conserved noncoding regions: Topics by Science.gov

Sample records for conserved noncoding regions

Analysis of conserved noncoding DNA in Drosophila reveals similar constraints in intergenic and intronic sequences.

PubMed

Bergman, C M; Kreitman, M

2001-08-01

Comparative genomic approaches to gene and cis-regulatory prediction are based on the principle that differential DNA sequence conservation reflects variation in functional constraint. Using this principle, we analyze noncoding sequence conservation in Drosophila for 40 loci with known or suspected cis-regulatory function encompassing >100 kb of DNA. We estimate the fraction of noncoding DNA conserved in both intergenic and intronic regions and describe the length distribution of ungapped conserved noncoding blocks. On average, 22%-26% of noncoding sequences surveyed are conserved in Drosophila, with median block length approximately 19 bp. We show that point substitution in conserved noncoding blocks exhibits transition bias as well as lineage effects in base composition, and occurs more than an order of magnitude more frequently than insertion/deletion (indel) substitution. Overall, patterns of noncoding DNA structure and evolution differ remarkably little between intergenic and intronic conserved blocks, suggesting that the effects of transcription per se contribute minimally to the constraints operating on these sequences. The results of this study have implications for the development of alignment and prediction algorithms specific to noncoding DNA, as well as for models of cis-regulatory DNA sequence evolution.
Nucleotide sequence determination of guinea-pig casein B mRNA reveals homology with bovine and rat alpha s1 caseins and conservation of the non-coding regions of the mRNA.

PubMed Central

Hall, L; Laird, J E; Craig, R K

1984-01-01

Nucleotide sequence analysis of cloned guinea-pig casein B cDNA sequences has identified two casein B variants related to the bovine and rat alpha s1 caseins. Amino acid homology was largely confined to the known bovine or predicted rat phosphorylation sites and within the 'signal' precursor sequence. Comparison of the deduced nucleotide sequence of the guinea-pig and rat alpha s1 casein mRNA species showed greater sequence conservation in the non-coding than in the coding regions, suggesting a functional and possibly regulatory role for the non-coding regions of casein mRNA. The results provide insight into the evolution of the casein genes, and raise questions as to the role of conserved nucleotide sequences within the non-coding regions of mRNA species. Images Fig. 1. PMID:6548375
RNA expression in a cartilaginous fish cell line reveals ancient 3′ noncoding regions highly conserved in vertebrates

PubMed Central

Forest, David; Nishikawa, Ryuhei; Kobayashi, Hiroshi; Parton, Angela; Bayne, Christopher J.; Barnes, David W.

2007-01-01

We have established a cartilaginous fish cell line [Squalus acanthias embryo cell line (SAE)], a mesenchymal stem cell line derived from the embryo of an elasmobranch, the spiny dogfish shark S. acanthias. Elasmobranchs (sharks and rays) first appeared >400 million years ago, and existing species provide useful models for comparative vertebrate cell biology, physiology, and genomics. Comparative vertebrate genomics among evolutionarily distant organisms can provide sequence conservation information that facilitates identification of critical coding and noncoding regions. Although these genomic analyses are informative, experimental verification of functions of genomic sequences depends heavily on cell culture approaches. Using ESTs defining mRNAs derived from the SAE cell line, we identified lengthy and highly conserved gene-specific nucleotide sequences in the noncoding 3′ UTRs of eight genes involved in the regulation of cell growth and proliferation. Conserved noncoding 3′ mRNA regions detected by using the shark nucleotide sequences as a starting point were found in a range of other vertebrate orders, including bony fish, birds, amphibians, and mammals. Nucleotide identity of shark and human in these regions was remarkably well conserved. Our results indicate that highly conserved gene sequences dating from the appearance of jawed vertebrates and representing potential cis-regulatory elements can be identified through the use of cartilaginous fish as a baseline. Because the expression of genes in the SAE cell line was prerequisite for their identification, this cartilaginous fish culture system also provides a physiologically valid tool to test functional hypotheses on the role of these ancient conserved sequences in comparative cell biology. PMID:17227856
Highly conserved elements discovered in vertebrates are present in non-syntenic loci of tunicates, act as enhancers and can be transcribed during development

PubMed Central

Sanges, Remo; Hadzhiev, Yavor; Gueroult-Bellone, Marion; Roure, Agnes; Ferg, Marco; Meola, Nicola; Amore, Gabriele; Basu, Swaraj; Brown, Euan R.; De Simone, Marco; Petrera, Francesca; Licastro, Danilo; Strähle, Uwe; Banfi, Sandro; Lemaire, Patrick; Birney, Ewan; Müller, Ferenc; Stupka, Elia

2013-01-01

Co-option of cis-regulatory modules has been suggested as a mechanism for the evolution of expression sites during development. However, the extent and mechanisms involved in mobilization of cis-regulatory modules remains elusive. To trace the history of non-coding elements, which may represent candidate ancestral cis-regulatory modules affirmed during chordate evolution, we have searched for conserved elements in tunicate and vertebrate (Olfactores) genomes. We identified, for the first time, 183 non-coding sequences that are highly conserved between the two groups. Our results show that all but one element are conserved in non-syntenic regions between vertebrate and tunicate genomes, while being syntenic among vertebrates. Nevertheless, in all the groups, they are significantly associated with transcription factors showing specific functions fundamental to animal development, such as multicellular organism development and sequence-specific DNA binding. The majority of these regions map onto ultraconserved elements and we demonstrate that they can act as functional enhancers within the organism of origin, as well as in cross-transgenesis experiments, and that they are transcribed in extant species of Olfactores. We refer to the elements as ‘Olfactores conserved non-coding elements’. PMID:23393190
Variation in conserved non-coding sequences on chromosome 5q andsusceptibility to asthma and atopy

DOE Office of Scientific and Technical Information (OSTI.GOV)

Donfack, Joseph; Schneider, Daniel H.; Tan, Zheng

2005-09-10

Background: Evolutionarily conserved sequences likely havebiological function. Methods: To determine whether variation in conservedsequences in non-coding DNA contributes to risk for human disease, westudied six conserved non-coding elements in the Th2 cytokine cluster onhuman chromosome 5q31 in a large Hutterite pedigree and in samples ofoutbred European American and African American asthma cases and controls.Results: Among six conserved non-coding elements (>100 bp,>70percent identity; human-mouse comparison), we identified one singlenucleotide polymorphism (SNP) in each of two conserved elements and sixSNPs in the flanking regions of three conserved elements. We genotypedour samples for four of these SNPs and an additional three SNPs eachmore » inthe IL13 and IL4 genes. While there was only modest evidence forassociation with single SNPs in the Hutterite and European Americansamples (P<0.05), there were highly significant associations inEuropean Americans between asthma and haplotypes comprised of SNPs in theIL4 gene (P<0.001), including a SNP in a conserved non-codingelement. Furthermore, variation in the IL13 gene was strongly associatedwith total IgE (P = 0.00022) and allergic sensitization to mold allergens(P = 0.00076) in the Hutterites, and more modestly associated withsensitization to molds in the European Americans and African Americans (P<0.01). Conclusion: These results indicate that there is overalllittle variation in the conserved non-coding elements on 5q31, butvariation in IL4 and IL13, including possibly one SNP in a conservedelement, influence asthma and atopic phenotypes in diversepopulations.« less
Interpreting Mammalian Evolution using Fugu Genome Comparisons

DOE Office of Scientific and Technical Information (OSTI.GOV)

Stubbs, L; Ovcharenko, I; Loots, G G

2004-04-02

Comparative sequence analysis of the human and the pufferfish Fugu rubripes (fugu) genomes has revealed several novel functional coding and noncoding regions in the human genome. In particular, the fugu genome has been extremely valuable for identifying transcriptional regulatory elements in human loci harboring unusually high levels of evolutionary conservation to rodent genomes. In such regions, the large evolutionary distance between human and fishes provides an additional filter through which functional noncoding elements can be detected with high efficiency.
The Large Mitochondrial Genome of Symbiodinium minutum Reveals Conserved Noncoding Sequences between Dinoflagellates and Apicomplexans

PubMed Central

Shoguchi, Eiichi; Shinzato, Chuya; Hisata, Kanako; Satoh, Nori; Mungpakdee, Sutada

2015-01-01

Even though mitochondrial genomes, which characterize eukaryotic cells, were first discovered more than 50 years ago, mitochondrial genomics remains an important topic in molecular biology and genome sciences. The Phylum Alveolata comprises three major groups (ciliates, apicomplexans, and dinoflagellates), the mitochondrial genomes of which have diverged widely. Even though the gene content of dinoflagellate mitochondrial genomes is reportedly comparable to that of apicomplexans, the highly fragmented and rearranged genome structures of dinoflagellates have frustrated whole genomic analysis. Consequently, noncoding sequences and gene arrangements of dinoflagellate mitochondrial genomes have not been well characterized. Here we report that the continuous assembled genome (∼326 kb) of the dinoflagellate, Symbiodinium minutum, is AT-rich (∼64.3%) and that it contains three protein-coding genes. Based upon in silico analysis, the remaining 99% of the genome comprises transcriptomic noncoding sequences. RNA edited sites and unique, possible start and stop codons clarify conserved regions among dinoflagellates. Our massive transcriptome analysis shows that almost all regions of the genome are transcribed, including 27 possible fragmented ribosomal RNA genes and 12 uncharacterized small RNAs that are similar to mitochondrial RNA genes of the malarial parasite, Plasmodium falciparum. Gene map comparisons show that gene order is only slightly conserved between S. minutum and P. falciparum. However, small RNAs and intergenic sequences share sequence similarities with P. falciparum, suggesting that the function of noncoding sequences has been preserved despite development of very different genome structures. PMID:26199191
A comparative genomics strategy for targeted discovery of single-nucleotide polymorphisms and conserved-noncoding sequences in orphan crops.

PubMed

Feltus, F A; Singh, H P; Lohithaswa, H C; Schulze, S R; Silva, T D; Paterson, A H

2006-04-01

Completed genome sequences provide templates for the design of genome analysis tools in orphan species lacking sequence information. To demonstrate this principle, we designed 384 PCR primer pairs to conserved exonic regions flanking introns, using Sorghum/Pennisetum expressed sequence tag alignments to the Oryza genome. Conserved-intron scanning primers (CISPs) amplified single-copy loci at 37% to 80% success rates in taxa that sample much of the approximately 50-million years of Poaceae divergence. While the conserved nature of exons fostered cross-taxon amplification, the lesser evolutionary constraints on introns enhanced single-nucleotide polymorphism detection. For example, in eight rice (Oryza sativa) genotypes, polymorphism averaged 12.1 per kb in introns but only 3.6 per kb in exons. Curiously, among 124 CISPs evaluated across Oryza, Sorghum, Pennisetum, Cynodon, Eragrostis, Zea, Triticum, and Hordeum, 23 (18.5%) seemed to be subject to rigid intron size constraints that were independent of per-nucleotide DNA sequence variation. Furthermore, we identified 487 conserved-noncoding sequence motifs in 129 CISP loci. A large CISP set (6,062 primer pairs, amplifying introns from 1,676 genes) designed using an automated pipeline showed generally higher abundance in recombinogenic than in nonrecombinogenic regions of the rice genome, thus providing relatively even distribution along genetic maps. CISPs are an effective means to explore poorly characterized genomes for both DNA polymorphism and noncoding sequence conservation on a genome-wide or candidate gene basis, and also provide anchor points for comparative genomics across a diverse range of species.
A Comparative Genomics Strategy for Targeted Discovery of Single-Nucleotide Polymorphisms and Conserved-Noncoding Sequences in Orphan Crops1[W

PubMed Central

Feltus, F.A.; Singh, H.P.; Lohithaswa, H.C.; Schulze, S.R.; Silva, T.D.; Paterson, A.H.

2006-01-01

Completed genome sequences provide templates for the design of genome analysis tools in orphan species lacking sequence information. To demonstrate this principle, we designed 384 PCR primer pairs to conserved exonic regions flanking introns, using Sorghum/Pennisetum expressed sequence tag alignments to the Oryza genome. Conserved-intron scanning primers (CISPs) amplified single-copy loci at 37% to 80% success rates in taxa that sample much of the approximately 50-million years of Poaceae divergence. While the conserved nature of exons fostered cross-taxon amplification, the lesser evolutionary constraints on introns enhanced single-nucleotide polymorphism detection. For example, in eight rice (Oryza sativa) genotypes, polymorphism averaged 12.1 per kb in introns but only 3.6 per kb in exons. Curiously, among 124 CISPs evaluated across Oryza, Sorghum, Pennisetum, Cynodon, Eragrostis, Zea, Triticum, and Hordeum, 23 (18.5%) seemed to be subject to rigid intron size constraints that were independent of per-nucleotide DNA sequence variation. Furthermore, we identified 487 conserved-noncoding sequence motifs in 129 CISP loci. A large CISP set (6,062 primer pairs, amplifying introns from 1,676 genes) designed using an automated pipeline showed generally higher abundance in recombinogenic than in nonrecombinogenic regions of the rice genome, thus providing relatively even distribution along genetic maps. CISPs are an effective means to explore poorly characterized genomes for both DNA polymorphism and noncoding sequence conservation on a genome-wide or candidate gene basis, and also provide anchor points for comparative genomics across a diverse range of species. PMID:16607031
Automated conserved non-coding sequence (CNS) discovery reveals differences in gene content and promoter evolution among grasses

PubMed Central

Turco, Gina; Schnable, James C.; Pedersen, Brent; Freeling, Michael

2013-01-01

Conserved non-coding sequences (CNS) are islands of non-coding sequence that, like protein coding exons, show less divergence in sequence between related species than functionless DNA. Several CNSs have been demonstrated experimentally to function as cis-regulatory regions. However, the specific functions of most CNSs remain unknown. Previous searches for CNS in plants have either anchored on exons and only identified nearby sequences or required years of painstaking manual annotation. Here we present an open source tool that can accurately identify CNSs between any two related species with sequenced genomes, including both those immediately adjacent to exons and distal sequences separated by >12 kb of non-coding sequence. We have used this tool to characterize new motifs, associate CNSs with additional functions, and identify previously undetected genes encoding RNA and protein in the genomes of five grass species. We provide a list of 15,363 orthologous CNSs conserved across all grasses tested. We were also able to identify regulatory sequences present in the common ancestor of grasses that have been lost in one or more extant grass lineages. Lists of orthologous gene pairs and associated CNSs are provided for reference inbred lines of arabidopsis, Japonica rice, foxtail millet, sorghum, brachypodium, and maize. PMID:23874343
The Large Mitochondrial Genome of Symbiodinium minutum Reveals Conserved Noncoding Sequences between Dinoflagellates and Apicomplexans.

PubMed

Shoguchi, Eiichi; Shinzato, Chuya; Hisata, Kanako; Satoh, Nori; Mungpakdee, Sutada

2015-07-20

Even though mitochondrial genomes, which characterize eukaryotic cells, were first discovered more than 50 years ago, mitochondrial genomics remains an important topic in molecular biology and genome sciences. The Phylum Alveolata comprises three major groups (ciliates, apicomplexans, and dinoflagellates), the mitochondrial genomes of which have diverged widely. Even though the gene content of dinoflagellate mitochondrial genomes is reportedly comparable to that of apicomplexans, the highly fragmented and rearranged genome structures of dinoflagellates have frustrated whole genomic analysis. Consequently, noncoding sequences and gene arrangements of dinoflagellate mitochondrial genomes have not been well characterized. Here we report that the continuous assembled genome (∼326 kb) of the dinoflagellate, Symbiodinium minutum, is AT-rich (∼64.3%) and that it contains three protein-coding genes. Based upon in silico analysis, the remaining 99% of the genome comprises transcriptomic noncoding sequences. RNA edited sites and unique, possible start and stop codons clarify conserved regions among dinoflagellates. Our massive transcriptome analysis shows that almost all regions of the genome are transcribed, including 27 possible fragmented ribosomal RNA genes and 12 uncharacterized small RNAs that are similar to mitochondrial RNA genes of the malarial parasite, Plasmodium falciparum. Gene map comparisons show that gene order is only slightly conserved between S. minutum and P. falciparum. However, small RNAs and intergenic sequences share sequence similarities with P. falciparum, suggesting that the function of noncoding sequences has been preserved despite development of very different genome structures. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Control of seed dormancy in Arabidopsis by a cis-acting noncoding antisense transcript.

PubMed

Fedak, Halina; Palusinska, Malgorzata; Krzyczmonik, Katarzyna; Brzezniak, Lien; Yatusevich, Ruslan; Pietras, Zbigniew; Kaczanowski, Szymon; Swiezewski, Szymon

2016-11-29

Seed dormancy is one of the most crucial process transitions in a plant's life cycle. Its timing is tightly controlled by the expression level of the Delay of Germination 1 gene (DOG1). DOG1 is the major quantitative trait locus for seed dormancy in Arabidopsis and has been shown to control dormancy in many other plant species. This is reflected by the evolutionary conservation of the functional short alternatively polyadenylated form of the DOG1 mRNA. Notably, the 3' region of DOG1, including the last exon that is not included in this transcript isoform, shows a high level of conservation at the DNA level, but the encoded polypeptide is poorly conserved. Here, we demonstrate that this region of DOG1 contains a promoter for the transcription of a noncoding antisense RNA, asDOG1, that is 5' capped, polyadenylated, and relatively stable. This promoter is autonomous and asDOG1 has an expression profile that is different from known DOG1 transcripts. Using several approaches we show that asDOG1 strongly suppresses DOG1 expression during seed maturation in cis, but is unable to do so in trans Therefore, the negative regulation of seed dormancy by asDOG1 in cis results in allele-specific suppression of DOG1 expression and promotes germination. Given the evolutionary conservation of the asDOG1 promoter, we propose that this cis-constrained noncoding RNA-mediated mechanism limiting the duration of seed dormancy functions across the Brassicaceae.
Comparative sequence analysis of the X-inactivation center region in mouse, human, and bovine.

PubMed

Chureau, Corinne; Prissette, Marine; Bourdet, Agnès; Barbe, Valérie; Cattolico, Laurence; Jones, Louis; Eggen, André; Avner, Philip; Duret, Laurent

2002-06-01

We have sequenced to high levels of accuracy 714-kb and 233-kb regions of the mouse and bovine X-inactivation centers (Xic), respectively, centered on the Xist gene. This has provided the basis for a fully annotated comparative analysis of the mouse Xic with the 2.3-Mb orthologous region in human and has allowed a three-way species comparison of the core central region, including the Xist gene. These comparisons have revealed conserved genes, both coding and noncoding, conserved CpG islands and, more surprisingly, conserved pseudogenes. The distribution of repeated elements, especially LINE repeats, in the mouse Xic region when compared to the rest of the genome does not support the hypothesis of a role for these repeat elements in the spreading of X inactivation. Interestingly, an asymmetric distribution of LINE elements on the two DNA strands was observed in the three species, not only within introns but also in intergenic regions. This feature is suggestive of important transcriptional activity within these intergenic regions. In silico prediction followed by experimental analysis has allowed four new genes, Cnbp2, Ftx, Jpx, and Ppnx, to be identified and novel, widespread, complex, and apparently noncoding transcriptional activity to be characterized in a region 5' of Xist that was recently shown to attract histone modification early after the onset of X inactivation.
Natural Selection and Functional Potentials of Human Noncoding Elements Revealed by Analysis of Next Generation Sequencing Data

PubMed Central

Xu, Shuhua

2015-01-01

Noncoding DNA sequences (NCS) have attracted much attention recently due to their functional potentials. Here we attempted to reveal the functional roles of noncoding sequences from the point of view of natural selection that typically indicates the functional potentials of certain genomic elements. We analyzed nearly 37 million single nucleotide polymorphisms (SNPs) of Phase I data of the 1000 Genomes Project. We estimated a series of key parameters of population genetics and molecular evolution to characterize sequence variations of the noncoding genome within and between populations, and identified the natural selection footprints in NCS in worldwide human populations. Our results showed that purifying selection is prevalent and there is substantial constraint of variations in NCS, while positive selectionis more likely to be specific to some particular genomic regions and regional populations. Intriguingly, we observed larger fraction of non-conserved NCS variants with lower derived allele frequency in the genome, indicating possible functional gain of non-conserved NCS. Notably, NCS elements are enriched for potentially functional markers such as eQTLs, TF motif, and DNase I footprints in the genome. More interestingly, some NCS variants associated with diseases such as Alzheimer's disease, Type 1 diabetes, and immune-related bowel disorder (IBD) showed signatures of positive selection, although the majority of NCS variants, reported as risk alleles by genome-wide association studies, showed signatures of negative selection. Our analyses provided compelling evidence of natural selection forces on noncoding sequences in the human genome and advanced our understanding of their functional potentials that play important roles in disease etiology and human evolution. PMID:26053627
Trichodesmium genome maintains abundant, widespread noncoding DNA in situ, despite oligotrophic lifestyle

DOE PAGES

Walworth, Nathan; Pfreundt, Ulrike; Nelson, William C.; ...

2015-03-23

Understanding the evolution of the free-living, cyanobacterial, diazotroph Trichodesmium is of great importance because of its critical role in oceanic biogeochemistry and primary production. Unlike the other >150 available genomes of free-living cyanobacteria, only 63.8% of the Trichodesmium erythraeum (strain IMS101) genome is predicted to encode protein, which is 20–25% less than the average for other cyanobacteria and nonpathogenic, free-living bacteria. In this paper, we use distinctive isolates and metagenomic data to show that low coding density observed in IMS101 is a common feature of the Trichodesmium genus, both in culture and in situ. Transcriptome analysis indicates that 86% ofmore » the noncoding space is expressed, although the function of these transcripts is unclear. The density of noncoding, possible regulatory elements predicted in Trichodesmium, when normalized per intergenic kilobase, was comparable and twofold higher than that found in the gene-dense genomes of the sympatric cyanobacterial genera Synechococcus and Prochlorococcus, respectively. Conserved Trichodesmium noncoding RNA secondary structures were predicted between most culture and metagenomic sequences, lending support to the structural conservation. Conservation of these intergenic regions in spatiotemporally separated Trichodesmium populations suggests possible genus-wide selection for their maintenance. These large intergenic spacers may have developed during intervals of strong genetic drift caused by periodic blooms of a subset of genotypes, which may have reduced effective population size. Finally, our data suggest that transposition of selfish DNA, low effective population size, and high-fidelity replication allowed the unusual “inflation” of noncoding sequence observed in Trichodesmium despite its oligotrophic lifestyle.« less
Identification and Characterization of Small Noncoding RNAs in Genome Sequences of the Edible Fungus Pleurotus ostreatus

PubMed Central

Zhao, Mengran; Hsiang, Tom; Feng, Xiaoxing

2016-01-01

Noncoding RNAs (ncRNAs) have been identified in many fungi. However, no genome-scale identification of ncRNAs has been inventoried for basidiomycetes. In this research, we detected 254 small noncoding RNAs (sncRNAs) in a genome assembly of an isolate (CCEF00389) of Pleurotus ostreatus, which is a widely cultivated edible basidiomycetous fungus worldwide. The identified sncRNAs include snRNAs, snoRNAs, tRNAs, and miRNAs. SnRNA U1 was not found in CCEF00389 genome assembly and some other basidiomycetous genomes by BLASTn. This implies that if snRNA U1 of basidiomycetes exists, it has a sequence that varies significantly from other organisms. By analyzing the distribution of sncRNA loci, we found that snRNAs and most tRNAs (88.6%) were located in pseudo-UTR regions, while miRNAs are commonly found in introns. To analyze the evolutionary conservation of the sncRNAs in P. ostreatus, we aligned all 254 sncRNAs to the genome assemblies of some other Agaricomycotina fungi. The results suggest that most sncRNAs (77.56%) were highly conserved in P. ostreatus, and 20% were conserved in Agaricomycotina fungi. These findings indicate that most sncRNAs of P. ostreatus were not conserved across Agaricomycotina fungi. PMID:27703969
Identification of coding and non-coding mutational hotspots in cancer genomes.

PubMed

Piraino, Scott W; Furney, Simon J

2017-01-05

The identification of mutations that play a causal role in tumour development, so called "driver" mutations, is of critical importance for understanding how cancers form and how they might be treated. Several large cancer sequencing projects have identified genes that are recurrently mutated in cancer patients, suggesting a role in tumourigenesis. While the landscape of coding drivers has been extensively studied and many of the most prominent driver genes are well characterised, comparatively less is known about the role of mutations in the non-coding regions of the genome in cancer development. The continuing fall in genome sequencing costs has resulted in a concomitant increase in the number of cancer whole genome sequences being produced, facilitating systematic interrogation of both the coding and non-coding regions of cancer genomes. To examine the mutational landscapes of tumour genomes we have developed a novel method to identify mutational hotspots in tumour genomes using both mutational data and information on evolutionary conservation. We have applied our methodology to over 1300 whole cancer genomes and show that it identifies prominent coding and non-coding regions that are known or highly suspected to play a role in cancer. Importantly, we applied our method to the entire genome, rather than relying on predefined annotations (e.g. promoter regions) and we highlight recurrently mutated regions that may have resulted from increased exposure to mutational processes rather than selection, some of which have been identified previously as targets of selection. Finally, we implicate several pan-cancer and cancer-specific candidate non-coding regions, which could be involved in tumourigenesis. We have developed a framework to identify mutational hotspots in cancer genomes, which is applicable to the entire genome. This framework identifies known and novel coding and non-coding mutional hotspots and can be used to differentiate candidate driver regions from likely passenger regions susceptible to somatic mutation.
Reptiles and mammals have differentially retained long conserved noncoding sequences from the amniote ancestor.

PubMed

Janes, D E; Chapus, C; Gondo, Y; Clayton, D F; Sinha, S; Blatti, C A; Organ, C L; Fujita, M K; Balakrishnan, C N; Edwards, S V

2011-01-01

Many noncoding regions of genomes appear to be essential to genome function. Conservation of large numbers of noncoding sequences has been reported repeatedly among mammals but not thus far among birds and reptiles. By searching genomes of chicken (Gallus gallus), zebra finch (Taeniopygia guttata), and green anole (Anolis carolinensis), we quantified the conservation among birds and reptiles and across amniotes of long, conserved noncoding sequences (LCNS), which we define as sequences ≥500 bp in length and exhibiting ≥95% similarity between species. We found 4,294 LCNS shared between chicken and zebra finch and 574 LCNS shared by the two birds and Anolis. The percent of genomes comprised by LCNS in the two birds (0.0024%) is notably higher than the percent in mammals (<0.0003% to <0.001%), differences that we show may be explained in part by differences in genome-wide substitution rates. We reconstruct a large number of LCNS for the amniote ancestor (ca. 8,630) and hypothesize differential loss and substantial turnover of these sites in descendent lineages. By contrast, we estimated a small role for recruitment of LCNS via acquisition of novel functions over time. Across amniotes, LCNS are significantly enriched with transcription factor binding sites for many developmental genes, and 2.9% of LCNS shared between the two birds show evidence of expression in brain expressed sequence tag databases. These results show that the rate of retention of LCNS from the amniote ancestor differs between mammals and Reptilia (including birds) and that this may reflect differing roles and constraints in gene regulation.
Reptiles and Mammals Have Differentially Retained Long Conserved Noncoding Sequences from the Amniote Ancestor

PubMed Central

Janes, D.E.; Chapus, C.; Gondo, Y.; Clayton, D.F.; Sinha, S.; Blatti, C.A.; Organ, C.L.; Fujita, M.K.; Balakrishnan, C.N.; Edwards, S.V.

2010-01-01

Many noncoding regions of genomes appear to be essential to genome function. Conservation of large numbers of noncoding sequences has been reported repeatedly among mammals but not thus far among birds and reptiles. By searching genomes of chicken (Gallus gallus), zebra finch (Taeniopygia guttata), and green anole (Anolis carolinensis), we quantified the conservation among birds and reptiles and across amniotes of long, conserved noncoding sequences (LCNS), which we define as sequences ≥500 bp in length and exhibiting ≥95% similarity between species. We found 4,294 LCNS shared between chicken and zebra finch and 574 LCNS shared by the two birds and Anolis. The percent of genomes comprised by LCNS in the two birds (0.0024%) is notably higher than the percent in mammals (<0.0003% to <0.001%), differences that we show may be explained in part by differences in genome-wide substitution rates. We reconstruct a large number of LCNS for the amniote ancestor (ca. 8,630) and hypothesize differential loss and substantial turnover of these sites in descendent lineages. By contrast, we estimated a small role for recruitment of LCNS via acquisition of novel functions over time. Across amniotes, LCNS are significantly enriched with transcription factor binding sites for many developmental genes, and 2.9% of LCNS shared between the two birds show evidence of expression in brain expressed sequence tag databases. These results show that the rate of retention of LCNS from the amniote ancestor differs between mammals and Reptilia (including birds) and that this may reflect differing roles and constraints in gene regulation. PMID:21183607
The Intolerance of Regulatory Sequence to Genetic Variation Predicts Gene Dosage Sensitivity

PubMed Central

Wang, Quanli; Halvorsen, Matt; Han, Yujun; Weir, William H.; Allen, Andrew S.; Goldstein, David B.

2015-01-01

Noncoding sequence contains pathogenic mutations. Yet, compared with mutations in protein-coding sequence, pathogenic regulatory mutations are notoriously difficult to recognize. Most fundamentally, we are not yet adept at recognizing the sequence stretches in the human genome that are most important in regulating the expression of genes. For this reason, it is difficult to apply to the regulatory regions the same kinds of analytical paradigms that are being successfully applied to identify mutations among protein-coding regions that influence risk. To determine whether dosage sensitive genes have distinct patterns among their noncoding sequence, we present two primary approaches that focus solely on a gene’s proximal noncoding regulatory sequence. The first approach is a regulatory sequence analogue of the recently introduced residual variation intolerance score (RVIS), termed noncoding RVIS, or ncRVIS. The ncRVIS compares observed and predicted levels of standing variation in the regulatory sequence of human genes. The second approach, termed ncGERP, reflects the phylogenetic conservation of a gene’s regulatory sequence using GERP++. We assess how well these two approaches correlate with four gene lists that use different ways to identify genes known or likely to cause disease through changes in expression: 1) genes that are known to cause disease through haploinsufficiency, 2) genes curated as dosage sensitive in ClinGen’s Genome Dosage Map, 3) genes judged likely to be under purifying selection for mutations that change expression levels because they are statistically depleted of loss-of-function variants in the general population, and 4) genes judged unlikely to cause disease based on the presence of copy number variants in the general population. We find that both noncoding scores are highly predictive of dosage sensitivity using any of these criteria. In a similar way to ncGERP, we assess two ensemble-based predictors of regional noncoding importance, ncCADD and ncGWAVA, and find both scores are significantly predictive of human dosage sensitive genes and appear to carry information beyond conservation, as assessed by ncGERP. These results highlight that the intolerance of noncoding sequence stretches in the human genome can provide a critical complementary tool to other genome annotation approaches to help identify the parts of the human genome increasingly likely to harbor mutations that influence risk of disease. PMID:26332131

Human Variation in Short Regions Predisposed to Deep Evolutionary Conservation

PubMed Central

Loots, Gabriela G.; Ovcharenko, Ivan

2010-01-01

The landscape of the human genome consists of millions of short islands of conservation that are 100% conserved across multiple vertebrate genomes (termed “bricks”), the majority of which are located in noncoding regions. Several hundred thousand bricks are deeply conserved reaching the genomes of amphibians and fish. Deep phylogenetic conservation of noncoding DNA has been reported to be strongly associated with the presence of gene regulatory elements, introducing bricks as a proxy to the functional noncoding landscape of the human genome. Here, we report a significant overrepresentation of bricks in the promoters of transcription factors and developmental genes, where the high level of phylogenetic conservation correlates with an increase in brick overrepresentation. We also found that the presence of a brick dictates a predisposition to evolutionary constraint, with only 0.7% of the amniota brick central nucleotides being diverged within the primate lineage—an 11-fold reduction in the divergence rate compared with random expectation. Human single-nucleotide polymorphism (SNP) data explains only 3% of primate-specific variation in amniota bricks, thus arguing for a widespread fixation of brick mutations within the primate lineage and prior to human radiation. This variation, in turn, might have been utilized as a driving force for primate- and hominoid-specific adaptation. We also discovered a pronounced deviation from the evolutionary predisposition in the human lineage, with over 20-fold increase in the substitution rate at brick SNP sites over expected values. In addition, contrary to typical brick mutations, brick variation commonly encountered in the human population displays limited, if any, signatures of negative selection as measured by the minor allele frequency and population differentiation (F-statistical measure) measures. These observations argue for the plasticity of gene regulatory mechanisms in vertebrates—with evidence of strong purifying selection acting on the gene regulatory landscape of the human genome, where widespread advantageous mutations in putative regulatory elements are likely utilized in functional diversification and adaptation of species. PMID:20093432
Genomic Sequence around Butterfly Wing Development Genes: Annotation and Comparative Analysis

PubMed Central

Conceição, Inês C.; Long, Anthony D.; Gruber, Jonathan D.; Beldade, Patrícia

2011-01-01

Background Analysis of genomic sequence allows characterization of genome content and organization, and access beyond gene-coding regions for identification of functional elements. BAC libraries, where relatively large genomic regions are made readily available, are especially useful for species without a fully sequenced genome and can increase genomic coverage of phylogenetic and biological diversity. For example, no butterfly genome is yet available despite the unique genetic and biological properties of this group, such as diversified wing color patterns. The evolution and development of these patterns is being studied in a few target species, including Bicyclus anynana, where a whole-genome BAC library allows targeted access to large genomic regions. Methodology/Principal Findings We characterize ∼1.3 Mb of genomic sequence around 11 selected genes expressed in B. anynana developing wings. Extensive manual curation of in silico predictions, also making use of a large dataset of expressed genes for this species, identified repetitive elements and protein coding sequence, and highlighted an expansion of Alcohol dehydrogenase genes. Comparative analysis with orthologous regions of the lepidopteran reference genome allowed assessment of conservation of fine-scale synteny (with detection of new inversions and translocations) and of DNA sequence (with detection of high levels of conservation of non-coding regions around some, but not all, developmental genes). Conclusions The general properties and organization of the available B. anynana genomic sequence are similar to the lepidopteran reference, despite the more than 140 MY divergence. Our results lay the groundwork for further studies of new interesting findings in relation to both coding and non-coding sequence: 1) the Alcohol dehydrogenase expansion with higher similarity between the five tandemly-repeated B. anynana paralogs than with the corresponding B. mori orthologs, and 2) the high conservation of non-coding sequence around the genes wingless and Ecdysone receptor, both involved in multiple developmental processes including wing pattern formation. PMID:21909358
Using the NCBI Genome Databases to Compare the Genes for Human & Chimpanzee Beta Hemoglobin

ERIC Educational Resources Information Center

Offner, Susan

2010-01-01

The beta hemoglobin protein is identical in humans and chimpanzees. In this tutorial, students see that even though the proteins are identical, the genes that code for them are not. There are many more differences in the introns than in the exons, which indicates that coding regions of DNA are more highly conserved than non-coding regions.
Molecular Evolution of the Non-Coding Eosinophil Granule Ontogeny Transcript

PubMed Central

Rose, Dominic; Stadler, Peter F.

2011-01-01

Eukaryotic genomes are pervasively transcribed. A large fraction of the transcriptional output consists of long, mRNA-like, non-protein-coding transcripts (mlncRNAs). The evolutionary history of mlncRNAs is still largely uncharted territory. In this contribution, we explore in detail the evolutionary traces of the eosinophil granule ontogeny transcript (EGOT), an experimentally confirmed representative of an abundant class of totally intronic non-coding transcripts (TINs). EGOT is located antisense to an intron of the ITPR1 gene. We computationally identify putative EGOT orthologs in the genomes of 32 different amniotes, including orthologs from primates, rodents, ungulates, carnivores, afrotherians, and xenarthrans, as well as putative candidates from basal amniotes, such as opossum or platypus. We investigate the EGOT gene phylogeny, analyze patterns of sequence conservation, and the evolutionary conservation of the EGOT gene structure. We show that EGO-B, the spliced isoform, may be present throughout the placental mammals, but most likely dates back even further. We demonstrate here for the first time that the whole EGOT locus is highly structured, containing several evolutionary conserved, and thermodynamic stable secondary structures. Our analyses allow us to postulate novel functional roles of a hitherto poorly understood region at the intron of EGO-B which is highly conserved at the sequence level. The region contains a novel ITPR1 exon and also conserved RNA secondary structures together with a conserved TATA-like element, which putatively acts as a promoter of an independent regulatory element. PMID:22303364
RNAcode: Robust discrimination of coding and noncoding regions in comparative sequence data

PubMed Central

Washietl, Stefan; Findeiß, Sven; Müller, Stephan A.; Kalkhof, Stefan; von Bergen, Martin; Hofacker, Ivo L.; Stadler, Peter F.; Goldman, Nick

2011-01-01

With the availability of genome-wide transcription data and massive comparative sequencing, the discrimination of coding from noncoding RNAs and the assessment of coding potential in evolutionarily conserved regions arose as a core analysis task. Here we present RNAcode, a program to detect coding regions in multiple sequence alignments that is optimized for emerging applications not covered by current protein gene-finding software. Our algorithm combines information from nucleotide substitution and gap patterns in a unified framework and also deals with real-life issues such as alignment and sequencing errors. It uses an explicit statistical model with no machine learning component and can therefore be applied “out of the box,” without any training, to data from all domains of life. We describe the RNAcode method and apply it in combination with mass spectrometry experiments to predict and confirm seven novel short peptides in Escherichia coli and to analyze the coding potential of RNAs previously annotated as “noncoding.” RNAcode is open source software and available for all major platforms at http://wash.github.com/rnacode. PMID:21357752
RNAcode: robust discrimination of coding and noncoding regions in comparative sequence data.

PubMed

Washietl, Stefan; Findeiss, Sven; Müller, Stephan A; Kalkhof, Stefan; von Bergen, Martin; Hofacker, Ivo L; Stadler, Peter F; Goldman, Nick

2011-04-01

With the availability of genome-wide transcription data and massive comparative sequencing, the discrimination of coding from noncoding RNAs and the assessment of coding potential in evolutionarily conserved regions arose as a core analysis task. Here we present RNAcode, a program to detect coding regions in multiple sequence alignments that is optimized for emerging applications not covered by current protein gene-finding software. Our algorithm combines information from nucleotide substitution and gap patterns in a unified framework and also deals with real-life issues such as alignment and sequencing errors. It uses an explicit statistical model with no machine learning component and can therefore be applied "out of the box," without any training, to data from all domains of life. We describe the RNAcode method and apply it in combination with mass spectrometry experiments to predict and confirm seven novel short peptides in Escherichia coli and to analyze the coding potential of RNAs previously annotated as "noncoding." RNAcode is open source software and available for all major platforms at http://wash.github.com/rnacode.
Long non-coding RNA produced by RNA polymerase V determines boundaries of heterochromatin

PubMed Central

Böhmdorfer, Gudrun; Sethuraman, Shriya; Rowley, M Jordan; Krzyszton, Michal; Rothi, M Hafiz; Bouzit, Lilia; Wierzbicki, Andrzej T

2016-01-01

RNA-mediated transcriptional gene silencing is a conserved process where small RNAs target transposons and other sequences for repression by establishing chromatin modifications. A central element of this process are long non-coding RNAs (lncRNA), which in Arabidopsis thaliana are produced by a specialized RNA polymerase known as Pol V. Here we show that non-coding transcription by Pol V is controlled by preexisting chromatin modifications located within the transcribed regions. Most Pol V transcripts are associated with AGO4 but are not sliced by AGO4. Pol V-dependent DNA methylation is established on both strands of DNA and is tightly restricted to Pol V-transcribed regions. This indicates that chromatin modifications are established in close proximity to Pol V. Finally, Pol V transcription is preferentially enriched on edges of silenced transposable elements, where Pol V transcribes into TEs. We propose that Pol V may play an important role in the determination of heterochromatin boundaries. DOI: http://dx.doi.org/10.7554/eLife.19092.001 PMID:27779094
Trichodesmium genome maintains abundant, widespread noncoding DNA in situ, despite oligotrophic lifestyle

DOE PAGES

Walworth, Nathan G.; Pfreundt, Ulrike; Nelson, William C.; ...

2015-04-07

Understanding the evolution of the free-living, cyanobacterial, diazotroph Trichodesmium is of great importance due to its critical role in oceanic biogeochemistry and primary production. Unlike the other >150 available genomes of free-living cyanobacteria, only 63.8% of the Trichodesmium erythraeum (strain IMS101) genome is predicted to encode protein, which is 20-25% less than the average for other cyanobacteria and non-pathogenic, free-living bacteria. We use distinctive isolates and metagenomic data to show that low coding density observed in IMS101 is a common feature of the Trichodesmium genus both in culture and in situ. Transcriptome analysis indicates that 86% of the non-coding spacemore » is expressed, although the function of these transcripts is unclear. The density of noncoding, possible regulatory elements predicted in Trichodesmium, when normalized per intergenic kilobase, was comparable and two fold higher than that found in the gene dense genomes of the sympatric cyanobacterial genera Synechococcus and Prochlorococcus, respectively. Conserved Trichodesmium ncRNA secondary structures were predicted between most culture and metagenomic sequences lending support to the structural conservation. Conservation of these intergenic regions in spatiotemporally separated Trichodesmium populations suggests possible genus-wide selection for their maintenance. These large intergenic spacers may have developed during intervals of strong genetic drift caused by periodic blooms of a subset of genotypes, which may have reduced effective population size. Our data suggest that transposition of selfish DNA, low effective population size, and high fidelity replication allowed the unusual ‘inflation’ of noncoding sequence observed in Trichodesmium despite its oligotrophic lifestyle.« less
Theria-Specific Homeodomain and cis-Regulatory Element Evolution of the Dlx3–4 Bigene Cluster in 12 Different Mammalian Species

PubMed Central

SUMIYAMA, KENTA; MIYAKE, TSUTOMU; GRIMWOOD, JANE; STUART, ANDREW; DICKSON, MARK; SCHMUTZ, JEREMY; RUDDLE, FRANK H.; MYERS, RICHARD M.; AMEMIYA, CHRIS T.

2013-01-01

The mammalian Dlx3 and Dlx4 genes are configured as a bigene cluster, and their respective expression patterns are controlled temporally and spatially by cis-elements that largely reside within the intergenic region of the cluster. Previous work revealed that there are conspicuously conserved elements within the intergenic region of the Dlx3–4 bigene clusters of mouse and human. In this paper we have extended these analyses to include 12 additional mammalian taxa (including a marsupial and a monotreme) in order to better define the nature and molecular evolutionary trends of the coding and non-coding functional elements among morphologically divergent mammals. Dlx3–4 regions were fully sequenced from 12 divergent taxa of interest. We identified three theria-specific amino acid replacements in homeodomain of Dlx4 gene that functions in placenta. Sequence analyses of constrained nucleotide sites in the intergenic non-coding region showed that many of the intergenic conserved elements are highly conserved and have evolved slowly within the mammals. In contrast, a branchial arch/craniofacial enhancer I37-2 exhibited accelerated evolution at the branch between the monotreme and therian common ancestor despite being highly conserved among therian species. Functional analysis of I37-2 in transgenic mice has shown that the equivalent region of the platypus fails to drive transcriptional activity in branchial arches. These observations, taken together with our molecular evolutionary data, suggest that theria-specific episodic changes in the I37-2 element may have contributed to craniofacial innovation at the base of the mammalian lineage. PMID:22951979
Identification of long non-coding RNAs in two anthozoan species and their possible implications for coral bleaching.

PubMed

Huang, Chen; Morlighem, Jean-Étienne R L; Cai, Jing; Liao, Qiwen; Perez, Carlos Daniel; Gomes, Paula Braga; Guo, Min; Rádis-Baptista, Gandhi; Lee, Simon Ming-Yuen

2017-07-13

Long non-coding RNAs (lncRNAs) have been shown to play regulatory roles in a diverse range of biological processes and are associated with the outcomes of various diseases. The majority of studies about lncRNAs focus on model organisms, with lessened investigation in non-model organisms to date. Herein, we have undertaken an investigation on lncRNA in two zoanthids (cnidarian): Protolpalythoa varibilis and Palythoa caribaeorum. A total of 11,206 and 13,240 lncRNAs were detected in P. variabilis and P. caribaeorum transcriptome, respectively. Comparison using NONCODE database indicated that the majority of these lncRNAs is taxonomically species-restricted with no identifiable orthologs. Even so, we found cases in which short regions of P. caribaeorum's lncRNAs were similar to vertebrate species' lncRNAs, and could be associated with lncRNA conserved regulatory functions. Consequently, some high-confidence lncRNA-mRNA interactions were predicted based on such conserved regions, therefore revealing possible involvement of lncRNAs in posttranscriptional processing and regulation in anthozoans. Moreover, investigation of differentially expressed lncRNAs, in healthy colonies and colonial individuals undergoing natural bleaching, indicated that some up-regulated lncRNAs in P. caribaeorum could posttranscriptionally regulate the mRNAs encoding proteins of Ras-mediated signal transduction pathway and components of innate immune-system, which could contribute to the molecular response of coral bleaching.
A screen for nuclear transcripts identifies two linked noncoding RNAs associated with SC35 splicing domains

PubMed Central

Hutchinson, John N; Ensminger, Alexander W; Clemson, Christine M; Lynch, Christopher R; Lawrence, Jeanne B; Chess, Andrew

2007-01-01

Background Noncoding RNA species play a diverse set of roles in the eukaryotic cell. While much recent attention has focused on smaller RNA species, larger noncoding transcripts are also thought to be highly abundant in mammalian cells. To search for large noncoding RNAs that might control gene expression or mRNA metabolism, we used Affymetrix expression arrays to identify polyadenylated RNA transcripts displaying nuclear enrichment. Results This screen identified no more than three transcripts; XIST, and two unique noncoding nuclear enriched abundant transcripts (NEAT) RNAs strikingly located less than 70 kb apart on human chromosome 11: NEAT1, a noncoding RNA from the locus encoding for TncRNA, and NEAT2 (also known as MALAT-1). While the two NEAT transcripts share no significant homology with each other, each is conserved within the mammalian lineage, suggesting significant function for these noncoding RNAs. NEAT2 is extraordinarily well conserved for a noncoding RNA, more so than even XIST. Bioinformatic analyses of publicly available mouse transcriptome data support our findings from human cells as they confirm that the murine homologs of these noncoding RNAs are also nuclear enriched. RNA FISH analyses suggest that these noncoding RNAs function in mRNA metabolism as they demonstrate an intimate association of these RNA species with SC35 nuclear speckles in both human and mouse cells. These studies show that one of these transcripts, NEAT1 localizes to the periphery of such domains, whereas the neighboring transcript, NEAT2, is part of the long-sought polyadenylated component of nuclear speckles. Conclusion Our genome-wide screens in two mammalian species reveal no more than three abundant large non-coding polyadenylated RNAs in the nucleus; the canonical large noncoding RNA XIST and NEAT1 and NEAT2. The function of these noncoding RNAs in mRNA metabolism is suggested by their high levels of conservation and their intimate association with SC35 splicing domains in multiple mammalian species. PMID:17270048
Comparative Sequence Analysis of the X-Inactivation Center Region in Mouse, Human, and Bovine

PubMed Central

Chureau, Corinne; Prissette, Marine; Bourdet, Agnès; Barbe, Valérie; Cattolico, Laurence; Jones, Louis; Eggen, André; Avner, Philip; Duret, Laurent

2002-01-01

We have sequenced to high levels of accuracy 714-kb and 233-kb regions of the mouse and bovine X-inactivation centers (Xic), respectively, centered on the Xist gene. This has provided the basis for a fully annotated comparative analysis of the mouse Xic with the 2.3-Mb orthologous region in human and has allowed a three-way species comparison of the core central region, including the Xist gene. These comparisons have revealed conserved genes, both coding and noncoding, conserved CpG islands and, more surprisingly, conserved pseudogenes. The distribution of repeated elements, especially LINE repeats, in the mouse Xic region when compared to the rest of the genome does not support the hypothesis of a role for these repeat elements in the spreading of X inactivation. Interestingly, an asymmetric distribution of LINE elements on the two DNA strands was observed in the three species, not only within introns but also in intergenic regions. This feature is suggestive of important transcriptional activity within these intergenic regions. In silico prediction followed by experimental analysis has allowed four new genes, Cnbp2, Ftx, Jpx, and Ppnx, to be identified and novel, widespread, complex, and apparently noncoding transcriptional activity to be characterized in a region 5′ of Xist that was recently shown to attract histone modification early after the onset of X inactivation. [The sequence data described in this paper have been submitted to the EMBL data library under accession nos. AJ421478, AJ421479, AJ421480, and AJ421481. Online supplemental data are available at http://pbil.univ-lyon1.fr/datasets/Xic2002/data.html and www.genome.org.] PMID:12045143
Conserved Nonexonic Elements: A Novel Class of Marker for Phylogenomics.

PubMed

Edwards, Scott V; Cloutier, Alison; Baker, Allan J

2017-11-01

Noncoding markers have a particular appeal as tools for phylogenomic analysis because, at least in vertebrates, they appear less subject to strong variation in GC content among lineages. Thus far, ultraconserved elements (UCEs) and introns have been the most widely used noncoding markers. Here we analyze and study the evolutionary properties of a new type of noncoding marker, conserved nonexonic elements (CNEEs), which consists of noncoding elements that are estimated to evolve slower than the neutral rate across a set of species. Although they often include UCEs, CNEEs are distinct from UCEs because they are not ultraconserved, and, most importantly, the core region alone is analyzed, rather than both the core and its flanking regions. Using a data set of 16 birds plus an alligator outgroup, and ∼3600-∼3800 loci per marker type, we found that although CNEEs were less variable than bioinformatically derived UCEs or introns and in some cases exhibited a slower approach to branch resolution as determined by phylogenomic subsampling, the quality of CNEE alignments was superior to those of the other markers, with fewer gaps and missing species. Phylogenetic resolution using coalescent approaches was comparable among the three marker types, with most nodes being fully and congruently resolved. Comparison of phylogenetic results across the three marker types indicated that one branch, the sister group to the passerine + falcon clade, was resolved differently and with moderate (>70%) bootstrap support between CNEEs and UCEs or introns. Overall, CNEEs appear to be promising as phylogenomic markers, yielding phylogenetic resolution as high as for UCEs and introns but with fewer gaps, less ambiguity in alignments and with patterns of nucleotide substitution more consistent with the assumptions of commonly used methods of phylogenetic analysis. © The Author(s) 2017. Published by Oxford University Press on behalf of the Systematic Biologists.
Conserved Nonexonic Elements: A Novel Class of Marker for Phylogenomics

PubMed Central

Cloutier, Alison; Baker, Allan J.

2017-01-01

Abstract Noncoding markers have a particular appeal as tools for phylogenomic analysis because, at least in vertebrates, they appear less subject to strong variation in GC content among lineages. Thus far, ultraconserved elements (UCEs) and introns have been the most widely used noncoding markers. Here we analyze and study the evolutionary properties of a new type of noncoding marker, conserved nonexonic elements (CNEEs), which consists of noncoding elements that are estimated to evolve slower than the neutral rate across a set of species. Although they often include UCEs, CNEEs are distinct from UCEs because they are not ultraconserved, and, most importantly, the core region alone is analyzed, rather than both the core and its flanking regions. Using a data set of 16 birds plus an alligator outgroup, and ∼3600–∼3800 loci per marker type, we found that although CNEEs were less variable than bioinformatically derived UCEs or introns and in some cases exhibited a slower approach to branch resolution as determined by phylogenomic subsampling, the quality of CNEE alignments was superior to those of the other markers, with fewer gaps and missing species. Phylogenetic resolution using coalescent approaches was comparable among the three marker types, with most nodes being fully and congruently resolved. Comparison of phylogenetic results across the three marker types indicated that one branch, the sister group to the passerine + falcon clade, was resolved differently and with moderate (>70%) bootstrap support between CNEEs and UCEs or introns. Overall, CNEEs appear to be promising as phylogenomic markers, yielding phylogenetic resolution as high as for UCEs and introns but with fewer gaps, less ambiguity in alignments and with patterns of nucleotide substitution more consistent with the assumptions of commonly used methods of phylogenetic analysis. PMID:28637293
Arabidopsis intragenomic conserved noncoding sequence

PubMed Central

Thomas, Brian C.; Rapaka, Lakshmi; Lyons, Eric; Pedersen, Brent; Freeling, Michael

2007-01-01

After the most recent tetraploidy in the Arabidopsis lineage, most gene pairs lost one, but not both, of their duplicates. We manually inspected the 3,179 retained gene pairs and their surrounding gene space still present in the genome using a custom-made viewer application. The display of these pairs allowed us to define intragenic conserved noncoding sequences (CNSs), identify exon annotation errors, and discover potentially new genes. Using a strict algorithm to sort high-scoring pair sequences from the bl2seq data, we created a database of 14,944 intragenomic Arabidopsis CNSs. The mean CNS length is 31 bp, ranging from 15 to 285 bp. There are ≈1.7 CNSs associated with a typical gene, and Arabidopsis CNSs are found in all areas around exons, most frequently in the 5′ upstream region. Gene ontology classifications related to transcription, regulation, or “response to …” external or endogenous stimuli, especially hormones, tend to be significantly overrepresented among genes containing a large number of CNSs, whereas protein localization, transport, and metabolism are common among genes with no CNSs. There is a 1.5% overlap between these CNSs and the 218,982 putative RNAs in the Arabidopsis Small RNA Project database, allowing for two mismatches. These CNSs provide a unique set of noncoding sequences enriched for function. CNS function is implied by evolutionary conservation and independently supported because CNS-richness predicts regulatory gene ontology categories. PMID:17301222
Conserved expression of transposon-derived non-coding transcripts in primate stem cells.

PubMed

Ramsay, LeeAnn; Marchetto, Maria C; Caron, Maxime; Chen, Shu-Huang; Busche, Stephan; Kwan, Tony; Pastinen, Tomi; Gage, Fred H; Bourque, Guillaume

2017-02-28

A significant portion of expressed non-coding RNAs in human cells is derived from transposable elements (TEs). Moreover, it has been shown that various long non-coding RNAs (lncRNAs), which come from the human endogenous retrovirus subfamily H (HERVH), are not only expressed but required for pluripotency in human embryonic stem cells (hESCs). To identify additional TE-derived functional non-coding transcripts, we generated RNA-seq data from induced pluripotent stem cells (iPSCs) of four primate species (human, chimpanzee, gorilla, and rhesus) and searched for transcripts whose expression was conserved. We observed that about 30% of TE instances expressed in human iPSCs had orthologous TE instances that were also expressed in chimpanzee and gorilla. Notably, our analysis revealed a number of repeat families with highly conserved expression profiles including HERVH but also MER53, which is known to be the source of a placental-specific family of microRNAs (miRNAs). We also identified a number of repeat families from all classes of TEs, including MLT1-type and Tigger families, that contributed a significant amount of sequence to primate lncRNAs whose expression was conserved. Together, these results describe TE families and TE-derived lncRNAs whose conserved expression patterns can be used to identify what are likely functional TE-derived non-coding transcripts in primate iPSCs.
Functional formation of domain V of the poliovirus noncoding region: significance of unpaired bases.

PubMed

Rowe, A; Burlison, J; Macadam, A J; Minor, P D

2001-10-10

Previously we have shown that polioviruses with mutations that disrupt the predicted secondary structure of the 5' noncoding region of domain V are temperature sensitive for growth. Non-temperature-sensitive revertant viruses had mutations that re-formed secondary structure by a direct back mutation of changes in the opposite strand. We mutated unpaired regions and selected revertants of viruses with single base deletions, where no obvious back mutation was available in order to gain information on secondary structure. Results indicated that conservation of length of a three base loop between two double-stranded stems was essential for a functional domain V to form. The requirement for the unpaired "hinge" base at 484 which is implicated in the attenuation of Sabin 2 was also confirmed. Results also underline the necessity for functional folding over local secondary structure stability. Copyright 2001 Academic Press.
Conserved noncoding sequences (CNSs) in higher plants.

PubMed

Freeling, Michael; Subramaniam, Shabarinath

2009-04-01

Plant conserved noncoding sequences (CNSs)--a specific category of phylogenetic footprint--have been shown experimentally to function. No plant CNS is conserved to the extent that ultraconserved noncoding sequences are conserved in vertebrates. Plant CNSs are enriched in known transcription factor or other cis-acting binding sites, and are usually clustered around genes. Genes that encode transcription factors and/or those that respond to stimuli are particularly CNS-rich. Only rarely could this function involve small RNA binding. Some transcribed CNSs encode short translation products as a form of negative control. Approximately 4% of Arabidopsis gene content is estimated to be both CNS-rich and occupies a relatively long stretch of chromosome: Bigfoot genes (long phylogenetic footprints). We discuss a 'DNA-templated protein assembly' idea that might help explain Bigfoot gene CNSs.
Genetic evidence for conserved non-coding element function across species–the ears have it

PubMed Central

Turner, Eric E.; Cox, Timothy C.

2014-01-01

Comparison of genomic sequences from diverse vertebrate species has revealed numerous highly conserved regions that do not appear to encode proteins or functional RNAs. Often these “conserved non-coding elements,” or CNEs, can direct gene expression to specific tissues in transgenic models, demonstrating they have regulatory function. CNEs are frequently found near “developmental” genes, particularly transcription factors, implying that these elements have essential regulatory roles in development. However, actual examples demonstrating CNE regulatory functions across species have been few, and recent loss-of-function studies of several CNEs in mice have shown relatively minor effects. In this Perspectives article, we discuss new findings in “fancy” rats and Highland cattle demonstrating that function of a CNE near the Hmx1 gene is crucial for normal external ear development and when disrupted can mimic loss-of function Hmx1 coding mutations in mice and humans. These findings provide important support for conserved developmental roles of CNEs in divergent species, and reinforce the concept that CNEs should be examined systematically in the ongoing search for genetic causes of human developmental disorders in the era of genome-scale sequencing. PMID:24478720
Potential Novel Mechanism for Axenfeld-Rieger Syndrome: Deletion of a Distant Region Containing Regulatory Elements of PITX2

PubMed Central

Volkmann, Bethany A.; Zinkevich, Natalya S.; Mustonen, Aki; Schilter, Kala F.; Bosenko, Dmitry V.; Reis, Linda M.; Broeckel, Ulrich; Link, Brian A.

2011-01-01

Purpose. Mutations in PITX2 are associated with Axenfeld-Rieger syndrome (ARS), which involves ocular, dental, and umbilical abnormalities. Identification of cis-regulatory elements of PITX2 is important to better understand the mechanisms of disease. Methods. Conserved noncoding elements surrounding PITX2/pitx2 were identified and examined through transgenic analysis in zebrafish; expression pattern was studied by in situ hybridization. Patient samples were screened for deletion/duplication of the PITX2 upstream region using arrays and probes. Results. Zebrafish pitx2 demonstrates conserved expression during ocular and craniofacial development. Thirteen conserved noncoding sequences positioned within a gene desert as far as 1.1 Mb upstream of the human PITX2 gene were identified; 11 have enhancer activities consistent with pitx2 expression. Ten elements mediated expression in the developing brain, four regions were active during eye formation, and two sequences were associated with craniofacial expression. One region, CE4, located approximately 111 kb upstream of PITX2, directed a complex pattern including expression in the developing eye and craniofacial region, the classic sites affected in ARS. Screening of ARS patients identified an approximately 7600-kb deletion that began 106 to 108 kb upstream of the PITX2 gene, leaving PITX2 intact while removing regulatory elements CE4 to CE13. Conclusions. These data suggest the presence of a complex distant regulatory matrix within the gene desert located upstream of PITX2 with an essential role in its activity and provides a possible mechanism for the previous reports of ARS in patients with balanced translocations involving the 4q25 region upstream of PITX2 and the current patient with an upstream deletion. PMID:20881290

Transcriptional Regulation in Ebola Virus: Effects of Gene Border Structure and Regulatory Elements on Gene Expression and Polymerase Scanning Behavior

PubMed Central

Brauburger, Kristina; Boehmann, Yannik; Krähling, Verena

2015-01-01

ABSTRACT The highly pathogenic Ebola virus (EBOV) has a nonsegmented negative-strand (NNS) RNA genome containing seven genes. The viral genes either are separated by intergenic regions (IRs) of variable length or overlap. The structure of the EBOV gene overlaps is conserved throughout all filovirus genomes and is distinct from that of the overlaps found in other NNS RNA viruses. Here, we analyzed how diverse gene borders and noncoding regions surrounding the gene borders influence transcript levels and govern polymerase behavior during viral transcription. Transcription of overlapping genes in EBOV bicistronic minigenomes followed the stop-start mechanism, similar to that followed by IR-containing gene borders. When the gene overlaps were extended, the EBOV polymerase was able to scan the template in an upstream direction. This polymerase feature seems to be generally conserved among NNS RNA virus polymerases. Analysis of IR-containing gene borders showed that the IR sequence plays only a minor role in transcription regulation. Changes in IR length were generally well tolerated, but specific IR lengths led to a strong decrease in downstream gene expression. Correlation analysis revealed that these effects were largely independent of the surrounding gene borders. Each EBOV gene contains exceptionally long untranslated regions (UTRs) flanking the open reading frame. Our data suggest that the UTRs adjacent to the gene borders are the main regulators of transcript levels. A highly complex interplay between the different cis-acting elements to modulate transcription was revealed for specific combinations of IRs and UTRs, emphasizing the importance of the noncoding regions in EBOV gene expression control. IMPORTANCE Our data extend those from previous analyses investigating the implication of noncoding regions at the EBOV gene borders for gene expression control. We show that EBOV transcription is regulated in a highly complex yet not easily predictable manner by a set of interacting cis-active elements. These findings are important not only for the design of recombinant filoviruses but also for the design of other replicon systems widely used as surrogate systems to study the filovirus replication cycle under low biosafety levels. Insights into the complex regulation of EBOV transcription conveyed by noncoding sequences will also help to interpret the importance of mutations that have been detected within these regions, including in isolates of the current outbreak. PMID:26656691
Transcriptional Regulation in Ebola Virus: Effects of Gene Border Structure and Regulatory Elements on Gene Expression and Polymerase Scanning Behavior.

PubMed

Brauburger, Kristina; Boehmann, Yannik; Krähling, Verena; Mühlberger, Elke

2016-02-15

The highly pathogenic Ebola virus (EBOV) has a nonsegmented negative-strand (NNS) RNA genome containing seven genes. The viral genes either are separated by intergenic regions (IRs) of variable length or overlap. The structure of the EBOV gene overlaps is conserved throughout all filovirus genomes and is distinct from that of the overlaps found in other NNS RNA viruses. Here, we analyzed how diverse gene borders and noncoding regions surrounding the gene borders influence transcript levels and govern polymerase behavior during viral transcription. Transcription of overlapping genes in EBOV bicistronic minigenomes followed the stop-start mechanism, similar to that followed by IR-containing gene borders. When the gene overlaps were extended, the EBOV polymerase was able to scan the template in an upstream direction. This polymerase feature seems to be generally conserved among NNS RNA virus polymerases. Analysis of IR-containing gene borders showed that the IR sequence plays only a minor role in transcription regulation. Changes in IR length were generally well tolerated, but specific IR lengths led to a strong decrease in downstream gene expression. Correlation analysis revealed that these effects were largely independent of the surrounding gene borders. Each EBOV gene contains exceptionally long untranslated regions (UTRs) flanking the open reading frame. Our data suggest that the UTRs adjacent to the gene borders are the main regulators of transcript levels. A highly complex interplay between the different cis-acting elements to modulate transcription was revealed for specific combinations of IRs and UTRs, emphasizing the importance of the noncoding regions in EBOV gene expression control. Our data extend those from previous analyses investigating the implication of noncoding regions at the EBOV gene borders for gene expression control. We show that EBOV transcription is regulated in a highly complex yet not easily predictable manner by a set of interacting cis-active elements. These findings are important not only for the design of recombinant filoviruses but also for the design of other replicon systems widely used as surrogate systems to study the filovirus replication cycle under low biosafety levels. Insights into the complex regulation of EBOV transcription conveyed by noncoding sequences will also help to interpret the importance of mutations that have been detected within these regions, including in isolates of the current outbreak. Copyright © 2016, American Society for Microbiology. All Rights Reserved.
Mitochondrial genome evolution in the Saccharomyces sensu stricto complex.

PubMed

Ruan, Jiangxing; Cheng, Jian; Zhang, Tongcun; Jiang, Huifeng

2017-01-01

Exploring the evolutionary patterns of mitochondrial genomes is important for our understanding of the Saccharomyces sensu stricto (SSS) group, which is a model system for genomic evolution and ecological analysis. In this study, we first obtained the complete mitochondrial sequences of two important species, Saccharomyces mikatae and Saccharomyces kudriavzevii. We then compared the mitochondrial genomes in the SSS group with those of close relatives, and found that the non-coding regions evolved rapidly, including dramatic expansion of intergenic regions, fast evolution of introns and almost 20-fold higher rearrangement rates than those of the nuclear genomes. However, the coding regions, and especially the protein-coding genes, are more conserved than those in the nuclear genomes of the SSS group. The different evolutionary patterns of coding and non-coding regions in the mitochondrial and nuclear genomes may be related to the origin of the aerobic fermentation lifestyle in this group. Our analysis thus provides novel insights into the evolution of mitochondrial genomes.
Comparison of the complete mitochondrial genome of the stonefly Sweltsa longistyla (Plecoptera: Chloroperlidae) with mitogenomes of three other stoneflies.

PubMed

Chen, Zhi-Teng; Du, Yu-Zhou

2015-03-01

The complete mitochondrial genome of the stonefly, Sweltsa longistyla Wu (Plecoptera: Chloroperlidae), was sequenced in this study. The mitogenome of S. longistyla is 16,151bp and contains 37 genes including 13 protein-coding genes (PCGs), 22 tRNA genes, two rRNA genes, and a large non-coding region. S. longistyla, Pteronarcys princeps Banks, Kamimuria wangi Du and Cryptoperla stilifera Sivec belong to the Plecoptera, and the gene order and orientation of their mitogenomes were similar. The overall AT content for the four stoneflies was below 72%, and the AT content of tRNA genes was above 69%. The four genomes were compact and contained only 65-127bp of non-coding intergenic DNAs. Overlapping nucleotides existed in all four genomes and ranged from 24 (P. princeps) to 178bp (K. wangi). There was a 7-bp motif ('ATGATAA') of overlapping DNA and an 8-bp motif (AAGCCTTA) conserved in three stonefly species (P. princeps, K. wangi and C. stilifera). The control regions of four stoneflies contained a stem-loop structure. Four conserved sequence blocks (CSBs) were present in the A+T-rich regions of all four stoneflies. Copyright © 2014 Elsevier B.V. All rights reserved.
Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution.

PubMed

2004-12-09

We present here a draft genome sequence of the red jungle fowl, Gallus gallus. Because the chicken is a modern descendant of the dinosaurs and the first non-mammalian amniote to have its genome sequenced, the draft sequence of its genome--composed of approximately one billion base pairs of sequence and an estimated 20,000-23,000 genes--provides a new perspective on vertebrate genome evolution, while also improving the annotation of mammalian genomes. For example, the evolutionary distance between chicken and human provides high specificity in detecting functional elements, both non-coding and coding. Notably, many conserved non-coding sequences are far from genes and cannot be assigned to defined functional classes. In coding regions the evolutionary dynamics of protein domains and orthologous groups illustrate processes that distinguish the lineages leading to birds and mammals. The distinctive properties of avian microchromosomes, together with the inferred patterns of conserved synteny, provide additional insights into vertebrate chromosome architecture.
Comparative evolutionary genomics of the HADH2 gene encoding Aβ-binding alcohol dehydrogenase/17β-hydroxysteroid dehydrogenase type 10 (ABAD/HSD10)

PubMed Central

Marques, Alexandra T; Antunes, Agostinho; Fernandes, Pedro A; Ramos, Maria J

2006-01-01

Background The Aβ-binding alcohol dehydrogenase/17β-hydroxysteroid dehydrogenase type 10 (ABAD/HSD10) is an enzyme involved in pivotal metabolic processes and in the mitochondrial dysfunction seen in the Alzheimer's disease. Here we use comparative genomic analyses to study the evolution of the HADH2 gene encoding ABAD/HSD10 across several eukaryotic species. Results Both vertebrate and nematode HADH2 genes showed a six-exon/five-intron organization while those of the insects had a reduced and varied number of exons (two to three). Eutherian mammal HADH2 genes revealed some highly conserved noncoding regions, which may indicate the presence of functional elements, namely in the upstream region about 1 kb of the transcription start site and in the first part of intron 1. These regions were also conserved between Tetraodon and Fugu fishes. We identified a conserved alternative splicing event between human and dog, which have a nine amino acid deletion, causing the removal of the strand βF. This strand is one of the seven strands that compose the core β-sheet of the Rossman fold dinucleotide-binding motif characteristic of the short chain dehydrogenase/reductase (SDR) family members. However, the fact that the substrate binding cleft residues are retained and the existence of a shared variant between human and dog suggest that it might be functional. Molecular adaptation analyses across eutherian mammal orthologues revealed the existence of sites under positive selection, some of which being localized in the substrate-binding cleft and in the insertion 1 region on loop D (an important region for the Aβ-binding to the enzyme). Interestingly, a higher than expected number of nonsynonymous substitutions were observed between human/chimpanzee and orangutan, with six out of the seven amino acid replacements being under molecular adaptation (including three in loop D and one in the substrate binding loop). Conclusion Our study revealed that HADH2 genes maintained a reasonable conserved organization across a large evolutionary distance. The conserved noncoding regions identified among mammals and between pufferfishes, the evidence of an alternative splicing variant conserved between human and dog, and the detection of positive selection across eutherian mammals, may be of importance for further research on ABAD/HSD10 function and its implication in the Alzheimer's disease. PMID:16899120
Conserved noncoding sequences conserve biological networks and influence genome evolution.

PubMed

Xie, Jianbo; Qian, Kecheng; Si, Jingna; Xiao, Liang; Ci, Dong; Zhang, Deqiang

2018-05-01

Comparative genomics approaches have identified numerous conserved cis-regulatory sequences near genes in plant genomes. Despite the identification of these conserved noncoding sequences (CNSs), our knowledge of their functional importance and selection remains limited. Here, we used a combination of DNA methylome analysis, microarray expression analyses, and functional annotation to study these sequences in the model tree Populus trichocarpa. Methylation in CG contexts and non-CG contexts was lower in CNSs, particularly CNSs in the 5'-upstream regions of genes, compared with other sites in the genome. We observed that CNSs are enriched in genes with transcription and binding functions, and this also associated with syntenic genes and those from whole-genome duplications, suggesting that cis-regulatory sequences play a key role in genome evolution. We detected a significant positive correlation between CNS number and protein interactions, suggesting that CNSs may have roles in the evolution and maintenance of biological networks. The divergence of CNSs indicates that duplication-degeneration-complementation drives the subfunctionalization of a proportion of duplicated genes from whole-genome duplication. Furthermore, population genomics confirmed that most CNSs are under strong purifying selection and only a small subset of CNSs shows evidence of adaptive evolution. These findings provide a foundation for future studies exploring these key genomic features in the maintenance of biological networks, local adaptation, and transcription.
Complex organisation and structure of the ghrelin antisense strand gene GHRLOS, a candidate non-coding RNA gene

PubMed Central

Seim, Inge; Carter, Shea L; Herington, Adrian C; Chopin, Lisa K

2008-01-01

Background The peptide hormone ghrelin has many important physiological and pathophysiological roles, including the stimulation of growth hormone (GH) release, appetite regulation, gut motility and proliferation of cancer cells. We previously identified a gene on the opposite strand of the ghrelin gene, ghrelinOS (GHRLOS), which spans the promoter and untranslated regions of the ghrelin gene (GHRL). Here we further characterise GHRLOS. Results We have described GHRLOS mRNA isoforms that extend over 1.4 kb of the promoter region and 106 nucleotides of exon 4 of the ghrelin gene, GHRL. These GHRLOS transcripts initiate 4.8 kb downstream of the terminal exon 4 of GHRL and are present in the 3' untranslated exon of the adjacent gene TATDN2 (TatD DNase domain containing 2). Interestingly, we have also identified a putative non-coding TATDN2-GHRLOS chimaeric transcript, indicating that GHRLOS RNA biogenesis is extremely complex. Moreover, we have discovered that the 3' region of GHRLOS is also antisense, in a tail-to-tail fashion to a novel terminal exon of the neighbouring SEC13 gene, which is important in protein transport. Sequence analyses revealed that GHRLOS is riddled with stop codons, and that there is little nucleotide and amino-acid sequence conservation of the GHRLOS gene between vertebrates. The gene spans 44 kb on 3p25.3, is extensively spliced and harbours multiple variable exons. We have also investigated the expression of GHRLOS and found evidence of differential tissue expression. It is highly expressed in tissues which are emerging as major sites of non-coding RNA expression (the thymus, brain, and testis), as well as in the ovary and uterus. In contrast, very low levels were found in the stomach where sense, GHRL derived RNAs are highly expressed. Conclusion GHRLOS RNA transcripts display several distinctive features of non-coding (ncRNA) genes, including 5' capping, polyadenylation, extensive splicing and short open reading frames. The gene is also non-conserved, with differential and tissue-restricted expression. The overlapping genomic arrangement of GHRLOS with the ghrelin gene indicates that it is likely to have interesting regulatory and functional roles in the ghrelin axis. PMID:18954468
Complex organisation and structure of the ghrelin antisense strand gene GHRLOS, a candidate non-coding RNA gene.

PubMed

Seim, Inge; Carter, Shea L; Herington, Adrian C; Chopin, Lisa K

2008-10-28

The peptide hormone ghrelin has many important physiological and pathophysiological roles, including the stimulation of growth hormone (GH) release, appetite regulation, gut motility and proliferation of cancer cells. We previously identified a gene on the opposite strand of the ghrelin gene, ghrelinOS (GHRLOS), which spans the promoter and untranslated regions of the ghrelin gene (GHRL). Here we further characterise GHRLOS. We have described GHRLOS mRNA isoforms that extend over 1.4 kb of the promoter region and 106 nucleotides of exon 4 of the ghrelin gene, GHRL. These GHRLOS transcripts initiate 4.8 kb downstream of the terminal exon 4 of GHRL and are present in the 3' untranslated exon of the adjacent gene TATDN2 (TatD DNase domain containing 2). Interestingly, we have also identified a putative non-coding TATDN2-GHRLOS chimaeric transcript, indicating that GHRLOS RNA biogenesis is extremely complex. Moreover, we have discovered that the 3' region of GHRLOS is also antisense, in a tail-to-tail fashion to a novel terminal exon of the neighbouring SEC13 gene, which is important in protein transport. Sequence analyses revealed that GHRLOS is riddled with stop codons, and that there is little nucleotide and amino-acid sequence conservation of the GHRLOS gene between vertebrates. The gene spans 44 kb on 3p25.3, is extensively spliced and harbours multiple variable exons. We have also investigated the expression of GHRLOS and found evidence of differential tissue expression. It is highly expressed in tissues which are emerging as major sites of non-coding RNA expression (the thymus, brain, and testis), as well as in the ovary and uterus. In contrast, very low levels were found in the stomach where sense, GHRL derived RNAs are highly expressed. GHRLOS RNA transcripts display several distinctive features of non-coding (ncRNA) genes, including 5' capping, polyadenylation, extensive splicing and short open reading frames. The gene is also non-conserved, with differential and tissue-restricted expression. The overlapping genomic arrangement of GHRLOS with the ghrelin gene indicates that it is likely to have interesting regulatory and functional roles in the ghrelin axis.
The Most Deeply Conserved Noncoding Sequences in Plants Serve Similar Functions to Those in Vertebrates Despite Large Differences in Evolutionary Rates[W

PubMed Central

Burgess, Diane; Freeling, Michael

2014-01-01

In vertebrates, conserved noncoding elements (CNEs) are functionally constrained sequences that can show striking conservation over >400 million years of evolutionary distance and frequently are located megabases away from target developmental genes. Conserved noncoding sequences (CNSs) in plants are much shorter, and it has been difficult to detect conservation among distantly related genomes. In this article, we show not only that CNS sequences can be detected throughout the eudicot clade of flowering plants, but also that a subset of 37 CNSs can be found in all flowering plants (diverging ∼170 million years ago). These CNSs are functionally similar to vertebrate CNEs, being highly associated with transcription factor and development genes and enriched in transcription factor binding sites. Some of the most highly conserved sequences occur in genes encoding RNA binding proteins, particularly the RNA splicing–associated SR genes. Differences in sequence conservation between plants and animals are likely to reflect differences in the biology of the organisms, with plants being much more able to tolerate genomic deletions and whole-genome duplication events due, in part, to their far greater fecundity compared with vertebrates. PMID:24681619
Early Evolution of Conserved Regulatory Sequences Associated with Development in Vertebrates

PubMed Central

McEwen, Gayle K.; Goode, Debbie K.; Parker, Hugo J.; Woolfe, Adam; Callaway, Heather; Elgar, Greg

2009-01-01

Comparisons between diverse vertebrate genomes have uncovered thousands of highly conserved non-coding sequences, an increasing number of which have been shown to function as enhancers during early development. Despite their extreme conservation over 500 million years from humans to cartilaginous fish, these elements appear to be largely absent in invertebrates, and, to date, there has been little understanding of their mode of action or the evolutionary processes that have modelled them. We have now exploited emerging genomic sequence data for the sea lamprey, Petromyzon marinus, to explore the depth of conservation of this type of element in the earliest diverging extant vertebrate lineage, the jawless fish (agnathans). We searched for conserved non-coding elements (CNEs) at 13 human gene loci and identified lamprey elements associated with all but two of these gene regions. Although markedly shorter and less well conserved than within jawed vertebrates, identified lamprey CNEs are able to drive specific patterns of expression in zebrafish embryos, which are almost identical to those driven by the equivalent human elements. These CNEs are therefore a unique and defining characteristic of all vertebrates. Furthermore, alignment of lamprey and other vertebrate CNEs should permit the identification of persistent sequence signatures that are responsible for common patterns of expression and contribute to the elucidation of the regulatory language in CNEs. Identifying the core regulatory code for development, common to all vertebrates, provides a foundation upon which regulatory networks can be constructed and might also illuminate how large conserved regulatory sequence blocks evolve and become fixed in genomic DNA. PMID:20011110
Disease-Causing 7.4 kb Cis-Regulatory Deletion Disrupting Conserved Non-Coding Sequences and Their Interaction with the FOXL2 Promotor: Implications for Mutation Screening

PubMed Central

Dostie, Josée; Lemire, Edmond; Bouchard, Philippe; Field, Michael; Jones, Kristie; Lorenz, Birgit; Menten, Björn; Buysse, Karen; Pattyn, Filip; Friedli, Marc; Ucla, Catherine; Rossier, Colette; Wyss, Carine; Speleman, Frank; De Paepe, Anne; Dekker, Job; Antonarakis, Stylianos E.; De Baere, Elfride

2009-01-01

To date, the contribution of disrupted potentially cis-regulatory conserved non-coding sequences (CNCs) to human disease is most likely underestimated, as no systematic screens for putative deleterious variations in CNCs have been conducted. As a model for monogenic disease we studied the involvement of genetic changes of CNCs in the cis-regulatory domain of FOXL2 in blepharophimosis syndrome (BPES). Fifty-seven molecularly unsolved BPES patients underwent high-resolution copy number screening and targeted sequencing of CNCs. Apart from three larger distant deletions, a de novo deletion as small as 7.4 kb was found at 283 kb 5′ to FOXL2. The deletion appeared to be triggered by an H-DNA-induced double-stranded break (DSB). In addition, it disrupts a novel long non-coding RNA (ncRNA) PISRT1 and 8 CNCs. The regulatory potential of the deleted CNCs was substantiated by in vitro luciferase assays. Interestingly, Chromosome Conformation Capture (3C) of a 625 kb region surrounding FOXL2 in expressing cellular systems revealed physical interactions of three upstream fragments and the FOXL2 core promoter. Importantly, one of these contains the 7.4 kb deleted fragment. Overall, this study revealed the smallest distant deletion causing monogenic disease and impacts upon the concept of mutation screening in human disease and developmental disorders in particular. PMID:19543368
Chimeric mitochondrial minichromosomes of the human body louse, Pediculus humanus: evidence for homologous and non-homologous recombination.

PubMed

Shao, Renfu; Barker, Stephen C

2011-02-15

The mitochondrial (mt) genome of the human body louse, Pediculus humanus, consists of 18 minichromosomes. Each minichromosome is 3 to 4 kb long and has 1 to 3 genes. There is unequivocal evidence for recombination between different mt minichromosomes in P. humanus. It is not known, however, how these minichromosomes recombine. Here, we report the discovery of eight chimeric mt minichromosomes in P. humanus. We classify these chimeric mt minichromosomes into two groups: Group I and Group II. Group I chimeric minichromosomes contain parts of two different protein-coding genes that are from different minichromosomes. The two parts of protein-coding genes in each Group I chimeric minichromosome are joined at a microhomologous nucleotide sequence; microhomologous nucleotide sequences are hallmarks of non-homologous recombination. Group II chimeric minichromosomes contain all of the genes and the non-coding regions of two different minichromosomes. The conserved sequence blocks in the non-coding regions of Group II chimeric minichromosomes resemble the "recombination repeats" in the non-coding regions of the mt genomes of higher plants. These repeats are essential to homologous recombination in higher plants. Our analyses of the nucleotide sequences of chimeric mt minichromosomes indicate both homologous and non-homologous recombination between minichromosomes in the mitochondria of the human body louse. Copyright © 2010 Elsevier B.V. All rights reserved.
Many human accelerated regions are developmental enhancers

PubMed Central

Capra, John A.; Erwin, Genevieve D.; McKinsey, Gabriel; Rubenstein, John L. R.; Pollard, Katherine S.

2013-01-01

The genetic changes underlying the dramatic differences in form and function between humans and other primates are largely unknown, although it is clear that gene regulatory changes play an important role. To identify regulatory sequences with potentially human-specific functions, we and others used comparative genomics to find non-coding regions conserved across mammals that have acquired many sequence changes in humans since divergence from chimpanzees. These regions are good candidates for performing human-specific regulatory functions. Here, we analysed the DNA sequence, evolutionary history, histone modifications, chromatin state and transcription factor (TF) binding sites of a combined set of 2649 non-coding human accelerated regions (ncHARs) and predicted that at least 30% of them function as developmental enhancers. We prioritized the predicted ncHAR enhancers using analysis of TF binding site gain and loss, along with the functional annotations and expression patterns of nearby genes. We then tested both the human and chimpanzee sequence for 29 ncHARs in transgenic mice, and found 24 novel developmental enhancers active in both species, 17 of which had very consistent patterns of activity in specific embryonic tissues. Of these ncHAR enhancers, five drove expression patterns suggestive of different activity for the human and chimpanzee sequence at embryonic day 11.5. The changes to human non-coding DNA in these ncHAR enhancers may modify the complex patterns of gene expression necessary for proper development in a human-specific manner and are thus promising candidates for understanding the genetic basis of human-specific biology. PMID:24218637
The Mitochondrial Cytochrome Oxidase Subunit I Gene Occurs on a Minichromosome with Extensive Heteroplasmy in Two Species of Chewing Lice, Geomydoecus aurei and Thomomydoecus minor

PubMed Central

Pietan, Lucas L.; Spradling, Theresa A.

2016-01-01

In animals, mitochondrial DNA (mtDNA) typically occurs as a single circular chromosome with 13 protein-coding genes and 22 tRNA genes. The various species of lice examined previously, however, have shown mitochondrial genome rearrangements with a range of chromosome sizes and numbers. Our research demonstrates that the mitochondrial genomes of two species of chewing lice found on pocket gophers, Geomydoecus aurei and Thomomydoecus minor, are fragmented with the 1,536 base-pair (bp) cytochrome-oxidase subunit I (cox1) gene occurring as the only protein-coding gene on a 1,916–1,964 bp minicircular chromosome in the two species, respectively. The cox1 gene of T. minor begins with an atypical start codon, while that of G. aurei does not. Components of the non-protein coding sequence of G. aurei and T. minor include a tRNA (isoleucine) gene, inverted repeat sequences consistent with origins of replication, and an additional non-coding region that is smaller than the non-coding sequence of other lice with such fragmented mitochondrial genomes. Sequences of cox1 minichromosome clones for each species reveal extensive length and sequence heteroplasmy in both coding and noncoding regions. The highly variable non-gene regions of G. aurei and T. minor have little sequence similarity with one another except for a 19-bp region of phylogenetically conserved sequence with unknown function. PMID:27589589
Detection of hyper-conserved regions in hepatitis B virus X gene potentially useful for gene therapy.

PubMed

González, Carolina; Tabernero, David; Cortese, Maria Francesca; Gregori, Josep; Casillas, Rosario; Riveiro-Barciela, Mar; Godoy, Cristina; Sopena, Sara; Rando, Ariadna; Yll, Marçal; Lopez-Martinez, Rosa; Quer, Josep; Esteban, Rafael; Buti, Maria; Rodríguez-Frías, Francisco

2018-05-21

To detect hyper-conserved regions in the hepatitis B virus (HBV) X gene ( HBX ) 5' region that could be candidates for gene therapy. The study included 27 chronic hepatitis B treatment-naive patients in various clinical stages (from chronic infection to cirrhosis and hepatocellular carcinoma, both HBeAg-negative and HBeAg-positive), and infected with HBV genotypes A-F and H. In a serum sample from each patient with viremia > 3.5 log IU/mL, the HBX 5' end region [nucleotide (nt) 1255-1611] was PCR-amplified and submitted to next-generation sequencing (NGS). We assessed genotype variants by phylogenetic analysis, and evaluated conservation of this region by calculating the information content of each nucleotide position in a multiple alignment of all unique sequences (haplotypes) obtained by NGS. Conservation at the HBx protein amino acid (aa) level was also analyzed. NGS yielded 1333069 sequences from the 27 samples, with a median of 4578 sequences/sample (2487-9279, IQR 2817). In 14/27 patients (51.8%), phylogenetic analysis of viral nucleotide haplotypes showed a complex mixture of genotypic variants. Analysis of the information content in the haplotype multiple alignments detected 2 hyper-conserved nucleotide regions, one in the HBX upstream non-coding region (nt 1255-1286) and the other in the 5' end coding region (nt 1519-1603). This last region coded for a conserved amino acid region (aa 63-76) that partially overlaps a Kunitz-like domain. Two hyper-conserved regions detected in the HBX 5' end may be of value for targeted gene therapy, regardless of the patients' clinical stage or HBV genotype.
Conserved Noncoding Elements in the Most Distant Genera of Cephalochordates: The Goldilocks Principle

PubMed Central

Yue, Jia-Xing; Kozmikova, Iryna; Ono, Hiroki; Nossa, Carlos W.; Kozmik, Zbynek; Putnam, Nicholas H.; Yu, Jr-Kai; Holland, Linda Z.

2016-01-01

Cephalochordates, the sister group of vertebrates + tunicates, are evolving particularly slowly. Therefore, genome comparisons between two congeners of Branchiostoma revealed so many conserved noncoding elements (CNEs), that it was not clear how many are functional regulatory elements. To more effectively identify CNEs with potential regulatory functions, we compared noncoding sequences of genomes of the most phylogenetically distant cephalochordate genera, Asymmetron and Branchiostoma, which diverged approximately 120–160 million years ago. We found 113,070 noncoding elements conserved between the two species, amounting to 3.3% of the genome. The genomic distribution, target gene ontology, and enriched motifs of these CNEs all suggest that many of them are probably cis-regulatory elements. More than 90% of previously verified amphioxus regulatory elements were re-captured in this study. A search of the cephalochordate CNEs around 50 developmental genes in several vertebrate genomes revealed eight CNEs conserved between cephalochordates and vertebrates, indicating sequence conservation over >500 million years of divergence. The function of five CNEs was tested in reporter assays in zebrafish, and one was also tested in amphioxus. All five CNEs proved to be tissue-specific enhancers. Taken together, these findings indicate that even though Branchiostoma and Asymmetron are distantly related, as they are evolving slowly, comparisons between them are likely optimal for identifying most of their tissue-specific cis-regulatory elements laying the foundation for functional characterizations and a better understanding of the evolution of developmental regulation in cephalochordates. PMID:27412606
Highly tissue specific expression of Sphinx supports its male courtship related role in Drosophila melanogaster.

PubMed

Chen, Ying; Dai, Hongzheng; Chen, Sidi; Zhang, Luoying; Long, Manyuan

2011-04-26

Sphinx is a lineage-specific non-coding RNA gene involved in regulating courtship behavior in Drosophila melanogaster. The 5' flanking region of the gene is conserved across Drosophila species, with the proximal 300 bp being conserved out to D. virilis and a further 600 bp region being conserved amongst the melanogaster subgroup (D. melanogaster, D. simulans, D. sechellia, D. yakuba, and D. erecta). Using a green fluorescence protein transformation system, we demonstrated that a 253 bp region of the highly conserved segment was sufficient to drive sphinx expression in male accessory gland. GFP signals were also observed in brain, wing hairs and leg bristles. An additional ∼800 bp upstream region was able to enhance expression specifically in proboscis, suggesting the existence of enhancer elements. Using anti-GFP staining, we identified putative sphinx expression signal in the brain antennal lobe and inner antennocerebral tract, suggesting that sphinx might be involved in olfactory neuron mediated regulation of male courtship behavior. Whole genome expression profiling of the sphinx knockout mutation identified significant up-regulated gene categories related to accessory gland protein function and odor perception, suggesting sphinx might be a negative regulator of its target genes.
Highly Tissue Specific Expression of Sphinx Supports Its Male Courtship Related Role in Drosophila melanogaster

PubMed Central

Chen, Sidi; Zhang, Luoying; Long, Manyuan

2011-01-01

Sphinx is a lineage-specific non-coding RNA gene involved in regulating courtship behavior in Drosophila melanogaster. The 5′ flanking region of the gene is conserved across Drosophila species, with the proximal 300 bp being conserved out to D. virilis and a further 600 bp region being conserved amongst the melanogaster subgroup (D. melanogaster, D. simulans, D. sechellia, D. yakuba, and D. erecta). Using a green fluorescence protein transformation system, we demonstrated that a 253 bp region of the highly conserved segment was sufficient to drive sphinx expression in male accessory gland. GFP signals were also observed in brain, wing hairs and leg bristles. An additional ∼800 bp upstream region was able to enhance expression specifically in proboscis, suggesting the existence of enhancer elements. Using anti-GFP staining, we identified putative sphinx expression signal in the brain antennal lobe and inner antennocerebral tract, suggesting that sphinx might be involved in olfactory neuron mediated regulation of male courtship behavior. Whole genome expression profiling of the sphinx knockout mutation identified significant up-regulated gene categories related to accessory gland protein function and odor perception, suggesting sphinx might be a negative regulator of its target genes. PMID:21541324
Conserved Non-Coding Sequences are Associated with Rates of mRNA Decay in Arabidopsis.

PubMed

Spangler, Jacob B; Feltus, Frank Alex

2013-01-01

Steady-state mRNA levels are tightly regulated through a combination of transcriptional and post-transcriptional control mechanisms. The discovery of cis-acting DNA elements that encode these control mechanisms is of high importance. We have investigated the influence of conserved non-coding sequences (CNSs), DNA patterns retained after an ancient whole genome duplication event, on the breadth of gene expression and the rates of mRNA decay in Arabidopsis thaliana. The absence of CNSs near α duplicate genes was associated with a decrease in breadth of gene expression and slower mRNA decay rates while the presence CNSs near α duplicates was associated with an increase in breadth of gene expression and faster mRNA decay rates. The observed difference in mRNA decay rate was fastest in genes with CNSs in both non-transcribed and transcribed regions, albeit through an unknown mechanism. This study supports the notion that some Arabidopsis CNSs regulate the steady-state mRNA levels through post-transcriptional control mechanisms and that CNSs also play a role in controlling the breadth of gene expression.

Conserved Non-Coding Sequences are Associated with Rates of mRNA Decay in Arabidopsis

PubMed Central

Spangler, Jacob B.; Feltus, Frank Alex

2013-01-01

Steady-state mRNA levels are tightly regulated through a combination of transcriptional and post-transcriptional control mechanisms. The discovery of cis-acting DNA elements that encode these control mechanisms is of high importance. We have investigated the influence of conserved non-coding sequences (CNSs), DNA patterns retained after an ancient whole genome duplication event, on the breadth of gene expression and the rates of mRNA decay in Arabidopsis thaliana. The absence of CNSs near α duplicate genes was associated with a decrease in breadth of gene expression and slower mRNA decay rates while the presence CNSs near α duplicates was associated with an increase in breadth of gene expression and faster mRNA decay rates. The observed difference in mRNA decay rate was fastest in genes with CNSs in both non-transcribed and transcribed regions, albeit through an unknown mechanism. This study supports the notion that some Arabidopsis CNSs regulate the steady-state mRNA levels through post-transcriptional control mechanisms and that CNSs also play a role in controlling the breadth of gene expression. PMID:23675377
SHOX gene and conserved noncoding element deletions/duplications in Colombian patients with idiopathic short stature.

PubMed

Sandoval, Gloria Tatiana Vinasco; Jaimes, Giovanna Carola; Barrios, Mauricio Coll; Cespedes, Camila; Velasco, Harvy Mauricio

2014-03-01

SHOX gene mutations or haploinsufficiency cause a wide range of phenotypes such as Leri Weill dyschondrosteosis (LWD), Turner syndrome, and disproportionate short stature (DSS). However, this gene has also been found to be mutated in cases of idiopathic short stature (ISS) with a 3-15% frequency. In this study, the multiplex ligation-dependent probe amplification (MLPA) technique was employed to determine the frequency of SHOX gene mutations and their conserved noncoding elements (CNE) in Colombian patients with ISS. Patients were referred from different centers around the county. From a sample of 62 patients, 8.1% deletions and insertions in the intragenic regions and in the CNE were found. This result is similar to others published in other countries. Moreover, an isolated case of CNE 9 duplication and a new intron 6b deletion in another patient, associated with ISS, are described. This is one of the first studies of a Latin American population in which deletions/duplications of the SHOX gene and its CNE are examined in patients with ISS.
SHOX gene and conserved noncoding element deletions/duplications in Colombian patients with idiopathic short stature

PubMed Central

Sandoval, Gloria Tatiana Vinasco; Jaimes, Giovanna Carola; Barrios, Mauricio Coll; Cespedes, Camila; Velasco, Harvy Mauricio

2014-01-01

SHOX gene mutations or haploinsufficiency cause a wide range of phenotypes such as Leri Weill dyschondrosteosis (LWD), Turner syndrome, and disproportionate short stature (DSS). However, this gene has also been found to be mutated in cases of idiopathic short stature (ISS) with a 3–15% frequency. In this study, the multiplex ligation-dependent probe amplification (MLPA) technique was employed to determine the frequency of SHOX gene mutations and their conserved noncoding elements (CNE) in Colombian patients with ISS. Patients were referred from different centers around the county. From a sample of 62 patients, 8.1% deletions and insertions in the intragenic regions and in the CNE were found. This result is similar to others published in other countries. Moreover, an isolated case of CNE 9 duplication and a new intron 6b deletion in another patient, associated with ISS, are described. This is one of the first studies of a Latin American population in which deletions/duplications of the SHOX gene and its CNE are examined in patients with ISS. PMID:24689071
Three-Dimensional RNA Structure of the Major HIV-1 Packaging Signal Region

PubMed Central

Stephenson, James D.; Li, Haitao; Kenyon, Julia C.; Symmons, Martyn; Klenerman, Dave; Lever, Andrew M.L.

2013-01-01

Summary HIV-1 genomic RNA has a noncoding 5′ region containing sequential conserved structural motifs that control many parts of the life cycle. Very limited data exist on their three-dimensional (3D) conformation and, hence, how they work structurally. To assemble a working model, we experimentally reassessed secondary structure elements of a 240-nt region and used single-molecule distances, derived from fluorescence resonance energy transfer, between defined locations in these elements as restraints to drive folding of the secondary structure into a 3D model with an estimated resolution below 10 Å. The folded 3D model satisfying the data is consensual with short nuclear-magnetic-resonance-solved regions and reveals previously unpredicted motifs, offering insight into earlier functional assays. It is a 3D representation of this entire region, with implications for RNA dimerization and protein binding during regulatory steps. The structural information of this highly conserved region of the virus has the potential to reveal promising therapeutic targets. PMID:23685210
Comparative Mitogenomics of Plant Bugs (Hemiptera: Miridae): Identifying the AGG Codon Reassignments between Serine and Lysine

PubMed Central

Wang, Pei; Song, Fan; Cai, Wanzhi

2014-01-01

Insect mitochondrial genomes are very important to understand the molecular evolution as well as for phylogenetic and phylogeographic studies of the insects. The Miridae are the largest family of Heteroptera encompassing more than 11,000 described species and of great economic importance. For better understanding the diversity and the evolution of plant bugs, we sequence five new mitochondrial genomes and present the first comparative analysis of nine mitochondrial genomes of mirids available to date. Our result showed that gene content, gene arrangement, base composition and sequences of mitochondrial transcription termination factor were conserved in plant bugs. Intra-genus species shared more conserved genomic characteristics, such as nucleotide and amino acid composition of protein-coding genes, secondary structure and anticodon mutations of tRNAs, and non-coding sequences. Control region possessed several distinct characteristics, including: variable size, abundant tandem repetitions, and intra-genus conservation; and was useful in evolutionary and population genetic studies. The AGG codon reassignments were investigated between serine and lysine in the genera Adelphocoris and other cimicomorphans. Our analysis revealed correlated evolution between reassignments of the AGG codon and specific point mutations at the antidocons of tRNALys and tRNASer(AGN). Phylogenetic analysis indicated that mitochondrial genome sequences were useful in resolving family level relationship of Cimicomorpha. Comparative evolutionary analysis of plant bug mitochondrial genomes allowed the identification of previously neglected coding genes or non-coding regions as potential molecular markers. The finding of the AGG codon reassignments between serine and lysine indicated the parallel evolution of the genetic code in Hemiptera mitochondrial genomes. PMID:24988409
Divergent evolutionary rates in vertebrate and mammalian specific conserved non-coding elements (CNEs) in echolocating mammals.

PubMed

Davies, Kalina T J; Tsagkogeorga, Georgia; Rossiter, Stephen J

2014-12-19

The majority of DNA contained within vertebrate genomes is non-coding, with a certain proportion of this thought to play regulatory roles during development. Conserved Non-coding Elements (CNEs) are an abundant group of putative regulatory sequences that are highly conserved across divergent groups and thus assumed to be under strong selective constraint. Many CNEs may contain regulatory factor binding sites, and their frequent spatial association with key developmental genes - such as those regulating sensory system development - suggests crucial roles in regulating gene expression and cellular patterning. Yet surprisingly little is known about the molecular evolution of CNEs across diverse mammalian taxa or their role in specific phenotypic adaptations. We examined 3,110 vertebrate-specific and ~82,000 mammalian-specific CNEs across 19 and 9 mammalian orders respectively, and tested for changes in the rate of evolution of CNEs located in the proximity of genes underlying the development or functioning of auditory systems. As we focused on CNEs putatively associated with genes underlying the development/functioning of auditory systems, we incorporated echolocating taxa in our dataset because of their highly specialised and derived auditory systems. Phylogenetic reconstructions of concatenated CNEs broadly recovered accepted mammal relationships despite high levels of sequence conservation. We found that CNE substitution rates were highest in rodents and lowest in primates, consistent with previous findings. Comparisons of CNE substitution rates from several genomic regions containing genes linked to auditory system development and hearing revealed differences between echolocating and non-echolocating taxa. Wider taxonomic sampling of four CNEs associated with the homeobox genes Hmx2 and Hmx3 - which are required for inner ear development - revealed family-wise variation across diverse bat species. Specifically within one family of echolocating bats that utilise frequency-modulated echolocation calls varying widely in frequency and intensity high levels of sequence divergence were found. Levels of selective constraint acting on CNEs differed both across genomic locations and taxa, with observed variation in substitution rates of CNEs among bat species. More work is needed to determine whether this variation can be linked to echolocation, and wider taxonomic sampling is necessary to fully document levels of conservation in CNEs across diverse taxa.
The identification and functional annotation of RNA structures conserved in vertebrates

PubMed Central

Seemann, Stefan E.; Mirza, Aashiq H.; Hansen, Claus; Bang-Berthelsen, Claus H.; Garde, Christian; Christensen-Dalsgaard, Mikkel; Torarinsson, Elfar; Yao, Zizhen; Workman, Christopher T.; Pociot, Flemming; Nielsen, Henrik; Tommerup, Niels; Ruzzo, Walter L.; Gorodkin, Jan

2017-01-01

Structured elements of RNA molecules are essential in, e.g., RNA stabilization, localization, and protein interaction, and their conservation across species suggests a common functional role. We computationally screened vertebrate genomes for conserved RNA structures (CRSs), leveraging structure-based, rather than sequence-based, alignments. After careful correction for sequence identity and GC content, we predict ∼516,000 human genomic regions containing CRSs. We find that a substantial fraction of human–mouse CRS regions (1) colocalize consistently with binding sites of the same RNA binding proteins (RBPs) or (2) are transcribed in corresponding tissues. Additionally, a CaptureSeq experiment revealed expression of many of our CRS regions in human fetal brain, including 662 novel ones. For selected human and mouse candidate pairs, qRT-PCR and in vitro RNA structure probing supported both shared expression and shared structure despite low abundance and low sequence identity. About 30,000 CRS regions are located near coding or long noncoding RNA genes or within enhancers. Structured (CRS overlapping) enhancer RNAs and extended 3′ ends have significantly increased expression levels over their nonstructured counterparts. Our findings of transcribed uncharacterized regulatory regions that contain CRSs support their RNA-mediated functionality. PMID:28487280
Analysis of a new homozygous deletion in the tumor suppressor region at 3p12.3 reveals two novel intronic noncoding RNA genes.

PubMed

Angeloni, Debora; ter Elst, Arja; Wei, Ming Hui; van der Veen, Anneke Y; Braga, Eleonora A; Klimov, Eugene A; Timmer, Tineke; Korobeinikova, Luba; Lerman, Michael I; Buys, Charles H C M

2006-07-01

Homozygous deletions or loss of heterozygosity (LOH) at human chromosome band 3p12 are consistent features of lung and other malignancies, suggesting the presence of a tumor suppressor gene(s) (TSG) at this location. Only one gene has been cloned thus far from the overlapping region deleted in lung and breast cancer cell lines U2020, NCI H2198, and HCC38. It is DUTT1 (Deleted in U Twenty Twenty), also known as ROBO1, FLJ21882, and SAX3, according to HUGO. DUTT1, the human ortholog of the fly gene ROBO, has homology with NCAM proteins. Extensive analyses of DUTT1 in lung cancer have not revealed any mutations, suggesting that another gene(s) at this location could be of importance in lung cancer initiation and progression. Here, we report the discovery of a new, small, homozygous deletion in the small cell lung cancer (SCLC) cell line GLC20, nested in the overlapping, critical region. The deletion was delineated using several polymorphic markers and three overlapping P1 phage clones. Fiber-FISH experiments revealed the deletion was approximately 130 kb. Comparative genomic sequence analysis uncovered short sequence elements highly conserved among mammalian genomes and the chicken genome. The discovery of two EST clusters within the deleted region led to the isolation of two noncoding RNA (ncRNA) genes. These were subsequently found differentially expressed in various tumors when compared to their normal tissues. The ncRNA and other highly conserved sequence elements in the deleted region may represent miRNA targets of importance in cancer initiation or progression. Published 2006 Wiley-Liss, Inc.
A subset of conserved mammalian long non-coding RNAs are fossils of ancestral protein-coding genes.

PubMed

Hezroni, Hadas; Ben-Tov Perry, Rotem; Meir, Zohar; Housman, Gali; Lubelsky, Yoav; Ulitsky, Igor

2017-08-30

Only a small portion of human long non-coding RNAs (lncRNAs) appear to be conserved outside of mammals, but the events underlying the birth of new lncRNAs in mammals remain largely unknown. One potential source is remnants of protein-coding genes that transitioned into lncRNAs. We systematically compare lncRNA and protein-coding loci across vertebrates, and estimate that up to 5% of conserved mammalian lncRNAs are derived from lost protein-coding genes. These lncRNAs have specific characteristics, such as broader expression domains, that set them apart from other lncRNAs. Fourteen lncRNAs have sequence similarity with the loci of the contemporary homologs of the lost protein-coding genes. We propose that selection acting on enhancer sequences is mostly responsible for retention of these regions. As an example of an RNA element from a protein-coding ancestor that was retained in the lncRNA, we describe in detail a short translated ORF in the JPX lncRNA that was derived from an upstream ORF in a protein-coding gene and retains some of its functionality. We estimate that ~ 55 annotated conserved human lncRNAs are derived from parts of ancestral protein-coding genes, and loss of coding potential is thus a non-negligible source of new lncRNAs. Some lncRNAs inherited regulatory elements influencing transcription and translation from their protein-coding ancestors and those elements can influence the expression breadth and functionality of these lncRNAs.
A class of circadian long non-coding RNAs mark enhancers modulating long-range circadian gene regulation

PubMed Central

Fan, Zenghua; Zhao, Meng; Joshi, Parth D.; Li, Ping; Zhang, Yan; Guo, Weimin; Xu, Yichi; Wang, Haifang; Zhao, Zhihu

2017-01-01

Abstract Circadian rhythm exerts its influence on animal physiology and behavior by regulating gene expression at various levels. Here we systematically explored circadian long non-coding RNAs (lncRNAs) in mouse liver and examined their circadian regulation. We found that a significant proportion of circadian lncRNAs are expressed at enhancer regions, mostly bound by two key circadian transcription factors, BMAL1 and REV-ERBα. These circadian lncRNAs showed similar circadian phases with their nearby genes. The extent of their nuclear localization is higher than protein coding genes but less than enhancer RNAs. The association between enhancer and circadian lncRNAs is also observed in tissues other than liver. Comparative analysis between mouse and rat circadian liver transcriptomes showed that circadian transcription at lncRNA loci tends to be conserved despite of low sequence conservation of lncRNAs. One such circadian lncRNA termed lnc-Crot led us to identify a super-enhancer region interacting with a cluster of genes involved in circadian regulation of metabolism through long-range interactions. Further experiments showed that lnc-Crot locus has enhancer function independent of lnc-Crot's transcription. Our results suggest that the enhancer-associated circadian lncRNAs mark the genomic loci modulating long-range circadian gene regulation and shed new lights on the evolutionary origin of lncRNAs. PMID:28335007
CSTminer: a web tool for the identification of coding and noncoding conserved sequence tags through cross-species genome comparison

PubMed Central

Castrignanò, Tiziana; Canali, Alessandro; Grillo, Giorgio; Liuni, Sabino; Mignone, Flavio; Pesole, Graziano

2004-01-01

The identification and characterization of genome tracts that are highly conserved across species during evolution may contribute significantly to the functional annotation of whole-genome sequences. Indeed, such sequences are likely to correspond to known or unknown coding exons or regulatory motifs. Here, we present a web server implementing a previously developed algorithm that, by comparing user-submitted genome sequences, is able to identify statistically significant conserved blocks and assess their coding or noncoding nature through the measure of a coding potential score. The web tool, available at http://www.caspur.it/CSTminer/, is dynamically interconnected with the Ensembl genome resources and produces a graphical output showing a map of detected conserved sequences and annotated gene features. PMID:15215464
Characterization of Non-coding DNA Satellites Associated with Sweepoviruses (Genus Begomovirus, Geminiviridae) – Definition of a Distinct Class of Begomovirus-Associated Satellites

PubMed Central

Lozano, Gloria; Trenado, Helena P.; Fiallo-Olivé, Elvira; Chirinos, Dorys; Geraud-Pouey, Francis; Briddon, Rob W.; Navas-Castillo, Jesús

2016-01-01

Begomoviruses (family Geminiviridae) are whitefly-transmitted, plant-infecting single-stranded DNA viruses that cause crop losses throughout the warmer parts of the World. Sweepoviruses are a phylogenetically distinct group of begomoviruses that infect plants of the family Convolvulaceae, including sweet potato (Ipomoea batatas). Two classes of subviral molecules are often associated with begomoviruses, particularly in the Old World; the betasatellites and the alphasatellites. An analysis of sweet potato and Ipomoea indica samples from Spain and Merremia dissecta samples from Venezuela identified small non-coding subviral molecules in association with several distinct sweepoviruses. The sequences of 18 clones were obtained and found to be structurally similar to tomato leaf curl virus-satellite (ToLCV-sat, the first DNA satellite identified in association with a begomovirus), with a region with significant sequence identity to the conserved region of betasatellites, an A-rich sequence, a predicted stem–loop structure containing the nonanucleotide TAATATTAC, and a second predicted stem–loop. These sweepovirus-associated satellites join an increasing number of ToLCV-sat-like non-coding satellites identified recently. Although sharing some features with betasatellites, evidence is provided to suggest that the ToLCV-sat-like satellites are distinct from betasatellites and should be considered a separate class of satellites, for which the collective name deltasatellites is proposed. PMID:26925037
Evolutionary conservation of regulatory elements in vertebrate HOX gene clusters

DOE Office of Scientific and Technical Information (OSTI.GOV)

Santini, Simona; Boore, Jeffrey L.; Meyer, Axel

2003-12-31

Due to their high degree of conservation, comparisons of DNA sequences among evolutionarily distantly-related genomes permit to identify functional regions in noncoding DNA. Hox genes are optimal candidate sequences for comparative genome analyses, because they are extremely conserved in vertebrates and occur in clusters. We aligned (Pipmaker) the nucleotide sequences of HoxA clusters of tilapia, pufferfish, striped bass, zebrafish, horn shark, human and mouse (over 500 million years of evolutionary distance). We identified several highly conserved intergenic sequences, likely to be important in gene regulation. Only a few of these putative regulatory elements have been previously described as being involvedmore » in the regulation of Hox genes, while several others are new elements that might have regulatory functions. The majority of these newly identified putative regulatory elements contain short fragments that are almost completely conserved and are identical to known binding sites for regulatory proteins (Transfac). The conserved intergenic regions located between the most rostrally expressed genes in the developing embryo are longer and better retained through evolution. We document that presumed regulatory sequences are retained differentially in either A or A clusters resulting from a genome duplication in the fish lineage. This observation supports both the hypothesis that the conserved elements are involved in gene regulation and the Duplication-Deletion-Complementation model.« less
ChIP-seq Identification of Weakly Conserved Heart Enhancers

PubMed Central

Blow, Matthew J.; McCulley, David J.; Li, Zirong; Zhang, Tao; Akiyama, Jennifer A.; Holt, Amy; Plajzer-Frick, Ingrid; Shoukry, Malak; Wright, Crystal; Chen, Feng; Afzal, Veena; Bristow, James; Ren, Bing; Black, Brian L.; Rubin, Edward M.; Visel, Axel; Pennacchio, Len A.

2011-01-01

Accurate control of tissue-specific gene expression plays a pivotal role in heart development, but few cardiac transcriptional enhancers have thus far been identified. Extreme non-coding sequence conservation successfully predicts enhancers active in many tissues, but fails to identify substantial numbers of heart enhancers. Here we used ChIP-seq with the enhancer-associated protein p300 from mouse embryonic day 11.5 heart tissue to identify over three thousand candidate heart enhancers genome-wide. Compared to other tissues studied at this time-point, most candidate heart enhancers are less deeply conserved in vertebrate evolution. Nevertheless, the testing of 130 candidate regions in a transgenic mouse assay revealed that most of them reproducibly function as enhancers active in the heart, irrespective of their degree of evolutionary constraint. These results provide evidence for a large population of poorly conserved heart enhancers and suggest that the evolutionary constraint of embryonic enhancers can vary depending on tissue type. PMID:20729851
rVISTA 2.0: Evolutionary Analysis of Transcription Factor Binding Sites

DOE Office of Scientific and Technical Information (OSTI.GOV)

Loots, G G; Ovcharenko, I

2004-01-28

Identifying and characterizing the patterns of DNA cis-regulatory modules represents a challenge that has the potential to reveal the regulatory language the genome uses to dictate transcriptional dynamics. Several studies have demonstrated that regulatory modules are under positive selection and therefore are often conserved between related species. Using this evolutionary principle we have created a comparative tool, rVISTA, for analyzing the regulatory potential of noncoding sequences. The rVISTA tool combines transcription factor binding site (TFBS) predictions, sequence comparisons and cluster analysis to identify noncoding DNA regions that are highly conserved and present in a specific configuration within an alignment. Heremore » we present the newly developed version 2.0 of the rVISTA tool that can process alignments generated by both zPicture and PipMaker alignment programs or use pre-computed pairwise alignments of seven vertebrate genomes available from the ECR Browser. The rVISTA web server is closely interconnected with the TRANSFAC database, allowing users to either search for matrices present in the TRANSFAC library collection or search for user-defined consensus sequences. rVISTA tool is publicly available at http://rvista.dcode.org/.« less
The complete mitochondrial genome of the sandbar shark Carcharhinus plumbeus.

PubMed

Blower, Dean C; Ovenden, Jennifer R

2016-01-01

The sandbar shark, Carcharhinus plumbeus, a major representative species in shark fisheries worldwide is now considered vulnerable to overfishing. A pool of 774,234 Roche 454 shotgun sequences from one individual were assembled into a 16,706 bp mitogenome with 33× average coverage depth. It comprised 13 protein coding genes, 22 transfer RNA's, 2 ribosomal genes and 2 non-coding regions, typical of a vertebrate mitogenome. As expected for sharks, an A-T nucleotide bias was evident. This adds to rapidly growing number of mitogenome assemblies for the economically important Carcharhinidae family. The C. plumbeus mitogenome will assist researchers, fisheries and conservation managers interested in shark molecular systematics, phylogeography, conservation genetics, population and stock structure.
Evolution of coding and non-coding genes in HOX clusters of a marsupial.

PubMed

Yu, Hongshi; Lindsay, James; Feng, Zhi-Ping; Frankenberg, Stephen; Hu, Yanqiu; Carone, Dawn; Shaw, Geoff; Pask, Andrew J; O'Neill, Rachel; Papenfuss, Anthony T; Renfree, Marilyn B

2012-06-18

The HOX gene clusters are thought to be highly conserved amongst mammals and other vertebrates, but the long non-coding RNAs have only been studied in detail in human and mouse. The sequencing of the kangaroo genome provides an opportunity to use comparative analyses to compare the HOX clusters of a mammal with a distinct body plan to those of other mammals. Here we report a comparative analysis of HOX gene clusters between an Australian marsupial of the kangaroo family and the eutherians. There was a strikingly high level of conservation of HOX gene sequence and structure and non-protein coding genes including the microRNAs miR-196a, miR-196b, miR-10a and miR-10b and the long non-coding RNAs HOTAIR, HOTAIRM1 and HOXA11AS that play critical roles in regulating gene expression and controlling development. By microRNA deep sequencing and comparative genomic analyses, two conserved microRNAs (miR-10a and miR-10b) were identified and one new candidate microRNA with typical hairpin precursor structure that is expressed in both fibroblasts and testes was found. The prediction of microRNA target analysis showed that several known microRNA targets, such as miR-10, miR-414 and miR-464, were found in the tammar HOX clusters. In addition, several novel and putative miRNAs were identified that originated from elsewhere in the tammar genome and that target the tammar HOXB and HOXD clusters. This study confirms that the emergence of known long non-coding RNAs in the HOX clusters clearly predate the marsupial-eutherian divergence 160 Ma ago. It also identified a new potentially functional microRNA as well as conserved miRNAs. These non-coding RNAs may participate in the regulation of HOX genes to influence the body plan of this marsupial.
Evolution of coding and non-coding genes in HOX clusters of a marsupial

PubMed Central

2012-01-01

Background The HOX gene clusters are thought to be highly conserved amongst mammals and other vertebrates, but the long non-coding RNAs have only been studied in detail in human and mouse. The sequencing of the kangaroo genome provides an opportunity to use comparative analyses to compare the HOX clusters of a mammal with a distinct body plan to those of other mammals. Results Here we report a comparative analysis of HOX gene clusters between an Australian marsupial of the kangaroo family and the eutherians. There was a strikingly high level of conservation of HOX gene sequence and structure and non-protein coding genes including the microRNAs miR-196a, miR-196b, miR-10a and miR-10b and the long non-coding RNAs HOTAIR, HOTAIRM1 and HOXA11AS that play critical roles in regulating gene expression and controlling development. By microRNA deep sequencing and comparative genomic analyses, two conserved microRNAs (miR-10a and miR-10b) were identified and one new candidate microRNA with typical hairpin precursor structure that is expressed in both fibroblasts and testes was found. The prediction of microRNA target analysis showed that several known microRNA targets, such as miR-10, miR-414 and miR-464, were found in the tammar HOX clusters. In addition, several novel and putative miRNAs were identified that originated from elsewhere in the tammar genome and that target the tammar HOXB and HOXD clusters. Conclusions This study confirms that the emergence of known long non-coding RNAs in the HOX clusters clearly predate the marsupial-eutherian divergence 160 Ma ago. It also identified a new potentially functional microRNA as well as conserved miRNAs. These non-coding RNAs may participate in the regulation of HOX genes to influence the body plan of this marsupial. PMID:22708672
The complete mitochondrial genome of Pholis nebulosus (Perciformes: Pholidae).

PubMed

Wang, Zhongquan; Qin, Kaili; Liu, Jingxi; Song, Na; Han, Zhiqiang; Gao, Tianxiang

2016-11-01

In this study, the complete mitochondrial genome (mitogenome) sequence of Pholis nebulosus has been determined by long polymerase chain reaction and primer-walking methods. The mitogenome is a circular molecule of 16 524 bp in length, including the typical structure of 13 protein-coding genes, 2 ribosomal RNA genes, 22 transfer RNA genes and 2 non-coding regions (L-strand replication origin and control region), the gene contents of which are identical to those observed in most bony fishes. Within the control region, we identified the termination-associated sequence domain (TAS), and the conserved sequence block domain (CSB-F, CSB-E, CSB-D, CSB-C, CSB-B, CSB-A, CSB-1, CSB-2, CSB-3).
Structural architecture of the human long non-coding RNA, steroid receptor RNA activator

PubMed Central

Novikova, Irina V.; Hennelly, Scott P.; Sanbonmatsu, Karissa Y.

2012-01-01

While functional roles of several long non-coding RNAs (lncRNAs) have been determined, the molecular mechanisms are not well understood. Here, we report the first experimentally derived secondary structure of a human lncRNA, the steroid receptor RNA activator (SRA), 0.87 kB in size. The SRA RNA is a non-coding RNA that coactivates several human sex hormone receptors and is strongly associated with breast cancer. Coding isoforms of SRA are also expressed to produce proteins, making the SRA gene a unique bifunctional system. Our experimental findings (SHAPE, in-line, DMS and RNase V1 probing) reveal that this lncRNA has a complex structural organization, consisting of four domains, with a variety of secondary structure elements. We examine the coevolution of the SRA gene at the RNA structure and protein structure levels using comparative sequence analysis across vertebrates. Rapid evolutionary stabilization of RNA structure, combined with frame-disrupting mutations in conserved regions, suggests that evolutionary pressure preserves the RNA structural core rather than its translational product. We perform similar experiments on alternatively spliced SRA isoforms to assess their structural features. PMID:22362738

Systematic analysis of coding and noncoding DNA sequences using methods of statistical linguistics

NASA Technical Reports Server (NTRS)

Mantegna, R. N.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Peng, C. K.; Simons, M.; Stanley, H. E.

1995-01-01

We compare the statistical properties of coding and noncoding regions in eukaryotic and viral DNA sequences by adapting two tests developed for the analysis of natural languages and symbolic sequences. The data set comprises all 30 sequences of length above 50 000 base pairs in GenBank Release No. 81.0, as well as the recently published sequences of C. elegans chromosome III (2.2 Mbp) and yeast chromosome XI (661 Kbp). We find that for the three chromosomes we studied the statistical properties of noncoding regions appear to be closer to those observed in natural languages than those of coding regions. In particular, (i) a n-tuple Zipf analysis of noncoding regions reveals a regime close to power-law behavior while the coding regions show logarithmic behavior over a wide interval, while (ii) an n-gram entropy measurement shows that the noncoding regions have a lower n-gram entropy (and hence a larger "n-gram redundancy") than the coding regions. In contrast to the three chromosomes, we find that for vertebrates such as primates and rodents and for viral DNA, the difference between the statistical properties of coding and noncoding regions is not pronounced and therefore the results of the analyses of the investigated sequences are less conclusive. After noting the intrinsic limitations of the n-gram redundancy analysis, we also briefly discuss the failure of the zeroth- and first-order Markovian models or simple nucleotide repeats to account fully for these "linguistic" features of DNA. Finally, we emphasize that our results by no means prove the existence of a "language" in noncoding DNA.
Molecular evolution of the HoxA cluster in the three major gnathostome lineages

PubMed Central

Chiu, Chi-hua; Amemiya, Chris; Dewar, Ken; Kim, Chang-Bae; Ruddle, Frank H.; Wagner, Günter P.

2002-01-01

The duplication of Hox clusters and their maintenance in a lineage has a prominent but little understood role in chordate evolution. Here we examined how Hox cluster duplication may influence changes in cluster architecture and patterns of noncoding sequence evolution. We sequenced the entire duplicated HoxAa and HoxAb clusters of zebrafish (Danio rerio) and extended the 5′ (posterior) part of the HoxM (HoxA-like) cluster of horn shark (Heterodontus francisci) containing the hoxa11 and hoxa13 orthologs as well as intergenic and flanking noncoding sequences. The duplicated HoxA clusters in zebrafish each house considerably fewer genes and are dramatically shorter than the single HoxA clusters of human and horn shark. We compared the intergenic sequences of the HoxA clusters of human, horn shark, zebrafish (Aa, Ab), and striped bass and found extensive conservation of noncoding sequence motifs, i.e., phylogenetic footprints, between the human and horn shark, representing two of the three gnathostome lineages. These are putative cis-regulatory elements that may play a role in the regulation of the ancestral HoxA cluster. In contrast, homologous regions of the duplicated HoxAa and HoxAb clusters of zebrafish and the HoxA cluster of striped bass revealed a striking loss of conservation of these putative cis-regulatory sequences in the 3′ (anterior) segment of the cluster, where zebrafish only retains single representatives of group 1, 3, 4, and 5 (HoxAa) and group 2 (HoxAb) genes and in the 5′ part of the clusters, where zebrafish retains two copies of the group 13, 11, and 9 genes, i.e., AbdB-like genes. In analyzing patterns of cis-sequence evolution in the 5′ part of the clusters, we explicitly looked for evidence of complementary loss of conserved noncoding sequences, as predicted by the duplication-degeneration-complementation model in which genetic redundancy after gene duplication is resolved because of the fixation of complementary degenerative mutations. Our data did not yield evidence supporting this prediction. We conclude that changes in the pattern of cis-sequence conservation after Hox cluster duplication are more consistent with being the outcome of adaptive modification rather than passive mechanisms that erode redundancy created by the duplication event. These results support the view that genome duplications may provide a mechanism whereby master control genes undergo radical modifications conducive to major alterations in body plan. Such genomic revolutions may contribute significantly to the evolutionary process. PMID:11943847
The identification and functional annotation of RNA structures conserved in vertebrates.

PubMed

Seemann, Stefan E; Mirza, Aashiq H; Hansen, Claus; Bang-Berthelsen, Claus H; Garde, Christian; Christensen-Dalsgaard, Mikkel; Torarinsson, Elfar; Yao, Zizhen; Workman, Christopher T; Pociot, Flemming; Nielsen, Henrik; Tommerup, Niels; Ruzzo, Walter L; Gorodkin, Jan

2017-08-01

Structured elements of RNA molecules are essential in, e.g., RNA stabilization, localization, and protein interaction, and their conservation across species suggests a common functional role. We computationally screened vertebrate genomes for conserved RNA structures (CRSs), leveraging structure-based, rather than sequence-based, alignments. After careful correction for sequence identity and GC content, we predict ∼516,000 human genomic regions containing CRSs. We find that a substantial fraction of human-mouse CRS regions (1) colocalize consistently with binding sites of the same RNA binding proteins (RBPs) or (2) are transcribed in corresponding tissues. Additionally, a CaptureSeq experiment revealed expression of many of our CRS regions in human fetal brain, including 662 novel ones. For selected human and mouse candidate pairs, qRT-PCR and in vitro RNA structure probing supported both shared expression and shared structure despite low abundance and low sequence identity. About 30,000 CRS regions are located near coding or long noncoding RNA genes or within enhancers. Structured (CRS overlapping) enhancer RNAs and extended 3' ends have significantly increased expression levels over their nonstructured counterparts. Our findings of transcribed uncharacterized regulatory regions that contain CRSs support their RNA-mediated functionality. © 2017 Seemann et al.; Published by Cold Spring Harbor Laboratory Press.
Spontaneous and engineered deletions in the 3' noncoding region of tick-borne encephalitis virus: construction of highly attenuated mutants of a flavivirus.

PubMed

Mandl, C W; Holzmann, H; Meixner, T; Rauscher, S; Stadler, P F; Allison, S L; Heinz, F X

1998-03-01

The flavivirus genome is a positive-strand RNA molecule containing a single long open reading frame flanked by noncoding regions (NCR) that mediate crucial processes of the viral life cycle. The 3' NCR of tick-borne encephalitis (TBE) virus can be divided into a variable region that is highly heterogeneous in length among strains of TBE virus and in certain cases includes an internal poly(A) tract and a 3'-terminal conserved core element that is believed to fold as a whole into a well-defined secondary structure. We have now investigated the genetic stability of the TBE virus 3' NCR and its influence on viral growth properties and virulence. We observed spontaneous deletions in the variable region during growth of TBE virus in cell culture and in mice. These deletions varied in size and location but always included the internal poly(A) element of the TBE virus 3' NCR and never extended into the conserved 3'-terminal core element. Subsequently, we constructed specific deletion mutants by using infectious cDNA clones with the entire variable region and increasing segments of the core element removed. A virus mutant lacking the entire variable region was indistinguishable from wild-type virus with respect to cell culture growth properties and virulence in the mouse model. In contrast, even small extensions of the deletion into the core element led to significant biological effects. Deletions extending to nucleotides 10826, 10847, and 10870 caused distinct attenuation in mice without measurable reduction of cell culture growth properties, which, however, were significantly restricted when the deletion was extended to nucleotide 10919. An even larger deletion (to nucleotide 10994) abolished viral viability. In spite of their high degree of attenuation, these mutants efficiently induced protective immune responses even at low inoculation doses. Thus, 3'-NCR deletions represent a useful technique for achieving stable attenuation of flaviviruses that can be included in the rational design of novel flavivirus live vaccines.
Conserved features of eukaryotic hsp70 genes revealed by comparison with the nucleotide sequence of human hsp70.

PubMed Central

Hunt, C; Morimoto, R I

1985-01-01

We have determined the nucleotide sequence of the human hsp70 gene and 5' flanking region. The hsp70 gene is transcribed as an uninterrupted primary transcript of 2440 nucleotides composed of a 5' noncoding leader sequence of 212 nucleotides, a 3' noncoding region of 242 nucleotides, and a continuous open reading frame of 1986 nucleotides that encodes a protein with predicted molecular mass of 69,800 daltons. Upstream of the 5' terminus are the canonical TATAAA box, the sequence ATTGG that corresponds in the inverted orientation to the CCAAT motif, and the dyad sequence CTGGAAT/ATTCCCG that shares homology in 12 of 14 positions with the consensus transcription regulatory sequence common to Drosophila heat shock genes. Comparison of the predicted amino acid sequences of human hsp70 with the published sequences of Drosophila hsp70 and Escherichia coli dnaK reveals that human hsp70 is 73% identical to Drosophila hsp70 and 47% identical to E. coli dnaK. Surprisingly, the nucleotide sequences of the human and Drosophila genes are 72% identical and human and E. coli genes are 50% identical, which is more highly conserved than necessary given the degeneracy of the genetic code. The lack of accumulated silent nucleotide substitutions leads us to propose that there may be additional information in the nucleotide sequence of the hsp70 gene or the corresponding mRNA that precludes the maximum divergence allowed in the silent codon positions. PMID:3931075
Statistical properties of DNA sequences

NASA Technical Reports Server (NTRS)

Peng, C. K.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Simons, M.; Stanley, H. E.

1995-01-01

We review evidence supporting the idea that the DNA sequence in genes containing non-coding regions is correlated, and that the correlation is remarkably long range--indeed, nucleotides thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene. We resolve the problem of the "non-stationarity" feature of the sequence of base pairs by applying a new algorithm called detrended fluctuation analysis (DFA). We address the claim of Voss that there is no difference in the statistical properties of coding and non-coding regions of DNA by systematically applying the DFA algorithm, as well as standard FFT analysis, to every DNA sequence (33301 coding and 29453 non-coding) in the entire GenBank database. Finally, we describe briefly some recent work showing that the non-coding sequences have certain statistical features in common with natural and artificial languages. Specifically, we adapt to DNA the Zipf approach to analyzing linguistic texts. These statistical properties of non-coding sequences support the possibility that non-coding regions of DNA may carry biological information.
COOLAIR Antisense RNAs Form Evolutionarily Conserved Elaborate Secondary Structures

DOE PAGES

Hawkes, Emily J.; Hennelly, Scott P.; Novikova, Irina V.; ...

2016-09-20

There is considerable debate about the functionality of long non-coding RNAs (lncRNAs). Lack of sequence conservation has been used to argue against functional relevance. Here, we investigated antisense lncRNAs, called COOLAIR, at the A. thaliana FLC locus and experimentally determined their secondary structure. The major COOLAIR variants are highly structured, organized by exon. The distally polyadenylated transcript has a complex multi-domain structure, altered by a single non-coding SNP defining a functionally distinct A. thaliana FLC haplotype. The A. thaliana COOLAIR secondary structure was used to predict COOLAIR exons in evolutionarily divergent Brassicaceae species. These predictions were validated through chemical probingmore » and cloning. Despite the relatively low nucleotide sequence identity, the structures, including multi-helix junctions, show remarkable evolutionary conservation. In a number of places, the structure is conserved through covariation of a non-contiguous DNA sequence. This structural conservation supports a functional role for COOLAIR transcripts rather than, or in addition to, antisense transcription.« less
COOLAIR Antisense RNAs Form Evolutionarily Conserved Elaborate Secondary Structures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hawkes, Emily J.; Hennelly, Scott P.; Novikova, Irina V.

There is considerable debate about the functionality of long non-coding RNAs (lncRNAs). Lack of sequence conservation has been used to argue against functional relevance. Here, we investigated antisense lncRNAs, called COOLAIR, at the A. thaliana FLC locus and experimentally determined their secondary structure. The major COOLAIR variants are highly structured, organized by exon. The distally polyadenylated transcript has a complex multi-domain structure, altered by a single non-coding SNP defining a functionally distinct A. thaliana FLC haplotype. The A. thaliana COOLAIR secondary structure was used to predict COOLAIR exons in evolutionarily divergent Brassicaceae species. These predictions were validated through chemical probingmore » and cloning. Despite the relatively low nucleotide sequence identity, the structures, including multi-helix junctions, show remarkable evolutionary conservation. In a number of places, the structure is conserved through covariation of a non-contiguous DNA sequence. This structural conservation supports a functional role for COOLAIR transcripts rather than, or in addition to, antisense transcription.« less
Genome-wide identification of conserved intronic non-coding sequences using a Bayesian segmentation approach.

PubMed

Algama, Manjula; Tasker, Edward; Williams, Caitlin; Parslow, Adam C; Bryson-Richardson, Robert J; Keith, Jonathan M

2017-03-27

Computational identification of non-coding RNAs (ncRNAs) is a challenging problem. We describe a genome-wide analysis using Bayesian segmentation to identify intronic elements highly conserved between three evolutionarily distant vertebrate species: human, mouse and zebrafish. We investigate the extent to which these elements include ncRNAs (or conserved domains of ncRNAs) and regulatory sequences. We identified 655 deeply conserved intronic sequences in a genome-wide analysis. We also performed a pathway-focussed analysis on genes involved in muscle development, detecting 27 intronic elements, of which 22 were not detected in the genome-wide analysis. At least 87% of the genome-wide and 70% of the pathway-focussed elements have existing annotations indicative of conserved RNA secondary structure. The expression of 26 of the pathway-focused elements was examined using RT-PCR, providing confirmation that they include expressed ncRNAs. Consistent with previous studies, these elements are significantly over-represented in the introns of transcription factors. This study demonstrates a novel, highly effective, Bayesian approach to identifying conserved non-coding sequences. Our results complement previous findings that these sequences are enriched in transcription factors. However, in contrast to previous studies which suggest the majority of conserved sequences are regulatory factor binding sites, the majority of conserved sequences identified using our approach contain evidence of conserved RNA secondary structures, and our laboratory results suggest most are expressed. Functional roles at DNA and RNA levels are not mutually exclusive, and many of our elements possess evidence of both. Moreover, ncRNAs play roles in transcriptional and post-transcriptional regulation, and this may contribute to the over-representation of these elements in introns of transcription factors. We attribute the higher sensitivity of the pathway-focussed analysis compared to the genome-wide analysis to improved alignment quality, suggesting that enhanced genomic alignments may reveal many more conserved intronic sequences.
Complete nucleotide sequences of the coat protein messenger RNAs of brome mosaic virus and cowpea chlorotic mottle virus.

PubMed Central

Dasgupta, R; Kaesberg, P

1982-01-01

The nucleotide sequences of the subgenomic coat protein messengers (RNA4's) of two related bromoviruses, brome mosaic virus (BMV) and cowpea chlorotic mottle virus (CCMV), have been determined by direct RNA and CDNA sequencing without cloning. BMV RNA4 is 876 b long including a 5' noncoding region of nine nucleotides and a 3' noncoding region of 300 nucleotides. CCMV RNA 4 is 824 b long, including a 5' noncoding region of 10 nucleotides and a 3' noncoding region of 244 nucleotides. The encoded coat proteins are similar in length (188 amino acids for BMV and 189 amino acids for CCMV) and display about 70% homology in their amino acid sequences. Length difference between the two RNAs is due mostly to a single deletion, in CCMV with respect to BMV, of about 57 b immediately following the coding region. Allowing for this deletion the RNAs are indicate that mutations leading to divergence were constrained in the coding region primarily by the requirement of maintaining a favorable coat protein structure and in the 3' noncoding region primarily by the requirement of maintaining a favorable RNA spatial configuration. PMID:6895941
Highly conserved non-coding elements on either side of SOX9 associated with Pierre Robin sequence.

PubMed

Benko, Sabina; Fantes, Judy A; Amiel, Jeanne; Kleinjan, Dirk-Jan; Thomas, Sophie; Ramsay, Jacqueline; Jamshidi, Negar; Essafi, Abdelkader; Heaney, Simon; Gordon, Christopher T; McBride, David; Golzio, Christelle; Fisher, Malcolm; Perry, Paul; Abadie, Véronique; Ayuso, Carmen; Holder-Espinasse, Muriel; Kilpatrick, Nicky; Lees, Melissa M; Picard, Arnaud; Temple, I Karen; Thomas, Paul; Vazquez, Marie-Paule; Vekemans, Michel; Roest Crollius, Hugues; Hastie, Nicholas D; Munnich, Arnold; Etchevers, Heather C; Pelet, Anna; Farlie, Peter G; Fitzpatrick, David R; Lyonnet, Stanislas

2009-03-01

Pierre Robin sequence (PRS) is an important subgroup of cleft palate. We report several lines of evidence for the existence of a 17q24 locus underlying PRS, including linkage analysis results, a clustering of translocation breakpoints 1.06-1.23 Mb upstream of SOX9, and microdeletions both approximately 1.5 Mb centromeric and approximately 1.5 Mb telomeric of SOX9. We have also identified a heterozygous point mutation in an evolutionarily conserved region of DNA with in vitro and in vivo features of a developmental enhancer. This enhancer is centromeric to the breakpoint cluster and maps within one of the microdeletion regions. The mutation abrogates the in vitro enhancer function and alters binding of the transcription factor MSX1 as compared to the wild-type sequence. In the developing mouse mandible, the 3-Mb region bounded by the microdeletions shows a regionally specific chromatin decompaction in cells expressing Sox9. Some cases of PRS may thus result from developmental misexpression of SOX9 due to disruption of very-long-range cis-regulatory elements.
GenomeVista

DOE Office of Scientific and Technical Information (OSTI.GOV)

Poliakov, Alexander; Couronne, Olivier

2002-11-04

Aligning large vertebrate genomes that are structurally complex poses a variety of problems not encountered on smaller scales. Such genomes are rich in repetitive elements and contain multiple segmental duplications, which increases the difficulty of identifying true orthologous SNA segments in alignments. The sizes of the sequences make many alignment algorithms designed for comparing single proteins extremely inefficient when processing large genomic intervals. We integrated both local and global alignment tools and developed a suite of programs for automatically aligning large vertebrate genomes and identifying conserved non-coding regions in the alignments. Our method uses the BLAT local alignment program tomore » find anchors on the base genome to identify regions of possible homology for a query sequence. These regions are postprocessed to find the best candidates which are then globally aligned using the AVID global alignment program. In the last step conserved non-coding segments are identified using VISTA. Our methods are fast and the resulting alignments exhibit a high degree of sensitivity, covering more than 90% of known coding exons in the human genome. The GenomeVISTA software is a suite of Perl programs that is built on a MySQL database platform. The scheduler gets control data from the database, builds a queve of jobs, and dispatches them to a PC cluster for execution. The main program, running on each node of the cluster, processes individual sequences. A Perl library acts as an interface between the database and the above programs. The use of a separate library allows the programs to function independently of the database schema. The library also improves on the standard Perl MySQL database interfere package by providing auto-reconnect functionality and improved error handling.« less
Massive Gene Transfer and Extensive RNA Editing of a Symbiotic Dinoflagellate Plastid Genome

PubMed Central

Mungpakdee, Sutada; Shinzato, Chuya; Takeuchi, Takeshi; Kawashima, Takeshi; Koyanagi, Ryo; Hisata, Kanako; Tanaka, Makiko; Goto, Hiroki; Fujie, Manabu; Lin, Senjie; Satoh, Nori; Shoguchi, Eiichi

2014-01-01

Genome sequencing of Symbiodinium minutum revealed that 95 of 109 plastid-associated genes have been transferred to the nuclear genome and subsequently expanded by gene duplication. Only 14 genes remain in plastids and occur as DNA minicircles. Each minicircle (1.8–3.3 kb) contains one gene and a conserved noncoding region containing putative promoters and RNA-binding sites. Nine types of RNA editing, including a novel G/U type, were discovered in minicircle transcripts but not in genes transferred to the nucleus. In contrast to DNA editing sites in dinoflagellate mitochondria, which tend to be highly conserved across all taxa, editing sites employed in DNA minicircles are highly variable from species to species. Editing is crucial for core photosystem protein function. It restores evolutionarily conserved amino acids and increases peptidyl hydropathy. It also increases protein plasticity necessary to initiate photosystem complex assembly. PMID:24881086
Statistical and linguistic features of DNA sequences

NASA Technical Reports Server (NTRS)

Havlin, S.; Buldyrev, S. V.; Goldberger, A. L.; Mantegna, R. N.; Peng, C. K.; Simons, M.; Stanley, H. E.

1995-01-01

We present evidence supporting the idea that the DNA sequence in genes containing noncoding regions is correlated, and that the correlation is remarkably long range--indeed, base pairs thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene. We resolve the problem of the "non-stationary" feature of the sequence of base pairs by applying a new algorithm called Detrended Fluctuation Analysis (DFA). We address the claim of Voss that there is no difference in the statistical properties of coding and noncoding regions of DNA by systematically applying the DFA algorithm, as well as standard FFT analysis, to all eukaryotic DNA sequences (33 301 coding and 29 453 noncoding) in the entire GenBank database. We describe a simple model to account for the presence of long-range power-law correlations which is based upon a generalization of the classic Levy walk. Finally, we describe briefly some recent work showing that the noncoding sequences have certain statistical features in common with natural languages. Specifically, we adapt to DNA the Zipf approach to analyzing linguistic texts, and the Shannon approach to quantifying the "redundancy" of a linguistic text in terms of a measurable entropy function. We suggest that noncoding regions in plants and invertebrates may display a smaller entropy and larger redundancy than coding regions, further supporting the possibility that noncoding regions of DNA may carry biological information.
Cell cycle, oncogenic and tumor suppressor pathways regulate numerous long and macro non-protein-coding RNAs

PubMed Central

2014-01-01

Background The genome is pervasively transcribed but most transcripts do not code for proteins, constituting non-protein-coding RNAs. Despite increasing numbers of functional reports of individual long non-coding RNAs (lncRNAs), assessing the extent of functionality among the non-coding transcriptional output of mammalian cells remains intricate. In the protein-coding world, transcripts differentially expressed in the context of processes essential for the survival of multicellular organisms have been instrumental in the discovery of functionally relevant proteins and their deregulation is frequently associated with diseases. We therefore systematically identified lncRNAs expressed differentially in response to oncologically relevant processes and cell-cycle, p53 and STAT3 pathways, using tiling arrays. Results We found that up to 80% of the pathway-triggered transcriptional responses are non-coding. Among these we identified very large macroRNAs with pathway-specific expression patterns and demonstrated that these are likely continuous transcripts. MacroRNAs contain elements conserved in mammals and sauropsids, which in part exhibit conserved RNA secondary structure. Comparing evolutionary rates of a macroRNA to adjacent protein-coding genes suggests a local action of the transcript. Finally, in different grades of astrocytoma, a tumor disease unrelated to the initially used cell lines, macroRNAs are differentially expressed. Conclusions It has been shown previously that the majority of expressed non-ribosomal transcripts are non-coding. We now conclude that differential expression triggered by signaling pathways gives rise to a similar abundance of non-coding content. It is thus unlikely that the prevalence of non-coding transcripts in the cell is a trivial consequence of leaky or random transcription events. PMID:24594072
Identification and Potential Regulatory Properties of Evolutionary Conserved Regions (ECRs) at the Schizophrenia-Associated MIR137 Locus.

PubMed

Gianfrancesco, Olympia; Griffiths, Daniel; Myers, Paul; Collier, David A; Bubb, Vivien J; Quinn, John P

2016-10-01

Genome-wide association studies (GWAS) have identified a region at chromosome 1p21.3, containing the microRNA MIR137, to be among the most significant associations for schizophrenia. However, the mechanism by which genetic variation at this locus increases risk of schizophrenia is unknown. Identifying key regulatory regions around MIR137 is crucial to understanding the potential role of this gene in the aetiology of psychiatric disorders. Through alignment of vertebrate genomes, we identified seven non-coding regions at the MIR137 locus with conservation comparable to exons (>70 %). Bioinformatic analysis using the Psychiatric Genomics Consortium GWAS dataset for schizophrenia showed five of the ECRs to have genome-wide significant SNPs in or adjacent to their sequence. Analysis of available datasets on chromatin marks and histone modification data showed that three of the ECRs were predicted to be functional in the human brain, and three in development. In vitro analysis of ECR activity using reporter gene assays showed that all seven of the selected ECRs displayed transcriptional regulatory activity in the SH-SY5Y neuroblastoma cell line. This data suggests a regulatory role in the developing and adult brain for these highly conserved regions at the MIR137 schizophrenia-associated locus and further that these domains could act individually or synergistically to regulate levels of MIR137 expression.
Identification of differentially expressed small non-coding RNAs in the legume endosymbiont Sinorhizobium meliloti by comparative genomics

PubMed Central

del Val, Coral; Rivas, Elena; Torres-Quesada, Omar; Toro, Nicolás; Jiménez-Zurdo, José I

2007-01-01

Bacterial small non-coding RNAs (sRNAs) are being recognized as novel widespread regulators of gene expression in response to environmental signals. Here, we present the first search for sRNA-encoding genes in the nitrogen-fixing endosymbiont Sinorhizobium meliloti, performed by a genome-wide computational analysis of its intergenic regions. Comparative sequence data from eight related α-proteobacteria were obtained, and the interspecies pairwise alignments were scored with the programs eQRNA and RNAz as complementary predictive tools to identify conserved and stable secondary structures corresponding to putative non-coding RNAs. Northern experiments confirmed that eight of the predicted loci, selected among the original 32 candidates as most probable sRNA genes, expressed small transcripts. This result supports the combined use of eQRNA and RNAz as a robust strategy to identify novel sRNAs in bacteria. Furthermore, seven of the transcripts accumulated differentially in free-living and symbiotic conditions. Experimental mapping of the 5′-ends of the detected transcripts revealed that their encoding genes are organized in autonomous transcription units with recognizable promoter and, in most cases, termination signatures. These findings suggest novel regulatory functions for sRNAs related to the interactions of α-proteobacteria with their eukaryotic hosts. PMID:17971083
Comparative Mitogenomics of the Assassin Bug Genus Peirates (Hemiptera: Reduviidae: Peiratinae) Reveal Conserved Mitochondrial Genome Organization of P. atromaculatus, P. fulvescens and P. turpis

PubMed Central

Zhao, Guangyu; Li, Hu; Zhao, Ping; Cai, Wanzhi

2015-01-01

In this study, we sequenced four new mitochondrial genomes and presented comparative mitogenomic analyses of five species in the genus Peirates (Hemiptera: Reduviidae). Mitochondrial genomes of these five assassin bugs had a typical set of 37 genes and retained the ancestral gene arrangement of insects. The A+T content, AT- and GC-skews were similar to the common base composition biases of insect mtDNA. Genomic size ranges from 15,702 bp to 16,314 bp and most of the size variation was due to length and copy number of the repeat unit in the putative control region. All of the control region sequences included large tandem repeats present in two or more copies. Our result revealed similarity in mitochondrial genomes of P. atromaculatus, P. fulvescens and P. turpis, as well as the highly conserved genomic-level characteristics of these three species, e.g., the same start and stop codons of protein-coding genes, conserved secondary structure of tRNAs, identical location and length of non-coding and overlapping regions, and conservation of structural elements and tandem repeat unit in control region. Phylogenetic analyses also supported a close relationship between P. atromaculatus, P. fulvescens and P. turpis, which might be recently diverged species. The present study indicates that mitochondrial genome has important implications on phylogenetics, population genetics and speciation in the genus Peirates. PMID:25689825
Mechanisms of haplotype divergence at the RGA08 nucleotide-binding leucine-rich repeat gene locus in wild banana (Musa balbisiana).

PubMed

Baurens, Franc-Christophe; Bocs, Stéphanie; Rouard, Mathieu; Matsumoto, Takashi; Miller, Robert N G; Rodier-Goud, Marguerite; MBéguié-A-MBéguié, Didier; Yahiaoui, Nabila

2010-07-16

Comparative sequence analysis of complex loci such as resistance gene analog clusters allows estimating the degree of sequence conservation and mechanisms of divergence at the intraspecies level. In banana (Musa sp.), two diploid wild species Musa acuminata (A genome) and Musa balbisiana (B genome) contribute to the polyploid genome of many cultivars. The M. balbisiana species is associated with vigour and tolerance to pests and disease and little is known on the genome structure and haplotype diversity within this species. Here, we compare two genomic sequences of 253 and 223 kb corresponding to two haplotypes of the RGA08 resistance gene analog locus in M. balbisiana "Pisang Klutuk Wulung" (PKW). Sequence comparison revealed two regions of contrasting features. The first is a highly colinear gene-rich region where the two haplotypes diverge only by single nucleotide polymorphisms and two repetitive element insertions. The second corresponds to a large cluster of RGA08 genes, with 13 and 18 predicted RGA genes and pseudogenes spread over 131 and 152 kb respectively on each haplotype. The RGA08 cluster is enriched in repetitive element insertions, in duplicated non-coding intergenic sequences including low complexity regions and shows structural variations between haplotypes. Although some allelic relationships are retained, a large diversity of RGA08 genes occurs in this single M. balbisiana genotype, with several RGA08 paralogs specific to each haplotype. The RGA08 gene family has evolved by mechanisms of unequal recombination, intragenic sequence exchange and diversifying selection. An unequal recombination event taking place between duplicated non-coding intergenic sequences resulted in a different RGA08 gene content between haplotypes pointing out the role of such duplicated regions in the evolution of RGA clusters. Based on the synonymous substitution rate in coding sequences, we estimated a 1 million year divergence time for these M. balbisiana haplotypes. A large RGA08 gene cluster identified in wild banana corresponds to a highly variable genomic region between haplotypes surrounded by conserved flanking regions. High level of sequence identity (70 to 99%) of the genic and intergenic regions suggests a recent and rapid evolution of this cluster in M. balbisiana.
Sost, independent of the non-coding enhancer ECR5, is required for bone mechanoadaptation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Robling, Alexander G.; Kang, Kyung Shin; Bullock, Whitney A.

Here, sclerostin ( Sost) is a negative regulator of bone formation that acts upon the Wnt signaling pathway. Sost is mechanically regulated at both mRNA and protein level such that loading represses and unloading enhances Sost expression, in osteocytes and in circulation. The non-coding evolutionarily conserved enhancer ECR5 has been previously reported as a transcriptional regulatory element required for modulating Sost expression in osteocytes. Here we explored the mechanisms by which ECR5, or several other putative transcriptional enhancers regulate Sost expression, in response to mechanical stimulation. We found that in vivo ulna loading is equally osteoanabolic in wildtype and Sostmore » –/– mice, although Sost is required for proper distribution of load-induced bone formation to regions of high strain. Using Luciferase reporters carrying the ECR5 non-coding enhancer and heterologous or homologous h SOST promoters, we found that ECR5 is mechanosensitive in vitro and that ECR5-driven Luciferase activity decreases in osteoblasts exposed to oscillatory fluid flow. Yet, ECR5–/– mice showed similar magnitude of load-induced bone formation and similar periosteal distribution of bone formation to high-strain regions compared to wildtype mice. Further, we found that in contrast to Sost–/– mice, which are resistant to disuse-induced bone loss, ECR5–/– mice lose bone upon unloading to a degree similar to wildtype control mice. ECR5 deletion did not abrogate positive effects of unloading on Sost, suggesting that additional transcriptional regulators and regulatory elements contribute to load-induced regulation of Sost.« less

Sost, independent of the non-coding enhancer ECR5, is required for bone mechanoadaptation

DOE PAGES

Robling, Alexander G.; Kang, Kyung Shin; Bullock, Whitney A.; ...

2016-09-04

Here, sclerostin ( Sost) is a negative regulator of bone formation that acts upon the Wnt signaling pathway. Sost is mechanically regulated at both mRNA and protein level such that loading represses and unloading enhances Sost expression, in osteocytes and in circulation. The non-coding evolutionarily conserved enhancer ECR5 has been previously reported as a transcriptional regulatory element required for modulating Sost expression in osteocytes. Here we explored the mechanisms by which ECR5, or several other putative transcriptional enhancers regulate Sost expression, in response to mechanical stimulation. We found that in vivo ulna loading is equally osteoanabolic in wildtype and Sostmore » –/– mice, although Sost is required for proper distribution of load-induced bone formation to regions of high strain. Using Luciferase reporters carrying the ECR5 non-coding enhancer and heterologous or homologous h SOST promoters, we found that ECR5 is mechanosensitive in vitro and that ECR5-driven Luciferase activity decreases in osteoblasts exposed to oscillatory fluid flow. Yet, ECR5–/– mice showed similar magnitude of load-induced bone formation and similar periosteal distribution of bone formation to high-strain regions compared to wildtype mice. Further, we found that in contrast to Sost–/– mice, which are resistant to disuse-induced bone loss, ECR5–/– mice lose bone upon unloading to a degree similar to wildtype control mice. ECR5 deletion did not abrogate positive effects of unloading on Sost, suggesting that additional transcriptional regulators and regulatory elements contribute to load-induced regulation of Sost.« less
Characteristics and phylogenetic analysis of the complete mitochondrial genome of Cheilodactylus quadricornis (Perciformes, Cheilodactylidae).

PubMed

Wang, Aishuai; Sun, Yuena; Wu, Changwen

2016-11-01

The complete mitochondrial genome of the Cheilodactylus quadricornis was firstly determined in the present study. The mitochondrial genome of C. quadricornis is 16 521 nucleotides, comprising 13 protein-coding genes and 2 ribosomal RNA genes, 22 tRNA genes and 2 main non-coding regions (the control region and the origin of the light-strand replication). The overall base composition was T, 26.3%; C, 29.6%; A, 27.8% and G, 16.3%. The gene arrangement, base composition, and tRNA structures of the complete mitochondrial genome of C. quadricornis is similar to other teleosts. Only two central conserved sequence blocks (CSB-2 and CSB-3) were identified in the control region. In addition, the conserved motif 5'-GCCGG-3' was identified in the origin of light-strand replication of C. quadricornis. The complete mitochondrial genome of C. quadricornis was used to construct phylogenetic tree, which shows that C. quadricornis and C. variegatus clustered in a clade and formed a sister relationship. This mitogenome sequence data would play an important role in population genetics and phylogenetic analysis of the Cheilodactylidae.
RNA Polymerase III promoter screen uncovers a novel noncoding RNA family conserved in Caenorhabditis and other clade V nematodes.

PubMed

Gruber, Andreas R

2014-07-10

RNA Polymerase III is a highly specialized enzyme complex responsible for the transcription of a very distinct set of housekeeping noncoding RNAs including tRNAs, 7SK snRNA, Y RNAs, U6 snRNA, and the RNA components of RNaseP and RNaseMRP. In this work we have utilized the conserved promoter structure of known RNA Polymerase III transcripts consisting of characteristic sequence elements termed proximal sequence elements (PSE) A and B and a TATA-box to uncover a novel RNA Polymerase III-transcribed, noncoding RNA family found to be conserved in Caenorhabditis as well as other clade V nematode species. Homology search in combination with detailed sequence and secondary structure analysis revealed that members of this novel ncRNA family evolve rapidly, and only maintain a potentially functional small stem structure that links the 5' end to the very 3' end of the transcript and a small hairpin structure at the 3' end. This is most likely required for efficient transcription termination. In addition, our study revealed evidence that canonical C/D box snoRNAs are also transcribed from a PSE A-PSE B-TATA-box promoter in Caenorhabditis elegans. Copyright © 2014 Elsevier B.V. All rights reserved.
RRE: a tool for the extraction of non-coding regions surrounding annotated genes from genomic datasets.

PubMed

Lazzarato, F; Franceschinis, G; Botta, M; Cordero, F; Calogero, R A

2004-11-01

RRE allows the extraction of non-coding regions surrounding a coding sequence [i.e. gene upstream region, 5'-untranslated region (5'-UTR), introns, 3'-UTR, downstream region] from annotated genomic datasets available at NCBI. RRE parser and web-based interface are accessible at http://www.bioinformatica.unito.it/bioinformatics/rre/rre.html
An Ultraconserved Brain-specific Enhancer within ADGRL3 (LPHN3) Underpins ADHD Susceptibility

PubMed Central

Martinez, Ariel F.; Abe, Yu; Hong, Sungkook; Molyneux, Kevin; Yarnell, David; Löhr, Heiko; Driever, Wolfgang; Acosta, Maria T.; Arcos-Burgos, Mauricio; Muenke, Maximilian

2016-01-01

BACKGROUND Genetic factors predispose to attention deficit/hyperactivity disorder (ADHD). Previous studies have reported linkage and association to ADHD of gene variants within ADGRL3. In this study, we functionally analyzed non-coding variants in this gene as likely pathological contributors. METHODS In silico, in vitro and in vivo approaches were used to identify and characterize evolutionary conserved elements within the ADGRL3 linkage region (~207 Kb). Family-based genetic analyses on 838 individuals (372 affected and 466 unaffected) identified ADHD-associated SNPs harbored in some of these conserved elements. Luciferase assays and zebrafish GFP transgenesis tested conserved elements for transcriptional enhancer activity. Electromobility shift assays were used to verify transcription factor binding disruption by ADHD risk alleles. RESULTS An ultraconserved element was discovered (ECR47) that functions as a transcriptional enhancer. A three-variant ADHD risk haplotype in ECR47, formed by rs17226398, rs56038622 and rs2271338, reduced enhancer activity by 40% in neuroblastoma and astrocytoma cells (PBonferroni<0.0001). This enhancer also drove GFP expression in the zebrafish brain in a tissue-specific manner, sharing aspects of endogenous ADGRL3 expression. The rs2271338 risk allele disrupts binding of YY1, an important factor in the development and function of the central nervous system. Expression quantitative trait loci analysis of post-mortem human brain tissues revealed an association between rs2271338 and reduced ADGRL3 expression in the thalamus. CONCLUSIONS These results uncover the first functional evidence of common non-coding variants with potential implications for the pathology of ADHD. PMID:27692237
An integrative approach to predicting the functional effects of small indels in non-coding regions of the human genome

PubMed Central

Ferlaino, Michael; Rogers, Mark F.; Shihab, Hashem A.; Mort, Matthew; Cooper, David N.; Gaunt, Tom R.; Campbell, Colin

2018-01-01

Background Small insertions and deletions (indels) have a significant influence in human disease and, in terms of frequency, they are second only to single nucleotide variants as pathogenic mutations. As the majority of mutations associated with complex traits are located outside the exome, it is crucial to investigate the potential pathogenic impact of indels in non-coding regions of the human genome. Results We present FATHMM-indel, an integrative approach to predict the functional effect, pathogenic or neutral, of indels in non-coding regions of the human genome. Our method exploits various genomic annotations in addition to sequence data. When validated on benchmark data, FATHMM-indel significantly outperforms CADD and GAVIN, state of the art models in assessing the pathogenic impact of non-coding variants. FATHMM-indel is available via a web server at indels.biocompute.org.uk. Conclusions FATHMM-indel can accurately predict the functional impact and prioritise small indels throughout the whole non-coding genome. PMID:28985712
An integrative approach to predicting the functional effects of small indels in non-coding regions of the human genome.

PubMed

Ferlaino, Michael; Rogers, Mark F; Shihab, Hashem A; Mort, Matthew; Cooper, David N; Gaunt, Tom R; Campbell, Colin

2017-10-06

Small insertions and deletions (indels) have a significant influence in human disease and, in terms of frequency, they are second only to single nucleotide variants as pathogenic mutations. As the majority of mutations associated with complex traits are located outside the exome, it is crucial to investigate the potential pathogenic impact of indels in non-coding regions of the human genome. We present FATHMM-indel, an integrative approach to predict the functional effect, pathogenic or neutral, of indels in non-coding regions of the human genome. Our method exploits various genomic annotations in addition to sequence data. When validated on benchmark data, FATHMM-indel significantly outperforms CADD and GAVIN, state of the art models in assessing the pathogenic impact of non-coding variants. FATHMM-indel is available via a web server at indels.biocompute.org.uk. FATHMM-indel can accurately predict the functional impact and prioritise small indels throughout the whole non-coding genome.
The complete mitogenome of the Australian tadpole shrimp Triops australiensis (Spencer & Hall, 1895) (Crustacea: Branchiopoda: Notostraca).

PubMed

Gan, Han Ming; Tan, Mun Hua; Lee, Yin Peng; Austin, Christopher M

2016-05-01

The mitochondrial genome sequence of the Australian tadpole shrimp, Triops australiensis is presented (GenBank Accession Number: NC_024439) and compared with other Triops species. Triops australiensis has a mitochondrial genome of 15,125 base pairs consisting of 13 protein-coding genes, 2 ribosomal subunit genes, 22 transfer RNAs, and a non-coding AT-rich region. The T. australiensis mitogenome is composed of 36.4% A, 16.1% C, 12.3% G and 35.1% T. The mitogenome gene order conforms to the primitive arrangement for Branchiopod crustaceans, which is also conserved within the Pancrustacean.
Dose-sensitivity, conserved non-coding sequences, and duplicate gene retention through multiple tetraploidies in the grasses.

PubMed

Schnable, James C; Pedersen, Brent S; Subramaniam, Sabarinath; Freeling, Michael

2011-01-01

Whole genome duplications, or tetraploidies, are an important source of increased gene content. Following whole genome duplication, duplicate copies of many genes are lost from the genome. This loss of genes is biased both in the classes of genes deleted and the subgenome from which they are lost. Many or all classes are genes preferentially retained as duplicate copies are engaged in dose sensitive protein-protein interactions, such that deletion of any one duplicate upsets the status quo of subunit concentrations, and presumably lowers fitness as a result. Transcription factors are also preferentially retained following every whole genome duplications studied. This has been explained as a consequence of protein-protein interactions, just as for other highly retained classes of genes. We show that the quantity of conserved noncoding sequences (CNSs) associated with genes predicts the likelihood of their retention as duplicate pairs following whole genome duplication. As many CNSs likely represent binding sites for transcriptional regulators, we propose that the likelihood of gene retention following tetraploidy may also be influenced by dose-sensitive protein-DNA interactions between the regulatory regions of CNS-rich genes - nicknamed bigfoot genes - and the proteins that bind to them. Using grass genomes, we show that differential loss of CNSs from one member of a pair following the pre-grass tetraploidy reduces its chance of retention in the subsequent maize lineage tetraploidy.
Dose–Sensitivity, Conserved Non-Coding Sequences, and Duplicate Gene Retention Through Multiple Tetraploidies in the Grasses

PubMed Central

Schnable, James C.; Pedersen, Brent S.; Subramaniam, Sabarinath; Freeling, Michael

2011-01-01

Whole genome duplications, or tetraploidies, are an important source of increased gene content. Following whole genome duplication, duplicate copies of many genes are lost from the genome. This loss of genes is biased both in the classes of genes deleted and the subgenome from which they are lost. Many or all classes are genes preferentially retained as duplicate copies are engaged in dose sensitive protein–protein interactions, such that deletion of any one duplicate upsets the status quo of subunit concentrations, and presumably lowers fitness as a result. Transcription factors are also preferentially retained following every whole genome duplications studied. This has been explained as a consequence of protein–protein interactions, just as for other highly retained classes of genes. We show that the quantity of conserved noncoding sequences (CNSs) associated with genes predicts the likelihood of their retention as duplicate pairs following whole genome duplication. As many CNSs likely represent binding sites for transcriptional regulators, we propose that the likelihood of gene retention following tetraploidy may also be influenced by dose–sensitive protein–DNA interactions between the regulatory regions of CNS-rich genes – nicknamed bigfoot genes – and the proteins that bind to them. Using grass genomes, we show that differential loss of CNSs from one member of a pair following the pre-grass tetraploidy reduces its chance of retention in the subsequent maize lineage tetraploidy. PMID:22645525
Ftx is a non-coding RNA which affects Xist expression and chromatin structure within the X-inactivation center region.

PubMed

Chureau, Corinne; Chantalat, Sophie; Romito, Antonio; Galvani, Angélique; Duret, Laurent; Avner, Philip; Rougeulle, Claire

2011-02-15

X chromosome inactivation (XCI) is an essential epigenetic process which involves several non-coding RNAs (ncRNAs), including Xist, the master regulator of X-inactivation initiation. Xist is flanked in its 5' region by a large heterochromatic hotspot, which contains several transcription units including a gene of unknown function, Ftx (five prime to Xist). In this article, we describe the characterization and functional analysis of murine Ftx. We present evidence that Ftx produces a conserved functional long ncRNA, and additionally hosts microRNAs (miR) in its introns. Strikingly, Ftx partially escapes X-inactivation and is upregulated specifically in female ES cells at the onset of X-inactivation, an expression profile which closely follows that of Xist. We generated Ftx null ES cells to address the function of this gene. In these cells, only local changes in chromatin marks are detected within the hotspot, indicating that Ftx is not involved in the global maintenance of the heterochromatic structure of this region. The Ftx mutation, however, results in widespread alteration of transcript levels within the X-inactivation center (Xic) and particularly important decreases in Xist RNA levels, which were correlated with increased DNA methylation at the Xist CpG island. Altogether our results indicate that Ftx is a positive regulator of Xist and lead us to propose that Ftx is a novel ncRNA involved in XCI.
Noncoding Genomics in Gastric Cancer and the Gastric Precancerous Cascade: Pathogenesis and Biomarkers

PubMed Central

Garcia-Bloj, Benjamin; Fry, Jacqueline; Wichmann, Ignacio

2015-01-01

Gastric cancer is the fifth most common cancer and the third leading cause of cancer-related death, whose patterns vary among geographical regions and ethnicities. It is a multifactorial disease, and its development depends on infection by Helicobacter pylori (H. pylori) and Epstein-Barr virus (EBV), host genetic factors, and environmental factors. The heterogeneity of the disease has begun to be unraveled by a comprehensive mutational evaluation of primary tumors. The low-abundance of mutations suggests that other mechanisms participate in the evolution of the disease, such as those found through analyses of noncoding genomics. Noncoding genomics includes single nucleotide polymorphisms (SNPs), regulation of gene expression through DNA methylation of promoter sites, miRNAs, other noncoding RNAs in regulatory regions, and other topics. These processes and molecules ultimately control gene expression. Potential biomarkers are appearing from analyses of noncoding genomics. This review focuses on noncoding genomics and potential biomarkers in the context of gastric cancer and the gastric precancerous cascade. PMID:26379360
Comparative transgenic analysis of enhancers from the human SHOX and mouse Shox2 genomic regions.

PubMed

Rosin, Jessica M; Abassah-Oppong, Samuel; Cobb, John

2013-08-01

Disruption of presumptive enhancers downstream of the human SHOX gene (hSHOX) is a frequent cause of the zeugopodal limb defects characteristic of Léri-Weill dyschondrosteosis (LWD). The closely related mouse Shox2 gene (mShox2) is also required for limb development, but in the more proximal stylopodium. In this study, we used transgenic mice in a comparative approach to characterize enhancer sequences in the hSHOX and mShox2 genomic regions. Among conserved noncoding elements (CNEs) that function as enhancers in vertebrate genomes, those that are maintained near paralogous genes are of particular interest given their ancient origins. Therefore, we first analyzed the regulatory potential of a genomic region containing one such duplicated CNE (dCNE) downstream of mShox2 and hSHOX. We identified a strong limb enhancer directly adjacent to the mShox2 dCNE that recapitulates the expression pattern of the endogenous gene. Interestingly, this enhancer requires sequences only conserved in the mammalian lineage in order to drive strong limb expression, whereas the more deeply conserved sequences of the dCNE function as a neural enhancer. Similarly, we found that a conserved element downstream of hSHOX (CNE9) also functions as a neural enhancer in transgenic mice. However, when the CNE9 transgenic construct was enlarged to include adjacent, non-conserved sequences frequently deleted in LWD patients, the transgene drove expression in the zeugopodium of the limbs. Therefore, both hSHOX and mShox2 limb enhancers are coupled to distinct neural enhancers. This is the first report demonstrating the activity of cis-regulatory elements from the hSHOX and mShox2 genomic regions in mammalian embryos.
A 3' UTR-Derived Small RNA Provides the Regulatory Noncoding Arm of the Inner Membrane Stress Response.

PubMed

Chao, Yanjie; Vogel, Jörg

2016-02-04

Small RNAs (sRNAs) from conserved noncoding genes are crucial regulators in bacterial signaling pathways but have remained elusive in the Cpx response to inner membrane stress. Here we report that an alternative biogenesis pathway releasing the conserved mRNA 3' UTR of stress chaperone CpxP as an ∼60-nt sRNA provides the noncoding arm of the Cpx response. This so-called CpxQ sRNA, generated by general mRNA decay through RNase E, acts as an Hfq-dependent repressor of multiple mRNAs encoding extracytoplasmic proteins. Both CpxQ and the Cpx pathway are required for cell survival under conditions of dissipation of membrane potential. Our discovery of CpxQ illustrates how the conversion of a transcribed 3' UTR into an sRNA doubles the output of a single mRNA to produce two factors with spatially segregated functions during inner membrane stress: a chaperone that targets problematic proteins in the periplasm and a regulatory RNA that dampens their synthesis in the cytosol. Copyright © 2016 Elsevier Inc. All rights reserved.
Principles of long noncoding RNA evolution derived from direct comparison of transcriptomes in 17 species.

PubMed

Hezroni, Hadas; Koppstein, David; Schwartz, Matthew G; Avrutin, Alexandra; Bartel, David P; Ulitsky, Igor

2015-05-19

The inability to predict long noncoding RNAs from genomic sequence has impeded the use of comparative genomics for studying their biology. Here, we develop methods that use RNA sequencing (RNA-seq) data to annotate the transcriptomes of 16 vertebrates and the echinoid sea urchin, uncovering thousands of previously unannotated genes, most of which produce long intervening noncoding RNAs (lincRNAs). Although in each species, >70% of lincRNAs cannot be traced to homologs in species that diverged >50 million years ago, thousands of human lincRNAs have homologs with similar expression patterns in other species. These homologs share short, 5'-biased patches of sequence conservation nested in exonic architectures that have been extensively rewired, in part by transposable element exonization. Thus, over a thousand human lincRNAs are likely to have conserved functions in mammals, and hundreds beyond mammals, but those functions require only short patches of specific sequences and can tolerate major changes in gene architecture. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
The Hippo pathway in hepatocellular carcinoma: Non-coding RNAs in action.

PubMed

Shi, Xuan; Zhu, Hai-Rong; Liu, Tao-Tao; Shen, Xi-Zhong; Zhu, Ji-Min

2017-08-01

Hepatocellular carcinoma (HCC) is the sixth most common cancer and the third leading cause of cancer-related death worldwide. However, current strategies curing HCC are far from satisfaction. The Hippo pathway is an evolutionarily conserved tumor suppressive pathway that plays crucial roles in organ size control and tissue homeostasis. Its dysregulation is commonly observed in various types of cancer including HCC. Recently, the prominent role of non-coding RNAs in the Hippo pathway during normal development and neoplastic progression is also emerging in liver. Thus, further investigation into the regulatory network between non-coding RNAs and the Hippo pathway and their connections with HCC may provide new therapeutic avenues towards developing an effective preventative or perhaps curative treatment for HCC. Herein we summarize the role of non-coding RNAs in the Hippo pathway, with an emphasis on their contribution to carcinogenesis, diagnosis, treatment and prognosis of HCC. Copyright © 2017 Elsevier B.V. All rights reserved.
Massive gene transfer and extensive RNA editing of a symbiotic dinoflagellate plastid genome.

PubMed

Mungpakdee, Sutada; Shinzato, Chuya; Takeuchi, Takeshi; Kawashima, Takeshi; Koyanagi, Ryo; Hisata, Kanako; Tanaka, Makiko; Goto, Hiroki; Fujie, Manabu; Lin, Senjie; Satoh, Nori; Shoguchi, Eiichi

2014-05-31

Genome sequencing of Symbiodinium minutum revealed that 95 of 109 plastid-associated genes have been transferred to the nuclear genome and subsequently expanded by gene duplication. Only 14 genes remain in plastids and occur as DNA minicircles. Each minicircle (1.8-3.3 kb) contains one gene and a conserved noncoding region containing putative promoters and RNA-binding sites. Nine types of RNA editing, including a novel G/U type, were discovered in minicircle transcripts but not in genes transferred to the nucleus. In contrast to DNA editing sites in dinoflagellate mitochondria, which tend to be highly conserved across all taxa, editing sites employed in DNA minicircles are highly variable from species to species. Editing is crucial for core photosystem protein function. It restores evolutionarily conserved amino acids and increases peptidyl hydropathy. It also increases protein plasticity necessary to initiate photosystem complex assembly. © The Author(s) 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Deep sequencing reveals unique small RNA repertoire that is regulated during head regeneration in Hydra magnipapillata.

PubMed

Krishna, Srikar; Nair, Aparna; Cheedipudi, Sirisha; Poduval, Deepak; Dhawan, Jyotsna; Palakodeti, Dasaradhi; Ghanekar, Yashoda

2013-01-07

Small non-coding RNAs such as miRNAs, piRNAs and endo-siRNAs fine-tune gene expression through post-transcriptional regulation, modulating important processes in development, differentiation, homeostasis and regeneration. Using deep sequencing, we have profiled small non-coding RNAs in Hydra magnipapillata and investigated changes in small RNA expression pattern during head regeneration. Our results reveal a unique repertoire of small RNAs in hydra. We have identified 126 miRNA loci; 123 of these miRNAs are unique to hydra. Less than 50% are conserved across two different strains of Hydra vulgaris tested in this study, indicating a highly diverse nature of hydra miRNAs in contrast to bilaterian miRNAs. We also identified siRNAs derived from precursors with perfect stem-loop structure and that arise from inverted repeats. piRNAs were the most abundant small RNAs in hydra, mapping to transposable elements, the annotated transcriptome and unique non-coding regions on the genome. piRNAs that map to transposable elements and the annotated transcriptome display a ping-pong signature. Further, we have identified several miRNAs and piRNAs whose expression is regulated during hydra head regeneration. Our study defines different classes of small RNAs in this cnidarian model system, which may play a role in orchestrating gene expression essential for hydra regeneration.
Deep sequencing reveals unique small RNA repertoire that is regulated during head regeneration in Hydra magnipapillata

PubMed Central

Krishna, Srikar; Nair, Aparna; Cheedipudi, Sirisha; Poduval, Deepak; Dhawan, Jyotsna; Palakodeti, Dasaradhi; Ghanekar, Yashoda

2013-01-01

Small non-coding RNAs such as miRNAs, piRNAs and endo-siRNAs fine-tune gene expression through post-transcriptional regulation, modulating important processes in development, differentiation, homeostasis and regeneration. Using deep sequencing, we have profiled small non-coding RNAs in Hydra magnipapillata and investigated changes in small RNA expression pattern during head regeneration. Our results reveal a unique repertoire of small RNAs in hydra. We have identified 126 miRNA loci; 123 of these miRNAs are unique to hydra. Less than 50% are conserved across two different strains of Hydra vulgaris tested in this study, indicating a highly diverse nature of hydra miRNAs in contrast to bilaterian miRNAs. We also identified siRNAs derived from precursors with perfect stem–loop structure and that arise from inverted repeats. piRNAs were the most abundant small RNAs in hydra, mapping to transposable elements, the annotated transcriptome and unique non-coding regions on the genome. piRNAs that map to transposable elements and the annotated transcriptome display a ping–pong signature. Further, we have identified several miRNAs and piRNAs whose expression is regulated during hydra head regeneration. Our study defines different classes of small RNAs in this cnidarian model system, which may play a role in orchestrating gene expression essential for hydra regeneration. PMID:23166307
Scaling features of noncoding DNA

NASA Technical Reports Server (NTRS)

Stanley, H. E.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Peng, C. K.; Simons, M.

1999-01-01

We review evidence supporting the idea that the DNA sequence in genes containing noncoding regions is correlated, and that the correlation is remarkably long range--indeed, base pairs thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene, and utilize this fact to build a Coding Sequence Finder Algorithm, which uses statistical ideas to locate the coding regions of an unknown DNA sequence. Finally, we describe briefly some recent work adapting to DNA the Zipf approach to analyzing linguistic texts, and the Shannon approach to quantifying the "redundancy" of a linguistic text in terms of a measurable entropy function, and reporting that noncoding regions in eukaryotes display a larger redundancy than coding regions. Specifically, we consider the possibility that this result is solely a consequence of nucleotide concentration differences as first noted by Bonhoeffer and his collaborators. We find that cytosine-guanine (CG) concentration does have a strong "background" effect on redundancy. However, we find that for the purine-pyrimidine binary mapping rule, which is not affected by the difference in CG concentration, the Shannon redundancy for the set of analyzed sequences is larger for noncoding regions compared to coding regions.

Non-coding cancer driver candidates identified with a sample- and position-specific model of the somatic mutation rate

PubMed Central

Juul, Malene; Bertl, Johanna; Guo, Qianyun; Nielsen, Morten Muhlig; Świtnicki, Michał; Hornshøj, Henrik; Madsen, Tobias; Hobolth, Asger; Pedersen, Jakob Skou

2017-01-01

Non-coding mutations may drive cancer development. Statistical detection of non-coding driver regions is challenged by a varying mutation rate and uncertainty of functional impact. Here, we develop a statistically founded non-coding driver-detection method, ncdDetect, which includes sample-specific mutational signatures, long-range mutation rate variation, and position-specific impact measures. Using ncdDetect, we screened non-coding regulatory regions of protein-coding genes across a pan-cancer set of whole-genomes (n = 505), which top-ranked known drivers and identified new candidates. For individual candidates, presence of non-coding mutations associates with altered expression or decreased patient survival across an independent pan-cancer sample set (n = 5454). This includes an antigen-presenting gene (CD1A), where 5’UTR mutations correlate significantly with decreased survival in melanoma. Additionally, mutations in a base-excision-repair gene (SMUG1) correlate with a C-to-T mutational-signature. Overall, we find that a rich model of mutational heterogeneity facilitates non-coding driver identification and integrative analysis points to candidates of potential clinical relevance. DOI: http://dx.doi.org/10.7554/eLife.21778.001 PMID:28362259
Japanese encephalitis virus non-coding RNA inhibits activation of interferon by blocking nuclear translocation of interferon regulatory factor 3.

PubMed

Chang, Ruey-Yi; Hsu, Ta-Wen; Chen, Yen-Lin; Liu, Shu-Fan; Tsai, Yi-Jer; Lin, Yun-Tong; Chen, Yi-Shiuan; Fan, Yi-Hsin

2013-09-27

Noncoding RNA (ncRNA) plays a critical role in modulating a broad range of diseases. All arthropod-borne flaviviruses produce short fragment ncRNA (sfRNA) collinear with highly conserved regions of the 3'-untranslated region (UTR) in the viral genome. We show that the molar ratio of sfRNA to genomic RNA in Japanese encephalitis virus (JEV) persistently infected cells is greater than that in acutely infected cells, indicating an sfRNA role in establishing persistent infection. Transfecting excess quantities of sfRNA into JEV-infected cells reduced interferon-β (IFN-β) promoter activity by 57% and IFN-β mRNA levels by 52%, compared to mock-transfected cells. Transfection of sfRNA into JEV-infected cells also reduced phosphorylation of interferon regulatory factor-3 (IRF-3), the IFN-β upstream regulator, and blocked roughly 30% of IRF-3 nuclear localization. Furthermore, JEV-infected sfRNA transfected cells produced 23% less IFN-β-stimulated apoptosis than mock-transfected groups did. Taken together, these results suggest that sfRNA plays a role against host-cell antiviral responses, prevents cells from undergoing apoptosis, and thus contributes to viral persistence. Copyright © 2013 Elsevier B.V. All rights reserved.
Identification and Characterization of Long Non-Coding RNAs Related to Mouse Embryonic Brain Development from Available Transcriptomic Data

PubMed Central

He, Hongjuan; Xiu, Youcheng; Guo, Jing; Liu, Hui; Liu, Qi; Zeng, Tiebo; Chen, Yan; Zhang, Yan; Wu, Qiong

2013-01-01

Long non-coding RNAs (lncRNAs) as a key group of non-coding RNAs have gained widely attention. Though lncRNAs have been functionally annotated and systematic explored in higher mammals, few are under systematical identification and annotation. Owing to the expression specificity, known lncRNAs expressed in embryonic brain tissues remain still limited. Considering a large number of lncRNAs are only transcribed in brain tissues, studies of lncRNAs in developmental brain are therefore of special interest. Here, publicly available RNA-sequencing (RNA-seq) data in embryonic brain are integrated to identify thousands of embryonic brain lncRNAs by a customized pipeline. A significant proportion of novel transcripts have not been annotated by available genomic resources. The putative embryonic brain lncRNAs are shorter in length, less spliced and show less conservation than known genes. The expression of putative lncRNAs is in one tenth on average of known coding genes, while comparable with known lncRNAs. From chromatin data, putative embryonic brain lncRNAs are associated with active chromatin marks, comparable with known lncRNAs. Embryonic brain expressed lncRNAs are also indicated to have expression though not evident in adult brain. Gene Ontology analysis of putative embryonic brain lncRNAs suggests that they are associated with brain development. The putative lncRNAs are shown to be related to possible cis-regulatory roles in imprinting even themselves are deemed to be imprinted lncRNAs. Re-analysis of one knockdown data suggests that four regulators are associated with lncRNAs. Taken together, the identification and systematic analysis of putative lncRNAs would provide novel insights into uncharacterized mouse non-coding regions and the relationships with mammalian embryonic brain development. PMID:23967161
Comparative analysis of human protein-coding and noncoding RNAs between brain and 10 mixed cell lines by RNA-Seq.

PubMed

Chen, Geng; Yin, Kangping; Shi, Leming; Fang, Yuanzhang; Qi, Ya; Li, Peng; Luo, Jian; He, Bing; Liu, Mingyao; Shi, Tieliu

2011-01-01

In their expression process, different genes can generate diverse functional products, including various protein-coding or noncoding RNAs. Here, we investigated the protein-coding capacities and the expression levels of their isoforms for human known genes, the conservation and disease association of long noncoding RNAs (ncRNAs) with two transcriptome sequencing datasets from human brain tissues and 10 mixed cell lines. Comparative analysis revealed that about two-thirds of the genes expressed between brain and cell lines are the same, but less than one-third of their isoforms are identical. Besides those genes specially expressed in brain and cell lines, about 66% of genes expressed in common encoded different isoforms. Moreover, most genes dominantly expressed one isoform and some genes only generated protein-coding (or noncoding) RNAs in one sample but not in another. We found 282 human genes could encode both protein-coding and noncoding RNAs through alternative splicing in the two samples. We also identified more than 1,000 long ncRNAs, and most of those long ncRNAs contain conserved elements across either 46 vertebrates or 33 placental mammals or 10 primates. Further analysis showed that some long ncRNAs differentially expressed in human breast cancer or lung cancer, several of those differentially expressed long ncRNAs were validated by RT-PCR. In addition, those validated differentially expressed long ncRNAs were found significantly correlated with certain breast cancer or lung cancer related genes, indicating the important biological relevance between long ncRNAs and human cancers. Our findings reveal that the differences of gene expression profile between samples mainly result from the expressed gene isoforms, and highlight the importance of studying genes at the isoform level for completely illustrating the intricate transcriptome.
Predicted stem-loop structures and variation in nucleotide sequence of 3' noncoding regions among animal calicivirus genomes.

PubMed

Seal, B S; Neill, J D; Ridpath, J F

1994-07-01

Caliciviruses are nonenveloped with a polyadenylated genome of approximately 7.6 kb and a single capsid protein. The "RNA Fold" computer program was used to analyze 3'-terminal noncoding sequences of five feline calicivirus (FCV), rabbit hemorrhagic disease virus (RHDV), and two San Miguel sea lion virus (SMSV) isolates. The FCV 3'-terminal sequences are 40-46 nucleotides in length and 72-91% similar. The FCV sequences were predicted to contain two possible duplex structures and one stem-loop structure with free energies of -2.1 to -18.2 kcal/mole. The RHDV genomic 3'-terminal RNA sequences are 54 nucleotides in length and share 49% sequence similarity to homologous regions of the FCV genome. The RHDV sequence was predicted to form two duplex structures in the 3'-terminal noncoding region with a single stem-loop structure, resembling that of FCV. In contrast, the SMSV 1 and 4 genomic 3'-terminal noncoding sequences were 185 and 182 nucleotides in length, respectively. Ten possible duplex structures were predicted with an average structural free energy of -35 kcal/mole. Sequence similarity between the two SMSV isolates was 75%. Furthermore, extensive cloverleaflike structures are predicted in the 3' noncoding region of the SMSV genome, in contrast to the predicted single stem-loop structures of FCV or RHDV.
Regulated Formation of lncRNA-DNA Hybrids Enables Faster Transcriptional Induction and Environmental Adaptation.

PubMed

Cloutier, Sara C; Wang, Siwen; Ma, Wai Kit; Al Husini, Nadra; Dhoondia, Zuzer; Ansari, Athar; Pascuzzi, Pete E; Tran, Elizabeth J

2016-02-04

Long non-coding (lnc)RNAs, once thought to merely represent noise from imprecise transcription initiation, have now emerged as major regulatory entities in all eukaryotes. In contrast to the rapidly expanding identification of individual lncRNAs, mechanistic characterization has lagged behind. Here we provide evidence that the GAL lncRNAs in the budding yeast S. cerevisiae promote transcriptional induction in trans by formation of lncRNA-DNA hybrids or R-loops. The evolutionarily conserved RNA helicase Dbp2 regulates formation of these R-loops as genomic deletion or nuclear depletion results in accumulation of these structures across the GAL cluster gene promoters and coding regions. Enhanced transcriptional induction is manifested by lncRNA-dependent displacement of the Cyc8 co-repressor and subsequent gene looping, suggesting that these lncRNAs promote induction by altering chromatin architecture. Moreover, the GAL lncRNAs confer a competitive fitness advantage to yeast cells because expression of these non-coding molecules correlates with faster adaptation in response to an environmental switch. Copyright © 2016 Elsevier Inc. All rights reserved.
Defining functional DNA elements in the human genome

PubMed Central

Kellis, Manolis; Wold, Barbara; Snyder, Michael P.; Bernstein, Bradley E.; Kundaje, Anshul; Marinov, Georgi K.; Ward, Lucas D.; Birney, Ewan; Crawford, Gregory E.; Dekker, Job; Dunham, Ian; Elnitski, Laura L.; Farnham, Peggy J.; Feingold, Elise A.; Gerstein, Mark; Giddings, Morgan C.; Gilbert, David M.; Gingeras, Thomas R.; Green, Eric D.; Guigo, Roderic; Hubbard, Tim; Kent, Jim; Lieb, Jason D.; Myers, Richard M.; Pazin, Michael J.; Ren, Bing; Stamatoyannopoulos, John A.; Weng, Zhiping; White, Kevin P.; Hardison, Ross C.

2014-01-01

With the completion of the human genome sequence, attention turned to identifying and annotating its functional DNA elements. As a complement to genetic and comparative genomics approaches, the Encyclopedia of DNA Elements Project was launched to contribute maps of RNA transcripts, transcriptional regulator binding sites, and chromatin states in many cell types. The resulting genome-wide data reveal sites of biochemical activity with high positional resolution and cell type specificity that facilitate studies of gene regulation and interpretation of noncoding variants associated with human disease. However, the biochemically active regions cover a much larger fraction of the genome than do evolutionarily conserved regions, raising the question of whether nonconserved but biochemically active regions are truly functional. Here, we review the strengths and limitations of biochemical, evolutionary, and genetic approaches for defining functional DNA segments, potential sources for the observed differences in estimated genomic coverage, and the biological implications of these discrepancies. We also analyze the relationship between signal intensity, genomic coverage, and evolutionary conservation. Our results reinforce the principle that each approach provides complementary information and that we need to use combinations of all three to elucidate genome function in human biology and disease. PMID:24753594
Evolutionary growth process of highly conserved sequences in vertebrate genomes.

PubMed

Ishibashi, Minaka; Noda, Akiko Ogura; Sakate, Ryuichi; Imanishi, Tadashi

2012-08-01

Genome sequence comparison between evolutionarily distant species revealed ultraconserved elements (UCEs) among mammals under strong purifying selection. Most of them were also conserved among vertebrates. Because they tend to be located in the flanking regions of developmental genes, they would have fundamental roles in creating vertebrate body plans. However, the evolutionary origin and selection mechanism of these UCEs remain unclear. Here we report that UCEs arose in primitive vertebrates, and gradually grew in vertebrate evolution. We searched for UCEs in two teleost fishes, Tetraodon nigroviridis and Oryzias latipes, and found 554 UCEs with 100% identity over 100 bps. Comparison of teleost and mammalian UCEs revealed 43 pairs of common, jawed-vertebrate UCEs (jUCE) with high sequence identities, ranging from 83.1% to 99.2%. Ten of them retain lower similarities to the Petromyzon marinus genome, and the substitution rates of four non-exonic jUCEs were reduced after the teleost-mammal divergence, suggesting that robust conservation had been acquired in the jawed vertebrate lineage. Our results indicate that prototypical UCEs originated before the divergence of jawed and jawless vertebrates and have been frozen as perfect conserved sequences in the jawed vertebrate lineage. In addition, our comparative sequence analyses of UCEs and neighboring regions resulted in a discovery of lineage-specific conserved sequences. They were added progressively to prototypical UCEs, suggesting step-wise acquisition of novel regulatory roles. Our results indicate that conserved non-coding elements (CNEs) consist of blocks with distinct evolutionary history, each having been frozen since different evolutionary era along the vertebrate lineage. Copyright © 2012 Elsevier B.V. All rights reserved.
Mechanisms of haplotype divergence at the RGA08 nucleotide-binding leucine-rich repeat gene locus in wild banana (Musa balbisiana)

PubMed Central

2010-01-01

Background Comparative sequence analysis of complex loci such as resistance gene analog clusters allows estimating the degree of sequence conservation and mechanisms of divergence at the intraspecies level. In banana (Musa sp.), two diploid wild species Musa acuminata (A genome) and Musa balbisiana (B genome) contribute to the polyploid genome of many cultivars. The M. balbisiana species is associated with vigour and tolerance to pests and disease and little is known on the genome structure and haplotype diversity within this species. Here, we compare two genomic sequences of 253 and 223 kb corresponding to two haplotypes of the RGA08 resistance gene analog locus in M. balbisiana "Pisang Klutuk Wulung" (PKW). Results Sequence comparison revealed two regions of contrasting features. The first is a highly colinear gene-rich region where the two haplotypes diverge only by single nucleotide polymorphisms and two repetitive element insertions. The second corresponds to a large cluster of RGA08 genes, with 13 and 18 predicted RGA genes and pseudogenes spread over 131 and 152 kb respectively on each haplotype. The RGA08 cluster is enriched in repetitive element insertions, in duplicated non-coding intergenic sequences including low complexity regions and shows structural variations between haplotypes. Although some allelic relationships are retained, a large diversity of RGA08 genes occurs in this single M. balbisiana genotype, with several RGA08 paralogs specific to each haplotype. The RGA08 gene family has evolved by mechanisms of unequal recombination, intragenic sequence exchange and diversifying selection. An unequal recombination event taking place between duplicated non-coding intergenic sequences resulted in a different RGA08 gene content between haplotypes pointing out the role of such duplicated regions in the evolution of RGA clusters. Based on the synonymous substitution rate in coding sequences, we estimated a 1 million year divergence time for these M. balbisiana haplotypes. Conclusions A large RGA08 gene cluster identified in wild banana corresponds to a highly variable genomic region between haplotypes surrounded by conserved flanking regions. High level of sequence identity (70 to 99%) of the genic and intergenic regions suggests a recent and rapid evolution of this cluster in M. balbisiana. PMID:20637079
Functional interrogation of non-coding DNA through CRISPR genome editing

PubMed Central

Canver, Matthew C.; Bauer, Daniel E.; Orkin, Stuart H.

2017-01-01

Methodologies to interrogate non-coding regions have lagged behind coding regions despite comprising the vast majority of the genome. However, the rapid evolution of clustered regularly interspaced short palindromic repeats (CRISPR)-based genome editing has provided a multitude of novel techniques for laboratory investigation including significant contributions to the toolbox for studying non-coding DNA. CRISPR-mediated loss-of-function strategies rely on direct disruption of the underlying sequence or repression of transcription without modifying the targeted DNA sequence. CRISPR-mediated gain-of-function approaches similarly benefit from methods to alter the targeted sequence through integration of customized sequence into the genome as well as methods to activate transcription. Here we review CRISPR-based loss- and gain-of-function techniques for the interrogation of non-coding DNA. PMID:28288828
Discovery of functional non-coding conserved regions in the α-synuclein gene locus

PubMed Central

Sterling, Lori; Walter, Michael; Ting, Dennis; Schüle, Birgitt

2014-01-01

Several single nucleotide polymorphisms (SNPs) and the Rep-1 microsatellite marker of the α-synuclein ( SNCA) gene have consistently been shown to be associated with Parkinson’s disease, but the functional relevance is unclear. Based on these findings we hypothesized that conserved cis-regulatory elements in the SNCA genomic region regulate expression of SNCA, and that SNPs in these regions could be functionally modulating the expression of SNCA, thus contributing to neuronal demise and predisposing to Parkinson’s disease. In a pair-wise comparison of a 206kb genomic region encompassing the SNCA gene, we revealed 34 evolutionary conserved DNA sequences between human and mouse. All elements were cloned into reporter vectors and assessed for expression modulation in dual luciferase reporter assays. We found that 12 out of 34 elements exhibited either an enhancement or reduction of the expression of the reporter gene. Three elements upstream of the SNCA gene displayed an approximately 1.5 fold (p<0.009) increase in expression. Of the intronic regions, three showed a 1.5 fold increase and two others indicated a 2 and 2.5 fold increase in expression (p<0.002). Three elements downstream of the SNCA gene showed 1.5 fold and 2.5 fold increase (p<0.0009). One element downstream of SNCA had a reduced expression of the reporter gene of 0.35 fold (p<0.0009) of normal activity. Our results demonstrate that the SNCA gene contains cis-regulatory regions that might regulate the transcription and expression of SNCA. Further studies in disease-relevant tissue types will be important to understand the functional impact of regulatory regions and specific Parkinson’s disease-associated SNPs and its function in the disease process. PMID:25566351
ICAM-1-related long non-coding RNA: promoter analysis and expression in human retinal endothelial cells.

PubMed

Lumsden, Amanda L; Ma, Yuefang; Ashander, Liam M; Stempel, Andrew J; Keating, Damien J; Smith, Justine R; Appukuttan, Binoy

2018-05-09

Regulation of intercellular adhesion molecule (ICAM)-1 in retinal endothelial cells is a promising druggable target for retinal vascular diseases. The ICAM-1-related (ICR) long non-coding RNA stabilizes ICAM-1 transcript, increasing protein expression. However, studies of ICR involvement in disease have been limited as the promoter is uncharacterized. To address this issue, we undertook a comprehensive in silico analysis of the human ICR gene promoter region. We used genomic evolutionary rate profiling to identify a 115 base pair (bp) sequence within 500 bp upstream of the transcription start site of the annotated human ICR gene that was conserved across 25 eutherian genomes. A second constrained sequence upstream of the orthologous mouse gene (68 bp; conserved across 27 Eutherian genomes including human) was also discovered. Searching these elements identified 33 matrices predictive of binding sites for transcription factors known to be responsive to a broad range of pathological stimuli, including hypoxia, and metabolic and inflammatory proteins. Five phenotype-associated single nucleotide polymorphisms (SNPs) in the immediate vicinity of these elements included four SNPs (i.e. rs2569693, rs281439, rs281440 and rs11575074) predicted to impact binding motifs of transcription factors, and thus the expression of ICR and ICAM-1 genes, with potential to influence disease susceptibility. We verified that human retinal endothelial cells expressed ICR, and observed induction of expression by tumor necrosis factor-α.
Parallel evolution of chordate cis-regulatory code for development.

PubMed

Doglio, Laura; Goode, Debbie K; Pelleri, Maria C; Pauls, Stefan; Frabetti, Flavia; Shimeld, Sebastian M; Vavouri, Tanya; Elgar, Greg

2013-11-01

Urochordates are the closest relatives of vertebrates and at the larval stage, possess a characteristic bilateral chordate body plan. In vertebrates, the genes that orchestrate embryonic patterning are in part regulated by highly conserved non-coding elements (CNEs), yet these elements have not been identified in urochordate genomes. Consequently the evolution of the cis-regulatory code for urochordate development remains largely uncharacterised. Here, we use genome-wide comparisons between C. intestinalis and C. savignyi to identify putative urochordate cis-regulatory sequences. Ciona conserved non-coding elements (ciCNEs) are associated with largely the same key regulatory genes as vertebrate CNEs. Furthermore, some of the tested ciCNEs are able to activate reporter gene expression in both zebrafish and Ciona embryos, in a pattern that at least partially overlaps that of the gene they associate with, despite the absence of sequence identity. We also show that the ability of a ciCNE to up-regulate gene expression in vertebrate embryos can in some cases be localised to short sub-sequences, suggesting that functional cross-talk may be defined by small regions of ancestral regulatory logic, although functional sub-sequences may also be dispersed across the whole element. We conclude that the structure and organisation of cis-regulatory modules is very different between vertebrates and urochordates, reflecting their separate evolutionary histories. However, functional cross-talk still exists because the same repertoire of transcription factors has likely guided their parallel evolution, exploiting similar sets of binding sites but in different combinations.
Evolutionary dynamics of a conserved sequence motif in the ribosomal genes of the ciliate Paramecium.

PubMed

Catania, Francesco; Lynch, Michael

2010-05-04

In protozoa, the identification of preserved motifs by comparative genomics is often impeded by difficulties to generate reliable alignments for non-coding sequences. Moreover, the evolutionary dynamics of regulatory elements in 3' untranslated regions (both in protozoa and metazoa) remains a virtually unexplored issue. By screening Paramecium tetraurelia's 3' untranslated regions for 8-mers that were previously found to be preserved in mammalian 3' UTRs, we detect and characterize a motif that is distinctly conserved in the ribosomal genes of this ciliate. The motif appears to be conserved across Paramecium aurelia species but is absent from the ribosomal genes of four additional non-Paramecium species surveyed, including another ciliate, Tetrahymena thermophila. Motif-free ribosomal genes retain fewer paralogs in the genome and appear to be lost more rapidly relative to motif-containing genes. Features associated with the discovered preserved motif are consistent with this 8-mer playing a role in post-transcriptional regulation. Our observations 1) shed light on the evolution of a putative regulatory motif across large phylogenetic distances; 2) are expected to facilitate the understanding of the modulation of ribosomal genes expression in Paramecium; and 3) reveal a largely unexplored--and presumably not restricted to Paramecium--association between the presence/absence of a DNA motif and the evolutionary fate of its host genes.
Long-Range Control of Gene Expression: Emerging Mechanisms and Disruption in Disease

PubMed Central

Kleinjan, Dirk A.; van Heyningen, Veronica

2005-01-01

Transcriptional control is a major mechanism for regulating gene expression. The complex machinery required to effect this control is still emerging from functional and evolutionary analysis of genomic architecture. In addition to the promoter, many other regulatory elements are required for spatiotemporally and quantitatively correct gene expression. Enhancer and repressor elements may reside in introns or up- and downstream of the transcription unit. For some genes with highly complex expression patterns—often those that function as key developmental control genes—the cis-regulatory domain can extend long distances outside the transcription unit. Some of the earliest hints of this came from disease-associated chromosomal breaks positioned well outside the relevant gene. With the availability of wide-ranging genome sequence comparisons, strong conservation of many noncoding regions became obvious. Functional studies have shown many of these conserved sites to be transcriptional regulatory elements that sometimes reside inside unrelated neighboring genes. Such sequence-conserved elements generally harbor sites for tissue-specific DNA-binding proteins. Developmentally variable chromatin conformation can control protein access to these sites and can regulate transcription. Disruption of these finely tuned mechanisms can cause disease. Some regulatory element mutations will be associated with phenotypes distinct from any identified for coding-region mutations. PMID:15549674
Conservation genetics and geographic patterns of genetic variation of the endangered officinal herb Fritillaria pallidiflora

Treesearch

Zhihao Su; Borong Pan; Stewart C. Sanderson; Xiaolong Jiang; Mingli Zhang

2015-01-01

Fritillaria pallidiflora is an endangered officinal herb distributed in the Tianshan Mountains of northwestern China. We examined its phylogeography to study evolutionary processes and suggest implications for conservation. Six haplotypes were detected based on three chloroplast non-coding spacers (psbA-trnH, rps16, and trnS-trnG); genetic variation mainly occurred...
Genomic positional conservation identifies topological anchor point RNAs linked to developmental loci.

PubMed

Amaral, Paulo P; Leonardi, Tommaso; Han, Namshik; Viré, Emmanuelle; Gascoigne, Dennis K; Arias-Carrasco, Raúl; Büscher, Magdalena; Pandolfini, Luca; Zhang, Anda; Pluchino, Stefano; Maracaja-Coutinho, Vinicius; Nakaya, Helder I; Hemberg, Martin; Shiekhattar, Ramin; Enright, Anton J; Kouzarides, Tony

2018-03-15

The mammalian genome is transcribed into large numbers of long noncoding RNAs (lncRNAs), but the definition of functional lncRNA groups has proven difficult, partly due to their low sequence conservation and lack of identified shared properties. Here we consider promoter conservation and positional conservation as indicators of functional commonality. We identify 665 conserved lncRNA promoters in mouse and human that are preserved in genomic position relative to orthologous coding genes. These positionally conserved lncRNA genes are primarily associated with developmental transcription factor loci with which they are coexpressed in a tissue-specific manner. Over half of positionally conserved RNAs in this set are linked to chromatin organization structures, overlapping binding sites for the CTCF chromatin organiser and located at chromatin loop anchor points and borders of topologically associating domains (TADs). We define these RNAs as topological anchor point RNAs (tapRNAs). Characterization of these noncoding RNAs and their associated coding genes shows that they are functionally connected: they regulate each other's expression and influence the metastatic phenotype of cancer cells in vitro in a similar fashion. Furthermore, we find that tapRNAs contain conserved sequence domains that are enriched in motifs for zinc finger domain-containing RNA-binding proteins and transcription factors, whose binding sites are found mutated in cancers. This work leverages positional conservation to identify lncRNAs with potential importance in genome organization, development and disease. The evidence that many developmental transcription factors are physically and functionally connected to lncRNAs represents an exciting stepping-stone to further our understanding of genome regulation.
Functional interrogation of non-coding DNA through CRISPR genome editing.

PubMed

Canver, Matthew C; Bauer, Daniel E; Orkin, Stuart H

2017-05-15

Methodologies to interrogate non-coding regions have lagged behind coding regions despite comprising the vast majority of the genome. However, the rapid evolution of clustered regularly interspaced short palindromic repeats (CRISPR)-based genome editing has provided a multitude of novel techniques for laboratory investigation including significant contributions to the toolbox for studying non-coding DNA. CRISPR-mediated loss-of-function strategies rely on direct disruption of the underlying sequence or repression of transcription without modifying the targeted DNA sequence. CRISPR-mediated gain-of-function approaches similarly benefit from methods to alter the targeted sequence through integration of customized sequence into the genome as well as methods to activate transcription. Here we review CRISPR-based loss- and gain-of-function techniques for the interrogation of non-coding DNA. Copyright © 2017 Elsevier Inc. All rights reserved.
Structure and characterization of a cDNA clone for phenylalanine ammonia-lyase from cut-injured roots of sweet potato

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tanaka, Yoshiyuki; Matsuoka, Makoto; Yamanoto, Naoki

A cDNA clone for phenylalanine ammonia-lyase (PAL) induced in wounded sweet potato (Ipomoea batatas Lam.) root was obtained by immunoscreening a cDNA library. The protein produced in Escherichia coli cells containing the plasmid pPAL02 was indistinguishable from sweet potato PAL as judged by Ouchterlony double diffusion assays. The M{sub r} of its subunit was 77,000. The cells converted ({sup 14}C)-L-phenylalanine into ({sup 14}C)-t-cinnamic acid and PAL activity was detected in the homogenate of the cells. The activity was dependent on the presence of the pPAL02 plasmid DNA. The nucleotide sequence of the cDNA contained a 2,121-base pair (bp) open-reading framemore » capable of coding for a polypeptide with 707 amino acids (M{sub r} 77,137), a 22-bp 5{prime}-noncoding region and a 207-bp 3{prime}-noncoding region. The results suggest that the insert DNA fully encoded the amino acid sequence for sweet potato PAL that is induced by wounding. Comparison of the deduced amino acid sequence with that of a PAL cDNA fragment from Phaseolus vulgaris revealed 78.9% homology. The sequence from amino acid residues 258 to 494 was highly conserved, showing 90.7% homology.« less
Complete mitochondrial genome of the larch hawk moth, Sphinx morio (Lepidoptera: Sphingidae).

PubMed

Kim, Min Jee; Choi, Sei-Woong; Kim, Iksoo

2013-12-01

The larch hawk moth, Sphinx morio, belongs to the lepidopteran family Sphingidae that has long been studied as a family of model insects in a diverse field. In this study, we describe the complete mitochondrial genome (mitogenome) sequences of the species in terms of general genomic features and characteristic short repetitive sequences found in the A + T-rich region. The 15,299-bp-long genome consisted of a typical set of genes (13 protein-coding genes, 2 rRNA genes, and 22 tRNA genes) and one major non-coding A + T-rich region, with the typical arrangement found in Lepidoptera. The 316-bp-long A + T-rich region located between srRNA and tRNA(Met) harbored the conserved sequence blocks that are typically found in lepidopteran insects. Additionally, the A + T-rich region of S. morio contained three characteristic repeat sequences that are rarely found in Lepidoptera: two identical 12-bp repeat, three identical 5-bp-long tandem repeat, and six nearly identical 5-6 bp long repeat sequences.

A statistical framework to predict functional non-coding regions in the human genome through integrated analysis of annotation data.

PubMed

Lu, Qiongshi; Hu, Yiming; Sun, Jiehuan; Cheng, Yuwei; Cheung, Kei-Hoi; Zhao, Hongyu

2015-05-27

Identifying functional regions in the human genome is a major goal in human genetics. Great efforts have been made to functionally annotate the human genome either through computational predictions, such as genomic conservation, or high-throughput experiments, such as the ENCODE project. These efforts have resulted in a rich collection of functional annotation data of diverse types that need to be jointly analyzed for integrated interpretation and annotation. Here we present GenoCanyon, a whole-genome annotation method that performs unsupervised statistical learning using 22 computational and experimental annotations thereby inferring the functional potential of each position in the human genome. With GenoCanyon, we are able to predict many of the known functional regions. The ability of predicting functional regions as well as its generalizable statistical framework makes GenoCanyon a unique and powerful tool for whole-genome annotation. The GenoCanyon web server is available at http://genocanyon.med.yale.edu.
Translational efficiency of poliovirus mRNA: mapping inhibitory cis-acting elements within the 5' noncoding region.

PubMed Central

Pelletier, J; Kaplan, G; Racaniello, V R; Sonenberg, N

1988-01-01

Poliovirus mRNA contains a long 5' noncoding region of about 750 nucleotides (the exact number varies among the three virus serotypes), which contains several AUG codons upstream of the major initiator AUG. Unlike most eucaryotic mRNAs, poliovirus does not contain a m7GpppX (where X is any nucleotide) cap structure at its 5' end and is translated by a cap-independent mechanism. To study the manner by which poliovirus mRNA is expressed, we examined the translational efficiencies of a series of deletion mutants within the 5' noncoding region of the mRNA. In this paper we report striking translation system-specific differences in the ability of the altered mRNAs to be translated. The results suggest the existence of an inhibitory cis-acting element(s) within the 5' noncoding region of poliovirus (between nucleotides 70 and 381) which restricts mRNA translation in reticulocyte lysate, wheat germ extract, and Xenopus oocytes, but not in HeLa cell extracts. In addition, we show that HeLa cell extracts contain a trans-acting factor(s) that overcomes this restriction. Images PMID:2836606
Simple sequence repeats in Escherichia coli: abundance, distribution, composition, and polymorphism.

PubMed

Gur-Arie, R; Cohen, C J; Eitan, Y; Shelef, L; Hallerman, E M; Kashi, Y

2000-01-01

Computer-based genome-wide screening of the DNA sequence of Escherichia coli strain K12 revealed tens of thousands of tandem simple sequence repeat (SSR) tracts, with motifs ranging from 1 to 6 nucleotides. SSRs were well distributed throughout the genome. Mononucleotide SSRs were over-represented in noncoding regions and under-represented in open reading frames (ORFs). Nucleotide composition of mono- and dinucleotide SSRs, both in ORFs and in noncoding regions, differed from that of the genomic region in which they occurred, with 93% of all mononucleotide SSRs proving to be of A or T. Computer-based analysis of the fine position of every SSR locus in the noncoding portion of the genome relative to downstream ORFs showed SSRs located in areas that could affect gene regulation. DNA sequences at 14 arbitrarily chosen SSR tracts were compared among E. coli strains. Polymorphisms of SSR copy number were observed at four of seven mononucleotide SSR tracts screened, with all polymorphisms occurring in noncoding regions. SSR polymorphism could prove important as a genome-wide source of variation, both for practical applications (including rapid detection, strain identification, and detection of loci affecting key phenotypes) and for evolutionary adaptation of microbes.
Genetics Home Reference: isolated Pierre Robin sequence

MedlinePlus

... PG, Fitzpatrick DR, Lyonnet S. Highly conserved non-coding elements on either side of SOX9 associated with Pierre ... Citation on PubMed or Free article on PubMed Central Jakobsen LP, Ullmann R, Christensen SB, Jensen KE, ...
The complete chloroplast genome sequence of strawberry (Fragaria × ananassa Duch.) and comparison with related species of Rosaceae

PubMed Central

Cheng, Hui; Li, Jinfeng; Zhang, Hong; Cai, Binhua; Gao, Zhihong

2017-01-01

Compared with other members of the family Rosaceae, the chloroplast genomes of Fragaria species exhibit low variation, and this situation has limited phylogenetic analyses; thus, complete chloroplast genome sequencing of Fragaria species is needed. In this study, we sequenced the complete chloroplast genome of F. × ananassa ‘Benihoppe’ using the Illumina HiSeq 2500-PE150 platform and then performed a combination of de novo assembly and reference-guided mapping of contigs to generate complete chloroplast genome sequences. The chloroplast genome exhibits a typical quadripartite structure with a pair of inverted repeats (IRs, 25,936 bp) separated by large (LSC, 85,531 bp) and small (SSC, 18,146 bp) single-copy (SC) regions. The length of the F. × ananassa ‘Benihoppe’ chloroplast genome is 155,549 bp, representing the smallest Fragaria chloroplast genome observed to date. The genome encodes 112 unique genes, comprising 78 protein-coding genes, 30 tRNA genes and four rRNA genes. Comparative analysis of the overall nucleotide sequence identity among ten complete chloroplast genomes confirmed that for both coding and non-coding regions in Rosaceae, SC regions exhibit higher sequence variation than IRs. The Ka/Ks ratio of most genes was less than 1, suggesting that most genes are under purifying selection. Moreover, the mVISTA results also showed a high degree of conservation in genome structure, gene order and gene content in Fragaria, particularly among three octoploid strawberries which were F. × ananassa ‘Benihoppe’, F. chiloensis (GP33) and F. virginiana (O477). However, when the sequences of the coding and non-coding regions of F. × ananassa ‘Benihoppe’ were compared in detail with those of F. chiloensis (GP33) and F. virginiana (O477), a number of SNPs and InDels were revealed by MEGA 7. Six non-coding regions (trnK-matK, trnS-trnG, atpF-atpH, trnC-petN, trnT-psbD and trnP-psaJ) with a percentage of variable sites greater than 1% and no less than five parsimony-informative sites were identified and may be useful for phylogenetic analysis of the genus Fragaria. PMID:29038765
Social insect genomes exhibit dramatic evolution in gene composition and regulation while preserving regulatory features linked to sociality

PubMed Central

Simola, Daniel F.; Wissler, Lothar; Donahue, Greg; Waterhouse, Robert M.; Helmkampf, Martin; Roux, Julien; Nygaard, Sanne; Glastad, Karl M.; Hagen, Darren E.; Viljakainen, Lumi; Reese, Justin T.; Hunt, Brendan G.; Graur, Dan; Elhaik, Eran; Kriventseva, Evgenia V.; Wen, Jiayu; Parker, Brian J.; Cash, Elizabeth; Privman, Eyal; Childers, Christopher P.; Muñoz-Torres, Monica C.; Boomsma, Jacobus J.; Bornberg-Bauer, Erich; Currie, Cameron R.; Elsik, Christine G.; Suen, Garret; Goodisman, Michael A.D.; Keller, Laurent; Liebig, Jürgen; Rawls, Alan; Reinberg, Danny; Smith, Chris D.; Smith, Chris R.; Tsutsui, Neil; Wurm, Yannick; Zdobnov, Evgeny M.; Berger, Shelley L.; Gadau, Jürgen

2013-01-01

Genomes of eusocial insects code for dramatic examples of phenotypic plasticity and social organization. We compared the genomes of seven ants, the honeybee, and various solitary insects to examine whether eusocial lineages share distinct features of genomic organization. Each ant lineage contains ∼4000 novel genes, but only 64 of these genes are conserved among all seven ants. Many gene families have been expanded in ants, notably those involved in chemical communication (e.g., desaturases and odorant receptors). Alignment of the ant genomes revealed reduced purifying selection compared with Drosophila without significantly reduced synteny. Correspondingly, ant genomes exhibit dramatic divergence of noncoding regulatory elements; however, extant conserved regions are enriched for novel noncoding RNAs and transcription factor–binding sites. Comparison of orthologous gene promoters between eusocial and solitary species revealed significant regulatory evolution in both cis (e.g., Creb) and trans (e.g., fork head) for nearly 2000 genes, many of which exhibit phenotypic plasticity. Our results emphasize that genomic changes can occur remarkably fast in ants, because two recently diverged leaf-cutter ant species exhibit faster accumulation of species-specific genes and greater divergence in regulatory elements compared with other ants or Drosophila. Thus, while the “socio-genomes” of ants and the honeybee are broadly characterized by a pervasive pattern of divergence in gene composition and regulation, they preserve lineage-specific regulatory features linked to eusociality. We propose that changes in gene regulation played a key role in the origins of insect eusociality, whereas changes in gene composition were more relevant for lineage-specific eusocial adaptations. PMID:23636946
Evolutionary dynamics of a conserved sequence motif in the ribosomal genes of the ciliate Paramecium

PubMed Central

2010-01-01

Background In protozoa, the identification of preserved motifs by comparative genomics is often impeded by difficulties to generate reliable alignments for non-coding sequences. Moreover, the evolutionary dynamics of regulatory elements in 3' untranslated regions (both in protozoa and metazoa) remains a virtually unexplored issue. Results By screening Paramecium tetraurelia's 3' untranslated regions for 8-mers that were previously found to be preserved in mammalian 3' UTRs, we detect and characterize a motif that is distinctly conserved in the ribosomal genes of this ciliate. The motif appears to be conserved across Paramecium aurelia species but is absent from the ribosomal genes of four additional non-Paramecium species surveyed, including another ciliate, Tetrahymena thermophila. Motif-free ribosomal genes retain fewer paralogs in the genome and appear to be lost more rapidly relative to motif-containing genes. Features associated with the discovered preserved motif are consistent with this 8-mer playing a role in post-transcriptional regulation. Conclusions Our observations 1) shed light on the evolution of a putative regulatory motif across large phylogenetic distances; 2) are expected to facilitate the understanding of the modulation of ribosomal genes expression in Paramecium; and 3) reveal a largely unexplored--and presumably not restricted to Paramecium--association between the presence/absence of a DNA motif and the evolutionary fate of its host genes. PMID:20441586
Characterization of the complete mitochondrial genome of Chilo auricilius and comparison with three other rice stem borers.

PubMed

Cao, Shuang-Shuang; Du, Yu-Zhou

2014-09-15

The mitogenome of Chilo auricilius (Lepidoptera: Pyraloidea: Crambidae) was a circular molecule made up of 15,367 bp. Sesamia inferens, Chilo suppressalis, Tryporyza incertulas, and C. auricilius, are closely related, well known rice stem borers that are widely distributed in the main rice-growing regions of China. The gene order and orientation of all four stem borers were similar to that of other insect mitogenomes. Among the four stem borers, all AT contents were below 83%, while all AT contents of tRNA genes were above 80%. The genomes were compact, with only 121-257 bp of non-coding intergenic spacer. There are 56 or 62-bp overlapping nucleotides in Crambidae moths, but were only 25-bp overlapping nucleotides in the noctuid moth S. inferens. There was a conserved motif 'ATACTAAA' between trnS2 (UCN) and nad1 in Crambidae moths, but this same region was 'ATCATA' in the noctuid S. inferens. And there was a 6-bp motif 'ATGATAA' of overlapping nucleotides, which was conserved in Lepidoptera, and a 14-bp motif 'TAAGCTATTTAAAT' conserved in the three Crambidae moths (C. suppressalis, C. auricilius and T. incertulas), but not in the noctuid. Finally, there were no stem-and-loop structures in the two Chilo moths. Copyright © 2014 Elsevier B.V. All rights reserved.
Detecting the borders between coding and non-coding DNA regions in prokaryotes based on recursive segmentation and nucleotide doublets statistics

PubMed Central

2012-01-01

Background Detecting the borders between coding and non-coding regions is an essential step in the genome annotation. And information entropy measures are useful for describing the signals in genome sequence. However, the accuracies of previous methods of finding borders based on entropy segmentation method still need to be improved. Methods In this study, we first applied a new recursive entropic segmentation method on DNA sequences to get preliminary significant cuts. A 22-symbol alphabet is used to capture the differential composition of nucleotide doublets and stop codon patterns along three phases in both DNA strands. This process requires no prior training datasets. Results Comparing with the previous segmentation methods, the experimental results on three bacteria genomes, Rickettsia prowazekii, Borrelia burgdorferi and E.coli, show that our approach improves the accuracy for finding the borders between coding and non-coding regions in DNA sequences. Conclusions This paper presents a new segmentation method in prokaryotes based on Jensen-Rényi divergence with a 22-symbol alphabet. For three bacteria genomes, comparing to A12_JR method, our method raised the accuracy of finding the borders between protein coding and non-coding regions in DNA sequences. PMID:23282225
Long Non-Coding RNAs Differentially Expressed between Normal versus Primary Breast Tumor Tissues Disclose Converse Changes to Breast Cancer-Related Protein-Coding Genes

PubMed Central

Reiche, Kristin; Kasack, Katharina; Schreiber, Stephan; Lüders, Torben; Due, Eldri U.; Naume, Bjørn; Riis, Margit; Kristensen, Vessela N.; Horn, Friedemann; Børresen-Dale, Anne-Lise; Hackermüller, Jörg; Baumbusch, Lars O.

2014-01-01

Breast cancer, the second leading cause of cancer death in women, is a highly heterogeneous disease, characterized by distinct genomic and transcriptomic profiles. Transcriptome analyses prevalently assessed protein-coding genes; however, the majority of the mammalian genome is expressed in numerous non-coding transcripts. Emerging evidence supports that many of these non-coding RNAs are specifically expressed during development, tumorigenesis, and metastasis. The focus of this study was to investigate the expression features and molecular characteristics of long non-coding RNAs (lncRNAs) in breast cancer. We investigated 26 breast tumor and 5 normal tissue samples utilizing a custom expression microarray enclosing probes for mRNAs as well as novel and previously identified lncRNAs. We identified more than 19,000 unique regions significantly differentially expressed between normal versus breast tumor tissue, half of these regions were non-coding without any evidence for functional open reading frames or sequence similarity to known proteins. The identified non-coding regions were primarily located in introns (53%) or in the intergenic space (33%), frequently orientated in antisense-direction of protein-coding genes (14%), and commonly distributed at promoter-, transcription factor binding-, or enhancer-sites. Analyzing the most diverse mRNA breast cancer subtypes Basal-like versus Luminal A and B resulted in 3,025 significantly differentially expressed unique loci, including 682 (23%) for non-coding transcripts. A notable number of differentially expressed protein-coding genes displayed non-synonymous expression changes compared to their nearest differentially expressed lncRNA, including an antisense lncRNA strongly anticorrelated to the mRNA coding for histone deacetylase 3 (HDAC3), which was investigated in more detail. Previously identified chromatin-associated lncRNAs (CARs) were predominantly downregulated in breast tumor samples, including CARs located in the protein-coding genes for CALD1, FTX, and HNRNPH1. In conclusion, a number of differentially expressed lncRNAs have been identified with relation to cancer-related protein-coding genes. PMID:25264628
Long non-coding RNAs differentially expressed between normal versus primary breast tumor tissues disclose converse changes to breast cancer-related protein-coding genes.

PubMed

Reiche, Kristin; Kasack, Katharina; Schreiber, Stephan; Lüders, Torben; Due, Eldri U; Naume, Bjørn; Riis, Margit; Kristensen, Vessela N; Horn, Friedemann; Børresen-Dale, Anne-Lise; Hackermüller, Jörg; Baumbusch, Lars O

2014-01-01

Breast cancer, the second leading cause of cancer death in women, is a highly heterogeneous disease, characterized by distinct genomic and transcriptomic profiles. Transcriptome analyses prevalently assessed protein-coding genes; however, the majority of the mammalian genome is expressed in numerous non-coding transcripts. Emerging evidence supports that many of these non-coding RNAs are specifically expressed during development, tumorigenesis, and metastasis. The focus of this study was to investigate the expression features and molecular characteristics of long non-coding RNAs (lncRNAs) in breast cancer. We investigated 26 breast tumor and 5 normal tissue samples utilizing a custom expression microarray enclosing probes for mRNAs as well as novel and previously identified lncRNAs. We identified more than 19,000 unique regions significantly differentially expressed between normal versus breast tumor tissue, half of these regions were non-coding without any evidence for functional open reading frames or sequence similarity to known proteins. The identified non-coding regions were primarily located in introns (53%) or in the intergenic space (33%), frequently orientated in antisense-direction of protein-coding genes (14%), and commonly distributed at promoter-, transcription factor binding-, or enhancer-sites. Analyzing the most diverse mRNA breast cancer subtypes Basal-like versus Luminal A and B resulted in 3,025 significantly differentially expressed unique loci, including 682 (23%) for non-coding transcripts. A notable number of differentially expressed protein-coding genes displayed non-synonymous expression changes compared to their nearest differentially expressed lncRNA, including an antisense lncRNA strongly anticorrelated to the mRNA coding for histone deacetylase 3 (HDAC3), which was investigated in more detail. Previously identified chromatin-associated lncRNAs (CARs) were predominantly downregulated in breast tumor samples, including CARs located in the protein-coding genes for CALD1, FTX, and HNRNPH1. In conclusion, a number of differentially expressed lncRNAs have been identified with relation to cancer-related protein-coding genes.
Hepatic Long Intergenic Noncoding RNAs: High Promoter Conservation and Dynamic, Sex-Dependent Transcriptional Regulation by Growth Hormone

PubMed Central

Melia, Tisha; Hao, Pengying; Yilmaz, Feyza

2015-01-01

Long intergenic noncoding RNAs (lincRNAs) are increasingly recognized as key chromatin regulators, yet few studies have characterized lincRNAs in a single tissue under diverse conditions. Here, we analyzed 45 mouse liver RNA sequencing (RNA-Seq) data sets collected under diverse conditions to systematically characterize 4,961 liver lincRNAs, 59% of them novel, with regard to gene structures, species conservation, chromatin accessibility, transcription factor binding, and epigenetic states. To investigate the potential for functionality, we focused on the responses of the liver lincRNAs to growth hormone stimulation, which imparts clinically relevant sex differences to hepatic metabolism and liver disease susceptibility. Sex-biased expression characterized 247 liver lincRNAs, with many being nuclear RNA enriched and regulated by growth hormone. The sex-biased lincRNA genes are enriched for nearby and correspondingly sex-biased accessible chromatin regions, as well as sex-biased binding sites for growth hormone-regulated transcriptional activators (STAT5, hepatocyte nuclear factor 6 [HNF6], FOXA1, and FOXA2) and transcriptional repressors (CUX2 and BCL6). Repression of female-specific lincRNAs in male liver, but not that of male-specific lincRNAs in female liver, was associated with enrichment of H3K27me3-associated inactive states and poised (bivalent) enhancer states. Strikingly, we found that liver-specific lincRNA gene promoters are more highly species conserved and have a significantly higher frequency of proximal binding by liver transcription factors than liver-specific protein-coding gene promoters. Orthologs for many liver lincRNAs were identified in one or more supraprimates, including two rat lincRNAs showing the same growth hormone-regulated, sex-biased expression as their mouse counterparts. This integrative analysis of liver lincRNA chromatin states, transcription factor occupancy, and growth hormone regulation provides novel insights into the expression of sex-specific lincRNAs and their potential for regulation of sex differences in liver physiology and disease. PMID:26459762
Conservation of gene linkage in dispersed vertebrate NK homeobox clusters.

PubMed

Wotton, Karl R; Weierud, Frida K; Juárez-Morales, José L; Alvares, Lúcia E; Dietrich, Susanne; Lewis, Katharine E

2009-10-01

Nk homeobox genes are important regulators of many different developmental processes including muscle, heart, central nervous system and sensory organ development. They are thought to have arisen as part of the ANTP megacluster, which also gave rise to Hox and ParaHox genes, and at least some NK genes remain tightly linked in all animals examined so far. The protostome-deuterostome ancestor probably contained a cluster of nine Nk genes: (Msx)-(Nk4/tinman)-(Nk3/bagpipe)-(Lbx/ladybird)-(Tlx/c15)-(Nk7)-(Nk6/hgtx)-(Nk1/slouch)-(Nk5/Hmx). Of these genes, only NKX2.6-NKX3.1, LBX1-TLX1 and LBX2-TLX2 remain tightly linked in humans. However, it is currently unclear whether this is unique to the human genome as we do not know which of these Nk genes are clustered in other vertebrates. This makes it difficult to assess whether the remaining linkages are due to selective pressures or because chance rearrangements have "missed" certain genes. In this paper, we identify all of the paralogs of these ancestrally clustered NK genes in several distinct vertebrates. We demonstrate that tight linkages of Lbx1-Tlx1, Lbx2-Tlx2 and Nkx3.1-Nkx2.6 have been widely maintained in both the ray-finned and lobe-finned fish lineages. Moreover, the recently duplicated Hmx2-Hmx3 genes are also tightly linked. Finally, we show that Lbx1-Tlx1 and Hmx2-Hmx3 are flanked by highly conserved noncoding elements, suggesting that shared regulatory regions may have resulted in evolutionary pressure to maintain these linkages. Consistent with this, these pairs of genes have overlapping expression domains. In contrast, Lbx2-Tlx2 and Nkx3.1-Nkx2.6, which do not seem to be coexpressed, are also not associated with conserved noncoding sequences, suggesting that an alternative mechanism may be responsible for the continued clustering of these genes.
Identification of evolutionarily conserved Momordica charantia microRNAs using computational approach and its utility in phylogeny analysis.

PubMed

Thirugnanasambantham, Krishnaraj; Saravanan, Subramanian; Karikalan, Kulandaivelu; Bharanidharan, Rajaraman; Lalitha, Perumal; Ilango, S; HairulIslam, Villianur Ibrahim

2015-10-01

Momordica charantia (bitter gourd, bitter melon) is a monoecious Cucurbitaceae with anti-oxidant, anti-microbial, anti-viral and anti-diabetic potential. Molecular studies on this economically valuable plant are very essential to understand its phylogeny and evolution. MicroRNAs (miRNAs) are conserved, small, non-coding RNA with ability to regulate gene expression by bind the 3' UTR region of target mRNA and are evolved at different rates in different plant species. In this study we have utilized homology based computational approach and identified 27 mature miRNAs for the first time from this bio-medically important plant. The phylogenetic tree developed from binary data derived from the data on presence/absence of the identified miRNAs were noticed to be uncertain and biased. Most of the identified miRNAs were highly conserved among the plant species and sequence based phylogeny analysis of miRNAs resolved the above difficulties in phylogeny approach using miRNA. Predicted gene targets of the identified miRNAs revealed their importance in regulation of plant developmental process. Reported miRNAs held sequence conservation in mature miRNAs and the detailed phylogeny analysis of pre-miRNA sequences revealed genus specific segregation of clusters. Copyright © 2015 Elsevier Ltd. All rights reserved.
MOSAIC: an online database dedicated to the comparative genomics of bacterial strains at the intra-species level.

PubMed

Chiapello, Hélène; Gendrault, Annie; Caron, Christophe; Blum, Jérome; Petit, Marie-Agnès; El Karoui, Meriem

2008-11-27

The recent availability of complete sequences for numerous closely related bacterial genomes opens up new challenges in comparative genomics. Several methods have been developed to align complete genomes at the nucleotide level but their use and the biological interpretation of results are not straightforward. It is therefore necessary to develop new resources to access, analyze, and visualize genome comparisons. Here we present recent developments on MOSAIC, a generalist comparative bacterial genome database. This database provides the bacteriologist community with easy access to comparisons of complete bacterial genomes at the intra-species level. The strategy we developed for comparison allows us to define two types of regions in bacterial genomes: backbone segments (i.e., regions conserved in all compared strains) and variable segments (i.e., regions that are either specific to or variable in one of the aligned genomes). Definition of these segments at the nucleotide level allows precise comparative and evolutionary analyses of both coding and non-coding regions of bacterial genomes. Such work is easily performed using the MOSAIC Web interface, which allows browsing and graphical visualization of genome comparisons. The MOSAIC database now includes 493 pairwise comparisons and 35 multiple maximal comparisons representing 78 bacterial species. Genome conserved regions (backbones) and variable segments are presented in various formats for further analysis. A graphical interface allows visualization of aligned genomes and functional annotations. The MOSAIC database is available online at http://genome.jouy.inra.fr/mosaic.
Emergence and Evolution of Hominidae-Specific Coding and Noncoding Genomic Sequences

PubMed Central

Saber, Morteza Mahmoudi; Adeyemi Babarinde, Isaac; Hettiarachchi, Nilmini; Saitou, Naruya

2016-01-01

Family Hominidae, which includes humans and great apes, is recognized for unique complex social behavior and intellectual abilities. Despite the increasing genome data, however, the genomic origin of its phenotypic uniqueness has remained elusive. Clade-specific genes and highly conserved noncoding sequences (HCNSs) are among the high-potential evolutionary candidates involved in driving clade-specific characters and phenotypes. On this premise, we analyzed whole genome sequences along with gene orthology data retrieved from major DNA databases to find Hominidae-specific (HS) genes and HCNSs. We discovered that Down syndrome critical region 4 (DSCR4) is the only experimentally verified gene uniquely present in Hominidae. DSCR4 has no structural homology to any known protein and was inferred to have emerged in several steps through LTR/ERV1, LTR/ERVL retrotransposition, and transversion. Using the genomic distance as neutral evolution threshold, we identified 1,658 HS HCNSs. Polymorphism coverage and derived allele frequency analysis of HS HCNSs showed that these HCNSs are under purifying selection, indicating that they may harbor important functions. They are overrepresented in promoters/untranslated regions, in close proximity of genes involved in sensory perception of sound and developmental process, and also showed a significantly lower nucleosome occupancy probability. Interestingly, many ancestral sequences of the HS HCNSs showed very high evolutionary rates. This suggests that new functions emerged through some kind of positive selection, and then purifying selection started to operate to keep these functions. PMID:27289096
Regulatory variation: an emerging vantage point for cancer biology.

PubMed

Li, Luolan; Lorzadeh, Alireza; Hirst, Martin

2014-01-01

Transcriptional regulation involves complex and interdependent interactions of noncoding and coding regions of the genome with proteins that interact and modify them. Genetic variation/mutation in coding and noncoding regions of the genome can drive aberrant transcription and disease. In spite of accounting for nearly 98% of the genome comparatively little is known about the contribution of noncoding DNA elements to disease. Genome-wide association studies of complex human diseases including cancer have revealed enrichment for variants in the noncoding genome. A striking finding of recent cancer genome re-sequencing efforts has been the previously underappreciated frequency of mutations in epigenetic modifiers across a wide range of cancer types. Taken together these results point to the importance of dysregulation in transcriptional regulatory control in genesis of cancer. Powered by recent technological advancements in functional genomic profiling, exploration of normal and transformed regulatory networks will provide novel insight into the initiation and progression of cancer and open new windows to future prognostic and diagnostic tools. © 2013 Wiley Periodicals, Inc.
Detection of non-coding RNA in bacteria and archaea using the DETR'PROK Galaxy pipeline.

PubMed

Toffano-Nioche, Claire; Luo, Yufei; Kuchly, Claire; Wallon, Claire; Steinbach, Delphine; Zytnicki, Matthias; Jacq, Annick; Gautheret, Daniel

2013-09-01

RNA-seq experiments are now routinely used for the large scale sequencing of transcripts. In bacteria or archaea, such deep sequencing experiments typically produce 10-50 million fragments that cover most of the genome, including intergenic regions. In this context, the precise delineation of the non-coding elements is challenging. Non-coding elements include untranslated regions (UTRs) of mRNAs, independent small RNA genes (sRNAs) and transcripts produced from the antisense strand of genes (asRNA). Here we present a computational pipeline (DETR'PROK: detection of ncRNAs in prokaryotes) based on the Galaxy framework that takes as input a mapping of deep sequencing reads and performs successive steps of clustering, comparison with existing annotation and identification of transcribed non-coding fragments classified into putative 5' UTRs, sRNAs and asRNAs. We provide a step-by-step description of the protocol using real-life example data sets from Vibrio splendidus and Escherichia coli. Copyright © 2013 The Authors. Published by Elsevier Inc. All rights reserved.
The Spot 42 RNA: A regulatory small RNA with roles in the central metabolism.

PubMed

Bækkedal, Cecilie; Haugen, Peik

2015-01-01

The Spot 42 RNA is a 109 nucleotide long (in Escherichia coli) noncoding small regulatory RNA (sRNA) encoded by the spf (spot fourty-two) gene. spf is found in gamma-proteobacteria and the majority of experimental work on Spot 42 RNA has been performed using E. coli, and recently Aliivibrio salmonicida. In the cell Spot 42 RNA plays essential roles as a regulator in carbohydrate metabolism and uptake, and its expression is activated by glucose, and inhibited by the cAMP-CRP complex. Here we summarize the current knowledge on Spot 42, and present the natural distribution of spf, show family-specific secondary structural features of Spot 42, and link highly conserved structural regions to mRNA target binding.
Comparative genomics reveals insights into avian genome evolution and adaptation

PubMed Central

Zhang, Guojie; Li, Cai; Li, Qiye; Li, Bo; Larkin, Denis M.; Lee, Chul; Storz, Jay F.; Antunes, Agostinho; Greenwold, Matthew J.; Meredith, Robert W.; Ödeen, Anders; Cui, Jie; Zhou, Qi; Xu, Luohao; Pan, Hailin; Wang, Zongji; Jin, Lijun; Zhang, Pei; Hu, Haofu; Yang, Wei; Hu, Jiang; Xiao, Jin; Yang, Zhikai; Liu, Yang; Xie, Qiaolin; Yu, Hao; Lian, Jinmin; Wen, Ping; Zhang, Fang; Li, Hui; Zeng, Yongli; Xiong, Zijun; Liu, Shiping; Zhou, Long; Huang, Zhiyong; An, Na; Wang, Jie; Zheng, Qiumei; Xiong, Yingqi; Wang, Guangbiao; Wang, Bo; Wang, Jingjing; Fan, Yu; da Fonseca, Rute R.; Alfaro-Núñez, Alonzo; Schubert, Mikkel; Orlando, Ludovic; Mourier, Tobias; Howard, Jason T.; Ganapathy, Ganeshkumar; Pfenning, Andreas; Whitney, Osceola; Rivas, Miriam V.; Hara, Erina; Smith, Julia; Farré, Marta; Narayan, Jitendra; Slavov, Gancho; Romanov, Michael N; Borges, Rui; Machado, João Paulo; Khan, Imran; Springer, Mark S.; Gatesy, John; Hoffmann, Federico G.; Opazo, Juan C.; Håstad, Olle; Sawyer, Roger H.; Kim, Heebal; Kim, Kyu-Won; Kim, Hyeon Jeong; Cho, Seoae; Li, Ning; Huang, Yinhua; Bruford, Michael W.; Zhan, Xiangjiang; Dixon, Andrew; Bertelsen, Mads F.; Derryberry, Elizabeth; Warren, Wesley; Wilson, Richard K; Li, Shengbin; Ray, David A.; Green, Richard E.; O’Brien, Stephen J.; Griffin, Darren; Johnson, Warren E.; Haussler, David; Ryder, Oliver A.; Willerslev, Eske; Graves, Gary R.; Alström, Per; Fjeldså, Jon; Mindell, David P.; Edwards, Scott V.; Braun, Edward L.; Rahbek, Carsten; Burt, David W.; Houde, Peter; Zhang, Yong; Yang, Huanming; Wang, Jian; Jarvis, Erich D.; Gilbert, M. Thomas P.; Wang, Jun

2015-01-01

Birds are the most species-rich class of tetrapod vertebrates and have wide relevance across many research fields. We explored bird macroevolution using full genomes from 48 avian species representing all major extant clades. The avian genome is principally characterized by its constrained size, which predominantly arose because of lineage-specific erosion of repetitive elements, large segmental deletions, and gene loss. Avian genomes furthermore show a remarkably high degree of evolutionary stasis at the levels of nucleotide sequence, gene synteny, and chromosomal structure. Despite this pattern of conservation, we detected many non-neutral evolutionary changes in protein-coding genes and noncoding regions. These analyses reveal that pan-avian genomic diversity covaries with adaptations to different lifestyles and convergent evolution of traits. PMID:25504712

Cap-independent translation of poliovirus mRNA is conferred by sequence elements within the 5' noncoding region

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pelletier, J.; Kaplan, G.; Racaniello, V.R.

1988-03-01

Poliovirus polysomal RNA is naturally uncapped, and as such, its translation must bypass any 5' cap-dependent ribosome recognition event. To elucidate the manner by which poliovirus mRNA is translated, the authors determined the translational efficiencies of a series of deletion mutants within the 5' noncoding region of the mRNA. They found striking differences in translatability among the altered mRNAs when assayed in mock-infected and poliovirus-infected HeLa cell extracts. The results identify a functional cis-acting element within the 5' noncoding region of the poliovirus mRNA which enables it to translate in a cap-independent fashion. The major determinant of this element mapsmore » between nucleotides 320 and 631 of the 5' end of the poliovirus mRNA. They also show that this region (320 to 631), when fused to a heterologous mRNA, can function in cis to render the mRNA cap independent in translation.« less
Diversity of Antisense and Other Non-Coding RNAs in Archaea Revealed by Comparative Small RNA Sequencing in Four Pyrobaculum Species

PubMed Central

Bernick, David L.; Dennis, Patrick P.; Lui, Lauren M.; Lowe, Todd M.

2012-01-01

A great diversity of small, non-coding RNA (ncRNA) molecules with roles in gene regulation and RNA processing have been intensely studied in eukaryotic and bacterial model organisms, yet our knowledge of possible parallel roles for small RNAs (sRNA) in archaea is limited. We employed RNA-seq to identify novel sRNA across multiple species of the hyperthermophilic genus Pyrobaculum, known for unusual RNA gene characteristics. By comparing transcriptional data collected in parallel among four species, we were able to identify conserved RNA genes fitting into known and novel families. Among our findings, we highlight three novel cis-antisense sRNAs encoded opposite to key regulatory (ferric uptake regulator), metabolic (triose-phosphate isomerase), and core transcriptional apparatus genes (transcription factor B). We also found a large increase in the number of conserved C/D box sRNA genes over what had been previously recognized; many of these genes are encoded antisense to protein coding genes. The conserved opposition to orthologous genes across the Pyrobaculum genus suggests similarities to other cis-antisense regulatory systems. Furthermore, the genus-specific nature of these sRNAs indicates they are relatively recent, stable adaptations. PMID:22783241
Genome-scale deletion screening of human long non-coding RNAs using a paired-guide RNA CRISPR library

PubMed Central

Zhu, Shiyou; Li, Wei; Liu, Jingze; Chen, Chen-Hao; Liao, Qi; Xu, Ping; Xu, Han; Xiao, Tengfei; Cao, Zhongzheng; Peng, Jingyu; Yuan, Pengfei; Brown, Myles; Liu, Xiaole Shirley; Wei, Wensheng

2017-01-01

CRISPR/Cas9 screens have been widely adopted to analyse coding gene functions, but high throughput screening of non-coding elements using this method is more challenging, because indels caused by a single cut in non-coding regions are unlikely to produce a functional knockout. A high-throughput method to produce deletions of non-coding DNA is needed. Herein, we report a high throughput genomic deletion strategy to screen for functional long non-coding RNAs (lncRNAs) that is based on a lentiviral paired-guide RNA (pgRNA) library. Applying our screening method, we identified 51 lncRNAs that can positively or negatively regulate human cancer cell growth. We individually validated 9 lncRNAs using CRISPR/Cas9-mediated genomic deletion and functional rescue, CRISPR activation or inhibition, and gene expression profiling. Our high-throughput pgRNA genome deletion method should enable rapid identification of functional mammalian non-coding elements. PMID:27798563
Variations in the non-coding transcriptome as a driver of inter-strain divergence and physiological adaptation in bacteria.

PubMed

Kopf, Matthias; Klähn, Stephan; Scholz, Ingeborg; Hess, Wolfgang R; Voß, Björn

2015-04-22

In all studied organisms, a substantial portion of the transcriptome consists of non-coding RNAs that frequently execute regulatory functions. Here, we have compared the primary transcriptomes of the cyanobacteria Synechocystis sp. PCC 6714 and PCC 6803 under 10 different conditions. These strains share 2854 protein-coding genes and a 16S rRNA identity of 99.4%, indicating their close relatedness. Conserved major transcriptional start sites (TSSs) give rise to non-coding transcripts within the sigB gene, from the 5'UTRs of cmpA and isiA, and 168 loci in antisense orientation. Distinct differences include single nucleotide polymorphisms rendering promoters inactive in one of the strains, e.g., for cmpR and for the asRNA PsbA2R. Based on the genome-wide mapped location, regulation and classification of TSSs, non-coding transcripts were identified as the most dynamic component of the transcriptome. We identified a class of mRNAs that originate by read-through from an sRNA that accumulates as a discrete and abundant transcript while also serving as the 5'UTR. Such an sRNA/mRNA structure, which we name 'actuaton', represents another way for bacteria to remodel their transcriptional network. Our findings support the hypothesis that variations in the non-coding transcriptome constitute a major evolutionary element of inter-strain divergence and capability for physiological adaptation.
New PAH gene promoter KLF1 and 3'-region C/EBPalpha motifs influence transcription in vitro.

PubMed

Klaassen, Kristel; Stankovic, Biljana; Kotur, Nikola; Djordjevic, Maja; Zukic, Branka; Nikcevic, Gordana; Ugrin, Milena; Spasovski, Vesna; Srzentic, Sanja; Pavlovic, Sonja; Stojiljkovic, Maja

2017-02-01

Phenylketonuria (PKU) is a metabolic disease caused by mutations in the phenylalanine hydroxylase (PAH) gene. Although the PAH genotype remains the main determinant of PKU phenotype severity, genotype-phenotype inconsistencies have been reported. In this study, we focused on unanalysed sequences in non-coding PAH gene regions to assess their possible influence on the PKU phenotype. We transiently transfected HepG2 cells with various chloramphenicol acetyl transferase (CAT) reporter constructs which included PAH gene non-coding regions. Selected non-coding regions were indicated by in silico prediction to contain transcription factor binding sites. Furthermore, electrophoretic mobility shift assay (EMSA) and supershift assays were performed to identify which transcriptional factors were engaged in the interaction. We found novel KLF1 motif in the PAH promoter, which decreases CAT activity by 50 % in comparison to basal transcription in vitro. The cytosine at the c.-170 promoter position creates an additional binding site for the protein complex involving KLF1 transcription factor. Moreover, we assessed for the first time the role of a multivariant variable number tandem repeat (VNTR) region located in the 3'-region of the PAH gene. We found that the VNTR3, VNTR7 and VNTR8 constructs had approximately 60 % of CAT activity. The regulation is mediated by the C/EBPalpha transcription factor, present in protein complex binding to VNTR3. Our study highlighted two novel promoter KLF1 and 3'-region C/EBPalpha motifs in the PAH gene which decrease transcription in vitro and, thus, could be considered as PAH expression modifiers. New transcription motifs in non-coding regions will contribute to better understanding of the PKU phenotype complexity and may become important for the optimisation of PKU treatment.
Disruption of long-distance highly conserved noncoding elements in neurocristopathies.

PubMed

Amiel, Jeanne; Benko, Sabina; Gordon, Christopher T; Lyonnet, Stanislas

2010-12-01

One of the key discoveries of vertebrate genome sequencing projects has been the identification of highly conserved noncoding elements (CNEs). Some characteristics of CNEs include their high frequency in mammalian genomes, their potential regulatory role in gene expression, and their enrichment in gene deserts nearby master developmental genes. The abnormal development of neural crest cells (NCCs) leads to a broad spectrum of congenital malformation(s), termed neurocristopathies, and/or tumor predisposition. Here we review recent findings that disruptions of CNEs, within or at long distance from the coding sequences of key genes involved in NCC development, result in neurocristopathies via the alteration of tissue- or stage-specific long-distance regulation of gene expression. While most studies on human genetic disorders have focused on protein-coding sequences, these examples suggest that investigation of genomic alterations of CNEs will provide a broader understanding of the molecular etiology of both rare and common human congenital malformations. © 2010 New York Academy of Sciences.
DOMAINS REARRANGED METHYLTRANSFERASE3 controls DNA methylation and regulates RNA polymerase V transcript abundance in Arabidopsis

PubMed Central

Zhong, Xuehua; Hale, Christopher J.; Nguyen, Minh; Ausin, Israel; Groth, Martin; Hetzel, Jonathan; Vashisht, Ajay A.; Henderson, Ian R.; Wohlschlegel, James A.; Jacobsen, Steven E.

2015-01-01

DNA methylation is a mechanism of epigenetic gene regulation and genome defense conserved in many eukaryotic organisms. In Arabidopsis, the DNA methyltransferase DOMAINS REARRANGED METHYLASE 2 (DRM2) controls RNA-directed DNA methylation in a pathway that also involves the plant-specific RNA Polymerase V (Pol V). Additionally, the Arabidopsis genome encodes an evolutionarily conserved but catalytically inactive DNA methyltransferase, DRM3. Here, we show that DRM3 has moderate effects on global DNA methylation and small RNA abundance and that DRM3 physically interacts with Pol V. In Arabidopsis drm3 mutants, we observe a lower level of Pol V-dependent noncoding RNA transcripts even though Pol V chromatin occupancy is increased at many sites in the genome. These findings suggest that DRM3 acts to promote Pol V transcriptional elongation or assist in the stabilization of Pol V transcripts. This work sheds further light on the mechanism by which long noncoding RNAs facilitate RNA-directed DNA methylation. PMID:25561521
Interplay between chromatin modulators and histone acetylation regulates the formation of accessible chromatin in the upstream regulatory region of fission yeast fbp1.

PubMed

Adachi, Akira; Senmatsu, Satoshi; Asada, Ryuta; Abe, Takuya; Hoffman, Charles S; Ohta, Kunihiro; Hirota, Kouji

2018-05-03

Numerous noncoding RNA transcripts are detected in eukaryotic cells. Noncoding RNAs transcribed across gene promoters are involved in the regulation of mRNA transcription via chromatin modulation. This function of noncoding RNA transcription was first demonstrated for the fission yeast fbp1 gene, where a cascade of noncoding RNA transcription events induces chromatin remodeling to facilitate transcription factor binding. We recently demonstrated that the noncoding RNAs from the fbp1 upstream region facilitate binding of the transcription activator Atf1 and thereby promote histone acetylation. Histone acetylation by histone acetyl transferases (HATs) and ATP-dependent chromatin remodelers (ADCRs) are implicated in chromatin remodeling, but the interplay between HATs and ADCRs in this process has not been fully elucidated. Here, we examine the roles played by two distinct ADCRs, Snf22 and Hrp3, and by the HAT Gcn5 in the transcriptional activation of fbp1. Snf22 and Hrp3 redundantly promote disassembly of chromatin in the fbp1 upstream region. Gcn5 critically contributes to nucleosome eviction in the absence of either Snf22 or Hrp3, presumably by recruiting Hrp3 in snf22∆ cells and Snf22 in hrp3∆ cells. Conversely, Gcn5-dependent histone H3 acetylation is impaired in snf22∆/hrp3∆ cells, suggesting that both redundant ADCRs induce recruitment of Gcn5 to the chromatin array in the fbp1 upstream region. These results reveal a previously unappreciated interplay between ADCRs and histone acetylation in which histone acetylation facilitates recruitment of ADCRs, while ADCRs are required for histone acetylation.
Association of Amine-Receptor DNA Sequence Variants with Associative Learning in the Honeybee.

PubMed

Lagisz, Malgorzata; Mercer, Alison R; de Mouzon, Charlotte; Santos, Luana L S; Nakagawa, Shinichi

2016-03-01

Octopamine- and dopamine-based neuromodulatory systems play a critical role in learning and learning-related behaviour in insects. To further our understanding of these systems and resulting phenotypes, we quantified DNA sequence variations at six loci coding octopamine-and dopamine-receptors and their association with aversive and appetitive learning traits in a population of honeybees. We identified 79 polymorphic sequence markers (mostly SNPs and a few insertions/deletions) located within or close to six candidate genes. Intriguingly, we found that levels of sequence variation in the protein-coding regions studied were low, indicating that sequence variation in the coding regions of receptor genes critical to learning and memory is strongly selected against. Non-coding and upstream regions of the same genes, however, were less conserved and sequence variations in these regions were weakly associated with between-individual differences in learning-related traits. While these associations do not directly imply a specific molecular mechanism, they suggest that the cross-talk between dopamine and octopamine signalling pathways may influence olfactory learning and memory in the honeybee.
Simple Sequence Repeats in Escherichia coli: Abundance, Distribution, Composition, and Polymorphism

PubMed Central

Gur-Arie, Riva; Cohen, Cyril J.; Eitan, Yuval; Shelef, Leora; Hallerman, Eric M.; Kashi, Yechezkel

2000-01-01

Computer-based genome-wide screening of the DNA sequence of Escherichia coli strain K12 revealed tens of thousands of tandem simple sequence repeat (SSR) tracts, with motifs ranging from 1 to 6 nucleotides. SSRs were well distributed throughout the genome. Mononucleotide SSRs were over-represented in noncoding regions and under-represented in open reading frames (ORFs). Nucleotide composition of mono- and dinucleotide SSRs, both in ORFs and in noncoding regions, differed from that of the genomic region in which they occurred, with 93% of all mononucleotide SSRs proving to be of A or T. Computer-based analysis of the fine position of every SSR locus in the noncoding portion of the genome relative to downstream ORFs showed SSRs located in areas that could affect gene regulation. DNA sequences at 14 arbitrarily chosen SSR tracts were compared among E. coli strains. Polymorphisms of SSR copy number were observed at four of seven mononucleotide SSR tracts screened, with all polymorphisms occurring in noncoding regions. SSR polymorphism could prove important as a genome-wide source of variation, both for practical applications (including rapid detection, strain identification, and detection of loci affecting key phenotypes) and for evolutionary adaptation of microbes.[The sequence data described in this paper have been submitted to the GenBank data library under accession numbers AF209020–209030 and AF209508–209518.] PMID:10645951
An Enhancer Near ISL1 and an Ultraconserved Exon of PCBP2 areDerived from a Retroposon

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bejerano, Gill; Lowe, Craig; Ahituv, Nadav

2005-11-27

Hundreds of highly conserved distal cis-regulatory elementshave been characterized to date in vertebrate genomes1. Many thousandsmore are predicted based on comparative genomics2,3. Yet, in starkcontrast to the genes they regulate, virtually none of these regions canbe traced using sequence similarity in invertebrates, leaving theirevolutionary origin obscure. Here we show that a class of conserved,primarily non-coding regions in tetrapods originated from a novel shortinterspersed repetitive element (SINE) retroposon family that was activein Sarcopterygii (lobe-finned fishes and terrestrial vertebrates) in theSilurian at least 410 Mya4, and, remarkably, appears to be recentlyactive in the "living fossil" Indonesian coelacanth, Latimeriamenadoensis. We show that onemore » copy is a distal enhancer, located 500kbfrom the neuro-developmental gene ISL1. Several others represent new,possibly regulatory, alternatively spliced exons in the middle ofpre-existing Sarcopterygian genes. One of these is the>200bpultraconserved region5, 100 percent identical in mammals, and 80 percentidentical to the coelacanth SINE, that contains a 31aa alternativelyspliced exon of the mRNA processing gene PCBP26. These add to a growinglist of examples7 in which relics of transposable elements have acquireda function that serves their host, a process termed "exaptation"8, andprovide an origin for at least some of the highly-conservedvertebrate-specific genomic sequences recently discovered usingcomparative genomics.« less
Analysis and recognition of 5′ UTR intron splice sites in human pre-mRNA

PubMed Central

Eden, E.; Brunak, S.

2004-01-01

Prediction of splice sites in non-coding regions of genes is one of the most challenging aspects of gene structure recognition. We perform a rigorous analysis of such splice sites embedded in human 5′ untranslated regions (UTRs), and investigate correlations between this class of splice sites and other features found in the adjacent exons and introns. By restricting the training of neural network algorithms to ‘pure’ UTRs (not extending partially into protein coding regions), we for the first time investigate the predictive power of the splicing signal proper, in contrast to conventional splice site prediction, which typically relies on the change in sequence at the transition from protein coding to non-coding. By doing so, the algorithms were able to pick up subtler splicing signals that were otherwise masked by ‘coding’ noise, thus enhancing significantly the prediction of 5′ UTR splice sites. For example, the non-coding splice site predicting networks pick up compositional and positional bias in the 3′ ends of non-coding exons and 5′ non-coding intron ends, where cytosine and guanine are over-represented. This compositional bias at the true UTR donor sites is also visible in the synaptic weights of the neural networks trained to identify UTR donor sites. Conventional splice site prediction methods perform poorly in UTRs because the reading frame pattern is absent. The NetUTR method presented here performs 2–3-fold better compared with NetGene2 and GenScan in 5′ UTRs. We also tested the 5′ UTR trained method on protein coding regions, and discovered, surprisingly, that it works quite well (although it cannot compete with NetGene2). This indicates that the local splicing pattern in UTRs and coding regions is largely the same. The NetUTR method is made publicly available at www.cbs.dtu.dk/services/NetUTR. PMID:14960723
Long Noncoding RNA-Associated Transcriptomic Changes in Resiliency or Susceptibility to Depression and Response to Antidepressant Treatment

PubMed Central

Roy, Bhaskar; Wang, Qingzhong; Dwivedi, Yogesh

2018-01-01

Abstract Background Recent emergence of long noncoding RNAs in regulating gene expression and thereby modulating physiological functions in brain has manifested their possible role in psychiatric disorders. In this study, the roles of long noncoding RNAs in susceptibility and resiliency to develop stress-induced depression and their response to antidepressant treatment were examined. Methods Microarray-based transcriptome-wide changes in long noncoding RNAs were determined in hippocampus of male Holtzman rats who showed susceptibility (learned helplessness) or resiliency (nonlearned helplessness) to develop depression. Changes in long noncoding RNA expression were also ascertained after subchronic administration of fluoxetine to learned helplessness rats. Bioinformatic and target prediction analyses (cis- and trans-acting) and qPCR-based assays were performed to decipher the functional role of altered long noncoding RNAs. Results Group-wise comparison showed an overrepresented class of long noncoding RNAs that were uniquely associated with nonlearned helplessness or learned helplessness behavior. Chromosomal mapping within the 5-kbp flank region of the top 20 dysregulated long noncoding RNAs in the learned helplessness group showed several target genes that were regulated through cis- or trans-actions, including Zbtb20 and Zfp385b from zinc finger binding protein family. Genomic context of differentially expressed long noncoding RNAs showed an overall blunted response in the learned helplessness group regardless of the long noncoding RNA classes analyzed. Gene ontology exhibited the functional clustering for anatomical structure development, cellular architecture modulation, protein metabolism, and cellular communications. Fluoxetine treatment reversed learned helplessness-induced changes in many long noncoding RNAs and target genes. Conclusions The involvement of specific classes of long noncoding RNAs with distinctive roles in modulating target gene expression could confer the role of long noncoding RNAs in resiliency or susceptibility to develop depression with a reciprocal response to antidepressant treatment. PMID:29390069
Evolution of the unspliced transcriptome.

PubMed

Engelhardt, Jan; Stadler, Peter F

2015-08-20

Despite their abundance, unspliced EST data have received little attention as a source of information on non-coding RNAs. Very little is know, therefore, about the genomic distribution of unspliced non-coding transcripts and their relationship with the much better studied regularly spliced products. In particular, their evolution has remained virtually unstudied. We systematically study the evidence on unspliced transcripts available in EST annotation tracks for human and mouse, comprising 104,980 and 66,109 unspliced EST clusters, respectively. Roughly one third of these are located totally inside introns of known genes (TINs) and another third overlaps exonic regions (PINs). Eleven percent are "intergenic", far away from any annotated gene. Direct evidence for the independent transcription of many PINs and TINs is obtained from CAGE tag and chromatin data. We predict more than 2000 3'UTR-associated RNA candidates for each human and mouse. Fifteen to twenty percent of the unspliced EST cluster are conserved between human and mouse. With the exception of TINs, the sequences of unspliced EST clusters evolve significantly slower than genomic background. Furthermore, like spliced lincRNAs, they show highly tissue-specific expression patterns. Unspliced long non-coding RNAs are an important, rapidly evolving, component of mammalian transcriptomes. Their analysis is complicated by their preferential association with complex transcribed loci that usually also harbor a plethora of spliced transcripts. Unspliced EST data, although typically disregarded in transcriptome analysis, can be used to gain insights into this rarely investigated transcriptome component. The frequently postulated connection between lack of splicing and nuclear retention and the surprising overlap of chromatin-associated transcripts suggests that this class of transcripts might be involved in chromatin organization and possibly other mechanisms of epigenetic control.
Variability among the Most Rapidly Evolving Plastid Genomic Regions is Lineage-Specific: Implications of Pairwise Genome Comparisons in Pyrus (Rosaceae) and Other Angiosperms for Marker Choice

PubMed Central

Ter-Voskanyan, Hasmik; Allgaier, Martin; Borsch, Thomas

2014-01-01

Plastid genomes exhibit different levels of variability in their sequences, depending on the respective kinds of genomic regions. Genes are usually more conserved while noncoding introns and spacers evolve at a faster pace. While a set of about thirty maximum variable noncoding genomic regions has been suggested to provide universally promising phylogenetic markers throughout angiosperms, applications often require several regions to be sequenced for many individuals. Our project aims to illuminate evolutionary relationships and species-limits in the genus Pyrus (Rosaceae)—a typical case with very low genetic distances between taxa. In this study, we have sequenced the plastid genome of Pyrus spinosa and aligned it to the already available P. pyrifolia sequence. The overall p-distance of the two Pyrus genomes was 0.00145. The intergenic spacers between ndhC–trnV, trnR–atpA, ndhF–rpl32, psbM–trnD, and trnQ–rps16 were the most variable regions, also comprising the highest total numbers of substitutions, indels and inversions (potentially informative characters). Our comparative analysis of further plastid genome pairs with similar low p-distances from Oenothera (representing another rosid), Olea (asterids) and Cymbidium (monocots) showed in each case a different ranking of genomic regions in terms of variability and potentially informative characters. Only two intergenic spacers (ndhF–rpl32 and trnK–rps16) were consistently found among the 30 top-ranked regions. We have mapped the occurrence of substitutions and microstructural mutations in the four genome pairs. High AT content in specific sequence elements seems to foster frequent mutations. We conclude that the variability among the fastest evolving plastid genomic regions is lineage-specific and thus cannot be precisely predicted across angiosperms. The often lineage-specific occurrence of stem-loop elements in the sequences of introns and spacers also governs lineage-specific mutations. Sequencing whole plastid genomes to find markers for evolutionary analyses is therefore particularly useful when overall genetic distances are low. PMID:25405773
Positive selection in the SLC11A1 gene in the family Equidae.

PubMed

Bayerova, Zuzana; Janova, Eva; Matiasovic, Jan; Orlando, Ludovic; Horin, Petr

2016-05-01

Immunity-related genes are a suitable model for studying effects of selection at the genomic level. Some of them are highly conserved due to functional constraints and purifying selection, while others are variable and change quickly to cope with the variation of pathogens. The SLC11A1 gene encodes a transporter protein mediating antimicrobial activity of macrophages. Little is known about the patterns of selection shaping this gene during evolution. Although it is a typical evolutionarily conserved gene, functionally important polymorphisms associated with various diseases were identified in humans and other species. We analyzed the genomic organization, genetic variation, and evolution of the SLC11A1 gene in the family Equidae to identify patterns of selection within this important gene. Nucleotide SLC11A1 sequences were shown to be highly conserved in ten equid species, with more than 97 % sequence identity across the family. Single nucleotide polymorphisms (SNPs) were found in the coding and noncoding regions of the gene. Seven codon sites were identified to be under strong purifying selection. Codons located in three regions, including the glycosylated extracellular loop, were shown to be under diversifying selection. A 3-bp indel resulting in a deletion of the amino acid 321 in the predicted protein was observed in all horses, while it has been maintained in all other equid species. This codon comprised in an N-glycosylation site was found to be under positive selection. Interspecific variation in the presence of predicted N-glycosylation sites was observed.
Effects of GWAS-Associated Genetic Variants on lncRNAs within IBD and T1D Candidate Loci

PubMed Central

Brorsson, Caroline A.; Pociot, Flemming

2014-01-01

Long non-coding RNAs are a new class of non-coding RNAs that are at the crosshairs in many human diseases such as cancers, cardiovascular disorders, inflammatory and autoimmune disease like Inflammatory Bowel Disease (IBD) and Type 1 Diabetes (T1D). Nearly 90% of the phenotype-associated single-nucleotide polymorphisms (SNPs) identified by genome-wide association studies (GWAS) lie outside of the protein coding regions, and map to the non-coding intervals. However, the relationship between phenotype-associated loci and the non-coding regions including the long non-coding RNAs (lncRNAs) is poorly understood. Here, we systemically identified all annotated IBD and T1D loci-associated lncRNAs, and mapped nominally significant GWAS/ImmunoChip SNPs for IBD and T1D within these lncRNAs. Additionally, we identified tissue-specific cis-eQTLs, and strong linkage disequilibrium (LD) signals associated with these SNPs. We explored sequence and structure based attributes of these lncRNAs, and also predicted the structural effects of mapped SNPs within them. We also identified lncRNAs in IBD and T1D that are under recent positive selection. Our analysis identified putative lncRNA secondary structure-disruptive SNPs within and in close proximity (+/−5 kb flanking regions) of IBD and T1D loci-associated candidate genes, suggesting that these RNA conformation-altering polymorphisms might be associated with diseased-phenotype. Disruption of lncRNA secondary structure due to presence of GWAS SNPs provides valuable information that could be potentially useful for future structure-function studies on lncRNAs. PMID:25144376
Dynamic interplay and function of multiple noncoding genes governing X chromosome inactivation

PubMed Central

Yue, Minghui; Richard, John Lalith Charles

2015-01-01

There is increasing evidence for the emergence of long noncoding RNAs (IncRNAs) as important components, especially in the regulation of gene expression. In the event of X chromosome inactivation, robust epigenetic marks are established in a long noncoding Xist RNA-dependent manner, giving rise to a distinct epigenetic landscape on the inactive X chromosome (Xi). The X inactivation center (Xic is essential for induction of X chromosome inactivation and harbors two topologically associated domains (TADs) to regulate monoallelic Xist expression: one at the noncoding Xist gene and its upstream region, and the other at the antisense Tsix and its upstream region. The monoallelic expression of Xist is tightly regulated by these two functionally distinct TADs as well as their constituting IncRNAs and proteins. In this review, we summarize recent updates in our knowledge of IncRNAs found at the Xic and discuss their overall mechanisms of action. We also discuss our current understanding of the molecular mechanism behind Xist RNA-mediated induction of the repressive epigenetic landscape at the Xi. PMID:26260844
A genome-wide survey of maternal and embryonic transcripts during Xenopus tropicalis development.

PubMed

Paranjpe, Sarita S; Jacobi, Ulrike G; van Heeringen, Simon J; Veenstra, Gert Jan C

2013-11-06

Dynamics of polyadenylation vs. deadenylation determine the fate of several developmentally regulated genes. Decay of a subset of maternal mRNAs and new transcription define the maternal-to-zygotic transition, but the full complement of polyadenylated and deadenylated coding and non-coding transcripts has not yet been assessed in Xenopus embryos. To analyze the dynamics and diversity of coding and non-coding transcripts during development, both polyadenylated mRNA and ribosomal RNA-depleted total RNA were harvested across six developmental stages and subjected to high throughput sequencing. The maternally loaded transcriptome is highly diverse and consists of both polyadenylated and deadenylated transcripts. Many maternal genes show peak expression in the oocyte and include genes which are known to be the key regulators of events like oocyte maturation and fertilization. Of all the transcripts that increase in abundance between early blastula and larval stages, about 30% of the embryonic genes are induced by fourfold or more by the late blastula stage and another 35% by late gastrulation. Using a gene model validation and discovery pipeline, we identified novel transcripts and putative long non-coding RNAs (lncRNA). These lncRNA transcripts were stringently selected as spliced transcripts generated from independent promoters, with limited coding potential and a codon bias characteristic of noncoding sequences. Many lncRNAs are conserved and expressed in a developmental stage-specific fashion. These data reveal dynamics of transcriptome polyadenylation and abundance and provides a high-confidence catalogue of novel and long non-coding RNAs.
Variations in the non-coding transcriptome as a driver of inter-strain divergence and physiological adaptation in bacteria

PubMed Central

Kopf, Matthias; Klähn, Stephan; Scholz, Ingeborg; Hess, Wolfgang R.; Voß, Björn

2015-01-01

In all studied organisms, a substantial portion of the transcriptome consists of non-coding RNAs that frequently execute regulatory functions. Here, we have compared the primary transcriptomes of the cyanobacteria Synechocystis sp. PCC 6714 and PCC 6803 under 10 different conditions. These strains share 2854 protein-coding genes and a 16S rRNA identity of 99.4%, indicating their close relatedness. Conserved major transcriptional start sites (TSSs) give rise to non-coding transcripts within the sigB gene, from the 5′UTRs of cmpA and isiA, and 168 loci in antisense orientation. Distinct differences include single nucleotide polymorphisms rendering promoters inactive in one of the strains, e.g., for cmpR and for the asRNA PsbA2R. Based on the genome-wide mapped location, regulation and classification of TSSs, non-coding transcripts were identified as the most dynamic component of the transcriptome. We identified a class of mRNAs that originate by read-through from an sRNA that accumulates as a discrete and abundant transcript while also serving as the 5′UTR. Such an sRNA/mRNA structure, which we name ‘actuaton’, represents another way for bacteria to remodel their transcriptional network. Our findings support the hypothesis that variations in the non-coding transcriptome constitute a major evolutionary element of inter-strain divergence and capability for physiological adaptation. PMID:25902393

Hundreds of conserved non-coding genomic regions are independently lost in mammals

PubMed Central

Hiller, Michael; Schaar, Bruce T.; Bejerano, Gill

2012-01-01

Conserved non-protein-coding DNA elements (CNEs) often encode cis-regulatory elements and are rarely lost during evolution. However, CNE losses that do occur can be associated with phenotypic changes, exemplified by pelvic spine loss in sticklebacks. Using a computational strategy to detect complete loss of CNEs in mammalian genomes while strictly controlling for artifacts, we find >600 CNEs that are independently lost in at least two mammalian lineages, including a spinal cord enhancer near GDF11. We observed several genomic regions where multiple independent CNE loss events happened; the most extreme is the DIAPH2 locus. We show that CNE losses often involve deletions and that CNE loss frequencies are non-uniform. Similar to less pleiotropic enhancers, we find that independently lost CNEs are shorter, slightly less constrained and evolutionarily younger than CNEs without detected losses. This suggests that independently lost CNEs are less pleiotropic and that pleiotropic constraints contribute to non-uniform CNE loss frequencies. We also detected 35 CNEs that are independently lost in the human lineage and in other mammals. Our study uncovers an interesting aspect of the evolution of functional DNA in mammalian genomes. Experiments are necessary to test if these independently lost CNEs are associated with parallel phenotype changes in mammals. PMID:23042682
The sequence of camelpox virus shows it is most closely related to variola virus, the cause of smallpox.

PubMed

Gubser, Caroline; Smith, Geoffrey L

2002-04-01

Camelpox virus (CMPV) and variola virus (VAR) are orthopoxviruses (OPVs) that share several biological features and cause high mortality and morbidity in their single host species. The sequence of a virulent CMPV strain was determined; it is 202182 bp long, with inverted terminal repeats (ITRs) of 6045 bp and has 206 predicted open reading frames (ORFs). As for other poxviruses, the genes are tightly packed with little non-coding sequence. Most genes within 25 kb of each terminus are transcribed outwards towards the terminus, whereas genes within the centre of the genome are transcribed from either DNA strand. The central region of the genome contains genes that are highly conserved in other OPVs and 87 of these are conserved in all sequenced chordopoxviruses. In contrast, genes towards either terminus are more variable and encode proteins involved in host range, virulence or immunomodulation. In some cases, these are broken versions of genes found in other OPVs. The relationship of CMPV to other OPVs was analysed by comparisons of DNA and predicted protein sequences, repeats within the ITRs and arrangement of ORFs within the terminal regions. Each comparison gave the same conclusion: CMPV is the closest known virus to variola virus, the cause of smallpox.
The Spot 42 RNA: A regulatory small RNA with roles in the central metabolism

PubMed Central

Bækkedal, Cecilie; Haugen, Peik

2015-01-01

The Spot 42 RNA is a 109 nucleotide long (in Escherichia coli) noncoding small regulatory RNA (sRNA) encoded by the spf (spot fourty-two) gene. spf is found in gamma-proteobacteria and the majority of experimental work on Spot 42 RNA has been performed using E. coli, and recently Aliivibrio salmonicida. In the cell Spot 42 RNA plays essential roles as a regulator in carbohydrate metabolism and uptake, and its expression is activated by glucose, and inhibited by the cAMP-CRP complex. Here we summarize the current knowledge on Spot 42, and present the natural distribution of spf, show family-specific secondary structural features of Spot 42, and link highly conserved structural regions to mRNA target binding. PMID:26327359
Comparative genomics reveals insights into avian genome evolution and adaptation.

PubMed

Zhang, Guojie; Li, Cai; Li, Qiye; Li, Bo; Larkin, Denis M; Lee, Chul; Storz, Jay F; Antunes, Agostinho; Greenwold, Matthew J; Meredith, Robert W; Ödeen, Anders; Cui, Jie; Zhou, Qi; Xu, Luohao; Pan, Hailin; Wang, Zongji; Jin, Lijun; Zhang, Pei; Hu, Haofu; Yang, Wei; Hu, Jiang; Xiao, Jin; Yang, Zhikai; Liu, Yang; Xie, Qiaolin; Yu, Hao; Lian, Jinmin; Wen, Ping; Zhang, Fang; Li, Hui; Zeng, Yongli; Xiong, Zijun; Liu, Shiping; Zhou, Long; Huang, Zhiyong; An, Na; Wang, Jie; Zheng, Qiumei; Xiong, Yingqi; Wang, Guangbiao; Wang, Bo; Wang, Jingjing; Fan, Yu; da Fonseca, Rute R; Alfaro-Núñez, Alonzo; Schubert, Mikkel; Orlando, Ludovic; Mourier, Tobias; Howard, Jason T; Ganapathy, Ganeshkumar; Pfenning, Andreas; Whitney, Osceola; Rivas, Miriam V; Hara, Erina; Smith, Julia; Farré, Marta; Narayan, Jitendra; Slavov, Gancho; Romanov, Michael N; Borges, Rui; Machado, João Paulo; Khan, Imran; Springer, Mark S; Gatesy, John; Hoffmann, Federico G; Opazo, Juan C; Håstad, Olle; Sawyer, Roger H; Kim, Heebal; Kim, Kyu-Won; Kim, Hyeon Jeong; Cho, Seoae; Li, Ning; Huang, Yinhua; Bruford, Michael W; Zhan, Xiangjiang; Dixon, Andrew; Bertelsen, Mads F; Derryberry, Elizabeth; Warren, Wesley; Wilson, Richard K; Li, Shengbin; Ray, David A; Green, Richard E; O'Brien, Stephen J; Griffin, Darren; Johnson, Warren E; Haussler, David; Ryder, Oliver A; Willerslev, Eske; Graves, Gary R; Alström, Per; Fjeldså, Jon; Mindell, David P; Edwards, Scott V; Braun, Edward L; Rahbek, Carsten; Burt, David W; Houde, Peter; Zhang, Yong; Yang, Huanming; Wang, Jian; Jarvis, Erich D; Gilbert, M Thomas P; Wang, Jun

2014-12-12

Birds are the most species-rich class of tetrapod vertebrates and have wide relevance across many research fields. We explored bird macroevolution using full genomes from 48 avian species representing all major extant clades. The avian genome is principally characterized by its constrained size, which predominantly arose because of lineage-specific erosion of repetitive elements, large segmental deletions, and gene loss. Avian genomes furthermore show a remarkably high degree of evolutionary stasis at the levels of nucleotide sequence, gene synteny, and chromosomal structure. Despite this pattern of conservation, we detected many non-neutral evolutionary changes in protein-coding genes and noncoding regions. These analyses reveal that pan-avian genomic diversity covaries with adaptations to different lifestyles and convergent evolution of traits. Copyright © 2014, American Association for the Advancement of Science.
Complete sequence of two tick-borne flaviviruses isolated from Siberia and the UK: analysis and significance of the 5' and 3'-UTRs.

PubMed

Gritsun, T S; Venugopal, K; Zanotto, P M; Mikhailov, M V; Sall, A A; Holmes, E C; Polkinghorne, I; Frolova, T V; Pogodina, V V; Lashkevich, V A; Gould, E A

1997-05-01

The complete nucleotide sequence of two tick-transmitted flaviviruses, Vasilchenko (Vs) from Siberia and louping ill (LI) from the UK, have been determined. The genomes were respectively, 10928 and 10871 nucleotides (nt) in length. The coding strategy and functional protein sequence motifs of tick-borne flaviviruses are presented in both Vs and LI viruses. The phylogenies based on maximum likelihood, maximum parsimony and distance analysis of the polyproteins, identified Vs virus as a member of the tick-borne encephalitis virus subgroup within the tick-borne serocomplex, genus Flavivirus, family Flaviviridae. Comparative alignment of the 3'-untranslated regions revealed deletions of different lengths essentially at the same position downstream of the stop codon for all tick-borne viruses. Two direct 27 nucleotide repeats at the 3'-end were found only for Vs and LI virus. Immediately following the deletions a region of 332-334 nt with relatively conserved primary structure (67-94% identity) was observed at the 3'-non-coding end of the virus genome. Pairwise comparisons of the nucleotide sequence data revealed similar levels of variation between the coding region, and the 5' and 3'-termini of the genome, implying an equivalent strong selective control for translated and untranslated regions. Indeed the predicted folding of the 5' and 3'-untranslated regions revealed patterns of stem and loop structures conserved for all tick-borne flaviviruses suggesting a purifying selection for preservation of essential RNA secondary structures which could be involved in translational control and replication. The possible implications of these findings are discussed.
Divergent genome evolution caused by regional variation in DNA gain and loss between human and mouse

PubMed Central

Kortschak, R. Daniel

2018-01-01

The forces driving the accumulation and removal of non-coding DNA and ultimately the evolution of genome size in complex organisms are intimately linked to genome structure and organisation. Our analysis provides a novel method for capturing the regional variation of lineage-specific DNA gain and loss events in their respective genomic contexts. To further understand this connection we used comparative genomics to identify genome-wide individual DNA gain and loss events in the human and mouse genomes. Focusing on the distribution of DNA gains and losses, relationships to important structural features and potential impact on biological processes, we found that in autosomes, DNA gains and losses both followed separate lineage-specific accumulation patterns. However, in both species chromosome X was particularly enriched for DNA gain, consistent with its high L1 retrotransposon content required for X inactivation. We found that DNA loss was associated with gene-rich open chromatin regions and DNA gain events with gene-poor closed chromatin regions. Additionally, we found that DNA loss events tended to be smaller than DNA gain events suggesting that they were able to accumulate in gene-rich open chromatin regions due to their reduced capacity to interrupt gene regulatory architecture. GO term enrichment showed that mouse loss hotspots were strongly enriched for terms related to developmental processes. However, these genes were also located in regions with a high density of conserved elements, suggesting that despite high levels of DNA loss, gene regulatory architecture remained conserved. This is consistent with a model in which DNA gain and loss results in turnover or “churning” in regulatory element dense regions of open chromatin, where interruption of regulatory elements is selected against. PMID:29677183
Long-range comparison of human and mouse Sprr loci to identify conserved noncoding sequences involved in coordinate regulation

PubMed Central

Martin, Natalia; Patel, Satyakam; Segre, Julia A.

2004-01-01

Mammalian epidermis provides a permeability barrier between an organism and its environment. Under homeostatic conditions, epidermal cells produce structural proteins, which are cross-linked in an orderly fashion to form a cornified envelope (CE). However, under genetic or environmental stress, specific genes are induced to rapidly build a temporary barrier. Small proline-rich (SPRR) proteins are the primary constituents of the CE. Under stress the entire family of 14 Sprr genes is upregulated. The Sprr genes are clustered within the larger epidermal differentiation complex on mouse chromosome 3, human chromosome 1q21. The clustering of the Sprr genes and their upregulation under stress suggest that these genes may be coordinately regulated. To identify enhancer elements that regulate this stress response activation of the Sprr locus, we utilized bioinformatic tools and classical biochemical dissection. Long-range comparative sequence analysis identified conserved noncoding sequences (CNSs). Clusters of epidermal-specific DNaseI-hypersensitive sites (HSs) mapped to specific CNSs. Increased prevalence of these HSs in barrier-deficient epidermis provides in vivo evidence of the regulation of the Sprr locus by these conserved sequences. Individual components of these HSs were cloned, and one was shown to have strong enhancer activity specific to conditions when the Sprr genes are coordinately upregulated. PMID:15574822
Antisense Transcription Is Pervasive but Rarely Conserved in Enteric Bacteria

PubMed Central

Raghavan, Rahul; Sloan, Daniel B.; Ochman, Howard

2012-01-01

ABSTRACT Noncoding RNAs, including antisense RNAs (asRNAs) that originate from the complementary strand of protein-coding genes, are involved in the regulation of gene expression in all domains of life. Recent application of deep-sequencing technologies has revealed that the transcription of asRNAs occurs genome-wide in bacteria. Although the role of the vast majority of asRNAs remains unknown, it is often assumed that their presence implies important regulatory functions, similar to those of other noncoding RNAs. Alternatively, many antisense transcripts may be produced by chance transcription events from promoter-like sequences that result from the degenerate nature of bacterial transcription factor binding sites. To investigate the biological relevance of antisense transcripts, we compared genome-wide patterns of asRNA expression in closely related enteric bacteria, Escherichia coli and Salmonella enterica serovar Typhimurium, by performing strand-specific transcriptome sequencing. Although antisense transcripts are abundant in both species, less than 3% of asRNAs are expressed at high levels in both species, and only about 14% appear to be conserved among species. And unlike the promoters of protein-coding genes, asRNA promoters show no evidence of sequence conservation between, or even within, species. Our findings suggest that many or even most bacterial asRNAs are nonadaptive by-products of the cell’s transcription machinery. PMID:22872780
Emergence and Evolution of Hominidae-Specific Coding and Noncoding Genomic Sequences.

PubMed

Saber, Morteza Mahmoudi; Adeyemi Babarinde, Isaac; Hettiarachchi, Nilmini; Saitou, Naruya

2016-07-12

Family Hominidae, which includes humans and great apes, is recognized for unique complex social behavior and intellectual abilities. Despite the increasing genome data, however, the genomic origin of its phenotypic uniqueness has remained elusive. Clade-specific genes and highly conserved noncoding sequences (HCNSs) are among the high-potential evolutionary candidates involved in driving clade-specific characters and phenotypes. On this premise, we analyzed whole genome sequences along with gene orthology data retrieved from major DNA databases to find Hominidae-specific (HS) genes and HCNSs. We discovered that Down syndrome critical region 4 (DSCR4) is the only experimentally verified gene uniquely present in Hominidae. DSCR4 has no structural homology to any known protein and was inferred to have emerged in several steps through LTR/ERV1, LTR/ERVL retrotransposition, and transversion. Using the genomic distance as neutral evolution threshold, we identified 1,658 HS HCNSs. Polymorphism coverage and derived allele frequency analysis of HS HCNSs showed that these HCNSs are under purifying selection, indicating that they may harbor important functions. They are overrepresented in promoters/untranslated regions, in close proximity of genes involved in sensory perception of sound and developmental process, and also showed a significantly lower nucleosome occupancy probability. Interestingly, many ancestral sequences of the HS HCNSs showed very high evolutionary rates. This suggests that new functions emerged through some kind of positive selection, and then purifying selection started to operate to keep these functions. © The Author(s) 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Genome-Wide Discovery of Long Non-Coding RNAs in Rainbow Trout.

PubMed

Al-Tobasei, Rafet; Paneru, Bam; Salem, Mohamed

2016-01-01

The ENCODE project revealed that ~70% of the human genome is transcribed. While only 1-2% of the RNAs encode for proteins, the rest are non-coding RNAs. Long non-coding RNAs (lncRNAs) form a diverse class of non-coding RNAs that are longer than 200 nt. Emerging evidence indicates that lncRNAs play critical roles in various cellular processes including regulation of gene expression. LncRNAs show low levels of gene expression and sequence conservation, which make their computational identification in genomes difficult. In this study, more than two billion Illumina sequence reads were mapped to the genome reference using the TopHat and Cufflinks software. Transcripts shorter than 200 nt, with more than 83-100 amino acids ORF, or with significant homologies to the NCBI nr-protein database were removed. In addition, a computational pipeline was used to filter the remaining transcripts based on a protein-coding-score test. Depending on the filtering stringency conditions, between 31,195 and 54,503 lncRNAs were identified, with only 421 matching known lncRNAs in other species. A digital gene expression atlas revealed 2,935 tissue-specific and 3,269 ubiquitously-expressed lncRNAs. This study annotates the lncRNA rainbow trout genome and provides a valuable resource for functional genomics research in salmonids.
Genomic assessment of the evolution of the prion protein gene family in vertebrates.

PubMed

Harrison, Paul M; Khachane, Amit; Kumar, Manish

2010-05-01

Prion diseases are devastating neurological disorders caused by the propagation of particles containing an alternative beta-sheet-rich form of the prion protein (PrP). Genes paralogous to PrP, called Doppel and Shadoo, have been identified, that also have neuropathological relevance. To aid in the further functional characterization of PrP and its relatives, we annotated completely the PrP gene family (PrP-GF), in the genomes of 42 vertebrates, through combined strategic application of gene prediction programs and advanced remote homology detection techniques (such as HMMs, PSI-TBLASTN and pGenThreader). We have uncovered several previously undescribed paralogous genes and pseudogenes. We find that current high-quality genomic evidence indicates that the PrP relative Doppel, was likely present in the last common ancestor of present-day Tetrapoda, but was lost in the bird lineage, since its divergence from reptiles. Using the new gene annotations, we have defined the consensus of structural features that are characteristic of the PrP and Doppel structures, across diverse Tetrapoda clades. Furthermore, we describe in detail a transcribed pseudogene derived from Shadoo that is conserved across primates, and that overlaps the meiosis gene, SYCE1, thus possibly regulating its expression. In addition, we analysed the locus of PRNP/PRND for significant conservation across the genomic DNA of eleven mammals, and determined the phylogenetic penetration of non-coding exons. The genomic evidence indicates that the second PRNP non-coding exon found in even-toed ungulates and rodents, is conserved in all high-coverage genome assemblies of primates (human, chimp, orang utan and macaque), and is, at least, likely to have fallen out of use during primate speciation. Furthermore, we have demonstrated that the PRNT gene (at the PRNP human locus) is conserved across at least sixteen mammals, and evolves like a long non-coding RNA, fashioned from fragments of ancient, long, interspersed elements. These annotations and evolutionary analyses will be of further use for functional characterisation of the PrP-GF, and will be updatable in a semi-automated fashion as more genomes accumulate. Copyright 2010 Elsevier Inc. All rights reserved.
Leveraging cross-species transcription factor binding site patterns: from diabetes risk loci to disease mechanisms.

PubMed

Claussnitzer, Melina; Dankel, Simon N; Klocke, Bernward; Grallert, Harald; Glunk, Viktoria; Berulava, Tea; Lee, Heekyoung; Oskolkov, Nikolay; Fadista, Joao; Ehlers, Kerstin; Wahl, Simone; Hoffmann, Christoph; Qian, Kun; Rönn, Tina; Riess, Helene; Müller-Nurasyid, Martina; Bretschneider, Nancy; Schroeder, Timm; Skurk, Thomas; Horsthemke, Bernhard; Spieler, Derek; Klingenspor, Martin; Seifert, Martin; Kern, Michael J; Mejhert, Niklas; Dahlman, Ingrid; Hansson, Ola; Hauck, Stefanie M; Blüher, Matthias; Arner, Peter; Groop, Leif; Illig, Thomas; Suhre, Karsten; Hsu, Yi-Hsiang; Mellgren, Gunnar; Hauner, Hans; Laumen, Helmut

2014-01-16

Genome-wide association studies have revealed numerous risk loci associated with diverse diseases. However, identification of disease-causing variants within association loci remains a major challenge. Divergence in gene expression due to cis-regulatory variants in noncoding regions is central to disease susceptibility. We show that integrative computational analysis of phylogenetic conservation with a complexity assessment of co-occurring transcription factor binding sites (TFBS) can identify cis-regulatory variants and elucidate their mechanistic role in disease. Analysis of established type 2 diabetes risk loci revealed a striking clustering of distinct homeobox TFBS. We identified the PRRX1 homeobox factor as a repressor of PPARG2 expression in adipose cells and demonstrate its adverse effect on lipid metabolism and systemic insulin sensitivity, dependent on the rs4684847 risk allele that triggers PRRX1 binding. Thus, cross-species conservation analysis at the level of co-occurring TFBS provides a valuable contribution to the translation of genetic association signals to disease-related molecular mechanisms. Copyright © 2014 Elsevier Inc. All rights reserved.
Regulatory elements of Caenorhabditis elegans ribosomal protein genes

PubMed Central

2012-01-01

Background Ribosomal protein genes (RPGs) are essential, tightly regulated, and highly expressed during embryonic development and cell growth. Even though their protein sequences are strongly conserved, their mechanism of regulation is not conserved across yeast, Drosophila, and vertebrates. A recent investigation of genomic sequences conserved across both nematode species and associated with different gene groups indicated the existence of several elements in the upstream regions of C. elegans RPGs, providing a new insight regarding the regulation of these genes in C. elegans. Results In this study, we performed an in-depth examination of C. elegans RPG regulation and found nine highly conserved motifs in the upstream regions of C. elegans RPGs using the motif discovery algorithm DME. Four motifs were partially similar to transcription factor binding sites from C. elegans, Drosophila, yeast, and human. One pair of these motifs was found to co-occur in the upstream regions of 250 transcripts including 22 RPGs. The distance between the two motifs displayed a complex frequency pattern that was related to their relative orientation. We tested the impact of three of these motifs on the expression of rpl-2 using a series of reporter gene constructs and showed that all three motifs are necessary to maintain the high natural expression level of this gene. One of the motifs was similar to the binding site of an orthologue of POP-1, and we showed that RNAi knockdown of pop-1 impacts the expression of rpl-2. We further determined the transcription start site of rpl-2 by 5’ RACE and found that the motifs lie 40–90 bases upstream of the start site. We also found evidence that a noncoding RNA, contained within the outron of rpl-2, is co-transcribed with rpl-2 and cleaved during trans-splicing. Conclusions Our results indicate that C. elegans RPGs are regulated by a complex novel series of regulatory elements that is evolutionarily distinct from those of all other species examined up until now. PMID:22928635
Restless legs syndrome-associated intronic common variant in Meis1 alters enhancer function in the developing telencephalon.

PubMed

Spieler, Derek; Kaffe, Maria; Knauf, Franziska; Bessa, José; Tena, Juan J; Giesert, Florian; Schormair, Barbara; Tilch, Erik; Lee, Heekyoung; Horsch, Marion; Czamara, Darina; Karbalai, Nazanin; von Toerne, Christine; Waldenberger, Melanie; Gieger, Christian; Lichtner, Peter; Claussnitzer, Melina; Naumann, Ronald; Müller-Myhsok, Bertram; Torres, Miguel; Garrett, Lillian; Rozman, Jan; Klingenspor, Martin; Gailus-Durner, Valérie; Fuchs, Helmut; Hrabě de Angelis, Martin; Beckers, Johannes; Hölter, Sabine M; Meitinger, Thomas; Hauck, Stefanie M; Laumen, Helmut; Wurst, Wolfgang; Casares, Fernando; Gómez-Skarmeta, Jose Luis; Winkelmann, Juliane

2014-04-01

Genome-wide association studies (GWAS) identified the MEIS1 locus for Restless Legs Syndrome (RLS), but causal single nucleotide polymorphisms (SNPs) and their functional relevance remain unknown. This locus contains a large number of highly conserved noncoding regions (HCNRs) potentially functioning as cis-regulatory modules. We analyzed these HCNRs for allele-dependent enhancer activity in zebrafish and mice and found that the risk allele of the lead SNP rs12469063 reduces enhancer activity in the Meis1 expression domain of the murine embryonic ganglionic eminences (GE). CREB1 binds this enhancer and rs12469063 affects its binding in vitro. In addition, MEIS1 target genes suggest a role in the specification of neuronal progenitors in the GE, and heterozygous Meis1-deficient mice exhibit hyperactivity, resembling the RLS phenotype. Thus, in vivo and in vitro analysis of a common SNP with small effect size showed allele-dependent function in the prospective basal ganglia representing the first neurodevelopmental region implicated in RLS.
Inactivation of the first nucleotide-binding fold of the sulfonylurea receptor, and familial persistent hyperinsulinemic hypoglycemia of infancy

DOE Office of Scientific and Technical Information (OSTI.GOV)

Thomas, P.M.; Wohllk, N.; Huang, E.

1996-09-01

Familial persistent hyperinsulinemic hypoglycemia of infancy is a disorder of glucose homeostasis and is characterized by unregulated insulin secretion and profound hypoglycemia. Loss-of-function mutations in the second nucleotide-binding fold of the sulfonylurea receptor, a subunit of the pancreatic-islet {beta}-cell ATP-dependent potassium channel, has been demonstrated to be causative for persistent hyperinsulinemic hypoglycemia of infancy. We now describe three additional mutations in the first nucleotide-binding fold of the sulfonylurea-receptor gene. One point mutation disrupts the highly conserved Walker A motif of the first nucleotide-binding-fold region. The other two mutations occur in noncoding sequences required for RNA processing and are predicted tomore » disrupt the normal splicing pathway of the sulfonylurea-receptor mRNA precursor. These data suggest that both nucleotide-binding-fold regions of the sulfortylurea receptor are required for normal regulation of {beta}-cell ATP-dependent potassium channel activity and insulin secretion. 32 refs., 4 figs., 1 tab.« less
Complete cure of persistent virus infections by antiviral siRNAs.

PubMed

Saulnier, Aure; Pelletier, Isabelle; Labadie, Karine; Colbère-Garapin, Florence

2006-01-01

Small interfering RNAs (siRNAs) have been developed as antiviral agents for mammalian cells. The capacity of specific siRNAs to prevent virus infections has been demonstrated, and there is evidence that these new antiviral agents could have a partial therapeutic effect a few days after infection. We investigated the possibility of curing a persistent infection, several months after becoming established, using an in vitro model of persistent poliovirus (PV) infection in HEp-2 cells. Despite high virus titers and the presence of PV mutants, repeated treatment with a mixture of two siRNAs targeting both noncoding and coding regions, one of them in a highly conserved region, resulted in the complete cure of the majority of persistently infected cultures. No escape mutants emerged in treated cultures. The antiviral effect of specific siRNAs, consistent with a mechanism of RNA interference, correlated with a decrease in the amount of viral RNA, until its complete disappearance, resulting in cultures cured of virions and viral RNA.
The conservation and signatures of lincRNAs in Marek’s disease of chicken

USDA-ARS?s Scientific Manuscript database

Long intergenic non-coding RNAs (lincRNAs) associated with a number of cancers and other diseases have been identified in mammals, but they are still formidable to be comprehensively identified and characterized. Marek’s disease (MD) is a T cell lymphoma of chickens induced by Marek’s disease virus ...
The conservation and signatures of lincRNAs in Marek’s disease of chicken

USDA-ARS?s Scientific Manuscript database

Long intergenic non-coding RNAs (lincRNAs) associated with a number of cancers and other diseases have been identified in mammals, but they are still formidable to be comprehensively identified and characterized in chicken. Marek’s disease (MD) is a T cell lymphoma of chickens induced by Marek’s dis...
Partial Shotgun Sequencing of the Boechera stricta Genome Reveals Extensive Microsynteny and Promoter Conservation with Arabidopsis1[W

PubMed Central

Windsor, Aaron J.; Schranz, M. Eric; Formanová, Nataša; Gebauer-Jung, Steffi; Bishop, John G.; Schnabelrauch, Domenica; Kroymann, Juergen; Mitchell-Olds, Thomas

2006-01-01

Comparative genomics provides insight into the evolutionary dynamics that shape discrete sequences as well as whole genomes. To advance comparative genomics within the Brassicaceae, we have end sequenced 23,136 medium-sized insert clones from Boechera stricta, a wild relative of Arabidopsis (Arabidopsis thaliana). A significant proportion of these sequences, 18,797, are nonredundant and display highly significant similarity (BLASTn e-value ≤ 10−30) to low copy number Arabidopsis genomic regions, including more than 9,000 annotated coding sequences. We have used this dataset to identify orthologous gene pairs in the two species and to perform a global comparison of DNA regions 5′ to annotated coding regions. On average, the 500 nucleotides upstream to coding sequences display 71.4% identity between the two species. In a similar analysis, 61.4% identity was observed between 5′ noncoding sequences of Brassica oleracea and Arabidopsis, indicating that regulatory regions are not as diverged among these lineages as previously anticipated. By mapping the B. stricta end sequences onto the Arabidopsis genome, we have identified nearly 2,000 conserved blocks of microsynteny (bracketing 26% of the Arabidopsis genome). A comparison of fully sequenced B. stricta inserts to their homologous Arabidopsis genomic regions indicates that indel polymorphisms >5 kb contribute substantially to the genome size difference observed between the two species. Further, we demonstrate that microsynteny inferred from end-sequence data can be applied to the rapid identification and cloning of genomic regions of interest from nonmodel species. These results suggest that among diploid relatives of Arabidopsis, small- to medium-scale shotgun sequencing approaches can provide rapid and cost-effective benefits to evolutionary and/or functional comparative genomic frameworks. PMID:16607030
Transposon-driven transcription is a conserved feature of vertebrate spermatogenesis and transcript evolution.

PubMed

Davis, Matthew P; Carrieri, Claudia; Saini, Harpreet K; van Dongen, Stijn; Leonardi, Tommaso; Bussotti, Giovanni; Monahan, Jack M; Auchynnikava, Tania; Bitetti, Angelo; Rappsilber, Juri; Allshire, Robin C; Shkumatava, Alena; O'Carroll, Dónal; Enright, Anton J

2017-07-01

Spermatogenesis is associated with major and unique changes to chromosomes and chromatin. Here, we sought to understand the impact of these changes on spermatogenic transcriptomes. We show that long terminal repeats (LTRs) of specific mouse endogenous retroviruses (ERVs) drive the expression of many long non-coding transcripts (lncRNA). This process occurs post-mitotically predominantly in spermatocytes and round spermatids. We demonstrate that this transposon-driven lncRNA expression is a conserved feature of vertebrate spermatogenesis. We propose that transposon promoters are a mechanism by which the genome can explore novel transcriptional substrates, increasing evolutionary plasticity and allowing for the genesis of novel coding and non-coding genes. Accordingly, we show that a small fraction of these novel ERV-driven transcripts encode short open reading frames that produce detectable peptides. Finally, we find that distinct ERV elements from the same subfamilies act as differentially activated promoters in a tissue-specific context. In summary, we demonstrate that LTRs can act as tissue-specific promoters and contribute to post-mitotic spermatogenic transcriptome diversity. © 2017 The Authors. Published under the terms of the CC BY 4.0 license.

Widespread long noncoding RNAs as endogenous target mimics for microRNAs in plants.

PubMed

Wu, Hua-Jun; Wang, Zhi-Min; Wang, Meng; Wang, Xiu-Jie

2013-04-01

Target mimicry is a recently identified regulatory mechanism for microRNA (miRNA) functions in plants in which the decoy RNAs bind to miRNAs via complementary sequences and therefore block the interaction between miRNAs and their authentic targets. Both endogenous decoy RNAs (miRNA target mimics) and engineered artificial RNAs can induce target mimicry effects. Yet until now, only the Induced by Phosphate Starvation1 RNA has been proven to be a functional endogenous microRNA target mimic (eTM). In this work, we developed a computational method and systematically identified intergenic or noncoding gene-originated eTMs for 20 conserved miRNAs in Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa). The predicted miRNA binding sites were well conserved among eTMs of the same miRNA, whereas sequences outside of the binding sites varied a lot. We proved that the eTMs of miR160 and miR166 are functional target mimics and identified their roles in the regulation of plant development. The effectiveness of eTMs for three other miRNAs was also confirmed by transient agroinfiltration assay.
Identification of miRNA from Bouteloua gracilis, a drought tolerant grass, by deep sequencing and their in silico analysis.

PubMed

Ordóñez-Baquera, Perla Lucía; González-Rodríguez, Everardo; Aguado-Santacruz, Gerardo Armando; Rascón-Cruz, Quintín; Conesa, Ana; Moreno-Brito, Verónica; Echavarria, Raquel; Dominguez-Viveros, Joel

2017-02-01

MicroRNAs (miRNAs) are small non-coding RNA molecules that regulate signal transduction, development, metabolism, and stress responses in plants through post-transcriptional degradation and/or translational repression of target mRNAs. Several studies have addressed the role of miRNAs in model plant species, but miRNA expression and function in economically important forage crops, such as Bouteloua gracilis (Poaceae), a high-quality and drought-resistant grass distributed in semiarid regions of the United States and northern Mexico remain unknown. We applied high-throughput sequencing technology and bioinformatics analysis and identified 31 conserved miRNA families and 53 novel putative miRNAs with different abundance of reads in chlorophyllic cell cultures derived from B. gracilis. Some conserved miRNA families were highly abundant and possessed predicted targets involved in metabolism, plant growth and development, and stress responses. We also predicted additional identified novel miRNAs with specific targets, including B. gracilis ESTs, which were detected under drought stress conditions. Here we report 31 conserved miRNA families and 53 putative novel miRNAs in B. gracilis. Our results suggested the presence of regulatory miRNAs involved in modulating physiological and stress responses in this grass species. Copyright © 2016 Elsevier Ltd. All rights reserved.
A new method for species identification via protein-coding and non-coding DNA barcodes by combining machine learning with bioinformatic methods.

PubMed

Zhang, Ai-bing; Feng, Jie; Ward, Robert D; Wan, Ping; Gao, Qiang; Wu, Jun; Zhao, Wei-zhong

2012-01-01

Species identification via DNA barcodes is contributing greatly to current bioinventory efforts. The initial, and widely accepted, proposal was to use the protein-coding cytochrome c oxidase subunit I (COI) region as the standard barcode for animals, but recently non-coding internal transcribed spacer (ITS) genes have been proposed as candidate barcodes for both animals and plants. However, achieving a robust alignment for non-coding regions can be problematic. Here we propose two new methods (DV-RBF and FJ-RBF) to address this issue for species assignment by both coding and non-coding sequences that take advantage of the power of machine learning and bioinformatics. We demonstrate the value of the new methods with four empirical datasets, two representing typical protein-coding COI barcode datasets (neotropical bats and marine fish) and two representing non-coding ITS barcodes (rust fungi and brown algae). Using two random sub-sampling approaches, we demonstrate that the new methods significantly outperformed existing Neighbor-joining (NJ) and Maximum likelihood (ML) methods for both coding and non-coding barcodes when there was complete species coverage in the reference dataset. The new methods also out-performed NJ and ML methods for non-coding sequences in circumstances of potentially incomplete species coverage, although then the NJ and ML methods performed slightly better than the new methods for protein-coding barcodes. A 100% success rate of species identification was achieved with the two new methods for 4,122 bat queries and 5,134 fish queries using COI barcodes, with 95% confidence intervals (CI) of 99.75-100%. The new methods also obtained a 96.29% success rate (95%CI: 91.62-98.40%) for 484 rust fungi queries and a 98.50% success rate (95%CI: 96.60-99.37%) for 1094 brown algae queries, both using ITS barcodes.
Understanding Neurodevelopmental Disorders: The Promise of Regulatory Variation in the 3'UTRome.

PubMed

Wanke, Kai A; Devanna, Paolo; Vernes, Sonja C

2018-04-01

Neurodevelopmental disorders have a strong genetic component, but despite widespread efforts, the specific genetic factors underlying these disorders remain undefined for a large proportion of affected individuals. Given the accessibility of exome sequencing, this problem has thus far been addressed from a protein-centric standpoint; however, protein-coding regions only make up ∼1% to 2% of the human genome. With the advent of whole genome sequencing we are in the midst of a paradigm shift as it is now possible to interrogate the entire sequence of the human genome (coding and noncoding) to fill in the missing heritability of complex disorders. These new technologies bring new challenges, as the number of noncoding variants identified per individual can be overwhelming, making it prudent to focus on noncoding regions of known function, for which the effects of variation can be predicted and directly tested to assess pathogenicity. The 3'UTRome is a region of the noncoding genome that perfectly fulfills these criteria and is of high interest when searching for pathogenic variation related to complex neurodevelopmental disorders. Herein, we review the regulatory roles of the 3'UTRome as binding sites for microRNAs or RNA binding proteins, or during alternative polyadenylation. We detail existing evidence that these regions contribute to neurodevelopmental disorders and outline strategies for identification and validation of novel putatively pathogenic variation in these regions. This evidence suggests that studying the 3'UTRome will lead to the identification of new risk factors, new candidate disease genes, and a better understanding of the molecular mechanisms contributing to neurodevelopmental disorders. Copyright © 2017 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.
Molecular organization of the 5S rDNA gene type II in elasmobranchs.

PubMed

Castro, Sergio I; Hleap, Jose S; Cárdenas, Heiber; Blouin, Christian

2016-01-01

The 5S rDNA gene is a non-coding RNA that can be found in 2 copies (type I and type II) in bony and cartilaginous fish. Previous studies have pointed out that type II gene is a paralog derived from type I. We analyzed the molecular organization of 5S rDNA type II in elasmobranchs. Although the structure of the 5S rDNA is supposed to be highly conserved, our results show that the secondary structure in this group possesses some variability and is different than the consensus secondary structure. One of these differences in Selachii is an internal loop at nucleotides 7 and 112. These mutations observed in the transcribed region suggest an independent origin of the gene among Batoids and Selachii. All promoters were highly conserved with the exception of BoxA, possibly due to its affinity to polymerase III. This latter enzyme recognizes a dT4 sequence as stop signal, however in Rajiformes this signal was doubled in length to dT8. This could be an adaptation toward a higher efficiency in the termination process. Our results suggest that there is no TATA box in elasmobranchs in the NTS region. We also provide some evidence suggesting that the complexity of the microsatellites present in the NTS region play an important role in the 5S rRNA gene since it is significantly correlated with the length of the NTS.
Molecular organization of the 5S rDNA gene type II in elasmobranchs

PubMed Central

Castro, Sergio I.; Hleap, Jose S.; Cárdenas, Heiber; Blouin, Christian

2016-01-01

ABSTRACT The 5S rDNA gene is a non-coding RNA that can be found in 2 copies (type I and type II) in bony and cartilaginous fish. Previous studies have pointed out that type II gene is a paralog derived from type I. We analyzed the molecular organization of 5S rDNA type II in elasmobranchs. Although the structure of the 5S rDNA is supposed to be highly conserved, our results show that the secondary structure in this group possesses some variability and is different than the consensus secondary structure. One of these differences in Selachii is an internal loop at nucleotides 7 and 112. These mutations observed in the transcribed region suggest an independent origin of the gene among Batoids and Selachii. All promoters were highly conserved with the exception of BoxA, possibly due to its affinity to polymerase III. This latter enzyme recognizes a dT4 sequence as stop signal, however in Rajiformes this signal was doubled in length to dT8. This could be an adaptation toward a higher efficiency in the termination process. Our results suggest that there is no TATA box in elasmobranchs in the NTS region. We also provide some evidence suggesting that the complexity of the microsatellites present in the NTS region play an important role in the 5S rRNA gene since it is significantly correlated with the length of the NTS. PMID:26488198
Enhancer elements upstream of the SHOX gene are active in the developing limb.

PubMed

Durand, Claudia; Bangs, Fiona; Signolet, Jason; Decker, Eva; Tickle, Cheryll; Rappold, Gudrun

2010-05-01

Léri-Weill Dyschondrosteosis (LWD) is a dominant skeletal disorder characterized by short stature and distinct bone anomalies. SHOX gene mutations and deletions of regulatory elements downstream of SHOX resulting in haploinsufficiency have been found in patients with LWD. SHOX encodes a homeodomain transcription factor and is known to be expressed in the developing limb. We have now analyzed the regulatory significance of the region upstream of the SHOX gene. By comparative genomic analyses, we identified several conserved non-coding elements, which subsequently were tested in an in ovo enhancer assay in both chicken limb bud and cornea, where SHOX is also expressed. In this assay, we found three enhancers to be active in the developing chicken limb, but none were functional in the developing cornea. A screening of 60 LWD patients with an intact SHOX coding and downstream region did not yield any deletion of the upstream enhancer region. Thus, we speculate that SHOX upstream deletions occur at a lower frequency because of the structural organization of this genomic region and/or that SHOX upstream deletions may cause a phenotype that differs from the one observed in LWD.
Enhancer elements upstream of the SHOX gene are active in the developing limb

PubMed Central

Durand, Claudia; Bangs, Fiona; Signolet, Jason; Decker, Eva; Tickle, Cheryll; Rappold, Gudrun

2010-01-01

Léri-Weill Dyschondrosteosis (LWD) is a dominant skeletal disorder characterized by short stature and distinct bone anomalies. SHOX gene mutations and deletions of regulatory elements downstream of SHOX resulting in haploinsufficiency have been found in patients with LWD. SHOX encodes a homeodomain transcription factor and is known to be expressed in the developing limb. We have now analyzed the regulatory significance of the region upstream of the SHOX gene. By comparative genomic analyses, we identified several conserved non-coding elements, which subsequently were tested in an in ovo enhancer assay in both chicken limb bud and cornea, where SHOX is also expressed. In this assay, we found three enhancers to be active in the developing chicken limb, but none were functional in the developing cornea. A screening of 60 LWD patients with an intact SHOX coding and downstream region did not yield any deletion of the upstream enhancer region. Thus, we speculate that SHOX upstream deletions occur at a lower frequency because of the structural organization of this genomic region and/or that SHOX upstream deletions may cause a phenotype that differs from the one observed in LWD. PMID:19997128
Compound heterozygous deletions in pseudoautosomal region 1 in an infant with mild manifestations of langer mesomelic dysplasia.

PubMed

Tsuchiya, Takayoshi; Shibata, Minoru; Numabe, Hironao; Jinno, Tomoko; Nakabayashi, Kazuhiko; Nishimura, Gen; Nagai, Toshiro; Ogata, Tsutomu; Fukami, Maki

2014-02-01

Haploinsufficiency of SHOX on the short arm pseudoautosomal region (PAR1) leads to Leri-Weill dyschondrosteosis (LWD), and nullizygosity of SHOX results in Langer mesomelic dysplasia (LMD). Molecular defects of LWD/LMD include various microdeletions in PAR1 that involve exons and/or the putative upstream or downstream enhancer regions of SHOX, as well as several intragenic mutations. Here, we report on a Japanese male infant with mild manifestations of LMD and hitherto unreported microdeletions in PAR1. Clinical analysis revealed mesomelic short stature with various radiological findings indicative of LMD. Molecular analyses identified compound heterozygous deletions, that is, a maternally inherited ∼46 kb deletion involving the upstream region and exons 1-5 of SHOX, and a paternally inherited ∼500 kb deletion started from a position ∼300 kb downstream from SHOX. In silico analysis revealed that the downstream deletion did not affect the known putative enhancer regions of SHOX, although it encompassed several non-coding elements which were well conserved among various species with SHOX orthologs. These results provide the possibility of the presence of a novel enhancer for SHOX in the genomic region ∼300 to ∼800 kb downstream of the start codon. © 2013 Wiley Periodicals, Inc.
Identification of 88 regulatory small RNAs in the TIGR4 strain of the human pathogen Streptococcus pneumoniae

PubMed Central

Acebo, Paloma; Martin-Galiano, Antonio J.; Navarro, Sara; Zaballos, Ángel; Amblar, Mónica

2012-01-01

Streptococcus pneumoniae is the main etiological agent of community-acquired pneumonia and a major cause of mortality and morbidity among children and the elderly. Genome sequencing of several pneumococcal strains revealed valuable information about the potential proteins and genetic diversity of this prevalent human pathogen. However, little is known about its transcriptional regulation and its small regulatory noncoding RNAs. In this study, we performed deep sequencing of the S. pneumoniae TIGR4 strain RNome to identify small regulatory RNA candidates expressed in this pathogen. We discovered 1047 potential small RNAs including intragenic, 5′- and/or 3′-overlapping RNAs and 88 small RNAs encoded in intergenic regions. With this approach, we recovered many of the previously identified intergenic small RNAs and identified 68 novel candidates, most of which are conserved in both sequence and genomic context in other S. pneumoniae strains. We confirmed the independent expression of 17 intergenic small RNAs and predicted putative mRNA targets for six of them using bioinformatics tools. Preliminary results suggest that one of these six is a key player in the regulation of competence development. This study is the biggest catalog of small noncoding RNAs reported to date in S. pneumoniae and provides a highly complete view of the small RNA network in this pathogen. PMID:22274957
Extracellular Vesicle-Associated RNA as a Carrier of Epigenetic Information

PubMed Central

2017-01-01

Post-transcriptional regulation of messenger RNA (mRNA) metabolism and subcellular localization is of the utmost importance both during development and in cell differentiation. Besides carrying genetic information, mRNAs contain cis-acting signals (zip codes), usually present in their 5′- and 3′-untranslated regions (UTRs). By binding to these signals, trans-acting factors, such as RNA-binding proteins (RBPs), and/or non-coding RNAs (ncRNAs), control mRNA localization, translation and stability. RBPs can also form complexes with non-coding RNAs of different sizes. The release of extracellular vesicles (EVs) is a conserved process that allows both normal and cancer cells to horizontally transfer molecules, and hence properties, to neighboring cells. By interacting with proteins that are specifically sorted to EVs, mRNAs as well as ncRNAs can be transferred from cell to cell. In this review, we discuss the mechanisms underlying the sorting to EVs of different classes of molecules, as well as the role of extracellular RNAs and the associated proteins in altering gene expression in the recipient cells. Importantly, if, on the one hand, RBPs play a critical role in transferring RNAs through EVs, RNA itself could, on the other hand, function as a carrier to transfer proteins (i.e., chromatin modifiers, and transcription factors) that, once transferred, can alter the cell’s epigenome. PMID:28937658
Whole-Genome Sequencing Suggests Schizophrenia Risk Mechanisms in Humans with 22q11.2 Deletion Syndrome.

PubMed

Merico, Daniele; Zarrei, Mehdi; Costain, Gregory; Ogura, Lucas; Alipanahi, Babak; Gazzellone, Matthew J; Butcher, Nancy J; Thiruvahindrapuram, Bhooma; Nalpathamkalam, Thomas; Chow, Eva W C; Andrade, Danielle M; Frey, Brendan J; Marshall, Christian R; Scherer, Stephen W; Bassett, Anne S

2015-09-16

Chromosome 22q11.2 microdeletions impart a high but incomplete risk for schizophrenia. Possible mechanisms include genome-wide effects of DGCR8 haploinsufficiency. In a proof-of-principle study to assess the power of this model, we used high-quality, whole-genome sequencing of nine individuals with 22q11.2 deletions and extreme phenotypes (schizophrenia, or no psychotic disorder at age >50 years). The schizophrenia group had a greater burden of rare, damaging variants impacting protein-coding neurofunctional genes, including genes involved in neuron projection (nominal P = 0.02, joint burden of three variant types). Variants in the intact 22q11.2 region were not major contributors. Restricting to genes affected by a DGCR8 mechanism tended to amplify between-group differences. Damaging variants in highly conserved long intergenic noncoding RNA genes also were enriched in the schizophrenia group (nominal P = 0.04). The findings support the 22q11.2 deletion model as a threshold-lowering first hit for schizophrenia risk. If applied to a larger and thus better-powered cohort, this appears to be a promising approach to identify genome-wide rare variants in coding and noncoding sequence that perturb gene networks relevant to idiopathic schizophrenia. Similarly designed studies exploiting genetic models may prove useful to help delineate the genetic architecture of other complex phenotypes. Copyright © 2015 Merico et al.
Whole-Genome Sequencing Suggests Schizophrenia Risk Mechanisms in Humans with 22q11.2 Deletion Syndrome

PubMed Central

Merico, Daniele; Zarrei, Mehdi; Costain, Gregory; Ogura, Lucas; Alipanahi, Babak; Gazzellone, Matthew J.; Butcher, Nancy J.; Thiruvahindrapuram, Bhooma; Nalpathamkalam, Thomas; Chow, Eva W. C.; Andrade, Danielle M.; Frey, Brendan J.; Marshall, Christian R.; Scherer, Stephen W.; Bassett, Anne S.

2015-01-01

Chromosome 22q11.2 microdeletions impart a high but incomplete risk for schizophrenia. Possible mechanisms include genome-wide effects of DGCR8 haploinsufficiency. In a proof-of-principle study to assess the power of this model, we used high-quality, whole-genome sequencing of nine individuals with 22q11.2 deletions and extreme phenotypes (schizophrenia, or no psychotic disorder at age >50 years). The schizophrenia group had a greater burden of rare, damaging variants impacting protein-coding neurofunctional genes, including genes involved in neuron projection (nominal P = 0.02, joint burden of three variant types). Variants in the intact 22q11.2 region were not major contributors. Restricting to genes affected by a DGCR8 mechanism tended to amplify between-group differences. Damaging variants in highly conserved long intergenic noncoding RNA genes also were enriched in the schizophrenia group (nominal P = 0.04). The findings support the 22q11.2 deletion model as a threshold-lowering first hit for schizophrenia risk. If applied to a larger and thus better-powered cohort, this appears to be a promising approach to identify genome-wide rare variants in coding and noncoding sequence that perturb gene networks relevant to idiopathic schizophrenia. Similarly designed studies exploiting genetic models may prove useful to help delineate the genetic architecture of other complex phenotypes. PMID:26384369
Analysis of the complete genome of the first Irkut virus isolate from China: comparison across the Lyssavirus genus.

PubMed

Liu, Ye; Li, Nan; Zhang, Shoufeng; Zhang, Fei; Lian, Hai; Wang, Ying; Zhang, Jinxia; Hu, Rongliang

2013-12-01

The genome of Irkut virus, isolate IRKV-THChina12, the first non-rabies lyssavirus from China (of bat origin), has been completely sequenced. In general, coding and non-coding regions of this viral genome are similar to those of other lyssaviruses. However, alignment of the deduced amino acid sequences of the structural proteins of IRKV-THChina12 with those of other lyssavirus representatives revealed significant variability between viral species. The nucleoprotein and matrix protein were found to be the most conserved, followed by the large protein, glycoprotein and phosphoprotein. Differences in the antigenic sites in glycoprotein may result in only partial protection of the available rabies biologics against Irkut virus, which is of particular concern for pre- and post-exposure rabies prophylaxis. Copyright © 2013 Elsevier Inc. All rights reserved.
Chromatin Heterogeneity and Distribution of Regulatory Elements in the Late-Replicating Intercalary Heterochromatin Domains of Drosophila melanogaster Chromosomes

PubMed Central

Khoroshko, Varvara A.; Levitsky, Viktor G.; Zykova, Tatyana Yu.; Antonenko, Oksana V.; Belyaeva, Elena S.; Zhimulev, Igor F.

2016-01-01

Late-replicating domains (intercalary heterochromatin) in the Drosophila genome display a number of features suggesting their organization is quite unique. Typically, they are quite large and encompass clusters of functionally unrelated tissue-specific genes. They correspond to the topologically associating domains and conserved microsynteny blocks. Our study aims at exploring further details of molecular organization of intercalary heterochromatin and has uncovered surprising heterogeneity of chromatin composition in these regions. Using the 4HMM model developed in our group earlier, intercalary heterochromatin regions were found to host chromatin fragments with a particular epigenetic profile. Aquamarine chromatin fragments (spanning 0.67% of late-replicating regions) are characterized as a class of sequences that appear heterogeneous in terms of their decompactization. These fragments are enriched with enhancer sequences and binding sites for insulator proteins. They likely mark the chromatin state that is related to the binding of cis-regulatory proteins. Malachite chromatin fragments (11% of late-replicating regions) appear to function as universal transitional regions between two contrasting chromatin states. Namely, they invariably delimit intercalary heterochromatin regions from the adjacent active chromatin of interbands. Malachite fragments also flank aquamarine fragments embedded in the repressed chromatin of late-replicating regions. Significant enrichment of insulator proteins CP190, SU(HW), and MOD2.2 was observed in malachite chromatin. Neither aquamarine nor malachite chromatin types appear to correlate with the positions of highly conserved non-coding elements (HCNE) that are typically replete in intercalary heterochromatin. Malachite chromatin found on the flanks of intercalary heterochromatin regions tends to replicate earlier than the malachite chromatin embedded in intercalary heterochromatin. In other words, there exists a gradient of replication progressing from the flanks of intercalary heterochromatin regions center-wise. The peculiar organization and features of replication in large late-replicating regions are discussed as possible factors shaping the evolutionary stability of intercalary heterochromatin. PMID:27300486
Small RNAs, big impact: small RNA pathways in transposon control and their effect on the host stress response.

PubMed

Wheeler, Bayly S

2013-12-01

Transposons are mobile genetic elements that are a major constituent of most genomes. Organisms regulate transposable element expression, transposition, and insertion site preference, mitigating the genome instability caused by uncontrolled transposition. A recent burst of research has demonstrated the critical role of small non-coding RNAs in regulating transposition in fungi, plants, and animals. While mechanistically distinct, these pathways work through a conserved paradigm. The presence of a transposon is communicated by the presence of its RNA or by its integration into specific genomic loci. These signals are then translated into small non-coding RNAs that guide epigenetic modifications and gene silencing back to the transposon. In addition to being regulated by the host, transposable elements are themselves capable of influencing host gene expression. Transposon expression is responsive to environmental signals, and many transposons are activated by various cellular stresses. TEs can confer local gene regulation by acting as enhancers and can also confer global gene regulation through their non-coding RNAs. Thus, transposable elements can act as stress-responsive regulators that control host gene expression in cis and trans.
Non-coding stem-bulge RNAs are required for cell proliferation and embryonic development in C. elegans

PubMed Central

Kowalski, Madzia P.; Baylis, Howard A.; Krude, Torsten

2015-01-01

ABSTRACT Stem bulge RNAs (sbRNAs) are a family of small non-coding stem-loop RNAs present in Caenorhabditis elegans and other nematodes, the function of which is unknown. Here, we report the first functional characterisation of nematode sbRNAs. We demonstrate that sbRNAs from a range of nematode species are able to reconstitute the initiation of chromosomal DNA replication in the presence of replication proteins in vitro, and that conserved nucleotide sequence motifs are essential for this function. By functionally inactivating sbRNAs with antisense morpholino oligonucleotides, we show that sbRNAs are required for S phase progression, early embryonic development and the viability of C. elegans in vivo. Thus, we demonstrate a new and essential role for sbRNAs during the early development of C. elegans. sbRNAs show limited nucleotide sequence similarity to vertebrate Y RNAs, which are also essential for the initiation of DNA replication. Our results therefore establish that the essential function of small non-coding stem-loop RNAs during DNA replication extends beyond vertebrates. PMID:25908866
Correlation approach to identify coding regions in DNA sequences

NASA Technical Reports Server (NTRS)

Ossadnik, S. M.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Peng, C. K.; Simons, M.; Stanley, H. E.

1994-01-01

Recently, it was observed that noncoding regions of DNA sequences possess long-range power-law correlations, whereas coding regions typically display only short-range correlations. We develop an algorithm based on this finding that enables investigators to perform a statistical analysis on long DNA sequences to locate possible coding regions. The algorithm is particularly successful in predicting the location of lengthy coding regions. For example, for the complete genome of yeast chromosome III (315,344 nucleotides), at least 82% of the predictions correspond to putative coding regions; the algorithm correctly identified all coding regions larger than 3000 nucleotides, 92% of coding regions between 2000 and 3000 nucleotides long, and 79% of coding regions between 1000 and 2000 nucleotides. The predictive ability of this new algorithm supports the claim that there is a fundamental difference in the correlation property between coding and noncoding sequences. This algorithm, which is not species-dependent, can be implemented with other techniques for rapidly and accurately locating relatively long coding regions in genomic sequences.
Functionally conserved cis-regulatory elements of COL18A1 identified through zebrafish transgenesis.

PubMed

Kague, Erika; Bessling, Seneca L; Lee, Josephine; Hu, Gui; Passos-Bueno, Maria Rita; Fisher, Shannon

2010-01-15

Type XVIII collagen is a component of basement membranes, and expressed prominently in the eye, blood vessels, liver, and the central nervous system. Homozygous mutations in COL18A1 lead to Knobloch Syndrome, characterized by ocular defects and occipital encephalocele. However, relatively little has been described on the role of type XVIII collagen in development, and nothing is known about the regulation of its tissue-specific expression pattern. We have used zebrafish transgenesis to identify and characterize cis-regulatory sequences controlling expression of the human gene. Candidate enhancers were selected from non-coding sequence associated with COL18A1 based on sequence conservation among mammals. Although these displayed no overt conservation with orthologous zebrafish sequences, four regions nonetheless acted as tissue-specific transcriptional enhancers in the zebrafish embryo, and together recapitulated the major aspects of col18a1 expression. Additional post-hoc computational analysis on positive enhancer sequences revealed alignments between mammalian and teleost sequences, which we hypothesize predict the corresponding zebrafish enhancers; for one of these, we demonstrate functional overlap with the orthologous human enhancer sequence. Our results provide important insight into the biological function and regulation of COL18A1, and point to additional sequences that may contribute to complex diseases involving COL18A1. More generally, we show that combining functional data with targeted analyses for phylogenetic conservation can reveal conserved cis-regulatory elements in the large number of cases where computational alignment alone falls short. Copyright 2009 Elsevier Inc. All rights reserved.
Behind the curtain of non-coding RNAs; long non-coding RNAs regulating hepatocarcinogenesis

PubMed Central

El Khodiry, Aya; Afify, Menna; El Tayebi, Hend M

2018-01-01

Hepatocellular carcinoma (HCC) is one of the most common and aggressive cancers worldwide. HCC is the fifth common malignancy in the world and the second leading cause of cancer death in Asia. Long non-coding RNAs (lncRNAs) are RNAs with a length greater than 200 nucleotides that do not encode proteins. lncRNAs can regulate gene expression and protein synthesis in several ways by interacting with DNA, RNA and proteins in a sequence specific manner. They could regulate cellular and developmental processes through either gene inhibition or gene activation. Many studies have shown that dysregulation of lncRNAs is related to many human diseases such as cardiovascular diseases, genetic disorders, neurological diseases, immune mediated disorders and cancers. However, the study of lncRNAs is challenging as they are poorly conserved between species, their expression levels aren’t as high as that of mRNAs and have great interpatient variations. The study of lncRNAs expression in cancers have been a breakthrough as it unveils potential biomarkers and drug targets for cancer therapy and helps understand the mechanism of pathogenesis. This review discusses many long non-coding RNAs and their contribution in HCC, their role in development, metastasis, and prognosis of HCC and how to regulate and target these lncRNAs as a therapeutic tool in HCC treatment in the future. PMID:29434445

Long noncoding RNAs and their proposed functions in fibre development of cotton (Gossypium spp.).

PubMed

Wang, Maojun; Yuan, Daojun; Tu, Lili; Gao, Wenhui; He, Yonghui; Hu, Haiyan; Wang, Pengcheng; Liu, Nian; Lindsey, Keith; Zhang, Xianlong

2015-09-01

Long noncoding RNAs (lncRNAs) are transcripts of at least 200 bp in length, possess no apparent coding capacity and are involved in various biological regulatory processes. Until now, no systematic identification of lncRNAs has been reported in cotton (Gossypium spp.). Here, we describe the identification of 30 550 long intergenic noncoding RNA (lincRNA) loci (50 566 transcripts) and 4718 long noncoding natural antisense transcript (lncNAT) loci (5826 transcripts). LncRNAs are rich in repetitive sequences and preferentially expressed in a tissue-specific manner. The detection of abundant genome-specific and/or lineage-specific lncRNAs indicated their weak evolutionary conservation. Approximately 76% of homoeologous lncRNAs exhibit biased expression patterns towards the At or Dt subgenomes. Compared with protein-coding genes, lncRNAs showed overall higher methylation levels and their expression was less affected by gene body methylation. Expression validation in different cotton accessions and coexpression network construction helped to identify several functional lncRNA candidates involved in cotton fibre initiation and elongation. Analysis of integrated expression from the subgenomes of lncRNAs generating miR397 and its targets as a result of genome polyploidization indicated their pivotal functions in regulating lignin metabolism in domesticated tetraploid cotton fibres. This study provides the first comprehensive identification of lncRNAs in Gossypium. © 2015 The Authors. New Phytologist © 2015 New Phytologist Trust.
Standing your Ground to Exoribonucleases: Function of Flavivirus Long Non-coding RNAs

PubMed Central

Charley, Phillida A.; Wilusz, Jeffrey

2015-01-01

Members of the Flaviviridae (e.g. Dengue virus, West Nile virus, and Hepatitis C virus) contain a positive-sense RNA genome that encodes a large polyprotein. It is now also clear most if not all of these viruses also produce an abundant subgenomic long non-coding RNA. These non-coding RNAs, which are called subgenomicflavivirus RNAs (sfRNAs) or Xrn1-resistant RNAs (xrRNAs), are stable decay intermediates generated from the viral genomic RNA through the stalling of the cellular exoribonuclease Xrn1 at highly structured regions. Several functions of these flavivirus long non-coding RNAs have been revealed in recent years. The generation of these sfRNAs/xrRNAs from viral transcripts results in the repression of Xrn1 and the dysregulation of cellular mRNA stability. The abundant sfRNAs also serve directly as a decoy for important cellular protein regulators of the interferon and RNA interference antiviral pathways. Thus the generation of long non-coding RNAs from flaviviruses, hepaciviruses and pestiviruses likely disrupts aspects of innate immunity and may directly contribute to viral replication, cytopathology and pathogenesis. PMID:26368052
Long non-coding RNA discovery across the genus anopheles reveals conserved secondary structures within and beyond the Gambiae complex.

PubMed

Jenkins, Adam M; Waterhouse, Robert M; Muskavitch, Marc A T

2015-04-23

Long non-coding RNAs (lncRNAs) have been defined as mRNA-like transcripts longer than 200 nucleotides that lack significant protein-coding potential, and many of them constitute scaffolds for ribonucleoprotein complexes with critical roles in epigenetic regulation. Various lncRNAs have been implicated in the modulation of chromatin structure, transcriptional and post-transcriptional gene regulation, and regulation of genomic stability in mammals, Caenorhabditis elegans, and Drosophila melanogaster. The purpose of this study is to identify the lncRNA landscape in the malaria vector An. gambiae and assess the evolutionary conservation of lncRNAs and their secondary structures across the Anopheles genus. Using deep RNA sequencing of multiple Anopheles gambiae life stages, we have identified 2,949 lncRNAs and more than 300 previously unannotated putative protein-coding genes. The lncRNAs exhibit differential expression profiles across life stages and adult genders. We find that across the genus Anopheles, lncRNAs display much lower sequence conservation than protein-coding genes. Additionally, we find that lncRNA secondary structure is highly conserved within the Gambiae complex, but diverges rapidly across the rest of the genus Anopheles. This study offers one of the first lncRNA secondary structure analyses in vector insects. Our description of lncRNAs in An. gambiae offers the most comprehensive genome-wide insights to date into lncRNAs in this vector mosquito, and defines a set of potential targets for the development of vector-based interventions that may further curb the human malaria burden in disease-endemic countries.
A conserved long noncoding RNA affects sleep behavior in Drosophila.

PubMed

Soshnev, Alexey A; Ishimoto, Hiroshi; McAllister, Bryant F; Li, Xingguo; Wehling, Misty D; Kitamoto, Toshihiro; Geyer, Pamela K

2011-10-01

Metazoan genomes encode an abundant collection of mRNA-like, long noncoding (lnc)RNAs. Although lncRNAs greatly expand the transcriptional repertoire, we have a limited understanding of how these RNAs contribute to developmental regulation. Here, we investigate the function of the Drosophila lncRNA called yellow-achaete intergenic RNA (yar). Comparative sequence analyses show that the yar gene is conserved in Drosophila species representing 40-60 million years of evolution, with one of the conserved sequence motifs encompassing the yar promoter. Further, the timing of yar expression in Drosophila virilis parallels that in D. melanogaster, suggesting that transcriptional regulation of yar is conserved. The function of yar was defined by generating null alleles. Flies lacking yar RNAs are viable and show no overt morphological defects, consistent with maintained transcriptional regulation of the adjacent yellow (y) and achaete (ac) genes. The location of yar within a neural gene cluster led to the investigation of effects of yar in behavioral assays. These studies demonstrated that loss of yar alters sleep regulation in the context of a normal circadian rhythm. Nighttime sleep was reduced and fragmented, with yar mutants displaying diminished sleep rebound following sleep deprivation. Importantly, these defects were rescued by a yar transgene. These data provide the first example of a lncRNA gene involved in Drosophila sleep regulation. We find that yar is a cytoplasmic lncRNA, suggesting that yar may regulate sleep by affecting stabilization or translational regulation of mRNAs. Such functions of lncRNAs may extend to vertebrates, as lncRNAs are abundant in neural tissues.
The Ftx Noncoding Locus Controls X Chromosome Inactivation Independently of Its RNA Products.

PubMed

Furlan, Giulia; Gutierrez Hernandez, Nancy; Huret, Christophe; Galupa, Rafael; van Bemmel, Joke Gerarda; Romito, Antonio; Heard, Edith; Morey, Céline; Rougeulle, Claire

2018-05-03

Accumulation of the Xist long noncoding RNA (lncRNA) on one X chromosome is the trigger for X chromosome inactivation (XCI) in female mammals. Xist expression, which needs to be tightly controlled, involves a cis-acting region, the X-inactivation center (Xic), containing many lncRNA genes that evolved concomitantly to Xist from protein-coding ancestors through pseudogeneization and loss of coding potential. Here, we uncover an essential role for the Xic-linked noncoding gene Ftx in the regulation of Xist expression. We show that Ftx is required in cis to promote Xist transcriptional activation and establishment of XCI. Importantly, we demonstrate that this function depends on Ftx transcription and not on the RNA products. Our findings illustrate the multiplicity of layers operating in the establishment of XCI and highlight the diversity in the modus operandi of the noncoding players. Copyright © 2018 Elsevier Inc. All rights reserved.
2-D Structure of the A Region of Xist RNA and Its Implication for PRC2 Association

PubMed Central

Maenner, Sylvain; Blaud, Magali; Fouillen, Laetitia; Savoye, Anne; Marchand, Virginie; Dubois, Agnès; Sanglier-Cianférani, Sarah; Van Dorsselaer, Alain; Clerc, Philippe; Avner, Philip; Visvikis, Athanase; Branlant, Christiane

2010-01-01

In placental mammals, inactivation of one of the X chromosomes in female cells ensures sex chromosome dosage compensation. The 17 kb non-coding Xist RNA is crucial to this process and accumulates on the future inactive X chromosome. The most conserved Xist RNA region, the A region, contains eight or nine repeats separated by U-rich spacers. It is implicated in the recruitment of late inactivated X genes to the silencing compartment and likely in the recruitment of complex PRC2. Little is known about the structure of the A region and more generally about Xist RNA structure. Knowledge of its structure is restricted to an NMR study of a single A repeat element. Our study is the first experimental analysis of the structure of the entire A region in solution. By the use of chemical and enzymatic probes and FRET experiments, using oligonucleotides carrying fluorescent dyes, we resolved problems linked to sequence redundancies and established a 2-D structure for the A region that contains two long stem-loop structures each including four repeats. Interactions formed between repeats and between repeats and spacers stabilize these structures. Conservation of the spacer terminal sequences allows formation of such structures in all sequenced Xist RNAs. By combination of RNP affinity chromatography, immunoprecipitation assays, mass spectrometry, and Western blot analysis, we demonstrate that the A region can associate with components of the PRC2 complex in mouse ES cell nuclear extracts. Whilst a single four-repeat motif is able to associate with components of this complex, recruitment of Suz12 is clearly more efficient when the entire A region is present. Our data with their emphasis on the importance of inter-repeat pairing change fundamentally our conception of the 2-D structure of the A region of Xist RNA and support its possible implication in recruitment of the PRC2 complex. PMID:20052282
The primary structures of two yeast enolase genes. Homology between the 5' noncoding flanking regions of yeast enolase and glyceraldehyde-3-phosphate dehydrogenase genes.

PubMed

Holland, M J; Holland, J P; Thill, G P; Jackson, K A

1981-02-10

Segments of yeast genomic DNA containing two enolase structural genes have been isolated by subculture cloning procedures using a cDNA hybridization probe synthesized from purified yeast enolase mRNA. Based on restriction endonuclease and transcriptional maps of these two segments of yeast DNA, each hybrid plasmid contains a region of extensive nucleotide sequence homology which forms hybrids with the cDNA probe. The DNA sequences which flank this homologous region in the two hybrid plasmids are nonhomologous indicating that these sequences are nontandemly repeated in the yeast genome. The complete nucleotide sequence of the coding as well as the flanking noncoding regions of these genes has been determined. The amino acid sequence predicted from one reading frame of both structural genes is extremely similar to that determined for yeast enolase (Chin, C. C. Q., Brewer, J. M., Eckard, E., and Wold, F. (1981) J. Biol. Chem. 256, 1370-1376), confirming that these isolated structural genes encode yeast enolase. The nucleotide sequences of the coding regions of the genes are approximately 95% homologous, and neither gene contains an intervening sequence. Codon utilization in the enolase genes follows the same biased pattern previously described for two yeast glyceraldehyde-3-phosphate dehydrogenase structural genes (Holland, J. P., and Holland, M. J. (1980) J. Biol. Chem. 255, 2596-2605). DNA blotting analysis confirmed that the isolated segments of yeast DNA are colinear with yeast genomic DNA and that there are two nontandemly repeated enolase genes per haploid yeast genome. The noncoding portions of the two enolase genes adjacent to the initiation and termination codons are approximately 70% homologous and contain sequences thought to be involved in the synthesis and processing messenger RNA. Finally there are regions of extensive homology between the two enolase structural genes and two yeast glyceraldehyde-3-phosphate dehydrogenase structural genes within the 5- noncoding portions of these glycolytic genes.
Conservation of regulatory sequences and gene expression patterns in the disintegrating Drosophila Hox gene complex

PubMed Central

Negre, Bárbara; Casillas, Sònia; Suzanne, Magali; Sánchez-Herrero, Ernesto; Akam, Michael; Nefedov, Michael; Barbadilla, Antonio; de Jong, Pieter; Ruiz, Alfredo

2005-01-01

Homeotic (Hox) genes are usually clustered and arranged in the same order as they are expressed along the anteroposterior body axis of metazoans. The mechanistic explanation for this colinearity has been elusive, and it may well be that a single and universal cause does not exist. The Hox-gene complex (HOM-C) has been rearranged differently in several Drosophila species, producing a striking diversity of Hox gene organizations. We investigated the genomic and functional consequences of the two HOM-C splits present in Drosophila buzzatii. Firstly, we sequenced two regions of the D. buzzatii genome, one containing the genes labial and abdominal A, and another one including proboscipedia, and compared their organization with that of D. melanogaster and D. pseudoobscura in order to map precisely the two splits. Then, a plethora of conserved noncoding sequences, which are putative enhancers, were identified around the three Hox genes closer to the splits. The position and order of these enhancers are conserved, with minor exceptions, between the three Drosophila species. Finally, we analyzed the expression patterns of the same three genes in embryos and imaginal discs of four Drosophila species with different Hox-gene organizations. The results show that their expression patterns are conserved despite the HOM-C splits. We conclude that, in Drosophila, Hox-gene clustering is not an absolute requirement for proper function. Rather, the organization of Hox genes is modular, and their clustering seems the result of phylogenetic inertia more than functional necessity. PMID:15867430
A deep learning method for lincRNA detection using auto-encoder algorithm.

PubMed

Yu, Ning; Yu, Zeng; Pan, Yi

2017-12-06

RNA sequencing technique (RNA-seq) enables scientists to develop novel data-driven methods for discovering more unidentified lincRNAs. Meantime, knowledge-based technologies are experiencing a potential revolution ignited by the new deep learning methods. By scanning the newly found data set from RNA-seq, scientists have found that: (1) the expression of lincRNAs appears to be regulated, that is, the relevance exists along the DNA sequences; (2) lincRNAs contain some conversed patterns/motifs tethered together by non-conserved regions. The two evidences give the reasoning for adopting knowledge-based deep learning methods in lincRNA detection. Similar to coding region transcription, non-coding regions are split at transcriptional sites. However, regulatory RNAs rather than message RNAs are generated. That is, the transcribed RNAs participate the biological process as regulatory units instead of generating proteins. Identifying these transcriptional regions from non-coding regions is the first step towards lincRNA recognition. The auto-encoder method achieves 100% and 92.4% prediction accuracy on transcription sites over the putative data sets. The experimental results also show the excellent performance of predictive deep neural network on the lincRNA data sets compared with support vector machine and traditional neural network. In addition, it is validated through the newly discovered lincRNA data set and one unreported transcription site is found by feeding the whole annotated sequences through the deep learning machine, which indicates that deep learning method has the extensive ability for lincRNA prediction. The transcriptional sequences of lincRNAs are collected from the annotated human DNA genome data. Subsequently, a two-layer deep neural network is developed for the lincRNA detection, which adopts the auto-encoder algorithm and utilizes different encoding schemes to obtain the best performance over intergenic DNA sequence data. Driven by those newly annotated lincRNA data, deep learning methods based on auto-encoder algorithm can exert their capability in knowledge learning in order to capture the useful features and the information correlation along DNA genome sequences for lincRNA detection. As our knowledge, this is the first application to adopt the deep learning techniques for identifying lincRNA transcription sequences.
Noncoding Subgenomic Flavivirus RNA Is Processed by the Mosquito RNA Interference Machinery and Determines West Nile Virus Transmission by Culex pipiens Mosquitoes.

PubMed

Göertz, G P; Fros, J J; Miesen, P; Vogels, C B F; van der Bent, M L; Geertsema, C; Koenraadt, C J M; van Rij, R P; van Oers, M M; Pijlman, G P

2016-11-15

Flaviviruses, such as Zika virus, yellow fever virus, dengue virus, and West Nile virus (WNV), are a serious concern for human health. Flaviviruses produce an abundant noncoding subgenomic flavivirus RNA (sfRNA) in infected cells. sfRNA results from stalling of the host 5'-3' exoribonuclease XRN1/Pacman on conserved RNA structures in the 3' untranslated region (UTR) of the viral genomic RNA. sfRNA production is conserved in insect-specific, mosquito-borne, and tick-borne flaviviruses and flaviviruses with no known vector, suggesting a pivotal role for sfRNA in the flavivirus life cycle. Here, we investigated the function of sfRNA during WNV infection of Culex pipiens mosquitoes and evaluated its role in determining vector competence. An sfRNA1-deficient WNV was generated that displayed growth kinetics similar to those of wild-type WNV in both RNA interference (RNAi)-competent and -compromised mosquito cell lines. Small-RNA deep sequencing of WNV-infected mosquitoes indicated an active small interfering RNA (siRNA)-based antiviral response for both the wild-type and sfRNA1-deficient viruses. Additionally, we provide the first evidence that sfRNA is an RNAi substrate in vivo Two reproducible small-RNA hot spots within the 3' UTR/sfRNA of the wild-type virus mapped to RNA stem-loops SL-III and 3' SL, which stick out of the three-dimensional (3D) sfRNA structure model. Importantly, we demonstrate that sfRNA-deficient WNV displays significantly decreased infection and transmission rates in vivo when administered via the blood meal. Finally, we show that transmission and infection rates are not affected by sfRNA after intrathoracic injection, thereby identifying sfRNA as a key driver to overcome the mosquito midgut infection barrier. This is the first report to describe a key biological function of sfRNA for flavivirus infection of the arthropod vector, providing an explanation for the strict conservation of sfRNA production. Understanding the flavivirus transmission cycle is important to identify novel targets to interfere with disease and to aid development of virus control strategies. Flaviviruses produce an abundant noncoding viral RNA called sfRNA in both arthropod and mammalian cells. To evaluate the role of sfRNA in flavivirus transmission, we infected mosquitoes with the flavivirus West Nile virus and an sfRNA-deficient mutant West Nile virus. We demonstrate that sfRNA determines the infection and transmission rates of West Nile virus in Culex pipiens mosquitoes. Comparison of infection via the blood meal versus intrathoracic injection, which bypasses the midgut, revealed that sfRNA is important to overcome the mosquito midgut barrier. We also show that sfRNA is processed by the antiviral RNA interference machinery in mosquitoes. This is the first report to describe a pivotal biological function of sfRNA in arthropods. The results explain why sfRNA production is evolutionarily conserved. Copyright © 2016, American Society for Microbiology. All Rights Reserved.
Noncoding Subgenomic Flavivirus RNA Is Processed by the Mosquito RNA Interference Machinery and Determines West Nile Virus Transmission by Culex pipiens Mosquitoes

PubMed Central

Göertz, G. P.; Fros, J. J.; Miesen, P.; Vogels, C. B. F.; van der Bent, M. L.; Geertsema, C.; Koenraadt, C. J. M.; van Oers, M. M.

2016-01-01

ABSTRACT Flaviviruses, such as Zika virus, yellow fever virus, dengue virus, and West Nile virus (WNV), are a serious concern for human health. Flaviviruses produce an abundant noncoding subgenomic flavivirus RNA (sfRNA) in infected cells. sfRNA results from stalling of the host 5′-3′ exoribonuclease XRN1/Pacman on conserved RNA structures in the 3′ untranslated region (UTR) of the viral genomic RNA. sfRNA production is conserved in insect-specific, mosquito-borne, and tick-borne flaviviruses and flaviviruses with no known vector, suggesting a pivotal role for sfRNA in the flavivirus life cycle. Here, we investigated the function of sfRNA during WNV infection of Culex pipiens mosquitoes and evaluated its role in determining vector competence. An sfRNA1-deficient WNV was generated that displayed growth kinetics similar to those of wild-type WNV in both RNA interference (RNAi)-competent and -compromised mosquito cell lines. Small-RNA deep sequencing of WNV-infected mosquitoes indicated an active small interfering RNA (siRNA)-based antiviral response for both the wild-type and sfRNA1-deficient viruses. Additionally, we provide the first evidence that sfRNA is an RNAi substrate in vivo. Two reproducible small-RNA hot spots within the 3′ UTR/sfRNA of the wild-type virus mapped to RNA stem-loops SL-III and 3′ SL, which stick out of the three-dimensional (3D) sfRNA structure model. Importantly, we demonstrate that sfRNA-deficient WNV displays significantly decreased infection and transmission rates in vivo when administered via the blood meal. Finally, we show that transmission and infection rates are not affected by sfRNA after intrathoracic injection, thereby identifying sfRNA as a key driver to overcome the mosquito midgut infection barrier. This is the first report to describe a key biological function of sfRNA for flavivirus infection of the arthropod vector, providing an explanation for the strict conservation of sfRNA production. IMPORTANCE Understanding the flavivirus transmission cycle is important to identify novel targets to interfere with disease and to aid development of virus control strategies. Flaviviruses produce an abundant noncoding viral RNA called sfRNA in both arthropod and mammalian cells. To evaluate the role of sfRNA in flavivirus transmission, we infected mosquitoes with the flavivirus West Nile virus and an sfRNA-deficient mutant West Nile virus. We demonstrate that sfRNA determines the infection and transmission rates of West Nile virus in Culex pipiens mosquitoes. Comparison of infection via the blood meal versus intrathoracic injection, which bypasses the midgut, revealed that sfRNA is important to overcome the mosquito midgut barrier. We also show that sfRNA is processed by the antiviral RNA interference machinery in mosquitoes. This is the first report to describe a pivotal biological function of sfRNA in arthropods. The results explain why sfRNA production is evolutionarily conserved. PMID:27581979
Ultraconserved regions encoding ncRNAs are altered in human leukemias and carcinomas.

PubMed

Calin, George A; Liu, Chang-gong; Ferracin, Manuela; Hyslop, Terry; Spizzo, Riccardo; Sevignani, Cinzia; Fabbri, Muller; Cimmino, Amelia; Lee, Eun Joo; Wojcik, Sylwia E; Shimizu, Masayoshi; Tili, Esmerina; Rossi, Simona; Taccioli, Cristian; Pichiorri, Flavia; Liu, Xiuping; Zupo, Simona; Herlea, Vlad; Gramantieri, Laura; Lanza, Giovanni; Alder, Hansjuerg; Rassenti, Laura; Volinia, Stefano; Schmittgen, Thomas D; Kipps, Thomas J; Negrini, Massimo; Croce, Carlo M

2007-09-01

Noncoding RNA (ncRNA) transcripts are thought to be involved in human tumorigenesis. We report that a large fraction of genomic ultraconserved regions (UCRs) encode a particular set of ncRNAs whose expression is altered in human cancers. Genome-wide profiling revealed that UCRs have distinct signatures in human leukemias and carcinomas. UCRs are frequently located at fragile sites and genomic regions involved in cancers. We identified certain UCRs whose expression may be regulated by microRNAs abnormally expressed in human chronic lymphocytic leukemia, and we proved that the inhibition of an overexpressed UCR induces apoptosis in colon cancer cells. Our findings argue that ncRNAs and interaction between noncoding genes are involved in tumorigenesis to a greater extent than previously thought.
Post-transcriptional Regulation of Genes Related to Biological Behaviors of Gastric Cancer by Long Noncoding RNAs and MicroRNAs

PubMed Central

Liu, Wenjing; Ma, Rui; Yuan, Yuan

2017-01-01

Noncoding RNAs play critical roles in regulating protein-coding genes and comprise two major classes: long noncoding RNAs (lncRNAs) and microRNAs (miRNAs). LncRNAs regulate gene expression at transcriptional, post-transcriptional, and epigenetic levels via multiple action modes. LncRNAs can also function as endogenous competitive RNAs for miRNAs and indirectly regulate gene expression post-transcriptionally. By binding to the 3'-untranslated regions (3'-UTR) of target genes, miRNAs post-transcriptionally regulate gene expression. Herein, we conducted a review of post-transcriptional regulation by lncRNAs and miRNAs of genes associated with biological behaviors of gastric cancer. PMID:29187891
Different phylogenomic approaches to resolve the evolutionary relationships among model fish species.

PubMed

Negrisolo, Enrico; Kuhl, Heiner; Forcato, Claudio; Vitulo, Nicola; Reinhardt, Richard; Patarnello, Tomaso; Bargelloni, Luca

2010-12-01

Comparative genomics holds the promise to magnify the information obtained from individual genome sequencing projects, revealing common features conserved across genomes and identifying lineage-specific characteristics. To implement such a comparative approach, a robust phylogenetic framework is required to accurately reconstruct evolution at the genome level. Among vertebrate taxa, teleosts represent the second best characterized group, with high-quality draft genome sequences for five model species (Danio rerio, Gasterosteus aculeatus, Oryzias latipes, Takifugu rubripes, and Tetraodon nigroviridis), and several others are in the finishing lane. However, the relationships among the acanthomorph teleost model fishes remain an unresolved taxonomic issue. Here, a genomic region spanning over 1.2 million base pairs was sequenced in the teleost fish Dicentrarchus labrax. Together with genomic data available for the above fish models, the new sequence was used to identify unique orthologous genomic regions shared across all target taxa. Different strategies were applied to produce robust multiple gene and genomic alignments spanning from 11,802 to 186,474 amino acid/nucleotide positions. Ten data sets were analyzed according to Bayesian inference, maximum likelihood, maximum parsimony, and neighbor joining methods. Extensive analyses were performed to explore the influence of several factors (e.g., alignment methodology, substitution model, data set partitions, and long-branch attraction) on the tree topology. Although a general consensus was observed for a closer relationship between G. aculeatus (Gasterosteidae) and Di. labrax (Moronidae) with the atherinomorph O. latipes (Beloniformes) sister taxon of this clade, with the tetraodontiform group Ta. rubripes and Te. nigroviridis (Tetraodontiformes) representing a more distantly related taxon among acanthomorph model fish species, conflicting results were obtained between data sets and methods, especially with respect to the choice of alignment methodology applied to noncoding parts of the genomic region under study. This may limit the use of intergenic/noncoding sequences in phylogenomics until more robust alignment algorithms are developed.
The PRC2-binding long non-coding RNAs in human and mouse genomes are associated with predictive sequence features

NASA Astrophysics Data System (ADS)

Tu, Shiqi; Yuan, Guo-Cheng; Shao, Zhen

2017-01-01

Recently, long non-coding RNAs (lncRNAs) have emerged as an important class of molecules involved in many cellular processes. One of their primary functions is to shape epigenetic landscape through interactions with chromatin modifying proteins. However, mechanisms contributing to the specificity of such interactions remain poorly understood. Here we took the human and mouse lncRNAs that were experimentally determined to have physical interactions with Polycomb repressive complex 2 (PRC2), and systematically investigated the sequence features of these lncRNAs by developing a new computational pipeline for sequences composition analysis, in which each sequence is considered as a series of transitions between adjacent nucleotides. Through that, PRC2-binding lncRNAs were found to be associated with a set of distinctive and evolutionarily conserved sequence features, which can be utilized to distinguish them from the others with considerable accuracy. We further identified fragments of PRC2-binding lncRNAs that are enriched with these sequence features, and found they show strong PRC2-binding signals and are more highly conserved across species than the other parts, implying their functional importance.
Conserved Non-Coding Regulatory Signatures in Arabidopsis Co-Expressed Gene Modules

PubMed Central

Spangler, Jacob B.; Ficklin, Stephen P.; Luo, Feng; Freeling, Michael; Feltus, F. Alex

2012-01-01

Complex traits and other polygenic processes require coordinated gene expression. Co-expression networks model mRNA co-expression: the product of gene regulatory networks. To identify regulatory mechanisms underlying coordinated gene expression in a tissue-enriched context, ten Arabidopsis thaliana co-expression networks were constructed after manually sorting 4,566 RNA profiling datasets into aerial, flower, leaf, root, rosette, seedling, seed, shoot, whole plant, and global (all samples combined) groups. Collectively, the ten networks contained 30% of the measurable genes of Arabidopsis and were circumscribed into 5,491 modules. Modules were scrutinized for cis regulatory mechanisms putatively encoded in conserved non-coding sequences (CNSs) previously identified as remnants of a whole genome duplication event. We determined the non-random association of 1,361 unique CNSs to 1,904 co-expression network gene modules. Furthermore, the CNS elements were placed in the context of known gene regulatory networks (GRNs) by connecting 250 CNS motifs with known GRN cis elements. Our results provide support for a regulatory role of some CNS elements and suggest the functional consequences of CNS activation of co-expression in specific gene sets dispersed throughout the genome. PMID:23024789
Conserved non-coding regulatory signatures in Arabidopsis co-expressed gene modules.

PubMed

Spangler, Jacob B; Ficklin, Stephen P; Luo, Feng; Freeling, Michael; Feltus, F Alex

2012-01-01

Complex traits and other polygenic processes require coordinated gene expression. Co-expression networks model mRNA co-expression: the product of gene regulatory networks. To identify regulatory mechanisms underlying coordinated gene expression in a tissue-enriched context, ten Arabidopsis thaliana co-expression networks were constructed after manually sorting 4,566 RNA profiling datasets into aerial, flower, leaf, root, rosette, seedling, seed, shoot, whole plant, and global (all samples combined) groups. Collectively, the ten networks contained 30% of the measurable genes of Arabidopsis and were circumscribed into 5,491 modules. Modules were scrutinized for cis regulatory mechanisms putatively encoded in conserved non-coding sequences (CNSs) previously identified as remnants of a whole genome duplication event. We determined the non-random association of 1,361 unique CNSs to 1,904 co-expression network gene modules. Furthermore, the CNS elements were placed in the context of known gene regulatory networks (GRNs) by connecting 250 CNS motifs with known GRN cis elements. Our results provide support for a regulatory role of some CNS elements and suggest the functional consequences of CNS activation of co-expression in specific gene sets dispersed throughout the genome.
Transcriptomic analysis of the mussel Elliptio complanata identifies candidate stress-response genes and an abundance of novel or noncoding transcripts

USGS Publications Warehouse

Cornman, Robert S.; Robertson, Laura S.; Galbraith, Heather S.; Blakeslee, Carrie J.

2014-01-01

Mussels are useful indicator species of environmental stress and degradation, and the global decline in freshwater mussel diversity and abundance is of conservation concern. Elliptio complanata is a common freshwater mussel of eastern North America that can serve both as an indicator and as an experimental model for understanding mussel physiology and genetics. To support genetic components of these research goals, we assembled transcriptome contigs from Illumina paired-end reads. Despite efforts to collapse similar contigs, the final assembly was in excess of 136,000 contigs with an N50 of 982 bp. Even so, comparisons to the CEGMA database of conserved eukaryotic genes indicated that ∼20% of genes remain unrepresented. However, numerous candidate stress-response genes were present, and we identified lineage-specific patterns of diversification among molluscs for cytochrome P450 detoxification genes and two saccharide-modifying enzymes: 1,3 beta-galactosyltransferase and fucosyltransferase. Less than a quarter of contigs had protein-level similarity based on modest BLAST and Hmmer3 statistical thresholds. These results add comparative genomic resources for molluscs and suggest a wealth of novel proteins and noncoding transcripts.
Widespread Long Noncoding RNAs as Endogenous Target Mimics for MicroRNAs in Plants1[W

PubMed Central

Wu, Hua-Jun; Wang, Zhi-Min; Wang, Meng; Wang, Xiu-Jie

2013-01-01

Target mimicry is a recently identified regulatory mechanism for microRNA (miRNA) functions in plants in which the decoy RNAs bind to miRNAs via complementary sequences and therefore block the interaction between miRNAs and their authentic targets. Both endogenous decoy RNAs (miRNA target mimics) and engineered artificial RNAs can induce target mimicry effects. Yet until now, only the Induced by Phosphate Starvation1 RNA has been proven to be a functional endogenous microRNA target mimic (eTM). In this work, we developed a computational method and systematically identified intergenic or noncoding gene-originated eTMs for 20 conserved miRNAs in Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa). The predicted miRNA binding sites were well conserved among eTMs of the same miRNA, whereas sequences outside of the binding sites varied a lot. We proved that the eTMs of miR160 and miR166 are functional target mimics and identified their roles in the regulation of plant development. The effectiveness of eTMs for three other miRNAs was also confirmed by transient agroinfiltration assay. PMID:23429259
A Multi-Platform Draft de novo Genome Assembly and Comparative Analysis for the Scarlet Macaw (Ara macao)

PubMed Central

Seabury, Christopher M.; Dowd, Scot E.; Seabury, Paul M.; Raudsepp, Terje; Brightsmith, Donald J.; Liboriussen, Poul; Halley, Yvette; Fisher, Colleen A.; Owens, Elaine; Viswanathan, Ganesh; Tizard, Ian R.

2013-01-01

Data deposition to NCBI Genomes This Whole Genome Shotgun project has been deposited at DDBJ/EMBL/GenBank under the accession AMXX00000000 (SMACv1.0, unscaffolded genome assembly). The version described in this paper is the first version (AMXX01000000). The scaffolded assembly (SMACv1.1) has been deposited at DDBJ/EMBL/GenBank under the accession AOUJ00000000, and is also the first version (AOUJ01000000). Strong biological interest in traits such as the acquisition and utilization of speech, cognitive abilities, and longevity catalyzed the utilization of two next-generation sequencing platforms to provide the first-draft de novo genome assembly for the large, new world parrot Ara macao (Scarlet Macaw). Despite the challenges associated with genome assembly for an outbred avian species, including 951,507 high-quality putative single nucleotide polymorphisms, the final genome assembly (>1.035 Gb) includes more than 997 Mb of unambiguous sequence data (excluding N’s). Cytogenetic analyses including ZooFISH revealed complex rearrangements associated with two scarlet macaw macrochromosomes (AMA6, AMA7), which supports the hypothesis that translocations, fusions, and intragenomic rearrangements are key factors associated with karyotype evolution among parrots. In silico annotation of the scarlet macaw genome provided robust evidence for 14,405 nuclear gene annotation models, their predicted transcripts and proteins, and a complete mitochondrial genome. Comparative analyses involving the scarlet macaw, chicken, and zebra finch genomes revealed high levels of nucleotide-based conservation as well as evidence for overall genome stability among the three highly divergent species. Application of a new whole-genome analysis of divergence involving all three species yielded prioritized candidate genes and noncoding regions for parrot traits of interest (i.e., speech, intelligence, longevity) which were independently supported by the results of previous human GWAS studies. We also observed evidence for genes and noncoding loci that displayed extreme conservation across the three avian lineages, thereby reflecting their likely biological and developmental importance among birds. PMID:23667475

The complete chloroplast genome of Cinnamomum camphora and its comparison with related Lauraceae species.

PubMed

Chen, Caihui; Zheng, Yongjie; Liu, Sian; Zhong, Yongda; Wu, Yanfang; Li, Jiang; Xu, Li-An; Xu, Meng

2017-01-01

Cinnamomum camphora , a member of the Lauraceae family, is a valuable aromatic and timber tree that is indigenous to the south of China and Japan. All parts of Cinnamomum camphora have secretory cells containing different volatile chemical compounds that are utilized as herbal medicines and essential oils. Here, we reported the complete sequencing of the chloroplast genome of Cinnamomum camphora using illumina technology. The chloroplast genome of Cinnamomum camphora is 152,570 bp in length and characterized by a relatively conserved quadripartite structure containing a large single copy region of 93,705 bp, a small single copy region of 19,093 bp and two inverted repeat (IR) regions of 19,886 bp. Overall, the genome contained 123 coding regions, of which 15 were repeated in the IR regions. An analysis of chloroplast sequence divergence revealed that the small single copy region was highly variable among the different genera in the Lauraceae family. A total of 40 repeat structures and 83 simple sequence repeats were detected in both the coding and non-coding regions. A phylogenetic analysis indicated that Calycanthus is most closely related to Lauraceae , both being members of Laurales , which forms a sister group to Magnoliids . The complete sequence of the chloroplast of Cinnamomum camphora will aid in in-depth taxonomical studies of the Lauraceae family in the future. The genetic sequence information will also have valuable applications for chloroplast genetic engineering.
The SHOX region and its mutations.

PubMed

Capone, L; Iughetti, L; Sabatini, S; Bacciaglia, A; Forabosco, A

2010-06-01

The short stature homeobox-containing (SHOX) gene lies in the pseudoautosomal region 1 (PAR1) that comprises 2.6 Mb of the short-arm tips of both the X and Y chromosomes. It is known that its heterozygous mutations cause Leri-Weill dyschondrosteosis (LWD) (OMIM #127300), while its homozygous mutations cause a severe form of dwarfism known as Langer mesomelic dysplasia (LMD) (OMIM #249700). The analysis of 238 LWD patients between 1998 and 2007 by multiple authors shows a prevalence of deletions (46.4%) compared to point mutations (21.2%). On the whole, deletions and point mutations account for about 67% of LWD patients. SHOX is located within a 1000 kb desert region without genes. The comparative genomic analysis of this region between genomes of different vertebrates has led to the identification of evolutionarily conserved non-coding DNA elements (CNE). Further functional studies have shown that one of these CNE downstream of the SHOX gene is necessary for the expression of SHOX; this is considered to be typical "enhancer" activity. Including the enhancer, the overall mutation of the SHOX region in LWD patients does not hold in 100% of cases. Various authors have demonstrated the existence of other CNE both downstream and upstream of SHOX regions. The resulting conclusion is that it is necessary to reanalyze all LWD/LMD patients without SHOX mutations for the presence of mutations in the 5'- and 3'-flanking SHOX regions.
An imprinted non-coding genomic cluster at 14q32 defines clinically relevant molecular subtypes in osteosarcoma across multiple independent datasets.

PubMed

Hill, Katherine E; Kelly, Andrew D; Kuijjer, Marieke L; Barry, William; Rattani, Ahmed; Garbutt, Cassandra C; Kissick, Haydn; Janeway, Katherine; Perez-Atayde, Antonio; Goldsmith, Jeffrey; Gebhardt, Mark C; Arredouani, Mohamed S; Cote, Greg; Hornicek, Francis; Choy, Edwin; Duan, Zhenfeng; Quackenbush, John; Haibe-Kains, Benjamin; Spentzos, Dimitrios

2017-05-15

A microRNA (miRNA) collection on the imprinted 14q32 MEG3 region has been associated with outcome in osteosarcoma. We assessed the clinical utility of this miRNA set and their association with methylation status. We integrated coding and non-coding RNA data from three independent annotated clinical osteosarcoma cohorts (n = 65, n = 27, and n = 25) and miRNA and methylation data from one in vitro (19 cell lines) and one clinical (NCI Therapeutically Applicable Research to Generate Effective Treatments (TARGET) osteosarcoma dataset, n = 80) dataset. We used time-dependent receiver operating characteristic (tdROC) analysis to evaluate the clinical value of candidate miRNA profiles and machine learning approaches to compare the coding and non-coding transcriptional programs of high- and low-risk osteosarcoma tumors and high- versus low-aggressiveness cell lines. In the cell line and TARGET datasets, we also studied the methylation patterns of the MEG3 imprinting control region on 14q32 and their association with miRNA expression and tumor aggressiveness. In the tdROC analysis, miRNA sets on 14q32 showed strong discriminatory power for recurrence and survival in the three clinical datasets. High- or low-risk tumor classification was robust to using different microRNA sets or classification methods. Machine learning approaches showed that genome-wide miRNA profiles and miRNA regulatory networks were quite different between the two outcome groups and mRNA profiles categorized the samples in a manner concordant with the miRNAs, suggesting potential molecular subtypes. Further, miRNA expression patterns were reproducible in comparing high-aggressiveness versus low-aggressiveness cell lines. Methylation patterns in the MEG3 differentially methylated region (DMR) also distinguished high-aggressiveness from low-aggressiveness cell lines and were associated with expression of several 14q32 miRNAs in both the cell lines and the large TARGET clinical dataset. Within the limits of available CpG array coverage, we observed a potential methylation-sensitive regulation of the non-coding RNA cluster by CTCF, a known enhancer-blocking factor. Loss of imprinting/methylation changes in the 14q32 non-coding region defines reproducible previously unrecognized osteosarcoma subtypes with distinct transcriptional programs and biologic and clinical behavior. Future studies will define the precise relationship between 14q32 imprinting, non-coding RNA expression, genomic enhancer binding, and tumor aggressiveness, with possible therapeutic implications for both early- and advanced-stage patients.
Monocyte-specific Accessibility of a Matrix Attachment Region in the Tumor Necrosis Factor Locus*

PubMed Central

Biglione, Sebastian; Tsytsykova, Alla V.; Goldfeld, Anne E.

2011-01-01

Regulation of TNF gene expression is cell type- and stimulus-specific. We have previously identified highly conserved noncoding regulatory elements within DNase I-hypersensitive sites (HSS) located 9 kb upstream (HSS−9) and 3 kb downstream (HSS+3) of the TNF gene, which play an important role in the transcriptional regulation of TNF in T cells. They act as enhancers and interact with the TNF promoter and with each other, generating a higher order chromatin structure. Here, we report a novel monocyte-specific AT-rich DNase I-hypersensitive element located 7 kb upstream of the TNF gene (HSS−7), which serves as a matrix attachment region in monocytes. We show that HSS−7 associates with topoisomerase IIα (Top2) in vivo and that induction of endogenous TNF mRNA expression is suppressed by etoposide, a Top2 inhibitor. Moreover, Top2 binds to and cleaves HSS−7 in in vitro analysis. Thus, HSS−7, which is selectively accessible in monocytes, can tether the TNF locus to the nuclear matrix via matrix attachment region formation, potentially promoting TNF gene expression by acting as a Top2 substrate. PMID:22027829
Computational approach for elucidating interactions of cross-species miRNAs and their targets in Flaviviruses.

PubMed

Shinde, Santosh P; Banerjee, Amit Kumar; Arora, Neelima; Murty, U S N; Sripathi, Venkateswara Rao; Pal-Bhadra, Manika; Bhadra, Utpal

2015-03-01

Combating viral diseases has been a challenging task since time immemorial. Available molecular approaches are limited and not much effective for this daunting task. MicroRNA based therapies have shown promise in recent times. MicroRNAs are tiny non-coding RNAs that regulate translational repression of target mRNA in highly specific manner. In this study, we have determined the target regions for human and viral microRNAs in the conserved genomic regions of selected viruses of Flaviviridae family using miRanda and performed a comparative target selectivity analysis among them. Specific target regions were determined and they were compared extensively among themselves by exploring their position to determine the vicinity. Based on the multiplicity and cooperativity analysis, interaction maps were developed manually to represent the interactions between top-ranking miRNAs and genomes of the viruses considered in this study. Self-organizing map (SOM) was used to cluster the best-ranked microRNAs based on the vital physicochemical properties. This study will provide deep insight into the interrelation of the viral and human microRNAs interactions with the selected Flaviviridae genomes and will help to identify cross-species microRNA targets on the viral genome.
Cancer-specific SNPs originate from low-level heteroplasmic variants in human mitochondrial genomes of a matched cell line pair.

PubMed

Hedberg, Annica; Knutsen, Erik; Løvhaugen, Anne Silje; Jørgensen, Tor Erik; Perander, Maria; Johansen, Steinar D

2018-04-19

Low-level mitochondrial heteroplasmy is a common phenomenon in both normal and cancer cells. Here, we investigate the link between low-level heteroplasmy and mitogenome mutations in a human breast cancer matched cell line by high-throughput sequencing. We identified 23 heteroplasmic sites, of which 15 were common between normal cells (Hs578Bst) and cancer cells (Hs578T). Most sites were clustered within the highly conserved Complex IV and ribosomal RNA genes. Two heteroplasmic variants in normal cells were found as fixed mutations in cancer cells. This indicates a positive selection of these variants in cancer cells. RNA-Seq analysis identified upregulated L-strand specific transcripts in cancer cells, which include three mitochondrial long non-coding RNA molecules. We hypothesize that this is due to two cancer cell-specific mutations in the control region.
Enhancer Evolution across 20 Mammalian Species

PubMed Central

Villar, Diego; Berthelot, Camille; Aldridge, Sarah; Rayner, Tim F.; Lukk, Margus; Pignatelli, Miguel; Park, Thomas J.; Deaville, Robert; Erichsen, Jonathan T.; Jasinska, Anna J.; Turner, James M.A.; Bertelsen, Mads F.; Murchison, Elizabeth P.; Flicek, Paul; Odom, Duncan T.

2015-01-01

Summary The mammalian radiation has corresponded with rapid changes in noncoding regions of the genome, but we lack a comprehensive understanding of regulatory evolution in mammals. Here, we track the evolution of promoters and enhancers active in liver across 20 mammalian species from six diverse orders by profiling genomic enrichment of H3K27 acetylation and H3K4 trimethylation. We report that rapid evolution of enhancers is a universal feature of mammalian genomes. Most of the recently evolved enhancers arise from ancestral DNA exaptation, rather than lineage-specific expansions of repeat elements. In contrast, almost all liver promoters are partially or fully conserved across these species. Our data further reveal that recently evolved enhancers can be associated with genes under positive selection, demonstrating the power of this approach for annotating regulatory adaptations in genomic sequences. These results provide important insight into the functional genetics underpinning mammalian regulatory evolution. PMID:25635462
Long-range evolutionary constraints reveal cis-regulatory interactions on the human X chromosome

PubMed Central

Naville, Magali; Ishibashi, Minaka; Ferg, Marco; Bengani, Hemant; Rinkwitz, Silke; Krecsmarik, Monika; Hawkins, Thomas A.; Wilson, Stephen W.; Manning, Elizabeth; Chilamakuri, Chandra S. R.; Wilson, David I.; Louis, Alexandra; Lucy Raymond, F.; Rastegar, Sepand; Strähle, Uwe; Lenhard, Boris; Bally-Cuif, Laure; van Heyningen, Veronica; FitzPatrick, David R.; Becker, Thomas S.; Roest Crollius, Hugues

2015-01-01

Enhancers can regulate the transcription of genes over long genomic distances. This is thought to lead to selection against genomic rearrangements within such regions that may disrupt this functional linkage. Here we test this concept experimentally using the human X chromosome. We describe a scoring method to identify evolutionary maintenance of linkage between conserved noncoding elements and neighbouring genes. Chromatin marks associated with enhancer function are strongly correlated with this linkage score. We test >1,000 putative enhancers by transgenesis assays in zebrafish to ascertain the identity of the target gene. The majority of active enhancers drive a transgenic expression in a pattern consistent with the known expression of a linked gene. These results show that evolutionary maintenance of linkage is a reliable predictor of an enhancer's function, and provide new information to discover the genetic basis of diseases caused by the mis-regulation of gene expression. PMID:25908307
A two-locus global DNA barcode for land plants: the coding rbcL gene complements the non-coding trnH-psbA spacer region.

PubMed

Kress, W John; Erickson, David L

2007-06-06

A useful DNA barcode requires sufficient sequence variation to distinguish between species and ease of application across a broad range of taxa. Discovery of a DNA barcode for land plants has been limited by intrinsically lower rates of sequence evolution in plant genomes than that observed in animals. This low rate has complicated the trade-off in finding a locus that is universal and readily sequenced and has sufficiently high sequence divergence at the species-level. Here, a global plant DNA barcode system is evaluated by comparing universal application and degree of sequence divergence for nine putative barcode loci, including coding and non-coding regions, singly and in pairs across a phylogenetically diverse set of 48 genera (two species per genus). No single locus could discriminate among species in a pair in more than 79% of genera, whereas discrimination increased to nearly 88% when the non-coding trnH-psbA spacer was paired with one of three coding loci, including rbcL. In silico trials were conducted in which DNA sequences from GenBank were used to further evaluate the discriminatory power of a subset of these loci. These trials supported the earlier observation that trnH-psbA coupled with rbcL can correctly identify and discriminate among related species. A combination of the non-coding trnH-psbA spacer region and a portion of the coding rbcL gene is recommended as a two-locus global land plant barcode that provides the necessary universality and species discrimination.
A regional approach to plant DNA barcoding provides high species resolution of sedges (Carex and Kobresia, Cyperaceae) in the Canadian Arctic Archipelago.

PubMed

Clerc-Blain, Jessica L E; Starr, Julian R; Bull, Roger D; Saarela, Jeffery M

2010-01-01

Previous research on barcoding sedges (Carex) suggested that basic searches within a global barcoding database would probably not resolve more than 60% of the world's some 2000 species. In this study, we take an alternative approach and explore the performance of plant DNA barcoding in the Carex lineage from an explicitly regional perspective. We characterize the utility of a subset of the proposed protein-coding and noncoding plastid barcoding regions (matK, rpoB, rpoC1, rbcL, atpF-atpH, psbK-psbI) for distinguishing species of Carex and Kobresia in the Canadian Arctic Archipelago, a clearly defined eco-geographical region representing 1% of the Earth's landmass. Our results show that matK resolves the greatest number of species of any single-locus (95%), and when combined in a two-locus barcode, it provides 100% species resolution in all but one combination (matK + atpFH) during unweighted pair-group method with arithmetic mean averages (UPGMA) analyses. Noncoding regions were equally or more variable than matK, but as single markers they resolve substantially fewer taxa than matK alone. When difficulties with sequencing and alignment due to microstructural variation in noncoding regions are also considered, our results support other studies in suggesting that protein-coding regions are more practical as barcoding markers. Plastid DNA barcodes are an effective identification tool for species of Carex and Kobresia in the Canadian Arctic Archipelago, a region where the number of co-existing closely related species is limited. We suggest that if a regional approach to plant DNA barcoding was applied on a global scale, it could provide a solution to the generally poor species resolution seen in previous barcoding studies. © 2009 Blackwell Publishing Ltd.
Transcriptome interrogation of human myometrium identifies differentially expressed sense-antisense pairs of protein-coding and long non-coding RNA genes in spontaneous labor at term.

PubMed

Romero, Roberto; Tarca, Adi L; Chaemsaithong, Piya; Miranda, Jezid; Chaiworapongsa, Tinnakorn; Jia, Hui; Hassan, Sonia S; Kalita, Cynthia A; Cai, Juan; Yeo, Lami; Lipovich, Leonard

2014-09-01

To identify differentially expressed long non-coding RNA (lncRNA) genes in human myometrium in women with spontaneous labor at term. Myometrium was obtained from women undergoing cesarean deliveries who were not in labor (n = 19) and women in spontaneous labor at term (n = 20). RNA was extracted and profiled using an Illumina® microarray platform. We have used computational approaches to bound the extent of long non-coding RNA representation on this platform, and to identify co-differentially expressed and correlated pairs of long non-coding RNA genes and protein-coding genes sharing the same genomic loci. We identified co-differential expression and correlation at two genomic loci that contain coding-lncRNA gene pairs: SOCS2-AK054607 and LMCD1-NR_024065 in women in spontaneous labor at term. This co-differential expression and correlation was validated by qRT-PCR, an experimental method completely independent of the microarray analysis. Intriguingly, one of the two lncRNA genes differentially expressed in term labor had a key genomic structure element, a splice site, that lacked evolutionary conservation beyond primates. We provide, for the first time, evidence for coordinated differential expression and correlation of cis-encoded antisense lncRNAs and protein-coding genes with known as well as novel roles in pregnancy in the myometrium of women in spontaneous labor at term.
Systematic analysis and evolution of 5S ribosomal DNA in metazoans.

PubMed

Vierna, J; Wehner, S; Höner zu Siederdissen, C; Martínez-Lage, A; Marz, M

2013-11-01

Several studies on 5S ribosomal DNA (5S rDNA) have been focused on a subset of the following features in mostly one organism: number of copies, pseudogenes, secondary structure, promoter and terminator characteristics, genomic arrangements, types of non-transcribed spacers and evolution. In this work, we systematically analyzed 5S rDNA sequence diversity in available metazoan genomes, and showed organism-specific and evolutionary-conserved features. Putatively functional sequences (12,766) from 97 organisms allowed us to identify general features of this multigene family in animals. Interestingly, we show that each mammal species has a highly conserved (housekeeping) 5S rRNA type and many variable ones. The genomic organization of 5S rDNA is still under debate. Here, we report the occurrence of several paralog 5S rRNA sequences in 58 of the examined species, and a flexible genome organization of 5S rDNA in animals. We found heterogeneous 5S rDNA clusters in several species, supporting the hypothesis of an exchange of 5S rDNA from one locus to another. A rather high degree of variation of upstream, internal and downstream putative regulatory regions appears to characterize metazoan 5S rDNA. We systematically studied the internal promoters and described three different types of termination signals, as well as variable distances between the coding region and the typical termination signal. Finally, we present a statistical method for detection of linkage among noncoding RNA (ncRNA) gene families. This method showed no evolutionary-conserved linkage among 5S rDNAs and any other ncRNA genes within Metazoa, even though we found 5S rDNA to be linked to various ncRNAs in several clades.
Systematic analysis and evolution of 5S ribosomal DNA in metazoans

PubMed Central

Vierna, J; Wehner, S; Höner zu Siederdissen, C; Martínez-Lage, A; Marz, M

2013-01-01

Several studies on 5S ribosomal DNA (5S rDNA) have been focused on a subset of the following features in mostly one organism: number of copies, pseudogenes, secondary structure, promoter and terminator characteristics, genomic arrangements, types of non-transcribed spacers and evolution. In this work, we systematically analyzed 5S rDNA sequence diversity in available metazoan genomes, and showed organism-specific and evolutionary-conserved features. Putatively functional sequences (12 766) from 97 organisms allowed us to identify general features of this multigene family in animals. Interestingly, we show that each mammal species has a highly conserved (housekeeping) 5S rRNA type and many variable ones. The genomic organization of 5S rDNA is still under debate. Here, we report the occurrence of several paralog 5S rRNA sequences in 58 of the examined species, and a flexible genome organization of 5S rDNA in animals. We found heterogeneous 5S rDNA clusters in several species, supporting the hypothesis of an exchange of 5S rDNA from one locus to another. A rather high degree of variation of upstream, internal and downstream putative regulatory regions appears to characterize metazoan 5S rDNA. We systematically studied the internal promoters and described three different types of termination signals, as well as variable distances between the coding region and the typical termination signal. Finally, we present a statistical method for detection of linkage among noncoding RNA (ncRNA) gene families. This method showed no evolutionary-conserved linkage among 5S rDNAs and any other ncRNA genes within Metazoa, even though we found 5S rDNA to be linked to various ncRNAs in several clades. PMID:23838690
Evolutionary conservation and expression of miR-10a-3p in olive flounder and rock bream.

PubMed

Jo, Ara; Im, Jennifer; Lee, Hee-Eun; Jang, Dongmin; Nam, Gyu-Hwi; Mishra, Anshuman; Kim, Woo-Jin; Kim, Won; Cha, Hee-Jae; Kim, Heui-Soo

2017-09-10

MicroRNAs (miRNAs) are small non-coding RNAs (ncRNAs) that mainly bind to the seed sequences located within the 3' untranslated region (3' UTR) of target genes. They perform an important biological function as regulators of gene expression. Different genes can be regulated by the same miRNA, whilst different miRNAs can be regulated by the same genes. Here, the evolutionary conservation and expression pattern of miR-10a-3p in olive flounder and rock bream was examined. Binding sites (AAAUUC) to seed region of the 3' UTR of target genes were highly conserved in various species. The expression pattern of miR-10a-3p was ubiquitous in the examined tissues, whilst its expression level was decreased in gill tissues infected by viral hemorrhagic septicemia virus (VHSV) compared to the normal control. In the case of rock bream, the spleen, kidney, and liver tissues showed dominant expression levels of miR-10a-3p. Only the liver tissues in the rock bream samples infected by the iridovirus indicated a dominant miR-10a-3p expression. The gene ontology (GO) analysis of predicted target genes for miR-10a-3p revealed that multiple genes are related to binding activity, catalytic activity, cell components as well as cellular and metabolic process. Overall the results imply that the miR-10a-3p could be used as a biomarker to detect VHSV infection in olive flounder and iridovirus infection in rock bream. In addition, the data provides fundamental information for further study of the complex interaction between miR-10a-3p and gene expression. Copyright © 2017 Elsevier B.V. All rights reserved.
Identification and characterization of a class of MALAT1 -like genomic loci

DOE PAGES

Zhang, Bin; Mao, Yuntao S.; Diermeier, Sarah D.; ...

2017-05-23

The MALAT1 (Metastasis-Associated Lung Adenocarcinoma Transcript 1) gene encodes a noncoding RNA that is processed into a long nuclear retained transcript ( MALAT1) and a small cytoplasmic tRNA-like transcript (mascRNA). Using an RNA sequence- and structure-based covariance model, we identified more than 130 genomic loci in vertebrate genomes containing the MALAT1 3' end triple-helix structure and its immediate downstream tRNA-like structure, including 44 in the green lizard Anolis carolinensis. Structural and computational analyses revealed a co-occurrence of components of the 3' end module. MALAT1-like genes in Anolis carolinensis are highly expressed in adult testis, thus we named them testis-abundant longmore » noncoding RNAs (tancRNAs). MALAT1-like loci also produce multiple small RNA species, including PIWI-interacting RNAs (piRNAs), from the antisense strand. The 3' ends of tancRNAs serve as potential targets for the PIWI-piRNA complex. Furthermore, we have identified an evolutionarily conserved class of long noncoding RNAs (lncRNAs) with similar structural constraints, post-transcriptional processing, and subcellular localization and a distinct function in spermatocytes.« less
Identification and characterization of a class of MALAT1 -like genomic loci

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Bin; Mao, Yuntao S.; Diermeier, Sarah D.

The MALAT1 (Metastasis-Associated Lung Adenocarcinoma Transcript 1) gene encodes a noncoding RNA that is processed into a long nuclear retained transcript ( MALAT1) and a small cytoplasmic tRNA-like transcript (mascRNA). Using an RNA sequence- and structure-based covariance model, we identified more than 130 genomic loci in vertebrate genomes containing the MALAT1 3' end triple-helix structure and its immediate downstream tRNA-like structure, including 44 in the green lizard Anolis carolinensis. Structural and computational analyses revealed a co-occurrence of components of the 3' end module. MALAT1-like genes in Anolis carolinensis are highly expressed in adult testis, thus we named them testis-abundant longmore » noncoding RNAs (tancRNAs). MALAT1-like loci also produce multiple small RNA species, including PIWI-interacting RNAs (piRNAs), from the antisense strand. The 3' ends of tancRNAs serve as potential targets for the PIWI-piRNA complex. Furthermore, we have identified an evolutionarily conserved class of long noncoding RNAs (lncRNAs) with similar structural constraints, post-transcriptional processing, and subcellular localization and a distinct function in spermatocytes.« less
Dynamic and Widespread lncRNA Expression in a Sponge and the Origin of Animal Complexity

PubMed Central

Gaiti, Federico; Fernandez-Valverde, Selene L.; Nakanishi, Nagayasu; Calcino, Andrew D.; Yanai, Itai; Tanurdzic, Milos; Degnan, Bernard M.

2015-01-01

Long noncoding RNAs (lncRNAs) are important developmental regulators in bilaterian animals. A correlation has been claimed between the lncRNA repertoire expansion and morphological complexity in vertebrate evolution. However, this claim has not been tested by examining morphologically simple animals. Here, we undertake a systematic investigation of lncRNAs in the demosponge Amphimedon queenslandica, a morphologically simple, early-branching metazoan. We combine RNA-Seq data across multiple developmental stages of Amphimedon with a filtering pipeline to conservatively predict 2,935 lncRNAs. These include intronic overlapping lncRNAs, exonic antisense overlapping lncRNAs, long intergenic nonprotein coding RNAs, and precursors for small RNAs. Sponge lncRNAs are remarkably similar to their bilaterian counterparts in being relatively short with few exons and having low primary sequence conservation relative to protein-coding genes. As in bilaterians, a majority of sponge lncRNAs exhibit typical hallmarks of regulatory molecules, including high temporal specificity and dynamic developmental expression. Specific lncRNA expression profiles correlate tightly with conserved protein-coding genes likely involved in a range of developmental and physiological processes, such as the Wnt signaling pathway. Although the majority of Amphimedon lncRNAs appears to be taxonomically restricted with no identifiable orthologs, we find a few cases of conservation between demosponges in lncRNAs that are antisense to coding sequences. Based on the high similarity in the structure, organization, and dynamic expression of sponge lncRNAs to their bilaterian counterparts, we propose that these noncoding RNAs are an ancient feature of the metazoan genome. These results are consistent with lncRNAs regulating the development of animals, regardless of their level of morphological complexity. PMID:25976353
microRNA Therapeutics in Cancer - An Emerging Concept.

PubMed

Shah, Maitri Y; Ferrajoli, Alessandra; Sood, Anil K; Lopez-Berestein, Gabriel; Calin, George A

2016-10-01

MicroRNAs (miRNAs) are an evolutionarily conserved class of small, regulatory non-coding RNAs that negatively regulate protein coding gene and other non-coding transcripts expression. miRNAs have been established as master regulators of cellular processes, and they play a vital role in tumor initiation, progression and metastasis. Further, widespread deregulation of microRNAs have been reported in several cancers, with several microRNAs playing oncogenic and tumor suppressive roles. Based on these, miRNAs have emerged as promising therapeutic tools for cancer management. In this review, we have focused on the roles of miRNAs in tumorigenesis, the miRNA-based therapeutic strategies currently being evaluated for use in cancer, and the advantages and current challenges to their use in the clinic. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
Characterization of noncoding regulatory DNA in the human genome.

PubMed

Elkon, Ran; Agami, Reuven

2017-08-08

Genetic variants associated with common diseases are usually located in noncoding parts of the human genome. Delineation of the full repertoire of functional noncoding elements, together with efficient methods for probing their biological roles, is therefore of crucial importance. Over the past decade, DNA accessibility and various epigenetic modifications have been associated with regulatory functions. Mapping these features across the genome has enabled researchers to begin to document the full complement of putative regulatory elements. High-throughput reporter assays to probe the functions of regulatory regions have also been developed but these methods separate putative regulatory elements from the chromosome so that any effects of chromatin context and long-range regulatory interactions are lost. Definitive assignment of function(s) to putative cis-regulatory elements requires perturbation of these elements. Genome-editing technologies are now transforming our ability to perturb regulatory elements across entire genomes. Interpretation of high-throughput genetic screens that incorporate genome editors might enable the construction of an unbiased map of functional noncoding elements in the human genome.
Genetic diversity of Histoplasma and Sporothrix complexes based on sequences of their ITS1-5.8S-ITS2 regions from the BOLD System.

PubMed

Estrada-Bárcenas, Daniel Alfonso; Vite-Garín, Tania; Navarro-Barranco, Hortensia; de la Torre-Arciniega, Raúl; Pérez-Mejía, Amelia; Rodríguez-Arellanes, Gabriela; Ramirez, Jose Antonio; Humberto Sahaza, Jorge; Taylor, Maria Lucia; Toriello, Conchita

2014-01-01

High sensitivity and specificity of molecular biology techniques have proven usefulness for the detection, identification and typing of different pathogens. The ITS (Internal Transcribed Spacer) regions of the ribosomal DNA are highly conserved non-coding regions, and have been widely used in different studies including the determination of the genetic diversity of human fungal pathogens. This article wants to contribute to the understanding of the intra- and interspecific genetic diversity of isolates of the Histoplasma capsulatum and Sporothrix schenckii species complexes by an analysis of the available sequences of the ITS regions from different sequence databases. ITS1-5.8S-ITS2 sequences of each fungus, either deposited in GenBank, or from our research groups (registered in the Fungi Barcode of Life Database), were analyzed using the maximum likelihood (ML) method. ML analysis of the ITS sequences discriminated isolates from distant geographic origins and particular wild hosts, depending on the fungal species analyzed. This manuscript is part of the series of works presented at the "V International Workshop: Molecular genetic approaches to the study of human pathogenic fungi" (Oaxaca, Mexico, 2012). Copyright © 2013 Revista Iberoamericana de Micología. Published by Elsevier Espana. All rights reserved.

A transcribed ultraconserved noncoding RNA, Uc.173, is a key molecule for the inhibition of lead-induced neuronal apoptosis

PubMed Central

Chen, Lijian; Liu, Meiling; Zhang, Nan; Zhang, Li; Luo, Yuanwei; Liu, Zhenzhong; Dai, Lijun; Jiang, Yiguo

2016-01-01

As a common toxic metal, lead has significant neurotoxicity to brain development. Long non-coding RNAs (lncRNAs) function in multiple biological processes. However, whether lncRNAs are involved in lead-induced neurotoxicity remains unclear. Uc.173 is a lncRNA from a transcribed ultra-conservative region (T-UCR) of human, mouse and rat genomes. We established a lead-induced nerve injury mouse model. It showed the levels of Uc.173 decreased significantly in hippocampus tissue and serum of the model. We further tested the expression of Uc.173 in serum of lead-exposed children, which also showed a tendency to decrease. To explore the effects of Uc.173 on lead-induced nerve injury, we overexpressed Uc.173 in an N2a mouse nerve cell line and found Uc.173 had an inhibitory effect on lead-induced apoptosis of N2a. To investigate the molecular mechanisms of Uc.173 in apoptosis associated with lead-induced nerve injury, we predicted the target microRNAs of Uc.173 by using miRanda, TargetScan and RegRNA. After performing quantitative real-time PCR and bioinformatics analysis, we showed Uc.173 might inter-regulate with miR-291a-3p in lead-induced apoptosis and regulate apoptosis-associated genes. Our study suggests Uc.173 significantly inhibits the apoptosis of nerve cells, which may be mediated by inter-regulation with miRNAs in lead-induced nerve injury. PMID:26683706
A benchmark study of scoring methods for non-coding mutations.

PubMed

Drubay, Damien; Gautheret, Daniel; Michiels, Stefan

2018-05-15

Detailed knowledge of coding sequences has led to different candidate models for pathogenic variant prioritization. Several deleteriousness scores have been proposed for the non-coding part of the genome, but no large-scale comparison has been realized to date to assess their performance. We compared the leading scoring tools (CADD, FATHMM-MKL, Funseq2 and GWAVA) and some recent competitors (DANN, SNP and SOM scores) for their ability to discriminate assumed pathogenic variants from assumed benign variants (using the ClinVar, COSMIC and 1000 genomes project databases). Using the ClinVar benchmark, CADD was the best tool for detecting the pathogenic variants that are mainly located in protein coding gene regions. Using the COSMIC benchmark, FATHMM-MKL, GWAVA and SOMliver outperformed the other tools for pathogenic variants that are typically located in lincRNAs, pseudogenes and other parts of the non-coding genome. However, all tools had low precision, which could potentially be improved by future non-coding genome feature discoveries. These results may have been influenced by the presence of potential benign variants in the COSMIC database. The development of a gold standard as consistent as ClinVar for these regions will be necessary to confirm our tool ranking. The Snakemake, C++ and R codes are freely available from https://github.com/Oncostat/BenchmarkNCVTools and supported on Linux. damien.drubay@gustaveroussy.fr or stefan.michiels@gustaveroussy.fr. Supplementary data are available at Bioinformatics online.
Transcription profiling suggests that mitochondrial topoisomerase IB acts as a topological barrier and regulator of mitochondrial DNA transcription.

PubMed

Dalla Rosa, Ilaria; Zhang, Hongliang; Khiati, Salim; Wu, Xiaolin; Pommier, Yves

2017-12-08

Mitochondrial DNA (mtDNA) is essential for cell viability because it encodes subunits of the respiratory chain complexes. Mitochondrial topoisomerase IB (TOP1MT) facilitates mtDNA replication by removing DNA topological tensions produced during mtDNA transcription, but it appears to be dispensable. To test whether cells lacking TOP1MT have aberrant mtDNA transcription, we performed mitochondrial transcriptome profiling. To that end, we designed and implemented a customized tiling array, which enabled genome-wide, strand-specific, and simultaneous detection of all mitochondrial transcripts. Our technique revealed that Top1mt KO mouse cells process the mitochondrial transcripts normally but that protein-coding mitochondrial transcripts are elevated. Moreover, we found discrete long noncoding RNAs produced by H-strand transcription and encompassing the noncoding regulatory region of mtDNA in human and murine cells and tissues. Of note, these noncoding RNAs were strongly up-regulated in the absence of TOP1MT. In contrast, 7S DNA, produced by mtDNA replication, was reduced in the Top1mt KO cells. We propose that the long noncoding RNA species in the D-loop region are generated by the extension of H-strand transcripts beyond their canonical stop site and that TOP1MT acts as a topological barrier and regulator for mtDNA transcription and D-loop formation.
Evidence for recombination of mtDNA in the marine mussel Mytilus trossulus from the Baltic.

PubMed

Burzyński, Artur; Zbawicka, Małgorzata; Skibinski, David O F; Wenne, Roman

2003-03-01

A number of studies have claimed that recombination occurs in animal mtDNA, although this evidence is controversial. Ladoukakis and Zouros (2001) provided strong evidence for mtDNA recombination in the COIII gene in gonadal tissue in the marine mussel Mytilus galloprovincialis from the Black Sea. The recombinant molecules they reported had not however become established in the population from which experimental animals were sampled. In the present study, we provide further evidence of the generality of mtDNA recombination in Mytilus by reporting recombinant mtDNA molecules in a related mussel species, Mytilus trossulus, from the Baltic. The mtDNA region studied begins in the 16S rRNA gene and terminates in the cytochrome b gene and includes a major noncoding region that may be analogous to the D-loop region observed in other animals. Many bivalve species, including some Mytilus species, are unusual in that they have two mtDNA genomes, one of which is inherited maternally (F genome) the other inherited paternally (M genome). Two recombinant variants reported in the present study have population frequencies of 5% and 36% and appear to be mosaic for F-like and M-like sequences. However, both variants have the noncoding region from the M genome, and both are transmitted to sperm like the M genome. We speculate that acquisition of the noncoding region by the recombinant molecules has conferred a paternal role on mtDNA genomes that otherwise resemble the F genome in sequence.
CRX ChIP-seq reveals the cis-regulatory architecture of mouse photoreceptors

PubMed Central

Corbo, Joseph C.; Lawrence, Karen A.; Karlstetter, Marcus; Myers, Connie A.; Abdelaziz, Musa; Dirkes, William; Weigelt, Karin; Seifert, Martin; Benes, Vladimir; Fritsche, Lars G.; Weber, Bernhard H.F.; Langmann, Thomas

2010-01-01

Approximately 98% of mammalian DNA is noncoding, yet we understand relatively little about the function of this enigmatic portion of the genome. The cis-regulatory elements that control gene expression reside in noncoding regions and can be identified by mapping the binding sites of tissue-specific transcription factors. Cone-rod homeobox (CRX) is a key transcription factor in photoreceptor differentiation and survival, but its in vivo targets are largely unknown. Here, we used chromatin immunoprecipitation with massively parallel sequencing (ChIP-seq) on CRX to identify thousands of cis-regulatory regions around photoreceptor genes in adult mouse retina. CRX directly regulates downstream photoreceptor transcription factors and their target genes via a network of spatially distributed regulatory elements around each locus. CRX-bound regions act in a synergistic fashion to activate transcription and contain multiple CRX binding sites which interact in a spacing- and orientation-dependent manner to fine-tune transcript levels. CRX ChIP-seq was also performed on Nrl−/− retinas, which represent an enriched source of cone photoreceptors. Comparison with the wild-type ChIP-seq data set identified numerous rod- and cone-specific CRX-bound regions as well as many shared elements. Thus, CRX combinatorially orchestrates the transcriptional networks of both rods and cones by coordinating the expression of photoreceptor genes including most retinal disease genes. In addition, this study pinpoints thousands of noncoding regions of relevance to both Mendelian and complex retinal disease. PMID:20693478
Expression analysis and in silico characterization of intronic long noncoding RNAs in renal cell carcinoma: emerging functional associations

PubMed Central

2013-01-01

Background Intronic and intergenic long noncoding RNAs (lncRNAs) are emerging gene expression regulators. The molecular pathogenesis of renal cell carcinoma (RCC) is still poorly understood, and in particular, limited studies are available for intronic lncRNAs expressed in RCC. Methods Microarray experiments were performed with custom-designed arrays enriched with probes for lncRNAs mapping to intronic genomic regions. Samples from 18 primary RCC tumors and 11 nontumor adjacent matched tissues were analyzed. Meta-analyses were performed with microarray expression data from three additional human tissues (normal liver, prostate tumor and kidney nontumor samples), and with large-scale public data for epigenetic regulatory marks and for evolutionarily conserved sequences. Results A signature of 29 intronic lncRNAs differentially expressed between RCC and nontumor samples was obtained (false discovery rate (FDR) <5%). A signature of 26 intronic lncRNAs significantly correlated with the RCC five-year patient survival outcome was identified (FDR <5%, p-value ≤0.01). We identified 4303 intronic antisense lncRNAs expressed in RCC, of which 22% were significantly (p <0.05) cis correlated with the expression of the mRNA in the same locus across RCC and three other human tissues. Gene Ontology (GO) analysis of those loci pointed to 'regulation of biological processes’ as the main enriched category. A module map analysis of the protein-coding genes significantly (p <0.05) trans correlated with the 20% most abundant lncRNAs, identified 51 enriched GO terms (p <0.05). We determined that 60% of the expressed lncRNAs are evolutionarily conserved. At the genomic loci containing the intronic RCC-expressed lncRNAs, a strong association (p <0.001) was found between their transcription start sites and genomic marks such as CpG islands, RNA Pol II binding and histones methylation and acetylation. Conclusion Intronic antisense lncRNAs are widely expressed in RCC tumors. Some of them are significantly altered in RCC in comparison with nontumor samples. The majority of these lncRNAs is evolutionarily conserved and possibly modulated by epigenetic modifications. Our data suggest that these RCC lncRNAs may contribute to the complex network of regulatory RNAs playing a role in renal cell malignant transformation. PMID:24238219
Replication of 13q31.1 Association in Nonsyndromic Cleft Lip with Cleft Palate in Europeans

PubMed Central

Cooper, Margaret E.; Butali, Azeez; Standley, Jennifer; Rigdon, Jennifer; Suzuki1, Satoshi; Gongorjav, Ayana; Shonkhuuz, T. Enkhtur; Natsume, Nagato; Shi, Bing; Marazita, Mary L.; Murray, Jeffrey C.

2015-01-01

Genome wide association (GWA) studies have successfully identified at least a dozen loci associated with orofacial clefts. However, these signals may be unique to specific populations and require replication to validate and extend findings as a prelude to etiologic SNP discovery. We attempted to replicate the findings of a recent meta-analysis of orofacial cleft GWA studies using four different ancestral populations. We studied 946 pedigrees (3436 persons) of European (US white and Danish) and Asian (Japanese and Mongolian) origin. We genotyped six SNPs which represented the most significant P value associations identified in published studies: rs742071 (1p36), rs7590268 (2p21), rs7632427 (3p11.1), rs12543318 (8q21.3), rs8001641 (13q31.1) and rs7179658 (15q22.2). We directly sequenced three non-coding conserved regions 200kb downstream of SPRY2 in 713 cases, 438 controls, and 485 trios from the US, Mongolia, and the Philippines. We found rs8001641 to be significantly associated with cleft lip with cleft palate (NSCLP) in Europeans (p-value=4 × 10−5, ORtransmission=1.86 with 95% confidence interval: 1.38-2.52). We also found several novel sequence variants in the conserved regions in Asian and European samples, which may help to localize common variants contributing directly to the risk for NSCLP. This study confirms the prior association between rs8001641 and NSCLP in European populations. PMID:25786657
IDH-mutant glioma specific association of rs55705857 located at 8q24.21 involves MYC deregulation

PubMed Central

Oktay, Yavuz; Ülgen, Ege; Can, Özge; Akyerli, Cemaliye B.; Yüksel, Şirin; Erdemgil, Yiğit; Durası, İ. Melis; Henegariu, Octavian Ioan; Nanni, E. Paolo; Selevsek, Nathalie; Grossmann, Jonas; Erson-Omay, E. Zeynep; Bai, Hanwen; Gupta, Manu; Lee, William; Turcan, Şevin; Özpınar, Aysel; Huse, Jason T.; Sav, M. Aydın; Flanagan, Adrienne; Günel, Murat; Sezerman, O. Uğur; Yakıcıer, M. Cengiz; Pamir, M. Necmettin; Özduman, Koray

2016-01-01

The single nucleotide polymorphism rs55705857, located in a non-coding but evolutionarily conserved region at 8q24.21, is strongly associated with IDH-mutant glioma development and was suggested to be a causal variant. However, the molecular mechanism underlying this association has remained unknown. With a case control study in 285 gliomas, 316 healthy controls, 380 systemic cancers, 31 other CNS-tumors, and 120 IDH-mutant cartilaginous tumors, we identified that the association was specific to IDH-mutant gliomas. Odds-ratios were 9.25 (5.17–16.52; 95% CI) for IDH-mutated gliomas and 12.85 (5.94–27.83; 95% CI) for IDH-mutated, 1p/19q co-deleted gliomas. Decreasing strength with increasing anaplasia implied a modulatory effect. No somatic mutations were noted at this locus in 114 blood-tumor pairs, nor was there a copy number difference between risk-allele and only-ancestral allele carriers. CCDC26 RNA-expression was rare and not different between the two groups. There were only minor subtype-specific differences in common glioma driver genes. RNA sequencing and LC-MS/MS comparisons pointed to significantly altered MYC-signaling. Baseline enhancer activity of the conserved region specifically on the MYC promoter and its further positive modulation by the SNP risk-allele was shown in vitro. Our findings implicate MYC deregulation as the underlying cause of the observed association. PMID:27282637
A Two-Locus Global DNA Barcode for Land Plants: The Coding rbcL Gene Complements the Non-Coding trnH-psbA Spacer Region

PubMed Central

Kress, W. John; Erickson, David L.

2007-01-01

Background A useful DNA barcode requires sufficient sequence variation to distinguish between species and ease of application across a broad range of taxa. Discovery of a DNA barcode for land plants has been limited by intrinsically lower rates of sequence evolution in plant genomes than that observed in animals. This low rate has complicated the trade-off in finding a locus that is universal and readily sequenced and has sufficiently high sequence divergence at the species-level. Methodology/Principal Findings Here, a global plant DNA barcode system is evaluated by comparing universal application and degree of sequence divergence for nine putative barcode loci, including coding and non-coding regions, singly and in pairs across a phylogenetically diverse set of 48 genera (two species per genus). No single locus could discriminate among species in a pair in more than 79% of genera, whereas discrimination increased to nearly 88% when the non-coding trnH-psbA spacer was paired with one of three coding loci, including rbcL. In silico trials were conducted in which DNA sequences from GenBank were used to further evaluate the discriminatory power of a subset of these loci. These trials supported the earlier observation that trnH-psbA coupled with rbcL can correctly identify and discriminate among related species. Conclusions/Significance A combination of the non-coding trnH-psbA spacer region and a portion of the coding rbcL gene is recommended as a two-locus global land plant barcode that provides the necessary universality and species discrimination. PMID:17551588
Recurrent and functional regulatory mutations in breast cancer.

PubMed

Rheinbay, Esther; Parasuraman, Prasanna; Grimsby, Jonna; Tiao, Grace; Engreitz, Jesse M; Kim, Jaegil; Lawrence, Michael S; Taylor-Weiner, Amaro; Rodriguez-Cuevas, Sergio; Rosenberg, Mara; Hess, Julian; Stewart, Chip; Maruvka, Yosef E; Stojanov, Petar; Cortes, Maria L; Seepo, Sara; Cibulskis, Carrie; Tracy, Adam; Pugh, Trevor J; Lee, Jesse; Zheng, Zongli; Ellisen, Leif W; Iafrate, A John; Boehm, Jesse S; Gabriel, Stacey B; Meyerson, Matthew; Golub, Todd R; Baselga, Jose; Hidalgo-Miranda, Alfredo; Shioda, Toshi; Bernards, Andre; Lander, Eric S; Getz, Gad

2017-07-06

Genomic analysis of tumours has led to the identification of hundreds of cancer genes on the basis of the presence of mutations in protein-coding regions. By contrast, much less is known about cancer-causing mutations in non-coding regions. Here we perform deep sequencing in 360 primary breast cancers and develop computational methods to identify significantly mutated promoters. Clear signals are found in the promoters of three genes. FOXA1, a known driver of hormone-receptor positive breast cancer, harbours a mutational hotspot in its promoter leading to overexpression through increased E2F binding. RMRP and NEAT1, two non-coding RNA genes, carry mutations that affect protein binding to their promoters and alter expression levels. Our study shows that promoter regions harbour recurrent mutations in cancer with functional consequences and that the mutations occur at similar frequencies as in coding regions. Power analyses indicate that more such regions remain to be discovered through deep sequencing of adequately sized cohorts of patients.
Identification of significantly mutated regions across cancer types highlights a rich landscape of functional molecular alterations

PubMed Central

Araya, Carlos L.; Cenik, Can; Reuter, Jason A.; Kiss, Gert; Pande, Vijay S.; Snyder, Michael P.; Greenleaf, William J.

2015-01-01

Cancer sequencing studies have primarily identified cancer-driver genes by the accumulation of protein-altering mutations. An improved method would be annotation-independent, sensitive to unknown distributions of functions within proteins, and inclusive of non-coding drivers. We employed density-based clustering methods in 21 tumor types to detect variably-sized significantly mutated regions (SMRs). SMRs reveal recurrent alterations across a spectrum of coding and non-coding elements, including transcription factor binding sites and untranslated regions mutated in up to ∼15% of specific tumor types. SMRs reveal spatial clustering of mutations at molecular domains and interfaces, often with associated changes in signaling. Mutation frequencies in SMRs demonstrate that distinct protein regions are differentially mutated among tumor types, as exemplified by a linker region of PIK3CA in which biophysical simulations suggest mutations affect regulatory interactions. The functional diversity of SMRs underscores both the varied mechanisms of oncogenic misregulation and the advantage of functionally-agnostic driver identification. PMID:26691984
Evidence of function for conserved noncoding sequences in Arabidopsis thaliana.

PubMed

Spangler, Jacob B; Subramaniam, Sabarinath; Freeling, Michael; Feltus, F Alex

2012-01-01

• Whole genome duplication events provide a lineage with a large reservoir of genes that can be molded by evolutionary forces into phenotypes that fit alternative environments. A well-studied whole genome duplication, the α-event, occurred in an ancestor of the model plant Arabidopsis thaliana. Retained segments of the α-event have been defined in recent years in the form of duplicate protein coding sequences (α-pairs) and associated conserved noncoding DNA sequences (CNSs). Our aim was to identify any association between CNSs and α-pair co-functionality at the gene expression level. • Here, we tested for correlation between CNS counts and α-pair co-expression and expression intensity across nine expression datasets: aerial tissue, flowers, leaves, roots, rosettes, seedlings, seeds, shoots and whole plants. • We provide evidence for a putative regulatory role of the CNSs. The association of CNSs with α-pair co-expression and expression intensity varied by gene function, subgene position and the presence of transcription factor binding motifs. A range of possible CNS regulatory mechanisms, including intron-mediated enhancement, messenger RNA fold stability and transcriptional regulation, are discussed. • This study provides a framework to understand how CNS motifs are involved in the maintenance of gene expression after a whole genome duplication event. © 2011 The Authors. New Phytologist © 2011 New Phytologist Trust.
G-Boxes, Bigfoot Genes, and Environmental Response: Characterization of Intragenomic Conserved Noncoding Sequences in Arabidopsis[W

PubMed Central

Freeling, Michael; Rapaka, Lakshmi; Lyons, Eric; Pedersen, Brent; Thomas, Brian C.

2007-01-01

A tetraploidy left Arabidopsis thaliana with 6358 pairs of homoeologs that, when aligned, generated 14,944 intragenomic conserved noncoding sequences (CNSs). Our previous work assembled these phylogenetic footprints into a database. We show that known transcription factor (TF) binding motifs, including the G-box, are overrepresented in these CNSs. A total of 254 genes spanning long lengths of CNS-rich chromosomes (Bigfoot) dominate this database. Therefore, we made subdatabases: one containing Bigfoot genes and the other containing genes with three to five CNSs (Smallfoot). Bigfoot genes are generally TFs that respond to signals, with their modal CNS positioned 3.1 kb 5′ from the ATG. Smallfoot genes encode components of signal transduction machinery, the cytoskeleton, or involve transcription. We queried each subdatabase with each possible 7-nucleotide sequence. Among hundreds of hits, most were purified from CNSs, and almost all of those significantly enriched in CNSs had no experimental history. The 7-mers in CNSs are not 5′- to 3′-oriented in Bigfoot genes but are often oriented in Smallfoot genes. CNSs with one G-box tend to have two G-boxes. CNSs were shared with the homoeolog only and with no other gene, suggesting that binding site turnover impedes detection. Bigfoot genes may function in adaptation to environmental change. PMID:17496117
G-boxes, bigfoot genes, and environmental response: characterization of intragenomic conserved noncoding sequences in Arabidopsis.

PubMed

Freeling, Michael; Rapaka, Lakshmi; Lyons, Eric; Pedersen, Brent; Thomas, Brian C

2007-05-01

A tetraploidy left Arabidopsis thaliana with 6358 pairs of homoeologs that, when aligned, generated 14,944 intragenomic conserved noncoding sequences (CNSs). Our previous work assembled these phylogenetic footprints into a database. We show that known transcription factor (TF) binding motifs, including the G-box, are overrepresented in these CNSs. A total of 254 genes spanning long lengths of CNS-rich chromosomes (Bigfoot) dominate this database. Therefore, we made subdatabases: one containing Bigfoot genes and the other containing genes with three to five CNSs (Smallfoot). Bigfoot genes are generally TFs that respond to signals, with their modal CNS positioned 3.1 kb 5' from the ATG. Smallfoot genes encode components of signal transduction machinery, the cytoskeleton, or involve transcription. We queried each subdatabase with each possible 7-nucleotide sequence. Among hundreds of hits, most were purified from CNSs, and almost all of those significantly enriched in CNSs had no experimental history. The 7-mers in CNSs are not 5'- to 3'-oriented in Bigfoot genes but are often oriented in Smallfoot genes. CNSs with one G-box tend to have two G-boxes. CNSs were shared with the homoeolog only and with no other gene, suggesting that binding site turnover impedes detection. Bigfoot genes may function in adaptation to environmental change.
GATA: A graphic alignment tool for comparative sequenceanalysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nix, David A.; Eisen, Michael B.

2005-01-01

Several problems exist with current methods used to align DNA sequences for comparative sequence analysis. Most dynamic programming algorithms assume that conserved sequence elements are collinear. This assumption appears valid when comparing orthologous protein coding sequences. Functional constraints on proteins provide strong selective pressure against sequence inversions, and minimize sequence duplications and feature shuffling. For non-coding sequences this collinearity assumption is often invalid. For example, enhancers contain clusters of transcription factor binding sites that change in number, orientation, and spacing during evolution yet the enhancer retains its activity. Dotplot analysis is often used to estimate non-coding sequence relatedness. Yet dotmore » plots do not actually align sequences and thus cannot account well for base insertions or deletions. Moreover, they lack an adequate statistical framework for comparing sequence relatedness and are limited to pairwise comparisons. Lastly, dot plots and dynamic programming text outputs fail to provide an intuitive means for visualizing DNA alignments.« less
Novel Approach to Analyzing MFE of Noncoding RNA Sequences

PubMed Central

George, Tina P.; Thomas, Tessamma

2016-01-01

Genomic studies have become noncoding RNA (ncRNA) centric after the study of different genomes provided enormous information on ncRNA over the past decades. The function of ncRNA is decided by its secondary structure, and across organisms, the secondary structure is more conserved than the sequence itself. In this study, the optimal secondary structure or the minimum free energy (MFE) structure of ncRNA was found based on the thermodynamic nearest neighbor model. MFE of over 2600 ncRNA sequences was analyzed in view of its signal properties. Mathematical models linking MFE to the signal properties were found for each of the four classes of ncRNA analyzed. MFE values computed with the proposed models were in concordance with those obtained with the standard web servers. A total of 95% of the sequences analyzed had deviation of MFE values within ±15% relative to those obtained from standard web servers. PMID:27695341
Novel Approach to Analyzing MFE of Noncoding RNA Sequences.

PubMed

George, Tina P; Thomas, Tessamma

2016-01-01

Genomic studies have become noncoding RNA (ncRNA) centric after the study of different genomes provided enormous information on ncRNA over the past decades. The function of ncRNA is decided by its secondary structure, and across organisms, the secondary structure is more conserved than the sequence itself. In this study, the optimal secondary structure or the minimum free energy (MFE) structure of ncRNA was found based on the thermodynamic nearest neighbor model. MFE of over 2600 ncRNA sequences was analyzed in view of its signal properties. Mathematical models linking MFE to the signal properties were found for each of the four classes of ncRNA analyzed. MFE values computed with the proposed models were in concordance with those obtained with the standard web servers. A total of 95% of the sequences analyzed had deviation of MFE values within ±15% relative to those obtained from standard web servers.
Silencing Effect of Hominoid Highly Conserved Noncoding Sequences on Embryonic Brain Development

PubMed Central

Mahmoudi Saber, Morteza

2017-01-01

Abstract Superfamily Hominoidea, which consists of Hominidae (humans and great apes) and Hylobatidae (gibbons), is well-known for sharing human-like characteristics, however, the genomic origins of these shared unique phenotypes have mainly remained elusive. To decipher the underlying genomic basis of Hominoidea-restricted phenotypes, we identified and characterized Hominoidea-restricted highly conserved noncoding sequences (HCNSs) that are a class of potential regulatory elements which may be involved in evolution of lineage-specific phenotypes. We discovered 679 such HCNSs from human, chimpanzee, gorilla, orangutan and gibbon genomes. These HCNSs were demonstrated to be under purifying selection but with lineage-restricted characteristics different from old CNSs. A significant proportion of their ancestral sequences had accelerated rates of nucleotide substitutions, insertions and deletions during the evolution of common ancestor of Hominoidea, suggesting the intervention of positive Darwinian selection for creating those HCNSs. In contrary to enhancer elements and similar to silencer sequences, these Hominoidea-restricted HCNSs are located in close proximity of transcription start sites. Their target genes are enriched in the nervous system, development and transcription, and they tend to be remotely located from the nearest coding gene. Chip-seq signals and gene expression patterns suggest that Hominoidea-restricted HCNSs are likely to be functional regulatory elements by imposing silencing effects on their target genes in a tissue-restricted manner during fetal brain development. These HCNSs, emerged through adaptive evolution and conserved through purifying selection, represent a set of promising targets for future functional studies of the evolution of Hominoidea-restricted phenotypes. PMID:28633494
Origin and evolution of the long non-coding genes in the X-inactivation center.

PubMed

Romito, Antonio; Rougeulle, Claire

2011-11-01

Random X chromosome inactivation (XCI), the eutherian mechanism of X-linked gene dosage compensation, is controlled by a cis-acting locus termed the X-inactivation center (Xic). One of the striking features that characterize the Xic landscape is the abundance of loci transcribing non-coding RNAs (ncRNAs), including Xist, the master regulator of the inactivation process. Recent comparative genomic analyses have depicted the evolutionary scenario behind the origin of the X-inactivation center, revealing that this locus evolved from a region harboring protein-coding genes. During mammalian radiation, this ancestral protein-coding region was disrupted in the marsupial group, whilst it provided in eutherian lineage the starting material for the non-translated RNAs of the X-inactivation center. The emergence of non-coding genes occurred by a dual mechanism involving loss of protein-coding function of the pre-existing genes and integration of different classes of mobile elements, some of which modeled the structure and sequence of the non-coding genes in a species-specific manner. The rising genes started to produce transcripts that acquired function in regulating the epigenetic status of the X chromosome, as shown for Xist, its antisense Tsix, Jpx, and recently suggested for Ftx. Thus, the appearance of the Xic, which occurred after the divergence between eutherians and marsupials, was the basis for the evolution of random X inactivation as a strategy to achieve dosage compensation. Copyright © 2011. Published by Elsevier Masson SAS.
Evaluation of non-coding variation in GLUT1 deficiency.

PubMed

Liu, Yu-Chi; Lee, Jia Wei Audrey; Bellows, Susannah T; Damiano, John A; Mullen, Saul A; Berkovic, Samuel F; Bahlo, Melanie; Scheffer, Ingrid E; Hildebrand, Michael S

2016-12-01

Loss-of-function mutations in SLC2A1, encoding glucose transporter-1 (GLUT-1), lead to dysfunction of glucose transport across the blood-brain barrier. Ten percent of cases with hypoglycorrhachia (fasting cerebrospinal fluid [CSF] glucose <2.2mmol/L) do not have mutations. We hypothesized that GLUT1 deficiency could be due to non-coding SLC2A1 variants. We performed whole exome sequencing of one proband with a GLUT1 phenotype and hypoglycorrhachia negative for SLC2A1 sequencing and copy number variants. We studied a further 55 patients with different epilepsies and low CSF glucose who did not have exonic mutations or copy number variants. We sequenced non-coding promoter and intronic regions. We performed mRNA studies for the recurrent intronic variant. The proband had a de novo splice site mutation five base pairs from the intron-exon boundary. Three of 55 patients had deep intronic SLC2A1 variants, including a recurrent variant in two. The recurrent variant produced less SLC2A1 mRNA transcript. Fasting CSF glucose levels show an age-dependent correlation, which makes the definition of hypoglycorrhachia challenging. Low CSF glucose levels may be associated with pathogenic SLC2A1 mutations including deep intronic SLC2A1 variants. Extending genetic screening to non-coding regions will enable diagnosis of more patients with GLUT1 deficiency, allowing implementation of the ketogenic diet to improve outcomes. © 2016 Mac Keith Press.

A-to-I editing of coding and non-coding RNAs by ADARs

PubMed Central

Nishikura, Kazuko

2016-01-01

Adenosine deaminases acting on RNA (ADARs) convert adenosine to inosine in double-stranded RNA. This A-to-I editing occurs not only in protein-coding regions of mRNAs, but also frequently in non-coding regions that contain inverted Alu repeats. Editing of coding sequences can result in the expression of functionally altered proteins that are not encoded in the genome, whereas the significance of Alu editing remains largely unknown. Certain microRNA (miRNA) precursors are also edited, leading to reduced expression or altered function of mature miRNAs. Conversely, recent studies indicate that ADAR1 forms a complex with Dicer to promote miRNA processing, revealing a new function of ADAR1 in the regulation of RNA interference. PMID:26648264
Long noncoding RNA linc00617 exhibits oncogenic activity in breast cancer.

PubMed

Li, Hengyu; Zhu, Li; Xu, Lu; Qin, Keyu; Liu, Chaoqian; Yu, Yue; Su, Dongwei; Wu, Kainan; Sheng, Yuan

2017-01-01

Protein-coding genes account for only 2% of the human genome, whereas the vast majority of transcripts are noncoding RNAs including long noncoding RNAs. LncRNAs are involved in the regulation of a diverse array of biological processes, including cancer progression. An evolutionarily conserved lncRNA TUNA, was found to be required for pluripotency of mouse embryonic stem cells. In this study, we found the human ortholog of TUNA, linc00617, was upregulated in breast cancer samples. Linc00617 promoted motility and invasion of breast cancer cells and induced epithelial-mesenchymal-transition (EMT), which was accompanied by generation of stem cell properties. Moreover, knockdown of linc00617 repressed lung metastasis in vivo. We demonstrated that linc00617 upregulated the expression of stemness factor Sox2 in breast cancer cells, which was shown to promote the oncogenic activity of breast cancer cells by stimulating epithelial-to-mesenchymal transition and enhancing the tumor-initiating capacity. Thus, our data indicate that linc00617 functions as an important regulator of EMT and promotes breast cancer progression and metastasis via activating the transcription of Sox2. Together, it suggests that linc00617 may be a potential therapeutic target for aggressive breast cancer. © 2015 Wiley Periodicals, Inc. © 2015 Wiley Periodicals, Inc.
The origins and evolutionary history of human non-coding RNA regulatory networks.

PubMed

Sherafatian, Masih; Mowla, Seyed Javad

2017-04-01

The evolutionary history and origin of the regulatory function of animal non-coding RNAs are not well understood. Lack of conservation of long non-coding RNAs and small sizes of microRNAs has been major obstacles in their phylogenetic analysis. In this study, we tried to shed more light on the evolution of ncRNA regulatory networks by changing our phylogenetic strategy to focus on the evolutionary pattern of their protein coding targets. We used available target databases of miRNAs and lncRNAs to find their protein coding targets in human. We were able to recognize evolutionary hallmarks of ncRNA targets by phylostratigraphic analysis. We found the conventional 3'-UTR and lesser known 5'-UTR targets of miRNAs to be enriched at three consecutive phylostrata. Firstly, in eukaryata phylostratum corresponding to the emergence of miRNAs, our study revealed that miRNA targets function primarily in cell cycle processes. Moreover, the same overrepresentation of the targets observed in the next two consecutive phylostrata, opisthokonta and eumetazoa, corresponded to the expansion periods of miRNAs in animals evolution. Coding sequence targets of miRNAs showed a delayed rise at opisthokonta phylostratum, compared to the 3' and 5' UTR targets of miRNAs. LncRNA regulatory network was the latest to evolve at eumetazoa.
Cytoplasmic long noncoding RNAs are frequently bound to and degraded at ribosomes in human cells

PubMed Central

Carlevaro-Fita, Joana; Rahim, Anisa; Guigó, Roderic; Vardy, Leah A.; Johnson, Rory

2016-01-01

Recent footprinting studies have made the surprising observation that long noncoding RNAs (lncRNAs) physically interact with ribosomes. However, these findings remain controversial, and the overall proportion of cytoplasmic lncRNAs involved is unknown. Here we make a global, absolute estimate of the cytoplasmic and ribosome-associated population of stringently filtered lncRNAs in a human cell line using polysome profiling coupled to spike-in normalized microarray analysis. Fifty-four percent of expressed lncRNAs are detected in the cytoplasm. The majority of these (70%) have >50% of their cytoplasmic copies associated with polysomal fractions. These interactions are lost upon disruption of ribosomes by puromycin. Polysomal lncRNAs are distinguished by a number of 5′ mRNA-like features, including capping and 5′UTR length. On the other hand, nonpolysomal “free cytoplasmic” lncRNAs have more conserved promoters and a wider range of expression across cell types. Exons of polysomal lncRNAs are depleted of endogenous retroviral insertions, suggesting a role for repetitive elements in lncRNA localization. Finally, we show that blocking of ribosomal elongation results in stabilization of many associated lncRNAs. Together these findings suggest that the ribosome is the default destination for the majority of cytoplasmic long noncoding RNAs and may play a role in their degradation. PMID:27090285
Secondary structure of the 3'-noncoding region of flavivirus genomes: comparative analysis of base pairing probabilities.

PubMed

Rauscher, S; Flamm, C; Mandl, C W; Heinz, F X; Stadler, P F

1997-07-01

The prediction of the complete matrix of base pairing probabilities was applied to the 3' noncoding region (NCR) of flavivirus genomes. This approach identifies not only well-defined secondary structure elements, but also regions of high structural flexibility. Flaviviruses, many of which are important human pathogens, have a common genomic organization, but exhibit a significant degree of RNA sequence diversity in the functionally important 3'-NCR. We demonstrate the presence of secondary structures shared by all flaviviruses, as well as structural features that are characteristic for groups of viruses within the genus reflecting the established classification scheme. The significance of most of the predicted structures is corroborated by compensatory mutations. The availability of infectious clones for several flaviviruses will allow the assessment of these structural elements in processes of the viral life cycle, such as replication and assembly.
Sequence variations of the bovine prion protein gene (PRNP) in native Korean Hanwoo cattle

PubMed Central

Choi, Sangho

2012-01-01

Bovine spongiform encephalopathy (BSE) is one of the fatal neurodegenerative diseases known as transmissible spongiform encephalopathies (TSEs) caused by infectious prion proteins. Genetic variations correlated with susceptibility or resistance to TSE in humans and sheep have not been reported for bovine strains including those from Holstein, Jersey, and Japanese Black cattle. Here, we investigated bovine prion protein gene (PRNP) variations in Hanwoo cattle [Bos (B.) taurus coreanae], a native breed in Korea. We identified mutations and polymorphisms in the coding region of PRNP, determined their frequency, and evaluated their significance. We identified four synonymous polymorphisms and two non-synonymous mutations in PRNP, but found no novel polymorphisms. The sequence and number of octapeptide repeats were completely conserved, and the haplotype frequency of the coding region was similar to that of other B. taurus strains. When we examined the 23-bp and 12-bp insertion/deletion (indel) polymorphisms in the non-coding region of PRNP, Hanwoo cattle had a lower deletion allele and 23-bp del/12-bp del haplotype frequency than healthy and BSE-affected animals of other strains. Thus, Hanwoo are seemingly less susceptible to BSE than other strains due to the 23-bp and 12-bp indel polymorphisms. PMID:22705734
Promoter-Specific Expression and Imprint Status of Marsupial IGF2

PubMed Central

Stringer, Jessica M.; Suzuki, Shunsuke; Pask, Andrew J.; Shaw, Geoff; Renfree, Marilyn B.

2012-01-01

In mice and humans, IGF2 has multiple promoters to maintain its complex tissue- and developmental stage-specific imprinting and expression. IGF2 is also imprinted in marsupials, but little is known about its promoter region. In this study, three IGF2 transcripts were isolated from placental and liver samples of the tammar wallaby, Macropus eugenii. Each transcript contained a unique 5' untranslated region, orthologous to the non-coding exons derived from promoters P1–P3 in the human and mouse IGF2 locus. The expression of tammar IGF2 was predominantly from the P2 promoter, similar to humans. Expression of IGF2 was higher in pouch young than in the adult and imprinting was highly tissue and developmental-stage specific. Interestingly, while IGF2 was expressed throughout the placenta, imprinting seemed to be restricted to the vascular, trilaminar region. In addition, IGF2 was monoallelically expressed in the adult mammary gland while in the liver it switched from monoalleleic expression in the pouch young to biallelic in the adult. These data suggest a complex mode of IGF2 regulation in marsupials as seen in eutherian mammals. The conservation of the IGF2 promoters suggests they originated before the divergence of marsupials and eutherians, and have been selectively maintained for at least 160 million years. PMID:22848567
Long-range correlation properties of coding and noncoding DNA sequences: GenBank analysis.

PubMed

Buldyrev, S V; Goldberger, A L; Havlin, S; Mantegna, R N; Matsa, M E; Peng, C K; Simons, M; Stanley, H E

1995-05-01

An open question in computational molecular biology is whether long-range correlations are present in both coding and noncoding DNA or only in the latter. To answer this question, we consider all 33301 coding and all 29453 noncoding eukaryotic sequences--each of length larger than 512 base pairs (bp)--in the present release of the GenBank to dtermine whether there is any statistically significant distinction in their long-range correlation properties. Standard fast Fourier transform (FFT) analysis indicates that coding sequences have practically no correlations in the range from 10 bp to 100 bp (spectral exponent beta=0.00 +/- 0.04, where the uncertainty is two standard deviations). In contrast, for noncoding sequences, the average value of the spectral exponent beta is positive (0.16 +/- 0.05) which unambiguously shows the presence of long-range correlations. We also separately analyze the 874 coding and the 1157 noncoding sequences that have more than 4096 bp and find a larger region of power-law behavior. We calculate the probability that these two data sets (coding and noncoding) were drawn from the same distribution and we find that it is less than 10(-10). We obtain independent confirmation of these findings using the method of detrended fluctuation analysis (DFA), which is designed to treat sequences with statistical heterogeneity, such as DNA's known mosaic structure ("patchiness") arising from the nonstationarity of nucleotide concentration. The near-perfect agreement between the two independent analysis methods, FFT and DFA, increases the confidence in the reliability of our conclusion.
Long-range correlation properties of coding and noncoding DNA sequences: GenBank analysis

NASA Technical Reports Server (NTRS)

Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Matsa, M. E.; Peng, C. K.; Simons, M.; Stanley, H. E.

1995-01-01

An open question in computational molecular biology is whether long-range correlations are present in both coding and noncoding DNA or only in the latter. To answer this question, we consider all 33301 coding and all 29453 noncoding eukaryotic sequences--each of length larger than 512 base pairs (bp)--in the present release of the GenBank to dtermine whether there is any statistically significant distinction in their long-range correlation properties. Standard fast Fourier transform (FFT) analysis indicates that coding sequences have practically no correlations in the range from 10 bp to 100 bp (spectral exponent beta=0.00 +/- 0.04, where the uncertainty is two standard deviations). In contrast, for noncoding sequences, the average value of the spectral exponent beta is positive (0.16 +/- 0.05) which unambiguously shows the presence of long-range correlations. We also separately analyze the 874 coding and the 1157 noncoding sequences that have more than 4096 bp and find a larger region of power-law behavior. We calculate the probability that these two data sets (coding and noncoding) were drawn from the same distribution and we find that it is less than 10(-10). We obtain independent confirmation of these findings using the method of detrended fluctuation analysis (DFA), which is designed to treat sequences with statistical heterogeneity, such as DNA's known mosaic structure ("patchiness") arising from the nonstationarity of nucleotide concentration. The near-perfect agreement between the two independent analysis methods, FFT and DFA, increases the confidence in the reliability of our conclusion.
Measles virus minigenomes encoding two autofluorescent proteins reveal cell-to-cell variation in reporter expression dependent on viral sequences between the transcription units.

PubMed

Rennick, Linda J; Duprex, W Paul; Rima, Bert K

2007-10-01

Transcription from morbillivirus genomes commences at a single promoter in the 3' non-coding terminus, with the six genes being transcribed sequentially. The 3' and 5' untranslated regions (UTRs) of the genes (mRNA sense), together with the intergenic trinucleotide spacer, comprise the non-coding sequences (NCS) of the virus and contain the conserved gene end and gene start signals, respectively. Bicistronic minigenomes containing transcription units (TUs) encoding autofluorescent reporter proteins separated by measles virus (MV) NCS were used to give a direct estimation of gene expression in single, living cells by assessing the relative amounts of each fluorescent protein in each cell. Initially, five minigenomes containing each of the MV NCS were generated. Assays were developed to determine the amount of each fluorescent protein in cells at both cell population and single-cell levels. This revealed significant variations in gene expression between cells expressing the same NCS-containing minigenome. The minigenome containing the M/F NCS produced significantly lower amounts of fluorescent protein from the second TU (TU2), compared with the other minigenomes. A minigenome with a truncated F 5' UTR had increased expression from TU2. This UTR is 524 nt longer than the other MV 5' UTRs. Insertions into the 5' UTR of the enhanced green fluorescent protein gene in the minigenome containing the N/P NCS showed that specific sequences, rather than just the additional length of F 5' UTR, govern this decreased expression from TU2.
Divergent transcription is associated with promoters of transcriptional regulators

PubMed Central

2013-01-01

Background Divergent transcription is a wide-spread phenomenon in mammals. For instance, short bidirectional transcripts are a hallmark of active promoters, while longer transcripts can be detected antisense from active genes in conditions where the RNA degradation machinery is inhibited. Moreover, many described long non-coding RNAs (lncRNAs) are transcribed antisense from coding gene promoters. However, the general significance of divergent lncRNA/mRNA gene pair transcription is still poorly understood. Here, we used strand-specific RNA-seq with high sequencing depth to thoroughly identify antisense transcripts from coding gene promoters in primary mouse tissues. Results We found that a substantial fraction of coding-gene promoters sustain divergent transcription of long non-coding RNA (lncRNA)/mRNA gene pairs. Strikingly, upstream antisense transcription is significantly associated with genes related to transcriptional regulation and development. Their promoters share several characteristics with those of transcriptional developmental genes, including very large CpG islands, high degree of conservation and epigenetic regulation in ES cells. In-depth analysis revealed a unique GC skew profile at these promoter regions, while the associated coding genes were found to have large first exons, two genomic features that might enforce bidirectional transcription. Finally, genes associated with antisense transcription harbor specific H3K79me2 epigenetic marking and RNA polymerase II enrichment profiles linked to an intensified rate of early transcriptional elongation. Conclusions We concluded that promoters of a class of transcription regulators are characterized by a specialized transcriptional control mechanism, which is directly coupled to relaxed bidirectional transcription. PMID:24365181
Non-coding functions of alternative pre-mRNA splicing in development

PubMed Central

Mockenhaupt, Stefan; Makeyev, Eugene V.

2015-01-01

A majority of messenger RNA precursors (pre-mRNAs) in the higher eukaryotes undergo alternative splicing to generate more than one mature product. By targeting the open reading frame region this process increases diversity of protein isoforms beyond the nominal coding capacity of the genome. However, alternative splicing also frequently controls output levels and spatiotemporal features of cellular and organismal gene expression programs. Here we discuss how these non-coding functions of alternative splicing contribute to development through regulation of mRNA stability, translational efficiency and cellular localization. PMID:26493705
First Mitochondrial Genome from Nemouridae (Plecoptera) Reveals Novel Features of the Elongated Control Region and Phylogenetic Implications

PubMed Central

Chen, Zhi-Teng; Du, Yu-Zhou

2017-01-01

The complete mitochondrial genome (mitogenome) of Nemoura nankinensis (Plecoptera: Nemouridae) was sequenced as the first reported mitogenome from the family Nemouridae. The N. nankinensis mitogenome was the longest (16,602 bp) among reported plecopteran mitogenomes, and it contains 37 genes including 13 protein-coding genes (PCGs), 22 transfer RNA (tRNA) genes and two ribosomal RNA (rRNA) genes. Most PCGs used standard ATN as start codons, and TAN as termination codons. All tRNA genes of N. nankinensis could fold into the cloverleaf secondary structures except for trnSer (AGN), whose dihydrouridine (DHU) arm was reduced to a small loop. There was also a large non-coding region (control region, CR) in the N. nankinensis mitogenome. The 1751 bp CR was the longest and had the highest A+T content (81.8%) among stoneflies. A large tandem repeat region, five potential stem-loop (SL) structures, four tRNA-like structures and four conserved sequence blocks (CSBs) were detected in the elongated CR. The presence of these tRNA-like structures in the CR has never been reported in other plecopteran mitogenomes. These novel features of the elongated CR in N. nankinensis may have functions associated with the process of replication and transcription. Finally, phylogenetic reconstruction suggested that Nemouridae was the sister-group of Capniidae. PMID:28475163
First Mitochondrial Genome from Nemouridae (Plecoptera) Reveals Novel Features of the Elongated Control Region and Phylogenetic Implications.

PubMed

Chen, Zhi-Teng; Du, Yu-Zhou

2017-05-05

The complete mitochondrial genome (mitogenome) of Nemoura nankinensis (Plecoptera: Nemouridae) was sequenced as the first reported mitogenome from the family Nemouridae. The N. nankinensis mitogenome was the longest (16,602 bp) among reported plecopteran mitogenomes, and it contains 37 genes including 13 protein-coding genes (PCGs), 22 transfer RNA (tRNA) genes and two ribosomal RNA (rRNA) genes. Most PCGs used standard ATN as start codons, and TAN as termination codons. All tRNA genes of N. nankinensis could fold into the cloverleaf secondary structures except for trnSer ( AGN ), whose dihydrouridine (DHU) arm was reduced to a small loop. There was also a large non-coding region (control region, CR) in the N. nankinensis mitogenome. The 1751 bp CR was the longest and had the highest A+T content (81.8%) among stoneflies. A large tandem repeat region, five potential stem-loop (SL) structures, four tRNA-like structures and four conserved sequence blocks (CSBs) were detected in the elongated CR. The presence of these tRNA-like structures in the CR has never been reported in other plecopteran mitogenomes. These novel features of the elongated CR in N. nankinensis may have functions associated with the process of replication and transcription. Finally, phylogenetic reconstruction suggested that Nemouridae was the sister-group of Capniidae.
Rare pseudoautosomal copy-number variations involving SHOX and/or its flanking regions in individuals with and without short stature.

PubMed

Fukami, Maki; Naiki, Yasuhiro; Muroya, Koji; Hamajima, Takashi; Soneda, Shun; Horikawa, Reiko; Jinno, Tomoko; Katsumi, Momori; Nakamura, Akie; Asakura, Yumi; Adachi, Masanori; Ogata, Tsutomu; Kanzaki, Susumu

2015-09-01

Pseudoautosomal region 1 (PAR1) contains SHOX, in addition to seven highly conserved non-coding DNA elements (CNEs) with cis-regulatory activity. Microdeletions involving SHOX exons 1-6a and/or the CNEs result in idiopathic short stature (ISS) and Leri-Weill dyschondrosteosis (LWD). Here, we report six rare copy-number variations (CNVs) in PAR1 identified through copy-number analyzes of 245 ISS/LWD patients and 15 unaffected individuals. The six CNVs consisted of three microduplications encompassing SHOX and some of the CNEs, two microduplications in the SHOX 3'-region affecting one or four of the downstream CNEs, and a microdeletion involving SHOX exon 6b and its neighboring CNE. The amplified DNA fragments of two SHOX-containing duplications were detected at chromosomal regions adjacent to the original positions. The breakpoints of a SHOX-containing duplication resided within Alu repeats. A microduplication encompassing four downstream CNEs was identified in an unaffected father-daughter pair, whereas the other five CNVs were detected in ISS patients. These results suggest that microduplications involving SHOX cause ISS by disrupting the cis-regulatory machinery of this gene and that at least some of microduplications in PAR1 arise from Alu-mediated non-allelic homologous recombination. The pathogenicity of other rare PAR1-linked CNVs, such as CNE-containing microduplications and exon 6b-flanking microdeletions, merits further investigation.
Differences in Relative Levels of 88 microRNAs in Various Regions of the Normal Adult Human Brain.

PubMed

Filatova, Elena V; Alieva, Anelya; Shadrina, Maria I; Slominsky, Petr A

2017-08-16

Since the discovery of microRNAs (miRNAs) in the 1990s, our knowledge about their biology has grown considerably. The increasing number of studies addressing the role of miRNAs in development and in various diseases emphasizes the need for a comprehensive catalogue of accurate sequence, expression and conservation information regarding the large number of miRNAs proposed recently in all organs and tissues. The objective of this study was to provide data on the levels of miRNA expression in 15 tissues of the normal human brain. We conducted an analysis of the relative levels of 88 of the most abundantly expressed and best characterized miRNA derived postmortem from well-characterized samples of various regions of the brains from five normal individuals. The cluster analysis revealed some differences in the relative levels of these miRNAs among the brain regions studied. Such diversity can be explained by different functioning of these brain regions. We hope that the data from the current study are a resource that will be useful to our colleagues in this exciting field, as more hypotheses will be generated and tested with regard to small noncoding RNA in the human brain in healthy and disease states. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Junk DNA and the long non-coding RNA twist in cancer genetics

PubMed Central

Ling, Hui; Vincent, Kimberly; Pichler, Martin; Fodde, Riccardo; Berindan-Neagoe, Ioana; Slack, Frank J.; Calin, George A

2015-01-01

The central dogma of molecular biology states that the flow of genetic information moves from DNA to RNA to protein. However, in the last decade this dogma has been challenged by new findings on non-coding RNAs (ncRNAs) such as microRNAs (miRNAs). More recently, long non-coding RNAs (lncRNAs) have attracted much attention due to their large number and biological significance. Many lncRNAs have been identified as mapping to regulatory elements including gene promoters and enhancers, ultraconserved regions, and intergenic regions of protein-coding genes. Yet, the biological function and molecular mechanisms of lncRNA in human diseases in general and cancer in particular remain largely unknown. Data from the literature suggest that lncRNA, often via interaction with proteins, functions in specific genomic loci or use their own transcription loci for regulatory activity. In this review, we summarize recent findings supporting the importance of DNA loci in lncRNA function, and the underlying molecular mechanisms via cis or trans regulation, and discuss their implications in cancer. In addition, we use the 8q24 genomic locus, a region containing interactive SNPs, DNA regulatory elements and lncRNAs, as an example to illustrate how single nucleotide polymorphism (SNP) located within lncRNAs may be functionally associated with the individual’s susceptibility to cancer. PMID:25619839
CUDR promotes liver cancer stem cell growth through upregulating TERT and C-Myc

PubMed Central

Pu, Hu; Zheng, Qidi; Li, Haiyan; Wu, Mengying; An, Jiahui; Gui, Xin; Li, Tianming; Lu, Dongdong

2015-01-01

Cancer up-regulated drug resistant (CUDR) is a novel non-coding RNA gene. Herein, we demonstrate excessive CUDR cooperates with excessive CyclinD1 or PTEN depletion to accelerate liver cancer stem cells growth and liver stem cell malignant transformation in vitro and in vivo. Mechanistically, we reveal the decrease of PTEN in cells may lead to increase binding capacity of CUDR to CyclinD1. Therefore, CUDR-CyclinD1 complex loads onto the long noncoding RNA H19 promoter region that may lead to reduce the DNA methylation on H19 promoter region and then to enhance the H19 expression. Strikingly, the overexpression of H19 increases the binding of TERT to TERC and reduces the interplay between TERT with TERRA, thus enhancing the cell telomerase activity and extending the telomere length. On the other hand, insulator CTCF recruits the CUDR-CyclinD1 complx to form the composite CUDR-CyclinD1-insulator CTCF complex which occupancied on the C-myc gene promoter region, increasing the outcome of oncogene C-myc. Ultimately, excessive TERT and C-myc lead to liver cancer stem cell and hepatocyte-like stem cell malignant proliferation. To understand the novel functions of long noncoding RNA CUDR will help in the development of new liver cancer therapeutic and diagnostic approaches. PMID:26513297
Regulation of Six1 expression by evolutionarily conserved enhancers in tetrapods.

PubMed

Sato, Shigeru; Ikeda, Keiko; Shioi, Go; Nakao, Kazuki; Yajima, Hiroshi; Kawakami, Kiyoshi

2012-08-01

The Six1 homeobox gene plays critical roles in vertebrate organogenesis. Mice deficient for Six1 show severe defects in organs such as skeletal muscle, kidney, thymus, sensory organs and ganglia derived from cranial placodes, and mutations in human SIX1 cause branchio-oto-renal syndrome, an autosomal dominant developmental disorder characterized by hearing loss and branchial defects. The present study was designed to identify enhancers responsible for the dynamic expression pattern of Six1 during mouse embryogenesis. The results showed distinct enhancer activities of seven conserved non-coding sequences (CNSs) retained in tetrapod Six1 loci. The activities were detected in all cranial placodes (excluding the lens placode), dorsal root ganglia, somites, nephrogenic cord, notochord and cranial mesoderm. The major Six1-expression domains during development were covered by the sum of activities of these enhancers, together with the previously identified enhancer for the pre-placodal region and foregut endoderm. Thus, the eight CNSs identified in a series of our study represent major evolutionarily conserved enhancers responsible for the expression of Six1 in tetrapods. The results also confirmed that chick electroporation is a robust means to decipher regulatory information stored in vertebrate genomes. Mutational analysis of the most conserved placode-specific enhancer, Six1-21, indicated that the enhancer integrates a variety of inputs from Sox, Pax, Fox, Six, Wnt/Lef1 and basic helix-loop-helix proteins. Positive autoregulation of Six1 is achieved through the regulation of Six protein-binding sites. The identified Six1 enhancers provide valuable tools to understand the mechanism of Six1 regulation and to manipulate gene expression in the developing embryo, particularly in the sensory organs. Copyright © 2012 Elsevier Inc. All rights reserved.
MicroRNAome of Spodoptera frugiperda cells (Sf9) and its alteration following baculovirus infection.

PubMed

Mehrabadi, Mohammad; Hussain, Mazhar; Asgari, Sassan

2013-06-01

MicroRNAs (miRNAs) as small non-coding RNAs play important roles in many biological processes such as development, cell signalling and immune response. Studies also suggest that miRNAs are important in host-virus interactions where the host limits virus infection by differentially expressing miRNAs that target essential viral genes. Here, we identified conserved and new miRNAs from Spodoptera frugiperda cells (Sf9) using a combination of deep sequencing and bioinformatics as well as experimental approaches. S. frugiperda miRNAs share common features of miRNAs in other organisms, such as uracil (U) at the 5' end of miRNA. The 5' ends of the miRNAs were more conserved than the 3' ends, revealing evolutionary protection of the seed region in miRNAs. The predominant miRNAs were found to be conserved among arthropods. The majority of homologous miRNAs were found in Bombyx mori, with 76 of the 90 identified miRNAs. We found that seed shifting and arm switching have happened in this insect's miRNAs. Expression levels of the majority of miRNAs changed following baculovirus infection. Results revealed that baculovirus infection mainly led to an overall suppression of cellular miRNAs. We found four different genes being regulated by sfr-miR-184 at the post-transcriptional level. The data presented here further support conservation of miRNAs in insects and other organisms. In addition, the results reveal a differential expression of host miRNAs upon baculovirus infection, suggesting their potential roles in host-virus interactions. Seed shifting and arm switching happened during evolution of miRNAs in different insects and caused miRNA diversification, which led to changes in the target repository of miRNAs.

CsrB, a noncoding regulatory RNA, is required for BarA-dependent expression of biocontrol traits in Rahnella aquatilis HX2.

PubMed

Mei, Li; Xu, Sanger; Lu, Peng; Lin, Haiping; Guo, Yanbin; Wang, Yongjun

2017-01-01

Rahnella aquatilis is ubiquitous and its certain strains have the applicative potent as a plant growth-promoting rhizobacteria. R. aquatilis HX2 is a biocontrol agent to produce antibacterial substance (ABS) and showed efficient biocontrol against crown gall caused by Agrobacterium vitis on sunflower and grapevine plants. The regulatory network of the ABS production and biocontrol activity is still limited known. In this study, a transposon-mediated mutagenesis strategy was used to investigate the regulators that involved in the biocontrol activity of R. aquatilis HX2. A 366-nt noncoding RNA CsrB was identified in vitro and in vivo, which regulated ABS production and biocontrol activity against crown gall on sunflower plants, respectively. The predicted product of noncoding RNA CsrB contains 14 stem-loop structures and an additional ρ-independent terminator harpin, with 23 characteristic GGA motifs in the loops and other unpaired regions. CsrB is required for ABS production and biocontrol activity in the biocontrol regulation by a two-component regulatory system BarA/UvrY in R. aquatilis HX2. The noncoding RNA CsrB regulates BarA-dependent ABS production and biocontrol activity in R. aquatilis HX2. To the best of our knowledge, this is the first report of noncoding RNA as a regulator for biocontrol function in R. aquatilis.
CsrB, a noncoding regulatory RNA, is required for BarA-dependent expression of biocontrol traits in Rahnella aquatilis HX2

PubMed Central

Lu, Peng; Lin, Haiping; Guo, Yanbin

2017-01-01

Background Rahnella aquatilis is ubiquitous and its certain strains have the applicative potent as a plant growth-promoting rhizobacteria. R. aquatilis HX2 is a biocontrol agent to produce antibacterial substance (ABS) and showed efficient biocontrol against crown gall caused by Agrobacterium vitis on sunflower and grapevine plants. The regulatory network of the ABS production and biocontrol activity is still limited known. Methodology/Principal findings In this study, a transposon-mediated mutagenesis strategy was used to investigate the regulators that involved in the biocontrol activity of R. aquatilis HX2. A 366-nt noncoding RNA CsrB was identified in vitro and in vivo, which regulated ABS production and biocontrol activity against crown gall on sunflower plants, respectively. The predicted product of noncoding RNA CsrB contains 14 stem-loop structures and an additional ρ-independent terminator harpin, with 23 characteristic GGA motifs in the loops and other unpaired regions. CsrB is required for ABS production and biocontrol activity in the biocontrol regulation by a two-component regulatory system BarA/UvrY in R. aquatilis HX2. Conclusion/Significance The noncoding RNA CsrB regulates BarA-dependent ABS production and biocontrol activity in R. aquatilis HX2. To the best of our knowledge, this is the first report of noncoding RNA as a regulator for biocontrol function in R. aquatilis. PMID:29091941
Genomic Editing of Non-Coding RNA Genes with CRISPR/Cas9 Ushers in a Potential Novel Approach to Study and Treat Schizophrenia

PubMed Central

Zhuo, Chuanjun; Hou, Weihong; Hu, Lirong; Lin, Chongguang; Chen, Ce; Lin, Xiaodong

2017-01-01

Schizophrenia is a genetically related mental illness, in which the majority of genetic alterations occur in the non-coding regions of the human genome. In the past decade, a growing number of regulatory non-coding RNAs (ncRNAs) including microRNAs (miRNAs) and long non-coding RNAs (lncRNAs) have been identified to be strongly associated with schizophrenia. However, the studies of these ncRNAs in the pathophysiology of schizophrenia and the reverting of their genetic defects in restoration of the normal phenotype have been hampered by insufficient technology to manipulate these ncRNA genes effectively as well as a lack of appropriate animal models. Most recently, a revolutionary gene editing technology known as Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)/CRISPR-associated nuclease 9 (Cas9; CRISPR/Cas9) has been developed that enable researchers to overcome these challenges. In this review article, we mainly focus on the schizophrenia-related ncRNAs and the use of CRISPR/Cas9-mediated editing on the non-coding regions of the genomic DNA in proving causal relationship between the genetic defects and the pathophysiology of schizophrenia. We subsequently discuss the potential of translating this advanced technology into a clinical therapy for schizophrenia, although the CRISPR/Cas9 technology is currently still in its infancy and immature to put into use in the treatment of diseases. Furthermore, we suggest strategies to accelerate the pace from the bench to the bedside. This review describes the application of the powerful and feasible CRISPR/Cas9 technology to manipulate schizophrenia-associated ncRNA genes. This technology could help researchers tackle this complex health problem and perhaps other genetically related mental disorders due to the overlapping genetic alterations of schizophrenia with other mental illnesses. PMID:28217082
Comprehensive Identification of Long Non-coding RNAs in Purified Cell Types from the Brain Reveals Functional LncRNA in OPC Fate Determination.

PubMed

Dong, Xiaomin; Chen, Kenian; Cuevas-Diaz Duran, Raquel; You, Yanan; Sloan, Steven A; Zhang, Ye; Zong, Shan; Cao, Qilin; Barres, Ben A; Wu, Jia Qian

2015-12-01

Long non-coding RNAs (lncRNAs) (> 200 bp) play crucial roles in transcriptional regulation during numerous biological processes. However, it is challenging to comprehensively identify lncRNAs, because they are often expressed at low levels and with more cell-type specificity than are protein-coding genes. In the present study, we performed ab initio transcriptome reconstruction using eight purified cell populations from mouse cortex and detected more than 5000 lncRNAs. Predicting the functions of lncRNAs using cell-type specific data revealed their potential functional roles in Central Nervous System (CNS) development. We performed motif searches in ENCODE DNase I digital footprint data and Mouse ENCODE promoters to infer transcription factor (TF) occupancy. By integrating TF binding and cell-type specific transcriptomic data, we constructed a novel framework that is useful for systematically identifying lncRNAs that are potentially essential for brain cell fate determination. Based on this integrative analysis, we identified lncRNAs that are regulated during Oligodendrocyte Precursor Cell (OPC) differentiation from Neural Stem Cells (NSCs) and that are likely to be involved in oligodendrogenesis. The top candidate, lnc-OPC, shows highly specific expression in OPCs and remarkable sequence conservation among placental mammals. Interestingly, lnc-OPC is significantly up-regulated in glial progenitors from experimental autoimmune encephalomyelitis (EAE) mouse models compared to wild-type mice. OLIG2-binding sites in the upstream regulatory region of lnc-OPC were identified by ChIP (chromatin immunoprecipitation)-Sequencing and validated by luciferase assays. Loss-of-function experiments confirmed that lnc-OPC plays a functional role in OPC genesis. Overall, our results substantiated the role of lncRNA in OPC fate determination and provided an unprecedented data source for future functional investigations in CNS cell types. We present our datasets and analysis results via the interactive genome browser at our laboratory website that is freely accessible to the research community. This is the first lncRNA expression database of collective populations of glia, vascular cells, and neurons. We anticipate that these studies will advance the knowledge of this major class of non-coding genes and their potential roles in neurological development and diseases.
Characterization and Analysis of Whole Transcriptome of Giant Panda Spleens: Implying Critical Roles of Long Non-Coding RNAs in Immunity.

PubMed

Peng, Rui; Liu, Yuliang; Cai, Zhigang; Shen, Fujun; Chen, Jiasong; Hou, Rong; Zou, Fangdong

2018-01-01

Giant pandas, an endangered species, are a powerful symbol of species conservation. Giant pandas may suffer from a variety of diseases. Owing to their highly specialized diet of bamboo, giant pandas are thought to have a relatively weak ability to resist diseases. The spleen is the largest organ in the lymphatic system. However, there is little known about giant panda spleen at a molecular level. Thus, clarifying the regulatory mechanisms of spleen could help us further understand the immune system of the giant panda as well as its conservation. The two giant panda spleens were from two male individuals, one newborn and one an adult, in a non-pathological condition. The whole transcriptomes of mRNA, lncRNA, miRNA, and circRNA in the two spleens were sequenced using the Illumina HiSeq platform. EBseq and IDEG6 were used to observe the differentially expressed genes (DEGs) between these two spleens. Gene Ontology and KEGG analyses were used to annotate the function of DEGs. Furthermore, networks between non-coding RNAs and protein-coding genes were constructed to investigate the relationship between non-coding RNAs and immune-associated genes. By comparative analysis of the whole transcriptomes of these two spleens, we found that one of the major roles of lncRNAs could be involved in the regulation of immune responses of giant panda spleens. In addition, our results also revealed that microRNAs and circRNAs may have evolved to regulate a large set of biological processes of giant panda spleens, and circRNAs may function as miRNA sponges. To our knowledge, this is the first report of lncRNAs and circRNAs in giant panda, which could be a useful resource for further giant panda research. Our study reveals the potential functional roles of miRNAs, lncRNAs, and circRNAs in giant panda spleen. © 2018 The Author(s). Published by S. Karger AG, Basel.
Detection of 224 candidate structured RNAs by comparative analysis of specific subsets of intergenic regions

PubMed Central

Lünse, Christina E.; Corbino, Keith A.; Ames, Tyler D.; Nelson, James W.; Roth, Adam; Perkins, Kevin R.; Sherlock, Madeline E.

2017-01-01

Abstract The discovery of structured non-coding RNAs (ncRNAs) in bacteria can reveal new facets of biology and biochemistry. Comparative genomics analyses executed by powerful computer algorithms have successfully been used to uncover many novel bacterial ncRNA classes in recent years. However, this general search strategy favors the discovery of more common ncRNA classes, whereas progressively rarer classes are correspondingly more difficult to identify. In the current study, we confront this problem by devising several methods to select subsets of intergenic regions that can concentrate these rare RNA classes, thereby increasing the probability that comparative sequence analysis approaches will reveal their existence. By implementing these methods, we discovered 224 novel ncRNA classes, which include ROOL RNA, an RNA class averaging 581 nt and present in multiple phyla, several highly conserved and widespread ncRNA classes with properties that suggest sophisticated biochemical functions and a multitude of putative cis-regulatory RNA classes involved in a variety of biological processes. We expect that further research on these newly found RNA classes will reveal additional aspects of novel biology, and allow for greater insights into the biochemistry performed by ncRNAs. PMID:28977401
Phylogeography and conservation genetics of Hygrophila pogonocalyx (Acanthaceae) based on atpB-rbcL noncoding spacer cpDNA.

PubMed

Huang, Jao-Ching; Wang, Wei-Kuang; Peng, Ching-I; Chiang, Tzen-Yuh

2005-02-01

Genetic variation in the atpB-rbcL intergenic spacer region of chloroplast DNA (cpDNA) was investigated in Hygrophila pogonocalyx Hayata (Acanthaceae), an endangered and endemic species in Taiwan. In this aquatic species, seed dispersal from capsules via elasticity is constrained by gravity and is thereby confined within populations, resulting in limited gene flow between populations. In this study, a total of 849 bp of the cpDNA atpB-rbcL spacer were sequenced from eight populations of H. pogonocalyx. Nucleotide diversity in the cpDNA is low (theta = 0.00343+/-0.00041). The distribution of genetic variation among populations agrees with an "isolation-by-distance" model. Two geographically correlated groups, the western and eastern regions, were identified in a neighbor-joining tree and a minimum-spanning network. Phylogeographical analyses based on the cpDNA network suggest that the present-day differentiation between western and eastern groups of H. pogonocalyx resulted from past fragmentation. The differentiation between eastern and western populations may be ascribed to isolation since the formation of the Central Mountain Range about 5 million years ago, which is consistent with the rate estimates based on a molecular clock of cpDNA.
Global Identification and Characterization of Transcriptionally Active Regions in the Rice Genome

PubMed Central

Stolc, Viktor; Deng, Wei; He, Hang; Korbel, Jan; Chen, Xuewei; Tongprasit, Waraporn; Ronald, Pamela; Chen, Runsheng; Gerstein, Mark; Wang Deng, Xing

2007-01-01

Genome tiling microarray studies have consistently documented rich transcriptional activity beyond the annotated genes. However, systematic characterization and transcriptional profiling of the putative novel transcripts on the genome scale are still lacking. We report here the identification of 25,352 and 27,744 transcriptionally active regions (TARs) not encoded by annotated exons in the rice (Oryza. sativa) subspecies japonica and indica, respectively. The non-exonic TARs account for approximately two thirds of the total TARs detected by tiling arrays and represent transcripts likely conserved between japonica and indica. Transcription of 21,018 (83%) japonica non-exonic TARs was verified through expression profiling in 10 tissue types using a re-array in which annotated genes and TARs were each represented by five independent probes. Subsequent analyses indicate that about 80% of the japonica TARs that were not assigned to annotated exons can be assigned to various putatively functional or structural elements of the rice genome, including splice variants, uncharacterized portions of incompletely annotated genes, antisense transcripts, duplicated gene fragments, and potential non-coding RNAs. These results provide a systematic characterization of non-exonic transcripts in rice and thus expand the current view of the complexity and dynamics of the rice transcriptome. PMID:17372628
RNA editing differently affects protein-coding genes in D. melanogaster and H. sapiens.

PubMed

Grassi, Luigi; Leoni, Guido; Tramontano, Anna

2015-07-14

When an RNA editing event occurs within a coding sequence it can lead to a different encoded amino acid. The biological significance of these events remains an open question: they can modulate protein functionality, increase the complexity of transcriptomes or arise from a loose specificity of the involved enzymes. We analysed the editing events in coding regions that produce or not a change in the encoded amino acid (nonsynonymous and synonymous events, respectively) in D. melanogaster and in H. sapiens and compared them with the appropriate random models. Interestingly, our results show that the phenomenon has rather different characteristics in the two organisms. For example, we confirm the observation that editing events occur more frequently in non-coding than in coding regions, and report that this effect is much more evident in H. sapiens. Additionally, in this latter organism, editing events tend to affect less conserved residues. The less frequently occurring editing events in Drosophila tend to avoid drastic amino acid changes. Interestingly, we find that, in Drosophila, changes from less frequently used codons to more frequently used ones are favoured, while this is not the case in H. sapiens.
SELEX and SHAPE reveal that sequence motifs and an extended hairpin in the 5' portion of Turnip crinkle virus satellite RNA C mediate fitness in plants.

PubMed

Bayne, Charlie F; Widawski, Max E; Gao, Feng; Masab, Mohammed H; Chattopadhyay, Maitreyi; Murawski, Allison M; Sansevere, Robert M; Lerner, Bryan D; Castillo, Rinaldys J; Griesman, Trevor; Fu, Jiantao; Hibben, Jennifer K; Garcia-Perez, Alma D; Simon, Anne E; Kushner, David B

2018-07-01

Noncoding RNAs use their sequence and/or structure to mediate function(s). The 5' portion (166 nt) of the 356-nt noncoding satellite RNA C (satC) of Turnip crinkle virus (TCV) was previously modeled to contain a central region with two stem-loops (H6 and H7) and a large connecting hairpin (H2). We now report that in vivo functional selection (SELEX) experiments assessing sequence/structure requirements in H2, H6, and H7 reveal that H6 loop sequence motifs were recovered at nonrandom rates and only some residues are proposed to base-pair with accessible complementary sequences within the 5' central region. In vitro SHAPE of SELEX winners indicates that the central region is heavily base-paired, such that along with the lower stem and H2 region, one extensive hairpin exists composing the entire 5' region. As these SELEX winners are highly fit, these characteristics facilitate satRNA amplification in association with TCV in plants. Copyright © 2018 Elsevier Inc. All rights reserved.
The noncoding RNA IPW regulates the imprinted DLK1-DIO3 locus in an induced pluripotent stem cell model of Prader-Willi syndrome.

PubMed

Stelzer, Yonatan; Sagi, Ido; Yanuka, Ofra; Eiges, Rachel; Benvenisty, Nissim

2014-06-01

Parental imprinting is a form of epigenetic regulation that results in parent-of-origin differential gene expression. To study Prader-Willi syndrome (PWS), a developmental imprinting disorder, we generated case-derived induced pluripotent stem cells (iPSCs) harboring distinct aberrations in the affected region on chromosome 15. In studying PWS-iPSCs and human parthenogenetic iPSCs, we unexpectedly found substantial upregulation of virtually all maternally expressed genes (MEGs) in the imprinted DLK1-DIO3 locus on chromosome 14. Subsequently, we determined that IPW, a long noncoding RNA in the critical region of the PWS locus, is a regulator of the DLK1-DIO3 region, as its overexpression in PWS and parthenogenetic iPSCs resulted in downregulation of MEGs in this locus. We further show that gene expression changes in the DLK1-DIO3 region coincide with chromatin modifications rather than DNA methylation levels. Our results suggest that a subset of PWS phenotypes may arise from dysregulation of an imprinted locus distinct from the PWS region.
DCODE.ORG Anthology of Comparative Genomic Tools

DOE Office of Scientific and Technical Information (OSTI.GOV)

Loots, G G; Ovcharenko, I

2005-01-11

Comparative genomics provides the means to demarcate functional regions in anonymous DNA sequences. The successful application of this method to identifying novel genes is currently shifting to deciphering the noncoding encryption of gene regulation across genomes. To facilitate the use of comparative genomics to practical applications in genetics and genomics we have developed several analytical and visualization tools for the analysis of arbitrary sequences and whole genomes. These tools include two alignment tools: zPicture and Mulan; a phylogenetic shadowing tool: eShadow for identifying lineage- and species-specific functional elements; two evolutionary conserved transcription factor analysis tools: rVista and multiTF; a toolmore » for extracting cis-regulatory modules governing the expression of co-regulated genes, CREME; and a dynamic portal to multiple vertebrate and invertebrate genome alignments, the ECR Browser. Here we briefly describe each one of these tools and provide specific examples on their practical applications. All the tools are publicly available at the http://www.dcode.org/ web site.« less
Dcode.org anthology of comparative genomic tools.

PubMed

Loots, Gabriela G; Ovcharenko, Ivan

2005-07-01

Comparative genomics provides the means to demarcate functional regions in anonymous DNA sequences. The successful application of this method to identifying novel genes is currently shifting to deciphering the non-coding encryption of gene regulation across genomes. To facilitate the practical application of comparative sequence analysis to genetics and genomics, we have developed several analytical and visualization tools for the analysis of arbitrary sequences and whole genomes. These tools include two alignment tools, zPicture and Mulan; a phylogenetic shadowing tool, eShadow for identifying lineage- and species-specific functional elements; two evolutionary conserved transcription factor analysis tools, rVista and multiTF; a tool for extracting cis-regulatory modules governing the expression of co-regulated genes, Creme 2.0; and a dynamic portal to multiple vertebrate and invertebrate genome alignments, the ECR Browser. Here, we briefly describe each one of these tools and provide specific examples on their practical applications. All the tools are publicly available at the http://www.dcode.org/ website.
An integrated expression atlas of miRNAs and their promoters in human and mouse

PubMed Central

de Rie, Derek; Abugessaisa, Imad; Alam, Tanvir; Arner, Erik; Arner, Peter; Ashoor, Haitham; Åström, Gaby; Babina, Magda; Bertin, Nicolas; Burroughs, A. Maxwell; Carlisle, Ailsa J.; Daub, Carsten O.; Detmar, Michael; Deviatiiarov, Ruslan; Fort, Alexandre; Gebhard, Claudia; Goldowitz, Daniel; Guhl, Sven; Ha, Thomas J.; Harshbarger, Jayson; Hasegawa, Akira; Hashimoto, Kosuke; Herlyn, Meenhard; Heutink, Peter; Hitchens, Kelly J.; Hon, Chung Chau; Huang, Edward; Ishizu, Yuri; Kai, Chieko; Kasukawa, Takeya; Klinken, Peter; Lassmann, Timo; Lecellier, Charles-Henri; Lee, Weonju; Lizio, Marina; Makeev, Vsevolod; Mathelier, Anthony; Medvedeva, Yulia A.; Mejhert, Niklas; Mungall, Christopher J.; Noma, Shohei; Ohshima, Mitsuhiro; Okada-Hatakeyama, Mariko; Persson, Helena; Rizzu, Patrizia; Roudnicky, Filip; Sætrom, Pål; Sato, Hiroki; Severin, Jessica; Shin, Jay W.; Swoboda, Rolf K.; Tarui, Hiroshi; Toyoda, Hiroo; Vitting-Seerup, Kristoffer; Winteringham, Louise; Yamaguchi, Yoko; Yasuzawa, Kayoko; Yoneda, Misako; Yumoto, Noriko; Zabierowski, Susan; Zhang, Peter G.; Wells, Christine A.; Summers, Kim M.; Kawaji, Hideya; Sandelin, Albin; Rehli, Michael; Hayashizaki, Yoshihide; Carninci, Piero; Forrest, Alistair R. R.; de Hoon, Michiel J. L.

2018-01-01

MicroRNAs (miRNAs) are short non-coding RNAs with key roles in cellular regulation. As part of the fifth edition of the Functional Annotation of Mammalian Genome (FANTOM5) project, we created an integrated expression atlas of miRNAs and their promoters by deep-sequencing 492 short RNA (sRNA) libraries, with matching Cap Analysis Gene Expression (CAGE) data, from 396 human and 47 mouse RNA samples. Promoters were identified for 1,357 human and 804 mouse miRNAs and showed strong sequence conservation between species. We also found that primary and mature miRNA expression levels were correlated, allowing us to use the primary miRNA measurements as a proxy for mature miRNA levels in a total of 1,829 human and 1,029 mouse CAGE libraries. We thus provide a broad atlas of miRNA expression and promoters in primary mammalian cells, establishing a foundation for detailed analysis of miRNA expression patterns and transcriptional control regions. PMID:28829439
The Histone Modification H3K27me3 Is Retained after Gene Duplication and Correlates with Conserved Noncoding Sequences in Arabidopsis

PubMed Central

Berke, Lidija; Snel, Berend

2014-01-01

The histone modification H3K27me3 is involved in repression of transcription and plays a crucial role in developmental transitions in both animals and plants. It is deposited by PRC2 (Polycomb repressive complex 2), a conserved protein complex. In Arabidopsis thaliana, H3K27me3 is found at 15% of all genes. These tend to encode transcription factors and other regulators important for development. However, it is not known how PRC2 is recruited to target loci nor how this set of target genes arose during Arabidopsis evolution. To resolve the latter, we integrated A. thaliana gene families with five independent genome-wide H3K27me3 data sets. Gene families were either significantly enriched or depleted of H3K27me3, showing a strong impact of shared ancestry to H3K27me3 distribution. To quantify this, we performed ancestral state reconstruction of H3K27me3 on phylogenetic trees of gene families. The set of H3K27me3-marked genes changed less than expected by chance, suggesting that H3K27me3 was retained after gene duplication. This retention suggests that the PRC2-recruiting signal could be encoded in the DNA and also conserved among certain duplicated genes. Indeed, H3K27me3-marked genes were overrepresented among paralogs sharing conserved noncoding sequences (CNSs) that are enriched with transcription factor binding sites. The association of upstream CNSs with H3K27me3-marked genes represents the first genome-wide connection between H3K27me3 and potential regulatory elements in plants. Thus, we propose that CNSs likely function as part of the PRC2 recruitment in plants. PMID:24567304
The roles of picornavirus untranslated regions in infection and innate immunity

USDA-ARS?s Scientific Manuscript database

Viral genomes have evolved to maximize their potential of overcoming host defense mechanisms and to induce a variety of disease syndromes. Structurally, a genome of a virus consists of coding and noncoding regions, and both have been shown to contribute to initiation and progression of disease. Ac...
Identification of MicroRNAs in the Coral Stylophora pistillata

PubMed Central

Liew, Yi Jin; Aranda, Manuel; Carr, Adrian; Baumgarten, Sebastian; Zoccola, Didier; Tambutté, Sylvie; Allemand, Denis; Micklem, Gos; Voolstra, Christian R.

2014-01-01

Coral reefs are major contributors to marine biodiversity. However, they are in rapid decline due to global environmental changes such as rising sea surface temperatures, ocean acidification, and pollution. Genomic and transcriptomic analyses have broadened our understanding of coral biology, but a study of the microRNA (miRNA) repertoire of corals is missing. miRNAs constitute a class of small non-coding RNAs of ∼22 nt in size that play crucial roles in development, metabolism, and stress response in plants and animals alike. In this study, we examined the coral Stylophora pistillata for the presence of miRNAs and the corresponding core protein machinery required for their processing and function. Based on small RNA sequencing, we present evidence for 31 bona fide microRNAs, 5 of which (miR-100, miR-2022, miR-2023, miR-2030, and miR-2036) are conserved in other metazoans. Homologues of Argonaute, Piwi, Dicer, Drosha, Pasha, and HEN1 were identified in the transcriptome of S. pistillata based on strong sequence conservation with known RNAi proteins, with additional support derived from phylogenetic trees. Examination of putative miRNA gene targets indicates potential roles in development, metabolism, immunity, and biomineralisation for several of the microRNAs. Here, we present first evidence of a functional RNAi machinery and five conserved miRNAs in S. pistillata, implying that miRNAs play a role in organismal biology of scleractinian corals. Analysis of predicted miRNA target genes in S. pistillata suggests potential roles of miRNAs in symbiosis and coral calcification. Given the importance of miRNAs in regulating gene expression in other metazoans, further expression analyses of small non-coding RNAs in transcriptional studies of corals should be informative about miRNA-affected processes and pathways. PMID:24658574
Predicting Gene Structure Changes Resulting from Genetic Variants via Exon Definition Features.

PubMed

Majoros, William H; Holt, Carson; Campbell, Michael S; Ware, Doreen; Yandell, Mark; Reddy, Timothy E

2018-04-25

Genetic variation that disrupts gene function by altering gene splicing between individuals can substantially influence traits and disease. In those cases, accurately predicting the effects of genetic variation on splicing can be highly valuable for investigating the mechanisms underlying those traits and diseases. While methods have been developed to generate high quality computational predictions of gene structures in reference genomes, the same methods perform poorly when used to predict the potentially deleterious effects of genetic changes that alter gene splicing between individuals. Underlying that discrepancy in predictive ability are the common assumptions by reference gene finding algorithms that genes are conserved, well-formed, and produce functional proteins. We describe a probabilistic approach for predicting recent changes to gene structure that may or may not conserve function. The model is applicable to both coding and noncoding genes, and can be trained on existing gene annotations without requiring curated examples of aberrant splicing. We apply this model to the problem of predicting altered splicing patterns in the genomes of individual humans, and we demonstrate that performing gene-structure prediction without relying on conserved coding features is feasible. The model predicts an unexpected abundance of variants that create de novo splice sites, an observation supported by both simulations and empirical data from RNA-seq experiments. While these de novo splice variants are commonly misinterpreted by other tools as coding or noncoding variants of little or no effect, we find that in some cases they can have large effects on splicing activity and protein products, and we propose that they may commonly act as cryptic factors in disease. The software is available from geneprediction.org/SGRF. bmajoros@duke.edu. Supplementary information is available at Bioinformatics online.
LncRNA, a new component of expanding RNA-protein regulatory network important for animal sperm development.

PubMed

Zhang, Chenwang; Gao, Liuze; Xu, Eugene Yujun

2016-11-01

Spermatogenesis is one of the fundamental processes of sexual reproduction, present in almost all metazoan animals. Like many other reproductive traits, developmental features and traits of spermatogenesis are under strong selective pressure to change, both at morphological and underlying molecular levels. Yet evidence suggests that some fundamental features of spermatogenesis may be ancient and conserved among metazoan species. Identifying the underlying conserved molecular mechanisms could reveal core components of metazoan spermatogenic machinery and provide novel insight into causes of human infertility. Conserved RNA-binding proteins and their interacting RNA network emerge to be a common theme important for animal sperm development. We review research on the recent addition to the RNA family - Long non-coding RNA (lncRNA) and its roles in spermatogenesis in the context of the expanding RNA-protein network. Copyright Â© 2016 Elsevier Ltd. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Helfenbein, Kevin G.; Brown, Wesley M.; Boore, Jeffrey L.

We have sequenced the complete mitochondrial DNA (mtDNA) of the articulate brachiopod Terebratalia transversa. The circular genome is 14,291 bp in size, relatively small compared to other published metazoan mtDNAs. The 37 genes commonly found in animal mtDNA are present; the size decrease is due to the truncation of several tRNA, rRNA, and protein genes, to some nucleotide overlaps, and to a paucity of non-coding nucleotides. Although the gene arrangement differs radically from those reported for other metazoans, some gene junctions are shared with two other articulate brachiopods, Laqueus rubellus and Terebratulina retusa. All genes in the T. transversa mtDNA,more » unlike those in most metazoan mtDNAs reported, are encoded by the same strand. The A+T content (59.1 percent) is low for a metazoan mtDNA, and there is a high propensity for homopolymer runs and a strong base-compositional strand bias. The coding strand is quite G+T-rich, a skew that is shared by the confamilial (laqueid) specie s L. rubellus, but opposite to that found in T. retusa, a cancellothyridid. These compositional skews are strongly reflected in the codon usage patterns and the amino acid compositions of the mitochondrial proteins, with markedly different usage observed between T. retusa and the two laqueids. This observation, plus the similarity of the laqueid non-coding regions to the reverse complement of the non-coding region of the cancellothyridid, suggest that an inversion that resulted in a reversal in the direction of first-strand replication has occurred in one of the two lineages. In addition to the presence of one non-coding region in T. transversa that is comparable to those in the other brachiopod mtDNAs, there are two others with the potential to form secondary structures; one or both of these may be involved in the process of transcript cleavage.« less

Unique XCI evolution in Tokudaia: initial XCI of the neo-X chromosome in Tokudaia muenninki and function loss of XIST in Tokudaia osimensis.

PubMed

Zushi, Hideki; Murata, Chie; Mizushima, Shusei; Nishida, Chizuko; Kuroiwa, Asato

2017-12-01

X chromosome inactivation (XCI) is an essential mechanism to compensate gene dosage in mammals. Here, we show that XCI has evolved differently in two species of the genus Tokudaia. The Amami spiny rat, Tokudaia osimensis, has a single X chromosome in males and females (XO/XO). By contrast, the Okinawa spiny rat, Tokudaia muenninki, has XX/XY sex chromosomes like most mammals, although the X chromosome has acquired a neo-X region by fusion with an autosome. BAC clones containing the XIST gene, which produces the long non-coding RNA XIST required for XCI, were obtained by screening of T. osimensis and T. muenninki BAC libraries. Each clone was mapped to the homologous region of the X inactivation center in the X chromosome of the two species by BAC-FISH. XIST RNAs were expressed in T. muenninki females, whereas no expression was observed in T. osimensis. The sequence of the XIST RNA was compared with that of mouse, showing that the XIST gene is highly conserved in T. muenninki. XIST RNAs were localized to the ancestral X region (Xq), to the heterochromatic region (pericentromeric region), and partially to the neo-X region (Xp). The hybridization pattern correlated with LINE-1 accumulation in Xq but not in Xp. Dosage of genes located on the neo-X chromosome was not compensated, suggesting that the neo-X region is in an early state of XCI. By contrast, many mutations were observed in the XIST gene of T. osimensis, indicating its loss of function in the XO/XO species.
Non-coding functions of alternative pre-mRNA splicing in development.

PubMed

Mockenhaupt, Stefan; Makeyev, Eugene V

2015-12-01

A majority of messenger RNA precursors (pre-mRNAs) in the higher eukaryotes undergo alternative splicing to generate more than one mature product. By targeting the open reading frame region this process increases diversity of protein isoforms beyond the nominal coding capacity of the genome. However, alternative splicing also frequently controls output levels and spatiotemporal features of cellular and organismal gene expression programs. Here we discuss how these non-coding functions of alternative splicing contribute to development through regulation of mRNA stability, translational efficiency and cellular localization. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
Re-annotation, improved large-scale assembly and establishment of a catalogue of noncoding loci for the genome of the model brown alga Ectocarpus.

PubMed

Cormier, Alexandre; Avia, Komlan; Sterck, Lieven; Derrien, Thomas; Wucher, Valentin; Andres, Gwendoline; Monsoor, Misharl; Godfroy, Olivier; Lipinska, Agnieszka; Perrineau, Marie-Mathilde; Van De Peer, Yves; Hitte, Christophe; Corre, Erwan; Coelho, Susana M; Cock, J Mark

2017-04-01

The genome of the filamentous brown alga Ectocarpus was the first to be completely sequenced from within the brown algal group and has served as a key reference genome both for this lineage and for the stramenopiles. We present a complete structural and functional reannotation of the Ectocarpus genome. The large-scale assembly of the Ectocarpus genome was significantly improved and genome-wide gene re-annotation using extensive RNA-seq data improved the structure of 11 108 existing protein-coding genes and added 2030 new loci. A genome-wide analysis of splicing isoforms identified an average of 1.6 transcripts per locus. A large number of previously undescribed noncoding genes were identified and annotated, including 717 loci that produce long noncoding RNAs. Conservation of lncRNAs between Ectocarpus and another brown alga, the kelp Saccharina japonica, suggests that at least a proportion of these loci serve a function. Finally, a large collection of single nucleotide polymorphism-based markers was developed for genetic analyses. These resources are available through an updated and improved genome database. This study significantly improves the utility of the Ectocarpus genome as a high-quality reference for the study of many important aspects of brown algal biology and as a reference for genomic analyses across the stramenopiles. © 2016 The Authors. New Phytologist © 2016 New Phytologist Trust.
A long noncoding RNA contributes to neuropathic pain by silencing Kcna2 in primary afferent neurons

PubMed Central

Zhao, Xiuli; Tang, Zongxiang; Zhang, Hongkang; Atianjoh, Fidelis E.; Zhao, Jian-Yuan; Liang, Lingli; Wang, Wei; Guan, Xiaowei; Kao, Sheng-Chin; Tiwari, Vinod; Gao, Yong-Jing; Hoffman, Paul N.; Cui, Hengmi; Li, Min; Dong, Xinzhong; Tao, Yuan-Xiang

2013-01-01

Neuropathic pain is a refractory disease characterized by maladaptive changes in gene transcription and translation within the sensory pathway. Long noncoding RNAs (lncRNAs) are emerging as new players in gene regulation, but how lncRNAs operate in the development of neuropathic pain is unclear. Here we identify a conserved lncRNA for Kcna2 (named Kcna2 antisense RNA) in first-order sensory neurons of rat dorsal root ganglion (DRG). Peripheral nerve injury increases Kcna2 antisense RNA expression in injured DRG through activation of myeloid zinc finger protein 1, a transcription factor that binds to Kcna2 antisense RNA gene promoter. Mimicking this increase downregulates Kcna2, reduces total Kv current, increases excitability in DRG neurons, and produces neuropathic pain symptoms. Blocking this increase reverses nerve injury-induced downregulation of DRG Kcna2 and attenuates development and maintenance of neuropathic pain. These findings suggest native Kcna2 antisense RNA as a new therapeutic target for the treatment of neuropathic pain. PMID:23792947
Perspectives on the mechanism of transcriptional regulation by long non-coding RNAs.

PubMed

Roberts, Thomas C; Morris, Kevin V; Weinberg, Marc S

2014-01-01

Long non-coding RNAs (lncRNAs) are increasingly being recognized as epigenetic regulators of gene transcription. The diversity and complexity of lncRNA genes means that they exert their regulatory effects by a variety of mechanisms. Although there is still much to be learned about the mechanism of lncRNA function, general principles are starting to emerge. In particular, the application of high throughput (deep) sequencing methodologies has greatly advanced our understanding of lncRNA gene function. lncRNAs function as adaptors that link specific chromatin loci with chromatin-remodeling complexes and transcription factors. lncRNAs can act in cis or trans to guide epigenetic-modifier complexes to distinct genomic sites, or act as scaffolds which recruit multiple proteins simultaneously, thereby coordinating their activities. In this review we discuss the genomic organization of lncRNAs, the importance of RNA secondary structure to lncRNA functionality, the multitude of ways in which they interact with the genome, and what evolutionary conservation tells us about their function.
DUSP11 – An RNA phosphatase that regulates host and viral non-coding RNAs in mammalian cells

PubMed Central

Burke, James M.; Sullivan, Christopher S.

2017-01-01

ABSTRACT Dual-specificity phosphatase 11 (DUSP11) is a conserved protein tyrosine phosphatase (PTP) in metazoans. The cellular substrates and physiologic activities of DUSP11 remain largely unknown. In nematodes, DUSP11 is required for normal development and RNA interference against endogenous RNAs (endo-RNAi) via molecular mechanisms that are not well understood. However, mammals lack analogous endo-RNAi pathways and consequently, a role for DUSP11 in mammalian RNA silencing was unanticipated. Recent work from our laboratory demonstrated that DUSP11 activity alters the silencing potential of noncanonical viral miRNAs in mammalian cells. Our studies further uncovered direct cellular substrates of DUSP11 and suggest that DUSP11 is part of regulatory pathway that controls the abundance of select triphosphorylated noncoding RNAs. Here, we highlight recent findings and present new data that advance understanding of mammalian DUSP11 during gene silencing and discuss the emerging biological activities of DUSP11 in mammalian cells. PMID:28296624
Dissecting non-coding RNA mechanisms in cellulo by single-molecule high-resolution localization and counting

PubMed Central

Pitchiaya, Sethuramasundaram; Krishnan, Vishalakshi; Custer, Thomas C.; Walter, Nils G.

2013-01-01

Non-coding RNAs (ncRNAs) recently were discovered to outnumber their protein-coding counterparts, yet their diverse functions are still poorly understood. Here we report on a method for the intracellular Single-molecule High Resolution Localization and Counting (iSHiRLoC) of microRNAs (miRNAs), a conserved, ubiquitous class of regulatory ncRNAs that controls the expression of over 60% of all mammalian protein coding genes post-transcriptionally, by a mechanism shrouded by seemingly contradictory observations. We present protocols to execute single particle tracking (SPT) and single-molecule counting of functional microinjected, fluorophore-labeled miRNAs and thereby extract diffusion coefficients and molecular stoichiometries of micro-ribonucleoprotein (miRNP) complexes from living and fixed cells, respectively. This probing of miRNAs at the single molecule level sheds new light on the intracellular assembly/disassembly of miRNPs, thus beginning to unravel the dynamic nature of this important gene regulatory pathway and facilitating the development of a parsimonious model for their obscured mechanism of action. PMID:23820309
The molecular dynamics of long noncoding RNA control of transcription in PTEN and its pseudogene

PubMed Central

Lister, Nicholas; Shevchenko, Galina; Walshe, James L.; Groen, Jessica; Johnsson, Per; Vidarsdóttir, Linda; Grander, Dan; Ataide, Sandro F.; Morris, Kevin V.

2017-01-01

RNA has been found to interact with chromatin and modulate gene transcription. In human cells, little is known about how long noncoding RNAs (lncRNAs) interact with target loci in the context of chromatin. We find here, using the phosphatase and tensin homolog (PTEN) pseudogene as a model system, that antisense lncRNAs interact first with a 5′ UTR-containing promoter-spanning transcript, which is then followed by the recruitment of DNA methyltransferase 3a (DNMT3a), ultimately resulting in the transcriptional and epigenetic control of gene expression. Moreover, we find that the lncRNA and promoter-spanning transcript interaction are based on a combination of structural and sequence components of the antisense lncRNA. These observations suggest, on the basis of this one example, that evolutionary pressures may be placed on RNA structure more so than sequence conservation. Collectively, the observations presented here suggest a much more complex and vibrant RNA regulatory world may be operative in the regulation of gene expression. PMID:28847966
RPS8—a New Informative DNA Marker for Phylogeny of Babesia and Theileria Parasites in China

PubMed Central

Tian, Zhan-Cheng; Liu, Guang-Yuan; Yin, Hong; Luo, Jian-Xun; Guan, Gui-Quan; Luo, Jin; Xie, Jun-Ren; Shen, Hui; Tian, Mei-Yuan; Zheng, Jin-feng; Yuan, Xiao-song; Wang, Fang-fang

2013-01-01

Piroplasmosis is a serious debilitating and sometimes fatal disease. Phylogenetic relationships within piroplasmida are complex and remain unclear. We compared the intron–exon structure and DNA sequences of the RPS8 gene from Babesia and Theileria spp. isolates in China. Similar to 18S rDNA, the 40S ribosomal protein S8 gene, RPS8, including both coding and non-coding regions is a useful and novel genetic marker for defining species boundaries and for inferring phylogenies because it tends to have little intra-specific variation but considerable inter-specific difference. However, more samples are needed to verify the usefulness of the RPS8 (coding and non-coding regions) gene as a marker for the phylogenetic position and detection of most Babesia and Theileria species, particularly for some closely related species. PMID:24244571
A transcriptional serenAID: the role of noncoding RNAs in class switch recombination

PubMed Central

Yewdell, William T.; Chaudhuri, Jayanta

2017-01-01

Abstract During an immune response, activated B cells may undergo class switch recombination (CSR), a molecular rearrangement that allows B cells to switch from expressing IgM and IgD to a secondary antibody heavy chain isotype such as IgG, IgA or IgE. Secondary antibody isotypes provide the adaptive immune system with distinct effector functions to optimally combat various pathogens. CSR occurs between repetitive DNA elements within the immunoglobulin heavy chain (Igh) locus, termed switch (S) regions and requires the DNA-modifying enzyme activation-induced cytidine deaminase (AID). AID-mediated DNA deamination within S regions initiates the formation of DNA double-strand breaks, which serve as biochemical beacons for downstream DNA repair pathways that coordinate the ligation of DNA breaks. Myriad factors contribute to optimal AID targeting; however, many of these factors also localize to genomic regions outside of the Igh locus. Thus, a current challenge is to explain the specific targeting of AID to the Igh locus. Recent studies have implicated noncoding RNAs in CSR, suggesting a provocative mechanism that incorporates Igh-specific factors to enable precise AID targeting. Here, we chronologically recount the rich history of noncoding RNAs functioning in CSR to provide a comprehensive context for recent and future discoveries. We present a model for the RNA-guided targeting of AID that attempts to integrate historical and recent findings, and highlight potential caveats. Lastly, we discuss testable hypotheses ripe for current experimentation, and explore promising ideas for future investigations. PMID:28535205
The complete mitochondrial genome of Pomacea canaliculata (Gastropoda: Ampullariidae).

PubMed

Zhou, Xuming; Chen, Yu; Zhu, Shanliang; Xu, Haigen; Liu, Yan; Chen, Lian

2016-01-01

The mitochondrial genome of Pomacea canaliculata (Gastropoda: Ampullariidae) is the first complete mtDNA sequence reported in the genus Pomacea. The total length of mtDNA is 15,707 bp, which containing 13 protein-coding genes, 2 ribosomal RNAs, 22 transfer RNAs, and a 359 bp non-coding region. The A + T content of the overall base composition of H-strand is 71.7% (T: 41%, C: 12.7%, A: 30.7%, G: 15.6%). ATP6, ATP8, CO1, CO2, ND1-3, ND5, ND6, ND4L and Cyt b genes begin with ATG as start codon, CO3 and ND4 begin with ATA. ATP8, CO2-3, ND4L, ND2-6 and Cyt b genes are terminated with TAA as stop codon, ATP6, ND1, and CO1 end with TAG. A long non-coding region is found and a 23 bp repeat unit repeat 11 times in this region.
Whole-genome sequencing identifies EN1 as a determinant of bone density and fracture

PubMed Central

Zheng, Hou-Feng; Forgetta, Vincenzo; Hsu, Yi-Hsiang; Estrada, Karol; Rosello-Diez, Alberto; Leo, Paul J; Dahia, Chitra L; Park-Min, Kyung Hyun; Tobias, Jonathan H; Kooperberg, Charles; Kleinman, Aaron; Styrkarsdottir, Unnur; Liu, Ching-Ti; Uggla, Charlotta; Evans, Daniel S; Nielson, Carrie M; Walter, Klaudia; Pettersson-Kymmer, Ulrika; McCarthy, Shane; Eriksson, Joel; Kwan, Tony; Jhamai, Mila; Trajanoska, Katerina; Memari, Yasin; Min, Josine; Huang, Jie; Danecek, Petr; Wilmot, Beth; Li, Rui; Chou, Wen-Chi; Mokry, Lauren E; Moayyeri, Alireza; Claussnitzer, Melina; Cheng, Chia-Ho; Cheung, Warren; Medina-Gómez, Carolina; Ge, Bing; Chen, Shu-Huang; Choi, Kwangbom; Oei, Ling; Fraser, James; Kraaij, Robert; Hibbs, Matthew A; Gregson, Celia L; Paquette, Denis; Hofman, Albert; Wibom, Carl; Tranah, Gregory J; Marshall, Mhairi; Gardiner, Brooke B; Cremin, Katie; Auer, Paul; Hsu, Li; Ring, Sue; Tung, Joyce Y; Thorleifsson, Gudmar; Enneman, Anke W; van Schoor, Natasja M; de Groot, Lisette C.P.G.M.; van der Velde, Nathalie; Melin, Beatrice; Kemp, John P; Christiansen, Claus; Sayers, Adrian; Zhou, Yanhua; Calderari, Sophie; van Rooij, Jeroen; Carlson, Chris; Peters, Ulrike; Berlivet, Soizik; Dostie, Josée; Uitterlinden, Andre G; Williams, Stephen R.; Farber, Charles; Grinberg, Daniel; LaCroix, Andrea Z; Haessler, Jeff; Chasman, Daniel I; Giulianini, Franco; Rose, Lynda M; Ridker, Paul M; Eisman, John A; Nguyen, Tuan V; Center, Jacqueline R; Nogues, Xavier; Garcia-Giralt, Natalia; Launer, Lenore L; Gudnason, Vilmunder; Mellström, Dan; Vandenput, Liesbeth; Karlsson, Magnus K; Ljunggren, Östen; Svensson, Olle; Hallmans, Göran; Rousseau, François; Giroux, Sylvie; Bussière, Johanne; Arp, Pascal P; Koromani, Fjorda; Prince, Richard L; Lewis, Joshua R; Langdahl, Bente L; Hermann, A Pernille; Jensen, Jens-Erik B; Kaptoge, Stephen; Khaw, Kay-Tee; Reeve, Jonathan; Formosa, Melissa M; Xuereb-Anastasi, Angela; Åkesson, Kristina; McGuigan, Fiona E; Garg, Gaurav; Olmos, Jose M; Zarrabeitia, Maria T; Riancho, Jose A; Ralston, Stuart H; Alonso, Nerea; Jiang, Xi; Goltzman, David; Pastinen, Tomi; Grundberg, Elin; Gauguier, Dominique; Orwoll, Eric S; Karasik, David; Davey-Smith, George; Smith, Albert V; Siggeirsdottir, Kristin; Harris, Tamara B; Zillikens, M Carola; van Meurs, Joyce BJ; Thorsteinsdottir, Unnur; Maurano, Matthew T; Timpson, Nicholas J; Soranzo, Nicole; Durbin, Richard; Wilson, Scott G; Ntzani, Evangelia E; Brown, Matthew A; Stefansson, Kari; Hinds, David A; Spector, Tim; Cupples, L Adrienne; Ohlsson, Claes; Greenwood, Celia MT; Jackson, Rebecca D; Rowe, David W; Loomis, Cynthia A; Evans, David M; Ackert-Bicknell, Cheryl L; Joyner, Alexandra L; Duncan, Emma L; Kiel, Douglas P; Rivadeneira, Fernando; Richards, J Brent

2016-01-01

SUMMARY The extent to which low-frequency (minor allele frequency [MAF] between 1–5%) and rare (MAF ≤ 1%) variants contribute to complex traits and disease in the general population is largely unknown. Bone mineral density (BMD) is highly heritable, is a major predictor of osteoporotic fractures and has been previously associated with common genetic variants1–8, and rare, population-specific, coding variants9. Here we identify novel non-coding genetic variants with large effects on BMD (ntotal = 53,236) and fracture (ntotal = 508,253) in individuals of European ancestry from the general population. Associations for BMD were derived from whole-genome sequencing (n=2,882 from UK10K), whole-exome sequencing (n= 3,549), deep imputation of genotyped samples using a combined UK10K/1000Genomes reference panel (n=26,534), and de-novo replication genotyping (n= 20,271). We identified a low-frequency non-coding variant near a novel locus, EN1, with an effect size 4-fold larger than the mean of previously reported common variants for lumbar spine BMD8 (rs11692564[T], MAF = 1.7%, replication effect size = +0.20 standard deviations [SD], Pmeta = 2×10−14), which was also associated with a decreased risk of fracture (OR = 0.85; P = 2×10−11; ncases = 98,742 and ncontrols = 409,511). Using an En1Cre/flox mouse model, we observed that conditional loss of En1 results in low bone mass, likely as a consequence of high bone turn-over. We also identified a novel low-frequency non-coding variant with large effects on BMD near WNT16 (rs148771817[T], MAF = 1.1%, replication effect size = +0.39 SD, Pmeta = 1×10−11). In general, there was an excess of association signals arising from deleterious coding and conserved non-coding variants. These findings provide evidence that low-frequency non-coding variants have large effects on BMD and fracture, thereby providing rationale for whole-genome sequencing and improved imputation reference panels to study the genetic architecture of complex traits and disease in the general population. PMID:26367794
Dehydration stress extends mRNA 3′ untranslated regions with noncoding RNA functions in Arabidopsis

PubMed Central

Sun, Hai-Xi; Li, Yan; Niu, Qi-Wen; Chua, Nam-Hai

2017-01-01

The 3′ untranslated regions (3′ UTRs) of mRNAs play important roles in the regulation of mRNA localization, translation, and stability. Alternative cleavage and polyadenylation (APA) generates mRNAs with different 3′ UTRs, but the involvement of this process in stress response has not yet been clarified. Here, we report that a subset of stress-related genes exhibits 3′ UTR extensions of their mRNAs during dehydration stress. These extended 3′ UTRs have characteristics of long noncoding RNAs and likely do not interact with miRNAs. Functional studies using T-DNA insertion mutants reveal that they can act as antisense transcripts to repress expression levels of sense genes from the opposite strand or can activate the transcription or lead to read-through transcription of their downstream genes. Further analysis suggests that transcripts with 3′ UTR extensions have weaker poly(A) signals than those without 3′ UTR extensions. Finally, we show that their biogenesis is partially dependent on a trans-acting factor FPA. Taken together, we report that dehydration stress could induce transcript 3′ UTR extensions and elucidate a novel function for these stress-induced 3′ UTR extensions as long noncoding RNAs in the regulation of their neighboring genes. PMID:28522613
Targeted deletion of the 9p21 noncoding coronary artery disease risk interval in mice

DOE Office of Scientific and Technical Information (OSTI.GOV)

Visel, Axel; Zhu, Yiwen; May, Dalit

2010-01-01

Sequence polymorphisms in a 58kb interval on chromosome 9p21 confer a markedly increased risk for coronary artery disease (CAD), the leading cause of death worldwide 1,2. The variants have a substantial impact on the epidemiology of CAD and other life?threatening vascular conditions since nearly a quarter of Caucasians are homozygous for risk alleles. However, the risk interval is devoid of protein?coding genes and the mechanism linking the region to CAD risk has remained enigmatic. Here we show that deletion of the orthologous 70kb noncoding interval on mouse chromosome 4 affects cardiac expression of neighboring genes, as well as proliferation propertiesmore » of vascular cells. Chr4delta70kb/delta70kb mice are viable, but show increased mortality both during development and as adults. Cardiac expression of two genes near the noncoding interval, Cdkn2a and Cdkn2b, is severely reduced in chr4delta70kb/delta70kb mice, indicating that distant-acting gene regulatory functions are located in the noncoding CAD risk interval. Allelespecific expression of Cdkn2b transcripts in heterozygous mice revealed that the deletion affects expression through a cis-acting mechanism. Primary cultures of chr4delta70kb/delta70kb aortic smooth muscle cells exhibited excessive proliferation and diminished senescence, a cellular phenotype consistent with accelerated CAD pathogenesis. Taken together, our results provide direct evidence that the CAD risk interval plays a pivotal role in regulation of cardiac Cdkn2a/b expression and suggest that this region affects CAD progression by altering the dynamics of vascular cell proliferation.« less
RNA-Seq Based Transcriptional Map of Bovine Respiratory Disease Pathogen “Histophilus somni 2336”

PubMed Central

Kumar, Ranjit; Lawrence, Mark L.; Watt, James; Cooksey, Amanda M.; Burgess, Shane C.; Nanduri, Bindu

2012-01-01

Genome structural annotation, i.e., identification and demarcation of the boundaries for all the functional elements in a genome (e.g., genes, non-coding RNAs, proteins and regulatory elements), is a prerequisite for systems level analysis. Current genome annotation programs do not identify all of the functional elements of the genome, especially small non-coding RNAs (sRNAs). Whole genome transcriptome analysis is a complementary method to identify “novel” genes, small RNAs, regulatory regions, and operon structures, thus improving the structural annotation in bacteria. In particular, the identification of non-coding RNAs has revealed their widespread occurrence and functional importance in gene regulation, stress and virulence. However, very little is known about non-coding transcripts in Histophilus somni, one of the causative agents of Bovine Respiratory Disease (BRD) as well as bovine infertility, abortion, septicemia, arthritis, myocarditis, and thrombotic meningoencephalitis. In this study, we report a single nucleotide resolution transcriptome map of H. somni strain 2336 using RNA-Seq method. The RNA-Seq based transcriptome map identified 94 sRNAs in the H. somni genome of which 82 sRNAs were never predicted or reported in earlier studies. We also identified 38 novel potential protein coding open reading frames that were absent in the current genome annotation. The transcriptome map allowed the identification of 278 operon (total 730 genes) structures in the genome. When compared with the genome sequence of a non-virulent strain 129Pt, a disproportionate number of sRNAs (∼30%) were located in genomic region unique to strain 2336 (∼18% of the total genome). This observation suggests that a number of the newly identified sRNAs in strain 2336 may be involved in strain-specific adaptations. PMID:22276113
RNA-seq based transcriptional map of bovine respiratory disease pathogen "Histophilus somni 2336".

PubMed

Kumar, Ranjit; Lawrence, Mark L; Watt, James; Cooksey, Amanda M; Burgess, Shane C; Nanduri, Bindu

2012-01-01

Genome structural annotation, i.e., identification and demarcation of the boundaries for all the functional elements in a genome (e.g., genes, non-coding RNAs, proteins and regulatory elements), is a prerequisite for systems level analysis. Current genome annotation programs do not identify all of the functional elements of the genome, especially small non-coding RNAs (sRNAs). Whole genome transcriptome analysis is a complementary method to identify "novel" genes, small RNAs, regulatory regions, and operon structures, thus improving the structural annotation in bacteria. In particular, the identification of non-coding RNAs has revealed their widespread occurrence and functional importance in gene regulation, stress and virulence. However, very little is known about non-coding transcripts in Histophilus somni, one of the causative agents of Bovine Respiratory Disease (BRD) as well as bovine infertility, abortion, septicemia, arthritis, myocarditis, and thrombotic meningoencephalitis. In this study, we report a single nucleotide resolution transcriptome map of H. somni strain 2336 using RNA-Seq method.The RNA-Seq based transcriptome map identified 94 sRNAs in the H. somni genome of which 82 sRNAs were never predicted or reported in earlier studies. We also identified 38 novel potential protein coding open reading frames that were absent in the current genome annotation. The transcriptome map allowed the identification of 278 operon (total 730 genes) structures in the genome. When compared with the genome sequence of a non-virulent strain 129Pt, a disproportionate number of sRNAs (∼30%) were located in genomic region unique to strain 2336 (∼18% of the total genome). This observation suggests that a number of the newly identified sRNAs in strain 2336 may be involved in strain-specific adaptations.
Functional annotation of HOT regions in the human genome: implications for human disease and cancer

PubMed Central

Li, Hao; Chen, Hebing; Liu, Feng; Ren, Chao; Wang, Shengqi; Bo, Xiaochen; Shu, Wenjie

2015-01-01

Advances in genome-wide association studies (GWAS) and large-scale sequencing studies have resulted in an impressive and growing list of disease- and trait-associated genetic variants. Most studies have emphasised the discovery of genetic variation in coding sequences, however, the noncoding regulatory effects responsible for human disease and cancer biology have been substantially understudied. To better characterise the cis-regulatory effects of noncoding variation, we performed a comprehensive analysis of the genetic variants in HOT (high-occupancy target) regions, which are considered to be one of the most intriguing findings of recent large-scale sequencing studies. We observed that GWAS variants that map to HOT regions undergo a substantial net decrease and illustrate development-specific localisation during haematopoiesis. Additionally, genetic risk variants are disproportionally enriched in HOT regions compared with LOT (low-occupancy target) regions in both disease-relevant and cancer cells. Importantly, this enrichment is biased toward disease- or cancer-specific cell types. Furthermore, we observed that cancer cells generally acquire cancer-specific HOT regions at oncogenes through diverse mechanisms of cancer pathogenesis. Collectively, our findings demonstrate the key roles of HOT regions in human disease and cancer and represent a critical step toward further understanding disease biology, diagnosis, and therapy. PMID:26113264
Functional annotation of HOT regions in the human genome: implications for human disease and cancer.

PubMed

Li, Hao; Chen, Hebing; Liu, Feng; Ren, Chao; Wang, Shengqi; Bo, Xiaochen; Shu, Wenjie

2015-06-26

Advances in genome-wide association studies (GWAS) and large-scale sequencing studies have resulted in an impressive and growing list of disease- and trait-associated genetic variants. Most studies have emphasised the discovery of genetic variation in coding sequences, however, the noncoding regulatory effects responsible for human disease and cancer biology have been substantially understudied. To better characterise the cis-regulatory effects of noncoding variation, we performed a comprehensive analysis of the genetic variants in HOT (high-occupancy target) regions, which are considered to be one of the most intriguing findings of recent large-scale sequencing studies. We observed that GWAS variants that map to HOT regions undergo a substantial net decrease and illustrate development-specific localisation during haematopoiesis. Additionally, genetic risk variants are disproportionally enriched in HOT regions compared with LOT (low-occupancy target) regions in both disease-relevant and cancer cells. Importantly, this enrichment is biased toward disease- or cancer-specific cell types. Furthermore, we observed that cancer cells generally acquire cancer-specific HOT regions at oncogenes through diverse mechanisms of cancer pathogenesis. Collectively, our findings demonstrate the key roles of HOT regions in human disease and cancer and represent a critical step toward further understanding disease biology, diagnosis, and therapy.
Regulation of an antisense RNA with the transition of neonatal to IIb myosin heavy chain during postnatal development and hypothyroidism in rat skeletal muscle

PubMed Central

Jiang, Weihua; Qin, Anqi X.; Bodell, Paul W.; Baldwin, Kenneth M.; Haddad, Fadia

2012-01-01

Postnatal development of fast skeletal muscle is characterized by a transition in expression of myosin heavy chain (MHC) isoforms, from primarily neonatal MHC at birth to primarily IIb MHC in adults, in a tightly coordinated manner. These isoforms are encoded by distinct genes, which are separated by ∼17 kb on rat chromosome 10. The neonatal-to-IIb MHC transition is inhibited by a hypothyroid state. We examined RNA products [mRNA, pre-mRNA, and natural antisense transcript (NAT)] of developmental and adult-expressed MHC genes (embryonic, neonatal, I, IIa, IIx, and IIb) at 2, 10, 20, and 40 days after birth in normal and thyroid-deficient rat neonates treated with propylthiouracil. We found that a long noncoding antisense-oriented RNA transcript, termed bII NAT, is transcribed from a site within the IIb-Neo intergenic region and across most of the IIb MHC gene. NATs have previously been shown to mediate transcriptional repression of sense-oriented counterparts. The bII NAT is transcriptionally regulated during postnatal development and in response to hypothyroidism. Evidence for a regulatory mechanism is suggested by an inverse relationship between IIb MHC and bII NAT in normal and hypothyroid-treated muscle. Neonatal MHC transcription is coordinately expressed with bII NAT. A comparative phylogenetic analysis also suggests that bII NAT-mediated regulation has been a conserved trait of placental mammals for most of the eutherian evolutionary history. The evidence in support of the regulatory model implicates long noncoding antisense RNA as a mechanism to coordinate the transition between neonatal and IIb MHC during postnatal development. PMID:22262309
Regulation of an antisense RNA with the transition of neonatal to IIb myosin heavy chain during postnatal development and hypothyroidism in rat skeletal muscle.

PubMed

Pandorf, Clay E; Jiang, Weihua; Qin, Anqi X; Bodell, Paul W; Baldwin, Kenneth M; Haddad, Fadia

2012-04-01

Postnatal development of fast skeletal muscle is characterized by a transition in expression of myosin heavy chain (MHC) isoforms, from primarily neonatal MHC at birth to primarily IIb MHC in adults, in a tightly coordinated manner. These isoforms are encoded by distinct genes, which are separated by ∼17 kb on rat chromosome 10. The neonatal-to-IIb MHC transition is inhibited by a hypothyroid state. We examined RNA products [mRNA, pre-mRNA, and natural antisense transcript (NAT)] of developmental and adult-expressed MHC genes (embryonic, neonatal, I, IIa, IIx, and IIb) at 2, 10, 20, and 40 days after birth in normal and thyroid-deficient rat neonates treated with propylthiouracil. We found that a long noncoding antisense-oriented RNA transcript, termed bII NAT, is transcribed from a site within the IIb-Neo intergenic region and across most of the IIb MHC gene. NATs have previously been shown to mediate transcriptional repression of sense-oriented counterparts. The bII NAT is transcriptionally regulated during postnatal development and in response to hypothyroidism. Evidence for a regulatory mechanism is suggested by an inverse relationship between IIb MHC and bII NAT in normal and hypothyroid-treated muscle. Neonatal MHC transcription is coordinately expressed with bII NAT. A comparative phylogenetic analysis also suggests that bII NAT-mediated regulation has been a conserved trait of placental mammals for most of the eutherian evolutionary history. The evidence in support of the regulatory model implicates long noncoding antisense RNA as a mechanism to coordinate the transition between neonatal and IIb MHC during postnatal development.

Expression characteristics of long noncoding RNA uc.322 and its effects on pancreatic islet function.

PubMed

Zhao, Xiaoqin; Rong, Can; Pan, Fenghui; Xiang, Lizhi; Wang, Xinlei; Hu, Yun

2018-06-28

Increasing evidence indicates that long noncoding RNAs (lncRNAs) perform special biological functions by regulating gene expression through multiple pathways and molecular mechanisms. The aim of this study was to explore the expression characteristics of lncRNA uc.322 in pancreatic islet cells and its effects on the secretion function of islet cells. Bioinformatics analysis was used to detect the lncRNA uc.322 sequence, location, and structural features. Expression of lncRNA uc.322 in different tissues was detected by quantitative polymerase chain reaction analyses. Quantitative polymerase chain reaction, Western blot analysis, adenosine triphosphate determination, glucose-stimulated insulin secretion, and enzyme-linked immunosorbent assay were used to evaluate the effects of lncRNA uc.322 on insulin secretion. The results showed that the full-length of lncRNA uc.322 is 224 bp and that it is highly conserved in various species. Bioinformatics analysis revealed that lncRNA uc.322 is located on chr7:122893196-122893419 (GRCH37/hg19) within the SRY-related HMG-box 6 gene exon region. Compared with other tissues, lncRNA uc.322 is highly expressed in pancreatic tissue. Upregulation of lncRNA uc.322 expression increases the insulin transcription factors pancreatic and duodenal homeobox 1 and Forkhead box O1 expression, promotes insulin secretion in the extracellular fluid of Min6 cells, and increases the adenosine triphosphate concentration. On the other hand, knockdown of lncRNA uc.322 has opposite effects on Min6 cells. Overall, this study showed that upregulation of lncRNA uc.322 in islet β-cells can increase the expression of insulin transcription factors and promote insulin secretion, and it may be a new therapeutic target for diabetes. © 2018 Wiley Periodicals, Inc.
Functional noncoding sequences derived from SINEs in the mammalian genome.

PubMed

Nishihara, Hidenori; Smit, Arian F A; Okada, Norihiro

2006-07-01

Recent comparative analyses of mammalian sequences have revealed that a large number of nonprotein-coding genomic regions are under strong selective constraint. Here, we report that some of these loci have been derived from a newly defined family of ancient SINEs (short interspersed repetitive elements). This is a surprising result, as SINEs and other transposable elements are commonly thought to be genomic parasites. We named the ancient SINE family AmnSINE1, for Amniota SINE1, because we found it to be present in mammals as well as in birds, and some copies predate the mammalian-bird split 310 million years ago (Mya). AmnSINE1 has a chimeric structure of a 5S rRNA and a tRNA-derived SINE, and is related to five tRNA-derived SINE families that we characterized here in the coelacanth, dogfish shark, hagfish, and amphioxus genomes. All of the newly described SINE families have a common central domain that is also shared by zebrafish SINE3, and we collectively name them the DeuSINE (Deuterostomia SINE) superfamily. Notably, of the approximately 1000 still identifiable copies of AmnSINE1 in the human genome, 105 correspond to loci phylogenetically highly conserved among mammalian orthologs. The conservation is strongest over the central domain. Thus, AmnSINE1 appears to be the best example of a transposable element of which a significant fraction of the copies have acquired genomic functionality.
The complete mitochondrial genome of the American black flour beetle Tribolium audax (Coleoptera: Tenebrionidae).

PubMed

Ou, Jing; Liu, Jin-Bo; Yao, Fu-Jiao; Wang, Xin-Guo; Wei, Zhao-Ming

2016-01-01

Flour beetles of the genus Tribolium are all pests of stored products and cause severe economic losses every year. The American black flour beetle Tribolium audax is one of the important pest species of flour beetle, and it is also an important quarantine insect. Here we sequenced and characterized the complete mitochondrial genome of T. audax, which was intercepted by Huangpu Custom in maize from America. The complete circular mitochondrial genome (mitogenome) of T. audax was 15,924 bp in length, containing 37 typical coding genes and one non-coding AT-rich region. The mitogenome of T. audax exhibits a gene arrangement and content identical to the most common type in insects. All protein coding genes (PCGs) are start with a typical ATN initiation codon, except for the cox1, which use AAC as its start codon instead of ATN. Eleven genes use standard complete termination codon (nine TAA, two TAG), whereas the nad4 and nad5 genes end with single T. Except for trnS1 (AGN), all tRNA genes display typical secondary cloverleaf structures as those of other insects. The sizes of the large and small ribosomal RNA genes are 1288 and 780 bp, respectively. The AT content of the AT-rich region is 81.36%. The 5 bp conserved motif TACTA was found in the intergenic region between trnS2 (UCN) and nad1.
Multiplexed direct genomic selection (MDiGS): a pooled BAC capture approach for highly accurate CNV and SNP/INDEL detection.

PubMed

Alvarado, David M; Yang, Ping; Druley, Todd E; Lovett, Michael; Gurnett, Christina A

2014-06-01

Despite declining sequencing costs, few methods are available for cost-effective single-nucleotide polymorphism (SNP), insertion/deletion (INDEL) and copy number variation (CNV) discovery in a single assay. Commercially available methods require a high investment to a specific region and are only cost-effective for large samples. Here, we introduce a novel, flexible approach for multiplexed targeted sequencing and CNV analysis of large genomic regions called multiplexed direct genomic selection (MDiGS). MDiGS combines biotinylated bacterial artificial chromosome (BAC) capture and multiplexed pooled capture for SNP/INDEL and CNV detection of 96 multiplexed samples on a single MiSeq run. MDiGS is advantageous over other methods for CNV detection because pooled sample capture and hybridization to large contiguous BAC baits reduces sample and probe hybridization variability inherent in other methods. We performed MDiGS capture for three chromosomal regions consisting of ∼ 550 kb of coding and non-coding sequence with DNA from 253 patients with congenital lower limb disorders. PITX1 nonsense and HOXC11 S191F missense mutations were identified that segregate in clubfoot families. Using a novel pooled-capture reference strategy, we identified recurrent chromosome chr17q23.1q23.2 duplications and small HOXC 5' cluster deletions (51 kb and 12 kb). Given the current interest in coding and non-coding variants in human disease, MDiGS fulfills a niche for comprehensive and low-cost evaluation of CNVs, coding, and non-coding variants across candidate regions of interest. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Mutations in the Norrie disease gene.

PubMed

Schuback, D E; Chen, Z Y; Craig, I W; Breakefield, X O; Sims, K B

1995-01-01

We report our experience to date in mutation identification in the Norrie disease (ND) gene. We carried out mutational analysis in 26 kindreds in an attempt to identify regions presumed critical to protein function and potentially correlated with generation of the disease phenotype. All coding exons, as well as noncoding regions of exons 1 and 2, 636 nucleotides in the noncoding region of exon 3, and 197 nucleotides of 5' flanking sequence, were analyzed for single-strand conformation polymorphisms (SSCP) by polymerase chain reaction (PCR) amplification of genomic DNA. DNA fragments that showed altered SSCP band mobilities were sequenced to locate the specific mutations. In addition to three previously described submicroscopic deletions encompassing the entire ND gene, we have now identified 6 intragenic deletions, 8 missense (seven point mutations, one 9-bp deletion), 6 nonsense (three point mutations, three single bp deletions/frameshift) and one 10-bp insertion, creating an expanded repeat in the 5' noncoding region of exon 1. Thus, mutations have been identified in a total of 24 of 26 (92%) of the kindreds we have studied to date. With the exception of two different mutations, each found in two apparently unrelated kindreds, these mutations are unique and expand the genotype database. Localization of the majority of point mutations at or near cysteine residues, potentially critical in protein tertiary structure, supports a previous protein model for norrin as member of a cystine knot growth factor family (Meitinger et al., 1993). Genotype-phenotype correlations were not evident with the limited clinical data available, except in the cases of larger submicroscopic deletions associated with a more severe neurologic syndrome.(ABSTRACT TRUNCATED AT 250 WORDS)
SNPnexus: assessing the functional relevance of genetic variation to facilitate the promise of precision medicine.

PubMed

Dayem Ullah, Abu Z; Oscanoa, Jorge; Wang, Jun; Nagano, Ai; Lemoine, Nicholas R; Chelala, Claude

2018-05-11

Broader functional annotation of genetic variation is a valuable means for prioritising phenotypically-important variants in further disease studies and large-scale genotyping projects. We developed SNPnexus to meet this need by assessing the potential significance of known and novel SNPs on the major transcriptome, proteome, regulatory and structural variation models. Since its previous release in 2012, we have made significant improvements to the annotation categories and updated the query and data viewing systems. The most notable changes include broader functional annotation of noncoding variants and expanding annotations to the most recent human genome assembly GRCh38/hg38. SNPnexus has now integrated rich resources from ENCODE and Roadmap Epigenomics Consortium to map and annotate the noncoding variants onto different classes of regulatory regions and noncoding RNAs as well as providing their predicted functional impact from eight popular non-coding variant scoring algorithms and computational methods. A novel functionality offered now is the support for neo-epitope predictions from leading tools to facilitate its use in immunotherapeutic applications. These updates to SNPnexus are in preparation for its future expansion towards a fully comprehensive computational workflow for disease-associated variant prioritization from sequencing data, placing its users at the forefront of translational research. SNPnexus is freely available at http://www.snp-nexus.org.
Comparisons between Arabidopsis thaliana and Drosophila melanogaster in relation to Coding and Noncoding Sequence Length and Gene Expression

PubMed Central

Caldwell, Rachel; Lin, Yan-Xia; Zhang, Ren

2015-01-01

There is a continuing interest in the analysis of gene architecture and gene expression to determine the relationship that may exist. Advances in high-quality sequencing technologies and large-scale resource datasets have increased the understanding of relationships and cross-referencing of expression data to the large genome data. Although a negative correlation between expression level and gene (especially transcript) length has been generally accepted, there have been some conflicting results arising from the literature concerning the impacts of different regions of genes, and the underlying reason is not well understood. The research aims to apply quantile regression techniques for statistical analysis of coding and noncoding sequence length and gene expression data in the plant, Arabidopsis thaliana, and fruit fly, Drosophila melanogaster, to determine if a relationship exists and if there is any variation or similarities between these species. The quantile regression analysis found that the coding sequence length and gene expression correlations varied, and similarities emerged for the noncoding sequence length (5′ and 3′ UTRs) between animal and plant species. In conclusion, the information described in this study provides the basis for further exploration into gene regulation with regard to coding and noncoding sequence length. PMID:26114098
The expanding regulatory universe of p53 in gastrointestinal cancer.

PubMed

Fesler, Andrew; Zhang, Ning; Ju, Jingfang

2016-01-01

Tumor suppresser gene TP53 is one of the most frequently deleted or mutated genes in gastrointestinal cancers. As a transcription factor, p53 regulates a number of important protein coding genes to control cell cycle, cell death, DNA damage/repair, stemness, differentiation and other key cellular functions. In addition, p53 is also able to activate the expression of a number of small non-coding microRNAs (miRNAs) through direct binding to the promoter region of these miRNAs. Many miRNAs have been identified to be potential tumor suppressors by regulating key effecter target mRNAs. Our understanding of the regulatory network of p53 has recently expanded to include long non-coding RNAs (lncRNAs). Like miRNA, lncRNAs have been found to play important roles in cancer biology. With our increased understanding of the important functions of these non-coding RNAs and their relationship with p53, we are gaining exciting new insights into the biology and function of cells in response to various growth environment changes. In this review we summarize the current understanding of the ever expanding involvement of non-coding RNAs in the p53 regulatory network and its implications for our understanding of gastrointestinal cancer.
NONCODE v2.0: decoding the non-coding.

PubMed

He, Shunmin; Liu, Changning; Skogerbø, Geir; Zhao, Haitao; Wang, Jie; Liu, Tao; Bai, Baoyan; Zhao, Yi; Chen, Runsheng

2008-01-01

The NONCODE database is an integrated knowledge database designed for the analysis of non-coding RNAs (ncRNAs). Since NONCODE was first released 3 years ago, the number of known ncRNAs has grown rapidly, and there is growing recognition that ncRNAs play important regulatory roles in most organisms. In the updated version of NONCODE (NONCODE v2.0), the number of collected ncRNAs has reached 206 226, including a wide range of microRNAs, Piwi-interacting RNAs and mRNA-like ncRNAs. The improvements brought to the database include not only new and updated ncRNA data sets, but also an incorporation of BLAST alignment search service and access through our custom UCSC Genome Browser. NONCODE can be found under http://www.noncode.org or http://noncode.bioinfo.org.cn.
A 5′ Noncoding Exon Containing Engineered Intron Enhances Transgene Expression from Recombinant AAV Vectors in vivo

PubMed Central

Lu, Jiamiao; Williams, James A.; Luke, Jeremy; Zhang, Feijie; Chu, Kirk; Kay, Mark A.

2017-01-01

We previously developed a mini-intronic plasmid (MIP) expression system in which the essential bacterial elements for plasmid replication and selection are placed within an engineered intron contained within a universal 5′ UTR noncoding exon. Like minicircle DNA plasmids (devoid of bacterial backbone sequences), MIP plasmids overcome transcriptional silencing of the transgene. However, in addition MIP plasmids increase transgene expression by 2 and often >10 times higher than minicircle vectors in vivo and in vitro. Based on these findings, we examined the effects of the MIP intronic sequences in a recombinant adeno-associated virus (AAV) vector system. Recombinant AAV vectors containing an intron with a bacterial replication origin and bacterial selectable marker increased transgene expression by 40 to 100 times in vivo when compared with conventional AAV vectors. Therefore, inclusion of this noncoding exon/intron sequence upstream of the coding region can substantially enhance AAV-mediated gene expression in vivo. PMID:27903072
Identification and role of regulatory non-coding RNAs in Listeria monocytogenes.

PubMed

Izar, Benjamin; Mraheil, Mobarak Abu; Hain, Torsten

2011-01-01

Bacterial regulatory non-coding RNAs control numerous mRNA targets that direct a plethora of biological processes, such as the adaption to environmental changes, growth and virulence. Recently developed high-throughput techniques, such as genomic tiling arrays and RNA-Seq have allowed investigating prokaryotic cis- and trans-acting regulatory RNAs, including sRNAs, asRNAs, untranslated regions (UTR) and riboswitches. As a result, we obtained a more comprehensive view on the complexity and plasticity of the prokaryotic genome biology. Listeria monocytogenes was utilized as a model system for intracellular pathogenic bacteria in several studies, which revealed the presence of about 180 regulatory RNAs in the listerial genome. A regulatory role of non-coding RNAs in survival, virulence and adaptation mechanisms of L. monocytogenes was confirmed in subsequent experiments, thus, providing insight into a multifaceted modulatory function of RNA/mRNA interference. In this review, we discuss the identification of regulatory RNAs by high-throughput techniques and in their functional role in L. monocytogenes.
The complete mitochondrial genome of Rapana venosa (Gastropoda, Muricidae).

PubMed

Sun, Xiujun; Yang, Aiguo

2016-01-01

The complete mitochondrial (mt) genome of the veined rapa whelk, Rapana venosa, was determined using genome walking techniques in this study. The total length of the mt genome sequence of R. venosa was 15,271 bp, which is comparable to the reported Muricidae mitogenomes to date. It contained 13 protein-coding genes, 21 transfer RNA genes, and two ribosomal RNA genes. A bias towards a higher representation of nucleotides A and T (69%) was detected in the mt genome of R. venosa. A small number of non-coding nucleotides (302 bp) was detected, and the largest non-coding region was 74 bp in length.
Regulating infidelity: RNA-mediated recruitment of AID to DNA during class switch recombination.

PubMed

DiMenna, Lauren J; Chaudhuri, Jayanta

2016-03-01

The mechanism by which the DNA deaminase activation-induced cytidine deaminase (AID) is specifically recruited to repetitive switch region DNA during class switch recombination is still poorly understood. Work over the past decade has revealed a strong link between transcription and RNA polymerase-associated factors in AID recruitment, yet none of these processes satisfactorily explain how AID specificity is affected. Here, we review a recent finding wherein AID is guided to switch regions not by a protein factor but by an RNA moiety, and especially one associated with a noncoding RNA that has been long thought of as being inert. This work explains the long-standing requirement of splicing of noncoding transcripts during class switching, and has implications in both B cell-mediated immunity as well as the underlying pathological syndromes associated with the recombination reaction. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Mitochondrial genomes of parasitic flatworms.

PubMed

Le, Thanh H; Blair, David; McManus, Donald P

2002-05-01

Complete or near-complete mitochondrial genomes are now available for 11 species or strains of parasitic flatworms belonging to the Trematoda and the Cestoda. The organization of these genomes is not strikingly different from those of other eumetazoans, although one gene (atp8) commonly found in other phyla is absent from flatworms. The gene order in most flatworms has similarities to those seen in higher protostomes such as annelids. However, the gene order has been drastically altered in Schistosoma mansoni, which obscures this possible relationship. Among the sequenced taxa, base composition varies considerably, creating potential difficulties for phylogeny reconstruction. Long non-coding regions are present in all taxa, but these vary in length from only a few hundred to approximately 10000 nucleotides. Among Schistosoma spp., the long non-coding regions are rich in repeats and length variation among individuals is known. Data from mitochondrial genomes are valuable for studies on species identification, phylogenies and biogeography.
Identification of novel non-coding small RNAs from Streptococcus pneumoniae TIGR4 using high-resolution genome tiling arrays

PubMed Central

2010-01-01

Background The identification of non-coding transcripts in human, mouse, and Escherichia coli has revealed their widespread occurrence and functional importance in both eukaryotic and prokaryotic life. In prokaryotes, studies have shown that non-coding transcripts participate in a broad range of cellular functions like gene regulation, stress and virulence. However, very little is known about non-coding transcripts in Streptococcus pneumoniae (pneumococcus), an obligate human respiratory pathogen responsible for significant worldwide morbidity and mortality. Tiling microarrays enable genome wide mRNA profiling as well as identification of novel transcripts at a high-resolution. Results Here, we describe a high-resolution transcription map of the S. pneumoniae clinical isolate TIGR4 using genomic tiling arrays. Our results indicate that approximately 66% of the genome is expressed under our experimental conditions. We identified a total of 50 non-coding small RNAs (sRNAs) from the intergenic regions, of which 36 had no predicted function. Half of the identified sRNA sequences were found to be unique to S. pneumoniae genome. We identified eight overrepresented sequence motifs among sRNA sequences that correspond to sRNAs in different functional categories. Tiling arrays also identified approximately 202 operon structures in the genome. Conclusions In summary, the pneumococcal operon structures and novel sRNAs identified in this study enhance our understanding of the complexity and extent of the pneumococcal 'expressed' genome. Furthermore, the results of this study open up new avenues of research for understanding the complex RNA regulatory network governing S. pneumoniae physiology and virulence. PMID:20525227
Identification of a Conserved Non-Protein-Coding Genomic Element that Plays an Essential Role in Alphabaculovirus Pathogenesis

PubMed Central

Kikhno, Irina

2014-01-01

Highly homologous sequences 154–157 bp in length grouped under the name of “conserved non-protein-coding element” (CNE) were revealed in all of the sequenced genomes of baculoviruses belonging to the genus Alphabaculovirus. A CNE alignment led to the detection of a set of highly conserved nucleotide clusters that occupy strictly conserved positions in the CNE sequence. The significant length of the CNE and conservation of both its length and cluster architecture were identified as a combination of characteristics that make this CNE different from known viral non-coding functional sequences. The essential role of the CNE in the Alphabaculovirus life cycle was demonstrated through the use of a CNE-knockout Autographa californica multiple nucleopolyhedrovirus (AcMNPV) bacmid. It was shown that the essential function of the CNE was not mediated by the presumed expression activities of the protein- and non-protein-coding genes that overlap the AcMNPV CNE. On the basis of the presented data, the AcMNPV CNE was categorized as a complex-structured, polyfunctional genomic element involved in an essential DNA transaction that is associated with an undefined function of the baculovirus genome. PMID:24740153
Conserved sequence-specific lincRNA-steroid receptor interactions drive transcriptional repression and direct cell fate

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hudson, William H.; Pickard, Mark R.; de Vera, Ian Mitchelle S.

2014-12-23

The majority of the eukaryotic genome is transcribed, generating a significant number of long intergenic noncoding RNAs (lincRNAs). Although lincRNAs represent the most poorly understood product of transcription, recent work has shown lincRNAs fulfill important cellular functions. In addition to low sequence conservation, poor understanding of structural mechanisms driving lincRNA biology hinders systematic prediction of their function. Here we report the molecular requirements for the recognition of steroid receptors (SRs) by the lincRNA growth arrest-specific 5 (Gas5), which regulates steroid-mediated transcriptional regulation, growth arrest and apoptosis. We identify the functional Gas5-SR interface and generate point mutations that ablate the SR-Gas5more » lincRNA interaction, altering Gas5-driven apoptosis in cancer cell lines. Further, we find that the Gas5 SR-recognition sequence is conserved among haplorhines, with its evolutionary origin as a splice acceptor site. This study demonstrates that lincRNAs can recognize protein targets in a conserved, sequence-specific manner in order to affect critical cell functions.« less
Ferritin gene organization: differences between plants and animals suggest possible kingdom-specific selective constraints.

PubMed

Proudhon, D; Wei, J; Briat, J; Theil, E C

1996-03-01

Ferritin, a protein widespread in nature, concentrates iron approximately 10(11)-10(12)-fold above the solubility within a spherical shell of 24 subunits; it derives in plants and animals from a common ancestor (based on sequence) but displays a cytoplasmic location in animals compared to the plastid in contemporary plants. Ferritin gene regulation in plants and animals is altered by development, hormones, and excess iron; iron signals target DNA in plants but mRNA in animals. Evolution has thus conserved the two end points of ferritin gene expression, the physiological signals and the protein structure, while allowing some divergence of the genetic mechanisms. Comparison of ferritin gene organization in plants and animals, made possible by the cloning of a dicot (soybean) ferritin gene presented here and the recent cloning of two monocot (maize) ferritin genes, shows evolutionary divergence in ferritin gene organization between plants and animals but conservation among plants or among animals; divergence in the genetic mechanism for iron regulation is reflected by the absence in all three plant genes of the IRE, a highly conserved, noncoding sequence in vertebrate animal ferritin mRNA. In plant ferritin genes, the number of introns (n = 7) is higher than in animals (n = 3). Second, no intron positions are conserved when ferritin genes of plants and animals are compared, although all ferritin gene introns are in the coding region; within kingdoms, the intron positions in ferritin genes are conserved. Finally, secondary protein structure has no apparent relationship to intron/exon boundaries in plant ferritin genes, whereas in animal ferritin genes the correspondence is high. The structural differences in introns/exons among phylogenetically related ferritin coding sequences and the high conservation of the gene structure within plant or animal kingdoms of the gene structure within plant or animal kingdoms suggest that kingdom-specific functional constraints may exist to maintain a particular intron/exon pattern within ferritin genes. In the case of plants, where ferritin gene intron placement is unrelated to triplet codons or protein structure, and where ferritin is targeted to the plastid, the selection pressure on gene organization may relate to RNA function and plastid/nuclear signaling.
A Celiac Diasease Associated lncRNA Named HCG14 Regulates NOD1 Expression in Intestinal Cells.

PubMed

Santin, Izortze; Jauregi-Miguel, Amaia; Velayos, Teresa; Castellanos-Rubio, Ainara; Garcia-Etxebarria, Koldo; Romero-Garmendia, Irati; Fernandez-Jimenez, Nora; Irastorza, Iñaki; Castaño, Luis; Bilbao, Jose Ramón

2018-03-29

To identify additional celiac disease associated loci in the Major Histocompatibility Complex independent from classical HLA risk alleles (HLA-DR3-DQ2) and to characterize their potential functional impact in celiac disease pathogenesis at the intestinal level. We performed a high resolution SNP genotyping of the MHC region, comparing HLA-DR3 homozygous celiac patients and non-celiac controls carrying a single copy of the B8-DR3-DQ2 conserved extended haplotype. Expression level of potential novel risk genes was determined by RT-PCR in intestinal biopsies and in intestinal and immune cells isolated from control and celiac individuals. Small interfering RNA-driven silencing of selected genes was performed in the intestinal cell line T84. MHC genotyping revealed two associated SNPs, one located in TRIM27 gene and another in the non-coding gene HCG14. After stratification analysis, only HCG14 showed significant association independent from HLA-DR-DQ loci Expression of HCG14 was slightly downregulated in epithelial cells isolated from duodenal biopsies of celiac patients, and eQTL analysis revealed that polymorphisms in HCG14 region were associated with decreased NOD1 expression in duodenal intestinal cells. We have sucessfully employed a conserved extended haplotype-matching strategy and identified a novel additional celiac disease risk variant in the lncRNA HGC14. This lncRNA seems to regulate the expression of NOD1 in an allele-specific manner. Further functional studies are needed to clarify the role of HCG14 in the regulation of gene expression and to determine the molecular mechanisms by which the risk variant in HCG14 contributes to celiac disease pathogenesis.
Comparative chloroplast genomes of eleven Schima (Theaceae) species: Insights into DNA barcoding and phylogeny.

PubMed

Yu, Xiang-Qin; Drew, Bryan T; Yang, Jun-Bo; Gao, Lian-Ming; Li, De-Zhu

2017-01-01

Schima is an ecologically and economically important woody genus in tea family (Theaceae). Unresolved species delimitations and phylogenetic relationships within Schima limit our understanding of the genus and hinder utilization of the genus for economic purposes. In the present study, we conducted comparative analysis among the complete chloroplast (cp) genomes of 11 Schima species. Our results indicate that Schima cp genomes possess a typical quadripartite structure, with conserved genomic structure and gene order. The size of the Schima cp genome is about 157 kilo base pairs (kb). They consistently encode 114 unique genes, including 80 protein-coding genes, 30 tRNAs, and 4 rRNAs, with 17 duplicated in the inverted repeat (IR). These cp genomes are highly conserved and do not show obvious expansion or contraction of the IR region. The percent variability of the 68 coding and 93 noncoding (>150 bp) fragments is consistently less than 3%. The seven most widely touted DNA barcode regions as well as one promising barcode candidate showed low sequence divergence. Eight mutational hotspots were identified from the 11 cp genomes. These hotspots may potentially be useful as specific DNA barcodes for species identification of Schima. The 58 cpSSR loci reported here are complementary to the microsatellite markers identified from the nuclear genome, and will be leveraged for further population-level studies. Phylogenetic relationships among the 11 Schima species were resolved with strong support based on the cp genome data set, which corresponds well with the species distribution pattern. The data presented here will serve as a foundation to facilitate species identification, DNA barcoding and phylogenetic reconstructions for future exploration of Schima.

Genetic, comparative genomic, and expression analyses of the Mc1r locus in the polychromatic Midas cichlid fish (Teleostei, Cichlidae Amphilophus sp.) species group.

PubMed

Henning, Frederico; Renz, Adina Josepha; Fukamachi, Shoji; Meyer, Axel

2010-05-01

Natural populations of the Midas cichlid species in several different crater lakes in Nicaragua exhibit a conspicuous color polymorphism. Most individuals are dark and the remaining have a gold coloration. The color morphs mate assortatively and sympatric population differentiation has been shown based on neutral molecular data. We investigated the color polymorphism using segregation analysis and a candidate gene approach. The segregation patterns observed in a mapping cross between a gold and a dark individual were consistent with a single dominant gene as a cause of the gold phenotype. This suggests that a simple genetic architecture underlies some of the speciation events in the Midas cichlids. We compared the expression levels of several candidate color genes Mc1r, Ednrb1, Slc45a2, and Tfap1a between the color morphs. Mc1r was found to be up regulated in the gold morph. Given its widespread association in color evolution and role on melanin synthesis, the Mc1r locus was further investigated using sequences derived from a genomic library. Comparative analysis revealed conserved synteny in relation to the majority of teleosts and highlighted several previously unidentified conserved non-coding elements (CNEs) in the upstream and downstream regions in the vicinity of Mc1r. The identification of the CNEs regions allowed the comparison of sequences from gold and dark specimens of natural populations. No polymorphisms were found between in the population sample and Mc1r showed no linkage to the gold phenotype in the mapping cross, demonstrating that it is not causally related to the color polymorphism in the Midas cichlid.
Genome sequences of a mouse-avirulent and a mouse-virulent strain of Ross River virus.

PubMed

Faragher, S G; Meek, A D; Rice, C M; Dalgarno, L

1988-04-01

The nucleotide sequence of the genomic RNA of a mouse-avirulent strain of Ross River virus, RRV NB5092 (isolated in 1969), has been determined and the corresponding sequence for the prototype mouse-virulent strain, RRV T48 (isolated in 1959), has been completed. The RRV NB5092 genome is approximately 11,674 nucleotides in length, compared with 11,853 nucleotides for RRV T48. RRV NB5092 and RRV T48 have the same genome organization. For both viruses an untranslated region of 80 nucleotides at the 5' end of the genome is followed by a 7440-nucleotide open reading frame which is interrupted after 5586 nucleotides by a single opal termination codon. By homology with other alphaviruses, the 5586-nucleotide open reading frame encodes the nonstructural proteins nsP1, nsP2, and nsP3; a fourth nonstructural protein, nsP4, is produced by read-through of the opal codon. The RRV nonstructural proteins show strong homology with the corresponding proteins of Sindbis virus and Semliki Forest virus in terms of size, net charge, and hydropathy characteristics. However, homology is not uniform between or within the proteins; nsP1, nsP2, and nsP4 contain extended domains which are highly conserved between alphaviruses, while the C-terminal region of nsP3 shows little conservation in sequence or length between alphaviruses. An untranslated "junction" region of 44 nucleotides (for RRV NB5092) or 47 nucleotides (for RRV T48) separates the nonstructural and structural protein coding regions. The structural proteins (capsid-E3-E2-6K-E1) are translated from an open reading frame of 3762 nucleotides which is followed by a 3'-untranslated region of approximately 348 nucleotides (for RRV NB5092) or 524 nucleotides (for RRV T48). Excluding deletions and insertions, the genomes of RRV NB5092 and RRV T48 differ at 284 nucleotides, representing a sequence divergence of 2.38%. Sequence deletions or insertions were found only in the noncoding regions and include a 173-nucleotide deletion in the 3'-untranslated region of RRV NB5092, compared with RRV T48. In the coding regions, most of the nucleotide differences are silent; there are 36 amino acid differences in the nonstructural proteins and 12 in the structural proteins. The distribution of amino acid differences between the two RRV strains correlates with the location of domains which are poorly conserved in sequence between alphaviruses. The possible role of amino acid differences in envelope glycoproteins E1 and E2 in determining the different antigenic and biological properties of RRV NB5092 and RRV T48 is discussed.
Current Research on Non-Coding Ribonucleic Acid (RNA).

PubMed

Wang, Jing; Samuels, David C; Zhao, Shilin; Xiang, Yu; Zhao, Ying-Yong; Guo, Yan

2017-12-05

Non-coding ribonucleic acid (RNA) has without a doubt captured the interest of biomedical researchers. The ability to screen the entire human genome with high-throughput sequencing technology has greatly enhanced the identification, annotation and prediction of the functionality of non-coding RNAs. In this review, we discuss the current landscape of non-coding RNA research and quantitative analysis. Non-coding RNA will be categorized into two major groups by size: long non-coding RNAs and small RNAs. In long non-coding RNA, we discuss regular long non-coding RNA, pseudogenes and circular RNA. In small RNA, we discuss miRNA, transfer RNA, piwi-interacting RNA, small nucleolar RNA, small nuclear RNA, Y RNA, single recognition particle RNA, and 7SK RNA. We elaborate on the origin, detection method, and potential association with disease, putative functional mechanisms, and public resources for these non-coding RNAs. We aim to provide readers with a complete overview of non-coding RNAs and incite additional interest in non-coding RNA research.
The emergence of noncoding RNAs as Heracles in autophagy.

PubMed

Zhang, Jian; Wang, Peiyuan; Wan, Lin; Xu, Shouping; Pang, Da

2017-06-03

Macroautophagy/autophagy is a catabolic process that is widely found in nature. Over the past few decades, mounting evidence has indicated that noncoding RNAs, ranging from small noncoding RNAs to long noncoding RNAs (lncRNAs) and even circular RNAs (circRNAs), mediate the transcriptional and post-transcriptional regulation of autophagy-related genes by participating in autophagy regulatory networks. The differential expression of noncoding RNAs affects autophagy levels at different physiological and pathological stages, including embryonic proliferation and differentiation, cellular senescence, and even diseases such as cancer. We summarize the current knowledge regarding noncoding RNA dysregulation in autophagy and investigate the molecular regulatory mechanisms underlying noncoding RNA involvement in autophagy regulatory networks. Then, we integrate public resources to predict autophagy-related noncoding RNAs across species and discuss strategies for and the challenges of identifying autophagy-related noncoding RNAs. This article will deepen our understanding of the relationship between noncoding RNAs and autophagy, and provide new insights to specifically target noncoding RNAs in autophagy-associated therapeutic strategies.
Polar bears, antibiotics, and the evolving ribosome (Nobel Lecture).

PubMed

Yonath, Ada

2010-06-14

High-resolution structures of ribosomes, the cellular machines that translate the genetic code into proteins, revealed the decoding mechanism, detected the mRNA path, identified the sites of the tRNA molecules in the ribosome, elucidated the position and the nature of the nascent proteins exit tunnel, illuminated the interactions of the ribosome with non-ribosomal factors, such as the initiation, release and recycling factors, and provided valuable information on ribosomal antibiotics, their binding sites, modes of action, principles of selectivity and the mechanisms leading to their resistance. Notably, these structures proved that the ribosome is a ribozyme whose active site, namely where the peptide bonds are being formed, is situated within a universal symmetrical region that is embedded in the otherwise asymmetric ribosome structure. As this symmetrical region is highly conserved and provides the machinery required for peptide bond formation and for ribosome polymerase activity, it may be the remnant of the proto-ribosome, a dimeric prebiotic machine that formed peptide bonds and non-coded polypeptide chains. Structures of complexes of ribosomes with antibiotics targeting them revealed the principles allowing for their clinical use, identified resistance mechanisms and showed the structural bases for discriminating pathogenic bacteria from hosts, hence providing valuable structural information for antibiotics improvement and for the design of novel compounds that can serve as antibiotics.
Developmental roles of 21 Drosophila transcription factors are determined by quantitative differences in binding to an overlapping set of thousands of genomic regions

DOE Office of Scientific and Technical Information (OSTI.GOV)

MacArthur, Stewart; Li, Xiao-Yong; Li, Jingyi

2009-05-15

BACKGROUND: We previously established that six sequence-specific transcription factors that initiate anterior/posterior patterning in Drosophila bind to overlapping sets of thousands of genomic regions in blastoderm embryos. While regions bound at high levels include known and probable functional targets, more poorly bound regions are preferentially associated with housekeeping genes and/or genes not transcribed in the blastoderm, and are frequently found in protein coding sequences or in less conserved non-coding DNA, suggesting that many are likely non-functional. RESULTS: Here we show that an additional 15 transcription factors that regulate other aspects of embryo patterning show a similar quantitative continuum of functionmore » and binding to thousands of genomic regions in vivo. Collectively, the 21 regulators show a surprisingly high overlap in the regions they bind given that they belong to 11 DNA binding domain families, specify distinct developmental fates, and can act via different cis-regulatory modules. We demonstrate, however, that quantitative differences in relative levels of binding to shared targets correlate with the known biological and transcriptional regulatory specificities of these factors. CONCLUSIONS: It is likely that the overlap in binding of biochemically and functionally unrelated transcription factors arises from the high concentrations of these proteins in nuclei, which, coupled with their broad DNA binding specificities, directs them to regions of open chromatin. We suggest that most animal transcription factors will be found to show a similar broad overlapping pattern of binding in vivo, with specificity achieved by modulating the amount, rather than the identity, of bound factor.« less
A genome-wide signature of positive selection in ancient and recent invasive expansions of the honey bee Apis mellifera

PubMed Central

Zayed, Amro; Whitfield, Charles W.

2008-01-01

Apis mellifera originated in Africa and extended its range into Eurasia in two or more ancient expansions. In 1956, honey bees of African origin were introduced into South America, their descendents admixing with previously introduced European bees, giving rise to the highly invasive and economically devastating “Africanized” honey bee. Here we ask whether the honey bee's out-of-Africa expansions, both ancient and recent (invasive), were associated with a genome-wide signature of positive selection, detected by contrasting genetic differentiation estimates (FST) between coding and noncoding SNPs. In native populations, SNPs in protein-coding regions had significantly higher FST estimates than those in noncoding regions, indicating adaptive evolution in the genome driven by positive selection. This signal of selection was associated with the expansion of honey bees from Africa into Western and Northern Europe, perhaps reflecting adaptation to temperate environments. We estimate that positive selection acted on a minimum of 852–1,371 genes or ≈10% of the bee's coding genome. We also detected positive selection associated with the invasion of African-derived honey bees in the New World. We found that introgression of European-derived alleles into Africanized bees was significantly greater for coding than noncoding regions. Our findings demonstrate that Africanized bees exploited the genetic diversity present from preexisting introductions in an adaptive way. Finally, we found a significant negative correlation between FST estimates and the local GC content surrounding coding SNPs, suggesting that AT-rich genes play an important role in adaptive evolution in the honey bee. PMID:18299560
A genome-wide signature of positive selection in ancient and recent invasive expansions of the honey bee Apis mellifera.

PubMed

Zayed, Amro; Whitfield, Charles W

2008-03-04

Apis mellifera originated in Africa and extended its range into Eurasia in two or more ancient expansions. In 1956, honey bees of African origin were introduced into South America, their descendents admixing with previously introduced European bees, giving rise to the highly invasive and economically devastating "Africanized" honey bee. Here we ask whether the honey bee's out-of-Africa expansions, both ancient and recent (invasive), were associated with a genome-wide signature of positive selection, detected by contrasting genetic differentiation estimates (F(ST)) between coding and noncoding SNPs. In native populations, SNPs in protein-coding regions had significantly higher F(ST) estimates than those in noncoding regions, indicating adaptive evolution in the genome driven by positive selection. This signal of selection was associated with the expansion of honey bees from Africa into Western and Northern Europe, perhaps reflecting adaptation to temperate environments. We estimate that positive selection acted on a minimum of 852-1,371 genes or approximately 10% of the bee's coding genome. We also detected positive selection associated with the invasion of African-derived honey bees in the New World. We found that introgression of European-derived alleles into Africanized bees was significantly greater for coding than noncoding regions. Our findings demonstrate that Africanized bees exploited the genetic diversity present from preexisting introductions in an adaptive way. Finally, we found a significant negative correlation between F(ST) estimates and the local GC content surrounding coding SNPs, suggesting that AT-rich genes play an important role in adaptive evolution in the honey bee.
G-quadruplex prediction in E. coli genome reveals a conserved putative G-quadruplex-Hairpin-Duplex switch.

PubMed

Kaplan, Oktay I; Berber, Burak; Hekim, Nezih; Doluca, Osman

2016-11-02

Many studies show that short non-coding sequences are widely conserved among regulatory elements. More and more conserved sequences are being discovered since the development of next generation sequencing technology. A common approach to identify conserved sequences with regulatory roles relies on topological changes such as hairpin formation at the DNA or RNA level. G-quadruplexes, non-canonical nucleic acid topologies with little established biological roles, are increasingly considered for conserved regulatory element discovery. Since the tertiary structure of G-quadruplexes is strongly dependent on the loop sequence which is disregarded by the generally accepted algorithm, we hypothesized that G-quadruplexes with similar topology and, indirectly, similar interaction patterns, can be determined using phylogenetic clustering based on differences in the loop sequences. Phylogenetic analysis of 52 G-quadruplex forming sequences in the Escherichia coli genome revealed two conserved G-quadruplex motifs with a potential regulatory role. Further analysis revealed that both motifs tend to form hairpins and G quadruplexes, as supported by circular dichroism studies. The phylogenetic analysis as described in this work can greatly improve the discovery of functional G-quadruplex structures and may explain unknown regulatory patterns. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Roles of Non-Coding RNA in Sugarcane-Microbe Interaction.

PubMed

Thiebaut, Flávia; Rojas, Cristian A; Grativol, Clícia; Calixto, Edmundo P da R; Motta, Mariana R; Ballesteros, Helkin G F; Peixoto, Barbara; de Lima, Berenice N S; Vieira, Lucas M; Walter, Maria Emilia; de Armas, Elvismary M; Entenza, Júlio O P; Lifschitz, Sergio; Farinelli, Laurent; Hemerly, Adriana S; Ferreira, Paulo C G

2017-12-20

Studies have highlighted the importance of non-coding RNA regulation in plant-microbe interaction. However, the roles of sugarcane microRNAs (miRNAs) in the regulation of disease responses have not been investigated. Firstly, we screened the sRNA transcriptome of sugarcane infected with Acidovorax avenae . Conserved and novel miRNAs were identified. Additionally, small interfering RNAs (siRNAs) were aligned to differentially expressed sequences from the sugarcane transcriptome. Interestingly, many siRNAs aligned to a transcript encoding a copper-transporter gene whose expression was induced in the presence of A. avenae , while the siRNAs were repressed in the presence of A. avenae . Moreover, a long intergenic non-coding RNA was identified as a potential target or decoy of miR408. To extend the bioinformatics analysis, we carried out independent inoculations and the expression patterns of six miRNAs were validated by quantitative reverse transcription-PCR (qRT-PCR). Among these miRNAs, miR408-a copper-microRNA-was downregulated. The cleavage of a putative miR408 target, a laccase, was confirmed by a modified 5'RACE (rapid amplification of cDNA ends) assay. MiR408 was also downregulated in samples infected with other pathogens, but it was upregulated in the presence of a beneficial diazotrophic bacteria. Our results suggest that regulation by miR408 is important in sugarcane sensing whether microorganisms are either pathogenic or beneficial, triggering specific miRNA-mediated regulatory mechanisms accordingly.
Roles of Non-Coding RNA in Sugarcane-Microbe Interaction

PubMed Central

Grativol, Clícia; Motta, Mariana R.; Ballesteros, Helkin G. F.; Peixoto, Barbara; Vieira, Lucas M.; Walter, Maria Emilia; de Armas, Elvismary M.; Entenza, Júlio O. P.; Lifschitz, Sergio; Farinelli, Laurent; Hemerly, Adriana S.

2017-01-01

Studies have highlighted the importance of non-coding RNA regulation in plant-microbe interaction. However, the roles of sugarcane microRNAs (miRNAs) in the regulation of disease responses have not been investigated. Firstly, we screened the sRNA transcriptome of sugarcane infected with Acidovorax avenae. Conserved and novel miRNAs were identified. Additionally, small interfering RNAs (siRNAs) were aligned to differentially expressed sequences from the sugarcane transcriptome. Interestingly, many siRNAs aligned to a transcript encoding a copper-transporter gene whose expression was induced in the presence of A. avenae, while the siRNAs were repressed in the presence of A. avenae. Moreover, a long intergenic non-coding RNA was identified as a potential target or decoy of miR408. To extend the bioinformatics analysis, we carried out independent inoculations and the expression patterns of six miRNAs were validated by quantitative reverse transcription-PCR (qRT-PCR). Among these miRNAs, miR408—a copper-microRNA—was downregulated. The cleavage of a putative miR408 target, a laccase, was confirmed by a modified 5′RACE (rapid amplification of cDNA ends) assay. MiR408 was also downregulated in samples infected with other pathogens, but it was upregulated in the presence of a beneficial diazotrophic bacteria. Our results suggest that regulation by miR408 is important in sugarcane sensing whether microorganisms are either pathogenic or beneficial, triggering specific miRNA-mediated regulatory mechanisms accordingly. PMID:29657296
DIANA-LncBase v2: indexing microRNA targets on non-coding transcripts

PubMed Central

Paraskevopoulou, Maria D.; Vlachos, Ioannis S.; Karagkouni, Dimitra; Georgakilas, Georgios; Kanellos, Ilias; Vergoulis, Thanasis; Zagganas, Konstantinos; Tsanakas, Panayiotis; Floros, Evangelos; Dalamagas, Theodore; Hatzigeorgiou, Artemis G.

2016-01-01

microRNAs (miRNAs) are short non-coding RNAs (ncRNAs) that act as post-transcriptional regulators of coding gene expression. Long non-coding RNAs (lncRNAs) have been recently reported to interact with miRNAs. The sponge-like function of lncRNAs introduces an extra layer of complexity in the miRNA interactome. DIANA-LncBase v1 provided a database of experimentally supported and in silico predicted miRNA Recognition Elements (MREs) on lncRNAs. The second version of LncBase (www.microrna.gr/LncBase) presents an extensive collection of miRNA:lncRNA interactions. The significantly enhanced database includes more than 70 000 low and high-throughput, (in)direct miRNA:lncRNA experimentally supported interactions, derived from manually curated publications and the analysis of 153 AGO CLIP-Seq libraries. The new experimental module presents a 14-fold increase compared to the previous release. LncBase v2 hosts in silico predicted miRNA targets on lncRNAs, identified with the DIANA-microT algorithm. The relevant module provides millions of predicted miRNA binding sites, accompanied with detailed metadata and MRE conservation metrics. LncBase v2 caters information regarding cell type specific miRNA:lncRNA regulation and enables users to easily identify interactions in 66 different cell types, spanning 36 tissues for human and mouse. Database entries are also supported by accurate lncRNA expression information, derived from the analysis of more than 6 billion RNA-Seq reads. PMID:26612864
Small Open Reading Frames, Non-Coding RNAs and Repetitive Elements in Bradyrhizobium japonicum USDA 110

PubMed Central

Hahn, Julia; Tsoy, Olga V.; Thalmann, Sebastian; Čuklina, Jelena; Gelfand, Mikhail S.

2016-01-01

Small open reading frames (sORFs) and genes for non-coding RNAs are poorly investigated components of most genomes. Our analysis of 1391 ORFs recently annotated in the soybean symbiont Bradyrhizobium japonicum USDA 110 revealed that 78% of them contain less than 80 codons. Twenty-one of these sORFs are conserved in or outside Alphaproteobacteria and most of them are similar to genes found in transposable elements, in line with their broad distribution. Stabilizing selection was demonstrated for sORFs with proteomic evidence and bll1319_ISGA which is conserved at the nucleotide level in 16 alphaproteobacterial species, 79 species from other taxa and 49 other Proteobacteria. Further we used Northern blot hybridization to validate ten small RNAs (BjsR1 to BjsR10) belonging to new RNA families. We found that BjsR1 and BjsR3 have homologs outside the genus Bradyrhizobium, and BjsR5, BjsR6, BjsR7, and BjsR10 have up to four imperfect copies in Bradyrhizobium genomes. BjsR8, BjsR9, and BjsR10 are present exclusively in nodules, while the other sRNAs are also expressed in liquid cultures. We also found that the level of BjsR4 decreases after exposure to tellurite and iron, and this down-regulation contributes to survival under high iron conditions. Analysis of additional small RNAs overlapping with 3’-UTRs revealed two new repetitive elements named Br-REP1 and Br-REP2. These REP elements may play roles in the genomic plasticity and gene regulation and could be useful for strain identification by PCR-fingerprinting. Furthermore, we studied two potential toxin genes in the symbiotic island and confirmed toxicity of the yhaV homolog bll1687 but not of the newly annotated higB homolog blr0229_ISGA in E. coli. Finally, we revealed transcription interference resulting in an antisense RNA complementary to blr1853, a gene induced in symbiosis. The presented results expand our knowledge on sORFs, non-coding RNAs and repetitive elements in B. japonicum and related bacteria. PMID:27788207
Specific inhibition of aphthovirus infection by RNAs transcribed from both the 5' and the 3' noncoding regions.

PubMed Central

Gutiérrez, A; Martínez-Salas, E; Pintado, B; Sobrino, F

1994-01-01

RNA molecules containing the 3' terminal region of foot-and-mouth disease virus (FMDV) RNA in both antisense and sense orientations were able to inhibit viral FMDV translation and infective particle formation in BHK-21 cells following comicroinjection or cotransfection with infectious viral RNA. Antisense, but not sense, transcripts from the 5' noncoding region including the proximal element of the internal ribosome entry site and the two functional initiation AUGs were also inhibitory, both in in vitro translation and in vivo in comicroinjected or cotransfected BHK-21 cells. This effect was not observed with nonrelated RNA transcripts from lambda phage. The inhibitions found were permanent, sequence specific, and dose dependent; an inverse correlation between the length of the transcript and the extent of the antiviral effect was seen. In all cases, the extent of inhibition increased when viral RNAs and transcripts were allowed to reanneal before transfection, concomitant with a decrease in the doses required. The antiviral effect was specific for FMDV, since transcripts failed to inhibit infective particle formation by other picornavirus, such as encephalomyocarditis virus. These results indicate that the ability of RNA transcripts to inhibit viral multiplication depends on their efficient hybridization with target regions on the viral genome. Furthermore, cells transfected with the 5'1as transcript, which is complementary to the 5' noncoding region, showed a significant reduction of plaque-forming ability during the course of a natural infection. RNA 5'1as was able to inhibit FMDV RNA translation in vitro, suggesting that the inhibitions observed are mediated by a blockage of the viral translation initiation. Conversely, hybridization of short sequences of both sense and antisense transcripts from the 3' end induces distortion of predicted highly ordered structural motifs, which could be required for the synthesis of negative-stranded viral RNA, and correlates with inhibition of viral propagation. Images PMID:7933126
Specific inhibition of aphthovirus infection by RNAs transcribed from both the 5' and the 3' noncoding regions.

PubMed

Gutiérrez, A; Martínez-Salas, E; Pintado, B; Sobrino, F

1994-11-01

RNA molecules containing the 3' terminal region of foot-and-mouth disease virus (FMDV) RNA in both antisense and sense orientations were able to inhibit viral FMDV translation and infective particle formation in BHK-21 cells following comicroinjection or cotransfection with infectious viral RNA. Antisense, but not sense, transcripts from the 5' noncoding region including the proximal element of the internal ribosome entry site and the two functional initiation AUGs were also inhibitory, both in in vitro translation and in vivo in comicroinjected or cotransfected BHK-21 cells. This effect was not observed with nonrelated RNA transcripts from lambda phage. The inhibitions found were permanent, sequence specific, and dose dependent; an inverse correlation between the length of the transcript and the extent of the antiviral effect was seen. In all cases, the extent of inhibition increased when viral RNAs and transcripts were allowed to reanneal before transfection, concomitant with a decrease in the doses required. The antiviral effect was specific for FMDV, since transcripts failed to inhibit infective particle formation by other picornavirus, such as encephalomyocarditis virus. These results indicate that the ability of RNA transcripts to inhibit viral multiplication depends on their efficient hybridization with target regions on the viral genome. Furthermore, cells transfected with the 5'1as transcript, which is complementary to the 5' noncoding region, showed a significant reduction of plaque-forming ability during the course of a natural infection. RNA 5'1as was able to inhibit FMDV RNA translation in vitro, suggesting that the inhibitions observed are mediated by a blockage of the viral translation initiation. Conversely, hybridization of short sequences of both sense and antisense transcripts from the 3' end induces distortion of predicted highly ordered structural motifs, which could be required for the synthesis of negative-stranded viral RNA, and correlates with inhibition of viral propagation.
Core histone genes of Giardia intestinalis: genomic organization, promoter structure, and expression

PubMed Central

Yee, Janet; Tang, Anita; Lau, Wei-Ling; Ritter, Heather; Delport, Dewald; Page, Melissa; Adam, Rodney D; Müller, Miklós; Wu, Gang

2007-01-01

Background Giardia intestinalis is a protist found in freshwaters worldwide, and is the most common cause of parasitic diarrhea in humans. The phylogenetic position of this parasite is still much debated. Histones are small, highly conserved proteins that associate tightly with DNA to form chromatin within the nucleus. There are two classes of core histone genes in higher eukaryotes: DNA replication-independent histones and DNA replication-dependent ones. Results We identified two copies each of the core histone H2a, H2b and H3 genes, and three copies of the H4 gene, at separate locations on chromosomes 3, 4 and 5 within the genome of Giardia intestinalis, but no gene encoding a H1 linker histone could be recognized. The copies of each gene share extensive DNA sequence identities throughout their coding and 5' noncoding regions, which suggests these copies have arisen from relatively recent gene duplications or gene conversions. The transcription start sites are at triplet A sequences 1–27 nucleotides upstream of the translation start codon for each gene. We determined that a 50 bp region upstream from the start of the histone H4 coding region is the minimal promoter, and a highly conserved 15 bp sequence called the histone motif (him) is essential for its activity. The Giardia core histone genes are constitutively expressed at approximately equivalent levels and their mRNAs are polyadenylated. Competition gel-shift experiments suggest that a factor within the protein complex that binds him may also be a part of the protein complexes that bind other promoter elements described previously in Giardia. Conclusion In contrast to other eukaryotes, the Giardia genome has only a single class of core histone genes that encode replication-independent histones. Our inability to locate a gene encoding the linker histone H1 leads us to speculate that the H1 protein may not be required for the compaction of Giardia's small and gene-rich genome. PMID:17425802
Adaptive evolution of the matrix extracellular phosphoglycoprotein in mammals

PubMed Central

2011-01-01

Background Matrix extracellular phosphoglycoprotein (MEPE) belongs to a family of small integrin-binding ligand N-linked glycoproteins (SIBLINGs) that play a key role in skeleton development, particularly in mineralization, phosphate regulation and osteogenesis. MEPE associated disorders cause various physiological effects, such as loss of bone mass, tumors and disruption of renal function (hypophosphatemia). The study of this developmental gene from an evolutionary perspective could provide valuable insights on the adaptive diversification of morphological phenotypes in vertebrates. Results Here we studied the adaptive evolution of the MEPE gene in 26 Eutherian mammals and three birds. The comparative genomic analyses revealed a high degree of evolutionary conservation of some coding and non-coding regions of the MEPE gene across mammals indicating a possible regulatory or functional role likely related with mineralization and/or phosphate regulation. However, the majority of the coding region had a fast evolutionary rate, particularly within the largest exon (1467 bp). Rodentia and Scandentia had distinct substitution rates with an increased accumulation of both synonymous and non-synonymous mutations compared with other mammalian lineages. Characteristics of the gene (e.g. biochemical, evolutionary rate, and intronic conservation) differed greatly among lineages of the eight mammalian orders. We identified 20 sites with significant positive selection signatures (codon and protein level) outside the main regulatory motifs (dentonin and ASARM) suggestive of an adaptive role. Conversely, we find three sites under selection in the signal peptide and one in the ASARM motif that were supported by at least one selection model. The MEPE protein tends to accumulate amino acids promoting disorder and potential phosphorylation targets. Conclusion MEPE shows a high number of selection signatures, revealing the crucial role of positive selection in the evolution of this SIBLING member. The selection signatures were found mainly outside the functional motifs, reinforcing the idea that other regions outside the dentonin and the ASARM might be crucial for the function of the protein and future studies should be undertaken to understand its importance. PMID:22103247
Intrinsic and extrinsic approaches for detecting genes in a bacterial genome.

PubMed Central

Borodovsky, M; Rudd, K E; Koonin, E V

1994-01-01

The unannotated regions of the Escherichia coli genome DNA sequence from the EcoSeq6 database, totaling 1,278 'intergenic' sequences of the combined length of 359,279 basepairs, were analyzed using computer-assisted methods with the aim of identifying putative unknown genes. The proposed strategy for finding new genes includes two key elements: i) prediction of expressed open reading frames (ORFs) using the GeneMark method based on Markov chain models for coding and non-coding regions of Escherichia coli DNA, and ii) search for protein sequence similarities using programs based on the BLAST algorithm and programs for motif identification. A total of 354 putative expressed ORFs were predicted by GeneMark. Using the BLASTX and TBLASTN programs, it was shown that 208 ORFs located in the unannotated regions of the E. coli chromosome are significantly similar to other protein sequences. Identification of 182 ORFs as probable genes was supported by GeneMark and BLAST, comprising 51.4% of the GeneMark 'hits' and 87.5% of the BLAST 'hits'. 73 putative new genes, comprising 20.6% of the GeneMark predictions, belong to ancient conserved protein families that include both eubacterial and eukaryotic members. This value is close to the overall proportion of highly conserved sequences among eubacterial proteins, indicating that the majority of the putative expressed ORFs that are predicted by GeneMark, but have no significant BLAST hits, nevertheless are likely to be real genes. The majority of the putative genes identified by BLAST search have been described since the release of the EcoSeq6 database, but about 70 genes have not been detected so far. Among these new identifications are genes encoding proteins with a variety of predicted functions including dehydrogenases, kinases, several other metabolic enzymes, ATPases, rRNA methyltransferases, membrane proteins, and different types of regulatory proteins. Images PMID:7984428
Sequencing of emerging canine distemper virus strain reveals new distinct genetic lineage in the United States associated with disease in wildlife and domestic canine populations.

PubMed

Riley, Matthew C; Wilkes, Rebecca P

2015-12-18

Recent outbreaks of canine distemper have prompted examination of strains from clinical samples submitted to the University of Tennessee College of Veterinary Medicine (UTCVM) Clinical Virology Lab. We previously described a new strain of CDV that significantly diverged from all genotypes reported to date including America 2, the genotype proposed to be the main lineage currently circulating in the US. The aim of this study was to determine when this new strain appeared and how widespread it is in animal populations, given that it has also been detected in fully vaccinated adult dogs. Additionally, we sequenced complete viral genomes to characterize the strain and determine if variation is confined to known variable regions of the genome or if the changes are also present in more conserved regions. Archived clinical samples were genotyped using real-time RT-PCR amplification and sequencing. The genomes of two unrelated viruses from a dog and fox each from a different state were sequenced and aligned with previously published genomes. Phylogenetic analysis was performed using coding, non-coding and genome-length sequences. Virus neutralization assays were used to evaluate potential antigenic differences between this strain and a vaccine strain and mixed ANOVA test was used to compare the titers. Genotyping revealed this strain first appeared in 2011 and was detected in dogs from multiple states in the Southeast region of the United States. It was the main strain detected among the clinical samples that were typed from 2011-2013, including wildlife submissions. Genome sequencing demonstrated that it is highly conserved within a new lineage and preliminary serologic testing showed significant differences in neutralizing antibody titers between this strain and the strain commonly used in vaccines. This new strain represents an emerging CDV in domestic dogs in the US, may be associated with a stable reservoir in the wildlife population, and could facilitate vaccine escape.
The spotted gar genome illuminates vertebrate evolution and facilitates human-teleost comparisons.

PubMed

Braasch, Ingo; Gehrke, Andrew R; Smith, Jeramiah J; Kawasaki, Kazuhiko; Manousaki, Tereza; Pasquier, Jeremy; Amores, Angel; Desvignes, Thomas; Batzel, Peter; Catchen, Julian; Berlin, Aaron M; Campbell, Michael S; Barrell, Daniel; Martin, Kyle J; Mulley, John F; Ravi, Vydianathan; Lee, Alison P; Nakamura, Tetsuya; Chalopin, Domitille; Fan, Shaohua; Wcisel, Dustin; Cañestro, Cristian; Sydes, Jason; Beaudry, Felix E G; Sun, Yi; Hertel, Jana; Beam, Michael J; Fasold, Mario; Ishiyama, Mikio; Johnson, Jeremy; Kehr, Steffi; Lara, Marcia; Letaw, John H; Litman, Gary W; Litman, Ronda T; Mikami, Masato; Ota, Tatsuya; Saha, Nil Ratan; Williams, Louise; Stadler, Peter F; Wang, Han; Taylor, John S; Fontenot, Quenton; Ferrara, Allyse; Searle, Stephen M J; Aken, Bronwen; Yandell, Mark; Schneider, Igor; Yoder, Jeffrey A; Volff, Jean-Nicolas; Meyer, Axel; Amemiya, Chris T; Venkatesh, Byrappa; Holland, Peter W H; Guiguen, Yann; Bobe, Julien; Shubin, Neil H; Di Palma, Federica; Alföldi, Jessica; Lindblad-Toh, Kerstin; Postlethwait, John H

2016-04-01

To connect human biology to fish biomedical models, we sequenced the genome of spotted gar (Lepisosteus oculatus), whose lineage diverged from teleosts before teleost genome duplication (TGD). The slowly evolving gar genome has conserved in content and size many entire chromosomes from bony vertebrate ancestors. Gar bridges teleosts to tetrapods by illuminating the evolution of immunity, mineralization and development (mediated, for example, by Hox, ParaHox and microRNA genes). Numerous conserved noncoding elements (CNEs; often cis regulatory) undetectable in direct human-teleost comparisons become apparent using gar: functional studies uncovered conserved roles for such cryptic CNEs, facilitating annotation of sequences identified in human genome-wide association studies. Transcriptomic analyses showed that the sums of expression domains and expression levels for duplicated teleost genes often approximate the patterns and levels of expression for gar genes, consistent with subfunctionalization. The gar genome provides a resource for understanding evolution after genome duplication, the origin of vertebrate genomes and the function of human regulatory sequences.

Identification of microRNAs and their targets in Finger millet by high throughput sequencing.

PubMed

Usha, S; Jyothi, M N; Sharadamma, N; Dixit, Rekha; Devaraj, V R; Nagesh Babu, R

2015-12-15

MicroRNAs are short non-coding RNAs which play an important role in regulating gene expression by mRNA cleavage or by translational repression. The majority of identified miRNAs were evolutionarily conserved; however, others expressed in a species-specific manner. Finger millet is an important cereal crop; nonetheless, no practical information is available on microRNAs to date. In this study, we have identified 95 conserved microRNAs belonging to 39 families and 3 novel microRNAs by high throughput sequencing. For the identified conserved and novel miRNAs a total of 507 targets were predicted. 11 miRNAs were validated and tissue specificity was determined by stem loop RT-qPCR, Northern blot. GO analyses revealed targets of miRNA were involved in wide range of regulatory functions. This study implies large number of known and novel miRNAs found in Finger millet which may play important role in growth and development. Copyright © 2015 Elsevier B.V. All rights reserved.
The spotted gar genome illuminates vertebrate evolution and facilitates human-to-teleost comparisons

PubMed Central

Braasch, Ingo; Gehrke, Andrew R.; Smith, Jeramiah J.; Kawasaki, Kazuhiko; Manousaki, Tereza; Pasquier, Jeremy; Amores, Angel; Desvignes, Thomas; Batzel, Peter; Catchen, Julian; Berlin, Aaron M.; Campbell, Michael S.; Barrell, Daniel; Martin, Kyle J.; Mulley, John F.; Ravi, Vydianathan; Lee, Alison P.; Nakamura, Tetsuya; Chalopin, Domitille; Fan, Shaohua; Wcisel, Dustin; Cañestro, Cristian; Sydes, Jason; Beaudry, Felix E. G.; Sun, Yi; Hertel, Jana; Beam, Michael J.; Fasold, Mario; Ishiyama, Mikio; Johnson, Jeremy; Kehr, Steffi; Lara, Marcia; Letaw, John H.; Litman, Gary W.; Litman, Ronda T.; Mikami, Masato; Ota, Tatsuya; Saha, Nil Ratan; Williams, Louise; Stadler, Peter F.; Wang, Han; Taylor, John S.; Fontenot, Quenton; Ferrara, Allyse; Searle, Stephen M. J.; Aken, Bronwen; Yandell, Mark; Schneider, Igor; Yoder, Jeffrey A.; Volff, Jean-Nicolas; Meyer, Axel; Amemiya, Chris T.; Venkatesh, Byrappa; Holland, Peter W. H.; Guiguen, Yann; Bobe, Julien; Shubin, Neil H.; Di Palma, Federica; Alföldi, Jessica; Lindblad-Toh, Kerstin; Postlethwait, John H.

2016-01-01

To connect human biology to fish biomedical models, we sequenced the genome of spotted gar (Lepisosteus oculatus), whose lineage diverged from teleosts before the teleost genome duplication (TGD). The slowly evolving gar genome conserved in content and size many entire chromosomes from bony vertebrate ancestors. Gar bridges teleosts to tetrapods by illuminating the evolution of immunity, mineralization, and development (e.g., Hox, ParaHox, and miRNA genes). Numerous conserved non-coding elements (CNEs, often cis-regulatory) undetectable in direct human-teleost comparisons become apparent using gar: functional studies uncovered conserved roles of such cryptic CNEs, facilitating annotation of sequences identified in human genome-wide association studies. Transcriptomic analyses revealed that the sum of expression domains and levels from duplicated teleost genes often approximate patterns and levels of gar genes, consistent with subfunctionalization. The gar genome provides a resource for understanding evolution after genome duplication, the origin of vertebrate genomes, and the function of human regulatory sequences. PMID:26950095
Comprehensive Identification of Long Non-coding RNAs in Purified Cell Types from the Brain Reveals Functional LncRNA in OPC Fate Determination

PubMed Central

Dong, Xiaomin; Chen, Kenian; Cuevas-Diaz Duran, Raquel; You, Yanan; Sloan, Steven A.; Zhang, Ye; Zong, Shan; Cao, Qilin; Barres, Ben A.; Wu, Jia Qian

2015-01-01

Long non-coding RNAs (lncRNAs) (> 200 bp) play crucial roles in transcriptional regulation during numerous biological processes. However, it is challenging to comprehensively identify lncRNAs, because they are often expressed at low levels and with more cell-type specificity than are protein-coding genes. In the present study, we performed ab initio transcriptome reconstruction using eight purified cell populations from mouse cortex and detected more than 5000 lncRNAs. Predicting the functions of lncRNAs using cell-type specific data revealed their potential functional roles in Central Nervous System (CNS) development. We performed motif searches in ENCODE DNase I digital footprint data and Mouse ENCODE promoters to infer transcription factor (TF) occupancy. By integrating TF binding and cell-type specific transcriptomic data, we constructed a novel framework that is useful for systematically identifying lncRNAs that are potentially essential for brain cell fate determination. Based on this integrative analysis, we identified lncRNAs that are regulated during Oligodendrocyte Precursor Cell (OPC) differentiation from Neural Stem Cells (NSCs) and that are likely to be involved in oligodendrogenesis. The top candidate, lnc-OPC, shows highly specific expression in OPCs and remarkable sequence conservation among placental mammals. Interestingly, lnc-OPC is significantly up-regulated in glial progenitors from experimental autoimmune encephalomyelitis (EAE) mouse models compared to wild-type mice. OLIG2-binding sites in the upstream regulatory region of lnc-OPC were identified by ChIP (chromatin immunoprecipitation)-Sequencing and validated by luciferase assays. Loss-of-function experiments confirmed that lnc-OPC plays a functional role in OPC genesis. Overall, our results substantiated the role of lncRNA in OPC fate determination and provided an unprecedented data source for future functional investigations in CNS cell types. We present our datasets and analysis results via the interactive genome browser at our laboratory website that is freely accessible to the research community. This is the first lncRNA expression database of collective populations of glia, vascular cells, and neurons. We anticipate that these studies will advance the knowledge of this major class of non-coding genes and their potential roles in neurological development and diseases. PMID:26683846
Natural variation in non-coding regions underlying phenotypic diversity in budding yeast

PubMed Central

Salinas, Francisco; de Boer, Carl G.; Abarca, Valentina; García, Verónica; Cuevas, Mara; Araos, Sebastian; Larrondo, Luis F.; Martínez, Claudio; Cubillos, Francisco A.

2016-01-01

Linkage mapping studies in model organisms have typically focused their efforts in polymorphisms within coding regions, ignoring those within regulatory regions that may contribute to gene expression variation. In this context, differences in transcript abundance are frequently proposed as a source of phenotypic diversity between individuals, however, until now, little molecular evidence has been provided. Here, we examined Allele Specific Expression (ASE) in six F1 hybrids from Saccharomyces cerevisiae derived from crosses between representative strains of the four main lineages described in yeast. ASE varied between crosses with levels ranging between 28% and 60%. Part of the variation in expression levels could be explained by differences in transcription factors binding to polymorphic cis-regulations and to differences in trans-activation depending on the allelic form of the TF. Analysis on highly expressed alleles on each background suggested ASN1 as a candidate transcript underlying nitrogen consumption differences between two strains. Further promoter allele swap analysis under fermentation conditions confirmed that coding and non-coding regions explained aspartic and glutamic acid consumption differences, likely due to a polymorphism affecting Uga3 binding. Together, we provide a new catalogue of variants to bridge the gap between genotype and phenotype. PMID:26898953
Cloning and characterization of transferrin cDNA and rapid detection of transferrin gene polymorphism in rainbow trout (Oncorhynchus mykiss).

PubMed

Tange, N; Jong-Young, L; Mikawa, N; Hirono, I; Aoki, T

1997-12-01

A cDNA clone of rainbow trout (Oncorhynchus mykiss) transferrin was obtained from a liver cDNA library. The 2537-bp cDNA sequence contained an open reading frame encoding 691 amino acids and the 5' and 3' noncoding regions. The amino acid sequences at the iron-binding sites and the two N-linked glycosylation sites, and the cysteine residues were consistent with known, conserved vertebrate transferrin cDNA sequences. Single N-linked glycosylation sites existed on the N- and C-lobe. The deduced amino acid sequence of the rainbow trout transferrin cDNA had 92.9% identities with transferrin of coho salmon (Oncorhynchus kisutch); 85%, Atlantic salmon (Salmo salar); 67.3%, medaka (Oryzias latipes); 61.3% Atlantic cod (Gadus morhua); and 59.7%, Japanese flounder (Paralichthys olivaceus). The long and accurate polymerase chain reaction (LA-PCR) was used to amplify approximately 6.5 kb of the transferrin gene from rainbow trout genomic DNA. Restriction fragment length polymorphisms (RFLPs) of the LA-PCR products revealed three digestion patterns in 22 samples.
Identification of a Csr system in Serratia marcescens 2170.

PubMed

Ito, Manabu; Nomura, Kazuki; Sugimoto, Hayuki; Watanabe, Takeshi; Suzuki, Kazushi

2014-01-01

The carbon storage regulator (Csr) global regulatory system is conserved in many eubacteria and coordinates the expression of various genes that facilitate adaptation during the major physiological growth phase. The Csr system in Escherichia coli comprises an RNA-binding protein, CsrA; small non-coding RNAs, CsrB and CsrC; and a decay factor for small RNAs, CsrD. In this study, we identified the Csr system in Serratia marcescens 2170. S. marcescens CsrA was 97% identical to E. coli CsrA. CsrB and CsrC RNAs had typical stem-loop structures, including a GGA motif that is the CsrA binding site. CsrD was composed of N-terminal two times transmembrane region and HAMP-like, GGDEF, and EAL domains. Overexpression of S. marcescens csr genes complemented the phenotype of E. coli csr mutants. S. marcescens CsrD affected the decay of CsrB and CsrC RNAs in E. coli. These results suggest that the Csr system in S. marcescens is composed of an RNA-binding protein, two Csr small RNAs, and a decay factor for Csr small RNAs.
Epigenetic Control of Cytokine Gene Expression: Regulation of the TNF/LT Locus and T Helper Cell Differentiation

PubMed Central

Falvo, James V.; Jasenosky, Luke D.; Kruidenier, Laurens; Goldfeld, Anne E.

2014-01-01

Epigenetics encompasses transient and heritable modifications to DNA and nucleosomes in the native chromatin context. For example, enzymatic addition of chemical moieties to the N-terminal “tails” of histones, particularly acetylation and methylation of lysine residues in the histone tails of H3 and H4, plays a key role in regulation of gene transcription. The modified histones, which are physically associated with gene regulatory regions that typically occur within conserved noncoding sequences, play a functional role in active, poised, or repressed gene transcription. The “histone code” defined by these modifications, along with the chromatin-binding acetylases, deacetylases, methylases, demethylases, and other enzymes that direct modifications resulting in specific patterns of histone modification, shows considerable evolutionary conservation from yeast to humans. Direct modifications at the DNA level, such as cytosine methylation at CpG motifs that represses promoter activity, are another highly conserved epigenetic mechanism of gene regulation. Furthermore, epigenetic modifications at the nucleosome or DNA level can also be coupled with higher-order intra- or interchromosomal interactions that influence the location of regulatory elements and that can place them in an environment of specific nucleoprotein complexes associated with transcription. In the mammalian immune system, epigenetic gene regulation is a crucial mechanism for a range of physiological processes, including the innate host immune response to pathogens and T cell differentiation driven by specific patterns of cytokine gene expression. Here, we will review current findings regarding epigenetic regulation of cytokine genes important in innate and/or adaptive immune responses, with a special focus upon the tumor necrosis factor/lymphotoxin locus and cytokine-driven CD4+ T cell differentiation into the Th1, Th2, and Th17 lineages. PMID:23683942
Antisense transcription is pervasive but rarely conserved in enteric bacteria.

PubMed

Raghavan, Rahul; Sloan, Daniel B; Ochman, Howard

2012-01-01

Noncoding RNAs, including antisense RNAs (asRNAs) that originate from the complementary strand of protein-coding genes, are involved in the regulation of gene expression in all domains of life. Recent application of deep-sequencing technologies has revealed that the transcription of asRNAs occurs genome-wide in bacteria. Although the role of the vast majority of asRNAs remains unknown, it is often assumed that their presence implies important regulatory functions, similar to those of other noncoding RNAs. Alternatively, many antisense transcripts may be produced by chance transcription events from promoter-like sequences that result from the degenerate nature of bacterial transcription factor binding sites. To investigate the biological relevance of antisense transcripts, we compared genome-wide patterns of asRNA expression in closely related enteric bacteria, Escherichia coli and Salmonella enterica serovar Typhimurium, by performing strand-specific transcriptome sequencing. Although antisense transcripts are abundant in both species, less than 3% of asRNAs are expressed at high levels in both species, and only about 14% appear to be conserved among species. And unlike the promoters of protein-coding genes, asRNA promoters show no evidence of sequence conservation between, or even within, species. Our findings suggest that many or even most bacterial asRNAs are nonadaptive by-products of the cell's transcription machinery. IMPORTANCE Application of high-throughput methods has revealed the expression throughout bacterial genomes of transcripts encoded on the strand complementary to protein-coding genes. Because transcription is costly, it is usually assumed that these transcripts, termed antisense RNAs (asRNAs), serve some function; however, the role of most asRNAs is unclear, raising questions about their relevance in cellular processes. Because natural selection conserves functional elements, comparisons between related species provide a method for assessing functionality genome-wide. Applying such an approach, we assayed all transcripts in two closely related bacteria, Escherichia coli and Salmonella enterica serovar Typhimurium, and demonstrate that, although the levels of genome-wide antisense transcription are similarly high in both bacteria, only a small fraction of asRNAs are shared across species. Moreover, the promoters associated with asRNAs show no evidence of sequence conservation between, or even within, species. These findings indicate that despite the genome-wide transcription of asRNAs, many of these transcripts are likely nonfunctional.
Identification of common, unique and polymorphic microsatellites among 73 cyanobacterial genomes.

PubMed

Kabra, Ritika; Kapil, Aditi; Attarwala, Kherunnisa; Rai, Piyush Kant; Shanker, Asheesh

2016-04-01

Microsatellites also known as Simple Sequence Repeats are short tandem repeats of 1-6 nucleotides. These repeats are found in coding as well as non-coding regions of both prokaryotic and eukaryotic genomes and play a significant role in the study of gene regulation, genetic mapping, DNA fingerprinting and evolutionary studies. The availability of 73 complete genome sequences of cyanobacteria enabled us to mine and statistically analyze microsatellites in these genomes. The cyanobacterial microsatellites identified through bioinformatics analysis were stored in a user-friendly database named CyanoSat, which is an efficient data representation and query system designed using ASP.net. The information in CyanoSat comprises of perfect, imperfect and compound microsatellites found in coding, non-coding and coding-non-coding regions. Moreover, it contains PCR primers with 200 nucleotides long flanking region. The mined cyanobacterial microsatellites can be freely accessed at www.compubio.in/CyanoSat/home.aspx. In addition to this 82 polymorphic, 13,866 unique and 2390 common microsatellites were also detected. These microsatellites will be useful in strain identification and genetic diversity studies of cyanobacteria.
Gene end-like sequences within the 3' non-coding region of the Nipah virus genome attenuate viral gene transcription.

PubMed

Sugai, Akihiro; Sato, Hiroki; Yoneda, Misako; Kai, Chieko

2017-08-01

The regulation of transcription during Nipah virus (NiV) replication is poorly understood. Using a bicistronic minigenome system, we investigated the involvement of non-coding regions (NCRs) in the transcriptional re-initiation efficiency of NiV RNA polymerase. Reporter assays revealed that attenuation of NiV gene expression was not constant at each gene junction, and that the attenuating property was controlled by the 3' NCR. However, this regulation was independent of the gene-end, gene-start and intergenic regions. Northern blot analysis indicated that regulation of viral gene expression by the phosphoprotein (P) and large protein (L) 3' NCRs occurred at the transcription level. We identified uridine-rich tracts within the L 3' NCR that are similar to gene-end signals. These gene-end-like sequences were recognized as weak transcription termination signals by the viral RNA polymerase, thereby reducing downstream gene transcription. Thus, we suggest that NiV has a unique mechanism of transcriptional regulation. Copyright © 2017 Elsevier Inc. All rights reserved.
The Big Entity of New RNA World: Long Non-Coding RNAs in Microvascular Complications of Diabetes.

PubMed

Raut, Satish K; Khullar, Madhu

2018-01-01

A major part of the genome is known to be transcribed into non-protein coding RNAs (ncRNAs), such as microRNA and long non-coding RNA (lncRNA). The importance of ncRNAs is being increasingly recognized in physiological and pathological processes. lncRNAs are a novel class of ncRNAs that do not code for proteins and are important regulators of gene expression. In the past, these molecules were thought to be transcriptional "noise" with low levels of evolutionary conservation. However, recent studies provide strong evidence indicating that lncRNAs are (i) regulated during various cellular processes, (ii) exhibit cell type-specific expression, (iii) localize to specific organelles, and (iv) associated with human diseases. Emerging evidence indicates an aberrant expression of lncRNAs in diabetes and diabetes-related microvascular complications. In the present review, we discuss the current state of knowledge of lncRNAs, their genesis from genome, and the mechanism of action of individual lncRNAs in the pathogenesis of microvascular complications of diabetes and therapeutic approaches.
lncRScan-SVM: A Tool for Predicting Long Non-Coding RNAs Using Support Vector Machine.

PubMed

Sun, Lei; Liu, Hui; Zhang, Lin; Meng, Jia

2015-01-01

Functional long non-coding RNAs (lncRNAs) have been bringing novel insight into biological study, however it is still not trivial to accurately distinguish the lncRNA transcripts (LNCTs) from the protein coding ones (PCTs). As various information and data about lncRNAs are preserved by previous studies, it is appealing to develop novel methods to identify the lncRNAs more accurately. Our method lncRScan-SVM aims at classifying PCTs and LNCTs using support vector machine (SVM). The gold-standard datasets for lncRScan-SVM model training, lncRNA prediction and method comparison were constructed according to the GENCODE gene annotations of human and mouse respectively. By integrating features derived from gene structure, transcript sequence, potential codon sequence and conservation, lncRScan-SVM outperforms other approaches, which is evaluated by several criteria such as sensitivity, specificity, accuracy, Matthews correlation coefficient (MCC) and area under curve (AUC). In addition, several known human lncRNA datasets were assessed using lncRScan-SVM. LncRScan-SVM is an efficient tool for predicting the lncRNAs, and it is quite useful for current lncRNA study.
Mutation in a primate-conserved retrotransposon reveals a noncoding RNA as a mediator of infantile encephalopathy

PubMed Central

Cartault, François; Munier, Patrick; Benko, Edgar; Desguerre, Isabelle; Hanein, Sylvain; Boddaert, Nathalie; Bandiera, Simonetta; Vellayoudom, Jeanine; Krejbich-Trotot, Pascale; Bintner, Marc; Hoarau, Jean-Jacques; Girard, Muriel; Génin, Emmanuelle; de Lonlay, Pascale; Fourmaintraux, Alain; Naville, Magali; Rodriguez, Diana; Feingold, Josué; Renouil, Michel; Munnich, Arnold; Westhof, Eric; Fähling, Michael; Lyonnet, Stanislas; Henrion-Caude, Alexandra

2012-01-01

The human genome is densely populated with transposons and transposon-like repetitive elements. Although the impact of these transposons and elements on human genome evolution is recognized, the significance of subtle variations in their sequence remains mostly unexplored. Here we report homozygosity mapping of an infantile neurodegenerative disease locus in a genetic isolate. Complete DNA sequencing of the 400-kb linkage locus revealed a point mutation in a primate-specific retrotransposon that was transcribed as part of a unique noncoding RNA, which was expressed in the brain. In vitro knockdown of this RNA increased neuronal apoptosis, consistent with the inappropriate dosage of this RNA in vivo and with the phenotype. Moreover, structural analysis of the sequence revealed a small RNA-like hairpin that was consistent with the putative gain of a functional site when mutated. We show here that a mutation in a unique transposable element-containing RNA is associated with lethal encephalopathy, and we suggest that RNAs that harbor evolutionarily recent repetitive elements may play important roles in human brain development. PMID:22411793
Identification of microRNA Genes in Three Opisthorchiids

PubMed Central

Ovchinnikov, Vladimir Y.; Afonnikov, Dmitry A.; Vasiliev, Gennady V.; Kashina, Elena V.; Sripa, Banchob; Mordvinov, Viacheslav A.; Katokhin, Alexey V.

2015-01-01

Background Opisthorchis felineus, O. viverrini, and Clonorchis sinensis (family Opisthorchiidae) are parasitic flatworms that pose a serious threat to humans in some countries and cause opisthorchiasis/clonorchiasis. Chronic disease may lead to a risk of carcinogenesis in the biliary ducts. MicroRNAs (miRNAs) are small noncoding RNAs that control gene expression at post-transcriptional level and are implicated in the regulation of various cellular processes during the parasite- host interplay. However, to date, the miRNAs of opisthorchiid flukes, in particular those essential for maintaining their complex biology and parasitic mode of existence, have not been satisfactorily described. Methodology/Principal Findings Using a SOLiD deep sequencing-bioinformatic approach, we identified 43 novel and 18 conserved miRNAs for O. felineus (miracidia, metacercariae and adult worms), 20 novel and 16 conserved miRNAs for O. viverrini (adult worms), and 33 novel and 18 conserved miRNAs for C. sinensis (adult worms). The analysis of the data revealed differences in the expression level of conserved miRNAs among the three species and among three the developmental stages of O. felineus. Analysis of miRNA genes revealed two gene clusters, one cluster-like region and one intronic miRNA in the genome. The presence and structure of the two gene clusters were validated using a PCR-based approach in the three flukes. Conclusions This study represents a comprehensive description of miRNAs in three members of the family Opistorchiidae, significantly expands our knowledge of miRNAs in multicellular parasites and provides a basis for understanding the structural and functional evolution of miRNAs in these metazoan parasites. Results of this study also provides novel resources for deeper understanding the complex parasite biology, for further research on the pathogenesis and molecular events of disease induced by the liver flukes. The present data may also facilitate the development of novel approaches for the prevention and treatment of opisthorchiasis/clonorchiasis. PMID:25898350
In Situ Detection of MicroRNA Expression with RNAscope Probes.

PubMed

Yin, Viravuth P

2018-01-01

Elucidating the spatial resolution of gene transcripts provides important insight into potential gene function. MicroRNAs are short, singled-stranded noncoding RNAs that control gene expression through base-pair complementarity with target mRNAs in the 3' untranslated region (UTR) and inhibiting protein expression. However, given their small size of ~22- to 24-nt and low expression levels, standard in situ hybridization detection methods are not amendable for microRNA spatial resolution. Here, I describe a technique that employs RNAscope probe design and propriety amplification technology that provides simultaneous single molecule detection of individual microRNA and its target gene. This method allows for rapid and sensitive detection of noncoding RNA transcripts in frozen tissue sections.
Long noncoding RNAs responsive to Fusarium oxysporum infection in Arabidopsis thaliana.

PubMed

Zhu, Qian-Hao; Stephen, Stuart; Taylor, Jennifer; Helliwell, Chris A; Wang, Ming-Bo

2014-01-01

Short noncoding RNAs have been demonstrated to play important roles in regulation of gene expression and stress responses, but the repertoire and functions of long noncoding RNAs (lncRNAs) remain largely unexplored, particularly in plants. To explore the role of lncRNAs in disease resistance, we used a strand-specific RNA-sequencing approach to identify lncRNAs responsive to Fusarium oxysporum infection in Arabidopsis thaliana. Antisense transcription was found in c. 20% of the annotated A. thaliana genes. Several noncoding natural antisense transcripts responsive to F. oxysporum infection were found in genes implicated in disease defense. While the majority of the novel transcriptionally active regions (TARs) were adjacent to annotated genes and could be an extension of the annotated transcripts, 159 novel intergenic TARs, including 20 F. oxysporum-responsive lncTARs, were identified. Ten F. oxysporum-induced lncTARs were functionally characterized using T-DNA insertion or RNA-interference knockdown lines, and five were demonstrated to be related to disease development. Promoter analysis suggests that some of the F. oxysporum-induced lncTARs are direct targets of transcription factor(s) responsive to pathogen attack. Our results demonstrated that strand-specific RNA sequencing is a powerful tool for uncovering hidden levels of transcriptome and that IncRNAs are important components of the antifungal networks in A. thaliana. © 2013 The Authors. New Phytologist © 2013 New Phytologist Trust.
Biological significance of long non-coding RNA FTX expression in human colorectal cancer.

PubMed

Guo, Xiao-Bo; Hua, Zhu; Li, Chen; Peng, Li-Pan; Wang, Jing-Shen; Wang, Bo; Zhi, Qiao-Ming

2015-01-01

The purpose of this study was to determine the expression of long non-coding RNA (lncRNA) FTX and analyze its prognostic and biological significance in colorectal cancer (CRC). A quantitative reverse transcription PCR was performed to detect the expression of long non-coding RNA FTX in 35 pairs of colorectal cancer and corresponding noncancerous tissues. The expression of long non-coding RNA FTX was detected in 187 colorectal cancer tissues and its correlations with clinicopathological factors of patients were examined. Univariate and multivariate analyses were performed to analyze the prognostic significance of Long Non-coding RNA FTX expression. The effects of long non-coding RNA FTX expression on malignant phenotypes of colorectal cancer cells and its possible biological significances were further determined. Long non-coding RNA FTX was significantly upregulated in colorectal cancer tissues, and low long non-coding RNA FTX expression was significantly correlated with differentiation grade, lymph vascular invasion, and clinical stage. Patients with high long non-coding RNA FTX showed poorer overall survival than those with low long non-coding RNA FTX. Multivariate analyses indicated that status of long non-coding RNA FTX was an independent prognostic factor for patients. Functional analyses showed that upregulation of long non-coding RNA FTX significantly promoted growth, migration, invasion, and increased colony formation in colorectal cancer cells. Therefore, long non-coding RNA FTX may be a potential biomarker for predicting the survival of colorectal cancer patients and might be a molecular target for treatment of human colorectal cancer.
Transcriptome interrogation of human myometrium identifies differentially expressed sense-antisense pairs of protein-coding and long non-coding RNA genes in spontaneous labor at term

PubMed Central

Romero, Roberto; Tarca, Adi; Chaemsaithong, Piya; Miranda, Jezid; Chaiworapongsa, Tinnakorn; Jia, Hui; Hassan, Sonia S.; Kalita, Cynthia A.; Cai, Juan; Yeo, Lami; Lipovich, Leonard

2014-01-01

Objective The mechanisms responsible for normal and abnormal parturition are poorly understood. Myometrial activation leading to regular uterine contractions is a key component of labor. Dysfunctional labor (arrest of dilatation and/or descent) is a leading indication for cesarean delivery. Compelling evidence suggests that most of these disorders are functional in nature, and not the result of cephalopelvic disproportion. The methodology and the datasets afforded by the post-genomic era provide novel opportunities to understand and target gene functions in these disorders. In 2012, the ENCODE Consortium elucidated the extraordinary abundance and functional complexity of long non-coding RNA genes in the human genome. The purpose of the study was to identify differentially expressed long non-coding RNA genes in human myometrium in women in spontaneous labor at term. Materials and Methods Myometrium was obtained from women undergoing cesarean deliveries who were not in labor (n=19) and women in spontaneous labor at term (n=20). RNA was extracted and profiled using an Illumina® microarray platform. The analysis of the protein coding genes from this study has been previously reported. Here, we have used computational approaches to bound the extent of long non-coding RNA representation on this platform, and to identify co-differentially expressed and correlated pairs of long non-coding RNA genes and protein-coding genes sharing the same genomic loci. Results Upon considering more than 18,498 distinct lncRNA genes compiled nonredundantly from public experimental data sources, and interrogating 2,634 that matched Illumina microarray probes, we identified co-differential expression and correlation at two genomic loci that contain coding-lncRNA gene pairs: SOCS2-AK054607 and LMCD1-NR_024065 in women in spontaneous labor at term. This co-differential expression and correlation was validated by qRT-PCR, an independent experimental method. Intriguingly, one of the two lncRNA genes differentially expressed in term labor had a key genomic structure element, a splice site that lacked evolutionary conservation beyond primates. Conclusions We provide for the first time evidence for coordinated differential expression and correlation of cis-encoded antisense lncRNAs and protein-coding genes with known, as well as novel roles in pregnancy in the myometrium of women in spontaneous labor at term. PMID:24168098
Inverted repeat Alu elements in the human lincRNA-p21 adopt a conserved secondary structure that regulates RNA function

PubMed Central

Chillón, Isabel; Pyle, Anna M.

2016-01-01

LincRNA-p21 is a long intergenic non-coding RNA (lincRNA) involved in the p53-mediated stress response. We sequenced the human lincRNA-p21 (hLincRNA-p21) and found that it has a single exon that includes inverted repeat Alu elements (IRAlus). Sense and antisense Alu elements fold independently of one another into a secondary structure that is conserved in lincRNA-p21 among primates. Moreover, the structures formed by IRAlus are involved in the localization of hLincRNA-p21 in the nucleus, where hLincRNA-p21 colocalizes with paraspeckles. Our results underscore the importance of IRAlus structures for the function of hLincRNA-p21 during the stress response. PMID:27378782
Metazoan tRNA introns generate stable circular RNAs in vivo

PubMed Central

Lu, Zhipeng; Filonov, Grigory S.; Noto, John J.; Schmidt, Casey A.; Hatkevich, Talia L.; Wen, Ying; Jaffrey, Samie R.; Matera, A. Gregory

2015-01-01

We report the discovery of a class of abundant circular noncoding RNAs that are produced during metazoan tRNA splicing. These transcripts, termed tRNA intronic circular (tric)RNAs, are conserved features of animal transcriptomes. Biogenesis of tricRNAs requires anciently conserved tRNA sequence motifs and processing enzymes, and their expression is regulated in an age-dependent and tissue-specific manner. Furthermore, we exploited this biogenesis pathway to develop an in vivo expression system for generating “designer” circular RNAs in human cells. Reporter constructs expressing RNA aptamers such as Spinach and Broccoli can be used to follow the transcription and subcellular localization of tricRNAs in living cells. Owing to the superior stability of circular vs. linear RNA isoforms, this expression system has a wide range of potential applications, from basic research to pharmaceutical science. PMID:26194134

Comparative genomics of ParaHox clusters of teleost fishes: gene cluster breakup and the retention of gene sets following whole genome duplications

PubMed Central

Siegel, Nicol; Hoegg, Simone; Salzburger, Walter; Braasch, Ingo; Meyer, Axel

2007-01-01

Background The evolutionary lineage leading to the teleost fish underwent a whole genome duplication termed FSGD or 3R in addition to two prior genome duplications that took place earlier during vertebrate evolution (termed 1R and 2R). Resulting from the FSGD, additional copies of genes are present in fish, compared to tetrapods whose lineage did not experience the 3R genome duplication. Interestingly, we find that ParaHox genes do not differ in number in extant teleost fishes despite their additional genome duplication from the genomic situation in mammals, but they are distributed over twice as many paralogous regions in fish genomes. Results We determined the DNA sequence of the entire ParaHox C1 paralogon in the East African cichlid fish Astatotilapia burtoni, and compared it to orthologous regions in other vertebrate genomes as well as to the paralogous vertebrate ParaHox D paralogons. Evolutionary relationships among genes from these four chromosomal regions were studied with several phylogenetic algorithms. We provide evidence that the genes of the ParaHox C paralogous cluster are duplicated in teleosts, just as it had been shown previously for the D paralogon genes. Overall, however, synteny and cluster integrity seems to be less conserved in ParaHox gene clusters than in Hox gene clusters. Comparative analyses of non-coding sequences uncovered conserved, possibly co-regulatory elements, which are likely to contain promoter motives of the genes belonging to the ParaHox paralogons. Conclusion There seems to be strong stabilizing selection for gene order as well as gene orientation in the ParaHox C paralogon, since with a few exceptions, only the lengths of the introns and intergenic regions differ between the distantly related species examined. The high degree of evolutionary conservation of this gene cluster's architecture in particular – but possibly clusters of genes more generally – might be linked to the presence of promoter, enhancer or inhibitor motifs that serve to regulate more than just one gene. Therefore, deletions, inversions or relocations of individual genes could destroy the regulation of the clustered genes in this region. The existence of such a regulation network might explain the evolutionary conservation of gene order and orientation over the course of hundreds of millions of years of vertebrate evolution. Another possible explanation for the highly conserved gene order might be the existence of a regulator not located immediately next to its corresponding gene but further away since a relocation or inversion would possibly interrupt this interaction. Different ParaHox clusters were found to have experienced differential gene loss in teleosts. Yet the complete set of these homeobox genes was maintained, albeit distributed over almost twice the number of chromosomes. Selection due to dosage effects and/or stoichiometric disturbance might act more strongly to maintain a modal number of homeobox genes (and possibly transcription factors more generally) per genome, yet permit the accumulation of other (non regulatory) genes associated with these homeobox gene clusters. PMID:17822543
Structural and dynamic properties of the C-terminal region of the Escherichia coli RNA chaperone Hfq: integrative experimental and computational studies.

PubMed

Wen, Bin; Wang, Weiwei; Zhang, Jiahai; Gong, Qingguo; Shi, Yunyu; Wu, Jihui; Zhang, Zhiyong

2017-08-09

In Escherichia coli, hexameric Hfq is an important RNA chaperone that facilitates small RNA-mediated post-transcriptional regulation. The Hfq monomer consists of an evolutionarily conserved Sm domain (residues 1-65) and a flexible C-terminal region (residues 66-102). It has been recognized that the existence of the C-terminal region is important for the function of Hfq, but its detailed structural and dynamic properties remain elusive due to its disordered nature. In this work, using integrative experimental techniques, such as nuclear magnetic resonance spectroscopy and small-angle X-ray scattering, as well as multi-scale computational simulations, new insights into the structure and dynamics of the C-terminal region in the context of the Hfq hexamer are provided. Although the C-terminal region is intrinsically disordered, some residues (83-86) are motionally restricted. The hexameric core may affect the secondary structure propensity of the C-terminal region, due to transient interactions between them. The residues at the rim and the proximal side of the core have significantly more transient contacts with the C-terminal region than those residues at the distal side, which may facilitate the function of the C-terminal region in the release of double-stranded RNAs and the cycling of small non-coding RNAs. Structure ensembles constructed by fitting the experimental data also support that the C-terminal region prefers to locate at the proximal side. From multi-scale simulations, we propose that the C-terminal region may play a dual role of steric effect (especially at the proximal side) and recruitment (at the both sides) in the binding process of RNA substrates. Interestingly, we have found that these motionally restricted residues may serve as important binding sites for the incoming RNAs that is probably driven by favorable electrostatic interactions. These integrative studies may aid in our understanding of the functional role of the C-terminal region of Hfq.
In silico screening of the chicken genome for overlaps between genomic regions: microRNA genes, coding and non-coding transcriptional units, QTL, and genetic variations.

PubMed

Zorc, Minja; Kunej, Tanja

2016-05-01

MicroRNAs (miRNAs) are a class of non-coding RNAs involved in posttranscriptional regulation of target genes. Regulation requires complementarity between target mRNA and the mature miRNA seed region, responsible for their recognition and binding. It has been estimated that each miRNA targets approximately 200 genes, and genetic variability of miRNA genes has been reported to affect phenotypic variability and disease susceptibility in humans, livestock species, and model organisms. Polymorphisms in miRNA genes could therefore represent biomarkers for phenotypic traits in livestock animals. In our previous study, we collected polymorphisms within miRNA genes in chicken. In the present study, we identified miRNA-related genomic overlaps to prioritize genomic regions of interest for further functional studies and biomarker discovery. Overlapping genomic regions in chicken were analyzed using the following bioinformatics tools and databases: miRNA SNiPer, Ensembl, miRBase, NCBI Blast, and QTLdb. Out of 740 known pre-miRNA genes, 263 (35.5 %) contain polymorphisms; among them, 35 contain more than three polymorphisms The most polymorphic miRNA genes in chicken are gga-miR-6662, containing 23 single nucleotide polymorphisms (SNPs) within the pre-miRNA region, including five consecutive SNPs, and gga-miR-6688, containing ten polymorphisms including three consecutive polymorphisms. Several miRNA-related genomic hotspots have been revealed in chicken genome; polymorphic miRNA genes are located within protein-coding and/or non-coding transcription units and quantitative trait loci (QTL) associated with production traits. The present study includes the first description of an exonic miRNA in a chicken genome, an overlap between the miRNA gene and the exon of the protein-coding gene (gga-miR-6578/HADHB), and the first report of a missense polymorphism located within a mature miRNA seed region. Identified miRNA-related genomic hotspots in chicken can serve researchers as a starting point for further functional studies and association studies with poultry production and health traits and the basis for systematic screening of exonic miRNAs and missense/miRNA seed polymorphisms in other genomes.
Outbreak of poliomyelitis in Finland in 1984-85 - Re-analysis of viral sequences using the current standard approach.

PubMed

Simonen, Marja-Leena; Roivainen, Merja; Iber, Jane; Burns, Cara; Hovi, Tapani

2010-01-01

In 1984, a wild type 3 poliovirus (PV3/FIN84) spread all over Finland causing nine cases of paralytic poliomyelitis and one case of aseptic meningitis. The outbreak was ended in 1985 with an intensive vaccination campaign. By limited sequence comparison with previously isolated PV3 strains, closest relatives of PV3/FIN84 were found among strains circulating in the Mediterranean region. Now we wanted to reanalyse the relationships using approaches currently exploited in poliovirus surveillance. Cell lysates of 22 strains isolated during the outbreak and stored frozen were subjected to RT-PCR amplification in three genomic regions without prior subculture. Sequences of the entire VP1 coding region, 150 nucleotides in the VP1-2A junction, most of the 5' non-coding region, partial sequences of the 3D RNA polymerase coding region and partial 3' non-coding region were compared within the outbreak and with sequences available in data banks. In addition, complete nucleotide sequences were obtained for 2 strains isolated from two different cases of disease during the outbreak. The results confirmed the previously described wide intraepidemic variation of the strains, including amino acid substitutions in antigenic sites, as well as the likely Mediterranean region origin of the strains. Simplot and bootscanning analyses of the complete genomes indicated complicated evolutionary history of the non-capsid coding regions of the genome suggesting several recombinations with different HEV-C viruses in the past.
Biological significance of long non-coding RNA FTX expression in human colorectal cancer

PubMed Central

Guo, Xiao-Bo; Hua, Zhu; Li, Chen; Peng, Li-Pan; Wang, Jing-Shen; Wang, Bo; Zhi, Qiao-Ming

2015-01-01

The purpose of this study was to determine the expression of long non-coding RNA (lncRNA) FTX and analyze its prognostic and biological significance in colorectal cancer (CRC). A quantitative reverse transcription PCR was performed to detect the expression of long non-coding RNA FTX in 35 pairs of colorectal cancer and corresponding noncancerous tissues. The expression of long non-coding RNA FTX was detected in 187 colorectal cancer tissues and its correlations with clinicopathological factors of patients were examined. Univariate and multivariate analyses were performed to analyze the prognostic significance of Long Non-coding RNA FTX expression. The effects of long non-coding RNA FTX expression on malignant phenotypes of colorectal cancer cells and its possible biological significances were further determined. Long non-coding RNA FTX was significantly upregulated in colorectal cancer tissues, and low long non-coding RNA FTX expression was significantly correlated with differentiation grade, lymph vascular invasion, and clinical stage. Patients with high long non-coding RNA FTX showed poorer overall survival than those with low long non-coding RNA FTX. Multivariate analyses indicated that status of long non-coding RNA FTX was an independent prognostic factor for patients. Functional analyses showed that upregulation of long non-coding RNA FTX significantly promoted growth, migration, invasion, and increased colony formation in colorectal cancer cells. Therefore, long non-coding RNA FTX may be a potential biomarker for predicting the survival of colorectal cancer patients and might be a molecular target for treatment of human colorectal cancer. PMID:26629053
ALUminating the Path of Atherosclerosis Progression: Chaos Theory Suggests a Role for Alu Repeats in the Development of Atherosclerotic Vascular Disease.

PubMed

Hueso, Miguel; Cruzado, Josep M; Torras, Joan; Navarro, Estanislao

2018-06-12

Atherosclerosis (ATH) and coronary artery disease (CAD) are chronic inflammatory diseases with an important genetic background; they derive from the cumulative effect of multiple common risk alleles, most of which are located in genomic noncoding regions. These complex diseases behave as nonlinear dynamical systems that show a high dependence on their initial conditions; thus, long-term predictions of disease progression are unreliable. One likely possibility is that the nonlinear nature of ATH could be dependent on nonlinear correlations in the structure of the human genome. In this review, we show how chaos theory analysis has highlighted genomic regions that have shared specific structural constraints, which could have a role in ATH progression. These regions were shown to be enriched with repetitive sequences of the Alu family, genomic parasites that have colonized the human genome, which show a particular secondary structure and are involved in the regulation of gene expression. Here, we show the impact of Alu elements on the mechanisms that regulate gene expression, especially highlighting the molecular mechanisms via which the Alu elements alter the inflammatory response. We devote special attention to their relationship with the long noncoding RNA (lncRNA); antisense noncoding RNA in the INK4 locus ( ANRIL ), a risk factor for ATH; their role as microRNA (miRNA) sponges; and their ability to interfere with the regulatory circuitry of the (nuclear factor kappa B) NF-κB response. We aim to characterize ATH as a nonlinear dynamic system, in which small initial alterations in the expression of a number of repetitive elements are somehow amplified to reach phenotypic significance.
Long Noncoding RNA LINC00958 Accelerates Gliomagenesis Through Regulating miR-203/CDK2.

PubMed

Guo, Erkun; Liang, Chaohui; He, Xin; Song, Guozhi; Liu, Hongjiang; Lv, Zhongqiang; Guan, Jianchao; Yang, Dezhen; Zheng, Jiapeng

2018-05-01

Increasing evidence has indicated that long noncoding RNAs (lncRNAs) play crucial roles in various biological processes, including glioma. However, the underlying mechanism of lncRNAs in gliomagenesis is still ambiguous. In this study, we aim to investigate the role of long intergenic noncoding RNA 00958 (LINC00958) in the tumorigenesis of glioma. Results revealed that LINC00958 was significantly upregulated in glioma tissues and cell lines compared with that of adjacent normal brain tissues and normal human astrocytes. Moreover, the ectopic overexpression of LINC00958 was correlated with poor prognosis of glioma patients. Loss-of-function experiments indicated that LINC00958 knockdown suppressed glioma cell proliferation, invasion, and induced cycle arrest at G0/G1 phase in vitro, and inhibited tumor growth in vivo. Bioinformatics programs and luciferase reporter assay revealed that miR-203 shared complementary binding sites with both 3'-untranslated region of LINC00958 and CDK2. In summary, our study concludes that LINC00958 acts as an oncogenic gene in the gliomagenesis through miR-203-CDK2 regulation, providing a novel insight into glioma tumorigenesis.
Changes in the Coding and Non-coding Transcriptome and DNA Methylome that Define the Schwann Cell Repair Phenotype after Nerve Injury.

PubMed

Arthur-Farraj, Peter J; Morgan, Claire C; Adamowicz, Martyna; Gomez-Sanchez, Jose A; Fazal, Shaline V; Beucher, Anthony; Razzaghi, Bonnie; Mirsky, Rhona; Jessen, Kristjan R; Aitman, Timothy J

2017-09-12

Repair Schwann cells play a critical role in orchestrating nerve repair after injury, but the cellular and molecular processes that generate them are poorly understood. Here, we perform a combined whole-genome, coding and non-coding RNA and CpG methylation study following nerve injury. We show that genes involved in the epithelial-mesenchymal transition are enriched in repair cells, and we identify several long non-coding RNAs in Schwann cells. We demonstrate that the AP-1 transcription factor C-JUN regulates the expression of certain micro RNAs in repair Schwann cells, in particular miR-21 and miR-34. Surprisingly, unlike during development, changes in CpG methylation are limited in injury, restricted to specific locations, such as enhancer regions of Schwann cell-specific genes (e.g., Nedd4l), and close to local enrichment of AP-1 motifs. These genetic and epigenomic changes broaden our mechanistic understanding of the formation of repair Schwann cell during peripheral nervous system tissue repair. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Characterization of a novel hepadnavirus in the white sucker (Catostomus commersonii) from the Great Lakes Region of the USA

USGS Publications Warehouse

Hahn, Cassidy M.; Iwanowicz, Luke R.; Cornman, Robert S.; Conway, Carla M.; Winton, James R.; Blazer, Vicki S.

2015-01-01

The white sucker Catostomus commersonii is a freshwater teleost often utilized as a resident sentinel. Here, we sequenced the full genome of a hepatitis B-like virus that infects white suckers from the Great Lakes Region of the USA. Dideoxysequencing confirmed the white sucker hepatitis B virus (WSHBV) has a circular genome (3542 bp) with the prototypical codon organization of hepadnaviruses. Electron microscopy demonstrated that complete virions of approximately 40 nm were present in the plasma of infected fish. Compared to avi- and orthohepadnaviruses, sequence conservation of the core, polymerase and surface proteins was low and ranged from 16-27% at the amino acid level. An X protein homologue common to the orthohepadnaviruses was not present. The WSHBV genome included an atypical, presumptively non-coding region absent in previously described hepadnaviruses. Phylogenetic analyses confirmed WSHBV as distinct from previously documented hepadnaviruses. The level of divergence in protein sequences between WSHBV other hepadnaviruses, and the identification of an HBV-like sequence in an African cichlid provide evidence that a novel genus of the family Hepadnaviridae may need to be established that includes these hepatitis B-like viruses in fishes. Viral transcription was observed in 9.5% (16 of 169) of white suckers evaluated. The prevalence of hepatic tumors in these fish was 4.9%, of which only 2.4% were positive for both virus and hepatic tumors. These results are not sufficient to draw inferences regarding the association of WSHBV and carcinogenesis in white sucker.
Functional noncoding sequences derived from SINEs in the mammalian genome

PubMed Central

Nishihara, Hidenori; Smit, Arian F.A.; Okada, Norihiro

2006-01-01

Recent comparative analyses of mammalian sequences have revealed that a large number of nonprotein-coding genomic regions are under strong selective constraint. Here, we report that some of these loci have been derived from a newly defined family of ancient SINEs (short interspersed repetitive elements). This is a surprising result, as SINEs and other transposable elements are commonly thought to be genomic parasites. We named the ancient SINE family AmnSINE1, for Amniota SINE1, because we found it to be present in mammals as well as in birds, and some copies predate the mammalian-bird split 310 million years ago (Mya). AmnSINE1 has a chimeric structure of a 5S rRNA and a tRNA-derived SINE, and is related to five tRNA-derived SINE families that we characterized here in the coelacanth, dogfish shark, hagfish, and amphioxus genomes. All of the newly described SINE families have a common central domain that is also shared by zebrafish SINE3, and we collectively name them the DeuSINE (Deuterostomia SINE) superfamily. Notably, of the ∼1000 still identifiable copies of AmnSINE1 in the human genome, 105 correspond to loci phylogenetically highly conserved among mammalian orthologs. The conservation is strongest over the central domain. Thus, AmnSINE1 appears to be the best example of a transposable element of which a significant fraction of the copies have acquired genomic functionality. PMID:16717141
Allergic TH2 Response Governed by B-Cell Lymphoma 6 Function in Naturally Occurring Memory Phenotype CD4+ T Cells

PubMed Central

Ogasawara, Takashi; Kohashi, Yuko; Ikari, Jun; Taniguchi, Toshibumi; Tsuruoka, Nobuhide; Watanabe-Takano, Haruko; Fujimura, Lisa; Sakamoto, Akemi; Hatano, Masahiko; Hirata, Hirokuni; Fukushima, Yasutsugu; Fukuda, Takeshi; Kurasawa, Kazuhiro; Tatsumi, Koichiro; Tokuhisa, Takeshi; Arima, Masafumi

2018-01-01

Transcriptional repressor B-cell lymphoma 6 (Bcl6) appears to regulate TH2 immune responses in allergies, but its precise role is unclear. We previously reported that Bcl6 suppressed IL-4 production in naïve CD4+ T cell-derived memory TH2 cells. To investigate Bcl6 function in allergic responses in naturally occurring memory phenotype CD4+ T (MPT) cells and their derived TH2 (MPTH2) cells, Bcl6-manipulated mice, highly conserved intron enhancer (hcIE)-deficient mice, and reporter mice for conserved noncoding sequence 2 (CNS2) 3′ distal enhancer region were used to elucidate Bcl6 function in MPT cells. The molecular mechanisms of Bcl6-mediated TH2 cytokine gene regulation were elucidated using cellular and molecular approaches. Bcl6 function in MPT cells was determined using adoptive transfer to naïve mice, which were assessed for allergic airway inflammation. Bcl6 suppressed IL-4 production in MPT and MPTH2 cells by suppressing CNS2 enhancer activity. Bcl6 downregulated Il4 expression in MPTH2 cells, but not MPT cells, by suppressing hcIE activity. The inhibitory functions of Bcl6 in MPT and MPTH2 cells attenuated allergic responses. Bcl6 is a critical regulator of IL-4 production by MPT and MPTH2 cells in TH2 immune responses related to the pathogenesis of allergies. PMID:29696026
A conserved α-proteobacterial small RNA contributes to osmoadaptation and symbiotic efficiency of rhizobia on legume roots.

PubMed

Robledo, Marta; Peregrina, Alexandra; Millán, Vicenta; García-Tomsig, Natalia I; Torres-Quesada, Omar; Mateos, Pedro F; Becker, Anke; Jiménez-Zurdo, José I

2017-07-01

Small non-coding RNAs (sRNAs) are expected to have pivotal roles in the adaptive responses underlying symbiosis of nitrogen-fixing rhizobia with legumes. Here, we provide primary insights into the function and activity mechanism of the Sinorhizobium meliloti trans-sRNA NfeR1 (Nodule Formation Efficiency RNA). Northern blot probing and transcription tracking with fluorescent promoter-reporter fusions unveiled high nfeR1 expression in response to salt stress and throughout the symbiotic interaction. The strength and differential regulation of nfeR1 transcription are conferred by a motif, which is conserved in nfeR1 promoter regions in α-proteobacteria. NfeR1 loss-of-function compromised osmoadaptation of free-living bacteria, whilst causing misregulation of salt-responsive genes related to stress adaptation, osmolytes catabolism and membrane trafficking. Nodulation tests revealed that lack of NfeR1 affected competitiveness, infectivity, nodule development and symbiotic efficiency of S. meliloti on alfalfa roots. Comparative computer predictions and a genetic reporter assay evidenced a redundant role of three identical NfeR1 unpaired anti Shine-Dalgarno motifs for targeting and downregulation of translation of multiple mRNAs from transporter genes. Our data provide genetic evidence of the hyperosmotic conditions of the endosymbiotic compartments. NfeR1-mediated gene regulation in response to this cue could contribute to coordinate nutrient uptake with the metabolic reprogramming concomitant to symbiotic transitions. © 2017 Society for Applied Microbiology and John Wiley & Sons Ltd.
Distribution of RPTLN Genes Across Reptilia: Hypothesized Role for RPTLN in the Evolution of SVMPs.

PubMed

Sanz-Soler, Raquel; Sanz, Libia; Calvete, Juan J

2016-11-01

We report the cloning, full-length sequencing, and broad distribution of reptile-specific RPTLN genes across a number of Anapsida (Testudines), Diapsida (Serpentes, Sauria), and Archosauria (Crocodylia) taxa. The remarkable structural conservation of RPTLN genes in species that had a common ancestor more than 250 million years ago, their low transcriptional level, and the lack of evidence for RPTLN translation in any reptile organ investigated, suggest for this ancient gene family a yet elusive function as long noncoding RNAs. The high conservation in extant snake venom metalloproteinases (SVMPs) of the signal peptide sequence coded for by RPTLN genes strongly suggests that this region may have played a key role in the recruitment and restricted expression of SVMP genes in the venom gland of Caenophidian snakes, some 60-50 Mya. More recently, 23-16 Mya, the neofunctionalization of an RPTLN copy in the venom gland of snakes of the genera Macrovipera and Daboia marked the beginning of the evolutionary history of a new family of disintegrins, the α 1 β 1 -collagen binding antagonists, short-RTS/KTS disintegrins. This evolutionary scenario predicts that venom gland RPTLN and SVMP genes may share tissue-specific regulatory elements. Future genomic studies should support or refute this hypothesis. © The Author 2016. Published by Oxford University Press on behalf of the Society for Integrative and Comparative Biology. All rights reserved. For permissions please email: journals.permissions@oup.com.
COME: a robust coding potential calculation tool for lncRNA identification and characterization based on multiple features.

PubMed

Hu, Long; Xu, Zhiyu; Hu, Boqin; Lu, Zhi John

2017-01-09

Recent genomic studies suggest that novel long non-coding RNAs (lncRNAs) are specifically expressed and far outnumber annotated lncRNA sequences. To identify and characterize novel lncRNAs in RNA sequencing data from new samples, we have developed COME, a coding potential calculation tool based on multiple features. It integrates multiple sequence-derived and experiment-based features using a decompose-compose method, which makes it more accurate and robust than other well-known tools. We also showed that COME was able to substantially improve the consistency of predication results from other coding potential calculators. Moreover, COME annotates and characterizes each predicted lncRNA transcript with multiple lines of supporting evidence, which are not provided by other tools. Remarkably, we found that one subgroup of lncRNAs classified by such supporting features (i.e. conserved local RNA secondary structure) was highly enriched in a well-validated database (lncRNAdb). We further found that the conserved structural domains on lncRNAs had better chance than other RNA regions to interact with RNA binding proteins, based on the recent eCLIP-seq data in human, indicating their potential regulatory roles. Overall, we present COME as an accurate, robust and multiple-feature supported method for the identification and characterization of novel lncRNAs. The software implementation is available at https://github.com/lulab/COME. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Primate-specific evolution of noncoding element insertion into PLA2G4C and human preterm birth

PubMed Central

2010-01-01

Background The onset of birth in humans, like other apes, differs from non-primate mammals in its endocrine physiology. We hypothesize that higher primate-specific gene evolution may lead to these differences and target genes involved in human preterm birth, an area of global health significance. Methods We performed a comparative genomics screen of highly conserved noncoding elements and identified PLA2G4C, a phospholipase A isoform involved in prostaglandin biosynthesis as human accelerated. To examine whether this gene demonstrating primate-specific evolution was associated with birth timing, we genotyped and analyzed 8 common single nucleotide polymorphisms (SNPs) in PLA2G4C in US Hispanic (n = 73 preterm, 292 control), US White (n = 147 preterm, 157 control) and US Black (n = 79 preterm, 166 control) mothers. Results Detailed structural and phylogenic analysis of PLA2G4C suggested a short genomic element within the gene duplicated from a paralogous highly conserved element on chromosome 1 specifically in primates. SNPs rs8110925 and rs2307276 in US Hispanics and rs11564620 in US Whites were significant after correcting for multiple tests (p < 0.006). Additionally, rs11564620 (Thr360Pro) was associated with increased metabolite levels of the prostaglandin thromboxane in healthy individuals (p = 0.02), suggesting this variant may affect PLA2G4C activity. Conclusions Our findings suggest that variation in PLA2G4C may influence preterm birth risk by increasing levels of prostaglandins, which are known to regulate labor. PMID:21184677
Phylum-Level Conservation of Regulatory Information in Nematodes despite Extensive Non-coding Sequence Divergence

PubMed Central

Gordon, Kacy L.; Arthur, Robert K.; Ruvinsky, Ilya

2015-01-01

Gene regulatory information guides development and shapes the course of evolution. To test conservation of gene regulation within the phylum Nematoda, we compared the functions of putative cis-regulatory sequences of four sets of orthologs (unc-47, unc-25, mec-3 and elt-2) from distantly-related nematode species. These species, Caenorhabditis elegans, its congeneric C. briggsae, and three parasitic species Meloidogyne hapla, Brugia malayi, and Trichinella spiralis, represent four of the five major clades in the phylum Nematoda. Despite the great phylogenetic distances sampled and the extensive sequence divergence of nematode genomes, all but one of the regulatory elements we tested are able to drive at least a subset of the expected gene expression patterns. We show that functionally conserved cis-regulatory elements have no more extended sequence similarity to their C. elegans orthologs than would be expected by chance, but they do harbor motifs that are important for proper expression of the C. elegans genes. These motifs are too short to be distinguished from the background level of sequence similarity, and while identical in sequence they are not conserved in orientation or position. Functional tests reveal that some of these motifs contribute to proper expression. Our results suggest that conserved regulatory circuitry can persist despite considerable turnover within cis elements. PMID:26020930
Transcriptional dynamics of a conserved gene expression network associated with craniofacial divergence in Arctic charr.

PubMed

Ahi, Ehsan Pashay; Kapralova, Kalina Hristova; Pálsson, Arnar; Maier, Valerie Helene; Gudbrandsson, Jóhannes; Snorrason, Sigurdur S; Jónsson, Zophonías O; Franzdóttir, Sigrídur Rut

2014-01-01

Understanding the molecular basis of craniofacial variation can provide insights into key developmental mechanisms of adaptive changes and their role in trophic divergence and speciation. Arctic charr (Salvelinus alpinus) is a polymorphic fish species, and, in Lake Thingvallavatn in Iceland, four sympatric morphs have evolved distinct craniofacial structures. We conducted a gene expression study on candidates from a conserved gene coexpression network, focusing on the development of craniofacial elements in embryos of two contrasting Arctic charr morphotypes (benthic and limnetic). Four Arctic charr morphs were studied: one limnetic and two benthic morphs from Lake Thingvallavatn and a limnetic reference aquaculture morph. The presence of morphological differences at developmental stages before the onset of feeding was verified by morphometric analysis. Following up on our previous findings that Mmp2 and Sparc were differentially expressed between morphotypes, we identified a network of genes with conserved coexpression across diverse vertebrate species. A comparative expression study of candidates from this network in developing heads of the four Arctic charr morphs verified the coexpression relationship of these genes and revealed distinct transcriptional dynamics strongly correlated with contrasting craniofacial morphologies (benthic versus limnetic). A literature review and Gene Ontology analysis indicated that a significant proportion of the network genes play a role in extracellular matrix organization and skeletogenesis, and motif enrichment analysis of conserved noncoding regions of network candidates predicted a handful of transcription factors, including Ap1 and Ets2, as potential regulators of the gene network. The expression of Ets2 itself was also found to associate with network gene expression. Genes linked to glucocorticoid signalling were also studied, as both Mmp2 and Sparc are responsive to this pathway. Among those, several transcriptional targets and upstream regulators showed differential expression between the contrasting morphotypes. Interestingly, although selected network genes showed overlapping expression patterns in situ and no morph differences, Timp2 expression patterns differed between morphs. Our comparative study of transcriptional dynamics in divergent craniofacial morphologies of Arctic charr revealed a conserved network of coexpressed genes sharing functional roles in structural morphogenesis. We also implicate transcriptional regulators of the network as targets for future functional studies.
The complete mitochondrial genome of Chrysopa pallens (Insecta, Neuroptera, Chrysopidae).

PubMed

He, Kun; Chen, Zhe; Yu, Dan-Na; Zhang, Jia-Yong

2012-10-01

The complete mitochondrial genome of Chrysopa pallens (Neuroptera, Chrysopidae) was sequenced. It consists of 13 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA (rRNA) genes, and a control region (AT-rich region). The total length of C. pallens mitogenome is 16,723 bp with 79.5% AT content, and the length of control region is 1905 bp with 89.1% AT content. The non-coding regions of C. pallens include control region between 12S rRNA and trnI genes, and a 75-bp space region between trnI and trnQ genes.
Lineage-Specific Genome Architecture Links Enhancers and Non-coding Disease Variants to Target Gene Promoters.

PubMed

Javierre, Biola M; Burren, Oliver S; Wilder, Steven P; Kreuzhuber, Roman; Hill, Steven M; Sewitz, Sven; Cairns, Jonathan; Wingett, Steven W; Várnai, Csilla; Thiecke, Michiel J; Burden, Frances; Farrow, Samantha; Cutler, Antony J; Rehnström, Karola; Downes, Kate; Grassi, Luigi; Kostadima, Myrto; Freire-Pritchett, Paula; Wang, Fan; Stunnenberg, Hendrik G; Todd, John A; Zerbino, Daniel R; Stegle, Oliver; Ouwehand, Willem H; Frontini, Mattia; Wallace, Chris; Spivakov, Mikhail; Fraser, Peter

2016-11-17

Long-range interactions between regulatory elements and gene promoters play key roles in transcriptional regulation. The vast majority of interactions are uncharted, constituting a major missing link in understanding genome control. Here, we use promoter capture Hi-C to identify interacting regions of 31,253 promoters in 17 human primary hematopoietic cell types. We show that promoter interactions are highly cell type specific and enriched for links between active promoters and epigenetically marked enhancers. Promoter interactomes reflect lineage relationships of the hematopoietic tree, consistent with dynamic remodeling of nuclear architecture during differentiation. Interacting regions are enriched in genetic variants linked with altered expression of genes they contact, highlighting their functional role. We exploit this rich resource to connect non-coding disease variants to putative target promoters, prioritizing thousands of disease-candidate genes and implicating disease pathways. Our results demonstrate the power of primary cell promoter interactomes to reveal insights into genomic regulatory mechanisms underlying common diseases. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
Recombined sequences between the non-coding control regions of JC and BK viruses found in the urine of a renal transplantation patient.

PubMed

Liaw, Yu-Ching; Chen, Cheng-Hsu; Shu, Kuo-Hsiung; Fang, Chiung-Yao; Ou, Wei-Chih; Chen, Pei-Lain; Shen, Cheng-Huang; Lin, Mien-Chun; Chang, Deching; Wang, Meilin

2012-12-01

Kidney cells are the common host for JC virus (JCV) and BK virus (BKV). Reactivation of JCV and/or BKV in patients after organ transplantation, such as renal transplantation, may cause hemorrhagic cystitis and polyomavirus-associated nephropathy. Furthermore, JCV and BKV may be shed in the urine after reactivation in the kidney. Rearranged as well as archetypal non-coding control regions (NCCRs) of JCV and BKV have been frequently identified in human samples. In this study, three JC/BK recombined NCCR sequences were identified in the urine of a patient who had undergone renal transplantation. They were designated as JC-BK hybrids 1, 2, and 3. The three JC/BK recombinant NCCRs contain up-stream JCV as well as down-stream BKV sequences. Deletions of both JCV and BKV sequences were found in these recombined NCCRs. Recombination of DNA sequences between JCV and BKV may occur during co-infection due to the relatively high homology of the two viral genomes.

A Somatically Acquired Enhancer of the Androgen Receptor Is a Noncoding Driver in Advanced Prostate Cancer.

PubMed

Takeda, David Y; Spisák, Sándor; Seo, Ji-Heui; Bell, Connor; O'Connor, Edward; Korthauer, Keegan; Ribli, Dezső; Csabai, István; Solymosi, Norbert; Szállási, Zoltán; Stillman, David R; Cejas, Paloma; Qiu, Xintao; Long, Henry W; Tisza, Viktória; Nuzzo, Pier Vitale; Rohanizadegan, Mersedeh; Pomerantz, Mark M; Hahn, William C; Freedman, Matthew L

2018-06-09

Increased androgen receptor (AR) activity drives therapeutic resistance in advanced prostate cancer. The most common resistance mechanism is amplification of this locus presumably targeting the AR gene. Here, we identify and characterize a somatically acquired AR enhancer located 650 kb centromeric to the AR. Systematic perturbation of this enhancer using genome editing decreased proliferation by suppressing AR levels. Insertion of an additional copy of this region sufficed to increase proliferation under low androgen conditions and to decrease sensitivity to enzalutamide. Epigenetic data generated in localized prostate tumors and benign specimens support the notion that this region is a developmental enhancer. Collectively, these observations underscore the importance of epigenomic profiling in primary specimens and the value of deploying genome editing to functionally characterize noncoding elements. More broadly, this work identifies a therapeutic vulnerability for targeting the AR and emphasizes the importance of regulatory elements as highly recurrent oncogenic drivers. Copyright © 2018 Elsevier Inc. All rights reserved.
Non-coding recurrent mutations in chronic lymphocytic leukaemia.

PubMed

Puente, Xose S; Beà, Silvia; Valdés-Mas, Rafael; Villamor, Neus; Gutiérrez-Abril, Jesús; Martín-Subero, José I; Munar, Marta; Rubio-Pérez, Carlota; Jares, Pedro; Aymerich, Marta; Baumann, Tycho; Beekman, Renée; Belver, Laura; Carrio, Anna; Castellano, Giancarlo; Clot, Guillem; Colado, Enrique; Colomer, Dolors; Costa, Dolors; Delgado, Julio; Enjuanes, Anna; Estivill, Xavier; Ferrando, Adolfo A; Gelpí, Josep L; González, Blanca; González, Santiago; González, Marcos; Gut, Marta; Hernández-Rivas, Jesús M; López-Guerra, Mónica; Martín-García, David; Navarro, Alba; Nicolás, Pilar; Orozco, Modesto; Payer, Ángel R; Pinyol, Magda; Pisano, David G; Puente, Diana A; Queirós, Ana C; Quesada, Víctor; Romeo-Casabona, Carlos M; Royo, Cristina; Royo, Romina; Rozman, María; Russiñol, Nuria; Salaverría, Itziar; Stamatopoulos, Kostas; Stunnenberg, Hendrik G; Tamborero, David; Terol, María J; Valencia, Alfonso; López-Bigas, Nuria; Torrents, David; Gut, Ivo; López-Guillermo, Armando; López-Otín, Carlos; Campo, Elías

2015-10-22

Chronic lymphocytic leukaemia (CLL) is a frequent disease in which the genetic alterations determining the clinicobiological behaviour are not fully understood. Here we describe a comprehensive evaluation of the genomic landscape of 452 CLL cases and 54 patients with monoclonal B-lymphocytosis, a precursor disorder. We extend the number of CLL driver alterations, including changes in ZNF292, ZMYM3, ARID1A and PTPN11. We also identify novel recurrent mutations in non-coding regions, including the 3' region of NOTCH1, which cause aberrant splicing events, increase NOTCH1 activity and result in a more aggressive disease. In addition, mutations in an enhancer located on chromosome 9p13 result in reduced expression of the B-cell-specific transcription factor PAX5. The accumulative number of driver alterations (0 to ≥4) discriminated between patients with differences in clinical behaviour. This study provides an integrated portrait of the CLL genomic landscape, identifies new recurrent driver mutations of the disease, and suggests clinical interventions that may improve the management of this neoplasia.
Complete Sequence of the mitochondrial genome of the tapeworm Hymenolepis diminuta: Gene arrangements indicate that platyhelminths are eutrochozoans

DOE Office of Scientific and Technical Information (OSTI.GOV)

von Nickisch-Rosenegk, Markus; Brown, Wesley M.; Boore, Jeffrey L.

2001-01-01

Using ''long-PCR'' we have amplified in overlapping fragments the complete mitochondrial genome of the tapeworm Hymenolepis diminuta (Platyhelminthes: Cestoda) and determined its 13,900 nucleotide sequence. The gene content is the same as that typically found for animal mitochondrial DNA (mtDNA) except that atp8 appears to be lacking, a condition found previously for several other animals. Despite the small size of this mtDNA, there are two large non-coding regions, one of which contains 13 repeats of a 31 nucleotide sequence and a potential stem-loop structure of 25 base pairs with an 11-member loop. Large potential secondary structures are identified also formore » the non-coding regions of two other cestode mtDNAs. Comparison of the mitochondrial gene arrangement of H. diminuta with those previously published supports a phylogenetic position of flatworms as members of the Eutrochozoa, rather than being basal to either a clade of protostomes or a clade of coelomates.« less
α satellite DNA variation and function of the human centromere

PubMed Central

Sullivan, Lori L.; Chew, Kimberline

2017-01-01

ABSTRACT Genomic variation is a source of functional diversity that is typically studied in genic and non-coding regulatory regions. However, the extent of variation within noncoding portions of the human genome, particularly highly repetitive regions, and the functional consequences are not well understood. Satellite DNA, including α satellite DNA found at human centromeres, comprises up to 10% of the genome, but is difficult to study because its repetitive nature hinders contiguous sequence assemblies. We recently described variation within α satellite DNA that affects centromere function. On human chromosome 17 (HSA17), we showed that size and sequence polymorphisms within primary array D17Z1 are associated with chromosome aneuploidy and defective centromere architecture. However, HSA17 can counteract this instability by assembling the centromere at a second, “backup” array lacking variation. Here, we discuss our findings in a broader context of human centromere assembly, and highlight areas of future study to uncover links between genomic and epigenetic features of human centromeres. PMID:28406740
The complete mitochondrial genome of the green lizard Lacerta viridis viridis (Reptilia: Lacertidae) and its phylogenetic position within squamate reptiles.

PubMed

Böhme, M U; Fritzsch, G; Tippmann, A; Schlegel, M; Berendonk, T U

2007-06-01

For the first time the complete mitochondrial genome was sequenced for a member of Lacertidae. Lacerta viridis viridis was sequenced in order to compare the phylogenetic relationships of this family to other reptilian lineages. Using the long-polymerase chain reaction (long PCR) we characterized a mitochondrial genome, 17,156 bp long showing a typical vertebrate pattern with 13 protein coding genes, 22 transfer RNAs (tRNA), two ribosomal RNAs (rRNA) and one major noncoding region. The noncoding region of L. v. viridis was characterized by a conspicuous 35 bp tandem repeat at its 5' terminus. A phylogenetic study including all currently available squamate mitochondrial sequences demonstrates the position of Lacertidae within a monophyletic squamate group. We obtained a narrow relationship of Lacertidae to Scincidae, Iguanidae, Varanidae, Anguidae, and Cordylidae. Although, the internal relationships within this group yielded only a weak resolution and low bootstrap support, the revealed relationships were more congruent with morphological studies than with recent molecular analyses.
An analytical framework for whole-genome sequence association studies and its implications for autism spectrum disorder.

PubMed

Werling, Donna M; Brand, Harrison; An, Joon-Yong; Stone, Matthew R; Zhu, Lingxue; Glessner, Joseph T; Collins, Ryan L; Dong, Shan; Layer, Ryan M; Markenscoff-Papadimitriou, Eirene; Farrell, Andrew; Schwartz, Grace B; Wang, Harold Z; Currall, Benjamin B; Zhao, Xuefang; Dea, Jeanselle; Duhn, Clif; Erdman, Carolyn A; Gilson, Michael C; Yadav, Rachita; Handsaker, Robert E; Kashin, Seva; Klei, Lambertus; Mandell, Jeffrey D; Nowakowski, Tomasz J; Liu, Yuwen; Pochareddy, Sirisha; Smith, Louw; Walker, Michael F; Waterman, Matthew J; He, Xin; Kriegstein, Arnold R; Rubenstein, John L; Sestan, Nenad; McCarroll, Steven A; Neale, Benjamin M; Coon, Hilary; Willsey, A Jeremy; Buxbaum, Joseph D; Daly, Mark J; State, Matthew W; Quinlan, Aaron R; Marth, Gabor T; Roeder, Kathryn; Devlin, Bernie; Talkowski, Michael E; Sanders, Stephan J

2018-05-01

Genomic association studies of common or rare protein-coding variation have established robust statistical approaches to account for multiple testing. Here we present a comparable framework to evaluate rare and de novo noncoding single-nucleotide variants, insertion/deletions, and all classes of structural variation from whole-genome sequencing (WGS). Integrating genomic annotations at the level of nucleotides, genes, and regulatory regions, we define 51,801 annotation categories. Analyses of 519 autism spectrum disorder families did not identify association with any categories after correction for 4,123 effective tests. Without appropriate correction, biologically plausible associations are observed in both cases and controls. Despite excluding previously identified gene-disrupting mutations, coding regions still exhibited the strongest associations. Thus, in autism, the contribution of de novo noncoding variation is probably modest in comparison to that of de novo coding variants. Robust results from future WGS studies will require large cohorts and comprehensive analytical strategies that consider the substantial multiple-testing burden.
DIANA-LncBase v2: indexing microRNA targets on non-coding transcripts.

PubMed

Paraskevopoulou, Maria D; Vlachos, Ioannis S; Karagkouni, Dimitra; Georgakilas, Georgios; Kanellos, Ilias; Vergoulis, Thanasis; Zagganas, Konstantinos; Tsanakas, Panayiotis; Floros, Evangelos; Dalamagas, Theodore; Hatzigeorgiou, Artemis G

2016-01-04

microRNAs (miRNAs) are short non-coding RNAs (ncRNAs) that act as post-transcriptional regulators of coding gene expression. Long non-coding RNAs (lncRNAs) have been recently reported to interact with miRNAs. The sponge-like function of lncRNAs introduces an extra layer of complexity in the miRNA interactome. DIANA-LncBase v1 provided a database of experimentally supported and in silico predicted miRNA Recognition Elements (MREs) on lncRNAs. The second version of LncBase (www.microrna.gr/LncBase) presents an extensive collection of miRNA:lncRNA interactions. The significantly enhanced database includes more than 70 000 low and high-throughput, (in)direct miRNA:lncRNA experimentally supported interactions, derived from manually curated publications and the analysis of 153 AGO CLIP-Seq libraries. The new experimental module presents a 14-fold increase compared to the previous release. LncBase v2 hosts in silico predicted miRNA targets on lncRNAs, identified with the DIANA-microT algorithm. The relevant module provides millions of predicted miRNA binding sites, accompanied with detailed metadata and MRE conservation metrics. LncBase v2 caters information regarding cell type specific miRNA:lncRNA regulation and enables users to easily identify interactions in 66 different cell types, spanning 36 tissues for human and mouse. Database entries are also supported by accurate lncRNA expression information, derived from the analysis of more than 6 billion RNA-Seq reads. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
NPInter v3.0: an upgraded database of noncoding RNA-associated interactions

PubMed Central

Hao, Yajing; Wu, Wei; Li, Hui; Yuan, Jiao; Luo, Jianjun; Zhao, Yi; Chen, Runsheng

2016-01-01

Despite the fact that a large quantity of noncoding RNAs (ncRNAs) have been identified, their functions remain unclear. To enable researchers to have a better understanding of ncRNAs’ functions, we updated the NPInter database to version 3.0, which contains experimentally verified interactions between ncRNAs (excluding tRNAs and rRNAs), especially long noncoding RNAs (lncRNAs) and other biomolecules (proteins, mRNAs, miRNAs and genomic DNAs). In NPInter v3.0, interactions pertaining to ncRNAs are not only manually curated from scientific literature but also curated from high-throughput technologies. In addition, we also curated lncRNA–miRNA interactions from in silico predictions supported by AGO CLIP-seq data. When compared with NPInter v2.0, the interactions are more informative (with additional information on tissues or cell lines, binding sites, conservation, co-expression values and other features) and more organized (with divisions on data sets by data sources, tissues or cell lines, experiments and other criteria). NPInter v3.0 expands the data set to 491,416 interactions in 188 tissues (or cell lines) from 68 kinds of experimental technologies. NPInter v3.0 also improves the user interface and adds new web services, including a local UCSC Genome Browser to visualize binding sites. Additionally, NPInter v3.0 defined a high-confidence set of interactions and predicted the functions of lncRNAs in human and mouse based on the interactions curated in the database. NPInter v3.0 is available at http://www.bioinfo.org/NPInter/. Database URL: http://www.bioinfo.org/NPInter/ PMID:27087310
The Non-Coding RNA Ncr0700/PmgR1 is Required for Photomixotrophic Growth and the Regulation of Glycogen Accumulation in the Cyanobacterium Synechocystis sp. PCC 6803.

PubMed

de Porcellinis, Alice J; Klähn, Stephan; Rosgaard, Lisa; Kirsch, Rebekka; Gutekunst, Kirstin; Georg, Jens; Hess, Wolfgang R; Sakuragi, Yumiko

2016-10-01

Carbohydrate metabolism is a tightly regulated process in photosynthetic organisms. In the cyanobacterium Synechocystis sp. PCC 6803, the photomixotrophic growth protein A (PmgA) is involved in the regulation of glucose and storage carbohydrate (i.e. glycogen) metabolism, while its biochemical activity and possible factors acting downstream of PmgA are unknown. Here, a genome-wide microarray analysis of a ΔpmgA strain identified the expression of 36 protein-coding genes and 42 non-coding transcripts as significantly altered. From these, the non-coding RNA Ncr0700 was identified as the transcript most strongly reduced in abundance. Ncr0700 is widely conserved among cyanobacteria. In Synechocystis its expression is inversely correlated with light intensity. Similarly to a ΔpmgA mutant, a Δncr0700 deletion strain showed an approximately 2-fold increase in glycogen content under photoautotrophic conditions and wild-type-like growth. Moreover, its growth was arrested by 38 h after a shift to photomixotrophic conditions. Ectopic expression of Ncr0700 in Δncr0700 and ΔpmgA restored the glycogen content and photomixotrophic growth to wild-type levels. These results indicate that Ncr0700 is required for photomixotrophic growth and the regulation of glycogen accumulation, and acts downstream of PmgA. Hence Ncr0700 is renamed here as PmgR1 for photomixotrophic growth RNA 1. © The Author 2016. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists. All rights reserved. For permissions, please email: journals.permissions@oup.com.
SNPs in putative regulatory regions identified by human mouse comparative sequencing and transcription factor binding site data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Banerjee, Poulabi; Bahlo, Melanie; Schwartz, Jody R.

2002-01-01

Genome wide disease association analysis using SNPs is being explored as a method for dissecting complex genetic traits and a vast number of SNPs have been generated for this purpose. As there are cost and throughput limitations of genotyping large numbers of SNPs and statistical issues regarding the large number of dependent tests on the same data set, to make association analysis practical it has been proposed that SNPs should be prioritized based on likely functional importance. The most easily identifiable functional SNPs are coding SNPs (cSNPs) and accordingly cSNPs have been screened in a number of studies. SNPs inmore » gene regulatory sequences embedded in noncoding DNA are another class of SNPs suggested for prioritization due to their predicted quantitative impact on gene expression. The main challenge in evaluating these SNPs, in contrast to cSNPs is a lack of robust algorithms and databases for recognizing regulatory sequences in noncoding DNA. Approaches that have been previously used to delineate noncoding sequences with gene regulatory activity include cross-species sequence comparisons and the search for sequences recognized by transcription factors. We combined these two methods to sift through mouse human genomic sequences to identify putative gene regulatory elements and subsequently localized SNPs within these sequences in a 1 Megabase (Mb) region of human chromosome 5q31, orthologous to mouse chromosome 11 containing the Interleukin cluster.« less
Non coding RNAs in vascular disease - from basic science to clinical applications: Scientific update from the Working Group of Myocardial Function of the European Society of Cardiology

PubMed

Fiedler, Jan; Baker, Andrew H; Dimmeler, Stefanie; Heymans, Stephane; Mayr, Manuel; Thum, Thomas

2018-05-23

Non-coding RNAs are increasingly recognized not only as regulators of various biological functions but also as targets for a new generation of RNA therapeutics and biomarkers. We hereby review recent insights relating to non-coding RNAs including microRNAs (e.g. miR-126, miR-146a), long non-coding RNAs (e.g. MIR503HG, GATA6-AS, SMILR) and circular RNAs (e.g. cZNF292) and their role in vascular diseases. This includes identification and therapeutic use of hypoxia-regulated non-coding RNAs and endogenous non-coding RNAs that regulate intrinsic smooth muscle cell signalling, age-related non-coding RNAs and non-coding RNAs involved in the regulation of mitochondrial biology and metabolic control. Finally, we discuss non-coding RNA species with biomarker potential.
Intergenic disease-associated regions are abundant in novel transcripts.

PubMed

Bartonicek, N; Clark, M B; Quek, X C; Torpy, J R; Pritchard, A L; Maag, J L V; Gloss, B S; Crawford, J; Taft, R J; Hayward, N K; Montgomery, G W; Mattick, J S; Mercer, T R; Dinger, M E

2017-12-28

Genotyping of large populations through genome-wide association studies (GWAS) has successfully identified many genomic variants associated with traits or disease risk. Unexpectedly, a large proportion of GWAS single nucleotide polymorphisms (SNPs) and associated haplotype blocks are in intronic and intergenic regions, hindering their functional evaluation. While some of these risk-susceptibility regions encompass cis-regulatory sites, their transcriptional potential has never been systematically explored. To detect rare tissue-specific expression, we employed the transcript-enrichment method CaptureSeq on 21 human tissues to identify 1775 multi-exonic transcripts from 561 intronic and intergenic haploblocks associated with 392 traits and diseases, covering 73.9 Mb (2.2%) of the human genome. We show that a large proportion (85%) of disease-associated haploblocks express novel multi-exonic non-coding transcripts that are tissue-specific and enriched for GWAS SNPs as well as epigenetic markers of active transcription and enhancer activity. Similarly, we captured transcriptomes from 13 melanomas, targeting nine melanoma-associated haploblocks, and characterized 31 novel melanoma-specific transcripts that include fusion proteins, novel exons and non-coding RNAs, one-third of which showed allelically imbalanced expression. This resource of previously unreported transcripts in disease-associated regions ( http://gwas-captureseq.dingerlab.org ) should provide an important starting point for the translational community in search of novel biomarkers, disease mechanisms, and drug targets.
[Exon-intron structure of the fet5+ gene of Schizosaccharomyces pombe and physical mapping of genome encompassing regions].

PubMed

Shpakovskiĭ, G V; Lebedenko, E N

1998-01-01

Plasmid pYUK3 bearing the fet5+ gene of Schizosaccharomyces pombe was isolated from a genomic library of the fission yeast, and a detailed physical map of the whole genomic insert (ca. 9.6 Kbp) was constructed. The primary structure of the fet5+ gene and its flanking regions is established. The gene contains a single 45-bp intron in its distal part. A typical TATA-box (TATAAG) was found in the 5'-noncoding region ca. 50 bp upstream of the putative start of transcription, and the 3'-noncoding region contains AT-rich palindromes, which are probably involved in termination of the fet5+ transcription. A previously unidentified gene of Sz. pombe encoding a protein with some similarity to one of the transcriptional activators from the TBP (TATA-binding protein) group of SPT factors of transcription was found in the vicinity of the fet5+ gene. Taking into account that cDNA of the fet5(+)-gene was isolated as a suppressor of the genetic-defect of nuclear RNA polymerases I-III (Bioorg. Khim., 1997, vol. 23, No 3, pp. 234-237), this vicinity may be the first evidence of possible clustering, in the genome of the fission yeast, of genes participating in transcription regulation.
The non-coding RNA landscape of human hematopoiesis and leukemia.

PubMed

Schwarzer, Adrian; Emmrich, Stephan; Schmidt, Franziska; Beck, Dominik; Ng, Michelle; Reimer, Christina; Adams, Felix Ferdinand; Grasedieck, Sarah; Witte, Damian; Käbler, Sebastian; Wong, Jason W H; Shah, Anushi; Huang, Yizhou; Jammal, Razan; Maroz, Aliaksandra; Jongen-Lavrencic, Mojca; Schambach, Axel; Kuchenbauer, Florian; Pimanda, John E; Reinhardt, Dirk; Heckl, Dirk; Klusmann, Jan-Henning

2017-08-09

Non-coding RNAs have emerged as crucial regulators of gene expression and cell fate decisions. However, their expression patterns and regulatory functions during normal and malignant human hematopoiesis are incompletely understood. Here we present a comprehensive resource defining the non-coding RNA landscape of the human hematopoietic system. Based on highly specific non-coding RNA expression portraits per blood cell population, we identify unique fingerprint non-coding RNAs-such as LINC00173 in granulocytes-and assign these to critical regulatory circuits involved in blood homeostasis. Following the incorporation of acute myeloid leukemia samples into the landscape, we further uncover prognostically relevant non-coding RNA stem cell signatures shared between acute myeloid leukemia blasts and healthy hematopoietic stem cells. Our findings highlight the importance of the non-coding transcriptome in the formation and maintenance of the human blood hierarchy.While micro-RNAs are known regulators of haematopoiesis and leukemogenesis, the role of long non-coding RNAs is less clear. Here the authors provide a non-coding RNA expression landscape of the human hematopoietic system, highlighting their role in the formation and maintenance of the human blood hierarchy.
Circular RNA: a new star in neurological diseases.

PubMed

Li, Tao-Ran; Jia, Yan-Jie; Wang, Qun; Shao, Xiao-Qiu; Lv, Rui-Juan

2017-08-01

Circular RNAs (circRNAs) are novel endogenous non-coding RNAs characterized by the presence of a covalent bond linking the 3' and 5' ends generated by backsplicing. In this review, we summarize a number of the latest theories regarding the biogenesis, properties and functions of circRNAs. Specifically, we focus on the advancing characteristics and functions of circRNAs in the brain and neurological diseases. CircRNAs exhibit the characteristics of species conservation, abundance and tissue/developmental-stage-specific expression in the brain. We also describe the relationship between circRNAs and several neurological diseases and highlight their functions in neurological diseases.
The ribonucleoprotein Csr network.

PubMed

Seyll, Ethel; Van Melderen, Laurence

2013-11-08

Ribonucleoprotein complexes are essential regulatory components in bacteria. In this review, we focus on the carbon storage regulator (Csr) network, which is well conserved in the bacterial world. This regulatory network is composed of the CsrA master regulator, its targets and regulators. CsrA binds to mRNA targets and regulates translation either negatively or positively. Binding to small non-coding RNAs controls activity of this protein. Expression of these regulators is tightly regulated at the level of transcription and stability by various global regulators (RNAses, two-component systems, alarmone). We discuss the implications of these complex regulations in bacterial adaptation.
Polyomavirus BK non-coding control region rearrangements in health and disease.

PubMed

Sharma, Preety M; Gupta, Gaurav; Vats, Abhay; Shapiro, Ron; Randhawa, Parmjeet S

2007-08-01

BK virus is an increasingly recognized pathogen in transplanted patients. DNA sequencing of this virus shows considerable genomic variability. To understand the clinical significance of rearrangements in the non-coding control region (NCCR) of BK virus (BKV), we report a meta-analysis of 507 sequences, including 40 sequences generated in our own laboratory, for associations between rearrangements and disease, tissue tropism, geographic origin, and viral genotype. NCCR rearrangements were less frequent in (a) asymptomatic BKV viruria compared to patients viral nephropathy (1.7% vs. 22.5%), and (b) viral genotype 1 compared to other genotypes (2.4% vs. 11.2%). Rearrangements were commoner in malignancy (78.6%), and Norwegians (45.7%), and less common in East Indians (0%), and Japanese (4.3%). A surprising number of rearranged sequences were reported from mononuclear cells of healthy subjects, whereas most plasma sequences were archetypal. This difference could not be related to potential recombinase activity in lymphocytes, as consensus recombination signal sequences could not be found in the NCCR region. NCCR rearrangements are neither required nor a sufficient condition to produce clinical disease. BKV nephropathy and hemorrhagic cystitis are not associated with any unique NCCR configuration or nucleotide sequence.
Self-organizing approach for meta-genomes.

PubMed

Zhu, Jianfeng; Zheng, Wei-Mou

2014-12-01

We extend the self-organizing approach for annotation of a bacterial genome to analyze the raw sequencing data of the human gut metagenome without sequence assembling. The original approach divides the genomic sequence of a bacterium into non-overlapping segments of equal length and assigns to each segment one of seven 'phases', among which one is for the noncoding regions, three for the direct coding regions to indicate the three possible codon positions of the segment starting site, and three for the reverse coding regions. The noncoding phase and the six coding phases are described by two frequency tables of the 64 triplet types or 'codon usages'. A set of codon usages can be used to update the phase assignment and vice versa. An iteration after an initialization leads to a convergent phase assignment to give an annotation of the genome. In the extension of the approach to a metagenome, we consider a mixture model of a number of categories described by different codon usages. The Illumina Genome Analyzer sequencing data of the total DNA from faecal samples are then examined to understand the diversity of the human gut microbiome. Copyright © 2014 Elsevier Ltd. All rights reserved.
Nonneutral GC3 and retroelement codon mimicry in Phytophthora.

PubMed

Jiang, Rays H Y; Govers, Francine

2006-10-01

Phytophthora is a genus entirely comprised of destructive plant pathogens. It belongs to the Stramenopila, a unique branch of eukaryotes, phylogenetically distinct from plants, animals, or fungi. Phytophthora genes show a strong preference for usage of codons ending with G or C (high GC3). The presence of high GC3 in genes can be utilized to differentiate coding regions from noncoding regions in the genome. We found that both selective pressure and mutation bias drive codon bias in Phytophthora. Indicative for selection pressure is the higher GC3 value of highly expressed genes in different Phytophthora species. Lineage specific GC increase of noncoding regions is reminiscent of whole-genome mutation bias, whereas the elevated Phytophthora GC3 is primarily a result of translation efficiency-driven selection. Heterogeneous retrotransposons exist in Phytophthora genomes and many of them vary in their GC content. Interestingly, the most widespread groups of retroelements in Phytophthora show high GC3 and a codon bias that is similar to host genes. Apparently, selection pressure has been exerted on the retroelement's codon usage, and such mimicry of host codon bias might be beneficial for the propagation of retrotransposons.
Transcriptional regulatory elements in the noncoding region of human papillomavirus type 6

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wu, Tzyy-Choou.

1989-01-01

The structure and function of the transcriptional regulatory region of human papillomavirus type 6 (HPV-6) has been investigated. To investigate tissue specific gene expression, a sensitive method to detect and localize HPV-6 viral DNA, mRNA and protein in plastic-embedded tissue sections of genital and respiratory tract papillomata by using in situ hybridization and immunoperoxidase assays has been developed. This method, using ultrathin sections and strand-specific {sup 3}H labeled riboprobes, offers the advantages of superior morphological preservation and detection of viral genomes at low copy number with good resolution, and the modified immunocytochemistry provides better sensitivity. The results suggest that genitalmore » tract epithelium is more permissive for HPV-6 replication than respiratory tract epithelium. To study the tissue tropism of HPV-6 at the level of regulation of viral gene expression, the polymerase chain reaction was used to isolate the noncoding region (NCR) of HPV-6 in independent isolates. Nucleotide sequence analysis of molecularly cloned DNA identified base substitutions, deletions/insertions and tandem duplications. Transcriptional regulatory elements in the NCR were assayed in recombinant plasmids containing the bacterial gene for chloramphenicol acetyl transferase.« less

Function and regulation of AUTS2, a gene implicated in autism and human evolution.

PubMed

Oksenberg, Nir; Stevison, Laurie; Wall, Jeffrey D; Ahituv, Nadav

2013-01-01

Nucleotide changes in the AUTS2 locus, some of which affect only noncoding regions, are associated with autism and other neurological disorders, including attention deficit hyperactivity disorder, epilepsy, dyslexia, motor delay, language delay, visual impairment, microcephaly, and alcohol consumption. In addition, AUTS2 contains the most significantly accelerated genomic region differentiating humans from Neanderthals, which is primarily composed of noncoding variants. However, the function and regulation of this gene remain largely unknown. To characterize auts2 function, we knocked it down in zebrafish, leading to a smaller head size, neuronal reduction, and decreased mobility. To characterize AUTS2 regulatory elements, we tested sequences for enhancer activity in zebrafish and mice. We identified 23 functional zebrafish enhancers, 10 of which were active in the brain. Our mouse enhancer assays characterized three mouse brain enhancers that overlap an ASD-associated deletion and four mouse enhancers that reside in regions implicated in human evolution, two of which are active in the brain. Combined, our results show that AUTS2 is important for neurodevelopment and expose candidate enhancer sequences in which nucleotide variation could lead to neurological disease and human-specific traits.
Glucose-6-phosphate dehydrogenase (G6PD) mutations database: review of the "old" and update of the new mutations.

PubMed

Minucci, Angelo; Moradkhani, Kamran; Hwang, Ming Jing; Zuppi, Cecilia; Giardina, Bruno; Capoluongo, Ettore

2012-03-15

In the present paper we have updated the G6PD mutations database, including all the last discovered G6PD genetic variants. We underline that the last database has been published by Vulliamy et al. [1] who analytically reported 140 G6PD mutations: along with Vulliamy's database, there are two main sites, such as http://202.120.189.88/mutdb/ and www.LOVD.nl/MR, where almost all G6PD mutations can be found. Compared to the previous mutation reports, in our paper we have included for each mutation some additional information, such as: the secondary structure and the enzyme 3D position involving by mutation, the creation or abolition of a restriction site (with the enzyme involved) and the conservation score associated with each amino acid position. The mutations reported in the present tab have been divided according to the gene's region involved (coding and non-coding) and mutations affecting the coding region in: single, multiple (at least with two bases involved) and deletion. We underline that for the listed mutations, reported in italic, literature doesn't provide all the biochemical or bio-molecular information or the research data. Finally, for the "old" mutations, we tried to verify features previously reported and, when subsequently modified, we updated the specific information using the latest literature data. Copyright © 2012 Elsevier Inc. All rights reserved.
The crystal structure of the Split End protein SHARP adds a new layer of complexity to proteins containing RNA recognition motifs

PubMed Central

Arieti, Fabiana; Gabus, Caroline; Tambalo, Margherita; Huet, Tiphaine; Round, Adam; Thore, Stéphane

2014-01-01

The Split Ends (SPEN) protein was originally discovered in Drosophila in the late 1990s. Since then, homologous proteins have been identified in eukaryotic species ranging from plants to humans. Every family member contains three predicted RNA recognition motifs (RRMs) in the N-terminal region of the protein. We have determined the crystal structure of the region of the human SPEN homolog that contains these RRMs—the SMRT/HDAC1 Associated Repressor Protein (SHARP), at 2.0 Å resolution. SHARP is a co-regulator of the nuclear receptors. We demonstrate that two of the three RRMs, namely RRM3 and RRM4, interact via a highly conserved interface. Furthermore, we show that the RRM3–RRM4 block is the main platform mediating the stable association with the H12–H13 substructure found in the steroid receptor RNA activator (SRA), a long, non-coding RNA previously shown to play a crucial role in nuclear receptor transcriptional regulation. We determine that SHARP association with SRA relies on both single- and double-stranded RNA sequences. The crystal structure of the SHARP–RRM fragment, together with the associated RNA-binding studies, extend the repertoire of nucleic acid binding properties of RRM domains suggesting a new hypothesis for a better understanding of SPEN protein functions. PMID:24748666
Experimental RNomics in Aquifex aeolicus: identification of small non-coding RNAs and the putative 6S RNA homolog

PubMed Central

Willkomm, Dagmar K.; Minnerup, Jens; Hüttenhofer, Alexander; Hartmann, Roland K.

2005-01-01

By an experimental RNomics approach, we have generated a cDNA library from small RNAs expressed from the genome of the hyperthermophilic bacterium Aquifex aeolicus. The library included RNAs that were antisense to mRNAs and tRNAs as well as RNAs encoded in intergenic regions. Substantial steady-state levels in A.aeolicus cells were confirmed for several of the cloned RNAs by northern blot analysis. The most abundant intergenic RNA of the library was identified as the 6S RNA homolog of A.aeolicus. Although shorter in size (150 nt) than its γ-proteobacterial homologs (∼185 nt), it is predicted to have the most stable structure among known 6S RNAs. As in the γ-proteobacteria, the A.aeolicus 6S RNA gene (ssrS) is located immediately upstream of the ygfA gene encoding a widely conserved 5-formyltetrahydrofolate cyclo-ligase. We identifed novel 6S RNA candidates within the γ-proteobacteria but were unable to identify reasonable 6S RNA candidates in other bacterial branches, utilizing mfold analyses of the region immediately upstream of ygfA combined with 6S RNA blastn searches. By RACE experiments, we mapped the major transcription initiation site of A.aeolicus 6S RNA primary transcripts, located within the pheT gene preceding ygfA, as well as three processing sites. PMID:15814812
Metazoan tRNA introns generate stable circular RNAs in vivo.

PubMed

Lu, Zhipeng; Filonov, Grigory S; Noto, John J; Schmidt, Casey A; Hatkevich, Talia L; Wen, Ying; Jaffrey, Samie R; Matera, A Gregory

2015-09-01

We report the discovery of a class of abundant circular noncoding RNAs that are produced during metazoan tRNA splicing. These transcripts, termed tRNA intronic circular (tric)RNAs, are conserved features of animal transcriptomes. Biogenesis of tricRNAs requires anciently conserved tRNA sequence motifs and processing enzymes, and their expression is regulated in an age-dependent and tissue-specific manner. Furthermore, we exploited this biogenesis pathway to develop an in vivo expression system for generating "designer" circular RNAs in human cells. Reporter constructs expressing RNA aptamers such as Spinach and Broccoli can be used to follow the transcription and subcellular localization of tricRNAs in living cells. Owing to the superior stability of circular vs. linear RNA isoforms, this expression system has a wide range of potential applications, from basic research to pharmaceutical science. © 2015 Lu et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Transcriptional landscapes of Axolotl (Ambystoma mexicanum).

PubMed

Caballero-Pérez, Juan; Espinal-Centeno, Annie; Falcon, Francisco; García-Ortega, Luis F; Curiel-Quesada, Everardo; Cruz-Hernández, Andrés; Bako, Laszlo; Chen, Xuemei; Martínez, Octavio; Alberto Arteaga-Vázquez, Mario; Herrera-Estrella, Luis; Cruz-Ramírez, Alfredo

2018-01-15

The axolotl (Ambystoma mexicanum) is the vertebrate model system with the highest regeneration capacity. Experimental tools established over the past 100 years have been fundamental to start unraveling the cellular and molecular basis of tissue and limb regeneration. In the absence of a reference genome for the Axolotl, transcriptomic analysis become fundamental to understand the genetic basis of regeneration. Here we present one of the most diverse transcriptomic data sets for Axolotl by profiling coding and non-coding RNAs from diverse tissues. We reconstructed a population of 115,906 putative protein coding mRNAs as full ORFs (including isoforms). We also identified 352 conserved miRNAs and 297 novel putative mature miRNAs. Systematic enrichment analysis of gene expression allowed us to identify tissue-specific protein-coding transcripts. We also found putative novel and conserved microRNAs which potentially target mRNAs which are reported as important disease candidates in heart and liver. Copyright © 2017 Elsevier Inc. All rights reserved.
Strategies and tools for whole genome alignments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Couronne, Olivier; Poliakov, Alexander; Bray, Nicolas

2002-11-25

The availability of the assembled mouse genome makespossible, for the first time, an alignment and comparison of two largevertebrate genomes. We have investigated different strategies ofalignment for the subsequent analysis of conservation of genomes that areeffective for different quality assemblies. These strategies were appliedto the comparison of the working draft of the human genome with the MouseGenome Sequencing Consortium assembly, as well as other intermediatemouse assemblies. Our methods are fast and the resulting alignmentsexhibit a high degree of sensitivity, covering more than 90 percent ofknown coding exons in the human genome. We have obtained such coveragewhile preserving specificity. With amore » view towards the end user, we havedeveloped a suite of tools and websites for automatically aligning, andsubsequently browsing and working with whole genome comparisons. Wedescribe the use of these tools to identify conserved non-coding regionsbetween the human and mouse genomes, some of which have not beenidentified by other methods.« less
Computational analysis of conserved RNA secondary structure in transcriptomes and genomes.

PubMed

Eddy, Sean R

2014-01-01

Transcriptomics experiments and computational predictions both enable systematic discovery of new functional RNAs. However, many putative noncoding transcripts arise instead from artifacts and biological noise, and current computational prediction methods have high false positive rates. I discuss prospects for improving computational methods for analyzing and identifying functional RNAs, with a focus on detecting signatures of conserved RNA secondary structure. An interesting new front is the application of chemical and enzymatic experiments that probe RNA structure on a transcriptome-wide scale. I review several proposed approaches for incorporating structure probing data into the computational prediction of RNA secondary structure. Using probabilistic inference formalisms, I show how all these approaches can be unified in a well-principled framework, which in turn allows RNA probing data to be easily integrated into a wide range of analyses that depend on RNA secondary structure inference. Such analyses include homology search and genome-wide detection of new structural RNAs.
Noncoding sequence classification based on wavelet transform analysis: part I

NASA Astrophysics Data System (ADS)

Paredes, O.; Strojnik, M.; Romo-Vázquez, R.; Vélez Pérez, H.; Ranta, R.; Garcia-Torales, G.; Scholl, M. K.; Morales, J. A.

2017-09-01

DNA sequences in human genome can be divided into the coding and noncoding ones. Coding sequences are those that are read during the transcription. The identification of coding sequences has been widely reported in literature due to its much-studied periodicity. Noncoding sequences represent the majority of the human genome. They play an important role in gene regulation and differentiation among the cells. However, noncoding sequences do not exhibit periodicities that correlate to their functions. The ENCODE (Encyclopedia of DNA elements) and Epigenomic Roadmap Project projects have cataloged the human noncoding sequences into specific functions. We study characteristics of noncoding sequences with wavelet analysis of genomic signals.
Retention of duplicated long-wavelength opsins in mosquito lineages by positive selection and differential expression.

PubMed

Giraldo-Calderón, Gloria I; Zanis, Michael J; Hill, Catherine A

2017-03-21

Opsins are light sensitive receptors associated with visual processes. Insects typically possess opsins that are stimulated by ultraviolet, short and long wavelength (LW) radiation. Six putative LW-sensitive opsins predicted in the yellow fever mosquito, Aedes aegypti and malaria mosquito, Anopheles gambiae, and eight in the southern house mosquito, Culex quinquefasciatus, suggest gene expansion in the Family Culicidae (mosquitoes) relative to other insects. Here we report the first detailed molecular and evolutionary analyses of LW opsins in three mosquito vectors, with a goal to understanding the molecular basis of opsin-mediated visual processes that could be exploited for mosquito control. Time of divergence estimates suggest that the mosquito LW opsins originated from 18 or 19 duplication events between 166.9/197.5 to 1.07/0.94 million years ago (MY) and that these likely occurred following the predicted divergence of the lineages Anophelinae and Culicinae 145-226 MY. Fitmodel analyses identified nine amino acid residues in the LW opsins that may be under positive selection. Of these, eight amino acids occur in the N and C termini and are shared among all three species, and one residue in TMIII was unique to culicine species. Alignment of 5' non-coding regions revealed potential Conserved Non-coding Sequences (CNS) and transcription factor binding sites (TFBS) in seven pairs of LW opsin paralogs. Our analyses suggest opsin gene duplication and residues possibly associated with spectral tuning of LW-sensitive photoreceptors. We explore two mechanisms - positive selection and differential expression mediated by regulatory units in CNS - that may have contributed to the retention of LW opsin genes in Culicinae and Anophelinae. We discuss the evolution of mosquito LW opsins in the context of major Earth events and possible adaptation of mosquitoes to LW-dominated photo environments, and implications for mosquito control strategies based on disrupting vision-mediated behaviors.
Conservation of σ28-Dependent Non-Coding RNA Paralogs and Predicted σ54-Dependent Targets in Thermophilic Campylobacter Species

PubMed Central

Le, My Thanh; van Veldhuizen, Mart; Porcelli, Ida; Bongaerts, Roy J.; Gaskin, Duncan J. H.; Pearson, Bruce M.; van Vliet, Arnoud H. M.

2015-01-01

Assembly of flagella requires strict hierarchical and temporal control via flagellar sigma and anti-sigma factors, regulatory proteins and the assembly complex itself, but to date non-coding RNAs (ncRNAs) have not been described to regulate genes directly involved in flagellar assembly. In this study we have investigated the possible role of two ncRNA paralogs (CjNC1, CjNC4) in flagellar assembly and gene regulation of the diarrhoeal pathogen Campylobacter jejuni. CjNC1 and CjNC4 are 37/44 nt identical and predicted to target the 5' untranslated region (5' UTR) of genes transcribed from the flagellar sigma factor σ54. Orthologs of the σ54-dependent 5' UTRs and ncRNAs are present in the genomes of other thermophilic Campylobacter species, and transcription of CjNC1 and CNC4 is dependent on the flagellar sigma factor σ28. Surprisingly, inactivation and overexpression of CjNC1 and CjNC4 did not affect growth, motility or flagella-associated phenotypes such as autoagglutination. However, CjNC1 and CjNC4 were able to mediate sequence-dependent, but Hfq-independent, partial repression of fluorescence of predicted target 5' UTRs in an Escherichia coli-based GFP reporter gene system. This hints towards a subtle role for the CjNC1 and CjNC4 ncRNAs in post-transcriptional gene regulation in thermophilic Campylobacter species, and suggests that the currently used phenotypic methodologies are insufficiently sensitive to detect such subtle phenotypes. The lack of a role of Hfq in the E. coli GFP-based system indicates that the CjNC1 and CjNC4 ncRNAs may mediate post-transcriptional gene regulation in ways that do not conform to the paradigms obtained from the Enterobacteriaceae. PMID:26512728
miPrimer: an empirical-based qPCR primer design method for small noncoding microRNA

PubMed Central

Kang, Shih-Ting; Hsieh, Yi-Shan; Feng, Chi-Ting; Chen, Yu-Ting; Yang, Pok Eric; Chen, Wei-Ming

2018-01-01

MicroRNAs (miRNAs) are 18–25 nucleotides (nt) of highly conserved, noncoding RNAs involved in gene regulation. Because of miRNAs’ short length, the design of miRNA primers for PCR amplification remains a significant challenge. Adding to the challenge are miRNAs similar in sequence and miRNA family members that often only differ in sequences by 1 nt. Here, we describe a novel empirical-based method, miPrimer, which greatly reduces primer dimerization and increases primer specificity by factoring various intrinsic primer properties and employing four primer design strategies. The resulting primer pairs displayed an acceptable qPCR efficiency of between 90% and 110%. When tested on miRNA families, miPrimer-designed primers are capable of discriminating among members of miRNA families, as validated by qPCR assays using Quark Biosciences’ platform. Of the 120 miRNA primer pairs tested, 95.6% and 93.3% were successful in amplifying specifically non-family and family miRNA members, respectively, after only one design trial. In summary, miPrimer provides a cost-effective and valuable tool for designing miRNA primers. PMID:29208706
Upregulation of Haploinsufficient Gene Expression in the Brain by Targeting a Long Non-coding RNA Improves Seizure Phenotype in a Model of Dravet Syndrome.

PubMed

Hsiao, J; Yuan, T Y; Tsai, M S; Lu, C Y; Lin, Y C; Lee, M L; Lin, S W; Chang, F C; Liu Pimentel, H; Olive, C; Coito, C; Shen, G; Young, M; Thorne, T; Lawrence, M; Magistri, M; Faghihi, M A; Khorkova, O; Wahlestedt, C

2016-07-01

Dravet syndrome is a devastating genetic brain disorder caused by heterozygous loss-of-function mutation in the voltage-gated sodium channel gene SCN1A. There are currently no treatments, but the upregulation of SCN1A healthy allele represents an appealing therapeutic strategy. In this study we identified a novel, evolutionary conserved mechanism controlling the expression of SCN1A that is mediated by an antisense non-coding RNA (SCN1ANAT). Using oligonucleotide-based compounds (AntagoNATs) targeting SCN1ANAT we were able to induce specific upregulation of SCN1A both in vitro and in vivo, in the brain of Dravet knock-in mouse model and a non-human primate. AntagoNAT-mediated upregulation of Scn1a in postnatal Dravet mice led to significant improvements in seizure phenotype and excitability of hippocampal interneurons. These results further elucidate the pathophysiology of Dravet syndrome and outline a possible new approach for the treatment of this and other genetic disorders with similar etiology. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
A long non-coding RNA, LncMyoD, regulates skeletal muscle differentiation by blocking IMP2-mediated mRNA translation.

PubMed

Gong, Chenguang; Li, Zhizhong; Ramanujan, Krishnan; Clay, Ieuan; Zhang, Yunyu; Lemire-Brachat, Sophie; Glass, David J

2015-07-27

Increasing evidence suggests that long non-coding RNAs (LncRNAs) represent a new class of regulators of stem cells. However, the roles of LncRNAs in stem cell maintenance and myogenesis remain largely unexamined. For this study, hundreds of intergenic LncRNAs were identified that are expressed in myoblasts and regulated during differentiation. One of these LncRNAs, termed LncMyoD, is encoded next to the Myod gene and is directly activated by MyoD during myoblast differentiation. Knockdown of LncMyoD strongly inhibits terminal muscle differentiation, largely due to a failure to exit the cell cycle. LncMyoD directly binds to IGF2-mRNA-binding protein 2 (IMP2) and negatively regulates IMP2-mediated translation of proliferation genes such as N-Ras and c-Myc. While the RNA sequence of LncMyoD is not well conserved between human and mouse, its locus, gene structure, and function are preserved. The MyoD-LncMyoD-IMP2 pathway elucidates a mechanism as to how MyoD blocks proliferation to create a permissive state for differentiation. Copyright © 2015 Elsevier Inc. All rights reserved.
A Positive Regulatory Loop between a Wnt-Regulated Non-coding RNA and ASCL2 Controls Intestinal Stem Cell Fate.

PubMed

Giakountis, Antonis; Moulos, Panagiotis; Zarkou, Vasiliki; Oikonomou, Christina; Harokopos, Vaggelis; Hatzigeorgiou, Artemis G; Reczko, Martin; Hatzis, Pantelis

2016-06-21

The canonical Wnt pathway plays a central role in stem cell maintenance, differentiation, and proliferation in the intestinal epithelium. Constitutive, aberrant activity of the TCF4/β-catenin transcriptional complex is the primary transforming factor in colorectal cancer. We identify a nuclear long non-coding RNA, termed WiNTRLINC1, as a direct target of TCF4/β-catenin in colorectal cancer cells. WiNTRLINC1 positively regulates the expression of its genomic neighbor ASCL2, a transcription factor that controls intestinal stem cell fate. WiNTRLINC1 interacts with TCF4/β-catenin to mediate the juxtaposition of its promoter with the regulatory regions of ASCL2. ASCL2, in turn, regulates WiNTRLINC1 transcriptionally, closing a feedforward regulatory loop that controls stem cell-related gene expression. This regulatory circuitry is highly amplified in colorectal cancer and correlates with increased metastatic potential and decreased patient survival. Our results uncover the interplay between non-coding RNA-mediated regulation and Wnt signaling and point to the diagnostic and therapeutic potential of WiNTRLINC1. Copyright © 2016 The Author(s). Published by Elsevier Inc. All rights reserved.
Prosurvival long noncoding RNA PINCR regulates a subset of p53 targets in human colorectal cancer cells by binding to Matrin 3

PubMed Central

Chaudhary, Ritu; Gryder, Berkley; Woods, Wendy S; Subramanian, Murugan; Jones, Matthew F; Li, Xiao Ling; Jenkins, Lisa M; Shabalina, Svetlana A; Mo, Min; Dasso, Mary; Yang, Yuan; Wakefield, Lalage M; Zhu, Yuelin; Frier, Susan M; Moriarity, Branden S; Prasanth, Kannanganattu V; Perez-Pinera, Pablo; Lal, Ashish

2017-01-01

Thousands of long noncoding RNAs (lncRNAs) have been discovered, yet the function of the vast majority remains unclear. Here, we show that a p53-regulated lncRNA which we named PINCR (p53-induced noncoding RNA), is induced ~100-fold after DNA damage and exerts a prosurvival function in human colorectal cancer cells (CRC) in vitro and tumor growth in vivo. Targeted deletion of PINCR in CRC cells significantly impaired G1 arrest and induced hypersensitivity to chemotherapeutic drugs. PINCR regulates the induction of a subset of p53 targets involved in G1 arrest and apoptosis, including BTG2, RRM2B and GPX1. Using a novel RNA pulldown approach that utilized endogenous S1-tagged PINCR, we show that PINCR associates with the enhancer region of these genes by binding to RNA-binding protein Matrin 3 that, in turn, associates with p53. Our findings uncover a critical prosurvival function of a p53/PINCR/Matrin 3 axis in response to DNA damage in CRC cells. DOI: http://dx.doi.org/10.7554/eLife.23244.001 PMID:28580901
Transcriptome analysis demonstrates that long noncoding RNA is involved in the hypoxic response in Larimichthys crocea.

PubMed

Liu, Wei; Liu, Xiaoxu; Wu, Changwen; Jiang, Lihua

2018-06-15

The large yellow croaker (Larimichthys crocea) has low hypoxia tolerance compared with other fish species, and the mRNA levels of hypoxia-inducible factor (HIF)-1α in its brain do not change markedly under hypoxic conditions. In this study, we investigated noncoding transcription in the hypoxic response mechanism of L. crocea. We generated a catalog of long noncoding RNAs (lncRNAs) from the brain of L. crocea individuals under hypoxic stress, investigated lncRNA expression patterns, and analyzed the HIF signaling pathway by RNA sequencing. Prolyl hydroxylase domain 2 (PHD2) expression significantly increased after 6 and 12 h of hypoxia, and a lncRNA (Linc_06633.1) was found in the upstream, antisense region of PHD2. Linc_06633.1 may be an important regulator that promotes PDH2 expression under hypoxia in L. crocea, and we constructed a regulatory profile of L. crocea under hypoxic conditions. To the best of our knowledge, it is the first study that has been conducted on hypoxia signaling pathway regulation by lncRNAs in L. crocea and elucidates the role played by lncRNAs in the regulation of the hypoxia stress response in teleost fish.
Increasing the Yield in Targeted Next-Generation Sequencing by Implicating CNV Analysis, Non-Coding Exons and the Overall Variant Load: The Example of Retinal Dystrophies

PubMed Central

Eisenberger, Tobias; Neuhaus, Christine; Khan, Arif O.; Decker, Christian; Preising, Markus N.; Friedburg, Christoph; Bieg, Anika; Gliem, Martin; Issa, Peter Charbel; Holz, Frank G.; Baig, Shahid M.; Hellenbroich, Yorck; Galvez, Alberto; Platzer, Konrad; Wollnik, Bernd; Laddach, Nadja; Ghaffari, Saeed Reza; Rafati, Maryam; Botzenhart, Elke; Tinschert, Sigrid; Börger, Doris; Bohring, Axel; Schreml, Julia; Körtge-Jung, Stefani; Schell-Apacik, Chayim; Bakur, Khadijah; Al-Aama, Jumana Y.; Neuhann, Teresa; Herkenrath, Peter; Nürnberg, Gudrun; Nürnberg, Peter; Davis, John S.; Gal, Andreas; Bergmann, Carsten; Lorenz, Birgit; Bolz, Hanno J.

2013-01-01

Retinitis pigmentosa (RP) and Leber congenital amaurosis (LCA) are major causes of blindness. They result from mutations in many genes which has long hampered comprehensive genetic analysis. Recently, targeted next-generation sequencing (NGS) has proven useful to overcome this limitation. To uncover “hidden mutations” such as copy number variations (CNVs) and mutations in non-coding regions, we extended the use of NGS data by quantitative readout for the exons of 55 RP and LCA genes in 126 patients, and by including non-coding 5′ exons. We detected several causative CNVs which were key to the diagnosis in hitherto unsolved constellations, e.g. hemizygous point mutations in consanguineous families, and CNVs complemented apparently monoallelic recessive alleles. Mutations of non-coding exon 1 of EYS revealed its contribution to disease. In view of the high carrier frequency for retinal disease gene mutations in the general population, we considered the overall variant load in each patient to assess if a mutation was causative or reflected accidental carriership in patients with mutations in several genes or with single recessive alleles. For example, truncating mutations in RP1, a gene implicated in both recessive and dominant RP, were causative in biallelic constellations, unrelated to disease when heterozygous on a biallelic mutation background of another gene, or even non-pathogenic if close to the C-terminus. Patients with mutations in several loci were common, but without evidence for di- or oligogenic inheritance. Although the number of targeted genes was low compared to previous studies, the mutation detection rate was highest (70%) which likely results from completeness and depth of coverage, and quantitative data analysis. CNV analysis should routinely be applied in targeted NGS, and mutations in non-coding exons give reason to systematically include 5′-UTRs in disease gene or exome panels. Consideration of all variants is indispensable because even truncating mutations may be misleading. PMID:24265693
Increasing the yield in targeted next-generation sequencing by implicating CNV analysis, non-coding exons and the overall variant load: the example of retinal dystrophies.

PubMed

Eisenberger, Tobias; Neuhaus, Christine; Khan, Arif O; Decker, Christian; Preising, Markus N; Friedburg, Christoph; Bieg, Anika; Gliem, Martin; Charbel Issa, Peter; Holz, Frank G; Baig, Shahid M; Hellenbroich, Yorck; Galvez, Alberto; Platzer, Konrad; Wollnik, Bernd; Laddach, Nadja; Ghaffari, Saeed Reza; Rafati, Maryam; Botzenhart, Elke; Tinschert, Sigrid; Börger, Doris; Bohring, Axel; Schreml, Julia; Körtge-Jung, Stefani; Schell-Apacik, Chayim; Bakur, Khadijah; Al-Aama, Jumana Y; Neuhann, Teresa; Herkenrath, Peter; Nürnberg, Gudrun; Nürnberg, Peter; Davis, John S; Gal, Andreas; Bergmann, Carsten; Lorenz, Birgit; Bolz, Hanno J

2013-01-01

Retinitis pigmentosa (RP) and Leber congenital amaurosis (LCA) are major causes of blindness. They result from mutations in many genes which has long hampered comprehensive genetic analysis. Recently, targeted next-generation sequencing (NGS) has proven useful to overcome this limitation. To uncover "hidden mutations" such as copy number variations (CNVs) and mutations in non-coding regions, we extended the use of NGS data by quantitative readout for the exons of 55 RP and LCA genes in 126 patients, and by including non-coding 5' exons. We detected several causative CNVs which were key to the diagnosis in hitherto unsolved constellations, e.g. hemizygous point mutations in consanguineous families, and CNVs complemented apparently monoallelic recessive alleles. Mutations of non-coding exon 1 of EYS revealed its contribution to disease. In view of the high carrier frequency for retinal disease gene mutations in the general population, we considered the overall variant load in each patient to assess if a mutation was causative or reflected accidental carriership in patients with mutations in several genes or with single recessive alleles. For example, truncating mutations in RP1, a gene implicated in both recessive and dominant RP, were causative in biallelic constellations, unrelated to disease when heterozygous on a biallelic mutation background of another gene, or even non-pathogenic if close to the C-terminus. Patients with mutations in several loci were common, but without evidence for di- or oligogenic inheritance. Although the number of targeted genes was low compared to previous studies, the mutation detection rate was highest (70%) which likely results from completeness and depth of coverage, and quantitative data analysis. CNV analysis should routinely be applied in targeted NGS, and mutations in non-coding exons give reason to systematically include 5'-UTRs in disease gene or exome panels. Consideration of all variants is indispensable because even truncating mutations may be misleading.
An expanding universe of noncoding RNAs between the poles of basic science and clinical investigations.

PubMed

Weil, Patrick P; Hensel, Kai O; Weber, David; Postberg, Jan

2016-03-01

The Keystone Symposium 'MicroRNAs and Noncoding RNAs in Cancer', Keystone, CO, USA, 7-12 June 2015 Since the discovery of RNAi, great efforts have been undertaken to unleash the potential biomedical applicability of small noncoding RNAs, mainly miRNAs, involving their use as biomarkers for personalized diagnostics or their usability as active agents or therapy targets. The research's focus on the noncoding RNA world is now slowly moving from a phase of basic discoveries into a new phase, where every single molecule out of many hundreds of cataloged noncoding RNAs becomes dissected in order to investigate these molecules' biomedical relevance. In addition, RNA classes neglected before, such as long noncoding RNAs or circular RNAs attract more attention. Numerous timely results and hypotheses were presented at the 2015 Keystone Symposium 'MicroRNAs and Noncoding RNAs in Cancer'.

Nuclear factor 90 uses an ADAR2-like binding mode to recognize specific bases in dsRNA.

PubMed

Jayachandran, Uma; Grey, Heather; Cook, Atlanta G

2016-02-29

Nuclear factors 90 and 45 (NF90 and NF45) form a protein complex involved in the post-transcriptional control of many genes in vertebrates. NF90 is a member of the dsRNA binding domain (dsRBD) family of proteins. RNA binding partners identified so far include elements in 3' untranslated regions of specific mRNAs and several non-coding RNAs. In NF90, a tandem pair of dsRBDs separated by a natively unstructured segment confers dsRNA binding activity. We determined a crystal structure of the tandem dsRBDs of NF90 in complex with a synthetic dsRNA. This complex shows surprising similarity to the tandem dsRBDs from an adenosine-to-inosine editing enzyme, ADAR2 in complex with a substrate RNA. Residues involved in unusual base-specific recognition in the minor groove of dsRNA are conserved between NF90 and ADAR2. These data suggest that, like ADAR2, underlying sequences in dsRNA may influence how NF90 recognizes its target RNAs. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Spliced DNA Sequences in the Paramecium Germline: Their Properties and Evolutionary Potential

PubMed Central

Catania, Francesco; McGrath, Casey L.; Doak, Thomas G.; Lynch, Michael

2013-01-01

Despite playing a crucial role in germline-soma differentiation, the evolutionary significance of developmentally regulated genome rearrangements (DRGRs) has received scant attention. An example of DRGR is DNA splicing, a process that removes segments of DNA interrupting genic and/or intergenic sequences. Perhaps, best known for shaping immune-system genes in vertebrates, DNA splicing plays a central role in the life of ciliated protozoa, where thousands of germline DNA segments are eliminated after sexual reproduction to regenerate a functional somatic genome. Here, we identify and chronicle the properties of 5,286 sequences that putatively undergo DNA splicing (i.e., internal eliminated sequences [IESs]) across the genomes of three closely related species of the ciliate Paramecium (P. tetraurelia, P. biaurelia, and P. sexaurelia). The study reveals that these putative IESs share several physical characteristics. Although our results are consistent with excision events being largely conserved between species, episodes of differential IES retention/excision occur, may have a recent origin, and frequently involve coding regions. Our findings indicate interconversion between somatic—often coding—DNA sequences and noncoding IESs, and provide insights into the role of DNA splicing in creating potentially functional genetic innovation. PMID:23737328
Alu-mediated deletion of SOX10 regulatory elements in Waardenburg syndrome type 4

PubMed Central

Bondurand, Nadége; Fouquet, Virginie; Baral, Viviane; Lecerf, Laure; Loundon, Natalie; Goossens, Michel; Duriez, Benedicte; Labrune, Philippe; Pingault, Veronique

2012-01-01

Waardenburg syndrome type 4 (WS4) is a rare neural crest disorder defined by the combination of Waardenburg syndrome (sensorineural hearing loss and pigmentation defects) and Hirschsprung disease (intestinal aganglionosis). Three genes are known to be involved in this syndrome, that is, EDN3 (endothelin-3), EDNRB (endothelin receptor type B), and SOX10. However, 15–35% of WS4 remains unexplained at the molecular level, suggesting that other genes could be involved and/or that mutations within known genes may have escaped previous screenings. Here, we searched for deletions within recently identified SOX10 regulatory sequences and describe the first characterization of a WS4 patient presenting with a large deletion encompassing three of these enhancers. Analysis of the breakpoint region suggests a complex rearrangement involving three Alu sequences that could be mediated by a FosTes/MMBIR replication mechanism. Taken together with recent reports, our results demonstrate that the disruption of highly conserved non-coding elements located within or at a long distance from the coding sequences of key genes can result in several neurocristopathies. This opens up new routes to the molecular dissection of neural crest disorders. PMID:22378281
Alu-mediated deletion of SOX10 regulatory elements in Waardenburg syndrome type 4.

PubMed

Bondurand, Nadége; Fouquet, Virginie; Baral, Viviane; Lecerf, Laure; Loundon, Natalie; Goossens, Michel; Duriez, Benedicte; Labrune, Philippe; Pingault, Veronique

2012-09-01

Waardenburg syndrome type 4 (WS4) is a rare neural crest disorder defined by the combination of Waardenburg syndrome (sensorineural hearing loss and pigmentation defects) and Hirschsprung disease (intestinal aganglionosis). Three genes are known to be involved in this syndrome, that is, EDN3 (endothelin-3), EDNRB (endothelin receptor type B), and SOX10. However, 15-35% of WS4 remains unexplained at the molecular level, suggesting that other genes could be involved and/or that mutations within known genes may have escaped previous screenings. Here, we searched for deletions within recently identified SOX10 regulatory sequences and describe the first characterization of a WS4 patient presenting with a large deletion encompassing three of these enhancers. Analysis of the breakpoint region suggests a complex rearrangement involving three Alu sequences that could be mediated by a FosTes/MMBIR replication mechanism. Taken together with recent reports, our results demonstrate that the disruption of highly conserved non-coding elements located within or at a long distance from the coding sequences of key genes can result in several neurocristopathies. This opens up new routes to the molecular dissection of neural crest disorders.
DLEU2 encodes an antisense RNA for the putative bicistronic RFP2/LEU5 gene in humans and mouse.

PubMed

Corcoran, Martin M; Hammarsund, Marianne; Zhu, Chaoyong; Lerner, Mikael; Kapanadze, Bagrat; Wilson, Bill; Larsson, Catharina; Forsberg, Lars; Ibbotson, Rachel E; Einhorn, Stefan; Oscier, David G; Grandér, Dan; Sangfelt, Olle

2004-08-01

Our group previously identified two novel genes, RFP2/LEU5 and DLEU2, within a 13q14.3 genomic region of loss seen in various malignancies. However, no specific inactivating mutations were found in these or other genes in the vicinity of the deletion, suggesting that a nonclassical tumor-suppressor mechanism may be involved. Here, we present data showing that the DLEU2 gene encodes a putative noncoding antisense RNA, with one exon directly overlapping the first exon of the RFP2/LEU5 gene in the opposite orientation. In addition, the RFP2/LEU5 transcript can be alternatively spliced to produce either several monocistronic transcripts or a putative bicistronic transcript encoding two separate open-reading frames, adding to the complexity of the locus. The finding that these gene structures are conserved in the mouse, including the putative bicistronic RFP2/LEU5 transcript as well as the antisense relationship with DLEU2, further underlines the significance of this unusual organization and suggests a biological function for DLEU2 in the regulation of RFP2/LEU5. Copyright 2004 Wiley-Liss, Inc.
Facts and updates about cardiovascular non-coding RNAs in heart failure.

PubMed

Thum, Thomas

2015-09-01

About 11% of all deaths include heart failure as a contributing cause. The annual cost of heart failure amounts to US $34,000,000,000 in the United States alone. With the exception of heart transplantation, there is no curative therapy available. Only occasionally there are new areas in science that develop into completely new research fields. The topic on non-coding RNAs, including microRNAs, long non-coding RNAs, and circular RNAs, is such a field. In this short review, we will discuss the latest developments about non-coding RNAs in cardiovascular disease. MicroRNAs are short regulatory non-coding endogenous RNA species that are involved in virtually all cellular processes. Long non-coding RNAs also regulate gene and protein levels; however, by much more complicated and diverse mechanisms. In general, non-coding RNAs have been shown to be of great value as therapeutic targets in adverse cardiac remodelling and also as diagnostic and prognostic biomarkers for heart failure. In the future, non-coding RNA-based therapeutics are likely to enter the clinical reality offering a new treatment approach of heart failure.
Noncoding RNAs in human intervertebral disc degeneration: An integrated microarray study.

PubMed

Liu, Xu; Che, Lu; Xie, Yan-Ke; Hu, Qing-Jie; Ma, Chi-Jiao; Pei, Yan-Jun; Wu, Zhi-Gang; Liu, Zhi-Heng; Fan, Li-Ying; Wang, Hai-Qiang

2015-09-01

Accumulating evidence indicates that noncoding RNAs play important roles in a multitude of biological processes. The striking findings of miRNAs (microRNAs) and lncRNAs (long noncoding RNAs) as members of noncoding RNAs open up an exciting era in the studies of gene regulation. More recently, the reports of circRNAs (circular RNAs) add fuel to the noncoding RNAs research. Human intervertebral disc degeneration (IDD) is a main cause of low back pain as a disabling spinal disease. We have addressed the expression profiles if miRNAs, lncRNAs and mRNAs in IDD (Wang et al., J Pathology, 2011 and Wan et al., Arthritis Res Ther, 2014). Furthermore, we thoroughly analysed noncoding RNAs, including miRNAs, lncRNAs and circRNAs in IDD using the very same samples. Here we delineate in detail the contents of the aforementioned microarray analyses. Microarray and sample annotation data were deposited in GEO under accession number GSE67567 as SuperSeries. The integrated analyses of these noncoding RNAs will shed a novel light on coding-noncoding regulatory machinery.
Cross-species inference of long non-coding RNAs greatly expands the ruminant transcriptome.

PubMed

Bush, Stephen J; Muriuki, Charity; McCulloch, Mary E B; Farquhar, Iseabail L; Clark, Emily L; Hume, David A

2018-04-24

mRNA-like long non-coding RNAs (lncRNAs) are a significant component of mammalian transcriptomes, although most are expressed only at low levels, with high tissue-specificity and/or at specific developmental stages. Thus, in many cases lncRNA detection by RNA-sequencing (RNA-seq) is compromised by stochastic sampling. To account for this and create a catalogue of ruminant lncRNAs, we compared de novo assembled lncRNAs derived from large RNA-seq datasets in transcriptional atlas projects for sheep and goats with previous lncRNAs assembled in cattle and human. We then combined the novel lncRNAs with the sheep transcriptional atlas to identify co-regulated sets of protein-coding and non-coding loci. Few lncRNAs could be reproducibly assembled from a single dataset, even with deep sequencing of the same tissues from multiple animals. Furthermore, there was little sequence overlap between lncRNAs that were assembled from pooled RNA-seq data. We combined positional conservation (synteny) with cross-species mapping of candidate lncRNAs to identify a consensus set of ruminant lncRNAs and then used the RNA-seq data to demonstrate detectable and reproducible expression in each species. In sheep, 20 to 30% of lncRNAs were located close to protein-coding genes with which they are strongly co-expressed, which is consistent with the evolutionary origin of some ncRNAs in enhancer sequences. Nevertheless, most of the lncRNAs are not co-expressed with neighbouring protein-coding genes. Alongside substantially expanding the ruminant lncRNA repertoire, the outcomes of our analysis demonstrate that stochastic sampling can be partly overcome by combining RNA-seq datasets from related species. This has practical implications for the future discovery of lncRNAs in other species.
A-to-I RNA editing promotes developmental stage–specific gene and lncRNA expression

PubMed Central

Goldstein, Boaz; Agranat-Tamir, Lily; Light, Dean; Ben-Naim Zgayer, Orna; Fishman, Alla; Lamm, Ayelet T.

2017-01-01

A-to-I RNA editing is a conserved widespread phenomenon in which adenosine (A) is converted to inosine (I) by adenosine deaminases (ADARs) in double-stranded RNA regions, mainly noncoding. Mutations in ADAR enzymes in Caenorhabditis elegans cause defects in normal development but are not lethal as in human and mouse. Previous studies in C. elegans indicated competition between RNA interference (RNAi) and RNA editing mechanisms, based on the observation that worms that lack both mechanisms do not exhibit defects, in contrast to the developmental defects observed when only RNA editing is absent. To study the effects of RNA editing on gene expression and function, we established a novel screen that enabled us to identify thousands of RNA editing sites in nonrepetitive regions in the genome. These include dozens of genes that are edited at their 3′ UTR region. We found that these genes are mainly germline and neuronal genes, and that they are down-regulated in the absence of ADAR enzymes. Moreover, we discovered that almost half of these genes are edited in a developmental-specific manner, indicating that RNA editing is a highly regulated process. We found that many pseudogenes and other lncRNAs are also extensively down-regulated in the absence of ADARs in the embryo but not in the fourth larval (L4) stage. This down-regulation is not observed upon additional knockout of RNAi. Furthermore, levels of siRNAs aligned to pseudogenes in ADAR mutants are enhanced. Taken together, our results suggest a role for RNA editing in normal growth and development by regulating silencing via RNAi. PMID:28031250
A Noncoding Expansion in EIF4A3 Causes Richieri-Costa-Pereira Syndrome, a Craniofacial Disorder Associated with Limb Defects

PubMed Central

Favaro, Francine P.; Alvizi, Lucas; Zechi-Ceide, Roseli M.; Bertola, Debora; Felix, Temis M.; de Souza, Josiane; Raskin, Salmo; Twigg, Stephen R.F.; Weiner, Andrea M.J.; Armas, Pablo; Margarit, Ezequiel; Calcaterra, Nora B.; Andersen, Gregers R.; McGowan, Simon J.; Wilkie, Andrew O.M.; Richieri-Costa, Antonio; de Almeida, Maria L.G.; Passos-Bueno, Maria Rita

2014-01-01

Richieri-Costa-Pereira syndrome is an autosomal-recessive acrofacial dysostosis characterized by mandibular median cleft associated with other craniofacial anomalies and severe limb defects. Learning and language disabilities are also prevalent. We mapped the mutated gene to a 122 kb region at 17q25.3 through identity-by-descent analysis in 17 genealogies. Sequencing strategies identified an expansion of a region with several repeats of 18- or 20-nucleotide motifs in the 5′ untranslated region (5′ UTR) of EIF4A3, which contained from 14 to 16 repeats in the affected individuals and from 3 to 12 repeats in 520 healthy individuals. A missense substitution of a highly conserved residue likely to affect the interaction of eIF4AIII with the UPF3B subunit of the exon junction complex in trans with an expanded allele was found in an unrelated individual with an atypical presentation, thus expanding mutational mechanisms and phenotypic diversity of RCPS. EIF4A3 transcript abundance was reduced in both white blood cells and mesenchymal cells of RCPS-affected individuals as compared to controls. Notably, targeting the orthologous eif4a3 in zebrafish led to underdevelopment of several craniofacial cartilage and bone structures, in agreement with the craniofacial alterations seen in RCPS. Our data thus suggest that RCPS is caused by mutations in EIF4A3 and show that EIF4A3, a gene involved in RNA metabolism, plays a role in mandible, laryngeal, and limb morphogenesis. PMID:24360810
Deciphering the Regulatory Logic of an Ancient, Ultraconserved Nuclear Receptor Enhancer Module

PubMed Central

Bagamasbad, Pia D.; Bonett, Ronald M.; Sachs, Laurent; Buisine, Nicolas; Raj, Samhitha; Knoedler, Joseph R.; Kyono, Yasuhiro; Ruan, Yijun; Ruan, Xiaoan

2015-01-01

Cooperative, synergistic gene regulation by nuclear hormone receptors can increase sensitivity and amplify cellular responses to hormones. We investigated thyroid hormone (TH) and glucocorticoid (GC) synergy on the Krüppel-like factor 9 (Klf9) gene, which codes for a zinc finger transcription factor involved in development and homeostasis of diverse tissues. We identified regions of the Xenopus and mouse Klf9 genes 5–6 kb upstream of the transcription start sites that supported synergistic transactivation by TH plus GC. Within these regions, we found an orthologous sequence of approximately 180 bp that is highly conserved among tetrapods, but absent in other chordates, and possesses chromatin marks characteristic of an enhancer element. The Xenopus and mouse approximately 180-bp DNA element conferred synergistic transactivation by hormones in transient transfection assays, so we designate this the Klf9 synergy module (KSM). We identified binding sites within the mouse KSM for TH receptor, GC receptor, and nuclear factor κB. TH strongly increased recruitment of liganded GC receptor and serine 5 phosphorylated (initiating) RNA polymerase II to chromatin at the KSM, suggesting a mechanism for transcriptional synergy. The KSM is transcribed to generate long noncoding RNAs, which are also synergistically induced by combined hormone treatment, and the KSM interacts with the Klf9 promoter and a far upstream region through chromosomal looping. Our findings support that the KSM plays a central role in hormone regulation of vertebrate Klf9 genes, it evolved in the tetrapod lineage, and has been maintained by strong stabilizing selection. PMID:25866873
Faster-X evolution of gene expression is driven by recessive adaptive cis-regulatory variation in Drosophila.

PubMed

Llopart, Ana

2018-05-01

The hemizygosity of the X (Z) chromosome fully exposes the fitness effects of mutations on that chromosome and has evolutionary consequences on the relative rates of evolution of X and autosomes. Specifically, several population genetics models predict increased rates of evolution in X-linked loci relative to autosomal loci. This prediction of faster-X evolution has been evaluated and confirmed for both protein coding sequences and gene expression. In the case of faster-X evolution for gene expression divergence, it is often assumed that variation in 5' noncoding sequences is associated with variation in transcript abundance between species but a formal, genomewide test of this hypothesis is still missing. Here, I use whole genome sequence data in Drosophila yakuba and D. santomea to evaluate this hypothesis and report positive correlations between sequence divergence at 5' noncoding sequences and gene expression divergence. I also examine polymorphism and divergence in 9,279 noncoding sequences located at the 5' end of annotated genes and detected multiple signals of positive selection. Notably, I used the traditional synonymous sites as neutral reference to test for adaptive evolution, but I also used bases 8-30 of introns <65 bp, which have been proposed to be a better neutral choice. X-linked genes with high degree of male-biased expression show the most extreme adaptive pattern at 5' noncoding regions, in agreement with faster-X evolution for gene expression divergence and a higher incidence of positively selected recessive mutations. © 2018 The Authors. Molecular Ecology Published by John Wiley & Sons Ltd.
Functional analysis of the TRIB1 associated locus linked to plasma triglycerides and coronary artery disease.

PubMed

Douvris, Adrianna; Soubeyrand, Sébastien; Naing, Thet; Martinuk, Amy; Nikpay, Majid; Williams, Andrew; Buick, Julie; Yauk, Carole; McPherson, Ruth

2014-06-03

The TRIB1 locus has been linked to hepatic triglyceride metabolism in mice and to plasma triglycerides and coronary artery disease in humans. The lipid-associated single nucleotide polymorphisms (SNPs), identified by genome-wide association studies, are located ≈30 kb downstream from TRIB1, suggesting complex regulatory effects on genes or pathways relevant to hepatic triglyceride metabolism. The goal of this study was to investigate the functional relationship between common SNPs at the TRIB1 locus and plasma lipid traits. Characterization of the risk locus reveals that it encompasses a gene, TRIB1-associated locus (TRIBAL), composed of a well-conserved promoter region and an alternatively spliced transcript. Bioinformatic analysis and resequencing identified a single SNP, rs2001844, within the promoter region that associates with increased plasma triglycerides and reduced high-density lipoprotein cholesterol and coronary artery disease risk. Further, correction for triglycerides as a covariate indicated that the genome-wide association studies association is largely dependent on triglycerides. In addition, we show that rs2001844 is an expression trait locus (eQTL) for TRIB1 expression in blood and alters TRIBAL promoter activity in a reporter assay model. The TRIBAL transcript has features typical of long noncoding RNAs, including poor sequence conservation. Modulation of TRIBAL expression had limited impact on either TRIB1 or lipid regulatory genes mRNA levels in human hepatocyte models. In contrast, TRIB1 knockdown markedly increased TRIBAL expression in HepG2 cells and primary human hepatocytes. These studies demonstrate an interplay between a novel locus, TRIBAL, and TRIB1. TRIBAL is located in the genome-wide association studies identified risk locus, responds to altered expression of TRIB1, harbors a risk SNP that is an eQTL for TRIB1 expression, and associates with plasma triglyceride concentrations. © 2014 The Authors. Published on behalf of the American Heart Association, Inc., by Wiley Blackwell.
Functional Analysis of the TRIB1 Associated Locus Linked to Plasma Triglycerides and Coronary Artery Disease

PubMed Central

Douvris, Adrianna; Soubeyrand, Sébastien; Naing, Thet; Martinuk, Amy; Nikpay, Majid; Williams, Andrew; Buick, Julie; Yauk, Carole; McPherson, Ruth

2014-01-01

Background The TRIB1 locus has been linked to hepatic triglyceride metabolism in mice and to plasma triglycerides and coronary artery disease in humans. The lipid‐associated single nucleotide polymorphisms (SNPs), identified by genome‐wide association studies, are located ≈30 kb downstream from TRIB1, suggesting complex regulatory effects on genes or pathways relevant to hepatic triglyceride metabolism. The goal of this study was to investigate the functional relationship between common SNPs at the TRIB1 locus and plasma lipid traits. Methods and Results Characterization of the risk locus reveals that it encompasses a gene, TRIB1‐associated locus (TRIBAL), composed of a well‐conserved promoter region and an alternatively spliced transcript. Bioinformatic analysis and resequencing identified a single SNP, rs2001844, within the promoter region that associates with increased plasma triglycerides and reduced high‐density lipoprotein cholesterol and coronary artery disease risk. Further, correction for triglycerides as a covariate indicated that the genome‐wide association studies association is largely dependent on triglycerides. In addition, we show that rs2001844 is an expression trait locus (eQTL) for TRIB1 expression in blood and alters TRIBAL promoter activity in a reporter assay model. The TRIBAL transcript has features typical of long noncoding RNAs, including poor sequence conservation. Modulation of TRIBAL expression had limited impact on either TRIB1 or lipid regulatory genes mRNA levels in human hepatocyte models. In contrast, TRIB1 knockdown markedly increased TRIBAL expression in HepG2 cells and primary human hepatocytes. Conclusions These studies demonstrate an interplay between a novel locus, TRIBAL, and TRIB1. TRIBAL is located in the genome‐wide association studies identified risk locus, responds to altered expression of TRIB1, harbors a risk SNP that is an eQTL for TRIB1 expression, and associates with plasma triglyceride concentrations. PMID:24895164
Function and Evolution of DNA Methylation in Nasonia vitripennis

PubMed Central

Wang, Xu; Wheeler, David; Avery, Amanda; Rago, Alfredo; Choi, Jeong-Hyeon; Colbourne, John K.; Clark, Andrew G.; Werren, John H.

2013-01-01

The parasitoid wasp Nasonia vitripennis is an emerging genetic model for functional analysis of DNA methylation. Here, we characterize genome-wide methylation at a base-pair resolution, and compare these results to gene expression across five developmental stages and to methylation patterns reported in other insects. An accurate assessment of DNA methylation across the genome is accomplished using bisulfite sequencing of adult females from a highly inbred line. One-third of genes show extensive methylation over the gene body, yet methylated DNA is not found in non-coding regions and rarely in transposons. Methylated genes occur in small clusters across the genome. Methylation demarcates exon-intron boundaries, with elevated levels over exons, primarily in the 5′ regions of genes. It is also elevated near the sites of translational initiation and termination, with reduced levels in 5′ and 3′ UTRs. Methylated genes have higher median expression levels and lower expression variation across development stages than non-methylated genes. There is no difference in frequency of differential splicing between methylated and non-methylated genes, and as yet no established role for methylation in regulating alternative splicing in Nasonia. Phylogenetic comparisons indicate that many genes maintain methylation status across long evolutionary time scales. Nasonia methylated genes are more likely to be conserved in insects, but even those that are not conserved show broader expression across development than comparable non-methylated genes. Finally, examination of duplicated genes shows that those paralogs that have lost methylation in the Nasonia lineage following gene duplication evolve more rapidly, show decreased median expression levels, and increased specialization in expression across development. Methylation of Nasonia genes signals constitutive transcription across developmental stages, whereas non-methylated genes show more dynamic developmental expression patterns. We speculate that loss of methylation may result in increased developmental specialization in evolution and acquisition of methylation may lead to broader constitutive expression. PMID:24130511
Molecular variation in AVP and AVPR1a in New World monkeys (Primates, Platyrrhini): evolution and implications for social monogamy.

PubMed

Ren, Dongren; Chin, Kelvin R; French, Jeffrey A

2014-01-01

The neurohypophysial hormone arginine vasopressin (AVP) plays important roles in fluid regulation and vascular resistance. Differences in AVP receptor expression, particularly mediated through variation in the noncoding promoter region of the primary receptor for AVP (AVPR1a), may play a role in social phenotypes, particularly social monogamy, in rodents and humans. Among primates, social monogamy is rare, but is common among New World monkeys (NWM). AVP is a nonapeptide and generally conserved among eutherian mammals, although a recent paper demonstrated that some NWM species possess a novel form of the related neuropeptide hormone, oxytocin. We therefore characterized variation in the AVP and AVPR1a genes in 22 species representing every genus in the three major platyrrhine families (Cebidae, Atelidae and Pitheciidae). For AVP, a total of 16 synonymous substitutions were detected in 15 NWM species. No non-synonymous substitutions were noted, hence, AVP is conserved in NWM. By contrast, relative to the human AVPR1a, 66 predicted amino acids (AA) substitutions were identified in NWM. The AVPR1a N-terminus (ligand binding domain), third intracellular (G-protein binding domain), and C-terminus were variable among species. Complex evolution of AVPR1a is also apparent in NWM. A molecular phylogenetic tree inferred from AVPR1a coding sequences revealed some consensus taxonomic separation by families, but also a mixed group composed of genera from all three families. The overall dN/dS ratio of AVPR1a was 0.11, but signals of positive selection in distinct AVPR1a regions were observed, including the N-terminus, in which we identified six potential positive selection sites. AA substitutions at positions 241, 319, 399 and 409 occurred uniquely in marmosets and tamarins. Our results enhance the appreciation of genetic diversity in the mammalian AVP/AVPR1a system, and set the stage for molecular modeling of the neurohypophyseal hormones and social behavior in primates.
Molecular Variation in AVP and AVPR1a in New World Monkeys (Primates, Platyrrhini): Evolution and Implications for Social Monogamy

PubMed Central

Ren, Dongren; Chin, Kelvin R.; French, Jeffrey A.

2014-01-01

The neurohypophysial hormone arginine vasopressin (AVP) plays important roles in fluid regulation and vascular resistance. Differences in AVP receptor expression, particularly mediated through variation in the noncoding promoter region of the primary receptor for AVP (AVPR1a), may play a role in social phenotypes, particularly social monogamy, in rodents and humans. Among primates, social monogamy is rare, but is common among New World monkeys (NWM). AVP is a nonapeptide and generally conserved among eutherian mammals, although a recent paper demonstrated that some NWM species possess a novel form of the related neuropeptide hormone, oxytocin. We therefore characterized variation in the AVP and AVPR1a genes in 22 species representing every genus in the three major platyrrhine families (Cebidae, Atelidae and Pitheciidae). For AVP, a total of 16 synonymous substitutions were detected in 15 NWM species. No non-synonymous substitutions were noted, hence, AVP is conserved in NWM. By contrast, relative to the human AVPR1a, 66 predicted amino acids (AA) substitutions were identified in NWM. The AVPR1a N-terminus (ligand binding domain), third intracellular (G-protein binding domain), and C-terminus were variable among species. Complex evolution of AVPR1a is also apparent in NWM. A molecular phylogenetic tree inferred from AVPR1a coding sequences revealed some consensus taxonomic separation by families, but also a mixed group composed of genera from all three families. The overall dN/dS ratio of AVPR1a was 0.11, but signals of positive selection in distinct AVPR1a regions were observed, including the N-terminus, in which we identified six potential positive selection sites. AA substitutions at positions 241, 319, 399 and 409 occurred uniquely in marmosets and tamarins. Our results enhance the appreciation of genetic diversity in the mammalian AVP/AVPR1a system, and set the stage for molecular modeling of the neurohypophyseal hormones and social behavior in primates. PMID:25360668
Breast cancer risk-associated SNPs modulate the affinity of chromatin for FOXA1 and alter gene expression

PubMed Central

Cowper-Sal·lari, Richard; Zhang, Xiaoyang; Wright, Jason B.; Bailey, Swneke D.; Cole, Michael D.; Eeckhoute, Jerome; Moore, Jason H.; Lupien, Mathieu

2012-01-01

Genome-wide association studies (GWASs) have identified thousands of single nucleotide polymorphisms (SNPs) associated with human traits and diseases. But because the vast majority of these SNPs are located in the noncoding regions of the genome their risk promoting mechanisms are elusive. Employing a new methodology combining cistromics, epigenomics and genotype imputation we annotate the noncoding regions of the genome in breast cancer cells and systematically identify the functional nature of SNPs associated with breast cancer risk. Our results demonstrate that breast cancer risk-associated SNPs are enriched in the cistromes of FOXA1 and ESR1 and the epigenome of H3K4me1 in a cancer and cell-type-specific manner. Furthermore, the majority of these risk-associated SNPs modulate the affinity of chromatin for FOXA1 at distal regulatory elements, which results in allele-specific gene expression, exemplified by the effect of the rs4784227 SNP on the TOX3 gene found within the 16q12.1 risk locus. PMID:23001124
Generation of Recombinant Polioviruses Harboring RNA Affinity Tags in the 5′ and 3′ Noncoding Regions of Genomic RNAs

PubMed Central

Flather, Dylan; Cathcart, Andrea L.; Cruz, Casey; Baggs, Eric; Ngo, Tuan; Gershon, Paul D.; Semler, Bert L.

2016-01-01

Despite being intensely studied for more than 50 years, a complete understanding of the enterovirus replication cycle remains elusive. Specifically, only a handful of cellular proteins have been shown to be involved in the RNA replication cycle of these viruses. In an effort to isolate and identify additional cellular proteins that function in enteroviral RNA replication, we have generated multiple recombinant polioviruses containing RNA affinity tags within the 3′ or 5′ noncoding region of the genome. These recombinant viruses retained RNA affinity sequences within the genome while remaining viable and infectious over multiple passages in cell culture. Further characterization of these viruses demonstrated that viral protein production and growth kinetics were unchanged or only slightly altered relative to wild type poliovirus. However, attempts to isolate these genetically-tagged viral genomes from infected cells have been hindered by high levels of co-purification of nonspecific proteins and the limited matrix-binding efficiency of RNA affinity sequences. Regardless, these recombinant viruses represent a step toward more thorough characterization of enterovirus ribonucleoprotein complexes involved in RNA replication. PMID:26861382
The complete mitochondrial genome of the Giant Manta ray, Manta birostris.

PubMed

Hinojosa-Alvarez, Silvia; Díaz-Jaimes, Pindaro; Marcet-Houben, Marina; Gabaldón, Toni

2015-01-01

The complete mitochondrial genome of the giant manta ray (Manta birostris), consists of 18,075 bp with rich A + T and low G content. Gene organization and length is similar to other species of ray. It comprises of 13 protein-coding genes, 2 rRNAs genes, 23 tRNAs genes and 1 non-coding sequence, and the control region. We identified an AT tandem repeat region, similar to that reported in Mobula japanica.

Tau mRNA 3'UTR-to-CDS ratio is increased in Alzheimer disease.

PubMed

García-Escudero, Vega; Gargini, Ricardo; Martín-Maestro, Patricia; García, Esther; García-Escudero, Ramón; Avila, Jesús

2017-08-10

Neurons frequently show an imbalance in expression of the 3' untranslated region (3'UTR) relative to the coding DNA sequence (CDS) region of mature messenger RNAs (mRNA). The ratio varies among different cells or parts of the brain. The Map2 protein levels per cell depend on the 3'UTR-to-CDS ratio rather than the total mRNA amount, which suggests powerful regulation of protein expression by 3'UTR sequences. Here we found that MAPT (the microtubule-associated protein tau gene) 3'UTR levels are particularly high with respect to other genes; indeed, the 3'UTR-to-CDS ratio of MAPT is balanced in healthy brain in mouse and human. The tau protein accumulates in Alzheimer diseased brain. We nonetheless observed that the levels of RNA encoding MAPT/tau were diminished in these patients' brains. To explain this apparently contradictory result, we studied MAPT mRNA stoichiometry in coding and non-coding regions, and found that the 3'UTR-to-CDS ratio was higher in the hippocampus of Alzheimer disease patients, with higher tau protein but lower total mRNA levels. Our data indicate that changes in the 3'UTR-to-CDS ratio have a regulatory role in the disease. Future research should thus consider not only mRNA levels, but also the ratios between coding and non-coding regions. Copyright © 2017 Elsevier B.V. All rights reserved.
A family of long intergenic non-coding RNA genes in human chromosomal region 22q11.2 carry a DNA translocation breakpoint/AT-rich sequence

PubMed Central

2018-01-01

FAM230C, a long intergenic non-coding RNA (lincRNA) gene in human chromosome 13 (chr13) is a member of lincRNA genes termed family with sequence similarity 230. An analysis using bioinformatics search tools and alignment programs was undertaken to determine properties of FAM230C and its related genes. Results reveal that the DNA translocation element, the Translocation Breakpoint Type A (TBTA) sequence, which consists of satellite DNA, Alu elements, and AT-rich sequences is embedded in the FAM230C gene. Eight lincRNA genes related to FAM230C also carry the TBTA sequences. These genes were formed from a large segment of the 3’ half of the FAM230C sequence duplicated in chr22, and are specifically in regions of low copy repeats (LCR22)s, in or close to the 22q.11.2 region. 22q11.2 is a chromosomal segment that undergoes a high rate of DNA translocation and is prone to genetic deletions. FAM230C-related genes present in other chromosomes do not carry the TBTA motif and were formed from the 5’ half region of the FAM230C sequence. These findings identify a high specificity in lincRNA gene formation by gene sequence duplication in different chromosomes. PMID:29668722
Noncoding RNAs of the Ultrabithorax Domain of the Drosophila Bithorax Complex

PubMed Central

Pease, Benjamin; Borges, Ana C.; Bender, Welcome

2013-01-01

RNA transcripts without obvious coding potential are widespread in many creatures, including the fruit fly, Drosophila melanogaster. Several noncoding RNAs have been identified within the Drosophila bithorax complex. These first appear in blastoderm stage embryos, and their expression patterns indicate that they are transcribed only from active domains of the bithorax complex. It has been suggested that these noncoding RNAs have a role in establishing active domains, perhaps by setting the state of Polycomb Response Elements A comprehensive survey across the proximal half of the bithorax complex has now revealed nine distinct noncoding RNA transcripts, including four within the Ultrabithorax transcription unit. At the blastoderm stage, the noncoding transcripts collectively span ∼75% of the 135 kb surveyed. Recombination-mediated cassette exchange was used to invert the promoter of one of the noncoding RNAs, a 23-kb transcript from the bxd domain of the bithorax complex. The resulting animals fail to make the normal bxd noncoding RNA and show no transcription across the bxd Polycomb Response Element in early embryos. The mutant flies look normal; the regulation of the bxd domain appears unaffected. Thus, the bxd noncoding RNA has no apparent function. PMID:24077301
Insertion of a self-splicing intron into the mtDNA of atriploblastic animal

DOE Office of Scientific and Technical Information (OSTI.GOV)

Valles, Y.; Halanych, K.; Boore, J.L.

2006-04-14

Nephtys longosetosa is a carnivorous polychaete worm that lives in the intertidal and subtidal zones with worldwide distribution (pleijel&rouse2001). Its mitochondrial genome has the characteristics typical of most metazoans: 37 genes; circular molecule; almost no intergenic sequence; and no significant gene rearrangements when compared to other annelid mtDNAs (booremoritz19981995). Ubiquitous features as small intergenic regions and lack of introns suggested that metazoan mtDNAs are under strong selective pressures to reduce their genome size allowing for faster replication requirements (booremoritz19981995Lynch2005). Yet, in 1996 two type I introns were found in the mtDNA of the basal metazoan Metridium senile (FigureX). Breaking amore » long-standing rule (absence of introns in metazoan mtDNA), this finding was later supported by the further presence of group I introns in other cnidarians. Interestingly, only the class Anthozoa within cnidarians seems to harbor such introns. Although several hundreds of triploblastic metazoan mtDNAs have been sequenced, this study is the first evidence of mitochondrial introns in triploblastic metazoans. The cox1 gene of N. longosetosa has an intron of almost 2 kbs in length. This finding represents as well the first instance of a group II intron (anthozoans harbor group I introns) in all metazoan lineages. Opposite trends are observed within plants, fungi and protist mtDNAs, where introns (both group I and II) and other non-coding sequences are widespread. Plant, fungal and protist mtDNA structure and organization differ enormously from that of metazoan mtDNA. Both, plant and fungal mtDNA are dynamic molecules that undergo high rates of recombination, contain long intergenic spacer regions and harbor both group I and group II introns. However, as metazoans they have a conserved gene content. Protists, on the other hand have a striking variation of gene content and introns that account for the genome size variation. In contrast to this mtDNA structure and organization diversity, current genome level studies point to a monophyletic origin of the mitochondria (REFS), raising questions such as: what are the pressures at work shaping the evolution of the mitochondrial genome at 'higher' levels? What drives the absence of introns and other non-coding spacers in metazoan mtDNA? What characteristics must have an intron to be maintained in an environment where 'extra chromosomes' are usually selected against?« less
Novel variants of the 5S rRNA genes in Eruca sativa.

PubMed

Singh, K; Bhatia, S; Lakshmikumaran, M

1994-02-01

The 5S ribosomal RNA (rRNA) genes of Eruca sativa were cloned and characterized. They are organized into clusters of tandemly repeated units. Each repeat unit consists of a 119-bp coding region followed by a noncoding spacer region that separates it from the coding region of the next repeat unit. Our study reports novel gene variants of the 5S rRNA genes in plants. Two families of the 5S rDNA, the 0.5-kb size family and the 1-kb size family, coexist in the E. sativa genome. The 0.5-kb size family consists of the 5S rRNA genes (S4) that have coding regions similar to those of other reported plant 5S rDNA sequences, whereas the 1-kb size family consists of the 5S rRNA gene variants (S1) that exist as 1-kb BamHI tandem repeats. S1 is made up of two variant units (V1 and V2) of 5S rDNA where the BamHI site between the two units is mutated. Sequence heterogeneity among S4, V1, and V2 units exists throughout the sequence and is not limited to the noncoding spacer region only. The coding regions of V1 and V2 show approximately 20% dissimilarity to the coding regions of S4 and other reported plant 5S rDNA sequences. Such a large variation in the coding regions of the 5S rDNA units within the same plant species has been observed for the first time. Restriction site variation is observed between the two size classes of 5S rDNA in E. sativa.(ABSTRACT TRUNCATED AT 250 WORDS)
Non-coding genomic regions possessing enhancer and silencer potential are associated with healthy aging and exceptional survival.

PubMed

Kim, Sangkyu; Welsh, David A; Myers, Leann; Cherry, Katie E; Wyckoff, Jennifer; Jazwinski, S Michal

2015-02-28

We have completed a genome-wide linkage scan for healthy aging using data collected from a family study, followed by fine-mapping by association in a separate population, the first such attempt reported. The family cohort consisted of parents of age 90 or above and their children ranging in age from 50 to 80. As a quantitative measure of healthy aging, we used a frailty index, called FI34, based on 34 health and function variables. The linkage scan found a single significant linkage peak on chromosome 12. Using an independent cohort of unrelated nonagenarians, we carried out a fine-scale association mapping of the region suggestive of linkage and identified three sites associated with healthy aging. These healthy-aging sites (HASs) are located in intergenic regions at 12q13-14. HAS-1 has been previously associated with multiple diseases, and an enhancer was recently mapped and experimentally validated within the site. HAS-2 is a previously uncharacterized site possessing genomic features suggestive of enhancer activity. HAS-3 contains features associated with Polycomb repression. The HASs also contain variants associated with exceptional longevity, based on a separate analysis. Our results provide insight into functional genomic networks involving non-coding regulatory elements that are involved in healthy aging and longevity.
Non-coding genomic regions possessing enhancer and silencer potential are associated with healthy aging and exceptional survival

PubMed Central

Kim, Sangkyu; Welsh, David A.; Myers, Leann; Cherry, Katie E.; Wyckoff, Jennifer; Jazwinski, S. Michal

2015-01-01

We have completed a genome-wide linkage scan for healthy aging using data collected from a family study, followed by fine-mapping by association in a separate population, the first such attempt reported. The family cohort consisted of parents of age 90 or above and their children ranging in age from 50 to 80. As a quantitative measure of healthy aging, we used a frailty index, called FI34, based on 34 health and function variables. The linkage scan found a single significant linkage peak on chromosome 12. Using an independent cohort of unrelated nonagenarians, we carried out a fine-scale association mapping of the region suggestive of linkage and identified three sites associated with healthy aging. These healthy-aging sites (HASs) are located in intergenic regions at 12q13–14. HAS-1 has been previously associated with multiple diseases, and an enhancer was recently mapped and experimentally validated within the site. HAS-2 is a previously uncharacterized site possessing genomic features suggestive of enhancer activity. HAS-3 contains features associated with Polycomb repression. The HASs also contain variants associated with exceptional longevity, based on a separate analysis. Our results provide insight into functional genomic networks involving non-coding regulatory elements that are involved in healthy aging and longevity. PMID:25682868
Elevated Rate of Fixation of Endogenous Retroviral Elements in Haplorhini TRIM5 and TRIM22 Genomic Sequences: Impact on Transcriptional Regulation

PubMed Central

Diehl, William E.; Johnson, Welkin E.; Hunter, Eric

2013-01-01

All genes in the TRIM6/TRIM34/TRIM5/TRIM22 locus are type I interferon inducible, with TRIM5 and TRIM22 possessing antiviral properties. Evolutionary studies involving the TRIM6/34/5/22 locus have predominantly focused on the coding sequence of the genes, finding that TRIM5 and TRIM22 have undergone high rates of both non-synonymous nucleotide replacements and in-frame insertions and deletions. We sought to understand if divergent evolutionary pressures on TRIM6/34/5/22 coding regions have selected for modifications in the non-coding regions of these genes and explore whether such non-coding changes may influence the biological function of these genes. The transcribed genomic regions, including the introns, of TRIM6, TRIM34, TRIM5, and TRIM22 from ten Haplorhini primates and one prosimian species were analyzed for transposable element content. In Haplorhini species, TRIM5 displayed an exaggerated interspecies variability, predominantly resulting from changes in the composition of transposable elements in the large first and fourth introns. Multiple lineage-specific endogenous retroviral long terminal repeats (LTRs) were identified in the first intron of TRIM5 and TRIM22. In the prosimian genome, we identified a duplication of TRIM5 with a concomitant loss of TRIM22. The transposable element content of the prosimian TRIM5 genes appears to largely represent the shared Haplorhini/prosimian ancestral state for this gene. Furthermore, we demonstrated that one such differentially fixed LTR provides for species-specific transcriptional regulation of TRIM22 in response to p53 activation. Our results identify a previously unrecognized source of species-specific variation in the antiviral TRIM genes, which can lead to alterations in their transcriptional regulation. These observations suggest that there has existed long-term pressure for exaptation of retroviral LTRs in the non-coding regions of these genes. This likely resulted from serial viral challenges and provided a mechanism for rapid alteration of transcriptional regulation. To our knowledge, this represents the first report of persistent evolutionary pressure for the capture of retroviral LTR insertions. PMID:23516500
Identification and Characterization of miRNA Transcriptome in Asiatic Cotton (Gossypium arboreum) Using High Throughput Sequencing

PubMed Central

Farooq, Muhammad; Mansoor, Shahid; Guo, Hui; Amin, Imran; Chee, Peng W.; Azim, M. Kamran; Paterson, Andrew H.

2017-01-01

MicroRNAs (miRNAs) are small 20–24nt molecules that have been well studied over the past decade due to their important regulatory roles in different cellular processes. The mature sequences are more conserved across vast phylogenetic scales than their precursors and some are conserved within entire kingdoms, hence, their loci and function can be predicted by homology searches. Different studies have been performed to elucidate miRNAs using de novo prediction methods but due to complex regulatory mechanisms or false positive in silico predictions, not all of them express in reality and sometimes computationally predicted mature transcripts differ from the actual expressed ones. With the availability of a complete genome sequence of Gossypium arboreum, it is important to annotate the genome for both coding and non-coding regions using high confidence transcript evidence, for this cotton species that is highly resistant to various biotic and abiotic stresses. Here we have analyzed the small RNA transcriptome of G. arboreum leaves and provided genome annotation of miRNAs with evidence from miRNA/miRNA∗ transcripts. A total of 446 miRNAs clustered into 224 miRNA families were found, among which 48 families are conserved in other plants and 176 are novel. Four short RNA libraries were used to shortlist best predictions based on high reads per million. The size, origin, copy numbers and transcript depth of all miRNAs along with their isoforms and targets has been reported. The highest gene copy number was observed for gar-miR7504 followed by gar-miR166, gar-miR8771, gar-miR156, and gar-miR7484. Altogether, 1274 target genes were found in G. arboreum that are enriched for 216 KEGG pathways. The resultant genomic annotations are provided in UCSC, BED format. PMID:28663752
The Murine Norovirus Core Subgenomic RNA Promoter Consists of a Stable Stem-Loop That Can Direct Accurate Initiation of RNA Synthesis

PubMed Central

Yunus, Muhammad Amir; Lin, Xiaoyan; Bailey, Dalan; Karakasiliotis, Ioannis; Chaudhry, Yasmin; Vashist, Surender; Zhang, Guo; Thorne, Lucy; Kao, C. Cheng

2014-01-01

ABSTRACT All members of the Caliciviridae family of viruses produce a subgenomic RNA during infection. The subgenomic RNA typically encodes only the major and minor capsid proteins, but in murine norovirus (MNV), the subgenomic RNA also encodes the VF1 protein, which functions to suppress host innate immune responses. To date, the mechanism of norovirus subgenomic RNA synthesis has not been characterized. We have previously described the presence of an evolutionarily conserved RNA stem-loop structure on the negative-sense RNA, the complementary sequence of which codes for the viral RNA-dependent RNA polymerase (NS7). The conserved stem-loop is positioned 6 nucleotides 3′ of the start site of the subgenomic RNA in all caliciviruses. We demonstrate that the conserved stem-loop is essential for MNV viability. Mutant MNV RNAs with substitutions in the stem-loop replicated poorly until they accumulated mutations that revert to restore the stem-loop sequence and/or structure. The stem-loop sequence functions in a noncoding context, as it was possible to restore the replication of an MNV mutant by introducing an additional copy of the stem-loop between the NS7- and VP1-coding regions. Finally, in vitro biochemical data suggest that the stem-loop sequence is sufficient for the initiation of viral RNA synthesis by the recombinant MNV RNA-dependent RNA polymerase, confirming that the stem-loop forms the core of the norovirus subgenomic promoter. IMPORTANCE Noroviruses are a significant cause of viral gastroenteritis, and it is important to understand the mechanism of norovirus RNA synthesis. Here we describe the identification of an RNA stem-loop structure that functions as the core of the norovirus subgenomic RNA promoter in cells and in vitro. This work provides new insights into the molecular mechanisms of norovirus RNA synthesis and the sequences that determine the recognition of viral RNA by the RNA-dependent RNA polymerase. PMID:25392209
The murine norovirus core subgenomic RNA promoter consists of a stable stem-loop that can direct accurate initiation of RNA synthesis.

PubMed

Yunus, Muhammad Amir; Lin, Xiaoyan; Bailey, Dalan; Karakasiliotis, Ioannis; Chaudhry, Yasmin; Vashist, Surender; Zhang, Guo; Thorne, Lucy; Kao, C Cheng; Goodfellow, Ian

2015-01-15

All members of the Caliciviridae family of viruses produce a subgenomic RNA during infection. The subgenomic RNA typically encodes only the major and minor capsid proteins, but in murine norovirus (MNV), the subgenomic RNA also encodes the VF1 protein, which functions to suppress host innate immune responses. To date, the mechanism of norovirus subgenomic RNA synthesis has not been characterized. We have previously described the presence of an evolutionarily conserved RNA stem-loop structure on the negative-sense RNA, the complementary sequence of which codes for the viral RNA-dependent RNA polymerase (NS7). The conserved stem-loop is positioned 6 nucleotides 3' of the start site of the subgenomic RNA in all caliciviruses. We demonstrate that the conserved stem-loop is essential for MNV viability. Mutant MNV RNAs with substitutions in the stem-loop replicated poorly until they accumulated mutations that revert to restore the stem-loop sequence and/or structure. The stem-loop sequence functions in a noncoding context, as it was possible to restore the replication of an MNV mutant by introducing an additional copy of the stem-loop between the NS7- and VP1-coding regions. Finally, in vitro biochemical data suggest that the stem-loop sequence is sufficient for the initiation of viral RNA synthesis by the recombinant MNV RNA-dependent RNA polymerase, confirming that the stem-loop forms the core of the norovirus subgenomic promoter. Noroviruses are a significant cause of viral gastroenteritis, and it is important to understand the mechanism of norovirus RNA synthesis. Here we describe the identification of an RNA stem-loop structure that functions as the core of the norovirus subgenomic RNA promoter in cells and in vitro. This work provides new insights into the molecular mechanisms of norovirus RNA synthesis and the sequences that determine the recognition of viral RNA by the RNA-dependent RNA polymerase. Copyright © 2015, American Society for Microbiology. All Rights Reserved.
[Relevance of long non-coding RNAs in tumour biology].

PubMed

Nagy, Zoltán; Szabó, Diána Rita; Zsippai, Adrienn; Falus, András; Rácz, Károly; Igaz, Péter

2012-09-23

The discovery of the biological relevance of non-coding RNA molecules represents one of the most significant advances in contemporary molecular biology. It has turned out that a major fraction of the non-coding part of the genome is transcribed. Beside small RNAs (including microRNAs) more and more data are disclosed concerning long non-coding RNAs of 200 nucleotides to 100 kb length that are implicated in the regulation of several basic molecular processes (cell proliferation, chromatin functioning, microRNA-mediated effects, etc.). Some of these long non-coding RNAs have been associated with human tumours, including H19, HOTAIR, MALAT1, etc., the different expression of which has been noted in various neoplasms relative to healthy tissues. Long non-coding RNAs may represent novel markers of molecular diagnostics and they might even turn out to be targets of therapeutic intervention.
Determination of the melon chloroplast and mitochondrial genome sequences reveals that the largest reported mitochondrial genome in plants contains a significant amount of DNA having a nuclear origin

PubMed Central

2011-01-01

Background The melon belongs to the Cucurbitaceae family, whose economic importance among vegetable crops is second only to Solanaceae. The melon has a small genome size (454 Mb), which makes it suitable for molecular and genetic studies. Despite similar nuclear and chloroplast genome sizes, cucurbits show great variation when their mitochondrial genomes are compared. The melon possesses the largest plant mitochondrial genome, as much as eight times larger than that of other cucurbits. Results The nucleotide sequences of the melon chloroplast and mitochondrial genomes were determined. The chloroplast genome (156,017 bp) included 132 genes, with 98 single-copy genes dispersed between the small (SSC) and large (LSC) single-copy regions and 17 duplicated genes in the inverted repeat regions (IRa and IRb). A comparison of the cucumber and melon chloroplast genomes showed differences in only approximately 5% of nucleotides, mainly due to short indels and SNPs. Additionally, 2.74 Mb of mitochondrial sequence, accounting for 95% of the estimated mitochondrial genome size, were assembled into five scaffolds and four additional unscaffolded contigs. An 84% of the mitochondrial genome is contained in a single scaffold. The gene-coding region accounted for 1.7% (45,926 bp) of the total sequence, including 51 protein-coding genes, 4 conserved ORFs, 3 rRNA genes and 24 tRNA genes. Despite the differences observed in the mitochondrial genome sizes of cucurbit species, Citrullus lanatus (379 kb), Cucurbita pepo (983 kb) and Cucumis melo (2,740 kb) share 120 kb of sequence, including the predicted protein-coding regions. Nevertheless, melon contained a high number of repetitive sequences and a high content of DNA of nuclear origin, which represented 42% and 47% of the total sequence, respectively. Conclusions Whereas the size and gene organisation of chloroplast genomes are similar among the cucurbit species, mitochondrial genomes show a wide variety of sizes, with a non-conserved structure both in gene number and organisation, as well as in the features of the noncoding DNA. The transfer of nuclear DNA to the melon mitochondrial genome and the high proportion of repetitive DNA appear to explain the size of the largest mitochondrial genome reported so far. PMID:21854637
Complete mitochondrial genome of a Asian lion (Panthera leo goojratensis).

PubMed

Li, Yu-Fei; Wang, Qiang; Zhao, Jian-ning

2016-01-01

The entire mitochondrial genome of this Asian lion (Panthera leo goojratensis) was 17,183 bp in length, gene composition and arrangement conformed to other lions, which contained the typical structure of 22 tRNAs, 2 rRNAs, 13 protein-coding genes and a non-coding region. The characteristic of the mitochondrial genome was analyzed in detail.
Single nucleotide polymorphisms in common bean: their discovery and genotyping using a multiplex detection system

USDA-ARS?s Scientific Manuscript database

Single-nucleotide Polymorphism (SNP) markers are by far the most common form of DNA polymorphism in a genome. The objectives of this study were to discover SNPs in common bean comparing sequences from coding and non-coding regions obtained from Genbank and genomic DNA and to compare sequencing resu...
Length and nucleotide sequence polymorphism at the trnL and trnF non-coding regions of chloroplast genomes among Saccharum and Erianthus species

USDA-ARS?s Scientific Manuscript database

The aneupolyploidy genome of sugarcane (Saccharum hybrids spp.) and lack of a classical genetic linkage map make genetics research most difficult for sugarcane. Whole genome sequencing and genetic characterization of sugarcane and related taxa are far behind other crops. In this study, universal PCR...
Genomic Sequence of the WHO International Standard for Hepatitis A Virus RNA.

PubMed

Jenkins, Adrian; Minhas, Rehan; Morris, Clare; Berry, Neil

2018-05-10

The World Health Organization (WHO) international standard for hepatitis A virus (HAV) RNA nucleic acid assays was characterized by complete genome sequencing. The entire coding sequence and noncoding regions were assigned HAV genotype IB. This information will aid the design, development, and evaluation of HAV RNA amplification assays. Copyright © 2018 Jenkins et al.
Complete Genome Sequences of Four Isolates of Plutella xylostella Granulovirus.

PubMed

Spence, Robert J; Noune, Christopher; Hauxwell, Caroline

2016-06-30

Granuloviruses are widespread pathogens of Plutella xylostella L. (diamondback moth) and potential biopesticides for control of this global insect pest. We report the complete genomes of four Plutella xylostella granulovirus isolates from China, Malaysia, and Taiwan exhibiting pairs of noncoding, homologous repeat regions with significant sequence variation but equivalent length. Copyright © 2016 Spence et al.
Noncoding sequence classification based on wavelet transform analysis: part II

NASA Astrophysics Data System (ADS)

Paredes, O.; Strojnik, M.; Romo-Vázquez, R.; Vélez-Pérez, H.; Ranta, R.; Garcia-Torales, G.; Scholl, M. K.; Morales, J. A.

2017-09-01

DNA sequences in human genome can be divided into the coding and noncoding ones. We hypothesize that the characteristic periodicities of the noncoding sequences are related to their function. We describe the procedure to identify these characteristic periodicities using the wavelet analysis. Our results show that three groups of noncoding sequences, each one with different biological function, may be differentiated by their wavelet coefficients within specific frequency range.
Analysis of nucleotide diversity among alleles of the major bacterial blight resistance gene Xa27 in cultivars of rice (Oryza sativa) and its wild relatives.

PubMed

Bimolata, Waikhom; Kumar, Anirudh; Sundaram, Raman Meenakshi; Laha, Gouri Shankar; Qureshi, Insaf Ahmed; Reddy, Gajjala Ashok; Ghazi, Irfan Ahmad

2013-08-01

Xa27 is one of the important R-genes, effective against bacterial blight disease of rice caused by Xanthomonas oryzae pv. oryzae (Xoo). Using natural population of Oryza, we analyzed the sequence variation in the functionally important domains of Xa27 across the Oryza species. DNA sequences of Xa27 alleles from 27 rice accessions revealed higher nucleotide diversity among the reported R-genes of rice. Sequence polymorphism analysis revealed synonymous and non-synonymous mutations in addition to a number of InDels in non-coding regions of the gene. High sequence variation was observed in the promoter region including the 5'UTR with 'π' value 0.00916 and 'θ w ' = 0.01785. Comparative analysis of the identified Xa27 alleles with that of IRBB27 and IR24 indicated the operation of both positive selection (Ka/Ks > 1) and neutral selection (Ka/Ks ≈ 0). The genetic distances of alleles of the gene from Oryza nivara were nearer to IRBB27 as compared to IR24. We also found the presence of conserved and null UPT (upregulated by transcriptional activator) box in the isolated alleles. Considerable amino acid polymorphism was localized in the trans-membrane domain for which the functional significance is yet to be elucidated. However, the absence of functional UPT box in all the alleles except IRBB27 suggests the maintenance of single resistant allele throughout the natural population.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.