genomic transcript mapping: Topics by Science.gov

Sample records for genomic transcript mapping

Temporal analysis and spatial mapping of Lymantria dispar nuclear polyhedrosis virus transcripts and in vitro translation polypeptides

Treesearch

James M. Slavicek

1991-01-01

Genomic expression of the Lymantriu dispar multinucleocapsid nuclear polyhedrosis virus (LdMNPV) was studied. Viral specific transcripts expressed in cell culture at various times from 2 through 72 h postinfection were identified and their genomic origins mapped through Northern analysis. Sixty-five distinct transcripts were identified in this...
Dynamic maps of UV damage formation and repair for the human genome

PubMed Central

Hu, Jinchuan; Adebali, Ogun; Adar, Sheera; Sancar, Aziz

2017-01-01

Formation and repair of UV-induced DNA damage in human cells are affected by cellular context. To study factors influencing damage formation and repair genome-wide, we developed a highly sensitive single-nucleotide resolution damage mapping method [high-sensitivity damage sequencing (HS–Damage-seq)]. Damage maps of both cyclobutane pyrimidine dimers (CPDs) and pyrimidine-pyrimidone (6-4) photoproducts [(6-4)PPs] from UV-irradiated cellular and naked DNA revealed that the effect of transcription factor binding on bulky adducts formation varies, depending on the specific transcription factor, damage type, and strand. We also generated time-resolved UV damage maps of both CPDs and (6-4)PPs by HS–Damage-seq and compared them to the complementary repair maps of the human genome obtained by excision repair sequencing to gain insight into factors that affect UV-induced DNA damage and repair and ultimately UV carcinogenesis. The combination of the two methods revealed that, whereas UV-induced damage is virtually uniform throughout the genome, repair is affected by chromatin states, transcription, and transcription factor binding, in a manner that depends on the type of DNA damage. PMID:28607063
Dynamic maps of UV damage formation and repair for the human genome.

PubMed

Hu, Jinchuan; Adebali, Ogun; Adar, Sheera; Sancar, Aziz

2017-06-27

Formation and repair of UV-induced DNA damage in human cells are affected by cellular context. To study factors influencing damage formation and repair genome-wide, we developed a highly sensitive single-nucleotide resolution damage mapping method [high-sensitivity damage sequencing (HS-Damage-seq)]. Damage maps of both cyclobutane pyrimidine dimers (CPDs) and pyrimidine-pyrimidone (6-4) photoproducts [(6-4)PPs] from UV-irradiated cellular and naked DNA revealed that the effect of transcription factor binding on bulky adducts formation varies, depending on the specific transcription factor, damage type, and strand. We also generated time-resolved UV damage maps of both CPDs and (6-4)PPs by HS-Damage-seq and compared them to the complementary repair maps of the human genome obtained by excision repair sequencing to gain insight into factors that affect UV-induced DNA damage and repair and ultimately UV carcinogenesis. The combination of the two methods revealed that, whereas UV-induced damage is virtually uniform throughout the genome, repair is affected by chromatin states, transcription, and transcription factor binding, in a manner that depends on the type of DNA damage.
Construction of a Transcription Map for Papillomaviruses using RACE, RNAse Protection and Primer Extension Assays

PubMed Central

Wang, Xiaohong; Zheng, Zhi-Ming

2016-01-01

Papillomaviruses are a family of small, non-enveloped DNA tumor viruses. Knowing a complete transcription map from each papillomavirus genome can provide guidance for various papillomavirus studies. This unit provides detailed protocols to construct a transcription map of human papillomavirus type 18. The same approach can be easily adapted to other transcription map studies of any other papillomavirus genotype due to the high degree of conservation in the genome structure, organization and gene expression among papillomaviruses. The focused methods are 5’- and 3’- rapid amplification of cDNA ends (RACE), which are the techniques commonly used in molecular biology to obtain the full length RNA transcript or to map a transcription start site (TSS) or an RNA polyadenylation (pA) cleavage site. Primer walking RT-PCR is a method for studying splicing junction of RACE products. In addition, RNase protection assay and primer extension are also introduced as alternative methods in the mapping analysis. PMID:26855281
Chromatin-associated RNA sequencing (ChAR-seq) maps genome-wide RNA-to-DNA contacts

PubMed Central

Jukam, David; Teran, Nicole A; Risca, Viviana I; Smith, Owen K; Johnson, Whitney L; Skotheim, Jan M; Greenleaf, William James

2018-01-01

RNA is a critical component of chromatin in eukaryotes, both as a product of transcription, and as an essential constituent of ribonucleoprotein complexes that regulate both local and global chromatin states. Here, we present a proximity ligation and sequencing method called Chromatin-Associated RNA sequencing (ChAR-seq) that maps all RNA-to-DNA contacts across the genome. Using Drosophila cells, we show that ChAR-seq provides unbiased, de novo identification of targets of chromatin-bound RNAs including nascent transcripts, chromosome-specific dosage compensation ncRNAs, and genome-wide trans-associated RNAs involved in co-transcriptional RNA processing. PMID:29648534
A Transcriptome Map of Actinobacillus pleuropneumoniae at Single-Nucleotide Resolution Using Deep RNA-Seq

PubMed Central

Su, Zhipeng; Zhu, Jiawen; Xu, Zhuofei; Xiao, Ran; Zhou, Rui; Li, Lu; Chen, Huanchun

2016-01-01

Actinobacillus pleuropneumoniae is the pathogen of porcine contagious pleuropneumoniae, a highly contagious respiratory disease of swine. Although the genome of A. pleuropneumoniae was sequenced several years ago, limited information is available on the genome-wide transcriptional analysis to accurately annotate the gene structures and regulatory elements. High-throughput RNA sequencing (RNA-seq) has been applied to study the transcriptional landscape of bacteria, which can efficiently and accurately identify gene expression regions and unknown transcriptional units, especially small non-coding RNAs (sRNAs), UTRs and regulatory regions. The aim of this study is to comprehensively analyze the transcriptome of A. pleuropneumoniae by RNA-seq in order to improve the existing genome annotation and promote our understanding of A. pleuropneumoniae gene structures and RNA-based regulation. In this study, we utilized RNA-seq to construct a single nucleotide resolution transcriptome map of A. pleuropneumoniae. More than 3.8 million high-quality reads (average length ~90 bp) from a cDNA library were generated and aligned to the reference genome. We identified 32 open reading frames encoding novel proteins that were mis-annotated in the previous genome annotations. The start sites for 35 genes based on the current genome annotation were corrected. Furthermore, 51 sRNAs in the A. pleuropneumoniae genome were discovered, of which 40 sRNAs were never reported in previous studies. The transcriptome map also enabled visualization of 5'- and 3'-UTR regions, in which contained 11 sRNAs. In addition, 351 operons covering 1230 genes throughout the whole genome were identified. The RNA-Seq based transcriptome map validated annotated genes and corrected annotations of open reading frames in the genome, and led to the identification of many functional elements (e.g. regions encoding novel proteins, non-coding sRNAs and operon structures). The transcriptional units described in this study provide a foundation for future studies concerning the gene functions and the transcriptional regulatory architectures of this pathogen. PMID:27018591
De Novo Transcriptome Assembly and Analyses of Gene Expression during Photomorphogenesis in Diploid Wheat Triticum monococcum

PubMed Central

Naithani, Sushma; Sullivan, Chris; Preece, Justin; Tiwari, Vijay K.; Elser, Justin; Leonard, Jeffrey M.; Sage, Abigail; Gresham, Cathy; Kerhornou, Arnaud; Bolser, Dan; McCarthy, Fiona; Kersey, Paul; Lazo, Gerard R.; Jaiswal, Pankaj

2014-01-01

Background Triticum monococcum (2n) is a close ancestor of T. urartu, the A-genome progenitor of cultivated hexaploid wheat, and is therefore a useful model for the study of components regulating photomorphogenesis in diploid wheat. In order to develop genetic and genomic resources for such a study, we constructed genome-wide transcriptomes of two Triticum monococcum subspecies, the wild winter wheat T. monococcum ssp. aegilopoides (accession G3116) and the domesticated spring wheat T. monococcum ssp. monococcum (accession DV92) by generating de novo assemblies of RNA-Seq data derived from both etiolated and green seedlings. Principal Findings The de novo transcriptome assemblies of DV92 and G3116 represent 120,911 and 117,969 transcripts, respectively. We successfully mapped ∼90% of these transcripts from each accession to barley and ∼95% of the transcripts to T. urartu genomes. However, only ∼77% transcripts mapped to the annotated barley genes and ∼85% transcripts mapped to the annotated T. urartu genes. Differential gene expression analyses revealed 22% more light up-regulated and 35% more light down-regulated transcripts in the G3116 transcriptome compared to DV92. The DV92 and G3116 mRNA sequence reads aligned against the reference barley genome led to the identification of ∼500,000 single nucleotide polymorphism (SNP) and ∼22,000 simple sequence repeat (SSR) sites. Conclusions De novo transcriptome assemblies of two accessions of the diploid wheat T. monococcum provide new empirical transcriptome references for improving Triticeae genome annotations, and insights into transcriptional programming during photomorphogenesis. The SNP and SSR sites identified in our analysis provide additional resources for the development of molecular markers. PMID:24821410
Global survey of genomic imprinting by transcriptome sequencing.

PubMed

Babak, Tomas; Deveale, Brian; Armour, Christopher; Raymond, Christopher; Cleary, Michele A; van der Kooy, Derek; Johnson, Jason M; Lim, Lee P

2008-11-25

Genomic imprinting restricts gene expression to a paternal or maternal allele. To date, approximately 90 imprinted transcripts have been identified in mouse, of which the majority were detected after intense interrogation of clusters of imprinted genes identified by phenotype-driven assays in mice with uniparental disomies [1]. Here we use selective priming and parallel sequencing to measure allelic bias in whole transcriptomes. By distinguishing parent-of-origin bias from strain-specific bias in embryos derived from a reciprocal cross of mice, we constructed a genome-wide map of imprinted transcription. This map was able to objectively locate over 80% of known imprinted loci and allowed the detection and confirmation of six novel imprinted genes. Even in the intensely studied embryonic day 9.5 developmental stage that we analyzed, more than half of all imprinted single-nucleotide polymorphisms did not overlap previously discovered imprinted transcripts; a large fraction of these represent novel noncoding RNAs within known imprinted loci. For example, a previously unnoticed, maternally expressed antisense transcript was mapped within the Grb10 locus. This study demonstrates the feasibility of using transcriptome sequencing for mapping of imprinted gene expression in physiologically normal animals. Such an approach will allow researchers to study imprinting without restricting themselves to individual loci or specific transcripts.
High Throughput Computing Impact on Meta Genomics (Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

ScienceCinema

Gore, Brooklin

2018-02-01

This presentation includes a brief background on High Throughput Computing, correlating gene transcription factors, optical mapping, genotype to phenotype mapping via QTL analysis, and current work on next gen sequencing.
Genome-wide transcription start site profiling in biofilm-grown Burkholderia cenocepacia J2315.

PubMed

Sass, Andrea M; Van Acker, Heleen; Förstner, Konrad U; Van Nieuwerburgh, Filip; Deforce, Dieter; Vogel, Jörg; Coenye, Tom

2015-10-13

Burkholderia cenocepacia is a soil-dwelling Gram-negative Betaproteobacterium with an important role as opportunistic pathogen in humans. Infections with B. cenocepacia are very difficult to treat due to their high intrinsic resistance to most antibiotics. Biofilm formation further adds to their antibiotic resistance. B. cenocepacia harbours a large, multi-replicon genome with a high GC-content, the reference genome of strain J2315 includes 7374 annotated genes. This study aims to annotate transcription start sites and identify novel transcripts on a whole genome scale. RNA extracted from B. cenocepacia J2315 biofilms was analysed by differential RNA-sequencing and the resulting dataset compared to data derived from conventional, global RNA-sequencing. Transcription start sites were annotated and further analysed according to their position relative to annotated genes. Four thousand ten transcription start sites were mapped over the whole B. cenocepacia genome and the primary transcription start site of 2089 genes expressed in B. cenocepacia biofilms were defined. For 64 genes a start codon alternative to the annotated one was proposed. Substantial antisense transcription for 105 genes and two novel protein coding sequences were identified. The distribution of internal transcription start sites can be used to identify genomic islands in B. cenocepacia. A potassium pump strongly induced only under biofilm conditions was found and 15 non-coding small RNAs highly expressed in biofilms were discovered. Mapping transcription start sites across the B. cenocepacia genome added relevant information to the J2315 annotation. Genes and novel regulatory RNAs putatively involved in B. cenocepacia biofilm formation were identified. These findings will help in understanding regulation of B. cenocepacia biofilm formation.
Poly A- transcripts expressed in HeLa cells.

PubMed

Wu, Qingfa; Kim, Yeong C; Lu, Jian; Xuan, Zhenyu; Chen, Jun; Zheng, Yonglan; Zhou, Tom; Zhang, Michael Q; Wu, Chung-I; Wang, San Ming

2008-07-30

Transcripts expressed in eukaryotes are classified as poly A+ transcripts or poly A- transcripts based on the presence or absence of the 3' poly A tail. Most transcripts identified so far are poly A+ transcripts, whereas the poly A- transcripts remain largely unknown. We developed the TRD (Total RNA Detection) system for transcript identification. The system detects the transcripts through the following steps: 1) depleting the abundant ribosomal and small-size transcripts; 2) synthesizing cDNA without regard to the status of the 3' poly A tail; 3) applying the 454 sequencing technology for massive 3' EST collection from the cDNA; and 4) determining the genome origins of the detected transcripts by mapping the sequences to the human genome reference sequences. Using this system, we characterized the cytoplasmic transcripts from HeLa cells. Of the 13,467 distinct 3' ESTs analyzed, 24% are poly A-, 36% are poly A+, and 40% are bimorphic with poly A+ features but without the 3' poly A tail. Most of the poly A- 3' ESTs do not match known transcript sequences; they have a similar distribution pattern in the genome as the poly A+ and bimorphic 3' ESTs, and their mapped intergenic regions are evolutionarily conserved. Experiments confirmed the authenticity of the detected poly A- transcripts. Our study provides the first large-scale sequence evidence for the presence of poly A- transcripts in eukaryotes. The abundance of the poly A- transcripts highlights the need for comprehensive identification of these transcripts for decoding the transcriptome, annotating the genome and studying biological relevance of the poly A- transcripts.
Mediator binding to UASs is broadly uncoupled from transcription and cooperative with TFIID recruitment to promoters.

PubMed

Grünberg, Sebastian; Henikoff, Steven; Hahn, Steven; Zentner, Gabriel E

2016-11-15

Mediator is a conserved, essential transcriptional coactivator complex, but its in vivo functions have remained unclear due to conflicting data regarding its genome-wide binding pattern obtained by genome-wide ChIP Here, we used ChEC-seq, a method orthogonal to ChIP, to generate a high-resolution map of Mediator binding to the yeast genome. We find that Mediator associates with upstream activating sequences (UASs) rather than the core promoter or gene body under all conditions tested. Mediator occupancy is surprisingly correlated with transcription levels at only a small fraction of genes. Using the same approach to map TFIID, we find that TFIID is associated with both TFIID- and SAGA-dependent genes and that TFIID and Mediator occupancy is cooperative. Our results clarify Mediator recruitment and binding to the genome, showing that Mediator binding to UASs is widespread, partially uncoupled from transcription, and mediated in part by TFIID. © 2016 The Authors.
RNA-Seq Alignment to Individualized Genomes Improves Transcript Abundance Estimates in Multiparent Populations

PubMed Central

Munger, Steven C.; Raghupathy, Narayanan; Choi, Kwangbom; Simons, Allen K.; Gatti, Daniel M.; Hinerfeld, Douglas A.; Svenson, Karen L.; Keller, Mark P.; Attie, Alan D.; Hibbs, Matthew A.; Graber, Joel H.; Chesler, Elissa J.; Churchill, Gary A.

2014-01-01

Massively parallel RNA sequencing (RNA-seq) has yielded a wealth of new insights into transcriptional regulation. A first step in the analysis of RNA-seq data is the alignment of short sequence reads to a common reference genome or transcriptome. Genetic variants that distinguish individual genomes from the reference sequence can cause reads to be misaligned, resulting in biased estimates of transcript abundance. Fine-tuning of read alignment algorithms does not correct this problem. We have developed Seqnature software to construct individualized diploid genomes and transcriptomes for multiparent populations and have implemented a complete analysis pipeline that incorporates other existing software tools. We demonstrate in simulated and real data sets that alignment to individualized transcriptomes increases read mapping accuracy, improves estimation of transcript abundance, and enables the direct estimation of allele-specific expression. Moreover, when applied to expression QTL mapping we find that our individualized alignment strategy corrects false-positive linkage signals and unmasks hidden associations. We recommend the use of individualized diploid genomes over reference sequence alignment for all applications of high-throughput sequencing technology in genetically diverse populations. PMID:25236449
A genome-wide SNP scan accelerates trait-regulatory genomic loci identification in chickpea

PubMed Central

Kujur, Alice; Bajaj, Deepak; Upadhyaya, Hari D.; Das, Shouvik; Ranjan, Rajeev; Shree, Tanima; Saxena, Maneesha S.; Badoni, Saurabh; Kumar, Vinod; Tripathi, Shailesh; Gowda, C.L.L.; Sharma, Shivali; Singh, Sube; Tyagi, Akhilesh K.; Parida, Swarup K.

2015-01-01

We identified 44844 high-quality SNPs by sequencing 92 diverse chickpea accessions belonging to a seed and pod trait-specific association panel using reference genome- and de novo-based GBS (genotyping-by-sequencing) assays. A GWAS (genome-wide association study) in an association panel of 211, including the 92 sequenced accessions, identified 22 major genomic loci showing significant association (explaining 23–47% phenotypic variation) with pod and seed number/plant and 100-seed weight. Eighteen trait-regulatory major genomic loci underlying 13 robust QTLs were validated and mapped on an intra-specific genetic linkage map by QTL mapping. A combinatorial approach of GWAS, QTL mapping and gene haplotype-specific LD mapping and transcript profiling uncovered one superior haplotype and favourable natural allelic variants in the upstream regulatory region of a CesA-type cellulose synthase (Ca_Kabuli_CesA3) gene regulating high pod and seed number/plant (explaining 47% phenotypic variation) in chickpea. The up-regulation of this superior gene haplotype correlated with increased transcript expression of Ca_Kabuli_CesA3 gene in the pollen and pod of high pod/seed number accession, resulting in higher cellulose accumulation for normal pollen and pollen tube growth. A rapid combinatorial genome-wide SNP genotyping-based approach has potential to dissect complex quantitative agronomic traits and delineate trait-regulatory genomic loci (candidate genes) for genetic enhancement in crop plants, including chickpea. PMID:26058368
RNA-Seq analysis of yak ovary: improving yak gene structure information and mining reproduction-related genes.

PubMed

Lan, DaoLiang; Xiong, XianRong; Wei, YanLi; Xu, Tong; Zhong, JinCheng; Zhi, XiangDong; Wang, Yong; Li, Jian

2014-09-01

RNA-Seq, a high-throughput (HT) sequencing technique, has been used effectively in large-scale transcriptomic studies, and is particularly useful for improving gene structure information and mining of new genes. In this study, RNA-Seq HT technology was employed to analyze the transcriptome of yak ovary. After Illumina-Solexa deep sequencing, 26826516 clean reads with a total of 4828772880 bp were obtained from the ovary library. Alignment analysis showed that 16992 yak genes mapped to the yak genome and 3734 of these genes were involved in alternative splicing. Gene structure refinement analysis showed that 7340 genes that were annotated in the yak genome could be extended at the 5' or 3' ends based on the alignments been the transcripts and the genome sequence. Novel transcript prediction analysis identified 6321 new transcripts with lengths ranging from 180 to 14884 bp, and 2267 of them were predicted to code proteins. BLAST analysis of the new transcripts showed that 1200?4933 mapped to the non-redundant (nr), nucleotide (nt) and/or SwissProt sequence databases. Comparative statistical analysis of the new mapped transcripts showed that the majority of them were similar to genes in Bos taurus (41.4%), Bos grunniens mutus (33.0%), Ovis aries (6.3%), Homo sapiens (2.8%), Mus musculus (1.6%) and other species. Functional analysis showed that these expressed genes were involved in various Gene Ontology (GO) categories and Kyoto Encyclopedia of Genes and Genomes pathways. GO analysis of the new transcripts found that the largest proportion of them was associated with reproduction. The results of this study will provide a basis for describing the normal transcriptome map of yak ovary and for future studies on yak breeding performance. Moreover, the results confirmed that RNA-Seq HT technology is highly advantageous in improving gene structure information and mining of new genes, as well as in providing valuable data to expand the yak genome information.
Genome-wide mapping of autonomous promoter activity in human cells

PubMed Central

van Arensbergen, Joris; FitzPatrick, Vincent D.; de Haas, Marcel; Pagie, Ludo; Sluimer, Jasper; Bussemaker, Harmen J.; van Steensel, Bas

2017-01-01

Previous methods to systematically characterize sequence-intrinsic activity of promoters have been limited by relatively low throughput and the length of sequences that could be tested. Here we present Survey of Regulatory Elements (SuRE), a method to assay more than 108 DNA fragments, each 0.2–2kb in size, for their ability to drive transcription autonomously. In SuRE, a plasmid library is constructed of random genomic fragments upstream of a 20bp barcode and decoded by paired-end sequencing. This library is then transfected into cells and transcribed barcodes are quantified in the RNA by high throughput sequencing. When applied to the human genome, we achieved a 55-fold genome coverage, allowing us to map autonomous promoter activity genome-wide. By computational modeling we delineated subregions within promoters that are relevant for their activity. For instance, we show that antisense promoter transcription is generally dependent on the sense core promoter sequences, and that most enhancers and several families of repetitive elements act as autonomous transcription initiation sites. PMID:28024146
Poly A- Transcripts Expressed in HeLa Cells

PubMed Central

Lu, Jian; Xuan, Zhenyu; Chen, Jun; Zheng, Yonglan; Zhou, Tom; Zhang, Michael Q.; Wu, Chung-I; Wang, San Ming

2008-01-01

Background Transcripts expressed in eukaryotes are classified as poly A+ transcripts or poly A- transcripts based on the presence or absence of the 3′ poly A tail. Most transcripts identified so far are poly A+ transcripts, whereas the poly A- transcripts remain largely unknown. Methodology/Principal Findings We developed the TRD (Total RNA Detection) system for transcript identification. The system detects the transcripts through the following steps: 1) depleting the abundant ribosomal and small-size transcripts; 2) synthesizing cDNA without regard to the status of the 3′ poly A tail; 3) applying the 454 sequencing technology for massive 3′ EST collection from the cDNA; and 4) determining the genome origins of the detected transcripts by mapping the sequences to the human genome reference sequences. Using this system, we characterized the cytoplasmic transcripts from HeLa cells. Of the 13,467 distinct 3′ ESTs analyzed, 24% are poly A-, 36% are poly A+, and 40% are bimorphic with poly A+ features but without the 3′ poly A tail. Most of the poly A- 3′ ESTs do not match known transcript sequences; they have a similar distribution pattern in the genome as the poly A+ and bimorphic 3′ ESTs, and their mapped intergenic regions are evolutionarily conserved. Experiments confirmed the authenticity of the detected poly A- transcripts. Conclusion/Significance Our study provides the first large-scale sequence evidence for the presence of poly A- transcripts in eukaryotes. The abundance of the poly A- transcripts highlights the need for comprehensive identification of these transcripts for decoding the transcriptome, annotating the genome and studying biological relevance of the poly A- transcripts. PMID:18665230
Genome-wide Analysis Reveals Extensive Functional Interaction between DNA Replication Initiation and Transcription in the Genome of Trypanosoma brucei

PubMed Central

Tiengwe, Calvin; Marcello, Lucio; Farr, Helen; Dickens, Nicholas; Kelly, Steven; Swiderski, Michal; Vaughan, Diane; Gull, Keith; Barry, J. David; Bell, Stephen D.; McCulloch, Richard

2012-01-01

Summary Identification of replication initiation sites, termed origins, is a crucial step in understanding genome transmission in any organism. Transcription of the Trypanosoma brucei genome is highly unusual, with each chromosome comprising a few discrete transcription units. To understand how DNA replication occurs in the context of such organization, we have performed genome-wide mapping of the binding sites of the replication initiator ORC1/CDC6 and have identified replication origins, revealing that both localize to the boundaries of the transcription units. A remarkably small number of active origins is seen, whose spacing is greater than in any other eukaryote. We show that replication and transcription in T. brucei have a profound functional overlap, as reducing ORC1/CDC6 levels leads to genome-wide increases in mRNA levels arising from the boundaries of the transcription units. In addition, ORC1/CDC6 loss causes derepression of silent Variant Surface Glycoprotein genes, which are critical for host immune evasion. PMID:22840408
Hematopoietic transcriptional mechanisms: from locus-specific to genome-wide vantage points.

PubMed

DeVilbiss, Andrew W; Sanalkumar, Rajendran; Johnson, Kirby D; Keles, Sunduz; Bresnick, Emery H

2014-08-01

Hematopoiesis is an exquisitely regulated process in which stem cells in the developing embryo and the adult generate progenitor cells that give rise to all blood lineages. Master regulatory transcription factors control hematopoiesis by integrating signals from the microenvironment and dynamically establishing and maintaining genetic networks. One of the most rudimentary aspects of cell type-specific transcription factor function, how they occupy a highly restricted cohort of cis-elements in chromatin, remains poorly understood. Transformative technologic advances involving the coupling of next-generation DNA sequencing technology with the chromatin immunoprecipitation assay (ChIP-seq) have enabled genome-wide mapping of factor occupancy patterns. However, formidable problems remain; notably, ChIP-seq analysis yields hundreds to thousands of chromatin sites occupied by a given transcription factor, and only a fraction of the sites appear to be endowed with critical, non-redundant function. It has become en vogue to map transcription factor occupancy patterns genome-wide, while using powerful statistical tools to establish correlations to inform biology and mechanisms. With the advent of revolutionary genome editing technologies, one can now reach beyond correlations to conduct definitive hypothesis testing. This review focuses on key discoveries that have emerged during the path from single loci to genome-wide analyses, specifically in the context of hematopoietic transcriptional mechanisms. Copyright © 2014 ISEH - International Society for Experimental Hematology. Published by Elsevier Inc. All rights reserved.
A Genome-wide Combinatorial Strategy Dissects Complex Genetic Architecture of Seed Coat Color in Chickpea

PubMed Central

Bajaj, Deepak; Das, Shouvik; Upadhyaya, Hari D.; Ranjan, Rajeev; Badoni, Saurabh; Kumar, Vinod; Tripathi, Shailesh; Gowda, C. L. Laxmipathi; Sharma, Shivali; Singh, Sube; Tyagi, Akhilesh K.; Parida, Swarup K.

2015-01-01

The study identified 9045 high-quality SNPs employing both genome-wide GBS- and candidate gene-based SNP genotyping assays in 172, including 93 cultivated (desi and kabuli) and 79 wild chickpea accessions. The GWAS in a structured population of 93 sequenced accessions detected 15 major genomic loci exhibiting significant association with seed coat color. Five seed color-associated major genomic loci underlying robust QTLs mapped on a high-density intra-specific genetic linkage map were validated by QTL mapping. The integration of association and QTL mapping with gene haplotype-specific LD mapping and transcript profiling identified novel allelic variants (non-synonymous SNPs) and haplotypes in a MATE secondary transporter gene regulating light/yellow brown and beige seed coat color differentiation in chickpea. The down-regulation and decreased transcript expression of beige seed coat color-associated MATE gene haplotype was correlated with reduced proanthocyanidins accumulation in the mature seed coats of beige than light/yellow brown seed colored desi and kabuli accessions for their coloration/pigmentation. This seed color-regulating MATE gene revealed strong purifying selection pressure primarily in LB/YB seed colored desi and wild Cicer reticulatum accessions compared with the BE seed colored kabuli accessions. The functionally relevant molecular tags identified have potential to decipher the complex transcriptional regulatory gene function of seed coat coloration and for understanding the selective sweep-based seed color trait evolutionary pattern in cultivated and wild accessions during chickpea domestication. The genome-wide integrated approach employed will expedite marker-assisted genetic enhancement for developing cultivars with desirable seed coat color types in chickpea. PMID:26635822

A high-density transcript linkage map with 1,845 expressed genes positioned by microarray-based Single Feature Polymorphisms (SFP) in Eucalyptus

PubMed Central

2011-01-01

Background Technological advances are progressively increasing the application of genomics to a wider array of economically and ecologically important species. High-density maps enriched for transcribed genes facilitate the discovery of connections between genes and phenotypes. We report the construction of a high-density linkage map of expressed genes for the heterozygous genome of Eucalyptus using Single Feature Polymorphism (SFP) markers. Results SFP discovery and mapping was achieved using pseudo-testcross screening and selective mapping to simultaneously optimize linkage mapping and microarray costs. SFP genotyping was carried out by hybridizing complementary RNA prepared from 4.5 year-old trees xylem to an SFP array containing 103,000 25-mer oligonucleotide probes representing 20,726 unigenes derived from a modest size expressed sequence tags collection. An SFP-mapping microarray with 43,777 selected candidate SFP probes representing 15,698 genes was subsequently designed and used to genotype SFPs in a larger subset of the segregating population drawn by selective mapping. A total of 1,845 genes were mapped, with 884 of them ordered with high likelihood support on a framework map anchored to 180 microsatellites with average density of 1.2 cM. Using more probes per unigene increased by two-fold the likelihood of detecting segregating SFPs eventually resulting in more genes mapped. In silico validation showed that 87% of the SFPs map to the expected location on the 4.5X draft sequence of the Eucalyptus grandis genome. Conclusions The Eucalyptus 1,845 gene map is the most highly enriched map for transcriptional information for any forest tree species to date. It represents a major improvement on the number of genes previously positioned on Eucalyptus maps and provides an initial glimpse at the gene space for this global tree genome. A general protocol is proposed to build high-density transcript linkage maps in less characterized plant species by SFP genotyping with a concurrent objective of reducing microarray costs. HIgh-density gene-rich maps represent a powerful resource to assist gene discovery endeavors when used in combination with QTL and association mapping and should be especially valuable to assist the assembly of reference genome sequences soon to come for several plant and animal species. PMID:21492453
Pervasive, Genome-Wide Transcription in the Organelle Genomes of Diverse Plastid-Bearing Protists.

PubMed

Sanitá Lima, Matheus; Smith, David Roy

2017-11-06

Organelle genomes are among the most sequenced kinds of chromosome. This is largely because they are small and widely used in molecular studies, but also because next-generation sequencing technologies made sequencing easier, faster, and cheaper. However, studies of organelle RNA have not kept pace with those of DNA, despite huge amounts of freely available eukaryotic RNA-sequencing (RNA-seq) data. Little is known about organelle transcription in nonmodel species, and most of the available eukaryotic RNA-seq data have not been mined for organelle transcripts. Here, we use publicly available RNA-seq experiments to investigate organelle transcription in 30 diverse plastid-bearing protists with varying organelle genomic architectures. Mapping RNA-seq data to organelle genomes revealed pervasive, genome-wide transcription, regardless of the taxonomic grouping, gene organization, or noncoding content. For every species analyzed, transcripts covered ≥85% of the mitochondrial and/or plastid genomes (all of which were ≤105 kb), indicating that most of the organelle DNA-coding and noncoding-is transcriptionally active. These results follow earlier studies of model species showing that organellar transcription is coupled and ubiquitous across the genome, requiring significant downstream processing of polycistronic transcripts. Our findings suggest that noncoding organelle DNA can be transcriptionally active, raising questions about the underlying function of these transcripts and underscoring the utility of publicly available RNA-seq data for recovering complete genome sequences. If pervasive transcription is also found in bigger organelle genomes (>105 kb) and across a broader range of eukaryotes, this could indicate that noncoding organelle RNAs are regulating fundamental processes within eukaryotic cells. Copyright © 2017 Sanitá Lima and Smith.
[Exon-intron structure of the fet5+ gene of Schizosaccharomyces pombe and physical mapping of genome encompassing regions].

PubMed

Shpakovskiĭ, G V; Lebedenko, E N

1998-01-01

Plasmid pYUK3 bearing the fet5+ gene of Schizosaccharomyces pombe was isolated from a genomic library of the fission yeast, and a detailed physical map of the whole genomic insert (ca. 9.6 Kbp) was constructed. The primary structure of the fet5+ gene and its flanking regions is established. The gene contains a single 45-bp intron in its distal part. A typical TATA-box (TATAAG) was found in the 5'-noncoding region ca. 50 bp upstream of the putative start of transcription, and the 3'-noncoding region contains AT-rich palindromes, which are probably involved in termination of the fet5+ transcription. A previously unidentified gene of Sz. pombe encoding a protein with some similarity to one of the transcriptional activators from the TBP (TATA-binding protein) group of SPT factors of transcription was found in the vicinity of the fet5+ gene. Taking into account that cDNA of the fet5(+)-gene was isolated as a suppressor of the genetic-defect of nuclear RNA polymerases I-III (Bioorg. Khim., 1997, vol. 23, No 3, pp. 234-237), this vicinity may be the first evidence of possible clustering, in the genome of the fission yeast, of genes participating in transcription regulation.
Cloning and Characterization of the Scalloped Region of Drosophila Melanogaster

PubMed Central

Campbell, S. D.; Duttaroy, A.; Katzen, A. L.; Chovnick, A.

1991-01-01

Viable mutants of the scalloped gene (sd) of Drosophila melanogaster exhibit defects that can include gapping of the wing margin and ectopic bristle formation on the wing. Lethal sd alleles characterized in the present study now implicate this gene in a genetic function essential for normal development. In order to further characterize the developmental role of this gene, we have undertaken to clone and characterize the region where sd maps. A P[ry(+)] transposon insertion at 13F associated with sd([ry+2216]) served as the starting point for a 42-kb chromosomal walk. Molecular lesions associated with viable and lethal sd alleles were characterized by genomic hybridization analysis as a means of defining the extent of the gene. DNA rearrangements associated with 11 viable sd alleles map to a 2-kb interval which appears to be a ``hot spot'' for P element activity. Four of five recessive lethal sd mutations were mapped by denaturing gradient gel electrophoresis to a region 12-14 kb away from the region of viable lesions. In a sd(+) genotype, at least two structurally related and developmentally regulated transcripts hybridize to the genomic region where several sd lethal alleles have been localized. A viable mutation, sd(58), used for comparison in the transcript analysis, makes at least two slightly smaller transcripts that also hybridize to this region. Preliminary analysis of cDNA clones has identified three structurally related transcripts that hybridize to this genomic region. The 5' end of these transcripts extends into the 2-kb genomic region wherein DNA rearrangements were seen in the P element rearrangements. We favor the view that the transcripts represented by these cDNA clones are products of the sd gene. If this is true, the sd gene would include genomic sequences extending over at least 14 kb of the described chromosomal walk, and would appear to be subject to alternative splicing. PMID:1706292
Creating and validating cis-regulatory maps of tissue-specific gene expression regulation

PubMed Central

O'Connor, Timothy R.; Bailey, Timothy L.

2014-01-01

Predicting which genomic regions control the transcription of a given gene is a challenge. We present a novel computational approach for creating and validating maps that associate genomic regions (cis-regulatory modules–CRMs) with genes. The method infers regulatory relationships that explain gene expression observed in a test tissue using widely available genomic data for ‘other’ tissues. To predict the regulatory targets of a CRM, we use cross-tissue correlation between histone modifications present at the CRM and expression at genes within 1 Mbp of it. To validate cis-regulatory maps, we show that they yield more accurate models of gene expression than carefully constructed control maps. These gene expression models predict observed gene expression from transcription factor binding in the CRMs linked to that gene. We show that our maps are able to identify long-range regulatory interactions and improve substantially over maps linking genes and CRMs based on either the control maps or a ‘nearest neighbor’ heuristic. Our results also show that it is essential to include CRMs predicted in multiple tissues during map-building, that H3K27ac is the most informative histone modification, and that CAGE is the most informative measure of gene expression for creating cis-regulatory maps. PMID:25200088
High-confidence coding and noncoding transcriptome maps

PubMed Central

2017-01-01

The advent of high-throughput RNA sequencing (RNA-seq) has led to the discovery of unprecedentedly immense transcriptomes encoded by eukaryotic genomes. However, the transcriptome maps are still incomplete partly because they were mostly reconstructed based on RNA-seq reads that lack their orientations (known as unstranded reads) and certain boundary information. Methods to expand the usability of unstranded RNA-seq data by predetermining the orientation of the reads and precisely determining the boundaries of assembled transcripts could significantly benefit the quality of the resulting transcriptome maps. Here, we present a high-performing transcriptome assembly pipeline, called CAFE, that significantly improves the original assemblies, respectively assembled with stranded and/or unstranded RNA-seq data, by orienting unstranded reads using the maximum likelihood estimation and by integrating information about transcription start sites and cleavage and polyadenylation sites. Applying large-scale transcriptomic data comprising 230 billion RNA-seq reads from the ENCODE, Human BodyMap 2.0, The Cancer Genome Atlas, and GTEx projects, CAFE enabled us to predict the directions of about 220 billion unstranded reads, which led to the construction of more accurate transcriptome maps, comparable to the manually curated map, and a comprehensive lncRNA catalog that includes thousands of novel lncRNAs. Our pipeline should not only help to build comprehensive, precise transcriptome maps from complex genomes but also to expand the universe of noncoding genomes. PMID:28396519
Recent Advancements in DNA Damage-Transcription Crosstalk and High-Resolution Mapping of DNA Breaks.

PubMed

Vitelli, Valerio; Galbiati, Alessandro; Iannelli, Fabio; Pessina, Fabio; Sharma, Sheetal; d'Adda di Fagagna, Fabrizio

2017-08-31

Until recently, DNA damage arising from physiological DNA metabolism was considered a detrimental by-product for cells. However, an increasing amount of evidence has shown that DNA damage could have a positive role in transcription activation. In particular, DNA damage has been detected in transcriptional elements following different stimuli. These physiological DNA breaks are thought to be instrumental for the correct expression of genomic loci through different mechanisms. In this regard, although a plethora of methods are available to precisely map transcribed regions and transcription start sites, commonly used techniques for mapping DNA breaks lack sufficient resolution and sensitivity to draw a robust correlation between DNA damage generation and transcription. Recently, however, several methods have been developed to map DNA damage at single-nucleotide resolution, thus providing a new set of tools to correlate DNA damage and transcription. Here, we review how DNA damage can positively regulate transcription initiation, the current techniques for mapping DNA breaks at high resolution, and how these techniques can benefit future studies of DNA damage and transcription.
Integrative analysis of the Caenorhabditis elegans genome by the modENCODE project.

PubMed

Gerstein, Mark B; Lu, Zhi John; Van Nostrand, Eric L; Cheng, Chao; Arshinoff, Bradley I; Liu, Tao; Yip, Kevin Y; Robilotto, Rebecca; Rechtsteiner, Andreas; Ikegami, Kohta; Alves, Pedro; Chateigner, Aurelien; Perry, Marc; Morris, Mitzi; Auerbach, Raymond K; Feng, Xin; Leng, Jing; Vielle, Anne; Niu, Wei; Rhrissorrakrai, Kahn; Agarwal, Ashish; Alexander, Roger P; Barber, Galt; Brdlik, Cathleen M; Brennan, Jennifer; Brouillet, Jeremy Jean; Carr, Adrian; Cheung, Ming-Sin; Clawson, Hiram; Contrino, Sergio; Dannenberg, Luke O; Dernburg, Abby F; Desai, Arshad; Dick, Lindsay; Dosé, Andréa C; Du, Jiang; Egelhofer, Thea; Ercan, Sevinc; Euskirchen, Ghia; Ewing, Brent; Feingold, Elise A; Gassmann, Reto; Good, Peter J; Green, Phil; Gullier, Francois; Gutwein, Michelle; Guyer, Mark S; Habegger, Lukas; Han, Ting; Henikoff, Jorja G; Henz, Stefan R; Hinrichs, Angie; Holster, Heather; Hyman, Tony; Iniguez, A Leo; Janette, Judith; Jensen, Morten; Kato, Masaomi; Kent, W James; Kephart, Ellen; Khivansara, Vishal; Khurana, Ekta; Kim, John K; Kolasinska-Zwierz, Paulina; Lai, Eric C; Latorre, Isabel; Leahey, Amber; Lewis, Suzanna; Lloyd, Paul; Lochovsky, Lucas; Lowdon, Rebecca F; Lubling, Yaniv; Lyne, Rachel; MacCoss, Michael; Mackowiak, Sebastian D; Mangone, Marco; McKay, Sheldon; Mecenas, Desirea; Merrihew, Gennifer; Miller, David M; Muroyama, Andrew; Murray, John I; Ooi, Siew-Loon; Pham, Hoang; Phippen, Taryn; Preston, Elicia A; Rajewsky, Nikolaus; Rätsch, Gunnar; Rosenbaum, Heidi; Rozowsky, Joel; Rutherford, Kim; Ruzanov, Peter; Sarov, Mihail; Sasidharan, Rajkumar; Sboner, Andrea; Scheid, Paul; Segal, Eran; Shin, Hyunjin; Shou, Chong; Slack, Frank J; Slightam, Cindie; Smith, Richard; Spencer, William C; Stinson, E O; Taing, Scott; Takasaki, Teruaki; Vafeados, Dionne; Voronina, Ksenia; Wang, Guilin; Washington, Nicole L; Whittle, Christina M; Wu, Beijing; Yan, Koon-Kiu; Zeller, Georg; Zha, Zheng; Zhong, Mei; Zhou, Xingliang; Ahringer, Julie; Strome, Susan; Gunsalus, Kristin C; Micklem, Gos; Liu, X Shirley; Reinke, Valerie; Kim, Stuart K; Hillier, LaDeana W; Henikoff, Steven; Piano, Fabio; Snyder, Michael; Stein, Lincoln; Lieb, Jason D; Waterston, Robert H

2010-12-24

We systematically generated large-scale data sets to improve genome annotation for the nematode Caenorhabditis elegans, a key model organism. These data sets include transcriptome profiling across a developmental time course, genome-wide identification of transcription factor-binding sites, and maps of chromatin organization. From this, we created more complete and accurate gene models, including alternative splice forms and candidate noncoding RNAs. We constructed hierarchical networks of transcription factor-binding and microRNA interactions and discovered chromosomal locations bound by an unusually large number of transcription factors. Different patterns of chromatin composition and histone modification were revealed between chromosome arms and centers, with similarly prominent differences between autosomes and the X chromosome. Integrating data types, we built statistical models relating chromatin, transcription factor binding, and gene expression. Overall, our analyses ascribed putative functions to most of the conserved genome.
Identification of hypertension-related genes through an integrated genomic-transcriptomic approach.

PubMed

Yagil, Chana; Hubner, Norbert; Monti, Jan; Schulz, Herbert; Sapojnikov, Marina; Luft, Friedrich C; Ganten, Detlev; Yagil, Yoram

2005-04-01

In search for the genetic basis of hypertension, we applied an integrated genomic-transcriptomic approach to identify genes involved in the pathogenesis of hypertension in the Sabra rat model of salt-susceptibility. In the genomic arm of the project, we previously detected in male rats two salt-susceptibility QTLs on chromosome 1, SS1a (D1Mgh2-D1Mit11; span 43.1 cM) and SS1b (D1Mit11-D1Mit4; span 18 cM). In the transcriptomic arm, we studied differential gene expression in kidneys of SBH/y and SBN/y rats that had been fed regular diet or salt-loaded. We used the Affymetrix Rat Genome RAE230 GeneChip and probed >30,000 transcripts. The research algorithm called for an initial genome-wide screen for differentially expressed transcripts between the study groups. This step was followed by cluster analysis based on 2x2 ANOVA to identify transcripts that were of relevance specifically to salt-sensitivity and hypertension and to salt-resistance. The two arms of the project were integrated by identifying those differentially expressed transcripts that showed an allele-specific hypertensive effect on salt-loading and that mapped within the defined boundaries of the salt-susceptibility QTLs on chromosome 1. The differentially expressed transcripts were confirmed by RT-PCR. Of the 2933 genes annotated to rat chromosome 1, 1102 genes were identified within the boundaries of the two blood pressure QTLs. The microarray identified 2470 transcripts that were differentially expressed between the study groups. Cluster analysis identified genome-wide 192 genes that were relevant to salt-susceptibility and/or hypertension, 19 of which mapped to chromosome 1. Eight of these genes mapped within the boundaries of QTLs SS1a and SS1b. RT-PCR confirmed 7 genes, leaving TcTex1, Myadm, Lisch7, Axl-like, Fah, PRC1-like, and Serpinh1. None of these genes has been implicated in hypertension before. These genes become henceforth targets for our continuing search for the genetic basis of hypertension.
A combinatorial approach of comprehensive QTL-based comparative genome mapping and transcript profiling identified a seed weight-regulating candidate gene in chickpea

PubMed Central

Bajaj, Deepak; Upadhyaya, Hari D.; Khan, Yusuf; Das, Shouvik; Badoni, Saurabh; Shree, Tanima; Kumar, Vinod; Tripathi, Shailesh; Gowda, C. L. L.; Singh, Sube; Sharma, Shivali; Tyagi, Akhilesh K.; Chattopdhyay, Debasis; Parida, Swarup K.

2015-01-01

High experimental validation/genotyping success rate (94–96%) and intra-specific polymorphic potential (82–96%) of 1536 SNP and 472 SSR markers showing in silico polymorphism between desi ICC 4958 and kabuli ICC 12968 chickpea was obtained in a 190 mapping population (ICC 4958 × ICC 12968) and 92 diverse desi and kabuli genotypes. A high-density 2001 marker-based intra-specific genetic linkage map comprising of eight LGs constructed is comparatively much saturated (mean map-density: 0.94 cM) in contrast to existing intra-specific genetic maps in chickpea. Fifteen robust QTLs (PVE: 8.8–25.8% with LOD: 7.0–13.8) associated with pod and seed number/plant (PN and SN) and 100 seed weight (SW) were identified and mapped on 10 major genomic regions of eight LGs. One of 126.8 kb major genomic region harbouring a strong SW-associated robust QTL (Caq'SW1.1: 169.1–171.3 cM) has been delineated by integrating high-resolution QTL mapping with comprehensive marker-based comparative genome mapping and differential expression profiling. This identified one potential regulatory SNP (G/A) in the cis-acting element of candidate ERF (ethylene responsive factor) TF (transcription factor) gene governing seed weight in chickpea. The functionally relevant molecular tags identified have potential to be utilized for marker-assisted genetic improvement of chickpea. PMID:25786576
RNA-Seq Based Transcriptional Map of Bovine Respiratory Disease Pathogen “Histophilus somni 2336”

PubMed Central

Kumar, Ranjit; Lawrence, Mark L.; Watt, James; Cooksey, Amanda M.; Burgess, Shane C.; Nanduri, Bindu

2012-01-01

Genome structural annotation, i.e., identification and demarcation of the boundaries for all the functional elements in a genome (e.g., genes, non-coding RNAs, proteins and regulatory elements), is a prerequisite for systems level analysis. Current genome annotation programs do not identify all of the functional elements of the genome, especially small non-coding RNAs (sRNAs). Whole genome transcriptome analysis is a complementary method to identify “novel” genes, small RNAs, regulatory regions, and operon structures, thus improving the structural annotation in bacteria. In particular, the identification of non-coding RNAs has revealed their widespread occurrence and functional importance in gene regulation, stress and virulence. However, very little is known about non-coding transcripts in Histophilus somni, one of the causative agents of Bovine Respiratory Disease (BRD) as well as bovine infertility, abortion, septicemia, arthritis, myocarditis, and thrombotic meningoencephalitis. In this study, we report a single nucleotide resolution transcriptome map of H. somni strain 2336 using RNA-Seq method. The RNA-Seq based transcriptome map identified 94 sRNAs in the H. somni genome of which 82 sRNAs were never predicted or reported in earlier studies. We also identified 38 novel potential protein coding open reading frames that were absent in the current genome annotation. The transcriptome map allowed the identification of 278 operon (total 730 genes) structures in the genome. When compared with the genome sequence of a non-virulent strain 129Pt, a disproportionate number of sRNAs (∼30%) were located in genomic region unique to strain 2336 (∼18% of the total genome). This observation suggests that a number of the newly identified sRNAs in strain 2336 may be involved in strain-specific adaptations. PMID:22276113
RNA-seq based transcriptional map of bovine respiratory disease pathogen "Histophilus somni 2336".

PubMed

Kumar, Ranjit; Lawrence, Mark L; Watt, James; Cooksey, Amanda M; Burgess, Shane C; Nanduri, Bindu

2012-01-01

Genome structural annotation, i.e., identification and demarcation of the boundaries for all the functional elements in a genome (e.g., genes, non-coding RNAs, proteins and regulatory elements), is a prerequisite for systems level analysis. Current genome annotation programs do not identify all of the functional elements of the genome, especially small non-coding RNAs (sRNAs). Whole genome transcriptome analysis is a complementary method to identify "novel" genes, small RNAs, regulatory regions, and operon structures, thus improving the structural annotation in bacteria. In particular, the identification of non-coding RNAs has revealed their widespread occurrence and functional importance in gene regulation, stress and virulence. However, very little is known about non-coding transcripts in Histophilus somni, one of the causative agents of Bovine Respiratory Disease (BRD) as well as bovine infertility, abortion, septicemia, arthritis, myocarditis, and thrombotic meningoencephalitis. In this study, we report a single nucleotide resolution transcriptome map of H. somni strain 2336 using RNA-Seq method.The RNA-Seq based transcriptome map identified 94 sRNAs in the H. somni genome of which 82 sRNAs were never predicted or reported in earlier studies. We also identified 38 novel potential protein coding open reading frames that were absent in the current genome annotation. The transcriptome map allowed the identification of 278 operon (total 730 genes) structures in the genome. When compared with the genome sequence of a non-virulent strain 129Pt, a disproportionate number of sRNAs (∼30%) were located in genomic region unique to strain 2336 (∼18% of the total genome). This observation suggests that a number of the newly identified sRNAs in strain 2336 may be involved in strain-specific adaptations.
A draft annotation and overview of the human genome

PubMed Central

Wright, Fred A; Lemon, William J; Zhao, Wei D; Sears, Russell; Zhuo, Degen; Wang, Jian-Ping; Yang, Hee-Yung; Baer, Troy; Stredney, Don; Spitzner, Joe; Stutz, Al; Krahe, Ralf; Yuan, Bo

2001-01-01

Background The recent draft assembly of the human genome provides a unified basis for describing genomic structure and function. The draft is sufficiently accurate to provide useful annotation, enabling direct observations of previously inferred biological phenomena. Results We report here a functionally annotated human gene index placed directly on the genome. The index is based on the integration of public transcript, protein, and mapping information, supplemented with computational prediction. We describe numerous global features of the genome and examine the relationship of various genetic maps with the assembly. In addition, initial sequence analysis reveals highly ordered chromosomal landscapes associated with paralogous gene clusters and distinct functional compartments. Finally, these annotation data were synthesized to produce observations of gene density and number that accord well with historical estimates. Such a global approach had previously been described only for chromosomes 21 and 22, which together account for 2.2% of the genome. Conclusions We estimate that the genome contains 65,000-75,000 transcriptional units, with exon sequences comprising 4%. The creation of a comprehensive gene index requires the synthesis of all available computational and experimental evidence. PMID:11516338
Updates to the Cool Season Food Legume Genome Database: Resources for pea, lentil, faba bean and chickpea genetics, genomics and breeding

USDA-ARS?s Scientific Manuscript database

The Cool Season Food Legume Genome database (CSFL, www.coolseasonfoodlegume.org) is an online resource for genomics, genetics, and breeding research for chickpea, lentil,pea, and faba bean. The user-friendly and curated website allows for all publicly available map,marker,trait, gene,transcript, ger...
Identification and Classification of New Transcripts in Dorper and Small-Tailed Han Sheep Skeletal Muscle Transcriptomes.

PubMed

Chao, Tianle; Wang, Guizhi; Wang, Jianmin; Liu, Zhaohua; Ji, Zhibin; Hou, Lei; Zhang, Chunlan

2016-01-01

High-throughput mRNA sequencing enables the discovery of new transcripts and additional parts of incompletely annotated transcripts. Compared with the human and cow genomes, the reference annotation level of the sheep genome is still low. An investigation of new transcripts in sheep skeletal muscle will improve our understanding of muscle development. Therefore, applying high-throughput sequencing, two cDNA libraries from the biceps brachii of small-tailed Han sheep and Dorper sheep were constructed, and whole-transcriptome analysis was performed to determine the unknown transcript catalogue of this tissue. In this study, 40,129 transcripts were finally mapped to the sheep genome. Among them, 3,467 transcripts were determined to be unannotated in the current reference sheep genome and were defined as new transcripts. Based on protein-coding capacity prediction and comparative analysis of sequence similarity, 246 transcripts were classified as portions of unannotated genes or incompletely annotated genes. Another 1,520 transcripts were predicted with high confidence to be long non-coding RNAs. Our analysis also revealed 334 new transcripts that displayed specific expression in ruminants and uncovered a number of new transcripts without intergenus homology but with specific expression in sheep skeletal muscle. The results confirmed a complex transcript pattern of coding and non-coding RNA in sheep skeletal muscle. This study provided important information concerning the sheep genome and transcriptome annotation, which could provide a basis for further study.
Genome Maps, a new generation genome browser.

PubMed

Medina, Ignacio; Salavert, Francisco; Sanchez, Rubén; de Maria, Alejandro; Alonso, Roberto; Escobar, Pablo; Bleda, Marta; Dopazo, Joaquín

2013-07-01

Genome browsers have gained importance as more genomes and related genomic information become available. However, the increase of information brought about by new generation sequencing technologies is, at the same time, causing a subtle but continuous decrease in the efficiency of conventional genome browsers. Here, we present Genome Maps, a genome browser that implements an innovative model of data transfer and management. The program uses highly efficient technologies from the new HTML5 standard, such as scalable vector graphics, that optimize workloads at both server and client sides and ensure future scalability. Thus, data management and representation are entirely carried out by the browser, without the need of any Java Applet, Flash or other plug-in technology installation. Relevant biological data on genes, transcripts, exons, regulatory features, single-nucleotide polymorphisms, karyotype and so forth, are imported from web services and are available as tracks. In addition, several DAS servers are already included in Genome Maps. As a novelty, this web-based genome browser allows the local upload of huge genomic data files (e.g. VCF or BAM) that can be dynamically visualized in real time at the client side, thus facilitating the management of medical data affected by privacy restrictions. Finally, Genome Maps can easily be integrated in any web application by including only a few lines of code. Genome Maps is an open source collaborative initiative available in the GitHub repository (https://github.com/compbio-bigdata-viz/genome-maps). Genome Maps is available at: http://www.genomemaps.org.
Genome Maps, a new generation genome browser

PubMed Central

Medina, Ignacio; Salavert, Francisco; Sanchez, Rubén; de Maria, Alejandro; Alonso, Roberto; Escobar, Pablo; Bleda, Marta; Dopazo, Joaquín

2013-01-01

Genome browsers have gained importance as more genomes and related genomic information become available. However, the increase of information brought about by new generation sequencing technologies is, at the same time, causing a subtle but continuous decrease in the efficiency of conventional genome browsers. Here, we present Genome Maps, a genome browser that implements an innovative model of data transfer and management. The program uses highly efficient technologies from the new HTML5 standard, such as scalable vector graphics, that optimize workloads at both server and client sides and ensure future scalability. Thus, data management and representation are entirely carried out by the browser, without the need of any Java Applet, Flash or other plug-in technology installation. Relevant biological data on genes, transcripts, exons, regulatory features, single-nucleotide polymorphisms, karyotype and so forth, are imported from web services and are available as tracks. In addition, several DAS servers are already included in Genome Maps. As a novelty, this web-based genome browser allows the local upload of huge genomic data files (e.g. VCF or BAM) that can be dynamically visualized in real time at the client side, thus facilitating the management of medical data affected by privacy restrictions. Finally, Genome Maps can easily be integrated in any web application by including only a few lines of code. Genome Maps is an open source collaborative initiative available in the GitHub repository (https://github.com/compbio-bigdata-viz/genome-maps). Genome Maps is available at: http://www.genomemaps.org. PMID:23748955
Widespread antisense transcription of Populus genome under drought.

PubMed

Yuan, Yinan; Chen, Su

2018-06-06

Antisense transcription is widespread in many genomes and plays important regulatory roles in gene expression. The objective of our study was to investigate the extent and functional relevance of antisense transcription in forest trees. We employed Populus, a model tree species, to probe the antisense transcriptional response of tree genome under drought, through stranded RNA-seq analysis. We detected nearly 48% of annotated Populus gene loci with antisense transcripts and 44% of them with co-transcription from both DNA strands. Global distribution of reads pattern across annotated gene regions uncovered that antisense transcription was enriched in untranslated regions while sense reads were predominantly mapped in coding exons. We further detected 1185 drought-responsive sense and antisense gene loci and identified a strong positive correlation between the expression of antisense and sense transcripts. Additionally, we assessed the antisense expression in introns and found a strong correlation between intronic expression and exonic expression, confirming antisense transcription of introns contributes to transcriptional activity of Populus genome under drought. Finally, we functionally characterized drought-responsive sense-antisense transcript pairs through gene ontology analysis and discovered that functional groups including transcription factors and histones were concordantly regulated at both sense and antisense transcriptional level. Overall, our study demonstrated the extensive occurrence of antisense transcripts of Populus genes under drought and provided insights into genome structure, regulation pattern and functional significance of drought-responsive antisense genes in forest trees. Datasets generated in this study serve as a foundation for future genetic analysis to improve our understanding of gene regulation by antisense transcription.
AbsIDconvert: An absolute approach for converting genetic identifiers at different granularities

PubMed Central

2012-01-01

Background High-throughput molecular biology techniques yield vast amounts of data, often by detecting small portions of ribonucleotides corresponding to specific identifiers. Existing bioinformatic methodologies categorize and compare these elements using inferred descriptive annotation given this sequence information irrespective of the fact that it may not be representative of the identifier as a whole. Results All annotations, no matter the granularity, can be aligned to genomic sequences and therefore annotated by genomic intervals. We have developed AbsIDconvert, a methodology for converting between genomic identifiers by first mapping them onto a common universal coordinate system using an interval tree which is subsequently queried for overlapping identifiers. AbsIDconvert has many potential uses, including gene identifier conversion, identification of features within a genomic region, and cross-species comparisons. The utility is demonstrated in three case studies: 1) comparative genomic study mapping plasmodium gene sequences to corresponding human and mosquito transcriptional regions; 2) cross-species study of Incyte clone sequences; and 3) analysis of human Ensembl transcripts mapped by Affymetrix®; and Agilent microarray probes. AbsIDconvert currently supports ID conversion of 53 species for a given list of input identifiers, genomic sequence, or genome intervals. Conclusion AbsIDconvert provides an efficient and reliable mechanism for conversion between identifier domains of interest. The flexibility of this tool allows for custom definition identifier domains contingent upon the availability and determination of a genomic mapping interval. As the genomes and the sequences for genetic elements are further refined, this tool will become increasingly useful and accurate. AbsIDconvert is freely available as a web application or downloadable as a virtual machine at: http://bioinformatics.louisville.edu/abid/. PMID:22967011
Epigenetics, chromatin and genome organization: recent advances from the ENCODE project.

PubMed

Siggens, L; Ekwall, K

2014-09-01

The organization of the genome into functional units, such as enhancers and active or repressed promoters, is associated with distinct patterns of DNA and histone modifications. The Encyclopedia of DNA Elements (ENCODE) project has advanced our understanding of the principles of genome, epigenome and chromatin organization, identifying hundreds of thousands of potential regulatory regions and transcription factor binding sites. Part of the ENCODE consortium, GENCODE, has annotated the human genome with novel transcripts including new noncoding RNAs and pseudogenes, highlighting transcriptional complexity. Many disease variants identified in genome-wide association studies are located within putative enhancer regions defined by the ENCODE project. Understanding the principles of chromatin and epigenome organization will help to identify new disease mechanisms, biomarkers and drug targets, particularly as ongoing epigenome mapping projects generate data for primary human cell types that play important roles in disease. © 2014 The Association for the Publication of the Journal of Internal Medicine.

Discovering transcription factor binding sites in highly repetitive regions of genomes with multi-read analysis of ChIP-Seq data.

PubMed

Chung, Dongjun; Kuan, Pei Fen; Li, Bo; Sanalkumar, Rajendran; Liang, Kun; Bresnick, Emery H; Dewey, Colin; Keleş, Sündüz

2011-07-01

Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is rapidly replacing chromatin immunoprecipitation combined with genome-wide tiling array analysis (ChIP-chip) as the preferred approach for mapping transcription-factor binding sites and chromatin modifications. The state of the art for analyzing ChIP-seq data relies on using only reads that map uniquely to a relevant reference genome (uni-reads). This can lead to the omission of up to 30% of alignable reads. We describe a general approach for utilizing reads that map to multiple locations on the reference genome (multi-reads). Our approach is based on allocating multi-reads as fractional counts using a weighted alignment scheme. Using human STAT1 and mouse GATA1 ChIP-seq datasets, we illustrate that incorporation of multi-reads significantly increases sequencing depths, leads to detection of novel peaks that are not otherwise identifiable with uni-reads, and improves detection of peaks in mappable regions. We investigate various genome-wide characteristics of peaks detected only by utilization of multi-reads via computational experiments. Overall, peaks from multi-read analysis have similar characteristics to peaks that are identified by uni-reads except that the majority of them reside in segmental duplications. We further validate a number of GATA1 multi-read only peaks by independent quantitative real-time ChIP analysis and identify novel target genes of GATA1. These computational and experimental results establish that multi-reads can be of critical importance for studying transcription factor binding in highly repetitive regions of genomes with ChIP-seq experiments.
Systematic analysis of transcription start sites in avian development.

PubMed

Lizio, Marina; Deviatiiarov, Ruslan; Nagai, Hiroki; Galan, Laura; Arner, Erik; Itoh, Masayoshi; Lassmann, Timo; Kasukawa, Takeya; Hasegawa, Akira; Ros, Marian A; Hayashizaki, Yoshihide; Carninci, Piero; Forrest, Alistair R R; Kawaji, Hideya; Gusev, Oleg; Sheng, Guojun

2017-09-01

Cap Analysis of Gene Expression (CAGE) in combination with single-molecule sequencing technology allows precision mapping of transcription start sites (TSSs) and genome-wide capture of promoter activities in differentiated and steady state cell populations. Much less is known about whether TSS profiling can characterize diverse and non-steady state cell populations, such as the approximately 400 transitory and heterogeneous cell types that arise during ontogeny of vertebrate animals. To gain such insight, we used the chick model and performed CAGE-based TSS analysis on embryonic samples covering the full 3-week developmental period. In total, 31,863 robust TSS peaks (>1 tag per million [TPM]) were mapped to the latest chicken genome assembly, of which 34% to 46% were active in any given developmental stage. ZENBU, a web-based, open-source platform, was used for interactive data exploration. TSSs of genes critical for lineage differentiation could be precisely mapped and their activities tracked throughout development, suggesting that non-steady state and heterogeneous cell populations are amenable to CAGE-based transcriptional analysis. Our study also uncovered a large set of extremely stable housekeeping TSSs and many novel stage-specific ones. We furthermore demonstrated that TSS mapping could expedite motif-based promoter analysis for regulatory modules associated with stage-specific and housekeeping genes. Finally, using Brachyury as an example, we provide evidence that precise TSS mapping in combination with Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)-on technology enables us, for the first time, to efficiently target endogenous avian genes for transcriptional activation. Taken together, our results represent the first report of genome-wide TSS mapping in birds and the first systematic developmental TSS analysis in any amniote species (birds and mammals). By facilitating promoter-based molecular analysis and genetic manipulation, our work also underscores the value of avian models in unravelling the complex regulatory mechanism of cell lineage specification during amniote development.
A Genome-Wide Scan of Selective Sweeps and Association Mapping of Fruit Traits Using Microsatellite Markers in Watermelon

PubMed Central

Reddy, Umesh K.; Abburi, Lavanya; Abburi, Venkata Lakshmi; Saminathan, Thangasamy; Cantrell, Robert; Vajja, Venkata Gopinath; Reddy, Rishi; Tomason, Yan R.; Levi, Amnon; Wehner, Todd C.; Nimmakayala, Padma

2015-01-01

Our genetic diversity study uses microsatellites of known map position to estimate genome level population structure and linkage disequilibrium, and to identify genomic regions that have undergone selection during watermelon domestication and improvement. Thirty regions that showed evidence of selective sweep were scanned for the presence of candidate genes using the watermelon genome browser (www.icugi.org). We localized selective sweeps in intergenic regions, close to the promoters, and within the exons and introns of various genes. This study provided an evidence of convergent evolution for the presence of diverse ecotypes with special reference to American and European ecotypes. Our search for location of linked markers in the whole-genome draft sequence revealed that BVWS00358, a GA repeat microsatellite, is the GAGA type transcription factor located in the 5′ untranslated regions of a structure and insertion element that expresses a Cys2His2 Zinc finger motif, with presumed biological processes related to chitin response and transcriptional regulation. In addition, BVWS01708, an ATT repeat microsatellite, located in the promoter of a DTW domain-containing protein (Cla002761); and 2 other simple sequence repeats that association mapping link to fruit length and rind thickness. PMID:25425675
Benchmarking database performance for genomic data.

PubMed

Khushi, Matloob

2015-06-01

Genomic regions represent features such as gene annotations, transcription factor binding sites and epigenetic modifications. Performing various genomic operations such as identifying overlapping/non-overlapping regions or nearest gene annotations are common research needs. The data can be saved in a database system for easy management, however, there is no comprehensive database built-in algorithm at present to identify overlapping regions. Therefore I have developed a novel region-mapping (RegMap) SQL-based algorithm to perform genomic operations and have benchmarked the performance of different databases. Benchmarking identified that PostgreSQL extracts overlapping regions much faster than MySQL. Insertion and data uploads in PostgreSQL were also better, although general searching capability of both databases was almost equivalent. In addition, using the algorithm pair-wise, overlaps of >1000 datasets of transcription factor binding sites and histone marks, collected from previous publications, were reported and it was found that HNF4G significantly co-locates with cohesin subunit STAG1 (SA1).Inc. © 2015 Wiley Periodicals, Inc.
ReMap 2018: an updated atlas of regulatory regions from an integrative analysis of DNA-binding ChIP-seq experiments

PubMed Central

Chèneby, Jeanne; Gheorghe, Marius; Artufel, Marie

2018-01-01

Abstract With this latest release of ReMap (http://remap.cisreg.eu), we present a unique collection of regulatory regions in human, as a result of a large-scale integrative analysis of ChIP-seq experiments for hundreds of transcriptional regulators (TRs) such as transcription factors, transcriptional co-activators and chromatin regulators. In 2015, we introduced the ReMap database to capture the genome regulatory space by integrating public ChIP-seq datasets, covering 237 TRs across 13 million (M) peaks. In this release, we have extended this catalog to constitute a unique collection of regulatory regions. Specifically, we have collected, analyzed and retained after quality control a total of 2829 ChIP-seq datasets available from public sources, covering a total of 485 TRs with a catalog of 80M peaks. Additionally, the updated database includes new search features for TR names as well as aliases, including cell line names and the ability to navigate the data directly within genome browsers via public track hubs. Finally, full access to this catalog is available online together with a TR binding enrichment analysis tool. ReMap 2018 provides a significant update of the ReMap database, providing an in depth view of the complexity of the regulatory landscape in human. PMID:29126285
Transcription factor profiling reveals molecular choreography and key regulators of human retrotransposon expression

PubMed Central

Sun, Xiaoji; Wang, Xuya; Tang, Zuojian; Grivainis, Mark; Kahler, David; Yun, Chi; Mita, Paolo; Fenyö, David

2018-01-01

Transposable elements (TEs) represent a substantial fraction of many eukaryotic genomes, and transcriptional regulation of these factors is important to determine TE activities in human cells. However, due to the repetitive nature of TEs, identifying transcription factor (TF)-binding sites from ChIP-sequencing (ChIP-seq) datasets is challenging. Current algorithms are focused on subtle differences between TE copies and thus bias the analysis to relatively old and inactive TEs. Here we describe an approach termed “MapRRCon” (mapping repeat reads to a consensus) which allows us to identify proteins binding to TE DNA sequences by mapping ChIP-seq reads to the TE consensus sequence after whole-genome alignment. Although this method does not assign binding sites to individual insertions in the genome, it provides a landscape of interacting TFs by capturing factors that bind to TEs under various conditions. We applied this method to screen TFs’ interaction with L1 in human cells/tissues using ENCODE ChIP-seq datasets and identified 178 of the 512 TFs tested as bound to L1 in at least one biological condition with most of them (138) localized to the promoter. Among these L1-binding factors, we focused on Myc and CTCF, as they play important roles in cancer progression and 3D chromatin structure formation. Furthermore, we explored the transcriptomes of The Cancer Genome Atlas breast and ovarian tumor samples in which a consistent anti-/correlation between L1 and Myc/CTCF expression was observed, suggesting that these two factors may play roles in regulating L1 transcription during the development of such tumors. PMID:29802231
Transcriptional analysis of Penaeus stylirostris densovirus genes

USDA-ARS?s Scientific Manuscript database

Penaeus stylirostris densovirus (PstDNV) genome contains three open reading frames (ORFs), left, middle, and right, which encode a non-structural (NS) protein, an unknown protein, and a capsid protein (CP), respectively. Transcription mapping revealed that P2, P11 and P61 promoters transcribe the le...
DOE Office of Scientific and Technical Information (OSTI.GOV)

Gardiner, K.

Tremendous progress has been made in the construction of physical and genetic maps of the human chromosomes. The next step in the solving of disease related problems, and in understanding the human genome as a whole, is the systematic isolation of transcribed sequences. Many investigators have already embarked upon comprehensive gene searches, and many more are considering the best strategies for undertaking such searches. Because these are likely to be costly and time consuming endeavors, it is important to determine the most efficient approaches. As a result, it is critical that investigators involved in the construction of transcriptional maps havemore » the opportunity to discuss their experiences and their successes with both old and new technologies. This document contains the proceedings of the Fourth Annual Workshop on the Identification of Transcribed Sequences, held in Montreal, Quebec, October 16-18, 1994. Included are the workshop notebook, containing the agenda, abstracts presented and list of attendees. Topics included: Progress in the application of the hybridization based approaches and exon trapping; Progress in transcriptional map construction of selected genomic regions; Computer assisted analysis of genomic and protein coding sequences and additional new approaches; and, Sequencing and mapping of random cDNAs.« less
Candidate gene database and transcript map for peach, a model species for fruit trees.

PubMed

Horn, Renate; Lecouls, Anne-Claire; Callahan, Ann; Dandekar, Abhaya; Garay, Lilibeth; McCord, Per; Howad, Werner; Chan, Helen; Verde, Ignazio; Main, Doreen; Jung, Sook; Georgi, Laura; Forrest, Sam; Mook, Jennifer; Zhebentyayeva, Tatyana; Yu, Yeisoo; Kim, Hye Ran; Jesudurai, Christopher; Sosinski, Bryon; Arús, Pere; Baird, Vance; Parfitt, Dan; Reighard, Gregory; Scorza, Ralph; Tomkins, Jeffrey; Wing, Rod; Abbott, Albert Glenn

2005-05-01

Peach (Prunus persica) is a model species for the Rosaceae, which includes a number of economically important fruit tree species. To develop an extensive Prunus expressed sequence tag (EST) database for identifying and cloning the genes important to fruit and tree development, we generated 9,984 high-quality ESTs from a peach cDNA library of developing fruit mesocarp. After assembly and annotation, a putative peach unigene set consisting of 3,842 ESTs was defined. Gene ontology (GO) classification was assigned based on the annotation of the single "best hit" match against the Swiss-Prot database. No significant homology could be found in the GenBank nr databases for 24.3% of the sequences. Using core markers from the general Prunus genetic map, we anchored bacterial artificial chromosome (BAC) clones on the genetic map, thereby providing a framework for the construction of a physical and transcript map. A transcript map was developed by hybridizing 1,236 ESTs from the putative peach unigene set and an additional 68 peach cDNA clones against the peach BAC library. Hybridizing ESTs to genetically anchored BACs immediately localized 11.2% of the ESTs on the genetic map. ESTs showed a clustering of expressed genes in defined regions of the linkage groups. [The data were built into a regularly updated Genome Database for Rosaceae (GDR), available at (http://www.genome.clemson.edu/gdr/).].
Effects of Nickel Treatment on H3K4 Trimethylation and Gene Expression

PubMed Central

Tchou-Wong, Kam-Meng; Kluz, Thomas; Arita, Adriana; Smith, Phillip R.; Brown, Stuart; Costa, Max

2011-01-01

Occupational exposure to nickel compounds has been associated with lung and nasal cancers. We have previously shown that exposure of the human lung adenocarcinoma A549 cells to NiCl2 for 24 hr significantly increased global levels of trimethylated H3K4 (H3K4me3), a transcriptional activating mark that maps to the promoters of transcribed genes. To further understand the potential epigenetic mechanism(s) underlying nickel carcinogenesis, we performed genome-wide mapping of H3K4me3 by chromatin immunoprecipitation and direct genome sequencing (ChIP-seq) and correlated with transcriptome genome-wide mapping of RNA transcripts by massive parallel sequencing of cDNA (RNA-seq). The effect of NiCl2 treatment on H3K4me3 peaks within 5,000 bp of transcription start sites (TSSs) on a set of genes highly induced by nickel in both A549 cells and human peripheral blood mononuclear cells were analyzed. Nickel exposure increased the level of H3K4 trimethylation in both the promoters and coding regions of several genes including CA9 and NDRG1 that were increased in expression in A549 cells. We have also compared the extent of the H3K4 trimethylation in the absence and presence of formaldehyde crosslinking and observed that crosslinking of chromatin was required to observe H3K4 trimethylation in the coding regions immediately downstream of TSSs of some nickel-induced genes including ADM and IGFBP3. This is the first genome-wide mapping of trimethylated H3K4 in the promoter and coding regions of genes induced after exposure to NiCl2. This study may provide insights into the epigenetic mechanism(s) underlying the carcinogenicity of nickel compounds. PMID:21455298
Global mapping of DNA conformational flexibility on Saccharomyces cerevisiae.

PubMed

Menconi, Giulia; Bedini, Andrea; Barale, Roberto; Sbrana, Isabella

2015-04-01

In this study we provide the first comprehensive map of DNA conformational flexibility in Saccharomyces cerevisiae complete genome. Flexibility plays a key role in DNA supercoiling and DNA/protein binding, regulating DNA transcription, replication or repair. Specific interest in flexibility analysis concerns its relationship with human genome instability. Enrichment in flexible sequences has been detected in unstable regions of human genome defined fragile sites, where genes map and carry frequent deletions and rearrangements in cancer. Flexible sequences have been suggested to be the determinants of fragile gene proneness to breakage; however, their actual role and properties remain elusive. Our in silico analysis carried out genome-wide via the StabFlex algorithm, shows the conserved presence of highly flexible regions in budding yeast genome as well as in genomes of other Saccharomyces sensu stricto species. Flexibile peaks in S. cerevisiae identify 175 ORFs mapping on their 3'UTR, a region affecting mRNA translation, localization and stability. (TA)n repeats of different extension shape the central structure of peaks and co-localize with polyadenylation efficiency element (EE) signals. ORFs with flexible peaks share common features. Transcripts are characterized by decreased half-life: this is considered peculiar of genes involved in regulatory systems with high turnover; consistently, their function affects biological processes such as cell cycle regulation or stress response. Our findings support the functional importance of flexibility peaks, suggesting that the flexible sequence may be derived by an expansion of canonical TAYRTA polyadenylation efficiency element. The flexible (TA)n repeat amplification could be the outcome of an evolutionary neofunctionalization leading to a differential 3'-end processing and expression regulation in genes with peculiar function. Our study provides a new support to the functional role of flexibility in genomes and a strategy for its characterization inside human fragile sites.
Global Mapping of DNA Conformational Flexibility on Saccharomyces cerevisiae

PubMed Central

Menconi, Giulia; Bedini, Andrea; Barale, Roberto; Sbrana, Isabella

2015-01-01

In this study we provide the first comprehensive map of DNA conformational flexibility in Saccharomyces cerevisiae complete genome. Flexibility plays a key role in DNA supercoiling and DNA/protein binding, regulating DNA transcription, replication or repair. Specific interest in flexibility analysis concerns its relationship with human genome instability. Enrichment in flexible sequences has been detected in unstable regions of human genome defined fragile sites, where genes map and carry frequent deletions and rearrangements in cancer. Flexible sequences have been suggested to be the determinants of fragile gene proneness to breakage; however, their actual role and properties remain elusive. Our in silico analysis carried out genome-wide via the StabFlex algorithm, shows the conserved presence of highly flexible regions in budding yeast genome as well as in genomes of other Saccharomyces sensu stricto species. Flexibile peaks in S. cerevisiae identify 175 ORFs mapping on their 3’UTR, a region affecting mRNA translation, localization and stability. (TA)n repeats of different extension shape the central structure of peaks and co-localize with polyadenylation efficiency element (EE) signals. ORFs with flexible peaks share common features. Transcripts are characterized by decreased half-life: this is considered peculiar of genes involved in regulatory systems with high turnover; consistently, their function affects biological processes such as cell cycle regulation or stress response. Our findings support the functional importance of flexibility peaks, suggesting that the flexible sequence may be derived by an expansion of canonical TAYRTA polyadenylation efficiency element. The flexible (TA)n repeat amplification could be the outcome of an evolutionary neofunctionalization leading to a differential 3’-end processing and expression regulation in genes with peculiar function. Our study provides a new support to the functional role of flexibility in genomes and a strategy for its characterization inside human fragile sites. PMID:25860149
A Comparative Encyclopedia of DNA Elements in the Mouse Genome

PubMed Central

Yue, Feng; Cheng, Yong; Breschi, Alessandra; Vierstra, Jeff; Wu, Weisheng; Ryba, Tyrone; Sandstrom, Richard; Ma, Zhihai; Davis, Carrie; Pope, Benjamin D.; Shen, Yin; Pervouchine, Dmitri D.; Djebali, Sarah; Thurman, Bob; Kaul, Rajinder; Rynes, Eric; Kirilusha, Anthony; Marinov, Georgi K.; Williams, Brian A.; Trout, Diane; Amrhein, Henry; Fisher-Aylor, Katherine; Antoshechkin, Igor; DeSalvo, Gilberto; See, Lei-Hoon; Fastuca, Meagan; Drenkow, Jorg; Zaleski, Chris; Dobin, Alex; Prieto, Pablo; Lagarde, Julien; Bussotti, Giovanni; Tanzer, Andrea; Denas, Olgert; Li, Kanwei; Bender, M. A.; Zhang, Miaohua; Byron, Rachel; Groudine, Mark T.; McCleary, David; Pham, Long; Ye, Zhen; Kuan, Samantha; Edsall, Lee; Wu, Yi-Chieh; Rasmussen, Matthew D.; Bansal, Mukul S.; Keller, Cheryl A.; Morrissey, Christapher S.; Mishra, Tejaswini; Jain, Deepti; Dogan, Nergiz; Harris, Robert S.; Cayting, Philip; Kawli, Trupti; Boyle, Alan P.; Euskirchen, Ghia; Kundaje, Anshul; Lin, Shin; Lin, Yiing; Jansen, Camden; Malladi, Venkat S.; Cline, Melissa S.; Erickson, Drew T.; Kirkup, Vanessa M; Learned, Katrina; Sloan, Cricket A.; Rosenbloom, Kate R.; de Sousa, Beatriz Lacerda; Beal, Kathryn; Pignatelli, Miguel; Flicek, Paul; Lian, Jin; Kahveci, Tamer; Lee, Dongwon; Kent, W. James; Santos, Miguel Ramalho; Herrero, Javier; Notredame, Cedric; Johnson, Audra; Vong, Shinny; Lee, Kristen; Bates, Daniel; Neri, Fidencio; Diegel, Morgan; Canfield, Theresa; Sabo, Peter J.; Wilken, Matthew S.; Reh, Thomas A.; Giste, Erika; Shafer, Anthony; Kutyavin, Tanya; Haugen, Eric; Dunn, Douglas; Reynolds, Alex P.; Neph, Shane; Humbert, Richard; Hansen, R. Scott; De Bruijn, Marella; Selleri, Licia; Rudensky, Alexander; Josefowicz, Steven; Samstein, Robert; Eichler, Evan E.; Orkin, Stuart H.; Levasseur, Dana; Papayannopoulou, Thalia; Chang, Kai-Hsin; Skoultchi, Arthur; Gosh, Srikanta; Disteche, Christine; Treuting, Piper; Wang, Yanli; Weiss, Mitchell J.; Blobel, Gerd A.; Good, Peter J.; Lowdon, Rebecca F.; Adams, Leslie B.; Zhou, Xiao-Qiao; Pazin, Michael J.; Feingold, Elise A.; Wold, Barbara; Taylor, James; Kellis, Manolis; Mortazavi, Ali; Weissman, Sherman M.; Stamatoyannopoulos, John; Snyder, Michael P.; Guigo, Roderic; Gingeras, Thomas R.; Gilbert, David M.; Hardison, Ross C.; Beer, Michael A.; Ren, Bing

2014-01-01

Summary As the premier model organism in biomedical research, the laboratory mouse shares the majority of protein-coding genes with humans, yet the two mammals differ in significant ways. To gain greater insights into both shared and species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications, and replication domains throughout the mouse genome in diverse cell and tissue types. By comparing with the human genome, we not only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree of divergence of other sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization. Our results illuminate the wide range of evolutionary forces acting on genes and their regulatory regions, and provide a general resource for research into mammalian biology and mechanisms of human diseases. PMID:25409824
A comparative encyclopedia of DNA elements in the mouse genome.

PubMed

Yue, Feng; Cheng, Yong; Breschi, Alessandra; Vierstra, Jeff; Wu, Weisheng; Ryba, Tyrone; Sandstrom, Richard; Ma, Zhihai; Davis, Carrie; Pope, Benjamin D; Shen, Yin; Pervouchine, Dmitri D; Djebali, Sarah; Thurman, Robert E; Kaul, Rajinder; Rynes, Eric; Kirilusha, Anthony; Marinov, Georgi K; Williams, Brian A; Trout, Diane; Amrhein, Henry; Fisher-Aylor, Katherine; Antoshechkin, Igor; DeSalvo, Gilberto; See, Lei-Hoon; Fastuca, Meagan; Drenkow, Jorg; Zaleski, Chris; Dobin, Alex; Prieto, Pablo; Lagarde, Julien; Bussotti, Giovanni; Tanzer, Andrea; Denas, Olgert; Li, Kanwei; Bender, M A; Zhang, Miaohua; Byron, Rachel; Groudine, Mark T; McCleary, David; Pham, Long; Ye, Zhen; Kuan, Samantha; Edsall, Lee; Wu, Yi-Chieh; Rasmussen, Matthew D; Bansal, Mukul S; Kellis, Manolis; Keller, Cheryl A; Morrissey, Christapher S; Mishra, Tejaswini; Jain, Deepti; Dogan, Nergiz; Harris, Robert S; Cayting, Philip; Kawli, Trupti; Boyle, Alan P; Euskirchen, Ghia; Kundaje, Anshul; Lin, Shin; Lin, Yiing; Jansen, Camden; Malladi, Venkat S; Cline, Melissa S; Erickson, Drew T; Kirkup, Vanessa M; Learned, Katrina; Sloan, Cricket A; Rosenbloom, Kate R; Lacerda de Sousa, Beatriz; Beal, Kathryn; Pignatelli, Miguel; Flicek, Paul; Lian, Jin; Kahveci, Tamer; Lee, Dongwon; Kent, W James; Ramalho Santos, Miguel; Herrero, Javier; Notredame, Cedric; Johnson, Audra; Vong, Shinny; Lee, Kristen; Bates, Daniel; Neri, Fidencio; Diegel, Morgan; Canfield, Theresa; Sabo, Peter J; Wilken, Matthew S; Reh, Thomas A; Giste, Erika; Shafer, Anthony; Kutyavin, Tanya; Haugen, Eric; Dunn, Douglas; Reynolds, Alex P; Neph, Shane; Humbert, Richard; Hansen, R Scott; De Bruijn, Marella; Selleri, Licia; Rudensky, Alexander; Josefowicz, Steven; Samstein, Robert; Eichler, Evan E; Orkin, Stuart H; Levasseur, Dana; Papayannopoulou, Thalia; Chang, Kai-Hsin; Skoultchi, Arthur; Gosh, Srikanta; Disteche, Christine; Treuting, Piper; Wang, Yanli; Weiss, Mitchell J; Blobel, Gerd A; Cao, Xiaoyi; Zhong, Sheng; Wang, Ting; Good, Peter J; Lowdon, Rebecca F; Adams, Leslie B; Zhou, Xiao-Qiao; Pazin, Michael J; Feingold, Elise A; Wold, Barbara; Taylor, James; Mortazavi, Ali; Weissman, Sherman M; Stamatoyannopoulos, John A; Snyder, Michael P; Guigo, Roderic; Gingeras, Thomas R; Gilbert, David M; Hardison, Ross C; Beer, Michael A; Ren, Bing

2014-11-20

The laboratory mouse shares the majority of its protein-coding genes with humans, making it the premier model organism in biomedical research, yet the two mammals differ in significant ways. To gain greater insights into both shared and species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications and replication domains throughout the mouse genome in diverse cell and tissue types. By comparing with the human genome, we not only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree of divergence of sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization. Our results illuminate the wide range of evolutionary forces acting on genes and their regulatory regions, and provide a general resource for research into mammalian biology and mechanisms of human diseases.
Mapping the pericentric heterochromatin by comparative genomic hybridization analysis and chromosome deletions in Drosophila melanogaster

PubMed Central

He, Bing; Caudy, Amy; Parsons, Lance; Rosebrock, Adam; Pane, Attilio; Raj, Sandeep; Wieschaus, Eric

2012-01-01

Heterochromatin represents a significant portion of eukaryotic genomes and has essential structural and regulatory functions. Its molecular organization is largely unknown due to difficulties in sequencing through and assembling repetitive sequences enriched in the heterochromatin. Here we developed a novel strategy using chromosomal rearrangements and embryonic phenotypes to position unmapped Drosophila melanogaster heterochromatic sequence to specific chromosomal regions. By excluding sequences that can be mapped to the assembled euchromatic arms, we identified sequences that are specific to heterochromatin and used them to design heterochromatin specific probes (“H-probes”) for microarray. By comparative genomic hybridization (CGH) analyses of embryos deficient for each chromosome or chromosome arm, we were able to map most of our H-probes to specific chromosome arms. We also positioned sequences mapped to the second and X chromosomes to finer intervals by analyzing smaller deletions with breakpoints in heterochromatin. Using this approach, we were able to map >40% (13.9 Mb) of the previously unmapped heterochromatin sequences assembled by the whole-genome sequencing effort on arm U and arm Uextra to specific locations. We also identified and mapped 110 kb of novel heterochromatic sequences. Subsequent analyses revealed that sequences located within different heterochromatic regions have distinct properties, such as sequence composition, degree of repetitiveness, and level of underreplication in polytenized tissues. Surprisingly, although heterochromatin is generally considered to be transcriptionally silent, we detected region-specific temporal patterns of transcription in heterochromatin during oogenesis and early embryonic development. Our study provides a useful approach to elucidate the molecular organization and function of heterochromatin and reveals region-specific variation of heterochromatin. PMID:22745230
Decomposing Oncogenic Transcriptional Signatures to Generate Maps of Divergent Cellular States* | Office of Cancer Genomics

Cancer.gov

The systematic sequencing of the cancer genome has led to the identification of numerous genetic alterations in cancer. However, a deeper understanding of the functional consequences of these alterations is necessary to guide appropriate therapeutic strategies. Here, we describe Onco-GPS (OncoGenic Positioning System), a data-driven analysis framework to organize individual tumor samples with shared oncogenic alterations onto a reference map defined by their underlying cellular states.
Unprecedented high-resolution view of bacterial operon architecture revealed by RNA sequencing.

PubMed

Conway, Tyrrell; Creecy, James P; Maddox, Scott M; Grissom, Joe E; Conkle, Trevor L; Shadid, Tyler M; Teramoto, Jun; San Miguel, Phillip; Shimada, Tomohiro; Ishihama, Akira; Mori, Hirotada; Wanner, Barry L

2014-07-08

We analyzed the transcriptome of Escherichia coli K-12 by strand-specific RNA sequencing at single-nucleotide resolution during steady-state (logarithmic-phase) growth and upon entry into stationary phase in glucose minimal medium. To generate high-resolution transcriptome maps, we developed an organizational schema which showed that in practice only three features are required to define operon architecture: the promoter, terminator, and deep RNA sequence read coverage. We precisely annotated 2,122 promoters and 1,774 terminators, defining 1,510 operons with an average of 1.98 genes per operon. Our analyses revealed an unprecedented view of E. coli operon architecture. A large proportion (36%) of operons are complex with internal promoters or terminators that generate multiple transcription units. For 43% of operons, we observed differential expression of polycistronic genes, despite being in the same operons, indicating that E. coli operon architecture allows fine-tuning of gene expression. We found that 276 of 370 convergent operons terminate inefficiently, generating complementary 3' transcript ends which overlap on average by 286 nucleotides, and 136 of 388 divergent operons have promoters arranged such that their 5' ends overlap on average by 168 nucleotides. We found 89 antisense transcripts of 397-nucleotide average length, 7 unannotated transcripts within intergenic regions, and 18 sense transcripts that completely overlap operons on the opposite strand. Of 519 overlapping transcripts, 75% correspond to sequences that are highly conserved in E. coli (>50 genomes). Our data extend recent studies showing unexpected transcriptome complexity in several bacteria and suggest that antisense RNA regulation is widespread. Importance: We precisely mapped the 5' and 3' ends of RNA transcripts across the E. coli K-12 genome by using a single-nucleotide analytical approach. Our resulting high-resolution transcriptome maps show that ca. one-third of E. coli operons are complex, with internal promoters and terminators generating multiple transcription units and allowing differential gene expression within these operons. We discovered extensive antisense transcription that results from more than 500 operons, which fully overlap or extensively overlap adjacent divergent or convergent operons. The genomic regions corresponding to these antisense transcripts are highly conserved in E. coli (including Shigella species), although it remains to be proven whether or not they are functional. Our observations of features unearthed by single-nucleotide transcriptome mapping suggest that deeper layers of transcriptional regulation in bacteria are likely to be revealed in the future. Copyright © 2014 Conway et al.
Genome organization and long-range regulation of gene expression by enhancers

PubMed Central

Smallwood, Andrea; Ren, Bing

2014-01-01

It is now well accepted that cell-type specific gene regulation is under the purview of enhancers. Great strides have been made recently to characterize and identify enhancers both genetically and epigenetically for multiple cell types and species, but efforts have just begun to link enhancers to their target promoters. Mapping these interactions and understanding how the 3D landscape of the genome constrains such interactions is fundamental to our understanding of mammalian gene regulation. Here, we review recent progress in mapping long-range regulatory interactions in mammalian genomes, focusing on transcriptional enhancers and chromatin organization principles. PMID:23465541
A distinct subgroup of cardiomyopathy patients characterized by transcriptionally active cardiotropic erythrovirus and altered cardiac gene expression.

PubMed

Kuhl, U; Lassner, D; Dorner, A; Rohde, M; Escher, F; Seeberg, B; Hertel, E; Tschope, C; Skurk, C; Gross, U M; Schultheiss, H-P; Poller, W

2013-09-01

Recent studies have detected erythrovirus genomes in the hearts of cardiomyopathy and cardiac transplant patients. Assessment of the functional status of viruses may provide clinically important information beyond detection of the viral genomes. Here, we report transcriptional activation of cardiotropic erythrovirus to be associated with strongly altered myocardial gene expression in a distinct subgroup of cardiomyopathy patients. Endomyocardial biopsies (EMBs) from 415 consecutive cardiac erythrovirus (B19V)-positive patients with clinically suspected cardiomyopathy were screened for virus-encoded VP1/VP2 mRNA indicating transcriptional activation of the virus, and correlated with cardiac host gene expression patterns in transcriptionally active versus latent infections, and in virus-free control hearts. Transcriptional activity was detected in baseline biopsies of only 66/415 patients (15.9 %) harbouring erythrovirus. At the molecular level, significant differences between cardiac B19V-positive patients with transcriptionally active versus latent virus were revealed by expression profiling of EMBs. Importantly, latent B19V infection was indistinguishable from controls. Genes involved encode proteins of antiviral immune response, B19V receptor complex, and mitochondrial energy metabolism. Thus, functional mapping of erythrovirus allows definition of a subgroup of B19V-infected cardiomyopathy patients characterized by virus-encoded VP1/VP2 transcripts and anomalous host myocardial transcriptomes. Cardiac B19V reactivation from latency, as reported here for the first time, is a key factor required for erythrovirus to induce altered cardiac gene expression in a subgroup of cardiomyopathy patients. Virus genome detection is insufficient to assess pathogenic potential, but additional transcriptional mapping should be incorporated into future pathogenetic and therapeutic studies both in cardiology and transplantation medicine.
Genome-wide maps of alkylation damage, repair, and mutagenesis in yeast reveal mechanisms of mutational heterogeneity.

PubMed

Mao, Peng; Brown, Alexander J; Malc, Ewa P; Mieczkowski, Piotr A; Smerdon, Michael J; Roberts, Steven A; Wyrick, John J

2017-10-01

DNA base damage is an important contributor to genome instability, but how the formation and repair of these lesions is affected by the genomic landscape and contributes to mutagenesis is unknown. Here, we describe genome-wide maps of DNA base damage, repair, and mutagenesis at single nucleotide resolution in yeast treated with the alkylating agent methyl methanesulfonate (MMS). Analysis of these maps revealed that base excision repair (BER) of alkylation damage is significantly modulated by chromatin, with faster repair in nucleosome-depleted regions, and slower repair and higher mutation density within strongly positioned nucleosomes. Both the translational and rotational settings of lesions within nucleosomes significantly influence BER efficiency; moreover, this effect is asymmetric relative to the nucleosome dyad axis and is regulated by histone modifications. Our data also indicate that MMS-induced mutations at adenine nucleotides are significantly enriched on the nontranscribed strand (NTS) of yeast genes, particularly in BER-deficient strains, due to higher damage formation on the NTS and transcription-coupled repair of the transcribed strand (TS). These findings reveal the influence of chromatin on repair and mutagenesis of base lesions on a genome-wide scale and suggest a novel mechanism for transcription-associated mutation asymmetry, which is frequently observed in human cancers. © 2017 Mao et al.; Published by Cold Spring Harbor Laboratory Press.

ASFinder: a tool for genome-wide identification of alternatively splicing transcripts from EST-derived sequences.

PubMed

Min, Xiang Jia

2013-01-01

Expressed Sequence Tags (ESTs) are a rich resource for identifying Alternatively Splicing (AS) genes. The ASFinder webserver is designed to identify AS isoforms from EST-derived sequences. Two approaches are implemented in ASFinder. If no genomic sequences are provided, the server performs a local BLASTN to identify AS isoforms from ESTs having both ends aligned but an internal segment unaligned. Otherwise, ASFinder uses SIM4 to map ESTs to the genome, then the overlapping ESTs that are mapped to the same genomic locus and have internal variable exon/intron boundaries are identified as AS isoforms. The tool is available at http://proteomics.ysu.edu/tools/ASFinder.html.
A consensus genetic map of cowpea [Vigna unguiculata (L) Walp.] and synteny based on EST-derived SNPs.

PubMed

Muchero, Wellington; Diop, Ndeye N; Bhat, Prasanna R; Fenton, Raymond D; Wanamaker, Steve; Pottorff, Marti; Hearne, Sarah; Cisse, Ndiaga; Fatokun, Christian; Ehlers, Jeffrey D; Roberts, Philip A; Close, Timothy J

2009-10-27

Consensus genetic linkage maps provide a genomic framework for quantitative trait loci identification, map-based cloning, assessment of genetic diversity, association mapping, and applied breeding in marker-assisted selection schemes. Among "orphan crops" with limited genomic resources such as cowpea [Vigna unguiculata (L.) Walp.] (2n = 2x = 22), the use of transcript-derived SNPs in genetic maps provides opportunities for automated genotyping and estimation of genome structure based on synteny analysis. Here, we report the development and validation of a high-throughput EST-derived SNP assay for cowpea, its application in consensus map building, and determination of synteny to reference genomes. SNP mining from 183,118 ESTs sequenced from 17 cDNA libraries yielded approximately 10,000 high-confidence SNPs from which an Illumina 1,536-SNP GoldenGate genotyping array was developed and applied to 741 recombinant inbred lines from six mapping populations. Approximately 90% of the SNPs were technically successful, providing 1,375 dependable markers. Of these, 928 were incorporated into a consensus genetic map spanning 680 cM with 11 linkage groups and an average marker distance of 0.73 cM. Comparison of this cowpea genetic map to reference legumes, soybean (Glycine max) and Medicago truncatula, revealed extensive macrosynteny encompassing 85 and 82%, respectively, of the cowpea map. Regions of soybean genome duplication were evident relative to the simpler diploid cowpea. Comparison with Arabidopsis revealed extensive genomic rearrangement with some conserved microsynteny. These results support evolutionary closeness between cowpea and soybean and identify regions for synteny-based functional genomics studies in legumes.
ReMap 2018: an updated atlas of regulatory regions from an integrative analysis of DNA-binding ChIP-seq experiments.

PubMed

Chèneby, Jeanne; Gheorghe, Marius; Artufel, Marie; Mathelier, Anthony; Ballester, Benoit

2018-01-04

With this latest release of ReMap (http://remap.cisreg.eu), we present a unique collection of regulatory regions in human, as a result of a large-scale integrative analysis of ChIP-seq experiments for hundreds of transcriptional regulators (TRs) such as transcription factors, transcriptional co-activators and chromatin regulators. In 2015, we introduced the ReMap database to capture the genome regulatory space by integrating public ChIP-seq datasets, covering 237 TRs across 13 million (M) peaks. In this release, we have extended this catalog to constitute a unique collection of regulatory regions. Specifically, we have collected, analyzed and retained after quality control a total of 2829 ChIP-seq datasets available from public sources, covering a total of 485 TRs with a catalog of 80M peaks. Additionally, the updated database includes new search features for TR names as well as aliases, including cell line names and the ability to navigate the data directly within genome browsers via public track hubs. Finally, full access to this catalog is available online together with a TR binding enrichment analysis tool. ReMap 2018 provides a significant update of the ReMap database, providing an in depth view of the complexity of the regulatory landscape in human. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Complex multi-enhancer contacts captured by Genome Architecture Mapping (GAM)

PubMed Central

Beagrie, Robert A.; Scialdone, Antonio; Schueler, Markus; Kraemer, Dorothee C.A.; Chotalia, Mita; Xie, Sheila Q.; Barbieri, Mariano; de Santiago, Inês; Lavitas, Liron-Mark; Branco, Miguel R.; Fraser, James; Dostie, Josée; Game, Laurence; Dillon, Niall; Edwards, Paul A.W.; Nicodemi, Mario; Pombo, Ana

2017-01-01

Summary The organization of the genome in the nucleus and the interactions of genes with their regulatory elements are key features of transcriptional control and their disruption can cause disease. We developed a novel genome-wide method, Genome Architecture Mapping (GAM), for measuring chromatin contacts, and other features of three-dimensional chromatin topology, based on sequencing DNA from a large collection of thin nuclear sections. We apply GAM to mouse embryonic stem cells and identify an enrichment for specific interactions between active genes and enhancers across very large genomic distances, using a mathematical model ‘SLICE’ (Statistical Inference of Co-segregation). GAM also reveals an abundance of three-way contacts genome-wide, especially between regions that are highly transcribed or contain super-enhancers, highlighting a previously inaccessible complexity in genome architecture and a major role for gene-expression specific contacts in organizing the genome in mammalian nuclei. PMID:28273065
Trinity: Transcriptome Assembly for Genetic and Functional Analysis of Cancer | Informatics Technology for Cancer Research (ITCR)

Cancer.gov

The cancer transcriptome is shaped by genetic changes, variation in gene transcription, mRNA processing, editing and stability, and the cancer microbiome. Deciphering this variation and understanding its implications on tumorigenesis requires sophisticated computational analyses. Most RNA-Seq analyses rely on methods that first map short reads to a reference genome, and then compare them to annotated transcripts or assemble them. However, this strategy can be limited when the cancer genome is substantially different than the reference or for detecting sequences from the cancer microbiome.
Distinct contributions of replication and transcription to mutation rate variation of human genomes.

PubMed

Cui, Peng; Ding, Feng; Lin, Qiang; Zhang, Lingfang; Li, Ang; Zhang, Zhang; Hu, Songnian; Yu, Jun

2012-02-01

Here, we evaluate the contribution of two major biological processes--DNA replication and transcription--to mutation rate variation in human genomes. Based on analysis of the public human tissue transcriptomics data, high-resolution replicating map of Hela cells and dbSNP data, we present significant correlations between expression breadth, replication time in local regions and SNP density. SNP density of tissue-specific (TS) genes is significantly higher than that of housekeeping (HK) genes. TS genes tend to locate in late-replicating genomic regions and genes in such regions have a higher SNP density compared to those in early-replication regions. In addition, SNP density is found to be positively correlated with expression level among HK genes. We conclude that the process of DNA replication generates stronger mutational pressure than transcription-associated biological processes do, resulting in an increase of mutation rate in TS genes while having weaker effects on HK genes. In contrast, transcription-associated processes are mainly responsible for the accumulation of mutations in highly-expressed HK genes. Copyright © 2012 Beijing Genomics Institute. Published by Elsevier Ltd. All rights reserved.
An Integrated Genomic Approach for Rapid Delineation of Candidate Genes Regulating Agro-Morphological Traits in Chickpea

PubMed Central

Saxena, Maneesha S.; Bajaj, Deepak; Das, Shouvik; Kujur, Alice; Kumar, Vinod; Singh, Mohar; Bansal, Kailash C.; Tyagi, Akhilesh K.; Parida, Swarup K.

2014-01-01

The identification and fine mapping of robust quantitative trait loci (QTLs)/genes governing important agro-morphological traits in chickpea still lacks systematic efforts at a genome-wide scale involving wild Cicer accessions. In this context, an 834 simple sequence repeat and single-nucleotide polymorphism marker-based high-density genetic linkage map between cultivated and wild parental accessions (Cicer arietinum desi cv. ICC 4958 and Cicer reticulatum wild cv. ICC 17160) was constructed. This inter-specific genetic map comprising eight linkage groups spanned a map length of 949.4 cM with an average inter-marker distance of 1.14 cM. Eleven novel major genomic regions harbouring 15 robust QTLs (15.6–39.8% R2 at 4.2–15.7 logarithm of odds) associated with four agro-morphological traits (100-seed weight, pod and branch number/plant and plant hairiness) were identified and mapped on chickpea chromosomes. Most of these QTLs showed positive additive gene effects with effective allelic contribution from ICC 4958, particularly for increasing seed weight (SW) and pod and branch number. One robust SW-influencing major QTL region (qSW4.2) has been narrowed down by combining QTL mapping with high-resolution QTL region-specific association analysis, differential expression profiling and gene haplotype-based association/LD mapping. This enabled to delineate a strong SW-regulating ABI3VP1 transcription factor (TF) gene at trait-specific QTL interval and consequently identified favourable natural allelic variants and superior high seed weight-specific haplotypes in the upstream regulatory region of this gene showing increased transcript expression during seed development. The genes (TFs) harbouring diverse trait-regulating QTLs, once validated and fine-mapped by our developed rapid integrated genomic approach and through gene/QTL map-based cloning, can be utilized as potential candidates for marker-assisted genetic enhancement of chickpea. PMID:25335477
Molecular Targeting of Prostate Cancer During Androgen Ablation: Inhibition of CHES1/FOXN3

DTIC Science & Technology

2013-05-01

the DNA sequences (~25^6 reads/sample) were mapped to the human genome reference sequence (hg19...tumor the AR has a genomic abnormality, placing the novel sequence 3’ of the transcriptional start site. However, it is unclear if a genomic alteration...exon/intron organization of the CHES1 gene was determined by BLAST analysis of the human genome using the 1,473-bp CHES1 cDNA sequence
Deep analysis of wild Vitis flower transcriptome reveals unexplored genome regions associated with sex specification.

PubMed

Ramos, Miguel Jesus Nunes; Coito, João Lucas; Fino, Joana; Cunha, Jorge; Silva, Helena; de Almeida, Patrícia Gomes; Costa, Maria Manuela Ribeiro; Amâncio, Sara; Paulo, Octávio S; Rocheta, Margarida

2017-01-01

RNA-seq of Vitis during early stages of bud development, in male, female and hermaphrodite flowers, identified new loci outside of annotated gene models, suggesting their involvement in sex establishment. The molecular mechanisms responsible for flower sex specification remain unclear for most plant species. In the case of V. vinifera ssp. vinifera, it is not fully understood what determines hermaphroditism in the domesticated subspecies and male or female flowers in wild dioecious relatives (Vitis vinifera ssp. sylvestris). Here, we describe a de novo assembly of the transcriptome of three flower developmental stages from the three Vitis vinifera flower types. The validation of de novo assembly showed a correlation of 0.825. The main goals of this work were the identification of V. v. sylvestris exclusive transcripts and the characterization of differential gene expression during flower development. RNA from several flower developmental stages was used previously to generate Illumina sequence reads. Through a sequential de novo assembly strategy one comprehensive transcriptome comprising 95,516 non-redundant transcripts was assembled. From this dataset 81,064 transcripts were annotated to V. v. vinifera reference transcriptome and 11,084 were annotated against V. v. vinifera reference genome. Moreover, we found 3368 transcripts that could not be mapped to Vitis reference genome. From all the non-redundant transcripts that were assembled, bioinformatics analysis identified 133 specific of V. v. sylvestris and 516 transcripts differentially expressed among the three flower types. The detection of transcription from areas of the genome not currently annotated suggests active transcription of previously unannotated genomic loci during early stages of bud development.
Transcriptional atlas of cardiogenesis maps congenital heart disease interactome.

PubMed

Li, Xing; Martinez-Fernandez, Almudena; Hartjes, Katherine A; Kocher, Jean-Pierre A; Olson, Timothy M; Terzic, Andre; Nelson, Timothy J

2014-07-01

Mammalian heart development is built on highly conserved molecular mechanisms with polygenetic perturbations resulting in a spectrum of congenital heart diseases (CHD). However, knowledge of cardiogenic ontogeny that regulates proper cardiogenesis remains largely based on candidate-gene approaches. Mapping the dynamic transcriptional landscape of cardiogenesis from a genomic perspective is essential to integrate the knowledge of heart development into translational applications that accelerate disease discovery efforts toward mechanistic-based treatment strategies. Herein, we designed a time-course transcriptome analysis to investigate the genome-wide dynamic expression landscape of innate murine cardiogenesis ranging from embryonic stem cells to adult cardiac structures. This comprehensive analysis generated temporal and spatial expression profiles, revealed stage-specific gene functions, and mapped the dynamic transcriptome of cardiogenesis to curated pathways. Reconciling known genetic underpinnings of CHD, we deconstructed a disease-centric dynamic interactome encoded within this cardiogenic atlas to identify stage-specific developmental disturbances clustered on regulation of epithelial-to-mesenchymal transition (EMT), BMP signaling, NF-AT signaling, TGFb-dependent EMT, and Notch signaling. Collectively, this cardiogenic transcriptional landscape defines the time-dependent expression of cardiac ontogeny and prioritizes regulatory networks at the interface between health and disease. Copyright © 2014 the American Physiological Society.
Genome organization and long-range regulation of gene expression by enhancers.

PubMed

Smallwood, Andrea; Ren, Bing

2013-06-01

It is now well accepted that cell-type specific gene regulation is under the purview of enhancers. Great strides have been made recently to characterize and identify enhancers both genetically and epigenetically for multiple cell types and species, but efforts have just begun to link enhancers to their target promoters. Mapping these interactions and understanding how the 3D landscape of the genome constrains such interactions is fundamental to our understanding of mammalian gene regulation. Here, we review recent progress in mapping long-range regulatory interactions in mammalian genomes, focusing on transcriptional enhancers and chromatin organization principles. Copyright © 2013. Published by Elsevier Ltd.
ACTG: novel peptide mapping onto gene models.

PubMed

Choi, Seunghyuk; Kim, Hyunwoo; Paek, Eunok

2017-04-15

In many proteogenomic applications, mapping peptide sequences onto genome sequences can be very useful, because it allows us to understand origins of the gene products. Existing software tools either take the genomic position of a peptide start site as an input or assume that the peptide sequence exactly matches the coding sequence of a given gene model. In case of novel peptides resulting from genomic variations, especially structural variations such as alternative splicing, these existing tools cannot be directly applied unless users supply information about the variant, either its genomic position or its transcription model. Mapping potentially novel peptides to genome sequences, while allowing certain genomic variations, requires introducing novel gene models when aligning peptide sequences to gene structures. We have developed a new tool called ACTG (Amino aCids To Genome), which maps peptides to genome, assuming all possible single exon skipping, junction variation allowing three edit distances from the original splice sites, exon extension and frame shift. In addition, it can also consider SNVs (single nucleotide variations) during mapping phase if a user provides the VCF (variant call format) file as an input. Available at http://prix.hanyang.ac.kr/ACTG/search.jsp . eunokpaek@hanyang.ac.kr. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Circularization of the HIV-1 genome facilitates strand transfer during reverse transcription

PubMed Central

Beerens, Nancy; Kjems, Jørgen

2010-01-01

Two obligatory DNA strand transfers take place during reverse transcription of a retroviral RNA genome. The first strand transfer involves a jump from the 5′ to the 3′ terminal repeat (R) region positioned at each end of the viral genome. The process depends on base pairing between the cDNA synthesized from the 5′ R region and the 3′ R RNA. The tertiary conformation of the viral RNA genome may facilitate strand transfer by juxtaposing the 5′ R and 3′ R sequences that are 9 kb apart in the linear sequence. In this study, RNA sequences involved in an interaction between the 5′ and 3′ ends of the HIV-1 genome were mapped by mutational analysis. This interaction appears to be mediated mainly by a sequence in the extreme 3′ end of the viral genome and in the gag open reading frame. Mutation of 3′ R sequences was found to inhibit the 5′–3′ interaction, which could be restored by a complementary mutation in the 5′ gag region. Furthermore, we find that circularization of the HIV-1 genome does not affect the initiation of reverse transcription, but stimulates the first strand transfer during reverse transcription in vitro, underscoring the functional importance of the interaction. PMID:20430859
Best practices for mapping replication origins in eukaryotic chromosomes.

PubMed

Besnard, Emilie; Desprat, Romain; Ryan, Michael; Kahli, Malik; Aladjem, Mirit I; Lemaitre, Jean-Marc

2014-09-02

Understanding the regulatory principles ensuring complete DNA replication in each cell division is critical for deciphering the mechanisms that maintain genomic stability. Recent advances in genome sequencing technology facilitated complete mapping of DNA replication sites and helped move the field from observing replication patterns at a handful of single loci to analyzing replication patterns genome-wide. These advances address issues, such as the relationship between replication initiation events, transcription, and chromatin modifications, and identify potential replication origin consensus sequences. This unit summarizes the technological and fundamental aspects of replication profiling and briefly discusses novel insights emerging from mining large datasets, published in the last 3 years, and also describes DNA replication dynamics on a whole-genome scale. Copyright © 2014 John Wiley & Sons, Inc.
Diverse patterns of genomic targeting by transcriptional regulators in Drosophila melanogaster.

PubMed

Slattery, Matthew; Ma, Lijia; Spokony, Rebecca F; Arthur, Robert K; Kheradpour, Pouya; Kundaje, Anshul; Nègre, Nicolas; Crofts, Alex; Ptashkin, Ryan; Zieba, Jennifer; Ostapenko, Alexander; Suchy, Sarah; Victorsen, Alec; Jameel, Nader; Grundstad, A Jason; Gao, Wenxuan; Moran, Jennifer R; Rehm, E Jay; Grossman, Robert L; Kellis, Manolis; White, Kevin P

2014-07-01

Annotation of regulatory elements and identification of the transcription-related factors (TRFs) targeting these elements are key steps in understanding how cells interpret their genetic blueprint and their environment during development, and how that process goes awry in the case of disease. One goal of the modENCODE (model organism ENCyclopedia of DNA Elements) Project is to survey a diverse sampling of TRFs, both DNA-binding and non-DNA-binding factors, to provide a framework for the subsequent study of the mechanisms by which transcriptional regulators target the genome. Here we provide an updated map of the Drosophila melanogaster regulatory genome based on the location of 84 TRFs at various stages of development. This regulatory map reveals a variety of genomic targeting patterns, including factors with strong preferences toward proximal promoter binding, factors that target intergenic and intronic DNA, and factors with distinct chromatin state preferences. The data also highlight the stringency of the Polycomb regulatory network, and show association of the Trithorax-like (Trl) protein with hotspots of DNA binding throughout development. Furthermore, the data identify more than 5800 instances in which TRFs target DNA regions with demonstrated enhancer activity. Regions of high TRF co-occupancy are more likely to be associated with open enhancers used across cell types, while lower TRF occupancy regions are associated with complex enhancers that are also regulated at the epigenetic level. Together these data serve as a resource for the research community in the continued effort to dissect transcriptional regulatory mechanisms directing Drosophila development. © 2014 Slattery et al.; Published by Cold Spring Harbor Laboratory Press.
EMSAR: estimation of transcript abundance from RNA-seq data by mappability-based segmentation and reclustering.

PubMed

Lee, Soohyun; Seo, Chae Hwa; Alver, Burak Han; Lee, Sanghyuk; Park, Peter J

2015-09-03

RNA-seq has been widely used for genome-wide expression profiling. RNA-seq data typically consists of tens of millions of short sequenced reads from different transcripts. However, due to sequence similarity among genes and among isoforms, the source of a given read is often ambiguous. Existing approaches for estimating expression levels from RNA-seq reads tend to compromise between accuracy and computational cost. We introduce a new approach for quantifying transcript abundance from RNA-seq data. EMSAR (Estimation by Mappability-based Segmentation And Reclustering) groups reads according to the set of transcripts to which they are mapped and finds maximum likelihood estimates using a joint Poisson model for each optimal set of segments of transcripts. The method uses nearly all mapped reads, including those mapped to multiple genes. With an efficient transcriptome indexing based on modified suffix arrays, EMSAR minimizes the use of CPU time and memory while achieving accuracy comparable to the best existing methods. EMSAR is a method for quantifying transcripts from RNA-seq data with high accuracy and low computational cost. EMSAR is available at https://github.com/parklab/emsar.
Computational Prediction and Experimental Verification of New MAP Kinase Docking Sites and Substrates Including Gli Transcription Factors

PubMed Central

Whisenant, Thomas C.; Ho, David T.; Benz, Ryan W.; Rogers, Jeffrey S.; Kaake, Robyn M.; Gordon, Elizabeth A.; Huang, Lan; Baldi, Pierre; Bardwell, Lee

2010-01-01

In order to fully understand protein kinase networks, new methods are needed to identify regulators and substrates of kinases, especially for weakly expressed proteins. Here we have developed a hybrid computational search algorithm that combines machine learning and expert knowledge to identify kinase docking sites, and used this algorithm to search the human genome for novel MAP kinase substrates and regulators focused on the JNK family of MAP kinases. Predictions were tested by peptide array followed by rigorous biochemical verification with in vitro binding and kinase assays on wild-type and mutant proteins. Using this procedure, we found new ‘D-site’ class docking sites in previously known JNK substrates (hnRNP-K, PPM1J/PP2Czeta), as well as new JNK-interacting proteins (MLL4, NEIL1). Finally, we identified new D-site-dependent MAPK substrates, including the hedgehog-regulated transcription factors Gli1 and Gli3, suggesting that a direct connection between MAP kinase and hedgehog signaling may occur at the level of these key regulators. These results demonstrate that a genome-wide search for MAP kinase docking sites can be used to find new docking sites and substrates. PMID:20865152
De novo assembly, characterization and functional annotation of pineapple fruit transcriptome through massively parallel sequencing.

PubMed

Ong, Wen Dee; Voo, Lok-Yung Christopher; Kumar, Vijay Subbiah

2012-01-01

Pineapple (Ananas comosus var. comosus), is an important tropical non-climacteric fruit with high commercial potential. Understanding the mechanism and processes underlying fruit ripening would enable scientists to enhance the improvement of quality traits such as, flavor, texture, appearance and fruit sweetness. Although, the pineapple is an important fruit, there is insufficient transcriptomic or genomic information that is available in public databases. Application of high throughput transcriptome sequencing to profile the pineapple fruit transcripts is therefore needed. To facilitate this, we have performed transcriptome sequencing of ripe yellow pineapple fruit flesh using Illumina technology. About 4.7 millions Illumina paired-end reads were generated and assembled using the Velvet de novo assembler. The assembly produced 28,728 unique transcripts with a mean length of approximately 200 bp. Sequence similarity search against non-redundant NCBI database identified a total of 16,932 unique transcripts (58.93%) with significant hits. Out of these, 15,507 unique transcripts were assigned to gene ontology terms. Functional annotation against Kyoto Encyclopedia of Genes and Genomes pathway database identified 13,598 unique transcripts (47.33%) which were mapped to 126 pathways. The assembly revealed many transcripts that were previously unknown. The unique transcripts derived from this work have rapidly increased of the number of the pineapple fruit mRNA transcripts as it is now available in public databases. This information can be further utilized in gene expression, genomics and other functional genomics studies in pineapple.
De Novo Assembly, Characterization and Functional Annotation of Pineapple Fruit Transcriptome through Massively Parallel Sequencing

PubMed Central

Ong, Wen Dee; Voo, Lok-Yung Christopher; Kumar, Vijay Subbiah

2012-01-01

Background Pineapple (Ananas comosus var. comosus), is an important tropical non-climacteric fruit with high commercial potential. Understanding the mechanism and processes underlying fruit ripening would enable scientists to enhance the improvement of quality traits such as, flavor, texture, appearance and fruit sweetness. Although, the pineapple is an important fruit, there is insufficient transcriptomic or genomic information that is available in public databases. Application of high throughput transcriptome sequencing to profile the pineapple fruit transcripts is therefore needed. Methodology/Principal Findings To facilitate this, we have performed transcriptome sequencing of ripe yellow pineapple fruit flesh using Illumina technology. About 4.7 millions Illumina paired-end reads were generated and assembled using the Velvet de novo assembler. The assembly produced 28,728 unique transcripts with a mean length of approximately 200 bp. Sequence similarity search against non-redundant NCBI database identified a total of 16,932 unique transcripts (58.93%) with significant hits. Out of these, 15,507 unique transcripts were assigned to gene ontology terms. Functional annotation against Kyoto Encyclopedia of Genes and Genomes pathway database identified 13,598 unique transcripts (47.33%) which were mapped to 126 pathways. The assembly revealed many transcripts that were previously unknown. Conclusions The unique transcripts derived from this work have rapidly increased of the number of the pineapple fruit mRNA transcripts as it is now available in public databases. This information can be further utilized in gene expression, genomics and other functional genomics studies in pineapple. PMID:23091603
Rice Annotation Project Database (RAP-DB): an integrative and interactive database for rice genomics.

PubMed

Sakai, Hiroaki; Lee, Sung Shin; Tanaka, Tsuyoshi; Numa, Hisataka; Kim, Jungsok; Kawahara, Yoshihiro; Wakimoto, Hironobu; Yang, Ching-chia; Iwamoto, Masao; Abe, Takashi; Yamada, Yuko; Muto, Akira; Inokuchi, Hachiro; Ikemura, Toshimichi; Matsumoto, Takashi; Sasaki, Takuji; Itoh, Takeshi

2013-02-01

The Rice Annotation Project Database (RAP-DB, http://rapdb.dna.affrc.go.jp/) has been providing a comprehensive set of gene annotations for the genome sequence of rice, Oryza sativa (japonica group) cv. Nipponbare. Since the first release in 2005, RAP-DB has been updated several times along with the genome assembly updates. Here, we present our newest RAP-DB based on the latest genome assembly, Os-Nipponbare-Reference-IRGSP-1.0 (IRGSP-1.0), which was released in 2011. We detected 37,869 loci by mapping transcript and protein sequences of 150 monocot species. To provide plant researchers with highly reliable and up to date rice gene annotations, we have been incorporating literature-based manually curated data, and 1,626 loci currently incorporate literature-based annotation data, including commonly used gene names or gene symbols. Transcriptional activities are shown at the nucleotide level by mapping RNA-Seq reads derived from 27 samples. We also mapped the Illumina reads of a Japanese leading japonica cultivar, Koshihikari, and a Chinese indica cultivar, Guangluai-4, to the genome and show alignments together with the single nucleotide polymorphisms (SNPs) and gene functional annotations through a newly developed browser, Short-Read Assembly Browser (S-RAB). We have developed two satellite databases, Plant Gene Family Database (PGFD) and Integrative Database of Cereal Gene Phylogeny (IDCGP), which display gene family and homologous gene relationships among diverse plant species. RAP-DB and the satellite databases offer simple and user-friendly web interfaces, enabling plant and genome researchers to access the data easily and facilitating a broad range of plant research topics.

Rice-Map: a new-generation rice genome browser.

PubMed

Wang, Jun; Kong, Lei; Zhao, Shuqi; Zhang, He; Tang, Liang; Li, Zhe; Gu, Xiaocheng; Luo, Jingchu; Gao, Ge

2011-03-30

The concurrent release of rice genome sequences for two subspecies (Oryza sativa L. ssp. japonica and Oryza sativa L. ssp. indica) facilitates rice studies at the whole genome level. Since the advent of high-throughput analysis, huge amounts of functional genomics data have been delivered rapidly, making an integrated online genome browser indispensable for scientists to visualize and analyze these data. Based on next-generation web technologies and high-throughput experimental data, we have developed Rice-Map, a novel genome browser for researchers to navigate, analyze and annotate rice genome interactively. More than one hundred annotation tracks (81 for japonica and 82 for indica) have been compiled and loaded into Rice-Map. These pre-computed annotations cover gene models, transcript evidences, expression profiling, epigenetic modifications, inter-species and intra-species homologies, genetic markers and other genomic features. In addition to these pre-computed tracks, registered users can interactively add comments and research notes to Rice-Map as User-Defined Annotation entries. By smoothly scrolling, dragging and zooming, users can browse various genomic features simultaneously at multiple scales. On-the-fly analysis for selected entries could be performed through dedicated bioinformatic analysis platforms such as WebLab and Galaxy. Furthermore, a BioMart-powered data warehouse "Rice Mart" is offered for advanced users to fetch bulk datasets based on complex criteria. Rice-Map delivers abundant up-to-date japonica and indica annotations, providing a valuable resource for both computational and bench biologists. Rice-Map is publicly accessible at http://www.ricemap.org/, with all data available for free downloading.
Activation of individual L1 retrotransposon instances is restricted to cell-type dependent permissive loci

PubMed Central

Philippe, Claude; Vargas-Landin, Dulce B; Doucet, Aurélien J; van Essen, Dominic; Vera-Otarola, Jorge; Kuciak, Monika; Corbin, Antoine; Nigumann, Pilvi; Cristofari, Gaël

2016-01-01

LINE-1 (L1) retrotransposons represent approximately one sixth of the human genome, but only the human-specific L1HS-Ta subfamily acts as an endogenous mutagen in modern humans, reshaping both somatic and germline genomes. Due to their high levels of sequence identity and the existence of many polymorphic insertions absent from the reference genome, the transcriptional activation of individual genomic L1HS-Ta copies remains poorly understood. Here we comprehensively mapped fixed and polymorphic L1HS-Ta copies in 12 commonly-used somatic cell lines, and identified transcriptional and epigenetic signatures allowing the unambiguous identification of active L1HS-Ta copies in their genomic context. Strikingly, only a very restricted subset of L1HS-Ta loci - some being polymorphic among individuals - significantly contributes to the bulk of L1 expression, and these loci are differentially regulated among distinct cell lines. Thus, our data support a local model of L1 transcriptional activation in somatic cells, governed by individual-, locus-, and cell-type-specific determinants. DOI: http://dx.doi.org/10.7554/eLife.13926.001 PMID:27016617
Mechanisms and dynamics of nuclear lamina-genome interactions.

PubMed

Amendola, Mario; van Steensel, Bas

2014-06-01

The nuclear lamina (NL) interacts with the genomic DNA and is thought to influence chromosome organization and gene expression. Both DNA sequences and histone modifications are important for NL tethering of the genomic DNA. These interactions are dynamic in individual cells and can change during differentiation and development. Evidence is accumulating that the NL contributes to the repression of transcription. Advances in mapping, genome-editing and microscopy techniques are increasing our understanding of the molecular mechanisms involved in NL-genome interactions. Copyright © 2014 Elsevier Ltd. All rights reserved.
Comparative analysis reveals genomic features of stress-induced transcriptional readthrough

PubMed Central

Vilborg, Anna; Sabath, Niv; Wiesel, Yuval; Nathans, Jenny; Levy-Adam, Flonia; Yario, Therese A.; Steitz, Joan A.; Shalgi, Reut

2017-01-01

Transcription is a highly regulated process, and stress-induced changes in gene transcription have been shown to play a major role in stress responses and adaptation. Genome-wide studies reveal prevalent transcription beyond known protein-coding gene loci, generating a variety of RNA classes, most of unknown function. One such class, termed downstream of gene-containing transcripts (DoGs), was reported to result from transcriptional readthrough upon osmotic stress in human cells. However, how widespread the readthrough phenomenon is, and what its causes and consequences are, remain elusive. Here we present a genome-wide mapping of transcriptional readthrough, using nuclear RNA-Seq, comparing heat shock, osmotic stress, and oxidative stress in NIH 3T3 mouse fibroblast cells. We observe massive induction of transcriptional readthrough, both in levels and length, under all stress conditions, with significant, yet not complete, overlap of readthrough-induced loci between different conditions. Importantly, our analyses suggest that stress-induced transcriptional readthrough is not a random failure process, but is rather differentially induced across different conditions. We explore potential regulators and find a role for HSF1 in the induction of a subset of heat shock-induced readthrough transcripts. Analysis of public datasets detected increases in polymerase II occupancy in DoG regions after heat shock, supporting our findings. Interestingly, DoGs tend to be produced in the vicinity of neighboring genes, leading to a marked increase in their antisense-generating potential. Finally, we examine genomic features of readthrough transcription and observe a unique chromatin signature typical of DoG-producing regions, suggesting that readthrough transcription is associated with the maintenance of an open chromatin state. PMID:28928151
Identification of novel non-coding small RNAs from Streptococcus pneumoniae TIGR4 using high-resolution genome tiling arrays

PubMed Central

2010-01-01

Background The identification of non-coding transcripts in human, mouse, and Escherichia coli has revealed their widespread occurrence and functional importance in both eukaryotic and prokaryotic life. In prokaryotes, studies have shown that non-coding transcripts participate in a broad range of cellular functions like gene regulation, stress and virulence. However, very little is known about non-coding transcripts in Streptococcus pneumoniae (pneumococcus), an obligate human respiratory pathogen responsible for significant worldwide morbidity and mortality. Tiling microarrays enable genome wide mRNA profiling as well as identification of novel transcripts at a high-resolution. Results Here, we describe a high-resolution transcription map of the S. pneumoniae clinical isolate TIGR4 using genomic tiling arrays. Our results indicate that approximately 66% of the genome is expressed under our experimental conditions. We identified a total of 50 non-coding small RNAs (sRNAs) from the intergenic regions, of which 36 had no predicted function. Half of the identified sRNA sequences were found to be unique to S. pneumoniae genome. We identified eight overrepresented sequence motifs among sRNA sequences that correspond to sRNAs in different functional categories. Tiling arrays also identified approximately 202 operon structures in the genome. Conclusions In summary, the pneumococcal operon structures and novel sRNAs identified in this study enhance our understanding of the complexity and extent of the pneumococcal 'expressed' genome. Furthermore, the results of this study open up new avenues of research for understanding the complex RNA regulatory network governing S. pneumoniae physiology and virulence. PMID:20525227
Integrating and mining the chromatin landscape of cell-type specificity using self-organizing maps.

PubMed

Mortazavi, Ali; Pepke, Shirley; Jansen, Camden; Marinov, Georgi K; Ernst, Jason; Kellis, Manolis; Hardison, Ross C; Myers, Richard M; Wold, Barbara J

2013-12-01

We tested whether self-organizing maps (SOMs) could be used to effectively integrate, visualize, and mine diverse genomics data types, including complex chromatin signatures. A fine-grained SOM was trained on 72 ChIP-seq histone modifications and DNase-seq data sets from six biologically diverse cell lines studied by The ENCODE Project Consortium. We mined the resulting SOM to identify chromatin signatures related to sequence-specific transcription factor occupancy, sequence motif enrichment, and biological functions. To highlight clusters enriched for specific functions such as transcriptional promoters or enhancers, we overlaid onto the map additional data sets not used during training, such as ChIP-seq, RNA-seq, CAGE, and information on cis-acting regulatory modules from the literature. We used the SOM to parse known transcriptional enhancers according to the cell-type-specific chromatin signature, and we further corroborated this pattern on the map by EP300 (also known as p300) occupancy. New candidate cell-type-specific enhancers were identified for multiple ENCODE cell types in this way, along with new candidates for ubiquitous enhancer activity. An interactive web interface was developed to allow users to visualize and custom-mine the ENCODE SOM. We conclude that large SOMs trained on chromatin data from multiple cell types provide a powerful way to identify complex relationships in genomic data at user-selected levels of granularity.
Integrating and mining the chromatin landscape of cell-type specificity using self-organizing maps

PubMed Central

Mortazavi, Ali; Pepke, Shirley; Jansen, Camden; Marinov, Georgi K.; Ernst, Jason; Kellis, Manolis; Hardison, Ross C.; Myers, Richard M.; Wold, Barbara J.

2013-01-01

We tested whether self-organizing maps (SOMs) could be used to effectively integrate, visualize, and mine diverse genomics data types, including complex chromatin signatures. A fine-grained SOM was trained on 72 ChIP-seq histone modifications and DNase-seq data sets from six biologically diverse cell lines studied by The ENCODE Project Consortium. We mined the resulting SOM to identify chromatin signatures related to sequence-specific transcription factor occupancy, sequence motif enrichment, and biological functions. To highlight clusters enriched for specific functions such as transcriptional promoters or enhancers, we overlaid onto the map additional data sets not used during training, such as ChIP-seq, RNA-seq, CAGE, and information on cis-acting regulatory modules from the literature. We used the SOM to parse known transcriptional enhancers according to the cell-type-specific chromatin signature, and we further corroborated this pattern on the map by EP300 (also known as p300) occupancy. New candidate cell-type-specific enhancers were identified for multiple ENCODE cell types in this way, along with new candidates for ubiquitous enhancer activity. An interactive web interface was developed to allow users to visualize and custom-mine the ENCODE SOM. We conclude that large SOMs trained on chromatin data from multiple cell types provide a powerful way to identify complex relationships in genomic data at user-selected levels of granularity. PMID:24170599
The primary transcriptome of the marine diazotroph Trichodesmium erythraeum IMS101

NASA Astrophysics Data System (ADS)

Pfreundt, Ulrike; Kopf, Matthias; Belkin, Natalia; Berman-Frank, Ilana; Hess, Wolfgang R.

2014-08-01

Blooms of the dinitrogen-fixing marine cyanobacterium Trichodesmium considerably contribute to new nitrogen inputs into tropical oceans. Intriguingly, only 60% of the Trichodesmium erythraeum IMS101 genome sequence codes for protein, compared with ~85% in other sequenced cyanobacterial genomes. The extensive non-coding genome fraction suggests space for an unusually high number of unidentified, potentially regulatory non-protein-coding RNAs (ncRNAs). To identify the transcribed fraction of the genome, here we present a genome-wide map of transcriptional start sites (TSS) at single nucleotide resolution, revealing the activity of 6,080 promoters. We demonstrate that T. erythraeum has the highest number of actively splicing group II introns and the highest percentage of TSS yielding ncRNAs of any bacterium examined to date. We identified a highly transcribed retroelement that serves as template repeat for the targeted mutation of at least 12 different genes by mutagenic homing. Our findings explain the non-coding portion of the T. erythraeum genome by the transcription of an unusually high number of non-coding transcripts in addition to the known high incidence of transposable elements. We conclude that riboregulation and RNA maturation-dependent processes constitute a major part of the Trichodesmium regulatory apparatus.
Issues in the analysis of oligonucleotide tiling microarrays for transcript mapping

NASA Technical Reports Server (NTRS)

Royce, Thomas E.; Rozowsky, Joel S.; Bertone, Paul; Samanta, Manoj; Stolc, Viktor; Weissman, Sherman; Snyder, Michael; Gerstein, Mark

2005-01-01

Traditional microarrays use probes complementary to known genes to quantitate the differential gene expression between two or more conditions. Genomic tiling microarray experiments differ in that probes that span a genomic region at regular intervals are used to detect the presence or absence of transcription. This difference means the same sets of biases and the methods for addressing them are unlikely to be relevant to both types of experiment. We introduce the informatics challenges arising in the analysis of tiling microarray experiments as open problems to the scientific community and present initial approaches for the analysis of this nascent technology.
Transcriptional activation is a conserved feature of the early embryonic factor Zelda that requires a cluster of four zinc fingers for DNA binding and a low-complexity activation domain.

PubMed

Hamm, Danielle C; Bondra, Eliana R; Harrison, Melissa M

2015-02-06

Delayed transcriptional activation of the zygotic genome is a nearly universal phenomenon in metazoans. Immediately following fertilization, development is controlled by maternally deposited products, and it is not until later stages that widespread activation of the zygotic genome occurs. Although the mechanisms driving this genome activation are currently unknown, the transcriptional activator Zelda (ZLD) has been shown to be instrumental in driving this process in Drosophila melanogaster. Here we define functional domains of ZLD required for both DNA binding and transcriptional activation. We show that the C-terminal cluster of four zinc fingers mediates binding to TAGteam DNA elements in the promoters of early expressed genes. All four zinc fingers are required for this activity, and splice isoforms lacking three of the four zinc fingers fail to activate transcription. These truncated splice isoforms dominantly suppress activation by the full-length, embryonically expressed isoform. We map the transcriptional activation domain of ZLD to a central region characterized by low complexity. Despite relatively little sequence conservation within this domain, ZLD orthologs from Drosophila virilis, Anopheles gambiae, and Nasonia vitripennis activate transcription in D. melanogaster cells. Transcriptional activation by these ZLD orthologs suggests that ZLD functions through conserved interactions with a protein cofactor(s). We have identified distinct DNA-binding and activation domains within the critical transcription factor ZLD that controls the initial activation of the zygotic genome. © 2015 by The American Society for Biochemistry and Molecular Biology, Inc.
Complex multi-enhancer contacts captured by genome architecture mapping.

PubMed

Beagrie, Robert A; Scialdone, Antonio; Schueler, Markus; Kraemer, Dorothee C A; Chotalia, Mita; Xie, Sheila Q; Barbieri, Mariano; de Santiago, Inês; Lavitas, Liron-Mark; Branco, Miguel R; Fraser, James; Dostie, Josée; Game, Laurence; Dillon, Niall; Edwards, Paul A W; Nicodemi, Mario; Pombo, Ana

2017-03-23

The organization of the genome in the nucleus and the interactions of genes with their regulatory elements are key features of transcriptional control and their disruption can cause disease. Here we report a genome-wide method, genome architecture mapping (GAM), for measuring chromatin contacts and other features of three-dimensional chromatin topology on the basis of sequencing DNA from a large collection of thin nuclear sections. We apply GAM to mouse embryonic stem cells and identify enrichment for specific interactions between active genes and enhancers across very large genomic distances using a mathematical model termed SLICE (statistical inference of co-segregation). GAM also reveals an abundance of three-way contacts across the genome, especially between regions that are highly transcribed or contain super-enhancers, providing a level of insight into genome architecture that, owing to the technical limitations of current technologies, has previously remained unattainable. Furthermore, GAM highlights a role for gene-expression-specific contacts in organizing the genome in mammalian nuclei.
Breaking the 1000-gene barrier for Mimivirus using ultra-deep genome and transcriptome sequencing.

PubMed

Legendre, Matthieu; Santini, Sébastien; Rico, Alain; Abergel, Chantal; Claverie, Jean-Michel

2011-03-04

Mimivirus, a giant dsDNA virus infecting Acanthamoeba, is the prototype of the mimiviridae family, the latest addition to the family of the nucleocytoplasmic large DNA viruses (NCLDVs). Its 1.2 Mb-genome was initially predicted to encode 917 genes. A subsequent RNA-Seq analysis precisely mapped many transcript boundaries and identified 75 new genes. We now report a much deeper analysis using the SOLiD™ technology combining RNA-Seq of the Mimivirus transcriptome during the infectious cycle (202.4 Million reads), and a complete genome re-sequencing (45.3 Million reads). This study corrected the genome sequence and identified several single nucleotide polymorphisms. Our results also provided clear evidence of previously overlooked transcription units, including an important RNA polymerase subunit distantly related to Euryarchea homologues. The total Mimivirus gene count is now 1018, 11% greater than the original annotation. This study highlights the huge progress brought about by ultra-deep sequencing for the comprehensive annotation of virus genomes, opening the door to a complete one-nucleotide resolution level description of their transcriptional activity, and to the realistic modeling of the viral genome expression at the ultimate molecular level. This work also illustrates the need to go beyond bioinformatics-only approaches for the annotation of short protein and non-coding genes in viral genomes.
Genome-wide DNA methylation map of human neutrophils reveals widespread inter-individual epigenetic variation

PubMed Central

Chatterjee, Aniruddha; Stockwell, Peter A.; Rodger, Euan J.; Duncan, Elizabeth J.; Parry, Matthew F.; Weeks, Robert J.; Morison, Ian M.

2015-01-01

The extent of variation in DNA methylation patterns in healthy individuals is not yet well documented. Identification of inter-individual epigenetic variation is important for understanding phenotypic variation and disease susceptibility. Using neutrophils from a cohort of healthy individuals, we generated base-resolution DNA methylation maps to document inter-individual epigenetic variation. We identified 12851 autosomal inter-individual variably methylated fragments (iVMFs). Gene promoters were the least variable, whereas gene body and upstream regions showed higher variation in DNA methylation. The iVMFs were relatively enriched in repetitive elements compared to non-iVMFs, and were associated with genome regulation and chromatin function elements. Further, variably methylated genes were disproportionately associated with regulation of transcription, responsive function and signal transduction pathways. Transcriptome analysis indicates that iVMF methylation at differentially expressed exons has a positive correlation and local effect on the inclusion of that exon in the mRNA transcript. PMID:26612583
Improving the genome annotation of the acarbose producer Actinoplanes sp. SE50/110 by sequencing enriched 5'-ends of primary transcripts.

PubMed

Schwientek, Patrick; Neshat, Armin; Kalinowski, Jörn; Klein, Andreas; Rückert, Christian; Schneiker-Bekel, Susanne; Wendler, Sergej; Stoye, Jens; Pühler, Alfred

2014-11-20

Actinoplanes sp. SE50/110 is the producer of the alpha-glucosidase inhibitor acarbose, which is an economically relevant and potent drug in the treatment of type-2 diabetes mellitus. In this study, we present the detection of transcription start sites on this genome by sequencing enriched 5'-ends of primary transcripts. Altogether, 1427 putative transcription start sites were initially identified. With help of the annotated genome sequence, 661 transcription start sites were found to belong to the leader region of protein-coding genes with the surprising result that roughly 20% of these genes rank among the class of leaderless transcripts. Next, conserved promoter motifs were identified for protein-coding genes with and without leader sequences. The mapped transcription start sites were finally used to improve the annotation of the Actinoplanes sp. SE50/110 genome sequence. Concerning protein-coding genes, 41 translation start sites were corrected and 9 novel protein-coding genes could be identified. In addition to this, 122 previously undetermined non-coding RNA (ncRNA) genes of Actinoplanes sp. SE50/110 were defined. Focusing on antisense transcription start sites located within coding genes or their leader sequences, it was discovered that 96 of those ncRNA genes belong to the class of antisense RNA (asRNA) genes. The remaining 26 ncRNA genes were found outside of known protein-coding genes. Four chosen examples of prominent ncRNA genes, namely the transfer messenger RNA gene ssrA, the ribonuclease P class A RNA gene rnpB, the cobalamin riboswitch RNA gene cobRS, and the selenocysteine-specific tRNA gene selC, are presented in more detail. This study demonstrates that sequencing of enriched 5'-ends of primary transcripts and the identification of transcription start sites are valuable tools for advanced genome annotation of Actinoplanes sp. SE50/110 and most probably also for other bacteria. Copyright © 2014 Elsevier B.V. All rights reserved.
Decomposing Oncogenic Transcriptional Signatures to Generate Maps of Divergent Cellular States.

PubMed

Kim, Jong Wook; Abudayyeh, Omar O; Yeerna, Huwate; Yeang, Chen-Hsiang; Stewart, Michelle; Jenkins, Russell W; Kitajima, Shunsuke; Konieczkowski, David J; Medetgul-Ernar, Kate; Cavazos, Taylor; Mah, Clarence; Ting, Stephanie; Van Allen, Eliezer M; Cohen, Ofir; Mcdermott, John; Damato, Emily; Aguirre, Andrew J; Liang, Jonathan; Liberzon, Arthur; Alexe, Gabriella; Doench, John; Ghandi, Mahmoud; Vazquez, Francisca; Weir, Barbara A; Tsherniak, Aviad; Subramanian, Aravind; Meneses-Cime, Karina; Park, Jason; Clemons, Paul; Garraway, Levi A; Thomas, David; Boehm, Jesse S; Barbie, David A; Hahn, William C; Mesirov, Jill P; Tamayo, Pablo

2017-08-23

The systematic sequencing of the cancer genome has led to the identification of numerous genetic alterations in cancer. However, a deeper understanding of the functional consequences of these alterations is necessary to guide appropriate therapeutic strategies. Here, we describe Onco-GPS (OncoGenic Positioning System), a data-driven analysis framework to organize individual tumor samples with shared oncogenic alterations onto a reference map defined by their underlying cellular states. We applied the methodology to the RAS pathway and identified nine distinct components that reflect transcriptional activities downstream of RAS and defined several functional states associated with patterns of transcriptional component activation that associates with genomic hallmarks and response to genetic and pharmacological perturbations. These results show that the Onco-GPS is an effective approach to explore the complex landscape of oncogenic cellular states across cancers, and an analytic framework to summarize knowledge, establish relationships, and generate more effective disease models for research or as part of individualized precision medicine paradigms. Copyright © 2017 Elsevier Inc. All rights reserved.
Identification of functional elements and regulatory circuits by Drosophila modENCODE

DOE Office of Scientific and Technical Information (OSTI.GOV)

Roy, Sushmita; Ernst, Jason; Kharchenko, Peter V.

2010-12-22

To gain insight into how genomic information is translated into cellular and developmental programs, the Drosophila model organism Encyclopedia of DNA Elements (modENCODE) project is comprehensively mapping transcripts, histone modifications, chromosomal proteins, transcription factors, replication proteins and intermediates, and nucleosome properties across a developmental time course and in multiple cell lines. We have generated more than 700 data sets and discovered protein-coding, noncoding, RNA regulatory, replication, and chromatin elements, more than tripling the annotated portion of the Drosophila genome. Correlated activity patterns of these elements reveal a functional regulatory network, which predicts putative new functions for genes, reveals stage- andmore » tissue-specific regulators, and enables gene-expression prediction. Our results provide a foundation for directed experimental and computational studies in Drosophila and related species and also a model for systematic data integration toward comprehensive genomic and functional annotation. Several years after the complete genetic sequencing of many species, it is still unclear how to translate genomic information into a functional map of cellular and developmental programs. The Encyclopedia of DNA Elements (ENCODE) (1) and model organism ENCODE (modENCODE) (2) projects use diverse genomic assays to comprehensively annotate the Homo sapiens (human), Drosophila melanogaster (fruit fly), and Caenorhabditis elegans (worm) genomes, through systematic generation and computational integration of functional genomic data sets. Previous genomic studies in flies have made seminal contributions to our understanding of basic biological mechanisms and genome functions, facilitated by genetic, experimental, computational, and manual annotation of the euchromatic and heterochromatic genome (3), small genome size, short life cycle, and a deep knowledge of development, gene function, and chromosome biology. The functions of {approx}40% of the protein and nonprotein-coding genes [FlyBase 5.12 (4)] have been determined from cDNA collections (5, 6), manual curation of gene models (7), gene mutations and comprehensive genome-wide RNA interference screens (8-10), and comparative genomic analyses (11, 12). The Drosophila modENCODE project has generated more than 700 data sets that profile transcripts, histone modifications and physical nucleosome properties, general and specific transcription factors (TFs), and replication programs in cell lines, isolated tissues, and whole organisms across several developmental stages (Fig. 1). Here, we computationally integrate these data sets and report (i) improved and additional genome annotations, including full-length proteincoding genes and peptides as short as 21 amino acids; (ii) noncoding transcripts, including 132 candidate structural RNAs and 1608 nonstructural transcripts; (iii) additional Argonaute (Ago)-associated small RNA genes and pathways, including new microRNAs (miRNAs) encoded within protein-coding exons and endogenous small interfering RNAs (siRNAs) from 3-inch untranslated regions; (iv) chromatin 'states' defined by combinatorial patterns of 18 chromatin marks that are associated with distinct functions and properties; (v) regions of high TF occupancy and replication activity with likely epigenetic regulation; (vi)mixed TF and miRNA regulatory networks with hierarchical structure and enriched feed-forward loops; (vii) coexpression- and co-regulation-based functional annotations for nearly 3000 genes; (viii) stage- and tissue-specific regulators; and (ix) predictive models of gene expression levels and regulator function.« less
Global Mapping of Transcription Factor Binding Sites by Sequencing Chromatin Surrogates: a Perspective on Experimental Design, Data Analysis, and Open Problems.

PubMed

Wei, Yingying; Wu, George; Ji, Hongkai

2013-05-01

Mapping genome-wide binding sites of all transcription factors (TFs) in all biological contexts is a critical step toward understanding gene regulation. The state-of-the-art technologies for mapping transcription factor binding sites (TFBSs) couple chromatin immunoprecipitation (ChIP) with high-throughput sequencing (ChIP-seq) or tiling array hybridization (ChIP-chip). These technologies have limitations: they are low-throughput with respect to surveying many TFs. Recent advances in genome-wide chromatin profiling, including development of technologies such as DNase-seq, FAIRE-seq and ChIP-seq for histone modifications, make it possible to predict in vivo TFBSs by analyzing chromatin features at computationally determined DNA motif sites. This promising new approach may allow researchers to monitor the genome-wide binding sites of many TFs simultaneously. In this article, we discuss various experimental design and data analysis issues that arise when applying this approach. Through a systematic analysis of the data from the Encyclopedia Of DNA Elements (ENCODE) project, we compare the predictive power of individual and combinations of chromatin marks using supervised and unsupervised learning methods, and evaluate the value of integrating information from public ChIP and gene expression data. We also highlight the challenges and opportunities for developing novel analytical methods, such as resolving the one-motif-multiple-TF ambiguity and distinguishing functional and non-functional TF binding targets from the predicted binding sites. The online version of this article (doi:10.1007/s12561-012-9066-5) contains supplementary material, which is available to authorized users.
C2H2 type of zinc finger transcription factors in foxtail millet define response to abiotic stresses.

PubMed

Muthamilarasan, Mehanathan; Bonthala, Venkata Suresh; Mishra, Awdhesh Kumar; Khandelwal, Rohit; Khan, Yusuf; Roy, Riti; Prasad, Manoj

2014-09-01

C2H2 type of zinc finger transcription factors (TFs) play crucial roles in plant stress response and hormone signal transduction. Hence considering its importance, genome-wide investigation and characterization of C2H2 zinc finger proteins were performed in Arabidopsis, rice and poplar but no such study was conducted in foxtail millet which is a C4 Panicoid model crop well known for its abiotic stress tolerance. The present study identified 124 C2H2-type zinc finger TFs in foxtail millet (SiC2H2) and physically mapped them onto the genome. The gene duplication analysis revealed that SiC2H2s primarily expanded in the genome through tandem duplication. The phylogenetic tree classified these TFs into five groups (I-V). Further, miRNAs targeting SiC2H2 transcripts in foxtail millet were identified. Heat map demonstrated differential and tissue-specific expression patterns of these SiC2H2 genes. Comparative physical mapping between foxtail millet SiC2H2 genes and its orthologs of sorghum, maize and rice revealed the evolutionary relationships of C2H2 type of zinc finger TFs. The duplication and divergence data provided novel insight into the evolutionary aspects of these TFs in foxtail millet and related grass species. Expression profiling of candidate SiC2H2 genes in response to salinity, dehydration and cold stress showed differential expression pattern of these genes at different time points of stresses.
CTCF-Mediated Human 3D Genome Architecture Reveals Chromatin Topology for Transcription.

PubMed

Tang, Zhonghui; Luo, Oscar Junhong; Li, Xingwang; Zheng, Meizhen; Zhu, Jacqueline Jufen; Szalaj, Przemyslaw; Trzaskoma, Pawel; Magalska, Adriana; Wlodarczyk, Jakub; Ruszczycki, Blazej; Michalski, Paul; Piecuch, Emaly; Wang, Ping; Wang, Danjuan; Tian, Simon Zhongyuan; Penrad-Mobayed, May; Sachs, Laurent M; Ruan, Xiaoan; Wei, Chia-Lin; Liu, Edison T; Wilczynski, Grzegorz M; Plewczynski, Dariusz; Li, Guoliang; Ruan, Yijun

2015-12-17

Spatial genome organization and its effect on transcription remains a fundamental question. We applied an advanced chromatin interaction analysis by paired-end tag sequencing (ChIA-PET) strategy to comprehensively map higher-order chromosome folding and specific chromatin interactions mediated by CCCTC-binding factor (CTCF) and RNA polymerase II (RNAPII) with haplotype specificity and nucleotide resolution in different human cell lineages. We find that CTCF/cohesin-mediated interaction anchors serve as structural foci for spatial organization of constitutive genes concordant with CTCF-motif orientation, whereas RNAPII interacts within these structures by selectively drawing cell-type-specific genes toward CTCF foci for coordinated transcription. Furthermore, we show that haplotype variants and allelic interactions have differential effects on chromosome configuration, influencing gene expression, and may provide mechanistic insights into functions associated with disease susceptibility. 3D genome simulation suggests a model of chromatin folding around chromosomal axes, where CTCF is involved in defining the interface between condensed and open compartments for structural regulation. Our 3D genome strategy thus provides unique insights in the topological mechanism of human variations and diseases. Copyright © 2015 Elsevier Inc. All rights reserved.
Transcription Mapping of the Kaposi’s Sarcoma-Associated Herpesvirus (Human Herpesvirus 8) Genome in a Body Cavity-Based Lymphoma Cell Line (BC-1)

PubMed Central

Sarid, Ronit; Flore, Ornella; Bohenzky, Roy A.; Chang, Yuan; Moore, Patrick S.

1998-01-01

Kaposi’s sarcoma-associated herpesvirus (KSHV) gene transcription in the BC-1 cell line (KSHV and Epstein-Barr virus coinfected) was examined by using Northern analysis with DNA probes extending across the viral genome except for a 3-kb unclonable rightmost region. Three broad classes of viral gene transcription have been identified. Class I genes, such as those encoding the v-cyclin, latency-associated nuclear antigen, and v-FLIP, are constitutively transcribed under standard growth conditions, are unaffected by tetradecanoylphorbol acetate (TPA) induction, and presumably represent latent viral transcripts. Class II genes are primarily clustered in nonconserved regions of the genome and include small polyadenylated RNAs (T0.7 and T1.1) as well as most of the virus-encoded cytokines and signal transduction genes. Class II genes are transcribed without TPA treatment but are induced to higher transcription levels by TPA treatment. Class III genes are primarily structural and replication genes that are transcribed only following TPA treatment and are presumably responsible for lytic virion production. These results indicate that BC-1 cells have detectable transcription of a number of KSHV genes, particularly nonconserved genes involved in cellular signal transduction and regulation, during noninduced (latent) virus culture. PMID:9444993

mQTL-seq delineates functionally relevant candidate gene harbouring a major QTL regulating pod number in chickpea

PubMed Central

Das, Shouvik; Singh, Mohar; Srivastava, Rishi; Bajaj, Deepak; Saxena, Maneesha S.; Rana, Jai C.; Bansal, Kailash C.; Tyagi, Akhilesh K.; Parida, Swarup K.

2016-01-01

The present study used a whole-genome, NGS resequencing-based mQTL-seq (multiple QTL-seq) strategy in two inter-specific mapping populations (Pusa 1103 × ILWC 46 and Pusa 256 × ILWC 46) to scan the major genomic region(s) underlying QTL(s) governing pod number trait in chickpea. Essentially, the whole-genome resequencing of low and high pod number-containing parental accessions and homozygous individuals (constituting bulks) from each of these two mapping populations discovered >8 million high-quality homozygous SNPs with respect to the reference kabuli chickpea. The functional significance of the physically mapped SNPs was apparent from the identified 2,264 non-synonymous and 23,550 regulatory SNPs, with 8–10% of these SNPs-carrying genes corresponding to transcription factors and disease resistance-related proteins. The utilization of these mined SNPs in Δ (SNP index)-led QTL-seq analysis and their correlation between two mapping populations based on mQTL-seq, narrowed down two (CaqaPN4.1: 867.8 kb and CaqaPN4.2: 1.8 Mb) major genomic regions harbouring robust pod number QTLs into the high-resolution short QTL intervals (CaqbPN4.1: 637.5 kb and CaqbPN4.2: 1.28 Mb) on chickpea chromosome 4. The integration of mQTL-seq-derived one novel robust QTL with QTL region-specific association analysis delineated the regulatory (C/T) and coding (C/A) SNPs-containing one pentatricopeptide repeat (PPR) gene at a major QTL region regulating pod number in chickpea. This target gene exhibited anther, mature pollen and pod-specific expression, including pronounced higher up-regulated (∼3.5-folds) transcript expression in high pod number-containing parental accessions and homozygous individuals of two mapping populations especially during pollen and pod development. The proposed mQTL-seq-driven combinatorial strategy has profound efficacy in rapid genome-wide scanning of potential candidate gene(s) underlying trait-associated high-resolution robust QTL(s), thereby expediting genomics-assisted breeding and genetic enhancement of crop plants, including chickpea. PMID:26685680
Analysis of the complete nucleotide sequence and functional organization of the genome of Streptococcus pneumoniae bacteriophage Cp-1.

PubMed

Martín, A C; López, R; García, P

1996-06-01

Cp-1, a bacteriophage infecting Streptococcus pneumoniae, has a linear double-stranded DNA genome, with a terminal protein covalently linked to its 5' ends, that replicates by the protein-priming mechanism. We describe here the complete DNA sequence and transcriptional map of the Cp-1 genome. These analyses have led to the firm assignment of 10 genes and the localization of 19 additional open reading frames in the 19,345-bp Cp-1 DNA. Striking similarities and differences between some of these proteins and those of the Bacillus subtilis phage phi 29, a system that also replicates its DNA by the protein-priming mechanism, have been revealed. The genes coding for structural proteins and assembly factors are located in the central part of the Cp-1 genome. Several proteins corresponding to the predicted gene products were identified by in vitro and in vivo expression of the cloned genes. Mature major head protein from the virion particles results from hydrolysis of the primary gene product at the His-49 residue, whereas the phage gene is expressed in Escherichia coli without modification. We have also identified two open reading frames coding for proteins that show high degrees of similarity to the N- and C-terminal regions, respectively, of the single tail protein identified in phi 29. Sequencing and primer extension analysis suggest transcription of a small RNA showing a secondary structure similar to that of the prohead RNA required for the ATP-dependent packaging of phi 29 DNA. On the basis of its temporal expression, transcription of the Cp-1 genome takes place in two stages, early and late. Combined Northern (RNA) blot and primer extension experiments allowed us to map the 5' initiation sites of the transcripts, and we found that only three genes were transcribed from right to left. These analyses reveal that there are also noticeable differences between Cp-l and phi 29 in transcriptional organization. Considered together, the observations reported here provide new tangible evidence on phylogenetic relationships between B. subtilis and S. pneumoniae.
Entering the Next Dimension: Plant Genomes in 3D.

PubMed

Sotelo-Silveira, Mariana; Chávez Montes, Ricardo A; Sotelo-Silveira, Jose R; Marsch-Martínez, Nayelli; de Folter, Stefan

2018-04-24

After linear sequences of genomes and epigenomic landscape data, the 3D organization of chromatin in the nucleus is the next level to be explored. Different organisms present a general hierarchical organization, with chromosome territories at the top. Chromatin interaction maps, obtained by chromosome conformation capture (3C)-based methodologies, for eight plant species reveal commonalities, but also differences, among them and with animals. The smallest structures, found in high-resolution maps of the Arabidopsis genome, are single genes. Epigenetic marks (histone modification and DNA methylation), transcriptional activity, and chromatin interaction appear to be correlated, and whether structure is the cause or consequence of the function of interacting regions is being actively investigated. Copyright © 2018 Elsevier Ltd. All rights reserved.
An Integrated Encyclopedia of DNA Elements in the Human Genome

PubMed Central

2012-01-01

Summary The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure, and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall the project provides new insights into the organization and regulation of our genes and genome, and an expansive resource of functional annotations for biomedical research. PMID:22955616
Necklace: combining reference and assembled transcriptomes for more comprehensive RNA-Seq analysis.

PubMed

Davidson, Nadia M; Oshlack, Alicia

2018-05-01

RNA sequencing (RNA-seq) analyses can benefit from performing a genome-guided and de novo assembly, in particular for species where the reference genome or the annotation is incomplete. However, tools for integrating an assembled transcriptome with reference annotation are lacking. Necklace is a software pipeline that runs genome-guided and de novo assembly and combines the resulting transcriptomes with reference genome annotations. Necklace constructs a compact but comprehensive superTranscriptome out of the assembled and reference data. Reads are subsequently aligned and counted in preparation for differential expression testing. Necklace allows a comprehensive transcriptome to be built from a combination of assembled and annotated transcripts, which results in a more comprehensive transcriptome for the majority of organisms. In addition RNA-seq data are mapped back to this newly created superTranscript reference to enable differential expression testing with standard methods.
Improving a Synechocystis-based photoautotrophic chassis through systematic genome mapping and validation of neutral sites

PubMed Central

Pinto, Filipe; Pacheco, Catarina C.; Oliveira, Paulo; Montagud, Arnau; Landels, Andrew; Couto, Narciso; Wright, Phillip C.; Urchueguía, Javier F.; Tamagnini, Paula

2015-01-01

The use of microorganisms as cell factories frequently requires extensive molecular manipulation. Therefore, the identification of genomic neutral sites for the stable integration of ectopic DNA is required to ensure a successful outcome. Here we describe the genome mapping and validation of five neutral sites in the chromosome of Synechocystis sp. PCC 6803, foreseeing the use of this cyanobacterium as a photoautotrophic chassis. To evaluate the neutrality of these loci, insertion/deletion mutants were produced, and to assess their functionality, a synthetic green fluorescent reporter module was introduced. The constructed integrative vectors include a BioBrick-compatible multiple cloning site insulated by transcription terminators, constituting robust cloning interfaces for synthetic biology approaches. Moreover, Synechocystis mutants (chassis) ready to receive purpose-built synthetic modules/circuits are also available. This work presents a systematic approach to map and validate chromosomal neutral sites in cyanobacteria, and that can be extended to other organisms. PMID:26490728
Gene Expression Profiling Reveals a Massive, Aneuploidy-Dependent Transcriptional Deregulation and Distinct Differences between Lymph Node–Negative and Lymph Node–Positive Colon Carcinomas

PubMed Central

Grade, Marian; Hörmann, Patrick; Becker, Sandra; Hummon, Amanda B.; Wangsa, Danny; Varma, Sudhir; Simon, Richard; Liersch, Torsten; Becker, Heinz; Difilippantonio, Michael J.; Ghadimi, B. Michael; Ried, Thomas

2016-01-01

To characterize patterns of global transcriptional deregulation in primary colon carcinomas, we did gene expression profiling of 73 tumors [Unio Internationale Contra Cancrum stage II (n = 33) and stage III (n = 40)] using oligonucleotide microarrays. For 30 of the tumors, expression profiles were compared with those from matched normal mucosa samples. We identified a set of 1,950 genes with highly significant deregulation between tumors and mucosa samples (P < 1e–7). A significant proportion of these genes mapped to chromosome 20 (P = 0.01). Seventeen genes had a >5-fold average expression difference between normal colon mucosa and carcinomas, including up-regulation of MYC and of HMGA1, a putative oncogene. Furthermore, we identified 68 genes that were significantly differentially expressed between lymph node–negative and lymph node–positive tumors (P < 0.001), the functional annotation of which revealed a preponderance of genes that play a role in cellular immune response and surveillance. The microarray-derived gene expression levels of 20 deregulated genes were validated using quantitative real-time reverse transcription-PCR in >40 tumor and normal mucosa samples with good concordance between the techniques. Finally, we established a relationship between specific genomic imbalances, which were mapped for 32 of the analyzed colon tumors by comparative genomic hybridization, and alterations of global transcriptional activity. Previously, we had conducted a similar analysis of primary rectal carcinomas. The systematic comparison of colon and rectal carcinomas revealed a significant overlap of genomic imbalances and transcriptional deregulation, including activation of the Wnt/β-catenin signaling cascade, suggesting similar pathogenic pathways. PMID:17210682
Gene expression profiling reveals a massive, aneuploidy-dependent transcriptional deregulation and distinct differences between lymph node-negative and lymph node-positive colon carcinomas.

PubMed

Grade, Marian; Hörmann, Patrick; Becker, Sandra; Hummon, Amanda B; Wangsa, Danny; Varma, Sudhir; Simon, Richard; Liersch, Torsten; Becker, Heinz; Difilippantonio, Michael J; Ghadimi, B Michael; Ried, Thomas

2007-01-01

To characterize patterns of global transcriptional deregulation in primary colon carcinomas, we did gene expression profiling of 73 tumors [Unio Internationale Contra Cancrum stage II (n = 33) and stage III (n = 40)] using oligonucleotide microarrays. For 30 of the tumors, expression profiles were compared with those from matched normal mucosa samples. We identified a set of 1,950 genes with highly significant deregulation between tumors and mucosa samples (P < 1e-7). A significant proportion of these genes mapped to chromosome 20 (P = 0.01). Seventeen genes had a >5-fold average expression difference between normal colon mucosa and carcinomas, including up-regulation of MYC and of HMGA1, a putative oncogene. Furthermore, we identified 68 genes that were significantly differentially expressed between lymph node-negative and lymph node-positive tumors (P < 0.001), the functional annotation of which revealed a preponderance of genes that play a role in cellular immune response and surveillance. The microarray-derived gene expression levels of 20 deregulated genes were validated using quantitative real-time reverse transcription-PCR in >40 tumor and normal mucosa samples with good concordance between the techniques. Finally, we established a relationship between specific genomic imbalances, which were mapped for 32 of the analyzed colon tumors by comparative genomic hybridization, and alterations of global transcriptional activity. Previously, we had conducted a similar analysis of primary rectal carcinomas. The systematic comparison of colon and rectal carcinomas revealed a significant overlap of genomic imbalances and transcriptional deregulation, including activation of the Wnt/beta-catenin signaling cascade, suggesting similar pathogenic pathways.
Draft genome of the lined seahorse, Hippocampus erectus.

PubMed

Lin, Qiang; Qiu, Ying; Gu, Ruobo; Xu, Meng; Li, Jia; Bian, Chao; Zhang, Huixian; Qin, Geng; Zhang, Yanhong; Luo, Wei; Chen, Jieming; You, Xinxin; Fan, Mingjun; Sun, Min; Xu, Pao; Venkatesh, Byrappa; Xu, Junming; Fu, Hongtuo; Shi, Qiong

2017-06-01

The lined seahorse, Hippocampus erectus , is an Atlantic species and mainly inhabits shallow sea beds or coral reefs. It has become very popular in China for its wide use in traditional Chinese medicine. In order to improve the aquaculture yield of this valuable fish species, we are trying to develop genomic resources for assistant selection in genetic breeding. Here, we provide whole genome sequencing, assembly, and gene annotation of the lined seahorse, which can enrich genome resource and further application for its molecular breeding. A total of 174.6 Gb (Gigabase) raw DNA sequences were generated by the Illumina Hiseq2500 platform. The final assembly of the lined seahorse genome is around 458 Mb, representing 94% of the estimated genome size (489 Mb by k-mer analysis). The contig N50 and scaffold N50 reached 14.57 kb and 1.97 Mb, respectively. Quality of the assembled genome was assessed by BUSCO with prediction of 85% of the known vertebrate genes and evaluated using the de novo assembled RNA-seq transcripts to prove a high mapping ratio (more than 99% transcripts could be mapped to the assembly). Using homology-based, de novo and transcriptome-based prediction methods, we predicted 20 788 protein-coding genes in the generated assembly, which is less than our previously reported gene number (23 458) of the tiger tail seahorse ( H. comes ). We report a draft genome of the lined seahorse. These generated genomic data are going to enrich genome resource of this economically important fish, and also provide insights into the genetic mechanisms of its iconic morphology and male pregnancy behavior. © The Authors 2017. Published by Oxford University Press.
Draft genome of the lined seahorse, Hippocampus erectus

PubMed Central

Lin, Qiang; Qiu, Ying; Gu, Ruobo; Xu, Meng; Li, Jia; Bian, Chao; Zhang, Huixian; Qin, Geng; Zhang, Yanhong; Luo, Wei; Chen, Jieming; You, Xinxin; Fan, Mingjun; Sun, Min; Xu, Pao; Venkatesh, Byrappa

2017-01-01

Abstract Background: The lined seahorse, Hippocampus erectus, is an Atlantic species and mainly inhabits shallow sea beds or coral reefs. It has become very popular in China for its wide use in traditional Chinese medicine. In order to improve the aquaculture yield of this valuable fish species, we are trying to develop genomic resources for assistant selection in genetic breeding. Here, we provide whole genome sequencing, assembly, and gene annotation of the lined seahorse, which can enrich genome resource and further application for its molecular breeding. Findings: A total of 174.6 Gb (Gigabase) raw DNA sequences were generated by the Illumina Hiseq2500 platform. The final assembly of the lined seahorse genome is around 458 Mb, representing 94% of the estimated genome size (489 Mb by k-mer analysis). The contig N50 and scaffold N50 reached 14.57 kb and 1.97 Mb, respectively. Quality of the assembled genome was assessed by BUSCO with prediction of 85% of the known vertebrate genes and evaluated using the de novo assembled RNA-seq transcripts to prove a high mapping ratio (more than 99% transcripts could be mapped to the assembly). Using homology-based, de novo and transcriptome-based prediction methods, we predicted 20 788 protein-coding genes in the generated assembly, which is less than our previously reported gene number (23 458) of the tiger tail seahorse (H. comes). Conclusion: We report a draft genome of the lined seahorse. These generated genomic data are going to enrich genome resource of this economically important fish, and also provide insights into the genetic mechanisms of its iconic morphology and male pregnancy behavior. PMID:28444302
Genomics of a Metamorphic Timing QTL: met1 Maps to a Unique Genomic Position and Regulates Morph and Species-Specific Patterns of Brain Transcription

PubMed Central

Page, Robert B.; Boley, Meredith A.; Kump, David K.; Voss, Stephen R.

2013-01-01

Very little is known about genetic factors that regulate life history transitions during ontogeny. Closely related tiger salamanders (Ambystoma species complex) show extreme variation in metamorphic timing, with some species foregoing metamorphosis altogether, an adaptive trait called paedomorphosis. Previous studies identified a major effect quantitative trait locus (met1) for metamorphic timing and expression of paedomorphosis in hybrid crosses between the biphasic Eastern tiger salamander (Ambystoma tigrinum tigrinum) and the paedomorphic Mexican axolotl (Ambystoma mexicanum). We used existing hybrid mapping panels and a newly created hybrid cross to map the met1 genomic region and determine the effect of met1 on larval growth, metamorphic timing, and gene expression in the brain. We show that met1 maps to the position of a urodele-specific chromosome rearrangement on linkage group 2 that uniquely brought functionally associated genes into linkage. Furthermore, we found that more than 200 genes were differentially expressed during larval development as a function of met1 genotype. This list of differentially expressed genes is enriched for proteins that function in the mitochondria, providing evidence of a link between met1, thyroid hormone signaling, and mitochondrial energetics associated with metamorphosis. Finally, we found that met1 significantly affected metamorphic timing in hybrids, but not early larval growth rate. Collectively, our results show that met1 regulates species and morph-specific patterns of brain transcription and life history variation. PMID:23946331
Sequencing, Annotation and Analysis of the Syrian Hamster (Mesocricetus auratus) Transcriptome

PubMed Central

Tchitchek, Nicolas; Safronetz, David; Rasmussen, Angela L.; Martens, Craig; Virtaneva, Kimmo; Porcella, Stephen F.; Feldmann, Heinz

2014-01-01

Background The Syrian hamster (golden hamster, Mesocricetus auratus) is gaining importance as a new experimental animal model for multiple pathogens, including emerging zoonotic diseases such as Ebola. Nevertheless there are currently no publicly available transcriptome reference sequences or genome for this species. Results A cDNA library derived from mRNA and snRNA isolated and pooled from the brains, lungs, spleens, kidneys, livers, and hearts of three adult female Syrian hamsters was sequenced. Sequence reads were assembled into 62,482 contigs and 111,796 reads remained unassembled (singletons). This combined contig/singleton dataset, designated as the Syrian hamster transcriptome, represents a total of 60,117,204 nucleotides. Our Mesocricetus auratus Syrian hamster transcriptome mapped to 11,648 mouse transcripts representing 9,562 distinct genes, and mapped to a similar number of transcripts and genes in the rat. We identified 214 quasi-complete transcripts based on mouse annotations. Canonical pathways involved in a broad spectrum of fundamental biological processes were significantly represented in the library. The Syrian hamster transcriptome was aligned to the current release of the Chinese hamster ovary (CHO) cell transcriptome and genome to improve the genomic annotation of this species. Finally, our Syrian hamster transcriptome was aligned against 14 other rodents, primate and laurasiatheria species to gain insights about the genetic relatedness and placement of this species. Conclusions This Syrian hamster transcriptome dataset significantly improves our knowledge of the Syrian hamster's transcriptome, especially towards its future use in infectious disease research. Moreover, this library is an important resource for the wider scientific community to help improve genome annotation of the Syrian hamster and other closely related species. Furthermore, these data provide the basis for development of expression microarrays that can be used in functional genomics studies. PMID:25398096
Genome-Wide Mapping of Collier In Vivo Binding Sites Highlights Its Hierarchical Position in Different Transcription Regulatory Networks

PubMed Central

Dubois, Laurence; Bataillé, Laetitia; Painset, Anaïs; Le Gras, Stéphanie; Jost, Bernard; Crozatier, Michèle; Vincent, Alain

2015-01-01

Collier, the single Drosophila COE (Collier/EBF/Olf-1) transcription factor, is required in several developmental processes, including head patterning and specification of muscle and neuron identity during embryogenesis. To identify direct Collier (Col) targets in different cell types, we used ChIP-seq to map Col binding sites throughout the genome, at mid-embryogenesis. In vivo Col binding peaks were associated to 415 potential direct target genes. Gene Ontology analysis revealed a strong enrichment in proteins with DNA binding and/or transcription-regulatory properties. Characterization of a selection of candidates, using transgenic CRM-reporter assays, identified direct Col targets in dorso-lateral somatic muscles and specific neuron types in the central nervous system. These data brought new evidence that Col direct control of the expression of the transcription regulators apterous and eyes-absent (eya) is critical to specifying neuronal identities. They also showed that cross-regulation between col and eya in muscle progenitor cells is required for specification of muscle identity, revealing a new parallel between the myogenic regulatory networks operating in Drosophila and vertebrates. Col regulation of eya, both in specific muscle and neuronal lineages, may illustrate one mechanism behind the evolutionary diversification of Col biological roles. PMID:26204530
Genome-wide profiling of PRC1 and PRC2 Polycomb chromatin binding in Drosophila melanogaster.

PubMed

Tolhuis, Bas; de Wit, Elzo; Muijrers, Inhua; Teunissen, Hans; Talhout, Wendy; van Steensel, Bas; van Lohuizen, Maarten

2006-06-01

Polycomb group (PcG) proteins maintain transcriptional repression of developmentally important genes and have been implicated in cell proliferation and stem cell self-renewal. We used a genome-wide approach to map binding patterns of PcG proteins (Pc, esc and Sce) in Drosophila melanogaster Kc cells. We found that Pc associates with large genomic regions of up to approximately 150 kb in size, hereafter referred to as 'Pc domains'. Sce and esc accompany Pc in most of these domains. PcG-bound chromatin is trimethylated at histone H3 Lys27 and is generally transcriptionally silent. Furthermore, PcG proteins preferentially bind to developmental genes. Many of these encode transcriptional regulators and key components of signal transduction pathways, including Wingless, Hedgehog, Notch and Delta. We also identify several new putative functions of PcG proteins, such as in steroid hormone biosynthesis. These results highlight the extensive involvement of PcG proteins in the coordination of development through the formation of large repressive chromatin domains.
5. international workshop on the identification of transcribed sequences

DOE Office of Scientific and Technical Information (OSTI.GOV)

NONE

1995-12-31

This workshop was held November 5--8, 1995 in Les Embiez, France. The purpose of this conference was to provide a multidisciplinary forum for exchange of state-of-the-art information on mapping the human genome. Attention is focused on the following topics: transcriptional maps; functional analysis; techniques; model organisms; and tissue specific libraries and genes. Abstracts are included of the papers that were presented.
A Consensus Genetic Map for Pinus taeda and Pinus elliottii and Extent of Linkage Disequilibrium in Two Genotype-Phenotype Discovery Populations of Pinus taeda

PubMed Central

Westbrook, Jared W.; Chhatre, Vikram E.; Wu, Le-Shin; Chamala, Srikar; Neves, Leandro Gomide; Muñoz, Patricio; Martínez-García, Pedro J.; Neale, David B.; Kirst, Matias; Mockaitis, Keithanne; Nelson, C. Dana; Peter, Gary F.; Echt, Craig S.

2015-01-01

A consensus genetic map for Pinus taeda (loblolly pine) and Pinus elliottii (slash pine) was constructed by merging three previously published P. taeda maps with a map from a pseudo-backcross between P. elliottii and P. taeda. The consensus map positioned 3856 markers via genotyping of 1251 individuals from four pedigrees. It is the densest linkage map for a conifer to date. Average marker spacing was 0.6 cM and total map length was 2305 cM. Functional predictions of mapped genes were improved by aligning expressed sequence tags used for marker discovery to full-length P. taeda transcripts. Alignments to the P. taeda genome mapped 3305 scaffold sequences onto 12 linkage groups. The consensus genetic map was used to compare the genome-wide linkage disequilibrium in a population of distantly related P. taeda individuals (ADEPT2) used for association genetic studies and a multiple-family pedigree used for genomic selection (CCLONES). The prevalence and extent of LD was greater in CCLONES as compared to ADEPT2; however, extended LD with LGs or between LGs was rare in both populations. The average squared correlations, r2, between SNP alleles less than 1 cM apart were less than 0.05 in both populations and r2 did not decay substantially with genetic distance. The consensus map and analysis of linkage disequilibrium establish a foundation for comparative association mapping and genomic selection in P. taeda and P. elliottii. PMID:26068575
Construction of an Integrated High Density Simple Sequence Repeat Linkage Map in Cultivated Strawberry (Fragaria × ananassa) and its Applicability

PubMed Central

Isobe, Sachiko N.; Hirakawa, Hideki; Sato, Shusei; Maeda, Fumi; Ishikawa, Masami; Mori, Toshiki; Yamamoto, Yuko; Shirasawa, Kenta; Kimura, Mitsuhiro; Fukami, Masanobu; Hashizume, Fujio; Tsuji, Tomoko; Sasamoto, Shigemi; Kato, Midori; Nanri, Keiko; Tsuruoka, Hisano; Minami, Chiharu; Takahashi, Chika; Wada, Tsuyuko; Ono, Akiko; Kawashima, Kumiko; Nakazaki, Naomi; Kishida, Yoshie; Kohara, Mitsuyo; Nakayama, Shinobu; Yamada, Manabu; Fujishiro, Tsunakazu; Watanabe, Akiko; Tabata, Satoshi

2013-01-01

The cultivated strawberry (Fragaria× ananassa) is an octoploid (2n = 8x = 56) of the Rosaceae family whose genomic architecture is still controversial. Several recent studies support the AAA′A′BBB′B′ model, but its complexity has hindered genetic and genomic analysis of this important crop. To overcome this difficulty and to assist genome-wide analysis of F. × ananassa, we constructed an integrated linkage map by organizing a total of 4474 of simple sequence repeat (SSR) markers collected from published Fragaria sequences, including 3746 SSR markers [Fragaria vesca expressed sequence tag (EST)-derived SSR markers] derived from F. vesca ESTs, 603 markers (F. × ananassa EST-derived SSR markers) from F. × ananassa ESTs, and 125 markers (F. × ananassa transcriptome-derived SSR markers) from F. × ananassa transcripts. Along with the previously published SSR markers, these markers were mapped onto five parent-specific linkage maps derived from three mapping populations, which were then assembled into an integrated linkage map. The constructed map consists of 1856 loci in 28 linkage groups (LGs) that total 2364.1 cM in length. Macrosynteny at the chromosome level was observed between the LGs of F. × ananassa and the genome of F. vesca. Variety distinction on 129 F. × ananassa lines was demonstrated using 45 selected SSR markers. PMID:23248204
Redefining the genetics of Murine Gammaherpesvirus 68 via transcriptome-based annotation

PubMed Central

Johnson, L. Steven; Willert, Erin K.; Virgin, Herbert W.

2010-01-01

Summary Viral genetic studies often focus on large open reading frames (ORFs) identified during genome annotation (ORF-based annotation). Here we provide a tool and software set for defining gene expression by murine gammaherpesvirus 68 (γHV68) nucleotide-by-nucleotide across the 119,450 basepair (bp) genome. These tools allowed us to determine that viral RNA expression was significantly more complex than predicted from ORF-based annotation, including over 73,000 nucleotides of unexpected transcription within 30 expressed genomic regions (EGRs). Approximately 90% of this RNA expression was antisense to genomic regions containing known large ORFs. We verified the existence of novel transcripts in three EGRs using standard methods to validate the approach and determined which parts of the transcriptome depend on protein or viral DNA synthesis. This redefines the genetic map of γHV68, indicates that herpesviruses contain significantly more genetic complexity than predicted from ORF-based genome annotations, and provides new tools and approaches for viral genetic studies. PMID:20542255
Analyses of a whole-genome inter-clade recombination map of hepatitis delta virus suggest a host polymerase-driven and viral RNA structure-promoted template-switching mechanism for viral RNA recombination

PubMed Central

Chao, Mei; Wang, Tzu-Chi; Lin, Chia-Chi; Yung-Liang Wang, Robert; Lin, Wen-Bin; Lee, Shang-En; Cheng, Ying-Yu; Yeh, Chau-Ting; Iang, Shan-Bei

2017-01-01

The genome of hepatitis delta virus (HDV) is a 1.7-kb single-stranded circular RNA that folds into an unbranched rod-like structure and has ribozyme activity. HDV redirects host RNA polymerase(s) (RNAP) to perform viral RNA-directed RNA transcription. RNA recombination is known to contribute to the genetic heterogeneity of HDV, but its molecular mechanism is poorly understood. Here, we established a whole-genome HDV-1/HDV-4 recombination map using two cloned sequences coexisting in cultured cells. Our functional analyses of the resulting chimeric delta antigens (the only viral-encoded protein) and recombinant genomes provide insights into how recombination promotes the genotypic and phenotypic diversity of HDV. Our examination of crossover distribution and subsequent mutagenesis analyses demonstrated that ribozyme activity on HDV genome, which is required for viral replication, also contributes to the generation of an inter-clade junction. These data provide circumstantial evidence supporting our contention that HDV RNA recombination occurs via a replication-dependent mechanism. Furthermore, we identify an intrinsic asymmetric bulge on the HDV genome, which appears to promote recombination events in the vicinity. We therefore propose a mammalian RNAP-driven and viral-RNA-structure-promoted template-switching mechanism for HDV genetic recombination. The present findings improve our understanding of the capacities of the host RNAP beyond typical DNA-directed transcription. PMID:28977829
A brief introduction to web-based genome browsers.

PubMed

Wang, Jun; Kong, Lei; Gao, Ge; Luo, Jingchu

2013-03-01

Genome browser provides a graphical interface for users to browse, search, retrieve and analyze genomic sequence and annotation data. Web-based genome browsers can be classified into general genome browsers with multiple species and species-specific genome browsers. In this review, we attempt to give an overview for the main functions and features of web-based genome browsers, covering data visualization, retrieval, analysis and customization. To give a brief introduction to the multiple-species genome browser, we describe the user interface and main functions of the Ensembl and UCSC genome browsers using the human alpha-globin gene cluster as an example. We further use the MSU and the Rice-Map genome browsers to show some special features of species-specific genome browser, taking a rice transcription factor gene OsSPL14 as an example.

Genome-Wide Analysis of Alternative Splicing Landscapes Modulated during Plant-Virus Interactions in Brachypodium distachyon

PubMed Central

Scholthof, Karen-Beth G.

2015-01-01

In eukaryotes, alternative splicing (AS) promotes transcriptome and proteome diversity. The extent of genome-wide AS changes occurring during a plant-microbe interaction is largely unknown. Here, using high-throughput, paired-end RNA sequencing, we generated an isoform-level spliceome map of Brachypodium distachyon infected with Panicum mosaic virus and its satellite virus. Overall, we detected ∼44,443 transcripts in B. distachyon, ∼30% more than those annotated in the reference genome. Expression of ∼28,900 transcripts was ≥2 fragments per kilobase of transcript per million mapped fragments, and ∼42% of multi-exonic genes were alternatively spliced. Comparative analysis of AS patterns in B. distachyon, rice (Oryza sativa), maize (Zea mays), sorghum (Sorghum bicolor), Arabidopsis thaliana, potato (Solanum tuberosum), Medicago truncatula, and poplar (Populus trichocarpa) revealed conserved ratios of the AS types between monocots and dicots. Virus infection quantitatively altered AS events in Brachypodium with little effect on the AS ratios. We discovered AS events for >100 immune-related genes encoding receptor-like kinases, NB-LRR resistance proteins, transcription factors, RNA silencing, and splicing-associated proteins. Cloning and molecular characterization of SCL33, a serine/arginine-rich splicing factor, identified multiple novel intron-retaining splice variants that are developmentally regulated and modulated during virus infection. B. distachyon SCL33 splicing patterns are also strikingly conserved compared with a distant Arabidopsis SCL33 ortholog. This analysis provides new insights into AS landscapes conserved among monocots and dicots and uncovered AS events in plant defense-related genes. PMID:25634987
Tissue-Specific Chromatin Modifications at a Multigene Locus Generate Asymmetric Transcriptional Interactions

PubMed Central

Yoo, Eung Jae; Cajiao, Isabela; Kim, Jeong-Seon; Kimura, Atsushi P.; Zhang, Aiwen; Cooke, Nancy E.; Liebhaber, Stephen A.

2006-01-01

Random assortment within mammalian genomes juxtaposes genes with distinct expression profiles. This organization, along with the prevalence of long-range regulatory controls, generates a potential for aberrant transcriptional interactions. The human CD79b/GH locus contains six tightly linked genes with three mutually exclusive tissue specificities and interdigitated control elements. One consequence of this compact organization is that the pituitarycell-specific transcriptional events that activate hGH-N also trigger ectopic activation of CD79b. However, the B-cell-specific events that activate CD79b do not trigger reciprocal activation of hGH-N. Here we utilized DNase I hypersensitive site mapping, chromatin immunoprecipitation, and transgenic models to explore the basis for this asymmetric relationship. The results reveal tissue-specific patterns of chromatin structures and transcriptional controls at the CD79b/GH locus in B cells distinct from those in the pituitary gland and placenta. These three unique transcriptional environments suggest a set of corresponding gene expression pathways and transcriptional interactions that are likely to be found juxtaposed at multiple sites within the eukaryotic genome. PMID:16847312
QuickMap: a public tool for large-scale gene therapy vector insertion site mapping and analysis.

PubMed

Appelt, J-U; Giordano, F A; Ecker, M; Roeder, I; Grund, N; Hotz-Wagenblatt, A; Opelz, G; Zeller, W J; Allgayer, H; Fruehauf, S; Laufs, S

2009-07-01

Several events of insertional mutagenesis in pre-clinical and clinical gene therapy studies have created intense interest in assessing the genomic insertion profiles of gene therapy vectors. For the construction of such profiles, vector-flanking sequences detected by inverse PCR, linear amplification-mediated-PCR or ligation-mediated-PCR need to be mapped to the host cell's genome and compared to a reference set. Although remarkable progress has been achieved in mapping gene therapy vector insertion sites, public reference sets are lacking, as are the possibilities to quickly detect non-random patterns in experimental data. We developed a tool termed QuickMap, which uniformly maps and analyzes human and murine vector-flanking sequences within seconds (available at www.gtsg.org). Besides information about hits in chromosomes and fragile sites, QuickMap automatically determines insertion frequencies in +/- 250 kb adjacency to genes, cancer genes, pseudogenes, transcription factor and (post-transcriptional) miRNA binding sites, CpG islands and repetitive elements (short interspersed nuclear elements (SINE), long interspersed nuclear elements (LINE), Type II elements and LTR elements). Additionally, all experimental frequencies are compared with the data obtained from a reference set, containing 1 000 000 random integrations ('random set'). Thus, for the first time a tool allowing high-throughput profiling of gene therapy vector insertion sites is available. It provides a basis for large-scale insertion site analyses, which is now urgently needed to discover novel gene therapy vectors with 'safe' insertion profiles.
Transcriptome Analysis of Cotton (Gossypium hirsutum L.) Genotypes That Are Susceptible, Resistant, and Hypersensitive to Reniform Nematode (Rotylenchulus reniformis).

PubMed

Li, Ruijuan; Rashotte, Aaron M; Singh, Narendra K; Lawrence, Kathy S; Weaver, David B; Locy, Robert D

2015-01-01

Reniform nematode is a semi-endoparasitic nematode species causing significant yield loss in numerous crops, including cotton (Gossypium hirsutum L.). An RNA-sequencing analysis was conducted to measure transcript abundance in reniform nematode susceptible (DP90 & SG747), resistant (BARBREN-713), and hypersensitive (LONREN-1) genotypes of cotton (Gossypium hirsutum L.) with and without reniform nematode infestation. Over 90 million trimmed high quality reads were assembled into 84,711 and 80, 353 transcripts using the G. arboreum and the G. raimondii genomes as references. Many transcripts were significantly differentially expressed between the three different genotypes both prior to and during nematode pathogenesis, including transcripts corresponding to the gene ontology categories of cell wall, hormone metabolism and signaling, redox reactions, secondary metabolism, transcriptional regulation, stress responses, and signaling. Further analysis revealed that a number of these differentially expressed transcripts mapped to the G. raimondii and/or the G. arboreum genomes within 1 megabase of quantitative trait loci that had previously been linked to reniform nematode resistance. Several resistance genes encoding proteins known to be strongly linked to pathogen perception and resistance, including LRR-like and NBS-LRR domain-containing proteins, were among the differentially expressed transcripts mapping near these quantitative trait loci. Further investigation is required to confirm a role for these transcripts in reniform nematode susceptibility, hypersensitivity, and/or resistance. This study presents the first systemic investigation of reniform nematode resistance-associated genes using different genotypes of cotton. The candidate reniform nematode resistance-associated genes identified in this study can serve as the basis for further functional analysis and aid in further development of reniform a nematode resistant cotton germplasm.
Integrative genetic analysis of transcription modules: towards filling the gap between genetic lociand inherited traits

DOE Office of Scientific and Technical Information (OSTI.GOV)

Li, Hongqiang; Chen, Hao; Bao, Lei

2005-01-01

Genetic loci that regulate inherited traits are routinely identified using quantitative trait locus (QTL) mapping methods. However, the genotype-phenotype associations do not provide information on the gene expression program through which the genetic loci regulate the traits. Transcription modules are 'selfconsistent regulatory units' and are closely related to the modular components of gene regulatory network [Ihmels, J., Friedlander, G., Bergmann, S., Sarig, O., Ziv, Y. and Barkai, N. (2002) Revealing modular organization in the yeast transcriptional network. Nat. Genet., 31, 370-377; Segal, E., Shapira, M., Regev, A., Pe'er, D., Botstein, D., Koller, D. and Friedman, N. (2003) Module networks: identifyingmore » regulatory modules and their condition-specific regulators from gene expression data. Nat. Genet., 34, 166-176]. We used genome-wide genotype and gene expression data of a genetic reference population that consists of mice of 32 recombinant inbred strains to identify the transcription modules and the genetic loci regulating them. Twenty-nine transcription modules defined by genetic variations were identified. Statistically significant associations between the transcription modules and 18 classical physiological and behavioral traits were found. Genome-wide interval mapping showed that major QTLs regulating the transcription modules are often co-localized with the QTLs regulating the associated classical traits. The association and the possible co-regulation of the classical trait and transcription module indicate that the transcription module may be involved in the gene pathways connecting the QTL and the classical trait. Our results show that a transcription module may associate with multiple seemingly unrelated classical traits and a classical trait may associate with different modules. Literature mining results provided strong independent evidences for the relations among genes of the transcription modules, genes in the regions of the QTLs regulating the transcription modules and the keywords representing the classical traits.« less
Genome-wide inference of regulatory networks in Streptomyces coelicolor.

PubMed

Castro-Melchor, Marlene; Charaniya, Salim; Karypis, George; Takano, Eriko; Hu, Wei-Shou

2010-10-18

The onset of antibiotics production in Streptomyces species is co-ordinated with differentiation events. An understanding of the genetic circuits that regulate these coupled biological phenomena is essential to discover and engineer the pharmacologically important natural products made by these species. The availability of genomic tools and access to a large warehouse of transcriptome data for the model organism, Streptomyces coelicolor, provides incentive to decipher the intricacies of the regulatory cascades and develop biologically meaningful hypotheses. In this study, more than 500 samples of genome-wide temporal transcriptome data, comprising wild-type and more than 25 regulatory gene mutants of Streptomyces coelicolor probed across multiple stress and medium conditions, were investigated. Information based on transcript and functional similarity was used to update a previously-predicted whole-genome operon map and further applied to predict transcriptional networks constituting modules enriched in diverse functions such as secondary metabolism, and sigma factor. The predicted network displays a scale-free architecture with a small-world property observed in many biological networks. The networks were further investigated to identify functionally-relevant modules that exhibit functional coherence and a consensus motif in the promoter elements indicative of DNA-binding elements. Despite the enormous experimental as well as computational challenges, a systems approach for integrating diverse genome-scale datasets to elucidate complex regulatory networks is beginning to emerge. We present an integrated analysis of transcriptome data and genomic features to refine a whole-genome operon map and to construct regulatory networks at the cistron level in Streptomyces coelicolor. The functionally-relevant modules identified in this study pose as potential targets for further studies and verification.
Comprehensive annotation of Glossina pallidipes salivary gland hypertrophy virus from Ethiopian tsetse flies: a proteogenomics approach

PubMed Central

Kariithi, Henry M.; Cousserans, François; Parker, Nicolas J.; İnce, İkbal Agah; Scully, Erin D.; Boeren, Sjef; Geib, Scott M.; Mekonnen, Solomon; Vlak, Just M.; Parker, Andrew G.; Vreysen, Marc J. B.; Bergoin, Max

2016-01-01

Glossina pallidipes salivary gland hypertrophy virus (GpSGHV; family Hytrosaviridae) can establish asymptomatic and symptomatic infection in its tsetse fly host. Here, we present a comprehensive annotation of the genome of an Ethiopian GpSGHV isolate (GpSGHV-Eth) compared with the reference Ugandan GpSGHV isolate (GpSGHV-Uga; GenBank accession number EF568108). GpSGHV-Eth has higher salivary gland hypertrophy syndrome prevalence than GpSGHV-Uga. We show that the GpSGHV-Eth genome has 190 291 nt, a low G+C content (27.9 %) and encodes 174 putative ORFs. Using proteogenomic and transcriptome mapping, 141 and 86 ORFs were mapped by transcripts and peptides, respectively. Furthermore, of the 174 ORFs, 132 had putative transcriptional signals [TATA-like box and poly(A) signals]. Sixty ORFs had both TATA-like box promoter and poly(A) signals, and mapped by both transcripts and peptides, implying that these ORFs encode functional proteins. Of the 60 ORFs, 10 ORFs are homologues to baculovirus and nudivirus core genes, including three per os infectivity factors and four RNA polymerase subunits (LEF4, 5, 8 and 9). Whereas GpSGHV-Eth and GpSGHV-Uga are 98.1 % similar at the nucleotide level, 37 ORFs in the GpSGHV-Eth genome had nucleotide insertions (n = 17) and deletions (n = 20) compared with their homologues in GpSGHV-Uga. Furthermore, compared with the GpSGHV-Uga genome, 11 and 24 GpSGHV ORFs were deleted and novel, respectively. Further, 13 GpSGHV-Eth ORFs were non-canonical; they had either CTG or TTG start codons instead of ATG. Taken together, these data suggest that GpSGHV-Eth and GpSGHV-Uga represent two different lineages of the same virus. Genetic differences combined with host and environmental factors possibly explain the differential GpSGHV pathogenesis observed in different G. pallidipes colonies. PMID:26801744
Whole-genome analysis of genetic recombination of hepatitis delta virus: molecular domain in delta antigen determining trans-activating efficiency.

PubMed

Chao, Mei; Lin, Chia-Chi; Lin, Feng-Ming; Li, Hsin-Pai; Iang, Shan-Bei

2015-12-01

Hepatitis delta virus (HDV) is the only animal RNA virus that has an unbranched rod-like genome with ribozyme activity and is replicated by host RNA polymerase. HDV RNA recombination was previously demonstrated in patients and in cultured cells by analysis of a region corresponding to the C terminus of the delta antigen (HDAg), the only viral-encoded protein. Here, a whole-genome recombination map of HDV was constructed using an experimental system in which two HDV-1 sequences were co-transfected into cultured cells and the recombinants were analysed by sequencing of cloned reverse transcription-PCR products. Fifty homologous recombinants with 60 crossovers mapping to 22 junctions were identified from 200 analysed clones. Small HDAg chimeras harbouring a junction newly detected in the recombination map were then constructed. The results further indicated that the genome-replication level of HDV was sensitive to the sixth amino acid within the N-terminal 22 aa of HDAg. Therefore, the recombination map established in this study provided a tool for not only understanding HDV RNA recombination, but also elucidating the related mechanisms, such as molecular elements responsible for the trans-activation levels of the small HDAg.
The draft genome sequence of cork oak

PubMed Central

Ramos, António Marcos; Usié, Ana; Barbosa, Pedro; Barros, Pedro M.; Capote, Tiago; Chaves, Inês; Simões, Fernanda; Abreu, Isabl; Carrasquinho, Isabel; Faro, Carlos; Guimarães, Joana B.; Mendonça, Diogo; Nóbrega, Filomena; Rodrigues, Leandra; Saibo, Nelson J. M.; Varela, Maria Carolina; Egas, Conceição; Matos, José; Miguel, Célia M.; Oliveira, M. Margarida; Ricardo, Cândido P.; Gonçalves, Sónia

2018-01-01

Cork oak (Quercus suber) is native to southwest Europe and northwest Africa where it plays a crucial environmental and economical role. To tackle the cork oak production and industrial challenges, advanced research is imperative but dependent on the availability of a sequenced genome. To address this, we produced the first draft version of the cork oak genome. We followed a de novo assembly strategy based on high-throughput sequence data, which generated a draft genome comprising 23,347 scaffolds and 953.3 Mb in size. A total of 79,752 genes and 83,814 transcripts were predicted, including 33,658 high-confidence genes. An InterPro signature assignment was detected for 69,218 transcripts, which represented 82.6% of the total. Validation studies demonstrated the genome assembly and annotation completeness and highlighted the usefulness of the draft genome for read mapping of high-throughput sequence data generated using different protocols. All data generated is available through the public databases where it was deposited, being therefore ready to use by the academic and industry communities working on cork oak and/or related species. PMID:29786699
The draft genome sequence of cork oak.

PubMed

Ramos, António Marcos; Usié, Ana; Barbosa, Pedro; Barros, Pedro M; Capote, Tiago; Chaves, Inês; Simões, Fernanda; Abreu, Isabl; Carrasquinho, Isabel; Faro, Carlos; Guimarães, Joana B; Mendonça, Diogo; Nóbrega, Filomena; Rodrigues, Leandra; Saibo, Nelson J M; Varela, Maria Carolina; Egas, Conceição; Matos, José; Miguel, Célia M; Oliveira, M Margarida; Ricardo, Cândido P; Gonçalves, Sónia

2018-05-22

Cork oak (Quercus suber) is native to southwest Europe and northwest Africa where it plays a crucial environmental and economical role. To tackle the cork oak production and industrial challenges, advanced research is imperative but dependent on the availability of a sequenced genome. To address this, we produced the first draft version of the cork oak genome. We followed a de novo assembly strategy based on high-throughput sequence data, which generated a draft genome comprising 23,347 scaffolds and 953.3 Mb in size. A total of 79,752 genes and 83,814 transcripts were predicted, including 33,658 high-confidence genes. An InterPro signature assignment was detected for 69,218 transcripts, which represented 82.6% of the total. Validation studies demonstrated the genome assembly and annotation completeness and highlighted the usefulness of the draft genome for read mapping of high-throughput sequence data generated using different protocols. All data generated is available through the public databases where it was deposited, being therefore ready to use by the academic and industry communities working on cork oak and/or related species.
Heritability and genetic basis of protein level variation in an outbred population

PubMed Central

Liu, Yi-Chun; Tekkedil, Manu M.; Steinmetz, Lars M.; Caudy, Amy A.; Fraser, Andrew G.

2014-01-01

The genetic basis of heritable traits has been studied for decades. Although recent mapping efforts have elucidated genetic determinants of transcript levels, mapping of protein abundance has lagged. Here, we analyze levels of 4084 GFP-tagged yeast proteins in the progeny of a cross between a laboratory and a wild strain using flow cytometry and high-content microscopy. The genotype of trans variants contributed little to protein level variation between individual cells but explained >50% of the variance in the population’s average protein abundance for half of the GFP fusions tested. To map trans-acting factors responsible, we performed flow sorting and bulk segregant analysis of 25 proteins, finding a median of five protein quantitative trait loci (pQTLs) per GFP fusion. Further, we find that cis-acting variants predominate; the genotype of a gene and its surrounding region had a large effect on protein level six times more frequently than the rest of the genome combined. We present evidence for both shared and independent genetic control of transcript and protein abundance: More than half of the expression QTLs (eQTLs) contribute to changes in protein levels of regulated genes, but several pQTLs do not affect their cognate transcript levels. Allele replacements of genes known to underlie trans eQTL hotspots confirmed the correlation of effects on mRNA and protein levels. This study represents the first genome-scale measurement of genetic contribution to protein levels in single cells and populations, identifies more than a hundred trans pQTLs, and validates the propagation of effects associated with transcript variation to protein abundance. PMID:24823668
Defining functional DNA elements in the human genome

PubMed Central

Kellis, Manolis; Wold, Barbara; Snyder, Michael P.; Bernstein, Bradley E.; Kundaje, Anshul; Marinov, Georgi K.; Ward, Lucas D.; Birney, Ewan; Crawford, Gregory E.; Dekker, Job; Dunham, Ian; Elnitski, Laura L.; Farnham, Peggy J.; Feingold, Elise A.; Gerstein, Mark; Giddings, Morgan C.; Gilbert, David M.; Gingeras, Thomas R.; Green, Eric D.; Guigo, Roderic; Hubbard, Tim; Kent, Jim; Lieb, Jason D.; Myers, Richard M.; Pazin, Michael J.; Ren, Bing; Stamatoyannopoulos, John A.; Weng, Zhiping; White, Kevin P.; Hardison, Ross C.

2014-01-01

With the completion of the human genome sequence, attention turned to identifying and annotating its functional DNA elements. As a complement to genetic and comparative genomics approaches, the Encyclopedia of DNA Elements Project was launched to contribute maps of RNA transcripts, transcriptional regulator binding sites, and chromatin states in many cell types. The resulting genome-wide data reveal sites of biochemical activity with high positional resolution and cell type specificity that facilitate studies of gene regulation and interpretation of noncoding variants associated with human disease. However, the biochemically active regions cover a much larger fraction of the genome than do evolutionarily conserved regions, raising the question of whether nonconserved but biochemically active regions are truly functional. Here, we review the strengths and limitations of biochemical, evolutionary, and genetic approaches for defining functional DNA segments, potential sources for the observed differences in estimated genomic coverage, and the biological implications of these discrepancies. We also analyze the relationship between signal intensity, genomic coverage, and evolutionary conservation. Our results reinforce the principle that each approach provides complementary information and that we need to use combinations of all three to elucidate genome function in human biology and disease. PMID:24753594
Genomic structure, promoter identification, and chromosomal mapping of a mouse nuclear orphan receptor expressed in embryos and adult testes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lee, C.H.; Wei, Li-Na; Copeland, N.G.

We have isolated and characterized overlapping genomic clones containing the complete transcribed region of a newly isolated mouse cDNA encoding an orphan receptor expressed specifically in midgestation embryos and adult testis. This gene spans a distance of more than 50 kb and is organized into 13 exons. The transcription initiation site is located at the 158th nucleotide upstream from the translation initiation codon. All the exon/intron junction sequences follow the GT/AG rule. Based upon Northern blot analysis and the size of the transcribed region of the gene, its transcript was determined to be approximately 2.5 kb. Within approximately 500 hpmore » upstream from the transcription initiation site, several immune response regulatory elements were identified but no TATA box was located. This gene was mapped to the distal region of mouse chromosome 10 and its locus has been designated Tr2-11. Immunohistochemical studies show that the Tr2-11 protein is present mainly in advanced germ cell populations of mature testes and that Tr2-11 gene expression is dramatically decreased in vitamin A-depleted animals. 23 refs., 7 figs.« less
Gene-breaking: A new paradigm for human retrotransposon-mediated gene evolution

PubMed Central

Wheelan, Sarah J.; Aizawa, Yasunori; Han, Jeffrey S.; Boeke, Jef D.

2005-01-01

The L1 retrotransposon is the most highly successful autonomous retrotransposon in mammals. This prolific genome parasite may on occasion benefit its host through genome rearrangements or adjustments of host gene expression. In examining possible effects of L1 elements on host gene expression, we investigated whether a full-length L1 element inserted in the antisense orientation into an intron of a cellular gene may actually split the gene's transcript into two smaller transcripts: (1) a transcript containing the upstream exons and terminating in the major antisense polyadenylation site (MAPS) of the L1, and (2) a transcript derived from the L1 antisense promoter (ASP) that includes the downstream exons of the gene. Bioinformatic analysis and experimental follow-up provide evidence for this L1 “gene-breaking” hypothesis. We identified three human genes apparently “broken” by L1 elements, as well as 12 more candidate genes. Most of the inserted L1 elements in our 15 candidate genes predate the human/chimp divergence. If indeed split, the transcripts of these genes may in at least one case encode potentially interacting proteins, and in another case may encode novel proteins. Gene-breaking represents a new mechanism through which L1 elements remodel mammalian genomes. PMID:16024818
GENCODE: the reference human genome annotation for The ENCODE Project.

PubMed

Harrow, Jennifer; Frankish, Adam; Gonzalez, Jose M; Tapanari, Electra; Diekhans, Mark; Kokocinski, Felix; Aken, Bronwen L; Barrell, Daniel; Zadissa, Amonida; Searle, Stephen; Barnes, If; Bignell, Alexandra; Boychenko, Veronika; Hunt, Toby; Kay, Mike; Mukherjee, Gaurab; Rajan, Jeena; Despacio-Reyes, Gloria; Saunders, Gary; Steward, Charles; Harte, Rachel; Lin, Michael; Howald, Cédric; Tanzer, Andrea; Derrien, Thomas; Chrast, Jacqueline; Walters, Nathalie; Balasubramanian, Suganthi; Pei, Baikang; Tress, Michael; Rodriguez, Jose Manuel; Ezkurdia, Iakes; van Baren, Jeltje; Brent, Michael; Haussler, David; Kellis, Manolis; Valencia, Alfonso; Reymond, Alexandre; Gerstein, Mark; Guigó, Roderic; Hubbard, Tim J

2012-09-01

The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since the first public release of this annotation data set, few new protein-coding loci have been added, yet the number of alternative splicing transcripts annotated has steadily increased. The GENCODE 7 release contains 20,687 protein-coding and 9640 long noncoding RNA loci and has 33,977 coding transcripts not represented in UCSC genes and RefSeq. It also has the most comprehensive annotation of long noncoding RNA (lncRNA) loci publicly available with the predominant transcript form consisting of two exons. We have examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites. Over one-third of GENCODE protein-coding genes are supported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas. New models derived from the Illumina Body Map 2.0 RNA-seq data identify 3689 new loci not currently in GENCODE, of which 3127 consist of two exon models indicating that they are possibly unannotated long noncoding loci. GENCODE 7 is publicly available from gencodegenes.org and via the Ensembl and UCSC Genome Browsers.
Functional genomics of root growth and development in Arabidopsis

PubMed Central

Iyer-Pascuzzi, Anjali; Simpson, June; Herrera-Estrella, Luis; Benfey, Philip N.

2009-01-01

Summary Roots are vital for the uptake of water and nutrients, and for anchorage in the soil. They are highly plastic, able to adapt developmentally and physiologically to changing environmental conditions. Understanding the molecular mechanisms behind this growth and development requires knowledge of root transcriptomics, proteomics and metabolomics. Genomics approaches, including the recent publication of a root expression map, root proteome, and environment-specific root expression studies, are uncovering complex transcriptional and post-transcriptional networks underlying root development. The challenge is in further capitalizing on the information in these datasets to understand the fundamental principles of root growth and development. In this review, we highlight progress researchers have made toward this goal. PMID:19117793
Functional genomics of root growth and development in Arabidopsis.

PubMed

Iyer-Pascuzzi, Anjali; Simpson, June; Herrera-Estrella, Luis; Benfey, Philip N

2009-04-01

Roots are vital for the uptake of water and nutrients, and for anchorage in the soil. They are highly plastic, able to adapt developmentally and physiologically to changing environmental conditions. Understanding the molecular mechanisms behind this growth and development requires knowledge of root transcriptomics, proteomics, and metabolomics. Genomics approaches, including the recent publication of a root expression map, root proteome, and environment-specific root expression studies, are uncovering complex transcriptional and post-transcriptional networks underlying root development. The challenge is in further capitalizing on the information in these datasets to understand the fundamental principles of root growth and development. In this review, we highlight progress researchers have made toward this goal.
Transcriptional Regulatory Networks in Saccharomyces cerevisiae

NASA Astrophysics Data System (ADS)

Lee, Tong Ihn; Rinaldi, Nicola J.; Robert, François; Odom, Duncan T.; Bar-Joseph, Ziv; Gerber, Georg K.; Hannett, Nancy M.; Harbison, Christopher T.; Thompson, Craig M.; Simon, Itamar; Zeitlinger, Julia; Jennings, Ezra G.; Murray, Heather L.; Gordon, D. Benjamin; Ren, Bing; Wyrick, John J.; Tagne, Jean-Bosco; Volkert, Thomas L.; Fraenkel, Ernest; Gifford, David K.; Young, Richard A.

2002-10-01

We have determined how most of the transcriptional regulators encoded in the eukaryote Saccharomyces cerevisiae associate with genes across the genome in living cells. Just as maps of metabolic networks describe the potential pathways that may be used by a cell to accomplish metabolic processes, this network of regulator-gene interactions describes potential pathways yeast cells can use to regulate global gene expression programs. We use this information to identify network motifs, the simplest units of network architecture, and demonstrate that an automated process can use motifs to assemble a transcriptional regulatory network structure. Our results reveal that eukaryotic cellular functions are highly connected through networks of transcriptional regulators that regulate other transcriptional regulators.
Transcriptional and phylogenetic analysis of five complete ambystomatid salamander mitochondrial genomes.

PubMed

Samuels, Amy K; Weisrock, David W; Smith, Jeramiah J; France, Katherine J; Walker, John A; Putta, Srikrishna; Voss, S Randal

2005-04-11

We report on a study that extended mitochondrial transcript information from a recent EST project to obtain complete mitochondrial genome sequence for 5 tiger salamander complex species (Ambystoma mexicanum, A. t. tigrinum, A. andersoni, A. californiense, and A. dumerilii). We describe, for the first time, aspects of mitochondrial transcription in a representative amphibian, and then use complete mitochondrial sequence data to examine salamander phylogeny at both deep and shallow levels of evolutionary divergence. The available mitochondrial ESTs for A. mexicanum (N=2481) and A. t. tigrinum (N=1205) provided 92% and 87% coverage of the mitochondrial genome, respectively. Complete mitochondrial sequences for all species were rapidly obtained by using long distance PCR and DNA sequencing. A number of genome structural characteristics (base pair length, base composition, gene number, gene boundaries, codon usage) were highly similar among all species and to other distantly related salamanders. Overall, mitochondrial transcription in Ambystoma approximated the pattern observed in other vertebrates. We inferred from the mapping of ESTs onto mtDNA that transcription occurs from both heavy and light strand promoters and continues around the entire length of the mtDNA, followed by post-transcriptional processing. However, the observation of many short transcripts corresponding to rRNA genes indicates that transcription may often terminate prematurely to bias transcription of rRNA genes; indeed an rRNA transcription termination signal sequence was observed immediately following the 16S rRNA gene. Phylogenetic analyses of salamander family relationships consistently grouped Ambystomatidae in a clade containing Cryptobranchidae and Hynobiidae, to the exclusion of Salamandridae. This robust result suggests a novel alternative hypothesis because previous studies have consistently identified Ambystomatidae and Salamandridae as closely related taxa. Phylogenetic analyses of tiger salamander complex species also produced robustly supported trees. The D-loop, used in previous molecular phylogenetic studies of the complex, was found to contain a relatively low level of variation and we identified mitochondrial regions with higher rates of molecular evolution that are more useful in resolving relationships among species. Our results show the benefit of using complete genome mitochondrial information in studies of recently and rapidly diverged taxa.
Mapping Flagellar Genes in Chlamydomonas Using Restriction Fragment Length Polymorphisms

PubMed Central

Ranum, LPW.; Thompson, M. D.; Schloss, J. A.; Lefebvre, P. A.; Silflow, C. D.

1988-01-01

To correlate cloned nuclear DNA sequences with previously characterized mutations in Chlamydomonas and, to gain insight into the organization of its nuclear genome, we have begun to map molecular markers using restriction fragment length polymorphisms (RFLPs). A Chlamydomonas reinhardtii strain (CC-29) containing phenotypic markers on nine of the 19 linkage groups was crossed to the interfertile species Chlamydomonas smithii. DNA from each member of 22 randomly selected tetrads was analyzed for the segregation of RFLPs associated with cloned genes detected by hybridization with radioactive DNA probes. The current set of markers allows the detection of linkage to new molecular markers over approximately 54% of the existing genetic map. This study focused on mapping cloned flagellar genes and genes whose transcripts accumulate after deflagellation. Twelve different molecular clones have been assigned to seven linkage groups. The α-1 tubulin gene maps to linkage group III and is linked to the genomic sequence homologous to pcf6-100, a cDNA clone whose corresponding transcript accumulates after deflagellation. The α-2 tubulin gene maps to linkage group IV. The two β-tubulin genes are linked, with the β-1 gene being approximately 12 cM more distal from the centromere than the β-2 gene. A clone corresponding to a 73-kD dynein protein maps to the opposite arm of the same linkage group. The gene corresponding to the cDNA clone pcf6-187, whose mRNA accumulates after deflagellation, maps very close to the tightly linked pf-26 and pf-1 mutations on linkage group V. PMID:2906025

Improved maize reference genome with single-molecule technologies.

PubMed

Jiao, Yinping; Peluso, Paul; Shi, Jinghua; Liang, Tiffany; Stitzer, Michelle C; Wang, Bo; Campbell, Michael S; Stein, Joshua C; Wei, Xuehong; Chin, Chen-Shan; Guill, Katherine; Regulski, Michael; Kumari, Sunita; Olson, Andrew; Gent, Jonathan; Schneider, Kevin L; Wolfgruber, Thomas K; May, Michael R; Springer, Nathan M; Antoniou, Eric; McCombie, W Richard; Presting, Gernot G; McMullen, Michael; Ross-Ibarra, Jeffrey; Dawe, R Kelly; Hastie, Alex; Rank, David R; Ware, Doreen

2017-06-22

Complete and accurate reference genomes and annotations provide fundamental tools for characterization of genetic and functional variation. These resources facilitate the determination of biological processes and support translation of research findings into improved and sustainable agricultural technologies. Many reference genomes for crop plants have been generated over the past decade, but these genomes are often fragmented and missing complex repeat regions. Here we report the assembly and annotation of a reference genome of maize, a genetic and agricultural model species, using single-molecule real-time sequencing and high-resolution optical mapping. Relative to the previous reference genome, our assembly features a 52-fold increase in contig length and notable improvements in the assembly of intergenic spaces and centromeres. Characterization of the repetitive portion of the genome revealed more than 130,000 intact transposable elements, allowing us to identify transposable element lineage expansions that are unique to maize. Gene annotations were updated using 111,000 full-length transcripts obtained by single-molecule real-time sequencing. In addition, comparative optical mapping of two other inbred maize lines revealed a prevalence of deletions in regions of low gene density and maize lineage-specific genes.
Localization of TFIIB binding regions using serial analysis of chromatin occupancy

PubMed Central

Yochum, Gregory S; Rajaraman, Veena; Cleland, Ryan; McWeeney, Shannon

2007-01-01

Background: RNA Polymerase II (RNAP II) is recruited to core promoters by the pre-initiation complex (PIC) of general transcription factors. Within the PIC, transcription factor for RNA polymerase IIB (TFIIB) determines the start site of transcription. TFIIB binding has not been localized, genome-wide, in metazoans. Serial analysis of chromatin occupancy (SACO) is an unbiased methodology used to empirically identify transcription factor binding regions. In this report, we use TFIIB and SACO to localize TFIIB binding regions across the rat genome. Results: A sample of the TFIIB SACO library was sequenced and 12,968 TFIIB genomic signature tags (GSTs) were assigned to the rat genome. GSTs are 20–22 base pair fragments that are derived from TFIIB bound chromatin. TFIIB localized to both non-protein coding and protein-coding loci. For 21% of the 1783 protein-coding genes in this sample of the SACO library, TFIIB binding mapped near the characterized 5' promoter that is upstream of the transcription start site (TSS). However, internal TFIIB binding positions were identified in 57% of the 1783 protein-coding genes. Internal positions are defined as those within an inclusive region greater than 2.5 kb downstream from the 5' TSS and 2.5 kb upstream from the transcription stop. We demonstrate that both TFIIB and TFIID (an additional component of PICs) bound to internal regions using chromatin immunoprecipitation (ChIP). The 5' cap of transcripts associated with internal TFIIB binding positions were identified using a cap-trapping assay. The 5' TSSs for internal transcripts were confirmed by primer extension. Additionally, an analysis of the functional annotation of mouse 3 (FANTOM3) databases indicates that internally initiated transcripts identified by TFIIB SACO in rat are conserved in mouse. Conclusion: Our findings that TFIIB binding is not restricted to the 5' upstream region indicates that the propensity for PIC to contribute to transcript diversity is far greater than previously appreciated. PMID:17997859
Intragenic origins due to short G1 phases underlie oncogene-induced DNA replication stress.

PubMed

Macheret, Morgane; Halazonetis, Thanos D

2018-03-01

Oncogene-induced DNA replication stress contributes critically to the genomic instability that is present in cancer. However, elucidating how oncogenes deregulate DNA replication has been impeded by difficulty in mapping replication initiation sites on the human genome. Here, using a sensitive assay to monitor nascent DNA synthesis in early S phase, we identified thousands of replication initiation sites in cells before and after induction of the oncogenes CCNE1 and MYC. Remarkably, both oncogenes induced firing of a novel set of DNA replication origins that mapped within highly transcribed genes. These ectopic origins were normally suppressed by transcription during G1, but precocious entry into S phase, before all genic regions had been transcribed, allowed firing of origins within genes in cells with activated oncogenes. Forks from oncogene-induced origins were prone to collapse, as a result of conflicts between replication and transcription, and were associated with DNA double-stranded break formation and chromosomal rearrangement breakpoints both in our experimental system and in a large cohort of human cancers. Thus, firing of intragenic origins caused by premature S phase entry represents a mechanism of oncogene-induced DNA replication stress that is relevant for genomic instability in human cancer.
DNA breathing dynamics distinguish binding from nonbinding consensus sites for transcription factor YY1 in cells.

PubMed

Alexandrov, Boian S; Fukuyo, Yayoi; Lange, Martin; Horikoshi, Nobuo; Gelev, Vladimir; Rasmussen, Kim Ø; Bishop, Alan R; Usheva, Anny

2012-11-01

The genome-wide mapping of the major gene expression regulators, the transcription factors (TFs) and their DNA binding sites, is of great importance for describing cellular behavior and phenotypic diversity. Presently, the methods for prediction of genomic TF binding produce a large number of false positives, most likely due to insufficient description of the physiochemical mechanisms of protein-DNA binding. Growing evidence suggests that, in the cell, the double-stranded DNA (dsDNA) is subject to local transient strands separations (breathing) that contribute to genomic functions. By using site-specific chromatin immunopecipitations, gel shifts, BIOBASE data, and our model that accurately describes the melting behavior and breathing dynamics of dsDNA we report a specific DNA breathing profile found at YY1 binding sites in cells. We find that the genomic flanking sequence variations and SNPs, may exert long-range effects on DNA dynamics and predetermine YY1 binding. The ubiquitous TF YY1 has a fundamental role in essential biological processes by activating, initiating or repressing transcription depending upon the sequence context it binds. We anticipate that consensus binding sequences together with the related DNA dynamics profile may significantly improve the accuracy of genomic TF binding sites and TF binding-related functional SNPs.
Smooth Muscle Cell Genome Browser: Enabling the Identification of Novel Serum Response Factor Target Genes

PubMed Central

Lee, Moon Young; Park, Chanjae; Berent, Robyn M.; Park, Paul J.; Fuchs, Robert; Syn, Hannah; Chin, Albert; Townsend, Jared; Benson, Craig C.; Redelman, Doug; Shen, Tsai-wei; Park, Jong Kun; Miano, Joseph M.; Sanders, Kenton M.; Ro, Seungil

2015-01-01

Genome-scale expression data on the absolute numbers of gene isoforms offers essential clues in cellular functions and biological processes. Smooth muscle cells (SMCs) perform a unique contractile function through expression of specific genes controlled by serum response factor (SRF), a transcription factor that binds to DNA sites known as the CArG boxes. To identify SRF-regulated genes specifically expressed in SMCs, we isolated SMC populations from mouse small intestine and colon, obtained their transcriptomes, and constructed an interactive SMC genome and CArGome browser. To our knowledge, this is the first online resource that provides a comprehensive library of all genetic transcripts expressed in primary SMCs. The browser also serves as the first genome-wide map of SRF binding sites. The browser analysis revealed novel SMC-specific transcriptional variants and SRF target genes, which provided new and unique insights into the cellular and biological functions of the cells in gastrointestinal (GI) physiology. The SRF target genes in SMCs, which were discovered in silico, were confirmed by proteomic analysis of SMC-specific Srf knockout mice. Our genome browser offers a new perspective into the alternative expression of genes in the context of SRF binding sites in SMCs and provides a valuable reference for future functional studies. PMID:26241044
High-resolution mapping of transcription factor binding sites on native chromatin

PubMed Central

Kasinathan, Sivakanthan; Orsi, Guillermo A.; Zentner, Gabriel E.; Ahmad, Kami; Henikoff, Steven

2014-01-01

Sequence-specific DNA-binding proteins including transcription factors (TFs) are key determinants of gene regulation and chromatin architecture. Formaldehyde cross-linking and sonication followed by Chromatin ImmunoPrecipitation (X-ChIP) is widely used for profiling of TF binding, but is limited by low resolution and poor specificity and sensitivity. We present a simple protocol that starts with micrococcal nuclease-digested uncross-linked chromatin and is followed by affinity purification of TFs and paired-end sequencing. The resulting ORGANIC (Occupied Regions of Genomes from Affinity-purified Naturally Isolated Chromatin) profiles of Saccharomyces cerevisiae Abf1 and Reb1 provide highly accurate base-pair resolution maps that are not biased toward accessible chromatin, and do not require input normalization. We also demonstrate the high specificity of our method when applied to larger genomes by profiling Drosophila melanogaster GAGA Factor and Pipsqueak. Our results suggest that ORGANIC profiling is a widely applicable high-resolution method for sensitive and specific profiling of direct protein-DNA interactions. PMID:24336359
Genome-Wide Identification and Structural Analysis of bZIP Transcription Factor Genes in Brassica napus.

PubMed

Zhou, Yan; Xu, Daixiang; Jia, Ledong; Huang, Xiaohu; Ma, Guoqiang; Wang, Shuxian; Zhu, Meichen; Zhang, Aoxiang; Guan, Mingwei; Lu, Kun; Xu, Xinfu; Wang, Rui; Li, Jiana; Qu, Cunmin

2017-10-24

The basic region/leucine zipper motif (bZIP) transcription factor family is one of the largest families of transcriptional regulators in plants. bZIP genes have been systematically characterized in some plants, but not in rapeseed ( Brassica napus ). In this study, we identified 247 BnbZIP genes in the rapeseed genome, which we classified into 10 subfamilies based on phylogenetic analysis of their deduced protein sequences. The BnbZIP genes were grouped into functional clades with Arabidopsis genes with similar putative functions, indicating functional conservation. Genome mapping analysis revealed that the BnbZIPs are distributed unevenly across all 19 chromosomes, and that some of these genes arose through whole-genome duplication and dispersed duplication events. All expression profiles of 247 bZIP genes were extracted from RNA-sequencing data obtained from 17 different B . napus ZS11 tissues with 42 various developmental stages. These genes exhibited different expression patterns in various tissues, revealing that these genes are differentially regulated. Our results provide a valuable foundation for functional dissection of the different BnbZIP homologs in B . napus and its parental lines and for molecular breeding studies of bZIP genes in B . napus .
Genome-Wide Identification and Structural Analysis of bZIP Transcription Factor Genes in Brassica napus

PubMed Central

Zhou, Yan; Xu, Daixiang; Jia, Ledong; Huang, Xiaohu; Ma, Guoqiang; Wang, Shuxian; Zhu, Meichen; Zhang, Aoxiang; Guan, Mingwei; Xu, Xinfu; Wang, Rui; Li, Jiana

2017-01-01

The basic region/leucine zipper motif (bZIP) transcription factor family is one of the largest families of transcriptional regulators in plants. bZIP genes have been systematically characterized in some plants, but not in rapeseed (Brassica napus). In this study, we identified 247 BnbZIP genes in the rapeseed genome, which we classified into 10 subfamilies based on phylogenetic analysis of their deduced protein sequences. The BnbZIP genes were grouped into functional clades with Arabidopsis genes with similar putative functions, indicating functional conservation. Genome mapping analysis revealed that the BnbZIPs are distributed unevenly across all 19 chromosomes, and that some of these genes arose through whole-genome duplication and dispersed duplication events. All expression profiles of 247 bZIP genes were extracted from RNA-sequencing data obtained from 17 different B. napus ZS11 tissues with 42 various developmental stages. These genes exhibited different expression patterns in various tissues, revealing that these genes are differentially regulated. Our results provide a valuable foundation for functional dissection of the different BnbZIP homologs in B. napus and its parental lines and for molecular breeding studies of bZIP genes in B. napus. PMID:29064393
A novel bioinformatics method for efficient knowledge discovery by BLSOM from big genomic sequence data.

PubMed

Bai, Yu; Iwasaki, Yuki; Kanaya, Shigehiko; Zhao, Yue; Ikemura, Toshimichi

2014-01-01

With remarkable increase of genomic sequence data of a wide range of species, novel tools are needed for comprehensive analyses of the big sequence data. Self-Organizing Map (SOM) is an effective tool for clustering and visualizing high-dimensional data such as oligonucleotide composition on one map. By modifying the conventional SOM, we have previously developed Batch-Learning SOM (BLSOM), which allows classification of sequence fragments according to species, solely depending on the oligonucleotide composition. In the present study, we introduce the oligonucleotide BLSOM used for characterization of vertebrate genome sequences. We first analyzed pentanucleotide compositions in 100 kb sequences derived from a wide range of vertebrate genomes and then the compositions in the human and mouse genomes in order to investigate an efficient method for detecting differences between the closely related genomes. BLSOM can recognize the species-specific key combination of oligonucleotide frequencies in each genome, which is called a "genome signature," and the specific regions specifically enriched in transcription-factor-binding sequences. Because the classification and visualization power is very high, BLSOM is an efficient powerful tool for extracting a wide range of information from massive amounts of genomic sequences (i.e., big sequence data).
Identification of human candidate genes for male infertility by digital differential display.

PubMed

Olesen, C; Hansen, C; Bendsen, E; Byskov, A G; Schwinger, E; Lopez-Pajares, I; Jensen, P K; Kristoffersson, U; Schubert, R; Van Assche, E; Wahlstroem, J; Lespinasse, J; Tommerup, N

2001-01-01

Evidence for the importance of genetic factors in male fertility is accumulating. In the literature and the Mendelian Cytogenetics Network database, 265 cases of infertile males with balanced reciprocal translocations have been described. The candidacy for infertility of 14 testis-expressed transcripts (TETs) were examined by comparing their chromosomal mapping position to the position of balanced reciprocal translocation breakpoints found in the 265 infertile males. The 14 TETs were selected by using digital differential display (electronic subtraction) to search for apparently testis-specific transcripts in the TIGR database. The testis specificity of the 14 TETs was further examined by reverse transcription-polymerase chain reaction (RT-PCR) on adult and fetal tissues showing that four TETs (TET1 to TET4) were testis-expressed only, six TETs (TET5 to TET10) appeared to be differentially expressed and the remaining four TETs (TET11 to TET14) were ubiquitously expressed. Interestingly, the two tesis expressed-only transcripts, TET1 and TET2, mapped to chromosomal regions where seven and six translocation breakpoints have been reported in infertile males respectively. Furthermore, one ubiquitously, but predominantly testis-expressed, transcript, TET11, mapped to 1p32-33, where 13 translocation breakpoints have been found in infertile males. Interestingly, the mouse mutation, skeletal fusions with sterility, sks, maps to the syntenic region in the mouse genome. Another transcript, TET7, was the human homologue of rat Tpx-1, which functions in the specific interaction of spermatogenic cells with Sertoli cells. TPX-1 maps to 6p21 where three cases of chromosomal breakpoints in infertile males have been reported. Finally, TET8 was a novel transcript which in the fetal stage is testis-specific, but in the adult is expressed in multiple tissues, including testis. We named this novel transcript fetal and adult testis-expressed transcript (FATE).
Survey of protein–DNA interactions in Aspergillus oryzae on a genomic scale

PubMed Central

Wang, Chao; Lv, Yangyong; Wang, Bin; Yin, Chao; Lin, Ying; Pan, Li

2015-01-01

The genome-scale delineation of in vivo protein–DNA interactions is key to understanding genome function. Only ∼5% of transcription factors (TFs) in the Aspergillus genus have been identified using traditional methods. Although the Aspergillus oryzae genome contains >600 TFs, knowledge of the in vivo genome-wide TF-binding sites (TFBSs) in aspergilli remains limited because of the lack of high-quality antibodies. We investigated the landscape of in vivo protein–DNA interactions across the A. oryzae genome through coupling the DNase I digestion of intact nuclei with massively parallel sequencing and the analysis of cleavage patterns in protein–DNA interactions at single-nucleotide resolution. The resulting map identified overrepresented de novo TF-binding motifs from genomic footprints, and provided the detailed chromatin remodeling patterns and the distribution of digital footprints near transcription start sites. The TFBSs of 19 known Aspergillus TFs were also identified based on DNase I digestion data surrounding potential binding sites in conjunction with TF binding specificity information. We observed that the cleavage patterns of TFBSs were dependent on the orientation of TF motifs and independent of strand orientation, consistent with the DNA shape features of binding motifs with flanking sequences. PMID:25883143
The fission yeast CENP-B protein Abp1 prevents pervasive transcription of repetitive DNA elements.

PubMed

Daulny, Anne; Mejía-Ramírez, Eva; Reina, Oscar; Rosado-Lugo, Jesus; Aguilar-Arnal, Lorena; Auer, Herbert; Zaratiegui, Mikel; Azorin, Fernando

2016-10-01

It is well established that eukaryotic genomes are pervasively transcribed producing cryptic unstable transcripts (CUTs). However, the mechanisms regulating pervasive transcription are not well understood. Here, we report that the fission yeast CENP-B homolog Abp1 plays an important role in preventing pervasive transcription. We show that loss of abp1 results in the accumulation of CUTs, which are targeted for degradation by the exosome pathway. These CUTs originate from different types of genomic features, but the highest increase corresponds to Tf2 retrotransposons and rDNA repeats, where they map along the entire elements. In the absence of abp1, increased RNAPII-Ser5P occupancy is observed throughout the Tf2 coding region and, unexpectedly, RNAPII-Ser5P is enriched at rDNA repeats. Loss of abp1 also results in Tf2 derepression and increased nucleolus size. Altogether these results suggest that Abp1 prevents pervasive RNAPII transcription of repetitive DNA elements (i.e., Tf2 and rDNA repeats) from internal cryptic sites. Copyright © 2016 Elsevier B.V. All rights reserved.
Genomic and Transcriptomic Resolution of Organic Matter Utilization Among Deep-Sea Bacteria in Guaymas Basin Hydrothermal Plumes.

PubMed

Li, Meng; Jain, Sunit; Dick, Gregory J

2016-01-01

Microbial chemosynthesis within deep-sea hydrothermal vent plumes is a regionally important source of organic carbon to the deep ocean. Although chemolithoautotrophs within hydrothermal plumes have attracted much attention, a gap remains in understanding the fate of organic carbon produced via chemosynthesis. In the present study, we conducted shotgun metagenomic and metatranscriptomic sequencing on samples from deep-sea hydrothermal vent plumes and surrounding background seawaters at Guaymas Basin (GB) in the Gulf of California. De novo assembly of metagenomic reads and binning by tetranucleotide signatures using emergent self-organizing maps (ESOM) revealed 66 partial and nearly complete bacterial genomes. These bacterial genomes belong to 10 different phyla: Actinobacteria, Bacteroidetes, Chloroflexi, Deferribacteres, Firmicutes, Gemmatimonadetes, Nitrospirae, Planctomycetes, Proteobacteria, Verrucomicrobia. Although several major transcriptionally active bacterial groups (Methylococcaceae, Methylomicrobium, SUP05, and SAR324) displayed methanotrophic and chemolithoautotrophic metabolisms, most other bacterial groups contain genes encoding extracellular peptidases and carbohydrate metabolizing enzymes with significantly higher transcripts in the plume than in background, indicating they are involved in degrading organic carbon derived from hydrothermal chemosynthesis. Among the most abundant and active heterotrophic bacteria in deep-sea hydrothermal plumes are Planctomycetes, which accounted for seven genomes with distinct functional and transcriptional activities. The Gemmatimonadetes and Verrucomicrobia also had abundant transcripts involved in organic carbon utilization. These results extend our knowledge of heterotrophic metabolism of bacterial communities in deep-sea hydrothermal plumes.
Genomic and Transcriptomic Resolution of Organic Matter Utilization Among Deep-Sea Bacteria in Guaymas Basin Hydrothermal Plumes

PubMed Central

Li, Meng; Jain, Sunit; Dick, Gregory J.

2016-01-01

Microbial chemosynthesis within deep-sea hydrothermal vent plumes is a regionally important source of organic carbon to the deep ocean. Although chemolithoautotrophs within hydrothermal plumes have attracted much attention, a gap remains in understanding the fate of organic carbon produced via chemosynthesis. In the present study, we conducted shotgun metagenomic and metatranscriptomic sequencing on samples from deep-sea hydrothermal vent plumes and surrounding background seawaters at Guaymas Basin (GB) in the Gulf of California. De novo assembly of metagenomic reads and binning by tetranucleotide signatures using emergent self-organizing maps (ESOM) revealed 66 partial and nearly complete bacterial genomes. These bacterial genomes belong to 10 different phyla: Actinobacteria, Bacteroidetes, Chloroflexi, Deferribacteres, Firmicutes, Gemmatimonadetes, Nitrospirae, Planctomycetes, Proteobacteria, Verrucomicrobia. Although several major transcriptionally active bacterial groups (Methylococcaceae, Methylomicrobium, SUP05, and SAR324) displayed methanotrophic and chemolithoautotrophic metabolisms, most other bacterial groups contain genes encoding extracellular peptidases and carbohydrate metabolizing enzymes with significantly higher transcripts in the plume than in background, indicating they are involved in degrading organic carbon derived from hydrothermal chemosynthesis. Among the most abundant and active heterotrophic bacteria in deep-sea hydrothermal plumes are Planctomycetes, which accounted for seven genomes with distinct functional and transcriptional activities. The Gemmatimonadetes and Verrucomicrobia also had abundant transcripts involved in organic carbon utilization. These results extend our knowledge of heterotrophic metabolism of bacterial communities in deep-sea hydrothermal plumes. PMID:27512389
Retrotransposon profiling of RNA polymerase III initiation sites.

PubMed

Qi, Xiaojie; Daily, Kenneth; Nguyen, Kim; Wang, Haoyi; Mayhew, David; Rigor, Paul; Forouzan, Sholeh; Johnston, Mark; Mitra, Robi David; Baldi, Pierre; Sandmeyer, Suzanne

2012-04-01

Although retroviruses are relatively promiscuous in choice of integration sites, retrotransposons can display marked integration specificity. In yeast and slime mold, some retrotransposons are associated with tRNA genes (tDNAs). In the Saccharomyces cerevisiae genome, the long terminal repeat retrotransposon Ty3 is found at RNA polymerase III (Pol III) transcription start sites of tDNAs. Ty1, 2, and 4 elements also cluster in the upstream regions of these genes. To determine the extent to which other Pol III-transcribed genes serve as genomic targets for Ty3, a set of 10,000 Ty3 genomic retrotranspositions were mapped using high-throughput DNA sequencing. Integrations occurred at all known tDNAs, two tDNA relics (iYGR033c and ZOD1), and six non-tDNA, Pol III-transcribed types of genes (RDN5, SNR6, SNR52, RPR1, RNA170, and SCR1). Previous work in vitro demonstrated that the Pol III transcription factor (TF) IIIB is important for Ty3 targeting. However, seven loci that bind the TFIIIB loader, TFIIIC, were not targeted, underscoring the unexplained absence of TFIIIB at those sites. Ty3 integrations also occurred in two open reading frames not previously associated with Pol III transcription, suggesting the existence of a small number of additional sites in the yeast genome that interact with Pol III transcription complexes.
Improving a Synechocystis-based photoautotrophic chassis through systematic genome mapping and validation of neutral sites.

PubMed

Pinto, Filipe; Pacheco, Catarina C; Oliveira, Paulo; Montagud, Arnau; Landels, Andrew; Couto, Narciso; Wright, Phillip C; Urchueguía, Javier F; Tamagnini, Paula

2015-12-01

The use of microorganisms as cell factories frequently requires extensive molecular manipulation. Therefore, the identification of genomic neutral sites for the stable integration of ectopic DNA is required to ensure a successful outcome. Here we describe the genome mapping and validation of five neutral sites in the chromosome of Synechocystis sp. PCC 6803, foreseeing the use of this cyanobacterium as a photoautotrophic chassis. To evaluate the neutrality of these loci, insertion/deletion mutants were produced, and to assess their functionality, a synthetic green fluorescent reporter module was introduced. The constructed integrative vectors include a BioBrick-compatible multiple cloning site insulated by transcription terminators, constituting robust cloning interfaces for synthetic biology approaches. Moreover, Synechocystis mutants (chassis) ready to receive purpose-built synthetic modules/circuits are also available. This work presents a systematic approach to map and validate chromosomal neutral sites in cyanobacteria, and that can be extended to other organisms. © The Author 2015. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Mining whole genomes and transcriptomes of Jatropha (Jatropha curcas) and Castor bean (Ricinus communis) for NBS-LRR genes and defense response associated transcription factors.

PubMed

Sood, Archit; Jaiswal, Varun; Chanumolu, Sree Krishna; Malhotra, Nikhil; Pal, Tarun; Chauhan, Rajinder Singh

2014-11-01

Jatropha (Jatropha curcas L.) and Castor bean (Ricinus communis) are oilseed crops of family Euphorbiaceae with the potential of producing high quality biodiesel and having industrial value. Both the bioenergy plants are becoming susceptible to various biotic stresses directly affecting the oil quality and content. No report exists as of today on analysis of Nucleotide Binding Site-Leucine Rich Repeat (NBS-LRR) gene repertoire and defense response transcription factors in both the plant species. In silico analysis of whole genomes and transcriptomes identified 47 new NBS-LRR genes in both the species and 122 and 318 defense response related transcription factors in Jatropha and Castor bean, respectively. The identified NBS-LRR genes and defense response transcription factors were mapped onto the respective genomes. Common and unique NBS-LRR genes and defense related transcription factors were identified in both the plant species. All NBS-LRR genes in both the species were characterized into Toll/interleukin-1 receptor NBS-LRRs (TNLs) and coiled-coil NBS-LRRs (CNLs), position on contigs, gene clusters and motifs and domains distribution. Transcript abundance or expression values were measured for all NBS-LRR genes and defense response transcription factors, suggesting their functional role. The current study provides a repertoire of NBS-LRR genes and transcription factors which can be used in not only dissecting the molecular basis of disease resistance phenotype but also in developing disease resistant genotypes in Jatropha and Castor bean through transgenic or molecular breeding approaches.
Transcription forms and remodels supercoiling domains unfolding large-scale chromatin structures

PubMed Central

Naughton, Catherine; Avlonitis, Nicolaos; Corless, Samuel; Prendergast, James G.; Mati, Ioulia K.; Eijk, Paul P.; Cockroft, Scott L.; Bradley, Mark; Ylstra, Bauke; Gilbert, Nick

2013-01-01

DNA supercoiling is an inherent consequence of twisting DNA and is critical for regulating gene expression and DNA replication. However, DNA supercoiling at a genomic scale in human cells is uncharacterized. To map supercoiling we used biotinylated-trimethylpsoralen as a DNA structure probe to show the genome is organized into supercoiling domains. Domains are formed and remodeled by RNA polymerase and topoisomerase activities and are flanked by GC-AT boundaries and CTCF binding sites. Under-wound domains are transcriptionally active, enriched in topoisomerase I, “open” chromatin fibers and DNaseI sites, but are depleted of topoisomerase II. Furthermore DNA supercoiling impacts on additional levels of chromatin compaction as under-wound domains are cytologically decondensed, topologically constrained, and decompacted by transcription of short RNAs. We suggest that supercoiling domains create a topological environment that facilitates gene activation providing an evolutionary purpose for clustering genes along chromosomes. PMID:23416946
A Rare SNP Identified a TCP Transcription Factor Essential for Tendril Development in Cucumber.

PubMed

Wang, Shenhao; Yang, Xueyong; Xu, Mengnan; Lin, Xingzhong; Lin, Tao; Qi, Jianjian; Shao, Guangjin; Tian, Nana; Yang, Qing; Zhang, Zhonghua; Huang, Sanwen

2015-12-07

Rare genetic variants are abundant in genomes but less tractable in genome-wide association study. Here we exploit a strategy of rare variation mapping to discover a gene essential for tendril development in cucumber (Cucumis sativus L.). In a collection of >3000 lines, we discovered a unique tendril-less line that forms branches instead of tendrils and, therefore, loses its climbing ability. We hypothesized that this unusual phenotype was caused by a rare variation and subsequently identified the causative single nucleotide polymorphism. The affected gene TEN encodes a TCP transcription factor conserved within the cucurbits and is expressed specifically in tendrils, representing a new organ identity gene. The variation occurs within a protein motif unique to the cucurbits and impairs its function as a transcriptional activator. Analyses of transcriptomes from near-isogenic lines identified downstream genes required for the tendril's capability to sense and climb a support. This study provides an example to explore rare functional variants in plant genomes. Copyright © 2015 The Author. Published by Elsevier Inc. All rights reserved.
Hydroxymethyl cytosine marks in the human mitochondrial genome are dynamic in nature.

PubMed

Ghosh, Sourav; Sengupta, Shantanu; Scaria, Vinod

2016-03-01

Apart from DNA methylation, hydroxymethylation has increasingly been studied as an important epigenetic mark. 5- hydroxymethylcytosines, though initially were thought to be an intermediary product of demethylation, recent studies suggest this to be a highly regulated process and modulated by the TET family of enzymes. Recent genome wide studies have shown that hydroxymethylcytosine marks are closely associated with the regulation of important biological processes like transcription and embryonic development. It is also known that aberrant hydroxymethylation marks have been associated with diseases like cancer. The presence of hydroxymethylcytosines in the mitochondrial genome has been earlier suggested, though the genome-scale map has not been laid out. In this present study, we have mapped and analyzed the hydroxymethylcytosine marks in the mitochondrial genome using 23 different publicly available datasets. We cross validated our data by checking for consistency across a subset of genomic regions previously annotated to hydroxymethylcytosines and show good consistency. We observe a dynamic distribution of hydroxymethylation marks in the mitochondrial genome. Unlike the methylcytosine marks, hydroxymethylcytosine marks are characterized by the lack of conservation across the samples considered, though similar cell types shared the pattern. We additionally observed that the hydroxymethylation marks are enriched in the upstream of GSS (gene start site) regions and in gene body as similar as nuclear genes. To the best of our knowledge, this is the first genome-scale map of hydroxymethyl cytosines in the human mitochondrial genome. Copyright © 2016 Elsevier B.V. and Mitochondria Research Society. All rights reserved.

A high-density genetic map of Arachis duranensis, a diploid ancestor of cultivated peanut

PubMed Central

2012-01-01

Background Cultivated peanut (Arachis hypogaea) is an allotetraploid species whose ancestral genomes are most likely derived from the A-genome species, A. duranensis, and the B-genome species, A. ipaensis. The very recent (several millennia) evolutionary origin of A. hypogaea has imposed a bottleneck for allelic and phenotypic diversity within the cultigen. However, wild diploid relatives are a rich source of alleles that could be used for crop improvement and their simpler genomes can be more easily analyzed while providing insight into the structure of the allotetraploid peanut genome. The objective of this research was to establish a high-density genetic map of the diploid species A. duranensis based on de novo generated EST databases. Arachis duranensis was chosen for mapping because it is the A-genome progenitor of cultivated peanut and also in order to circumvent the confounding effects of gene duplication associated with allopolyploidy in A. hypogaea. Results More than one million expressed sequence tag (EST) sequences generated from normalized cDNA libraries of A. duranensis were assembled into 81,116 unique transcripts. Mining this dataset, 1236 EST-SNP markers were developed between two A. duranensis accessions, PI 475887 and Grif 15036. An additional 300 SNP markers also were developed from genomic sequences representing conserved legume orthologs. Of the 1536 SNP markers, 1054 were placed on a genetic map. In addition, 598 EST-SSR markers identified in A. hypogaea assemblies were included in the map along with 37 disease resistance gene candidate (RGC) and 35 other previously published markers. In total, 1724 markers spanning 1081.3 cM over 10 linkage groups were mapped. Gene sequences that provided mapped markers were annotated using similarity searches in three different databases, and gene ontology descriptions were determined using the Medicago Gene Atlas and TAIR databases. Synteny analysis between A. duranensis, Medicago and Glycine revealed significant stretches of conserved gene clusters spread across the peanut genome. A higher level of colinearity was detected between A. duranensis and Glycine than with Medicago. Conclusions The first high-density, gene-based linkage map for A. duranensis was generated that can serve as a reference map for both wild and cultivated Arachis species. The markers developed here are valuable resources for the peanut, and more broadly, to the legume research community. The A-genome map will have utility for fine mapping in other peanut species and has already had application for mapping a nematode resistance gene that was introgressed into A. hypogaea from A. cardenasii. PMID:22967170
Clone DB: an integrated NCBI resource for clone-associated data

PubMed Central

Schneider, Valerie A.; Chen, Hsiu-Chuan; Clausen, Cliff; Meric, Peter A.; Zhou, Zhigang; Bouk, Nathan; Husain, Nora; Maglott, Donna R.; Church, Deanna M.

2013-01-01

The National Center for Biotechnology Information (NCBI) Clone DB (http://www.ncbi.nlm.nih.gov/clone/) is an integrated resource providing information about and facilitating access to clones, which serve as valuable research reagents in many fields, including genome sequencing and variation analysis. Clone DB represents an expansion and replacement of the former NCBI Clone Registry and has records for genomic and cell-based libraries and clones representing more than 100 different eukaryotic taxa. Records provide details of library construction, associated sequences, map positions and information about resource distribution. Clone DB is indexed in the NCBI Entrez system and can be queried by fields that include organism, clone name, gene name and sequence identifier. Whenever possible, genomic clones are mapped to reference assemblies and their map positions provided in clone records. Clones mapping to specific genomic regions can also be searched for using the NCBI Clone Finder tool, which accepts queries based on sequence coordinates or features such as gene or transcript names. Clone DB makes reports of library, clone and placement data on its FTP site available for download. With Clone DB, users now have available to them a centralized resource that provides them with the tools they will need to make use of these important research reagents. PMID:23193260
Stallion sperm transcriptome comprises functionally coherent coding and regulatory RNAs as revealed by microarray analysis and RNA-seq.

PubMed

Das, Pranab J; McCarthy, Fiona; Vishnoi, Monika; Paria, Nandina; Gresham, Cathy; Li, Gang; Kachroo, Priyanka; Sudderth, A Kendrick; Teague, Sheila; Love, Charles C; Varner, Dickson D; Chowdhary, Bhanu P; Raudsepp, Terje

2013-01-01

Mature mammalian sperm contain a complex population of RNAs some of which might regulate spermatogenesis while others probably play a role in fertilization and early development. Due to this limited knowledge, the biological functions of sperm RNAs remain enigmatic. Here we report the first characterization of the global transcriptome of the sperm of fertile stallions. The findings improved understanding of the biological significance of sperm RNAs which in turn will allow the discovery of sperm-based biomarkers for stallion fertility. The stallion sperm transcriptome was interrogated by analyzing sperm and testes RNA on a 21,000-element equine whole-genome oligoarray and by RNA-seq. Microarray analysis revealed 6,761 transcripts in the sperm, of which 165 were sperm-enriched, and 155 were differentially expressed between the sperm and testes. Next, 70 million raw reads were generated by RNA-seq of which 50% could be aligned with the horse reference genome. A total of 19,257 sequence tags were mapped to all horse chromosomes and the mitochondrial genome. The highest density of mapped transcripts was in gene-rich ECA11, 12 and 13, and the lowest in gene-poor ECA9 and X; 7 gene transcripts originated from ECAY. Structural annotation aligned sperm transcripts with 4,504 known horse and/or human genes, rRNAs and 82 miRNAs, whereas 13,354 sequence tags remained anonymous. The data were aligned with selected equine gene models to identify additional exons and splice variants. Gene Ontology annotations showed that sperm transcripts were associated with molecular processes (chemoattractant-activated signal transduction, ion transport) and cellular components (membranes and vesicles) related to known sperm functions at fertilization, while some messenger and micro RNAs might be critical for early development. The findings suggest that the rich repertoire of coding and non-coding RNAs in stallion sperm is not a random remnant from spermatogenesis in testes but a selectively retained and functionally coherent collection of RNAs.
Stallion Sperm Transcriptome Comprises Functionally Coherent Coding and Regulatory RNAs as Revealed by Microarray Analysis and RNA-seq

PubMed Central

Das, Pranab J.; McCarthy, Fiona; Vishnoi, Monika; Paria, Nandina; Gresham, Cathy; Li, Gang; Kachroo, Priyanka; Sudderth, A. Kendrick; Teague, Sheila; Love, Charles C.; Varner, Dickson D.; Chowdhary, Bhanu P.; Raudsepp, Terje

2013-01-01

Mature mammalian sperm contain a complex population of RNAs some of which might regulate spermatogenesis while others probably play a role in fertilization and early development. Due to this limited knowledge, the biological functions of sperm RNAs remain enigmatic. Here we report the first characterization of the global transcriptome of the sperm of fertile stallions. The findings improved understanding of the biological significance of sperm RNAs which in turn will allow the discovery of sperm-based biomarkers for stallion fertility. The stallion sperm transcriptome was interrogated by analyzing sperm and testes RNA on a 21,000-element equine whole-genome oligoarray and by RNA-seq. Microarray analysis revealed 6,761 transcripts in the sperm, of which 165 were sperm-enriched, and 155 were differentially expressed between the sperm and testes. Next, 70 million raw reads were generated by RNA-seq of which 50% could be aligned with the horse reference genome. A total of 19,257 sequence tags were mapped to all horse chromosomes and the mitochondrial genome. The highest density of mapped transcripts was in gene-rich ECA11, 12 and 13, and the lowest in gene-poor ECA9 and X; 7 gene transcripts originated from ECAY. Structural annotation aligned sperm transcripts with 4,504 known horse and/or human genes, rRNAs and 82 miRNAs, whereas 13,354 sequence tags remained anonymous. The data were aligned with selected equine gene models to identify additional exons and splice variants. Gene Ontology annotations showed that sperm transcripts were associated with molecular processes (chemoattractant-activated signal transduction, ion transport) and cellular components (membranes and vesicles) related to known sperm functions at fertilization, while some messenger and micro RNAs might be critical for early development. The findings suggest that the rich repertoire of coding and non-coding RNAs in stallion sperm is not a random remnant from spermatogenesis in testes but a selectively retained and functionally coherent collection of RNAs. PMID:23409192
Identification and molecular characterization of MYB Transcription Factor Superfamily in C4 model plant foxtail millet (Setaria italica L.).

PubMed

Muthamilarasan, Mehanathan; Khandelwal, Rohit; Yadav, Chandra Bhan; Bonthala, Venkata Suresh; Khan, Yusuf; Prasad, Manoj

2014-01-01

MYB proteins represent one of the largest transcription factor families in plants, playing important roles in diverse developmental and stress-responsive processes. Considering its significance, several genome-wide analyses have been conducted in almost all land plants except foxtail millet. Foxtail millet (Setaria italica L.) is a model crop for investigating systems biology of millets and bioenergy grasses. Further, the crop is also known for its potential abiotic stress-tolerance. In this context, a comprehensive genome-wide survey was conducted and 209 MYB protein-encoding genes were identified in foxtail millet. All 209 S. italica MYB (SiMYB) genes were physically mapped onto nine chromosomes of foxtail millet. Gene duplication study showed that segmental- and tandem-duplication have occurred in genome resulting in expansion of this gene family. The protein domain investigation classified SiMYB proteins into three classes according to number of MYB repeats present. The phylogenetic analysis categorized SiMYBs into ten groups (I-X). SiMYB-based comparative mapping revealed a maximum orthology between foxtail millet and sorghum, followed by maize, rice and Brachypodium. Heat map analysis showed tissue-specific expression pattern of predominant SiMYB genes. Expression profiling of candidate MYB genes against abiotic stresses and hormone treatments using qRT-PCR revealed specific and/or overlapping expression patterns of SiMYBs. Taken together, the present study provides a foundation for evolutionary and functional characterization of MYB TFs in foxtail millet to dissect their functions in response to environmental stimuli.
Genome-wide uniformity of human ‘open’ pre-initiation complexes

PubMed Central

Lai, William K.M.; Pugh, B. Franklin

2017-01-01

Transcription of protein-coding and noncoding DNA occurs pervasively throughout the mammalian genome. Their sites of initiation are generally inferred from transcript 5′ ends and are thought to be either locally dispersed or focused. How these two modes of initiation relate is unclear. Here, we apply permanganate treatment and chromatin immunoprecipitation (PIP-seq) of initiation factors to identify the precise location of melted DNA separately associated with the preinitiation complex (PIC) and the adjacent paused complex (PC). This approach revealed the two known modes of transcription initiation. However, in contrast to prevailing views, they co-occurred within the same promoter region: initiation originating from a focused PIC, and broad nucleosome-linked initiation. PIP-seq allowed transcriptional orientation of Pol II to be determined, which may be useful near promoters where sufficient sense/anti-sense transcript mapping information is lacking. PIP-seq detected divergently oriented Pol II at both coding and noncoding promoters, as well as at enhancers. Their occupancy levels were not necessarily coupled in the two orientations. DNA sequence and shape analysis of initiation complex sites suggest that both sequence and shape contribute to specificity, but in a context-restricted manner. That is, initiation sites have the locally “best” initiator (INR) sequence and/or shape. These findings reveal a common core to pervasive Pol II initiation throughout the human genome. PMID:27927716
Variation of DNA methylation patterns associated with gene expression in rice (Oryza sativa) exposed to cadmium.

PubMed

Feng, Sheng Jun; Liu, Xue Song; Tao, Hua; Tan, Shang Kun; Chu, Shan Shan; Oono, Youko; Zhang, Xian Duo; Chen, Jian; Yang, Zhi Min

2016-12-01

We report genome-wide single-base resolution maps of methylated cytosines and transcriptome change in Cd-exposed rice. Widespread differences were identified in CG and non-CG methylation marks between Cd-exposed and Cd-free rice genomes. There are 2320 non-redundant differentially methylated regions detected in the genome. RNA sequencing revealed 2092 DNA methylation-modified genes differentially expressed under Cd exposure. More genes were found hypermethylated than those hypomethylated in CG, CHH and CHG (where H is A, C or T) contexts in upstream, gene body and downstream regions. Many of the genes were involved in stress response, metal transport and transcription factors. Most of the DNA methylation-modified genes were transcriptionally altered under Cd stress. A subset of loss of function mutants defective in DNA methylation and histone modification activities was used to identify transcript abundance of selected genes. Compared with wide type, mutation of MET1 and DRM2 resulted in general lower transcript levels of the genes under Cd stress. Transcripts of OsIRO2, OsPR1b and Os09g02214 in drm2 were significantly reduced. A commonly used DNA methylation inhibitor 5-azacytidine was employed to investigate whether DNA demethylation affected physiological consequences. 5-azacytidine provision decreased general DNA methylation levels of selected genes, but promoted growth of rice seedlings and Cd accumulation in rice plant. © 2016 John Wiley & Sons Ltd.
Ontology and diversity of transcript-associated microsatellites mined from a globe artichoke EST database

PubMed Central

Scaglione, Davide; Acquadro, Alberto; Portis, Ezio; Taylor, Christopher A; Lanteri, Sergio; Knapp, Steven J

2009-01-01

Background The globe artichoke (Cynara cardunculus var. scolymus L.) is a significant crop in the Mediterranean basin. Despite its commercial importance and its both dietary and pharmaceutical value, knowledge of its genetics and genomics remains scant. Microsatellite markers have become a key tool in genetic and genomic analysis, and we have exploited recently acquired EST (expressed sequence tag) sequence data (Composite Genome Project - CGP) to develop an extensive set of microsatellite markers. Results A unigene assembly was created from over 36,000 globe artichoke EST sequences, containing 6,621 contigs and 12,434 singletons. Over 12,000 of these unigenes were functionally assigned on the basis of homology with Arabidopsis thaliana reference proteins. A total of 4,219 perfect repeats, located within 3,308 unigenes was identified and the gene ontology (GO) analysis highlighted some GO term's enrichments among different classes of microsatellites with respect to their position. Sufficient flanking sequence was available to enable the design of primers to amplify 2,311 of these microsatellites, and a set of 300 was tested against a DNA panel derived from 28 C. cardunculus genotypes. Consistent amplification and polymorphism was obtained from 236 of these assays. Their polymorphic information content (PIC) ranged from 0.04 to 0.90 (mean 0.66). Between 176 and 198 of the assays were informative in at least one of the three available mapping populations. Conclusion EST-based microsatellites have provided a large set of de novo genetic markers, which show significant amounts of polymorphism both between and within the three taxa of C. cardunculus. They are thus well suited as assays for phylogenetic analysis, the construction of genetic maps, marker-assisted breeding, transcript mapping and other genomic applications in the species. PMID:19785740
Genome network medicine: innovation to overcome huge challenges in cancer therapy.

PubMed

Roukos, Dimitrios H

2014-01-01

The post-ENCODE era shapes now a new biomedical research direction for understanding transcriptional and signaling networks driving gene expression and core cellular processes such as cell fate, survival, and apoptosis. Over the past half century, the Francis Crick 'central dogma' of single n gene/protein-phenotype (trait/disease) has defined biology, human physiology, disease, diagnostics, and drugs discovery. However, the ENCODE project and several other genomic studies using high-throughput sequencing technologies, computational strategies, and imaging techniques to visualize regulatory networks, provide evidence that transcriptional process and gene expression are regulated by highly complex dynamic molecular and signaling networks. This Focus article describes the linear experimentation-based limitations of diagnostics and therapeutics to cure advanced cancer and the need to move on from reductionist to network-based approaches. With evident a wide genomic heterogeneity, the power and challenges of next-generation sequencing (NGS) technologies to identify a patient's personal mutational landscape for tailoring the best target drugs in the individual patient are discussed. However, the available drugs are not capable of targeting aberrant signaling networks and research on functional transcriptional heterogeneity and functional genome organization is poorly understood. Therefore, the future clinical genome network medicine aiming at overcoming multiple problems in the new fields of regulatory DNA mapping, noncoding RNA, enhancer RNAs, and dynamic complexity of transcriptional circuitry are also discussed expecting in new innovation technology and strong appreciation of clinical data and evidence-based medicine. The problematic and potential solutions in the discovery of next-generation, molecular, and signaling circuitry-based biomarkers and drugs are explored. © 2013 Wiley Periodicals, Inc.
Accumulation of unstable promoter-associated transcripts upon loss of the nuclear exosome subunit Rrp6p in Saccharomyces cerevisiae

PubMed Central

Davis, Carrie Anne; Ares, Manuel

2006-01-01

Mutations in RRP6 result in the accumulation of aberrant polyadenylated transcripts from small nucleolar RNA genes. We exploited this observation to search for novel noncoding RNA genes in the yeast genome. When RNA from rrp6Δ yeast is compared with wild-type on whole-genome microarrays, numerous intergenic loci exhibit an increased mutant/wild type signal ratio. Among these loci, we found one encoding a new C/D box small nucleolar RNA, as well as a surprising number that gave rise to heterogeneous Trf4p-polyadenylated RNAs with lengths of ≈250–500 nt. This class of RNAs is not easily detected in wild-type cells and appears associated with promoters. Fine mapping of several such transcripts shows they originate near known promoter elements but do not usually extend far enough to act as mRNAs, and may regulate the transcription of downstream mRNAs. Rather than being uninformative transcriptional “noise,” we hypothesize that these transcripts reflect important features of RNA polymerase activity at the promoter. This activity is normally undetectable in wild-type cells because the transcripts are somehow distinguished from true mRNAs and are degraded in an Rrp6p-dependent fashion in the nucleus. PMID:16484372
ChimeRScope: a novel alignment-free algorithm for fusion transcript prediction using paired-end RNA-Seq data

PubMed Central

Li, You; Heavican, Tayla B.; Vellichirammal, Neetha N.; Iqbal, Javeed

2017-01-01

Abstract The RNA-Seq technology has revolutionized transcriptome characterization not only by accurately quantifying gene expression, but also by the identification of novel transcripts like chimeric fusion transcripts. The ‘fusion’ or ‘chimeric’ transcripts have improved the diagnosis and prognosis of several tumors, and have led to the development of novel therapeutic regimen. The fusion transcript detection is currently accomplished by several software packages, primarily relying on sequence alignment algorithms. The alignment of sequencing reads from fusion transcript loci in cancer genomes can be highly challenging due to the incorrect mapping induced by genomic alterations, thereby limiting the performance of alignment-based fusion transcript detection methods. Here, we developed a novel alignment-free method, ChimeRScope that accurately predicts fusion transcripts based on the gene fingerprint (as k-mers) profiles of the RNA-Seq paired-end reads. Results on published datasets and in-house cancer cell line datasets followed by experimental validations demonstrate that ChimeRScope consistently outperforms other popular methods irrespective of the read lengths and sequencing depth. More importantly, results on our in-house datasets show that ChimeRScope is a better tool that is capable of identifying novel fusion transcripts with potential oncogenic functions. ChimeRScope is accessible as a standalone software at (https://github.com/ChimeRScope/ChimeRScope/wiki) or via the Galaxy web-interface at (https://galaxy.unmc.edu/). PMID:28472320
Regulating RNA polymerase pausing and transcription elongation in embryonic stem cells

PubMed Central

Min, Irene M.; Waterfall, Joshua J.; Core, Leighton J.; Munroe, Robert J.; Schimenti, John; Lis, John T.

2011-01-01

Transitions between pluripotent stem cells and differentiated cells are executed by key transcription regulators. Comparative measurements of RNA polymerase distribution over the genome's primary transcription units in different cell states can identify the genes and steps in the transcription cycle that are regulated during such transitions. To identify the complete transcriptional profiles of RNA polymerases with high sensitivity and resolution, as well as the critical regulated steps upon which regulatory factors act, we used genome-wide nuclear run-on (GRO-seq) to map the density and orientation of transcriptionally engaged RNA polymerases in mouse embryonic stem cells (ESCs) and mouse embryonic fibroblasts (MEFs). In both cell types, progression of a promoter-proximal, paused RNA polymerase II (Pol II) into productive elongation is a rate-limiting step in transcription of ∼40% of mRNA-encoding genes. Importantly, quantitative comparisons between cell types reveal that transcription is controlled frequently at paused Pol II's entry into elongation. Furthermore, “bivalent” ESC genes (exhibiting both active and repressive histone modifications) bound by Polycomb group complexes PRC1 (Polycomb-repressive complex 1) and PRC2 show dramatically reduced levels of paused Pol II at promoters relative to an average gene. In contrast, bivalent promoters bound by only PRC2 allow Pol II pausing, but it is confined to extremely 5′ proximal regions. Altogether, these findings identify rate-limiting targets for transcription regulation during cell differentiation. PMID:21460038
Gramene 2013: comparative plant genomics resources.

PubMed

Monaco, Marcela K; Stein, Joshua; Naithani, Sushma; Wei, Sharon; Dharmawardhana, Palitha; Kumari, Sunita; Amarasinghe, Vindhya; Youens-Clark, Ken; Thomason, James; Preece, Justin; Pasternak, Shiran; Olson, Andrew; Jiao, Yinping; Lu, Zhenyuan; Bolser, Dan; Kerhornou, Arnaud; Staines, Dan; Walts, Brandon; Wu, Guanming; D'Eustachio, Peter; Haw, Robin; Croft, David; Kersey, Paul J; Stein, Lincoln; Jaiswal, Pankaj; Ware, Doreen

2014-01-01

Gramene (http://www.gramene.org) is a curated online resource for comparative functional genomics in crops and model plant species, currently hosting 27 fully and 10 partially sequenced reference genomes in its build number 38. Its strength derives from the application of a phylogenetic framework for genome comparison and the use of ontologies to integrate structural and functional annotation data. Whole-genome alignments complemented by phylogenetic gene family trees help infer syntenic and orthologous relationships. Genetic variation data, sequences and genome mappings available for 10 species, including Arabidopsis, rice and maize, help infer putative variant effects on genes and transcripts. The pathways section also hosts 10 species-specific metabolic pathways databases developed in-house or by our collaborators using Pathway Tools software, which facilitates searches for pathway, reaction and metabolite annotations, and allows analyses of user-defined expression datasets. Recently, we released a Plant Reactome portal featuring 133 curated rice pathways. This portal will be expanded for Arabidopsis, maize and other plant species. We continue to provide genetic and QTL maps and marker datasets developed by crop researchers. The project provides a unique community platform to support scientific research in plant genomics including studies in evolution, genetics, plant breeding, molecular biology, biochemistry and systems biology.
Single strand conformation polymorphism based SNP and Indel markers for genetic mapping and synteny analysis of common bean (Phaseolus vulgaris L.)

PubMed Central

2009-01-01

Background Expressed sequence tags (ESTs) are an important source of gene-based markers such as those based on insertion-deletions (Indels) or single-nucleotide polymorphisms (SNPs). Several gel based methods have been reported for the detection of sequence variants, however they have not been widely exploited in common bean, an important legume crop of the developing world. The objectives of this project were to develop and map EST based markers using analysis of single strand conformation polymorphisms (SSCPs), to create a transcript map for common bean and to compare synteny of the common bean map with sequenced chromosomes of other legumes. Results A set of 418 EST based amplicons were evaluated for parental polymorphisms using the SSCP technique and 26% of these presented a clear conformational or size polymorphism between Andean and Mesoamerican genotypes. The amplicon based markers were then used for genetic mapping with segregation analysis performed in the DOR364 × G19833 recombinant inbred line (RIL) population. A total of 118 new marker loci were placed into an integrated molecular map for common bean consisting of 288 markers. Of these, 218 were used for synteny analysis and 186 presented homology with segments of the soybean genome with an e-value lower than 7 × 10-12. The synteny analysis with soybean showed a mosaic pattern of syntenic blocks with most segments of any one common bean linkage group associated with two soybean chromosomes. The analysis with Medicago truncatula and Lotus japonicus presented fewer syntenic regions consistent with the more distant phylogenetic relationship between the galegoid and phaseoloid legumes. Conclusion The SSCP technique is a useful and inexpensive alternative to other SNP or Indel detection techniques for saturating the common bean genetic map with functional markers that may be useful in marker assisted selection. In addition, the genetic markers based on ESTs allowed the construction of a transcript map and given their high conservation between species allowed synteny comparisons to be made to sequenced genomes. This synteny analysis may support positional cloning of target genes in common bean through the use of genomic information from these other legumes. PMID:20030833
Binding Sites Analyser (BiSA): Software for Genomic Binding Sites Archiving and Overlap Analysis

PubMed Central

Khushi, Matloob; Liddle, Christopher; Clarke, Christine L.; Graham, J. Dinny

2014-01-01

Genome-wide mapping of transcription factor binding and histone modification reveals complex patterns of interactions. Identifying overlaps in binding patterns by different factors is a major objective of genomic studies, but existing methods to archive large numbers of datasets in a personalised database lack sophistication and utility. Therefore we have developed transcription factor DNA binding site analyser software (BiSA), for archiving of binding regions and easy identification of overlap with or proximity to other regions of interest. Analysis results can be restricted by chromosome or base pair overlap between regions or maximum distance between binding peaks. BiSA is capable of reporting overlapping regions that share common base pairs; regions that are nearby; regions that are not overlapping; and average region sizes. BiSA can identify genes located near binding regions of interest, genomic features near a gene or locus of interest and statistical significance of overlapping regions can also be reported. Overlapping results can be visualized as Venn diagrams. A major strength of BiSA is that it is supported by a comprehensive database of publicly available transcription factor binding sites and histone modifications, which can be directly compared to user data. The documentation and source code are available on http://bisa.sourceforge.net PMID:24533055
New bioinformatic tool for quick identification of functionally relevant endogenous retroviral inserts in human genome.

PubMed

Garazha, Andrew; Ivanova, Alena; Suntsova, Maria; Malakhova, Galina; Roumiantsev, Sergey; Zhavoronkov, Alex; Buzdin, Anton

2015-01-01

Endogenous retroviruses (ERVs) and LTR retrotransposons (LRs) occupy ∼8% of human genome. Deep sequencing technologies provide clues to understanding of functional relevance of individual ERVs/LRs by enabling direct identification of transcription factor binding sites (TFBS) and other landmarks of functional genomic elements. Here, we performed the genome-wide identification of human ERVs/LRs containing TFBS according to the ENCODE project. We created the first interactive ERV/LRs database that groups the individual inserts according to their familial nomenclature, number of mapped TFBS and divergence from their consensus sequence. Information on any particular element can be easily extracted by the user. We also created a genome browser tool, which enables quick mapping of any ERV/LR insert according to genomic coordinates, known human genes and TFBS. These tools can be used to easily explore functionally relevant individual ERV/LRs, and for studying their impact on the regulation of human genes. Overall, we identified ∼110,000 ERV/LR genomic elements having TFBS. We propose a hypothesis of "domestication" of ERV/LR TFBS by the genome milieu including subsequent stages of initial epigenetic repression, partial functional release, and further mutation-driven reshaping of TFBS in tight coevolution with the enclosing genomic loci.
Genome-Wide Discovery of Microsatellite Markers from Diploid Progenitor Species, Arachis duranensis and A. ipaensis, and Their Application in Cultivated Peanut (A. hypogaea)

PubMed Central

Zhao, Chuanzhi; Qiu, Jingjing; Agarwal, Gaurav; Wang, Jiangshan; Ren, Xuezhen; Xia, Han; Guo, Baozhu; Ma, Changle; Wan, Shubo; Bertioli, David J.; Varshney, Rajeev K.; Pandey, Manish K.; Wang, Xingjun

2017-01-01

Despite several efforts in the last decade toward development of simple sequence repeat (SSR) markers in peanut, there is still a need for more markers for conducting different genetic and breeding studies. With the effort of the International Peanut Genome Initiative, the availability of reference genome for both the diploid progenitors of cultivated peanut allowed us to identify 135,529 and 199,957 SSRs from the A (Arachis duranensis) and B genomes (Arachis ipaensis), respectively. Genome sequence analysis showed uneven distribution of the SSR motifs across genomes with variation in parameters such as SSR type, repeat number, and SSR length. Using the flanking sequences of identified SSRs, primers were designed for 51,354 and 60,893 SSRs with densities of 49 and 45 SSRs per Mb in A. duranensis and A. ipaensis, respectively. In silico PCR analysis of these SSR markers showed high transferability between wild and cultivated Arachis species. Two physical maps were developed for the A genome and the B genome using these SSR markers, and two reported disease resistance quantitative trait loci (QTLs), qF2TSWV5 for tomato spotted wilt virus (TSWV) and qF2LS6 for leaf spot (LS), were mapped in the 8.135 Mb region of chromosome A04 of A. duranensis. From this genomic region, 719 novel SSR markers were developed, which provide the possibility for fine mapping of these QTLs. In addition, this region also harbors 652 genes and 49 of these are defense related genes, including two NB-ARC genes, three LRR receptor-like genes and three WRKY transcription factors. These disease resistance related genes could contribute to resistance to viral (such as TSWV) and fungal (such as LS) diseases in peanut. In summary, this study not only provides a large number of molecular markers for potential use in peanut genetic map development and QTL mapping but also for map-based gene cloning and molecular breeding. PMID:28769940
Prunus transcription factors: breeding perspectives

PubMed Central

Bianchi, Valmor J.; Rubio, Manuel; Trainotti, Livio; Verde, Ignazio; Bonghi, Claudio; Martínez-Gómez, Pedro

2015-01-01

Many plant processes depend on differential gene expression, which is generally controlled by complex proteins called transcription factors (TFs). In peach, 1533 TFs have been identified, accounting for about 5.5% of the 27,852 protein-coding genes. These TFs are the reference for the rest of the Prunus species. TF studies in Prunus have been performed on the gene expression analysis of different agronomic traits, including control of the flowering process, fruit quality, and biotic and abiotic stress resistance. These studies, using quantitative RT-PCR, have mainly been performed in peach, and to a lesser extent in other species, including almond, apricot, black cherry, Fuji cherry, Japanese apricot, plum, and sour and sweet cherry. Other tools have also been used in TF studies, including cDNA-AFLP, LC-ESI-MS, RNA, and DNA blotting or mapping. More recently, new tools assayed include microarray and high-throughput DNA sequencing (DNA-Seq) and RNA sequencing (RNA-Seq). New functional genomics opportunities include genome resequencing and the well-known synteny among Prunus genomes and transcriptomes. These new functional studies should be applied in breeding programs in the development of molecular markers. With the genome sequences available, some strategies that have been used in model systems (such as SNP genotyping assays and genotyping-by-sequencing) may be applicable in the functional analysis of Prunus TFs as well. In addition, the knowledge of the gene functions and position in the peach reference genome of the TFs represents an additional advantage. These facts could greatly facilitate the isolation of genes via QTL (quantitative trait loci) map-based cloning in the different Prunus species, following the association of these TFs with the identified QTLs using the peach reference genome. PMID:26124770
Characterization of the Structural Gene Promoter of Aedes aegypti Densovirus

PubMed Central

Ward, Todd W.; Kimmick, Michael W.; Afanasiev, Boris N.; Carlson, Jonathan O.

2001-01-01

Aedes aegypti densonucleosis virus (AeDNV) has two promoters that have been shown to be active by reporter gene expression analysis (B. N. Afanasiev, Y. V. Koslov, J. O. Carlson, and B. J. Beaty, Exp. Parasitol. 79:322–339, 1994). Northern blot analysis of cells infected with AeDNV revealed two transcripts 1,200 and 3,500 nucleotides in length that are assumed to express the structural protein (VP) gene and nonstructural protein genes, respectively. Primer extension was used to map the transcriptional start site of the structural protein gene. Surprisingly, the structural protein gene transcript began at an initiator consensus sequence, CAGT, 60 nucleotides upstream from the map unit 61 TATAA sequence previously thought to define the promoter. Constructs with the β-galactosidase gene fused to the structural protein gene were used to determine elements necessary for promoter function. Deletion or mutation of the initiator sequence, CAGT, reduced protein expression by 93%, whereas mutation of the TATAA sequence at map unit 61 had little effect. An additional open reading frame was observed upstream of the structural protein gene that can express β-galactosidase at a low level (20% of that of VP fusions). Expression of the AeDNV structural protein gene was shown to be stimulated by the major nonstructural protein NS1 (Afanasiev et al., Exp. parasitol., 1994). To determine the sequences required for transactivation, expression of structural protein gene–β-galactosidase gene fusion constructs differing in AeDNV genome content was measured with and without NS1. The presence of NS1 led to an 8- to 10-fold increase in expression when either genomic end was present, compared to a 2-fold increase with a construct lacking the genomic ends. An even higher (37-fold) increase in expression occurred with both genomic ends present; however, this was in part due to template replication as shown by Southern blot analysis. These data indicate the location and importance of various elements necessary for efficient protein expression and transactivation from the structural protein gene promoter of AeDNV. PMID:11152505
The FlyBase database of the Drosophila genome projects and community literature

PubMed Central

2002-01-01

FlyBase (http://flybase.bio.indiana.edu/) provides an integrated view of the fundamental genomic and genetic data on the major genetic model Drosophila melanogaster and related species. Following on the success of the Drosophila genome project, FlyBase has primary responsibility for the continual reannotation of the D.melanogaster genome. The ultimate goal of the reannotation effort is to decorate the euchromatic sequence of the genome with as much biological information as is available from the community and from the major genome project centers. The current cycle of reannotation focuses on establishing a comprehensive data set of gene models (i.e. transcription units and CDSs). There are many points of entry to the genome within FlyBase, most notably through maps, gene ontologies, structured phenotypic and gene expression data, and anatomy. PMID:11752267

Global transcriptional analysis of psoriatic skin and blood confirms known disease-associated pathways and highlights novel genomic "hot spots" for differentially expressed genes.

PubMed

Coda, Alvin B; Icen, Murat; Smith, Jason R; Sinha, Animesh A

2012-07-01

There are major gaps in our knowledge regarding the exact mechanisms and genetic basis of psoriasis. To investigate the pathogenesis of psoriasis, gene expression in 10 skin (5 lesional, 5 nonlesional) and 11 blood (6 psoriatic, 5 nonpsoriatic) samples were examined using Affymetrix HG-U95A microarrays. We detected 535 (425 upregulated, 110 downregulated) DEGs in lesional skin at 1% false discovery rate (FDR). Combining nine microarray studies comparing lesional and nonlesional psoriatic skin, 34.5% of dysregulated genes were overlapped in multiple studies. We further identified 20 skin and 2 blood associated transcriptional "hot spots" at specified genomic locations. At 5% FDR, 11.8% skin and 10.4% blood DEGs in our study mapped to one of the 12 PSORS loci. DEGs that overlap with PSORS loci may offer prioritized targets for downstream genetic fine mapping studies. Novel DEG "hot spots" may provide new targets for defining susceptibility loci in future studies. Copyright © 2012 Elsevier Inc. All rights reserved.
DNA Physical Mapping via the Controlled Translocation of Single Molecules through a 5-10nm Silicon Nitride Nanopore

NASA Astrophysics Data System (ADS)

Stein, Derek; Reisner, Walter; Jiang, Zhijun; Hagerty, Nick; Wood, Charles; Chan, Jason

2009-03-01

The ability to map the binding position of sequence-specific markers, including transcription-factors, protein-nucleic acids (PNAs) or deactivated restriction enzymes, along a single DNA molecule in a nanofluidic device would be of key importance for the life-sciences. Such markers could give an indication of the active genes at particular stage in a cell's transcriptional cycle, pinpoint the location of mutations or even provide a DNA barcode that could aid in genomics applications. We have developed a setup consisting of a 5-10 nm nanopore in a 20nm thick silicon nitride film coupled to an optical tweezer setup. The translocation of DNA across the nanopore can be detected via blockades in the electrical current through the pore. By anchoring one end of the translocating DNA to an optically trapped microsphere, we hope to stretch out the molecule in the nanopore and control the translocation speed, enabling us to slowly scan across the genome and detect changes in the baseline current due to the presence of bound markers.
Pathogenesis, Molecular Genetics, and Genomics of Mycobacterium avium subsp. paratuberculosis, the Etiologic Agent of Johne’s Disease

PubMed Central

Rathnaiah, Govardhan; Zinniel, Denise K.; Bannantine, John P.; Stabel, Judith R.; Gröhn, Yrjö T.; Collins, Michael T.; Barletta, Raúl G.

2017-01-01

Mycobacterium avium subsp. paratuberculosis (MAP) is the etiologic agent of Johne’s disease in ruminants causing chronic diarrhea, malnutrition, and muscular wasting. Neonates and young animals are infected primarily by the fecal–oral route. MAP attaches to, translocates via the intestinal mucosa, and is phagocytosed by macrophages. The ensuing host cellular immune response leads to granulomatous enteritis characterized by a thick and corrugated intestinal wall. We review various tissue culture systems, ileal loops, and mice, goats, and cattle used to study MAP pathogenesis. MAP can be detected in clinical samples by microscopy, culturing, PCR, and an enzyme-linked immunosorbent assay. There are commercial vaccines that reduce clinical disease and shedding, unfortunately, their efficacies are limited and may not engender long-term protective immunity. Moreover, the potential linkage with Crohn’s disease and other human diseases makes MAP a concern as a zoonotic pathogen. Potential therapies with anti-mycobacterial agents are also discussed. The completion of the MAP K-10 genome sequence has greatly improved our understanding of MAP pathogenesis. The analysis of this sequence has identified a wide range of gene functions involved in virulence, lipid metabolism, transcriptional regulation, and main metabolic pathways. We also review the transposons utilized to generate random transposon mutant libraries and the recent advances in the post-genomic era. This includes the generation and characterization of allelic exchange mutants, transcriptomic analysis, transposon mutant banks analysis, new efforts to generate comprehensive mutant libraries, and the application of transposon site hybridization mutagenesis and transposon sequencing for global analysis of the MAP genome. Further analysis of candidate vaccine strains development is also provided with critical discussions on their benefits and shortcomings, and strategies to develop a highly efficacious live-attenuated vaccine capable of differentiating infected from vaccinated animals. PMID:29164142
An anatomically comprehensive atlas of the adult human brain transcriptome

PubMed Central

Guillozet-Bongaarts, Angela L.; Shen, Elaine H.; Ng, Lydia; Miller, Jeremy A.; van de Lagemaat, Louie N.; Smith, Kimberly A.; Ebbert, Amanda; Riley, Zackery L.; Abajian, Chris; Beckmann, Christian F.; Bernard, Amy; Bertagnolli, Darren; Boe, Andrew F.; Cartagena, Preston M.; Chakravarty, M. Mallar; Chapin, Mike; Chong, Jimmy; Dalley, Rachel A.; David Daly, Barry; Dang, Chinh; Datta, Suvro; Dee, Nick; Dolbeare, Tim A.; Faber, Vance; Feng, David; Fowler, David R.; Goldy, Jeff; Gregor, Benjamin W.; Haradon, Zeb; Haynor, David R.; Hohmann, John G.; Horvath, Steve; Howard, Robert E.; Jeromin, Andreas; Jochim, Jayson M.; Kinnunen, Marty; Lau, Christopher; Lazarz, Evan T.; Lee, Changkyu; Lemon, Tracy A.; Li, Ling; Li, Yang; Morris, John A.; Overly, Caroline C.; Parker, Patrick D.; Parry, Sheana E.; Reding, Melissa; Royall, Joshua J.; Schulkin, Jay; Sequeira, Pedro Adolfo; Slaughterbeck, Clifford R.; Smith, Simon C.; Sodt, Andy J.; Sunkin, Susan M.; Swanson, Beryl E.; Vawter, Marquis P.; Williams, Derric; Wohnoutka, Paul; Zielke, H. Ronald; Geschwind, Daniel H.; Hof, Patrick R.; Smith, Stephen M.; Koch, Christof; Grant, Seth G. N.; Jones, Allan R.

2014-01-01

Neuroanatomically precise, genome-wide maps of transcript distributions are critical resources to complement genomic sequence data and to correlate functional and genetic brain architecture. Here we describe the generation and analysis of a transcriptional atlas of the adult human brain, comprising extensive histological analysis and comprehensive microarray profiling of ~900 neuroanatomically precise subdivisions in two individuals. Transcriptional regulation varies enormously by anatomical location, with different regions and their constituent cell types displaying robust molecular signatures that are highly conserved between individuals. Analysis of differential gene expression and gene co-expression relationships demonstrates that brain-wide variation strongly reflects the distributions of major cell classes such as neurons, oligodendrocytes, astrocytes and microglia. Local neighbourhood relationships between fine anatomical subdivisions are associated with discrete neuronal subtypes and genes involved with synaptic transmission. The neocortex displays a relatively homogeneous transcriptional pattern, but with distinct features associated selectively with primary sensorimotor cortices and with enriched frontal lobe expression. Notably, the spatial topography of the neocortex is strongly reflected in its molecular topography— the closer two cortical regions, the more similar their transcriptomes. This freely accessible online data resource forms a high-resolution transcriptional baseline for neurogenetic studies of normal and abnormal human brain function. PMID:22996553
Genome-Wide Cell Type-Specific Mapping of In Vivo Chromatin Protein Binding Using an FLP-Inducible DamID System in Drosophila.

PubMed

Pindyurin, Alexey V

2017-01-01

A thorough study of the genome-wide binding patterns of chromatin proteins is essential for understanding the regulatory mechanisms of genomic processes in eukaryotic nuclei, including DNA replication, transcription, and repair. The DNA adenine methyltransferase identification (DamID) method is a powerful tool to identify genomic binding sites of chromatin proteins. This method does not require fixation of cells and the use of specific antibodies, and has been used to generate genome-wide binding maps of more than a hundred different proteins in Drosophila tissue culture cells. Recent versions of inducible DamID allow performing cell type-specific profiling of chromatin proteins even in small samples of Drosophila tissues that contain heterogeneous cell types. Importantly, with these methods sorting of cells of interest or their nuclei is not necessary as genomic DNA isolated from the whole tissue can be used as an input. Here, I describe in detail an FLP-inducible DamID method, namely generation of suitable transgenic flies, activation of the Dam transgenes by the FLP recombinase, isolation of DNA from small amounts of dissected tissues, and subsequent identification of the DNA binding sites of the chromatin proteins.
Comprehensive Transcriptome Profiling and Functional Analysis of the Frog (Bombina maxima) Immune System

PubMed Central

Zhao, Feng; Yan, Chao; Wang, Xuan; Yang, Yang; Wang, Guangyin; Lee, Wenhui; Xiang, Yang; Zhang, Yun

2014-01-01

Amphibians occupy a key phylogenetic position in vertebrates and evolution of the immune system. But, the resources of its transcriptome or genome are still little now. Bombina maxima possess strong ability to survival in very harsh environment with a more mature immune system. We obtained a comprehensive transcriptome by RNA-sequencing technology. 14.3% of transcripts were identified to be skin-specific genes, most of which were not isolated from skin secretion in previous works or novel non-coding RNAs. 27.9% of transcripts were mapped into 242 predicted KEGG pathways and 6.16% of transcripts related to human disease and cancer. Of 39 448 transcripts with the coding sequence, at least 1501 transcripts (570 genes) related to the immune system process. The molecules of immune signalling pathway were almost presented, several transcripts with high expression in skin and stomach. Experiments showed that lipopolysaccharide or bacteria challenge stimulated pro-inflammatory cytokine production and activation of pro-inflammatory caspase-1. These frog's data can remarkably expand the existing genome or transcriptome resources of amphibians, especially immunity data. The entity of the data provides a valuable platform for further investigation on more detailed immune response in B. maxima and a comparative study with other amphibians. PMID:23942912
MHC class I-associated peptides derive from selective regions of the human genome.

PubMed

Pearson, Hillary; Daouda, Tariq; Granados, Diana Paola; Durette, Chantal; Bonneil, Eric; Courcelles, Mathieu; Rodenbrock, Anja; Laverdure, Jean-Philippe; Côté, Caroline; Mader, Sylvie; Lemieux, Sébastien; Thibault, Pierre; Perreault, Claude

2016-12-01

MHC class I-associated peptides (MAPs) define the immune self for CD8+ T lymphocytes and are key targets of cancer immunosurveillance. Here, the goals of our work were to determine whether the entire set of protein-coding genes could generate MAPs and whether specific features influence the ability of discrete genes to generate MAPs. Using proteogenomics, we have identified 25,270 MAPs isolated from the B lymphocytes of 18 individuals who collectively expressed 27 high-frequency HLA-A,B allotypes. The entire MAP repertoire presented by these 27 allotypes covered only 10% of the exomic sequences expressed in B lymphocytes. Indeed, 41% of expressed protein-coding genes generated no MAPs, while 59% of genes generated up to 64 MAPs, often derived from adjacent regions and presented by different allotypes. We next identified several features of transcripts and proteins associated with efficient MAP production. From these data, we built a logistic regression model that predicts with good accuracy whether a gene generates MAPs. Our results show preferential selection of MAPs from a limited repertoire of proteins with distinctive features. The notion that the MHC class I immunopeptidome presents only a small fraction of the protein-coding genome for monitoring by the immune system has profound implications in autoimmunity and cancer immunology.
MHC class I–associated peptides derive from selective regions of the human genome

PubMed Central

Pearson, Hillary; Granados, Diana Paola; Durette, Chantal; Bonneil, Eric; Courcelles, Mathieu; Rodenbrock, Anja; Laverdure, Jean-Philippe; Côté, Caroline; Thibault, Pierre

2016-01-01

MHC class I–associated peptides (MAPs) define the immune self for CD8+ T lymphocytes and are key targets of cancer immunosurveillance. Here, the goals of our work were to determine whether the entire set of protein-coding genes could generate MAPs and whether specific features influence the ability of discrete genes to generate MAPs. Using proteogenomics, we have identified 25,270 MAPs isolated from the B lymphocytes of 18 individuals who collectively expressed 27 high-frequency HLA-A,B allotypes. The entire MAP repertoire presented by these 27 allotypes covered only 10% of the exomic sequences expressed in B lymphocytes. Indeed, 41% of expressed protein-coding genes generated no MAPs, while 59% of genes generated up to 64 MAPs, often derived from adjacent regions and presented by different allotypes. We next identified several features of transcripts and proteins associated with efficient MAP production. From these data, we built a logistic regression model that predicts with good accuracy whether a gene generates MAPs. Our results show preferential selection of MAPs from a limited repertoire of proteins with distinctive features. The notion that the MHC class I immunopeptidome presents only a small fraction of the protein-coding genome for monitoring by the immune system has profound implications in autoimmunity and cancer immunology. PMID:27841757
Comparative transcriptomics of two environmentally relevant cyanobacteria reveals unexpected transcriptome diversity

PubMed Central

Voigt, Karsten; Sharma, Cynthia M; Mitschke, Jan; Joke Lambrecht, S; Voß, Björn; Hess, Wolfgang R; Steglich, Claudia

2014-01-01

Prochlorococcus is a genus of abundant and ecologically important marine cyanobacteria. Here, we present a comprehensive comparison of the structure and composition of the transcriptomes of two Prochlorococcus strains, which, despite their similarities, have adapted their gene pool to specific environmental constraints. We present genome-wide maps of transcriptional start sites (TSS) for both organisms, which are representatives of the two most diverse clades within the two major ecotypes adapted to high- and low-light conditions, respectively. Our data suggest antisense transcription for three-quarters of all genes, which is substantially more than that observed in other bacteria. We discovered hundreds of TSS within genes, most notably within 16 of the 29 prochlorosin genes, in strain MIT9313. A direct comparison revealed very little conservation in the location of TSS and the nature of non-coding transcripts between both strains. We detected extremely short 5′ untranslated regions with a median length of only 27 and 29 nt for MED4 and MIT9313, respectively, and for 8% of all protein-coding genes the median distance to the start codon is only 10 nt or even shorter. These findings and the absence of an obvious Shine–Dalgarno motif suggest that leaderless translation and ribosomal protein S1-dependent translation constitute alternative mechanisms for translation initiation in Prochlorococcus. We conclude that genome-wide antisense transcription is a major component of the transcriptional output from these relatively small genomes and that a hitherto unrecognized high degree of complexity and variability of gene expression exists in their transcriptional architecture. PMID:24739626
A Herpesviral Immediate Early Protein Promotes Transcription Elongation of Viral Transcripts.

PubMed

Fox, Hannah L; Dembowski, Jill A; DeLuca, Neal A

2017-06-13

Herpes simplex virus 1 (HSV-1) genes are transcribed by cellular RNA polymerase II (RNA Pol II). While four viral immediate early proteins (ICP4, ICP0, ICP27, and ICP22) function in some capacity in viral transcription, the mechanism by which ICP22 functions remains unclear. We observed that the FACT complex (comprised of SSRP1 and Spt16) was relocalized in infected cells as a function of ICP22. ICP22 was also required for the association of FACT and the transcription elongation factors SPT5 and SPT6 with viral genomes. We further demonstrated that the FACT complex interacts with ICP22 throughout infection. We therefore hypothesized that ICP22 recruits cellular transcription elongation factors to viral genomes for efficient transcription elongation of viral genes. We reevaluated the phenotype of an ICP22 mutant virus by determining the abundance of all viral mRNAs throughout infection by transcriptome sequencing (RNA-seq). The accumulation of almost all viral mRNAs late in infection was reduced compared to the wild type, regardless of kinetic class. Using chromatin immunoprecipitation sequencing (ChIP-seq), we mapped the location of RNA Pol II on viral genes and found that RNA Pol II levels on the bodies of viral genes were reduced in the ICP22 mutant compared to wild-type virus. In contrast, the association of RNA Pol II with transcription start sites in the mutant was not reduced. Taken together, our results indicate that ICP22 plays a role in recruiting elongation factors like the FACT complex to the HSV-1 genome to allow for efficient viral transcription elongation late in viral infection and ultimately infectious virion production. IMPORTANCE HSV-1 interacts with many cellular proteins throughout productive infection. Here, we demonstrate the interaction of a viral protein, ICP22, with a subset of cellular proteins known to be involved in transcription elongation. We determined that ICP22 is required to recruit the FACT complex and other transcription elongation factors to viral genomes and that in the absence of ICP22 viral transcription is globally reduced late in productive infection, due to an elongation defect. This insight defines a fundamental role of ICP22 in HSV-1 infection and elucidates the involvement of cellular factors in HSV-1 transcription. Copyright © 2017 Fox et al.
Identification and Molecular Characterization of MYB Transcription Factor Superfamily in C4 Model Plant Foxtail Millet (Setaria italica L.)

PubMed Central

Muthamilarasan, Mehanathan; Khandelwal, Rohit; Yadav, Chandra Bhan; Bonthala, Venkata Suresh; Khan, Yusuf; Prasad, Manoj

2014-01-01

MYB proteins represent one of the largest transcription factor families in plants, playing important roles in diverse developmental and stress-responsive processes. Considering its significance, several genome-wide analyses have been conducted in almost all land plants except foxtail millet. Foxtail millet (Setaria italica L.) is a model crop for investigating systems biology of millets and bioenergy grasses. Further, the crop is also known for its potential abiotic stress-tolerance. In this context, a comprehensive genome-wide survey was conducted and 209 MYB protein-encoding genes were identified in foxtail millet. All 209 S. italica MYB (SiMYB) genes were physically mapped onto nine chromosomes of foxtail millet. Gene duplication study showed that segmental- and tandem-duplication have occurred in genome resulting in expansion of this gene family. The protein domain investigation classified SiMYB proteins into three classes according to number of MYB repeats present. The phylogenetic analysis categorized SiMYBs into ten groups (I - X). SiMYB-based comparative mapping revealed a maximum orthology between foxtail millet and sorghum, followed by maize, rice and Brachypodium. Heat map analysis showed tissue-specific expression pattern of predominant SiMYB genes. Expression profiling of candidate MYB genes against abiotic stresses and hormone treatments using qRT-PCR revealed specific and/or overlapping expression patterns of SiMYBs. Taken together, the present study provides a foundation for evolutionary and functional characterization of MYB TFs in foxtail millet to dissect their functions in response to environmental stimuli. PMID:25279462
Conserved Regulators of Nucleolar Size Revealed by Global Phenotypic Analyses

PubMed Central

Neumüller, Ralph A.; Gross, Thomas; Samsonova, Anastasia A.; Vinayagam, Arunachalam; Buckner, Michael; Founk, Karen; Hu, Yanhui; Sharifpoor, Sara; Rosebrock, Adam P.; Andrews, Brenda; Winston, Fred; Perrimon, Norbert

2014-01-01

Regulation of cell growth is a fundamental process in development and disease that integrates a vast array of extra- and intracellular information. A central player in this process is RNA polymerase I (Pol I), which transcribes ribosomal RNA (rRNA) genes in the nucleolus. Rapidly growing cancer cells are characterized by increased Pol I–mediated transcription and, consequently, nucleolar hypertrophy. To map the genetic network underlying the regulation of nucleolar size and of Pol I–mediated transcription, we performed comparative, genome-wide loss-of-function analyses of nucleolar size in Saccharomyces cerevisiae and Drosophila melanogaster coupled with mass spectrometry–based analyses of the ribosomal DNA (rDNA) promoter. With this approach, we identified a set of conserved and nonconserved molecular complexes that control nucleolar size. Furthermore, we characterized a direct role of the histone information regulator (HIR) complex in repressing rRNA transcription in yeast. Our study provides a full-genome, cross-species analysis of a nuclear subcompartment and shows that this approach can identify conserved molecular modules. PMID:23962978
Transcriptional profiling of Medicago truncatula meristematic root cells

PubMed Central

Holmes, Peta; Goffard, Nicolas; Weiller, Georg F; Rolfe, Barry G; Imin, Nijat

2008-01-01

Background The root apical meristem of crop and model legume Medicago truncatula is a significantly different stem cell system to that of the widely studied model plant species Arabidopsis thaliana. In this study we used the Affymetrix Medicago GeneChip® to compare the transcriptomes of meristem and non-meristematic root to identify root meristem specific candidate genes. Results Using mRNA from root meristem and non-meristem we were able to identify 324 and 363 transcripts differentially expressed from the two regions. With bioinformatics tools developed to functionally annotate the Medicago genome array we could identify significant changes in metabolism, signalling and the differentially expression of 55 transcription factors in meristematic and non-meristematic roots. Conclusion This is the first comprehensive analysis of M. truncatula root meristem cells using this genome array. This data will facilitate the mapping of regulatory and metabolic networks involved in the open root meristem of M. truncatula and provides candidates for functional analysis. PMID:18302802
FANTOM5 CAGE profiles of human and mouse samples.

PubMed

Noguchi, Shuhei; Arakawa, Takahiro; Fukuda, Shiro; Furuno, Masaaki; Hasegawa, Akira; Hori, Fumi; Ishikawa-Kato, Sachi; Kaida, Kaoru; Kaiho, Ai; Kanamori-Katayama, Mutsumi; Kawashima, Tsugumi; Kojima, Miki; Kubosaki, Atsutaka; Manabe, Ri-Ichiroh; Murata, Mitsuyoshi; Nagao-Sato, Sayaka; Nakazato, Kenichi; Ninomiya, Noriko; Nishiyori-Sueki, Hiromi; Noma, Shohei; Saijyo, Eri; Saka, Akiko; Sakai, Mizuho; Simon, Christophe; Suzuki, Naoko; Tagami, Michihira; Watanabe, Shoko; Yoshida, Shigehiro; Arner, Peter; Axton, Richard A; Babina, Magda; Baillie, J Kenneth; Barnett, Timothy C; Beckhouse, Anthony G; Blumenthal, Antje; Bodega, Beatrice; Bonetti, Alessandro; Briggs, James; Brombacher, Frank; Carlisle, Ailsa J; Clevers, Hans C; Davis, Carrie A; Detmar, Michael; Dohi, Taeko; Edge, Albert S B; Edinger, Matthias; Ehrlund, Anna; Ekwall, Karl; Endoh, Mitsuhiro; Enomoto, Hideki; Eslami, Afsaneh; Fagiolini, Michela; Fairbairn, Lynsey; Farach-Carson, Mary C; Faulkner, Geoffrey J; Ferrai, Carmelo; Fisher, Malcolm E; Forrester, Lesley M; Fujita, Rie; Furusawa, Jun-Ichi; Geijtenbeek, Teunis B; Gingeras, Thomas; Goldowitz, Daniel; Guhl, Sven; Guler, Reto; Gustincich, Stefano; Ha, Thomas J; Hamaguchi, Masahide; Hara, Mitsuko; Hasegawa, Yuki; Herlyn, Meenhard; Heutink, Peter; Hitchens, Kelly J; Hume, David A; Ikawa, Tomokatsu; Ishizu, Yuri; Kai, Chieko; Kawamoto, Hiroshi; Kawamura, Yuki I; Kempfle, Judith S; Kenna, Tony J; Kere, Juha; Khachigian, Levon M; Kitamura, Toshio; Klein, Sarah; Klinken, S Peter; Knox, Alan J; Kojima, Soichi; Koseki, Haruhiko; Koyasu, Shigeo; Lee, Weonju; Lennartsson, Andreas; Mackay-Sim, Alan; Mejhert, Niklas; Mizuno, Yosuke; Morikawa, Hiromasa; Morimoto, Mitsuru; Moro, Kazuyo; Morris, Kelly J; Motohashi, Hozumi; Mummery, Christine L; Nakachi, Yutaka; Nakahara, Fumio; Nakamura, Toshiyuki; Nakamura, Yukio; Nozaki, Tadasuke; Ogishima, Soichi; Ohkura, Naganari; Ohno, Hiroshi; Ohshima, Mitsuhiro; Okada-Hatakeyama, Mariko; Okazaki, Yasushi; Orlando, Valerio; Ovchinnikov, Dmitry A; Passier, Robert; Patrikakis, Margaret; Pombo, Ana; Pradhan-Bhatt, Swati; Qin, Xian-Yang; Rehli, Michael; Rizzu, Patrizia; Roy, Sugata; Sajantila, Antti; Sakaguchi, Shimon; Sato, Hiroki; Satoh, Hironori; Savvi, Suzana; Saxena, Alka; Schmidl, Christian; Schneider, Claudio; Schulze-Tanzil, Gundula G; Schwegmann, Anita; Sheng, Guojun; Shin, Jay W; Sugiyama, Daisuke; Sugiyama, Takaaki; Summers, Kim M; Takahashi, Naoko; Takai, Jun; Tanaka, Hiroshi; Tatsukawa, Hideki; Tomoiu, Andru; Toyoda, Hiroo; van de Wetering, Marc; van den Berg, Linda M; Verardo, Roberto; Vijayan, Dipti; Wells, Christine A; Winteringham, Louise N; Wolvetang, Ernst; Yamaguchi, Yoko; Yamamoto, Masayuki; Yanagi-Mizuochi, Chiyo; Yoneda, Misako; Yonekura, Yohei; Zhang, Peter G; Zucchelli, Silvia; Abugessaisa, Imad; Arner, Erik; Harshbarger, Jayson; Kondo, Atsushi; Lassmann, Timo; Lizio, Marina; Sahin, Serkan; Sengstag, Thierry; Severin, Jessica; Shimoji, Hisashi; Suzuki, Masanori; Suzuki, Harukazu; Kawai, Jun; Kondo, Naoto; Itoh, Masayoshi; Daub, Carsten O; Kasukawa, Takeya; Kawaji, Hideya; Carninci, Piero; Forrest, Alistair R R; Hayashizaki, Yoshihide

2017-08-29

In the FANTOM5 project, transcription initiation events across the human and mouse genomes were mapped at a single base-pair resolution and their frequencies were monitored by CAGE (Cap Analysis of Gene Expression) coupled with single-molecule sequencing. Approximately three thousands of samples, consisting of a variety of primary cells, tissues, cell lines, and time series samples during cell activation and development, were subjected to a uniform pipeline of CAGE data production. The analysis pipeline started by measuring RNA extracts to assess their quality, and continued to CAGE library production by using a robotic or a manual workflow, single molecule sequencing, and computational processing to generate frequencies of transcription initiation. Resulting data represents the consequence of transcriptional regulation in each analyzed state of mammalian cells. Non-overlapping peaks over the CAGE profiles, approximately 200,000 and 150,000 peaks for the human and mouse genomes, were identified and annotated to provide precise location of known promoters as well as novel ones, and to quantify their activities.
FANTOM5 CAGE profiles of human and mouse samples

PubMed Central

Noguchi, Shuhei; Arakawa, Takahiro; Fukuda, Shiro; Furuno, Masaaki; Hasegawa, Akira; Hori, Fumi; Ishikawa-Kato, Sachi; Kaida, Kaoru; Kaiho, Ai; Kanamori-Katayama, Mutsumi; Kawashima, Tsugumi; Kojima, Miki; Kubosaki, Atsutaka; Manabe, Ri-ichiroh; Murata, Mitsuyoshi; Nagao-Sato, Sayaka; Nakazato, Kenichi; Ninomiya, Noriko; Nishiyori-Sueki, Hiromi; Noma, Shohei; Saijyo, Eri; Saka, Akiko; Sakai, Mizuho; Simon, Christophe; Suzuki, Naoko; Tagami, Michihira; Watanabe, Shoko; Yoshida, Shigehiro; Arner, Peter; Axton, Richard A.; Babina, Magda; Baillie, J. Kenneth; Barnett, Timothy C.; Beckhouse, Anthony G.; Blumenthal, Antje; Bodega, Beatrice; Bonetti, Alessandro; Briggs, James; Brombacher, Frank; Carlisle, Ailsa J.; Clevers, Hans C.; Davis, Carrie A.; Detmar, Michael; Dohi, Taeko; Edge, Albert S.B.; Edinger, Matthias; Ehrlund, Anna; Ekwall, Karl; Endoh, Mitsuhiro; Enomoto, Hideki; Eslami, Afsaneh; Fagiolini, Michela; Fairbairn, Lynsey; Farach-Carson, Mary C.; Faulkner, Geoffrey J.; Ferrai, Carmelo; Fisher, Malcolm E.; Forrester, Lesley M.; Fujita, Rie; Furusawa, Jun-ichi; Geijtenbeek, Teunis B.; Gingeras, Thomas; Goldowitz, Daniel; Guhl, Sven; Guler, Reto; Gustincich, Stefano; Ha, Thomas J.; Hamaguchi, Masahide; Hara, Mitsuko; Hasegawa, Yuki; Herlyn, Meenhard; Heutink, Peter; Hitchens, Kelly J.; Hume, David A.; Ikawa, Tomokatsu; Ishizu, Yuri; Kai, Chieko; Kawamoto, Hiroshi; Kawamura, Yuki I.; Kempfle, Judith S.; Kenna, Tony J.; Kere, Juha; Khachigian, Levon M.; Kitamura, Toshio; Klein, Sarah; Klinken, S. Peter; Knox, Alan J.; Kojima, Soichi; Koseki, Haruhiko; Koyasu, Shigeo; Lee, Weonju; Lennartsson, Andreas; Mackay-sim, Alan; Mejhert, Niklas; Mizuno, Yosuke; Morikawa, Hiromasa; Morimoto, Mitsuru; Moro, Kazuyo; Morris, Kelly J.; Motohashi, Hozumi; Mummery, Christine L.; Nakachi, Yutaka; Nakahara, Fumio; Nakamura, Toshiyuki; Nakamura, Yukio; Nozaki, Tadasuke; Ogishima, Soichi; Ohkura, Naganari; Ohno, Hiroshi; Ohshima, Mitsuhiro; Okada-Hatakeyama, Mariko; Okazaki, Yasushi; Orlando, Valerio; Ovchinnikov, Dmitry A.; Passier, Robert; Patrikakis, Margaret; Pombo, Ana; Pradhan-Bhatt, Swati; Qin, Xian-Yang; Rehli, Michael; Rizzu, Patrizia; Roy, Sugata; Sajantila, Antti; Sakaguchi, Shimon; Sato, Hiroki; Satoh, Hironori; Savvi, Suzana; Saxena, Alka; Schmidl, Christian; Schneider, Claudio; Schulze-Tanzil, Gundula G.; Schwegmann, Anita; Sheng, Guojun; Shin, Jay W.; Sugiyama, Daisuke; Sugiyama, Takaaki; Summers, Kim M.; Takahashi, Naoko; Takai, Jun; Tanaka, Hiroshi; Tatsukawa, Hideki; Tomoiu, Andru; Toyoda, Hiroo; van de Wetering, Marc; van den Berg, Linda M.; Verardo, Roberto; Vijayan, Dipti; Wells, Christine A.; Winteringham, Louise N.; Wolvetang, Ernst; Yamaguchi, Yoko; Yamamoto, Masayuki; Yanagi-Mizuochi, Chiyo; Yoneda, Misako; Yonekura, Yohei; Zhang, Peter G.; Zucchelli, Silvia; Abugessaisa, Imad; Arner, Erik; Harshbarger, Jayson; Kondo, Atsushi; Lassmann, Timo; Lizio, Marina; Sahin, Serkan; Sengstag, Thierry; Severin, Jessica; Shimoji, Hisashi; Suzuki, Masanori; Suzuki, Harukazu; Kawai, Jun; Kondo, Naoto; Itoh, Masayoshi; Daub, Carsten O.; Kasukawa, Takeya; Kawaji, Hideya; Carninci, Piero; Forrest, Alistair R.R.; Hayashizaki, Yoshihide

2017-01-01

In the FANTOM5 project, transcription initiation events across the human and mouse genomes were mapped at a single base-pair resolution and their frequencies were monitored by CAGE (Cap Analysis of Gene Expression) coupled with single-molecule sequencing. Approximately three thousands of samples, consisting of a variety of primary cells, tissues, cell lines, and time series samples during cell activation and development, were subjected to a uniform pipeline of CAGE data production. The analysis pipeline started by measuring RNA extracts to assess their quality, and continued to CAGE library production by using a robotic or a manual workflow, single molecule sequencing, and computational processing to generate frequencies of transcription initiation. Resulting data represents the consequence of transcriptional regulation in each analyzed state of mammalian cells. Non-overlapping peaks over the CAGE profiles, approximately 200,000 and 150,000 peaks for the human and mouse genomes, were identified and annotated to provide precise location of known promoters as well as novel ones, and to quantify their activities. PMID:28850106
Comparative genomics and transcriptional profiles of Saccharopolyspora erythraea NRRL 2338 and a classically improved erythromycin over-producing strain

PubMed Central

2012-01-01

Background The molecular mechanisms altered by the traditional mutation and screening approach during the improvement of antibiotic-producing microorganisms are still poorly understood although this information is essential to design rational strategies for industrial strain improvement. In this study, we applied comparative genomics to identify all genetic changes occurring during the development of an erythromycin overproducer obtained using the traditional mutate-and- screen method. Results Compared with the parental Saccharopolyspora erythraea NRRL 2338, the genome of the overproducing strain presents 117 deletion, 78 insertion and 12 transposition sites, with 71 insertion/deletion sites mapping within coding sequences (CDSs) and generating frame-shift mutations. Single nucleotide variations are present in 144 CDSs. Overall, the genomic variations affect 227 proteins of the overproducing strain and a considerable number of mutations alter genes of key enzymes in the central carbon and nitrogen metabolism and in the biosynthesis of secondary metabolites, resulting in the redirection of common precursors toward erythromycin biosynthesis. Interestingly, several mutations inactivate genes coding for proteins that play fundamental roles in basic transcription and translation machineries including the transcription anti-termination factor NusB and the transcription elongation factor Efp. These mutations, along with those affecting genes coding for pleiotropic or pathway-specific regulators, affect global expression profile as demonstrated by a comparative analysis of the parental and overproducer expression profiles. Genomic data, finally, suggest that the mutate-and-screen process might have been accelerated by mutations in DNA repair genes. Conclusions This study helps to clarify the mechanisms underlying antibiotic overproduction providing valuable information about new possible molecular targets for rationale strain improvement. PMID:22401291
Optimization of oligonucleotide arrays and RNA amplification protocols for analysis of transcript structure and alternative splicing.

PubMed

Castle, John; Garrett-Engele, Phil; Armour, Christopher D; Duenwald, Sven J; Loerch, Patrick M; Meyer, Michael R; Schadt, Eric E; Stoughton, Roland; Parrish, Mark L; Shoemaker, Daniel D; Johnson, Jason M

2003-01-01

Microarrays offer a high-resolution means for monitoring pre-mRNA splicing on a genomic scale. We have developed a novel, unbiased amplification protocol that permits labeling of entire transcripts. Also, hybridization conditions, probe characteristics, and analysis algorithms were optimized for detection of exons, exon-intron edges, and exon junctions. These optimized protocols can be used to detect small variations and isoform mixtures, map the tissue specificity of known human alternative isoforms, and provide a robust, scalable platform for high-throughput discovery of alternative splicing.
Optimization of oligonucleotide arrays and RNA amplification protocols for analysis of transcript structure and alternative splicing

PubMed Central

Castle, John; Garrett-Engele, Phil; Armour, Christopher D; Duenwald, Sven J; Loerch, Patrick M; Meyer, Michael R; Schadt, Eric E; Stoughton, Roland; Parrish, Mark L; Shoemaker, Daniel D; Johnson, Jason M

2003-01-01

Microarrays offer a high-resolution means for monitoring pre-mRNA splicing on a genomic scale. We have developed a novel, unbiased amplification protocol that permits labeling of entire transcripts. Also, hybridization conditions, probe characteristics, and analysis algorithms were optimized for detection of exons, exon-intron edges, and exon junctions. These optimized protocols can be used to detect small variations and isoform mixtures, map the tissue specificity of known human alternative isoforms, and provide a robust, scalable platform for high-throughput discovery of alternative splicing. PMID:14519201
Characterization of Equine Infectious Anemia Virus Integration in the Horse Genome.

PubMed

Liu, Qiang; Wang, Xue-Feng; Ma, Jian; He, Xi-Jun; Wang, Xiao-Jun; Zhou, Jian-Hua

2015-06-19

Human immunodeficiency virus (HIV)-1 has a unique integration profile in the human genome relative to murine and avian retroviruses. Equine infectious anemia virus (EIAV) is another well-studied lentivirus that can also be used as a promising retro-transfection vector, but its integration into its native host has not been characterized. In this study, we mapped 477 integration sites of the EIAV strain EIAVFDDV13 in fetal equine dermal (FED) cells during in vitro infection. Published integration sites of EIAV and HIV-1 in the human genome were also analyzed as references. Our results demonstrated that EIAVFDDV13 tended to integrate into genes and AT-rich regions, and it avoided integrating into transcription start sites (TSS), which is consistent with EIAV and HIV-1 integration in the human genome. Notably, the integration of EIAVFDDV13 favored long interspersed elements (LINEs) and DNA transposons in the horse genome, whereas the integration of HIV-1 favored short interspersed elements (SINEs) in the human genome. The chromosomal environment near LINEs or DNA transposons potentially influences viral transcription and may be related to the unique EIAV latency states in equids. The data on EIAV integration in its natural host will facilitate studies on lentiviral infection and lentivirus-based therapeutic vectors.
Genome-Wide Discovery of Long Non-Coding RNAs in Rainbow Trout.

PubMed

Al-Tobasei, Rafet; Paneru, Bam; Salem, Mohamed

2016-01-01

The ENCODE project revealed that ~70% of the human genome is transcribed. While only 1-2% of the RNAs encode for proteins, the rest are non-coding RNAs. Long non-coding RNAs (lncRNAs) form a diverse class of non-coding RNAs that are longer than 200 nt. Emerging evidence indicates that lncRNAs play critical roles in various cellular processes including regulation of gene expression. LncRNAs show low levels of gene expression and sequence conservation, which make their computational identification in genomes difficult. In this study, more than two billion Illumina sequence reads were mapped to the genome reference using the TopHat and Cufflinks software. Transcripts shorter than 200 nt, with more than 83-100 amino acids ORF, or with significant homologies to the NCBI nr-protein database were removed. In addition, a computational pipeline was used to filter the remaining transcripts based on a protein-coding-score test. Depending on the filtering stringency conditions, between 31,195 and 54,503 lncRNAs were identified, with only 421 matching known lncRNAs in other species. A digital gene expression atlas revealed 2,935 tissue-specific and 3,269 ubiquitously-expressed lncRNAs. This study annotates the lncRNA rainbow trout genome and provides a valuable resource for functional genomics research in salmonids.

Characterization of Equine Infectious Anemia Virus Integration in the Horse Genome

PubMed Central

Liu, Qiang; Wang, Xue-Feng; Ma, Jian; He, Xi-Jun; Wang, Xiao-Jun; Zhou, Jian-Hua

2015-01-01

Human immunodeficiency virus (HIV)-1 has a unique integration profile in the human genome relative to murine and avian retroviruses. Equine infectious anemia virus (EIAV) is another well-studied lentivirus that can also be used as a promising retro-transfection vector, but its integration into its native host has not been characterized. In this study, we mapped 477 integration sites of the EIAV strain EIAVFDDV13 in fetal equine dermal (FED) cells during in vitro infection. Published integration sites of EIAV and HIV-1 in the human genome were also analyzed as references. Our results demonstrated that EIAVFDDV13 tended to integrate into genes and AT-rich regions, and it avoided integrating into transcription start sites (TSS), which is consistent with EIAV and HIV-1 integration in the human genome. Notably, the integration of EIAVFDDV13 favored long interspersed elements (LINEs) and DNA transposons in the horse genome, whereas the integration of HIV-1 favored short interspersed elements (SINEs) in the human genome. The chromosomal environment near LINEs or DNA transposons potentially influences viral transcription and may be related to the unique EIAV latency states in equids. The data on EIAV integration in its natural host will facilitate studies on lentiviral infection and lentivirus-based therapeutic vectors. PMID:26102582
Spatiotemporal coupling and decoupling of gene transcription with DNA replication origins during embryogenesis in C. elegans

PubMed Central

Pourkarimi, Ehsan; Bellush, James M; Whitehouse, Iestyn

2016-01-01

The primary task of developing embryos is genome replication, yet how DNA replication is integrated with the profound cellular changes that occur through development is largely unknown. Using an approach to map DNA replication at high resolution in C. elegans, we show that replication origins are marked with specific histone modifications that define gene enhancers. We demonstrate that the level of enhancer associated modifications scale with the efficiency at which the origin is utilized. By mapping replication origins at different developmental stages, we show that the positions and activity of origins is largely invariant through embryogenesis. Contrary to expectation, we find that replication origins are specified prior to the broad onset of zygotic transcription, yet when transcription initiates it does so in close proximity to the pre-defined replication origins. Transcription and DNA replication origins are correlated, but the association breaks down when embryonic cell division ceases. Collectively, our data indicate that replication origins are fundamental organizers and regulators of gene activity through embryonic development. DOI: http://dx.doi.org/10.7554/eLife.21728.001 PMID:28009254
Mapping the subgenomic RNA promoter of the Citrus leaf blotch virus coat protein gene by Agrobacterium-mediated inoculation.

PubMed

Renovell, Agueda; Gago, Selma; Ruiz-Ruiz, Susana; Velázquez, Karelia; Navarro, Luis; Moreno, Pedro; Vives, Mari Carmen; Guerri, José

2010-10-25

Citrus leaf blotch virus has a single-stranded positive-sense genomic RNA (gRNA) of 8747 nt organized in three open reading frames (ORFs). The ORF1, encoding a polyprotein involved in replication, is translated directly from the gRNA, whereas ORFs encoding the movement (MP) and coat (CP) proteins are expressed via 3' coterminal subgenomic RNAs (sgRNAs). We characterized the minimal promoter region critical for the CP-sgRNA expression in infected cells by deletion analyses using Agrobacterium-mediated infection of Nicotiana benthamiana plants. The minimal CP-sgRNA promoter was mapped between nucleotides -67 and +50 nt around the transcription start site. Surprisingly, larger deletions in the region between the CP-sgRNA transcription start site and the CP translation initiation codon resulted in increased CP-sgRNA accumulation, suggesting that this sequence could modulate the CP-sgRNA transcription. Site-specific mutational analysis of the transcription start site revealed that the +1 guanylate and the +2 adenylate are important for CP-sgRNA synthesis. Copyright © 2010 Elsevier Inc. All rights reserved.
Analysis of Mycobacterium avium subsp. paratuberculosis mutant libraries reveals loci-dependent transcription biases and strategies to novel mutant discovery

USDA-ARS?s Scientific Manuscript database

Mycobacterium avium subsp. paratuberculosis (MAP) is the etiologic agent of Johne’s disease in ruminants and it has been implicated as a cause of Crohn’s disease in humans. The generation of comprehensive random mutant banks by transposon mutagenesis is a fundamental wide genomic technology utilized...
At-TAX: a whole genome tiling array resource for developmental expression analysis and transcript identification in Arabidopsis thaliana

PubMed Central

Laubinger, Sascha; Zeller, Georg; Henz, Stefan R; Sachsenberg, Timo; Widmer, Christian K; Naouar, Naïra; Vuylsteke, Marnik; Schölkopf, Bernhard; Rätsch, Gunnar; Weigel, Detlef

2008-01-01

Gene expression maps for model organisms, including Arabidopsis thaliana, have typically been created using gene-centric expression arrays. Here, we describe a comprehensive expression atlas, Arabidopsis thaliana Tiling Array Express (At-TAX), which is based on whole-genome tiling arrays. We demonstrate that tiling arrays are accurate tools for gene expression analysis and identified more than 1,000 unannotated transcribed regions. Visualizations of gene expression estimates, transcribed regions, and tiling probe measurements are accessible online at the At-TAX homepage. PMID:18613972
Whole-genome expression analysis of mammalian-wide interspersed repeat elements in human cell lines.

PubMed

Carnevali, Davide; Conti, Anastasia; Pellegrini, Matteo; Dieci, Giorgio

2017-02-01

With more than 500,000 copies, mammalian-wide interspersed repeats (MIRs), a sub-group of SINEs, represent ∼2.5% of the human genome and one of the most numerous family of potential targets for the RNA polymerase (Pol) III transcription machinery. Since MIR elements ceased to amplify ∼130 myr ago, previous studies primarily focused on their genomic impact, while the issue of their expression has not been extensively addressed. We applied a dedicated bioinformatic pipeline to ENCODE RNA-Seq datasets of seven human cell lines and, for the first time, we were able to define the Pol III-driven MIR transcriptome at single-locus resolution. While the majority of Pol III-transcribed MIR elements are cell-specific, we discovered a small set of ubiquitously transcribed MIRs mapping within Pol II-transcribed genes in antisense orientation that could influence the expression of the overlapping gene. We also identified novel Pol III-transcribed ncRNAs, deriving from transcription of annotated MIR fragments flanked by unique MIR-unrelated sequences, and confirmed the role of Pol III-specific internal promoter elements in MIR transcription. Besides demonstrating widespread transcription at these retrotranspositionally inactive elements in human cells, the ability to profile MIR expression at single-locus resolution will facilitate their study in different cell types and states including pathological alterations. © The Author 2016. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Global Analysis of Transcription Factor-Binding Sites in Yeast Using ChIP-Seq

PubMed Central

Lefrançois, Philippe; Gallagher, Jennifer E. G.; Snyder, Michael

2016-01-01

Transcription factors influence gene expression through their ability to bind DNA at specific regulatory elements. Specific DNA-protein interactions can be isolated through the chromatin immunoprecipitation (ChIP) procedure, in which DNA fragments bound by the protein of interest are recovered. ChIP is followed by high-throughput DNA sequencing (Seq) to determine the genomic provenance of ChIP DNA fragments and their relative abundance in the sample. This chapter describes a ChIP-Seq strategy adapted for budding yeast to enable the genome-wide characterization of binding sites of transcription factors (TFs) and other DNA-binding proteins in an efficient and cost-effective way. Yeast strains with epitope-tagged TFs are most commonly used for ChIP-Seq, along with their matching untagged control strains. The initial step of ChIP involves the cross-linking of DNA and proteins. Next, yeast cells are lysed and sonicated to shear chromatin into smaller fragments. An antibody against an epitope-tagged TF is used to pull down chromatin complexes containing DNA and the TF of interest. DNA is then purified and proteins degraded. Specific barcoded adapters for multiplex DNA sequencing are ligated to ChIP DNA. Short DNA sequence reads (28–36 base pairs) are parsed according to the barcode and aligned against the yeast reference genome, thus generating a nucleotide-resolution map of transcription factor-binding sites and their occupancy. PMID:25213249
Identification and characterization of a novel serine-threonine kinase gene from the Xp22 region.

PubMed

Montini, E; Andolfi, G; Caruso, A; Buchner, G; Walpole, S M; Mariani, M; Consalez, G; Trump, D; Ballabio, A; Franco, B

1998-08-01

Eukaryotic protein kinases are part of a large and expanding family of proteins. Through our transcriptional mapping effort in the Xp22 region, we have isolated and sequenced the full-length transcript of STK9, a novel cDNA highly homologous to serine-threonine kinases. A number of human genetic disorders have been mapped to the region where STK9 has been localized including Nance-Horan (NH) syndrome, oral-facial-digital syndrome type 1 (OFD1), and a novel locus for nonsyndromic sensorineural deafness (DFN6). To evaluate the possible involvement of STK9 in any of the above-mentioned disorders, a 2416-bp full-length cDNA was assembled. The entire genomic structure of the gene, which is composed of 20 coding exons, was determined. Northern analysis revealed a transcript larger than 9.5 kb in several tissues including brain, lung, and kidney. The mouse homologue (Stk9) was identified and mapped in the mouse in the region syntenic to human Xp. This location is compatible with the location of the Xcat mutant, which shows congenital cataracts very similar to those observed in NH patients. Sequence homologies, expression pattern, and mapping information in both human and mouse make STK9 a candidate gene for the above-mentioned disorders. Copyright 1998 Academic Press.
Discovery and Fine-Mapping of Glycaemic and Obesity-Related Trait Loci Using High-Density Imputation.

PubMed

Horikoshi, Momoko; Mӓgi, Reedik; van de Bunt, Martijn; Surakka, Ida; Sarin, Antti-Pekka; Mahajan, Anubha; Marullo, Letizia; Thorleifsson, Gudmar; Hӓgg, Sara; Hottenga, Jouke-Jan; Ladenvall, Claes; Ried, Janina S; Winkler, Thomas W; Willems, Sara M; Pervjakova, Natalia; Esko, Tõnu; Beekman, Marian; Nelson, Christopher P; Willenborg, Christina; Wiltshire, Steven; Ferreira, Teresa; Fernandez, Juan; Gaulton, Kyle J; Steinthorsdottir, Valgerdur; Hamsten, Anders; Magnusson, Patrik K E; Willemsen, Gonneke; Milaneschi, Yuri; Robertson, Neil R; Groves, Christopher J; Bennett, Amanda J; Lehtimӓki, Terho; Viikari, Jorma S; Rung, Johan; Lyssenko, Valeriya; Perola, Markus; Heid, Iris M; Herder, Christian; Grallert, Harald; Müller-Nurasyid, Martina; Roden, Michael; Hypponen, Elina; Isaacs, Aaron; van Leeuwen, Elisabeth M; Karssen, Lennart C; Mihailov, Evelin; Houwing-Duistermaat, Jeanine J; de Craen, Anton J M; Deelen, Joris; Havulinna, Aki S; Blades, Matthew; Hengstenberg, Christian; Erdmann, Jeanette; Schunkert, Heribert; Kaprio, Jaakko; Tobin, Martin D; Samani, Nilesh J; Lind, Lars; Salomaa, Veikko; Lindgren, Cecilia M; Slagboom, P Eline; Metspalu, Andres; van Duijn, Cornelia M; Eriksson, Johan G; Peters, Annette; Gieger, Christian; Jula, Antti; Groop, Leif; Raitakari, Olli T; Power, Chris; Penninx, Brenda W J H; de Geus, Eco; Smit, Johannes H; Boomsma, Dorret I; Pedersen, Nancy L; Ingelsson, Erik; Thorsteinsdottir, Unnur; Stefansson, Kari; Ripatti, Samuli; Prokopenko, Inga; McCarthy, Mark I; Morris, Andrew P

2015-07-01

Reference panels from the 1000 Genomes (1000G) Project Consortium provide near complete coverage of common and low-frequency genetic variation with minor allele frequency ≥0.5% across European ancestry populations. Within the European Network for Genetic and Genomic Epidemiology (ENGAGE) Consortium, we have undertaken the first large-scale meta-analysis of genome-wide association studies (GWAS), supplemented by 1000G imputation, for four quantitative glycaemic and obesity-related traits, in up to 87,048 individuals of European ancestry. We identified two loci for body mass index (BMI) at genome-wide significance, and two for fasting glucose (FG), none of which has been previously reported in larger meta-analysis efforts to combine GWAS of European ancestry. Through conditional analysis, we also detected multiple distinct signals of association mapping to established loci for waist-hip ratio adjusted for BMI (RSPO3) and FG (GCK and G6PC2). The index variant for one association signal at the G6PC2 locus is a low-frequency coding allele, H177Y, which has recently been demonstrated to have a functional role in glucose regulation. Fine-mapping analyses revealed that the non-coding variants most likely to drive association signals at established and novel loci were enriched for overlap with enhancer elements, which for FG mapped to promoter and transcription factor binding sites in pancreatic islets, in particular. Our study demonstrates that 1000G imputation and genetic fine-mapping of common and low-frequency variant association signals at GWAS loci, integrated with genomic annotation in relevant tissues, can provide insight into the functional and regulatory mechanisms through which their effects on glycaemic and obesity-related traits are mediated.
Discovery and Fine-Mapping of Glycaemic and Obesity-Related Trait Loci Using High-Density Imputation

PubMed Central

van de Bunt, Martijn; Surakka, Ida; Sarin, Antti-Pekka; Mahajan, Anubha; Marullo, Letizia; Thorleifsson, Gudmar; Hӓgg, Sara; Hottenga, Jouke-Jan; Ladenvall, Claes; Ried, Janina S.; Winkler, Thomas W.; Willems, Sara M.; Pervjakova, Natalia; Esko, Tõnu; Beekman, Marian; Nelson, Christopher P.; Willenborg, Christina; Ferreira, Teresa; Fernandez, Juan; Gaulton, Kyle J.; Steinthorsdottir, Valgerdur; Hamsten, Anders; Magnusson, Patrik K. E.; Willemsen, Gonneke; Milaneschi, Yuri; Robertson, Neil R.; Groves, Christopher J.; Bennett, Amanda J.; Lehtimӓki, Terho; Viikari, Jorma S.; Rung, Johan; Lyssenko, Valeriya; Perola, Markus; Heid, Iris M.; Herder, Christian; Grallert, Harald; Müller-Nurasyid, Martina; Roden, Michael; Hypponen, Elina; Isaacs, Aaron; van Leeuwen, Elisabeth M.; Karssen, Lennart C.; Mihailov, Evelin; Houwing-Duistermaat, Jeanine J.; de Craen, Anton J. M.; Deelen, Joris; Havulinna, Aki S.; Blades, Matthew; Hengstenberg, Christian; Erdmann, Jeanette; Schunkert, Heribert; Kaprio, Jaakko; Tobin, Martin D.; Samani, Nilesh J.; Lind, Lars; Salomaa, Veikko; Lindgren, Cecilia M.; Slagboom, P. Eline; Metspalu, Andres; van Duijn, Cornelia M.; Eriksson, Johan G.; Peters, Annette; Gieger, Christian; Jula, Antti; Groop, Leif; Raitakari, Olli T.; Power, Chris; Penninx, Brenda W. J. H.; de Geus, Eco; Smit, Johannes H.; Boomsma, Dorret I.; Pedersen, Nancy L.; Ingelsson, Erik; Thorsteinsdottir, Unnur; Stefansson, Kari; Ripatti, Samuli; Prokopenko, Inga; McCarthy, Mark I.; Morris, Andrew P.

2015-01-01

Reference panels from the 1000 Genomes (1000G) Project Consortium provide near complete coverage of common and low-frequency genetic variation with minor allele frequency ≥0.5% across European ancestry populations. Within the European Network for Genetic and Genomic Epidemiology (ENGAGE) Consortium, we have undertaken the first large-scale meta-analysis of genome-wide association studies (GWAS), supplemented by 1000G imputation, for four quantitative glycaemic and obesity-related traits, in up to 87,048 individuals of European ancestry. We identified two loci for body mass index (BMI) at genome-wide significance, and two for fasting glucose (FG), none of which has been previously reported in larger meta-analysis efforts to combine GWAS of European ancestry. Through conditional analysis, we also detected multiple distinct signals of association mapping to established loci for waist-hip ratio adjusted for BMI (RSPO3) and FG (GCK and G6PC2). The index variant for one association signal at the G6PC2 locus is a low-frequency coding allele, H177Y, which has recently been demonstrated to have a functional role in glucose regulation. Fine-mapping analyses revealed that the non-coding variants most likely to drive association signals at established and novel loci were enriched for overlap with enhancer elements, which for FG mapped to promoter and transcription factor binding sites in pancreatic islets, in particular. Our study demonstrates that 1000G imputation and genetic fine-mapping of common and low-frequency variant association signals at GWAS loci, integrated with genomic annotation in relevant tissues, can provide insight into the functional and regulatory mechanisms through which their effects on glycaemic and obesity-related traits are mediated. PMID:26132169
Natural Allelic Diversity, Genetic Structure and Linkage Disequilibrium Pattern in Wild Chickpea

PubMed Central

Kujur, Alice; Das, Shouvik; Badoni, Saurabh; Kumar, Vinod; Singh, Mohar; Bansal, Kailash C.; Tyagi, Akhilesh K.; Parida, Swarup K.

2014-01-01

Characterization of natural allelic diversity and understanding the genetic structure and linkage disequilibrium (LD) pattern in wild germplasm accessions by large-scale genotyping of informative microsatellite and single nucleotide polymorphism (SNP) markers is requisite to facilitate chickpea genetic improvement. Large-scale validation and high-throughput genotyping of genome-wide physically mapped 478 genic and genomic microsatellite markers and 380 transcription factor gene-derived SNP markers using gel-based assay, fluorescent dye-labelled automated fragment analyser and matrix-assisted laser desorption ionization-time of flight (MALDI-TOF) mass array have been performed. Outcome revealed their high genotyping success rate (97.5%) and existence of a high level of natural allelic diversity among 94 wild and cultivated Cicer accessions. High intra- and inter-specific polymorphic potential and wider molecular diversity (11–94%) along with a broader genetic base (13–78%) specifically in the functional genic regions of wild accessions was assayed by mapped markers. It suggested their utility in monitoring introgression and transferring target trait-specific genomic (gene) regions from wild to cultivated gene pool for the genetic enhancement. Distinct species/gene pool-wise differentiation, admixed domestication pattern, and differential genome-wide recombination and LD estimates/decay observed in a six structured population of wild and cultivated accessions using mapped markers further signifies their usefulness in chickpea genetics, genomics and breeding. PMID:25222488
Fine-scale maps of recombination rates and hotspots in the mouse genome.

PubMed

Brunschwig, Hadassa; Levi, Liat; Ben-David, Eyal; Williams, Robert W; Yakir, Benjamin; Shifman, Sagiv

2012-07-01

Recombination events are not uniformly distributed and often cluster in narrow regions known as recombination hotspots. Several studies using different approaches have dramatically advanced our understanding of recombination hotspot regulation. Population genetic data have been used to map and quantify hotspots in the human genome. Genetic variation in recombination rates and hotspots usage have been explored in human pedigrees, mouse intercrosses, and by sperm typing. These studies pointed to the central role of the PRDM9 gene in hotspot modulation. In this study, we used single nucleotide polymorphisms (SNPs) from whole-genome resequencing and genotyping studies of mouse inbred strains to estimate recombination rates across the mouse genome and identified 47,068 historical hotspots--an average of over 2477 per chromosome. We show by simulation that inbred mouse strains can be used to identify positions of historical hotspots. Recombination hotspots were found to be enriched for the predicted binding sequences for different alleles of the PRDM9 protein. Recombination rates were on average lower near transcription start sites (TSS). Comparing the inferred historical recombination hotspots with the recent genome-wide mapping of double-strand breaks (DSBs) in mouse sperm revealed a significant overlap, especially toward the telomeres. Our results suggest that inbred strains can be used to characterize and study the dynamics of historical recombination hotspots. They also strengthen previous findings on mouse recombination hotspots, and specifically the impact of sequence variants in Prdm9.
Genomic Signal Processing: Predicting Basic Molecular Biological Principles

NASA Astrophysics Data System (ADS)

Alter, Orly

2005-03-01

Advances in high-throughput technologies enable acquisition of different types of molecular biological data, monitoring the flow of biological information as DNA is transcribed to RNA, and RNA is translated to proteins, on a genomic scale. Future discovery in biology and medicine will come from the mathematical modeling of these data, which hold the key to fundamental understanding of life on the molecular level, as well as answers to questions regarding diagnosis, treatment and drug development. Recently we described data-driven models for genome-scale molecular biological data, which use singular value decomposition (SVD) and the comparative generalized SVD (GSVD). Now we describe an integrative data-driven model, which uses pseudoinverse projection (1). We also demonstrate the predictive power of these matrix algebra models (2). The integrative pseudoinverse projection model formulates any number of genome-scale molecular biological data sets in terms of one chosen set of data samples, or of profiles extracted mathematically from data samples, designated the ``basis'' set. The mathematical variables of this integrative model, the pseudoinverse correlation patterns that are uncovered in the data, represent independent processes and corresponding cellular states (such as observed genome-wide effects of known regulators or transcription factors, the biological components of the cellular machinery that generate the genomic signals, and measured samples in which these regulators or transcription factors are over- or underactive). Reconstruction of the data in the basis simulates experimental observation of only the cellular states manifest in the data that correspond to those of the basis. Classification of the data samples according to their reconstruction in the basis, rather than their overall measured profiles, maps the cellular states of the data onto those of the basis, and gives a global picture of the correlations and possibly also causal coordination of these two sets of states. Mapping genome-scale protein binding data using pseudoinverse projection onto patterns of RNA expression data that had been extracted by SVD and GSVD, a novel correlation between DNA replication initiation and RNA transcription during the cell cycle in yeast, that might be due to a previously unknown mechanism of regulation, is predicted. (1) Alter & Golub, Proc. Natl. Acad. Sci. USA 101, 16577 (2004). (2) Alter, Golub, Brown & Botstein, Miami Nat. Biotechnol. Winter Symp. 2004 (www.med.miami.edu/mnbws/alter-.pdf)
Enriching Genomic Resources and Marker Development from Transcript Sequences of Jatropha curcas for Microgravity Studies

PubMed Central

Tian, Wenlan; Paudel, Dev

2017-01-01

Jatropha (Jatropha curcas L.) is an economically important species with a great potential for biodiesel production. To enrich the jatropha genomic databases and resources for microgravity studies, we sequenced and annotated the transcriptome of jatropha and developed SSR and SNP markers from the transcriptome sequences. In total 1,714,433 raw reads with an average length of 441.2 nucleotides were generated. De novo assembling and clustering resulted in 115,611 uniquely assembled sequences (UASs) including 21,418 full-length cDNAs and 23,264 new jatropha transcript sequences. The whole set of UASs were fully annotated, out of which 59,903 (51.81%) were assigned with gene ontology (GO) term, 12,584 (10.88%) had orthologs in Eukaryotic Orthologous Groups (KOG), and 8,822 (7.63%) were mapped to 317 pathways in six different categories in Kyoto Encyclopedia of Genes and Genome (KEGG) database, and it contained 3,588 putative transcription factors. From the UASs, 9,798 SSRs were discovered with AG/CT as the most frequent (45.8%) SSR motif type. Further 38,693 SNPs were detected and 7,584 remained after filtering. This UAS set has enriched the current jatropha genomic databases and provided a large number of genetic markers, which can facilitate jatropha genetic improvement and many other genetic and biological studies. PMID:28154822
Zseq: An Approach for Preprocessing Next-Generation Sequencing Data.

PubMed

Alkhateeb, Abedalrhman; Rueda, Luis

2017-08-01

Next-generation sequencing technology generates a huge number of reads (short sequences), which contain a vast amount of genomic data. The sequencing process, however, comes with artifacts. Preprocessing of sequences is mandatory for further downstream analysis. We present Zseq, a linear method that identifies the most informative genomic sequences and reduces the number of biased sequences, sequence duplications, and ambiguous nucleotides. Zseq finds the complexity of the sequences by counting the number of unique k-mers in each sequence as its corresponding score and also takes into the account other factors such as ambiguous nucleotides or high GC-content percentage in k-mers. Based on a z-score threshold, Zseq sweeps through the sequences again and filters those with a z-score less than the user-defined threshold. Zseq algorithm is able to provide a better mapping rate; it reduces the number of ambiguous bases significantly in comparison with other methods. Evaluation of the filtered reads has been conducted by aligning the reads and assembling the transcripts using the reference genome as well as de novo assembly. The assembled transcripts show a better discriminative ability to separate cancer and normal samples in comparison with another state-of-the-art method. Moreover, de novo assembled transcripts from the reads filtered by Zseq have longer genomic sequences than other tested methods. Estimating the threshold of the cutoff point is introduced using labeling rules with optimistic results.
Analysis of the First Genome of a Hyperthermophilic Marine Virus-Like Particle, PAV1, Isolated from Pyrococcus abyssi▿ †

PubMed Central

Geslin, C.; Gaillard, M.; Flament, D.; Rouault, K.; Le Romancer, M.; Prieur, D.; Erauso, G.

2007-01-01

Only one virus-like particle (VLP) has been reported from hyperthermophilic Euryarchaeotes. This VLP, named PAV1, is shaped like a lemon and was isolated from a strain of “Pyrococcus abyssi,” a deep-sea isolate. Its genome consists of a double-stranded circular DNA of 18 kb which is also present at a high copy number (60 per chromosome) free within the host cytoplasm but is not integrated into the host chromosome. Here, we report the results of complete analysis of the PAV1 genome. All the 25 predicted genes, except 3, are located on one DNA strand. A transcription map has been made by using a reverse transcription-PCR assay. All the identified open reading frames (ORFs) are transcribed. The most significant similarities relate to four ORFs. ORF 180a shows 31% identity with ORF 181 of the pRT1 plasmid isolated from Pyrococcus sp. strain JT1. ORFs 676 and 678 present similarities with a concanavalin A-like lectin/glucanase domain, which could be involved in the process of host-virus recognition, and ORF 59 presents similarities with the transcriptional regulator CopG. The genome of PAV1 displays unique features at the nucleic and proteinic level, indicating that PAV1 should be attached at least to a novel genus or virus family. PMID:17449623
Dynamic epigenetic regulation of gene expression during the life cycle of malaria parasite Plasmodium falciparum.

PubMed

Gupta, Archna P; Chin, Wai Hoe; Zhu, Lei; Mok, Sachel; Luah, Yen-Hoon; Lim, Eng-How; Bozdech, Zbynek

2013-02-01

Epigenetic mechanisms are emerging as one of the major factors of the dynamics of gene expression in the human malaria parasite, Plasmodium falciparum. To elucidate the role of chromatin remodeling in transcriptional regulation associated with the progression of the P. falciparum intraerythrocytic development cycle (IDC), we mapped the temporal pattern of chromosomal association with histone H3 and H4 modifications using ChIP-on-chip. Here, we have generated a broad integrative epigenomic map of twelve histone modifications during the P. falciparum IDC including H4K5ac, H4K8ac, H4K12ac, H4K16ac, H3K9ac, H3K14ac, H3K56ac, H4K20me1, H4K20me3, H3K4me3, H3K79me3 and H4R3me2. While some modifications were found to be associated with the vast majority of the genome and their occupancy was constant, others showed more specific and highly dynamic distribution. Importantly, eight modifications displaying tight correlations with transcript levels showed differential affinity to distinct genomic regions with H4K8ac occupying predominantly promoter regions while others occurred at the 5' ends of coding sequences. The promoter occupancy of H4K8ac remained unchanged when ectopically inserted at a different locus, indicating the presence of specific DNA elements that recruit histone modifying enzymes regardless of their broad chromatin environment. In addition, we showed the presence of multivalent domains on the genome carrying more than one histone mark, highlighting the importance of combinatorial effects on transcription. Overall, our work portrays a substantial association between chromosomal locations of various epigenetic markers, transcriptional activity and global stage-specific transitions in the epigenome.
Achievements and prospects of genomics-assisted breeding in three legume crops of the semi-arid tropics.

PubMed

Varshney, Rajeev K; Mohan, S Murali; Gaur, Pooran M; Gangarao, N V P R; Pandey, Manish K; Bohra, Abhishek; Sawargaonkar, Shrikant L; Chitikineni, Annapurna; Kimurto, Paul K; Janila, Pasupuleti; Saxena, K B; Fikre, Asnake; Sharma, Mamta; Rathore, Abhishek; Pratap, Aditya; Tripathi, Shailesh; Datta, Subhojit; Chaturvedi, S K; Mallikarjuna, Nalini; Anuradha, G; Babbar, Anita; Choudhary, Arbind K; Mhase, M B; Bharadwaj, Ch; Mannur, D M; Harer, P N; Guo, Baozhu; Liang, Xuanqiang; Nadarajan, N; Gowda, C L L

2013-12-01

Advances in next-generation sequencing and genotyping technologies have enabled generation of large-scale genomic resources such as molecular markers, transcript reads and BAC-end sequences (BESs) in chickpea, pigeonpea and groundnut, three major legume crops of the semi-arid tropics. Comprehensive transcriptome assemblies and genome sequences have either been developed or underway in these crops. Based on these resources, dense genetic maps, QTL maps as well as physical maps for these legume species have also been developed. As a result, these crops have graduated from 'orphan' or 'less-studied' crops to 'genomic resources rich' crops. This article summarizes the above-mentioned advances in genomics and genomics-assisted breeding applications in the form of marker-assisted selection (MAS) for hybrid purity assessment in pigeonpea; marker-assisted backcrossing (MABC) for introgressing QTL region for drought-tolerance related traits, Fusarium wilt (FW) resistance and Ascochyta blight (AB) resistance in chickpea; late leaf spot (LLS), leaf rust and nematode resistance in groundnut. We critically present the case of use of other modern breeding approaches like marker-assisted recurrent selection (MARS) and genomic selection (GS) to utilize the full potential of genomics-assisted breeding for developing superior cultivars with enhanced tolerance to various environmental stresses. In addition, this article recommends the use of advanced-backcross (AB-backcross) breeding and development of specialized populations such as multi-parents advanced generation intercross (MAGIC) for creating new variations that will help in developing superior lines with broadened genetic base. In summary, we propose the use of integrated genomics and breeding approach in these legume crops to enhance crop productivity in marginal environments ensuring food security in developing countries. Copyright © 2012 Elsevier Inc. All rights reserved.
Identification and characterization of a new multigene family in the human MHC: A candidate autoimmune disease susceptibility element (3.8-1)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Harris, J.M.; Venditti, C.P.; Chorney, M.J.

1994-09-01

An association between idiopathic hemochromatosis (HFE) and the HLA-A3 locus has been previously well-established. In an attempt to identify potential HFE candidate genes, a genomic DNA fragment distal to the HLA-A9 breakpoint was used to screen a B cell cDNA library; a member (3.8-1) of a new multigene family, composed of five distinct genomic cross-reactive fragments, was identified. Clone 3.8-1 represents the 3{prime} end of 9.6 kb transcript which is expressed in multiple tissues including the spleen, thymus, lung and kidney. Sequencing and genome database analysis indicate that 3.8-1 is unique, with no homology to any known entries. The genomicmore » residence of 3-8.1, defined by polymorphism analysis and physical mapping using YAC clones, appears to be absent from the genomes of higher primates, although four other cross-reactivities are maintained. The absence of this gene as well as other probes which map in the TNF to HLA-B interval, suggest that this portion of the human HMC, located between the Class I and Class III regions, arose in humans as the result of a post-speciation insertional event. The large size of the 3.8-1 gene and the possible categorization of 3.8-1 as a human-specific gene are significant given the genetic data that place an autoimmune susceptibility element for IDDM and myasthenia gravis in the precise region where this gene resides. In an attempt to isolate the 5{prime} end of this large transcript, we have constructed a cosmid contig which encompasses the genomic locus of this gene and are progressively isolating coding sequences by exon trapping.« less
Strand-specific transcriptome profiling with directly labeled RNA on genomic tiling microarrays

PubMed Central

2011-01-01

Background With lower manufacturing cost, high spot density, and flexible probe design, genomic tiling microarrays are ideal for comprehensive transcriptome studies. Typically, transcriptome profiling using microarrays involves reverse transcription, which converts RNA to cDNA. The cDNA is then labeled and hybridized to the probes on the arrays, thus the RNA signals are detected indirectly. Reverse transcription is known to generate artifactual cDNA, in particular the synthesis of second-strand cDNA, leading to false discovery of antisense RNA. To address this issue, we have developed an effective method using RNA that is directly labeled, thus by-passing the cDNA generation. This paper describes this method and its application to the mapping of transcriptome profiles. Results RNA extracted from laboratory cultures of Porphyromonas gingivalis was fluorescently labeled with an alkylation reagent and hybridized directly to probes on genomic tiling microarrays specifically designed for this periodontal pathogen. The generated transcriptome profile was strand-specific and produced signals close to background level in most antisense regions of the genome. In contrast, high levels of signal were detected in the antisense regions when the hybridization was done with cDNA. Five antisense areas were tested with independent strand-specific RT-PCR and none to negligible amplification was detected, indicating that the strong antisense cDNA signals were experimental artifacts. Conclusions An efficient method was developed for mapping transcriptome profiles specific to both coding strands of a bacterial genome. This method chemically labels and uses extracted RNA directly in microarray hybridization. The generated transcriptome profile was free of cDNA artifactual signals. In addition, this method requires fewer processing steps and is potentially more sensitive in detecting small amount of RNA compared to conventional end-labeling methods due to the incorporation of more fluorescent molecules per RNA fragment. PMID:21235785

Comprehensive Genome-Wide Survey, Genomic Constitution and Expression Profiling of the NAC Transcription Factor Family in Foxtail Millet (Setaria italica L.)

PubMed Central

Puranik, Swati; Sahu, Pranav Pankaj; Mandal, Sambhu Nath; B., Venkata Suresh; Parida, Swarup Kumar; Prasad, Manoj

2013-01-01

The NAC proteins represent a major plant-specific transcription factor family that has established enormously diverse roles in various plant processes. Aided by the availability of complete genomes, several members of this family have been identified in Arabidopsis, rice, soybean and poplar. However, no comprehensive investigation has been presented for the recently sequenced, naturally stress tolerant crop, Setaria italica (foxtail millet) that is famed as a model crop for bioenergy research. In this study, we identified 147 putative NAC domain-encoding genes from foxtail millet by systematic sequence analysis and physically mapped them onto nine chromosomes. Genomic organization suggested that inter-chromosomal duplications may have been responsible for expansion of this gene family in foxtail millet. Phylogenetically, they were arranged into 11 distinct sub-families (I-XI), with duplicated genes fitting into one cluster and possessing conserved motif compositions. Comparative mapping with other grass species revealed some orthologous relationships and chromosomal rearrangements including duplication, inversion and deletion of genes. The evolutionary significance as duplication and divergence of NAC genes based on their amino acid substitution rates was understood. Expression profiling against various stresses and phytohormones provides novel insights into specific and/or overlapping expression patterns of SiNAC genes, which may be responsible for functional divergence among individual members in this crop. Further, we performed structure modeling and molecular simulation of a stress-responsive protein, SiNAC128, proffering an initial framework for understanding its molecular function. Taken together, this genome-wide identification and expression profiling unlocks new avenues for systematic functional analysis of novel NAC gene family candidates which may be applied for improvising stress adaption in plants. PMID:23691254
Comprehensive genome-wide survey, genomic constitution and expression profiling of the NAC transcription factor family in foxtail millet (Setaria italica L.).

PubMed

Puranik, Swati; Sahu, Pranav Pankaj; Mandal, Sambhu Nath; B, Venkata Suresh; Parida, Swarup Kumar; Prasad, Manoj

2013-01-01

The NAC proteins represent a major plant-specific transcription factor family that has established enormously diverse roles in various plant processes. Aided by the availability of complete genomes, several members of this family have been identified in Arabidopsis, rice, soybean and poplar. However, no comprehensive investigation has been presented for the recently sequenced, naturally stress tolerant crop, Setaria italica (foxtail millet) that is famed as a model crop for bioenergy research. In this study, we identified 147 putative NAC domain-encoding genes from foxtail millet by systematic sequence analysis and physically mapped them onto nine chromosomes. Genomic organization suggested that inter-chromosomal duplications may have been responsible for expansion of this gene family in foxtail millet. Phylogenetically, they were arranged into 11 distinct sub-families (I-XI), with duplicated genes fitting into one cluster and possessing conserved motif compositions. Comparative mapping with other grass species revealed some orthologous relationships and chromosomal rearrangements including duplication, inversion and deletion of genes. The evolutionary significance as duplication and divergence of NAC genes based on their amino acid substitution rates was understood. Expression profiling against various stresses and phytohormones provides novel insights into specific and/or overlapping expression patterns of SiNAC genes, which may be responsible for functional divergence among individual members in this crop. Further, we performed structure modeling and molecular simulation of a stress-responsive protein, SiNAC128, proffering an initial framework for understanding its molecular function. Taken together, this genome-wide identification and expression profiling unlocks new avenues for systematic functional analysis of novel NAC gene family candidates which may be applied for improvising stress adaption in plants.
The full transcription map of mouse papillomavirus type 1 (MmuPV1) in mouse wart tissues

PubMed Central

Kim, Bong-Hyun; Gotte, Deanna; Chen, Xiongfong; Cam, Maggie; Lambert, Paul F.

2017-01-01

Mouse papillomavirus type 1 (MmuPV1) provides, for the first time, the opportunity to study infection and pathogenesis of papillomaviruses in the context of laboratory mice. In this report, we define the transcriptome of MmuPV1 genome present in papillomas arising in experimentally infected mice using a combination of RNA-seq, PacBio Iso-seq, 5’ RACE, 3’ RACE, primer-walking RT-PCR, RNase protection, Northern blot and in situ hybridization analyses. We demonstrate that the MmuPV1 genome is transcribed unidirectionally from five major promoters (P) or transcription start sites (TSS) and polyadenylates its transcripts at two major polyadenylation (pA) sites. We designate the P7503, P360 and P859 as “early” promoters because they give rise to transcripts mostly utilizing the polyadenylation signal at nt 3844 and therefore can only encode early genes, and P7107 and P533 as “late” promoters because they give rise to transcripts utilizing polyadenylation signals at either nt 3844 or nt 7047, the latter being able to encode late, capsid proteins. MmuPV1 genome contains five splice donor sites and three acceptor sites that produce thirty-six RNA isoforms deduced to express seven predicted early gene products (E6, E7, E1, E1^M1, E1^M2, E2 and E8^E2) and three predicted late gene products (E1^E4, L2 and L1). The majority of the viral early transcripts are spliced once from nt 757 to 3139, while viral late transcripts, which are predicted to encode L1, are spliced twice, first from nt 7243 to either nt 3139 (P7107) or nt 757 to 3139 (P533) and second from nt 3431 to nt 5372. Thirteen of these viral transcripts were detectable by Northern blot analysis, with the P533-derived late E1^E4 transcripts being the most abundant. The late transcripts could be detected in highly differentiated keratinocytes of MmuPV1-infected tissues as early as ten days after MmuPV1 inoculation and correlated with detection of L1 protein and viral DNA amplification. In mature warts, detection of L1 was also found in more poorly differentiated cells, as previously reported. Subclinical infections were also observed. The comprehensive transcription map of MmuPV1 generated in this study provides further evidence that MmuPV1 is similar to high-risk cutaneous beta human papillomaviruses. The knowledge revealed will facilitate the use of MmuPV1 as an animal virus model for understanding of human papillomavirus gene expression, pathogenesis and immunology. PMID:29176795
Global analysis of WRKY transcription factor superfamily in Setaria identifies potential candidates involved in abiotic stress signaling

PubMed Central

Muthamilarasan, Mehanathan; Bonthala, Venkata S.; Khandelwal, Rohit; Jaishankar, Jananee; Shweta, Shweta; Nawaz, Kashif; Prasad, Manoj

2015-01-01

Transcription factors (TFs) are major players in stress signaling and constitute an integral part of signaling networks. Among the major TFs, WRKY proteins play pivotal roles in regulation of transcriptional reprogramming associated with stress responses. In view of this, genome- and transcriptome-wide identification of WRKY TF family was performed in the C4model plants, Setaria italica (SiWRKY) and S. viridis (SvWRKY), respectively. The study identified 105 SiWRKY and 44 SvWRKY proteins that were computationally analyzed for their physicochemical properties. Sequence alignment and phylogenetic analysis classified these proteins into three major groups, namely I, II, and III with majority of WRKY proteins belonging to group II (53 SiWRKY and 23 SvWRKY), followed by group III (39 SiWRKY and 11 SvWRKY) and group I (10 SiWRKY and 6 SvWRKY). Group II proteins were further classified into 5 subgroups (IIa to IIe) based on their phylogeny. Domain analysis showed the presence of WRKY motif and zinc finger-like structures in these proteins along with additional domains in a few proteins. All SiWRKY genes were physically mapped on the S. italica genome and their duplication analysis revealed that 10 and 8 gene pairs underwent tandem and segmental duplications, respectively. Comparative mapping of SiWRKY and SvWRKY genes in related C4 panicoid genomes demonstrated the orthologous relationships between these genomes. In silico expression analysis of SiWRKY and SvWRKY genes showed their differential expression patterns in different tissues and stress conditions. Expression profiling of candidate SiWRKY genes in response to stress (dehydration and salinity) and hormone treatments (abscisic acid, salicylic acid, and methyl jasmonate) suggested the putative involvement of SiWRKY066 and SiWRKY082 in stress and hormone signaling. These genes could be potential candidates for further characterization to delineate their functional roles in abiotic stress signaling. PMID:26635818
Global analysis of WRKY transcription factor superfamily in Setaria identifies potential candidates involved in abiotic stress signaling.

PubMed

Muthamilarasan, Mehanathan; Bonthala, Venkata S; Khandelwal, Rohit; Jaishankar, Jananee; Shweta, Shweta; Nawaz, Kashif; Prasad, Manoj

2015-01-01

Transcription factors (TFs) are major players in stress signaling and constitute an integral part of signaling networks. Among the major TFs, WRKY proteins play pivotal roles in regulation of transcriptional reprogramming associated with stress responses. In view of this, genome- and transcriptome-wide identification of WRKY TF family was performed in the C4model plants, Setaria italica (SiWRKY) and S. viridis (SvWRKY), respectively. The study identified 105 SiWRKY and 44 SvWRKY proteins that were computationally analyzed for their physicochemical properties. Sequence alignment and phylogenetic analysis classified these proteins into three major groups, namely I, II, and III with majority of WRKY proteins belonging to group II (53 SiWRKY and 23 SvWRKY), followed by group III (39 SiWRKY and 11 SvWRKY) and group I (10 SiWRKY and 6 SvWRKY). Group II proteins were further classified into 5 subgroups (IIa to IIe) based on their phylogeny. Domain analysis showed the presence of WRKY motif and zinc finger-like structures in these proteins along with additional domains in a few proteins. All SiWRKY genes were physically mapped on the S. italica genome and their duplication analysis revealed that 10 and 8 gene pairs underwent tandem and segmental duplications, respectively. Comparative mapping of SiWRKY and SvWRKY genes in related C4 panicoid genomes demonstrated the orthologous relationships between these genomes. In silico expression analysis of SiWRKY and SvWRKY genes showed their differential expression patterns in different tissues and stress conditions. Expression profiling of candidate SiWRKY genes in response to stress (dehydration and salinity) and hormone treatments (abscisic acid, salicylic acid, and methyl jasmonate) suggested the putative involvement of SiWRKY066 and SiWRKY082 in stress and hormone signaling. These genes could be potential candidates for further characterization to delineate their functional roles in abiotic stress signaling.
Recovering complete mitochondrial genome sequences from RNA-Seq: A case study of Polytomella non-photosynthetic green algae.

PubMed

Tian, Yao; Smith, David Roy

2016-05-01

Thousands of mitochondrial genomes have been sequenced, but there are comparatively few available mitochondrial transcriptomes. This might soon be changing. High-throughput RNA sequencing (RNA-Seq) techniques have made it fast and cheap to generate massive amounts of mitochondrial transcriptomic data. Here, we explore the utility of RNA-Seq for assembling mitochondrial genomes and studying their expression patterns. Specifically, we investigate the mitochondrial transcriptomes from Polytomella non-photosynthetic green algae, which have among the smallest, most reduced mitochondrial genomes from the Archaeplastida as well as fragmented rRNA-coding regions, palindromic genes, and linear chromosomes with telomeres. Isolation of whole genomic RNA from the four known Polytomella species followed by Illumina paired-end sequencing generated enough mitochondrial-derived reads to easily recover almost-entire mitochondrial genome sequences. Read-mapping and coverage statistics also gave insights into Polytomella mitochondrial transcriptional architecture, revealing polycistronic transcripts and the expression of telomeres and palindromic genes. Ultimately, RNA-Seq is a promising, cost-effective technique for studying mitochondrial genetics, but it does have drawbacks, which are discussed. One of its greatest potentials, as shown here, is that it can be used to generate near-complete mitochondrial genome sequences, which could be particularly useful in situations where there is a lack of available mtDNA data. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
N6-Methyladenosine in Flaviviridae Viral RNA Genomes Regulates Infection.

PubMed

Gokhale, Nandan S; McIntyre, Alexa B R; McFadden, Michael J; Roder, Allison E; Kennedy, Edward M; Gandara, Jorge A; Hopcraft, Sharon E; Quicke, Kendra M; Vazquez, Christine; Willer, Jason; Ilkayeva, Olga R; Law, Brittany A; Holley, Christopher L; Garcia-Blanco, Mariano A; Evans, Matthew J; Suthar, Mehul S; Bradrick, Shelton S; Mason, Christopher E; Horner, Stacy M

2016-11-09

The RNA modification N6-methyladenosine (m 6 A) post-transcriptionally regulates RNA function. The cellular machinery that controls m 6 A includes methyltransferases and demethylases that add or remove this modification, as well as m 6 A-binding YTHDF proteins that promote the translation or degradation of m 6 A-modified mRNA. We demonstrate that m 6 A modulates infection by hepatitis C virus (HCV). Depletion of m 6 A methyltransferases or an m 6 A demethylase, respectively, increases or decreases infectious HCV particle production. During HCV infection, YTHDF proteins relocalize to lipid droplets, sites of viral assembly, and their depletion increases infectious viral particles. We further mapped m 6 A sites across the HCV genome and determined that inactivating m 6 A in one viral genomic region increases viral titer without affecting RNA replication. Additional mapping of m 6 A on the RNA genomes of other Flaviviridae, including dengue, Zika, yellow fever, and West Nile virus, identifies conserved regions modified by m 6 A. Altogether, this work identifies m 6 A as a conserved regulatory mark across Flaviviridae genomes. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
Molecular mapping of QTLs for plant type and earliness traits in pigeonpea (Cajanus cajan L. Millsp.).

PubMed

Kumawat, Giriraj; Raje, Ranjeet S; Bhutani, Shefali; Pal, Jitendra K; Mithra, Amitha S V C R; Gaikwad, Kishor; Sharma, Tilak R; Singh, Nagendra K

2012-10-08

Pigeonpea is an important grain legume of the semi-arid tropics and sub-tropical regions where it plays a crucial role in the food and nutritional security of the people. The average productivity of pigeonpea has remained very low and stagnant for over five decades due to lack of genomic information and intensive breeding efforts. Previous SSR-based linkage maps of pigeonpea used inter-specific crosses due to low inter-varietal polymorphism. Here our aim was to construct a high density intra-specific linkage map using genic-SNP markers for mapping of major quantitative trait loci (QTLs) for key agronomic traits, including plant height, number of primary and secondary branches, number of pods, days to flowering and days to maturity in pigeonpea. A population of 186 F2:3 lines derived from an intra-specific cross between inbred lines 'Pusa Dwarf' and 'HDM04-1' was used to construct a dense molecular linkage map of 296 genic SNP and SSR markers covering a total adjusted map length of 1520.22 cM for the 11 chromosomes of the pigeonpea genome. This is the first dense intra-specific linkage map of pigeonpea with the highest genome length coverage. Phenotypic data from the F2:3 families were used to identify thirteen QTLs for the six agronomic traits. The proportion of phenotypic variance explained by the individual QTLs ranged from 3.18% to 51.4%. Ten of these QTLs were clustered in just two genomic regions, indicating pleiotropic effects or close genetic linkage. In addition to the main effects, significant epistatic interaction effects were detected between the QTLs for number of pods per plant. A large amount of information on transcript sequences, SSR markers and draft genome sequence is now available for pigeonpea. However, there is need to develop high density linkage maps and identify genes/QTLs for important agronomic traits for practical breeding applications. This is the first report on identification of QTLs for plant type and maturity traits in pigeonpea. The QTLs identified in this study provide a strong foundation for further validation and fine mapping for utilization in the pigeonpea improvement.
Comparative Genomics and Host Resistance against Infectious Diseases

PubMed Central

Qureshi, Salman T.; Skamene, Emil

1999-01-01

The large size and complexity of the human genome have limited the identification and functional characterization of components of the innate immune system that play a critical role in front-line defense against invading microorganisms. However, advances in genome analysis (including the development of comprehensive sets of informative genetic markers, improved physical mapping methods, and novel techniques for transcript identification) have reduced the obstacles to discovery of novel host resistance genes. Study of the genomic organization and content of widely divergent vertebrate species has shown a remarkable degree of evolutionary conservation and enables meaningful cross-species comparison and analysis of newly discovered genes. Application of comparative genomics to host resistance will rapidly expand our understanding of human immune defense by facilitating the translation of knowledge acquired through the study of model organisms. We review the rationale and resources for comparative genomic analysis and describe three examples of host resistance genes successfully identified by this approach. PMID:10081670
A comprehensive transcriptome assembly of Pigeonpea (Cajanus cajan L.) using sanger and second-generation sequencing platforms.

PubMed

Kudapa, Himabindu; Bharti, Arvind K; Cannon, Steven B; Farmer, Andrew D; Mulaosmanovic, Benjamin; Kramer, Robin; Bohra, Abhishek; Weeks, Nathan T; Crow, John A; Tuteja, Reetu; Shah, Trushar; Dutta, Sutapa; Gupta, Deepak K; Singh, Archana; Gaikwad, Kishor; Sharma, Tilak R; May, Gregory D; Singh, Nagendra K; Varshney, Rajeev K

2012-09-01

A comprehensive transcriptome assembly for pigeonpea has been developed by analyzing 128.9 million short Illumina GA IIx single end reads, 2.19 million single end FLX/454 reads, and 18 353 Sanger expressed sequenced tags from more than 16 genotypes. The resultant transcriptome assembly, referred to as CcTA v2, comprised 21 434 transcript assembly contigs (TACs) with an N50 of 1510 bp, the largest one being ~8 kb. Of the 21 434 TACs, 16 622 (77.5%) could be mapped on to the soybean genome build 1.0.9 under fairly stringent alignment parameters. Based on knowledge of intron junctions, 10 009 primer pairs were designed from 5033 TACs for amplifying intron spanning regions (ISRs). By using in silico mapping of BAC-end-derived SSR loci of pigeonpea on the soybean genome as a reference, putative mapping positions at the chromosome level were predicted for 6284 ISR markers, covering all 11 pigeonpea chromosomes. A subset of 128 ISR markers were analyzed on a set of eight genotypes. While 116 markers were validated, 70 markers showed one to three alleles, with an average of 0.16 polymorphism information content (PIC) value. In summary, the CcTA v2 transcript assembly and ISR markers will serve as a useful resource to accelerate genetic research and breeding applications in pigeonpea.
Putative bovine topological association domains and CTCF binding motifs can reduce the search space for causative regulatory variants of complex traits.

PubMed

Wang, Min; Hancock, Timothy P; Chamberlain, Amanda J; Vander Jagt, Christy J; Pryce, Jennie E; Cocks, Benjamin G; Goddard, Mike E; Hayes, Benjamin J

2018-05-24

Topological association domains (TADs) are chromosomal domains characterised by frequent internal DNA-DNA interactions. The transcription factor CTCF binds to conserved DNA sequence patterns called CTCF binding motifs to either prohibit or facilitate chromosomal interactions. TADs and CTCF binding motifs control gene expression, but they are not yet well defined in the bovine genome. In this paper, we sought to improve the annotation of bovine TADs and CTCF binding motifs, and assess whether the new annotation can reduce the search space for cis-regulatory variants. We used genomic synteny to map TADs and CTCF binding motifs from humans, mice, dogs and macaques to the bovine genome. We found that our mapped TADs exhibited the same hallmark properties of those sourced from experimental data, such as housekeeping genes, transfer RNA genes, CTCF binding motifs, short interspersed elements, H3K4me3 and H3K27ac. We showed that runs of genes with the same pattern of allele-specific expression (ASE) (either favouring paternal or maternal allele) were often located in the same TAD or between the same conserved CTCF binding motifs. Analyses of variance showed that when averaged across all bovine tissues tested, TADs explained 14% of ASE variation (standard deviation, SD: 0.056), while CTCF explained 27% (SD: 0.078). Furthermore, we showed that the quantitative trait loci (QTLs) associated with gene expression variation (eQTLs) or ASE variation (aseQTLs), which were identified from mRNA transcripts from 141 lactating cows' white blood and milk cells, were highly enriched at putative bovine CTCF binding motifs. The linearly-furthermost, and most-significant aseQTL and eQTL for each genic target were located within the same TAD as the gene more often than expected (Chi-Squared test P-value < 0.001). Our results suggest that genomic synteny can be used to functionally annotate conserved transcriptional components, and provides a tool to reduce the search space for causative regulatory variants in the bovine genome.
Faithful transcription initiation from a mitochondrial promoter in transgenic plastids

PubMed Central

Bohne, Alexandra-Viola; Ruf, Stephanie; Börner, Thomas; Bock, Ralph

2007-01-01

The transcriptional machineries of plastids and mitochondria in higher plants exhibit striking similarities. All mitochondrial genes and part of the plastid genes are transcribed by related phage-type RNA polymerases. Furthermore, the majority of mitochondrial promoters and a subset of plastid promoters show a similar structural organization. We show here that the plant mitochondrial atpA promoter is recognized by plastid RNA polymerases in vitro and in vivo. The Arabidopsis phage-type RNA polymerase RpoTp, an enzyme localized exclusively to plastids, was found to recognize the mitochondrial atpA promoter in in vitro assays suggesting the possibility that mitochondrial promoters might function as well in plastids. We have, therefore, generated transplastomic tobacco plants harboring in their chloroplast genome the atpA promoter fused to the coding region of the bacterial nptII gene. The chimeric nptII gene was found to be efficiently transcribed in chloroplasts. Mapping of the 5′ ends of the nptII transcripts revealed accurate recognition of the atpA promoter by the chloroplast transcription machinery. We show further that the 5′ untranslated region (UTR) of the mitochondrial atpA transcript is capable of mediating translation in chloroplasts. The functional and evolutionary implications of these findings as well as possible applications in chloroplast genome engineering are discussed. PMID:17959651
Characteristics and significance of intergenic polyadenylated RNA transcription in Arabidopsis.

PubMed

Moghe, Gaurav D; Lehti-Shiu, Melissa D; Seddon, Alex E; Yin, Shan; Chen, Yani; Juntawong, Piyada; Brandizzi, Federica; Bailey-Serres, Julia; Shiu, Shin-Han

2013-01-01

The Arabidopsis (Arabidopsis thaliana) genome is the most well-annotated plant genome. However, transcriptome sequencing in Arabidopsis continues to suggest the presence of polyadenylated (polyA) transcripts originating from presumed intergenic regions. It is not clear whether these transcripts represent novel noncoding or protein-coding genes. To understand the nature of intergenic polyA transcription, we first assessed its abundance using multiple messenger RNA sequencing data sets. We found 6,545 intergenic transcribed fragments (ITFs) occupying 3.6% of Arabidopsis intergenic space. In contrast to transcribed fragments that map to protein-coding and RNA genes, most ITFs are significantly shorter, are expressed at significantly lower levels, and tend to be more data set specific. A surprisingly large number of ITFs (32.1%) may be protein coding based on evidence of translation. However, our results indicate that these "translated" ITFs tend to be close to and are likely associated with known genes. To investigate if ITFs are under selection and are functional, we assessed ITF conservation through cross-species as well as within-species comparisons. Our analysis reveals that 237 ITFs, including 49 with translation evidence, are under strong selective constraint and relatively distant from annotated features. These ITFs are likely parts of novel genes. However, the selective pressure imposed on most ITFs is similar to that of randomly selected, untranscribed intergenic sequences. Our findings indicate that despite the prevalence of ITFs, apart from the possibility of genomic contamination, many may be background or noisy transcripts derived from "junk" DNA, whose production may be inherent to the process of transcription and which, on rare occasions, may act as catalysts for the creation of novel genes.
Identification of Regulatory DNA Elements Using Genome-wide Mapping of DNase I Hypersensitive Sites during Tomato Fruit Development.

PubMed

Qiu, Zhengkun; Li, Ren; Zhang, Shuaibin; Wang, Ketao; Xu, Meng; Li, Jiayang; Du, Yongchen; Yu, Hong; Cui, Xia

2016-08-01

Development and ripening of tomato fruit are precisely controlled by transcriptional regulation, which depends on the orchestrated accessibility of regulatory proteins to promoters and other cis-regulatory DNA elements. This accessibility and its effect on gene expression play a major role in defining the developmental process. To understand the regulatory mechanism and functional elements modulating morphological and anatomical changes during fruit development, we generated genome-wide high-resolution maps of DNase I hypersensitive sites (DHSs) from the fruit tissues of the tomato cultivar "Moneymaker" at 20 days post anthesis as well as break stage. By exploring variation of DHSs across fruit development stages, we pinpointed the most likely hypersensitive sites related to development-specific genes. By detecting binding motifs on DHSs of these development-specific genes or genes in the ascorbic acid biosynthetic pathway, we revealed the common regulatory elements contributing to coordinating gene transcription of plant ripening and specialized metabolic pathways. Our results contribute to a better understanding of the regulatory dynamics of genes involved in tomato fruit development and ripening. Copyright © 2016 The Author. Published by Elsevier Inc. All rights reserved.
The Genetic and Molecular Organization of the Dopa Decarboxylase Gene Cluster of Drosophila Melanogaster

PubMed Central

Stathakis, D. G.; Pentz, E. S.; Freeman, M. E.; Kullman, J.; Hankins, G. R.; Pearlson, N. J.; Wright, TRF.

1995-01-01

We report the complete molecular organization of the Dopa decarboxylase gene cluster. Mutagenesis screens recovered 77 new Df(2L)TW130 recessive lethal mutations. These new alleles combined with 263 previously isolated mutations in the cluster to define 18 essential genes. In addition, seven new deficiencies were isolated and characterized. Deficiency mapping, restriction fragment length polymorphism (RFLP) analysis and P-element-mediated germline transformation experiments determined the gene order for all 18 loci. Genomic and cDNA restriction endonuclease mapping, Northern blot analysis and DNA sequencing provided information on exact gene location, mRNA size and transcriptional direction for most of these loci. In addition, this analysis identified two transcription units that had not previously been identified by extensive mutagenesis screening. Most of the loci are contained within two dense subclusters. We discuss the effectiveness of mutagens and strategies used in our screens, the variable mutability of loci within the genome of Drosophila melanogaster, the cytological and molecular organization of the Ddc gene cluster, the validity of the one band-one gene hypothesis and a possible purpose for the clustering of genes in the Ddc region. PMID:8647399
Discovery of stimulation-responsive immune enhancers with CRISPR activation

PubMed Central

Simeonov, Dimitre R.; Gowen, Benjamin G.; Boontanrart, Mandy; Roth, Theodore L.; Gagnon, John D.; Mumbach, Maxwell R.; Satpathy, Ansuman T.; Lee, Youjin; Bray, Nicolas L.; Chan, Alice Y.; Lituiev, Dmytro S.; Nguyen, Michelle L.; Gate, Rachel E.; Subramaniam, Meena; Li, Zhongmei; Woo, Jonathan M.; Mitros, Therese; Ray, Graham J.; Curie, Gemma L.; Naddaf, Nicki; Chu, Julia S.; Ma, Hong; Boyer, Eric; Van Gool, Frederic; Huang, Hailiang; Liu, Ruize; Tobin, Victoria R.; Schumann, Kathrin; Daly, Mark J.; Farh, Kyle K; Ansel, K. Mark; Ye, Chun J.; Greenleaf, William J.; Anderson, Mark S.; Bluestone, Jeffrey A.; Chang, Howard Y.; Corn, Jacob E.; Marson, Alexander

2017-01-01

The majority of genetic variants associated with common human diseases map to enhancers, non-coding elements that shape cell-type-specific transcriptional programs and responses to extracellular cues1–3. Systematic mapping of functional enhancers and their biological contexts is required to understand the mechanisms by which variation in non-coding genetic sequences contributes to disease. Functional enhancers can be mapped by genomic sequence disruption4–6, but this approach is limited to the subset of enhancers that are necessary in the particular cellular context being studied. We hypothesized that recruitment of a strong transcriptional activator to an enhancer would be sufficient to drive target gene expression, even if that enhancer was not currently active in the assayed cells. Here we describe a discovery platform that can identify stimulus-responsive enhancers for a target gene independent of stimulus exposure. We used tiled CRISPR activation (CRISPRa)7 to synthetically recruit a transcriptional activator to sites across large genomic regions (more than 100 kilobases) surrounding two key autoimmunity risk loci, CD69 and IL2RA. We identified several CRISPRa-responsive elements with chromatin features of stimulus-responsive enhancers, including an IL2RA enhancer that harbours an autoimmunity risk variant. Using engineered mouse models, we found that sequence perturbation of the disease-associated Il2ra enhancer did not entirely block Il2ra expression, but rather delayed the timing of gene activation in response to specific extracellular signals. Enhancer deletion skewed polarization of naive T cells towards a pro-inflammatory T helper (TH17) cell state and away from a regulatory T cell state. This integrated approach identifies functional enhancers and reveals how non-coding variation associated with human immune dysfunction alters context-specific gene programs. PMID:28854172
Discovery of stimulation-responsive immune enhancers with CRISPR activation.

PubMed

Simeonov, Dimitre R; Gowen, Benjamin G; Boontanrart, Mandy; Roth, Theodore L; Gagnon, John D; Mumbach, Maxwell R; Satpathy, Ansuman T; Lee, Youjin; Bray, Nicolas L; Chan, Alice Y; Lituiev, Dmytro S; Nguyen, Michelle L; Gate, Rachel E; Subramaniam, Meena; Li, Zhongmei; Woo, Jonathan M; Mitros, Therese; Ray, Graham J; Curie, Gemma L; Naddaf, Nicki; Chu, Julia S; Ma, Hong; Boyer, Eric; Van Gool, Frederic; Huang, Hailiang; Liu, Ruize; Tobin, Victoria R; Schumann, Kathrin; Daly, Mark J; Farh, Kyle K; Ansel, K Mark; Ye, Chun J; Greenleaf, William J; Anderson, Mark S; Bluestone, Jeffrey A; Chang, Howard Y; Corn, Jacob E; Marson, Alexander

2017-09-07

The majority of genetic variants associated with common human diseases map to enhancers, non-coding elements that shape cell-type-specific transcriptional programs and responses to extracellular cues. Systematic mapping of functional enhancers and their biological contexts is required to understand the mechanisms by which variation in non-coding genetic sequences contributes to disease. Functional enhancers can be mapped by genomic sequence disruption, but this approach is limited to the subset of enhancers that are necessary in the particular cellular context being studied. We hypothesized that recruitment of a strong transcriptional activator to an enhancer would be sufficient to drive target gene expression, even if that enhancer was not currently active in the assayed cells. Here we describe a discovery platform that can identify stimulus-responsive enhancers for a target gene independent of stimulus exposure. We used tiled CRISPR activation (CRISPRa) to synthetically recruit a transcriptional activator to sites across large genomic regions (more than 100 kilobases) surrounding two key autoimmunity risk loci, CD69 and IL2RA. We identified several CRISPRa-responsive elements with chromatin features of stimulus-responsive enhancers, including an IL2RA enhancer that harbours an autoimmunity risk variant. Using engineered mouse models, we found that sequence perturbation of the disease-associated Il2ra enhancer did not entirely block Il2ra expression, but rather delayed the timing of gene activation in response to specific extracellular signals. Enhancer deletion skewed polarization of naive T cells towards a pro-inflammatory T helper (T H 17) cell state and away from a regulatory T cell state. This integrated approach identifies functional enhancers and reveals how non-coding variation associated with human immune dysfunction alters context-specific gene programs.
Discovery of stimulation-responsive immune enhancers with CRISPR activation

NASA Astrophysics Data System (ADS)

Simeonov, Dimitre R.; Gowen, Benjamin G.; Boontanrart, Mandy; Roth, Theodore L.; Gagnon, John D.; Mumbach, Maxwell R.; Satpathy, Ansuman T.; Lee, Youjin; Bray, Nicolas L.; Chan, Alice Y.; Lituiev, Dmytro S.; Nguyen, Michelle L.; Gate, Rachel E.; Subramaniam, Meena; Li, Zhongmei; Woo, Jonathan M.; Mitros, Therese; Ray, Graham J.; Curie, Gemma L.; Naddaf, Nicki; Chu, Julia S.; Ma, Hong; Boyer, Eric; van Gool, Frederic; Huang, Hailiang; Liu, Ruize; Tobin, Victoria R.; Schumann, Kathrin; Daly, Mark J.; Farh, Kyle K.; Ansel, K. Mark; Ye, Chun J.; Greenleaf, William J.; Anderson, Mark S.; Bluestone, Jeffrey A.; Chang, Howard Y.; Corn, Jacob E.; Marson, Alexander

2017-09-01

The majority of genetic variants associated with common human diseases map to enhancers, non-coding elements that shape cell-type-specific transcriptional programs and responses to extracellular cues. Systematic mapping of functional enhancers and their biological contexts is required to understand the mechanisms by which variation in non-coding genetic sequences contributes to disease. Functional enhancers can be mapped by genomic sequence disruption, but this approach is limited to the subset of enhancers that are necessary in the particular cellular context being studied. We hypothesized that recruitment of a strong transcriptional activator to an enhancer would be sufficient to drive target gene expression, even if that enhancer was not currently active in the assayed cells. Here we describe a discovery platform that can identify stimulus-responsive enhancers for a target gene independent of stimulus exposure. We used tiled CRISPR activation (CRISPRa) to synthetically recruit a transcriptional activator to sites across large genomic regions (more than 100 kilobases) surrounding two key autoimmunity risk loci, CD69 and IL2RA. We identified several CRISPRa-responsive elements with chromatin features of stimulus-responsive enhancers, including an IL2RA enhancer that harbours an autoimmunity risk variant. Using engineered mouse models, we found that sequence perturbation of the disease-associated Il2ra enhancer did not entirely block Il2ra expression, but rather delayed the timing of gene activation in response to specific extracellular signals. Enhancer deletion skewed polarization of naive T cells towards a pro-inflammatory T helper (TH17) cell state and away from a regulatory T cell state. This integrated approach identifies functional enhancers and reveals how non-coding variation associated with human immune dysfunction alters context-specific gene programs.
Transcription mapping and expression patterns of genes in the major immediate-early region of Kaposi's sarcoma-associated herpesvirus.

PubMed

Saveliev, Alexei; Zhu, Fan; Yuan, Yan

2002-08-01

Viral immediate-early (IE) genes are the first class of viral genes expressed during primary infection or reactivation from latency. They usually encode regulatory proteins that play crucial roles in viral life cycle. In a previous study, four regions in the KSHV genome were found to be actively transcribed in the immediate-early stage of viral reactivation in primary effusion lymphoma cells. Three immediate-early transcripts were characterized in these regions, as follows: mRNAs for ORF50 (KIE-1), ORF-45 (KIE-2), and ORF K4.2 (KIE-3) (F. X. Zhu, T. Cusano, and Y. Yuan, 1999, J. Virol. 73, 5556-5567). In the present study, we further analyzed the expression of genes in these IE regions in BC-1 and BCBL-1 cells. One of the immediate-early regions (KIE-1) that encompasses ORF50 and other genes was intensively studied to establish a detailed transcription map and expression patterns of genes in this region. This study led to identification of several novel IE transcripts in this region. They include a 2.6-kb mRNA which encodes ORF48/ORF29b, a family of transcripts that are complementary to ORF50 mRNA and a novel K8 IE mRNA of 1.5 kb. Together with the IE mRNA for ORF50 which was identified previously, four immediate-early genes have been mapped to KIE-1 region. Therefore, we would designate KIE-1 the major immediate-early region of KSHV. In addition, we showed that transcription of K8 gene is controlled by two promoters, yielding two transcripts, an immediate-early mRNA of 1.5 kb and a delayed-early mRNA of 1.3 kb.
DNA Double-Strand Breaks Coupled with PARP1 and HNRNPA2B1 Binding Sites Flank Coordinately Expressed Domains in Human Chromosomes

PubMed Central

Fedoseeva, Daria M.; Sosin, Dmitri V.; Grachev, Sergei A.; Serebraykova, Marina V.; Romanenko, Svetlana A.; Vorobieva, Nadezhda V.; Kravatsky, Yuri V.

2013-01-01

Genome instability plays a key role in multiple biological processes and diseases, including cancer. Genome-wide mapping of DNA double-strand breaks (DSBs) is important for understanding both chromosomal architecture and specific chromosomal regions at DSBs. We developed a method for precise genome-wide mapping of blunt-ended DSBs in human chromosomes, and observed non-random fragmentation and DSB hot spots. These hot spots are scattered along chromosomes and delimit protected 50–250 kb DNA domains. We found that about 30% of the domains (denoted forum domains) possess coordinately expressed genes and that PARP1 and HNRNPA2B1 specifically bind DNA sequences at the forum domain termini. Thus, our data suggest a novel type of gene regulation: a coordinated transcription or silencing of gene clusters delimited by DSB hot spots as well as PARP1 and HNRNPa2B1 binding sites. PMID:23593027

Genomic structure and chromosomal mapping of the human CD22 gene

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wilson, G.L.; Kozlow, E.; Kehrl, J.H.

1993-06-01

The human CD22 gene is expressed specifically in B lymphocytes and likely has an important function in cell-cell interactions. A nearly full length human CD22 cDNA clone was used to isolate genomic clones that span the CD22 gene. The CD22 gene is spread over 22 kb of DNA and is composed of 15 exons. The first exon contains the major transcriptional start sites. The translation initiation codon is located in exon 3, which also encodes a portion of the signal peptide. Exons 4 to 10 encode the seven Ig domains of CD22, exon 11 encodes the transmembrane domain, exons 12more » to 15 encode the intracytoplasmic domain of CD22, and exon 15 also contains the 3' untranslated region. A minor form of CD22 mRNA likely results from splicing of exon 5 to exon 8, skipping exons 6 and 7. A 4.6-kb Xbal fragment of the CD22 gene was used to map the chromosomal location of CD22 by fluorescence in situ hybridization. The hybridization locus was identified by combining fluorescent images of the probe with the chromosomal banding pattern generated by an Alu probe. The results demonstrate the CD22 is located within the band region q13.1 of chromosome 19. Two closely clustered major transcription start sites and several minor start sites were mapped by primer extension. Similarly to many other lymphoid-specific genes, the CD22 promoter lacks an obvious TATA box. Approximately 4 kb of DNA 5' of the transcription start sites were sequenced and found to contain multiple Alu elements. Potential binding sites for the transcriptional factors NF-kB, AP-1, and Oct-2 are located within 300 bp 5' of the major transcription start sites. A 400-bp fragment (bp -339 through +71) of the CD22 promoter region was subcloned into a pGEM-chloramphenicol acetyltransferase vector and after transfection into B and T cells was found to be active in both B and T cells. 45 refs., 7 figs., 2 tabs.« less
GAN: a platform of genomics and genetics analysis and application in Nicotiana

PubMed Central

Yang, Shuai; Zhang, Xingwei; Li, Huayang; Chen, Yudong

2018-01-01

Abstract Nicotiana is an important Solanaceae genus, and plays a significant role in modern biological research. Massive Nicotiana biological data have emerged from in-depth genomics and genetics studies. From big data to big discovery, large-scale analysis and application with new platforms is critical. Based on data accumulation, a comprehensive platform of Genomics and Genetics Analysis and Application in Nicotiana (GAN) has been developed, and is publicly available at http://biodb.sdau.edu.cn/gan/. GAN consists of four main sections: (i) Sources, a total of 5267 germplasm lines, along with detailed descriptions of associated characteristics, are all available on the Germplasm page, which can be queried using eight different inquiry modes. Seven fully sequenced species with accompanying sequences and detailed genomic annotation are available on the Genomics page. (ii) Genetics, detailed descriptions of 10 genetic linkage maps, constructed by different parents, 2239 KEGG metabolic pathway maps and 209 945 gene families across all catalogued genes, along with two co-linearity maps combining N. tabacum with available tomato and potato linkage maps are available here. Furthermore, 3 963 119 genome-SSRs, 10 621 016 SNPs, 12 388 PIPs and 102 895 reverse transcription-polymerase chain reaction primers, are all available to be used and searched on the Markers page. (iii) Tools, the genome browser JBrowse and five useful online bioinformatics softwares, Blast, Primer3, SSR-detect, Nucl-Protein and E-PCR, are provided on the JBrowse and Tools pages. (iv) Auxiliary, all the datasets are shown on a Statistics page, and are available for download on a Download page. In addition, the user’s manual is provided on a Manual page in English and Chinese languages. GAN provides a user-friendly Web interface for searching, browsing and downloading the genomics and genetics datasets in Nicotiana. As far as we can ascertain, GAN is the most comprehensive source of bio-data available, and the most applicable resource for breeding, gene mapping, gene cloning, the study of the origin and evolution of polyploidy, and related studies in Nicotiana. Database URL: http://biodb.sdau.edu.cn/gan/ PMID:29688356
A transcription map of the regions surrounding the CSF1R locus on human chromosome 5q31: Candidate genes for diastrophic dysplasia

DOE Office of Scientific and Technical Information (OSTI.GOV)

Clines, G.; Lovett, M.

1994-09-01

Diastrophic dysplasia (DTD) is an autosomal recessive disorder of unknown pathogenesis that is characterized by abnormal skeletal and cartilage growth. Phenotypic characteristics of the disorder include short stature, scoliosis, and deformation of the first metacarpal. The diastrophic dysplasia gene has been localized to chromosome 5q31-33, within {approximately}60 kb of the colony stimulating factor 1 receptor gene (CSF1R). We have used direct cDNA selection to build a transcription map across {approximately}250 kb surrounding and including the CSF1R locus. cDNA pools from human placenta, activated T cells, cerebellum, Hela cells, fetal brain, chondrocytes, chondrosarcomas and osteosarcomas were multiplexed in these selections. Aftermore » two rounds of selection, an analysis revealed that {approximately}70% of the selected cDNAs were contained within the contig. DNA sequencing and cosmid mapping data from a collection of 310 clones revealed the presence of three new genes in this region that show no appreciable homologies on sequence database searches, as well as cDNA clones from the CSF1R and the PDGFRB loci (another of the known genes in the region). An additional cDNA was found with 100% homology to the gene encoding human ribosomal protein L7 (RPL7). This cDNA comprised {approximately}25% of all selected clones. However, further analysis of the genomic contig revealed the presence of an RPL7 processed pseudogene in very close proximity to the CSF1R and PDGFRB genes. The selection of processed pseudogenes is one previously anticipated artifact of selection metholodolgies, but has not been previously observed. Mutational analysis of the three new genes is underway in diastrophic dysplasia families, as is derivation of full length cDNA clones and the expansion of this detailed transcription map into a larger genomic contig.« less
StereoGene: rapid estimation of genome-wide correlation of continuous or interval feature data.

PubMed

Stavrovskaya, Elena D; Niranjan, Tejasvi; Fertig, Elana J; Wheelan, Sarah J; Favorov, Alexander V; Mironov, Andrey A

2017-10-15

Genomics features with similar genome-wide distributions are generally hypothesized to be functionally related, for example, colocalization of histones and transcription start sites indicate chromatin regulation of transcription factor activity. Therefore, statistical algorithms to perform spatial, genome-wide correlation among genomic features are required. Here, we propose a method, StereoGene, that rapidly estimates genome-wide correlation among pairs of genomic features. These features may represent high-throughput data mapped to reference genome or sets of genomic annotations in that reference genome. StereoGene enables correlation of continuous data directly, avoiding the data binarization and subsequent data loss. Correlations are computed among neighboring genomic positions using kernel correlation. Representing the correlation as a function of the genome position, StereoGene outputs the local correlation track as part of the analysis. StereoGene also accounts for confounders such as input DNA by partial correlation. We apply our method to numerous comparisons of ChIP-Seq datasets from the Human Epigenome Atlas and FANTOM CAGE to demonstrate its wide applicability. We observe the changes in the correlation between epigenomic features across developmental trajectories of several tissue types consistent with known biology and find a novel spatial correlation of CAGE clusters with donor splice sites and with poly(A) sites. These analyses provide examples for the broad applicability of StereoGene for regulatory genomics. The StereoGene C ++ source code, program documentation, Galaxy integration scripts and examples are available from the project homepage http://stereogene.bioinf.fbb.msu.ru/. favorov@sensi.org. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Precise Maps of RNA Polymerase Reveal How Promoters Direct Initiation and Pausing

PubMed Central

Kwak, Hojoong; Fuda, Nicholas J.; Core, Leighton J.; Lis, John T.

2014-01-01

Transcription regulation occurs frequently through promoter-associated pausing of RNA polymerase II (Pol II). We developed a Precision nuclear Run-On and sequencing assay (PRO-seq) to map the genome-wide distribution of transcriptionally-engaged Pol II at base-pair resolution. Pol II accumulates immediately downstream of promoters, at intron-exon junctions that are efficiently used for splicing, and over 3' poly-adenylation sites. Focused analyses of promoters reveal that pausing is not fixed relative to initiation sites nor is it specified directly by the position of a particular core promoter element or the first nucleosome. Core promoter elements function beyond initiation, and when optimally positioned they act collectively to dictate the position and strength of pausing. We test this ‘Complex Interaction’ model with insertional mutagenesis of the Drosophila Hsp70 core promoter. PMID:23430654
Genome-Wide Identification and Expression Analysis of WRKY Gene Family in Capsicum annuum L.

PubMed

Diao, Wei-Ping; Snyder, John C; Wang, Shu-Bin; Liu, Jin-Bing; Pan, Bao-Gui; Guo, Guang-Jun; Wei, Ge

2016-01-01

The WRKY family of transcription factors is one of the most important families of plant transcriptional regulators with members regulating multiple biological processes, especially in regulating defense against biotic and abiotic stresses. However, little information is available about WRKYs in pepper (Capsicum annuum L.). The recent release of completely assembled genome sequences of pepper allowed us to perform a genome-wide investigation for pepper WRKY proteins. In the present study, a total of 71 WRKY genes were identified in the pepper genome. According to structural features of their encoded proteins, the pepper WRKY genes (CaWRKY) were classified into three main groups, with the second group further divided into five subgroups. Genome mapping analysis revealed that CaWRKY were enriched on four chromosomes, especially on chromosome 1, and 15.5% of the family members were tandemly duplicated genes. A phylogenetic tree was constructed depending on WRKY domain' sequences derived from pepper and Arabidopsis. The expression of 21 selected CaWRKY genes in response to seven different biotic and abiotic stresses (salt, heat shock, drought, Phytophtora capsici, SA, MeJA, and ABA) was evaluated by quantitative RT-PCR; Some CaWRKYs were highly expressed and up-regulated by stress treatment. Our results will provide a platform for functional identification and molecular breeding studies of WRKY genes in pepper.
A comprehensive transcript index of the human genome generated using microarrays and computational approaches

PubMed Central

Schadt, Eric E; Edwards, Stephen W; GuhaThakurta, Debraj; Holder, Dan; Ying, Lisa; Svetnik, Vladimir; Leonardson, Amy; Hart, Kyle W; Russell, Archie; Li, Guoya; Cavet, Guy; Castle, John; McDonagh, Paul; Kan, Zhengyan; Chen, Ronghua; Kasarskis, Andrew; Margarint, Mihai; Caceres, Ramon M; Johnson, Jason M; Armour, Christopher D; Garrett-Engele, Philip W; Tsinoremas, Nicholas F; Shoemaker, Daniel D

2004-01-01

Background Computational and microarray-based experimental approaches were used to generate a comprehensive transcript index for the human genome. Oligonucleotide probes designed from approximately 50,000 known and predicted transcript sequences from the human genome were used to survey transcription from a diverse set of 60 tissues and cell lines using ink-jet microarrays. Further, expression activity over at least six conditions was more generally assessed using genomic tiling arrays consisting of probes tiled through a repeat-masked version of the genomic sequence making up chromosomes 20 and 22. Results The combination of microarray data with extensive genome annotations resulted in a set of 28,456 experimentally supported transcripts. This set of high-confidence transcripts represents the first experimentally driven annotation of the human genome. In addition, the results from genomic tiling suggest that a large amount of transcription exists outside of annotated regions of the genome and serves as an example of how this activity could be measured on a genome-wide scale. Conclusions These data represent one of the most comprehensive assessments of transcriptional activity in the human genome and provide an atlas of human gene expression over a unique set of gene predictions. Before the annotation of the human genome is considered complete, however, the previously unannotated transcriptional activity throughout the genome must be fully characterized. PMID:15461792
Poly(A)-tag deep sequencing data processing to extract poly(A) sites.

PubMed

Wu, Xiaohui; Ji, Guoli; Li, Qingshun Quinn

2015-01-01

Polyadenylation [poly(A)] is an essential posttranscriptional processing step in the maturation of eukaryotic mRNA. The advent of next-generation sequencing (NGS) technology has offered feasible means to generate large-scale data and new opportunities for intensive study of polyadenylation, particularly deep sequencing of the transcriptome targeting the junction of 3'-UTR and the poly(A) tail of the transcript. To take advantage of this unprecedented amount of data, we present an automated workflow to identify polyadenylation sites by integrating NGS data cleaning, processing, mapping, normalizing, and clustering. In this pipeline, a series of Perl scripts are seamlessly integrated to iteratively map the single- or paired-end sequences to the reference genome. After mapping, the poly(A) tags (PATs) at the same genome coordinate are grouped into one cleavage site, and the internal priming artifacts removed. Then the ambiguous region is introduced to parse the genome annotation for cleavage site clustering. Finally, cleavage sites within a close range of 24 nucleotides and from different samples can be clustered into poly(A) clusters. This procedure could be used to identify thousands of reliable poly(A) clusters from millions of NGS sequences in different tissues or treatments.
ReadXplorer—visualization and analysis of mapped sequences

PubMed Central

Hilker, Rolf; Stadermann, Kai Bernd; Doppmeier, Daniel; Kalinowski, Jörn; Stoye, Jens; Straube, Jasmin; Winnebald, Jörn; Goesmann, Alexander

2014-01-01

Motivation: Fast algorithms and well-arranged visualizations are required for the comprehensive analysis of the ever-growing size of genomic and transcriptomic next-generation sequencing data. Results: ReadXplorer is a software offering straightforward visualization and extensive analysis functions for genomic and transcriptomic DNA sequences mapped on a reference. A unique specialty of ReadXplorer is the quality classification of the read mappings. It is incorporated in all analysis functions and displayed in ReadXplorer's various synchronized data viewers for (i) the reference sequence, its base coverage as (ii) normalizable plot and (iii) histogram, (iv) read alignments and (v) read pairs. ReadXplorer's analysis capability covers RNA secondary structure prediction, single nucleotide polymorphism and deletion–insertion polymorphism detection, genomic feature and general coverage analysis. Especially for RNA-Seq data, it offers differential gene expression analysis, transcription start site and operon detection as well as RPKM value and read count calculations. Furthermore, ReadXplorer can combine or superimpose coverage of different datasets. Availability and implementation: ReadXplorer is available as open-source software at http://www.readxplorer.org along with a detailed manual. Contact: rhilker@mikrobio.med.uni-giessen.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24790157
Tiled Microarray Identification of Novel Viral Transcript Structures and Distinct Transcriptional Profiles during Two Modes of Productive Murine Gammaherpesvirus 68 Infection

PubMed Central

Cheng, Benson Yee Hin; Zhi, Jizu; Santana, Alexis; Khan, Sohail; Salinas, Eduardo; Forrest, J. Craig; Zheng, Yueting; Jaggi, Shirin; Leatherwood, Janet

2012-01-01

We applied a custom tiled microarray to examine murine gammaherpesvirus 68 (MHV68) polyadenylated transcript expression in a time course of de novo infection of fibroblast cells and following phorbol ester-mediated reactivation from a latently infected B cell line. During de novo infection, all open reading frames (ORFs) were transcribed and clustered into four major temporal groups that were overlapping yet distinct from clusters based on the phorbol ester-stimulated B cell reactivation time course. High-density transcript analysis at 2-h intervals during de novo infection mapped gene boundaries with a 20-nucleotide resolution, including a previously undefined ORF73 transcript and the MHV68 ORF63 homolog of Kaposi's sarcoma-associated herpesvirus vNLRP1. ORF6 transcript initiation was mapped by tiled array and confirmed by 5′ rapid amplification of cDNA ends. The ∼1.3-kb region upstream of ORF6 was responsive to lytic infection and MHV68 RTA, identifying a novel RTA-responsive promoter. Transcription in intergenic regions consistent with the previously defined expressed genomic regions was detected during both types of productive infection. We conclude that the MHV68 transcriptome is dynamic and distinct during de novo fibroblast infection and upon phorbol ester-stimulated B cell reactivation, highlighting the need to evaluate further transcript structure and the context-dependent molecular events that govern viral gene expression during chronic infection. PMID:22318145
[Molecular combing method in the research of DNA replication parameters in isolated organs of Drosophyla melanogaster].

PubMed

Ivankin, A V; Kolesnikova, T D; Demakov, S A; Andreenkov, O V; Bil'danova, E R; Andreenkova, N G; Zhimulev, I F

2011-01-01

Methods of physical DNA mapping and direct visualization of replication and transcription in specific regions of genome play crucial role in the researches of structural and functional organization of eukaryotic genomes. Since DNA strands in the cells are organized into high-fold structure and present as highly compacted chromosomes, the majority of these methods have lower resolution at chromosomal level. One of the approaches to enhance the resolution and mapping accuracy is the method of molecular combing. The method is based on the process of stretching and alignment of DNA molecules that are covalently attached with one of the ends to the cover glass surface. In this article we describe the major methodological steps of molecular combing and their adaptation for researches of DNA replication parameters in polyploidy and diploid tissues of Drosophyla larvae.
GRID-seq reveals the global RNA-chromatin interactome

PubMed Central

Li, Xiao; Zhou, Bing; Chen, Liang; Gou, Lan-Tao; Li, Hairi; Fu, Xiang-Dong

2017-01-01

Higher eukaryotic genomes are bound by a large number of coding and non-coding RNAs, but approaches to comprehensively map the identity and binding sites of these RNAs are lacking. Here we report a method to in situ capture global RNA interactions with DNA by deep sequencing (GRID-seq), which enables the comprehensive identification of the entire repertoire of chromatin-interacting RNAs and their respective binding sites. In human, mouse and Drosophila cells, we detected a large set of tissue-specific coding and non-coding RNAs that are bound to active promoters and enhancers, especially super-enhancers. Assuming that most mRNA-chromatin interactions indicate the physical proximity of a promoter and an enhancer, we constructed a three-dimensional global connectivity map of promoters and enhancers, revealing transcription activity-linked genomic interactions in the nucleus. PMID:28922346
Epigenetic modulation by TFII-I during embryonic stem cell differentiation.

PubMed

Bayarsaihan, Dashzeveg; Makeyev, Aleksandr V; Enkhmandakh, Badam

2012-10-01

TFII-I transcription factors play an essential role during early vertebrate embryogenesis. Genome-wide mapping studies by ChIP-seq and ChIP-chip revealed that TFII-I primes multiple genomic loci in mouse embryonic stem cells and embryonic tissues. Moreover, many TFII-I-bound regions co-localize with H3K4me3/K27me3 bivalent chromatin within the promoters of lineage-specific genes. This minireview provides a summary of current knowledge regarding the function of TFII-I in epigenetic control of stem cell differentiation. Copyright © 2012 Wiley Periodicals, Inc.
A Novel Collection of snRNA-Like Promoters with Tissue-Specific Transcription Properties

PubMed Central

Garritano, Sonia; Gigoni, Arianna; Costa, Delfina; Malatesta, Paolo; Florio, Tullio; Cancedda, Ranieri; Pagano, Aldo

2012-01-01

We recently identified a novel dataset of snRNA-like trascriptional units in the human genome. The investigation of a subset of these elements showed that they play relevant roles in physiology and/or pathology. In this work we expand our collection of small RNAs taking advantage of a newly developed algorithm able to identify genome sequence stretches with RNA polymerase (pol) III type 3 promoter features thus constituting putative pol III binding sites. The bioinformatic analysis of a subset of these elements that map in introns of protein-coding genes in antisense configuration suggest their association with alternative splicing, similarly to other recently characterized small RNAs. Interestingly, the analysis of the transcriptional activity of these novel promoters shows that they are active in a cell-type specific manner, in accordance with the emerging body of evidence of a tissue/cell-specific activity of pol III. PMID:23109855
A novel collection of snRNA-like promoters with tissue-specific transcription properties.

PubMed

Garritano, Sonia; Gigoni, Arianna; Costa, Delfina; Malatesta, Paolo; Florio, Tullio; Cancedda, Ranieri; Pagano, Aldo

2012-01-01

We recently identified a novel dataset of snRNA-like trascriptional units in the human genome. The investigation of a subset of these elements showed that they play relevant roles in physiology and/or pathology. In this work we expand our collection of small RNAs taking advantage of a newly developed algorithm able to identify genome sequence stretches with RNA polymerase (pol) III type 3 promoter features thus constituting putative pol III binding sites. The bioinformatic analysis of a subset of these elements that map in introns of protein-coding genes in antisense configuration suggest their association with alternative splicing, similarly to other recently characterized small RNAs. Interestingly, the analysis of the transcriptional activity of these novel promoters shows that they are active in a cell-type specific manner, in accordance with the emerging body of evidence of a tissue/cell-specific activity of pol III.
DNA methylation dynamics during in vivo differentiation of blood and skin stem cells

PubMed Central

Bock, Christoph; Beerman, Isabel; Lien, Wen-Hui; Smith, Zachary D.; Gu, Hongcang; Boyle, Patrick; Gnirke, Andreas; Fuchs, Elaine; Rossi, Derrick J.; Meissner, Alexander

2012-01-01

DNA methylation is a mechanism of epigenetic regulation that is common to all vertebrates. Functional studies underscore its relevance for tissue homeostasis, but the global dynamics of DNA methylation during in vivo differentiation remain underexplored. Here we report high-resolution DNA methylation maps of adult stem cell differentiation in mouse, focusing on 19 purified cell populations of the blood and skin lineages. DNA methylation changes were locus-specific and relatively modest in magnitude. They frequently overlapped with lineage-associated transcription factors and their binding sites, suggesting that DNA methylation may protect cells from aberrant transcription factor activation. DNA methylation and gene expression provided complementary information, and combining the two enabled us to infer the cellular differentiation hierarchy of the blood lineage directly from genomic data. In summary, these results demonstrate that in vivo differentiation of adult stem cells is associated with small but informative changes in the genomic distribution of DNA methylation. PMID:22841485
Metabolic potential and in situ activity of marine Marinimicrobia bacteria in an anoxic water column.

PubMed

Bertagnolli, Anthony D; Padilla, Cory C; Glass, Jennifer B; Thamdrup, Bo; Stewart, Frank J

2017-11-01

Marinimicrobia bacteria are widespread in subeuphotic areas of the oceans and particularly abundant in oxygen minimum zones (OMZs). Information on Marinimicrobia metabolism is sparse, making the biogeochemical influence of this group challenging to predict. Here, metagenome-assembled genomes representing Marinimicrobia subgroups PN262000N21 and ARCTIC96B-7 were retrieved to near completion (97% and 94%) from OMZ metagenomes, with contamination (14.1%) observed only in ARCTIC96B-7. Genes for aerobic carbon monoxide (CO) oxidation, polysulfide metabolism and hydrogen utilization were identified only in PN262000N21, while genes for partial denitrification occurred in both genomes. Transcripts mapping to these genomes increased from <0.3% of total mRNA from the oxic zone to a max of 22% under anoxia. ARCTIC96B-7 transcript representation decreased an order of magnitude from non-sulfidic to sulfidic depths. In contrast, PN262000N21 representation was relatively constant throughout the OMZ, although transcripts encoding sulfur-utilizing proteins, including sulfur transferases, were enriched at sulfidic depths. PN262000N21 transcripts encoding a protein with fibronectin domains similar to those in cellulosome-producing bacteria were also abundant, suggesting a potential for high molecular weight carbon cycling. These data provide omic-level descriptions of metabolic potential and activity in OMZ-associated Marinimicrobia, suggesting differentiation between subgroups with roles in carbon and dissimilatory inorganic nitrogen and sulfur cycling. © 2017 Society for Applied Microbiology and John Wiley & Sons Ltd.
The small RNA complement of adult Schistosoma haematobium.

PubMed

Stroehlein, Andreas J; Young, Neil D; Korhonen, Pasi K; Hall, Ross S; Jex, Aaron R; Webster, Bonnie L; Rollinson, David; Brindley, Paul J; Gasser, Robin B

2018-05-01

Blood flukes of the genus Schistosoma cause schistosomiasis-a neglected tropical disease (NTD) that affects more than 200 million people worldwide. Studies of schistosome genomes have improved our understanding of the molecular biology of flatworms, but most of them have focused largely on protein-coding genes. Small non-coding RNAs (sncRNAs) have been explored in selected schistosome species and are suggested to play essential roles in the post-transcriptional regulation of genes, and in modulating flatworm-host interactions. However, genome-wide small RNA data are currently lacking for key schistosomes including Schistosoma haematobium-the causative agent of urogenital schistosomiasis of humans. MicroRNAs (miRNAs) and other sncRNAs of male and female adults of S. haematobium and small RNA transcription levels were explored by deep sequencing, genome mapping and detailed bioinformatic analyses. In total, 89 transcribed miRNAs were identified in S. haematobium-a similar complement to those reported for the congeners S. mansoni and S. japonicum. Of these miRNAs, 34 were novel, with no homologs in other schistosomes. Most miRNAs (n = 64) exhibited sex-biased transcription, suggestive of roles in sexual differentiation, pairing of adult worms and reproductive processes. Of the sncRNAs that were not miRNAs, some related to the spliceosome (n = 21), biogenesis of other RNAs (n = 3) or ribozyme functions (n = 16), whereas most others (n = 3798) were novel ('orphans') with unknown functions. This study provides the first genome-wide sncRNA resource for S. haematobium, extending earlier studies of schistosomes. The present work should facilitate the future curation and experimental validation of sncRNA functions in schistosomes to enhance our understanding of post-transcriptional gene regulation and of the roles that sncRNAs play in schistosome reproduction, development and parasite-host cross-talk.
Detection of PIWI and piRNAs in the mitochondria of mammalian cancer cells

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kwon, ChangHyuk, E-mail: netbuyer@hanmail.net; Tak, Hyosun, E-mail: chuberry@naver.com; Rho, Mina, E-mail: minarho@hanyang.ac.kr

2014-03-28

Highlights: • piRNA sequences were mapped to human mitochondrial (mt) genome. • We inspected small RNA-Seq datasets from somatic cell mt subcellular fractions. • Piwi and piRNA transcripts are present in mammalian somatic cancer cell mt fractions. - Abstract: Piwi-interacting RNAs (piRNAs) are 26–31 nt small noncoding RNAs that are processed from their longer precursor transcripts by Piwi proteins. Localization of Piwi and piRNA has been reported mostly in nucleus and cytoplasm of higher eukaryotes germ-line cells, where it is believed that known piRNA sequences are located in repeat regions of nuclear genome in germ-line cells. However, localization of PIWImore » and piRNA in mammalian somatic cell mitochondria yet remains largely unknown. We identified 29 piRNA sequence alignments from various regions of the human mitochondrial genome. Twelve out 29 piRNA sequences matched stem-loop fragment sequences of seven distinct tRNAs. We observed their actual expression in mitochondria subcellular fractions by inspecting mitochondrial-specific small RNA-Seq datasets. Of interest, the majority of the 29 piRNAs overlapped with multiple longer transcripts (expressed sequence tags) that are unique to the human mitochondrial genome. The presence of mature piRNAs in mitochondria was detected by qRT-PCR of mitochondrial subcellular RNAs. Further validation showed detection of Piwi by colocalization using anti-Piwil1 and mitochondria organelle-specific protein antibodies.« less
The developmental transcriptome of Drosophila melanogaster

DOE Office of Scientific and Technical Information (OSTI.GOV)

University of Connecticut; Graveley, Brenton R.; Brooks, Angela N.

Drosophila melanogaster is one of the most well studied genetic model organisms; nonetheless, its genome still contains unannotated coding and non-coding genes, transcripts, exons and RNA editing sites. Full discovery and annotation are pre-requisites for understanding how the regulation of transcription, splicing and RNA editing directs the development of this complex organism. Here we used RNA-Seq, tiling microarrays and cDNA sequencing to explore the transcriptome in 30 distinct developmental stages. We identified 111,195 new elements, including thousands of genes, coding and non-coding transcripts, exons, splicing and editing events, and inferred protein isoforms that previously eluded discovery using established experimental, predictionmore » and conservation-based approaches. These data substantially expand the number of known transcribed elements in the Drosophila genome and provide a high-resolution view of transcriptome dynamics throughout development. Drosophila melanogaster is an important non-mammalian model system that has had a critical role in basic biological discoveries, such as identifying chromosomes as the carriers of genetic information and uncovering the role of genes in development. Because it shares a substantial genic content with humans, Drosophila is increasingly used as a translational model for human development, homeostasis and disease. High-quality maps are needed for all functional genomic elements. Previous studies demonstrated that a rich collection of genes is deployed during the life cycle of the fly. Although expression profiling using microarrays has revealed the expression of, 13,000 annotated genes, it is difficult to map splice junctions and individual base modifications generated by RNA editing using such approaches. Single-base resolution is essential to define precisely the elements that comprise the Drosophila transcriptome. Estimates of the number of transcript isoforms are less accurate than estimates of the number of genes. Whereas, 20% of Drosophila genes are annotated as encoding alternatively spliced premRNAs, splice-junction microarray experiments indicate that this number is at least 40% (ref. 7). Determining the diversity of mRNAs generated by alternative promoters, alternative splicing and RNA editing will substantially increase the inferred protein repertoire. Non-coding RNA genes (ncRNAs) including short interfering RNAs (siRNAs) and microRNAS (miRNAs) (reviewed in ref. 10), and longer ncRNAs such as bxd (ref. 11) and rox (ref. 12), have important roles in gene regulation, whereas others such as small nucleolar RNAs (snoRNAs)and small nuclear RNAs (snRNAs) are important components of macromolecular machines such as the ribosome and spliceosome. The transcription and processing of these ncRNAs must also be fully documented and mapped. As part of the modENCODE project to annotate the functional elements of the D. melanogaster and Caenorhabditis elegans genomes, we used RNA-Seq and tiling microarrays to sample the Drosophila transcriptome at unprecedented depth throughout development from early embryo to ageing male and female adults. We report on a high-resolution view of the discovery, structure and dynamic expression of the D. melanogaster transcriptome.« less

Global Transcriptional Start Site Mapping Using Differential RNA Sequencing Reveals Novel Antisense RNAs in Escherichia coli

PubMed Central

Thomason, Maureen K.; Bischler, Thorsten; Eisenbart, Sara K.; Förstner, Konrad U.; Zhang, Aixia; Herbig, Alexander; Nieselt, Kay

2014-01-01

While the model organism Escherichia coli has been the subject of intense study for decades, the full complement of its RNAs is only now being examined. Here we describe a survey of the E. coli transcriptome carried out using a differential RNA sequencing (dRNA-seq) approach, which can distinguish between primary and processed transcripts, and an automated prediction algorithm for transcriptional start sites (TSS). With the criterion of expression under at least one of three growth conditions examined, we predicted 14,868 TSS candidates, including 5,574 internal to annotated genes (iTSS) and 5,495 TSS corresponding to potential antisense RNAs (asRNAs). We examined expression of 14 candidate asRNAs by Northern analysis using RNA from wild-type E. coli and from strains defective for RNases III and E, two RNases reported to be involved in asRNA processing. Interestingly, nine asRNAs detected as distinct bands by Northern analysis were differentially affected by the rnc and rne mutations. We also compared our asRNA candidates with previously published asRNA annotations from RNA-seq data and discuss the challenges associated with these cross-comparisons. Our global transcriptional start site map represents a valuable resource for identification of transcription start sites, promoters, and novel transcripts in E. coli and is easily accessible, together with the cDNA coverage plots, in an online genome browser. PMID:25266388
Mapping eQTLs in the Norfolk Island Genetic Isolate Identifies Candidate Genes for CVD Risk Traits

PubMed Central

Benton, Miles C.; Lea, Rod A.; Macartney-Coxson, Donia; Carless, Melanie A.; Göring, Harald H.; Bellis, Claire; Hanna, Michelle; Eccles, David; Chambers, Geoffrey K.; Curran, Joanne E.; Harper, Jacquie L.; Blangero, John; Griffiths, Lyn R.

2013-01-01

Cardiovascular disease (CVD) affects millions of people worldwide and is influenced by numerous factors, including lifestyle and genetics. Expression quantitative trait loci (eQTLs) influence gene expression and are good candidates for CVD risk. Founder-effect pedigrees can provide additional power to map genes associated with disease risk. Therefore, we identified eQTLs in the genetic isolate of Norfolk Island (NI) and tested for associations between these and CVD risk factors. We measured genome-wide transcript levels of blood lymphocytes in 330 individuals and used pedigree-based heritability analysis to identify heritable transcripts. eQTLs were identified by genome-wide association testing of these transcripts. Testing for association between CVD risk factors (i.e., blood lipids, blood pressure, and body fat indices) and eQTLs revealed 1,712 heritable transcripts (p < 0.05) with heritability values ranging from 0.18 to 0.84. From these, we identified 200 cis-acting and 70 trans-acting eQTLs (p < 1.84 × 10−7) An eQTL-centric analysis of CVD risk traits revealed multiple associations, including 12 previously associated with CVD-related traits. Trait versus eQTL regression modeling identified four CVD risk candidates (NAAA, PAPSS1, NME1, and PRDX1), all of which have known biological roles in disease. In addition, we implicated several genes previously associated with CVD risk traits, including MTHFR and FN3KRP. We have successfully identified a panel of eQTLs in the NI pedigree and used this to implicate several genes in CVD risk. Future studies are required for further assessing the functional importance of these eQTLs and whether the findings here also relate to outbred populations. PMID:24314549
Genome-Wide Transcriptional Start Site Mapping and sRNA Identification in the Pathogen Leptospira interrogans

PubMed Central

Zhukova, Anna; Fernandes, Luis Guilherme; Hugon, Perrine; Pappas, Christopher J.; Sismeiro, Odile; Coppée, Jean-Yves; Becavin, Christophe; Malabat, Christophe; Eshghi, Azad; Zhang, Jun-Jie; Yang, Frank X.; Picardeau, Mathieu

2017-01-01

Leptospira are emerging zoonotic pathogens transmitted from animals to humans typically through contaminated environmental sources of water and soil. Regulatory pathways of pathogenic Leptospira spp. underlying the adaptive response to different hosts and environmental conditions remains elusive. In this study, we provide the first global Transcriptional Start Site (TSS) map of a Leptospira species. RNA was obtained from the pathogen Leptospira interrogans grown at 30°C (optimal in vitro temperature) and 37°C (host temperature) and selectively enriched for 5′ ends of native transcripts. A total of 2865 and 2866 primary TSS (pTSS) were predicted in the genome of L. interrogans at 30 and 37°C, respectively. The majority of the pTSSs were located between 0 and 10 nucleotides from the translational start site, suggesting that leaderless transcripts are a common feature of the leptospiral translational landscape. Comparative differential RNA-sequencing (dRNA-seq) analysis revealed conservation of most pTSS at 30 and 37°C. Promoter prediction algorithms allow the identification of the binding sites of the alternative sigma factor sigma 54. However, other motifs were not identified indicating that Leptospira consensus promoter sequences are inherently different from the Escherichia coli model. RNA sequencing also identified 277 and 226 putative small regulatory RNAs (sRNAs) at 30 and 37°C, respectively, including eight validated sRNAs by Northern blots. These results provide the first global view of TSS and the repertoire of sRNAs in L. interrogans. These data will establish a foundation for future experimental work on gene regulation under various environmental conditions including those in the host. PMID:28154810
eQTL Mapping Using RNA-seq Data

PubMed Central

Hu, Yijuan

2012-01-01

As RNA-seq is replacing gene expression microarrays to assess genome-wide transcription abundance, gene expression Quantitative Trait Locus (eQTL) studies using RNA-seq have emerged. RNA-seq delivers two novel features that are important for eQTL studies. First, it provides information on allele-specific expression (ASE), which is not available from gene expression microarrays. Second, it generates unprecedentedly rich data to study RNA-isoform expression. In this paper, we review current methods for eQTL mapping using ASE and discuss some future directions. We also review existing works that use RNA-seq data to study RNA-isoform expression and we discuss the gaps between these works and isoform-specific eQTL mapping. PMID:23667399
Quantitative trait locus mapping and functional genomics of an organophosphate resistance trait in the western corn rootworm, Diabrotica virgifera virgifera.

PubMed

Coates, B S; Alves, A P; Wang, H; Zhou, X; Nowatzki, T; Chen, H; Rangasamy, M; Robertson, H M; Whitfield, C W; Walden, K K; Kachman, S D; French, B W; Meinke, L J; Hawthorne, D; Abel, C A; Sappington, T W; Siegfried, B D; Miller, N J

2016-02-01

The western corn rootworm, Diabrotica virgifera virgifera, is an insect pest of corn and population suppression with chemical insecticides is an important management tool. Traits conferring organophosphate insecticide resistance have increased in frequency amongst D. v. virgifera populations, resulting in the reduced efficacy in many corn-growing regions of the USA. We used comparative functional genomic and quantitative trait locus (QTL) mapping approaches to investigate the genetic basis of D. v. virgifera resistance to the organophosphate methyl-parathion. RNA from adult methyl-parathion resistant and susceptible adults was hybridized to 8331 microarray probes. The results predicted that 11 transcripts were significantly up-regulated in resistant phenotypes, with the most significant (fold increases ≥ 2.43) being an α-esterase-like transcript. Differential expression was validated only for the α-esterase (ST020027A20C03), with 11- to 13-fold greater expression in methyl-parathion resistant adults (P < 0.05). Progeny with a segregating methyl-parathion resistance trait were obtained from a reciprocal backcross design. QTL analyses of high-throughput single nucleotide polymorphism genotype data predicted involvement of a single genome interval. These data suggest that a specific carboyxesterase may function in field-evolved corn rootworm resistance to organophosphates, even though direct linkage between the QTL and this locus could not be established. Published 2015. This article is a U.S. Government work and is in the public domain in the USA.
The DNA-encoded nucleosome organization of a eukaryotic genome.

PubMed

Kaplan, Noam; Moore, Irene K; Fondufe-Mittendorf, Yvonne; Gossett, Andrea J; Tillo, Desiree; Field, Yair; LeProust, Emily M; Hughes, Timothy R; Lieb, Jason D; Widom, Jonathan; Segal, Eran

2009-03-19

Nucleosome organization is critical for gene regulation. In living cells this organization is determined by multiple factors, including the action of chromatin remodellers, competition with site-specific DNA-binding proteins, and the DNA sequence preferences of the nucleosomes themselves. However, it has been difficult to estimate the relative importance of each of these mechanisms in vivo, because in vivo nucleosome maps reflect the combined action of all influencing factors. Here we determine the importance of nucleosome DNA sequence preferences experimentally by measuring the genome-wide occupancy of nucleosomes assembled on purified yeast genomic DNA. The resulting map, in which nucleosome occupancy is governed only by the intrinsic sequence preferences of nucleosomes, is similar to in vivo nucleosome maps generated in three different growth conditions. In vitro, nucleosome depletion is evident at many transcription factor binding sites and around gene start and end sites, indicating that nucleosome depletion at these sites in vivo is partly encoded in the genome. We confirm these results with a micrococcal nuclease-independent experiment that measures the relative affinity of nucleosomes for approximately 40,000 double-stranded 150-base-pair oligonucleotides. Using our in vitro data, we devise a computational model of nucleosome sequence preferences that is significantly correlated with in vivo nucleosome occupancy in Caenorhabditis elegans. Our results indicate that the intrinsic DNA sequence preferences of nucleosomes have a central role in determining the organization of nucleosomes in vivo.
Expressed sequence tag analysis of human RPE/choroid for the NEIBank Project: over 6000 non-redundant transcripts, novel genes and splice variants.

PubMed

Wistow, Graeme; Bernstein, Steven L; Wyatt, M Keith; Fariss, Robert N; Behal, Amita; Touchman, Jeffrey W; Bouffard, Gerald; Smith, Don; Peterson, Katherine

2002-06-15

The retinal pigment epithelium (RPE) and choroid comprise a functional unit of the eye that is essential to normal retinal health and function. Here we describe expressed sequence tag (EST) analysis of human RPE/choroid as part of a project for ocular bioinformatics. A cDNA library (cs) was made from human RPE/choroid and sequenced. Data were analyzed and assembled using the program GRIST (GRouping and Identification of Sequence Tags). Complete sequencing, Northern and Western blots, RH mapping, peptide antibody synthesis and immunofluorescence (IF) have been used to examine expression patterns and genome location for selected transcripts and proteins. Ten thousand individual sequence reads yield over 6300 unique gene clusters of which almost half have no matches with named genes. One of the most abundant transcripts is from a gene (named "alpha") that maps to the BBS1 region of chromosome 11. A number of tissue preferred transcripts are common to both RPE/choroid and iris. These include oculoglycan/opticin, for which an alternative splice form is detected in RPE/choroid, and "oculospanin" (Ocsp), a novel tetraspanin that maps to chromosome 17q. Antiserum to Ocsp detects expression in RPE, iris, ciliary body, and retinal ganglion cells by IF. A newly identified gene for a zinc-finger protein (TIRC) maps to 19q13.4. Variant transcripts of several genes were also detected. Most notably, the predominant form of Bestrophin represented in cs contains a longer open reading frame as a result of splice junction skipping. The unamplified cs library gives a view of the transcriptional repertoire of the adult RPE/choroid. A large number of potentially novel genes and splice forms and candidates for genetic diseases are revealed. Clones from this collection are being included in a large, nonredundant set for cDNA microarray construction.
Genomics-assisted breeding for boosting crop improvement in pigeonpea (Cajanus cajan)

PubMed Central

Pazhamala, Lekha; Saxena, Rachit K.; Singh, Vikas K.; Sameerkumar, C. V.; Kumar, Vinay; Sinha, Pallavi; Patel, Kishan; Obala, Jimmy; Kaoneka, Seleman R.; Tongoona, P.; Shimelis, Hussein A.; Gangarao, N. V. P. R.; Odeny, Damaris; Rathore, Abhishek; Dharmaraj, P. S.; Yamini, K. N.; Varshney, Rajeev K.

2015-01-01

Pigeonpea is an important pulse crop grown predominantly in the tropical and sub-tropical regions of the world. Although pigeonpea growing area has considerably increased, yield has remained stagnant for the last six decades mainly due to the exposure of the crop to various biotic and abiotic constraints. In addition, low level of genetic variability and limited genomic resources have been serious impediments to pigeonpea crop improvement through modern breeding approaches. In recent years, however, due to the availability of next generation sequencing and high-throughput genotyping technologies, the scenario has changed tremendously. The reduced sequencing costs resulting in the decoding of the pigeonpea genome has led to the development of various genomic resources including molecular markers, transcript sequences and comprehensive genetic maps. Mapping of some important traits including resistance to Fusarium wilt and sterility mosaic disease, fertility restoration, determinacy with other agronomically important traits have paved the way for applying genomics-assisted breeding (GAB) through marker assisted selection as well as genomic selection (GS). This would accelerate the development and improvement of both varieties and hybrids in pigeonpea. Particularly for hybrid breeding programme, mitochondrial genomes of cytoplasmic male sterile (CMS) lines, maintainers and hybrids have been sequenced to identify genes responsible for cytoplasmic male sterility. Furthermore, several diagnostic molecular markers have been developed to assess the purity of commercial hybrids. In summary, pigeonpea has become a genomic resources-rich crop and efforts have already been initiated to integrate these resources in pigeonpea breeding. PMID:25741349
Genome-wide high-throughput SNP discovery and genotyping for understanding natural (functional) allelic diversity and domestication patterns in wild chickpea

PubMed Central

Bajaj, Deepak; Das, Shouvik; Badoni, Saurabh; Kumar, Vinod; Singh, Mohar; Bansal, Kailash C.; Tyagi, Akhilesh K.; Parida, Swarup K.

2015-01-01

We identified 82489 high-quality genome-wide SNPs from 93 wild and cultivated Cicer accessions through integrated reference genome- and de novo-based GBS assays. High intra- and inter-specific polymorphic potential (66–85%) and broader natural allelic diversity (6–64%) detected by genome-wide SNPs among accessions signify their efficacy for monitoring introgression and transferring target trait-regulating genomic (gene) regions/allelic variants from wild to cultivated Cicer gene pools for genetic improvement. The population-specific assignment of wild Cicer accessions pertaining to the primary gene pool are more influenced by geographical origin/phenotypic characteristics than species/gene-pools of origination. The functional significance of allelic variants (non-synonymous and regulatory SNPs) scanned from transcription factors and stress-responsive genes in differentiating wild accessions (with potential known sources of yield-contributing and stress tolerance traits) from cultivated desi and kabuli accessions, fine-mapping/map-based cloning of QTLs and determination of LD patterns across wild and cultivated gene-pools are suitably elucidated. The correlation between phenotypic (agromorphological traits) and molecular diversity-based admixed domestication patterns within six structured populations of wild and cultivated accessions via genome-wide SNPs was apparent. This suggests utility of whole genome SNPs as a potential resource for identifying naturally selected trait-regulating genomic targets/functional allelic variants adaptive to diverse agroclimatic regions for genetic enhancement of cultivated gene-pools. PMID:26208313
Saturation of an Intra-Gene Pool Linkage Map: Towards a Unified Consensus Linkage Map for Fine Mapping and Synteny Analysis in Common Bean

PubMed Central

Galeano, Carlos H.; Fernandez, Andrea C.; Franco-Herrera, Natalia; Cichy, Karen A.; McClean, Phillip E.; Vanderleyden, Jos; Blair, Matthew W.

2011-01-01

Map-based cloning and fine mapping to find genes of interest and marker assisted selection (MAS) requires good genetic maps with reproducible markers. In this study, we saturated the linkage map of the intra-gene pool population of common bean DOR364×BAT477 (DB) by evaluating 2,706 molecular markers including SSR, SNP, and gene-based markers. On average the polymorphism rate was 7.7% due to the narrow genetic base between the parents. The DB linkage map consisted of 291 markers with a total map length of 1,788 cM. A consensus map was built using the core mapping populations derived from inter-gene pool crosses: DOR364×G19833 (DG) and BAT93×JALO EEP558 (BJ). The consensus map consisted of a total of 1,010 markers mapped, with a total map length of 2,041 cM across 11 linkage groups. On average, each linkage group on the consensus map contained 91 markers of which 83% were single copy markers. Finally, a synteny analysis was carried out using our highly saturated consensus maps compared with the soybean pseudo-chromosome assembly. A total of 772 marker sequences were compared with the soybean genome. A total of 44 syntenic blocks were identified. The linkage group Pv6 presented the most diverse pattern of synteny with seven syntenic blocks, and Pv9 showed the most consistent relations with soybean with just two syntenic blocks. Additionally, a co-linear analysis using common bean transcript map information against soybean coding sequences (CDS) revealed the relationship with 787 soybean genes. The common bean consensus map has allowed us to map a larger number of markers, to obtain a more complete coverage of the common bean genome. Our results, combined with synteny relationships provide tools to increase marker density in selected genomic regions to identify closely linked polymorphic markers for indirect selection, fine mapping or for positional cloning. PMID:22174773
An integrated map of genetic variation from 1,092 human genomes

PubMed Central

2012-01-01

Summary Through characterising the geographic and functional spectrum of human genetic variation, the 1000 Genomes Project aims to build a resource to help understand the genetic contribution to disease. We describe the genomes of 1,092 individuals from 14 populations, constructed using a combination of low-coverage whole-genome and exome sequencing. By developing methodologies to integrate information across multiple algorithms and diverse data sources we provide a validated haplotype map of 38 million SNPs, 1.4 million indels and over 14 thousand larger deletions. We show that individuals from different populations carry different profiles of rare and common variants and that low-frequency variants show substantial geographic differentiation, which is further increased by the action of purifying selection. We show that evolutionary conservation and coding consequence are key determinants of the strength of purifying selection, that rare-variant load varies substantially across biological pathways and that each individual harbours hundreds of rare non-coding variants at conserved sites, such as transcription-factor-motif disrupting changes. This resource, which captures up to 98% of accessible SNPs at a frequency of 1% in populations of medical genetics focus, enables analysis of common and low-frequency variants in individuals from diverse, including admixed, populations. PMID:23128226
Identification of unannotated exons of low abundance transcripts in Drosophila melanogaster and cloning of a new serine protease gene upregulated upon injury.

PubMed

Maia, Rafaela M; Valente, Valeria; Cunha, Marco A V; Sousa, Josane F; Araujo, Daniela D; Silva, Wilson A; Zago, Marco A; Dias-Neto, Emmanuel; Souza, Sandro J; Simpson, Andrew J G; Monesi, Nadia; Ramos, Ricardo G P; Espreafico, Enilza M; Paçó-Larson, Maria L

2007-07-24

The sequencing of the D.melanogaster genome revealed an unexpected small number of genes (~ 14,000) indicating that mechanisms acting on generation of transcript diversity must have played a major role in the evolution of complex metazoans. Among the most extensively used mechanisms that accounts for this diversity is alternative splicing. It is estimated that over 40% of Drosophila protein-coding genes contain one or more alternative exons. A recent transcription map of the Drosophila embryogenesis indicates that 30% of the transcribed regions are unannotated, and that 1/3 of this is estimated as missed or alternative exons of previously characterized protein-coding genes. Therefore, the identification of the variety of expressed transcripts depends on experimental data for its final validation and is continuously being performed using different approaches. We applied the Open Reading Frame Expressed Sequence Tags (ORESTES) methodology, which is capable of generating cDNA data from the central portion of rare transcripts, in order to investigate the presence of hitherto unnanotated regions of Drosophila transcriptome. Bioinformatic analysis of 1,303 Drosophila ORESTES clusters identified 68 sequences derived from unannotated regions in the current Drosophila genome version (4.3). Of these, a set of 38 was analysed by polyA+ northern blot hybridization, validating 17 (50%) new exons of low abundance transcripts. For one of these ESTs, we obtained the cDNA encompassing the complete coding sequence of a new serine protease, named SP212. The SP212 gene is part of a serine protease gene cluster located in the chromosome region 88A12-B1. This cluster includes the predicted genes CG9631, CG9649 and CG31326, which were previously identified as up-regulated after immune challenges in genomic-scale microarray analysis. In agreement with the proposal that this locus is co-regulated in response to microorganisms infection, we show here that SP212 is also up-regulated upon injury. Using the ORESTES methodology we identified 17 novel exons from low abundance Drosophila transcripts, and through a PCR approach the complete CDS of one of these transcripts was defined. Our results show that the computational identification and manual inspection are not sufficient to annotate a genome in the absence of experimentally derived data.
Identification of unannotated exons of low abundance transcripts in Drosophila melanogaster and cloning of a new serine protease gene upregulated upon injury

PubMed Central

Maia, Rafaela M; Valente, Valeria; Cunha, Marco AV; Sousa, Josane F; Araujo, Daniela D; Silva, Wilson A; Zago, Marco A; Dias-Neto, Emmanuel; Souza, Sandro J; Simpson, Andrew JG; Monesi, Nadia; Ramos, Ricardo GP; Espreafico, Enilza M; Paçó-Larson, Maria L

2007-01-01

Background The sequencing of the D.melanogaster genome revealed an unexpected small number of genes (~ 14,000) indicating that mechanisms acting on generation of transcript diversity must have played a major role in the evolution of complex metazoans. Among the most extensively used mechanisms that accounts for this diversity is alternative splicing. It is estimated that over 40% of Drosophila protein-coding genes contain one or more alternative exons. A recent transcription map of the Drosophila embryogenesis indicates that 30% of the transcribed regions are unannotated, and that 1/3 of this is estimated as missed or alternative exons of previously characterized protein-coding genes. Therefore, the identification of the variety of expressed transcripts depends on experimental data for its final validation and is continuously being performed using different approaches. We applied the Open Reading Frame Expressed Sequence Tags (ORESTES) methodology, which is capable of generating cDNA data from the central portion of rare transcripts, in order to investigate the presence of hitherto unnanotated regions of Drosophila transcriptome. Results Bioinformatic analysis of 1,303 Drosophila ORESTES clusters identified 68 sequences derived from unannotated regions in the current Drosophila genome version (4.3). Of these, a set of 38 was analysed by polyA+ northern blot hybridization, validating 17 (50%) new exons of low abundance transcripts. For one of these ESTs, we obtained the cDNA encompassing the complete coding sequence of a new serine protease, named SP212. The SP212 gene is part of a serine protease gene cluster located in the chromosome region 88A12-B1. This cluster includes the predicted genes CG9631, CG9649 and CG31326, which were previously identified as up-regulated after immune challenges in genomic-scale microarray analysis. In agreement with the proposal that this locus is co-regulated in response to microorganisms infection, we show here that SP212 is also up-regulated upon injury. Conclusion Using the ORESTES methodology we identified 17 novel exons from low abundance Drosophila transcripts, and through a PCR approach the complete CDS of one of these transcripts was defined. Our results show that the computational identification and manual inspection are not sufficient to annotate a genome in the absence of experimentally derived data. PMID:17650329
Genome-wide mRNA processing in methanogenic archaea reveals post-transcriptional regulation of ribosomal protein synthesis

PubMed Central

Qi, Lei; Yue, Lei; Feng, Deqin; Qi, Fengxia

2017-01-01

Abstract Unlike stable RNAs that require processing for maturation, prokaryotic cellular mRNAs generally follow an ‘all-or-none’ pattern. Herein, we used a 5΄ monophosphate transcript sequencing (5΄P-seq) that specifically captured the 5΄-end of processed transcripts and mapped the genome-wide RNA processing sites (PSSs) in a methanogenic archaeon. Following statistical analysis and stringent filtration, we identified 1429 PSSs, among which 23.5% and 5.4% were located in 5΄ untranslated region (uPSS) and intergenic region (iPSS), respectively. A predominant uridine downstream PSSs served as a processing signature. Remarkably, 5΄P-seq detected overrepresented uPSS and iPSS in the polycistronic operons encoding ribosomal proteins, and the majority upstream and proximal ribosome binding sites, suggesting a regulatory role of processing on translation initiation. The processed transcripts showed increased stability and translation efficiency. Particularly, processing within the tricistronic transcript of rplA-rplJ-rplL enhanced the translation of rplL, which can provide a driving force for the 1:4 stoichiometry of L10 to L12 in the ribosome. Growth-associated mRNA processing intensities were also correlated with the cellular ribosomal protein levels, thereby suggesting that mRNA processing is involved in tuning growth-dependent ribosome synthesis. In conclusion, our findings suggest that mRNA processing-mediated post-transcriptional regulation is a potential mechanism of ribosomal protein synthesis and stoichiometry. PMID:28520982
A large-scale full-length cDNA analysis to explore the budding yeast transcriptome

PubMed Central

Miura, Fumihito; Kawaguchi, Noriko; Sese, Jun; Toyoda, Atsushi; Hattori, Masahira; Morishita, Shinichi; Ito, Takashi

2006-01-01

We performed a large-scale cDNA analysis to explore the transcriptome of the budding yeast Saccharomyces cerevisiae. We sequenced two cDNA libraries, one from the cells exponentially growing in a minimal medium and the other from meiotic cells. Both libraries were generated by using a vector-capping method that allows the accurate mapping of transcription start sites (TSSs). Consequently, we identified 11,575 TSSs associated with 3,638 annotated genomic features, including 3,599 ORFs, to suggest that most yeast genes have two or more TSSs. In addition, we identified 45 previously undescribed introns, including those affecting current ORF annotations and those spliced alternatively. Furthermore, the analysis revealed 667 transcription units in the intergenic regions and transcripts derived from antisense strands of 367 known features. We also found that 348 ORFs carry TSSs in their 3′-halves to generate sense transcripts starting from inside the ORFs. These results indicate that the budding yeast transcriptome is considerably more complex than previously thought, and it shares many recently revealed characteristics with the transcriptomes of mammals and other higher eukaryotes. Thus, the genome-wide active transcription that generates novel classes of transcripts appears to be an intrinsic feature of the eukaryotic cells. The budding yeast will serve as a versatile model for the studies on these aspects of transcriptome, and the full-length cDNA clones can function as an invaluable resource in such studies. PMID:17101987
ChIPBase: a database for decoding the transcriptional regulation of long non-coding RNA and microRNA genes from ChIP-Seq data.

PubMed

Yang, Jian-Hua; Li, Jun-Hao; Jiang, Shan; Zhou, Hui; Qu, Liang-Hu

2013-01-01

Long non-coding RNAs (lncRNAs) and microRNAs (miRNAs) represent two classes of important non-coding RNAs in eukaryotes. Although these non-coding RNAs have been implicated in organismal development and in various human diseases, surprisingly little is known about their transcriptional regulation. Recent advances in chromatin immunoprecipitation with next-generation DNA sequencing (ChIP-Seq) have provided methods of detecting transcription factor binding sites (TFBSs) with unprecedented sensitivity. In this study, we describe ChIPBase (http://deepbase.sysu.edu.cn/chipbase/), a novel database that we have developed to facilitate the comprehensive annotation and discovery of transcription factor binding maps and transcriptional regulatory relationships of lncRNAs and miRNAs from ChIP-Seq data. The current release of ChIPBase includes high-throughput sequencing data that were generated by 543 ChIP-Seq experiments in diverse tissues and cell lines from six organisms. By analysing millions of TFBSs, we identified tens of thousands of TF-lncRNA and TF-miRNA regulatory relationships. Furthermore, two web-based servers were developed to annotate and discover transcriptional regulatory relationships of lncRNAs and miRNAs from ChIP-Seq data. In addition, we developed two genome browsers, deepView and genomeView, to provide integrated views of multidimensional data. Moreover, our web implementation supports diverse query types and the exploration of TFs, lncRNAs, miRNAs, gene ontologies and pathways.
Framework for reanalysis of publicly available Affymetrix® GeneChip® data sets based on functional regions of interest.

PubMed

Saka, Ernur; Harrison, Benjamin J; West, Kirk; Petruska, Jeffrey C; Rouchka, Eric C

2017-12-06

Since the introduction of microarrays in 1995, researchers world-wide have used both commercial and custom-designed microarrays for understanding differential expression of transcribed genes. Public databases such as ArrayExpress and the Gene Expression Omnibus (GEO) have made millions of samples readily available. One main drawback to microarray data analysis involves the selection of probes to represent a specific transcript of interest, particularly in light of the fact that transcript-specific knowledge (notably alternative splicing) is dynamic in nature. We therefore developed a framework for reannotating and reassigning probe groups for Affymetrix® GeneChip® technology based on functional regions of interest. This framework addresses three issues of Affymetrix® GeneChip® data analyses: removing nonspecific probes, updating probe target mapping based on the latest genome knowledge and grouping probes into gene, transcript and region-based (UTR, individual exon, CDS) probe sets. Updated gene and transcript probe sets provide more specific analysis results based on current genomic and transcriptomic knowledge. The framework selects unique probes, aligns them to gene annotations and generates a custom Chip Description File (CDF). The analysis reveals only 87% of the Affymetrix® GeneChip® HG-U133 Plus 2 probes uniquely align to the current hg38 human assembly without mismatches. We also tested new mappings on the publicly available data series using rat and human data from GSE48611 and GSE72551 obtained from GEO, and illustrate that functional grouping allows for the subtle detection of regions of interest likely to have phenotypical consequences. Through reanalysis of the publicly available data series GSE48611 and GSE72551, we profiled the contribution of UTR and CDS regions to the gene expression levels globally. The comparison between region and gene based results indicated that the detected expressed genes by gene-based and region-based CDFs show high consistency and regions based results allows us to detection of changes in transcript formation.
High-density genetic map using whole-genome resequencing for fine mapping and candidate gene discovery for disease resistance in peanut.

PubMed

Agarwal, Gaurav; Clevenger, Josh; Pandey, Manish K; Wang, Hui; Shasidhar, Yaduru; Chu, Ye; Fountain, Jake C; Choudhary, Divya; Culbreath, Albert K; Liu, Xin; Huang, Guodong; Wang, Xingjun; Deshmukh, Rupesh; Holbrook, C Corley; Bertioli, David J; Ozias-Akins, Peggy; Jackson, Scott A; Varshney, Rajeev K; Guo, Baozhu

2018-04-10

Whole-genome resequencing (WGRS) of mapping populations has facilitated development of high-density genetic maps essential for fine mapping and candidate gene discovery for traits of interest in crop species. Leaf spots, including early leaf spot (ELS) and late leaf spot (LLS), and Tomato spotted wilt virus (TSWV) are devastating diseases in peanut causing significant yield loss. We generated WGRS data on a recombinant inbred line population, developed a SNP-based high-density genetic map, and conducted fine mapping, candidate gene discovery and marker validation for ELS, LLS and TSWV. The first sequence-based high-density map was constructed with 8869 SNPs assigned to 20 linkage groups, representing 20 chromosomes, for the 'T' population (Tifrunner × GT-C20) with a map length of 3120 cM and an average distance of 1.45 cM. The quantitative trait locus (QTL) analysis using high-density genetic map and multiple season phenotyping data identified 35 main-effect QTLs with phenotypic variation explained (PVE) from 6.32% to 47.63%. Among major-effect QTLs mapped, there were two QTLs for ELS on B05 with 47.42% PVE and B03 with 47.38% PVE, two QTLs for LLS on A05 with 47.63% and B03 with 34.03% PVE and one QTL for TSWV on B09 with 40.71% PVE. The epistasis and environment interaction analyses identified significant environmental effects on these traits. The identified QTL regions had disease resistance genes including R-genes and transcription factors. KASP markers were developed for major QTLs and validated in the population and are ready for further deployment in genomics-assisted breeding in peanut. © 2018 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.
A Three-Dimensional Model of the Yeast Genome

NASA Astrophysics Data System (ADS)

Noble, William; Duan, Zhi-Jun; Andronescu, Mirela; Schutz, Kevin; McIlwain, Sean; Kim, Yoo Jung; Lee, Choli; Shendure, Jay; Fields, Stanley; Blau, C. Anthony

Layered on top of information conveyed by DNA sequence and chromatin are higher order structures that encompass portions of chromosomes, entire chromosomes, and even whole genomes. Interphase chromosomes are not positioned randomly within the nucleus, but instead adopt preferred conformations. Disparate DNA elements co-localize into functionally defined aggregates or factories for transcription and DNA replication. In budding yeast, Drosophila and many other eukaryotes, chromosomes adopt a Rabl configuration, with arms extending from centromeres adjacent to the spindle pole body to telomeres that abut the nuclear envelope. Nonetheless, the topologies and spatial relationships of chromosomes remain poorly understood. Here we developed a method to globally capture intra- and inter-chromosomal interactions, and applied it to generate a map at kilobase resolution of the haploid genome of Saccharomyces cerevisiae. The map recapitulates known features of genome organization, thereby validating the method, and identifies new features. Extensive regional and higher order folding of individual chromosomes is observed. Chromosome XII exhibits a striking conformation that implicates the nucleolus as a formidable barrier to interaction between DNA sequences at either end. Inter-chromosomal contacts are anchored by centromeres and include interactions among transfer RNA genes, among origins of early DNA replication and among sites where chromosomal breakpoints occur. Finally, we constructed a three-dimensional model of the yeast genome. Our findings provide a glimpse of the interface between the form and function of a eukaryotic genome.
A Picea abies Linkage Map Based on SNP Markers Identifies QTLs for Four Aspects of Resistance to Heterobasidion parviporum Infection

PubMed Central

Lind, Mårten; Källman, Thomas; Chen, Jun; Ma, Xiao-Fei; Bousquet, Jean; Morgante, Michele; Zaina, Giusi; Karlsson, Bo; Elfstrand, Malin; Lascoux, Martin; Stenlid, Jan

2014-01-01

A consensus linkage map of Picea abies, an economically important conifer, was constructed based on the segregation of 686 SNP markers in a F1 progeny population consisting of 247 individuals. The total length of 1889.2 cM covered 96.5% of the estimated genome length and comprised 12 large linkage groups, corresponding to the number of haploid P. abies chromosomes. The sizes of the groups (from 5.9 to 9.9% of the total map length) correlated well with previous estimates of chromosome sizes (from 5.8 to 10.8% of total genome size). Any locus in the genome has a 97% probability to be within 10 cM from a mapped marker, which makes the map suited for QTL mapping. Infecting the progeny trees with the root rot pathogen Heterobasidion parviporum allowed for mapping of four different resistance traits: lesion length at the inoculation site, fungal spread within the sapwood, exclusion of the pathogen from the host after initial infection, and ability to prevent the infection from establishing at all. These four traits were associated with two, four, four and three QTL regions respectively of which none overlapped between the traits. Each QTL explained between 4.6 and 10.1% of the respective traits phenotypic variation. Although the QTL regions contain many more genes than the ones represented by the SNP markers, at least four markers within the confidence intervals originated from genes with known function in conifer defence; a leucoanthocyanidine reductase, which has previously been shown to upregulate during H. parviporum infection, and three intermediates of the lignification process; a hydroxycinnamoyl CoA shikimate/quinate hydroxycinnamoyltransferase, a 4-coumarate CoA ligase, and a R2R3-MYB transcription factor. PMID:25036209

Identification of differentially expressed small non-coding RNAs in the legume endosymbiont Sinorhizobium meliloti by comparative genomics

PubMed Central

del Val, Coral; Rivas, Elena; Torres-Quesada, Omar; Toro, Nicolás; Jiménez-Zurdo, José I

2007-01-01

Bacterial small non-coding RNAs (sRNAs) are being recognized as novel widespread regulators of gene expression in response to environmental signals. Here, we present the first search for sRNA-encoding genes in the nitrogen-fixing endosymbiont Sinorhizobium meliloti, performed by a genome-wide computational analysis of its intergenic regions. Comparative sequence data from eight related α-proteobacteria were obtained, and the interspecies pairwise alignments were scored with the programs eQRNA and RNAz as complementary predictive tools to identify conserved and stable secondary structures corresponding to putative non-coding RNAs. Northern experiments confirmed that eight of the predicted loci, selected among the original 32 candidates as most probable sRNA genes, expressed small transcripts. This result supports the combined use of eQRNA and RNAz as a robust strategy to identify novel sRNAs in bacteria. Furthermore, seven of the transcripts accumulated differentially in free-living and symbiotic conditions. Experimental mapping of the 5′-ends of the detected transcripts revealed that their encoding genes are organized in autonomous transcription units with recognizable promoter and, in most cases, termination signatures. These findings suggest novel regulatory functions for sRNAs related to the interactions of α-proteobacteria with their eukaryotic hosts. PMID:17971083
Molecular mapping of QTLs for plant type and earliness traits in pigeonpea (Cajanus cajan L. Millsp.)

PubMed Central

2012-01-01

Background Pigeonpea is an important grain legume of the semi-arid tropics and sub-tropical regions where it plays a crucial role in the food and nutritional security of the people. The average productivity of pigeonpea has remained very low and stagnant for over five decades due to lack of genomic information and intensive breeding efforts. Previous SSR-based linkage maps of pigeonpea used inter-specific crosses due to low inter-varietal polymorphism. Here our aim was to construct a high density intra-specific linkage map using genic-SNP markers for mapping of major quantitative trait loci (QTLs) for key agronomic traits, including plant height, number of primary and secondary branches, number of pods, days to flowering and days to maturity in pigeonpea. Results A population of 186 F2:3 lines derived from an intra-specific cross between inbred lines ‘Pusa Dwarf’ and ‘HDM04-1’ was used to construct a dense molecular linkage map of 296 genic SNP and SSR markers covering a total adjusted map length of 1520.22 cM for the 11 chromosomes of the pigeonpea genome. This is the first dense intra-specific linkage map of pigeonpea with the highest genome length coverage. Phenotypic data from the F2:3 families were used to identify thirteen QTLs for the six agronomic traits. The proportion of phenotypic variance explained by the individual QTLs ranged from 3.18% to 51.4%. Ten of these QTLs were clustered in just two genomic regions, indicating pleiotropic effects or close genetic linkage. In addition to the main effects, significant epistatic interaction effects were detected between the QTLs for number of pods per plant. Conclusions A large amount of information on transcript sequences, SSR markers and draft genome sequence is now available for pigeonpea. However, there is need to develop high density linkage maps and identify genes/QTLs for important agronomic traits for practical breeding applications. This is the first report on identification of QTLs for plant type and maturity traits in pigeonpea. The QTLs identified in this study provide a strong foundation for further validation and fine mapping for utilization in the pigeonpea improvement. PMID:23043321
Allelic expression mapping across cellular lineages to establish impact of non-coding SNPs

PubMed Central

Adoue, Veronique; Schiavi, Alicia; Light, Nicholas; Almlöf, Jonas Carlsson; Lundmark, Per; Ge, Bing; Kwan, Tony; Caron, Maxime; Rönnblom, Lars; Wang, Chuan; Chen, Shu-Huang; Goodall, Alison H; Cambien, Francois; Deloukas, Panos; Ouwehand, Willem H; Syvänen, Ann-Christine; Pastinen, Tomi

2014-01-01

Most complex disease-associated genetic variants are located in non-coding regions and are therefore thought to be regulatory in nature. Association mapping of differential allelic expression (AE) is a powerful method to identify SNPs with direct cis-regulatory impact (cis-rSNPs). We used AE mapping to identify cis-rSNPs regulating gene expression in 55 and 63 HapMap lymphoblastoid cell lines from a Caucasian and an African population, respectively, 70 fibroblast cell lines, and 188 purified monocyte samples and found 40–60% of these cis-rSNPs to be shared across cell types. We uncover a new class of cis-rSNPs, which disrupt footprint-derived de novo motifs that are predominantly bound by repressive factors and are implicated in disease susceptibility through overlaps with GWAS SNPs. Finally, we provide the proof-of-principle for a new approach for genome-wide functional validation of transcription factor–SNP interactions. By perturbing NFκB action in lymphoblasts, we identified 489 cis-regulated transcripts with altered AE after NFκB perturbation. Altogether, we perform a comprehensive analysis of cis-variation in four cell populations and provide new tools for the identification of functional variants associated to complex diseases. PMID:25326100
Anatomy of a hash-based long read sequence mapping algorithm for next generation DNA sequencing.

PubMed

Misra, Sanchit; Agrawal, Ankit; Liao, Wei-keng; Choudhary, Alok

2011-01-15

Recently, a number of programs have been proposed for mapping short reads to a reference genome. Many of them are heavily optimized for short-read mapping and hence are very efficient for shorter queries, but that makes them inefficient or not applicable for reads longer than 200 bp. However, many sequencers are already generating longer reads and more are expected to follow. For long read sequence mapping, there are limited options; BLAT, SSAHA2, FANGS and BWA-SW are among the popular ones. However, resequencing and personalized medicine need much faster software to map these long sequencing reads to a reference genome to identify SNPs or rare transcripts. We present AGILE (AliGnIng Long rEads), a hash table based high-throughput sequence mapping algorithm for longer 454 reads that uses diagonal multiple seed-match criteria, customized q-gram filtering and a dynamic incremental search approach among other heuristics to optimize every step of the mapping process. In our experiments, we observe that AGILE is more accurate than BLAT, and comparable to BWA-SW and SSAHA2. For practical error rates (< 5%) and read lengths (200-1000 bp), AGILE is significantly faster than BLAT, SSAHA2 and BWA-SW. Even for the other cases, AGILE is comparable to BWA-SW and several times faster than BLAT and SSAHA2. http://www.ece.northwestern.edu/~smi539/agile.html.
Genome-Wide Identification and Expression Analysis of WRKY Gene Family in Capsicum annuum L.

PubMed Central

Diao, Wei-Ping; Snyder, John C.; Wang, Shu-Bin; Liu, Jin-Bing; Pan, Bao-Gui; Guo, Guang-Jun; Wei, Ge

2016-01-01

The WRKY family of transcription factors is one of the most important families of plant transcriptional regulators with members regulating multiple biological processes, especially in regulating defense against biotic and abiotic stresses. However, little information is available about WRKYs in pepper (Capsicum annuum L.). The recent release of completely assembled genome sequences of pepper allowed us to perform a genome-wide investigation for pepper WRKY proteins. In the present study, a total of 71 WRKY genes were identified in the pepper genome. According to structural features of their encoded proteins, the pepper WRKY genes (CaWRKY) were classified into three main groups, with the second group further divided into five subgroups. Genome mapping analysis revealed that CaWRKY were enriched on four chromosomes, especially on chromosome 1, and 15.5% of the family members were tandemly duplicated genes. A phylogenetic tree was constructed depending on WRKY domain' sequences derived from pepper and Arabidopsis. The expression of 21 selected CaWRKY genes in response to seven different biotic and abiotic stresses (salt, heat shock, drought, Phytophtora capsici, SA, MeJA, and ABA) was evaluated by quantitative RT-PCR; Some CaWRKYs were highly expressed and up-regulated by stress treatment. Our results will provide a platform for functional identification and molecular breeding studies of WRKY genes in pepper. PMID:26941768
Complementary genetic and genomic approaches help characterize the linkage group I seed protein QTL in soybean

PubMed Central

2010-01-01

Background The nutritional and economic value of many crops is effectively a function of seed protein and oil content. Insight into the genetic and molecular control mechanisms involved in the deposition of these constituents in the developing seed is needed to guide crop improvement. A quantitative trait locus (QTL) on Linkage Group I (LG I) of soybean (Glycine max (L.) Merrill) has a striking effect on seed protein content. Results A soybean near-isogenic line (NIL) pair contrasting in seed protein and differing in an introgressed genomic segment containing the LG I protein QTL was used as a resource to demarcate the QTL region and to study variation in transcript abundance in developing seed. The LG I QTL region was delineated to less than 8.4 Mbp of genomic sequence on chromosome 20. Using Affymetrix® Soy GeneChip and high-throughput Illumina® whole transcriptome sequencing platforms, 13 genes displaying significant seed transcript accumulation differences between NILs were identified that mapped to the 8.4 Mbp LG I protein QTL region. Conclusions This study identifies gene candidates at the LG I protein QTL for potential involvement in the regulation of protein content in the soybean seed. The results demonstrate the power of complementary approaches to characterize contrasting NILs and provide genome-wide transcriptome insight towards understanding seed biology and the soybean genome. PMID:20199683
Variant Histone H2A.Z Is Globally Localized to the Promoters of Inactive Yeast Genes and Regulates Nucleosome Positioning

PubMed Central

Gévry, Nicolas; Adam, Maryse; Blanchette, Mathieu

2005-01-01

H2A.Z is an evolutionary conserved histone variant involved in transcriptional regulation, antisilencing, silencing, and genome stability. The mechanism(s) by which H2A.Z regulates these various biological functions remains poorly defined, in part due to the lack of knowledge regarding its physical location along chromosomes and the bearing it has in regulating chromatin structure. Here we mapped H2A.Z across the yeast genome at an approximately 300-bp resolution, using chromatin immunoprecipitation combined with tiling microarrays. We have identified 4,862 small regions—typically one or two nucleosomes wide—decorated with H2A.Z. Those “Z loci” are predominantly found within specific nucleosomes in the promoter of inactive genes all across the genome. Furthermore, we have shown that H2A.Z can regulate nucleosome positioning at the GAL1 promoter. Within HZAD domains, the regions where H2A.Z shows an antisilencing function, H2A.Z is localized in a wider pattern, suggesting that the variant histone regulates a silencing and transcriptional activation via different mechanisms. Our data suggest that the incorporation of H2A.Z into specific promoter-bound nucleosomes configures chromatin structure to poise genes for transcriptional activation. The relevance of these findings to higher eukaryotes is discussed. PMID:16248679
Genome-wide transcriptional responses of Alteromonas naphthalenivorans SN2 to contaminated seawater and marine tidal flat sediment.

PubMed

Jin, Hyun Mi; Jeong, Hye Im; Kim, Kyung Hyun; Hahn, Yoonsoo; Madsen, Eugene L; Jeon, Che Ok

2016-02-18

A genome-wide transcriptional analysis of Alteromonas naphthalenivorans SN2 was performed to investigate its ecophysiological behavior in contaminated tidal flats and seawater. The experimental design mimicked these habitats that either added naphthalene or pyruvate; tidal flat-naphthalene (TF-N), tidal flat-pyruvate (TF-P), seawater-naphthalene (SW-N), and seawater-pyruvate (SW-P). The transcriptional profiles clustered by habitat (TF-N/TF-P and SW-N/SW-P), rather than carbon source, suggesting that the former may exert a greater influence on genome-wide expression in strain SN2 than the latter. Metabolic mapping of cDNA reads from strain SN2 based on KEGG pathway showed that metabolic and regulatory genes associated with energy metabolism, translation, and cell motility were highly expressed in all four test conditions, probably highlighting the copiotrophic properties of strain SN2 as an opportunistic marine r-strategist. Differential gene expression analysis revealed that strain SN2 displayed specific cellular responses to environmental variables (tidal flat, seawater, naphthalene, and pyruvate) and exhibited certain ecological fitness traits -- its notable PAH degradation capability in seasonally cold tidal flat might be reflected in elevated expression of stress response and chaperone proteins, while fast growth in nitrogen-deficient and aerobic seawater probably correlated with high expression of glutamine synthetase, enzymes utilizing nitrite/nitrate, and those involved in the removal of reactive oxygen species.
Genome-wide transcriptional responses of Alteromonas naphthalenivorans SN2 to contaminated seawater and marine tidal flat sediment

PubMed Central

Jin, Hyun Mi; Jeong, Hye Im; Kim, Kyung Hyun; Hahn, Yoonsoo; Madsen, Eugene L.; Jeon, Che Ok

2016-01-01

A genome-wide transcriptional analysis of Alteromonas naphthalenivorans SN2 was performed to investigate its ecophysiological behavior in contaminated tidal flats and seawater. The experimental design mimicked these habitats that either added naphthalene or pyruvate; tidal flat-naphthalene (TF-N), tidal flat-pyruvate (TF-P), seawater-naphthalene (SW-N), and seawater-pyruvate (SW-P). The transcriptional profiles clustered by habitat (TF-N/TF-P and SW-N/SW-P), rather than carbon source, suggesting that the former may exert a greater influence on genome-wide expression in strain SN2 than the latter. Metabolic mapping of cDNA reads from strain SN2 based on KEGG pathway showed that metabolic and regulatory genes associated with energy metabolism, translation, and cell motility were highly expressed in all four test conditions, probably highlighting the copiotrophic properties of strain SN2 as an opportunistic marine r-strategist. Differential gene expression analysis revealed that strain SN2 displayed specific cellular responses to environmental variables (tidal flat, seawater, naphthalene, and pyruvate) and exhibited certain ecological fitness traits –– its notable PAH degradation capability in seasonally cold tidal flat might be reflected in elevated expression of stress response and chaperone proteins, while fast growth in nitrogen-deficient and aerobic seawater probably correlated with high expression of glutamine synthetase, enzymes utilizing nitrite/nitrate, and those involved in the removal of reactive oxygen species. PMID:26887987
Deep transcriptome sequencing provides new insights into the structural and functional organization of the wheat genome.

PubMed

Pingault, Lise; Choulet, Frédéric; Alberti, Adriana; Glover, Natasha; Wincker, Patrick; Feuillet, Catherine; Paux, Etienne

2015-02-10

Because of its size, allohexaploid nature, and high repeat content, the bread wheat genome is a good model to study the impact of the genome structure on gene organization, function, and regulation. However, because of the lack of a reference genome sequence, such studies have long been hampered and our knowledge of the wheat gene space is still limited. The access to the reference sequence of the wheat chromosome 3B provided us with an opportunity to study the wheat transcriptome and its relationships to genome and gene structure at a level that has never been reached before. By combining this sequence with RNA-seq data, we construct a fine transcriptome map of the chromosome 3B. More than 8,800 transcription sites are identified, that are distributed throughout the entire chromosome. Expression level, expression breadth, alternative splicing as well as several structural features of genes, including transcript length, number of exons, and cumulative intron length are investigated. Our analysis reveals a non-monotonic relationship between gene expression and structure and leads to the hypothesis that gene structure is determined by its function, whereas gene expression is subject to energetic cost. Moreover, we observe a recombination-based partitioning at the gene structure and function level. Our analysis provides new insights into the relationships between gene and genome structure and function. It reveals mechanisms conserved with other plant species as well as superimposed evolutionary forces that shaped the wheat gene space, likely participating in wheat adaptation.
ANISEED 2017: extending the integrated ascidian database to the exploration and evolutionary comparison of genome-scale datasets

PubMed Central

Brozovic, Matija; Dantec, Christelle; Dardaillon, Justine; Dauga, Delphine; Faure, Emmanuel; Gineste, Mathieu; Louis, Alexandra; Naville, Magali; Nitta, Kazuhiro R; Piette, Jacques; Reeves, Wendy; Scornavacca, Céline; Simion, Paul; Vincentelli, Renaud; Bellec, Maelle; Aicha, Sameh Ben; Fagotto, Marie; Guéroult-Bellone, Marion; Haeussler, Maximilian; Jacox, Edwin; Lowe, Elijah K; Mendez, Mickael; Roberge, Alexis; Stolfi, Alberto; Yokomori, Rui; Cambillau, Christian; Christiaen, Lionel; Delsuc, Frédéric; Douzery, Emmanuel; Dumollard, Rémi; Kusakabe, Takehiro; Nakai, Kenta; Nishida, Hiroki; Satou, Yutaka; Swalla, Billie; Veeman, Michael; Volff, Jean-Nicolas

2018-01-01

Abstract ANISEED (www.aniseed.cnrs.fr) is the main model organism database for tunicates, the sister-group of vertebrates. This release gives access to annotated genomes, gene expression patterns, and anatomical descriptions for nine ascidian species. It provides increased integration with external molecular and taxonomy databases, better support for epigenomics datasets, in particular RNA-seq, ChIP-seq and SELEX-seq, and features novel interactive interfaces for existing and novel datatypes. In particular, the cross-species navigation and comparison is enhanced through a novel taxonomy section describing each represented species and through the implementation of interactive phylogenetic gene trees for 60% of tunicate genes. The gene expression section displays the results of RNA-seq experiments for the three major model species of solitary ascidians. Gene expression is controlled by the binding of transcription factors to cis-regulatory sequences. A high-resolution description of the DNA-binding specificity for 131 Ciona robusta (formerly C. intestinalis type A) transcription factors by SELEX-seq is provided and used to map candidate binding sites across the Ciona robusta and Phallusia mammillata genomes. Finally, use of a WashU Epigenome browser enhances genome navigation, while a Genomicus server was set up to explore microsynteny relationships within tunicates and with vertebrates, Amphioxus, echinoderms and hemichordates. PMID:29149270
Calcium Signaling Pathway Genes RUNX2 and CACNA1C Are Associated With Calcific Aortic Valve Disease

PubMed Central

Guauque-Olarte, Sandra; Messika-Zeitoun, David; Droit, Arnaud; Lamontagne, Maxime; Tremblay-Marchand, Joël; Lavoie-Charland, Emilie; Gaudreault, Nathalie; Arsenault, Benoit J.; Dubé, Marie-Pierre; Tardif, Jean-Claude; Body, Simon C.; Seidman, Jonathan G.; Boileau, Catherine; Mathieu, Patrick; Pibarot, Philippe; Bossé, Yohan

2016-01-01

Background Calcific aortic valve stenosis (AS) is a life-threatening disease with no medical therapy. The genetic architecture of AS remains elusive. This study combines genome-wide association studies, gene expression, and expression quantitative trait loci mapping in human valve tissues to identify susceptibility genes of AS. Methods and Results A meta-analysis was performed combining the results of 2 genome-wide association studies in 474 and 486 cases from Quebec City (Canada) and Paris (France), respectively. Corresponding controls consisted of 2988 and 1864 individuals with European ancestry from the database of genotypes and phenotypes. mRNA expression levels were evaluated in 9 calcified and 8 normal aortic valves by RNA sequencing. The results were integrated with valve expression quantitative trait loci data obtained from 22 AS patients. Twenty-five single-nucleotide polymorphisms had P<5×10−6 in the genome-wide association studies meta-analysis. The calcium signaling pathway was the top gene set enriched for genes mapped to moderately AS-associated single-nucleotide polymorphisms. Genes in this pathway were found differentially expressed in valves with and without AS. Two single-nucleotide polymorphisms located in RUNX2 (runt-related transcription factor 2), encoding an osteogenic transcription factor, demonstrated some association with AS (genome-wide association studies P=5.33×10−5). The mRNA expression levels of RUNX2 were upregulated in calcified valves and associated with eQTL-SNPs. CACNA1C encoding a subunit of a voltage-dependent calcium channel was upregulated in calcified valves. The eQTL-SNP with the most significant association with AS located in CACNA1C was associated with higher expression of the gene. Conclusions This integrative genomic study confirmed the role of RUNX2 as a potential driver of AS and identified a new AS susceptibility gene, CACNA1C, belonging to the calcium signaling pathway. PMID:26553695
Gene-based SNP discovery in tepary bean (Phaseolus acutifolius) and common bean (P. vulgaris) for diversity analysis and comparative mapping.

PubMed

Gujaria-Verma, Neha; Ramsay, Larissa; Sharpe, Andrew G; Sanderson, Lacey-Anne; Debouck, Daniel G; Tar'an, Bunyamin; Bett, Kirstin E

2016-03-15

Common bean (Phaseolus vulgaris) is an important grain legume and there has been a recent resurgence in interest in its relative, tepary bean (P. acutifolius), owing to this species' ability to better withstand abiotic stresses. Genomic resources are scarce for this minor crop species and a better knowledge of the genome-level relationship between these two species would facilitate improvement in both. High-throughput genotyping has facilitated large-scale single nucleotide polymorphism (SNP) identification leading to the development of molecular markers with associated sequence information that can be used to place them in the context of a full genome assembly. Transcript-based SNPs were identified from six common bean and two tepary bean accessions and a subset were used to generate a 768-SNP Illumina GoldenGate assay for each species. The tepary bean assay was used to assess diversity in wild and cultivated tepary bean and to generate the first gene-based map of the tepary bean genome. Genotypic analyses of the diversity panel showed a clear separation between domesticated and cultivated tepary beans, two distinct groups within the domesticated types, and P. parvifolius was confirmed to be distinct. The genetic map of tepary bean was compared to the common bean genome assembly to demonstrate high levels of collinearity between the two species with differences limited to a few intra-chromosomal rearrangements. The development of the first set of genomic resources specifically for tepary bean has allowed for greater insight into the structure of this species and its relationship to its agriculturally more prominent relative, common bean. These resources will be helpful in the development of efficient breeding strategies for both species and will facilitate the introgression of agriculturally important traits from one crop into the other.
Transposon Variants and Their Effects on Gene Expression in Arabidopsis

PubMed Central

Wang, Xi; Weigel, Detlef; Smith, Lisa M.

2013-01-01

Transposable elements (TEs) make up the majority of many plant genomes. Their transcription and transposition is controlled through siRNAs and epigenetic marks including DNA methylation. To dissect the interplay of siRNA–mediated regulation and TE evolution, and to examine how TE differences affect nearby gene expression, we investigated genome-wide differences in TEs, siRNAs, and gene expression among three Arabidopsis thaliana accessions. Both TE sequence polymorphisms and presence of linked TEs are positively correlated with intraspecific variation in gene expression. The expression of genes within 2 kb of conserved TEs is more stable than that of genes next to variant TEs harboring sequence polymorphisms. Polymorphism levels of TEs and closely linked adjacent genes are positively correlated as well. We also investigated the distribution of 24-nt-long siRNAs, which mediate TE repression. TEs targeted by uniquely mapping siRNAs are on average farther from coding genes, apparently because they more strongly suppress expression of adjacent genes. Furthermore, siRNAs, and especially uniquely mapping siRNAs, are enriched in TE regions missing in other accessions. Thus, targeting by uniquely mapping siRNAs appears to promote sequence deletions in TEs. Overall, our work indicates that siRNA–targeting of TEs may influence removal of sequences from the genome and hence evolution of gene expression in plants. PMID:23408902
Fine mapping of the celiac disease-associated LPP locus reveals a potential functional variant.

PubMed

Almeida, Rodrigo; Ricaño-Ponce, Isis; Kumar, Vinod; Deelen, Patrick; Szperl, Agata; Trynka, Gosia; Gutierrez-Achury, Javier; Kanterakis, Alexandros; Westra, Harm-Jan; Franke, Lude; Swertz, Morris A; Platteel, Mathieu; Bilbao, Jose Ramon; Barisani, Donatella; Greco, Luigi; Mearin, Luisa; Wolters, Victorien M; Mulder, Chris; Mazzilli, Maria Cristina; Sood, Ajit; Cukrowska, Bozena; Núñez, Concepción; Pratesi, Riccardo; Withoff, Sebo; Wijmenga, Cisca

2014-05-01

Using the Immunochip for genotyping, we identified 39 non-human leukocyte antigen (non-HLA) loci associated to celiac disease (CeD), an immune-mediated disease with a worldwide frequency of ∼1%. The most significant non-HLA signal mapped to the intronic region of 70 kb in the LPP gene. Our aim was to fine map and identify possible functional variants in the LPP locus. We performed a meta-analysis in a cohort of 25 169 individuals from six different populations previously genotyped using Immunochip. Imputation using data from the Genome of the Netherlands and 1000 Genomes projects, followed by meta-analysis, confirmed the strong association signal on the LPP locus (rs2030519, P = 1.79 × 10(-49)), without any novel associations. The conditional analysis on this top SNP-indicated association to a single common haplotype. By performing haplotype analyses in each population separately, as well as in a combined group of the four populations that reach the significant threshold after correction (P < 0.008), we narrowed down the CeD-associated region from 70 to 2.8 kb (P = 1.35 × 10(-44)). By intersecting regulatory data from the ENCODE project, we found a functional SNP, rs4686484 (P = 3.12 × 10(-49)), that maps to several B-cell enhancer elements and a highly conserved region. This SNP was also predicted to change the binding motif of the transcription factors IRF4, IRF11, Nkx2.7 and Nkx2.9, suggesting its role in transcriptional regulation. We later found significantly low levels of LPP mRNA in CeD biopsies compared with controls, thus our results suggest that rs4686484 is the functional variant in this locus, while LPP expression is decreased in CeD.
ChIP-seq Accurately Predicts Tissue-Specific Activity of Enhancers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Visel, Axel; Blow, Matthew J.; Li, Zirong

2009-02-01

A major yet unresolved quest in decoding the human genome is the identification of the regulatory sequences that control the spatial and temporal expression of genes. Distant-acting transcriptional enhancers are particularly challenging to uncover since they are scattered amongst the vast non-coding portion of the genome. Evolutionary sequence constraint can facilitate the discovery of enhancers, but fails to predict when and where they are active in vivo. Here, we performed chromatin immunoprecipitation with the enhancer-associated protein p300, followed by massively-parallel sequencing, to map several thousand in vivo binding sites of p300 in mouse embryonic forebrain, midbrain, and limb tissue. Wemore » tested 86 of these sequences in a transgenic mouse assay, which in nearly all cases revealed reproducible enhancer activity in those tissues predicted by p300 binding. Our results indicate that in vivo mapping of p300 binding is a highly accurate means for identifying enhancers and their associated activities and suggest that such datasets will be useful to study the role of tissue-specific enhancers in human biology and disease on a genome-wide scale.« less
Transcriptionally active LTR retrotransposons in Eucalyptus genus are differentially expressed and insertionally polymorphic.

PubMed

Marcon, Helena Sanches; Domingues, Douglas Silva; Silva, Juliana Costa; Borges, Rafael Junqueira; Matioli, Fábio Filippi; Fontes, Marcos Roberto de Mattos; Marino, Celso Luis

2015-08-14

In Eucalyptus genus, studies on genome composition and transposable elements (TEs) are particularly scarce. Nearly half of the recently released Eucalyptus grandis genome is composed by retrotransposons and this data provides an important opportunity to understand TE dynamics in Eucalyptus genome and transcriptome. We characterized nine families of transcriptionally active LTR retrotransposons from Copia and Gypsy superfamilies in Eucalyptus grandis genome and we depicted genomic distribution and copy number in two Eucalyptus species. We also evaluated genomic polymorphism and transcriptional profile in three organs of five Eucalyptus species. We observed contrasting genomic and transcriptional behavior in the same family among different species. RLC_egMax_1 was the most prevalent family and RLC_egAngela_1 was the family with the lowest copy number. Most families of both superfamilies have their insertions occurring <3 million years, except one Copia family, RLC_egBianca_1. Protein theoretical models suggest different properties between Copia and Gypsy domains. IRAP and REMAP markers suggested genomic polymorphisms among Eucalyptus species. Using EST analysis and qRT-PCRs, we observed transcriptional activity in several tissues and in all evaluated species. In some families, osmotic stress increases transcript values. Our strategy was successful in isolating transcriptionally active retrotransposons in Eucalyptus, and each family has a particular genomic and transcriptional pattern. Overall, our results show that retrotransposon activity have differentially affected genome and transcriptome among Eucalyptus species.
“A draft Musa balbisiana genome sequence for molecular genetics in polyploid, inter- and intra-specific Musa hybrids”

PubMed Central

2013-01-01

Background Modern banana cultivars are primarily interspecific triploid hybrids of two species, Musa acuminata and Musa balbisiana, which respectively contribute the A- and B-genomes. The M. balbisiana genome has been associated with improved vigour and tolerance to biotic and abiotic stresses and is thus a target for Musa breeding programs. However, while a reference M. acuminata genome has recently been released (Nature 488:213–217, 2012), little sequence data is available for the corresponding B-genome. To address these problems we carried out Next Generation gDNA sequencing of the wild diploid M. balbisiana variety ‘Pisang Klutuk Wulung’ (PKW). Our strategy was to align PKW gDNA reads against the published A-genome and to extract the mapped consensus sequences for subsequent rounds of evaluation and gene annotation. Results The resulting B-genome is 79% the size of the A-genome, and contains 36,638 predicted functional gene sequences which is nearly identical to the 36,542 of the A-genome. There is substantial sequence divergence from the A-genome at a frequency of 1 homozygous SNP per 23.1 bp, and a high degree of heterozygosity corresponding to one heterozygous SNP per 55.9 bp. Using expressed small RNA data, a similar number of microRNA sequences were predicted in both A- and B-genomes, but additional novel miRNAs were detected, including some that are unique to each genome. The usefulness of this B-genome sequence was evaluated by mapping RNA-seq data from a set of triploid AAA and AAB hybrids simultaneously to both genomes. Results for the plantains demonstrated the expected 2:1 distribution of reads across the A- and B-genomes, but for the AAA genomes, results show they contain regions of significant homology to the B-genome supporting proposals that there has been a history of interspecific recombination between homeologous A and B chromosomes in Musa hybrids. Conclusions We have generated and annotated a draft reference Musa B-genome and demonstrate that this can be used for molecular genetic mapping of gene transcripts and small RNA expression data from several allopolyploid banana cultivars. This draft therefore represents a valuable resource to support the study of metabolism in inter- and intraspecific triploid Musa hybrids and to help direct breeding programs. PMID:24094114
Regulation of cotton (Gossypium hirsutum) drought responses by mitogen-activated protein (MAP) kinase cascade-mediated phosphorylation of GhWRKY59.

PubMed

Li, Fangjun; Li, Maoying; Wang, Ping; Cox, Kevin L; Duan, Liusheng; Dever, Jane K; Shan, Libo; Li, Zhaohu; He, Ping

2017-09-01

Drought is a key limiting factor for cotton (Gossypium spp.) production, as more than half of the global cotton supply is grown in regions with high water shortage. However, the underlying mechanism of the response of cotton to drought stress remains elusive. By combining genome-wide transcriptome profiling and a loss-of-function screen using virus-induced gene silencing, we identified Gossypium hirsutum GhWRKY59 as an important transcription factor that regulates the drought stress response in cotton. Biochemical and genetic analyses revealed a drought stress-activated mitogen-activated protein (MAP) kinase cascade consisting of GhMAP3K15-Mitogen-activated Protein Kinase Kinase 4 (GhMKK4)-Mitogen-activated Protein Kinase 6 (GhMPK6) that directly phosphorylates GhWRKY59 at residue serine 221. Interestingly, GhWRKY59 is required for dehydration-induced expression of GhMAPK3K15, constituting a positive feedback loop of GhWRKY59-regulated MAP kinase activation in response to drought stress. Moreover, GhWRKY59 directly binds to the W-boxes of DEHYDRATION-RESPONSIVE ELEMENT-BINDING PROTEIN 2 (GhDREB2), which encodes a dehydration-inducible transcription factor regulating the plant hormone abscisic acid (ABA)-independent drought response. Our study identified a complete MAP kinase cascade that phosphorylates and activates a key WRKY transcription factor, and elucidated a regulatory module, consisting of GhMAP3K15-GhMKK4-GhMPK6-GhWRKY59-GhDREB2, that is involved in controlling the cotton drought response. © 2017 The Authors. New Phytologist © 2017 New Phytologist Trust.
Prdm5 Regulates Collagen Gene Transcription by Association with RNA Polymerase II in Developing Bone

PubMed Central

Galli, Giorgio Giacomo; Honnens de Lichtenberg, Kristian; Carrara, Matteo; Hans, Wolfgang; Wuelling, Manuela; Mentz, Bettina; Multhaupt, Hinke Arnolda; Fog, Cathrine Kolster; Jensen, Klaus Thorleif; Rappsilber, Juri; Vortkamp, Andrea; Coulton, Les; Fuchs, Helmut; Gailus-Durner, Valérie; Hrabě de Angelis, Martin; Calogero, Raffaele Adolfo; Couchman, John Robert; Lund, Anders Henrik

2012-01-01

PRDM family members are transcriptional regulators involved in tissue specific differentiation. PRDM5 has been reported to predominantly repress transcription, but a characterization of its molecular functions in a relevant biological context is lacking. We demonstrate here that Prdm5 is highly expressed in developing bones; and, by genome-wide mapping of Prdm5 occupancy in pre-osteoblastic cells, we uncover a novel and unique role for Prdm5 in targeting all mouse collagen genes as well as several SLRP proteoglycan genes. In particular, we show that Prdm5 controls both Collagen I transcription and fibrillogenesis by binding inside the Col1a1 gene body and maintaining RNA polymerase II occupancy. In vivo, Prdm5 loss results in delayed ossification involving a pronounced impairment in the assembly of fibrillar collagens. Collectively, our results define a novel role for Prdm5 in sustaining the transcriptional program necessary to the proper assembly of osteoblastic extracellular matrix. PMID:22589746

Regulated expression of a novel TCP domain transcription factor indicates an involvement in the control of meristem activation processes in Solanum tuberosum.

PubMed

Faivre-Rampant, Odile; Bryan, Glenn J; Roberts, Alison G; Milbourne, Daniel; Viola, Roberto; Taylor, Mark A

2004-04-01

In this study, the aim was to determine whether TCP transcription factors are implicated in meristem activation in potato (Solanum tuberosum). By searching a database of potato EST sequences, with a sequence characteristically conserved in TCP domains, a potato tcp gene was identified. A BAC clone containing the tcp sequence was isolated and the genomic sequence was determined. Using a CAPS marker assay, the potato tcp gene (sttcp1) was mapped to chromosome 8. In dormant buds, relatively high levels of sttcp1-specific transcript were detected by in situ hybridization. By contrast, in sprouting buds, no expression of the sttcp1 could be detected. Furthermore, an inverse relationship between axillary bud size and the steady-state level of the sstcp1 transcript was demonstrated. In non-growing buds exhibiting correlative inhibition, sttcpI-specific transcript levels were also relatively high, but rapidly decreased when apical dominance was removed by excision of the apical bud.
Transcription arrest by a G quadruplex forming-trinucleotide repeat sequence from the human c-myb gene.

PubMed

Broxson, Christopher; Beckett, Joshua; Tornaletti, Silvia

2011-05-17

Non canonical DNA structures correspond to genomic regions particularly susceptible to genetic instability. The transcription process facilitates formation of these structures and plays a major role in generating the instability associated with these genomic sites. However, little is known about how non canonical structures are processed when encountered by an elongating RNA polymerase. Here we have studied the behavior of T7 RNA polymerase (T7RNAP) when encountering a G quadruplex forming-(GGA)(4) repeat located in the human c-myb proto-oncogene. To make direct correlations between formation of the structure and effects on transcription, we have taken advantage of the ability of the T7 polymerase to transcribe single-stranded substrates and of G4 DNA to form in single-stranded G-rich sequences in the presence of potassium ions. Under physiological KCl concentrations, we found that T7 RNAP transcription was arrested at two sites that mapped to the c-myb (GGA)(4) repeat sequence. The extent of arrest did not change with time, indicating that the c-myb repeat represented an absolute block and not a transient pause to T7 RNAP. Consistent with G4 DNA formation, arrest was not observed in the absence of KCl or in the presence of LiCl. Furthermore, mutations in the c-myb (GGA)(4) repeat, expected to prevent transition to G4, also eliminated the transcription block. We show T7 RNAP arrest at the c-myb repeat in double-stranded DNA under conditions mimicking the cellular concentration of biomolecules and potassium ions, suggesting that the G4 structure formed in the c-myb repeat may represent a transcription roadblock in vivo. Our results support a mechanism of transcription-coupled DNA repair initiated by arrest of transcription at G4 structures.
Genomic organization of the Neurospora crassa gsn gene: possible involvement of the STRE and HSE elements in the modulation of transcription during heat shock.

PubMed

Freitas, F Zanolli; Bertolini, M C

2004-12-01

Glycogen synthase, an enzyme involved in glycogen biosynthesis, is regulated by phosphorylation and by the allosteric ligand glucose-6-phosphate (G6P). In addition, enzyme levels can be regulated by changes in gene expression. We recently cloned a cDNA for glycogen synthase ( gsn) from Neurospora crassa, and showed that gsn transcription decreased when cells were exposed to heat shock (shifted from 30 degrees C to 45 degrees C). In order to understand the mechanisms that control gsn expression, we isolated the gene, including its 5' and 3' flanking regions, from the genome of N. crassa. An ORF of approximately 2.4 kb was identified, which is interrupted by four small introns (II-V). Intron I (482 bp) is located in the 5'UTR region. Three putative Transcription Initiation Sites (TISs) were mapped, one of which lies downstream of a canonical TATA-box sequence (5'-TGTATAAA-3'). Analysis of the 5'-flanking region revealed the presence of putative transcription factor-binding sites, including Heat Shock Elements (HSEs) and STress Responsive Elements (STREs). The possible involvement of these motifs in the negative regulation of gsn transcription was investigated using Electrophoretic Mobility Shift Assays (EMSA) with nuclear extracts of N. crassa mycelium obtained before and after heat shock, and DNA fragments encompassing HSE and STRE elements from the 5'-flanking region. While elements within the promoter region are involved in transcription under heat shock, elements in the 5'UTR intron may participate in transcription during vegetative growth. The results thus suggest that N. crassa possesses trans -acting elements that interact with the 5'-flanking region to regulate gsn transcription during heat shock and vegetative growth.
Genome-wide DNase hypersensitivity, and occupancy of RUNX2 and CTCF reveal a highly dynamic gene regulome during MC3T3 pre-osteoblast differentiation.

PubMed

Tai, Phillip W L; Wu, Hai; van Wijnen, André J; Stein, Gary S; Stein, Janet L; Lian, Jane B

2017-01-01

The ability to discover regulatory sequences that control bone-related genes during development has been greatly improved by massively parallel sequencing methodologies. To expand our understanding of cis-regulatory regions critical to the control of gene expression during osteoblastogenesis, we probed the presence of open chromatin states across the osteoblast genome using global DNase hypersensitivity (DHS) mapping. Our profiling of MC3T3 mouse pre-osteoblasts during differentiation has identified more than 224,000 unique DHS sites. Approximately 65% of these sites are dynamic during temporal stages of osteoblastogenesis, and a majority of them are located within non-promoter (intergenic and intronic) regions. Nearly half of all DHS sites (both constitutive and dynamic) overlap binding events of the bone-essential RUNX2 and/or the chromatin-related CTCF transcription factors. This finding reinforces the role of these regulatory proteins as essential components of the bone gene regulome. We observe a reduction in chromatin accessibility throughout the genome between pre-osteoblast and early osteoblasts. Our analysis also defined a class of differentially expressed genes that harbor DHS peaks centered within 1 kb downstream of transcriptional end sites (TES). These DHSs at the 3'-flanks of genes exhibit dynamic changes during differentiation that may impact regulation of the osteoblast genome. Taken together, the distribution of DHS regions within non-promoter locations harboring osteoblast and chromatin related transcription factor binding motifs, reflect novel cis-regulatory requirements to support temporal gene expression in differentiating osteoblasts.
Global transcriptional start site mapping using differential RNA sequencing reveals novel antisense RNAs in Escherichia coli.

PubMed

Thomason, Maureen K; Bischler, Thorsten; Eisenbart, Sara K; Förstner, Konrad U; Zhang, Aixia; Herbig, Alexander; Nieselt, Kay; Sharma, Cynthia M; Storz, Gisela

2015-01-01

While the model organism Escherichia coli has been the subject of intense study for decades, the full complement of its RNAs is only now being examined. Here we describe a survey of the E. coli transcriptome carried out using a differential RNA sequencing (dRNA-seq) approach, which can distinguish between primary and processed transcripts, and an automated prediction algorithm for transcriptional start sites (TSS). With the criterion of expression under at least one of three growth conditions examined, we predicted 14,868 TSS candidates, including 5,574 internal to annotated genes (iTSS) and 5,495 TSS corresponding to potential antisense RNAs (asRNAs). We examined expression of 14 candidate asRNAs by Northern analysis using RNA from wild-type E. coli and from strains defective for RNases III and E, two RNases reported to be involved in asRNA processing. Interestingly, nine asRNAs detected as distinct bands by Northern analysis were differentially affected by the rnc and rne mutations. We also compared our asRNA candidates with previously published asRNA annotations from RNA-seq data and discuss the challenges associated with these cross-comparisons. Our global transcriptional start site map represents a valuable resource for identification of transcription start sites, promoters, and novel transcripts in E. coli and is easily accessible, together with the cDNA coverage plots, in an online genome browser. Copyright © 2015, American Society for Microbiology. All Rights Reserved.
Conditional Selection of Genomic Alterations Dictates Cancer Evolution and Oncogenic Dependencies.

PubMed

Mina, Marco; Raynaud, Franck; Tavernari, Daniele; Battistello, Elena; Sungalee, Stephanie; Saghafinia, Sadegh; Laessle, Titouan; Sanchez-Vega, Francisco; Schultz, Nikolaus; Oricchio, Elisa; Ciriello, Giovanni

2017-08-14

Cancer evolves through the emergence and selection of molecular alterations. Cancer genome profiling has revealed that specific events are more or less likely to be co-selected, suggesting that the selection of one event depends on the others. However, the nature of these evolutionary dependencies and their impact remain unclear. Here, we designed SELECT, an algorithmic approach to systematically identify evolutionary dependencies from alteration patterns. By analyzing 6,456 genomes from multiple tumor types, we constructed a map of oncogenic dependencies associated with cellular pathways, transcriptional readouts, and therapeutic response. Finally, modeling of cancer evolution shows that alteration dependencies emerge only under conditional selection. These results provide a framework for the design of strategies to predict cancer progression and therapeutic response. Copyright © 2017 Elsevier Inc. All rights reserved.
Genome-nuclear lamina interactions and gene regulation.

PubMed

Kind, Jop; van Steensel, Bas

2010-06-01

The nuclear lamina, a filamentous protein network that coats the inner nuclear membrane, has long been thought to interact with specific genomic loci and regulate their expression. Molecular mapping studies have now identified large genomic domains that are in contact with the lamina. Genes in these domains are typically repressed, and artificial tethering experiments indicate that the lamina can actively contribute to this repression. Furthermore, the lamina indirectly controls gene expression in the nuclear interior by sequestration of certain transcription factors. A variety of DNA-binding and chromatin proteins may anchor specific loci to the lamina, while histone-modifying enzymes partly mediate the local repressive effect of the lamina. Experimental tools are now available to begin to unravel the underlying molecular mechanisms. Copyright 2010 Elsevier Ltd. All rights reserved.
Gene Expression Profiling of Development and Anthocyanin Accumulation in Kiwifruit (Actinidia chinensis) Based on Transcriptome Sequencing

PubMed Central

Zeng, Shaohua; Xiao, Gong; Wang, Gan; Wang, Ying; Peng, Ming; Huang, Hongwen

2015-01-01

Red-fleshed kiwifruit (Actinidia chinensis Planch. ‘Hongyang’) is a promising commercial cultivar due to its nutritious value and unique flesh color, derived from vitamin C and anthocyanins. In this study, we obtained transcriptome data of ‘Hongyang’ from seven developmental stages using Illumina sequencing. We mapped 39–54 million reads to the recently sequenced kiwifruit genome and other databases to define gene structure, to analyze alternative splicing, and to quantify gene transcript abundance at different developmental stages. The transcript profiles throughout red kiwifruit development were constructed and analyzed, with a focus on the biosynthesis and metabolism of compounds such as phytohormones, sugars, starch and L-ascorbic acid, which are indispensable for the development and formation of quality fruit. Candidate genes for these pathways were identified through MapMan and phylogenetic analysis. The transcript levels of genes involved in sucrose and starch metabolism were consistent with the change in soluble sugar and starch content throughout kiwifruit development. The metabolism of L-ascorbic acid was very active, primarily through the L-galactose pathway. The genes responsible for the accumulation of anthocyanin in red kiwifruit were identified, and their expression levels were investigated during kiwifruit development. This survey of gene expression during kiwifruit development paves the way for further investigation of the development of this uniquely colored and nutritious fruit and reveals which factors are needed for high quality fruit formation. This transcriptome data and its analysis will be useful for improving kiwifruit genome annotation, for basic fruit molecular biology research, and for kiwifruit breeding and improvement. PMID:26301713
A high-quality human reference panel reveals the complexity and distribution of genomic structural variants.

PubMed

Hehir-Kwa, Jayne Y; Marschall, Tobias; Kloosterman, Wigard P; Francioli, Laurent C; Baaijens, Jasmijn A; Dijkstra, Louis J; Abdellaoui, Abdel; Koval, Vyacheslav; Thung, Djie Tjwan; Wardenaar, René; Renkens, Ivo; Coe, Bradley P; Deelen, Patrick; de Ligt, Joep; Lameijer, Eric-Wubbo; van Dijk, Freerk; Hormozdiari, Fereydoun; Uitterlinden, André G; van Duijn, Cornelia M; Eichler, Evan E; de Bakker, Paul I W; Swertz, Morris A; Wijmenga, Cisca; van Ommen, Gert-Jan B; Slagboom, P Eline; Boomsma, Dorret I; Schönhuth, Alexander; Ye, Kai; Guryev, Victor

2016-10-06

Structural variation (SV) represents a major source of differences between individual human genomes and has been linked to disease phenotypes. However, the majority of studies provide neither a global view of the full spectrum of these variants nor integrate them into reference panels of genetic variation. Here, we analyse whole genome sequencing data of 769 individuals from 250 Dutch families, and provide a haplotype-resolved map of 1.9 million genome variants across 9 different variant classes, including novel forms of complex indels, and retrotransposition-mediated insertions of mobile elements and processed RNAs. A large proportion are previously under reported variants sized between 21 and 100 bp. We detect 4 megabases of novel sequence, encoding 11 new transcripts. Finally, we show 191 known, trait-associated SNPs to be in strong linkage disequilibrium with SVs and demonstrate that our panel facilitates accurate imputation of SVs in unrelated individuals.
Microarray Analyses and Comparisons of Upper or Lower Flanks of Rice Shoot Base Preceding Gravitropic Bending

PubMed Central

Zang, Aiping; Chen, Haiying; Dou, Xianying; Jin, Jing; Cai, Weiming

2013-01-01

Gravitropism is a complex process involving a series of physiological pathways. Despite ongoing research, gravitropism sensing and response mechanisms are not well understood. To identify the key transcripts and corresponding pathways in gravitropism, a whole-genome microarray approach was used to analyze transcript abundance in the shoot base of rice (Oryza sativa sp. japonica) at 0.5 h and 6 h after gravistimulation by horizontal reorientation. Between upper and lower flanks of the shoot base, 167 transcripts at 0.5 h and 1202 transcripts at 6 h were discovered to be significantly different in abundance by 2-fold. Among these transcripts, 48 were found to be changed both at 0.5 h and 6 h, while 119 transcripts were only changed at 0.5 h and 1154 transcripts were changed at 6 h in association with gravitropism. MapMan and PageMan analyses were used to identify transcripts significantly changed in abundance. The asymmetric regulation of transcripts related to phytohormones, signaling, RNA transcription, metabolism and cell wall-related categories between upper and lower flanks were demonstrated. Potential roles of the identified transcripts in gravitropism are discussed. Our results suggest that the induction of asymmetrical transcription, likely as a consequence of gravitropic reorientation, precedes gravitropic bending in the rice shoot base. PMID:24040303
A spruce gene map infers ancient plant genome reshuffling and subsequent slow evolution in the gymnosperm lineage leading to extant conifers

PubMed Central

2012-01-01

Background Seed plants are composed of angiosperms and gymnosperms, which diverged from each other around 300 million years ago. While much light has been shed on the mechanisms and rate of genome evolution in flowering plants, such knowledge remains conspicuously meagre for the gymnosperms. Conifers are key representatives of gymnosperms and the sheer size of their genomes represents a significant challenge for characterization, sequencing and assembling. Results To gain insight into the macro-organisation and long-term evolution of the conifer genome, we developed a genetic map involving 1,801 spruce genes. We designed a statistical approach based on kernel density estimation to analyse gene density and identified seven gene-rich isochors. Groups of co-localizing genes were also found that were transcriptionally co-regulated, indicative of functional clusters. Phylogenetic analyses of 157 gene families for which at least two duplicates were mapped on the spruce genome indicated that ancient gene duplicates shared by angiosperms and gymnosperms outnumbered conifer-specific duplicates by a ratio of eight to one. Ancient duplicates were much more translocated within and among spruce chromosomes than conifer-specific duplicates, which were mostly organised in tandem arrays. Both high synteny and collinearity were also observed between the genomes of spruce and pine, two conifers that diverged more than 100 million years ago. Conclusions Taken together, these results indicate that much genomic evolution has occurred in the seed plant lineage before the split between gymnosperms and angiosperms, and that the pace of evolution of the genome macro-structure has been much slower in the gymnosperm lineage leading to extent conifers than that seen for the same period of time in flowering plants. This trend is largely congruent with the contrasted rates of diversification and morphological evolution observed between these two groups of seed plants. PMID:23102090
A spruce gene map infers ancient plant genome reshuffling and subsequent slow evolution in the gymnosperm lineage leading to extant conifers.

PubMed

Pavy, Nathalie; Pelgas, Betty; Laroche, Jérôme; Rigault, Philippe; Isabel, Nathalie; Bousquet, Jean

2012-10-26

Seed plants are composed of angiosperms and gymnosperms, which diverged from each other around 300 million years ago. While much light has been shed on the mechanisms and rate of genome evolution in flowering plants, such knowledge remains conspicuously meagre for the gymnosperms. Conifers are key representatives of gymnosperms and the sheer size of their genomes represents a significant challenge for characterization, sequencing and assembling. To gain insight into the macro-organisation and long-term evolution of the conifer genome, we developed a genetic map involving 1,801 spruce genes. We designed a statistical approach based on kernel density estimation to analyse gene density and identified seven gene-rich isochors. Groups of co-localizing genes were also found that were transcriptionally co-regulated, indicative of functional clusters. Phylogenetic analyses of 157 gene families for which at least two duplicates were mapped on the spruce genome indicated that ancient gene duplicates shared by angiosperms and gymnosperms outnumbered conifer-specific duplicates by a ratio of eight to one. Ancient duplicates were much more translocated within and among spruce chromosomes than conifer-specific duplicates, which were mostly organised in tandem arrays. Both high synteny and collinearity were also observed between the genomes of spruce and pine, two conifers that diverged more than 100 million years ago. Taken together, these results indicate that much genomic evolution has occurred in the seed plant lineage before the split between gymnosperms and angiosperms, and that the pace of evolution of the genome macro-structure has been much slower in the gymnosperm lineage leading to extent conifers than that seen for the same period of time in flowering plants. This trend is largely congruent with the contrasted rates of diversification and morphological evolution observed between these two groups of seed plants.
Identification of ovule transcripts from the Apospory-Specific Genomic Region (ASGR)-carrier chromosome

PubMed Central

2011-01-01

Background Apomixis, asexual seed production in plants, holds great potential for agriculture as a means to fix hybrid vigor. Apospory is a form of apomixis where the embryo develops from an unreduced egg that is derived from a somatic nucellar cell, the aposporous initial, via mitosis. Understanding the molecular mechanism regulating aposporous initial specification will be a critical step toward elucidation of apomixis and also provide insight into developmental regulation and downstream signaling that results in apomixis. To discover candidate transcripts for regulating aposporous initial specification in P. squamulatum, we compared two transcriptomes derived from microdissected ovules at the stage of aposporous initial formation between the apomictic donor parent, P. squamulatum (accession PS26), and an apomictic derived backcross 8 (BC8) line containing only the Apospory-Specific Genomic Region (ASGR)-carrier chromosome from P. squamulatum. Toward this end, two transcriptomes derived from ovules of an apomictic donor parent and its apomictic backcross derivative at the stage of apospory initiation, were sequenced using 454-FLX technology. Results Using 454-FLX technology, we generated 332,567 reads with an average read length of 147 base pairs (bp) for the PS26 ovule transcriptome library and 363,637 reads with an average read length of 142 bp for the BC8 ovule transcriptome library. A total of 33,977 contigs from the PS26 ovule transcriptome library and 26,576 contigs from the BC8 ovule transcriptome library were assembled using the Multifunctional Inertial Reference Assembly program. Using stringent in silico parameters, 61 transcripts were predicted to map to the ASGR-carrier chromosome, of which 49 transcripts were verified as ASGR-carrier chromosome specific. One of the alien expressed genes could be assigned as tightly linked to the ASGR by screening of apomictic and sexual F1s. Only one transcript, which did not map to the ASGR, showed expression primarily in reproductive tissue. Conclusions Our results suggest that a strategy of comparative sequencing of transcriptomes between donor parent and backcross lines containing an alien chromosome of interest can be an efficient method of identifying transcripts derived from an alien chromosome in a chromosome addition line. PMID:21521529
Fine-mapping inflammatory bowel disease loci to single-variant resolution.

PubMed

Huang, Hailiang; Fang, Ming; Jostins, Luke; Umićević Mirkov, Maša; Boucher, Gabrielle; Anderson, Carl A; Andersen, Vibeke; Cleynen, Isabelle; Cortes, Adrian; Crins, François; D'Amato, Mauro; Deffontaine, Valérie; Dmitrieva, Julia; Docampo, Elisa; Elansary, Mahmoud; Farh, Kyle Kai-How; Franke, Andre; Gori, Ann-Stephan; Goyette, Philippe; Halfvarson, Jonas; Haritunians, Talin; Knight, Jo; Lawrance, Ian C; Lees, Charlie W; Louis, Edouard; Mariman, Rob; Meuwissen, Theo; Mni, Myriam; Momozawa, Yukihide; Parkes, Miles; Spain, Sarah L; Théâtre, Emilie; Trynka, Gosia; Satsangi, Jack; van Sommeren, Suzanne; Vermeire, Severine; Xavier, Ramnik J; Weersma, Rinse K; Duerr, Richard H; Mathew, Christopher G; Rioux, John D; McGovern, Dermot P B; Cho, Judy H; Georges, Michel; Daly, Mark J; Barrett, Jeffrey C

2017-07-13

Inflammatory bowel diseases are chronic gastrointestinal inflammatory disorders that affect millions of people worldwide. Genome-wide association studies have identified 200 inflammatory bowel disease-associated loci, but few have been conclusively resolved to specific functional variants. Here we report fine-mapping of 94 inflammatory bowel disease loci using high-density genotyping in 67,852 individuals. We pinpoint 18 associations to a single causal variant with greater than 95% certainty, and an additional 27 associations to a single variant with greater than 50% certainty. These 45 variants are significantly enriched for protein-coding changes (n = 13), direct disruption of transcription-factor binding sites (n = 3), and tissue-specific epigenetic marks (n = 10), with the last category showing enrichment in specific immune cells among associations stronger in Crohn's disease and in gut mucosa among associations stronger in ulcerative colitis. The results of this study suggest that high-resolution fine-mapping in large samples can convert many discoveries from genome-wide association studies into statistically convincing causal variants, providing a powerful substrate for experimental elucidation of disease mechanisms.
Identification of differentially expressed genes in cucumber (Cucumis sativus L.) root under waterlogging stress by digital gene expression profile.

PubMed

Qi, Xiao-Hua; Xu, Xue-Wen; Lin, Xiao-Jian; Zhang, Wen-Jie; Chen, Xue-Hao

2012-03-01

High-throughput tag-sequencing (Tag-seq) analysis based on the Solexa Genome Analyzer platform was applied to analyze the gene expression profiling of cucumber plant at 5 time points over a 24h period of waterlogging treatment. Approximately 5.8 million total clean sequence tags per library were obtained with 143013 distinct clean tag sequences. Approximately 23.69%-29.61% of the distinct clean tags were mapped unambiguously to the unigene database, and 53.78%-60.66% of the distinct clean tags were mapped to the cucumber genome database. Analysis of the differentially expressed genes revealed that most of the genes were down-regulated in the waterlogging stages, and the differentially expressed genes mainly linked to carbon metabolism, photosynthesis, reactive oxygen species generation/scavenging, and hormone synthesis/signaling. Finally, quantitative real-time polymerase chain reaction using nine genes independently verified the tag-mapped results. This present study reveals the comprehensive mechanisms of waterlogging-responsive transcription in cucumber. Copyright Â© 2011 Elsevier Inc. All rights reserved.
High-resolution genetic mapping of allelic variants associated with cell wall chemistry in Populus.

PubMed

Muchero, Wellington; Guo, Jianjun; DiFazio, Stephen P; Chen, Jin-Gui; Ranjan, Priya; Slavov, Gancho T; Gunter, Lee E; Jawdy, Sara; Bryan, Anthony C; Sykes, Robert; Ziebell, Angela; Klápště, Jaroslav; Porth, Ilga; Skyba, Oleksandr; Unda, Faride; El-Kassaby, Yousry A; Douglas, Carl J; Mansfield, Shawn D; Martin, Joel; Schackwitz, Wendy; Evans, Luke M; Czarnecki, Olaf; Tuskan, Gerald A

2015-01-23

QTL cloning for the discovery of genes underlying polygenic traits has historically been cumbersome in long-lived perennial plants like Populus. Linkage disequilibrium-based association mapping has been proposed as a cloning tool, and recent advances in high-throughput genotyping and whole-genome resequencing enable marker saturation to levels sufficient for association mapping with no a priori candidate gene selection. Here, multiyear and multienvironment evaluation of cell wall phenotypes was conducted in an interspecific P. trichocarpa x P. deltoides pseudo-backcross mapping pedigree and two partially overlapping populations of unrelated P. trichocarpa genotypes using pyrolysis molecular beam mass spectrometry, saccharification, and/ or traditional wet chemistry. QTL mapping was conducted using a high-density genetic map with 3,568 SNP markers. As a fine-mapping approach, chromosome-wide association mapping targeting a QTL hot-spot on linkage group XIV was performed in the two P. trichocarpa populations. Both populations were genotyped using the 34 K Populus Infinium SNP array and whole-genome resequencing of one of the populations facilitated marker-saturation of candidate intervals for gene identification. Five QTLs ranging in size from 0.6 to 1.8 Mb were mapped on linkage group XIV for lignin content, syringyl to guaiacyl (S/G) ratio, 5- and 6-carbon sugars using the mapping pedigree. Six candidate loci exhibiting significant associations with phenotypes were identified within QTL intervals. These associations were reproducible across multiple environments, two independent genotyping platforms, and different plant growth stages. cDNA sequencing for allelic variants of three of the six loci identified polymorphisms leading to variable length poly glutamine (PolyQ) stretch in a transcription factor annotated as an ANGUSTIFOLIA C-terminus Binding Protein (CtBP) and premature stop codons in a KANADI transcription factor as well as a protein kinase. Results from protoplast transient expression assays suggested that each of the polymorphisms conferred allelic differences in the activation of cellulose, hemicelluloses, and lignin pathway marker genes. This study illustrates the utility of complementary QTL and association mapping as tools for gene discovery with no a priori candidate gene selection. This proof of concept in a perennial organism opens up opportunities for discovery of novel genetic determinants of economically important but complex traits in plants.
Cohesin regulates tissue-specific expression by stabilizing highly occupied cis-regulatory modules

PubMed Central

Faure, Andre J.; Schmidt, Dominic; Watt, Stephen; Schwalie, Petra C.; Wilson, Michael D.; Xu, Huiling; Ramsay, Robert G.; Odom, Duncan T.; Flicek, Paul

2012-01-01

The cohesin protein complex contributes to transcriptional regulation in a CTCF-independent manner by colocalizing with master regulators at tissue-specific loci. The regulation of transcription involves the concerted action of multiple transcription factors (TFs) and cohesin's role in this context of combinatorial TF binding remains unexplored. To investigate cohesin-non-CTCF (CNC) binding events in vivo we mapped cohesin and CTCF, as well as a collection of tissue-specific and ubiquitous transcriptional regulators using ChIP-seq in primary mouse liver. We observe a positive correlation between the number of distinct TFs bound and the presence of CNC sites. In contrast to regions of the genome where cohesin and CTCF colocalize, CNC sites coincide with the binding of master regulators and enhancer-markers and are significantly associated with liver-specific expressed genes. We also show that cohesin presence partially explains the commonly observed discrepancy between TF motif score and ChIP signal. Evidence from these statistical analyses in wild-type cells, and comparisons to maps of TF binding in Rad21-cohesin haploinsufficient mouse liver, suggests that cohesin helps to stabilize large protein–DNA complexes. Finally, we observe that the presence of mirrored CTCF binding events at promoters and their nearby cohesin-bound enhancers is associated with elevated expression levels. PMID:22780989
Polytene Chromosomes - A Portrait of Functional Organization of the Drosophila Genome.

PubMed

Zykova, Tatyana Yu; Levitsky, Victor G; Belyaeva, Elena S; Zhimulev, Igor F

2018-04-01

This mini-review is devoted to the problem genetic meaning of main polytene chromosome structures - bands and interbands. Generally, densely packed chromatin forms black bands, moderately condensed regions form grey loose bands, whereas decondensed regions of the genome appear as interbands. Recent progress in the annotation of the Drosophila genome and epigenome has made it possible to compare the banding pattern and the structural organization of genes, as well as their activity. This was greatly aided by our ability to establish the borders of bands and interbands on the physical map, which allowed to perform comprehensive side-by-side comparisons of cytology, genetic and epigenetic maps and to uncover the association between the morphological structures and the functional domains of the genome. These studies largely conclude that interbands 5'-ends of housekeeping genes that are active across all cell types. Interbands are enriched with proteins involved in transcription and nucleosome remodeling, as well as with active histone modifications. Notably, most of the replication origins map to interband regions. As for grey loose bands adjacent to interbands, they typically host the bodies of house-keeping genes. Thus, the bipartite structure composed of an interband and an adjacent grey band functions as a standalone genetic unit. Finally, black bands harbor tissue-specific genes with narrow temporal and tissue expression profiles. Thus, the uniform and permanent activity of interbands combined with the inactivity of genes in bands forms the basis of the universal banding pattern observed in various Drosophila tissues.
Signal Correlations in Ecological Niches Can Shape the Organization and Evolution of Bacterial Gene Regulatory Networks

PubMed Central

Dufour, Yann S.; Donohue, Timothy J.

2015-01-01

Transcriptional regulation plays a significant role in the biological response of bacteria to changing environmental conditions. Therefore, mapping transcriptional regulatory networks is an important step not only in understanding how bacteria sense and interpret their environment but also to identify the functions involved in biological responses to specific conditions. Recent experimental and computational developments have facilitated the characterization of regulatory networks on a genome-wide scale in model organisms. In addition, the multiplication of complete genome sequences has encouraged comparative analyses to detect conserved regulatory elements and infer regulatory networks in other less well-studied organisms. However, transcription regulation appears to evolve rapidly, thus, creating challenges for the transfer of knowledge to nonmodel organisms. Nevertheless, the mechanisms and constraints driving the evolution of regulatory networks have been the subjects of numerous analyses, and several models have been proposed. Overall, the contributions of mutations, recombination, and horizontal gene transfer are complex. Finally, the rapid evolution of regulatory networks plays a significant role in the remarkable capacity of bacteria to adapt to new or changing environments. Conversely, the characteristics of environmental niches determine the selective pressures and can shape the structure of regulatory network accordingly. PMID:23046950
Proteome- and transcriptome-driven reconstruction of the human myocyte metabolic network and its use for identification of markers for diabetes.

PubMed

Väremo, Leif; Scheele, Camilla; Broholm, Christa; Mardinoglu, Adil; Kampf, Caroline; Asplund, Anna; Nookaew, Intawat; Uhlén, Mathias; Pedersen, Bente Klarlund; Nielsen, Jens

2015-05-12

Skeletal myocytes are metabolically active and susceptible to insulin resistance and are thus implicated in type 2 diabetes (T2D). This complex disease involves systemic metabolic changes, and their elucidation at the systems level requires genome-wide data and biological networks. Genome-scale metabolic models (GEMs) provide a network context for the integration of high-throughput data. We generated myocyte-specific RNA-sequencing data and investigated their correlation with proteome data. These data were then used to reconstruct a comprehensive myocyte GEM. Next, we performed a meta-analysis of six studies comparing muscle transcription in T2D versus healthy subjects. Transcriptional changes were mapped on the myocyte GEM, revealing extensive transcriptional regulation in T2D, particularly around pyruvate oxidation, branched-chain amino acid catabolism, and tetrahydrofolate metabolism, connected through the downregulated dihydrolipoamide dehydrogenase. Strikingly, the gene signature underlying this metabolic regulation successfully classifies the disease state of individual samples, suggesting that regulation of these pathways is a ubiquitous feature of myocytes in response to T2D. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

Large-scale turnover of functional transcription factor bindingsites in Drosophila

DOE Office of Scientific and Technical Information (OSTI.GOV)

Moses, Alan M.; Pollard, Daniel A.; Nix, David A.

2006-07-14

The gain and loss of functional transcription-factor bindingsites has been proposed as a major source of evolutionary change incis-regulatory DNA and gene expression. We have developed an evolutionarymodel to study binding site turnover that uses multiple sequencealignments to assess the evolutionary constraint on individual bindingsites, and to map gain and loss events along a phylogenetic tree. Weapply this model to study the evolutionary dynamics of binding sites ofthe Drosophila melanogaster transcription factor Zeste, using genome-widein vivo (ChIP-chip) binding data to identify functional Zeste bindingsites, and the genome sequences of D. melanogaster, D. simulans, D.erecta and D. yakuba to study theirmore » evolution. We estimate that more than5 percent of functional Zeste binding sites in D. melanogaster weregained along the D. melanogaster lineage or lost along one of the otherlineages. We find that Zeste bound regions have a reduced rate of bindingsite loss and an increased rate of binding site gain relative to flankingsequences. Finally, we show that binding site gains and losses areasymmetrically distributed with respect to D. melanogaster, consistentwith lineage-specific acquisition and loss of Zeste-responsive regulatoryelements.« less
A combination of improved differential and global RNA-seq reveals pervasive transcription initiation and events in all stages of the life-cycle of functional RNAs in Propionibacterium acnes, a major contributor to wide-spread human disease

PubMed Central

2013-01-01

Background Sequencing of the genome of Propionibacterium acnes produced a catalogue of genes many of which enable this organism to colonise skin and survive exposure to the elements. Despite this platform, there was little understanding of the gene regulation that gives rise to an organism that has a major impact on human health and wellbeing and causes infections beyond the skin. To address this situation, we have undertaken a genome–wide study of gene regulation using a combination of improved differential and global RNA-sequencing and an analytical approach that takes into account the inherent noise within the data. Results We have produced nucleotide-resolution transcriptome maps that identify and differentiate sites of transcription initiation from sites of stable RNA processing and mRNA cleavage. Moreover, analysis of these maps provides strong evidence for ‘pervasive’ transcription and shows that contrary to initial indications it is not biased towards the production of antisense RNAs. In addition, the maps reveal an extensive array of riboswitches, leaderless mRNAs and small non-protein-coding RNAs alongside vegetative promoters and post-transcriptional events, which includes unusual tRNA processing. The identification of such features will inform models of complex gene regulation, as illustrated here for ribonucleotide reductases and a potential quorum-sensing, two-component system. Conclusions The approach described here, which is transferable to any bacterial species, has produced a step increase in whole-cell knowledge of gene regulation in P. acnes. Continued expansion of our maps to include transcription associated with different growth conditions and genetic backgrounds will provide a new platform from which to computationally model the gene expression that determines the physiology of P. acnes and its role in human disease. PMID:24034785
Detection of microRNAs in color space.

PubMed

Marco, Antonio; Griffiths-Jones, Sam

2012-02-01

Deep sequencing provides inexpensive opportunities to characterize the transcriptional diversity of known genomes. The AB SOLiD technology generates millions of short sequencing reads in color-space; that is, the raw data is a sequence of colors, where each color represents 2 nt and each nucleotide is represented by two consecutive colors. This strategy is purported to have several advantages, including increased ability to distinguish sequencing errors from polymorphisms. Several programs have been developed to map short reads to genomes in color space. However, a number of previously unexplored technical issues arise when using SOLiD technology to characterize microRNAs. Here we explore these technical difficulties. First, since the sequenced reads are longer than the biological sequences, every read is expected to contain linker fragments. The color-calling error rate increases toward the 3(') end of the read such that recognizing the linker sequence for removal becomes problematic. Second, mapping in color space may lead to the loss of the first nucleotide of each read. We propose a sequential trimming and mapping approach to map small RNAs. Using our strategy, we reanalyze three published insect small RNA deep sequencing datasets and characterize 22 new microRNAs. A bash shell script to perform the sequential trimming and mapping procedure, called SeqTrimMap, is available at: http://www.mirbase.org/tools/seqtrimmap/ antonio.marco@manchester.ac.uk Supplementary data are available at Bioinformatics online.
Evaluation of potential models for imprinted and nonimprinted components of human chromosome 15q11-q13 syndromes by fine-structure homology mapping in the mouse.

PubMed Central

Nicholls, R D; Gottlieb, W; Russell, L B; Davda, M; Horsthemke, B; Rinchik, E M

1993-01-01

Prader-Willi and Angelman syndromes are complex neurobehavioral contiguous gene syndromes whose expression depends on the unmasking of genomic imprinting for different genetic loci in human chromosome 15q11-q13. The homologous chromosomal region in the mouse genome has been fine-mapped by using interspecific (Mus spretus) crosses and overlapping, radiation-induced deletions to evaluate potential animal models for both imprinted and nonimprinted components of these syndromes. Four evolutionarily conserved sequences from human 15q11-q13, including two cDNAs from fetal brain (DN10, D15S12h; DN34, D15S9h-1), a microdissected clone (MN7; D15F37S1h) expressed in mouse brain, and the gene for the beta 3 subunit of the gamma-aminobutyric acid type A receptor (Gabrb3), were mapped in mouse chromosome 7 by analysis of deletions at the pink-eyed dilution (p) locus. Three of these loci are deleted in pre- and postnatally lethal p-locus mutations, which extend up to 5.5 +/- 1.7 centimorgans (cM) proximal to p; D15S9h-1, which maps 1.1 +/- 0.8 cM distal to p and is the mouse homolog of the human gene D15S9 (which shows a DNA methylation imprint), is not deleted in any of the p-locus deletion series. A transcript from the Gabrb3 gene, but not the transcript detected by MN7 at the D15F37S1h locus, is expressed in mice homozygous for the p6H deletion, which have an abnormal neurological phenotype. Furthermore, the Gabrb3 transcript is expressed equally well from the maternal or paternal chromosome 7 and, therefore, its expression is not imprinted in mouse brain. Deletions at the mouse p locus should serve as intermediate genetic reagents and models with which to analyze the genetics and etiology of individual components of human 15q11-q13 disorders. Images Fig. 1 Fig. 2 Fig. 4 Fig. 5 PMID:8095339
A Hierarchical Framework for State-Space Matrix Inference and Clustering.

PubMed

Zuo, Chandler; Chen, Kailei; Hewitt, Kyle J; Bresnick, Emery H; Keleş, Sündüz

2016-09-01

In recent years, a large number of genomic and epigenomic studies have been focusing on the integrative analysis of multiple experimental datasets measured over a large number of observational units. The objectives of such studies include not only inferring a hidden state of activity for each unit over individual experiments, but also detecting highly associated clusters of units based on their inferred states. Although there are a number of methods tailored for specific datasets, there is currently no state-of-the-art modeling framework for this general class of problems. In this paper, we develop the MBASIC ( M atrix B ased A nalysis for S tate-space I nference and C lustering) framework. MBASIC consists of two parts: state-space mapping and state-space clustering. In state-space mapping, it maps observations onto a finite state-space, representing the activation states of units across conditions. In state-space clustering, MBASIC incorporates a finite mixture model to cluster the units based on their inferred state-space profiles across all conditions. Both the state-space mapping and clustering can be simultaneously estimated through an Expectation-Maximization algorithm. MBASIC flexibly adapts to a large number of parametric distributions for the observed data, as well as the heterogeneity in replicate experiments. It allows for imposing structural assumptions on each cluster, and enables model selection using information criterion. In our data-driven simulation studies, MBASIC showed significant accuracy in recovering both the underlying state-space variables and clustering structures. We applied MBASIC to two genome research problems using large numbers of datasets from the ENCODE project. The first application grouped genes based on transcription factor occupancy profiles of their promoter regions in two different cell types. The second application focused on identifying groups of loci that are similar to a GATA2 binding site that is functional at its endogenous locus by utilizing transcription factor occupancy data and illustrated applicability of MBASIC in a wide variety of problems. In both studies, MBASIC showed higher levels of raw data fidelity than analyzing these data with a two-step approach using ENCODE results on transcription factor occupancy data.
SHARAKU: an algorithm for aligning and clustering read mapping profiles of deep sequencing in non-coding RNA processing.

PubMed

Tsuchiya, Mariko; Amano, Kojiro; Abe, Masaya; Seki, Misato; Hase, Sumitaka; Sato, Kengo; Sakakibara, Yasubumi

2016-06-15

Deep sequencing of the transcripts of regulatory non-coding RNA generates footprints of post-transcriptional processes. After obtaining sequence reads, the short reads are mapped to a reference genome, and specific mapping patterns can be detected called read mapping profiles, which are distinct from random non-functional degradation patterns. These patterns reflect the maturation processes that lead to the production of shorter RNA sequences. Recent next-generation sequencing studies have revealed not only the typical maturation process of miRNAs but also the various processing mechanisms of small RNAs derived from tRNAs and snoRNAs. We developed an algorithm termed SHARAKU to align two read mapping profiles of next-generation sequencing outputs for non-coding RNAs. In contrast with previous work, SHARAKU incorporates the primary and secondary sequence structures into an alignment of read mapping profiles to allow for the detection of common processing patterns. Using a benchmark simulated dataset, SHARAKU exhibited superior performance to previous methods for correctly clustering the read mapping profiles with respect to 5'-end processing and 3'-end processing from degradation patterns and in detecting similar processing patterns in deriving the shorter RNAs. Further, using experimental data of small RNA sequencing for the common marmoset brain, SHARAKU succeeded in identifying the significant clusters of read mapping profiles for similar processing patterns of small derived RNA families expressed in the brain. The source code of our program SHARAKU is available at http://www.dna.bio.keio.ac.jp/sharaku/, and the simulated dataset used in this work is available at the same link. Accession code: The sequence data from the whole RNA transcripts in the hippocampus of the left brain used in this work is available from the DNA DataBank of Japan (DDBJ) Sequence Read Archive (DRA) under the accession number DRA004502. yasu@bio.keio.ac.jp Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Bacillus anthracis genome organization in light of whole transcriptome sequencing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Martin, Jeffrey; Zhu, Wenhan; Passalacqua, Karla D.

2010-03-22

Emerging knowledge of whole prokaryotic transcriptomes could validate a number of theoretical concepts introduced in the early days of genomics. What are the rules connecting gene expression levels with sequence determinants such as quantitative scores of promoters and terminators? Are translation efficiency measures, e.g. codon adaptation index and RBS score related to gene expression? We used the whole transcriptome shotgun sequencing of a bacterial pathogen Bacillus anthracis to assess correlation of gene expression level with promoter, terminator and RBS scores, codon adaptation index, as well as with a new measure of gene translational efficiency, average translation speed. We compared computationalmore » predictions of operon topologies with the transcript borders inferred from RNA-Seq reads. Transcriptome mapping may also improve existing gene annotation. Upon assessment of accuracy of current annotation of protein-coding genes in the B. anthracis genome we have shown that the transcriptome data indicate existence of more than a hundred genes missing in the annotation though predicted by an ab initio gene finder. Interestingly, we observed that many pseudogenes possess not only a sequence with detectable coding potential but also promoters that maintain transcriptional activity.« less
Dynamics of DNA methylomes underlie oyster development

PubMed Central

Sourdaine, Pascal; Guo, Ximing; Favrel, Pascal

2017-01-01

DNA methylation is a critical epigenetic regulator of development in mammals and social insects, but its significance in development outside these groups is not understood. Here we investigated the genome-wide dynamics of DNA methylation in a mollusc model, the oyster Crassostrea gigas, from the egg to the completion of organogenesis. Large-scale methylation maps reveal that the oyster genome displays a succession of methylated and non methylated regions, which persist throughout development. Differentially methylated regions (DMRs) are strongly regulated during cleavage and metamorphosis. The distribution and levels of methylated DNA within genomic features (exons, introns, promoters, repeats and transposons) show different developmental lansdscapes marked by a strong increase in the methylation of exons against introns after metamorphosis. Kinetics of methylation in gene-bodies correlate to their transcription regulation and to distinct functional gene clusters, and DMRs at cleavage and metamorphosis bear the genes functionally related to these steps, respectively. This study shows that DNA methylome dynamics underlie development through transcription regulation in the oyster, a lophotrochozoan species. To our knowledge, this is the first demonstration of such epigenetic regulation outside vertebrates and ecdysozoan models, bringing new insights into the evolution and the epigenetic regulation of developmental processes. PMID:28594821
Dynamics of DNA methylomes underlie oyster development.

PubMed

Riviere, Guillaume; He, Yan; Tecchio, Samuele; Crowell, Elizabeth; Gras, Michaël; Sourdaine, Pascal; Guo, Ximing; Favrel, Pascal

2017-06-01

DNA methylation is a critical epigenetic regulator of development in mammals and social insects, but its significance in development outside these groups is not understood. Here we investigated the genome-wide dynamics of DNA methylation in a mollusc model, the oyster Crassostrea gigas, from the egg to the completion of organogenesis. Large-scale methylation maps reveal that the oyster genome displays a succession of methylated and non methylated regions, which persist throughout development. Differentially methylated regions (DMRs) are strongly regulated during cleavage and metamorphosis. The distribution and levels of methylated DNA within genomic features (exons, introns, promoters, repeats and transposons) show different developmental lansdscapes marked by a strong increase in the methylation of exons against introns after metamorphosis. Kinetics of methylation in gene-bodies correlate to their transcription regulation and to distinct functional gene clusters, and DMRs at cleavage and metamorphosis bear the genes functionally related to these steps, respectively. This study shows that DNA methylome dynamics underlie development through transcription regulation in the oyster, a lophotrochozoan species. To our knowledge, this is the first demonstration of such epigenetic regulation outside vertebrates and ecdysozoan models, bringing new insights into the evolution and the epigenetic regulation of developmental processes.
Replication landscape of the human genome

PubMed Central

Petryk, Nataliya; Kahli, Malik; d'Aubenton-Carafa, Yves; Jaszczyszyn, Yan; Shen, Yimin; Silvain, Maud; Thermes, Claude; Chen, Chun-Long; Hyrien, Olivier

2016-01-01

Despite intense investigation, human replication origins and termini remain elusive. Existing data have shown strong discrepancies. Here we sequenced highly purified Okazaki fragments from two cell types and, for the first time, quantitated replication fork directionality and delineated initiation and termination zones genome-wide. Replication initiates stochastically, primarily within non-transcribed, broad (up to 150 kb) zones that often abut transcribed genes, and terminates dispersively between them. Replication fork progression is significantly co-oriented with the transcription. Initiation and termination zones are frequently contiguous, sometimes separated by regions of unidirectional replication. Initiation zones are enriched in open chromatin and enhancer marks, even when not flanked by genes, and often border ‘topologically associating domains' (TADs). Initiation zones are enriched in origin recognition complex (ORC)-binding sites and better align to origins previously mapped using bubble-trap than λ-exonuclease. This novel panorama of replication reveals how chromatin and transcription modulate the initiation process to create cell-type-specific replication programs. PMID:26751768
Pancreatic islet enhancer clusters enriched in type 2 diabetes risk-associated variants.

PubMed

Pasquali, Lorenzo; Gaulton, Kyle J; Rodríguez-Seguí, Santiago A; Mularoni, Loris; Miguel-Escalada, Irene; Akerman, İldem; Tena, Juan J; Morán, Ignasi; Gómez-Marín, Carlos; van de Bunt, Martijn; Ponsa-Cobas, Joan; Castro, Natalia; Nammo, Takao; Cebola, Inês; García-Hurtado, Javier; Maestro, Miguel Angel; Pattou, François; Piemonti, Lorenzo; Berney, Thierry; Gloyn, Anna L; Ravassard, Philippe; Skarmeta, José Luis Gómez; Müller, Ferenc; McCarthy, Mark I; Ferrer, Jorge

2014-02-01

Type 2 diabetes affects over 300 million people, causing severe complications and premature death, yet the underlying molecular mechanisms are largely unknown. Pancreatic islet dysfunction is central in type 2 diabetes pathogenesis, and understanding islet genome regulation could therefore provide valuable mechanistic insights. We have now mapped and examined the function of human islet cis-regulatory networks. We identify genomic sequences that are targeted by islet transcription factors to drive islet-specific gene activity and show that most such sequences reside in clusters of enhancers that form physical three-dimensional chromatin domains. We find that sequence variants associated with type 2 diabetes and fasting glycemia are enriched in these clustered islet enhancers and identify trait-associated variants that disrupt DNA binding and islet enhancer activity. Our studies illustrate how islet transcription factors interact functionally with the epigenome and provide systematic evidence that the dysregulation of islet enhancers is relevant to the mechanisms underlying type 2 diabetes.
NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins

PubMed Central

Pruitt, Kim D.; Tatusova, Tatiana; Maglott, Donna R.

2005-01-01

The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) provides a non-redundant collection of sequences representing genomic data, transcripts and proteins. Although the goal is to provide a comprehensive dataset representing the complete sequence information for any given species, the database pragmatically includes sequence data that are currently publicly available in the archival databases. The database incorporates data from over 2400 organisms and includes over one million proteins representing significant taxonomic diversity spanning prokaryotes, eukaryotes and viruses. Nucleotide and protein sequences are explicitly linked, and the sequences are linked to other resources including the NCBI Map Viewer and Gene. Sequences are annotated to include coding regions, conserved domains, variation, references, names, database cross-references, and other features using a combined approach of collaboration and other input from the scientific community, automated annotation, propagation from GenBank and curation by NCBI staff. PMID:15608248
Mapping and analysis of Caenorhabditis elegans transcription factor sequence specificities

PubMed Central

Narasimhan, Kamesh; Lambert, Samuel A; Yang, Ally WH; Riddell, Jeremy; Mnaimneh, Sanie; Zheng, Hong; Albu, Mihai; Najafabadi, Hamed S; Reece-Hoyes, John S; Fuxman Bass, Juan I; Walhout, Albertha JM; Weirauch, Matthew T; Hughes, Timothy R

2015-01-01

Caenorhabditis elegans is a powerful model for studying gene regulation, as it has a compact genome and a wealth of genomic tools. However, identification of regulatory elements has been limited, as DNA-binding motifs are known for only 71 of the estimated 763 sequence-specific transcription factors (TFs). To address this problem, we performed protein binding microarray experiments on representatives of canonical TF families in C. elegans, obtaining motifs for 129 TFs. Additionally, we predict motifs for many TFs that have DNA-binding domains similar to those already characterized, increasing coverage of binding specificities to 292 C. elegans TFs (∼40%). These data highlight the diversification of binding motifs for the nuclear hormone receptor and C2H2 zinc finger families and reveal unexpected diversity of motifs for T-box and DM families. Motif enrichment in promoters of functionally related genes is consistent with known biology and also identifies putative regulatory roles for unstudied TFs. DOI: http://dx.doi.org/10.7554/eLife.06967.001 PMID:25905672
Draft genome and reference transcriptomic resources for the urticating pine defoliator Thaumetopoea pityocampa (Lepidoptera: Notodontidae).

PubMed

Gschloessl, B; Dorkeld, F; Berges, H; Beydon, G; Bouchez, O; Branco, M; Bretaudeau, A; Burban, C; Dubois, E; Gauthier, P; Lhuillier, E; Nichols, J; Nidelet, S; Rocha, S; Sauné, L; Streiff, R; Gautier, M; Kerdelhué, C

2018-05-01

The pine processionary moth Thaumetopoea pityocampa (Lepidoptera: Notodontidae) is the main pine defoliator in the Mediterranean region. Its urticating larvae cause severe human and animal health concerns in the invaded areas. This species shows a high phenotypic variability for various traits, such as phenology, fecundity and tolerance to extreme temperatures. This study presents the construction and analysis of extensive genomic and transcriptomic resources, which are an obligate prerequisite to understand their underlying genetic architecture. Using a well-studied population from Portugal with peculiar phenological characteristics, the karyotype was first determined and a first draft genome of 537 Mb total length was assembled into 68,292 scaffolds (N50 = 164 kb). From this genome assembly, 29,415 coding genes were predicted. To circumvent some limitations for fine-scale physical mapping of genomic regions of interest, a 3X coverage BAC library was also developed. In particular, 11 BACs from this library were individually sequenced to assess the assembly quality. Additionally, de novo transcriptomic resources were generated from various developmental stages sequenced with HiSeq and MiSeq Illumina technologies. The reads were de novo assembled into 62,376 and 63,175 transcripts, respectively. Then, a robust subset of the genome-predicted coding genes, the de novo transcriptome assemblies and previously published 454/Sanger data were clustered to obtain a high-quality and comprehensive reference transcriptome consisting of 29,701 bona fide unigenes. These sequences covered 99% of the cegma and 88% of the busco highly conserved eukaryotic genes and 84% of the busco arthropod gene set. Moreover, 90% of these transcripts could be localized on the draft genome. The described information is available via a genome annotation portal (http://bipaa.genouest.org/sp/thaumetopoea_pityocampa/). © 2018 John Wiley & Sons Ltd.
BLISS is a versatile and quantitative method for genome-wide profiling of DNA double-strand breaks.

PubMed

Yan, Winston X; Mirzazadeh, Reza; Garnerone, Silvano; Scott, David; Schneider, Martin W; Kallas, Tomasz; Custodio, Joaquin; Wernersson, Erik; Li, Yinqing; Gao, Linyi; Federova, Yana; Zetsche, Bernd; Zhang, Feng; Bienko, Magda; Crosetto, Nicola

2017-05-12

Precisely measuring the location and frequency of DNA double-strand breaks (DSBs) along the genome is instrumental to understanding genomic fragility, but current methods are limited in versatility, sensitivity or practicality. Here we present Breaks Labeling In Situ and Sequencing (BLISS), featuring the following: (1) direct labelling of DSBs in fixed cells or tissue sections on a solid surface; (2) low-input requirement by linear amplification of tagged DSBs by in vitro transcription; (3) quantification of DSBs through unique molecular identifiers; and (4) easy scalability and multiplexing. We apply BLISS to profile endogenous and exogenous DSBs in low-input samples of cancer cells, embryonic stem cells and liver tissue. We demonstrate the sensitivity of BLISS by assessing the genome-wide off-target activity of two CRISPR-associated RNA-guided endonucleases, Cas9 and Cpf1, observing that Cpf1 has higher specificity than Cas9. Our results establish BLISS as a versatile, sensitive and efficient method for genome-wide DSB mapping in many applications.
[Genome similarity of Baikal omul and sig].

PubMed

Bychenko, O S; Sukhanova, L V; Ukolova, S S; Skvortsov, T A; Potapov, V K; Azhikina, T L; Sverdlov, E D

2009-01-01

Two members of the Baikal sig family, a lake sig (Coregonus lavaretus baicalensis Dybovsky) and omul (C. autumnalis migratorius Georgi), are close relatives that diverged from the same ancestor 10-20 thousand years ago. In this work, we studied genomic polymorphism of these two fish species. The method of subtraction hybridization (SH) did not reveal the presence of extended sequences in the sig genome and their absence in the omul genome. All the fragments found by SH corresponded to polymorphous noncoding genome regions varying in mononucleotide substitutions and short deletions. Many of them are mapped close to genes of the immune system and have regions identical to the Tc-1-like transposons abundant among fish, whose transcription activity may affect the expression of adjacent genes. Thus, we showed for the first time that genetic differences between Baikal sig family members are extremely small and cannot be revealed by the SH method. This is another endorsement of the hypothesis on the close relationship between Baikal sig and omul and their evolutionarily recent divergence from a common ancestor.
Clock genes and their genomic distributions in three species of salmonid fishes: Associations with genes regulating sexual maturation and cell cycling

PubMed Central

2010-01-01

Background Clock family genes encode transcription factors that regulate clock-controlled genes and thus regulate many physiological mechanisms/processes in a circadian fashion. Clock1 duplicates and copies of Clock3 and NPAS2-like genes were partially characterized (genomic sequencing) and mapped using family-based indels/SNPs in rainbow trout (RT)(Oncorhynchus mykiss), Arctic charr (AC)(Salvelinus alpinus), and Atlantic salmon (AS)(Salmo salar) mapping panels. Results Clock1 duplicates mapped to linkage groups RT-8/-24, AC-16/-13 and AS-2/-18. Clock3/NPAS2-like genes mapped to RT-9/-20, AC-20/-43, and AS-5. Most of these linkage group regions containing the Clock gene duplicates were derived from the most recent 4R whole genome duplication event specific to the salmonids. These linkage groups contain quantitative trait loci (QTL) for life history and growth traits (i.e., reproduction and cell cycling). Comparative synteny analyses with other model teleost species reveal a high degree of conservation for genes in these chromosomal regions suggesting that functionally related or co-regulated genes are clustered in syntenic blocks. For example, anti-müllerian hormone (amh), regulating sexual maturation, and ornithine decarboxylase antizymes (oaz1 and oaz2), regulating cell cycling, are contained within these syntenic blocks. Conclusions Synteny analyses indicate that regions homologous to major life-history QTL regions in salmonids contain many candidate genes that are likely to influence reproduction and cell cycling. The order of these genes is highly conserved across the vertebrate species examined, and as such, these genes may make up a functional cluster of genes that are likely co-regulated. CLOCK, as a transcription factor, is found within this block and therefore has the potential to cis-regulate the processes influenced by these genes. Additionally, clock-controlled genes (CCGs) are located in other life-history QTL regions within salmonids suggesting that at least in part, trans-regulation of these QTL regions may also occur via Clock expression. PMID:20670436
Genome-wide analysis of the TPX2 family proteins in Eucalyptus grandis.

PubMed

Du, Pingzhou; Kumar, Manoj; Yao, Yuan; Xie, Qiaoli; Wang, Jinyan; Zhang, Baolong; Gan, Siming; Wang, Yuqi; Wu, Ai-Min

2016-11-24

The Xklp2 (TPX2) proteins belong to the microtubule-associated (MAP) family of proteins. All members of the family contain the conserved TPX2 motif, which can interact with microtubules, regulate microtubule dynamics or assist with different microtubule functions, for example, maintenance of cell morphology or regulation of cell growth and development. However, the role of members of the TPX family have not been studied in the model tree species Eucalyptus to date. Here, we report the identification of the members of the TPX2 family in Eucalyptus grandis (Eg) and analyse the expression patterns and functions of these genes. In present study, a comprehensive analysis of the plant TPX2 family proteins was performed. Phylogenetic analyses indicated that the genes can be classified into 6 distinct subfamilies. A genome-wide survey identified 12 members of the TPX2 family in the sequenced genome of Eucalyptus grandis. The basic genetic properties of the TPX2 family in Eucalyptus were analysed. Our results suggest that the TPX2 family proteins within different sub-groups are relatively conserved but there are important differences between groups. Quantitative real-time PCR (qRT-PCR) was performed to confirm the expression levels of the genes in different tissues. The results showed that in the whole plant, the levels of EgWDL5 transcript are the highest, followed by those of EgWDL4. Compared with other tissues, the level of the EgMAP20 transcript is the highest in the root. Over-expression of EgMAP20 in Arabidopsis resulted in organ twisting. The cotyledon petioles showed left-handed twisting while the hypocotyl epidermal cells produced right-handed helical twisting. Finally, EgMAP20, EgWDL3 and EgWDL3L were all able to decorate microtubules. Plant TPX2 family proteins were systematically analysed using bioinformatics methods. There are 12 TPX2 family proteins in Eucalyptus. We have performed an initial characterization of the functions of several members of the TPX2 family. We found that the gene products are localized to the microtubule cytoskeleton. Our results lay the foundation for future efforts to reveal the biological significance of TPX2 family proteins in Eucalyptus.
Genome-wide mRNA processing in methanogenic archaea reveals post-transcriptional regulation of ribosomal protein synthesis.

PubMed

Qi, Lei; Yue, Lei; Feng, Deqin; Qi, Fengxia; Li, Jie; Dong, Xiuzhu

2017-07-07

Unlike stable RNAs that require processing for maturation, prokaryotic cellular mRNAs generally follow an 'all-or-none' pattern. Herein, we used a 5΄ monophosphate transcript sequencing (5΄P-seq) that specifically captured the 5΄-end of processed transcripts and mapped the genome-wide RNA processing sites (PSSs) in a methanogenic archaeon. Following statistical analysis and stringent filtration, we identified 1429 PSSs, among which 23.5% and 5.4% were located in 5΄ untranslated region (uPSS) and intergenic region (iPSS), respectively. A predominant uridine downstream PSSs served as a processing signature. Remarkably, 5΄P-seq detected overrepresented uPSS and iPSS in the polycistronic operons encoding ribosomal proteins, and the majority upstream and proximal ribosome binding sites, suggesting a regulatory role of processing on translation initiation. The processed transcripts showed increased stability and translation efficiency. Particularly, processing within the tricistronic transcript of rplA-rplJ-rplL enhanced the translation of rplL, which can provide a driving force for the 1:4 stoichiometry of L10 to L12 in the ribosome. Growth-associated mRNA processing intensities were also correlated with the cellular ribosomal protein levels, thereby suggesting that mRNA processing is involved in tuning growth-dependent ribosome synthesis. In conclusion, our findings suggest that mRNA processing-mediated post-transcriptional regulation is a potential mechanism of ribosomal protein synthesis and stoichiometry. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
First insights into the giant panda (Ailuropoda melanoleuca) blood transcriptome: a resource for novel gene loci and immunogenetics.

PubMed

Du, Lianming; Li, Wujiao; Fan, Zhenxin; Shen, Fujun; Yang, Mingyu; Wang, Zili; Jian, Zuoyi; Hou, Rong; Yue, Bisong; Zhang, Xiuyue

2015-07-01

The giant panda (Ailuropoda melanoleuca) is one of the most famous flagship species for conservation, and its draft genome has recently been assembled. However, the transcriptome is not yet available. In this study, the blood transcriptomes of three pandas were characterized and about 160 million sequencing reads were generated using Illumina HiSeq 2000 paired-end sequencing technology. The assembly yielded 92 598 transcripts with an average length of 1626 bp and N50 length of 2842 bp. Based on a sequence similarity search against nonredundant (nr) protein database, a total of 38 522 (41.6%) transcripts were annotated. Of these annotated transcripts, 25 142 and 8272 transcripts were assigned to gene ontology terms and clusters of orthologous group, respectively. A search against the Kyoto Encyclopedia of Genes and Genomes Pathway database (KEGG) indicated that 9098 (9.83%) transcripts mapped to 324 KEGG pathways, and the best represented functional categories of pathways were signal transduction and immune system. We have also identified 23 460 microsatellites, 43 560 SNPs as well as 21 456 alternative splicing events in the assembly. Additionally, a total of 24 341 complete open reading frames (ORFs) were detected from the assembly where 1492 ORFs were found to be novel gene loci as these have not been annotated so far in any public database. © 2014 John Wiley & Sons Ltd.

Transcriptome-wide N 6 -methyladenosine methylome profiling of porcine muscle and adipose tissues reveals a potential mechanism for transcriptional regulation and differential methylation pattern.

PubMed

Tao, Xuelian; Chen, Jianning; Jiang, Yanzhi; Wei, Yingying; Chen, Yan; Xu, Huaming; Zhu, Li; Tang, Guoqing; Li, Mingzhou; Jiang, Anan; Shuai, Surong; Bai, Lin; Liu, Haifeng; Ma, Jideng; Jin, Long; Wen, Anxiang; Wang, Qin; Zhu, Guangxiang; Xie, Meng; Wu, Jiayun; He, Tao; Huang, Chunyu; Gao, Xiang; Li, Xuewei

2017-04-28

N 6 -methyladenosine (m 6 A) is the most prevalent internal form of modification in messenger RNA in higher eukaryotes and potential regulatory functions of reversible m 6 A methylation on mRNA have been revealed by mapping of m 6 A methylomes in several species. m 6 A modification in active gene regulation manifests itself as altered methylation profiles in a tissue-specific manner or in response to changing cellular or species living environment. However, up to date, there has no data on m 6 A porcine transcriptome-wide map and its potential biological roles in adipose deposition and muscle growth. In this work, we used methylated RNA immunoprecipitation with next-generation sequencing (MeRIP-Seq) technique to acquire the first ever m 6 A porcine transcriptome-wide map. Transcriptomes of muscle and adipose tissues from three different pig breeds, the wild boar, Landrace, and Rongchang pig, were used to generate these maps. Our findings show that there were 5,872 and 2,826 m 6 A peaks respectively, in the porcine muscle and adipose tissue transcriptomes. Stop codons, 3'-untranslated regions, and coding regions were found to be mainly enriched for m 6 A peaks. Gene ontology analysis revealed that common m 6 A peaks in nuclear genes are associated with transcriptional factors, suggestive of a relationship between m 6 A mRNA methylation and nuclear genome transcription. Some genes showed tissue- and breed-differential methylation, and have novel biological functions. We also found a relationship between the m 6 A methylation extent and the transcript level, suggesting a regulatory role for m 6 A in gene expression. This comprehensive map provides a solid basis for the determination of potential functional roles for RNA m 6 A modification in adipose deposition and muscle growth.
Identification of quantitative trait loci affecting ectomycorrhizal symbiosis in an interspecific F1 poplar cross and differential expression of genes in ectomycorrhizas of the two parents: Populus deltoides and Populus trichocarpa

DOE Office of Scientific and Technical Information (OSTI.GOV)

Labbe, Jessy L; Jorge, Veronique; Vion, Patrice

A Populus deltoides Populus trichocarpa F1 pedigree was analyzed for quantitative trait loci (QTLs) affecting ectomycorrhizal development and for microarray characterization of gene networks involved in this symbiosis. A 300 genotype progeny set was evaluated for its ability to form ectomycorrhiza with the basidiomycete Laccaria bicolor. The percentage of mycorrhizal root tips was determined on the root systems of all 300 progeny and their two parents. QTL analysis identified four significant QTLs, one on the P. deltoides and three on the P. trichocarpa genetic maps. These QTLs were aligned to the P. trichocarpa genome and each contained several megabases andmore » encompass numerous genes. NimbleGen whole-genome microarray, using cDNA from RNA extracts of ectomycorrhizal root tips from the parental genotypes P. trichocarpa and P. deltoides, was used to narrow the candidate gene list. Among the 1,543 differentially expressed genes (p value 0.05; 5.0-fold change in transcript level) having different transcript levels in mycorrhiza of the two parents, 41 transcripts were located in the QTL intervals: 20 in Myc_d1, 14 in Myc_t1, and seven in Myc_t2, while no significant differences among transcripts were found in Myc_t3. Among these 41 transcripts, 25 were overrepresented in P. deltoides relative to P. trichocarpa; 16 were overrepresented in P. trichocarpa. The transcript showing the highest overrepresentation in P. trichocarpa mycorrhiza libraries compared to P. deltoides mycorrhiza codes for an ethylene-sensitive EREBP-4 protein which may repress defense mechanisms in P. trichocarpa while the highest overrepresented transcripts in P. deltoides code for proteins/genes typically associated with pathogen resistance.« less
Nencki Genomics Database—Ensembl funcgen enhanced with intersections, user data and genome-wide TFBS motifs

PubMed Central

Krystkowiak, Izabella; Lenart, Jakub; Debski, Konrad; Kuterba, Piotr; Petas, Michal; Kaminska, Bozena; Dabrowski, Michal

2013-01-01

We present the Nencki Genomics Database, which extends the functionality of Ensembl Regulatory Build (funcgen) for the three species: human, mouse and rat. The key enhancements over Ensembl funcgen include the following: (i) a user can add private data, analyze them alongside the public data and manage access rights; (ii) inside the database, we provide efficient procedures for computing intersections between regulatory features and for mapping them to the genes. To Ensembl funcgen-derived data, which include data from ENCODE, we add information on conserved non-coding (putative regulatory) sequences, and on genome-wide occurrence of transcription factor binding site motifs from the current versions of two major motif libraries, namely, Jaspar and Transfac. The intersections and mapping to the genes are pre-computed for the public data, and the result of any procedure run on the data added by the users is stored back into the database, thus incrementally increasing the body of pre-computed data. As the Ensembl funcgen schema for the rat is currently not populated, our database is the first database of regulatory features for this frequently used laboratory animal. The database is accessible without registration using the mysql client: mysql –h database.nencki-genomics.org –u public. Registration is required only to add or access private data. A WSDL webservice provides access to the database from any SOAP client, including the Taverna Workbench with a graphical user interface. Database URL: http://www.nencki-genomics.org. PMID:24089456
Rapid construction of genome map for large yellow croaker (Larimichthys crocea) by the whole-genome mapping in BioNano Genomics Irys system.

PubMed

Xiao, Shijun; Li, Jiongtang; Ma, Fengshou; Fang, Lujing; Xu, Shuangbin; Chen, Wei; Wang, Zhi Yong

2015-09-03

Large yellow croaker (Larimichthys crocea) is an important commercial fish in China and East-Asia. The annual product of the species from the aqua-farming industry is about 90 thousand tons. In spite of its economic importance, genetic studies of economic traits and genomic selections of the species are hindered by the lack of genomic resources. Specifically, a whole-genome physical map of large yellow croaker is still missing. The traditional BAC-based fingerprint method is extremely time- and labour-consuming. Here we report the first genome map construction using the high-throughput whole-genome mapping technique by nanochannel arrays in BioNano Genomics Irys system. For an optimal marker density of ~10 per 100 kb, the nicking endonuclease Nt.BspQ1 was chosen for the genome map generation. 645,305 DNA molecules with a total length of ~112 Gb were labelled and detected, covering more than 160X of the large yellow croaker genome. Employing IrysView package and signature patterns in raw DNA molecules, a whole-genome map of large yellow croaker was assembled into 686 maps with a total length of 727 Mb, which was consistent with the estimated genome size. The N50 length of the whole-genome map, including 126 maps, was up to 1.7 Mb. The excellent hybrid alignment with large yellow croaker draft genome validated the consensus genome map assembly and highlighted a promising application of whole-genome mapping on draft genome sequence super-scaffolding. The genome map data of large yellow croaker are accessible on lycgenomics.jmu.edu.cn/pm. Using the state-of-the-art whole-genome mapping technique in Irys system, the first whole-genome map for large yellow croaker has been constructed and thus highly facilitates the ongoing genomic and evolutionary studies for the species. To our knowledge, this is the first public report on genome map construction by the whole-genome mapping for aquatic-organisms. Our study demonstrates a promising application of the whole-genome mapping on genome maps construction for other non-model organisms in a fast and reliable manner.
A high-resolution map of the three-dimensional chromatin interactome in human cells.

PubMed

Jin, Fulai; Li, Yan; Dixon, Jesse R; Selvaraj, Siddarth; Ye, Zhen; Lee, Ah Young; Yen, Chia-An; Schmitt, Anthony D; Espinoza, Celso A; Ren, Bing

2013-11-14

A large number of cis-regulatory sequences have been annotated in the human genome, but defining their target genes remains a challenge. One strategy is to identify the long-range looping interactions at these elements with the use of chromosome conformation capture (3C)-based techniques. However, previous studies lack either the resolution or coverage to permit a whole-genome, unbiased view of chromatin interactions. Here we report a comprehensive chromatin interaction map generated in human fibroblasts using a genome-wide 3C analysis method (Hi-C). We determined over one million long-range chromatin interactions at 5-10-kb resolution, and uncovered general principles of chromatin organization at different types of genomic features. We also characterized the dynamics of promoter-enhancer contacts after TNF-α signalling in these cells. Unexpectedly, we found that TNF-α-responsive enhancers are already in contact with their target promoters before signalling. Such pre-existing chromatin looping, which also exists in other cell types with different extracellular signalling, is a strong predictor of gene induction. Our observations suggest that the three-dimensional chromatin landscape, once established in a particular cell type, is relatively stable and could influence the selection or activation of target genes by a ubiquitous transcription activator in a cell-specific manner.
The Fragmented Mitochondrial Ribosomal RNAs of Plasmodium falciparum

PubMed Central

Feagin, Jean E.; Harrell, Maria Isabel; Lee, Jung C.; Coe, Kevin J.; Sands, Bryan H.; Cannone, Jamie J.; Tami, Germaine; Schnare, Murray N.; Gutell, Robin R.

2012-01-01

Background The mitochondrial genome in the human malaria parasite Plasmodium falciparum is most unusual. Over half the genome is composed of the genes for three classic mitochondrial proteins: cytochrome oxidase subunits I and III and apocytochrome b. The remainder encodes numerous small RNAs, ranging in size from 23 to 190 nt. Previous analysis revealed that some of these transcripts have significant sequence identity with highly conserved regions of large and small subunit rRNAs, and can form the expected secondary structures. However, these rRNA fragments are not encoded in linear order; instead, they are intermixed with one another and the protein coding genes, and are coded on both strands of the genome. This unorthodox arrangement hindered the identification of transcripts corresponding to other regions of rRNA that are highly conserved and/or are known to participate directly in protein synthesis. Principal Findings The identification of 14 additional small mitochondrial transcripts from P. falcipaurm and the assignment of 27 small RNAs (12 SSU RNAs totaling 804 nt, 15 LSU RNAs totaling 1233 nt) to specific regions of rRNA are supported by multiple lines of evidence. The regions now represented are highly similar to those of the small but contiguous mitochondrial rRNAs of Caenorhabditis elegans. The P. falciparum rRNA fragments cluster on the interfaces of the two ribosomal subunits in the three-dimensional structure of the ribosome. Significance All of the rRNA fragments are now presumed to have been identified with experimental methods, and nearly all of these have been mapped onto the SSU and LSU rRNAs. Conversely, all regions of the rRNAs that are known to be directly associated with protein synthesis have been identified in the P. falciparum mitochondrial genome and RNA transcripts. The fragmentation of the rRNA in the P. falciparum mitochondrion is the most extreme example of any rRNA fragmentation discovered. PMID:22761677
Transcriptional activity across the Epstein-Barr virus genome in Raji cells during latency and after induction of an abortive lytic cycle.

PubMed

Kirchner, E A; Bornkamm, G W; Polack, A

1991-10-01

We have studied the relative rate of transcription across the Epstein-Barr virus genome in the Burkitt's lymphoma cell line Raji by nuclear run-on analysis during latency and after induction of an abortive lytic cycle with 12-0-tetradecanoylphorbol 13-acetate (TPA) and 5-iodo-2'-deoxyuridine (IUdR). During latency the entire, or almost the entire, viral genome was found to be transcriptionally active to a low or intermediate extent, with some variation in activity along the genome. The fragment with the highest transcriptional activity was EcoRI J, which contains the genes encoding the small nuclear RNAs EBER1 and -2, transcribed predominantly by RNA polymerase III. An intermediate level of transcription was observed between positions 10 and 138 (kb), with areas of slightly higher activity on the large internal repeats and the left duplicated region (DL). The remaining part of the viral genome, between position 138 and the termini, and the termini and position 10 (kb) (with the exception of the EcoRI J fragment), showed very little transcriptional activity, except for the intermediately active regions carrying the righthand oriLyt (DR) and the terminal repeats. Upon induction of the viral genome with TPA and IUdR, the viral genome was transcriptionally active at a rate at least tenfold that seen during latency. Polymerases were not equally distributed along the genome after induction; the highest density was found in regions 48 to 58 kb, 82 to 84 kb, 102 to 104 kb, 118 to 122 kb and 142 to 145 kb of the viral genome. High transcriptional activity correlated with distinct transcription units in some cases, i.e. BamHI H1LF1 (DL), BamHI MLF1, BamHI ZLF1/BamHI RLF1 and BamHI X (thymidine kinase), but not in others (BamHI H2). Besides initiation of transcription, other regulatory processes such as stabilization and processing of primary transcripts may also contribute to regulation of virus gene expression. Addition of cycloheximide completely abolished the transcriptional activation of the genome mediated by TPA and IUdR.
Genome-wide analysis of WRKY transcription factors in Solanum lycopersicum.

PubMed

Huang, Shengxiong; Gao, Yongfeng; Liu, Jikai; Peng, Xiaoli; Niu, Xiangli; Fei, Zhangjun; Cao, Shuqing; Liu, Yongsheng

2012-06-01

The WRKY transcription factors have been implicated in multiple biological processes in plants, especially in regulating defense against biotic and abiotic stresses. However, little information is available about the WRKYs in tomato (Solanum lycopersicum). The recent release of the whole-genome sequence of tomato allowed us to perform a genome-wide investigation for tomato WRKY proteins, and to compare these positively identified proteins with their orthologs in model plants, such as Arabidopsis and rice. In the present study, based on the recently released tomato whole-genome sequences, we identified 81 SlWRKY genes that were classified into three main groups, with the second group further divided into five subgroups. Depending on WRKY domains' sequences derived from tomato, Arabidopsis and rice, construction of a phylogenetic tree demonstrated distinct clustering and unique gene expansion of WRKY genes among the three species. Genome mapping analysis revealed that tomato WRKY genes were enriched on several chromosomes, especially on chromosome 5, and 16 % of the family members were tandemly duplicated genes. The tomato WRKYs from each group were shown to share similar motif compositions. Furthermore, tomato WRKY genes showed distinct temporal and spatial expression patterns in different developmental processes and in response to various biotic and abiotic stresses. The expression of 18 selected tomato WRKY genes in response to drought and salt stresses and Pseudomonas syringae invasion, respectively, was validated by quantitative RT-PCR. Our results will provide a platform for functional identification and molecular breeding study of WRKY genes in tomato and probably other Solanaceae plants.
Efficient and accurate causal inference with hidden confounders from genome-transcriptome variation data

PubMed Central

2017-01-01

Mapping gene expression as a quantitative trait using whole genome-sequencing and transcriptome analysis allows to discover the functional consequences of genetic variation. We developed a novel method and ultra-fast software Findr for higly accurate causal inference between gene expression traits using cis-regulatory DNA variations as causal anchors, which improves current methods by taking into consideration hidden confounders and weak regulations. Findr outperformed existing methods on the DREAM5 Systems Genetics challenge and on the prediction of microRNA and transcription factor targets in human lymphoblastoid cells, while being nearly a million times faster. Findr is publicly available at https://github.com/lingfeiwang/findr. PMID:28821014
Multi-tissue transcriptomics for construction of a comprehensive gene resource for the terrestrial snail Theba pisana.

PubMed

Zhao, M; Wang, T; Adamson, K J; Storey, K B; Cummins, S F

2016-02-08

The land snail Theba pisana is native to the Mediterranean region but has become one of the most abundant invasive species worldwide. Here, we present three transcriptomes of this agriculture pest derived from three tissues: the central nervous system, hepatopancreas (digestive gland), and foot muscle. Sequencing of the three tissues produced 339,479,092 high quality reads and a global de novo assembly generated a total of 250,848 unique transcripts (unigenes). BLAST analysis mapped 52,590 unigenes to NCBI non-redundant protein databases and further functional analysis annotated 21,849 unigenes with gene ontology. We report that T. pisana transcripts have representatives in all functional classes and a comparison of differentially expressed transcripts amongst all three tissues demonstrates enormous differences in their potential metabolic activities. The genes differentially expressed include those with sequence similarity to those genes associated with multiple bacterial diseases and neurological diseases. To provide a valuable resource that will assist functional genomics study, we have implemented a user-friendly web interface, ThebaDB (http://thebadb.bioinfo-minzhao.org/). This online database allows for complex text queries, sequence searches, and data browsing by enriched functional terms and KEGG mapping.
The MAP kinase substrate MKS1 is a regulator of plant defense responses

PubMed Central

Andreasson, Erik; Jenkins, Thomas; Brodersen, Peter; Thorgrimsen, Stephan; Petersen, Nikolaj H T; Zhu, Shijiang; Qiu, Jin-Long; Micheelsen, Pernille; Rocher, Anne; Petersen, Morten; Newman, Mari-Anne; Bjørn Nielsen, Henrik; Hirt, Heribert; Somssich, Imre; Mattsson, Ole; Mundy, John

2005-01-01

Arabidopsis MAP kinase 4 (MPK4) functions as a regulator of pathogen defense responses, because it is required for both repression of salicylic acid (SA)-dependent resistance and for activation of jasmonate (JA)-dependent defense gene expression. To understand MPK4 signaling mechanisms, we used yeast two-hybrid screening to identify the MPK4 substrate MKS1. Analyses of transgenic plants and genome-wide transcript profiling indicated that MKS1 is required for full SA-dependent resistance in mpk4 mutants, and that overexpression of MKS1 in wild-type plants is sufficient to activate SA-dependent resistance, but does not interfere with induction of a defense gene by JA. Further yeast two-hybrid screening revealed that MKS1 interacts with the WRKY transcription factors WRKY25 and WRKY33. WRKY25 and WRKY33 were shown to be in vitro substrates of MPK4, and a wrky33 knockout mutant was found to exhibit increased expression of the SA-related defense gene PR1. MKS1 may therefore contribute to MPK4-regulated defense activation by coupling the kinase to specific WRKY transcription factors. PMID:15990873
Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes.

PubMed

Riechmann, J L; Heard, J; Martin, G; Reuber, L; Jiang, C; Keddie, J; Adam, L; Pineda, O; Ratcliffe, O J; Samaha, R R; Creelman, R; Pilgrim, M; Broun, P; Zhang, J Z; Ghandehari, D; Sherman, B K; Yu, G

2000-12-15

The completion of the Arabidopsis thaliana genome sequence allows a comparative analysis of transcriptional regulators across the three eukaryotic kingdoms. Arabidopsis dedicates over 5% of its genome to code for more than 1500 transcription factors, about 45% of which are from families specific to plants. Arabidopsis transcription factors that belong to families common to all eukaryotes do not share significant similarity with those of the other kingdoms beyond the conserved DNA binding domains, many of which have been arranged in combinations specific to each lineage. The genome-wide comparison reveals the evolutionary generation of diversity in the regulation of transcription.
Transcription and chromatin determinants of de novo DNA methylation timing in oocytes.

PubMed

Gahurova, Lenka; Tomizawa, Shin-Ichi; Smallwood, Sébastien A; Stewart-Morgan, Kathleen R; Saadeh, Heba; Kim, Jeesun; Andrews, Simon R; Chen, Taiping; Kelsey, Gavin

2017-01-01

Gametogenesis in mammals entails profound re-patterning of the epigenome. In the female germline, DNA methylation is acquired late in oogenesis from an essentially unmethylated baseline and is established largely as a consequence of transcription events. Molecular and functional studies have shown that imprinted genes become methylated at different times during oocyte growth; however, little is known about the kinetics of methylation gain genome wide and the reasons for asynchrony in methylation at imprinted loci. Given the predominant role of transcription, we sought to investigate whether transcription timing is rate limiting for de novo methylation and determines the asynchrony of methylation events. Therefore, we generated genome-wide methylation and transcriptome maps of size-selected, growing oocytes to capture the onset and progression of methylation. We find that most sequence elements, including most classes of transposable elements, acquire methylation at similar rates overall. However, methylation of CpG islands (CGIs) is delayed compared with the genome average and there are reproducible differences amongst CGIs in onset of methylation. Although more highly transcribed genes acquire methylation earlier, the major transitions in the oocyte transcriptome occur well before the de novo methylation phase, indicating that transcription is generally not rate limiting in conferring permissiveness to DNA methylation. Instead, CGI methylation timing negatively correlates with enrichment for histone 3 lysine 4 (H3K4) methylation and dependence on the H3K4 demethylases KDM1A and KDM1B, implicating chromatin remodelling as a major determinant of methylation timing. We also identified differential enrichment of transcription factor binding motifs in CGIs acquiring methylation early or late in oocyte growth. By combining these parameters into multiple regression models, we were able to account for about a fifth of the variation in methylation timing of CGIs. Finally, we show that establishment of non-CpG methylation, which is prevalent in fully grown oocytes, and methylation over non-transcribed regions, are later events in oogenesis. These results do not support a major role for transcriptional transitions in the time of onset of DNA methylation in the oocyte, but suggest a model in which sequences least dependent on chromatin remodelling are the earliest to become permissive for methylation.
Comprehensive Transcriptome Analysis of Response to Nickel Stress in White Birch (Betula papyrifera)

PubMed Central

Theriault, Gabriel; Michael, Paul; Nkongolo, Kabwe

2016-01-01

White birch (Betula papyrifera) is a dominant tree species of the Boreal Forest. Recent studies have shown that it is fairly resistant to heavy metal contamination, specifically to nickel. Knowledge of regulation of genes associated with metal resistance in higher plants is very sketchy. Availability and annotation of the dwarf birch (B. nana) enables the use of high throughout sequencing approaches to understanding responses to environmental challenges in other Betula species such as B. papyrifera. The main objectives of this study are to 1) develop and characterize the B. papyrifera transcriptome, 2) assess gene expression dynamics of B. papyrifera in response to nickel stress, and 3) describe gene function based on ontology. Nickel resistant and susceptible genotypes were selected and used for transcriptome analysis. A total of 208,058 trinity genes were identified and were assembled to 275,545 total trinity transcripts. The transcripts were mapped to protein sequences and based on best match; we annotated the B. papyrifera genes and assigned gene ontology. In total, 215,700 transcripts were annotated and were compared to the published B. nana genome. Overall, a genomic match for 61% transcripts with the reference genome was found. Expression profiles were generated and 62,587 genes were found to be significantly differentially expressed among the nickel resistant, susceptible, and untreated libraries. The main nickel resistance mechanism in B. papyrifera is a downregulation of genes associated with translation (in ribosome), binding, and transporter activities. Five candidate genes associated to nickel resistance were identified. They include Glutathione S–transferase, thioredoxin family protein, putative transmembrane protein and two Nramp transporters. These genes could be useful for genetic engineering of birch trees. PMID:27082755
SpinachDB: A Well-Characterized Genomic Database for Gene Family Classification and SNP Information of Spinach.

PubMed

Yang, Xue-Dong; Tan, Hua-Wei; Zhu, Wei-Min

2016-01-01

Spinach (Spinacia oleracea L.), which originated in central and western Asia, belongs to the family Amaranthaceae. Spinach is one of most important leafy vegetables with a high nutritional value as well as being a perfect research material for plant sex chromosome models. As the completion of genome assembly and gene prediction of spinach, we developed SpinachDB (http://222.73.98.124/spinachdb) to store, annotate, mine and analyze genomics and genetics datasets efficiently. In this study, all of 21702 spinach genes were annotated. A total of 15741 spinach genes were catalogued into 4351 families, including identification of a substantial number of transcription factors. To construct a high-density genetic map, a total of 131592 SSRs and 1125743 potential SNPs located in 548801 loci of spinach genome were identified in 11 cultivated and wild spinach cultivars. The expression profiles were also performed with RNA-seq data using the FPKM method, which could be used to compare the genes. Paralogs in spinach and the orthologous genes in Arabidopsis, grape, sugar beet and rice were identified for comparative genome analysis. Finally, the SpinachDB website contains seven main sections, including the homepage; the GBrowse map that integrates genome, genes, SSR and SNP marker information; the Blast alignment service; the gene family classification search tool; the orthologous and paralogous gene pairs search tool; and the download and useful contact information. SpinachDB will be continually expanded to include newly generated robust genomics and genetics data sets along with the associated data mining and analysis tools.
Convergent evolution of adenosine aptamers spanning bacterial, human, and random sequences revealed by structure-based bioinformatics and genomic SELEX

PubMed Central

Vu, Michael M. K.; Jameson, Nora E.; Masuda, Stuart J.; Lin, Dana; Larralde-Ridaura, Rosa; Lupták, Andrej

2012-01-01

SUMMARY Aptamers are structured macromolecules in vitro evolved to bind molecular targets, whereas in nature they form the ligand-binding domains of riboswitches. Adenosine aptamers of a single structural family were isolated several times from random pools but they have not been identified in genomic sequences. We used two unbiased methods, structure-based bioinformatics and human genome-based in vitro selection, to identify aptamers that form the same adenosine-binding structure in a bacterium, and several vertebrates, including humans. Two of the human aptamers map to introns of RAB3C and FGD3 genes. The RAB3C aptamer binds ATP with dissociation constants about ten times lower than physiological ATP concentration, while the minimal FGD3 aptamer binds ATP only co-transcriptionally. PMID:23102219
Mapping replication dynamics in Trypanosoma brucei reveals a link with telomere transcription and antigenic variation

PubMed Central

Devlin, Rebecca; Marques, Catarina A; Paape, Daniel; Prorocic, Marko; Zurita-Leal, Andrea C; Campbell, Samantha J; Lapsley, Craig; Dickens, Nicholas; McCulloch, Richard

2016-01-01

Survival of Trypanosoma brucei depends upon switches in its protective Variant Surface Glycoprotein (VSG) coat by antigenic variation. VSG switching occurs by frequent homologous recombination, which is thought to require locus-specific initiation. Here, we show that a RecQ helicase, RECQ2, acts to repair DNA breaks, including in the telomeric site of VSG expression. Despite this, RECQ2 loss does not impair antigenic variation, but causes increased VSG switching by recombination, arguing against models for VSG switch initiation through direct generation of a DNA double strand break (DSB). Indeed, we show DSBs inefficiently direct recombination in the VSG expression site. By mapping genome replication dynamics, we reveal that the transcribed VSG expression site is the only telomeric site that is early replicating – a differential timing only seen in mammal-infective parasites. Specific association between VSG transcription and replication timing reveals a model for antigenic variation based on replication-derived DNA fragility. DOI: http://dx.doi.org/10.7554/eLife.12765.001 PMID:27228154
Aptazyme-embedded guide RNAs enable ligand-responsive genome editing and transcriptional activation

PubMed Central

Tang, Weixin; Hu, Johnny H.; Liu, David R.

2017-01-01

Programmable sequence-specific genome editing agents such as CRISPR-Cas9 have greatly advanced our ability to manipulate the human genome. Although canonical forms of genome-editing agents and programmable transcriptional regulators are constitutively active, precise temporal and spatial control over genome editing and transcriptional regulation activities would enable the more selective and potentially safer use of these powerful technologies. Here, by incorporating ligand-responsive self-cleaving catalytic RNAs (aptazymes) into guide RNAs, we developed a set of aptazyme-embedded guide RNAs that enable small molecule-controlled nuclease-mediated genome editing and small molecule-controlled base editing, as well as small molecule-dependent transcriptional activation in mammalian cells. PMID:28656978
ANISEED 2017: extending the integrated ascidian database to the exploration and evolutionary comparison of genome-scale datasets.

PubMed

Brozovic, Matija; Dantec, Christelle; Dardaillon, Justine; Dauga, Delphine; Faure, Emmanuel; Gineste, Mathieu; Louis, Alexandra; Naville, Magali; Nitta, Kazuhiro R; Piette, Jacques; Reeves, Wendy; Scornavacca, Céline; Simion, Paul; Vincentelli, Renaud; Bellec, Maelle; Aicha, Sameh Ben; Fagotto, Marie; Guéroult-Bellone, Marion; Haeussler, Maximilian; Jacox, Edwin; Lowe, Elijah K; Mendez, Mickael; Roberge, Alexis; Stolfi, Alberto; Yokomori, Rui; Brown, C Titus; Cambillau, Christian; Christiaen, Lionel; Delsuc, Frédéric; Douzery, Emmanuel; Dumollard, Rémi; Kusakabe, Takehiro; Nakai, Kenta; Nishida, Hiroki; Satou, Yutaka; Swalla, Billie; Veeman, Michael; Volff, Jean-Nicolas; Lemaire, Patrick

2018-01-04

ANISEED (www.aniseed.cnrs.fr) is the main model organism database for tunicates, the sister-group of vertebrates. This release gives access to annotated genomes, gene expression patterns, and anatomical descriptions for nine ascidian species. It provides increased integration with external molecular and taxonomy databases, better support for epigenomics datasets, in particular RNA-seq, ChIP-seq and SELEX-seq, and features novel interactive interfaces for existing and novel datatypes. In particular, the cross-species navigation and comparison is enhanced through a novel taxonomy section describing each represented species and through the implementation of interactive phylogenetic gene trees for 60% of tunicate genes. The gene expression section displays the results of RNA-seq experiments for the three major model species of solitary ascidians. Gene expression is controlled by the binding of transcription factors to cis-regulatory sequences. A high-resolution description of the DNA-binding specificity for 131 Ciona robusta (formerly C. intestinalis type A) transcription factors by SELEX-seq is provided and used to map candidate binding sites across the Ciona robusta and Phallusia mammillata genomes. Finally, use of a WashU Epigenome browser enhances genome navigation, while a Genomicus server was set up to explore microsynteny relationships within tunicates and with vertebrates, Amphioxus, echinoderms and hemichordates. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Genome-wide mapping of DNase I hypersensitive sites in rare cell populations using single-cell DNase sequencing.

PubMed

Cooper, James; Ding, Yi; Song, Jiuzhou; Zhao, Keji

2017-11-01

Increased chromatin accessibility is a feature of cell-type-specific cis-regulatory elements; therefore, mapping of DNase I hypersensitive sites (DHSs) enables the detection of active regulatory elements of transcription, including promoters, enhancers, insulators and locus-control regions. Single-cell DNase sequencing (scDNase-seq) is a method of detecting genome-wide DHSs when starting with either single cells or <1,000 cells from primary cell sources. This technique enables genome-wide mapping of hypersensitive sites in a wide range of cell populations that cannot be analyzed using conventional DNase I sequencing because of the requirement for millions of starting cells. Fresh cells, formaldehyde-cross-linked cells or cells recovered from formalin-fixed paraffin-embedded (FFPE) tissue slides are suitable for scDNase-seq assays. To generate scDNase-seq libraries, cells are lysed and then digested with DNase I. Circular carrier plasmid DNA is included during subsequent DNA purification and library preparation steps to prevent loss of the small quantity of DHS DNA. Libraries are generated for high-throughput sequencing on the Illumina platform using standard methods. Preparation of scDNase-seq libraries requires only 2 d. The materials and molecular biology techniques described in this protocol should be accessible to any general molecular biology laboratory. Processing of high-throughput sequencing data requires basic bioinformatics skills and uses publicly available bioinformatics software.

RPO41-independent maintenance of [rho-] mitochondrial DNA in Saccharomyces cerevisiae.

PubMed

Fangman, W L; Henly, J W; Brewer, B J

1990-01-01

A subset of promoters in the mitochondrial DNA (mtDNA) of the yeast Saccharomyces cerevisiae has been proposed to participate in replication initiation, giving rise to a primer through site-specific cleavage of an RNA transcript. To test whether transcription is essential for mtDNA maintenance, we examined two simple mtDNA deletion ([rho-]) genomes in yeast cells. One genome (HS3324) contains a consensus promoter (ATATAAGTA) for the mitochondrial RNA polymerase encoded by the nuclear gene RPO41, and the other genome (4a) does not. As anticipated, in RPO41 cells transcripts from the HS3324 genome were more abundant than were transcripts from the 4a genome. When the RPO41 gene was disrupted, both [rho-] genomes were efficiently maintained. The level of transcripts from HS3324 mtDNA was decreased greater than 400-fold in cells carrying the RPO41 disrupted gene; however, the low-level transcripts from 4a mtDNA were undiminished. These results indicate that replication of [rho-] genomes can be initiated in the absence of wild-type levels of the RPO41-encoded RNA polymerase.
Semantic integration of data on transcriptional regulation

PubMed Central

Baitaluk, Michael; Ponomarenko, Julia

2010-01-01

Motivation: Experimental and predicted data concerning gene transcriptional regulation are distributed among many heterogeneous sources. However, there are no resources to integrate these data automatically or to provide a ‘one-stop shop’ experience for users seeking information essential for deciphering and modeling gene regulatory networks. Results: IntegromeDB, a semantic graph-based ‘deep-web’ data integration system that automatically captures, integrates and manages publicly available data concerning transcriptional regulation, as well as other relevant biological information, is proposed in this article. The problems associated with data integration are addressed by ontology-driven data mapping, multiple data annotation and heterogeneous data querying, also enabling integration of the user's data. IntegromeDB integrates over 100 experimental and computational data sources relating to genomics, transcriptomics, genetics, and functional and interaction data concerning gene transcriptional regulation in eukaryotes and prokaryotes. Availability: IntegromeDB is accessible through the integrated research environment BiologicalNetworks at http://www.BiologicalNetworks.org Contact: baitaluk@sdsc.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20427517
Semantic integration of data on transcriptional regulation.

PubMed

Baitaluk, Michael; Ponomarenko, Julia

2010-07-01

Experimental and predicted data concerning gene transcriptional regulation are distributed among many heterogeneous sources. However, there are no resources to integrate these data automatically or to provide a 'one-stop shop' experience for users seeking information essential for deciphering and modeling gene regulatory networks. IntegromeDB, a semantic graph-based 'deep-web' data integration system that automatically captures, integrates and manages publicly available data concerning transcriptional regulation, as well as other relevant biological information, is proposed in this article. The problems associated with data integration are addressed by ontology-driven data mapping, multiple data annotation and heterogeneous data querying, also enabling integration of the user's data. IntegromeDB integrates over 100 experimental and computational data sources relating to genomics, transcriptomics, genetics, and functional and interaction data concerning gene transcriptional regulation in eukaryotes and prokaryotes. IntegromeDB is accessible through the integrated research environment BiologicalNetworks at http://www.BiologicalNetworks.org baitaluk@sdsc.edu Supplementary data are available at Bioinformatics online.
An Endogenous Accelerator for Viral Gene Expression Confers a Fitness Advantage

PubMed Central

Teng, Melissa W.; Bolovan-Fritts, Cynthia; Dar, Roy D.; Womack, Andrew; Simpson, Michael L.; Shenk, Thomas; Weinberger, Leor S.

2012-01-01

Many signaling circuits face a fundamental tradeoff between accelerating their response speed while maintaining final levels below a cytotoxic threshold. Here, we describe a transcriptional circuitry that dynamically converts signaling inputs into faster rates without amplifying final equilibrium levels. Using time-lapse microscopy, we find that transcriptional activators accelerate human cytomegalovirus (CMV) gene expression in single cells without amplifying steady-state expression levels, and this acceleration generates a significant replication advantage. We map the accelerator to a highly self-cooperative transcriptional negative-feedback loop (Hill coefficient ~ 7) generated by homo-multimerization of the virus’s essential transactivator protein IE2 at nuclear PML bodies. Eliminating the IE2-accelerator circuit reduces transcriptional strength through mislocalization of incoming viral genomes away from PML bodies and carries a heavy fitness cost. In general, accelerators may provide a mechanism for signal-transduction circuits to respond quickly to external signals without increasing steady-state levels of potentially cytotoxic molecules. PMID:23260143
Identification of neuronal target genes for CCAAT/Enhancer Binding Proteins

PubMed Central

Kfoury, N.; Kapatos, G.

2009-01-01

CCAAT/Enhancer Binding Proteins (C/EBPs) play pivotal roles in development and plasticity of the nervous system. Identification of the physiological targets of C/EBPs (C/EBP target genes) should therefore provide insight into the underlying biology of these processes. We used unbiased genome-wide mapping to identify 115 C/EBPβ target genes in PC12 cells that include transcription factors, neurotransmitter receptors, ion channels, protein kinases and synaptic vesicle proteins. C/EBPβ binding sites were located primarily within introns, suggesting novel regulatory functions, and were associated with binding sites for other developmentally important transcription factors. Experiments using dominant negatives showed C/EBPβ to repress transcription of a subset of target genes. Target genes in rat brain were subsequently found to preferentially bind C/EBPα, β and δ. Analysis of the hippocampal transcriptome of C/EBPβ knockout mice revealed dysregulation of a high percentage of transcripts identified as C/EBP target genes. These results support the hypothesis that C/EBPs play non-redundant roles in the brain. PMID:19103292
Functional classification of rice flanking sequence tagged genes using MapMan terms and global understanding on metabolic and regulatory pathways affected by dxr mutant having defects in light response.

PubMed

Chandran, Anil Kumar Nalini; Lee, Gang-Seob; Yoo, Yo-Han; Yoon, Ung-Han; Ahn, Byung-Ohg; Yun, Doh-Won; Kim, Jin-Hyun; Choi, Hong-Kyu; An, GynHeung; Kim, Tae-Ho; Jung, Ki-Hong

2016-12-01

Rice is one of the most important food crops for humans. To improve the agronomical traits of rice, the functions of more than 1,000 rice genes have been recently characterized and summarized. The completed, map-based sequence of the rice genome has significantly accelerated the functional characterization of rice genes, but progress remains limited in assigning functions to all predicted non-transposable element (non-TE) genes, estimated to number 37,000-41,000. The International Rice Functional Genomics Consortium (IRFGC) has generated a huge number of gene-indexed mutants by using mutagens such as T-DNA, Tos17 and Ds/dSpm. These mutants have been identified by 246,566 flanking sequence tags (FSTs) and cover 65 % (25,275 of 38,869) of the non-TE genes in rice, while the mutation ratio of TE genes is 25.7 %. In addition, almost 80 % of highly expressed non-TE genes have insertion mutations, indicating that highly expressed genes in rice chromosomes are more likely to have mutations by mutagens such as T-DNA, Ds, dSpm and Tos17. The functions of around 2.5 % of rice genes have been characterized, and studies have mainly focused on transcriptional and post-transcriptional regulation. Slow progress in characterizing the function of rice genes is mainly due to a lack of clues to guide functional studies or functional redundancy. These limitations can be partially solved by a well-categorized functional classification of FST genes. To create this classification, we used the diverse overviews installed in the MapMan toolkit. Gene Ontology (GO) assignment to FST genes supplemented the limitation of MapMan overviews. The functions of 863 of 1,022 known genes can be evaluated by current FST lines, indicating that FST genes are useful resources for functional genomic studies. We assigned 16,169 out of 29,624 FST genes to 34 MapMan classes, including major three categories such as DNA, RNA and protein. To demonstrate the MapMan application on FST genes, transcriptome analysis was done from a rice mutant of 1-deoxy-D-xylulose 5-phosphate reductoisomerase (DXR) gene with FST. Mapping of 756 down-regulated genes in dxr mutants and their annotation in terms of various MapMan overviews revealed candidate genes downstream of DXR-mediating light signaling pathway in diverse functional classes such as the methyl-D-erythritol 4-phosphatepathway (MEP) pathway overview, photosynthesis, secondary metabolism and regulatory overview. This report provides a useful guide for systematic phenomics and further applications to enhance the key agronomic traits of rice.
Subtracting the sequence bias from partially digested MNase-seq data reveals a general contribution of TFIIS to nucleosome positioning.

PubMed

Gutiérrez, Gabriel; Millán-Zambrano, Gonzalo; Medina, Daniel A; Jordán-Pla, Antonio; Pérez-Ortín, José E; Peñate, Xenia; Chávez, Sebastián

2017-12-07

TFIIS stimulates RNA cleavage by RNA polymerase II and promotes the resolution of backtracking events. TFIIS acts in the chromatin context, but its contribution to the chromatin landscape has not yet been investigated. Co-transcriptional chromatin alterations include subtle changes in nucleosome positioning, like those expected to be elicited by TFIIS, which are elusive to detect. The most popular method to map nucleosomes involves intensive chromatin digestion by micrococcal nuclease (MNase). Maps based on these exhaustively digested samples miss any MNase-sensitive nucleosomes caused by transcription. In contrast, partial digestion approaches preserve such nucleosomes, but introduce noise due to MNase sequence preferences. A systematic way of correcting this bias for massively parallel sequencing experiments is still missing. To investigate the contribution of TFIIS to the chromatin landscape, we developed a refined nucleosome-mapping method in Saccharomyces cerevisiae. Based on partial MNase digestion and a sequence-bias correction derived from naked DNA cleavage, the refined method efficiently mapped nucleosomes in promoter regions rich in MNase-sensitive structures. The naked DNA correction was also important for mapping gene body nucleosomes, particularly in those genes whose core promoters contain a canonical TATA element. With this improved method, we analyzed the global nucleosomal changes caused by lack of TFIIS. We detected a general increase in nucleosomal fuzziness and more restricted changes in nucleosome occupancy, which concentrated in some gene categories. The TATA-containing genes were preferentially associated with decreased occupancy in gene bodies, whereas the TATA-like genes did so with increased fuzziness. The detected chromatin alterations correlated with functional defects in nascent transcription, as revealed by genomic run-on experiments. The combination of partial MNase digestion and naked DNA correction of the sequence bias is a precise nucleosomal mapping method that does not exclude MNase-sensitive nucleosomes. This method is useful for detecting subtle alterations in nucleosome positioning produced by lack of TFIIS. Their analysis revealed that TFIIS generally contributed to nucleosome positioning in both gene promoters and bodies. The independent effect of lack of TFIIS on nucleosome occupancy and fuzziness supports the existence of alternative chromatin dynamics during transcription elongation.
Draft Sequencing of the Heterozygous Diploid Genome of Satsuma (Citrus unshiu Marc.) Using a Hybrid Assembly Approach

PubMed Central

Shimizu, Tokurou; Tanizawa, Yasuhiro; Mochizuki, Takako; Nagasaki, Hideki; Yoshioka, Terutaka; Toyoda, Atsushi; Fujiyama, Asao; Kaminuma, Eli; Nakamura, Yasukazu

2017-01-01

Satsuma (Citrus unshiu Marc.) is one of the most abundantly produced mandarin varieties of citrus, known for its seedless fruit production and as a breeding parent of citrus. De novo assembly of the heterozygous diploid genome of Satsuma (“Miyagawa Wase”) was conducted by a hybrid assembly approach using short-read sequences, three mate-pair libraries, and a long-read sequence of PacBio by the PLATANUS assembler. The assembled sequence, with a total size of 359.7 Mb at the N50 length of 386,404 bp, consisted of 20,876 scaffolds. Pseudomolecules of Satsuma constructed by aligning the scaffolds to three genetic maps showed genome-wide synteny to the genomes of Clementine, pummelo, and sweet orange. Gene prediction by modeling with MAKER-P proposed 29,024 genes and 37,970 mRNA; additionally, gene prediction analysis found candidates for novel genes in several biosynthesis pathways for gibberellin and violaxanthin catabolism. BUSCO scores for the assembled scaffold and predicted transcripts, and another analysis by BAC end sequence mapping indicated the assembled genome consistency was close to those of the haploid Clementine, pummel, and sweet orange genomes. The number of repeat elements and long terminal repeat retrotransposon were comparable to those of the seven citrus genomes; this suggested no significant failure in the assembly at the repeat region. A resequencing application using the assembled sequence confirmed that both kunenbo-A and Satsuma are offsprings of Kishu, and Satsuma is a back-crossed offspring of Kishu. These results illustrated the performance of the hybrid assembly approach and its ability to construct an accurate heterozygous diploid genome. PMID:29259619
Draft Sequencing of the Heterozygous Diploid Genome of Satsuma (Citrus unshiu Marc.) Using a Hybrid Assembly Approach.

PubMed

Shimizu, Tokurou; Tanizawa, Yasuhiro; Mochizuki, Takako; Nagasaki, Hideki; Yoshioka, Terutaka; Toyoda, Atsushi; Fujiyama, Asao; Kaminuma, Eli; Nakamura, Yasukazu

2017-01-01

Satsuma ( Citrus unshiu Marc.) is one of the most abundantly produced mandarin varieties of citrus, known for its seedless fruit production and as a breeding parent of citrus. De novo assembly of the heterozygous diploid genome of Satsuma ("Miyagawa Wase") was conducted by a hybrid assembly approach using short-read sequences, three mate-pair libraries, and a long-read sequence of PacBio by the PLATANUS assembler. The assembled sequence, with a total size of 359.7 Mb at the N 50 length of 386,404 bp, consisted of 20,876 scaffolds. Pseudomolecules of Satsuma constructed by aligning the scaffolds to three genetic maps showed genome-wide synteny to the genomes of Clementine, pummelo, and sweet orange. Gene prediction by modeling with MAKER-P proposed 29,024 genes and 37,970 mRNA; additionally, gene prediction analysis found candidates for novel genes in several biosynthesis pathways for gibberellin and violaxanthin catabolism. BUSCO scores for the assembled scaffold and predicted transcripts, and another analysis by BAC end sequence mapping indicated the assembled genome consistency was close to those of the haploid Clementine, pummel, and sweet orange genomes. The number of repeat elements and long terminal repeat retrotransposon were comparable to those of the seven citrus genomes; this suggested no significant failure in the assembly at the repeat region. A resequencing application using the assembled sequence confirmed that both kunenbo-A and Satsuma are offsprings of Kishu, and Satsuma is a back-crossed offspring of Kishu. These results illustrated the performance of the hybrid assembly approach and its ability to construct an accurate heterozygous diploid genome.
Identification and reproducibility of diagnostic DNA markers for tuber starch and yield optimization in a novel association mapping population of potato (Solanum tuberosum L.).

PubMed

Schönhals, E M; Ortega, F; Barandalla, L; Aragones, A; Ruiz de Galarreta, J I; Liao, J-C; Sanetomo, R; Walkemeier, B; Tacke, E; Ritter, E; Gebhardt, C

2016-04-01

SNPs in candidate genes Pain - 1, InvCD141 (invertases), SSIV (starch synthase), StCDF1 (transcription factor), LapN (leucine aminopeptidase), and cytoplasm type are associated with potato tuber yield, starch content and/or starch yield. Tuber yield (TY), starch content (TSC), and starch yield (TSY) are complex characters of high importance for the potato crop in general and for industrial starch production in particular. DNA markers associated with superior alleles of genes that control the natural variation of TY, TSC, and TSY could increase precision and speed of breeding new cultivars optimized for potato starch production. Diagnostic DNA markers are identified by association mapping in populations of tetraploid potato varieties and advanced breeding clones. A novel association mapping population of 282 genotypes including varieties, breeding clones and Andean landraces was assembled and field evaluated in Northern Spain for TY, TSC, TSY, tuber number (TN) and tuber weight (TW). The landraces had lower mean values of TY, TW, TN, and TSY. The population was genotyped for 183 microsatellite alleles, 221 single nucleotide polymorphisms (SNPs) in fourteen candidate genes and eight known diagnostic markers for TSC and TSY. Association test statistics including kinship and population structure reproduced five known marker-trait associations of candidate genes and discovered new ones, particularly for tuber yield and starch yield. The inclusion of landraces increased the number of detected marker-trait associations. Integration of the present association mapping results with previous QTL linkage mapping studies for TY, TSC, TSY, TW, TN, and tuberization revealed some hot spots of QTL for these traits in the potato genome. The genomic positions of markers linked or associated with QTL for complex tuber traits suggest high multiplicity and genome wide distribution of the underlying genes.
Markers and mapping revisited: finding your gene.

PubMed

Jones, Neil; Ougham, Helen; Thomas, Howard; Pasakinskiene, Izolda

2009-01-01

This paper is an update of our earlier review (Jones et al., 1997, Markers and mapping: we are all geneticists now. New Phytologist 137: 165-177), which dealt with the genetics of mapping, in terms of recombination as the basis of the procedure, and covered some of the first generation of markers, including restriction fragment length polymorphisms (RFLPs), random amplified polymorphic DNA (RAPDs), simple sequence repeats (SSRs) and quantitative trait loci (QTLs). In the intervening decade there have been numerous developments in marker science with many new systems becoming available, which are herein described: cleavage amplification polymorphism (CAP), sequence-specific amplification polymorphism (S-SAP), inter-simple sequence repeat (ISSR), sequence tagged site (STS), sequence characterized amplification region (SCAR), selective amplification of microsatellite polymorphic loci (SAMPL), single nucleotide polymorphism (SNP), expressed sequence tag (EST), sequence-related amplified polymorphism (SRAP), target region amplification polymorphism (TRAP), microarrays, diversity arrays technology (DArT), single-strand conformation polymorphism (SSCP), denaturing gradient gel electrophoresis (DGGE), temperature gradient gel electrophoresis (TGGE) and methylation-sensitive PCR. In addition there has been an explosion of knowledge and databases in the area of genomics and bioinformatics. The number of flowering plant ESTs is c. 19 million and counting, with all the opportunity that this provides for gene-hunting, while the survey of bioinformatics and computer resources points to a rapid growth point for future activities in unravelling and applying the burst of new information on plant genomes. A case study is presented on tracking down a specific gene (stay-green (SGR), a post-transcriptional senescence regulator) using the full suite of mapping tools and comparative mapping resources. We end with a brief speculation on how genome analysis may progress into the future of this highly dynamic arena of plant science.
In silico comparison of genomic regions containing genes coding for enzymes and transcription factors for the phenylpropanoid pathway in Phaseolus vulgaris L. and Glycine max L. Merr

PubMed Central

Reinprecht, Yarmilla; Yadegari, Zeinab; Perry, Gregory E.; Siddiqua, Mahbuba; Wright, Lori C.; McClean, Phillip E.; Pauls, K. Peter

2013-01-01

Legumes contain a variety of phytochemicals derived from the phenylpropanoid pathway that have important effects on human health as well as seed coat color, plant disease resistance and nodulation. However, the information about the genes involved in this important pathway is fragmentary in common bean (Phaseolus vulgaris L.). The objectives of this research were to isolate genes that function in and control the phenylpropanoid pathway in common bean, determine their genomic locations in silico in common bean and soybean, and analyze sequences of the 4CL gene family in two common bean genotypes. Sequences of phenylpropanoid pathway genes available for common bean or other plant species were aligned, and the conserved regions were used to design sequence-specific primers. The PCR products were cloned and sequenced and the gene sequences along with common bean gene-based (g) markers were BLASTed against the Glycine max v.1.0 genome and the P. vulgaris v.1.0 (Andean) early release genome. In addition, gene sequences were BLASTed against the OAC Rex (Mesoamerican) genome sequence assembly. In total, fragments of 46 structural and regulatory phenylpropanoid pathway genes were characterized in this way and placed in silico on common bean and soybean sequence maps. The maps contain over 250 common bean g and SSR (simple sequence repeat) markers and identify the positions of more than 60 additional phenylpropanoid pathway gene sequences, plus the putative locations of seed coat color genes. The majority of cloned phenylpropanoid pathway gene sequences were mapped to one location in the common bean genome but had two positions in soybean. The comparison of the genomic maps confirmed previous studies, which show that common bean and soybean share genomic regions, including those containing phenylpropanoid pathway gene sequences, with conserved synteny. Indels identified in the comparison of Andean and Mesoamerican common bean 4CL gene sequences might be used to develop inter-pool phenylpropanoid pathway gene-based markers. We anticipate that the information obtained by this study will simplify and accelerate selections of common bean with specific phenylpropanoid pathway alleles to increase the contents of beneficial phenylpropanoids in common bean and other legumes. PMID:24046770
Genome-wide mapping of infection-induced SINE RNAs reveals a role in selective mRNA export.

PubMed

Karijolich, John; Zhao, Yang; Alla, Ravi; Glaunsinger, Britt

2017-06-02

Short interspersed nuclear elements (SINEs) are retrotransposons evolutionarily derived from endogenous RNA Polymerase III RNAs. Though SINE elements have undergone exaptation into gene regulatory elements, how transcribed SINE RNA impacts transcriptional and post-transcriptional regulation is largely unknown. This is partly due to a lack of information regarding which of the loci have transcriptional potential. Here, we present an approach (short interspersed nuclear element sequencing, SINE-seq), which selectively profiles RNA Polymerase III-derived SINE RNA, thereby identifying transcriptionally active SINE loci. Applying SINE-seq to monitor murine B2 SINE expression during a gammaherpesvirus infection revealed transcription from 28 270 SINE loci, with ∼50% of active SINE elements residing within annotated RNA Polymerase II loci. Furthermore, B2 RNA can form intermolecular RNA-RNA interactions with complementary mRNAs, leading to nuclear retention of the targeted mRNA via a mechanism involving p54nrb. These findings illuminate a pathway for the selective regulation of mRNA export during stress via retrotransposon activation. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Condensin controls recruitment of RNA polymerase II to achieve nematode X-chromosome dosage compensation

PubMed Central

Kruesi, William S; Core, Leighton J; Waters, Colin T; Lis, John T; Meyer, Barbara J

2013-01-01

The X-chromosome gene regulatory process called dosage compensation ensures that males (1X) and females (2X) express equal levels of X-chromosome transcripts. The mechanism in Caenorhabditis elegans has been elusive due to improperly annotated transcription start sites (TSSs). Here we define TSSs and the distribution of transcriptionally engaged RNA polymerase II (Pol II) genome-wide in wild-type and dosage-compensation-defective animals to dissect this regulatory mechanism. Our TSS-mapping strategy integrates GRO-seq, which tracks nascent transcription, with a new derivative of this method, called GRO-cap, which recovers nascent RNAs with 5′ caps prior to their removal by co-transcriptional processing. Our analyses reveal that promoter-proximal pausing is rare, unlike in other metazoans, and promoters are unexpectedly far upstream from the 5′ ends of mature mRNAs. We find that C. elegans equalizes X-chromosome expression between the sexes, to a level equivalent to autosomes, by reducing Pol II recruitment to promoters of hermaphrodite X-linked genes using a chromosome-restructuring condensin complex. DOI: http://dx.doi.org/10.7554/eLife.00808.001 PMID:23795297
Genome-wide mapping of infection-induced SINE RNAs reveals a role in selective mRNA export

PubMed Central

Zhao, Yang; Alla, Ravi

2017-01-01

Abstract Short interspersed nuclear elements (SINEs) are retrotransposons evolutionarily derived from endogenous RNA Polymerase III RNAs. Though SINE elements have undergone exaptation into gene regulatory elements, how transcribed SINE RNA impacts transcriptional and post-transcriptional regulation is largely unknown. This is partly due to a lack of information regarding which of the loci have transcriptional potential. Here, we present an approach (short interspersed nuclear element sequencing, SINE-seq), which selectively profiles RNA Polymerase III-derived SINE RNA, thereby identifying transcriptionally active SINE loci. Applying SINE-seq to monitor murine B2 SINE expression during a gammaherpesvirus infection revealed transcription from 28 270 SINE loci, with ∼50% of active SINE elements residing within annotated RNA Polymerase II loci. Furthermore, B2 RNA can form intermolecular RNA–RNA interactions with complementary mRNAs, leading to nuclear retention of the targeted mRNA via a mechanism involving p54nrb. These findings illuminate a pathway for the selective regulation of mRNA export during stress via retrotransposon activation. PMID:28334904
Alterations of immune response of non-small cell lung cancer with Azacytidine

PubMed Central

Easwaran, Hariharan; Mohammad, Helai P.; Vendetti, Frank; VanCriekinge, Wim; DeMeyer, Tim; Du, Zhengzong; Parsana, Princy; Rodgers, Kristen; Yen, Ray-Whay; Zahnow, Cynthia A.; Taube, Janis M.; Brahmer, Julie R.; Tykodi, Scott S.; Easton, Keith; Carvajal, Richard D.; Jones, Peter A.; Laird, Peter W.; Weisenberger, Daniel J.; Tsai, Salina; Juergens, Rosalyn A.; Topalian, Suzanne L.; Rudin, Charles M.; Brock, Malcolm V.; Pardoll, Drew; Baylin, Stephen B.

2013-01-01

Innovative therapies are needed for advanced Non-Small Cell Lung Cancer (NSCLC). We have undertaken a genomics based, hypothesis driving, approach to query an emerging potential that epigenetic therapy may sensitize to immune checkpoint therapy targeting PD-L1/PD-1 interaction. NSCLC cell lines were treated with the DNA hypomethylating agent azacytidine (AZA – Vidaza) and genes and pathways altered were mapped by genome-wide expression and DNA methylation analyses. AZA-induced pathways were analyzed in The Cancer Genome Atlas (TCGA) project by mapping the derived gene signatures in hundreds of lung adeno (LUAD) and squamous cell carcinoma (LUSC) samples. AZA up-regulates genes and pathways related to both innate and adaptive immunity and genes related to immune evasion in a several NSCLC lines. DNA hypermethylation and low expression of IRF7, an interferon transcription factor, tracks with this signature particularly in LUSC. In concert with these events, AZA up-regulates PD-L1 transcripts and protein, a key ligand-mediator of immune tolerance. Analysis of TCGA samples demonstrates that a significant proportion of primary NSCLC have low expression of AZA-induced immune genes, including PD-L1. We hypothesize that epigenetic therapy combined with blockade of immune checkpoints – in particular the PD-1/PD-L1 pathway – may augment response of NSCLC by shifting the balance between immune activation and immune inhibition, particularly in a subset of NSCLC with low expression of these pathways. Our studies define a biomarker strategy for response in a recently initiated trial to examine the potential of epigenetic therapy to sensitize patients with NSCLC to PD-1 immune checkpoint blockade. PMID:24162015
Pervasive Transcription of a Herpesvirus Genome Generates Functionally Important RNAs

PubMed Central

Canny, Susan P.; Reese, Tiffany A.; Johnson, L. Steven; Zhang, Xin; Kambal, Amal; Duan, Erning; Liu, Catherine Y.; Virgin, Herbert W.

2014-01-01

ABSTRACT Pervasive transcription is observed in a wide range of organisms, including humans, mice, and viruses, but the functional significance of the resulting transcripts remains uncertain. Current genetic approaches are often limited by their emphasis on protein-coding open reading frames (ORFs). We previously identified extensive pervasive transcription from the murine gammaherpesvirus 68 (MHV68) genome outside known ORFs and antisense to known genes (termed expressed genomic regions [EGRs]). Similar antisense transcripts have been identified in many other herpesviruses, including Kaposi’s sarcoma-associated herpesvirus and human and murine cytomegalovirus. Despite their prevalence, whether these RNAs have any functional importance in the viral life cycle is unknown, and one interpretation is that these are merely “noise” generated by functionally unimportant transcriptional events. To determine whether pervasive transcription of a herpesvirus genome generates RNA molecules that are functionally important, we used a strand-specific functional approach to target transcripts from thirteen EGRs in MHV68. We found that targeting transcripts from six EGRs reduced viral protein expression, proving that pervasive transcription can generate functionally important RNAs. We characterized transcripts emanating from EGRs 26 and 27 in detail using several methods, including RNA sequencing, and identified several novel polyadenylated transcripts that were enriched in the nuclei of infected cells. These data provide the first evidence of the functional importance of regions of pervasive transcription emanating from MHV68 EGRs. Therefore, studies utilizing mutation of a herpesvirus genome must account for possible effects on RNAs generated by pervasive transcription. PMID:24618256
Bombyx mori Transcription Factors: Genome-Wide Identification, Expression Profiles and Response to Pathogens by Microarray Analysis

PubMed Central

Huang, Lulin; Cheng, Tingcai; Xu, Pingzhen; Fang, Ting; Xia, Qingyou

2012-01-01

Transcription factors are present in all living organisms, and play vital roles in a wide range of biological processes. Studies of transcription factors will help reveal the complex regulation mechanism of organisms. So far, hundreds of domains have been identified that show transcription factor activity. Here, 281 reported transcription factor domains were used as seeds to search the transcription factors in genomes of Bombyx mori L. (Lepidoptera: Bombycidae) and four other model insects. Overall, 666 transcription factors including 36 basal factors and 630 other factors were identified in B. mori genome, which accounted for 4.56% of its genome. The silkworm transcription factors' expression profiles were investigated in relation to multiple tissues, developmental stages, sexual dimorphism, and responses to oral infection by pathogens and direct bacterial injection. These all provided rich clues for revealing the transcriptional regulation mechanism of silkworm organ differentiation, growth and development, sexual dimorphism, and response to pathogen infection. PMID:22943524
Transcriptional Elongation Control of Hepatitis B Virus Covalently Closed Circular DNA Transcription by Super Elongation Complex and BRD4.

PubMed

Francisco, Joel Celio; Dai, Qian; Luo, Zhuojuan; Wang, Yan; Chong, Roxanne Hui-Heng; Tan, Yee Joo; Xie, Wei; Lee, Guan-Huei; Lin, Chengqi

2017-10-01

Chronic hepatitis B virus (HBV) infection can lead to liver cirrhosis and hepatocellular carcinoma. HBV reactivation during or after chemotherapy is a potentially fatal complication for cancer patients with chronic HBV infection. Transcription of HBV is a critical intermediate step of the HBV life cycle. However, factors controlling HBV transcription remain largely unknown. Here, we found that different P-TEFb complexes are involved in the transcription of the HBV viral genome. Both BRD4 and the super elongation complex (SEC) bind to the HBV genome. The treatment of bromodomain inhibitor JQ1 stimulates HBV transcription and increases the occupancy of BRD4 on the HBV genome, suggesting the bromodomain-independent recruitment of BRD4 to the HBV genome. JQ1 also leads to the increased binding of SEC to the HBV genome, and SEC is required for JQ1-induced HBV transcription. These findings reveal a novel mechanism by which the HBV genome hijacks the host P-TEFb-containing complexes to promote its own transcription. Our findings also point out an important clinical implication, that is, the potential risk of HBV reactivation during therapy with a BRD4 inhibitor, such as JQ1 or its analogues, which are a potential treatment for acute myeloid leukemia. Copyright © 2017 American Society for Microbiology.
An apple MYB transcription factor, MdMYB3, is involved in regulation of anthocyanin biosynthesis and flower development

PubMed Central

2013-01-01

Background Red coloration of fruit is an important trait in apple, and it is mainly attributed to the accumulation of anthocyanins, a class of plant flavonoid metabolites. Anthocyanin biosynthesis is genetically determined by structural and regulatory genes. Plant tissue pigmentation patterns are mainly controlled by expression profiles of regulatory genes. Among these regulatory genes are MYB transcription factors (TFs), wherein the class of two-repeats (R2R3) is deemed the largest, and these are associated with the anthocyanin biosynthesis pathway. Although three MdMYB genes, almost identical in nucleotide sequences, have been identified in apple, it is likely that there are other R2R3 MYB TFs that are present in the apple genome that are also involved in the regulation of coloration of red color pigmentation of the skin of apple fruits. Results In this study, a novel R2R3 MYB gene has been isolated and characterized in apple. This MYB gene is closely related to the Arabidopsis thaliana AtMYB3, and has been designated as MdMYB3. This TF belongs to the subgroup 4 R2R3 family of plant MYB transcription factors. This apple MdMYB3 gene is mapped onto linkage group 15 of the integrated apple genetic map. Transcripts of MdMYB3 are detected in all analyzed tissues including leaves, flowers, and fruits. However, transcripts of MdMYB3 are higher in excocarp of red-skinned apple cultivars than that in yellowish-green skinned apple cultivars. When this gene is ectopically expressed in Nicotiana tabacum cv. Petite Havana SR1, flowers of transgenic tobacco lines carrying MdMYB3 have exhibited increased pigmentation and accumulate higher levels of anthocyanins and flavonols than wild-type flowers. Overexpression of MdMYB3 has resulted in transcriptional activation of several flavonoid pathway genes, including CHS, CHI, UFGT, and FLS. Moreover, peduncles of flowers and styles of pistils of transgenic plants overexpressing MdMYB3 are longer than those of wild-type plants, thus suggesting that this TF is involved in regulation of flower development. Conclusions This study has identified a novel MYB transcription factor in the apple genome. This TF, designated as MdMYB3, is involved in transcriptional activation of several flavonoid pathway genes. Moreover, this TF not only regulates the accumulation of anthocyanin in the skin of apple fruits, but it is also involved in the regulation of flower development, particularly that of pistil development. PMID:24199943

An apple MYB transcription factor, MdMYB3, is involved in regulation of anthocyanin biosynthesis and flower development.

PubMed

Vimolmangkang, Sornkanok; Han, Yuepeng; Wei, Guochao; Korban, Schuyler S

2013-11-07

Red coloration of fruit is an important trait in apple, and it is mainly attributed to the accumulation of anthocyanins, a class of plant flavonoid metabolites. Anthocyanin biosynthesis is genetically determined by structural and regulatory genes. Plant tissue pigmentation patterns are mainly controlled by expression profiles of regulatory genes. Among these regulatory genes are MYB transcription factors (TFs), wherein the class of two-repeats (R2R3) is deemed the largest, and these are associated with the anthocyanin biosynthesis pathway. Although three MdMYB genes, almost identical in nucleotide sequences, have been identified in apple, it is likely that there are other R2R3 MYB TFs that are present in the apple genome that are also involved in the regulation of coloration of red color pigmentation of the skin of apple fruits. In this study, a novel R2R3 MYB gene has been isolated and characterized in apple. This MYB gene is closely related to the Arabidopsis thaliana AtMYB3, and has been designated as MdMYB3. This TF belongs to the subgroup 4 R2R3 family of plant MYB transcription factors. This apple MdMYB3 gene is mapped onto linkage group 15 of the integrated apple genetic map. Transcripts of MdMYB3 are detected in all analyzed tissues including leaves, flowers, and fruits. However, transcripts of MdMYB3 are higher in excocarp of red-skinned apple cultivars than that in yellowish-green skinned apple cultivars. When this gene is ectopically expressed in Nicotiana tabacum cv. Petite Havana SR1, flowers of transgenic tobacco lines carrying MdMYB3 have exhibited increased pigmentation and accumulate higher levels of anthocyanins and flavonols than wild-type flowers. Overexpression of MdMYB3 has resulted in transcriptional activation of several flavonoid pathway genes, including CHS, CHI, UFGT, and FLS. Moreover, peduncles of flowers and styles of pistils of transgenic plants overexpressing MdMYB3 are longer than those of wild-type plants, thus suggesting that this TF is involved in regulation of flower development. This study has identified a novel MYB transcription factor in the apple genome. This TF, designated as MdMYB3, is involved in transcriptional activation of several flavonoid pathway genes. Moreover, this TF not only regulates the accumulation of anthocyanin in the skin of apple fruits, but it is also involved in the regulation of flower development, particularly that of pistil development.
SeqTU: A web server for identification of bacterial transcription units

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chen, Xin; Chou, Wen -Chi; Ma, Qin

A transcription unit (TU) consists of K ≥ 1 consecutive genes on the same strand of a bacterial genome that are transcribed into a single mRNA molecule under certain conditions. Their identification is an essential step in elucidation of transcriptional regulatory networks. We have recently developed a machine-learning method to accurately identify TUs from RNA-seq data, based on two features of the assembled RNA reads: the continuity and stability of RNA-seq coverage across a genomic region. While good performance was achieved by the method on Escherichia coli and Clostridium thermocellum, substantial work is needed to make the program generally applicablemore » to all bacteria, knowing that the program requires organism specific information. A web server, named SeqTU, was developed to automatically identify TUs with given RNA-seq data of any bacterium using a machine-learning approach. The server consists of a number of utility tools, in addition to TU identification, such as data preparation, data quality check and RNA-read mapping. SeqTU provides a user-friendly interface and automated prediction of TUs from given RNA-seq data. Furthermore, the predicted TUs are displayed intuitively using HTML format along with a graphic visualization of the prediction.« less
SeqTU: A web server for identification of bacterial transcription units

DOE PAGES

Chen, Xin; Chou, Wen -Chi; Ma, Qin; ...

2017-03-07

A transcription unit (TU) consists of K ≥ 1 consecutive genes on the same strand of a bacterial genome that are transcribed into a single mRNA molecule under certain conditions. Their identification is an essential step in elucidation of transcriptional regulatory networks. We have recently developed a machine-learning method to accurately identify TUs from RNA-seq data, based on two features of the assembled RNA reads: the continuity and stability of RNA-seq coverage across a genomic region. While good performance was achieved by the method on Escherichia coli and Clostridium thermocellum, substantial work is needed to make the program generally applicablemore » to all bacteria, knowing that the program requires organism specific information. A web server, named SeqTU, was developed to automatically identify TUs with given RNA-seq data of any bacterium using a machine-learning approach. The server consists of a number of utility tools, in addition to TU identification, such as data preparation, data quality check and RNA-read mapping. SeqTU provides a user-friendly interface and automated prediction of TUs from given RNA-seq data. Furthermore, the predicted TUs are displayed intuitively using HTML format along with a graphic visualization of the prediction.« less
Molecular dissection of transcriptional reprogramming of steviol glycosides synthesis in leaf tissue during developmental phase transitions in Stevia rebaudiana Bert.

PubMed

Singh, Gopal; Singh, Gagandeep; Singh, Pradeep; Parmar, Rajni; Paul, Navgeet; Vashist, Radhika; Swarnkar, Mohit Kumar; Kumar, Ashok; Singh, Sanatsujat; Singh, Anil Kumar; Kumar, Sanjay; Sharma, Ram Kumar

2017-09-19

Stevia is a natural source of commercially important steviol glycosides (SGs), which share biosynthesis route with gibberellic acids (GAs) through plastidal MEP and cytosolic MVA pathways. Ontogeny-dependent deviation in SGs biosynthesis is one of the key factor for global cultivation of Stevia, has not been studied at transcriptional level. To dissect underlying molecular mechanism, we followed a global transcriptome sequencing approach and generated more than 100 million reads. Annotation of 41,262 de novo assembled transcripts identified all the genes required for SGs and GAs biosynthesis. Differential gene expression and quantitative analysis of important pathway genes (DXS, HMGR, KA13H) and gene regulators (WRKY, MYB, NAC TFs) indicated developmental phase dependent utilization of metabolic flux between SGs and GAs synthesis. Further, identification of 124 CYPs and 45 UGTs enrich the genomic resources, and their PPI network analysis with SGs/GAs biosynthesis proteins identifies putative candidates involved in metabolic changes, as supported by their developmental phase-dependent expression. These putative targets can expedite molecular breeding and genetic engineering efforts to enhance SGs content, biomass and yield. Futuristically, the generated dataset will be a useful resource for development of functional molecular markers for diversity characterization, genome mapping and evolutionary studies in Stevia.
AKT1, LKB1, and YAP1 revealed as MYC interactors with NanoLuc-based protein-fragment complementation assay. | Office of Cancer Genomics

Cancer.gov

The c-Myc (MYC) transcription factor is a major cancer driver and a well-validated therapeutic target. However, directly targeting MYC has been challenging. Thus, identifying proteins that interact with and regulate MYC may provide alternative strategies to inhibit its oncogenic activity. Here we report the development of a NanoLuc®-based protein-fragment complementation assay (NanoPCA) and mapping of the MYC protein interaction hub in live mammalian cells.
Rheumatoid arthritis association at 6q23

PubMed Central

Thomson, Wendy; Barton, Anne; Ke, Xiayi; Eyre, Steve; Hinks, Anne; Bowes, John; Donn, Rachelle; Symmons, Deborah; Hider, Samantha; Bruce, Ian N; Wilson, Anthony G; Marinou, Ioanna; Morgan, Ann; Emery, Paul; Carter, Angela; Steer, Sophia; Hocking, Lynne; Reid, David M; Wordsworth, Paul; Harrison, Pille; Strachan, David; Worthington, Jane

2009-01-01

The Wellcome Trust Case Control Consortium (WTCCC) identified nine single SNPs putatively associated with rheumatoid arthritis at P = 1 × 10 -5 - 5 × 10-7 in a genome-wide association screen. One, rs6920220, was unequivocally replicated (trend P = 1.1 × 10-8) in a validation study, as described here. This SNP maps to 6q23, between the genes oligodendrocyte lineage transcription factor 3 (OLIG3) and tumor necrosis factor-α-induced protein 3 (TNFAIP3). PMID:17982455
GenomeRNAi: a database for cell-based RNAi phenotypes.

PubMed

Horn, Thomas; Arziman, Zeynep; Berger, Juerg; Boutros, Michael

2007-01-01

RNA interference (RNAi) has emerged as a powerful tool to generate loss-of-function phenotypes in a variety of organisms. Combined with the sequence information of almost completely annotated genomes, RNAi technologies have opened new avenues to conduct systematic genetic screens for every annotated gene in the genome. As increasing large datasets of RNAi-induced phenotypes become available, an important challenge remains the systematic integration and annotation of functional information. Genome-wide RNAi screens have been performed both in Caenorhabditis elegans and Drosophila for a variety of phenotypes and several RNAi libraries have become available to assess phenotypes for almost every gene in the genome. These screens were performed using different types of assays from visible phenotypes to focused transcriptional readouts and provide a rich data source for functional annotation across different species. The GenomeRNAi database provides access to published RNAi phenotypes obtained from cell-based screens and maps them to their genomic locus, including possible non-specific regions. The database also gives access to sequence information of RNAi probes used in various screens. It can be searched by phenotype, by gene, by RNAi probe or by sequence and is accessible at http://rnai.dkfz.de.
GenomeRNAi: a database for cell-based RNAi phenotypes

PubMed Central

Horn, Thomas; Arziman, Zeynep; Berger, Juerg; Boutros, Michael

2007-01-01

RNA interference (RNAi) has emerged as a powerful tool to generate loss-of-function phenotypes in a variety of organisms. Combined with the sequence information of almost completely annotated genomes, RNAi technologies have opened new avenues to conduct systematic genetic screens for every annotated gene in the genome. As increasing large datasets of RNAi-induced phenotypes become available, an important challenge remains the systematic integration and annotation of functional information. Genome-wide RNAi screens have been performed both in Caenorhabditis elegans and Drosophila for a variety of phenotypes and several RNAi libraries have become available to assess phenotypes for almost every gene in the genome. These screens were performed using different types of assays from visible phenotypes to focused transcriptional readouts and provide a rich data source for functional annotation across different species. The GenomeRNAi database provides access to published RNAi phenotypes obtained from cell-based screens and maps them to their genomic locus, including possible non-specific regions. The database also gives access to sequence information of RNAi probes used in various screens. It can be searched by phenotype, by gene, by RNAi probe or by sequence and is accessible at PMID:17135194
TEA: the epigenome platform for Arabidopsis methylome study.

PubMed

Su, Sheng-Yao; Chen, Shu-Hwa; Lu, I-Hsuan; Chiang, Yih-Shien; Wang, Yu-Bin; Chen, Pao-Yang; Lin, Chung-Yen

2016-12-22

Bisulfite sequencing (BS-seq) has become a standard technology to profile genome-wide DNA methylation at single-base resolution. It allows researchers to conduct genome-wise cytosine methylation analyses on issues about genomic imprinting, transcriptional regulation, cellular development and differentiation. One single data from a BS-Seq experiment is resolved into many features according to the sequence contexts, making methylome data analysis and data visualization a complex task. We developed a streamlined platform, TEA, for analyzing and visualizing data from whole-genome BS-Seq (WGBS) experiments conducted in the model plant Arabidopsis thaliana. To capture the essence of the genome methylation level and to meet the efficiency for running online, we introduce a straightforward method for measuring genome methylation in each sequence context by gene. The method is scripted in Java to process BS-Seq mapping results. Through a simple data uploading process, the TEA server deploys a web-based platform for deep analysis by linking data to an updated Arabidopsis annotation database and toolkits. TEA is an intuitive and efficient online platform for analyzing the Arabidopsis genomic DNA methylation landscape. It provides several ways to help users exploit WGBS data. TEA is freely accessible for academic users at: http://tea.iis.sinica.edu.tw .
Hierarchical regulation of the genome: global changes in nucleosome organization potentiate genome response

PubMed Central

Sexton, Brittany S.; Druliner, Brooke R.; Vera, Daniel L.; Avey, Denis; Zhu, Fanxiu; Dennis, Jonathan H.

2016-01-01

Nucleosome occupancy is critically important in regulating access to the eukaryotic genome. Few studies in human cells have measured genome-wide nucleosome distributions at high temporal resolution during a response to a common stimulus. We measured nucleosome distributions at high temporal resolution following Kaposi's-sarcoma-associated herpesvirus (KSHV) reactivation using our newly developed mTSS-seq technology, which maps nucleosome distribution at the transcription start sites (TSS) of all human genes. Nucleosomes underwent widespread changes in organization 24 hours after KSHV reactivation and returned to their basal nucleosomal architecture 48 hours after KSHV reactivation. The widespread changes consisted of an indiscriminate remodeling event resulting in the loss of nucleosome rotational phasing signals. Additionally, one in six TSSs in the human genome possessed nucleosomes that are translationally remodeled. 72% of the loci with translationally remodeled nucleosomes have nucleosomes that moved to positions encoded by the underlying DNA sequence. Finally we demonstrated that these widespread alterations in nucleosomal architecture potentiated regulatory factor binding. These descriptions of nucleosomal architecture changes provide a new framework for understanding the role of chromatin in the genomic response, and have allowed us to propose a hierarchical model for chromatin-based regulation of genome response. PMID:26771136
Lineage-specific expansions of retroviral insertions within the genomes of African great apes but not humans and orangutans.

PubMed

Yohn, Chris T; Jiang, Zhaoshi; McGrath, Sean D; Hayden, Karen E; Khaitovich, Philipp; Johnson, Matthew E; Eichler, Marla Y; McPherson, John D; Zhao, Shaying; Pääbo, Svante; Eichler, Evan E

2005-04-01

Retroviral infections of the germline have the potential to episodically alter gene function and genome structure during the course of evolution. Horizontal transmissions between species have been proposed, but little evidence exists for such events in the human/great ape lineage of evolution. Based on analysis of finished BAC chimpanzee genome sequence, we characterize a retroviral element (Pan troglodytes endogenous retrovirus 1 [PTERV1]) that has become integrated in the germline of African great ape and Old World monkey species but is absent from humans and Asian ape genomes. We unambiguously map 287 retroviral integration sites and determine that approximately 95.8% of the insertions occur at non-orthologous regions between closely related species. Phylogenetic analysis of the endogenous retrovirus reveals that the gorilla and chimpanzee elements share a monophyletic origin with a subset of the Old World monkey retroviral elements, but that the average sequence divergence exceeds neutral expectation for a strictly nuclear inherited DNA molecule. Within the chimpanzee, there is a significant integration bias against genes, with only 14 of these insertions mapping within intronic regions. Six out of ten of these genes, for which there are expression data, show significant differences in transcript expression between human and chimpanzee. Our data are consistent with a retroviral infection that bombarded the genomes of chimpanzees and gorillas independently and concurrently, 3-4 million years ago. We speculate on the potential impact of such recent events on the evolution of humans and great apes.
Nencki Genomics Database--Ensembl funcgen enhanced with intersections, user data and genome-wide TFBS motifs.

PubMed

Krystkowiak, Izabella; Lenart, Jakub; Debski, Konrad; Kuterba, Piotr; Petas, Michal; Kaminska, Bozena; Dabrowski, Michal

2013-01-01

We present the Nencki Genomics Database, which extends the functionality of Ensembl Regulatory Build (funcgen) for the three species: human, mouse and rat. The key enhancements over Ensembl funcgen include the following: (i) a user can add private data, analyze them alongside the public data and manage access rights; (ii) inside the database, we provide efficient procedures for computing intersections between regulatory features and for mapping them to the genes. To Ensembl funcgen-derived data, which include data from ENCODE, we add information on conserved non-coding (putative regulatory) sequences, and on genome-wide occurrence of transcription factor binding site motifs from the current versions of two major motif libraries, namely, Jaspar and Transfac. The intersections and mapping to the genes are pre-computed for the public data, and the result of any procedure run on the data added by the users is stored back into the database, thus incrementally increasing the body of pre-computed data. As the Ensembl funcgen schema for the rat is currently not populated, our database is the first database of regulatory features for this frequently used laboratory animal. The database is accessible without registration using the mysql client: mysql -h database.nencki-genomics.org -u public. Registration is required only to add or access private data. A WSDL webservice provides access to the database from any SOAP client, including the Taverna Workbench with a graphical user interface.
Genome-Wide Association Study Identifies Novel Loci Associated With Diisocyanate-Induced Occupational Asthma

PubMed Central

Yucesoy, Berran; Kaufman, Kenneth M.; Lummus, Zana L.; Weirauch, Matthew T.; Zhang, Ge; Cartier, André; Boulet, Louis-Philippe; Sastre, Joaquin; Quirce, Santiago; Tarlo, Susan M.; Cruz, Maria-Jesus; Munoz, Xavier; Harley, John B.; Bernstein, David I.

2015-01-01

Diisocyanates, reactive chemicals used to produce polyurethane products, are the most common causes of occupational asthma. The aim of this study is to identify susceptibility gene variants that could contribute to the pathogenesis of diisocyanate asthma (DA) using a Genome-Wide Association Study (GWAS) approach. Genome-wide single nucleotide polymorphism (SNP) genotyping was performed in 74 diisocyanate-exposed workers with DA and 824 healthy controls using Omni-2.5 and Omni-5 SNP microarrays. We identified 11 SNPs that exceeded genome-wide significance; the strongest association was for the rs12913832 SNP located on chromosome 15, which has been mapped to the HERC2 gene (p = 6.94 × 10−14). Strong associations were also found for SNPs near the ODZ3 and CDH17 genes on chromosomes 4 and 8 (rs908084, p = 8.59 × 10−9 and rs2514805, p = 1.22 × 10−8, respectively). We also prioritized 38 SNPs with suggestive genome-wide significance (p < 1 × 10−6). Among them, 17 SNPs map to the PITPNC1, ACMSD, ZBTB16, ODZ3, and CDH17 gene loci. Functional genomics data indicate that 2 of the suggestive SNPs (rs2446823 and rs2446824) are located within putative binding sites for the CCAAT/Enhancer Binding Protein (CEBP) and Hepatocyte Nuclear Factor 4, Alpha transcription factors (TFs), respectively. This study identified SNPs mapping to the HERC2, CDH17, and ODZ3 genes as potential susceptibility loci for DA. Pathway analysis indicated that these genes are associated with antigen processing and presentation, and other immune pathways. Overlap of 2 suggestive SNPs with likely TF binding sites suggests possible roles in disruption of gene regulation. These results provide new insights into the genetic architecture of DA and serve as a basis for future functional and mechanistic studies. PMID:25918132
Normal and compound poisson approximations for pattern occurrences in NGS reads.

PubMed

Zhai, Zhiyuan; Reinert, Gesine; Song, Kai; Waterman, Michael S; Luan, Yihui; Sun, Fengzhu

2012-06-01

Next generation sequencing (NGS) technologies are now widely used in many biological studies. In NGS, sequence reads are randomly sampled from the genome sequence of interest. Most computational approaches for NGS data first map the reads to the genome and then analyze the data based on the mapped reads. Since many organisms have unknown genome sequences and many reads cannot be uniquely mapped to the genomes even if the genome sequences are known, alternative analytical methods are needed for the study of NGS data. Here we suggest using word patterns to analyze NGS data. Word pattern counting (the study of the probabilistic distribution of the number of occurrences of word patterns in one or multiple long sequences) has played an important role in molecular sequence analysis. However, no studies are available on the distribution of the number of occurrences of word patterns in NGS reads. In this article, we build probabilistic models for the background sequence and the sampling process of the sequence reads from the genome. Based on the models, we provide normal and compound Poisson approximations for the number of occurrences of word patterns from the sequence reads, with bounds on the approximation error. The main challenge is to consider the randomness in generating the long background sequence, as well as in the sampling of the reads using NGS. We show the accuracy of these approximations under a variety of conditions for different patterns with various characteristics. Under realistic assumptions, the compound Poisson approximation seems to outperform the normal approximation in most situations. These approximate distributions can be used to evaluate the statistical significance of the occurrence of patterns from NGS data. The theory and the computational algorithm for calculating the approximate distributions are then used to analyze ChIP-Seq data using transcription factor GABP. Software is available online (www-rcf.usc.edu/∼fsun/Programs/NGS_motif_power/NGS_motif_power.html). In addition, Supplementary Material can be found online (www.liebertonline.com/cmb).
Unique cell-type-specific patterns of DNA methylation in the root meristem.

PubMed

Kawakatsu, Taiji; Stuart, Tim; Valdes, Manuel; Breakfield, Natalie; Schmitz, Robert J; Nery, Joseph R; Urich, Mark A; Han, Xinwei; Lister, Ryan; Benfey, Philip N; Ecker, Joseph R

2016-04-29

DNA methylation is an epigenetic modification that differs between plant organs and tissues, but the extent of variation between cell types is not known. Here, we report single-base-resolution whole-genome DNA methylomes, mRNA transcriptomes and small RNA transcriptomes for six cell populations covering the major cell types of the Arabidopsis root meristem. We identify widespread cell-type-specific patterns of DNA methylation, especially in the CHH sequence context, where H is A, C or T. The genome of the columella root cap is the most highly methylated Arabidopsis cell characterized so far. It is hypermethylated within transposable elements (TEs), accompanied by increased abundance of transcripts encoding RNA-directed DNA methylation (RdDM) pathway components and 24-nt small RNAs (smRNAs). The absence of the nucleosome remodeller DECREASED DNA METHYLATION 1 (DDM1), required for maintenance of DNA methylation, and low abundance of histone transcripts involved in heterochromatin formation suggests that a loss of heterochromatin may occur in the columella, thus allowing access of RdDM factors to the whole genome, and producing an excess of 24-nt smRNAs in this tissue. Together, these maps provide new insights into the epigenomic diversity that exists between distinct plant somatic cell types.
Regulatory Architecture of Gene Expression Variation in the Threespine Stickleback Gasterosteus aculeatus.

PubMed

Pritchard, Victoria L; Viitaniemi, Heidi M; McCairns, R J Scott; Merilä, Juha; Nikinmaa, Mikko; Primmer, Craig R; Leder, Erica H

2017-01-05

Much adaptive evolutionary change is underlain by mutational variation in regions of the genome that regulate gene expression rather than in the coding regions of the genes themselves. An understanding of the role of gene expression variation in facilitating local adaptation will be aided by an understanding of underlying regulatory networks. Here, we characterize the genetic architecture of gene expression variation in the threespine stickleback (Gasterosteus aculeatus), an important model in the study of adaptive evolution. We collected transcriptomic and genomic data from 60 half-sib families using an expression microarray and genotyping-by-sequencing, and located expression quantitative trait loci (eQTL) underlying the variation in gene expression in liver tissue using an interval mapping approach. We identified eQTL for several thousand expression traits. Expression was influenced by polymorphism in both cis- and trans-regulatory regions. Trans-eQTL clustered into hotspots. We did not identify master transcriptional regulators in hotspot locations: rather, the presence of hotspots may be driven by complex interactions between multiple transcription factors. One observed hotspot colocated with a QTL recently found to underlie salinity tolerance in the threespine stickleback. However, most other observed hotspots did not colocate with regions of the genome known to be involved in adaptive divergence between marine and freshwater habitats. Copyright © 2017 Pritchard et al.
GETPrime 2.0: gene- and transcript-specific qPCR primers for 13 species including polymorphisms

PubMed Central

David, Fabrice P.A.; Rougemont, Jacques; Deplancke, Bart

2017-01-01

GETPrime (http://bbcftools.epfl.ch/getprime) is a database with a web frontend providing gene- and transcript-specific, pre-computed qPCR primer pairs. The primers have been optimized for genome-wide specificity and for allowing the selective amplification of one or several splice variants of most known genes. To ease selection, primers have also been ranked according to defined criteria such as genome-wide specificity (with BLAST), amplicon size, and isoform coverage. Here, we report a major upgrade (2.0) of the database: eight new species (yeast, chicken, macaque, chimpanzee, rat, platypus, pufferfish, and Anolis carolinensis) now complement the five already included in the previous version (human, mouse, zebrafish, fly, and worm). Furthermore, the genomic reference has been updated to Ensembl v81 (while keeping earlier versions for backward compatibility) as a result of re-designing the back-end database and automating the import of relevant sections of the Ensembl database in species-independent fashion. This also allowed us to map known polymorphisms to the primers (on average three per primer for human), with the aim of reducing experimental error when targeting specific strains or individuals. Another consequence is that the inclusion of future Ensembl releases and other species has now become a relatively straightforward task. PMID:28053161
QTLs Regulating the Contents of Antioxidants, Phenolics, and Flavonoids in Soybean Seeds Share a Common Genomic Region.

PubMed

Li, Man-Wah; Muñoz, Nacira B; Wong, Chi-Fai; Wong, Fuk-Ling; Wong, Kwong-Sen; Wong, Johanna Wing-Hang; Qi, Xinpeng; Li, Kwan-Pok; Ng, Ming-Sin; Lam, Hon-Ming

2016-01-01

Soybean seeds are a rich source of phenolic compounds, especially isoflavonoids, which are important nutraceuticals. Our study using 14 wild- and 16 cultivated-soybean accessions shows that seeds from cultivated soybeans generally contain lower total antioxidants compared to their wild counterparts, likely an unintended consequence of domestication or human selection. Using a recombinant inbred population resulting from a wild and a cultivated soybean parent and a bin map approach, we have identified an overlapping genomic region containing major quantitative trait loci (QTLs) that regulate the seed contents of total antioxidants, phenolics, and flavonoids. The QTL for seed antioxidant content contains 14 annotated genes based on the Williams 82 reference genome (Gmax1.01). None of these genes encodes functions that are related to the phenylpropanoid pathway of soybean. However, we found three putative Multidrug And Toxic Compound Extrusion (MATE) transporter genes within this QTL and one adjacent to it (GmMATE1-4). Moreover, we have identified non-synonymous changes between GmMATE1 and GmMATE2, and that GmMATE3 encodes an antisense transcript that expresses in pods. Whether the polymorphisms in GmMATE proteins are major determinants of the antioxidant contents, or whether the antisense transcripts of GmMATE3 play important regulatory roles, awaits further functional investigations.
Regulatory Architecture of Gene Expression Variation in the Threespine Stickleback Gasterosteus aculeatus

PubMed Central

Pritchard, Victoria L.; Viitaniemi, Heidi M.; McCairns, R. J. Scott; Merilä, Juha; Nikinmaa, Mikko; Primmer, Craig R.; Leder, Erica H.

2016-01-01

Much adaptive evolutionary change is underlain by mutational variation in regions of the genome that regulate gene expression rather than in the coding regions of the genes themselves. An understanding of the role of gene expression variation in facilitating local adaptation will be aided by an understanding of underlying regulatory networks. Here, we characterize the genetic architecture of gene expression variation in the threespine stickleback (Gasterosteus aculeatus), an important model in the study of adaptive evolution. We collected transcriptomic and genomic data from 60 half-sib families using an expression microarray and genotyping-by-sequencing, and located expression quantitative trait loci (eQTL) underlying the variation in gene expression in liver tissue using an interval mapping approach. We identified eQTL for several thousand expression traits. Expression was influenced by polymorphism in both cis- and trans-regulatory regions. Trans-eQTL clustered into hotspots. We did not identify master transcriptional regulators in hotspot locations: rather, the presence of hotspots may be driven by complex interactions between multiple transcription factors. One observed hotspot colocated with a QTL recently found to underlie salinity tolerance in the threespine stickleback. However, most other observed hotspots did not colocate with regions of the genome known to be involved in adaptive divergence between marine and freshwater habitats. PMID:27836907
An Ultra-High-Density, Transcript-Based, Genetic Map of Lettuce

PubMed Central

Truco, Maria José; Ashrafi, Hamid; Kozik, Alexander; van Leeuwen, Hans; Bowers, John; Wo, Sebastian Reyes Chin; Stoffel, Kevin; Xu, Huaqin; Hill, Theresa; Van Deynze, Allen; Michelmore, Richard W.

2013-01-01

We have generated an ultra-high-density genetic map for lettuce, an economically important member of the Compositae, consisting of 12,842 unigenes (13,943 markers) mapped in 3696 genetic bins distributed over nine chromosomal linkage groups. Genomic DNA was hybridized to a custom Affymetrix oligonucleotide array containing 6.4 million features representing 35,628 unigenes of Lactuca spp. Segregation of single-position polymorphisms was analyzed using 213 F7:8 recombinant inbred lines that had been generated by crossing cultivated Lactuca sativa cv. Salinas and L. serriola acc. US96UC23, the wild progenitor species of L. sativa. The high level of replication of each allele in the recombinant inbred lines was exploited to identify single-position polymorphisms that were assigned to parental haplotypes. Marker information has been made available using GBrowse to facilitate access to the map. This map has been anchored to the previously published integrated map of lettuce providing candidate genes for multiple phenotypes. The high density of markers achieved in this ultradense map allowed syntenic studies between lettuce and Vitis vinifera as well as other plant species. PMID:23550116

An Ultra-High-Density, Transcript-Based, Genetic Map of Lettuce.

PubMed

Truco, Maria José; Ashrafi, Hamid; Kozik, Alexander; van Leeuwen, Hans; Bowers, John; Wo, Sebastian Reyes Chin; Stoffel, Kevin; Xu, Huaqin; Hill, Theresa; Van Deynze, Allen; Michelmore, Richard W

2013-04-09

We have generated an ultra-high-density genetic map for lettuce, an economically important member of the Compositae, consisting of 12,842 unigenes (13,943 markers) mapped in 3696 genetic bins distributed over nine chromosomal linkage groups. Genomic DNA was hybridized to a custom Affymetrix oligonucleotide array containing 6.4 million features representing 35,628 unigenes of Lactuca spp. Segregation of single-position polymorphisms was analyzed using 213 F 7:8 recombinant inbred lines that had been generated by crossing cultivated Lactuca sativa cv. Salinas and L. serriola acc. US96UC23, the wild progenitor species of L. sativa The high level of replication of each allele in the recombinant inbred lines was exploited to identify single-position polymorphisms that were assigned to parental haplotypes. Marker information has been made available using GBrowse to facilitate access to the map. This map has been anchored to the previously published integrated map of lettuce providing candidate genes for multiple phenotypes. The high density of markers achieved in this ultradense map allowed syntenic studies between lettuce and Vitis vinifera as well as other plant species. Copyright © 2013 Truco et al.
Translational profiling of B cells infected with the Epstein-Barr virus reveals 5' leader ribosome recruitment through upstream open reading frames.

PubMed

Bencun, Maja; Klinke, Olaf; Hotz-Wagenblatt, Agnes; Klaus, Severina; Tsai, Ming-Han; Poirey, Remy; Delecluse, Henri-Jacques

2018-04-06

The Epstein-Barr virus (EBV) genome encodes several hundred transcripts. We have used ribosome profiling to characterize viral translation in infected cells and map new translation initiation sites. We show here that EBV transcripts are translated with highly variable efficiency, owing to variable transcription and translation rates, variable ribosome recruitment to the leader region and coverage by monosomes versus polysomes. Some transcripts were hardly translated, others mainly carried monosomes, showed ribosome accumulation in leader regions and most likely represent non-coding RNAs. A similar process was visible for a subset of lytic genes including the key transactivators BZLF1 and BRLF1 in cells infected with weakly replicating EBV strains. This suggests that ribosome trapping, particularly in the leader region, represents a new checkpoint for the repression of lytic replication. We could identify 25 upstream open reading frames (uORFs) located upstream of coding transcripts that displayed 5' leader ribosome trapping, six of which were located in the leader region shared by many latent transcripts. These uORFs repressed viral translation and are likely to play an important role in the regulation of EBV translation.
[Transcription activator-like effectors(TALEs)based genome engineering].

PubMed

Zhao, Mei-Wei; Duan, Cheng-Li; Liu, Jiang

2013-10-01

Systematic reverse-engineering of functional genome architecture requires precise modifications of gene sequences and transcription levels. The development and application of transcription activator-like effectors(TALEs) has created a wealth of genome engineering possibilities. TALEs are a class of naturally occurring DNA-binding proteins found in the plant pathogen Xanthomonas species. The DNA-binding domain of each TALE typically consists of tandem 34-amino acid repeat modules rearranged according to a simple cipher to target new DNA sequences. Customized TALEs can be used for a wide variety of genome engineering applications, including transcriptional modulation and genome editing. Such "genome engineering" has now been established in human cells and a number of model organisms, thus opening the door to better understanding gene function in model organisms, improving traits in crop plants and treating human genetic disorders.
HEDD: Human Enhancer Disease Database

PubMed Central

Wang, Zhen; Zhang, Quanwei; Zhang, Wen; Lin, Jhih-Rong; Cai, Ying; Mitra, Joydeep

2018-01-01

Abstract Enhancers, as specialized genomic cis-regulatory elements, activate transcription of their target genes and play an important role in pathogenesis of many human complex diseases. Despite recent systematic identification of them in the human genome, currently there is an urgent need for comprehensive annotation databases of human enhancers with a focus on their disease connections. In response, we built the Human Enhancer Disease Database (HEDD) to facilitate studies of enhancers and their potential roles in human complex diseases. HEDD currently provides comprehensive genomic information for ∼2.8 million human enhancers identified by ENCODE, FANTOM5 and RoadMap with disease association scores based on enhancer–gene and gene–disease connections. It also provides Web-based analytical tools to visualize enhancer networks and score enhancers given a set of selected genes in a specific gene network. HEDD is freely accessible at http://zdzlab.einstein.yu.edu/1/hedd.php. PMID:29077884
Motif discovery and motif finding from genome-mapped DNase footprint data.

PubMed

Kulakovskiy, Ivan V; Favorov, Alexander V; Makeev, Vsevolod J

2009-09-15

Footprint data is an important source of information on transcription factor recognition motifs. However, a footprinting fragment can contain no sequences similar to known protein recognition sites. Inspection of genome fragments nearby can help to identify missing site positions. Genome fragments containing footprints were supplied to a pipeline that constructed a position weight matrix (PWM) for different motif lengths and selected the optimal PWM. Fragments were aligned with the SeSiMCMC sampler and a new heuristic algorithm, Bigfoot. Footprints with missing hits were found for approximately 50% of factors. Adding only 2 bp on both sides of a footprinting fragment recovered most hits. We automatically constructed motifs for 41 Drosophila factors. New motifs can recognize footprints with a greater sensitivity at the same false positive rate than existing models. Also we discuss possible overfitting of constructed motifs. Software and the collection of regulatory motifs are freely available at http://line.imb.ac.ru/DMMPMM.
Genetics of climate change adaptation.

PubMed

Franks, Steven J; Hoffmann, Ary A

2012-01-01

The rapid rate of current global climate change is having strong effects on many species and, at least in some cases, is driving evolution, particularly when changes in conditions alter patterns of selection. Climate change thus provides an opportunity for the study of the genetic basis of adaptation. Such studies include a variety of observational and experimental approaches, such as sampling across clines, artificial evolution experiments, and resurrection studies. These approaches can be combined with a number of techniques in genetics and genomics, including association and mapping analyses, genome scans, and transcription profiling. Recent research has revealed a number of candidate genes potentially involved in climate change adaptation and has also illustrated that genetic regulatory networks and epigenetic effects may be particularly relevant for evolution driven by climate change. Although genetic and genomic data are rapidly accumulating, we still have much to learn about the genetic architecture of climate change adaptation.
Horizontal Gene Transfer of Pectinases from Bacteria Preceded the Diversification of Stick and Leaf Insects

PubMed Central

Shelomi, Matan; Danchin, Etienne G. J.; Heckel, David; Wipfler, Benjamin; Bradler, Sven; Zhou, Xin; Pauchet, Yannick

2016-01-01

Genes acquired by horizontal transfer are increasingly being found in animal genomes. Understanding their origin and evolution requires knowledge about the phylogenetic relationships from both source and recipient organisms. We used RNASeq data and respective assembled transcript libraries to trace the evolutionary history of polygalacturonase (pectinase) genes in stick insects (Phasmatodea). By mapping the distribution of pectinase genes on a Polyneoptera phylogeny, we identified the transfer of pectinase genes from known phasmatodean gut microbes into the genome of an early euphasmatodean ancestor that took place between 60 and 100 million years ago. This transfer preceded the rapid diversification of the suborder, enabling symbiont-free pectinase production that would increase the insects’ digestive efficiency and reduce dependence on microbes. Bacteria-to-insect gene transfer was thought to be uncommon, however the increasing availability of large-scale genomic data may change this prevailing notion. PMID:27210832
Transcription Factors Bind Thousands of Active and InactiveRegions in the Drosophila Blastoderm

DOE Office of Scientific and Technical Information (OSTI.GOV)

Li, Xiao-Yong; MacArthur, Stewart; Bourgon, Richard

2008-01-10

Identifying the genomic regions bound by sequence-specific regulatory factors is central both to deciphering the complex DNA cis-regulatory code that controls transcription in metazoans and to determining the range of genes that shape animal morphogenesis. Here, we use whole-genome tiling arrays to map sequences bound in Drosophila melanogaster embryos by the six maternal and gap transcription factors that initiate anterior-posterior patterning. We find that these sequence-specific DNA binding proteins bind with quantitatively different specificities to highly overlapping sets of several thousand genomic regions in blastoderm embryos. Specific high- and moderate-affinity in vitro recognition sequences for each factor are enriched inmore » bound regions. This enrichment, however, is not sufficient to explain the pattern of binding in vivo and varies in a context-dependent manner, demonstrating that higher-order rules must govern targeting of transcription factors. The more highly bound regions include all of the over forty well-characterized enhancers known to respond to these factors as well as several hundred putative new cis-regulatory modules clustered near developmental regulators and other genes with patterned expression at this stage of embryogenesis. The new targets include most of the microRNAs (miRNAs) transcribed in the blastoderm, as well as all major zygotically transcribed dorsal-ventral patterning genes, whose expression we show to be quantitatively modulated by anterior-posterior factors. In addition to these highly bound regions, there are several thousand regions that are reproducibly bound at lower levels. However, these poorly bound regions are, collectively, far more distant from genes transcribed in the blastoderm than highly bound regions; are preferentially found in protein-coding sequences; and are less conserved than highly bound regions. Together these observations suggest that many of these poorly-bound regions are not involved in early-embryonic transcriptional regulation, and a significant proportion may be nonfunctional. Surprisingly, for five of the six factors, their recognition sites are not unambiguously more constrained evolutionarily than the immediate flanking DNA, even in more highly bound and presumably functional regions, indicating that comparative DNA sequence analysis is limited in its ability to identify functional transcription factor targets.« less
Genome-wide characterization of the WRKY gene family in radish (Raphanus sativus L.) reveals its critical functions under different abiotic stresses.

PubMed

Karanja, Bernard Kinuthia; Fan, Lianxue; Xu, Liang; Wang, Yan; Zhu, Xianwen; Tang, Mingjia; Wang, Ronghua; Zhang, Fei; Muleke, Everlyne M'mbone; Liu, Liwang

2017-11-01

The radish WRKY gene family was genome-widely identified and played critical roles in response to multiple abiotic stresses. The WRKY is among the largest transcription factors (TFs) associated with multiple biological activities for plant survival, including control response mechanisms against abiotic stresses such as heat, salinity, and heavy metals. Radish is an important root vegetable crop and therefore characterization and expression pattern investigation of WRKY transcription factors in radish is imperative. In the present study, 126 putative WRKY genes were retrieved from radish genome database. Protein sequence and annotation scrutiny confirmed that RsWRKY proteins possessed highly conserved domains and zinc finger motif. Based on phylogenetic analysis results, RsWRKYs candidate genes were divided into three groups (Group I, II and III) with the number 31, 74, and 20, respectively. Additionally, gene structure analysis revealed that intron-exon patterns of the WRKY genes are highly conserved in radish. Linkage map analysis indicated that RsWRKY genes were distributed with varying densities over nine linkage groups. Further, RT-qPCR analysis illustrated the significant variation of 36 RsWRKY genes under one or more abiotic stress treatments, implicating that they might be stress-responsive genes. In total, 126 WRKY TFs were identified from the R. sativus genome wherein, 35 of them showed abiotic stress-induced expression patterns. These results provide a genome-wide characterization of RsWRKY TFs and baseline for further functional dissection and molecular evolution investigation, specifically for improving abiotic stress resistances with an ultimate goal of increasing yield and quality of radish.
Enabling a Community to Dissect an Organism: Overview of the Neurospora Functional Genomics Project

PubMed Central

Dunlap, Jay C.; Borkovich, Katherine A.; Henn, Matthew R.; Turner, Gloria E.; Sachs, Matthew S.; Glass, N. Louise; McCluskey, Kevin; Plamann, Michael; Galagan, James E.; Birren, Bruce W.; Weiss, Richard L.; Townsend, Jeffrey P.; Loros, Jennifer J.; Nelson, Mary Anne; Lambreghts, Randy; Colot, Hildur V.; Park, Gyungsoon; Collopy, Patrick; Ringelberg, Carol; Crew, Christopher; Litvinkova, Liubov; DeCaprio, Dave; Hood, Heather M.; Curilla, Susan; Shi, Mi; Crawford, Matthew; Koerhsen, Michael; Montgomery, Phil; Larson, Lisa; Pearson, Matthew; Kasuga, Takao; Tian, Chaoguang; Baştürkmen, Meray; Altamirano, Lorena; Xu, Junhuan

2013-01-01

A consortium of investigators is engaged in a functional genomics project centered on the filamentous fungus Neurospora, with an eye to opening up the functional genomic analysis of all the filamentous fungi. The overall goal of the four interdependent projects in this effort is to acccomplish functional genomics, annotation, and expression analyses of Neurospora crassa, a filamentous fungus that is an established model for the assemblage of over 250,000 species of nonyeast fungi. Building from the completely sequenced 43-Mb Neurospora genome, Project 1 is pursuing the systematic disruption of genes through targeted gene replacements, phenotypic analysis of mutant strains, and their distribution to the scientific community at large. Project 2, through a primary focus in Annotation and Bioinformatics, has developed a platform for electronically capturing community feedback and data about the existing annotation, while building and maintaining a database to capture and display information about phenotypes. Oligonucleotide-based microarrays created in Project 3 are being used to collect baseline expression data for the nearly 11,000 distinguishable transcripts in Neurospora under various conditions of growth and development, and eventually to begin to analyze the global effects of loss of novel genes in strains created by Project 1. cDNA libraries generated in Project 4 document the overall complexity of expressed sequences in Neurospora, including alternative splicing alternative promoters and antisense transcripts. In addition, these studies have driven the assembly of an SNP map presently populated by nearly 300 markers that will greatly accelerate the positional cloning of genes. PMID:17352902
Modeling and interoperability of heterogeneous genomic big data for integrative processing and querying.

PubMed

Masseroli, Marco; Kaitoua, Abdulrahman; Pinoli, Pietro; Ceri, Stefano

2016-12-01

While a huge amount of (epi)genomic data of multiple types is becoming available by using Next Generation Sequencing (NGS) technologies, the most important emerging problem is the so-called tertiary analysis, concerned with sense making, e.g., discovering how different (epi)genomic regions and their products interact and cooperate with each other. We propose a paradigm shift in tertiary analysis, based on the use of the Genomic Data Model (GDM), a simple data model which links genomic feature data to their associated experimental, biological and clinical metadata. GDM encompasses all the data formats which have been produced for feature extraction from (epi)genomic datasets. We specifically describe the mapping to GDM of SAM (Sequence Alignment/Map), VCF (Variant Call Format), NARROWPEAK (for called peaks produced by NGS ChIP-seq or DNase-seq methods), and BED (Browser Extensible Data) formats, but GDM supports as well all the formats describing experimental datasets (e.g., including copy number variations, DNA somatic mutations, or gene expressions) and annotations (e.g., regarding transcription start sites, genes, enhancers or CpG islands). We downloaded and integrated samples of all the above-mentioned data types and formats from multiple sources. The GDM is able to homogeneously describe semantically heterogeneous data and makes the ground for providing data interoperability, e.g., achieved through the GenoMetric Query Language (GMQL), a high-level, declarative query language for genomic big data. The combined use of the data model and the query language allows comprehensive processing of multiple heterogeneous data, and supports the development of domain-specific data-driven computations and bio-molecular knowledge discovery. Copyright Â© 2016 Elsevier Inc. All rights reserved.
In the loop: promoter–enhancer interactions and bioinformatics

PubMed Central

Mora, Antonio; Sandve, Geir Kjetil; Gabrielsen, Odd Stokke

2016-01-01

Enhancer–promoter regulation is a fundamental mechanism underlying differential transcriptional regulation. Spatial chromatin organization brings remote enhancers in contact with target promoters in cis to regulate gene expression. There is considerable evidence for promoter–enhancer interactions (PEIs). In the recent years, genome-wide analyses have identified signatures and mapped novel enhancers; however, being able to precisely identify their target gene(s) requires massive biological and bioinformatics efforts. In this review, we give a short overview of the chromatin landscape and transcriptional regulation. We discuss some key concepts and problems related to chromatin interaction detection technologies, and emerging knowledge from genome-wide chromatin interaction data sets. Then, we critically review different types of bioinformatics analysis methods and tools related to representation and visualization of PEI data, raw data processing and PEI prediction. Lastly, we provide specific examples of how PEIs have been used to elucidate a functional role of non-coding single-nucleotide polymorphisms. The topic is at the forefront of epigenetic research, and by highlighting some future bioinformatics challenges in the field, this review provides a comprehensive background for future PEI studies. PMID:26586731
Metabolic Reconstruction and Modeling Microbial Electrosynthesis.

PubMed

Marshall, Christopher W; Ross, Daniel E; Handley, Kim M; Weisenhorn, Pamela B; Edirisinghe, Janaka N; Henry, Christopher S; Gilbert, Jack A; May, Harold D; Norman, R Sean

2017-08-21

Microbial electrosynthesis is a renewable energy and chemical production platform that relies on microbial cells to capture electrons from a cathode and fix carbon. Yet despite the promise of this technology, the metabolic capacity of the microbes that inhabit the electrode surface and catalyze electron transfer in these systems remains largely unknown. We assembled thirteen draft genomes from a microbial electrosynthesis system producing primarily acetate from carbon dioxide, and their transcriptional activity was mapped to genomes from cells on the electrode surface and in the supernatant. This allowed us to create a metabolic model of the predominant community members belonging to Acetobacterium, Sulfurospirillum, and Desulfovibrio. According to the model, the Acetobacterium was the primary carbon fixer, and a keystone member of the community. Transcripts of soluble hydrogenases and ferredoxins from Acetobacterium and hydrogenases, formate dehydrogenase, and cytochromes of Desulfovibrio were found in high abundance near the electrode surface. Cytochrome c oxidases of facultative members of the community were highly expressed in the supernatant despite completely sealed reactors and constant flushing with anaerobic gases. These molecular discoveries and metabolic modeling now serve as a foundation for future examination and development of electrosynthetic microbial communities.
ChIP-Chip Identifies SEC23A, CFDP1, and NSD1 as TFII-I Target Genes in Human Neural Crest Progenitor Cells.

PubMed

Makeyev, Aleksandr V; Bayarsaihan, Dashzeveg

2013-05-01

Objectives : GTF2I and GTF2IRD1 genes located in Williams-Beuren syndrome (WBS) critical region encode TFII-I family transcription factors. The aim of this study was to map genomic sites bound by these proteins across promoter regions of developmental regulators associated with craniofacial development. Design : Chromatin was isolated from human neural crest progenitor cells and the DNA-binding profile was generated using the human RefSeq tiling promoter ChIP-chip arrays. Results : TFII-I transcription factors are recruited to the promoters of SEC23A, CFDP1, and NSD1 previously defined as TFII-I target genes. Moreover, our analysis revealed additional binding elements that contain E-boxes and initiator-like motifs. Conclusions : Genome-wide promoter binding studies revealed SEC23A, CFDP1, and NSD1 linked to craniofacial or dental development as direct TFII-I targets. Developmental regulation of these genes by TFII-I factors could contribute to the WBS-specific facial dysmorphism.
Inferring genome-wide interplay landscape between DNA methylation and transcriptional regulation.

PubMed

Tang, Binhua; Wang, Xin

2015-01-01

DNA methylation and transcriptional regulation play important roles in cancer cell development and differentiation processes. Based on the currently available cell line profiling information from the ENCODE Consortium, we propose a Bayesian inference model to infer and construct genome-wide interaction landscape between DNA methylation and transcriptional regulation, which sheds light on the underlying complex functional mechanisms important within the human cancer and disease context. For the first time, we select all the currently available cell lines (>=20) and transcription factors (>=80) profiling information from the ENCODE Consortium portal. Through the integration of those genome-wide profiling sources, our genome-wide analysis detects multiple functional loci of interest, and indicates that DNA methylation is cell- and region-specific, due to the interplay mechanisms with transcription regulatory activities. We validate our analysis results with the corresponding RNA-sequencing technique for those detected genomic loci. Our results provide novel and meaningful insights for the interplay mechanisms of transcriptional regulation and gene expression for the human cancer and disease studies.
Global Mapping of Cell Type–Specific Open Chromatin by FAIRE-seq Reveals the Regulatory Role of the NFI Family in Adipocyte Differentiation

PubMed Central

Yu, Jing; Hirose-Yotsuya, Lisa; Take, Kazumi; Sun, Wei; Iwabu, Masato; Okada-Iwabu, Miki; Fujita, Takanori; Aoyama, Tomohisa; Tsutsumi, Shuichi; Ueki, Kohjiro; Kodama, Tatsuhiko; Sakai, Juro; Aburatani, Hiroyuki; Kadowaki, Takashi

2011-01-01

Identification of regulatory elements within the genome is crucial for understanding the mechanisms that govern cell type–specific gene expression. We generated genome-wide maps of open chromatin sites in 3T3-L1 adipocytes (on day 0 and day 8 of differentiation) and NIH-3T3 fibroblasts using formaldehyde-assisted isolation of regulatory elements coupled with high-throughput sequencing (FAIRE-seq). FAIRE peaks at the promoter were associated with active transcription and histone modifications of H3K4me3 and H3K27ac. Non-promoter FAIRE peaks were characterized by H3K4me1+/me3-, the signature of enhancers, and were largely located in distal regions. The non-promoter FAIRE peaks showed dynamic change during differentiation, while the promoter FAIRE peaks were relatively constant. Functionally, the adipocyte- and preadipocyte-specific non-promoter FAIRE peaks were, respectively, associated with genes up-regulated and down-regulated by differentiation. Genes highly up-regulated during differentiation were associated with multiple clustered adipocyte-specific FAIRE peaks. Among the adipocyte-specific FAIRE peaks, 45.3% and 11.7% overlapped binding sites for, respectively, PPARγ and C/EBPα, the master regulators of adipocyte differentiation. Computational motif analyses of the adipocyte-specific FAIRE peaks revealed enrichment of a binding motif for nuclear family I (NFI) transcription factors. Indeed, ChIP assay showed that NFI occupy the adipocyte-specific FAIRE peaks and/or the PPARγ binding sites near PPARγ, C/EBPα, and aP2 genes. Overexpression of NFIA in 3T3-L1 cells resulted in robust induction of these genes and lipid droplet formation without differentiation stimulus. Overexpression of dominant-negative NFIA or siRNA–mediated knockdown of NFIA or NFIB significantly suppressed both induction of genes and lipid accumulation during differentiation, suggesting a physiological function of these factors in the adipogenic program. Together, our study demonstrates the utility of FAIRE-seq in providing a global view of cell type–specific regulatory elements in the genome and in identifying transcriptional regulators of adipocyte differentiation. PMID:22028663
Genome-wide screening of indicator genes for assessing the potential carcinogenic risk of Nanjing city drinking water.

PubMed

Zhang, Rui; Cheng, Shupei; Li, Aimin; Sun, Jie; Zhang, Yan; Zhang, Xuxiang

2011-07-01

Effects of all pollutants existing in the Nanjing city drinking water (DWNC) on mouse gene transcription levels were measured to assess the DWNC carcinogenic risks and to identify candidate indicator genes for assessing and early warning the cancer risks. Transcriptional expression levels of 14,000 hepatic genes for the treatment group mice (Mus musculus, ICR) fed with DWNC for 90 days were detected using the GeneChip(®) Mouse Genome 430A 2.0 array. The analysis indicated that the transcriptional levels of 294 genes were up-regulated and 542 ones were down-regulated. Of these genes, 12 ones identified to be involved in at least five different types of cancers were further analyzed. An interrogation by Kyoto Encyclopedia of Genes and Genomes (KEGG) revealed that three (including ITGAV, CCND1 and SMAD2) of the 12 genes were mapped to pathway in cancer. Gene Ontology (GO) function annotation also showed that they were associated with the functional categories of cell cycle regulation, adhesion, apoptosis, signal transduction and so on which are closely implicated in tumorigenesis and progression. The correlations between the aberrant expressions of them and the genesis and progression of cancers have been further documented by a number of scientific researches. These results might demonstrate that the potential toxicity and carcinogenic risks were associated with DWNC. Moreover, ITGAV, CCND1 and SMAD2 were identified as the most likely candidate indicator genes for the assessment of the combined carcinogenic risk of all pollutants existing in DWNC.
Friends-enemies: endogenous retroviruses are major transcriptional regulators of human DNA

NASA Astrophysics Data System (ADS)

Buzdin, Anton A.; Prassolov, Vladimir; Garazha, Andrew V.

2017-06-01

Endogenous retroviruses are mobile genetic elements hardly distinguishable from infectious, or “exogenous”, retroviruses at the time of insertion in the host DNA. Human endogenous retroviruses (HERVs) are not rare. They gave rise to multiple families of closely related mobile elements that occupy 8% of the human genome. Together, they shape genomic regulatory landscape by providing at least 320,000 human transcription factor binding sites (TFBS) located on 110,000 individual HERV elements. The HERVs host as many as 155,000 mapped DNaseI hypersensitivity sites, which denote loci active in the regulation of gene expression or chromatin structure. The contemporary view of the HERVs evolutionary dynamics suggests that at the early stages after insertion, the HERV is treated by the host cells as a foreign genetic element, and is likely to be suppressed by the targeted methylation and mutations. However, at the later stages, when significant number of mutations has been already accumulated and when the retroviral genes are broken, the regulatory potential of a HERV may be released and recruited to modify the genomic balance of transcription factor binding sites. This process goes together with further accumulation and selection of mutations, which reshape the regulatory landscape of the human DNA. However, developmental reprogramming, stress or pathological conditions like cancer, inflammation and infectious diseases, can remove the blocks limiting expression and HERV-mediated host gene regulation. This, in turn, can dramatically alter the gene expression equilibrium and shift it to a newer state, thus further amplifying instability and exacerbating the stressful situation.
Topological analysis of metabolic networks integrating co-segregating transcriptomes and metabolomes in type 2 diabetic rat congenic series.

PubMed

Dumas, Marc-Emmanuel; Domange, Céline; Calderari, Sophie; Martínez, Andrea Rodríguez; Ayala, Rafael; Wilder, Steven P; Suárez-Zamorano, Nicolas; Collins, Stephan C; Wallis, Robert H; Gu, Quan; Wang, Yulan; Hue, Christophe; Otto, Georg W; Argoud, Karène; Navratil, Vincent; Mitchell, Steve C; Lindon, John C; Holmes, Elaine; Cazier, Jean-Baptiste; Nicholson, Jeremy K; Gauguier, Dominique

2016-09-30

The genetic regulation of metabolic phenotypes (i.e., metabotypes) in type 2 diabetes mellitus occurs through complex organ-specific cellular mechanisms and networks contributing to impaired insulin secretion and insulin resistance. Genome-wide gene expression profiling systems can dissect the genetic contributions to metabolome and transcriptome regulations. The integrative analysis of multiple gene expression traits and metabolic phenotypes (i.e., metabotypes) together with their underlying genetic regulation remains a challenge. Here, we introduce a systems genetics approach based on the topological analysis of a combined molecular network made of genes and metabolites identified through expression and metabotype quantitative trait locus mapping (i.e., eQTL and mQTL) to prioritise biological characterisation of candidate genes and traits. We used systematic metabotyping by 1 H NMR spectroscopy and genome-wide gene expression in white adipose tissue to map molecular phenotypes to genomic blocks associated with obesity and insulin secretion in a series of rat congenic strains derived from spontaneously diabetic Goto-Kakizaki (GK) and normoglycemic Brown-Norway (BN) rats. We implemented a network biology strategy approach to visualize the shortest paths between metabolites and genes significantly associated with each genomic block. Despite strong genomic similarities (95-99 %) among congenics, each strain exhibited specific patterns of gene expression and metabotypes, reflecting the metabolic consequences of series of linked genetic polymorphisms in the congenic intervals. We subsequently used the congenic panel to map quantitative trait loci underlying specific mQTLs and genome-wide eQTLs. Variation in key metabolites like glucose, succinate, lactate, or 3-hydroxybutyrate and second messenger precursors like inositol was associated with several independent genomic intervals, indicating functional redundancy in these regions. To navigate through the complexity of these association networks we mapped candidate genes and metabolites onto metabolic pathways and implemented a shortest path strategy to highlight potential mechanistic links between metabolites and transcripts at colocalized mQTLs and eQTLs. Minimizing the shortest path length drove prioritization of biological validations by gene silencing. These results underline the importance of network-based integration of multilevel systems genetics datasets to improve understanding of the genetic architecture of metabotype and transcriptomic regulation and to characterize novel functional roles for genes determining tissue-specific metabolism.
Molecular pathways: transcription factories and chromosomal translocations.

PubMed

Osborne, Cameron S

2014-01-15

The mammalian nucleus is a highly complex structure that carries out a diverse range of functions such as DNA replication, cell division, RNA processing, and nuclear export/import. Many of these activities occur at discrete subcompartments that intersect with specific regions of the genome. Over the past few decades, evidence has accumulated to suggest that RNA transcription also occurs in specialized sites, called transcription factories, that may influence how the genome is organized. There may be certain efficiency benefits to cluster transcriptional activity in this way. However, the clustering of genes at transcription factories may have consequences for genome stability, and increase the susceptibility to recurrent chromosomal translocations that lead to cancer. The relationships between genome organization, transcription, and chromosomal translocation formation will have important implications in understanding the causes of therapy-related cancers. ©2013 AACR.

USE OF TRANSCRIPTIONAL COUPLING AND KEGG PATHWAY ANALYSIS OF GLOBAL GENE EXPRESSION TO REVEAL TRANSCRIPTIONAL CHANGES BETWEEN STATIONARY- AND LOG-PHASE SALMONELLA TYPHIMURIUM LT2

EPA Science Inventory

DNA microarray analysis is plagued by a lack of data reproducibility and by limits to the detectability of transcripts by hybridization. To mitigate these limitations, we employed transcriptional coupling within the S. typhimurium genome. This genome has 2664 transcriptionally co...
Comparative transcriptome and gene co-expression network analysis reveal genes and signaling pathways adaptively responsive to varied adverse stresses in the insect fungal pathogen, Beauveria bassiana.

PubMed

He, Zhangjiang; Zhao, Xin; Lu, Zhuoyue; Wang, Huifang; Liu, Pengfei; Zeng, Fanqin; Zhang, Yongjun

2018-01-01

Sensing, responding, and adapting to the surrounding environment are crucial for all living organisms to survive, proliferate, and differentiate in their biological niches. Beauveria bassiana is an economically important insect-pathogenic fungus which is widely used as a biocontrol agent to control a variety of insect pests. The fungal pathogen unavoidably encounters a variety of adverse environmental stresses and defense response from the host insects during application of the fungal agents. However, few are known about the transcription response of the fungus to respond or adapt varied adverse stresses. Here, we comparatively analyzed the transcriptome of B. bassiana in globe genome under the varied stationary-phase stresses including osmotic agent (0.8 M NaCl), high temperature (32 °C), cell wall-perturbing agent (Congo red), and oxidative agents (H 2 O 2 or menadione). Total of 12,412 reads were obtained, and mapped to the 6767 genes of the B. bassiana. All of these stresses caused transcription responses involved in basal metabolism, cell wall construction, stress response or cell rescue/detoxification, signaling transduction and gene transcription regulation, and likely other cellular processes. An array of genes displayed similar transcription patterns in response to at least two of the five stresses, suggesting a shared transcription response to varied adverse stresses. Gene co-expression network analysis revealed that mTOR signaling pathway, but not HOG1 MAP kinase pathway, played a central role in regulation the varied adverse stress responses, which was verified by RNAi-mediated knockdown of TOR1. Our findings provided an insight of transcription response and gene co-expression network of B. bassiana in adaptation to varied environments. Copyright © 2017 Elsevier Inc. All rights reserved.
Aquaculture genomics, genetics and breeding in the United States: current status, challenges, and priorities for future research.

PubMed

Abdelrahman, Hisham; ElHady, Mohamed; Alcivar-Warren, Acacia; Allen, Standish; Al-Tobasei, Rafet; Bao, Lisui; Beck, Ben; Blackburn, Harvey; Bosworth, Brian; Buchanan, John; Chappell, Jesse; Daniels, William; Dong, Sheng; Dunham, Rex; Durland, Evan; Elaswad, Ahmed; Gomez-Chiarri, Marta; Gosh, Kamal; Guo, Ximing; Hackett, Perry; Hanson, Terry; Hedgecock, Dennis; Howard, Tiffany; Holland, Leigh; Jackson, Molly; Jin, Yulin; Khalil, Karim; Kocher, Thomas; Leeds, Tim; Li, Ning; Lindsey, Lauren; Liu, Shikai; Liu, Zhanjiang; Martin, Kyle; Novriadi, Romi; Odin, Ramjie; Palti, Yniv; Peatman, Eric; Proestou, Dina; Qin, Guyu; Reading, Benjamin; Rexroad, Caird; Roberts, Steven; Salem, Mohamed; Severin, Andrew; Shi, Huitong; Shoemaker, Craig; Stiles, Sheila; Tan, Suxu; Tang, Kathy F J; Thongda, Wilawan; Tiersch, Terrence; Tomasso, Joseph; Prabowo, Wendy Tri; Vallejo, Roger; van der Steen, Hein; Vo, Khoi; Waldbieser, Geoff; Wang, Hanping; Wang, Xiaozhu; Xiang, Jianhai; Yang, Yujia; Yant, Roger; Yuan, Zihao; Zeng, Qifan; Zhou, Tao

2017-02-20

Advancing the production efficiency and profitability of aquaculture is dependent upon the ability to utilize a diverse array of genetic resources. The ultimate goals of aquaculture genomics, genetics and breeding research are to enhance aquaculture production efficiency, sustainability, product quality, and profitability in support of the commercial sector and for the benefit of consumers. In order to achieve these goals, it is important to understand the genomic structure and organization of aquaculture species, and their genomic and phenomic variations, as well as the genetic basis of traits and their interrelationships. In addition, it is also important to understand the mechanisms of regulation and evolutionary conservation at the levels of genome, transcriptome, proteome, epigenome, and systems biology. With genomic information and information between the genomes and phenomes, technologies for marker/causal mutation-assisted selection, genome selection, and genome editing can be developed for applications in aquaculture. A set of genomic tools and resources must be made available including reference genome sequences and their annotations (including coding and non-coding regulatory elements), genome-wide polymorphic markers, efficient genotyping platforms, high-density and high-resolution linkage maps, and transcriptome resources including non-coding transcripts. Genomic and genetic control of important performance and production traits, such as disease resistance, feed conversion efficiency, growth rate, processing yield, behaviour, reproductive characteristics, and tolerance to environmental stressors like low dissolved oxygen, high or low water temperature and salinity, must be understood. QTL need to be identified, validated across strains, lines and populations, and their mechanisms of control understood. Causal gene(s) need to be identified. Genetic and epigenetic regulation of important aquaculture traits need to be determined, and technologies for marker-assisted selection, causal gene/mutation-assisted selection, genome selection, and genome editing using CRISPR and other technologies must be developed, demonstrated with applicability, and application to aquaculture industries.Major progress has been made in aquaculture genomics for dozens of fish and shellfish species including the development of genetic linkage maps, physical maps, microarrays, single nucleotide polymorphism (SNP) arrays, transcriptome databases and various stages of genome reference sequences. This paper provides a general review of the current status, challenges and future research needs of aquaculture genomics, genetics, and breeding, with a focus on major aquaculture species in the United States: catfish, rainbow trout, Atlantic salmon, tilapia, striped bass, oysters, and shrimp. While the overall research priorities and the practical goals are similar across various aquaculture species, the current status in each species should dictate the next priority areas within the species. This paper is an output of the USDA Workshop for Aquaculture Genomics, Genetics, and Breeding held in late March 2016 in Auburn, Alabama, with participants from all parts of the United States.
Transcription as a Threat to Genome Integrity.

PubMed

Gaillard, Hélène; Aguilera, Andrés

2016-06-02

Genomes undergo different types of sporadic alterations, including DNA damage, point mutations, and genome rearrangements, that constitute the basis for evolution. However, these changes may occur at high levels as a result of cell pathology and trigger genome instability, a hallmark of cancer and a number of genetic diseases. In the last two decades, evidence has accumulated that transcription constitutes an important natural source of DNA metabolic errors that can compromise the integrity of the genome. Transcription can create the conditions for high levels of mutations and recombination by its ability to open the DNA structure and remodel chromatin, making it more accessible to DNA insulting agents, and by its ability to become a barrier to DNA replication. Here we review the molecular basis of such events from a mechanistic perspective with particular emphasis on the role of transcription as a genome instability determinant.
Pea Marker Database (PMD) - A new online database combining known pea (Pisum sativum L.) gene-based markers.

PubMed

Kulaeva, Olga A; Zhernakov, Aleksandr I; Afonin, Alexey M; Boikov, Sergei S; Sulima, Anton S; Tikhonovich, Igor A; Zhukov, Vladimir A

2017-01-01

Pea (Pisum sativum L.) is the oldest model object of plant genetics and one of the most agriculturally important legumes in the world. Since the pea genome has not been sequenced yet, identification of genes responsible for mutant phenotypes or desirable agricultural traits is usually performed via genetic mapping followed by candidate gene search. Such mapping is best carried out using gene-based molecular markers, as it opens the possibility for exploiting genome synteny between pea and its close relative Medicago truncatula Gaertn., possessing sequenced and annotated genome. In the last 5 years, a large number of pea gene-based molecular markers have been designed and mapped owing to the rapid evolution of "next-generation sequencing" technologies. However, the access to the complete set of markers designed worldwide is limited because the data are not uniformed and therefore hard to use. The Pea Marker Database was designed to combine the information about pea markers in a form of user-friendly and practical online tool. Version 1 (PMD1) comprises information about 2484 genic markers, including their locations in linkage groups, the sequences of corresponding pea transcripts and the names of related genes in M. truncatula. Version 2 (PMD2) is an updated version comprising 15944 pea markers in the same format with several advanced features. To test the performance of the PMD, fine mapping of pea symbiotic genes Sym13 and Sym27 in linkage groups VII and V, respectively, was carried out. The results of mapping allowed us to propose the Sen1 gene (a homologue of SEN1 gene of Lotus japonicus (Regel) K. Larsen) as the best candidate gene for Sym13, and to narrow the list of possible candidate genes for Sym27 to ten, thus proving PMD to be useful for pea gene mapping and cloning. All information contained in PMD1 and PMD2 is available at www.peamarker.arriam.ru.
β-adrenergic-stimulated macrophages: Comprehensive localization in the M1–M2 spectrum

PubMed Central

Lamkin, Donald M.; Ho, Hsin-Yun; Ong, Tiffany H.; Kawanishi, Carly K.; Stoffers, Victoria L.; Ahlawat, Nivedita; Ma, Jeffrey C.Y.; Arevalo, Jesusa M. G.; Cole, Steve W.; Sloan, Erica K.

2016-01-01

β-adrenergic signaling can regulate macrophage involvement in several diseases and often produces anti-inflammatory properties in macrophages, which are similar to M2 properties in a dichotomous M1 vs. M2 macrophage taxonomy. However, it is not clear that β-adrenergic-stimulated macrophages may be classified strictly as M2. In this in vitro study, we utilized recently published criteria and transcriptome-wide bioinformatics methods to map the relative polarity of murine β-adrenergic-stimulated macrophages within a wider M1–M2 spectrum. Results show that β-adrenergic-stimulated macrophages did not fit entirely into any one predefined category of the M1–M2 spectrum but did express genes that are representative of some M2 side categories. Moreover, transcript origin analysis of genome-wide transcriptional profiles located β-adrenergic-stimulated macrophages firmly on the M2 side of the M1–M2 spectrum and found active suppression of M1 side gene transcripts. The signal transduction pathways involved were mapped through blocking experiments and bioinformatics analysis of transcription factor binding motifs. M2-promoting effects were mediated specifically through β2-adrenergic receptors and were associated with CREB, C/EBPβ, and ATF transcription factor pathways but not with established M1–M2 STAT pathways. Thus, β-adrenergic-signaling induces a macrophage transcriptome that locates on the M2 side of the M1–M2 spectrum but likely accomplishes this effect through a signaling pathway that is atypical for M2-spectrum macrophages. PMID:27485040
β-Adrenergic-stimulated macrophages: Comprehensive localization in the M1-M2 spectrum.

PubMed

Lamkin, Donald M; Ho, Hsin-Yun; Ong, Tiffany H; Kawanishi, Carly K; Stoffers, Victoria L; Ahlawat, Nivedita; Ma, Jeffrey C Y; Arevalo, Jesusa M G; Cole, Steve W; Sloan, Erica K

2016-10-01

β-Adrenergic signaling can regulate macrophage involvement in several diseases and often produces anti-inflammatory properties in macrophages, which are similar to M2 properties in a dichotomous M1 vs. M2 macrophage taxonomy. However, it is not clear that β-adrenergic-stimulated macrophages may be classified strictly as M2. In this in vitro study, we utilized recently published criteria and transcriptome-wide bioinformatics methods to map the relative polarity of murine β-adrenergic-stimulated macrophages within a wider M1-M2 spectrum. Results show that β-adrenergic-stimulated macrophages did not fit entirely into any one pre-defined category of the M1-M2 spectrum but did express genes that are representative of some M2 side categories. Moreover, transcript origin analysis of genome-wide transcriptional profiles located β-adrenergic-stimulated macrophages firmly on the M2 side of the M1-M2 spectrum and found active suppression of M1 side gene transcripts. The signal transduction pathways involved were mapped through blocking experiments and bioinformatics analysis of transcription factor binding motifs. M2-promoting effects were mediated specifically through β2-adrenergic receptors and were associated with CREB, C/EBPβ, and ATF transcription factor pathways but not with established M1-M2 STAT pathways. Thus, β-adrenergic-signaling induces a macrophage transcriptome that locates on the M2 side of the M1-M2 spectrum but likely accomplishes this effect through a signaling pathway that is atypical for M2-spectrum macrophages. Copyright © 2016 Elsevier Inc. All rights reserved.
Competition between histone and transcription factor binding regulates the onset of transcription in zebrafish embryos

PubMed Central

Joseph, Shai R; Pálfy, Máté; Hilbert, Lennart; Kumar, Mukesh; Karschau, Jens; Zaburdaev, Vasily; Shevchenko, Andrej; Vastenhouw, Nadine L

2017-01-01

Upon fertilization, the genome of animal embryos remains transcriptionally inactive until the maternal-to-zygotic transition. At this time, the embryo takes control of its development and transcription begins. How the onset of zygotic transcription is regulated remains unclear. Here, we show that a dynamic competition for DNA binding between nucleosome-forming histones and transcription factors regulates zebrafish genome activation. Taking a quantitative approach, we found that the concentration of non-DNA-bound core histones sets the time for the onset of transcription. The reduction in nuclear histone concentration that coincides with genome activation does not affect nucleosome density on DNA, but allows transcription factors to compete successfully for DNA binding. In agreement with this, transcription factor binding is sensitive to histone levels and the concentration of transcription factors also affects the time of transcription. Our results demonstrate that the relative levels of histones and transcription factors regulate the onset of transcription in the embryo. DOI: http://dx.doi.org/10.7554/eLife.23326.001 PMID:28425915
Competition between histone and transcription factor binding regulates the onset of transcription in zebrafish embryos.

PubMed

Joseph, Shai R; Pálfy, Máté; Hilbert, Lennart; Kumar, Mukesh; Karschau, Jens; Zaburdaev, Vasily; Shevchenko, Andrej; Vastenhouw, Nadine L

2017-04-20

Upon fertilization, the genome of animal embryos remains transcriptionally inactive until the maternal-to-zygotic transition. At this time, the embryo takes control of its development and transcription begins. How the onset of zygotic transcription is regulated remains unclear. Here, we show that a dynamic competition for DNA binding between nucleosome-forming histones and transcription factors regulates zebrafish genome activation. Taking a quantitative approach, we found that the concentration of non-DNA-bound core histones sets the time for the onset of transcription. The reduction in nuclear histone concentration that coincides with genome activation does not affect nucleosome density on DNA, but allows transcription factors to compete successfully for DNA binding. In agreement with this, transcription factor binding is sensitive to histone levels and the concentration of transcription factors also affects the time of transcription. Our results demonstrate that the relative levels of histones and transcription factors regulate the onset of transcription in the embryo.
Starch biosynthesis in cassava: a genome-based pathway reconstruction and its exploitation in data integration

PubMed Central

2013-01-01

Background Cassava is a well-known starchy root crop utilized for food, feed and biofuel production. However, the comprehension underlying the process of starch production in cassava is not yet available. Results In this work, we exploited the recently released genome information and utilized the post-genomic approaches to reconstruct the metabolic pathway of starch biosynthesis in cassava using multiple plant templates. The quality of pathway reconstruction was assured by the employed parsimonious reconstruction framework and the collective validation steps. Our reconstructed pathway is presented in the form of an informative map, which describes all important information of the pathway, and an interactive map, which facilitates the integration of omics data into the metabolic pathway. Additionally, to demonstrate the advantage of the reconstructed pathways beyond just the schematic presentation, the pathway could be used for incorporating the gene expression data obtained from various developmental stages of cassava roots. Our results exhibited the distinct activities of the starch biosynthesis pathway in different stages of root development at the transcriptional level whereby the activity of the pathway is higher toward the development of mature storage roots. Conclusions To expand its applications, the interactive map of the reconstructed starch biosynthesis pathway is available for download at the SBI group’s website (http://sbi.pdti.kmutt.ac.th/?page_id=33). This work is considered a big step in the quantitative modeling pipeline aiming to investigate the dynamic regulation of starch biosynthesis in cassava roots. PMID:23938102
Starch biosynthesis in cassava: a genome-based pathway reconstruction and its exploitation in data integration.

PubMed

Saithong, Treenut; Rongsirikul, Oratai; Kalapanulak, Saowalak; Chiewchankaset, Porntip; Siriwat, Wanatsanan; Netrphan, Supatcharee; Suksangpanomrung, Malinee; Meechai, Asawin; Cheevadhanarak, Supapon

2013-08-10

Cassava is a well-known starchy root crop utilized for food, feed and biofuel production. However, the comprehension underlying the process of starch production in cassava is not yet available. In this work, we exploited the recently released genome information and utilized the post-genomic approaches to reconstruct the metabolic pathway of starch biosynthesis in cassava using multiple plant templates. The quality of pathway reconstruction was assured by the employed parsimonious reconstruction framework and the collective validation steps. Our reconstructed pathway is presented in the form of an informative map, which describes all important information of the pathway, and an interactive map, which facilitates the integration of omics data into the metabolic pathway. Additionally, to demonstrate the advantage of the reconstructed pathways beyond just the schematic presentation, the pathway could be used for incorporating the gene expression data obtained from various developmental stages of cassava roots. Our results exhibited the distinct activities of the starch biosynthesis pathway in different stages of root development at the transcriptional level whereby the activity of the pathway is higher toward the development of mature storage roots. To expand its applications, the interactive map of the reconstructed starch biosynthesis pathway is available for download at the SBI group's website (http://sbi.pdti.kmutt.ac.th/?page_id=33). This work is considered a big step in the quantitative modeling pipeline aiming to investigate the dynamic regulation of starch biosynthesis in cassava roots.
Single nucleotide-level mapping of DNA double-strand breaks in human HEK293T cells.

PubMed

Pope, Bernard J; Mahmood, Khalid; Jung, Chol-Hee; Georgeson, Peter; Park, Daniel J

2017-03-01

Constitutional biological processes involve the generation of DNA double-strand breaks (DSBs). The production of such breaks and their subsequent resolution are also highly relevant to neurodegenerative diseases and cancer, in which extensive DNA fragmentation has been described Stephens et al. (2011), Blondet et al. (2001). Tchurikov et al. Tchurikov et al. (2011, 2013) have reported previously that frequent sites of DSBs occur in chromosomal domains involved in the co-ordinated expression of genes. This group report that hot spots of DSBs in human HEK293T cells often coincide with H3K4me3 marks, associated with active transcription Kravatsky et al. (2015) and that frequent sites of DNA double-strand breakage are likely to be relevant to cancer genomics Tchurikov et al. (2013, 2016) . Recently, they applied a RAFT (rapid amplification of forum termini) protocol that selects for blunt-ended DSB sites and mapped these to the human genome within defined co-ordinate 'windows'. In this paper, we re-analyse public RAFT data to derive sites of DSBs at the single-nucleotide level across the built genome for human HEK293T cells (https://figshare.com/s/35220b2b79eaaaf64ed8). This refined mapping, combined with accessory ENCODE data tracks and ribosomal DNA-related sequence annotations, will likely be of value for the design of clinically relevant targeted assays such as those for cancer susceptibility, diagnosis, treatment-matching and prognostication.
Mouse scrapie responsive gene 1 (Scrg1): genomic organization, physical linkage to sap30, genetic mapping on chromosome 8, and expression in neuronal primary cell cultures.

PubMed

Dron, M; Tartare, X; Guillo, F; Haik, S; Barbin, G; Maury, C; Tovey, M; Dandoy-Dron, F

2000-11-15

We have previously reported a transcript of a novel mouse gene (Scrg1) with increased expression in transmissible spongiform encephalopathies and the cloning of the human mRNA analogue. In this paper, we present the genomic organization of the mouse and human SCRG1 loci, which exhibit a high degree of conservation. The genes are composed of three exons; the two downstream exons contain the protein coding region. The mouse gene is expressed in brain tissue essentially as a 0.7-kb message but also as a minor 2.6-kb mRNA. We have sequenced 20 kb of DNA at the mouse Scrg1 locus and found that the longer transcript is the prolongation of the 0.7-kb mRNA to a polyadenylation site located about 2 kb further downstream. Sequencing revealed that the mouse Scrg1 gene is physically linked to Sap30, a gene that encodes a protein of the histone deacetylase complex, and genetic linkage mapping assigned the localization of Scrg1 to chromosome 8 between Ant1 and Hmg2. Northern blot analysis showed that Scrg1 is under strict developmental control in mouse embryo and is expressed by cells of neuronal origin in vitro. Comparison of the rat, mouse, and human SCRG1 proteins identified a box of 35 identical contiguous amino acids and a characteristic cysteine distribution pattern defining a new protein signature. Copyright 2000 Academic Press.
Genome-wide identification, classification, and functional analysis of the basic helix-loop-helix transcription factors in the cattle, Bos Taurus.

PubMed

Li, Fengmei; Liu, Wuyi

2017-06-01

The basic helix-loop-helix (bHLH) transcription factors (TFs) form a huge superfamily and play crucial roles in many essential developmental, genetic, and physiological-biochemical processes of eukaryotes. In total, 109 putative bHLH TFs were identified and categorized successfully in the genomic databases of cattle, Bos Taurus, after removing redundant sequences and merging genetic isoforms. Through phylogenetic analyses, 105 proteins among these bHLH TFs were classified into 44 families with 46, 25, 14, 3, 13, and 4 members in the high-order groups A, B, C, D, E, and F, respectively. The remaining 4 bHLH proteins were sorted out as 'orphans.' Next, these 109 putative bHLH proteins identified were further characterized as significantly enriched in 524 significant Gene Ontology (GO) annotations (corrected P value ≤ 0.05) and 21 significantly enriched pathways (corrected P value ≤ 0.05) that had been mapped by the web server KOBAS 2.0. Furthermore, 95 bHLH proteins were further screened and analyzed together with two uncharacterized proteins in the STRING online database to reconstruct the protein-protein interaction network of cattle bHLH TFs. Ultimately, 89 bHLH proteins were fully mapped in a network with 67 biological process, 13 molecular functions, 5 KEGG pathways, 12 PFAM protein domains, and 25 INTERPRO classified protein domains and features. These results provide much useful information and a good reference for further functional investigations and updated researches on cattle bHLH TFs.
An atypical deletion of the Williams–Beuren syndrome interval implicates genes associated with defective visuospatial processing and autism

PubMed Central

Edelmann, Lisa; Prosnitz, Aaron; Pardo, Sherly; Bhatt, Jahnavi; Cohen, Ninette; Lauriat, Tara; Ouchanov, Leonid; González, Patricia J; Manghi, Elina R; Bondy, Pamela; Esquivel, Marcela; Monge, Silvia; Delgado, Marietha F; Splendore, Alessandra; Francke, Uta; Burton, Barbara K; McInnes, L Alison

2007-01-01

Background During a genetic study of autism, a female child who met diagnostic criteria for autism spectrum disorder, but also exhibited the cognitive–behavioural profile (CBP) associated with Williams–Beuren syndrome (WBS) was examined. The WBS CBP includes impaired visuospatial ability, an overly friendly personality, excessive non‐social anxiety and language delay. Methods Using array‐based comparative genomic hybridisation (aCGH), a deletion corresponding to BAC RP11‐89A20 in the distal end of the WBS deletion interval was detected. Hemizygosity was confirmed using fluorescence in situ hybridisation and fine mapping was performed by measuring the copy number of genomic DNA using quantitative polymerase chain reaction. Results The proximal breakpoint was mapped to intron 1 of GTF2IRD1 and the distal breakpoint lies 2.4–3.1 Mb towards the telomere. The subject was completely hemizygous for GTF2I, commonly deleted in carriers of the classic ∼1.5 Mb WBS deletion, and GTF2IRD2, deleted in carriers of the rare ∼1.84 Mb WBS deletion. Conclusion Hemizygosity of the GTF2 family of transcription factors is sufficient to produce many aspects of the WBS CBP, and particularly implicate the GTF2 transcription factors in the visuospatial construction deficit. Symptoms of autism in this case may be due to deletion of additional genes outside the typical WBS interval or remote effects on gene expression at other loci. PMID:16971481
A universal genomic coordinate translator for comparative genomics

PubMed Central

2014-01-01

Background Genomic duplications constitute major events in the evolution of species, allowing paralogous copies of genes to take on fine-tuned biological roles. Unambiguously identifying the orthology relationship between copies across multiple genomes can be resolved by synteny, i.e. the conserved order of genomic sequences. However, a comprehensive analysis of duplication events and their contributions to evolution would require all-to-all genome alignments, which increases at N2 with the number of available genomes, N. Results Here, we introduce Kraken, software that omits the all-to-all requirement by recursively traversing a graph of pairwise alignments and dynamically re-computing orthology. Kraken scales linearly with the number of targeted genomes, N, which allows for including large numbers of genomes in analyses. We first evaluated the method on the set of 12 Drosophila genomes, finding that orthologous correspondence computed indirectly through a graph of multiple synteny maps comes at minimal cost in terms of sensitivity, but reduces overall computational runtime by an order of magnitude. We then used the method on three well-annotated mammalian genomes, human, mouse, and rat, and show that up to 93% of protein coding transcripts have unambiguous pairwise orthologous relationships across the genomes. On a nucleotide level, 70 to 83% of exons match exactly at both splice junctions, and up to 97% on at least one junction. We last applied Kraken to an RNA-sequencing dataset from multiple vertebrates and diverse tissues, where we confirmed that brain-specific gene family members, i.e. one-to-many or many-to-many homologs, are more highly correlated across species than single-copy (i.e. one-to-one homologous) genes. Not limited to protein coding genes, Kraken also identifies thousands of newly identified transcribed loci, likely non-coding RNAs that are consistently transcribed in human, chimpanzee and gorilla, and maintain significant correlation of expression levels across species. Conclusions Kraken is a computational genome coordinate translator that facilitates cross-species comparisons, distinguishes orthologs from paralogs, and does not require costly all-to-all whole genome mappings. Kraken is freely available under LPGL from http://github.com/nedaz/kraken. PMID:24976580
A universal genomic coordinate translator for comparative genomics.

PubMed

Zamani, Neda; Sundström, Görel; Meadows, Jennifer R S; Höppner, Marc P; Dainat, Jacques; Lantz, Henrik; Haas, Brian J; Grabherr, Manfred G

2014-06-30

Genomic duplications constitute major events in the evolution of species, allowing paralogous copies of genes to take on fine-tuned biological roles. Unambiguously identifying the orthology relationship between copies across multiple genomes can be resolved by synteny, i.e. the conserved order of genomic sequences. However, a comprehensive analysis of duplication events and their contributions to evolution would require all-to-all genome alignments, which increases at N2 with the number of available genomes, N. Here, we introduce Kraken, software that omits the all-to-all requirement by recursively traversing a graph of pairwise alignments and dynamically re-computing orthology. Kraken scales linearly with the number of targeted genomes, N, which allows for including large numbers of genomes in analyses. We first evaluated the method on the set of 12 Drosophila genomes, finding that orthologous correspondence computed indirectly through a graph of multiple synteny maps comes at minimal cost in terms of sensitivity, but reduces overall computational runtime by an order of magnitude. We then used the method on three well-annotated mammalian genomes, human, mouse, and rat, and show that up to 93% of protein coding transcripts have unambiguous pairwise orthologous relationships across the genomes. On a nucleotide level, 70 to 83% of exons match exactly at both splice junctions, and up to 97% on at least one junction. We last applied Kraken to an RNA-sequencing dataset from multiple vertebrates and diverse tissues, where we confirmed that brain-specific gene family members, i.e. one-to-many or many-to-many homologs, are more highly correlated across species than single-copy (i.e. one-to-one homologous) genes. Not limited to protein coding genes, Kraken also identifies thousands of newly identified transcribed loci, likely non-coding RNAs that are consistently transcribed in human, chimpanzee and gorilla, and maintain significant correlation of expression levels across species. Kraken is a computational genome coordinate translator that facilitates cross-species comparisons, distinguishes orthologs from paralogs, and does not require costly all-to-all whole genome mappings. Kraken is freely available under LPGL from http://github.com/nedaz/kraken.
Genome-Wide Profiling of DNA Double-Strand Breaks by the BLESS and BLISS Methods.

PubMed

Mirzazadeh, Reza; Kallas, Tomasz; Bienko, Magda; Crosetto, Nicola

2018-01-01

DNA double-strand breaks (DSBs) are major DNA lesions that are constantly formed during physiological processes such as DNA replication, transcription, and recombination, or as a result of exogenous agents such as ionizing radiation, radiomimetic drugs, and genome editing nucleases. Unrepaired DSBs threaten genomic stability by leading to the formation of potentially oncogenic rearrangements such as translocations. In past few years, several methods based on next-generation sequencing (NGS) have been developed to study the genome-wide distribution of DSBs or their conversion to translocation events. We developed Breaks Labeling, Enrichment on Streptavidin, and Sequencing (BLESS), which was the first method for direct labeling of DSBs in situ followed by their genome-wide mapping at nucleotide resolution (Crosetto et al., Nat Methods 10:361-365, 2013). Recently, we have further expanded the quantitative nature, applicability, and scalability of BLESS by developing Breaks Labeling In Situ and Sequencing (BLISS) (Yan et al., Nat Commun 8:15058, 2017). Here, we first present an overview of existing methods for genome-wide localization of DSBs, and then focus on the BLESS and BLISS methods, discussing different assay design options depending on the sample type and application.
Epigenetic functions enriched in transcription factors binding to mouse recombination hotspots.

PubMed

Wu, Min; Kwoh, Chee-Keong; Przytycka, Teresa M; Li, Jing; Zheng, Jie

2012-06-21

The regulatory mechanism of recombination is a fundamental problem in genomics, with wide applications in genome-wide association studies, birth-defect diseases, molecular evolution, cancer research, etc. In mammalian genomes, recombination events cluster into short genomic regions called "recombination hotspots". Recently, a 13-mer motif enriched in hotspots is identified as a candidate cis-regulatory element of human recombination hotspots; moreover, a zinc finger protein, PRDM9, binds to this motif and is associated with variation of recombination phenotype in human and mouse genomes, thus is a trans-acting regulator of recombination hotspots. However, this pair of cis and trans-regulators covers only a fraction of hotspots, thus other regulators of recombination hotspots remain to be discovered. In this paper, we propose an approach to predicting additional trans-regulators from DNA-binding proteins by comparing their enrichment of binding sites in hotspots. Applying this approach on newly mapped mouse hotspots genome-wide, we confirmed that PRDM9 is a major trans-regulator of hotspots. In addition, a list of top candidate trans-regulators of mouse hotspots is reported. Using GO analysis we observed that the top genes are enriched with function of histone modification, highlighting the epigenetic regulatory mechanisms of recombination hotspots.
Epigenetic functions enriched in transcription factors binding to mouse recombination hotspots

PubMed Central

2012-01-01

The regulatory mechanism of recombination is a fundamental problem in genomics, with wide applications in genome-wide association studies, birth-defect diseases, molecular evolution, cancer research, etc. In mammalian genomes, recombination events cluster into short genomic regions called "recombination hotspots". Recently, a 13-mer motif enriched in hotspots is identified as a candidate cis-regulatory element of human recombination hotspots; moreover, a zinc finger protein, PRDM9, binds to this motif and is associated with variation of recombination phenotype in human and mouse genomes, thus is a trans-acting regulator of recombination hotspots. However, this pair of cis and trans-regulators covers only a fraction of hotspots, thus other regulators of recombination hotspots remain to be discovered. In this paper, we propose an approach to predicting additional trans-regulators from DNA-binding proteins by comparing their enrichment of binding sites in hotspots. Applying this approach on newly mapped mouse hotspots genome-wide, we confirmed that PRDM9 is a major trans-regulator of hotspots. In addition, a list of top candidate trans-regulators of mouse hotspots is reported. Using GO analysis we observed that the top genes are enriched with function of histone modification, highlighting the epigenetic regulatory mechanisms of recombination hotspots. PMID:22759569

Transcription facilitated genome-wide recruitment of topoisomerase I and DNA gyrase.

PubMed

Ahmed, Wareed; Sala, Claudia; Hegde, Shubhada R; Jha, Rajiv Kumar; Cole, Stewart T; Nagaraja, Valakunja

2017-05-01

Movement of the transcription machinery along a template alters DNA topology resulting in the accumulation of supercoils in DNA. The positive supercoils generated ahead of transcribing RNA polymerase (RNAP) and the negative supercoils accumulating behind impose severe topological constraints impeding transcription process. Previous studies have implied the role of topoisomerases in the removal of torsional stress and the maintenance of template topology but the in vivo interaction of functionally distinct topoisomerases with heterogeneous chromosomal territories is not deciphered. Moreover, how the transcription-induced supercoils influence the genome-wide recruitment of DNA topoisomerases remains to be explored in bacteria. Using ChIP-Seq, we show the genome-wide occupancy profile of both topoisomerase I and DNA gyrase in conjunction with RNAP in Mycobacterium tuberculosis taking advantage of minimal topoisomerase representation in the organism. The study unveils the first in vivo genome-wide interaction of both the topoisomerases with the genomic regions and establishes that transcription-induced supercoils govern their recruitment at genomic sites. Distribution profiles revealed co-localization of RNAP and the two topoisomerases on the active transcriptional units (TUs). At a given locus, topoisomerase I and DNA gyrase were localized behind and ahead of RNAP, respectively, correlating with the twin-supercoiled domains generated. The recruitment of topoisomerases was higher at the genomic loci with higher transcriptional activity and/or at regions under high torsional stress compared to silent genomic loci. Importantly, the occupancy of DNA gyrase, sole type II topoisomerase in Mtb, near the Ter domain of the Mtb chromosome validates its function as a decatenase.
Transcription facilitated genome-wide recruitment of topoisomerase I and DNA gyrase

PubMed Central

Ahmed, Wareed; Sala, Claudia; Hegde, Shubhada R.; Jha, Rajiv Kumar

2017-01-01

Movement of the transcription machinery along a template alters DNA topology resulting in the accumulation of supercoils in DNA. The positive supercoils generated ahead of transcribing RNA polymerase (RNAP) and the negative supercoils accumulating behind impose severe topological constraints impeding transcription process. Previous studies have implied the role of topoisomerases in the removal of torsional stress and the maintenance of template topology but the in vivo interaction of functionally distinct topoisomerases with heterogeneous chromosomal territories is not deciphered. Moreover, how the transcription-induced supercoils influence the genome-wide recruitment of DNA topoisomerases remains to be explored in bacteria. Using ChIP-Seq, we show the genome-wide occupancy profile of both topoisomerase I and DNA gyrase in conjunction with RNAP in Mycobacterium tuberculosis taking advantage of minimal topoisomerase representation in the organism. The study unveils the first in vivo genome-wide interaction of both the topoisomerases with the genomic regions and establishes that transcription-induced supercoils govern their recruitment at genomic sites. Distribution profiles revealed co-localization of RNAP and the two topoisomerases on the active transcriptional units (TUs). At a given locus, topoisomerase I and DNA gyrase were localized behind and ahead of RNAP, respectively, correlating with the twin-supercoiled domains generated. The recruitment of topoisomerases was higher at the genomic loci with higher transcriptional activity and/or at regions under high torsional stress compared to silent genomic loci. Importantly, the occupancy of DNA gyrase, sole type II topoisomerase in Mtb, near the Ter domain of the Mtb chromosome validates its function as a decatenase. PMID:28463980
RNAi Functions in Adaptive Reprogramming of the Genome | Center for Cancer Research

Cancer.gov

The regulation of transcribing DNA into RNA, including the production, processing, and degradation of RNA transcripts, affects the expression and the regulation of the genome in ways that are just beginning to be unraveled. A surprising discovery in recent years is that the vast majority of the genome is transcribed to yield an abundance of RNA transcripts. Many transcripts
Functional analysis and transcriptional output of the Göttingen minipig genome.

PubMed

Heckel, Tobias; Schmucki, Roland; Berrera, Marco; Ringshandl, Stephan; Badi, Laura; Steiner, Guido; Ravon, Morgane; Küng, Erich; Kuhn, Bernd; Kratochwil, Nicole A; Schmitt, Georg; Kiialainen, Anna; Nowaczyk, Corinne; Daff, Hamina; Khan, Azinwi Phina; Lekolool, Isaac; Pelle, Roger; Okoth, Edward; Bishop, Richard; Daubenberger, Claudia; Ebeling, Martin; Certa, Ulrich

2015-11-14

In the past decade the Göttingen minipig has gained increasing recognition as animal model in pharmaceutical and safety research because it recapitulates many aspects of human physiology and metabolism. Genome-based comparison of drug targets together with quantitative tissue expression analysis allows rational prediction of pharmacology and cross-reactivity of human drugs in animal models thereby improving drug attrition which is an important challenge in the process of drug development. Here we present a new chromosome level based version of the Göttingen minipig genome together with a comparative transcriptional analysis of tissues with pharmaceutical relevance as basis for translational research. We relied on mapping and assembly of WGS (whole-genome-shotgun sequencing) derived reads to the reference genome of the Duroc pig and predict 19,228 human orthologous protein-coding genes. Genome-based prediction of the sequence of human drug targets enables the prediction of drug cross-reactivity based on conservation of binding sites. We further support the finding that the genome of Sus scrofa contains about ten-times less pseudogenized genes compared to other vertebrates. Among the functional human orthologs of these minipig pseudogenes we found HEPN1, a putative tumor suppressor gene. The genomes of Sus scrofa, the Tibetan boar, the African Bushpig, and the Warthog show sequence conservation of all inactivating HEPN1 mutations suggesting disruption before the evolutionary split of these pig species. We identify 133 Sus scrofa specific, conserved long non-coding RNAs (lncRNAs) in the minipig genome and show that these transcripts are highly conserved in the African pigs and the Tibetan boar suggesting functional significance. Using a new minipig specific microarray we show high conservation of gene expression signatures in 13 tissues with biomedical relevance between humans and adult minipigs. We underline this relationship for minipig and human liver where we could demonstrate similar expression levels for most phase I drug-metabolizing enzymes. Higher expression levels and metabolic activities were found for FMO1, AKR/CRs and for phase II drug metabolizing enzymes in minipig as compared to human. The variability of gene expression in equivalent human and minipig tissues is considerably higher in minipig organs, which is important for study design in case a human target belongs to this variable category in the minipig. The first analysis of gene expression in multiple tissues during development from young to adult shows that the majority of transcriptional programs are concluded four weeks after birth. This finding is in line with the advanced state of human postnatal organ development at comparative age categories and further supports the minipig as model for pediatric drug safety studies. Genome based assessment of sequence conservation combined with gene expression data in several tissues improves the translational value of the minipig for human drug development. The genome and gene expression data presented here are important resources for researchers using the minipig as model for biomedical research or commercial breeding. Potential impact of our data for comparative genomics, translational research, and experimental medicine are discussed.
Transcriptome of the adult female malaria mosquito vector Anopheles albimanus.

PubMed

Martínez-Barnetche, Jesús; Gómez-Barreto, Rosa E; Ovilla-Muñoz, Marbella; Téllez-Sosa, Juan; García López, David E; Dinglasan, Rhoel R; Ubaida Mohien, Ceereena; MacCallum, Robert M; Redmond, Seth N; Gibbons, John G; Rokas, Antonis; Machado, Carlos A; Cazares-Raga, Febe E; González-Cerón, Lilia; Hernández-Martínez, Salvador; Rodríguez López, Mario H

2012-05-30

Human Malaria is transmitted by mosquitoes of the genus Anopheles. Transmission is a complex phenomenon involving biological and environmental factors of humans, parasites and mosquitoes. Among more than 500 anopheline species, only a few species from different branches of the mosquito evolutionary tree transmit malaria, suggesting that their vectorial capacity has evolved independently. Anopheles albimanus (subgenus Nyssorhynchus) is an important malaria vector in the Americas. The divergence time between Anopheles gambiae, the main malaria vector in Africa, and the Neotropical vectors has been estimated to be 100 My. To better understand the biological basis of malaria transmission and to develop novel and effective means of vector control, there is a need to explore the mosquito biology beyond the An. gambiae complex. We sequenced the transcriptome of the An. albimanus adult female. By combining Sanger, 454 and Illumina sequences from cDNA libraries derived from the midgut, cuticular fat body, dorsal vessel, salivary gland and whole body, we generated a single, high-quality assembly containing 16,669 transcripts, 92% of which mapped to the An. darlingi genome and covered 90% of the core eukaryotic genome. Bidirectional comparisons between the An. gambiae, An. darlingi and An. albimanus predicted proteomes allowed the identification of 3,772 putative orthologs. More than half of the transcripts had a match to proteins in other insect vectors and had an InterPro annotation. We identified several protein families that may be relevant to the study of Plasmodium-mosquito interaction. An open source transcript annotation browser called GDAV (Genome-Delinked Annotation Viewer) was developed to facilitate public access to the data generated by this and future transcriptome projects. We have explored the adult female transcriptome of one important New World malaria vector, An. albimanus. We identified protein-coding transcripts involved in biological processes that may be relevant to the Plasmodium lifecycle and can serve as the starting point for searching targets for novel control strategies. Our data increase the available genomic information regarding An. albimanus several hundred-fold, and will facilitate molecular research in medical entomology, evolutionary biology, genomics and proteomics of anopheline mosquito vectors. The data reported in this manuscript is accessible to the community via the VectorBase website (http://www.vectorbase.org/Other/AdditionalOrganisms/).
Improved methods and resources for paramecium genomics: transcription units, gene annotation and gene expression.

PubMed

Arnaiz, Olivier; Van Dijk, Erwin; Bétermier, Mireille; Lhuillier-Akakpo, Maoussi; de Vanssay, Augustin; Duharcourt, Sandra; Sallet, Erika; Gouzy, Jérôme; Sperling, Linda

2017-06-26

The 15 sibling species of the Paramecium aurelia cryptic species complex emerged after a whole genome duplication that occurred tens of millions of years ago. Given extensive knowledge of the genetics and epigenetics of Paramecium acquired over the last century, this species complex offers a uniquely powerful system to investigate the consequences of whole genome duplication in a unicellular eukaryote as well as the genetic and epigenetic mechanisms that drive speciation. High quality Paramecium gene models are important for research using this system. The major aim of the work reported here was to build an improved gene annotation pipeline for the Paramecium lineage. We generated oriented RNA-Seq transcriptome data across the sexual process of autogamy for the model species Paramecium tetraurelia. We determined, for the first time in a ciliate, candidate P. tetraurelia transcription start sites using an adapted Cap-Seq protocol. We developed TrUC, multi-threaded Perl software that in conjunction with TopHat mapping of RNA-Seq data to a reference genome, predicts transcription units for the annotation pipeline. We used EuGene software to combine annotation evidence. The high quality gene structural annotations obtained for P. tetraurelia were used as evidence to improve published annotations for 3 other Paramecium species. The RNA-Seq data were also used for differential gene expression analysis, providing a gene expression atlas that is more sensitive than the previously established microarray resource. We have developed a gene annotation pipeline tailored for the compact genomes and tiny introns of Paramecium species. A novel component of this pipeline, TrUC, predicts transcription units using Cap-Seq and oriented RNA-Seq data. TrUC could prove useful beyond Paramecium, especially in the case of high gene density. Accurate predictions of 3' and 5' UTR will be particularly valuable for studies of gene expression (e.g. nucleosome positioning, identification of cis regulatory motifs). The P. tetraurelia improved transcriptome resource, gene annotations for P. tetraurelia, P. biaurelia, P. sexaurelia and P. caudatum, and Paramecium-trained EuGene configuration are available through ParameciumDB ( http://paramecium.i2bc.paris-saclay.fr ). TrUC software is freely distributed under a GNU GPL v3 licence ( https://github.com/oarnaiz/TrUC ).
In Silico and Fluorescence In Situ Hybridization Mapping Reveals Collinearity between the Pennisetum squamulatum Apomixis Carrier-Chromosome and Chromosome 2 of Sorghum and Foxtail Millet.

PubMed

Sapkota, Sirjan; Conner, Joann A; Hanna, Wayne W; Simon, Bindu; Fengler, Kevin; Deschamps, Stéphane; Cigan, Mark; Ozias-Akins, Peggy

2016-01-01

Apomixis, or clonal propagation through seed, is a trait identified within multiple species of the grass family (Poaceae). The genetic locus controlling apomixis in Pennisetum squamulatum (syn Cenchrus squamulatus) and Cenchrus ciliaris (syn Pennisetum ciliare, buffelgrass) is the apospory-specific genomic region (ASGR). Previously, the ASGR was shown to be highly conserved but inverted in marker order between P. squamulatum and C. ciliaris based on fluorescence in situ hybridization (FISH) and varied in both karyotype and position of the ASGR on the ASGR-carrier chromosome among other apomictic Cenchrus/Pennisetum species. Using in silico transcript mapping and verification of physical positions of some of the transcripts via FISH, we discovered that the ASGR-carrier chromosome from P. squamulatum is collinear with chromosome 2 of foxtail millet and sorghum outside of the ASGR. The in silico ordering of the ASGR-carrier chromosome markers, previously unmapped in P. squamulatum, allowed for the identification of a backcross line with structural changes to the P. squamulatum ASGR-carrier chromosome derived from gamma irradiated pollen.
In Silico and Fluorescence In Situ Hybridization Mapping Reveals Collinearity between the Pennisetum squamulatum Apomixis Carrier-Chromosome and Chromosome 2 of Sorghum and Foxtail Millet

PubMed Central

Sapkota, Sirjan; Conner, Joann A.; Hanna, Wayne W.; Simon, Bindu; Fengler, Kevin; Deschamps, Stéphane; Cigan, Mark; Ozias-Akins, Peggy

2016-01-01

Apomixis, or clonal propagation through seed, is a trait identified within multiple species of the grass family (Poaceae). The genetic locus controlling apomixis in Pennisetum squamulatum (syn Cenchrus squamulatus) and Cenchrus ciliaris (syn Pennisetum ciliare, buffelgrass) is the apospory-specific genomic region (ASGR). Previously, the ASGR was shown to be highly conserved but inverted in marker order between P. squamulatum and C. ciliaris based on fluorescence in situ hybridization (FISH) and varied in both karyotype and position of the ASGR on the ASGR-carrier chromosome among other apomictic Cenchrus/Pennisetum species. Using in silico transcript mapping and verification of physical positions of some of the transcripts via FISH, we discovered that the ASGR-carrier chromosome from P. squamulatum is collinear with chromosome 2 of foxtail millet and sorghum outside of the ASGR. The in silico ordering of the ASGR-carrier chromosome markers, previously unmapped in P. squamulatum, allowed for the identification of a backcross line with structural changes to the P. squamulatum ASGR-carrier chromosome derived from gamma irradiated pollen. PMID:27031857
Mapping a candidate gene (MdMYB10) for red flesh and foliage colour in apple

PubMed Central

Chagné, David; Carlisle, Charmaine M; Blond, Céline; Volz, Richard K; Whitworth, Claire J; Oraguzie, Nnadozie C; Crowhurst, Ross N; Allan, Andrew C; Espley, Richard V; Hellens, Roger P; Gardiner, Susan E

2007-01-01

Background Integrating plant genomics and classical breeding is a challenge for both plant breeders and molecular biologists. Marker-assisted selection (MAS) is a tool that can be used to accelerate the development of novel apple varieties such as cultivars that have fruit with anthocyanin through to the core. In addition, determining the inheritance of novel alleles, such as the one responsible for red flesh, adds to our understanding of allelic variation. Our goal was to map candidate anthocyanin biosynthetic and regulatory genes in a population segregating for the red flesh phenotypes. Results We have identified the Rni locus, a major genetic determinant of the red foliage and red colour in the core of apple fruit. In a population segregating for the red flesh and foliage phenotype we have determined the inheritance of the Rni locus and DNA polymorphisms of candidate anthocyanin biosynthetic and regulatory genes. Simple Sequence Repeats (SSRs) and Single Nucleotide Polymorphisms (SNPs) in the candidate genes were also located on an apple genetic map. We have shown that the MdMYB10 gene co-segregates with the Rni locus and is on Linkage Group (LG) 09 of the apple genome. Conclusion We have performed candidate gene mapping in a fruit tree crop and have provided genetic evidence that red colouration in the fruit core as well as red foliage are both controlled by a single locus named Rni. We have shown that the transcription factor MdMYB10 may be the gene underlying Rni as there were no recombinants between the marker for this gene and the red phenotype in a population of 516 individuals. Associating markers derived from candidate genes with a desirable phenotypic trait has demonstrated the application of genomic tools in a breeding programme of a horticultural crop species. PMID:17608951
TRACTOR_DB: a database of regulatory networks in gamma-proteobacterial genomes

PubMed Central

González, Abel D.; Espinosa, Vladimir; Vasconcelos, Ana T.; Pérez-Rueda, Ernesto; Collado-Vides, Julio

2005-01-01

Experimental data on the Escherichia coli transcriptional regulatory system has been used in the past years to predict new regulatory elements (promoters, transcription factors (TFs), TFs' binding sites and operons) within its genome. As more genomes of gamma-proteobacteria are being sequenced, the prediction of these elements in a growing number of organisms has become more feasible, as a step towards the study of how different bacteria respond to environmental changes at the level of transcriptional regulation. In this work, we present TRACTOR_DB (TRAnscription FaCTORs' predicted binding sites in prokaryotic genomes), a relational database that contains computational predictions of new members of 74 regulons in 17 gamma-proteobacterial genomes. For these predictions we used a comparative genomics approach regarding which several proof-of-principle articles for large regulons have been published. TRACTOR_DB may be currently accessed at http://www.bioinfo.cu/Tractor_DB, http://www.tractor.lncc.br/ or at http://www.cifn.unam.mx/Computational_Genomics/tractorDB. Contact Email id is tractor@cifn.unam.mx. PMID:15608293
The abundance of homoeologue transcripts is disrupted by hybridization and is partially restored by genome doubling in synthetic hexaploid wheat.

PubMed

Hao, Ming; Li, Aili; Shi, Tongwei; Luo, Jiangtao; Zhang, Lianquan; Zhang, Xuechuan; Ning, Shunzong; Yuan, Zhongwei; Zeng, Deying; Kong, Xingchen; Li, Xiaolong; Zheng, Hongkun; Lan, Xiujin; Zhang, Huaigang; Zheng, Youliang; Mao, Long; Liu, Dengcai

2017-02-10

The formation of an allopolyploid is a two step process, comprising an initial wide hybridization event, which is later followed by a whole genome doubling. Both processes can affect the transcription of homoeologues. Here, RNA-Seq was used to obtain the genome-wide leaf transcriptome of two independent Triticum turgidum × Aegilops tauschii allotriploids (F1), along with their spontaneous allohexaploids (S1) and their parental lines. The resulting sequence data were then used to characterize variation in homoeologue transcript abundance. The hybridization event strongly down-regulated D-subgenome homoeologues, but this effect was in many cases reversed by whole genome doubling. The suppression of D-subgenome homoeologue transcription resulted in a marked frequency of parental transcription level dominance, especially with respect to genes encoding proteins involved in photosynthesis. Singletons (genes where no homoeologues were present) were frequently transcribed at both the allotriploid and allohexaploid plants. The implication is that whole genome doubling helps to overcome the phenotypic weakness of the allotriploid, restoring a more favourable gene dosage in genes experiencing transcription level dominance in hexaploid wheat.
Comparative Analysis of Transcription Factors Families across Fungal Tree of Life

DOE Office of Scientific and Technical Information (OSTI.GOV)

Salamov, Asaf; Grigoriev, Igor

2015-03-19

Transcription factors (TFs) are proteins that regulate the transcription of genes, by binding to specific DNA sequences. Based on literature (Shelest, 2008; Weirauch and Hughes,2011) collected and manually curated list of DBD Pfam domains (in total 62 DBD domains) We looked for distribution of TFs in 395 fungal genomes plus additionally in plant genomes (Phytozome), prokaryotes(IMG), some animals/metazoans and protists genomes
The 3D genome in transcriptional regulation and pluripotency.

PubMed

Gorkin, David U; Leung, Danny; Ren, Bing

2014-06-05

It can be convenient to think of the genome as simply a string of nucleotides, the linear order of which encodes an organism's genetic blueprint. However, the genome does not exist as a linear entity within cells where this blueprint is actually utilized. Inside the nucleus, the genome is organized in three-dimensional (3D) space, and lineage-specific transcriptional programs that direct stem cell fate are implemented in this native 3D context. Here, we review principles of 3D genome organization in mammalian cells. We focus on the emerging relationship between genome organization and lineage-specific transcriptional regulation, which we argue are inextricably linked. Copyright © 2014 Elsevier Inc. All rights reserved.
Transcriptome and proteomic analysis of mango (Mangifera indica Linn) fruits.

PubMed

Wu, Hong-xia; Jia, Hui-min; Ma, Xiao-wei; Wang, Song-biao; Yao, Quan-sheng; Xu, Wen-tian; Zhou, Yi-gang; Gao, Zhong-shan; Zhan, Ru-lin

2014-06-13

Here we used Illumina RNA-seq technology for transcriptome sequencing of a mixed fruit sample from 'Zill' mango (Mangifera indica Linn) fruit pericarp and pulp during the development and ripening stages. RNA-seq generated 68,419,722 sequence reads that were assembled into 54,207 transcripts with a mean length of 858bp, including 26,413 clusters and 27,794 singletons. A total of 42,515(78.43%) transcripts were annotated using public protein databases, with a cut-off E-value above 10(-5), of which 35,198 and 14,619 transcripts were assigned to gene ontology terms and clusters of orthologous groups respectively. Functional annotation against the Kyoto Encyclopedia of Genes and Genomes database identified 23,741(43.79%) transcripts which were mapped to 128 pathways. These pathways revealed many previously unknown transcripts. We also applied mass spectrometry-based transcriptome data to characterize the proteome of ripe fruit. LC-MS/MS analysis of the mango fruit proteome was using tandem mass spectrometry (MS/MS) in an LTQ Orbitrap Velos (Thermo) coupled online to the HPLC. This approach enabled the identification of 7536 peptides that matched 2754 proteins. Our study provides a comprehensive sequence for a systemic view of transcriptome during mango fruit development and the most comprehensive fruit proteome to date, which are useful for further genomics research and proteomic studies. Our study provides a comprehensive sequence for a systemic view of both the transcriptome and proteome of mango fruit, and a valuable reference for further research on gene expression and protein identification. This article is part of a Special Issue entitled: Proteomics of non-model organisms. Copyright © 2014 Elsevier B.V. All rights reserved.
ChIP-nexus: a novel ChIP-exo protocol for improved detection of in vivo transcription factor binding footprints

PubMed Central

He, Qiye; Johnston, Jeff; Zeitlinger, Julia

2014-01-01

Understanding how eukaryotic enhancers are bound and regulated by specific combinations of transcription factors is still a major challenge. To better map transcription factor binding genome-wide at nucleotide resolution in vivo, we have developed a robust ChIP-exo protocol called ChIP experiments with nucleotide resolution through exonuclease, unique barcode and single ligation (ChIP-nexus), which utilizes an efficient DNA self-circularization step during library preparation. Application of ChIP-nexus to four proteins—human TBP and Drosophila NFkB, Twist and Max— demonstrates that it outperforms existing ChIP protocols in resolution and specificity, pinpoints relevant binding sites within enhancers containing multiple binding motifs and allows the analysis of in vivo binding specificities. Notably, we show that Max frequently interacts with DNA sequences next to its motif, and that this binding pattern correlates with local DNA sequence features such as DNA shape. ChIP-nexus will be broadly applicable to studying in vivo transcription factor binding specificity and its relationship to cis-regulatory changes in humans and model organisms. PMID:25751057
Mapping RNA-seq Reads with STAR

PubMed Central

Dobin, Alexander; Gingeras, Thomas R.

2015-01-01

Mapping of large sets of high-throughput sequencing reads to a reference genome is one of the foundational steps in RNA-seq data analysis. The STAR software package performs this task with high levels of accuracy and speed. In addition to detecting annotated and novel splice junctions, STAR is capable of discovering more complex RNA sequence arrangements, such as chimeric and circular RNA. STAR can align spliced sequences of any length with moderate error rates providing scalability for emerging sequencing technologies. STAR generates output files that can be used for many downstream analyses such as transcript/gene expression quantification, differential gene expression, novel isoform reconstruction, signal visualization, and so forth. In this unit we describe computational protocols that produce various output files, use different RNA-seq datatypes, and utilize different mapping strategies. STAR is Open Source software that can be run on Unix, Linux or Mac OS X systems. PMID:26334920
Mapping RNA-seq Reads with STAR.

PubMed

Dobin, Alexander; Gingeras, Thomas R

2015-09-03

Mapping of large sets of high-throughput sequencing reads to a reference genome is one of the foundational steps in RNA-seq data analysis. The STAR software package performs this task with high levels of accuracy and speed. In addition to detecting annotated and novel splice junctions, STAR is capable of discovering more complex RNA sequence arrangements, such as chimeric and circular RNA. STAR can align spliced sequences of any length with moderate error rates, providing scalability for emerging sequencing technologies. STAR generates output files that can be used for many downstream analyses such as transcript/gene expression quantification, differential gene expression, novel isoform reconstruction, and signal visualization. In this unit, we describe computational protocols that produce various output files, use different RNA-seq datatypes, and utilize different mapping strategies. STAR is open source software that can be run on Unix, Linux, or Mac OS X systems. Copyright © 2015 John Wiley & Sons, Inc.
Histone modifications influence mediator interactions with chromatin

PubMed Central

Zhu, Xuefeng; Zhang, Yongqiang; Bjornsdottir, Gudrun; Liu, Zhongle; Quan, Amy; Costanzo, Michael; Dávila López, Marcela; Westholm, Jakub Orzechowski; Ronne, Hans; Boone, Charles; Gustafsson, Claes M.; Myers, Lawrence C.

2011-01-01

The Mediator complex transmits activation signals from DNA bound transcription factors to the core transcription machinery. Genome wide localization studies have demonstrated that Mediator occupancy not only correlates with high levels of transcription, but that the complex also is present at transcriptionally silenced locations. We provide evidence that Mediator localization is guided by an interaction with histone tails, and that this interaction is regulated by their post-translational modifications. A quantitative, high-density genetic interaction map revealed links between Mediator components and factors affecting chromatin structure, especially histone deacetylases. Peptide binding assays demonstrated that pure wild-type Mediator forms stable complexes with the tails of Histone H3 and H4. These binding assays also showed Mediator—histone H4 peptide interactions are specifically inhibited by acetylation of the histone H4 lysine 16, a residue critical in transcriptional silencing. Finally, these findings were validated by tiling array analysis that revealed a broad correlation between Mediator and nucleosome occupancy in vivo, but a negative correlation between Mediator and nucleosomes acetylated at histone H4 lysine 16. Our studies show that chromatin structure and the acetylation state of histones are intimately connected to Mediator localization. PMID:21742760
Evidence for Transcript Networks Composed of Chimeric RNAs in Human Cells

PubMed Central

Borel, Christelle; Mudge, Jonathan M.; Howald, Cédric; Foissac, Sylvain; Ucla, Catherine; Chrast, Jacqueline; Ribeca, Paolo; Martin, David; Murray, Ryan R.; Yang, Xinping; Ghamsari, Lila; Lin, Chenwei; Bell, Ian; Dumais, Erica; Drenkow, Jorg; Tress, Michael L.; Gelpí, Josep Lluís; Orozco, Modesto; Valencia, Alfonso; van Berkum, Nynke L.; Lajoie, Bryan R.; Vidal, Marc; Stamatoyannopoulos, John; Batut, Philippe; Dobin, Alex; Harrow, Jennifer; Hubbard, Tim; Dekker, Job; Frankish, Adam; Salehi-Ashtiani, Kourosh; Reymond, Alexandre; Antonarakis, Stylianos E.; Guigó, Roderic; Gingeras, Thomas R.

2012-01-01

The classic organization of a gene structure has followed the Jacob and Monod bacterial gene model proposed more than 50 years ago. Since then, empirical determinations of the complexity of the transcriptomes found in yeast to human has blurred the definition and physical boundaries of genes. Using multiple analysis approaches we have characterized individual gene boundaries mapping on human chromosomes 21 and 22. Analyses of the locations of the 5′ and 3′ transcriptional termini of 492 protein coding genes revealed that for 85% of these genes the boundaries extend beyond the current annotated termini, most often connecting with exons of transcripts from other well annotated genes. The biological and evolutionary importance of these chimeric transcripts is underscored by (1) the non-random interconnections of genes involved, (2) the greater phylogenetic depth of the genes involved in many chimeric interactions, (3) the coordination of the expression of connected genes and (4) the close in vivo and three dimensional proximity of the genomic regions being transcribed and contributing to parts of the chimeric RNAs. The non-random nature of the connection of the genes involved suggest that chimeric transcripts should not be studied in isolation, but together, as an RNA network. PMID:22238572
RegPrecise 3.0--a resource for genome-scale exploration of transcriptional regulation in bacteria.

PubMed

Novichkov, Pavel S; Kazakov, Alexey E; Ravcheev, Dmitry A; Leyn, Semen A; Kovaleva, Galina Y; Sutormin, Roman A; Kazanov, Marat D; Riehl, William; Arkin, Adam P; Dubchak, Inna; Rodionov, Dmitry A

2013-11-01

Genome-scale prediction of gene regulation and reconstruction of transcriptional regulatory networks in prokaryotes is one of the critical tasks of modern genomics. Bacteria from different taxonomic groups, whose lifestyles and natural environments are substantially different, possess highly diverged transcriptional regulatory networks. The comparative genomics approaches are useful for in silico reconstruction of bacterial regulons and networks operated by both transcription factors (TFs) and RNA regulatory elements (riboswitches). RegPrecise (http://regprecise.lbl.gov) is a web resource for collection, visualization and analysis of transcriptional regulons reconstructed by comparative genomics. We significantly expanded a reference collection of manually curated regulons we introduced earlier. RegPrecise 3.0 provides access to inferred regulatory interactions organized by phylogenetic, structural and functional properties. Taxonomy-specific collections include 781 TF regulogs inferred in more than 160 genomes representing 14 taxonomic groups of Bacteria. TF-specific collections include regulogs for a selected subset of 40 TFs reconstructed across more than 30 taxonomic lineages. Novel collections of regulons operated by RNA regulatory elements (riboswitches) include near 400 regulogs inferred in 24 bacterial lineages. RegPrecise 3.0 provides four classifications of the reference regulons implemented as controlled vocabularies: 55 TF protein families; 43 RNA motif families; ~150 biological processes or metabolic pathways; and ~200 effectors or environmental signals. Genome-wide visualization of regulatory networks and metabolic pathways covered by the reference regulons are available for all studied genomes. A separate section of RegPrecise 3.0 contains draft regulatory networks in 640 genomes obtained by an conservative propagation of the reference regulons to closely related genomes. RegPrecise 3.0 gives access to the transcriptional regulons reconstructed in bacterial genomes. Analytical capabilities include exploration of: regulon content, structure and function; TF binding site motifs; conservation and variations in genome-wide regulatory networks across all taxonomic groups of Bacteria. RegPrecise 3.0 was selected as a core resource on transcriptional regulation of the Department of Energy Systems Biology Knowledgebase, an emerging software and data environment designed to enable researchers to collaboratively generate, test and share new hypotheses about gene and protein functions, perform large-scale analyses, and model interactions in microbes, plants, and their communities.

Advances in Exercise, Fitness, and Performance Genomics in 2011

PubMed Central

Roth, Stephen M.; Rankinen, Tuomo; Hagberg, James M.; Loos, Ruth J. F.; Pérusse, Louis; Sarzynski, Mark A.; Wolfarth, Bernd; Bouchard, Claude

2014-01-01

This review of the exercise genomics literature emphasizes the highest quality papers published in 2011. Given this emphasis on the best publications, only a small number of published papers are reviewed. One study found that physical activity levels were significantly lower in patients with mitochondrial DNA mutations compared to controls. A two-stage fine mapping follow-up of a previous linkage peak found strong associations between sequence variation in the activin A receptor, type-1B (ACVR1B) gene and knee extensor strength, with rs2854464 emerging as the most promising candidate polymorphism. The association of higher muscular strength with the rs2854464 A-allele was confirmed in two separate cohorts. A study using a combination of transcriptomic and genomic data identified a comprehensive map of the transcriptomic features important for aerobic exercise training-induced improvements in maximal oxygen consumption, but no genetic variants derived from candidate transcripts were associated with trainability. A large-scale de novo meta-analysis confirmed that the effect of sequence variation in the fat mass and obesity-associated (FTO) gene on the risk of obesity differs between sedentary and physically active adults. Evidence for gene-physical activity interactions on type 2 diabetes risk was found in two separate studies. A large study of women found that physical activity modified the effect of polymorphisms in the lipoprotein lipase (LPL), hepatic lipase (LIPC), and cholesteryl ester transfer protein (CETP) genes, identified in previous genome-wide association study (GWAS) reports, on HDL-C. We conclude that a strong exercise genomics corpus of evidence would not only translate into powerful genomic predictors but would also have a major impact on exercise biology and exercise behavior research. PMID:22330029
Global Identification and Characterization of Transcriptionally Active Regions in the Rice Genome

PubMed Central

Stolc, Viktor; Deng, Wei; He, Hang; Korbel, Jan; Chen, Xuewei; Tongprasit, Waraporn; Ronald, Pamela; Chen, Runsheng; Gerstein, Mark; Wang Deng, Xing

2007-01-01

Genome tiling microarray studies have consistently documented rich transcriptional activity beyond the annotated genes. However, systematic characterization and transcriptional profiling of the putative novel transcripts on the genome scale are still lacking. We report here the identification of 25,352 and 27,744 transcriptionally active regions (TARs) not encoded by annotated exons in the rice (Oryza. sativa) subspecies japonica and indica, respectively. The non-exonic TARs account for approximately two thirds of the total TARs detected by tiling arrays and represent transcripts likely conserved between japonica and indica. Transcription of 21,018 (83%) japonica non-exonic TARs was verified through expression profiling in 10 tissue types using a re-array in which annotated genes and TARs were each represented by five independent probes. Subsequent analyses indicate that about 80% of the japonica TARs that were not assigned to annotated exons can be assigned to various putatively functional or structural elements of the rice genome, including splice variants, uncharacterized portions of incompletely annotated genes, antisense transcripts, duplicated gene fragments, and potential non-coding RNAs. These results provide a systematic characterization of non-exonic transcripts in rice and thus expand the current view of the complexity and dynamics of the rice transcriptome. PMID:17372628
Deciphering RNA regulatory elements in trypanosomatids: one piece at a time or genome-wide?

PubMed

Gazestani, Vahid H; Lu, Zhiquan; Salavati, Reza

2014-05-01

Morphological and metabolic changes in the life cycle of Trypanosoma brucei are accomplished by precise regulation of hundreds of genes. In the absence of transcriptional control, RNA-binding proteins (RBPs) shape the structure of gene regulatory maps in this organism, but our knowledge about their target RNAs, binding sites, and mechanisms of action is far from complete. Although recent technological advances have revolutionized the RBP-based approaches, the main framework for the RNA regulatory element (RRE)-based approaches has not changed over the last two decades in T. brucei. In this Opinion, after highlighting the current challenges in RRE inference, we explain some genome-wide solutions that can significantly boost our current understanding about gene regulatory networks in T. brucei. Copyright © 2014 Elsevier Ltd. All rights reserved.
Information theory applications for biological sequence analysis.

PubMed

Vinga, Susana

2014-05-01

Information theory (IT) addresses the analysis of communication systems and has been widely applied in molecular biology. In particular, alignment-free sequence analysis and comparison greatly benefited from concepts derived from IT, such as entropy and mutual information. This review covers several aspects of IT applications, ranging from genome global analysis and comparison, including block-entropy estimation and resolution-free metrics based on iterative maps, to local analysis, comprising the classification of motifs, prediction of transcription factor binding sites and sequence characterization based on linguistic complexity and entropic profiles. IT has also been applied to high-level correlations that combine DNA, RNA or protein features with sequence-independent properties, such as gene mapping and phenotype analysis, and has also provided models based on communication systems theory to describe information transmission channels at the cell level and also during evolutionary processes. While not exhaustive, this review attempts to categorize existing methods and to indicate their relation with broader transversal topics such as genomic signatures, data compression and complexity, time series analysis and phylogenetic classification, providing a resource for future developments in this promising area.
Defining the location of promoter-associated R-loops at near-nucleotide resolution using bisDRIP-seq

PubMed Central

Dumelie, Jason G

2017-01-01

R-loops are features of chromatin consisting of a strand of DNA hybridized to RNA, as well as the expelled complementary DNA strand. R-loops are enriched at promoters where they have recently been shown to have important roles in modifying gene expression. However, the location of promoter-associated R-loops and the genomic domains they perturb to modify gene expression remain unclear. To resolve this issue, we developed a bisulfite-based approach, bisDRIP-seq, to map R-loops across the genome at near-nucleotide resolution in MCF-7 cells. We found the location of promoter-associated R-loops is dependent on the presence of introns. In intron-containing genes, R-loops are bounded between the transcription start site and the first exon-intron junction. In intronless genes, the 3' boundary displays gene-specific heterogeneity. Moreover, intronless genes are often associated with promoter-associated R-loop formation. Together, these studies provide a high-resolution map of R-loops and identify gene structure as a critical determinant of R-loop formation. PMID:29072160
A genome-wide map of hyper-edited RNA reveals numerous new sites.

PubMed

Porath, Hagit T; Carmi, Shai; Levanon, Erez Y

2014-08-27

Adenosine-to-inosine editing is one of the most frequent post-transcriptional modifications, manifested as A-to-G mismatches when comparing RNA sequences with their source DNA. Recently, a number of RNA-seq data sets have been screened for the presence of A-to-G editing, and hundreds of thousands of editing sites identified. Here we show that existing screens missed the majority of sites by ignoring reads with excessive ('hyper') editing that do not easily align to the genome. We show that careful alignment and examination of the unmapped reads in RNA-seq studies reveal numerous new sites, usually many more than originally discovered, and in precisely those regions that are most heavily edited. Specifically, we discover 327,096 new editing sites in the heavily studied Illumina Human BodyMap data and more than double the number of detected sites in several published screens. We also identify thousands of new sites in mouse, rat, opossum and fly. Our results establish that hyper-editing events account for the majority of editing sites.
A hybrid BAC physical map of potato: a framework for sequencing a heterozygous genome

PubMed Central

2011-01-01

Background Potato is the world's third most important food crop, yet cultivar improvement and genomic research in general remain difficult because of the heterozygous and tetraploid nature of its genome. The development of physical map resources that can facilitate genomic analyses in potato has so far been very limited. Here we present the methods of construction and the general statistics of the first two genome-wide BAC physical maps of potato, which were made from the heterozygous diploid clone RH89-039-16 (RH). Results First, a gel electrophoresis-based physical map was made by AFLP fingerprinting of 64478 BAC clones, which were aligned into 4150 contigs with an estimated total length of 1361 Mb. Screening of BAC pools, followed by the KeyMaps in silico anchoring procedure, identified 1725 AFLP markers in the physical map, and 1252 BAC contigs were anchored the ultradense potato genetic map. A second, sequence-tag-based physical map was constructed from 65919 whole genome profiling (WGP) BAC fingerprints and these were aligned into 3601 BAC contigs spanning 1396 Mb. The 39733 BAC clones that overlap between both physical maps provided anchors to 1127 contigs in the WGP physical map, and reduced the number of contigs to around 2800 in each map separately. Both physical maps were 1.64 times longer than the 850 Mb potato genome. Genome heterozygosity and incomplete merging of BAC contigs are two factors that can explain this map inflation. The contig information of both physical maps was united in a single table that describes hybrid potato physical map. Conclusions The AFLP physical map has already been used by the Potato Genome Sequencing Consortium for sequencing 10% of the heterozygous genome of clone RH on a BAC-by-BAC basis. By layering a new WGP physical map on top of the AFLP physical map, a genetically anchored genome-wide framework of 322434 sequence tags has been created. This reference framework can be used for anchoring and ordering of genomic sequences of clone RH (and other potato genotypes), and opens the possibility to finish sequencing of the RH genome in a more efficient way via high throughput next generation approaches. PMID:22142254
RECQL5 Controls Transcript Elongation and Suppresses Genome Instability Associated with Transcription Stress

PubMed Central

Saponaro, Marco; Kantidakis, Theodoros; Mitter, Richard; Kelly, Gavin P.; Heron, Mark; Williams, Hannah; Söding, Johannes; Stewart, Aengus; Svejstrup, Jesper Q.

2014-01-01

Summary RECQL5 is the sole member of the RECQ family of helicases associated with RNA polymerase II (RNAPII). We now show that RECQL5 is a general elongation factor that is important for preserving genome stability during transcription. Depletion or overexpression of RECQL5 results in corresponding shifts in the genome-wide RNAPII density profile. Elongation is particularly affected, with RECQL5 depletion causing a striking increase in the average rate, concurrent with increased stalling, pausing, arrest, and/or backtracking (transcription stress). RECQL5 therefore controls the movement of RNAPII across genes. Loss of RECQL5 also results in the loss or gain of genomic regions, with the breakpoints of lost regions located in genes and common fragile sites. The chromosomal breakpoints overlap with areas of elevated transcription stress, suggesting that RECQL5 suppresses such stress and its detrimental effects, and thereby prevents genome instability in the transcribed region of genes. PMID:24836610
T antigen mutations are a human tumor-specific signature for Merkel cell polyomavirus

PubMed Central

Shuda, Masahiro; Feng, Huichen; Kwun, Hyun Jin; Rosen, Steven T.; Gjoerup, Ole; Moore, Patrick S.; Chang, Yuan

2008-01-01

Merkel cell polyomavirus (MCV) is a virus discovered in our laboratory at the University of Pittsburgh that is monoclonally integrated into the genome of ≈80% of human Merkel cell carcinomas (MCCs). Transcript mapping was performed to show that MCV expresses transcripts in MCCs similar to large T (LT), small T (ST), and 17kT transcripts of SV40. Nine MCC tumor-derived LT genomic sequences have been examined, and all were found to harbor mutations prematurely truncating the MCV LT helicase. In contrast, four presumed episomal viruses from nontumor sources did not possess this T antigen signature mutation. Using coimmunoprecipitation and origin replication assays, we show that tumor-derived virus mutations do not affect retinoblastoma tumor suppressor protein (Rb) binding by LT but do eliminate viral DNA replication capacity. Identification of an MCC cell line (MKL-1) having monoclonal MCV integration and the signature LT mutation allowed us to functionally test both tumor-derived and wild type (WT) T antigens. Only WT LT expression activates replication of integrated MCV DNA in MKL-1 cells. Our findings suggest that MCV-positive MCC tumor cells undergo selection for LT mutations to prevent autoactivation of integrated virus replication that would be detrimental to cell survival. Because these mutations render the virus replication-incompetent, MCV is not a “passenger virus” that secondarily infects MCC tumors. PMID:18812503
The Medicago truncatula GRAS protein RAD1 supports arbuscular mycorrhiza symbiosis and Phytophthora palmivora susceptibility.

PubMed

Rey, Thomas; Bonhomme, Maxime; Chatterjee, Abhishek; Gavrin, Aleksandr; Toulotte, Justine; Yang, Weibing; André, Olivier; Jacquet, Christophe; Schornack, Sebastian

2017-12-16

The roots of most land plants are colonized by symbiotic arbuscular mycorrhiza (AM) fungi. To facilitate this symbiosis, plant genomes encode a set of genes required for microbial perception and accommodation. However, the extent to which infection by filamentous root pathogens also relies on some of these genes remains an open question. Here, we used genome-wide association mapping to identify genes contributing to colonization of Medicago truncatula roots by the pathogenic oomycete Phytophthora palmivora. Single-nucleotide polymorphism (SNP) markers most significantly associated with plant colonization response were identified upstream of RAD1, which encodes a GRAS transcription regulator first negatively implicated in root nodule symbiosis and recently identified as a positive regulator of AM symbiosis. RAD1 transcript levels are up-regulated both in response to AM fungus and, to a lower extent, in infected tissues by P. palmivora where its expression is restricted to root cortex cells proximal to pathogen hyphae. Reverse genetics showed that reduction of RAD1 transcript levels as well as a rad1 mutant are impaired in their full colonization by AM fungi as well as by P. palmivora. Thus, the importance of RAD1 extends beyond symbiotic interactions, suggesting a general involvement in M. truncatula microbe-induced root development and interactions with unrelated beneficial and detrimental filamentous microbes. © The Author 2017. Published by Oxford University Press on behalf of the Society for Experimental Biology.
Genetic and epigenetic variation in 5S ribosomal RNA genes reveals genome dynamics in Arabidopsis thaliana

PubMed Central

Simon, Lauriane; Rabanal, Fernando A; Dubos, Tristan; Oliver, Cecilia; Lauber, Damien; Poulet, Axel; Vogt, Alexander; Mandlbauer, Ariane; Le Goff, Samuel; Sommer, Andreas; Duborjal, Hervé; Tatout, Christophe

2018-01-01

Abstract Organized in tandem repeat arrays in most eukaryotes and transcribed by RNA polymerase III, expression of 5S rRNA genes is under epigenetic control. To unveil mechanisms of transcriptional regulation, we obtained here in depth sequence information on 5S rRNA genes from the Arabidopsis thaliana genome and identified differential enrichment in epigenetic marks between the three 5S rDNA loci situated on chromosomes 3, 4 and 5. We reveal the chromosome 5 locus as the major source of an atypical, long 5S rRNA transcript characteristic of an open chromatin structure. 5S rRNA genes from this locus translocated in the Landsberg erecta ecotype as shown by linkage mapping and chromosome-specific FISH analysis. These variations in 5S rDNA locus organization cause changes in the spatial arrangement of chromosomes in the nucleus. Furthermore, 5S rRNA gene arrangements are highly dynamic with alterations in chromosomal positions through translocations in certain mutants of the RNA-directed DNA methylation pathway and important copy number variations among ecotypes. Finally, variations in 5S rRNA gene sequence, chromatin organization and transcripts indicate differential usage of 5S rDNA loci in distinct ecotypes. We suggest that both the usage of existing and new 5S rDNA loci resulting from translocations may impact neighboring chromatin organization. PMID:29518237
Genetic and epigenetic variation in 5S ribosomal RNA genes reveals genome dynamics in Arabidopsis thaliana.

PubMed

Simon, Lauriane; Rabanal, Fernando A; Dubos, Tristan; Oliver, Cecilia; Lauber, Damien; Poulet, Axel; Vogt, Alexander; Mandlbauer, Ariane; Le Goff, Samuel; Sommer, Andreas; Duborjal, Hervé; Tatout, Christophe; Probst, Aline V

2018-04-06

Organized in tandem repeat arrays in most eukaryotes and transcribed by RNA polymerase III, expression of 5S rRNA genes is under epigenetic control. To unveil mechanisms of transcriptional regulation, we obtained here in depth sequence information on 5S rRNA genes from the Arabidopsis thaliana genome and identified differential enrichment in epigenetic marks between the three 5S rDNA loci situated on chromosomes 3, 4 and 5. We reveal the chromosome 5 locus as the major source of an atypical, long 5S rRNA transcript characteristic of an open chromatin structure. 5S rRNA genes from this locus translocated in the Landsberg erecta ecotype as shown by linkage mapping and chromosome-specific FISH analysis. These variations in 5S rDNA locus organization cause changes in the spatial arrangement of chromosomes in the nucleus. Furthermore, 5S rRNA gene arrangements are highly dynamic with alterations in chromosomal positions through translocations in certain mutants of the RNA-directed DNA methylation pathway and important copy number variations among ecotypes. Finally, variations in 5S rRNA gene sequence, chromatin organization and transcripts indicate differential usage of 5S rDNA loci in distinct ecotypes. We suggest that both the usage of existing and new 5S rDNA loci resulting from translocations may impact neighboring chromatin organization.
A reference genetic linkage map of apomictic Hieracium species based on expressed markers derived from developing ovule transcripts

PubMed Central

Shirasawa, Kenta; Hand, Melanie L.; Henderson, Steven T.; Okada, Takashi; Johnson, Susan D.; Taylor, Jennifer M.; Spriggs, Andrew; Siddons, Hayley; Hirakawa, Hideki; Isobe, Sachiko; Tabata, Satoshi; Koltunow, Anna M. G.

2015-01-01

Background and Aims Apomixis in plants generates clonal progeny with a maternal genotype through asexual seed formation. Hieracium subgenus Pilosella (Asteraceae) contains polyploid, highly heterozygous apomictic and sexual species. Within apomictic Hieracium, dominant genetic loci independently regulate the qualitative developmental components of apomixis. In H. praealtum, LOSS OF APOMEIOSIS (LOA) enables formation of embryo sacs without meiosis and LOSS OF PARTHENOGENESIS (LOP) enables fertilization-independent seed formation. A locus required for fertilization-independent endosperm formation (AutE) has been identified in H. piloselloides. Additional quantitative loci appear to influence the penetrance of the qualitative loci, although the controlling genes remain unknown. This study aimed to develop the first genetic linkage maps for sexual and apomictic Hieracium species using simple sequence repeat (SSR) markers derived from expressed transcripts within the developing ovaries. Methods RNA from microdissected Hieracium ovule cell types and ovaries was sequenced and SSRs were identified. Two different F1 mapping populations were created to overcome difficulties associated with genome complexity and asexual reproduction. SSR markers were analysed within each mapping population to generate draft linkage maps for apomictic and sexual Hieracium species. Key Results A collection of 14 684 Hieracium expressed SSR markers were developed and linkage maps were constructed for Hieracium species using a subset of the SSR markers. Both the LOA and LOP loci were successfully assigned to linkage groups; however, AutE could not be mapped using the current populations. Comparisons with lettuce (Lactuca sativa) revealed partial macrosynteny between the two Asteraceae species. Conclusions A collection of SSR markers and draft linkage maps were developed for two apomictic and one sexual Hieracium species. These maps will support cloning of controlling genes at LOA and LOP loci in Hieracium and should also assist with identification of quantitative loci that affect the expressivity of apomixis. Future work will focus on mapping AutE using alternative populations. PMID:25538115
A resource for characterizing genome-wide binding and putative target genes of transcription factors expressed during secondary growth and wood formation in Populus

Treesearch

Lijun Liu; Trevor Ramsay; Matthew S. Zinkgraf; David Sundell; Nathaniel Robert Street; Vladimir Filkov; Andrew Groover

2015-01-01

Identifying transcription factor target genes is essential for modeling the transcriptional networks underlying developmental processes. Here we report a chromatin immunoprecipitation sequencing (ChIP-seq) resource consisting of genome-wide binding regions and associated putative target genes for four Populus homeodomain transcription factors...
Genomic identification of regulatory elements by evolutionary sequence comparison and functional analysis.

PubMed

Loots, Gabriela G

2008-01-01

Despite remarkable recent advances in genomics that have enabled us to identify most of the genes in the human genome, comparable efforts to define transcriptional cis-regulatory elements that control gene expression are lagging behind. The difficulty of this task stems from two equally important problems: our knowledge of how regulatory elements are encoded in genomes remains elementary, and there is a vast genomic search space for regulatory elements, since most of mammalian genomes are noncoding. Comparative genomic approaches are having a remarkable impact on the study of transcriptional regulation in eukaryotes and currently represent the most efficient and reliable methods of predicting noncoding sequences likely to control the patterns of gene expression. By subjecting eukaryotic genomic sequences to computational comparisons and subsequent experimentation, we are inching our way toward a more comprehensive catalog of common regulatory motifs that lie behind fundamental biological processes. We are still far from comprehending how the transcriptional regulatory code is encrypted in the human genome and providing an initial global view of regulatory gene networks, but collectively, the continued development of comparative and experimental approaches will rapidly expand our knowledge of the transcriptional regulome.
Evolutionary Story of a Satellite DNA from Phodopus sungorus (Rodentia, Cricetidae)

PubMed Central

Paço, Ana; Adega, Filomena; Meštrović, Nevenka; Plohl, Miroslav; Chaves, Raquel

2014-01-01

With the goal to contribute for the understanding of satellite DNA evolution and its genomic involvement, in this work it was isolated and characterized the first satellite DNA (PSUcentSat) from Phodopus sungorus (Cricetidae). Physical mapping of this sequence in P. sungorus showed large PSUcentSat arrays located at the heterochromatic (peri)centromeric region of five autosomal pairs and Y-chromosome. The presence of orthologous PSUcentSat sequences in the genomes of other Cricetidae and Muridae rodents was also verified, presenting however, an interspersed chromosomal distribution. This distribution pattern suggests a PSUcentSat-scattered location in an ancestor of Muridae/Cricetidae families, that assumed afterwards, in the descendant genome of P. sungorus a restricted localization to few chromosomes in the (peri)centromeric region. We believe that after the divergence of the studied species, PSUcentSat was most probably highly amplified in the (peri)centromeric region of some chromosome pairs of this hamster by recombinational mechanisms. The bouquet chromosome configuration (prophase I) possibly displays an important role in this selective amplification, providing physical proximity of centromeric regions between chromosomes with similar size and/or morphology. This seems particularly evident for the acrocentric chromosomes of P. sungorus (including the Y-chromosome), all presenting large PSUcentSat arrays at the (peri)centromeric region. The conservation of this sequence in the studied genomes and its (peri)centromeric amplification in P. sungorus strongly suggests functional significance, possibly displaying this satellite family different functions in the different genomes. The verification of PSUcentSat transcriptional activity in normal proliferative cells suggests that its transcription is not stage-limited, as described for some other satellites. PMID:25336681
The genome sequence of the most widely cultivated cacao type and its use to identify candidate genes regulating pod color.

PubMed

Motamayor, Juan C; Mockaitis, Keithanne; Schmutz, Jeremy; Haiminen, Niina; Livingstone, Donald; Cornejo, Omar; Findley, Seth D; Zheng, Ping; Utro, Filippo; Royaert, Stefan; Saski, Christopher; Jenkins, Jerry; Podicheti, Ram; Zhao, Meixia; Scheffler, Brian E; Stack, Joseph C; Feltus, Frank A; Mustiga, Guiliana M; Amores, Freddy; Phillips, Wilbert; Marelli, Jean Philippe; May, Gregory D; Shapiro, Howard; Ma, Jianxin; Bustamante, Carlos D; Schnell, Raymond J; Main, Dorrie; Gilbert, Don; Parida, Laxmi; Kuhn, David N

2013-06-03

Theobroma cacao L. cultivar Matina 1-6 belongs to the most cultivated cacao type. The availability of its genome sequence and methods for identifying genes responsible for important cacao traits will aid cacao researchers and breeders. We describe the sequencing and assembly of the genome of Theobroma cacao L. cultivar Matina 1-6. The genome of the Matina 1-6 cultivar is 445 Mbp, which is significantly larger than a sequenced Criollo cultivar, and more typical of other cultivars. The chromosome-scale assembly, version 1.1, contains 711 scaffolds covering 346.0 Mbp, with a contig N50 of 84.4 kbp, a scaffold N50 of 34.4 Mbp, and an evidence-based gene set of 29,408 loci. Version 1.1 has 10x the scaffold N50 and 4x the contig N50 as Criollo, and includes 111 Mb more anchored sequence. The version 1.1 assembly has 4.4% gap sequence, while Criollo has 10.9%. Through a combination of haplotype, association mapping and gene expression analyses, we leverage this robust reference genome to identify a promising candidate gene responsible for pod color variation. We demonstrate that green/red pod color in cacao is likely regulated by the R2R3 MYB transcription factor TcMYB113, homologs of which determine pigmentation in Rosaceae, Solanaceae, and Brassicaceae. One SNP within the target site for a highly conserved trans-acting siRNA in dicots, found within TcMYB113, seems to affect transcript levels of this gene and therefore pod color variation. We report a high-quality sequence and annotation of Theobroma cacao L. and demonstrate its utility in identifying candidate genes regulating traits.
Structure and expression strategy of the genome of Culex pipiens densovirus, a mosquito densovirus with an ambisense organization.

PubMed

Baquerizo-Audiot, Elizabeth; Abd-Alla, Adly; Jousset, Françoise-Xavière; Cousserans, François; Tijssen, Peter; Bergoin, Max

2009-07-01

The genome of all densoviruses (DNVs) so far isolated from mosquitoes or mosquito cell lines consists of a 4-kb single-stranded DNA molecule with a monosense organization (genus Brevidensovirus, subfamily Densovirinae). We previously reported the isolation of a Culex pipiens DNV (CpDNV) that differs significantly from brevidensoviruses by (i) having a approximately 6-kb genome, (ii) lacking sequence homology, and (iii) lacking antigenic cross-reactivity with Brevidensovirus capsid polypeptides. We report here the sequence organization and transcription map of this virus. The cloned genome of CpDNV is 5,759 nucleotides (nt) long, and it possesses an inverted terminal repeat (ITR) of 285 nt and an ambisense organization of its genes. The nonstructural (NS) proteins NS-1, NS-2, and NS-3 are located in the 5' half of one strand and are organized into five open reading frames (ORFs) due to the split of both NS-1 and NS-2 into two ORFs. The ORF encoding capsid polypeptides is located in the 5' half of the complementary strand. The expression of NS proteins is controlled by two promoters, P7 and P17, driving the transcription of a 2.4-kb mRNA encoding NS-3 and of a 1.8-kb mRNA encoding NS-1 and NS-2, respectively. The two NS mRNAs species are spliced off a 53-nt sequence. Capsid proteins are translated from an unspliced 2.3-kb mRNA driven by the P88 promoter. CpDNV thus appears as a new type of mosquito DNV, and based on the overall organization and expression modalities of its genome, it may represent the prototype of a new genus of DNV.
The genome sequence of the most widely cultivated cacao type and its use to identify candidate genes regulating pod color

PubMed Central

2013-01-01

Background Theobroma cacao L. cultivar Matina 1-6 belongs to the most cultivated cacao type. The availability of its genome sequence and methods for identifying genes responsible for important cacao traits will aid cacao researchers and breeders. Results We describe the sequencing and assembly of the genome of Theobroma cacao L. cultivar Matina 1-6. The genome of the Matina 1-6 cultivar is 445 Mbp, which is significantly larger than a sequenced Criollo cultivar, and more typical of other cultivars. The chromosome-scale assembly, version 1.1, contains 711 scaffolds covering 346.0 Mbp, with a contig N50 of 84.4 kbp, a scaffold N50 of 34.4 Mbp, and an evidence-based gene set of 29,408 loci. Version 1.1 has 10x the scaffold N50 and 4x the contig N50 as Criollo, and includes 111 Mb more anchored sequence. The version 1.1 assembly has 4.4% gap sequence, while Criollo has 10.9%. Through a combination of haplotype, association mapping and gene expression analyses, we leverage this robust reference genome to identify a promising candidate gene responsible for pod color variation. We demonstrate that green/red pod color in cacao is likely regulated by the R2R3 MYB transcription factor TcMYB113, homologs of which determine pigmentation in Rosaceae, Solanaceae, and Brassicaceae. One SNP within the target site for a highly conserved trans-acting siRNA in dicots, found within TcMYB113, seems to affect transcript levels of this gene and therefore pod color variation. Conclusions We report a high-quality sequence and annotation of Theobroma cacao L. and demonstrate its utility in identifying candidate genes regulating traits. PMID:23731509
Complex Interplay among DNA Modification, Noncoding RNA Expression and Protein-Coding RNA Expression in Salvia miltiorrhiza Chloroplast Genome

PubMed Central

Chen, Haimei; Zhang, Jianhui; Yuan, George; Liu, Chang

2014-01-01

Salvia miltiorrhiza is one of the most widely used medicinal plants. As a first step to develop a chloroplast-based genetic engineering method for the over-production of active components from S. miltiorrhiza, we have analyzed the genome, transcriptome, and base modifications of the S. miltiorrhiza chloroplast. Total genomic DNA and RNA were extracted from fresh leaves and then subjected to strand-specific RNA-Seq and Single-Molecule Real-Time (SMRT) sequencing analyses. Mapping the RNA-Seq reads to the genome assembly allowed us to determine the relative expression levels of 80 protein-coding genes. In addition, we identified 19 polycistronic transcription units and 136 putative antisense and intergenic noncoding RNA (ncRNA) genes. Comparison of the abundance of protein-coding transcripts (cRNA) with and without overlapping antisense ncRNAs (asRNA) suggest that the presence of asRNA is associated with increased cRNA abundance (p<0.05). Using the SMRT Portal software (v1.3.2), 2687 potential DNA modification sites and two potential DNA modification motifs were predicted. The two motifs include a TATA box–like motif (CPGDMM1, “TATANNNATNA”), and an unknown motif (CPGDMM2 “WNYANTGAW”). Specifically, 35 of the 97 CPGDMM1 motifs (36.1%) and 91 of the 369 CPGDMM2 motifs (24.7%) were found to be significantly modified (p<0.01). Analysis of genes downstream of the CPGDMM1 motif revealed the significantly increased abundance of ncRNA genes that are less than 400 bp away from the significantly modified CPGDMM1motif (p<0.01). Taking together, the present study revealed a complex interplay among DNA modifications, ncRNA and cRNA expression in chloroplast genome. PMID:24914614

Complex interplay among DNA modification, noncoding RNA expression and protein-coding RNA expression in Salvia miltiorrhiza chloroplast genome.

PubMed

Chen, Haimei; Zhang, Jianhui; Yuan, George; Liu, Chang

2014-01-01

Salvia miltiorrhiza is one of the most widely used medicinal plants. As a first step to develop a chloroplast-based genetic engineering method for the over-production of active components from S. miltiorrhiza, we have analyzed the genome, transcriptome, and base modifications of the S. miltiorrhiza chloroplast. Total genomic DNA and RNA were extracted from fresh leaves and then subjected to strand-specific RNA-Seq and Single-Molecule Real-Time (SMRT) sequencing analyses. Mapping the RNA-Seq reads to the genome assembly allowed us to determine the relative expression levels of 80 protein-coding genes. In addition, we identified 19 polycistronic transcription units and 136 putative antisense and intergenic noncoding RNA (ncRNA) genes. Comparison of the abundance of protein-coding transcripts (cRNA) with and without overlapping antisense ncRNAs (asRNA) suggest that the presence of asRNA is associated with increased cRNA abundance (p<0.05). Using the SMRT Portal software (v1.3.2), 2687 potential DNA modification sites and two potential DNA modification motifs were predicted. The two motifs include a TATA box-like motif (CPGDMM1, "TATANNNATNA"), and an unknown motif (CPGDMM2 "WNYANTGAW"). Specifically, 35 of the 97 CPGDMM1 motifs (36.1%) and 91 of the 369 CPGDMM2 motifs (24.7%) were found to be significantly modified (p<0.01). Analysis of genes downstream of the CPGDMM1 motif revealed the significantly increased abundance of ncRNA genes that are less than 400 bp away from the significantly modified CPGDMM1motif (p<0.01). Taking together, the present study revealed a complex interplay among DNA modifications, ncRNA and cRNA expression in chloroplast genome.
Profilin Is Required for Optimal Actin-Dependent Transcription of Respiratory Syncytial Virus Genome RNA

PubMed Central

Burke, Emily; Mahoney, Nicole M.; Almo, Steven C.; Barik, Sailen

2000-01-01

Transcription of human respiratory syncytial virus (RSV) genome RNA exhibited an obligatory need for the host cytoskeletal protein actin. Optimal transcription, however, required the participation of another cellular protein that was characterized as profilin by a number of criteria. The amino acid sequence of the protein, purified on the basis of its transcription-optimizing activity in vitro, exactly matched that of profilin. RSV transcription was inhibited 60 to 80% by antiprofilin antibody or poly-l-proline, molecules that specifically bind profilin. Native profilin, purified from extracts of lung epithelial cells by affinity binding to a poly-l-proline matrix, stimulated the actin-saturated RSV transcription by 2.5- to 3-fold. Recombinant profilin, expressed in bacteria, stimulated viral transcription as effectively as the native protein and was also inhibited by poly-l-proline. Profilin alone, in the absence of actin, did not activate viral transcription. It is estimated that at optimal levels of transcription, every molecule of viral genomic RNA associates with approximately the following number of protein molecules: 30 molecules of L, 120 molecules of phosphoprotein P, and 60 molecules each of actin and profilin. Together, these results demonstrated for the first time a cardinal role for profilin, an actin-modulatory protein, in the transcription of a paramyxovirus RNA genome. PMID:10623728
Resolving the problem of multiple accessions of the same transcript deposited across various public databases.

PubMed

Weirick, Tyler; John, David; Uchida, Shizuka

2017-03-01

Maintaining the consistency of genomic annotations is an increasingly complex task because of the iterative and dynamic nature of assembly and annotation, growing numbers of biological databases and insufficient integration of annotations across databases. As information exchange among databases is poor, a 'novel' sequence from one reference annotation could be annotated in another. Furthermore, relationships to nearby or overlapping annotated transcripts are even more complicated when using different genome assemblies. To better understand these problems, we surveyed current and previous versions of genomic assemblies and annotations across a number of public databases containing long noncoding RNA. We identified numerous discrepancies of transcripts regarding their genomic locations, transcript lengths and identifiers. Further investigation showed that the positional differences between reference annotations of essentially the same transcript could lead to differences in its measured expression at the RNA level. To aid in resolving these problems, we present the algorithm 'Universal Genomic Accession Hash (UGAHash)' and created an open source web tool to encourage the usage of the UGAHash algorithm. The UGAHash web tool (http://ugahash.uni-frankfurt.de) can be accessed freely without registration. The web tool allows researchers to generate Universal Genomic Accessions for genomic features or to explore annotations deposited in the public databases of the past and present versions. We anticipate that the UGAHash web tool will be a valuable tool to check for the existence of transcripts before judging the newly discovered transcripts as novel. © The Author 2016. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
RNA polymerase II pausing can be retained or acquired during activation of genes involved in the epithelial to mesenchymal transition

PubMed Central

Samarakkody, Ann; Abbas, Ata; Scheidegger, Adam; Warns, Jessica; Nnoli, Oscar; Jokinen, Bradley; Zarns, Kris; Kubat, Brooke; Dhasarathy, Archana; Nechaev, Sergei

2015-01-01

Promoter-proximal RNA polymerase II (Pol II) pausing is implicated in the regulation of gene transcription. However, the mechanisms of pausing including its dynamics during transcriptional responses remain to be fully understood. We performed global analysis of short capped RNAs and Pol II Chromatin Immunoprecipitation sequencing in MCF-7 breast cancer cells to map Pol II pausing across the genome, and used permanganate footprinting to specifically follow pausing during transcriptional activation of several genes involved in the epithelial to mesenchymal transition (EMT). We find that the gene for EMT master regulator Snail (SNAI1), but not Slug (SNAI2), shows evidence of Pol II pausing before activation. Transcriptional activation of the paused SNAI1 gene is accompanied by a further increase in Pol II pausing signal, whereas activation of non-paused SNAI2 gene results in the acquisition of a typical pausing signature. The increase in pausing signal reflects increased transcription initiation without changes in Pol II pausing. Activation of the heat shock HSP70 gene involves pausing release that speeds up Pol II turnover, but does not change pausing location. We suggest that Pol II pausing is retained during transcriptional activation and can further undergo regulated release in a signal-specific manner. PMID:25820424
Isolation of a promoter region in mouse cytochrome P450 3A (Cyp3A16) gene and its transcriptional control.

PubMed

Itoh, S; Abe, Y; Kubo, A; Okuda, M; Shimoji, M; Nakayama, K; Kamataki, T

1997-02-07

An 11.5 kb fragment of the mouse Cyp3a16 gene containing the 5' flanking region was isolated from the lambda DASHII mouse genomic library. A part of the 5' flanking region and the first exon of Cyp3a16 gene were sequenced. S1 mapping analysis showed the presence of two transcriptional initiation sites. The first exon was completely identical to Cyp3a16 cDNA. The identity of 5' flanking sequences between Cyp3a16 and Cyp3a11 genes was about 69%. A typical TATA box and a basic transcription element (BTE) were found as seen with other CYP3A genes from various animal species Moreover, some putative transcriptional regulatory elements were also found in addition to the sequence motif seen for the formation of Z-type DNA. To examine the transcriptional activity of Cyp3a11 gene, DNA fragments in the 5'-flanking region of the gene were inserted front of the luciferase structural gene, and the constructs were transfected in primary hepatocytes. The analysis of the luciferase activity indicated that the region between -146 and -56 was necessary for the transcription of CYP3a16 gene.
Genome-wide analysis of starch metabolism genes in potato (Solanum tuberosum L.).

PubMed

Van Harsselaar, Jessica K; Lorenz, Julia; Senning, Melanie; Sonnewald, Uwe; Sonnewald, Sophia

2017-01-05

Starch is the principle constituent of potato tubers and is of considerable importance for food and non-food applications. Its metabolism has been subject of extensive research over the past decades. Despite its importance, a description of the complete inventory of genes involved in starch metabolism and their genome organization in potato plants is still missing. Moreover, mechanisms regulating the expression of starch genes in leaves and tubers remain elusive with regard to differences between transitory and storage starch metabolism, respectively. This study aimed at identifying and mapping the complete set of potato starch genes, and to study their expression pattern in leaves and tubers using different sets of transcriptome data. Moreover, we wanted to uncover transcription factors co-regulated with starch accumulation in tubers in order to get insight into the regulation of starch metabolism. We identified 77 genomic loci encoding enzymes involved in starch metabolism. Novel isoforms of many enzymes were found. Their analysis will help to elucidate mechanisms of starch biosynthesis and degradation. Expression analysis of starch genes led to the identification of tissue-specific isoenzymes suggesting differences in the transcriptional regulation of starch metabolism between potato leaf and tuber tissues. Selection of genes predominantly expressed in developing potato tubers and exhibiting an expression pattern indicative for a role in starch biosynthesis enabled the identification of possible transcriptional regulators of tuber starch biosynthesis by co-expression analysis. This study provides the annotation of the complete set of starch metabolic genes in potato plants and their genomic localizations. Novel, so far undescribed, enzyme isoforms were revealed. Comparative transcriptome analysis enabled the identification of tuber- and leaf-specific isoforms of starch genes. This finding suggests distinct regulatory mechanisms in transitory and storage starch metabolism. Putative regulatory proteins of starch biosynthesis in potato tubers have been identified by co-expression and their expression was verified by quantitative RT-PCR.
Transcriptomic Changes of Drought-Tolerant and Sensitive Banana Cultivars Exposed to Drought Stress

PubMed Central

Muthusamy, Muthusamy; Uma, Subbaraya; Backiyarani, Suthanthiram; Saraswathi, Marimuthu Somasundaram; Chandrasekar, Arumugam

2016-01-01

In banana, drought responsive gene expression profiles of drought-tolerant and sensitive genotypes remain largely unexplored. In this research, the transcriptome of drought-tolerant banana cultivar (Saba, ABB genome) and sensitive cultivar (Grand Naine, AAA genome) was monitored using mRNA-Seq under control and drought stress condition. A total of 162.36 million reads from tolerant and 126.58 million reads from sensitive libraries were produced and mapped onto the Musa acuminata genome sequence and assembled into 23,096 and 23,079 unigenes. Differential gene expression between two conditions (control and drought) showed that at least 2268 and 2963 statistically significant, functionally known, non-redundant differentially expressed genes (DEGs) from tolerant and sensitive libraries. Drought has up-regulated 991 and 1378 DEGs and down-regulated 1104 and 1585 DEGs respectively in tolerant and sensitive libraries. Among DEGs, 15.9% are coding for transcription factors (TFs) comprising 46 families and 9.5% of DEGs are constituted by protein kinases from 82 families. Most enriched DEGs are mainly involved in protein modifications, lipid metabolism, alkaloid biosynthesis, carbohydrate degradation, glycan metabolism, and biosynthesis of amino acid, cofactor, nucleotide-sugar, hormone, terpenoids and other secondary metabolites. Several, specific genotype-dependent gene expression pattern was observed for drought stress in both cultivars. A subset of 9 DEGs was confirmed using quantitative reverse transcription-PCR. These results will provide necessary information for developing drought-resilient banana plants. PMID:27867388
GETPrime 2.0: gene- and transcript-specific qPCR primers for 13 species including polymorphisms.

PubMed

David, Fabrice P A; Rougemont, Jacques; Deplancke, Bart

2017-01-04

GETPrime (http://bbcftools.epfl.ch/getprime) is a database with a web frontend providing gene- and transcript-specific, pre-computed qPCR primer pairs. The primers have been optimized for genome-wide specificity and for allowing the selective amplification of one or several splice variants of most known genes. To ease selection, primers have also been ranked according to defined criteria such as genome-wide specificity (with BLAST), amplicon size, and isoform coverage. Here, we report a major upgrade (2.0) of the database: eight new species (yeast, chicken, macaque, chimpanzee, rat, platypus, pufferfish, and Anolis carolinensis) now complement the five already included in the previous version (human, mouse, zebrafish, fly, and worm). Furthermore, the genomic reference has been updated to Ensembl v81 (while keeping earlier versions for backward compatibility) as a result of re-designing the back-end database and automating the import of relevant sections of the Ensembl database in species-independent fashion. This also allowed us to map known polymorphisms to the primers (on average three per primer for human), with the aim of reducing experimental error when targeting specific strains or individuals. Another consequence is that the inclusion of future Ensembl releases and other species has now become a relatively straightforward task. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Decoding the regulatory landscape of melanoma reveals TEADS as regulators of the invasive cell state

PubMed Central

Verfaillie, Annelien; Imrichova, Hana; Atak, Zeynep Kalender; Dewaele, Michael; Rambow, Florian; Hulselmans, Gert; Christiaens, Valerie; Svetlichnyy, Dmitry; Luciani, Flavie; Van den Mooter, Laura; Claerhout, Sofie; Fiers, Mark; Journe, Fabrice; Ghanem, Ghanem-Elias; Herrmann, Carl; Halder, Georg; Marine, Jean-Christophe; Aerts, Stein

2015-01-01

Transcriptional reprogramming of proliferative melanoma cells into a phenotypically distinct invasive cell subpopulation is a critical event at the origin of metastatic spreading. Here we generate transcriptome, open chromatin and histone modification maps of melanoma cultures; and integrate this data with existing transcriptome and DNA methylation profiles from tumour biopsies to gain insight into the mechanisms underlying this key reprogramming event. This shows thousands of genomic regulatory regions underlying the proliferative and invasive states, identifying SOX10/MITF and AP-1/TEAD as regulators, respectively. Knockdown of TEADs shows a previously unrecognized role in the invasive gene network and establishes a causative link between these transcription factors, cell invasion and sensitivity to MAPK inhibitors. Using regulatory landscapes and in silico analysis, we show that transcriptional reprogramming underlies the distinct cellular states present in melanoma. Furthermore, it reveals an essential role for the TEADs, linking it to clinically relevant mechanisms such as invasion and resistance. PMID:25865119
The Zur regulon of Corynebacterium glutamicum ATCC 13032

PubMed Central

2010-01-01

Background Zinc is considered as an essential element for all living organisms, but it can be toxic at large concentrations. Bacteria therefore tightly regulate zinc metabolism. The Cg2502 protein of Corynebacterium glutamicum was a candidate to control zinc metabolism in this species, since it was classified as metalloregulator of the zinc uptake regulator (Zur) subgroup of the ferric uptake regulator (Fur) family of DNA-binding transcription regulators. Results The cg2502 (zur) gene was deleted in the chromosome of C. glutamicum ATCC 13032 by an allelic exchange procedure to generate the zur-deficient mutant C. glutamicum JS2502. Whole-genome DNA microarray hybridizations and real-time RT-PCR assays comparing the gene expression in C. glutamicum JS2502 with that of the wild-type strain detected 18 genes with enhanced expression in the zur mutant. The expression data were combined with results from cross-genome comparisons of shared regulatory sites, revealing the presence of candidate Zur-binding sites in the mapped promoter regions of five transcription units encoding components of potential zinc ABC-type transporters (cg0041-cg0042/cg0043; cg2911-cg2912-cg2913), a putative secreted protein (cg0040), a putative oxidoreductase (cg0795), and a putative P-loop GTPase of the COG0523 protein family (cg0794). Enhanced transcript levels of the respective genes in C. glutamicum JS2502 were verified by real-time RT-PCR, and complementation of the mutant with a wild-type zur gene reversed the effect of differential gene expression. The zinc-dependent expression of the putative cg0042 and cg2911 operons was detected in vivo with a gfp reporter system. Moreover, the zinc-dependent binding of purified Zur protein to double-stranded 40-mer oligonucleotides containing candidate Zur-binding sites was demonstrated in vitro by DNA band shift assays. Conclusion Whole-genome expression profiling and DNA band shift assays demonstrated that Zur directly represses in a zinc-dependent manner the expression of nine genes organized in five transcription units. Accordingly, the Zur (Cg2502) protein is the key transcription regulator for genes involved in zinc homeostasis in C. glutamicum. PMID:20055984
Genome-wide identification and characterisation of human DNA replication origins by initiation site sequencing (ini-seq)

PubMed Central

Langley, Alexander R.; Gräf, Stefan; Smith, James C.; Krude, Torsten

2016-01-01

Next-generation sequencing has enabled the genome-wide identification of human DNA replication origins. However, different approaches to mapping replication origins, namely (i) sequencing isolated small nascent DNA strands (SNS-seq); (ii) sequencing replication bubbles (bubble-seq) and (iii) sequencing Okazaki fragments (OK-seq), show only limited concordance. To address this controversy, we describe here an independent high-resolution origin mapping technique that we call initiation site sequencing (ini-seq). In this approach, newly replicated DNA is directly labelled with digoxigenin-dUTP near the sites of its initiation in a cell-free system. The labelled DNA is then immunoprecipitated and genomic locations are determined by DNA sequencing. Using this technique we identify >25,000 discrete origin sites at sub-kilobase resolution on the human genome, with high concordance between biological replicates. Most activated origins identified by ini-seq are found at transcriptional start sites and contain G-quadruplex (G4) motifs. They tend to cluster in early-replicating domains, providing a correlation between early replication timing and local density of activated origins. Origins identified by ini-seq show highest concordance with sites identified by SNS-seq, followed by OK-seq and bubble-seq. Furthermore, germline origins identified by positive nucleotide distribution skew jumps overlap with origins identified by ini-seq and OK-seq more frequently and more specifically than do sites identified by either SNS-seq or bubble-seq. PMID:27587586
Genome-wide identification and characterisation of human DNA replication origins by initiation site sequencing (ini-seq).

PubMed

Langley, Alexander R; Gräf, Stefan; Smith, James C; Krude, Torsten

2016-12-01

Next-generation sequencing has enabled the genome-wide identification of human DNA replication origins. However, different approaches to mapping replication origins, namely (i) sequencing isolated small nascent DNA strands (SNS-seq); (ii) sequencing replication bubbles (bubble-seq) and (iii) sequencing Okazaki fragments (OK-seq), show only limited concordance. To address this controversy, we describe here an independent high-resolution origin mapping technique that we call initiation site sequencing (ini-seq). In this approach, newly replicated DNA is directly labelled with digoxigenin-dUTP near the sites of its initiation in a cell-free system. The labelled DNA is then immunoprecipitated and genomic locations are determined by DNA sequencing. Using this technique we identify >25,000 discrete origin sites at sub-kilobase resolution on the human genome, with high concordance between biological replicates. Most activated origins identified by ini-seq are found at transcriptional start sites and contain G-quadruplex (G4) motifs. They tend to cluster in early-replicating domains, providing a correlation between early replication timing and local density of activated origins. Origins identified by ini-seq show highest concordance with sites identified by SNS-seq, followed by OK-seq and bubble-seq. Furthermore, germline origins identified by positive nucleotide distribution skew jumps overlap with origins identified by ini-seq and OK-seq more frequently and more specifically than do sites identified by either SNS-seq or bubble-seq. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Mapping the Shh long-range regulatory domain

PubMed Central

Anderson, Eve; Devenney, Paul S.; Hill, Robert E.; Lettice, Laura A.

2014-01-01

Coordinated gene expression controlled by long-distance enhancers is orchestrated by DNA regulatory sequences involving transcription factors and layers of control mechanisms. The Shh gene and well-established regulators are an example of genomic composition in which enhancers reside in a large desert extending into neighbouring genes to control the spatiotemporal pattern of expression. Exploiting the local hopping activity of the Sleeping Beauty transposon, the lacZ reporter gene was dispersed throughout the Shh region to systematically map the genomic features responsible for expression activity. We found that enhancer activities are retained inside a genomic region that corresponds to the topological associated domain (TAD) defined by Hi-C. This domain of approximately 900 kb is in an open conformation over its length and is generally susceptible to all Shh enhancers. Similar to the distal enhancers, an enhancer residing within the Shh second intron activates the reporter gene located at distances of hundreds of kilobases away, suggesting that both proximal and distal enhancers have the capacity to survey the Shh topological domain to recognise potential promoters. The widely expressed Rnf32 gene lying within the Shh domain evades enhancer activities by a process that may be common among other housekeeping genes that reside in large regulatory domains. Finally, the boundaries of the Shh TAD do not represent the absolute expression limits of enhancer activity, as expression activity is lost stepwise at a number of genomic positions at the verges of these domains. PMID:25252942
CoryneRegNet: an ontology-based data warehouse of corynebacterial transcription factors and regulatory networks.

PubMed

Baumbach, Jan; Brinkrolf, Karina; Czaja, Lisa F; Rahmann, Sven; Tauch, Andreas

2006-02-14

The application of DNA microarray technology in post-genomic analysis of bacterial genome sequences has allowed the generation of huge amounts of data related to regulatory networks. This data along with literature-derived knowledge on regulation of gene expression has opened the way for genome-wide reconstruction of transcriptional regulatory networks. These large-scale reconstructions can be converted into in silico models of bacterial cells that allow a systematic analysis of network behavior in response to changing environmental conditions. CoryneRegNet was designed to facilitate the genome-wide reconstruction of transcriptional regulatory networks of corynebacteria relevant in biotechnology and human medicine. During the import and integration process of data derived from experimental studies or literature knowledge CoryneRegNet generates links to genome annotations, to identified transcription factors and to the corresponding cis-regulatory elements. CoryneRegNet is based on a multi-layered, hierarchical and modular concept of transcriptional regulation and was implemented by using the relational database management system MySQL and an ontology-based data structure. Reconstructed regulatory networks can be visualized by using the yFiles JAVA graph library. As an application example of CoryneRegNet, we have reconstructed the global transcriptional regulation of a cellular module involved in SOS and stress response of corynebacteria. CoryneRegNet is an ontology-based data warehouse that allows a pertinent data management of regulatory interactions along with the genome-scale reconstruction of transcriptional regulatory networks. These models can further be combined with metabolic networks to build integrated models of cellular function including both metabolism and its transcriptional regulation.
Platinum coat color in red fox (Vulpes vulpes) is caused by a mutation in an autosomal copy of KIT.

PubMed

Johnson, J L; Kozysa, A; Kharlamova, A V; Gulevich, R G; Perelman, P L; Fong, H W F; Vladimirova, A V; Oskina, I N; Trut, L N; Kukekova, A V

2015-04-01

The red fox (Vulpes vulpes) demonstrates a variety of coat colors including platinum, a common phenotype maintained in farm-bred fox populations. Foxes heterozygous for the platinum allele have a light silver coat and extensive white spotting, whereas homozygosity is embryonic lethal. Two KIT transcripts were identified in skin cDNA from platinum foxes. The long transcript was identical to the KIT transcript of silver foxes, whereas the short transcript, which lacks exon 17, was specific to platinum. The KIT gene has several copies in the fox genome: an autosomal copy on chromosome 2 and additional copies on the B chromosomes. To identify the platinum-specific KIT sequence, the genomes of one platinum and one silver fox were sequenced. A single nucleotide polymorphism (SNP) was identified at the first nucleotide of KIT intron 17 in the platinum fox. In platinum foxes, the A allele of the SNP disrupts the donor splice site and causes exon 17, which is part of a segment that encodes a conserved tyrosine kinase domain, to be skipped. Complete cosegregation of the A allele with the platinum phenotype was confirmed by linkage mapping (LOD 25.59). All genotyped farm-bred platinum foxes from Russia and the US were heterozygous for the SNP (A/G), whereas foxes with different coat colors were homozygous for the G allele. Identification of the platinum mutation suggests that other fox white-spotting phenotypes, which are allelic to platinum, would also be caused by mutations in the KIT gene. © 2015 Stichting International Foundation for Animal Genetics.
Intrinsic DNA curvature in trypanosomes.

PubMed

Smircich, Pablo; El-Sayed, Najib M; Garat, Beatriz

2017-11-09

Trypanosoma cruzi and Trypanosoma brucei are protozoan parasites causing Chagas disease and African sleeping sickness, displaying unique features of cellular and molecular biology. Remarkably, no canonical signals for RNA polymerase II promoters, which drive protein coding genes transcription, have been identified so far. The secondary structure of DNA has long been recognized as a signal in biological processes and more recently, its involvement in transcription initiation in Leishmania was proposed. In order to study whether this feature is conserved in trypanosomatids, we undertook a genome wide search for intrinsic DNA curvature in T. cruzi and T. brucei. Using a region integrated intrinsic curvature (RIIC) scoring that we previously developed, a non-random distribution of sequence-dependent curvature was observed. High RIIC scores were found to be significantly correlated with transcription start sites in T. cruzi, which have been mapped in divergent switch regions, whereas in T. brucei, the high RIIC scores correlated with sites that have been involved not only in RNA polymerase II initiation but also in termination. In addition, we observed regions with high RIIC score presenting in-phase tracts of Adenines, in the subtelomeric regions of the T. brucei chromosomes that harbor the variable surface glycoproteins genes. In both T. cruzi and T. brucei genomes, a link between DNA conformational signals and gene expression was found. High sequence dependent curvature is associated with transcriptional regulation regions. High intrinsic curvature also occurs at the T. brucei chromosome subtelomeric regions where the recombination processes involved in the evasion of the immune host system take place. These findings underscore the relevance of indirect DNA readout in these ancient eukaryotes.
Genome investigation suggests MdSHN3, an APETALA2-domain transcription factor gene, to be a positive regulator of apple fruit cuticle formation and an inhibitor of russet development

PubMed Central

Lashbrooke, Justin; Aharoni, Asaph; Costa, Fabrizio

2015-01-01

The outer epidermal layer of apple fruit is covered by a protective cuticle. Composed of a polymerized cutin matrix embedded with waxes, the cuticle is a natural waterproof barrier and protects against several abiotic and biotic stresses. In terms of apple production, the cuticle is essential to maintain long post-harvest storage, while severe failure of the cuticle can result in the formation of a disorder known as russet. Apple russet results from micro-cracking of the cuticle and the formation of a corky suberized layer. This is typically an undesirable consumer trait, and negatively impacts the post-harvest storage of apples. In order to identify genetic factors controlling cuticle biosynthesis (and thus preventing russet) in apple, a quantitative trait locus (QTL) mapping survey was performed on a full-sib population. Two genomic regions located on chromosomes 2 and 15 that could be associated with russeting were identified. Apples with compromised cuticles were identified through a novel and high-throughput tensile analysis of the skin, while histological analysis confirmed cuticle failure in a subset of the progeny. Additional genomic investigation of the determined QTL regions identified a set of underlying genes involved in cuticle biosynthesis. Candidate gene expression profiling by quantitative real-time PCR on a subset of the progeny highlighted the specific expression pattern of a SHN1/WIN1 transcription factor gene (termed MdSHN3) on chromosome 15. Orthologues of SHN1/WIN1 have been previously shown to regulate cuticle formation in Arabidopsis, tomato, and barley. The MdSHN3 transcription factor gene displayed extremely low expression in lines with improper cuticle formation, suggesting it to be a fundamental regulator of cuticle biosynthesis in apple fruit. PMID:26220084
Grape RNA-Seq analysis pipeline environment

PubMed Central

Knowles, David G.; Röder, Maik; Merkel, Angelika; Guigó, Roderic

2013-01-01

Motivation: The avalanche of data arriving since the development of NGS technologies have prompted the need for developing fast, accurate and easily automated bioinformatic tools capable of dealing with massive datasets. Among the most productive applications of NGS technologies is the sequencing of cellular RNA, known as RNA-Seq. Although RNA-Seq provides similar or superior dynamic range than microarrays at similar or lower cost, the lack of standard and user-friendly pipelines is a bottleneck preventing RNA-Seq from becoming the standard for transcriptome analysis. Results: In this work we present a pipeline for processing and analyzing RNA-Seq data, that we have named Grape (Grape RNA-Seq Analysis Pipeline Environment). Grape supports raw sequencing reads produced by a variety of technologies, either in FASTA or FASTQ format, or as prealigned reads in SAM/BAM format. A minimal Grape configuration consists of the file location of the raw sequencing reads, the genome of the species and the corresponding gene and transcript annotation. Grape first runs a set of quality control steps, and then aligns the reads to the genome, a step that is omitted for prealigned read formats. Grape next estimates gene and transcript expression levels, calculates exon inclusion levels and identifies novel transcripts. Grape can be run on a single computer or in parallel on a computer cluster. It is distributed with specific mapping and quantification tools, but given its modular design, any tool supporting popular data interchange formats can be integrated. Availability: Grape can be obtained from the Bioinformatics and Genomics website at: http://big.crg.cat/services/grape. Contact: david.gonzalez@crg.eu or roderic.guigo@crg.eu PMID:23329413
Large-scale transcriptome characterization and mass discovery of SNPs in globe artichoke and its related taxa.

PubMed

Scaglione, Davide; Lanteri, Sergio; Acquadro, Alberto; Lai, Zhao; Knapp, Steven J; Rieseberg, Loren; Portis, Ezio

2012-10-01

Cynara cardunculus (2n = 2× = 34) is a member of the Asteraceae family that contributes significantly to the agricultural economy of the Mediterranean basin. The species includes two cultivated varieties, globe artichoke and cardoon, which are grown mainly for food. Cynara cardunculus is an orphan crop species whose genome/transcriptome has been relatively unexplored, especially in comparison to other Asteraceae crops. Hence, there is a significant need to improve its genomic resources through the identification of novel genes and sequence-based markers, to design new breeding schemes aimed at increasing quality and crop productivity. We report the outcome of cDNA sequencing and assembly for eleven accessions of C. cardunculus. Sequencing of three mapping parental genotypes using Roche 454-Titanium technology generated 1.7 × 10⁶ reads, which were assembled into 38,726 reference transcripts covering 32 Mbp. Putative enzyme-encoding genes were annotated using the KEGG-database. Transcription factors and candidate resistance genes were surveyed as well. Paired-end sequencing was done for cDNA libraries of eight other representative C. cardunculus accessions on an Illumina Genome Analyzer IIx, generating 46 × 10⁶ reads. Alignment of the IGA and 454 reads to reference transcripts led to the identification of 195,400 SNPs with a Bayesian probability exceeding 95%; a validation rate of 90% was obtained by Sanger-sequencing of a subset of contigs. These results demonstrate that the integration of data from different NGS platforms enables large-scale transcriptome characterization, along with massive SNP discovery. This information will contribute to the dissection of key agricultural traits in C. cardunculus and facilitate the implementation of marker-assisted selection programs. © 2012 The Authors. Plant Biotechnology Journal © 2012 Society for Experimental Biology, Association of Applied Biologists and Blackwell Publishing Ltd.
High-throughput interpretation of gene structure changes in human and nonhuman resequencing data, using ACE

PubMed Central

Majoros, William H.; Campbell, Michael S.; Holt, Carson; DeNardo, Erin K.; Ware, Doreen; Allen, Andrew S.; Yandell, Mark; Reddy, Timothy E.

2017-01-01

Abstract Motivation: The accurate interpretation of genetic variants is critical for characterizing genotype–phenotype associations. Because the effects of genetic variants can depend strongly on their local genomic context, accurate genome annotations are essential. Furthermore, as some variants have the potential to disrupt or alter gene structure, variant interpretation efforts stand to gain from the use of individualized annotations that account for differences in gene structure between individuals or strains. Results: We describe a suite of software tools for identifying possible functional changes in gene structure that may result from sequence variants. ACE (‘Assessing Changes to Exons’) converts phased genotype calls to a collection of explicit haplotype sequences, maps transcript annotations onto them, detects gene-structure changes and their possible repercussions, and identifies several classes of possible loss of function. Novel transcripts predicted by ACE are commonly supported by spliced RNA-seq reads, and can be used to improve read alignment and transcript quantification when an individual-specific genome sequence is available. Using publicly available RNA-seq data, we show that ACE predictions confirm earlier results regarding the quantitative effects of nonsense-mediated decay, and we show that predicted loss-of-function events are highly concordant with patterns of intolerance to mutations across the human population. ACE can be readily applied to diverse species including animals and plants, making it a broadly useful tool for use in eukaryotic population-based resequencing projects, particularly for assessing the joint impact of all variants at a locus. Availability and Implementation: ACE is written in open-source C ++ and Perl and is available from geneprediction.org/ACE Contact: myandell@genetics.utah.edu or tim.reddy@duke.edu Supplementary information: Supplementary information is available at Bioinformatics online. PMID:28011790

Some links on this page may take you to non-federal websites. Their policies may differ from this site.