genomes regulatory code: Topics by Science.gov

Sample records for genomes regulatory code

Deciphering the transcriptional cis-regulatory code.

PubMed

Yáñez-Cuna, J Omar; Kvon, Evgeny Z; Stark, Alexander

2013-01-01

Information about developmental gene expression resides in defined regulatory elements, called enhancers, in the non-coding part of the genome. Although cells reliably utilize enhancers to orchestrate gene expression, a cis-regulatory code that would allow their interpretation has remained one of the greatest challenges of modern biology. In this review, we summarize studies from the past three decades that describe progress towards revealing the properties of enhancers and discuss how recent approaches are providing unprecedented insights into regulatory elements in animal genomes. Over the next years, we believe that the functional characterization of regulatory sequences in entire genomes, combined with recent computational methods, will provide a comprehensive view of genomic regulatory elements and their building blocks and will enable researchers to begin to understand the sequence basis of the cis-regulatory code. Copyright © 2012 Elsevier Ltd. All rights reserved.
Cis-regulatory somatic mutations and gene-expression alteration in B-cell lymphomas.

PubMed

Mathelier, Anthony; Lefebvre, Calvin; Zhang, Allen W; Arenillas, David J; Ding, Jiarui; Wasserman, Wyeth W; Shah, Sohrab P

2015-04-23

With the rapid increase of whole-genome sequencing of human cancers, an important opportunity to analyze and characterize somatic mutations lying within cis-regulatory regions has emerged. A focus on protein-coding regions to identify nonsense or missense mutations disruptive to protein structure and/or function has led to important insights; however, the impact on gene expression of mutations lying within cis-regulatory regions remains under-explored. We analyzed somatic mutations from 84 matched tumor-normal whole genomes from B-cell lymphomas with accompanying gene expression measurements to elucidate the extent to which these cancers are disrupted by cis-regulatory mutations. We characterize mutations overlapping a high quality set of well-annotated transcription factor binding sites (TFBSs), covering a similar portion of the genome as protein-coding exons. Our results indicate that cis-regulatory mutations overlapping predicted TFBSs are enriched in promoter regions of genes involved in apoptosis or growth/proliferation. By integrating gene expression data with mutation data, our computational approach culminates with identification of cis-regulatory mutations most likely to participate in dysregulation of the gene expression program. The impact can be measured along with protein-coding mutations to highlight key mutations disrupting gene expression and pathways in cancer. Our study yields specific genes with disrupted expression triggered by genomic mutations in either the coding or the regulatory space. It implies that mutated regulatory components of the genome contribute substantially to cancer pathways. Our analyses demonstrate that identifying genomically altered cis-regulatory elements coupled with analysis of gene expression data will augment biological interpretation of mutational landscapes of cancers.
Identification and role of regulatory non-coding RNAs in Listeria monocytogenes.

PubMed

Izar, Benjamin; Mraheil, Mobarak Abu; Hain, Torsten

2011-01-01

Bacterial regulatory non-coding RNAs control numerous mRNA targets that direct a plethora of biological processes, such as the adaption to environmental changes, growth and virulence. Recently developed high-throughput techniques, such as genomic tiling arrays and RNA-Seq have allowed investigating prokaryotic cis- and trans-acting regulatory RNAs, including sRNAs, asRNAs, untranslated regions (UTR) and riboswitches. As a result, we obtained a more comprehensive view on the complexity and plasticity of the prokaryotic genome biology. Listeria monocytogenes was utilized as a model system for intracellular pathogenic bacteria in several studies, which revealed the presence of about 180 regulatory RNAs in the listerial genome. A regulatory role of non-coding RNAs in survival, virulence and adaptation mechanisms of L. monocytogenes was confirmed in subsequent experiments, thus, providing insight into a multifaceted modulatory function of RNA/mRNA interference. In this review, we discuss the identification of regulatory RNAs by high-throughput techniques and in their functional role in L. monocytogenes.
Regulatory variation: an emerging vantage point for cancer biology.

PubMed

Li, Luolan; Lorzadeh, Alireza; Hirst, Martin

2014-01-01

Transcriptional regulation involves complex and interdependent interactions of noncoding and coding regions of the genome with proteins that interact and modify them. Genetic variation/mutation in coding and noncoding regions of the genome can drive aberrant transcription and disease. In spite of accounting for nearly 98% of the genome comparatively little is known about the contribution of noncoding DNA elements to disease. Genome-wide association studies of complex human diseases including cancer have revealed enrichment for variants in the noncoding genome. A striking finding of recent cancer genome re-sequencing efforts has been the previously underappreciated frequency of mutations in epigenetic modifiers across a wide range of cancer types. Taken together these results point to the importance of dysregulation in transcriptional regulatory control in genesis of cancer. Powered by recent technological advancements in functional genomic profiling, exploration of normal and transformed regulatory networks will provide novel insight into the initiation and progression of cancer and open new windows to future prognostic and diagnostic tools. © 2013 Wiley Periodicals, Inc.
Genome-wide prediction of cis-regulatory regions using supervised deep learning methods.

PubMed

Li, Yifeng; Shi, Wenqiang; Wasserman, Wyeth W

2018-05-31

In the human genome, 98% of DNA sequences are non-protein-coding regions that were previously disregarded as junk DNA. In fact, non-coding regions host a variety of cis-regulatory regions which precisely control the expression of genes. Thus, Identifying active cis-regulatory regions in the human genome is critical for understanding gene regulation and assessing the impact of genetic variation on phenotype. The developments of high-throughput sequencing and machine learning technologies make it possible to predict cis-regulatory regions genome wide. Based on rich data resources such as the Encyclopedia of DNA Elements (ENCODE) and the Functional Annotation of the Mammalian Genome (FANTOM) projects, we introduce DECRES based on supervised deep learning approaches for the identification of enhancer and promoter regions in the human genome. Due to their ability to discover patterns in large and complex data, the introduction of deep learning methods enables a significant advance in our knowledge of the genomic locations of cis-regulatory regions. Using models for well-characterized cell lines, we identify key experimental features that contribute to the predictive performance. Applying DECRES, we delineate locations of 300,000 candidate enhancers genome wide (6.8% of the genome, of which 40,000 are supported by bidirectional transcription data), and 26,000 candidate promoters (0.6% of the genome). The predicted annotations of cis-regulatory regions will provide broad utility for genome interpretation from functional genomics to clinical applications. The DECRES model demonstrates potentials of deep learning technologies when combined with high-throughput sequencing data, and inspires the development of other advanced neural network models for further improvement of genome annotations.
cncRNAs: Bi-functional RNAs with protein coding and non-coding functions

PubMed Central

Kumari, Pooja; Sampath, Karuna

2015-01-01

For many decades, the major function of mRNA was thought to be to provide protein-coding information embedded in the genome. The advent of high-throughput sequencing has led to the discovery of pervasive transcription of eukaryotic genomes and opened the world of RNA-mediated gene regulation. Many regulatory RNAs have been found to be incapable of protein coding and are hence termed as non-coding RNAs (ncRNAs). However, studies in recent years have shown that several previously annotated non-coding RNAs have the potential to encode proteins, and conversely, some coding RNAs have regulatory functions independent of the protein they encode. Such bi-functional RNAs, with both protein coding and non-coding functions, which we term as ‘cncRNAs’, have emerged as new players in cellular systems. Here, we describe the functions of some cncRNAs identified from bacteria to humans. Because the functions of many RNAs across genomes remains unclear, we propose that RNAs be classified as coding, non-coding or both only after careful analysis of their functions. PMID:26498036
The primary transcriptome of the marine diazotroph Trichodesmium erythraeum IMS101

NASA Astrophysics Data System (ADS)

Pfreundt, Ulrike; Kopf, Matthias; Belkin, Natalia; Berman-Frank, Ilana; Hess, Wolfgang R.

2014-08-01

Blooms of the dinitrogen-fixing marine cyanobacterium Trichodesmium considerably contribute to new nitrogen inputs into tropical oceans. Intriguingly, only 60% of the Trichodesmium erythraeum IMS101 genome sequence codes for protein, compared with ~85% in other sequenced cyanobacterial genomes. The extensive non-coding genome fraction suggests space for an unusually high number of unidentified, potentially regulatory non-protein-coding RNAs (ncRNAs). To identify the transcribed fraction of the genome, here we present a genome-wide map of transcriptional start sites (TSS) at single nucleotide resolution, revealing the activity of 6,080 promoters. We demonstrate that T. erythraeum has the highest number of actively splicing group II introns and the highest percentage of TSS yielding ncRNAs of any bacterium examined to date. We identified a highly transcribed retroelement that serves as template repeat for the targeted mutation of at least 12 different genes by mutagenic homing. Our findings explain the non-coding portion of the T. erythraeum genome by the transcription of an unusually high number of non-coding transcripts in addition to the known high incidence of transposable elements. We conclude that riboregulation and RNA maturation-dependent processes constitute a major part of the Trichodesmium regulatory apparatus.
Exploring the read-write genome: mobile DNA and mammalian adaptation.

PubMed

Shapiro, James A

2017-02-01

The read-write genome idea predicts that mobile DNA elements will act in evolution to generate adaptive changes in organismal DNA. This prediction was examined in the context of mammalian adaptations involving regulatory non-coding RNAs, viviparous reproduction, early embryonic and stem cell development, the nervous system, and innate immunity. The evidence shows that mobile elements have played specific and sometimes major roles in mammalian adaptive evolution by generating regulatory sites in the DNA and providing interaction motifs in non-coding RNA. Endogenous retroviruses and retrotransposons have been the predominant mobile elements in mammalian adaptive evolution, with the notable exception of bats, where DNA transposons are the major agents of RW genome inscriptions. A few examples of independent but convergent exaptation of mobile DNA elements for similar regulatory rewiring functions are noted.
Reconstitution of wild type viral DNA in simian cells transfected with early and late SV40 defective genomes.

PubMed

O'Neill, F J; Gao, Y; Xu, X

1993-11-01

The DNAs of polyomaviruses ordinarily exist as a single circular molecule of approximately 5000 base pairs. Variants of SV40, BKV and JCV have been described which contain two complementing defective DNA molecules. These defectives, which form a bipartite genome structure, contain either the viral early region or the late region. The defectives have the unique property of being able to tolerate variable sized reiterations of regulatory and terminus region sequences, and portions of the coding region. They can also exchange coding region sequences with other polyomaviruses. It has been suggested that the bipartite genome structure might be a stage in the evolution of polyomaviruses which can uniquely sustain genome and sequence diversity. However, it is not known if the regulatory and terminus region sequences are highly mutable. Also, it is not known if the bipartite genome structure is reversible and what the conditions might be which would favor restoration of the monomolecular genome structure. We addressed the first question by sequencing the reiterated regulatory and terminus regions of E- and L-SV40 DNAs. This revealed a large number of mutations in the regulatory regions of the defective genomes, including deletions, insertions, rearrangements and base substitutions. We also detected insertions and base substitutions in the T-antigen gene. We addressed the second question by introducing into permissive simian cells, E- and L-SV40 genomes which had been engineered to contain only a single regulatory region. Analysis of viral DNA from transfected cells demonstrated recombined genomes containing a wild type monomolecular DNA structure. However, the complete defectives, containing reiterated regulatory regions, could often compete away the wild type genomes. The recombinant monomolecular genomes were isolated, cloned and found to be infectious. All of the DNA alterations identified in one of the regulatory regions of E-SV40 DNA were present in the recombinant monomolecular genomes. These and other findings indicate that the bipartite genome state can sustain many mutations which wtSV40 cannot directly sustain. However, the mutations can later be introduced into the wild type genomes when the E- and L-SV40 DNAs recombine to generate a new monomolecular genome structure.
Genomic identification of regulatory elements by evolutionary sequence comparison and functional analysis.

PubMed

Loots, Gabriela G

2008-01-01

Despite remarkable recent advances in genomics that have enabled us to identify most of the genes in the human genome, comparable efforts to define transcriptional cis-regulatory elements that control gene expression are lagging behind. The difficulty of this task stems from two equally important problems: our knowledge of how regulatory elements are encoded in genomes remains elementary, and there is a vast genomic search space for regulatory elements, since most of mammalian genomes are noncoding. Comparative genomic approaches are having a remarkable impact on the study of transcriptional regulation in eukaryotes and currently represent the most efficient and reliable methods of predicting noncoding sequences likely to control the patterns of gene expression. By subjecting eukaryotic genomic sequences to computational comparisons and subsequent experimentation, we are inching our way toward a more comprehensive catalog of common regulatory motifs that lie behind fundamental biological processes. We are still far from comprehending how the transcriptional regulatory code is encrypted in the human genome and providing an initial global view of regulatory gene networks, but collectively, the continued development of comparative and experimental approaches will rapidly expand our knowledge of the transcriptional regulome.
Decoding the genome beyond sequencing: the new phase of genomic research.

PubMed

Heng, Henry H Q; Liu, Guo; Stevens, Joshua B; Bremer, Steven W; Ye, Karen J; Abdallah, Batoul Y; Horne, Steven D; Ye, Christine J

2011-10-01

While our understanding of gene-based biology has greatly improved, it is clear that the function of the genome and most diseases cannot be fully explained by genes and other regulatory elements. Genes and the genome represent distinct levels of genetic organization with their own coding systems; Genes code parts like protein and RNA, but the genome codes the structure of genetic networks, which are defined by the whole set of genes, chromosomes and their topological interactions within a cell. Accordingly, the genetic code of DNA offers limited understanding of genome functions. In this perspective, we introduce the genome theory which calls for the departure of gene-centric genomic research. To make this transition for the next phase of genomic research, it is essential to acknowledge the importance of new genome-based biological concepts and to establish new technology platforms to decode the genome beyond sequencing. Copyright © 2011 Elsevier Inc. All rights reserved.
Decoding the complex genetic causes of heart diseases using systems biology.

PubMed

Djordjevic, Djordje; Deshpande, Vinita; Szczesnik, Tomasz; Yang, Andrian; Humphreys, David T; Giannoulatou, Eleni; Ho, Joshua W K

2015-03-01

The pace of disease gene discovery is still much slower than expected, even with the use of cost-effective DNA sequencing and genotyping technologies. It is increasingly clear that many inherited heart diseases have a more complex polygenic aetiology than previously thought. Understanding the role of gene-gene interactions, epigenetics, and non-coding regulatory regions is becoming increasingly critical in predicting the functional consequences of genetic mutations identified by genome-wide association studies and whole-genome or exome sequencing. A systems biology approach is now being widely employed to systematically discover genes that are involved in heart diseases in humans or relevant animal models through bioinformatics. The overarching premise is that the integration of high-quality causal gene regulatory networks (GRNs), genomics, epigenomics, transcriptomics and other genome-wide data will greatly accelerate the discovery of the complex genetic causes of congenital and complex heart diseases. This review summarises state-of-the-art genomic and bioinformatics techniques that are used in accelerating the pace of disease gene discovery in heart diseases. Accompanying this review, we provide an interactive web-resource for systems biology analysis of mammalian heart development and diseases, CardiacCode ( http://CardiacCode.victorchang.edu.au/ ). CardiacCode features a dataset of over 700 pieces of manually curated genetic or molecular perturbation data, which enables the inference of a cardiac-specific GRN of 280 regulatory relationships between 33 regulator genes and 129 target genes. We believe this growing resource will fill an urgent unmet need to fully realise the true potential of predictive and personalised genomic medicine in tackling human heart disease.
RNA-Seq Based Transcriptional Map of Bovine Respiratory Disease Pathogen “Histophilus somni 2336”

PubMed Central

Kumar, Ranjit; Lawrence, Mark L.; Watt, James; Cooksey, Amanda M.; Burgess, Shane C.; Nanduri, Bindu

2012-01-01

Genome structural annotation, i.e., identification and demarcation of the boundaries for all the functional elements in a genome (e.g., genes, non-coding RNAs, proteins and regulatory elements), is a prerequisite for systems level analysis. Current genome annotation programs do not identify all of the functional elements of the genome, especially small non-coding RNAs (sRNAs). Whole genome transcriptome analysis is a complementary method to identify “novel” genes, small RNAs, regulatory regions, and operon structures, thus improving the structural annotation in bacteria. In particular, the identification of non-coding RNAs has revealed their widespread occurrence and functional importance in gene regulation, stress and virulence. However, very little is known about non-coding transcripts in Histophilus somni, one of the causative agents of Bovine Respiratory Disease (BRD) as well as bovine infertility, abortion, septicemia, arthritis, myocarditis, and thrombotic meningoencephalitis. In this study, we report a single nucleotide resolution transcriptome map of H. somni strain 2336 using RNA-Seq method. The RNA-Seq based transcriptome map identified 94 sRNAs in the H. somni genome of which 82 sRNAs were never predicted or reported in earlier studies. We also identified 38 novel potential protein coding open reading frames that were absent in the current genome annotation. The transcriptome map allowed the identification of 278 operon (total 730 genes) structures in the genome. When compared with the genome sequence of a non-virulent strain 129Pt, a disproportionate number of sRNAs (∼30%) were located in genomic region unique to strain 2336 (∼18% of the total genome). This observation suggests that a number of the newly identified sRNAs in strain 2336 may be involved in strain-specific adaptations. PMID:22276113
RNA-seq based transcriptional map of bovine respiratory disease pathogen "Histophilus somni 2336".

PubMed

Kumar, Ranjit; Lawrence, Mark L; Watt, James; Cooksey, Amanda M; Burgess, Shane C; Nanduri, Bindu

2012-01-01

Genome structural annotation, i.e., identification and demarcation of the boundaries for all the functional elements in a genome (e.g., genes, non-coding RNAs, proteins and regulatory elements), is a prerequisite for systems level analysis. Current genome annotation programs do not identify all of the functional elements of the genome, especially small non-coding RNAs (sRNAs). Whole genome transcriptome analysis is a complementary method to identify "novel" genes, small RNAs, regulatory regions, and operon structures, thus improving the structural annotation in bacteria. In particular, the identification of non-coding RNAs has revealed their widespread occurrence and functional importance in gene regulation, stress and virulence. However, very little is known about non-coding transcripts in Histophilus somni, one of the causative agents of Bovine Respiratory Disease (BRD) as well as bovine infertility, abortion, septicemia, arthritis, myocarditis, and thrombotic meningoencephalitis. In this study, we report a single nucleotide resolution transcriptome map of H. somni strain 2336 using RNA-Seq method.The RNA-Seq based transcriptome map identified 94 sRNAs in the H. somni genome of which 82 sRNAs were never predicted or reported in earlier studies. We also identified 38 novel potential protein coding open reading frames that were absent in the current genome annotation. The transcriptome map allowed the identification of 278 operon (total 730 genes) structures in the genome. When compared with the genome sequence of a non-virulent strain 129Pt, a disproportionate number of sRNAs (∼30%) were located in genomic region unique to strain 2336 (∼18% of the total genome). This observation suggests that a number of the newly identified sRNAs in strain 2336 may be involved in strain-specific adaptations.
[The ENCODE project and functional genomics studies].

PubMed

Ding, Nan; Qu, Hongzhu; Fang, Xiangdong

2014-03-01

Upon the completion of the Human Genome Project, scientists have been trying to interpret the underlying genomic code for human biology. Since 2003, National Human Genome Research Institute (NHGRI) has invested nearly $0.3 billion and gathered over 440 scientists from more than 32 institutions in the United States, China, United Kingdom, Japan, Spain and Singapore to initiate the Encyclopedia of DNA Elements (ENCODE) project, aiming to identify and analyze all regulatory elements in the human genome. Taking advantage of the development of next-generation sequencing technologies and continuous improvement of experimental methods, ENCODE had made remarkable achievements: identified methylation and histone modification of DNA sequences and their regulatory effects on gene expression through altering chromatin structures, categorized binding sites of various transcription factors and constructed their regulatory networks, further revised and updated database for pseudogenes and non-coding RNA, and identified SNPs in regulatory sequences associated with diseases. These findings help to comprehensively understand information embedded in gene and genome sequences, the function of regulatory elements as well as the molecular mechanism underlying the transcriptional regulation by noncoding regions, and provide extensive data resource for life sciences, particularly for translational medicine. We re-viewed the contributions of high-throughput sequencing platform development and bioinformatical technology improve-ment to the ENCODE project, the association between epigenetics studies and the ENCODE project, and the major achievement of the ENCODE project. We also provided our prospective on the role of the ENCODE project in promoting the development of basic and clinical medicine.
Polymorphisms of 20 regulatory proteins between Mycobacterium tuberculosis and Mycobacterium bovis.

PubMed

Bigi, María M; Blanco, Federico Carlos; Araújo, Flabio R; Thacker, Tyler C; Zumárraga, Martín J; Cataldi, Angel A; Soria, Marcelo A; Bigi, Fabiana

2016-08-01

Mycobacterium tuberculosis and Mycobacterium bovis are responsible for tuberculosis in humans and animals, respectively. Both species are closely related and belong to the Mycobacterium tuberculosis complex (MTC). M. tuberculosis is the most ancient species from which M. bovis and other members of the MTC evolved. The genome of M. bovis is over >99.95% identical to that of M. tuberculosis but with seven deletions ranging in size from 1 to 12.7 kb. In addition, 1200 single nucleotide mutations in coding regions distinguish M. bovis from M. tuberculosis. In the present study, we assessed 75 M. tuberculosis genomes and 23 M. bovis genomes to identify non-synonymous mutations in 202 coding sequences of regulatory genes between both species. We identified species-specific variants in 20 regulatory proteins and confirmed differential expression of hypoxia-related genes between M. bovis and M. tuberculosis. © 2016 The Societies and John Wiley & Sons Australia, Ltd.
GREGOR: evaluating global enrichment of trait-associated variants in epigenomic features using a systematic, data-driven approach.

PubMed

Schmidt, Ellen M; Zhang, Ji; Zhou, Wei; Chen, Jin; Mohlke, Karen L; Chen, Y Eugene; Willer, Cristen J

2015-08-15

The majority of variation identified by genome wide association studies falls in non-coding genomic regions and is hypothesized to impact regulatory elements that modulate gene expression. Here we present a statistically rigorous software tool GREGOR (Genomic Regulatory Elements and Gwas Overlap algoRithm) for evaluating enrichment of any set of genetic variants with any set of regulatory features. Using variants from five phenotypes, we describe a data-driven approach to determine the tissue and cell types most relevant to a trait of interest and to identify the subset of regulatory features likely impacted by these variants. Last, we experimentally evaluate six predicted functional variants at six lipid-associated loci and demonstrate significant evidence for allele-specific impact on expression levels. GREGOR systematically evaluates enrichment of genetic variation with the vast collection of regulatory data available to explore novel biological mechanisms of disease and guide us toward the functional variant at trait-associated loci. GREGOR, including source code, documentation, examples, and executables, is available at http://genome.sph.umich.edu/wiki/GREGOR. cristen@umich.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Decoding the genome with an integrative analysis tool: combinatorial CRM Decoder.

PubMed

Kang, Keunsoo; Kim, Joomyeong; Chung, Jae Hoon; Lee, Daeyoup

2011-09-01

The identification of genome-wide cis-regulatory modules (CRMs) and characterization of their associated epigenetic features are fundamental steps toward the understanding of gene regulatory networks. Although integrative analysis of available genome-wide information can provide new biological insights, the lack of novel methodologies has become a major bottleneck. Here, we present a comprehensive analysis tool called combinatorial CRM decoder (CCD), which utilizes the publicly available information to identify and characterize genome-wide CRMs in a species of interest. CCD first defines a set of the epigenetic features which is significantly associated with a set of known CRMs as a code called 'trace code', and subsequently uses the trace code to pinpoint putative CRMs throughout the genome. Using 61 genome-wide data sets obtained from 17 independent mouse studies, CCD successfully catalogued ∼12 600 CRMs (five distinct classes) including polycomb repressive complex 2 target sites as well as imprinting control regions. Interestingly, we discovered that ∼4% of the identified CRMs belong to at least two different classes named 'multi-functional CRM', suggesting their functional importance for regulating spatiotemporal gene expression. From these examples, we show that CCD can be applied to any potential genome-wide datasets and therefore will shed light on unveiling genome-wide CRMs in various species.
CSTminer: a web tool for the identification of coding and noncoding conserved sequence tags through cross-species genome comparison

PubMed Central

Castrignanò, Tiziana; Canali, Alessandro; Grillo, Giorgio; Liuni, Sabino; Mignone, Flavio; Pesole, Graziano

2004-01-01

The identification and characterization of genome tracts that are highly conserved across species during evolution may contribute significantly to the functional annotation of whole-genome sequences. Indeed, such sequences are likely to correspond to known or unknown coding exons or regulatory motifs. Here, we present a web server implementing a previously developed algorithm that, by comparing user-submitted genome sequences, is able to identify statistically significant conserved blocks and assess their coding or noncoding nature through the measure of a coding potential score. The web tool, available at http://www.caspur.it/CSTminer/, is dynamically interconnected with the Ensembl genome resources and produces a graphical output showing a map of detected conserved sequences and annotated gene features. PMID:15215464
Regulatory sequence analysis tools.

PubMed

van Helden, Jacques

2003-07-01

The web resource Regulatory Sequence Analysis Tools (RSAT) (http://rsat.ulb.ac.be/rsat) offers a collection of software tools dedicated to the prediction of regulatory sites in non-coding DNA sequences. These tools include sequence retrieval, pattern discovery, pattern matching, genome-scale pattern matching, feature-map drawing, random sequence generation and other utilities. Alternative formats are supported for the representation of regulatory motifs (strings or position-specific scoring matrices) and several algorithms are proposed for pattern discovery. RSAT currently holds >100 fully sequenced genomes and these data are regularly updated from GenBank.

AP1 Keeps Chromatin Poised for Action | Center for Cancer Research

Cancer.gov

The human genome harbors gene-encoding DNA, the blueprint for building proteins that regulate cellular function. Embedded across the genome, in non-coding regions, are DNA elements to which regulatory factors bind. The interaction of regulatory factors with DNA at these sites modifies gene expression to modulate cell activity. In cells, DNA exists in a complex with proteins
Prevalence of transcription promoters within archaeal operons and coding sequences

PubMed Central

Koide, Tie; Reiss, David J; Bare, J Christopher; Pang, Wyming Lee; Facciotti, Marc T; Schmid, Amy K; Pan, Min; Marzolf, Bruz; Van, Phu T; Lo, Fang-Yin; Pratap, Abhishek; Deutsch, Eric W; Peterson, Amelia; Martin, Dan; Baliga, Nitin S

2009-01-01

Despite the knowledge of complex prokaryotic-transcription mechanisms, generalized rules, such as the simplified organization of genes into operons with well-defined promoters and terminators, have had a significant role in systems analysis of regulatory logic in both bacteria and archaea. Here, we have investigated the prevalence of alternate regulatory mechanisms through genome-wide characterization of transcript structures of ∼64% of all genes, including putative non-coding RNAs in Halobacterium salinarum NRC-1. Our integrative analysis of transcriptome dynamics and protein–DNA interaction data sets showed widespread environment-dependent modulation of operon architectures, transcription initiation and termination inside coding sequences, and extensive overlap in 3′ ends of transcripts for many convergently transcribed genes. A significant fraction of these alternate transcriptional events correlate to binding locations of 11 transcription factors and regulators (TFs) inside operons and annotated genes—events usually considered spurious or non-functional. Using experimental validation, we illustrate the prevalence of overlapping genomic signals in archaeal transcription, casting doubt on the general perception of rigid boundaries between coding sequences and regulatory elements. PMID:19536208
Prevalence of transcription promoters within archaeal operons and coding sequences.

PubMed

Koide, Tie; Reiss, David J; Bare, J Christopher; Pang, Wyming Lee; Facciotti, Marc T; Schmid, Amy K; Pan, Min; Marzolf, Bruz; Van, Phu T; Lo, Fang-Yin; Pratap, Abhishek; Deutsch, Eric W; Peterson, Amelia; Martin, Dan; Baliga, Nitin S

2009-01-01

Despite the knowledge of complex prokaryotic-transcription mechanisms, generalized rules, such as the simplified organization of genes into operons with well-defined promoters and terminators, have had a significant role in systems analysis of regulatory logic in both bacteria and archaea. Here, we have investigated the prevalence of alternate regulatory mechanisms through genome-wide characterization of transcript structures of approximately 64% of all genes, including putative non-coding RNAs in Halobacterium salinarum NRC-1. Our integrative analysis of transcriptome dynamics and protein-DNA interaction data sets showed widespread environment-dependent modulation of operon architectures, transcription initiation and termination inside coding sequences, and extensive overlap in 3' ends of transcripts for many convergently transcribed genes. A significant fraction of these alternate transcriptional events correlate to binding locations of 11 transcription factors and regulators (TFs) inside operons and annotated genes-events usually considered spurious or non-functional. Using experimental validation, we illustrate the prevalence of overlapping genomic signals in archaeal transcription, casting doubt on the general perception of rigid boundaries between coding sequences and regulatory elements.
On the Concept of Cis-regulatory Information: From Sequence Motifs to Logic Functions

NASA Astrophysics Data System (ADS)

Tarpine, Ryan; Istrail, Sorin

The regulatory genome is about the “system level organization of the core genomic regulatory apparatus, and how this is the locus of causality underlying the twin phenomena of animal development and animal evolution” (E.H. Davidson. The Regulatory Genome: Gene Regulatory Networks in Development and Evolution, Academic Press, 2006). Information processing in the regulatory genome is done through regulatory states, defined as sets of transcription factors (sequence-specific DNA binding proteins which determine gene expression) that are expressed and active at the same time. The core information processing machinery consists of modular DNA sequence elements, called cis-modules, that interact with transcription factors. The cis-modules “read” the information contained in the regulatory state of the cell through transcription factor binding, “process” it, and directly or indirectly communicate with the basal transcription apparatus to determine gene expression. This endowment of each gene with the information-receiving capacity through their cis-regulatory modules is essential for the response to every possible regulatory state to which it might be exposed during all phases of the life cycle and in all cell types. We present here a set of challenges addressed by our CYRENE research project aimed at studying the cis-regulatory code of the regulatory genome. The CYRENE Project is devoted to (1) the construction of a database, the cis-Lexicon, containing comprehensive information across species about experimentally validated cis-regulatory modules; and (2) the software development of a next-generation genome browser, the cis-Browser, specialized for the regulatory genome. The presentation is anchored on three main computational challenges: the Gene Naming Problem, the Consensus Sequence Bottleneck Problem, and the Logic Function Inference Problem.
Parallel evolution of chordate cis-regulatory code for development.

PubMed

Doglio, Laura; Goode, Debbie K; Pelleri, Maria C; Pauls, Stefan; Frabetti, Flavia; Shimeld, Sebastian M; Vavouri, Tanya; Elgar, Greg

2013-11-01

Urochordates are the closest relatives of vertebrates and at the larval stage, possess a characteristic bilateral chordate body plan. In vertebrates, the genes that orchestrate embryonic patterning are in part regulated by highly conserved non-coding elements (CNEs), yet these elements have not been identified in urochordate genomes. Consequently the evolution of the cis-regulatory code for urochordate development remains largely uncharacterised. Here, we use genome-wide comparisons between C. intestinalis and C. savignyi to identify putative urochordate cis-regulatory sequences. Ciona conserved non-coding elements (ciCNEs) are associated with largely the same key regulatory genes as vertebrate CNEs. Furthermore, some of the tested ciCNEs are able to activate reporter gene expression in both zebrafish and Ciona embryos, in a pattern that at least partially overlaps that of the gene they associate with, despite the absence of sequence identity. We also show that the ability of a ciCNE to up-regulate gene expression in vertebrate embryos can in some cases be localised to short sub-sequences, suggesting that functional cross-talk may be defined by small regions of ancestral regulatory logic, although functional sub-sequences may also be dispersed across the whole element. We conclude that the structure and organisation of cis-regulatory modules is very different between vertebrates and urochordates, reflecting their separate evolutionary histories. However, functional cross-talk still exists because the same repertoire of transcription factors has likely guided their parallel evolution, exploiting similar sets of binding sites but in different combinations.
Sheep genome functional annotation reveals proximal regulatory elements contributed to the evolution of modern breeds.

PubMed

Naval-Sanchez, Marina; Nguyen, Quan; McWilliam, Sean; Porto-Neto, Laercio R; Tellam, Ross; Vuocolo, Tony; Reverter, Antonio; Perez-Enciso, Miguel; Brauning, Rudiger; Clarke, Shannon; McCulloch, Alan; Zamani, Wahid; Naderi, Saeid; Rezaei, Hamid Reza; Pompanon, Francois; Taberlet, Pierre; Worley, Kim C; Gibbs, Richard A; Muzny, Donna M; Jhangiani, Shalini N; Cockett, Noelle; Daetwyler, Hans; Kijas, James

2018-02-28

Domestication fundamentally reshaped animal morphology, physiology and behaviour, offering the opportunity to investigate the molecular processes driving evolutionary change. Here we assess sheep domestication and artificial selection by comparing genome sequence from 43 modern breeds (Ovis aries) and their Asian mouflon ancestor (O. orientalis) to identify selection sweeps. Next, we provide a comparative functional annotation of the sheep genome, validated using experimental ChIP-Seq of sheep tissue. Using these annotations, we evaluate the impact of selection and domestication on regulatory sequences and find that sweeps are significantly enriched for protein coding genes, proximal regulatory elements of genes and genome features associated with active transcription. Finally, we find individual sites displaying strong allele frequency divergence are enriched for the same regulatory features. Our data demonstrate that remodelling of gene expression is likely to have been one of the evolutionary forces that drove phenotypic diversification of this common livestock species.
The standard operating procedure of the DOE-JGI Microbial Genome Annotation Pipeline (MGAP v.4).

PubMed

Huntemann, Marcel; Ivanova, Natalia N; Mavromatis, Konstantinos; Tripp, H James; Paez-Espino, David; Palaniappan, Krishnaveni; Szeto, Ernest; Pillay, Manoj; Chen, I-Min A; Pati, Amrita; Nielsen, Torben; Markowitz, Victor M; Kyrpides, Nikos C

2015-01-01

The DOE-JGI Microbial Genome Annotation Pipeline performs structural and functional annotation of microbial genomes that are further included into the Integrated Microbial Genome comparative analysis system. MGAP is applied to assembled nucleotide sequence datasets that are provided via the IMG submission site. Dataset submission for annotation first requires project and associated metadata description in GOLD. The MGAP sequence data processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNA features, as well as CRISPR elements. Structural annotation is followed by assignment of protein product names and functions.
Open chromatin reveals the functional maize genome

USDA-ARS?s Scientific Manuscript database

Every cellular process mediated through nuclear DNA must contend with chromatin. As results from ENCODE show, open chromatin assays can efficiently integrate across diverse regulatory elements, revealing functional non-coding genome. In this study, we use a MNase hypersensitivity assay to discover o...
Contribution of transposable elements and distal enhancers to evolution of human-specific features of interphase chromatin architecture in embryonic stem cells.

PubMed

Glinsky, Gennadi V

2018-03-01

Transposable elements have made major evolutionary impacts on creation of primate-specific and human-specific genomic regulatory loci and species-specific genomic regulatory networks (GRNs). Molecular and genetic definitions of human-specific changes to GRNs contributing to development of unique to human phenotypes remain a highly significant challenge. Genome-wide proximity placement analysis of diverse families of human-specific genomic regulatory loci (HSGRL) identified topologically associating domains (TADs) that are significantly enriched for HSGRL and designated rapidly evolving in human TADs. Here, the analysis of HSGRL, hESC-enriched enhancers, super-enhancers (SEs), and specific sub-TAD structures termed super-enhancer domains (SEDs) has been performed. In the hESC genome, 331 of 504 (66%) of SED-harboring TADs contain HSGRL and 68% of SEDs co-localize with HSGRL, suggesting that emergence of HSGRL may have rewired SED-associated GRNs within specific TADs by inserting novel and/or erasing existing non-coding regulatory sequences. Consequently, markedly distinct features of the principal regulatory structures of interphase chromatin evolved in the hESC genome compared to mouse: the SED quantity is 3-fold higher and the median SED size is significantly larger. Concomitantly, the overall TAD quantity is increased by 42% while the median TAD size is significantly decreased (p = 9.11E-37) in the hESC genome. Present analyses illustrate a putative global role for transposable elements and HSGRL in shaping the human-specific features of the interphase chromatin organization and functions, which are facilitated by accelerated creation of novel transcription factor binding sites and new enhancers driven by targeted placement of HSGRL at defined genomic coordinates. A trend toward the convergence of TAD and SED architectures of interphase chromatin in the hESC genome may reflect changes of 3D-folding patterns of linear chromatin fibers designed to enhance both regulatory complexity and functional precision of GRNs by creating predominantly a single gene (or a set of functionally linked genes) per regulatory domain structures. Collectively, present analyses reveal critical evolutionary contributions of transposable elements and distal enhancers to creation of thousands primate- and human-specific elements of a chromatin folding code, which defines the 3D context of interphase chromatin both restricting and facilitating biological functions of GRNs.
QuIN: A Web Server for Querying and Visualizing Chromatin Interaction Networks.

PubMed

Thibodeau, Asa; Márquez, Eladio J; Luo, Oscar; Ruan, Yijun; Menghi, Francesca; Shin, Dong-Guk; Stitzel, Michael L; Vera-Licona, Paola; Ucar, Duygu

2016-06-01

Recent studies of the human genome have indicated that regulatory elements (e.g. promoters and enhancers) at distal genomic locations can interact with each other via chromatin folding and affect gene expression levels. Genomic technologies for mapping interactions between DNA regions, e.g., ChIA-PET and HiC, can generate genome-wide maps of interactions between regulatory elements. These interaction datasets are important resources to infer distal gene targets of non-coding regulatory elements and to facilitate prioritization of critical loci for important cellular functions. With the increasing diversity and complexity of genomic information and public ontologies, making sense of these datasets demands integrative and easy-to-use software tools. Moreover, network representation of chromatin interaction maps enables effective data visualization, integration, and mining. Currently, there is no software that can take full advantage of network theory approaches for the analysis of chromatin interaction datasets. To fill this gap, we developed a web-based application, QuIN, which enables: 1) building and visualizing chromatin interaction networks, 2) annotating networks with user-provided private and publicly available functional genomics and interaction datasets, 3) querying network components based on gene name or chromosome location, and 4) utilizing network based measures to identify and prioritize critical regulatory targets and their direct and indirect interactions. QuIN's web server is available at http://quin.jax.org QuIN is developed in Java and JavaScript, utilizing an Apache Tomcat web server and MySQL database and the source code is available under the GPLV3 license available on GitHub: https://github.com/UcarLab/QuIN/.
HiView: an integrative genome browser to leverage Hi-C results for the interpretation of GWAS variants.

PubMed

Xu, Zheng; Zhang, Guosheng; Duan, Qing; Chai, Shengjie; Zhang, Baqun; Wu, Cong; Jin, Fulai; Yue, Feng; Li, Yun; Hu, Ming

2016-03-11

Genome-wide association studies (GWAS) have identified thousands of genetic variants associated with complex traits and diseases. However, most of them are located in the non-protein coding regions, and therefore it is challenging to hypothesize the functions of these non-coding GWAS variants. Recent large efforts such as the ENCODE and Roadmap Epigenomics projects have predicted a large number of regulatory elements. However, the target genes of these regulatory elements remain largely unknown. Chromatin conformation capture based technologies such as Hi-C can directly measure the chromatin interactions and have generated an increasingly comprehensive catalog of the interactome between the distal regulatory elements and their potential target genes. Leveraging such information revealed by Hi-C holds the promise of elucidating the functions of genetic variants in human diseases. In this work, we present HiView, the first integrative genome browser to leverage Hi-C results for the interpretation of GWAS variants. HiView is able to display Hi-C data and statistical evidence for chromatin interactions in genomic regions surrounding any given GWAS variant, enabling straightforward visualization and interpretation. We believe that as the first GWAS variants-centered Hi-C genome browser, HiView is a useful tool guiding post-GWAS functional genomics studies. HiView is freely accessible at: http://www.unc.edu/~yunmli/HiView .
Early Evolution of Conserved Regulatory Sequences Associated with Development in Vertebrates

PubMed Central

McEwen, Gayle K.; Goode, Debbie K.; Parker, Hugo J.; Woolfe, Adam; Callaway, Heather; Elgar, Greg

2009-01-01

Comparisons between diverse vertebrate genomes have uncovered thousands of highly conserved non-coding sequences, an increasing number of which have been shown to function as enhancers during early development. Despite their extreme conservation over 500 million years from humans to cartilaginous fish, these elements appear to be largely absent in invertebrates, and, to date, there has been little understanding of their mode of action or the evolutionary processes that have modelled them. We have now exploited emerging genomic sequence data for the sea lamprey, Petromyzon marinus, to explore the depth of conservation of this type of element in the earliest diverging extant vertebrate lineage, the jawless fish (agnathans). We searched for conserved non-coding elements (CNEs) at 13 human gene loci and identified lamprey elements associated with all but two of these gene regions. Although markedly shorter and less well conserved than within jawed vertebrates, identified lamprey CNEs are able to drive specific patterns of expression in zebrafish embryos, which are almost identical to those driven by the equivalent human elements. These CNEs are therefore a unique and defining characteristic of all vertebrates. Furthermore, alignment of lamprey and other vertebrate CNEs should permit the identification of persistent sequence signatures that are responsible for common patterns of expression and contribute to the elucidation of the regulatory language in CNEs. Identifying the core regulatory code for development, common to all vertebrates, provides a foundation upon which regulatory networks can be constructed and might also illuminate how large conserved regulatory sequence blocks evolve and become fixed in genomic DNA. PMID:20011110
The noncoding human genome and the future of personalised medicine.

PubMed

Cowie, Philip; Hay, Elizabeth A; MacKenzie, Alasdair

2015-01-30

Non-coding cis-regulatory sequences act as the 'eyes' of the genome and their role is to perceive, organise and relay cellular communication information to RNA polymerase II at gene promoters. The evolution of these sequences, that include enhancers, silencers, insulators and promoters, has progressed in multicellular organisms to the extent that cis-regulatory sequences make up as much as 10% of the human genome. Parallel evidence suggests that 75% of polymorphisms associated with heritable disease occur within predicted cis-regulatory sequences that effectively alter the 'perception' of cis-regulatory sequences or render them blind to cell communication cues. Cis-regulatory sequences also act as major functional targets of epigenetic modification thus representing an important conduit through which changes in DNA-methylation affects disease susceptibility. The objectives of the current review are (1) to describe what has been learned about identifying and characterising cis-regulatory sequences since the sequencing of the human genome; (2) to discuss their role in interpreting cell signalling pathways pathways; and (3) outline how this role may be altered by polymorphisms and epigenetic changes. We argue that the importance of the cis-regulatory genome for the interpretation of cellular communication pathways cannot be overstated and understanding its role in health and disease will be critical for the future development of personalised medicine.
The agents of natural genome editing.

PubMed

Witzany, Guenther

2011-06-01

The DNA serves as a stable information storage medium and every protein which is needed by the cell is produced from this blueprint via an RNA intermediate code. More recently it was found that an abundance of various RNA elements cooperate in a variety of steps and substeps as regulatory and catalytic units with multiple competencies to act on RNA transcripts. Natural genome editing on one side is the competent agent-driven generation and integration of meaningful DNA nucleotide sequences into pre-existing genomic content arrangements, and the ability to (re-)combine and (re-)regulate them according to context-dependent (i.e. adaptational) purposes of the host organism. Natural genome editing on the other side designates the integration of all RNA activities acting on RNA transcripts without altering DNA-encoded genes. If we take the genetic code seriously as a natural code, there must be agents that are competent to act on this code because no natural code codes itself as no natural language speaks itself. As code editing agents, viral and subviral agents have been suggested because there are several indicators that demonstrate viruses competent in both RNA and DNA natural genome editing.
Decoding the non-coding genome: elucidating genetic risk outside the coding genome.

PubMed

Barr, C L; Misener, V L

2016-01-01

Current evidence emerging from genome-wide association studies indicates that the genetic underpinnings of complex traits are likely attributable to genetic variation that changes gene expression, rather than (or in combination with) variation that changes protein-coding sequences. This is particularly compelling with respect to psychiatric disorders, as genetic changes in regulatory regions may result in differential transcriptional responses to developmental cues and environmental/psychosocial stressors. Until recently, however, the link between transcriptional regulation and psychiatric genetic risk has been understudied. Multiple obstacles have contributed to the paucity of research in this area, including challenges in identifying the positions of remote (distal from the promoter) regulatory elements (e.g. enhancers) and their target genes and the underrepresentation of neural cell types and brain tissues in epigenome projects - the availability of high-quality brain tissues for epigenetic and transcriptome profiling, particularly for the adolescent and developing brain, has been limited. Further challenges have arisen in the prediction and testing of the functional impact of DNA variation with respect to multiple aspects of transcriptional control, including regulatory-element interaction (e.g. between enhancers and promoters), transcription factor binding and DNA methylation. Further, the brain has uncommon DNA-methylation marks with unique genomic distributions not found in other tissues - current evidence suggests the involvement of non-CG methylation and 5-hydroxymethylation in neurodevelopmental processes but much remains unknown. We review here knowledge gaps as well as both technological and resource obstacles that will need to be overcome in order to elucidate the involvement of brain-relevant gene-regulatory variants in genetic risk for psychiatric disorders. © 2015 John Wiley & Sons Ltd and International Behavioural and Neural Genetics Society.
Feather Development Genes and Associated Regulatory Innovation Predate the Origin of Dinosauria

PubMed Central

Lowe, Craig B.; Clarke, Julia A.; Baker, Allan J.; Haussler, David; Edwards, Scott V.

2015-01-01

The evolution of avian feathers has recently been illuminated by fossils and the identification of genes involved in feather patterning and morphogenesis. However, molecular studies have focused mainly on protein-coding genes. Using comparative genomics and more than 600,000 conserved regulatory elements, we show that patterns of genome evolution in the vicinity of feather genes are consistent with a major role for regulatory innovation in the evolution of feathers. Rates of innovation at feather regulatory elements exhibit an extended period of innovation with peaks in the ancestors of amniotes and archosaurs. We estimate that 86% of such regulatory elements and 100% of the nonkeratin feather gene set were present prior to the origin of Dinosauria. On the branch leading to modern birds, we detect a strong signal of regulatory innovation near insulin-like growth factor binding protein (IGFBP) 2 and IGFBP5, which have roles in body size reduction, and may represent a genomic signature for the miniaturization of dinosaurian body size preceding the origin of flight. PMID:25415961
The standard operating procedure of the DOE-JGI Microbial Genome Annotation Pipeline (MGAP v.4)

DOE PAGES

Huntemann, Marcel; Ivanova, Natalia N.; Mavromatis, Konstantinos; ...

2015-10-26

The DOE-JGI Microbial Genome Annotation Pipeline performs structural and functional annotation of microbial genomes that are further included into the Integrated Microbial Genome comparative analysis system. MGAP is applied to assembled nucleotide sequence datasets that are provided via the IMG submission site. Dataset submission for annotation first requires project and associated metadata description in GOLD. The MGAP sequence data processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNA features, as well as CRISPR elements. In conclusion, structural annotation is followed by assignment of protein product names and functions.
The standard operating procedure of the DOE-JGI Microbial Genome Annotation Pipeline (MGAP v.4)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Huntemann, Marcel; Ivanova, Natalia N.; Mavromatis, Konstantinos

The DOE-JGI Microbial Genome Annotation Pipeline performs structural and functional annotation of microbial genomes that are further included into the Integrated Microbial Genome comparative analysis system. MGAP is applied to assembled nucleotide sequence datasets that are provided via the IMG submission site. Dataset submission for annotation first requires project and associated metadata description in GOLD. The MGAP sequence data processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNA features, as well as CRISPR elements. In conclusion, structural annotation is followed by assignment of protein product names and functions.
Altruistic functions for selfish DNA.

PubMed

Faulkner, Geoffrey J; Carninci, Piero

2009-09-15

Mammalian genomes are comprised of 30-50% transposed elements (TEs). The vast majority of these TEs are truncated and mutated fragments of retrotransposons that are no longer capable of transposition. Although initially regarded as important factors in the evolution of gene regulatory networks, TEs are now commonly perceived as neutrally evolving and non-functional genomic elements. In a major development, recent works have strongly contradicted this "selfish DNA" or "junk DNA" dogma by demonstrating that TEs use a host of novel promoters to generate RNA on a massive scale across most eukaryotic cells. This transcription frequently functions to control the expression of protein-coding genes via alternative promoters, cis regulatory non protein-coding RNAs and the formation of double stranded short RNAs. If considered in sum, these findings challenge the designation of TEs as selfish and neutrally evolving genomic elements. Here, we will expand upon these themes and discuss challenges in establishing novel TE functions in vivo.
Conserved Non-Coding Regulatory Signatures in Arabidopsis Co-Expressed Gene Modules

PubMed Central

Spangler, Jacob B.; Ficklin, Stephen P.; Luo, Feng; Freeling, Michael; Feltus, F. Alex

2012-01-01

Complex traits and other polygenic processes require coordinated gene expression. Co-expression networks model mRNA co-expression: the product of gene regulatory networks. To identify regulatory mechanisms underlying coordinated gene expression in a tissue-enriched context, ten Arabidopsis thaliana co-expression networks were constructed after manually sorting 4,566 RNA profiling datasets into aerial, flower, leaf, root, rosette, seedling, seed, shoot, whole plant, and global (all samples combined) groups. Collectively, the ten networks contained 30% of the measurable genes of Arabidopsis and were circumscribed into 5,491 modules. Modules were scrutinized for cis regulatory mechanisms putatively encoded in conserved non-coding sequences (CNSs) previously identified as remnants of a whole genome duplication event. We determined the non-random association of 1,361 unique CNSs to 1,904 co-expression network gene modules. Furthermore, the CNS elements were placed in the context of known gene regulatory networks (GRNs) by connecting 250 CNS motifs with known GRN cis elements. Our results provide support for a regulatory role of some CNS elements and suggest the functional consequences of CNS activation of co-expression in specific gene sets dispersed throughout the genome. PMID:23024789

Conserved non-coding regulatory signatures in Arabidopsis co-expressed gene modules.

PubMed

Spangler, Jacob B; Ficklin, Stephen P; Luo, Feng; Freeling, Michael; Feltus, F Alex

2012-01-01

Complex traits and other polygenic processes require coordinated gene expression. Co-expression networks model mRNA co-expression: the product of gene regulatory networks. To identify regulatory mechanisms underlying coordinated gene expression in a tissue-enriched context, ten Arabidopsis thaliana co-expression networks were constructed after manually sorting 4,566 RNA profiling datasets into aerial, flower, leaf, root, rosette, seedling, seed, shoot, whole plant, and global (all samples combined) groups. Collectively, the ten networks contained 30% of the measurable genes of Arabidopsis and were circumscribed into 5,491 modules. Modules were scrutinized for cis regulatory mechanisms putatively encoded in conserved non-coding sequences (CNSs) previously identified as remnants of a whole genome duplication event. We determined the non-random association of 1,361 unique CNSs to 1,904 co-expression network gene modules. Furthermore, the CNS elements were placed in the context of known gene regulatory networks (GRNs) by connecting 250 CNS motifs with known GRN cis elements. Our results provide support for a regulatory role of some CNS elements and suggest the functional consequences of CNS activation of co-expression in specific gene sets dispersed throughout the genome.
Activity-Dependent Human Brain Coding/Noncoding Gene Regulatory Networks

PubMed Central

Lipovich, Leonard; Dachet, Fabien; Cai, Juan; Bagla, Shruti; Balan, Karina; Jia, Hui; Loeb, Jeffrey A.

2012-01-01

While most gene transcription yields RNA transcripts that code for proteins, a sizable proportion of the genome generates RNA transcripts that do not code for proteins, but may have important regulatory functions. The brain-derived neurotrophic factor (BDNF) gene, a key regulator of neuronal activity, is overlapped by a primate-specific, antisense long noncoding RNA (lncRNA) called BDNFOS. We demonstrate reciprocal patterns of BDNF and BDNFOS transcription in highly active regions of human neocortex removed as a treatment for intractable seizures. A genome-wide analysis of activity-dependent coding and noncoding human transcription using a custom lncRNA microarray identified 1288 differentially expressed lncRNAs, of which 26 had expression profiles that matched activity-dependent coding genes and an additional 8 were adjacent to or overlapping with differentially expressed protein-coding genes. The functions of most of these protein-coding partner genes, such as ARC, include long-term potentiation, synaptic activity, and memory. The nuclear lncRNAs NEAT1, MALAT1, and RPPH1, composing an RNAse P-dependent lncRNA-maturation pathway, were also upregulated. As a means to replicate human neuronal activity, repeated depolarization of SY5Y cells resulted in sustained CREB activation and produced an inverse pattern of BDNF-BDNFOS co-expression that was not achieved with a single depolarization. RNAi-mediated knockdown of BDNFOS in human SY5Y cells increased BDNF expression, suggesting that BDNFOS directly downregulates BDNF. Temporal expression patterns of other lncRNA-messenger RNA pairs validated the effect of chronic neuronal activity on the transcriptome and implied various lncRNA regulatory mechanisms. lncRNAs, some of which are unique to primates, thus appear to have potentially important regulatory roles in activity-dependent human brain plasticity. PMID:22960213
QuIN: A Web Server for Querying and Visualizing Chromatin Interaction Networks

PubMed Central

Thibodeau, Asa; Márquez, Eladio J.; Luo, Oscar; Ruan, Yijun; Shin, Dong-Guk; Stitzel, Michael L.; Ucar, Duygu

2016-01-01

Recent studies of the human genome have indicated that regulatory elements (e.g. promoters and enhancers) at distal genomic locations can interact with each other via chromatin folding and affect gene expression levels. Genomic technologies for mapping interactions between DNA regions, e.g., ChIA-PET and HiC, can generate genome-wide maps of interactions between regulatory elements. These interaction datasets are important resources to infer distal gene targets of non-coding regulatory elements and to facilitate prioritization of critical loci for important cellular functions. With the increasing diversity and complexity of genomic information and public ontologies, making sense of these datasets demands integrative and easy-to-use software tools. Moreover, network representation of chromatin interaction maps enables effective data visualization, integration, and mining. Currently, there is no software that can take full advantage of network theory approaches for the analysis of chromatin interaction datasets. To fill this gap, we developed a web-based application, QuIN, which enables: 1) building and visualizing chromatin interaction networks, 2) annotating networks with user-provided private and publicly available functional genomics and interaction datasets, 3) querying network components based on gene name or chromosome location, and 4) utilizing network based measures to identify and prioritize critical regulatory targets and their direct and indirect interactions. AVAILABILITY: QuIN’s web server is available at http://quin.jax.org QuIN is developed in Java and JavaScript, utilizing an Apache Tomcat web server and MySQL database and the source code is available under the GPLV3 license available on GitHub: https://github.com/UcarLab/QuIN/. PMID:27336171
Interpreting Mammalian Evolution using Fugu Genome Comparisons

DOE Office of Scientific and Technical Information (OSTI.GOV)

Stubbs, L; Ovcharenko, I; Loots, G G

2004-04-02

Comparative sequence analysis of the human and the pufferfish Fugu rubripes (fugu) genomes has revealed several novel functional coding and noncoding regions in the human genome. In particular, the fugu genome has been extremely valuable for identifying transcriptional regulatory elements in human loci harboring unusually high levels of evolutionary conservation to rodent genomes. In such regions, the large evolutionary distance between human and fishes provides an additional filter through which functional noncoding elements can be detected with high efficiency.
An expanding universe of the non-coding genome in cancer biology.

PubMed

Xue, Bin; He, Lin

2014-06-01

Neoplastic transformation is caused by accumulation of genetic and epigenetic alterations that ultimately convert normal cells into tumor cells with uncontrolled proliferation and survival, unlimited replicative potential and invasive growth [Hanahan,D. et al. (2011) Hallmarks of cancer: the next generation. Cell, 144, 646-674]. Although the majority of the cancer studies have focused on the functions of protein-coding genes, emerging evidence has started to reveal the importance of the vast non-coding genome, which constitutes more than 98% of the human genome. A number of non-coding RNAs (ncRNAs) derived from the 'dark matter' of the human genome exhibit cancer-specific differential expression and/or genomic alterations, and it is increasingly clear that ncRNAs, including small ncRNAs and long ncRNAs (lncRNAs), play an important role in cancer development by regulating protein-coding gene expression through diverse mechanisms. In addition to ncRNAs, nearly half of the mammalian genomes consist of transposable elements, particularly retrotransposons. Once depicted as selfish genomic parasites that propagate at the expense of host fitness, retrotransposon elements could also confer regulatory complexity to the host genomes during development and disease. Reactivation of retrotransposons in cancer, while capable of causing insertional mutagenesis and genome rearrangements to promote oncogenesis, could also alter host gene expression networks to favor tumor development. Taken together, the functional significance of non-coding genome in tumorigenesis has been previously underestimated, and diverse transcripts derived from the non-coding genome could act as integral functional components of the oncogene and tumor suppressor network. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Long Noncoding RNAs: a New Regulatory Code in Metabolic Control

PubMed Central

Zhao, Xu-Yun; Lin, Jiandie D.

2015-01-01

Long noncoding RNAs (lncRNAs) are emerging as an integral part of the regulatory information encoded in the genome. LncRNAs possess the unique capability to interact with nucleic acids and proteins and exert discrete effects on numerous biological processes. Recent studies have delineated multiple lncRNA pathways that control metabolic tissue development and function. The expansion of the regulatory code that links nutrient and hormonal signals to tissue metabolism gives new insights into the genetic and pathogenic mechanisms underlying metabolic disease. This review discusses lncRNA biology with a focus on its role in the development, signaling, and function of key metabolic tissues. PMID:26410599
Artificial Intelligence, DNA Mimicry, and Human Health.

PubMed

Stefano, George B; Kream, Richard M

2017-08-14

The molecular evolution of genomic DNA across diverse plant and animal phyla involved dynamic registrations of sequence modifications to maintain existential homeostasis to increasingly complex patterns of environmental stressors. As an essential corollary, driver effects of positive evolutionary pressure are hypothesized to effect concerted modifications of genomic DNA sequences to meet expanded platforms of regulatory controls for successful implementation of advanced physiological requirements. It is also clearly apparent that preservation of updated registries of advantageous modifications of genomic DNA sequences requires coordinate expansion of convergent cellular proofreading/error correction mechanisms that are encoded by reciprocally modified genomic DNA. Computational expansion of operationally defined DNA memory extends to coordinate modification of coding and previously under-emphasized noncoding regions that now appear to represent essential reservoirs of untapped genetic information amenable to evolutionary driven recruitment into the realm of biologically active domains. Additionally, expansion of DNA memory potential via chemical modification and activation of noncoding sequences is targeted to vertical augmentation and integration of an expanded cadre of transcriptional and epigenetic regulatory factors affecting linear coding of protein amino acid sequences within open reading frames.
Feather development genes and associated regulatory innovation predate the origin of Dinosauria.

PubMed

Lowe, Craig B; Clarke, Julia A; Baker, Allan J; Haussler, David; Edwards, Scott V

2015-01-01

The evolution of avian feathers has recently been illuminated by fossils and the identification of genes involved in feather patterning and morphogenesis. However, molecular studies have focused mainly on protein-coding genes. Using comparative genomics and more than 600,000 conserved regulatory elements, we show that patterns of genome evolution in the vicinity of feather genes are consistent with a major role for regulatory innovation in the evolution of feathers. Rates of innovation at feather regulatory elements exhibit an extended period of innovation with peaks in the ancestors of amniotes and archosaurs. We estimate that 86% of such regulatory elements and 100% of the nonkeratin feather gene set were present prior to the origin of Dinosauria. On the branch leading to modern birds, we detect a strong signal of regulatory innovation near insulin-like growth factor binding protein (IGFBP) 2 and IGFBP5, which have roles in body size reduction, and may represent a genomic signature for the miniaturization of dinosaurian body size preceding the origin of flight. © The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Regulatory activities of transposable elements: from conflicts to benefits

PubMed Central

Chuong, Edward B.; Elde, Nels C.; Feschotte, Cédric

2017-01-01

Transposable elements (TEs) are a prolific source of tightly regulated, biochemically active non-coding elements, such as transcription factor binding sites and non-coding RNAs. A wealth of recent studies reinvigorates the idea that these elements are pervasively co-opted for the regulation of host genes. We argue that the inherent genetic properties of TEs and conflicting relationships with their hosts facilitate their recruitment for regulatory functions in diverse genomes. We review recent findings supporting the long-standing hypothesis that the waves of TE invasions endured by organisms for eons have catalyzed the evolution of gene regulatory networks. We also discuss the challenges of dissecting and interpreting the phenotypic impact of regulatory activities encoded by TEs in health and disease. PMID:27867194
Independent evolution of genomic characters during major metazoan transitions.

PubMed

Simakov, Oleg; Kawashima, Takeshi

2017-07-15

Metazoan evolution encompasses a vast evolutionary time scale spanning over 600 million years. Our ability to infer ancestral metazoan characters, both morphological and functional, is limited by our understanding of the nature and evolutionary dynamics of the underlying regulatory networks. Increasing coverage of metazoan genomes enables us to identify the evolutionary changes of the relevant genomic characters such as the loss or gain of coding sequences, gene duplications, micro- and macro-synteny, and non-coding element evolution in different lineages. In this review we describe recent advances in our understanding of ancestral metazoan coding and non-coding features, as deduced from genomic comparisons. Some genomic changes such as innovations in gene and linkage content occur at different rates across metazoan clades, suggesting some level of independence among genomic characters. While their contribution to biological innovation remains largely unclear, we review recent literature about certain genomic changes that do correlate with changes to specific developmental pathways and metazoan innovations. In particular, we discuss the origins of the recently described pharyngeal cluster which is conserved across deuterostome genomes, and highlight different genomic features that have contributed to the evolution of this group. We also assess our current capacity to infer ancestral metazoan states from gene models and comparative genomics tools and elaborate on the future directions of metazoan comparative genomics relevant to evo-devo studies. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
Genetic evidence for conserved non-coding element function across species–the ears have it

PubMed Central

Turner, Eric E.; Cox, Timothy C.

2014-01-01

Comparison of genomic sequences from diverse vertebrate species has revealed numerous highly conserved regions that do not appear to encode proteins or functional RNAs. Often these “conserved non-coding elements,” or CNEs, can direct gene expression to specific tissues in transgenic models, demonstrating they have regulatory function. CNEs are frequently found near “developmental” genes, particularly transcription factors, implying that these elements have essential regulatory roles in development. However, actual examples demonstrating CNE regulatory functions across species have been few, and recent loss-of-function studies of several CNEs in mice have shown relatively minor effects. In this Perspectives article, we discuss new findings in “fancy” rats and Highland cattle demonstrating that function of a CNE near the Hmx1 gene is crucial for normal external ear development and when disrupted can mimic loss-of function Hmx1 coding mutations in mice and humans. These findings provide important support for conserved developmental roles of CNEs in divergent species, and reinforce the concept that CNEs should be examined systematically in the ongoing search for genetic causes of human developmental disorders in the era of genome-scale sequencing. PMID:24478720
Highly conserved elements discovered in vertebrates are present in non-syntenic loci of tunicates, act as enhancers and can be transcribed during development

PubMed Central

Sanges, Remo; Hadzhiev, Yavor; Gueroult-Bellone, Marion; Roure, Agnes; Ferg, Marco; Meola, Nicola; Amore, Gabriele; Basu, Swaraj; Brown, Euan R.; De Simone, Marco; Petrera, Francesca; Licastro, Danilo; Strähle, Uwe; Banfi, Sandro; Lemaire, Patrick; Birney, Ewan; Müller, Ferenc; Stupka, Elia

2013-01-01

Co-option of cis-regulatory modules has been suggested as a mechanism for the evolution of expression sites during development. However, the extent and mechanisms involved in mobilization of cis-regulatory modules remains elusive. To trace the history of non-coding elements, which may represent candidate ancestral cis-regulatory modules affirmed during chordate evolution, we have searched for conserved elements in tunicate and vertebrate (Olfactores) genomes. We identified, for the first time, 183 non-coding sequences that are highly conserved between the two groups. Our results show that all but one element are conserved in non-syntenic regions between vertebrate and tunicate genomes, while being syntenic among vertebrates. Nevertheless, in all the groups, they are significantly associated with transcription factors showing specific functions fundamental to animal development, such as multicellular organism development and sequence-specific DNA binding. The majority of these regions map onto ultraconserved elements and we demonstrate that they can act as functional enhancers within the organism of origin, as well as in cross-transgenesis experiments, and that they are transcribed in extant species of Olfactores. We refer to the elements as ‘Olfactores conserved non-coding elements’. PMID:23393190
Determining Epigenetic Targets: A Beginner's Guide to Identifying Genome Functionality Through Database Analysis.

PubMed

Hay, Elizabeth A; Cowie, Philip; MacKenzie, Alasdair

2017-01-01

There can now be little doubt that the cis-regulatory genome represents the largest information source within the human genome essential for health. In addition to containing up to five times more information than the coding genome, the cis-regulatory genome also acts as a major reservoir of disease-associated polymorphic variation. The cis-regulatory genome, which is comprised of enhancers, silencers, promoters, and insulators, also acts as a major functional target for epigenetic modification including DNA methylation and chromatin modifications. These epigenetic modifications impact the ability of cis-regulatory sequences to maintain tissue-specific and inducible expression of genes that preserve health. There has been limited ability to identify and characterize the functional components of this huge and largely misunderstood part of the human genome that, for decades, was ignored as "Junk" DNA. In an attempt to address this deficit, the current chapter will first describe methods of identifying and characterizing functional elements of the cis-regulatory genome at a genome-wide level using databases such as ENCODE, the UCSC browser, and NCBI. We will then explore the databases on the UCSC genome browser, which provides access to DNA methylation and chromatin modification datasets. Finally, we will describe how we can superimpose the huge volume of study data contained in the NCBI archives onto that contained within the UCSC browser in order to glean relevant in vivo study data for any locus within the genome. An ability to access and utilize these information sources will become essential to informing the future design of experiments and subsequent determination of the role of epigenetics in health and disease and will form a critical step in our development of personalized medicine.
Lineage-Specific Genome Architecture Links Enhancers and Non-coding Disease Variants to Target Gene Promoters.

PubMed

Javierre, Biola M; Burren, Oliver S; Wilder, Steven P; Kreuzhuber, Roman; Hill, Steven M; Sewitz, Sven; Cairns, Jonathan; Wingett, Steven W; Várnai, Csilla; Thiecke, Michiel J; Burden, Frances; Farrow, Samantha; Cutler, Antony J; Rehnström, Karola; Downes, Kate; Grassi, Luigi; Kostadima, Myrto; Freire-Pritchett, Paula; Wang, Fan; Stunnenberg, Hendrik G; Todd, John A; Zerbino, Daniel R; Stegle, Oliver; Ouwehand, Willem H; Frontini, Mattia; Wallace, Chris; Spivakov, Mikhail; Fraser, Peter

2016-11-17

Long-range interactions between regulatory elements and gene promoters play key roles in transcriptional regulation. The vast majority of interactions are uncharted, constituting a major missing link in understanding genome control. Here, we use promoter capture Hi-C to identify interacting regions of 31,253 promoters in 17 human primary hematopoietic cell types. We show that promoter interactions are highly cell type specific and enriched for links between active promoters and epigenetically marked enhancers. Promoter interactomes reflect lineage relationships of the hematopoietic tree, consistent with dynamic remodeling of nuclear architecture during differentiation. Interacting regions are enriched in genetic variants linked with altered expression of genes they contact, highlighting their functional role. We exploit this rich resource to connect non-coding disease variants to putative target promoters, prioritizing thousands of disease-candidate genes and implicating disease pathways. Our results demonstrate the power of primary cell promoter interactomes to reveal insights into genomic regulatory mechanisms underlying common diseases. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
AP1 Keeps Chromatin Poised for Action | Center for Cancer Research

Cancer.gov

The human genome harbors gene-encoding DNA, the blueprint for building proteins that regulate cellular function. Embedded across the genome, in non-coding regions, are DNA elements to which regulatory factors bind. The interaction of regulatory factors with DNA at these sites modifies gene expression to modulate cell activity. In cells, DNA exists in a complex with proteins called chromatin that compacts the DNA in the nucleus, strongly restricting access to DNA sequences. As a result, regulatory factors only interact with a small subset of their potential binding elements in a given cell to regulate genes. How factors recognize and select sites in chromatin across the genome is not well understood -- but several discoveries in CCR’s Laboratory of Receptor Biology and Gene Expression (LRBGE) have shed light on the mechanisms that direct factors to DNA.
The 'dark matter' in the plant genomes: non-coding and unannotated DNA sequences associated with open chromatin.

PubMed

Jiang, Jiming

2015-04-01

Sequencing of complete plant genomes has become increasingly more routine since the advent of the next-generation sequencing technology. Identification and annotation of large amounts of noncoding but functional DNA sequences, including cis-regulatory DNA elements (CREs), have become a new frontier in plant genome research. Genomic regions containing active CREs bound to regulatory proteins are hypersensitive to DNase I digestion and are called DNase I hypersensitive sites (DHSs). Several recent DHS studies in plants illustrate that DHS datasets produced by DNase I digestion followed by next-generation sequencing (DNase-seq) are highly valuable for the identification and characterization of CREs associated with plant development and responses to environmental cues. DHS-based genomic profiling has opened a door to identify and annotate the 'dark matter' in sequenced plant genomes. Copyright © 2015 Elsevier Ltd. All rights reserved.
Promoter analysis reveals globally differential regulation of human long non-coding RNA and protein-coding genes

DOE PAGES

Alam, Tanvir; Medvedeva, Yulia A.; Jia, Hui; ...

2014-10-02

Transcriptional regulation of protein-coding genes is increasingly well-understood on a global scale, yet no comparable information exists for long non-coding RNA (lncRNA) genes, which were recently recognized to be as numerous as protein-coding genes in mammalian genomes. We performed a genome-wide comparative analysis of the promoters of human lncRNA and protein-coding genes, finding global differences in specific genetic and epigenetic features relevant to transcriptional regulation. These two groups of genes are hence subject to separate transcriptional regulatory programs, including distinct transcription factor (TF) proteins that significantly favor lncRNA, rather than coding-gene, promoters. We report a specific signature of promoter-proximal transcriptionalmore » regulation of lncRNA genes, including several distinct transcription factor binding sites (TFBS). Experimental DNase I hypersensitive site profiles are consistent with active configurations of these lncRNA TFBS sets in diverse human cell types. TFBS ChIP-seq datasets confirm the binding events that we predicted using computational approaches for a subset of factors. For several TFs known to be directly regulated by lncRNAs, we find that their putative TFBSs are enriched at lncRNA promoters, suggesting that the TFs and the lncRNAs may participate in a bidirectional feedback loop regulatory network. Accordingly, cells may be able to modulate lncRNA expression levels independently of mRNA levels via distinct regulatory pathways. Our results also raise the possibility that, given the historical reliance on protein-coding gene catalogs to define the chromatin states of active promoters, a revision of these chromatin signature profiles to incorporate expressed lncRNA genes is warranted in the future.« less
Promoter analysis reveals globally differential regulation of human long non-coding RNA and protein-coding genes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Alam, Tanvir; Medvedeva, Yulia A.; Jia, Hui

Transcriptional regulation of protein-coding genes is increasingly well-understood on a global scale, yet no comparable information exists for long non-coding RNA (lncRNA) genes, which were recently recognized to be as numerous as protein-coding genes in mammalian genomes. We performed a genome-wide comparative analysis of the promoters of human lncRNA and protein-coding genes, finding global differences in specific genetic and epigenetic features relevant to transcriptional regulation. These two groups of genes are hence subject to separate transcriptional regulatory programs, including distinct transcription factor (TF) proteins that significantly favor lncRNA, rather than coding-gene, promoters. We report a specific signature of promoter-proximal transcriptionalmore » regulation of lncRNA genes, including several distinct transcription factor binding sites (TFBS). Experimental DNase I hypersensitive site profiles are consistent with active configurations of these lncRNA TFBS sets in diverse human cell types. TFBS ChIP-seq datasets confirm the binding events that we predicted using computational approaches for a subset of factors. For several TFs known to be directly regulated by lncRNAs, we find that their putative TFBSs are enriched at lncRNA promoters, suggesting that the TFs and the lncRNAs may participate in a bidirectional feedback loop regulatory network. Accordingly, cells may be able to modulate lncRNA expression levels independently of mRNA levels via distinct regulatory pathways. Our results also raise the possibility that, given the historical reliance on protein-coding gene catalogs to define the chromatin states of active promoters, a revision of these chromatin signature profiles to incorporate expressed lncRNA genes is warranted in the future.« less
Shared regulatory sites are abundant in the human genome and shed light on genome evolution and disease pleiotropy.

PubMed

Tong, Pin; Monahan, Jack; Prendergast, James G D

2017-03-01

Large-scale gene expression datasets are providing an increasing understanding of the location of cis-eQTLs in the human genome and their role in disease. However, little is currently known regarding the extent of regulatory site-sharing between genes. This is despite it having potentially wide-ranging implications, from the determination of the way in which genetic variants may shape multiple phenotypes to the understanding of the evolution of human gene order. By first identifying the location of non-redundant cis-eQTLs, we show that regulatory site-sharing is a relatively common phenomenon in the human genome, with over 10% of non-redundant regulatory variants linked to the expression of multiple nearby genes. We show that these shared, local regulatory sites are linked to high levels of chromatin looping between the regulatory sites and their associated genes. In addition, these co-regulated gene modules are found to be strongly conserved across mammalian species, suggesting that shared regulatory sites have played an important role in shaping human gene order. The association of these shared cis-eQTLs with multiple genes means they also appear to be unusually important in understanding the genetics of human phenotypes and pleiotropy, with shared regulatory sites more often linked to multiple human phenotypes than other regulatory variants. This study shows that regulatory site-sharing is likely an underappreciated aspect of gene regulation and has important implications for the understanding of various biological phenomena, including how the two and three dimensional structures of the genome have been shaped and the potential causes of disease pleiotropy outside coding regions.
High-resolution transcriptional analysis of the regulatory influence of cell-to-cell signalling reveals novel genes that contribute to Xanthomonas phytopathogenesis

PubMed Central

An, Shi-Qi; Febrer, Melanie; McCarthy, Yvonne; Tang, Dong-Jie; Clissold, Leah; Kaithakottil, Gemy; Swarbreck, David; Tang, Ji-Liang; Rogers, Jane; Dow, J Maxwell; Ryan, Robert P

2013-01-01

The bacterium Xanthomonas campestris is an economically important pathogen of many crop species and a model for the study of bacterial phytopathogenesis. In X. campestris, a regulatory system mediated by the signal molecule DSF controls virulence to plants. The synthesis and recognition of the DSF signal depends upon different Rpf proteins. DSF signal generation requires RpfF whereas signal perception and transduction depends upon a system comprising the sensor RpfC and regulator RpfG. Here we have addressed the action and role of Rpf/DSF signalling in phytopathogenesis by high-resolution transcriptional analysis coupled to functional genomics. We detected transcripts for many genes that were unidentified by previous computational analysis of the genome sequence. Novel transcribed regions included intergenic transcripts predicted as coding or non-coding as well as those that were antisense to coding sequences. In total, mutation of rpfF, rpfG and rpfC led to alteration in transcript levels (more than fourfold) of approximately 480 genes. The regulatory influence of RpfF and RpfC demonstrated considerable overlap. Contrary to expectation, the regulatory influence of RpfC and RpfG had limited overlap, indicating complexities of the Rpf signalling system. Importantly, functional analysis revealed over 160 new virulence factors within the group of Rpf-regulated genes. PMID:23617851

The spotted gar genome illuminates vertebrate evolution and facilitates human-to-teleost comparisons

PubMed Central

Braasch, Ingo; Gehrke, Andrew R.; Smith, Jeramiah J.; Kawasaki, Kazuhiko; Manousaki, Tereza; Pasquier, Jeremy; Amores, Angel; Desvignes, Thomas; Batzel, Peter; Catchen, Julian; Berlin, Aaron M.; Campbell, Michael S.; Barrell, Daniel; Martin, Kyle J.; Mulley, John F.; Ravi, Vydianathan; Lee, Alison P.; Nakamura, Tetsuya; Chalopin, Domitille; Fan, Shaohua; Wcisel, Dustin; Cañestro, Cristian; Sydes, Jason; Beaudry, Felix E. G.; Sun, Yi; Hertel, Jana; Beam, Michael J.; Fasold, Mario; Ishiyama, Mikio; Johnson, Jeremy; Kehr, Steffi; Lara, Marcia; Letaw, John H.; Litman, Gary W.; Litman, Ronda T.; Mikami, Masato; Ota, Tatsuya; Saha, Nil Ratan; Williams, Louise; Stadler, Peter F.; Wang, Han; Taylor, John S.; Fontenot, Quenton; Ferrara, Allyse; Searle, Stephen M. J.; Aken, Bronwen; Yandell, Mark; Schneider, Igor; Yoder, Jeffrey A.; Volff, Jean-Nicolas; Meyer, Axel; Amemiya, Chris T.; Venkatesh, Byrappa; Holland, Peter W. H.; Guiguen, Yann; Bobe, Julien; Shubin, Neil H.; Di Palma, Federica; Alföldi, Jessica; Lindblad-Toh, Kerstin; Postlethwait, John H.

2016-01-01

To connect human biology to fish biomedical models, we sequenced the genome of spotted gar (Lepisosteus oculatus), whose lineage diverged from teleosts before the teleost genome duplication (TGD). The slowly evolving gar genome conserved in content and size many entire chromosomes from bony vertebrate ancestors. Gar bridges teleosts to tetrapods by illuminating the evolution of immunity, mineralization, and development (e.g., Hox, ParaHox, and miRNA genes). Numerous conserved non-coding elements (CNEs, often cis-regulatory) undetectable in direct human-teleost comparisons become apparent using gar: functional studies uncovered conserved roles of such cryptic CNEs, facilitating annotation of sequences identified in human genome-wide association studies. Transcriptomic analyses revealed that the sum of expression domains and levels from duplicated teleost genes often approximate patterns and levels of gar genes, consistent with subfunctionalization. The gar genome provides a resource for understanding evolution after genome duplication, the origin of vertebrate genomes, and the function of human regulatory sequences. PMID:26950095
Junk DNA and the long non-coding RNA twist in cancer genetics

PubMed Central

Ling, Hui; Vincent, Kimberly; Pichler, Martin; Fodde, Riccardo; Berindan-Neagoe, Ioana; Slack, Frank J.; Calin, George A

2015-01-01

The central dogma of molecular biology states that the flow of genetic information moves from DNA to RNA to protein. However, in the last decade this dogma has been challenged by new findings on non-coding RNAs (ncRNAs) such as microRNAs (miRNAs). More recently, long non-coding RNAs (lncRNAs) have attracted much attention due to their large number and biological significance. Many lncRNAs have been identified as mapping to regulatory elements including gene promoters and enhancers, ultraconserved regions, and intergenic regions of protein-coding genes. Yet, the biological function and molecular mechanisms of lncRNA in human diseases in general and cancer in particular remain largely unknown. Data from the literature suggest that lncRNA, often via interaction with proteins, functions in specific genomic loci or use their own transcription loci for regulatory activity. In this review, we summarize recent findings supporting the importance of DNA loci in lncRNA function, and the underlying molecular mechanisms via cis or trans regulation, and discuss their implications in cancer. In addition, we use the 8q24 genomic locus, a region containing interactive SNPs, DNA regulatory elements and lncRNAs, as an example to illustrate how single nucleotide polymorphism (SNP) located within lncRNAs may be functionally associated with the individual’s susceptibility to cancer. PMID:25619839
Disease-associated variants in different categories of disease located in distinct regulatory elements.

PubMed

Ma, Meng; Ru, Ying; Chuang, Ling-Shiang; Hsu, Nai-Yun; Shi, Li-Song; Hakenberg, Jörg; Cheng, Wei-Yi; Uzilov, Andrew; Ding, Wei; Glicksberg, Benjamin S; Chen, Rong

2015-01-01

The invention of high throughput sequencing technologies has led to the discoveries of hundreds of thousands of genetic variants associated with thousands of human diseases. Many of these genetic variants are located outside the protein coding regions, and as such, it is challenging to interpret the function of these genetic variants by traditional genetic approaches. Recent genome-wide functional genomics studies, such as FANTOM5 and ENCODE have uncovered a large number of regulatory elements across hundreds of different tissues or cell lines in the human genome. These findings provide an opportunity to study the interaction between regulatory elements and disease-associated genetic variants. Identifying these diseased-related regulatory elements will shed light on understanding the mechanisms of how these variants regulate gene expression and ultimately result in disease formation and progression. In this study, we curated and categorized 27,558 Mendelian disease variants, 20,964 complex disease variants, 5,809 cancer predisposing germline variants, and 43,364 recurrent cancer somatic mutations. Compared against nine different types of regulatory regions from FANTOM5 and ENCODE projects, we found that different types of disease variants show distinctive propensity for particular regulatory elements. Mendelian disease variants and recurrent cancer somatic mutations are 22-fold and 10- fold significantly enriched in promoter regions respectively (q<0.001), compared with allele-frequency-matched genomic background. Separate from these two categories, cancer predisposing germline variants are 27-fold enriched in histone modification regions (q<0.001), 10-fold enriched in chromatin physical interaction regions (q<0.001), and 6-fold enriched in transcription promoters (q<0.001). Furthermore, Mendelian disease variants and recurrent cancer somatic mutations share very similar distribution across types of functional effects. We further found that regulatory regions are located within over 50% coding exon regions. Transcription promoters, methylation regions, and transcription insulators have the highest density of disease variants, with 472, 239, and 72 disease variants per one million base pairs, respectively. Disease-associated variants in different disease categories are preferentially located in particular regulatory elements. These results will be useful for an overall understanding about the differences among the pathogenic mechanisms of various disease-associated variants.
Disease-associated variants in different categories of disease located in distinct regulatory elements

PubMed Central

2015-01-01

Background The invention of high throughput sequencing technologies has led to the discoveries of hundreds of thousands of genetic variants associated with thousands of human diseases. Many of these genetic variants are located outside the protein coding regions, and as such, it is challenging to interpret the function of these genetic variants by traditional genetic approaches. Recent genome-wide functional genomics studies, such as FANTOM5 and ENCODE have uncovered a large number of regulatory elements across hundreds of different tissues or cell lines in the human genome. These findings provide an opportunity to study the interaction between regulatory elements and disease-associated genetic variants. Identifying these diseased-related regulatory elements will shed light on understanding the mechanisms of how these variants regulate gene expression and ultimately result in disease formation and progression. Results In this study, we curated and categorized 27,558 Mendelian disease variants, 20,964 complex disease variants, 5,809 cancer predisposing germline variants, and 43,364 recurrent cancer somatic mutations. Compared against nine different types of regulatory regions from FANTOM5 and ENCODE projects, we found that different types of disease variants show distinctive propensity for particular regulatory elements. Mendelian disease variants and recurrent cancer somatic mutations are 22-fold and 10- fold significantly enriched in promoter regions respectively (q<0.001), compared with allele-frequency-matched genomic background. Separate from these two categories, cancer predisposing germline variants are 27-fold enriched in histone modification regions (q<0.001), 10-fold enriched in chromatin physical interaction regions (q<0.001), and 6-fold enriched in transcription promoters (q<0.001). Furthermore, Mendelian disease variants and recurrent cancer somatic mutations share very similar distribution across types of functional effects. We further found that regulatory regions are located within over 50% coding exon regions. Transcription promoters, methylation regions, and transcription insulators have the highest density of disease variants, with 472, 239, and 72 disease variants per one million base pairs, respectively. Conclusions Disease-associated variants in different disease categories are preferentially located in particular regulatory elements. These results will be useful for an overall understanding about the differences among the pathogenic mechanisms of various disease-associated variants. PMID:26110593
Radiation and the regulatory landscape of neo2-Darwinism.

PubMed

Rollo, C David

2006-05-11

Several recently revealed features of eukaryotic genomes were not predicted by earlier evolutionary paradigms, including the relatively small number of genes, the very large amounts of non-functional code and its quarantine in heterochromatin, the remarkable conservation of many functionally important genes across relatively enormous phylogenetic distances, and the prevalence of extra-genomic information associated with chromatin structure and histone proteins. All of these emphasize a paramount role for regulatory evolution, which is further reinforced by recent perspectives highlighting even higher-order regulation governing epigenetics and development (EVO-DEVO). Modern neo2-Darwinism, with its emphasis on regulatory mechanisms and regulatory evolution provides new vision for understanding radiation biology, particularly because free radicals and redox states are central to many regulatory mechanisms and free radicals generated by radiation mimic and amplify endogenous signalling. This paper explores some of these aspects and their implications for low-dose radiation biology.
GWAS4D: multidimensional analysis of context-specific regulatory variant for human complex diseases and traits.

PubMed

Huang, Dandan; Yi, Xianfu; Zhang, Shijie; Zheng, Zhanye; Wang, Panwen; Xuan, Chenghao; Sham, Pak Chung; Wang, Junwen; Li, Mulin Jun

2018-05-16

Genome-wide association studies have generated over thousands of susceptibility loci for many human complex traits, and yet for most of these associations the true causal variants remain unknown. Tissue/cell type-specific prediction and prioritization of non-coding regulatory variants will facilitate the identification of causal variants and underlying pathogenic mechanisms for particular complex diseases and traits. By leveraging recent large-scale functional genomics/epigenomics data, we develop an intuitive web server, GWAS4D (http://mulinlab.tmu.edu.cn/gwas4d or http://mulinlab.org/gwas4d), that systematically evaluates GWAS signals and identifies context-specific regulatory variants. The updated web server includes six major features: (i) updates the regulatory variant prioritization method with our new algorithm; (ii) incorporates 127 tissue/cell type-specific epigenomes data; (iii) integrates motifs of 1480 transcriptional regulators from 13 public resources; (iv) uniformly processes Hi-C data and generates significant interactions at 5 kb resolution across 60 tissues/cell types; (v) adds comprehensive non-coding variant functional annotations; (vi) equips a highly interactive visualization function for SNP-target interaction. Using a GWAS fine-mapped set for 161 coronary artery disease risk loci, we demonstrate that GWAS4D is able to efficiently prioritize disease-causal regulatory variants.
RNA regulatory networks in animals and plants: a long noncoding RNA perspective.

PubMed

Bai, Youhuang; Dai, Xiaozhuan; Harrison, Andrew P; Chen, Ming

2015-03-01

A recent highlight of genomics research has been the discovery of many families of transcripts which have function but do not code for proteins. An important group is long noncoding RNAs (lncRNAs), which are typically longer than 200 nt, and whose members originate from thousands of loci across genomes. We review progress in understanding the biogenesis and regulatory mechanisms of lncRNAs. We describe diverse computational and high throughput technologies for identifying and studying lncRNAs. We discuss the current knowledge of functional elements embedded in lncRNAs as well as insights into the lncRNA-based regulatory network in animals. We also describe genome-wide studies of large amount of lncRNAs in plants, as well as knowledge of selected plant lncRNAs with a focus on biotic/abiotic stress-responsive lncRNAs. © The Author 2014. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.
The Legionella pneumophila genome evolved to accommodate multiple regulatory mechanisms controlled by the CsrA-system

PubMed Central

Sahr, Tobias; Rusniok, Christophe; Impens, Francis; Oliva, Giulia; Sismeiro, Odile; Coppée, Jean-Yves

2017-01-01

The carbon storage regulator protein CsrA regulates cellular processes post-transcriptionally by binding to target-RNAs altering translation efficiency and/or their stability. Here we identified and analyzed the direct targets of CsrA in the human pathogen Legionella pneumophila. Genome wide transcriptome, proteome and RNA co-immunoprecipitation followed by deep sequencing of a wild type and a csrA mutant strain identified 479 RNAs with potential CsrA interaction sites located in the untranslated and/or coding regions of mRNAs or of known non-coding sRNAs. Further analyses revealed that CsrA exhibits a dual regulatory role in virulence as it affects the expression of the regulators FleQ, LqsR, LetE and RpoS but it also directly regulates the timely expression of over 40 Dot/Icm substrates. CsrA controls its own expression and the stringent response through a regulatory feedback loop as evidenced by its binding to RelA-mRNA and links it to quorum sensing and motility. CsrA is a central player in the carbon, amino acid, fatty acid metabolism and energy transfer and directly affects the biosynthesis of cofactors, vitamins and secondary metabolites. We describe the first L. pneumophila riboswitch, a thiamine pyrophosphate riboswitch whose regulatory impact is fine-tuned by CsrA, and identified a unique regulatory mode of CsrA, the active stabilization of RNA anti-terminator conformations inside a coding sequence preventing Rho-dependent termination of the gap operon through transcriptional polarity effects. This allows L. pneumophila to regulate the pentose phosphate pathway and the glycolysis combined or individually although they share genes in a single operon. Thus the L. pneumophila genome has evolved to acclimate at least five different modes of regulation by CsrA giving it a truly unique position in its life cycle. PMID:28212376
A Transcriptome Map of Actinobacillus pleuropneumoniae at Single-Nucleotide Resolution Using Deep RNA-Seq

PubMed Central

Su, Zhipeng; Zhu, Jiawen; Xu, Zhuofei; Xiao, Ran; Zhou, Rui; Li, Lu; Chen, Huanchun

2016-01-01

Actinobacillus pleuropneumoniae is the pathogen of porcine contagious pleuropneumoniae, a highly contagious respiratory disease of swine. Although the genome of A. pleuropneumoniae was sequenced several years ago, limited information is available on the genome-wide transcriptional analysis to accurately annotate the gene structures and regulatory elements. High-throughput RNA sequencing (RNA-seq) has been applied to study the transcriptional landscape of bacteria, which can efficiently and accurately identify gene expression regions and unknown transcriptional units, especially small non-coding RNAs (sRNAs), UTRs and regulatory regions. The aim of this study is to comprehensively analyze the transcriptome of A. pleuropneumoniae by RNA-seq in order to improve the existing genome annotation and promote our understanding of A. pleuropneumoniae gene structures and RNA-based regulation. In this study, we utilized RNA-seq to construct a single nucleotide resolution transcriptome map of A. pleuropneumoniae. More than 3.8 million high-quality reads (average length ~90 bp) from a cDNA library were generated and aligned to the reference genome. We identified 32 open reading frames encoding novel proteins that were mis-annotated in the previous genome annotations. The start sites for 35 genes based on the current genome annotation were corrected. Furthermore, 51 sRNAs in the A. pleuropneumoniae genome were discovered, of which 40 sRNAs were never reported in previous studies. The transcriptome map also enabled visualization of 5'- and 3'-UTR regions, in which contained 11 sRNAs. In addition, 351 operons covering 1230 genes throughout the whole genome were identified. The RNA-Seq based transcriptome map validated annotated genes and corrected annotations of open reading frames in the genome, and led to the identification of many functional elements (e.g. regions encoding novel proteins, non-coding sRNAs and operon structures). The transcriptional units described in this study provide a foundation for future studies concerning the gene functions and the transcriptional regulatory architectures of this pathogen. PMID:27018591
Evolution of Salmonella-Host Cell Interactions through a Dynamic Bacterial Genome

PubMed Central

Ilyas, Bushra; Tsai, Caressa N.; Coombes, Brian K.

2017-01-01

Salmonella Typhimurium has a broad arsenal of genes that are tightly regulated and coordinated to facilitate adaptation to the various host environments it colonizes. The genome of Salmonella Typhimurium has undergone multiple gene acquisition events and has accrued changes in non-coding DNA that have undergone selection by regulatory evolution. Together, at least 17 horizontally acquired pathogenicity islands (SPIs), prophage-associated genes, and changes in core genome regulation contribute to the virulence program of Salmonella. Here, we review the latest understanding of these elements and their contributions to pathogenesis, emphasizing the regulatory circuitry that controls niche-specific gene expression. In addition to an overview of the importance of SPI-1 and SPI-2 to host invasion and colonization, we describe the recently characterized contributions of other SPIs, including the antibacterial activity of SPI-6 and adhesion and invasion mediated by SPI-4. We further discuss how these fitness traits have been integrated into the regulatory circuitry of the bacterial cell through cis-regulatory evolution and by a careful balance of silencing and counter-silencing by regulatory proteins. Detailed understanding of regulatory evolution within Salmonella is uncovering novel aspects of infection biology that relate to host-pathogen interactions and evasion of host immunity. PMID:29034217
The genomic basis of adaptive evolution in threespine sticklebacks

PubMed Central

Jones, Felicity C; Grabherr, Manfred G; Chan, Yingguang Frank; Russell, Pamela; Mauceli, Evan; Johnson, Jeremy; Swofford, Ross; Pirun, Mono; Zody, Michael C; White, Simon; Birney, Ewan; Searle, Stephen; Schmutz, Jeremy; Grimwood, Jane; Dickson, Mark C; Myers, Richard M; Miller, Craig T; Summers, Brian R; Knecht, Anne K; Brady, Shannon D; Zhang, Haili; Pollen, Alex A; Howes, Timothy; Amemiya, Chris; Lander, Eric S; Di Palma, Federica

2012-01-01

Summary Marine stickleback fish have colonized and adapted to innumerable streams and lakes formed since the last ice age, providing an exceptional opportunity to characterize genomic mechanisms underlying repeated ecological adaptation in nature. Here we develop a high quality reference genome assembly for threespine sticklebacks. By sequencing the genomes of 20 additional individuals from a global set of marine and freshwater populations, we identify a genome-wide set of loci that are consistently associated with marine-freshwater divergence. Our results suggest that reuse of globally-shared standing genetic variation, including chromosomal inversions, plays an important role in repeated evolution of distinct marine and freshwater sticklebacks, and in the maintenance of divergent ecotypes during early stages of reproductive isolation. Both coding and regulatory changes occur in the set of loci underlying marine-freshwater evolution, with regulatory changes likely predominating in this classic example of repeated adaptive evolution in nature. PMID:22481358
The genomic basis of adaptive evolution in threespine sticklebacks.

PubMed

Jones, Felicity C; Grabherr, Manfred G; Chan, Yingguang Frank; Russell, Pamela; Mauceli, Evan; Johnson, Jeremy; Swofford, Ross; Pirun, Mono; Zody, Michael C; White, Simon; Birney, Ewan; Searle, Stephen; Schmutz, Jeremy; Grimwood, Jane; Dickson, Mark C; Myers, Richard M; Miller, Craig T; Summers, Brian R; Knecht, Anne K; Brady, Shannon D; Zhang, Haili; Pollen, Alex A; Howes, Timothy; Amemiya, Chris; Baldwin, Jen; Bloom, Toby; Jaffe, David B; Nicol, Robert; Wilkinson, Jane; Lander, Eric S; Di Palma, Federica; Lindblad-Toh, Kerstin; Kingsley, David M

2012-04-04

Marine stickleback fish have colonized and adapted to thousands of streams and lakes formed since the last ice age, providing an exceptional opportunity to characterize genomic mechanisms underlying repeated ecological adaptation in nature. Here we develop a high-quality reference genome assembly for threespine sticklebacks. By sequencing the genomes of twenty additional individuals from a global set of marine and freshwater populations, we identify a genome-wide set of loci that are consistently associated with marine-freshwater divergence. Our results indicate that reuse of globally shared standing genetic variation, including chromosomal inversions, has an important role in repeated evolution of distinct marine and freshwater sticklebacks, and in the maintenance of divergent ecotypes during early stages of reproductive isolation. Both coding and regulatory changes occur in the set of loci underlying marine-freshwater evolution, but regulatory changes appear to predominate in this well known example of repeated adaptive evolution in nature.
Social insect genomes exhibit dramatic evolution in gene composition and regulation while preserving regulatory features linked to sociality

PubMed Central

Simola, Daniel F.; Wissler, Lothar; Donahue, Greg; Waterhouse, Robert M.; Helmkampf, Martin; Roux, Julien; Nygaard, Sanne; Glastad, Karl M.; Hagen, Darren E.; Viljakainen, Lumi; Reese, Justin T.; Hunt, Brendan G.; Graur, Dan; Elhaik, Eran; Kriventseva, Evgenia V.; Wen, Jiayu; Parker, Brian J.; Cash, Elizabeth; Privman, Eyal; Childers, Christopher P.; Muñoz-Torres, Monica C.; Boomsma, Jacobus J.; Bornberg-Bauer, Erich; Currie, Cameron R.; Elsik, Christine G.; Suen, Garret; Goodisman, Michael A.D.; Keller, Laurent; Liebig, Jürgen; Rawls, Alan; Reinberg, Danny; Smith, Chris D.; Smith, Chris R.; Tsutsui, Neil; Wurm, Yannick; Zdobnov, Evgeny M.; Berger, Shelley L.; Gadau, Jürgen

2013-01-01

Genomes of eusocial insects code for dramatic examples of phenotypic plasticity and social organization. We compared the genomes of seven ants, the honeybee, and various solitary insects to examine whether eusocial lineages share distinct features of genomic organization. Each ant lineage contains ∼4000 novel genes, but only 64 of these genes are conserved among all seven ants. Many gene families have been expanded in ants, notably those involved in chemical communication (e.g., desaturases and odorant receptors). Alignment of the ant genomes revealed reduced purifying selection compared with Drosophila without significantly reduced synteny. Correspondingly, ant genomes exhibit dramatic divergence of noncoding regulatory elements; however, extant conserved regions are enriched for novel noncoding RNAs and transcription factor–binding sites. Comparison of orthologous gene promoters between eusocial and solitary species revealed significant regulatory evolution in both cis (e.g., Creb) and trans (e.g., fork head) for nearly 2000 genes, many of which exhibit phenotypic plasticity. Our results emphasize that genomic changes can occur remarkably fast in ants, because two recently diverged leaf-cutter ant species exhibit faster accumulation of species-specific genes and greater divergence in regulatory elements compared with other ants or Drosophila. Thus, while the “socio-genomes” of ants and the honeybee are broadly characterized by a pervasive pattern of divergence in gene composition and regulation, they preserve lineage-specific regulatory features linked to eusociality. We propose that changes in gene regulation played a key role in the origins of insect eusociality, whereas changes in gene composition were more relevant for lineage-specific eusocial adaptations. PMID:23636946
[Long non-coding RNAs in the pathophysiology of atherosclerosis].

PubMed

Novak, Jan; Vašků, Julie Bienertová; Souček, Miroslav

2018-01-01

The human genome contains about 22 000 protein-coding genes that are transcribed to an even larger amount of messenger RNAs (mRNA). Interestingly, the results of the project ENCODE from 2012 show, that despite up to 90 % of our genome being actively transcribed, protein-coding mRNAs make up only 2-3 % of the total amount of the transcribed RNA. The rest of RNA transcripts is not translated to proteins and that is why they are referred to as "non-coding RNAs". Earlier the non-coding RNA was considered "the dark matter of genome", or "the junk", whose genes has accumulated in our DNA during the course of evolution. Today we already know that non-coding RNAs fulfil a variety of regulatory functions in our body - they intervene into epigenetic processes from chromatin remodelling to histone methylation, or into the transcription process itself, or even post-transcription processes. Long non-coding RNAs (lncRNA) are one of the classes of non-coding RNAs that have more than 200 nucleotides in length (non-coding RNAs with less than 200 nucleotides in length are called small non-coding RNAs). lncRNAs represent a widely varied and large group of molecules with diverse regulatory functions. We can identify them in all thinkable cell types or tissues, or even in an extracellular space, which includes blood, specifically plasma. Their levels change during the course of organogenesis, they are specific to different tissues and their changes also occur along with the development of different illnesses, including atherosclerosis. This review article aims to present lncRNAs problematics in general and then focuses on some of their specific representatives in relation to the process of atherosclerosis (i.e. we describe lncRNA involvement in the biology of endothelial cells, vascular smooth muscle cells or immune cells), and we further describe possible clinical potential of lncRNA, whether in diagnostics or therapy of atherosclerosis and its clinical manifestations.Key words: atherosclerosis - lincRNA - lncRNA - MALAT - MIAT.
Automated conserved non-coding sequence (CNS) discovery reveals differences in gene content and promoter evolution among grasses

PubMed Central

Turco, Gina; Schnable, James C.; Pedersen, Brent; Freeling, Michael

2013-01-01

Conserved non-coding sequences (CNS) are islands of non-coding sequence that, like protein coding exons, show less divergence in sequence between related species than functionless DNA. Several CNSs have been demonstrated experimentally to function as cis-regulatory regions. However, the specific functions of most CNSs remain unknown. Previous searches for CNS in plants have either anchored on exons and only identified nearby sequences or required years of painstaking manual annotation. Here we present an open source tool that can accurately identify CNSs between any two related species with sequenced genomes, including both those immediately adjacent to exons and distal sequences separated by >12 kb of non-coding sequence. We have used this tool to characterize new motifs, associate CNSs with additional functions, and identify previously undetected genes encoding RNA and protein in the genomes of five grass species. We provide a list of 15,363 orthologous CNSs conserved across all grasses tested. We were also able to identify regulatory sequences present in the common ancestor of grasses that have been lost in one or more extant grass lineages. Lists of orthologous gene pairs and associated CNSs are provided for reference inbred lines of arabidopsis, Japonica rice, foxtail millet, sorghum, brachypodium, and maize. PMID:23874343
Nothing in Evolution Makes Sense Except in the Light of Genomics: Read-Write Genome Evolution as an Active Biological Process.

PubMed

Shapiro, James A

2016-06-08

The 21st century genomics-based analysis of evolutionary variation reveals a number of novel features impossible to predict when Dobzhansky and other evolutionary biologists formulated the neo-Darwinian Modern Synthesis in the middle of the last century. These include three distinct realms of cell evolution; symbiogenetic fusions forming eukaryotic cells with multiple genome compartments; horizontal organelle, virus and DNA transfers; functional organization of proteins as systems of interacting domains subject to rapid evolution by exon shuffling and exonization; distributed genome networks integrated by mobile repetitive regulatory signals; and regulation of multicellular development by non-coding lncRNAs containing repetitive sequence components. Rather than single gene traits, all phenotypes involve coordinated activity by multiple interacting cell molecules. Genomes contain abundant and functional repetitive components in addition to the unique coding sequences envisaged in the early days of molecular biology. Combinatorial coding, plus the biochemical abilities cells possess to rearrange DNA molecules, constitute a powerful toolbox for adaptive genome rewriting. That is, cells possess "Read-Write Genomes" they alter by numerous biochemical processes capable of rapidly restructuring cellular DNA molecules. Rather than viewing genome evolution as a series of accidental modifications, we can now study it as a complex biological process of active self-modification.
Auto-Regulatory RNA Editing Fine-Tunes mRNA Re-Coding and Complex Behaviour in Drosophila

PubMed Central

Savva, Yiannis A.; Jepson, James E.C; Sahin, Asli; Sugden, Arthur U.; Dorsky, Jacquelyn S.; Alpert, Lauren; Lawrence, Charles; Reenan, Robert A.

2014-01-01

Auto-regulatory feedback loops are a common molecular strategy used to optimize protein function. In Drosophila many mRNAs involved in neuro-transmission are re-coded at the RNA level by the RNA editing enzyme dADAR, leading to the incorporation of amino acids that are not directly encoded by the genome. dADAR also re-codes its own transcript, but the consequences of this auto-regulation in vivo are unclear. Here we show that hard-wiring or abolishing endogenous dADAR auto-regulation dramatically remodels the landscape of re-coding events in a site-specific manner. These molecular phenotypes correlate with altered localization of dADAR within the nuclear compartment. Furthermore, auto-editing exhibits sexually dimorphic patterns of spatial regulation and can be modified by abiotic environmental factors. Finally, we demonstrate that modifying dAdar auto-editing affects adaptive complex behaviors. Our results reveal the in vivo relevance of auto-regulatory control over post-transcriptional mRNA re-coding events in fine-tuning brain function and organismal behavior. PMID:22531175
An analytical framework for whole-genome sequence association studies and its implications for autism spectrum disorder.

PubMed

Werling, Donna M; Brand, Harrison; An, Joon-Yong; Stone, Matthew R; Zhu, Lingxue; Glessner, Joseph T; Collins, Ryan L; Dong, Shan; Layer, Ryan M; Markenscoff-Papadimitriou, Eirene; Farrell, Andrew; Schwartz, Grace B; Wang, Harold Z; Currall, Benjamin B; Zhao, Xuefang; Dea, Jeanselle; Duhn, Clif; Erdman, Carolyn A; Gilson, Michael C; Yadav, Rachita; Handsaker, Robert E; Kashin, Seva; Klei, Lambertus; Mandell, Jeffrey D; Nowakowski, Tomasz J; Liu, Yuwen; Pochareddy, Sirisha; Smith, Louw; Walker, Michael F; Waterman, Matthew J; He, Xin; Kriegstein, Arnold R; Rubenstein, John L; Sestan, Nenad; McCarroll, Steven A; Neale, Benjamin M; Coon, Hilary; Willsey, A Jeremy; Buxbaum, Joseph D; Daly, Mark J; State, Matthew W; Quinlan, Aaron R; Marth, Gabor T; Roeder, Kathryn; Devlin, Bernie; Talkowski, Michael E; Sanders, Stephan J

2018-05-01

Genomic association studies of common or rare protein-coding variation have established robust statistical approaches to account for multiple testing. Here we present a comparable framework to evaluate rare and de novo noncoding single-nucleotide variants, insertion/deletions, and all classes of structural variation from whole-genome sequencing (WGS). Integrating genomic annotations at the level of nucleotides, genes, and regulatory regions, we define 51,801 annotation categories. Analyses of 519 autism spectrum disorder families did not identify association with any categories after correction for 4,123 effective tests. Without appropriate correction, biologically plausible associations are observed in both cases and controls. Despite excluding previously identified gene-disrupting mutations, coding regions still exhibited the strongest associations. Thus, in autism, the contribution of de novo noncoding variation is probably modest in comparison to that of de novo coding variants. Robust results from future WGS studies will require large cohorts and comprehensive analytical strategies that consider the substantial multiple-testing burden.
The Intolerance of Regulatory Sequence to Genetic Variation Predicts Gene Dosage Sensitivity

PubMed Central

Wang, Quanli; Halvorsen, Matt; Han, Yujun; Weir, William H.; Allen, Andrew S.; Goldstein, David B.

2015-01-01

Noncoding sequence contains pathogenic mutations. Yet, compared with mutations in protein-coding sequence, pathogenic regulatory mutations are notoriously difficult to recognize. Most fundamentally, we are not yet adept at recognizing the sequence stretches in the human genome that are most important in regulating the expression of genes. For this reason, it is difficult to apply to the regulatory regions the same kinds of analytical paradigms that are being successfully applied to identify mutations among protein-coding regions that influence risk. To determine whether dosage sensitive genes have distinct patterns among their noncoding sequence, we present two primary approaches that focus solely on a gene’s proximal noncoding regulatory sequence. The first approach is a regulatory sequence analogue of the recently introduced residual variation intolerance score (RVIS), termed noncoding RVIS, or ncRVIS. The ncRVIS compares observed and predicted levels of standing variation in the regulatory sequence of human genes. The second approach, termed ncGERP, reflects the phylogenetic conservation of a gene’s regulatory sequence using GERP++. We assess how well these two approaches correlate with four gene lists that use different ways to identify genes known or likely to cause disease through changes in expression: 1) genes that are known to cause disease through haploinsufficiency, 2) genes curated as dosage sensitive in ClinGen’s Genome Dosage Map, 3) genes judged likely to be under purifying selection for mutations that change expression levels because they are statistically depleted of loss-of-function variants in the general population, and 4) genes judged unlikely to cause disease based on the presence of copy number variants in the general population. We find that both noncoding scores are highly predictive of dosage sensitivity using any of these criteria. In a similar way to ncGERP, we assess two ensemble-based predictors of regional noncoding importance, ncCADD and ncGWAVA, and find both scores are significantly predictive of human dosage sensitive genes and appear to carry information beyond conservation, as assessed by ncGERP. These results highlight that the intolerance of noncoding sequence stretches in the human genome can provide a critical complementary tool to other genome annotation approaches to help identify the parts of the human genome increasingly likely to harbor mutations that influence risk of disease. PMID:26332131
ChIPBase: a database for decoding the transcriptional regulation of long non-coding RNA and microRNA genes from ChIP-Seq data.

PubMed

Yang, Jian-Hua; Li, Jun-Hao; Jiang, Shan; Zhou, Hui; Qu, Liang-Hu

2013-01-01

Long non-coding RNAs (lncRNAs) and microRNAs (miRNAs) represent two classes of important non-coding RNAs in eukaryotes. Although these non-coding RNAs have been implicated in organismal development and in various human diseases, surprisingly little is known about their transcriptional regulation. Recent advances in chromatin immunoprecipitation with next-generation DNA sequencing (ChIP-Seq) have provided methods of detecting transcription factor binding sites (TFBSs) with unprecedented sensitivity. In this study, we describe ChIPBase (http://deepbase.sysu.edu.cn/chipbase/), a novel database that we have developed to facilitate the comprehensive annotation and discovery of transcription factor binding maps and transcriptional regulatory relationships of lncRNAs and miRNAs from ChIP-Seq data. The current release of ChIPBase includes high-throughput sequencing data that were generated by 543 ChIP-Seq experiments in diverse tissues and cell lines from six organisms. By analysing millions of TFBSs, we identified tens of thousands of TF-lncRNA and TF-miRNA regulatory relationships. Furthermore, two web-based servers were developed to annotate and discover transcriptional regulatory relationships of lncRNAs and miRNAs from ChIP-Seq data. In addition, we developed two genome browsers, deepView and genomeView, to provide integrated views of multidimensional data. Moreover, our web implementation supports diverse query types and the exploration of TFs, lncRNAs, miRNAs, gene ontologies and pathways.

Nothing in Evolution Makes Sense Except in the Light of Genomics: Read–Write Genome Evolution as an Active Biological Process

PubMed Central

Shapiro, James A.

2016-01-01

The 21st century genomics-based analysis of evolutionary variation reveals a number of novel features impossible to predict when Dobzhansky and other evolutionary biologists formulated the neo-Darwinian Modern Synthesis in the middle of the last century. These include three distinct realms of cell evolution; symbiogenetic fusions forming eukaryotic cells with multiple genome compartments; horizontal organelle, virus and DNA transfers; functional organization of proteins as systems of interacting domains subject to rapid evolution by exon shuffling and exonization; distributed genome networks integrated by mobile repetitive regulatory signals; and regulation of multicellular development by non-coding lncRNAs containing repetitive sequence components. Rather than single gene traits, all phenotypes involve coordinated activity by multiple interacting cell molecules. Genomes contain abundant and functional repetitive components in addition to the unique coding sequences envisaged in the early days of molecular biology. Combinatorial coding, plus the biochemical abilities cells possess to rearrange DNA molecules, constitute a powerful toolbox for adaptive genome rewriting. That is, cells possess “Read–Write Genomes” they alter by numerous biochemical processes capable of rapidly restructuring cellular DNA molecules. Rather than viewing genome evolution as a series of accidental modifications, we can now study it as a complex biological process of active self-modification. PMID:27338490
Identification of functional elements and regulatory circuits by Drosophila modENCODE

DOE Office of Scientific and Technical Information (OSTI.GOV)

Roy, Sushmita; Ernst, Jason; Kharchenko, Peter V.

2010-12-22

To gain insight into how genomic information is translated into cellular and developmental programs, the Drosophila model organism Encyclopedia of DNA Elements (modENCODE) project is comprehensively mapping transcripts, histone modifications, chromosomal proteins, transcription factors, replication proteins and intermediates, and nucleosome properties across a developmental time course and in multiple cell lines. We have generated more than 700 data sets and discovered protein-coding, noncoding, RNA regulatory, replication, and chromatin elements, more than tripling the annotated portion of the Drosophila genome. Correlated activity patterns of these elements reveal a functional regulatory network, which predicts putative new functions for genes, reveals stage- andmore » tissue-specific regulators, and enables gene-expression prediction. Our results provide a foundation for directed experimental and computational studies in Drosophila and related species and also a model for systematic data integration toward comprehensive genomic and functional annotation. Several years after the complete genetic sequencing of many species, it is still unclear how to translate genomic information into a functional map of cellular and developmental programs. The Encyclopedia of DNA Elements (ENCODE) (1) and model organism ENCODE (modENCODE) (2) projects use diverse genomic assays to comprehensively annotate the Homo sapiens (human), Drosophila melanogaster (fruit fly), and Caenorhabditis elegans (worm) genomes, through systematic generation and computational integration of functional genomic data sets. Previous genomic studies in flies have made seminal contributions to our understanding of basic biological mechanisms and genome functions, facilitated by genetic, experimental, computational, and manual annotation of the euchromatic and heterochromatic genome (3), small genome size, short life cycle, and a deep knowledge of development, gene function, and chromosome biology. The functions of {approx}40% of the protein and nonprotein-coding genes [FlyBase 5.12 (4)] have been determined from cDNA collections (5, 6), manual curation of gene models (7), gene mutations and comprehensive genome-wide RNA interference screens (8-10), and comparative genomic analyses (11, 12). The Drosophila modENCODE project has generated more than 700 data sets that profile transcripts, histone modifications and physical nucleosome properties, general and specific transcription factors (TFs), and replication programs in cell lines, isolated tissues, and whole organisms across several developmental stages (Fig. 1). Here, we computationally integrate these data sets and report (i) improved and additional genome annotations, including full-length proteincoding genes and peptides as short as 21 amino acids; (ii) noncoding transcripts, including 132 candidate structural RNAs and 1608 nonstructural transcripts; (iii) additional Argonaute (Ago)-associated small RNA genes and pathways, including new microRNAs (miRNAs) encoded within protein-coding exons and endogenous small interfering RNAs (siRNAs) from 3-inch untranslated regions; (iv) chromatin 'states' defined by combinatorial patterns of 18 chromatin marks that are associated with distinct functions and properties; (v) regions of high TF occupancy and replication activity with likely epigenetic regulation; (vi)mixed TF and miRNA regulatory networks with hierarchical structure and enriched feed-forward loops; (vii) coexpression- and co-regulation-based functional annotations for nearly 3000 genes; (viii) stage- and tissue-specific regulators; and (ix) predictive models of gene expression levels and regulator function.« less
New families of human regulatory RNA structures identified by comparative analysis of vertebrate genomes.

PubMed

Parker, Brian J; Moltke, Ida; Roth, Adam; Washietl, Stefan; Wen, Jiayu; Kellis, Manolis; Breaker, Ronald; Pedersen, Jakob Skou

2011-11-01

Regulatory RNA structures are often members of families with multiple paralogous instances across the genome. Family members share functional and structural properties, which allow them to be studied as a whole, facilitating both bioinformatic and experimental characterization. We have developed a comparative method, EvoFam, for genome-wide identification of families of regulatory RNA structures, based on primary sequence and secondary structure similarity. We apply EvoFam to a 41-way genomic vertebrate alignment. Genome-wide, we identify 220 human, high-confidence families outside protein-coding regions comprising 725 individual structures, including 48 families with known structural RNA elements. Known families identified include both noncoding RNAs, e.g., miRNAs and the recently identified MALAT1/MEN β lincRNA family; and cis-regulatory structures, e.g., iron-responsive elements. We also identify tens of new families supported by strong evolutionary evidence and other statistical evidence, such as GO term enrichments. For some of these, detailed analysis has led to the formulation of specific functional hypotheses. Examples include two hypothesized auto-regulatory feedback mechanisms: one involving six long hairpins in the 3'-UTR of MAT2A, a key metabolic gene that produces the primary human methyl donor S-adenosylmethionine; the other involving a tRNA-like structure in the intron of the tRNA maturation gene POP1. We experimentally validate the predicted MAT2A structures. Finally, we identify potential new regulatory networks, including large families of short hairpins enriched in immunity-related genes, e.g., TNF, FOS, and CTLA4, which include known transcript destabilizing elements. Our findings exemplify the diversity of post-transcriptional regulation and provide a resource for further characterization of new regulatory mechanisms and families of noncoding RNAs.
Decoding the role of regulatory element polymorphisms in complex disease.

PubMed

Vockley, Christopher M; Barrera, Alejandro; Reddy, Timothy E

2017-04-01

Genetic variation in gene regulatory elements contributes to diverse human diseases, ranging from rare and severe developmental defects to common and complex diseases such as obesity and diabetes. Early examples of regulatory mechanisms of human diseases involve large chromosomal rearrangements that change the regulatory connections within the genome. Single nucleotide variants in regulatory elements can also contribute to disease, potentially via demonstrated associations with changes in transcription factor binding, enhancer activity, post-translational histone modifications, long-range enhancer-promoter interactions, or RNA polymerase recruitment. Establishing causality between non-coding genetic variants, gene regulation, and disease has recently become more feasible with advances in genome-editing and epigenome-editing technologies. As establishing causal regulatory mechanisms of diseases becomes routine, functional annotation of target genes is likely to emerge as a major bottleneck for translation into patient benefits. In this review, we discuss the history and recent advances in understanding the regulatory mechanisms of human disease, and new challenges likely to be encountered once establishing those mechanisms becomes rote. Copyright © 2016 Elsevier Ltd. All rights reserved.
Global Organization of a Positive-strand RNA Virus Genome

PubMed Central

Wu, Baodong; Grigull, Jörg; Ore, Moriam O.; Morin, Sylvie; White, K. Andrew

2013-01-01

The genomes of plus-strand RNA viruses contain many regulatory sequences and structures that direct different viral processes. The traditional view of these RNA elements are as local structures present in non-coding regions. However, this view is changing due to the discovery of regulatory elements in coding regions and functional long-range intra-genomic base pairing interactions. The ∼4.8 kb long RNA genome of the tombusvirus tomato bushy stunt virus (TBSV) contains these types of structural features, including six different functional long-distance interactions. We hypothesized that to achieve these multiple interactions this viral genome must utilize a large-scale organizational strategy and, accordingly, we sought to assess the global conformation of the entire TBSV genome. Atomic force micrographs of the genome indicated a mostly condensed structure composed of interconnected protrusions extending from a central hub. This configuration was consistent with the genomic secondary structure model generated using high-throughput selective 2′-hydroxyl acylation analysed by primer extension (i.e. SHAPE), which predicted different sized RNA domains originating from a central region. Known RNA elements were identified in both domain and inter-domain regions, and novel structural features were predicted and functionally confirmed. Interestingly, only two of the six long-range interactions known to form were present in the structural model. However, for those interactions that did not form, complementary partner sequences were positioned relatively close to each other in the structure, suggesting that the secondary structure level of viral genome structure could provide a basic scaffold for the formation of different long-range interactions. The higher-order structural model for the TBSV RNA genome provides a snapshot of the complex framework that allows multiple functional components to operate in concert within a confined context. PMID:23717202
The locus of evolution: evo devo and the genetics of adaptation.

PubMed

Hoekstra, Hopi E; Coyne, Jerry A

2007-05-01

An important tenet of evolutionary developmental biology ("evo devo") is that adaptive mutations affecting morphology are more likely to occur in the cis-regulatory regions than in the protein-coding regions of genes. This argument rests on two claims: (1) the modular nature of cis-regulatory elements largely frees them from deleterious pleiotropic effects, and (2) a growing body of empirical evidence appears to support the predominant role of gene regulatory change in adaptation, especially morphological adaptation. Here we discuss and critique these assertions. We first show that there is no theoretical or empirical basis for the evo devo contention that adaptations involving morphology evolve by genetic mechanisms different from those involving physiology and other traits. In addition, some forms of protein evolution can avoid the negative consequences of pleiotropy, most notably via gene duplication. In light of evo devo claims, we then examine the substantial data on the genetic basis of adaptation from both genome-wide surveys and single-locus studies. Genomic studies lend little support to the cis-regulatory theory: many of these have detected adaptation in protein-coding regions, including transcription factors, whereas few have examined regulatory regions. Turning to single-locus studies, we note that the most widely cited examples of adaptive cis-regulatory mutations focus on trait loss rather than gain, and none have yet pinpointed an evolved regulatory site. In contrast, there are many studies that have both identified structural mutations and functionally verified their contribution to adaptation and speciation. Neither the theoretical arguments nor the data from nature, then, support the claim for a predominance of cis-regulatory mutations in evolution. Although this claim may be true, it is at best premature. Adaptation and speciation probably proceed through a combination of cis-regulatory and structural mutations, with a substantial contribution of the latter.
Partitioning of genetic variation between regulatory and coding gene segments: the predominance of software variation in genes encoding introvert proteins.

PubMed

Mitchison, A

1997-01-01

In considering genetic variation in eukaryotes, a fundamental distinction can be made between variation in regulatory (software) and coding (hardware) gene segments. For quantitative traits the bulk of variation, particularly that near the population mean, appears to reside in regulatory segments. The main exceptions to this rule concern proteins which handle extrinsic substances, here termed extrovert proteins. The immune system includes an unusually large proportion of this exceptional category, but even so its chief source of variation may well be polymorphism in regulatory gene segments. The main evidence for this view emerges from genome scanning for quantitative trait loci (QTL), which in the case of the immune system points to a major contribution of pro-inflammatory cytokine genes. Further support comes from sequencing of major histocompatibility complex (Mhc) class II promoters, where a high level of polymorphism has been detected. These Mhc promoters appear to act, in part at least, by gating the back-signal from T cells into antigen-presenting cells. Both these forms of polymorphism are likely to be sustained by the need for flexibility in the immune response. Future work on promoter polymorphism is likely to benefit from the input from genome informatics.
Identification of novel non-coding small RNAs from Streptococcus pneumoniae TIGR4 using high-resolution genome tiling arrays

PubMed Central

2010-01-01

Background The identification of non-coding transcripts in human, mouse, and Escherichia coli has revealed their widespread occurrence and functional importance in both eukaryotic and prokaryotic life. In prokaryotes, studies have shown that non-coding transcripts participate in a broad range of cellular functions like gene regulation, stress and virulence. However, very little is known about non-coding transcripts in Streptococcus pneumoniae (pneumococcus), an obligate human respiratory pathogen responsible for significant worldwide morbidity and mortality. Tiling microarrays enable genome wide mRNA profiling as well as identification of novel transcripts at a high-resolution. Results Here, we describe a high-resolution transcription map of the S. pneumoniae clinical isolate TIGR4 using genomic tiling arrays. Our results indicate that approximately 66% of the genome is expressed under our experimental conditions. We identified a total of 50 non-coding small RNAs (sRNAs) from the intergenic regions, of which 36 had no predicted function. Half of the identified sRNA sequences were found to be unique to S. pneumoniae genome. We identified eight overrepresented sequence motifs among sRNA sequences that correspond to sRNAs in different functional categories. Tiling arrays also identified approximately 202 operon structures in the genome. Conclusions In summary, the pneumococcal operon structures and novel sRNAs identified in this study enhance our understanding of the complexity and extent of the pneumococcal 'expressed' genome. Furthermore, the results of this study open up new avenues of research for understanding the complex RNA regulatory network governing S. pneumoniae physiology and virulence. PMID:20525227
Nencki Genomics Database—Ensembl funcgen enhanced with intersections, user data and genome-wide TFBS motifs

PubMed Central

Krystkowiak, Izabella; Lenart, Jakub; Debski, Konrad; Kuterba, Piotr; Petas, Michal; Kaminska, Bozena; Dabrowski, Michal

2013-01-01

We present the Nencki Genomics Database, which extends the functionality of Ensembl Regulatory Build (funcgen) for the three species: human, mouse and rat. The key enhancements over Ensembl funcgen include the following: (i) a user can add private data, analyze them alongside the public data and manage access rights; (ii) inside the database, we provide efficient procedures for computing intersections between regulatory features and for mapping them to the genes. To Ensembl funcgen-derived data, which include data from ENCODE, we add information on conserved non-coding (putative regulatory) sequences, and on genome-wide occurrence of transcription factor binding site motifs from the current versions of two major motif libraries, namely, Jaspar and Transfac. The intersections and mapping to the genes are pre-computed for the public data, and the result of any procedure run on the data added by the users is stored back into the database, thus incrementally increasing the body of pre-computed data. As the Ensembl funcgen schema for the rat is currently not populated, our database is the first database of regulatory features for this frequently used laboratory animal. The database is accessible without registration using the mysql client: mysql –h database.nencki-genomics.org –u public. Registration is required only to add or access private data. A WSDL webservice provides access to the database from any SOAP client, including the Taverna Workbench with a graphical user interface. Database URL: http://www.nencki-genomics.org. PMID:24089456
Genome-wide identification of conserved intronic non-coding sequences using a Bayesian segmentation approach.

PubMed

Algama, Manjula; Tasker, Edward; Williams, Caitlin; Parslow, Adam C; Bryson-Richardson, Robert J; Keith, Jonathan M

2017-03-27

Computational identification of non-coding RNAs (ncRNAs) is a challenging problem. We describe a genome-wide analysis using Bayesian segmentation to identify intronic elements highly conserved between three evolutionarily distant vertebrate species: human, mouse and zebrafish. We investigate the extent to which these elements include ncRNAs (or conserved domains of ncRNAs) and regulatory sequences. We identified 655 deeply conserved intronic sequences in a genome-wide analysis. We also performed a pathway-focussed analysis on genes involved in muscle development, detecting 27 intronic elements, of which 22 were not detected in the genome-wide analysis. At least 87% of the genome-wide and 70% of the pathway-focussed elements have existing annotations indicative of conserved RNA secondary structure. The expression of 26 of the pathway-focused elements was examined using RT-PCR, providing confirmation that they include expressed ncRNAs. Consistent with previous studies, these elements are significantly over-represented in the introns of transcription factors. This study demonstrates a novel, highly effective, Bayesian approach to identifying conserved non-coding sequences. Our results complement previous findings that these sequences are enriched in transcription factors. However, in contrast to previous studies which suggest the majority of conserved sequences are regulatory factor binding sites, the majority of conserved sequences identified using our approach contain evidence of conserved RNA secondary structures, and our laboratory results suggest most are expressed. Functional roles at DNA and RNA levels are not mutually exclusive, and many of our elements possess evidence of both. Moreover, ncRNAs play roles in transcriptional and post-transcriptional regulation, and this may contribute to the over-representation of these elements in introns of transcription factors. We attribute the higher sensitivity of the pathway-focussed analysis compared to the genome-wide analysis to improved alignment quality, suggesting that enhanced genomic alignments may reveal many more conserved intronic sequences.
Identification of cyanobacterial non-coding RNAs by comparative genome analysis.

PubMed

Axmann, Ilka M; Kensche, Philip; Vogel, Jörg; Kohl, Stefan; Herzel, Hanspeter; Hess, Wolfgang R

2005-01-01

Whole genome sequencing of marine cyanobacteria has revealed an unprecedented degree of genomic variation and streamlining. With a size of 1.66 megabase-pairs, Prochlorococcus sp. MED4 has the most compact of these genomes and it is enigmatic how the few identified regulatory proteins efficiently sustain the lifestyle of an ecologically successful marine microorganism. Small non-coding RNAs (ncRNAs) control a plethora of processes in eukaryotes as well as in bacteria; however, systematic searches for ncRNAs are still lacking for most eubacterial phyla outside the enterobacteria. Based on a computational prediction we show the presence of several ncRNAs (cyanobacterial functional RNA or Yfr) in several different cyanobacteria of the Prochlorococcus-Synechococcus lineage. Some ncRNA genes are present only in two or three of the four strains investigated, whereas the RNAs Yfr2 through Yfr5 are structurally highly related and are encoded by a rapidly evolving gene family as their genes exist in different copy numbers and at different sites in the four investigated genomes. One ncRNA, Yfr7, is present in at least seven other cyanobacteria. In addition, control elements for several ribosomal operons were predicted as well as riboswitches for thiamine pyrophosphate and cobalamin. This is the first genome-wide and systematic screen for ncRNAs in cyanobacteria. Several ncRNAs were both computationally predicted and their presence was biochemically verified. These RNAs may have regulatory functions and each shows a distinct phylogenetic distribution. Our approach can be applied to any group of microorganisms for which more than one total genome sequence is available for comparative analysis.
Regulatory element-based prediction identifies new susceptibility regulatory variants for osteoporosis.

PubMed

Yao, Shi; Guo, Yan; Dong, Shan-Shan; Hao, Ruo-Han; Chen, Xiao-Feng; Chen, Yi-Xiao; Chen, Jia-Bin; Tian, Qing; Deng, Hong-Wen; Yang, Tie-Lin

2017-08-01

Despite genome-wide association studies (GWASs) have identified many susceptibility genes for osteoporosis, it still leaves a large part of missing heritability to be discovered. Integrating regulatory information and GWASs could offer new insights into the biological link between the susceptibility SNPs and osteoporosis. We generated five machine learning classifiers with osteoporosis-associated variants and regulatory features data. We gained the optimal classifier and predicted genome-wide SNPs to discover susceptibility regulatory variants. We further utilized Genetic Factors for Osteoporosis Consortium (GEFOS) and three in-house GWASs samples to validate the associations for predicted positive SNPs. The random forest classifier performed best among all machine learning methods with the F1 score of 0.8871. Using the optimized model, we predicted 37,584 candidate SNPs for osteoporosis. According to the meta-analysis results, a list of regulatory variants was significantly associated with osteoporosis after multiple testing corrections and contributed to the expression of known osteoporosis-associated protein-coding genes. In summary, combining GWASs and regulatory elements through machine learning could provide additional information for understanding the mechanism of osteoporosis. The regulatory variants we predicted will provide novel targets for etiology research and treatment of osteoporosis.
Integrating evolutionary and regulatory information with a multispecies approach implicates genes and pathways in obsessive-compulsive disorder.

PubMed

Noh, Hyun Ji; Tang, Ruqi; Flannick, Jason; O'Dushlaine, Colm; Swofford, Ross; Howrigan, Daniel; Genereux, Diane P; Johnson, Jeremy; van Grootheest, Gerard; Grünblatt, Edna; Andersson, Erik; Djurfeldt, Diana R; Patel, Paresh D; Koltookian, Michele; M Hultman, Christina; Pato, Michele T; Pato, Carlos N; Rasmussen, Steven A; Jenike, Michael A; Hanna, Gregory L; Stewart, S Evelyn; Knowles, James A; Ruhrmann, Stephan; Grabe, Hans-Jörgen; Wagner, Michael; Rück, Christian; Mathews, Carol A; Walitza, Susanne; Cath, Daniëlle C; Feng, Guoping; Karlsson, Elinor K; Lindblad-Toh, Kerstin

2017-10-17

Obsessive-compulsive disorder is a severe psychiatric disorder linked to abnormalities in glutamate signaling and the cortico-striatal circuit. We sequenced coding and regulatory elements for 608 genes potentially involved in obsessive-compulsive disorder in human, dog, and mouse. Using a new method that prioritizes likely functional variants, we compared 592 cases to 560 controls and found four strongly associated genes, validated in a larger cohort. NRXN1 and HTR2A are enriched for coding variants altering postsynaptic protein-binding domains. CTTNBP2 (synapse maintenance) and REEP3 (vesicle trafficking) are enriched for regulatory variants, of which at least six (35%) alter transcription factor-DNA binding in neuroblastoma cells. NRXN1 achieves genome-wide significance (p = 6.37 × 10 -11 ) when we include 33,370 population-matched controls. Our findings suggest synaptic adhesion as a key component in compulsive behaviors, and show that targeted sequencing plus functional annotation can identify potentially causative variants, even when genomic data are limited.Obsessive-compulsive disorder (OCD) is a neuropsychiatric disorder with symptoms including intrusive thoughts and time-consuming repetitive behaviors. Here Noh and colleagues identify genes enriched for functional variants associated with increased risk of OCD.
CHiCP: a web-based tool for the integrative and interactive visualization of promoter capture Hi-C datasets.

PubMed

Schofield, E C; Carver, T; Achuthan, P; Freire-Pritchett, P; Spivakov, M; Todd, J A; Burren, O S

2016-08-15

Promoter capture Hi-C (PCHi-C) allows the genome-wide interrogation of physical interactions between distal DNA regulatory elements and gene promoters in multiple tissue contexts. Visual integration of the resultant chromosome interaction maps with other sources of genomic annotations can provide insight into underlying regulatory mechanisms. We have developed Capture HiC Plotter (CHiCP), a web-based tool that allows interactive exploration of PCHi-C interaction maps and integration with both public and user-defined genomic datasets. CHiCP is freely accessible from www.chicp.org and supports most major HTML5 compliant web browsers. Full source code and installation instructions are available from http://github.com/D-I-L/django-chicp ob219@cam.ac.uk. © The Author 2016. Published by Oxford University Press. All rights reserved.
CisMiner: Genome-Wide In-Silico Cis-Regulatory Module Prediction by Fuzzy Itemset Mining

PubMed Central

Navarro, Carmen; Lopez, Francisco J.; Cano, Carlos; Garcia-Alcalde, Fernando; Blanco, Armando

2014-01-01

Eukaryotic gene control regions are known to be spread throughout non-coding DNA sequences which may appear distant from the gene promoter. Transcription factors are proteins that coordinately bind to these regions at transcription factor binding sites to regulate gene expression. Several tools allow to detect significant co-occurrences of closely located binding sites (cis-regulatory modules, CRMs). However, these tools present at least one of the following limitations: 1) scope limited to promoter or conserved regions of the genome; 2) do not allow to identify combinations involving more than two motifs; 3) require prior information about target motifs. In this work we present CisMiner, a novel methodology to detect putative CRMs by means of a fuzzy itemset mining approach able to operate at genome-wide scale. CisMiner allows to perform a blind search of CRMs without any prior information about target CRMs nor limitation in the number of motifs. CisMiner tackles the combinatorial complexity of genome-wide cis-regulatory module extraction using a natural representation of motif combinations as itemsets and applying the Top-Down Fuzzy Frequent- Pattern Tree algorithm to identify significant itemsets. Fuzzy technology allows CisMiner to better handle the imprecision and noise inherent to regulatory processes. Results obtained for a set of well-known binding sites in the S. cerevisiae genome show that our method yields highly reliable predictions. Furthermore, CisMiner was also applied to putative in-silico predicted transcription factor binding sites to identify significant combinations in S. cerevisiae and D. melanogaster, proving that our approach can be further applied genome-wide to more complex genomes. CisMiner is freely accesible at: http://genome2.ugr.es/cisminer. CisMiner can be queried for the results presented in this work and can also perform a customized cis-regulatory module prediction on a query set of transcription factor binding sites provided by the user. PMID:25268582
De novo mutations in regulatory elements in neurodevelopmental disorders

PubMed Central

Short, Patrick J.; McRae, Jeremy F.; Gallone, Giuseppe; Sifrim, Alejandro; Won, Hyejung; Geschwind, Daniel H.; Wright, Caroline F.; Firth, Helen V; FitzPatrick, David R.; Barrett, Jeffrey C.; Hurles, Matthew E.

2018-01-01

We previously estimated that 42% of patients with severe developmental disorders carry pathogenic de novo mutations in coding sequences. The role of de novo mutations in regulatory elements affecting genes associated with developmental disorders, or other genes, has been essentially unexplored. We identified de novo mutations in three classes of putative regulatory elements in almost 8,000 patients with developmental disorders. Here we show that de novo mutations in highly evolutionarily conserved fetal brain-active elements are significantly and specifically enriched in neurodevelopmental disorders. We identified a significant twofold enrichment of recurrently mutated elements. We estimate that, genome-wide, 1-3% of patients without a diagnostic coding variant carry pathogenic de novo mutations in fetal brain-active regulatory elements and that only 0.15% of all possible mutations within highly conserved fetal brain-active elements cause neurodevelopmental disorders with a dominant mechanism. Our findings represent a robust estimate of the contribution of de novo mutations in regulatory elements to this genetically heterogeneous set of disorders, and emphasize the importance of combining functional and evolutionary evidence to identify regulatory causes of genetic disorders. PMID:29562236
Modularity and design principles in the sea urchin embryo gene regulatory network

PubMed Central

Peter, Isabelle S.; Davidson, Eric H.

2010-01-01

The gene regulatory network (GRN) established experimentally for the pre-gastrular sea urchin embryo provides causal explanations of the biological functions required for spatial specification of embryonic regulatory states. Here we focus on the structure of the GRN which controls the progressive increase in complexity of territorial regulatory states during embryogenesis; and on the types of modular subcircuits of which the GRN is composed. Each of these subcircuit topologies executes a particular operation of spatial information processing. The GRN architecture reflects the particular mode of embryogenesis represented by sea urchin development. Network structure not only specifies the linkages constituting the genomic regulatory code for development, but also indicates the various regulatory requirements of regional developmental processes. PMID:19932099
Living Organisms Author Their Read-Write Genomes in Evolution.

PubMed

Shapiro, James A

2017-12-06

Evolutionary variations generating phenotypic adaptations and novel taxa resulted from complex cellular activities altering genome content and expression: (i) Symbiogenetic cell mergers producing the mitochondrion-bearing ancestor of eukaryotes and chloroplast-bearing ancestors of photosynthetic eukaryotes; (ii) interspecific hybridizations and genome doublings generating new species and adaptive radiations of higher plants and animals; and, (iii) interspecific horizontal DNA transfer encoding virtually all of the cellular functions between organisms and their viruses in all domains of life. Consequently, assuming that evolutionary processes occur in isolated genomes of individual species has become an unrealistic abstraction. Adaptive variations also involved natural genetic engineering of mobile DNA elements to rewire regulatory networks. In the most highly evolved organisms, biological complexity scales with "non-coding" DNA content more closely than with protein-coding capacity. Coincidentally, we have learned how so-called "non-coding" RNAs that are rich in repetitive mobile DNA sequences are key regulators of complex phenotypes. Both biotic and abiotic ecological challenges serve as triggers for episodes of elevated genome change. The intersections of cell activities, biosphere interactions, horizontal DNA transfers, and non-random Read-Write genome modifications by natural genetic engineering provide a rich molecular and biological foundation for understanding how ecological disruptions can stimulate productive, often abrupt, evolutionary transformations.
The identification of cis-regulatory elements: A review from a machine learning perspective.

PubMed

Li, Yifeng; Chen, Chih-Yu; Kaye, Alice M; Wasserman, Wyeth W

2015-12-01

The majority of the human genome consists of non-coding regions that have been called junk DNA. However, recent studies have unveiled that these regions contain cis-regulatory elements, such as promoters, enhancers, silencers, insulators, etc. These regulatory elements can play crucial roles in controlling gene expressions in specific cell types, conditions, and developmental stages. Disruption to these regions could contribute to phenotype changes. Precisely identifying regulatory elements is key to deciphering the mechanisms underlying transcriptional regulation. Cis-regulatory events are complex processes that involve chromatin accessibility, transcription factor binding, DNA methylation, histone modifications, and the interactions between them. The development of next-generation sequencing techniques has allowed us to capture these genomic features in depth. Applied analysis of genome sequences for clinical genetics has increased the urgency for detecting these regions. However, the complexity of cis-regulatory events and the deluge of sequencing data require accurate and efficient computational approaches, in particular, machine learning techniques. In this review, we describe machine learning approaches for predicting transcription factor binding sites, enhancers, and promoters, primarily driven by next-generation sequencing data. Data sources are provided in order to facilitate testing of novel methods. The purpose of this review is to attract computational experts and data scientists to advance this field. Crown Copyright © 2015. Published by Elsevier Ireland Ltd. All rights reserved.
Transposable Elements and DNA Methylation Create in Embryonic Stem Cells Human-Specific Regulatory Sequences Associated with Distal Enhancers and Noncoding RNAs

PubMed Central

Glinsky, Gennadi V.

2015-01-01

Despite significant progress in the structural and functional characterization of the human genome, understanding of the mechanisms underlying the genetic basis of human phenotypic uniqueness remains limited. Here, I report that transposable element-derived sequences, most notably LTR7/HERV-H, LTR5_Hs, and L1HS, harbor 99.8% of the candidate human-specific regulatory loci (HSRL) with putative transcription factor-binding sites in the genome of human embryonic stem cells (hESC). A total of 4,094 candidate HSRL display selective and site-specific binding of critical regulators (NANOG [Nanog homeobox], POU5F1 [POU class 5 homeobox 1], CCCTC-binding factor [CTCF], Lamin B1), and are preferentially located within the matrix of transcriptionally active DNA segments that are hypermethylated in hESC. hESC-specific NANOG-binding sites are enriched near the protein-coding genes regulating brain size, pluripotency long noncoding RNAs, hESC enhancers, and 5-hydroxymethylcytosine-harboring regions immediately adjacent to binding sites. Sequences of only 4.3% of hESC-specific NANOG-binding sites are present in Neanderthals’ genome, suggesting that a majority of these regulatory elements emerged in Modern Humans. Comparisons of estimated creation rates of novel TF-binding sites revealed that there was 49.7-fold acceleration of creation rates of NANOG-binding sites in genomes of Chimpanzees compared with the mouse genomes and further 5.7-fold acceleration in genomes of Modern Humans compared with the Chimpanzees genomes. Preliminary estimates suggest that emergence of one novel NANOG-binding site detectable in hESC required 466 years of evolution. Pathway analysis of coding genes that have hESC-specific NANOG-binding sites within gene bodies or near gene boundaries revealed their association with physiological development and functions of nervous and cardiovascular systems, embryonic development, behavior, as well as development of a diverse spectrum of pathological conditions such as cancer, diseases of cardiovascular and reproductive systems, metabolic diseases, multiple neurological and psychological disorders. A proximity placement model is proposed explaining how a 33–47% excess of NANOG, CTCF, and POU5F1 proteins immobilized on a DNA scaffold may play a functional role at distal regulatory elements. PMID:25956794

Nencki Genomics Database--Ensembl funcgen enhanced with intersections, user data and genome-wide TFBS motifs.

PubMed

Krystkowiak, Izabella; Lenart, Jakub; Debski, Konrad; Kuterba, Piotr; Petas, Michal; Kaminska, Bozena; Dabrowski, Michal

2013-01-01

We present the Nencki Genomics Database, which extends the functionality of Ensembl Regulatory Build (funcgen) for the three species: human, mouse and rat. The key enhancements over Ensembl funcgen include the following: (i) a user can add private data, analyze them alongside the public data and manage access rights; (ii) inside the database, we provide efficient procedures for computing intersections between regulatory features and for mapping them to the genes. To Ensembl funcgen-derived data, which include data from ENCODE, we add information on conserved non-coding (putative regulatory) sequences, and on genome-wide occurrence of transcription factor binding site motifs from the current versions of two major motif libraries, namely, Jaspar and Transfac. The intersections and mapping to the genes are pre-computed for the public data, and the result of any procedure run on the data added by the users is stored back into the database, thus incrementally increasing the body of pre-computed data. As the Ensembl funcgen schema for the rat is currently not populated, our database is the first database of regulatory features for this frequently used laboratory animal. The database is accessible without registration using the mysql client: mysql -h database.nencki-genomics.org -u public. Registration is required only to add or access private data. A WSDL webservice provides access to the database from any SOAP client, including the Taverna Workbench with a graphical user interface.
Genome-guided exploration of metabolic features of Streptomyces peucetius ATCC 27952: past, current, and prospect.

PubMed

Thuan, Nguyen Huy; Dhakal, Dipesh; Pokhrel, Anaya Raj; Chu, Luan Luong; Van Pham, Thi Thuy; Shrestha, Anil; Sohng, Jae Kyung

2018-05-01

Streptomyces peucetius ATCC 27952 produces two major anthracyclines, doxorubicin (DXR) and daunorubicin (DNR), which are potent chemotherapeutic agents for the treatment of several cancers. In order to gain detailed insight on genetics and biochemistry of the strain, the complete genome was determined and analyzed. The result showed that its complete sequence contains 7187 protein coding genes in a total of 8,023,114 bp, whereas 87% of the genome contributed to the protein coding region. The genomic sequence included 18 rRNA, 66 tRNAs, and 3 non-coding RNAs. In silico studies predicted ~ 68 biosynthetic gene clusters (BCGs) encoding diverse classes of secondary metabolites, including non-ribosomal polyketide synthase (NRPS), polyketide synthase (PKS I, II, and III), terpenes, and others. Detailed analysis of the genome sequence revealed versatile biocatalytic enzymes such as cytochrome P450 (CYP), electron transfer systems (ETS) genes, methyltransferase (MT), glycosyltransferase (GT). In addition, numerous functional genes (transporter gene, SOD, etc.) and regulatory genes (afsR-sp, metK-sp, etc.) involved in the regulation of secondary metabolites were found. This minireview summarizes the genome-based genome mining (GM) of diverse BCGs and genome exploration (GE) of versatile biocatalytic enzymes, and other enzymes involved in maintenance and regulation of metabolism of S. peucetius. The detailed analysis of genome sequence provides critically important knowledge useful in the bioengineering of the strain or harboring catalytically efficient enzymes for biotechnological applications.
Non-coding RNAs as regulators of gene expression and epigenetics

PubMed Central

Kaikkonen, Minna U.; Lam, Michael T.Y.; Glass, Christopher K.

2011-01-01

Genome-wide studies have revealed that mammalian genomes are pervasively transcribed. This has led to the identification and isolation of novel classes of non-coding RNAs (ncRNAs) that influence gene expression by a variety of mechanisms. Here we review the characteristics and functions of regulatory ncRNAs in chromatin remodelling and at multiple levels of transcriptional and post-transcriptional regulation. We also describe the potential roles of ncRNAs in vascular biology and in mediating epigenetic modifications that might play roles in cardiovascular disease susceptibility. The emerging recognition of the diverse functions of ncRNAs in regulation of gene expression suggests that they may represent new targets for therapeutic intervention. PMID:21558279
Divergent genome evolution caused by regional variation in DNA gain and loss between human and mouse

PubMed Central

Kortschak, R. Daniel

2018-01-01

The forces driving the accumulation and removal of non-coding DNA and ultimately the evolution of genome size in complex organisms are intimately linked to genome structure and organisation. Our analysis provides a novel method for capturing the regional variation of lineage-specific DNA gain and loss events in their respective genomic contexts. To further understand this connection we used comparative genomics to identify genome-wide individual DNA gain and loss events in the human and mouse genomes. Focusing on the distribution of DNA gains and losses, relationships to important structural features and potential impact on biological processes, we found that in autosomes, DNA gains and losses both followed separate lineage-specific accumulation patterns. However, in both species chromosome X was particularly enriched for DNA gain, consistent with its high L1 retrotransposon content required for X inactivation. We found that DNA loss was associated with gene-rich open chromatin regions and DNA gain events with gene-poor closed chromatin regions. Additionally, we found that DNA loss events tended to be smaller than DNA gain events suggesting that they were able to accumulate in gene-rich open chromatin regions due to their reduced capacity to interrupt gene regulatory architecture. GO term enrichment showed that mouse loss hotspots were strongly enriched for terms related to developmental processes. However, these genes were also located in regions with a high density of conserved elements, suggesting that despite high levels of DNA loss, gene regulatory architecture remained conserved. This is consistent with a model in which DNA gain and loss results in turnover or “churning” in regulatory element dense regions of open chromatin, where interruption of regulatory elements is selected against. PMID:29677183
Transposable elements and G-quadruplexes.

PubMed

Kejnovsky, Eduard; Tokan, Viktor; Lexa, Matej

2015-09-01

A significant part of eukaryotic genomes is formed by transposable elements (TEs) containing not only genes but also regulatory sequences. Some of the regulatory sequences located within TEs can form secondary structures like hairpins or three-stranded (triplex DNA) and four-stranded (quadruplex DNA) conformations. This review focuses on recent evidence showing that G-quadruplex-forming sequences in particular are often present in specific parts of TEs in plants and humans. We discuss the potential role of these structures in the TE life cycle as well as the impact of G-quadruplexes on replication, transcription, translation, chromatin status, and recombination. The aim of this review is to emphasize that TEs may serve as vehicles for the genomic spread of G-quadruplexes. These non-canonical DNA structures and their conformational switches may constitute another regulatory system that, together with small and long non-coding RNA molecules and proteins, contribute to the complex cellular network resulting in the large diversity of eukaryotes.
Identification and Potential Regulatory Properties of Evolutionary Conserved Regions (ECRs) at the Schizophrenia-Associated MIR137 Locus.

PubMed

Gianfrancesco, Olympia; Griffiths, Daniel; Myers, Paul; Collier, David A; Bubb, Vivien J; Quinn, John P

2016-10-01

Genome-wide association studies (GWAS) have identified a region at chromosome 1p21.3, containing the microRNA MIR137, to be among the most significant associations for schizophrenia. However, the mechanism by which genetic variation at this locus increases risk of schizophrenia is unknown. Identifying key regulatory regions around MIR137 is crucial to understanding the potential role of this gene in the aetiology of psychiatric disorders. Through alignment of vertebrate genomes, we identified seven non-coding regions at the MIR137 locus with conservation comparable to exons (>70 %). Bioinformatic analysis using the Psychiatric Genomics Consortium GWAS dataset for schizophrenia showed five of the ECRs to have genome-wide significant SNPs in or adjacent to their sequence. Analysis of available datasets on chromatin marks and histone modification data showed that three of the ECRs were predicted to be functional in the human brain, and three in development. In vitro analysis of ECR activity using reporter gene assays showed that all seven of the selected ECRs displayed transcriptional regulatory activity in the SH-SY5Y neuroblastoma cell line. This data suggests a regulatory role in the developing and adult brain for these highly conserved regions at the MIR137 schizophrenia-associated locus and further that these domains could act individually or synergistically to regulate levels of MIR137 expression.
A Positive Regulatory Loop between a Wnt-Regulated Non-coding RNA and ASCL2 Controls Intestinal Stem Cell Fate.

PubMed

Giakountis, Antonis; Moulos, Panagiotis; Zarkou, Vasiliki; Oikonomou, Christina; Harokopos, Vaggelis; Hatzigeorgiou, Artemis G; Reczko, Martin; Hatzis, Pantelis

2016-06-21

The canonical Wnt pathway plays a central role in stem cell maintenance, differentiation, and proliferation in the intestinal epithelium. Constitutive, aberrant activity of the TCF4/β-catenin transcriptional complex is the primary transforming factor in colorectal cancer. We identify a nuclear long non-coding RNA, termed WiNTRLINC1, as a direct target of TCF4/β-catenin in colorectal cancer cells. WiNTRLINC1 positively regulates the expression of its genomic neighbor ASCL2, a transcription factor that controls intestinal stem cell fate. WiNTRLINC1 interacts with TCF4/β-catenin to mediate the juxtaposition of its promoter with the regulatory regions of ASCL2. ASCL2, in turn, regulates WiNTRLINC1 transcriptionally, closing a feedforward regulatory loop that controls stem cell-related gene expression. This regulatory circuitry is highly amplified in colorectal cancer and correlates with increased metastatic potential and decreased patient survival. Our results uncover the interplay between non-coding RNA-mediated regulation and Wnt signaling and point to the diagnostic and therapeutic potential of WiNTRLINC1. Copyright © 2016 The Author(s). Published by Elsevier Inc. All rights reserved.
A Comparative Encyclopedia of DNA Elements in the Mouse Genome

PubMed Central

Yue, Feng; Cheng, Yong; Breschi, Alessandra; Vierstra, Jeff; Wu, Weisheng; Ryba, Tyrone; Sandstrom, Richard; Ma, Zhihai; Davis, Carrie; Pope, Benjamin D.; Shen, Yin; Pervouchine, Dmitri D.; Djebali, Sarah; Thurman, Bob; Kaul, Rajinder; Rynes, Eric; Kirilusha, Anthony; Marinov, Georgi K.; Williams, Brian A.; Trout, Diane; Amrhein, Henry; Fisher-Aylor, Katherine; Antoshechkin, Igor; DeSalvo, Gilberto; See, Lei-Hoon; Fastuca, Meagan; Drenkow, Jorg; Zaleski, Chris; Dobin, Alex; Prieto, Pablo; Lagarde, Julien; Bussotti, Giovanni; Tanzer, Andrea; Denas, Olgert; Li, Kanwei; Bender, M. A.; Zhang, Miaohua; Byron, Rachel; Groudine, Mark T.; McCleary, David; Pham, Long; Ye, Zhen; Kuan, Samantha; Edsall, Lee; Wu, Yi-Chieh; Rasmussen, Matthew D.; Bansal, Mukul S.; Keller, Cheryl A.; Morrissey, Christapher S.; Mishra, Tejaswini; Jain, Deepti; Dogan, Nergiz; Harris, Robert S.; Cayting, Philip; Kawli, Trupti; Boyle, Alan P.; Euskirchen, Ghia; Kundaje, Anshul; Lin, Shin; Lin, Yiing; Jansen, Camden; Malladi, Venkat S.; Cline, Melissa S.; Erickson, Drew T.; Kirkup, Vanessa M; Learned, Katrina; Sloan, Cricket A.; Rosenbloom, Kate R.; de Sousa, Beatriz Lacerda; Beal, Kathryn; Pignatelli, Miguel; Flicek, Paul; Lian, Jin; Kahveci, Tamer; Lee, Dongwon; Kent, W. James; Santos, Miguel Ramalho; Herrero, Javier; Notredame, Cedric; Johnson, Audra; Vong, Shinny; Lee, Kristen; Bates, Daniel; Neri, Fidencio; Diegel, Morgan; Canfield, Theresa; Sabo, Peter J.; Wilken, Matthew S.; Reh, Thomas A.; Giste, Erika; Shafer, Anthony; Kutyavin, Tanya; Haugen, Eric; Dunn, Douglas; Reynolds, Alex P.; Neph, Shane; Humbert, Richard; Hansen, R. Scott; De Bruijn, Marella; Selleri, Licia; Rudensky, Alexander; Josefowicz, Steven; Samstein, Robert; Eichler, Evan E.; Orkin, Stuart H.; Levasseur, Dana; Papayannopoulou, Thalia; Chang, Kai-Hsin; Skoultchi, Arthur; Gosh, Srikanta; Disteche, Christine; Treuting, Piper; Wang, Yanli; Weiss, Mitchell J.; Blobel, Gerd A.; Good, Peter J.; Lowdon, Rebecca F.; Adams, Leslie B.; Zhou, Xiao-Qiao; Pazin, Michael J.; Feingold, Elise A.; Wold, Barbara; Taylor, James; Kellis, Manolis; Mortazavi, Ali; Weissman, Sherman M.; Stamatoyannopoulos, John; Snyder, Michael P.; Guigo, Roderic; Gingeras, Thomas R.; Gilbert, David M.; Hardison, Ross C.; Beer, Michael A.; Ren, Bing

2014-01-01

Summary As the premier model organism in biomedical research, the laboratory mouse shares the majority of protein-coding genes with humans, yet the two mammals differ in significant ways. To gain greater insights into both shared and species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications, and replication domains throughout the mouse genome in diverse cell and tissue types. By comparing with the human genome, we not only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree of divergence of other sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization. Our results illuminate the wide range of evolutionary forces acting on genes and their regulatory regions, and provide a general resource for research into mammalian biology and mechanisms of human diseases. PMID:25409824
A comparative encyclopedia of DNA elements in the mouse genome.

PubMed

Yue, Feng; Cheng, Yong; Breschi, Alessandra; Vierstra, Jeff; Wu, Weisheng; Ryba, Tyrone; Sandstrom, Richard; Ma, Zhihai; Davis, Carrie; Pope, Benjamin D; Shen, Yin; Pervouchine, Dmitri D; Djebali, Sarah; Thurman, Robert E; Kaul, Rajinder; Rynes, Eric; Kirilusha, Anthony; Marinov, Georgi K; Williams, Brian A; Trout, Diane; Amrhein, Henry; Fisher-Aylor, Katherine; Antoshechkin, Igor; DeSalvo, Gilberto; See, Lei-Hoon; Fastuca, Meagan; Drenkow, Jorg; Zaleski, Chris; Dobin, Alex; Prieto, Pablo; Lagarde, Julien; Bussotti, Giovanni; Tanzer, Andrea; Denas, Olgert; Li, Kanwei; Bender, M A; Zhang, Miaohua; Byron, Rachel; Groudine, Mark T; McCleary, David; Pham, Long; Ye, Zhen; Kuan, Samantha; Edsall, Lee; Wu, Yi-Chieh; Rasmussen, Matthew D; Bansal, Mukul S; Kellis, Manolis; Keller, Cheryl A; Morrissey, Christapher S; Mishra, Tejaswini; Jain, Deepti; Dogan, Nergiz; Harris, Robert S; Cayting, Philip; Kawli, Trupti; Boyle, Alan P; Euskirchen, Ghia; Kundaje, Anshul; Lin, Shin; Lin, Yiing; Jansen, Camden; Malladi, Venkat S; Cline, Melissa S; Erickson, Drew T; Kirkup, Vanessa M; Learned, Katrina; Sloan, Cricket A; Rosenbloom, Kate R; Lacerda de Sousa, Beatriz; Beal, Kathryn; Pignatelli, Miguel; Flicek, Paul; Lian, Jin; Kahveci, Tamer; Lee, Dongwon; Kent, W James; Ramalho Santos, Miguel; Herrero, Javier; Notredame, Cedric; Johnson, Audra; Vong, Shinny; Lee, Kristen; Bates, Daniel; Neri, Fidencio; Diegel, Morgan; Canfield, Theresa; Sabo, Peter J; Wilken, Matthew S; Reh, Thomas A; Giste, Erika; Shafer, Anthony; Kutyavin, Tanya; Haugen, Eric; Dunn, Douglas; Reynolds, Alex P; Neph, Shane; Humbert, Richard; Hansen, R Scott; De Bruijn, Marella; Selleri, Licia; Rudensky, Alexander; Josefowicz, Steven; Samstein, Robert; Eichler, Evan E; Orkin, Stuart H; Levasseur, Dana; Papayannopoulou, Thalia; Chang, Kai-Hsin; Skoultchi, Arthur; Gosh, Srikanta; Disteche, Christine; Treuting, Piper; Wang, Yanli; Weiss, Mitchell J; Blobel, Gerd A; Cao, Xiaoyi; Zhong, Sheng; Wang, Ting; Good, Peter J; Lowdon, Rebecca F; Adams, Leslie B; Zhou, Xiao-Qiao; Pazin, Michael J; Feingold, Elise A; Wold, Barbara; Taylor, James; Mortazavi, Ali; Weissman, Sherman M; Stamatoyannopoulos, John A; Snyder, Michael P; Guigo, Roderic; Gingeras, Thomas R; Gilbert, David M; Hardison, Ross C; Beer, Michael A; Ren, Bing

2014-11-20

The laboratory mouse shares the majority of its protein-coding genes with humans, making it the premier model organism in biomedical research, yet the two mammals differ in significant ways. To gain greater insights into both shared and species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications and replication domains throughout the mouse genome in diverse cell and tissue types. By comparing with the human genome, we not only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree of divergence of sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization. Our results illuminate the wide range of evolutionary forces acting on genes and their regulatory regions, and provide a general resource for research into mammalian biology and mechanisms of human diseases.
Chromatin accessibility prediction via a hybrid deep convolutional neural network.

PubMed

Liu, Qiao; Xia, Fei; Yin, Qijin; Jiang, Rui

2018-03-01

A majority of known genetic variants associated with human-inherited diseases lie in non-coding regions that lack adequate interpretation, making it indispensable to systematically discover functional sites at the whole genome level and precisely decipher their implications in a comprehensive manner. Although computational approaches have been complementing high-throughput biological experiments towards the annotation of the human genome, it still remains a big challenge to accurately annotate regulatory elements in the context of a specific cell type via automatic learning of the DNA sequence code from large-scale sequencing data. Indeed, the development of an accurate and interpretable model to learn the DNA sequence signature and further enable the identification of causative genetic variants has become essential in both genomic and genetic studies. We proposed Deopen, a hybrid framework mainly based on a deep convolutional neural network, to automatically learn the regulatory code of DNA sequences and predict chromatin accessibility. In a series of comparison with existing methods, we show the superior performance of our model in not only the classification of accessible regions against background sequences sampled at random, but also the regression of DNase-seq signals. Besides, we further visualize the convolutional kernels and show the match of identified sequence signatures and known motifs. We finally demonstrate the sensitivity of our model in finding causative noncoding variants in the analysis of a breast cancer dataset. We expect to see wide applications of Deopen with either public or in-house chromatin accessibility data in the annotation of the human genome and the identification of non-coding variants associated with diseases. Deopen is freely available at https://github.com/kimmo1019/Deopen. ruijiang@tsinghua.edu.cn. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Computational Approaches to Identify Promoters and cis-Regulatory Elements in Plant Genomes1

PubMed Central

Rombauts, Stephane; Florquin, Kobe; Lescot, Magali; Marchal, Kathleen; Rouzé, Pierre; Van de Peer, Yves

2003-01-01

The identification of promoters and their regulatory elements is one of the major challenges in bioinformatics and integrates comparative, structural, and functional genomics. Many different approaches have been developed to detect conserved motifs in a set of genes that are either coregulated or orthologous. However, although recent approaches seem promising, in general, unambiguous identification of regulatory elements is not straightforward. The delineation of promoters is even harder, due to its complex nature, and in silico promoter prediction is still in its infancy. Here, we review the different approaches that have been developed for identifying promoters and their regulatory elements. We discuss the detection of cis-acting regulatory elements using word-counting or probabilistic methods (so-called “search by signal” methods) and the delineation of promoters by considering both sequence content and structural features (“search by content” methods). As an example of search by content, we explored in greater detail the association of promoters with CpG islands. However, due to differences in sequence content, the parameters used to detect CpG islands in humans and other vertebrates cannot be used for plants. Therefore, a preliminary attempt was made to define parameters that could possibly define CpG and CpNpG islands in Arabidopsis, by exploring the compositional landscape around the transcriptional start site. To this end, a data set of more than 5,000 gene sequences was built, including the promoter region, the 5′-untranslated region, and the first introns and coding exons. Preliminary analysis shows that promoter location based on the detection of potential CpG/CpNpG islands in the Arabidopsis genome is not straightforward. Nevertheless, because the landscape of CpG/CpNpG islands differs considerably between promoters and introns on the one side and exons (whether coding or not) on the other, more sophisticated approaches can probably be developed for the successful detection of “putative” CpG and CpNpG islands in plants. PMID:12857799
The Ever-Evolving Concept of the Gene: The Use of RNA/Protein Experimental Techniques to Understand Genome Functions

PubMed Central

Cipriano, Andrea; Ballarino, Monica

2018-01-01

The completion of the human genome sequence together with advances in sequencing technologies have shifted the paradigm of the genome, as composed of discrete and hereditable coding entities, and have shown the abundance of functional noncoding DNA. This part of the genome, previously dismissed as “junk” DNA, increases proportionally with organismal complexity and contributes to gene regulation beyond the boundaries of known protein-coding genes. Different classes of functionally relevant nonprotein-coding RNAs are transcribed from noncoding DNA sequences. Among them are the long noncoding RNAs (lncRNAs), which are thought to participate in the basal regulation of protein-coding genes at both transcriptional and post-transcriptional levels. Although knowledge of this field is still limited, the ability of lncRNAs to localize in different cellular compartments, to fold into specific secondary structures and to interact with different molecules (RNA or proteins) endows them with multiple regulatory mechanisms. It is becoming evident that lncRNAs may play a crucial role in most biological processes such as the control of development, differentiation and cell growth. This review places the evolution of the concept of the gene in its historical context, from Darwin's hypothetical mechanism of heredity to the post-genomic era. We discuss how the original idea of protein-coding genes as unique determinants of phenotypic traits has been reconsidered in light of the existence of noncoding RNAs. We summarize the technological developments which have been made in the genome-wide identification and study of lncRNAs and emphasize the methodologies that have aided our understanding of the complexity of lncRNA-protein interactions in recent years. PMID:29560353
The genomic substrate for adaptive radiation in African cichlid fish.

PubMed

Brawand, David; Wagner, Catherine E; Li, Yang I; Malinsky, Milan; Keller, Irene; Fan, Shaohua; Simakov, Oleg; Ng, Alvin Y; Lim, Zhi Wei; Bezault, Etienne; Turner-Maier, Jason; Johnson, Jeremy; Alcazar, Rosa; Noh, Hyun Ji; Russell, Pamela; Aken, Bronwen; Alföldi, Jessica; Amemiya, Chris; Azzouzi, Naoual; Baroiller, Jean-François; Barloy-Hubler, Frederique; Berlin, Aaron; Bloomquist, Ryan; Carleton, Karen L; Conte, Matthew A; D'Cotta, Helena; Eshel, Orly; Gaffney, Leslie; Galibert, Francis; Gante, Hugo F; Gnerre, Sante; Greuter, Lucie; Guyon, Richard; Haddad, Natalie S; Haerty, Wilfried; Harris, Rayna M; Hofmann, Hans A; Hourlier, Thibaut; Hulata, Gideon; Jaffe, David B; Lara, Marcia; Lee, Alison P; MacCallum, Iain; Mwaiko, Salome; Nikaido, Masato; Nishihara, Hidenori; Ozouf-Costaz, Catherine; Penman, David J; Przybylski, Dariusz; Rakotomanga, Michaelle; Renn, Suzy C P; Ribeiro, Filipe J; Ron, Micha; Salzburger, Walter; Sanchez-Pulido, Luis; Santos, M Emilia; Searle, Steve; Sharpe, Ted; Swofford, Ross; Tan, Frederick J; Williams, Louise; Young, Sarah; Yin, Shuangye; Okada, Norihiro; Kocher, Thomas D; Miska, Eric A; Lander, Eric S; Venkatesh, Byrappa; Fernald, Russell D; Meyer, Axel; Ponting, Chris P; Streelman, J Todd; Lindblad-Toh, Kerstin; Seehausen, Ole; Di Palma, Federica

2014-09-18

Cichlid fishes are famous for large, diverse and replicated adaptive radiations in the Great Lakes of East Africa. To understand the molecular mechanisms underlying cichlid phenotypic diversity, we sequenced the genomes and transcriptomes of five lineages of African cichlids: the Nile tilapia (Oreochromis niloticus), an ancestral lineage with low diversity; and four members of the East African lineage: Neolamprologus brichardi/pulcher (older radiation, Lake Tanganyika), Metriaclima zebra (recent radiation, Lake Malawi), Pundamilia nyererei (very recent radiation, Lake Victoria), and Astatotilapia burtoni (riverine species around Lake Tanganyika). We found an excess of gene duplications in the East African lineage compared to tilapia and other teleosts, an abundance of non-coding element divergence, accelerated coding sequence evolution, expression divergence associated with transposable element insertions, and regulation by novel microRNAs. In addition, we analysed sequence data from sixty individuals representing six closely related species from Lake Victoria, and show genome-wide diversifying selection on coding and regulatory variants, some of which were recruited from ancient polymorphisms. We conclude that a number of molecular mechanisms shaped East African cichlid genomes, and that amassing of standing variation during periods of relaxed purifying selection may have been important in facilitating subsequent evolutionary diversification.
The genomic substrate for adaptive radiation in African cichlid fish

PubMed Central

Malinsky, Milan; Keller, Irene; Fan, Shaohua; Simakov, Oleg; Ng, Alvin Y.; Lim, Zhi Wei; Bezault, Etienne; Turner-Maier, Jason; Johnson, Jeremy; Alcazar, Rosa; Noh, Hyun Ji; Russell, Pamela; Aken, Bronwen; Alföldi, Jessica; Amemiya, Chris; Azzouzi, Naoual; Baroiller, Jean-François; Barloy-Hubler, Frederique; Berlin, Aaron; Bloomquist, Ryan; Carleton, Karen L.; Conte, Matthew A.; D'Cotta, Helena; Eshel, Orly; Gaffney, Leslie; Galibert, Francis; Gante, Hugo F.; Gnerre, Sante; Greuter, Lucie; Guyon, Richard; Haddad, Natalie S.; Haerty, Wilfried; Harris, Rayna M.; Hofmann, Hans A.; Hourlier, Thibaut; Hulata, Gideon; Jaffe, David B.; Lara, Marcia; Lee, Alison P.; MacCallum, Iain; Mwaiko, Salome; Nikaido, Masato; Nishihara, Hidenori; Ozouf-Costaz, Catherine; Penman, David J.; Przybylski, Dariusz; Rakotomanga, Michaelle; Renn, Suzy C. P.; Ribeiro, Filipe J.; Ron, Micha; Salzburger, Walter; Sanchez-Pulido, Luis; Santos, M. Emilia; Searle, Steve; Sharpe, Ted; Swofford, Ross; Tan, Frederick J.; Williams, Louise; Young, Sarah; Yin, Shuangye; Okada, Norihiro; Kocher, Thomas D.; Miska, Eric A.; Lander, Eric S.; Venkatesh, Byrappa; Fernald, Russell D.; Meyer, Axel; Ponting, Chris P.; Streelman, J. Todd; Lindblad-Toh, Kerstin; Seehausen, Ole; Di Palma, Federica

2015-01-01

Cichlid fishes are famous for large, diverse and replicated adaptive radiations in the Great Lakes of East Africa. To understand the molecular mechanisms underlying cichlid phenotypic diversity, we sequenced the genomes and transcriptomes of five lineages of African cichlids: the Nile tilapia (Oreochromis niloticus), an ancestral lineage with low diversity; and four members of the East African lineage: Neolamprologus brichardi/pulcher (older radiation, Lake Tanganyika), Metriaclima zebra (recent radiation, Lake Malawi), Pundamilia nyererei (very recent radiation, Lake Victoria), and Astatotilapia burtoni (riverine species around Lake Tanganyika). We found an excess of gene duplications in the East African lineage compared to tilapia and other teleosts, an abundance of non-coding element divergence, accelerated coding sequence evolution, expression divergence associated with transposable element insertions, and regulation by novel microRNAs. In addition, we analysed sequence data from sixty individuals representing six closely related species from Lake Victoria, and show genome-wide diversifying selection on coding and regulatory variants, some of which were recruited from ancient polymorphisms. We conclude that a number of molecular mechanisms shaped East African cichlid genomes, and that amassing of standing variation during periods of relaxed purifying selection may have been important in facilitating subsequent evolutionary diversification. PMID:25186727
Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures

PubMed Central

Stark, Alexander; Lin, Michael F.; Kheradpour, Pouya; Pedersen, Jakob S.; Parts, Leopold; Carlson, Joseph W.; Crosby, Madeline A.; Rasmussen, Matthew D.; Roy, Sushmita; Deoras, Ameya N.; Ruby, J. Graham; Brennecke, Julius; Hodges, Emily; Hinrichs, Angie S.; Caspi, Anat; Paten, Benedict; Park, Seung-Won; Han, Mira V.; Maeder, Morgan L.; Polansky, Benjamin J.; Robson, Bryanne E.; Aerts, Stein; van Helden, Jacques; Hassan, Bassem; Gilbert, Donald G.; Eastman, Deborah A.; Rice, Michael; Weir, Michael; Hahn, Matthew W.; Park, Yongkyu; Dewey, Colin N.; Pachter, Lior; Kent, W. James; Haussler, David; Lai, Eric C.; Bartel, David P.; Hannon, Gregory J.; Kaufman, Thomas C.; Eisen, Michael B.; Clark, Andrew G.; Smith, Douglas; Celniker, Susan E.; Gelbart, William M.; Kellis, Manolis

2008-01-01

Sequencing of multiple related species followed by comparative genomics analysis constitutes a powerful approach for the systematic understanding of any genome. Here, we use the genomes of 12 Drosophila species for the de novo discovery of functional elements in the fly. Each type of functional element shows characteristic patterns of change, or ‘evolutionary signatures’, dictated by its precise selective constraints. Such signatures enable recognition of new protein-coding genes and exons, spurious and incorrect gene annotations, and numerous unusual gene structures, including abundant stop-codon readthrough. Similarly, we predict non-protein-coding RNA genes and structures, and new microRNA (miRNA) genes. We provide evidence of miRNA processing and functionality from both hairpin arms and both DNA strands. We identify several classes of pre- and post-transcriptional regulatory motifs, and predict individual motif instances with high confidence. We also study how discovery power scales with the divergence and number of species compared, and we provide general guidelines for comparative studies. PMID:17994088
Technological Developments in lncRNA Biology.

PubMed

Jathar, Sonali; Kumar, Vikram; Srivastava, Juhi; Tripathi, Vidisha

2017-01-01

It is estimated that more than 90% of the mammalian genome is transcribed as non-coding RNAs. Recent evidences have established that these non-coding transcripts are not junk or just transcriptional noise, but they do serve important biological purpose. One of the rapidly expanding fields of this class of transcripts is the regulatory lncRNAs, which had been a major challenge in terms of their molecular functions and mechanisms of action. The emergence of high-throughput technologies and the development in various conventional approaches have led to the expansion of the lncRNA world. The combination of multidisciplinary approaches has proven to be essential to unravel the complexity of their regulatory networks and helped establish the importance of their existence. Here, we review the current methodologies available for discovering and investigating functions of long non-coding RNAs (lncRNAs) and focus on the powerful technological advancement available to specifically address their functional importance.
Draft genome sequence for virulent and avirulent strains of Xanthomonas arboricola isolated from Prunus spp. in Spain.

PubMed

Garita-Cambronero, Jerson; Palacio-Bielsa, Ana; López, María M; Cubero, Jaime

2016-01-01

Xanthomonas arboricola is a species in genus Xanthomonas which is mainly comprised of plant pathogens. Among the members of this taxon, X. arboricola pv. pruni, the causal agent of bacterial spot disease of stone fruits and almond, is distributed worldwide although it is considered a quarantine pathogen in the European Union. Herein, we report the draft genome sequence, the classification, the annotation and the sequence analyses of a virulent strain, IVIA 2626.1, and an avirulent strain, CITA 44, of X. arboricola associated with Prunus spp. The draft genome sequence of IVIA 2626.1 consists of 5,027,671 bp, 4,720 protein coding genes and 50 RNA encoding genes. The draft genome sequence of strain CITA 44 consists of 4,760,482 bp, 4,250 protein coding genes and 56 RNA coding genes. Initial comparative analyses reveals differences in the presence of structural and regulatory components of the type IV pilus, the type III secretion system, the type III effectors as well as variations in the number of the type IV secretion systems. The genome sequence data for these strains will facilitate the development of molecular diagnostics protocols that differentiate virulent and avirulent strains. In addition, comparative genome analysis will provide insights into the plant-pathogen interaction during the bacterial spot disease process.
Networking Omic Data to Envisage Systems Biological Regulation.

PubMed

Kalapanulak, Saowalak; Saithong, Treenut; Thammarongtham, Chinae

To understand how biological processes work, it is necessary to explore the systematic regulation governing the behaviour of the processes. Not only driving the normal behavior of organisms, the systematic regulation evidently underlies the temporal responses to surrounding environments (dynamics) and long-term phenotypic adaptation (evolution). The systematic regulation is, in effect, formulated from the regulatory components which collaboratively work together as a network. In the drive to decipher such a code of lives, a spectrum of technologies has continuously been developed in the post-genomic era. With current advances, high-throughput sequencing technologies are tremendously powerful for facilitating genomics and systems biology studies in the attempt to understand system regulation inside the cells. The ability to explore relevant regulatory components which infer transcriptional and signaling regulation, driving core cellular processes, is thus enhanced. This chapter reviews high-throughput sequencing technologies, including second and third generation sequencing technologies, which support the investigation of genomics and transcriptomics data. Utilization of this high-throughput data to form the virtual network of systems regulation is explained, particularly transcriptional regulatory networks. Analysis of the resulting regulatory networks could lead to an understanding of cellular systems regulation at the mechanistic and dynamics levels. The great contribution of the biological networking approach to envisage systems regulation is finally demonstrated by a broad range of examples.
Puzzles in modern biology. V. Why are genomes overwired?

PubMed

Frank, Steven A

2017-01-01

Many factors affect eukaryotic gene expression. Transcription factors, histone codes, DNA folding, and noncoding RNA modulate expression. Those factors interact in large, broadly connected regulatory control networks. An engineer following classical principles of control theory would design a simpler regulatory network. Why are genomes overwired? Neutrality or enhanced robustness may lead to the accumulation of additional factors that complicate network architecture. Dynamics progresses like a ratchet. New factors get added. Genomes adapt to the additional complexity. The newly added factors can no longer be removed without significant loss of fitness. Alternatively, highly wired genomes may be more malleable. In large networks, most genomic variants tend to have a relatively small effect on gene expression and trait values. Many small effects lead to a smooth gradient, in which traits may change steadily with respect to underlying regulatory changes. A smooth gradient may provide a continuous path from a starting point up to the highest peak of performance. A potential path of increasing performance promotes adaptability and learning. Genomes gain by the inductive process of natural selection, a trial and error learning algorithm that discovers general solutions for adapting to environmental challenge. Similarly, deeply and densely connected computational networks gain by various inductive trial and error learning procedures, in which the networks learn to reduce the errors in sequential trials. Overwiring alters the geometry of induction by smoothing the gradient along the inductive pathways of improving performance. Those overwiring benefits for induction apply to both natural biological networks and artificial deep learning networks.
Inference of gene regulatory networks from genome-wide knockout fitness data

PubMed Central

Wang, Liming; Wang, Xiaodong; Arkin, Adam P.; Samoilov, Michael S.

2013-01-01

Motivation: Genome-wide fitness is an emerging type of high-throughput biological data generated for individual organisms by creating libraries of knockouts, subjecting them to broad ranges of environmental conditions, and measuring the resulting clone-specific fitnesses. Since fitness is an organism-scale measure of gene regulatory network behaviour, it may offer certain advantages when insights into such phenotypical and functional features are of primary interest over individual gene expression. Previous works have shown that genome-wide fitness data can be used to uncover novel gene regulatory interactions, when compared with results of more conventional gene expression analysis. Yet, to date, few algorithms have been proposed for systematically using genome-wide mutant fitness data for gene regulatory network inference. Results: In this article, we describe a model and propose an inference algorithm for using fitness data from knockout libraries to identify underlying gene regulatory networks. Unlike most prior methods, the presented approach captures not only structural, but also dynamical and non-linear nature of biomolecular systems involved. A state–space model with non-linear basis is used for dynamically describing gene regulatory networks. Network structure is then elucidated by estimating unknown model parameters. Unscented Kalman filter is used to cope with the non-linearities introduced in the model, which also enables the algorithm to run in on-line mode for practical use. Here, we demonstrate that the algorithm provides satisfying results for both synthetic data as well as empirical measurements of GAL network in yeast Saccharomyces cerevisiae and TyrR–LiuR network in bacteria Shewanella oneidensis. Availability: MATLAB code and datasets are available to download at http://www.duke.edu/∼lw174/Fitness.zip and http://genomics.lbl.gov/supplemental/fitness-bioinf/ Contact: wangx@ee.columbia.edu or mssamoilov@lbl.gov Supplementary information: Supplementary data are available at Bioinformatics online PMID:23271269

Partial Shotgun Sequencing of the Boechera stricta Genome Reveals Extensive Microsynteny and Promoter Conservation with Arabidopsis1[W

PubMed Central

Windsor, Aaron J.; Schranz, M. Eric; Formanová, Nataša; Gebauer-Jung, Steffi; Bishop, John G.; Schnabelrauch, Domenica; Kroymann, Juergen; Mitchell-Olds, Thomas

2006-01-01

Comparative genomics provides insight into the evolutionary dynamics that shape discrete sequences as well as whole genomes. To advance comparative genomics within the Brassicaceae, we have end sequenced 23,136 medium-sized insert clones from Boechera stricta, a wild relative of Arabidopsis (Arabidopsis thaliana). A significant proportion of these sequences, 18,797, are nonredundant and display highly significant similarity (BLASTn e-value ≤ 10−30) to low copy number Arabidopsis genomic regions, including more than 9,000 annotated coding sequences. We have used this dataset to identify orthologous gene pairs in the two species and to perform a global comparison of DNA regions 5′ to annotated coding regions. On average, the 500 nucleotides upstream to coding sequences display 71.4% identity between the two species. In a similar analysis, 61.4% identity was observed between 5′ noncoding sequences of Brassica oleracea and Arabidopsis, indicating that regulatory regions are not as diverged among these lineages as previously anticipated. By mapping the B. stricta end sequences onto the Arabidopsis genome, we have identified nearly 2,000 conserved blocks of microsynteny (bracketing 26% of the Arabidopsis genome). A comparison of fully sequenced B. stricta inserts to their homologous Arabidopsis genomic regions indicates that indel polymorphisms >5 kb contribute substantially to the genome size difference observed between the two species. Further, we demonstrate that microsynteny inferred from end-sequence data can be applied to the rapid identification and cloning of genomic regions of interest from nonmodel species. These results suggest that among diploid relatives of Arabidopsis, small- to medium-scale shotgun sequencing approaches can provide rapid and cost-effective benefits to evolutionary and/or functional comparative genomic frameworks. PMID:16607030
Genomic Identification and Analysis of Shared Cis-regulator Elements in a Developmentally Critical homeobox Cluster

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chris Amemiya

2003-04-01

The goals of this project were to isolate, characterize, and sequence the Dlx3/Dlx7 bigene cluster from twelve different species of mammals. The Dlx3 and Dlx7 genes are known to encode homeobox transcription factors involved in patterning of structures in the vertebrate jaw as well as vertebrate limbs. Genomic sequences from the respective taxa will subsequently be compared in order to identify conserved non-coding sequences that are potential cis-regulatory elements. Based on the comparisons they will fashion transgenic mouse experiments to functionally test the strength of the potential cis-regulatory elements. A goal of the project is to attempt to identify thosemore » elements that may function in coordinately regulating both Dlx3 and Dlx7 functions.« less
Predictive computation of genomic logic processing functions in embryonic development

PubMed Central

Peter, Isabelle S.; Faure, Emmanuel; Davidson, Eric H.

2012-01-01

Gene regulatory networks (GRNs) control the dynamic spatial patterns of regulatory gene expression in development. Thus, in principle, GRN models may provide system-level, causal explanations of developmental process. To test this assertion, we have transformed a relatively well-established GRN model into a predictive, dynamic Boolean computational model. This Boolean model computes spatial and temporal gene expression according to the regulatory logic and gene interactions specified in a GRN model for embryonic development in the sea urchin. Additional information input into the model included the progressive embryonic geometry and gene expression kinetics. The resulting model predicted gene expression patterns for a large number of individual regulatory genes each hour up to gastrulation (30 h) in four different spatial domains of the embryo. Direct comparison with experimental observations showed that the model predictively computed these patterns with remarkable spatial and temporal accuracy. In addition, we used this model to carry out in silico perturbations of regulatory functions and of embryonic spatial organization. The model computationally reproduced the altered developmental functions observed experimentally. Two major conclusions are that the starting GRN model contains sufficiently complete regulatory information to permit explanation of a complex developmental process of gene expression solely in terms of genomic regulatory code, and that the Boolean model provides a tool with which to test in silico regulatory circuitry and developmental perturbations. PMID:22927416
Cellular miR-2909 RNomics governs the genes that ensure immune checkpoint regulation.

PubMed

Kaul, Deepak; Malik, Deepti; Wani, Sameena

2018-06-20

Cross-talk between coding RNAs and regulatory non-coding microRNAs, within human genome, has provided compelling evidence for the existence of flexible checkpoint control of T-Cell activation. The present study attempts to demonstrate that the interplay between miR-2909 and its effector KLF4 gene has the inherent capacity to regulate genes coding for CTLA4, CD28, CD40, CD134, PDL1, CD80, CD86, IL-6 and IL-10 within normal human peripheral blood mononuclear cells (PBMCs). Based upon these findings, we propose a pathway that links miR-2909 RNomics with the genes coding for immune checkpoint regulators required for the maintenance of immune homeostasis.
Evaluating the protein coding potential of exonized transposable element sequences

PubMed Central

Piriyapongsa, Jittima; Rutledge, Mark T; Patel, Sanil; Borodovsky, Mark; Jordan, I King

2007-01-01

Background Transposable element (TE) sequences, once thought to be merely selfish or parasitic members of the genomic community, have been shown to contribute a wide variety of functional sequences to their host genomes. Analysis of complete genome sequences have turned up numerous cases where TE sequences have been incorporated as exons into mRNAs, and it is widely assumed that such 'exonized' TEs encode protein sequences. However, the extent to which TE-derived sequences actually encode proteins is unknown and a matter of some controversy. We have tried to address this outstanding issue from two perspectives: i-by evaluating ascertainment biases related to the search methods used to uncover TE-derived protein coding sequences (CDS) and ii-through a probabilistic codon-frequency based analysis of the protein coding potential of TE-derived exons. Results We compared the ability of three classes of sequence similarity search methods to detect TE-derived sequences among data sets of experimentally characterized proteins: 1-a profile-based hidden Markov model (HMM) approach, 2-BLAST methods and 3-RepeatMasker. Profile based methods are more sensitive and more selective than the other methods evaluated. However, the application of profile-based search methods to the detection of TE-derived sequences among well-curated experimentally characterized protein data sets did not turn up many more cases than had been previously detected and nowhere near as many cases as recent genome-wide searches have. We observed that the different search methods used were complementary in the sense that they yielded largely non-overlapping sets of hits and differed in their ability to recover known cases of TE-derived CDS. The probabilistic analysis of TE-derived exon sequences indicates that these sequences have low protein coding potential on average. In particular, non-autonomous TEs that do not encode protein sequences, such as Alu elements, are frequently exonized but unlikely to encode protein sequences. Conclusion The exaptation of the numerous TE sequences found in exons as bona fide protein coding sequences may prove to be far less common than has been suggested by the analysis of complete genomes. We hypothesize that many exonized TE sequences actually function as post-transcriptional regulators of gene expression, rather than coding sequences, which may act through a variety of double stranded RNA related regulatory pathways. Indeed, their relatively high copy numbers and similarity to sequences dispersed throughout the genome suggests that exonized TE sequences could serve as master regulators with a wide scope of regulatory influence. Reviewers: This article was reviewed by Itai Yanai, Kateryna D. Makova, Melissa Wilson (nominated by Kateryna D. Makova) and Cedric Feschotte (nominated by John M. Logsdon Jr.). PMID:18036258
A Noncoding, Regulatory Mutation Implicates HCFC1 in Nonsyndromic Intellectual Disability

PubMed Central

Huang, Lingli; Jolly, Lachlan A.; Willis-Owen, Saffron; Gardner, Alison; Kumar, Raman; Douglas, Evelyn; Shoubridge, Cheryl; Wieczorek, Dagmar; Tzschach, Andreas; Cohen, Monika; Hackett, Anna; Field, Michael; Froyen, Guy; Hu, Hao; Haas, Stefan A.; Ropers, Hans-Hilger; Kalscheuer, Vera M.; Corbett, Mark A.; Gecz, Jozef

2012-01-01

The discovery of mutations causing human disease has so far been biased toward protein-coding regions. Having excluded all annotated coding regions, we performed targeted massively parallel resequencing of the nonrepetitive genomic linkage interval at Xq28 of family MRX3. We identified in the binding site of transcription factor YY1 a regulatory mutation that leads to overexpression of the chromatin-associated transcriptional regulator HCFC1. When tested on embryonic murine neural stem cells and embryonic hippocampal neurons, HCFC1 overexpression led to a significant increase of the production of astrocytes and a considerable reduction in neurite growth. Two other nonsynonymous, potentially deleterious changes have been identified by X-exome sequencing in individuals with intellectual disability, implicating HCFC1 in normal brain function. PMID:23000143
Comprehensive identification and analysis of human accelerated regulatory DNA

PubMed Central

Gittelman, Rachel M.; Hun, Enna; Ay, Ferhat; Madeoy, Jennifer; Pennacchio, Len; Noble, William S.; Hawkins, R. David; Akey, Joshua M.

2015-01-01

It has long been hypothesized that changes in gene regulation have played an important role in human evolution, but regulatory DNA has been much more difficult to study compared with protein-coding regions. Recent large-scale studies have created genome-scale catalogs of DNase I hypersensitive sites (DHSs), which demark potentially functional regulatory DNA. To better define regulatory DNA that has been subject to human-specific adaptive evolution, we performed comprehensive evolutionary and population genetics analyses on over 18 million DHSs discovered in 130 cell types. We identified 524 DHSs that are conserved in nonhuman primates but accelerated in the human lineage (haDHS), and estimate that 70% of substitutions in haDHSs are attributable to positive selection. Through extensive computational and experimental analyses, we demonstrate that haDHSs are often active in brain or neuronal cell types; play an important role in regulating the expression of developmentally important genes, including many transcription factors such as SOX6, POU3F2, and HOX genes; and identify striking examples of adaptive regulatory evolution that may have contributed to human-specific phenotypes. More generally, our results reveal new insights into conserved and adaptive regulatory DNA in humans and refine the set of genomic substrates that distinguish humans from their closest living primate relatives. PMID:26104583
Refactored M13 Bacteriophage as a Platform for Tumor Cell Imaging and Drug Delivery

PubMed Central

MOSER, FELIX; ENDY, DREW; BELCHER, ANGELA M.

2014-01-01

M13 bacteriophage is a well-characterized platform for peptide display. The utility of the M13 display platform is derived from the ability to encode phage protein fusions with display peptides at the genomic level. However, the genome of the phage is complicated by overlaps of key genetic elements. These overlaps directly couple the coding sequence of one gene to the coding or regulatory sequence of another, making it difficult to alter one gene without disrupting the other. Specifically, overlap of the end of gene VII and the beginning of gene IX has prevented the functional genomic modification of the N-terminus of p9. By redesigning the M13 genome to physically separate these overlapping genetic elements, a process known as “refactoring,” we enabled independent manipulation of gene VII and gene IX and the construction of the first N-terminal genomic modification of p9 for peptide display. We demonstrate the utility of this refactored genome by developing an M13 bacteriophage-based platform for targeted imaging of and drug delivery to prostate cancer cells in vitro. This successful use of refactoring principles to reengineer a natural biological system strengthens the suggestion that natural genomes can be rationally designed for a number of applications. PMID:23656279
Refactored M13 bacteriophage as a platform for tumor cell imaging and drug delivery.

PubMed

Ghosh, Debadyuti; Kohli, Aditya G; Moser, Felix; Endy, Drew; Belcher, Angela M

2012-12-21

M13 bacteriophage is a well-characterized platform for peptide display. The utility of the M13 display platform is derived from the ability to encode phage protein fusions with display peptides at the genomic level. However, the genome of the phage is complicated by overlaps of key genetic elements. These overlaps directly couple the coding sequence of one gene to the coding or regulatory sequence of another, making it difficult to alter one gene without disrupting the other. Specifically, overlap of the end of gene VII and the beginning of gene IX has prevented the functional genomic modification of the N-terminus of p9. By redesigning the M13 genome to physically separate these overlapping genetic elements, a process known as "refactoring," we enabled independent manipulation of gene VII and gene IX and the construction of the first N-terminal genomic modification of p9 for peptide display. We demonstrate the utility of this refactored genome by developing an M13 bacteriophage-based platform for targeted imaging of and drug delivery to prostate cancer cells in vitro. This successful use of refactoring principles to re-engineer a natural biological system strengthens the suggestion that natural genomes can be rationally designed for a number of applications.
Integration of multi-omics data of a genome-reduced bacterium: Prevalence of post-transcriptional regulation and its correlation with protein abundances

PubMed Central

Chen, Wei-Hua; van Noort, Vera; Lluch-Senar, Maria; Hennrich, Marco L.; H. Wodke, Judith A.; Yus, Eva; Alibés, Andreu; Roma, Guglielmo; Mende, Daniel R.; Pesavento, Christina; Typas, Athanasios; Gavin, Anne-Claude; Serrano, Luis; Bork, Peer

2016-01-01

We developed a comprehensive resource for the genome-reduced bacterium Mycoplasma pneumoniae comprising 1748 consistently generated ‘-omics’ data sets, and used it to quantify the power of antisense non-coding RNAs (ncRNAs), lysine acetylation, and protein phosphorylation in predicting protein abundance (11%, 24% and 8%, respectively). These factors taken together are four times more predictive of the proteome abundance than of mRNA abundance. In bacteria, post-translational modifications (PTMs) and ncRNA transcription were both found to increase with decreasing genomic GC-content and genome size. Thus, the evolutionary forces constraining genome size and GC-content modify the relative contributions of the different regulatory layers to proteome homeostasis, and impact more genomic and genetic features than previously appreciated. Indeed, these scaling principles will enable us to develop more informed approaches when engineering minimal synthetic genomes. PMID:26773059
Piecing together cis-regulatory networks: insights from epigenomics studies in plants.

PubMed

Huang, Shao-Shan C; Ecker, Joseph R

2018-05-01

5-Methylcytosine, a chemical modification of DNA, is a covalent modification found in the genomes of both plants and animals. Epigenetic inheritance of phenotypes mediated by DNA methylation is well established in plants. Most of the known mechanisms of establishing, maintaining and modifying DNA methylation have been worked out in the reference plant Arabidopsis thaliana. Major functions of DNA methylation in plants include regulation of gene expression and silencing of transposable elements (TEs) and repetitive sequences, both of which have parallels in mammalian biology, involve interaction with the transcriptional machinery, and may have profound effects on the regulatory networks in the cell. Methylome and transcriptome dynamics have been investigated in development and environmental responses in Arabidopsis and agriculturally and ecologically important plants, revealing the interdependent relationship among genomic context, methylation patterns, and expression of TE and protein coding genes. Analyses of methylome variation among plant natural populations and species have begun to quantify the extent of genetic control of methylome variation vs. true epimutation, and model the evolutionary forces driving methylome evolution in both short and long time scales. The ability of DNA methylation to positively or negatively modulate binding affinity of transcription factors (TFs) provides a natural link from genome sequence and methylation changes to transcription. Technologies that allow systematic determination of methylation sensitivities of TFs, in native genomic and methylation context without confounding factors such as histone modifications, will provide baseline datasets for building cell-type- and individual-specific regulatory networks that underlie the establishment and inheritance of complex traits. This article is categorized under: Laboratory Methods and Technologies > Genetic/Genomic Methods Biological Mechanisms > Regulatory Biology. © 2017 Wiley Periodicals, Inc.
Resurrection of DNA Function In Vivo from an Extinct Genome

PubMed Central

Pask, Andrew J.; Behringer, Richard R.; Renfree, Marilyn B.

2008-01-01

There is a burgeoning repository of information available from ancient DNA that can be used to understand how genomes have evolved and to determine the genetic features that defined a particular species. To assess the functional consequences of changes to a genome, a variety of methods are needed to examine extinct DNA function. We isolated a transcriptional enhancer element from the genome of an extinct marsupial, the Tasmanian tiger (Thylacinus cynocephalus or thylacine), obtained from 100 year-old ethanol-fixed tissues from museum collections. We then examined the function of the enhancer in vivo. Using a transgenic approach, it was possible to resurrect DNA function in transgenic mice. The results demonstrate that the thylacine Col2A1 enhancer directed chondrocyte-specific expression in this extinct mammalian species in the same way as its orthologue does in mice. While other studies have examined extinct coding DNA function in vitro, this is the first example of the restoration of extinct non-coding DNA and examination of its function in vivo. Our method using transgenesis can be used to explore the function of regulatory and protein-coding sequences obtained from any extinct species in an in vivo model system, providing important insights into gene evolution and diversity. PMID:18493600
The evolutionary history of Saccharomyces species inferred from completed mitochondrial genomes and revision in the ‘yeast mitochondrial genetic code’

PubMed Central

Szabóová, Dana; Bielik, Peter; Poláková, Silvia; Šoltys, Katarína; Jatzová, Katarína; Szemes, Tomáš

2017-01-01

Abstract The yeast Saccharomyces are widely used to test ecological and evolutionary hypotheses. A large number of nuclear genomic DNA sequences are available, but mitochondrial genomic data are insufficient. We completed mitochondrial DNA (mtDNA) sequencing from Illumina MiSeq reads for all Saccharomyces species. All are circularly mapped molecules decreasing in size with phylogenetic distance from Saccharomyces cerevisiae but with similar gene content including regulatory and selfish elements like origins of replication, introns, free-standing open reading frames or GC clusters. Their most profound feature is species-specific alteration in gene order. The genetic code slightly differs from well-established yeast mitochondrial code as GUG is used rarely as the translation start and CGA and CGC code for arginine. The multilocus phylogeny, inferred from mtDNA, does not correlate with the trees derived from nuclear genes. mtDNA data demonstrate that Saccharomyces cariocanus should be assigned as a separate species and Saccharomyces bayanus CBS 380T should not be considered as a distinct species due to mtDNA nearly identical to Saccharomyces uvarum mtDNA. Apparently, comparison of mtDNAs should not be neglected in genomic studies as it is an important tool to understand the origin and evolutionary history of some yeast species. PMID:28992063
Intricate and Cell Type-Specific Populations of Endogenous Circular DNA (eccDNA) in Caenorhabditis elegans and Homo sapiens.

PubMed

Shoura, Massa J; Gabdank, Idan; Hansen, Loren; Merker, Jason; Gotlib, Jason; Levene, Stephen D; Fire, Andrew Z

2017-10-05

Investigations aimed at defining the 3D configuration of eukaryotic chromosomes have consistently encountered an endogenous population of chromosome-derived circular genomic DNA, referred to as extrachromosomal circular DNA (eccDNA). While the production, distribution, and activities of eccDNAs remain understudied, eccDNA formation from specific regions of the linear genome has profound consequences on the regulatory and coding capabilities for these regions. Here, we define eccDNA distributions in Caenorhabditis elegans and in three human cell types, utilizing a set of DNA topology-dependent approaches for enrichment and characterization. The use of parallel biophysical, enzymatic, and informatic approaches provides a comprehensive profiling of eccDNA robust to isolation and analysis methodology. Results in human and nematode systems provide quantitative analysis of the eccDNA loci at both unique and repetitive regions. Our studies converge on and support a consistent picture, in which endogenous genomic DNA circles are present in normal physiological states, and in which the circles come from both coding and noncoding genomic regions. Prominent among the coding regions generating DNA circles are several genes known to produce a diversity of protein isoforms, with mucin proteins and titin as specific examples. Copyright © 2017 Shoura et al.
The impact of rare variation on gene expression across tissues.

PubMed

Li, Xin; Kim, Yungil; Tsang, Emily K; Davis, Joe R; Damani, Farhan N; Chiang, Colby; Hess, Gaelen T; Zappala, Zachary; Strober, Benjamin J; Scott, Alexandra J; Li, Amy; Ganna, Andrea; Bassik, Michael C; Merker, Jason D; Hall, Ira M; Battle, Alexis; Montgomery, Stephen B

2017-10-11

Rare genetic variants are abundant in humans and are expected to contribute to individual disease risk. While genetic association studies have successfully identified common genetic variants associated with susceptibility, these studies are not practical for identifying rare variants. Efforts to distinguish pathogenic variants from benign rare variants have leveraged the genetic code to identify deleterious protein-coding alleles, but no analogous code exists for non-coding variants. Therefore, ascertaining which rare variants have phenotypic effects remains a major challenge. Rare non-coding variants have been associated with extreme gene expression in studies using single tissues, but their effects across tissues are unknown. Here we identify gene expression outliers, or individuals showing extreme expression levels for a particular gene, across 44 human tissues by using combined analyses of whole genomes and multi-tissue RNA-sequencing data from the Genotype-Tissue Expression (GTEx) project v6p release. We find that 58% of underexpression and 28% of overexpression outliers have nearby conserved rare variants compared to 8% of non-outliers. Additionally, we developed RIVER (RNA-informed variant effect on regulation), a Bayesian statistical model that incorporates expression data to predict a regulatory effect for rare variants with higher accuracy than models using genomic annotations alone. Overall, we demonstrate that rare variants contribute to large gene expression changes across tissues and provide an integrative method for interpretation of rare variants in individual genomes.
Genome-wide identification of miRNAs and lncRNAs in Cajanus cajan.

PubMed

Nithin, Chandran; Thomas, Amal; Basak, Jolly; Bahadur, Ranjit Prasad

2017-11-15

Non-coding RNAs (ncRNAs) are important players in the post transcriptional regulation of gene expression (PTGR). On one hand, microRNAs (miRNAs) are an abundant class of small ncRNAs (~22nt long) that negatively regulate gene expression at the levels of messenger RNAs stability and translation inhibition, on the other hand, long ncRNAs (lncRNAs) are a large and diverse class of transcribed non-protein coding RNA molecules (> 200nt) that play both up-regulatory as well as down-regulatory roles at the transcriptional level. Cajanus cajan, a leguminosae pulse crop grown in tropical and subtropical areas of the world, is a source of high value protein to vegetarians or very poor populations globally. Hence, genome-wide identification of miRNAs and lncRNAs in C. cajan is extremely important to understand their role in PTGR with a possible implication to generate improve variety of crops. We have identified 616 mature miRNAs in C. cajan belonging to 118 families, of which 578 are novel and not reported in MirBase21. A total of 1373 target sequences were identified for 180 miRNAs. Of these, 298 targets were characterized at the protein level. Besides, we have also predicted 3919 lncRNAs. Additionally, we have identified 87 of the predicted lncRNAs to be targeted by 66 miRNAs. miRNA and lncRNAs in plants are known to control a variety of traits including yield, quality and stress tolerance. Owing to its agricultural importance and medicinal value, the identified miRNA, lncRNA and their targets in C. cajan may be useful for genome editing to improve better quality crop. A thorough understanding of ncRNA-based cellular regulatory networks will aid in the improvement of C. cajan agricultural traits.
Heterogeneous conservation of Dlx paralog co-expression in jawed vertebrates.

PubMed

Debiais-Thibaud, Mélanie; Metcalfe, Cushla J; Pollack, Jacob; Germon, Isabelle; Ekker, Marc; Depew, Michael; Laurenti, Patrick; Borday-Birraux, Véronique; Casane, Didier

2013-01-01

The Dlx gene family encodes transcription factors involved in the development of a wide variety of morphological innovations that first evolved at the origins of vertebrates or of the jawed vertebrates. This gene family expanded with the two rounds of genome duplications that occurred before jawed vertebrates diversified. It includes at least three bigene pairs sharing conserved regulatory sequences in tetrapods and teleost fish, but has been only partially characterized in chondrichthyans, the third major group of jawed vertebrates. Here we take advantage of developmental and molecular tools applied to the shark Scyliorhinus canicula to fill in the gap and provide an overview of the evolution of the Dlx family in the jawed vertebrates. These results are analyzed in the theoretical framework of the DDC (Duplication-Degeneration-Complementation) model. The genomic organisation of the catshark Dlx genes is similar to that previously described for tetrapods. Conserved non-coding elements identified in bony fish were also identified in catshark Dlx clusters and showed regulatory activity in transgenic zebrafish. Gene expression patterns in the catshark showed that there are some expression sites with high conservation of the expressed paralog(s) and other expression sites with events of paralog sub-functionalization during jawed vertebrate diversification, resulting in a wide variety of evolutionary scenarios within this gene family. Dlx gene expression patterns in the catshark show that there has been little neo-functionalization in Dlx genes over gnathostome evolution. In most cases, one tandem duplication and two rounds of vertebrate genome duplication have led to at least six Dlx coding sequences with redundant expression patterns followed by some instances of paralog sub-functionalization. Regulatory constraints such as shared enhancers, and functional constraints including gene pleiotropy, may have contributed to the evolutionary inertia leading to high redundancy between gene expression patterns.
Sequence and functional characterization of MIRNA164 promoters from Brassica shows copy number dependent regulatory diversification among homeologs.

PubMed

Jain, Aditi; Anand, Saurabh; Singh, Neer K; Das, Sandip

2018-03-12

The impact of polyploidy on functional diversification of cis-regulatory elements is poorly understood. This is primarily on account of lack of well-defined structure of cis-elements and a universal regulatory code. To the best of our knowledge, this is the first report on characterization of sequence and functional diversification of paralogous and homeologous promoter elements associated with MIR164 from Brassica. The availability of whole genome sequence allowed us to identify and isolate a total of 42 homologous copies of MIR164 from diploid species-Brassica rapa (A-genome), Brassica nigra (B-genome), Brassica oleracea (C-genome), and allopolyploids-Brassica juncea (AB-genome), Brassica carinata (BC-genome) and Brassica napus (AC-genome). Additionally, we retrieved homologous sequences based on comparative genomics from Arabidopsis lyrata, Capsella rubella, and Thellungiella halophila, spanning ca. 45 million years of evolutionary history of Brassicaceae. Sequence comparison across Brassicaceae revealed lineage-, karyotype, species-, and sub-genome specific changes providing a snapshot of evolutionary dynamics of miRNA promoters in polyploids. Tree topology of cis-elements associated with MIR164 was found to re-capitulate the species and family evolutionary history. Phylogenetic shadowing identified transcription factor binding sites (TFBS) conserved across Brassicaceae, of which, some are already known as regulators of MIR164 expression. Some of the TFBS were found to be distributed in a sub-genome specific (e.g., SOX specific to promoter of MIR164c from MF2 sub-genome), lineage-specific (YABBY binding motif, specific to C. rubella in MIR164b), or species-specific (e.g., VOZ in A. thaliana MIR164a) manner which might contribute towards genetic and adaptive variation. Reporter activity driven by promoters associated with MIR164 paralogs and homeologs was majorly in agreement with known role of miR164 in leaf shaping, regulation of lateral root development and senescence, and one previously un-described novel role in trichome. The impact of polyploidy was most profound when reporter activity across three MIR164c homeologs were compared that revealed negligible overlap, whereas reporter activity among two homeologs of MIR164a displays significant overlap. A copy number dependent cis-regulatory divergence thus exists in MIR164 genes in Brassica juncea. The full extent of regulatory diversification towards adaptive strategies will only be known when future endeavors analyze the promoter function under duress of stress and hormonal regimes.
kmer-SVM: a web server for identifying predictive regulatory sequence features in genomic data sets

PubMed Central

Fletez-Brant, Christopher; Lee, Dongwon; McCallion, Andrew S.; Beer, Michael A.

2013-01-01

Massively parallel sequencing technologies have made the generation of genomic data sets a routine component of many biological investigations. For example, Chromatin immunoprecipitation followed by sequence assays detect genomic regions bound (directly or indirectly) by specific factors, and DNase-seq identifies regions of open chromatin. A major bottleneck in the interpretation of these data is the identification of the underlying DNA sequence code that defines, and ultimately facilitates prediction of, these transcription factor (TF) bound or open chromatin regions. We have recently developed a novel computational methodology, which uses a support vector machine (SVM) with kmer sequence features (kmer-SVM) to identify predictive combinations of short transcription factor-binding sites, which determine the tissue specificity of these genomic assays (Lee, Karchin and Beer, Discriminative prediction of mammalian enhancers from DNA sequence. Genome Res. 2011; 21:2167–80). This regulatory information can (i) give confidence in genomic experiments by recovering previously known binding sites, and (ii) reveal novel sequence features for subsequent experimental testing of cooperative mechanisms. Here, we describe the development and implementation of a web server to allow the broader research community to independently apply our kmer-SVM to analyze and interpret their genomic datasets. We analyze five recently published data sets and demonstrate how this tool identifies accessory factors and repressive sequence elements. kmer-SVM is available at http://kmersvm.beerlab.org. PMID:23771147
kmer-SVM: a web server for identifying predictive regulatory sequence features in genomic data sets.

PubMed

Fletez-Brant, Christopher; Lee, Dongwon; McCallion, Andrew S; Beer, Michael A

2013-07-01

Massively parallel sequencing technologies have made the generation of genomic data sets a routine component of many biological investigations. For example, Chromatin immunoprecipitation followed by sequence assays detect genomic regions bound (directly or indirectly) by specific factors, and DNase-seq identifies regions of open chromatin. A major bottleneck in the interpretation of these data is the identification of the underlying DNA sequence code that defines, and ultimately facilitates prediction of, these transcription factor (TF) bound or open chromatin regions. We have recently developed a novel computational methodology, which uses a support vector machine (SVM) with kmer sequence features (kmer-SVM) to identify predictive combinations of short transcription factor-binding sites, which determine the tissue specificity of these genomic assays (Lee, Karchin and Beer, Discriminative prediction of mammalian enhancers from DNA sequence. Genome Res. 2011; 21:2167-80). This regulatory information can (i) give confidence in genomic experiments by recovering previously known binding sites, and (ii) reveal novel sequence features for subsequent experimental testing of cooperative mechanisms. Here, we describe the development and implementation of a web server to allow the broader research community to independently apply our kmer-SVM to analyze and interpret their genomic datasets. We analyze five recently published data sets and demonstrate how this tool identifies accessory factors and repressive sequence elements. kmer-SVM is available at http://kmersvm.beerlab.org.

Identification and in vitro characterization of a Marek’s disease virus encoded ribonucleotide reductase

USDA-ARS?s Scientific Manuscript database

Marek’s disease virus (MDV) encodes a ribonucleotide reductase (RR), a key regulatory enzyme in the DNA synthesis pathway. The gene coding for the RR of MDV is located in the unique long (UL) region of the genome. The large subunit is encoded by UL39 (RR1) and is predicted to comprise 860 amino acid...
Genome-wide piRNA profiles of virus transmitting whitefly Bemisia tabaci during feeding on TYLCV-infected tomato

USDA-ARS?s Scientific Manuscript database

Small RNAs (sRNAs) are 20-31 nucleotide (nt) non-coding regulatory elements commonly found in plants and animals, which are classified as short interfering RNA (siRNA), microRNA (miRNA) and Piwi-interacting RNA (piRNA). The whitefly Bemisia tabaci MEAM1 is a vector capable of transmitting many devas...
Perspectives on the mechanism of transcriptional regulation by long non-coding RNAs.

PubMed

Roberts, Thomas C; Morris, Kevin V; Weinberg, Marc S

2014-01-01

Long non-coding RNAs (lncRNAs) are increasingly being recognized as epigenetic regulators of gene transcription. The diversity and complexity of lncRNA genes means that they exert their regulatory effects by a variety of mechanisms. Although there is still much to be learned about the mechanism of lncRNA function, general principles are starting to emerge. In particular, the application of high throughput (deep) sequencing methodologies has greatly advanced our understanding of lncRNA gene function. lncRNAs function as adaptors that link specific chromatin loci with chromatin-remodeling complexes and transcription factors. lncRNAs can act in cis or trans to guide epigenetic-modifier complexes to distinct genomic sites, or act as scaffolds which recruit multiple proteins simultaneously, thereby coordinating their activities. In this review we discuss the genomic organization of lncRNAs, the importance of RNA secondary structure to lncRNA functionality, the multitude of ways in which they interact with the genome, and what evolutionary conservation tells us about their function.
Disruption of long-distance highly conserved noncoding elements in neurocristopathies.

PubMed

Amiel, Jeanne; Benko, Sabina; Gordon, Christopher T; Lyonnet, Stanislas

2010-12-01

One of the key discoveries of vertebrate genome sequencing projects has been the identification of highly conserved noncoding elements (CNEs). Some characteristics of CNEs include their high frequency in mammalian genomes, their potential regulatory role in gene expression, and their enrichment in gene deserts nearby master developmental genes. The abnormal development of neural crest cells (NCCs) leads to a broad spectrum of congenital malformation(s), termed neurocristopathies, and/or tumor predisposition. Here we review recent findings that disruptions of CNEs, within or at long distance from the coding sequences of key genes involved in NCC development, result in neurocristopathies via the alteration of tissue- or stage-specific long-distance regulation of gene expression. While most studies on human genetic disorders have focused on protein-coding sequences, these examples suggest that investigation of genomic alterations of CNEs will provide a broader understanding of the molecular etiology of both rare and common human congenital malformations. © 2010 New York Academy of Sciences.
Association analysis identifies 65 new breast cancer risk loci

PubMed Central

Lemaçon, Audrey; Soucy, Penny; Glubb, Dylan; Rostamianfar, Asha; Bolla, Manjeet K.; Wang, Qin; Tyrer, Jonathan; Dicks, Ed; Lee, Andrew; Wang, Zhaoming; Allen, Jamie; Keeman, Renske; Eilber, Ursula; French, Juliet D.; Chen, Xiao Qing; Fachal, Laura; McCue, Karen; McCart Reed, Amy E.; Ghoussaini, Maya; Carroll, Jason; Jiang, Xia; Finucane, Hilary; Adams, Marcia; Adank, Muriel A.; Ahsan, Habibul; Aittomäki, Kristiina; Anton-Culver, Hoda; Antonenkova, Natalia N.; Arndt, Volker; Aronson, Kristan J.; Arun, Banu; Auer, Paul L.; Bacot, François; Barrdahl, Myrto; Baynes, Caroline; Beckmann, Matthias W.; Behrens, Sabine; Benitez, Javier; Bermisheva, Marina; Bernstein, Leslie; Blomqvist, Carl; Bogdanova, Natalia V.; Bojesen, Stig E.; Bonanni, Bernardo; Børresen-Dale, Anne-Lise; Brand, Judith S.; Brauch, Hiltrud; Brennan, Paul; Brenner, Hermann; Brinton, Louise; Broberg, Per; Brock, Ian W.; Broeks, Annegien; Brooks-Wilson, Angela; Brucker, Sara Y.; Brüning, Thomas; Burwinkel, Barbara; Butterbach, Katja; Cai, Qiuyin; Cai, Hui; Caldés, Trinidad; Canzian, Federico; Carracedo, Angel; Carter, Brian D.; Castelao, Jose E.; Chan, Tsun L.; Cheng, Ting-Yuan David; Chia, Kee Seng; Choi, Ji-Yeob; Christiansen, Hans; Clarke, Christine L.; Collée, Margriet; Conroy, Don M.; Cordina-Duverger, Emilie; Cornelissen, Sten; Cox, David G; Cox, Angela; Cross, Simon S.; Cunningham, Julie M.; Czene, Kamila; Daly, Mary B.; Devilee, Peter; Doheny, Kimberly F.; Dörk, Thilo; dos-Santos-Silva, Isabel; Dumont, Martine; Durcan, Lorraine; Dwek, Miriam; Eccles, Diana M.; Ekici, Arif B.; Eliassen, A. Heather; Ellberg, Carolina; Elvira, Mingajeva; Engel, Christoph; Eriksson, Mikael; Fasching, Peter A.; Figueroa, Jonine; Flesch-Janys, Dieter; Fletcher, Olivia; Flyger, Henrik; Fritschi, Lin; Gaborieau, Valerie; Gabrielson, Marike; Gago-Dominguez, Manuela; Gao, Yu-Tang; Gapstur, Susan M.; García-Sáenz, José A.; Gaudet, Mia M.; Georgoulias, Vassilios; Giles, Graham G.; Glendon, Gord; Goldberg, Mark S.; Goldgar, David E.; González-Neira, Anna; Grenaker Alnæs, Grethe I.; Grip, Mervi; Gronwald, Jacek; Grundy, Anne; Guénel, Pascal; Haeberle, Lothar; Hahnen, Eric; Haiman, Christopher A.; Håkansson, Niclas; Hamann, Ute; Hamel, Nathalie; Hankinson, Susan; Harrington, Patricia; Hart, Steven N.; Hartikainen, Jaana M.; Hartman, Mikael; Hein, Alexander; Heyworth, Jane; Hicks, Belynda; Hillemanns, Peter; Ho, Dona N.; Hollestelle, Antoinette; Hooning, Maartje J.; Hoover, Robert N.; Hopper, John L.; Hou, Ming-Feng; Hsiung, Chia-Ni; Huang, Guanmengqian; Humphreys, Keith; Ishiguro, Junko; Ito, Hidemi; Iwasaki, Motoki; Iwata, Hiroji; Jakubowska, Anna; Janni, Wolfgang; John, Esther M.; Johnson, Nichola; Jones, Kristine; Jones, Michael; Jukkola-Vuorinen, Arja; Kaaks, Rudolf; Kabisch, Maria; Kaczmarek, Katarzyna; Kang, Daehee; Kasuga, Yoshio; Kerin, Michael J.; Khan, Sofia; Khusnutdinova, Elza; Kiiski, Johanna I.; Kim, Sung-Won; Knight, Julia A.; Kosma, Veli-Matti; Kristensen, Vessela N.; Krüger, Ute; Kwong, Ava; Lambrechts, Diether; Marchand, Loic Le; Lee, Eunjung; Lee, Min Hyuk; Lee, Jong Won; Lee, Chuen Neng; Lejbkowicz, Flavio; Li, Jingmei; Lilyquist, Jenna; Lindblom, Annika; Lissowska, Jolanta; Lo, Wing-Yee; Loibl, Sibylle; Long, Jirong; Lophatananon, Artitaya; Lubinski, Jan; Luccarini, Craig; Lux, Michael P.; Ma, Edmond S.K.; MacInnis, Robert J.; Maishman, Tom; Makalic, Enes; Malone, Kathleen E; Kostovska, Ivana Maleva; Mannermaa, Arto; Manoukian, Siranoush; Manson, JoAnn E.; Margolin, Sara; Mariapun, Shivaani; Martinez, Maria Elena; Matsuo, Keitaro; Mavroudis, Dimitrios; McKay, James; McLean, Catriona; Meijers-Heijboer, Hanne; Meindl, Alfons; Menéndez, Primitiva; Menon, Usha; Meyer, Jeffery; Miao, Hui; Miller, Nicola; Mohd Taib, Nur Aishah; Muir, Kenneth; Mulligan, Anna Marie; Mulot, Claire; Neuhausen, Susan L.; Nevanlinna, Heli; Neven, Patrick; Nielsen, Sune F.; Noh, Dong-Young; Nordestgaard, Børge G.; Norman, Aaron; Olopade, Olufunmilayo I.; Olson, Janet E.; Olsson, Håkan; Olswold, Curtis; Orr, Nick; Pankratz, V. Shane; Park, Sue K.; Park-Simon, Tjoung-Won; Lloyd, Rachel; Perez, Jose I.A.; Peterlongo, Paolo; Peto, Julian; Phillips, Kelly-Anne; Pinchev, Mila; Plaseska-Karanfilska, Dijana; Prentice, Ross; Presneau, Nadege; Prokofieva, Darya; Pugh, Elizabeth; Pylkäs, Katri; Rack, Brigitte; Radice, Paolo; Rahman, Nazneen; Rennert, Gadi; Rennert, Hedy S.; Rhenius, Valerie; Romero, Atocha; Romm, Jane; Ruddy, Kathryn J; Rüdiger, Thomas; Rudolph, Anja; Ruebner, Matthias; Rutgers, Emiel J. Th.; Saloustros, Emmanouil; Sandler, Dale P.; Sangrajrang, Suleeporn; Sawyer, Elinor J.; Schmidt, Daniel F.; Schmutzler, Rita K.; Schneeweiss, Andreas; Schoemaker, Minouk J.; Schumacher, Fredrick; Schürmann, Peter; Scott, Rodney J.; Scott, Christopher; Seal, Sheila; Seynaeve, Caroline; Shah, Mitul; Sharma, Priyanka; Shen, Chen-Yang; Sheng, Grace; Sherman, Mark E.; Shrubsole, Martha J.; Shu, Xiao-Ou; Smeets, Ann; Sohn, Christof; Southey, Melissa C.; Spinelli, John J.; Stegmaier, Christa; Stewart-Brown, Sarah; Stone, Jennifer; Stram, Daniel O.; Surowy, Harald; Swerdlow, Anthony; Tamimi, Rulla; Taylor, Jack A.; Tengström, Maria; Teo, Soo H.; Terry, Mary Beth; Tessier, Daniel C.; Thanasitthichai, Somchai; Thöne, Kathrin; Tollenaar, Rob A.E.M.; Tomlinson, Ian; Tong, Ling; Torres, Diana; Truong, Thérèse; Tseng, Chiu-chen; Tsugane, Shoichiro; Ulmer, Hans-Ulrich; Ursin, Giske; Untch, Michael; Vachon, Celine; van Asperen, Christi J.; Van Den Berg, David; van den Ouweland, Ans M.W.; van der Kolk, Lizet; van der Luijt, Rob B.; Vincent, Daniel; Vollenweider, Jason; Waisfisz, Quinten; Wang-Gohrke, Shan; Weinberg, Clarice R.; Wendt, Camilla; Whittemore, Alice S.; Wildiers, Hans; Willett, Walter; Winqvist, Robert; Wolk, Alicja; Wu, Anna H.; Xia, Lucy; Yamaji, Taiki; Yang, Xiaohong R.; Yip, Cheng Har; Yoo, Keun-Young; Yu, Jyh-Cherng; Zheng, Wei; Zheng, Ying; Zhu, Bin; Ziogas, Argyrios; Ziv, Elad; Lakhani, Sunil R.; Antoniou, Antonis C.; Droit, Arnaud; Andrulis, Irene L.; Amos, Christopher I.; Couch, Fergus J.; Pharoah, Paul D.P.; Chang-Claude, Jenny; Hall, Per; Hunter, David J.; Milne, Roger L.; García-Closas, Montserrat; Schmidt, Marjanka K.; Chanock, Stephen J.; Dunning, Alison M.; Edwards, Stacey L.; Bader, Gary D.; Chenevix-Trench, Georgia; Simard, Jacques; Kraft, Peter; Easton, Douglas F.

2017-01-01

Breast cancer risk is influenced by rare coding variants in susceptibility genes such as BRCA1 and many common, mainly non-coding variants. However, much of the genetic contribution to breast cancer risk remains unknown. We report results from a genome-wide association study (GWAS) of breast cancer in 122,977 cases and 105,974 controls of European ancestry and 14,068 cases and 13,104 controls of East Asian ancestry1. We identified 65 new loci associated with overall breast cancer at p<5x10-8. The majority of credible risk SNPs in the new loci fall in distal regulatory elements, and by integrating in-silico data to predict target genes in breast cells at each locus, we demonstrate a strong overlap between candidate target genes and somatic driver genes in breast tumours. We also find that heritability of breast cancer due to all SNPs in regulatory features was 2-5-fold enriched relative to the genome-wide average, with strong enrichment for particular transcription factor binding sites. These results provide further insight into genetic susceptibility to breast cancer and will improve the utility of genetic risk scores for individualized screening and prevention. PMID:29059683
Enhancer elements upstream of the SHOX gene are active in the developing limb.

PubMed

Durand, Claudia; Bangs, Fiona; Signolet, Jason; Decker, Eva; Tickle, Cheryll; Rappold, Gudrun

2010-05-01

Léri-Weill Dyschondrosteosis (LWD) is a dominant skeletal disorder characterized by short stature and distinct bone anomalies. SHOX gene mutations and deletions of regulatory elements downstream of SHOX resulting in haploinsufficiency have been found in patients with LWD. SHOX encodes a homeodomain transcription factor and is known to be expressed in the developing limb. We have now analyzed the regulatory significance of the region upstream of the SHOX gene. By comparative genomic analyses, we identified several conserved non-coding elements, which subsequently were tested in an in ovo enhancer assay in both chicken limb bud and cornea, where SHOX is also expressed. In this assay, we found three enhancers to be active in the developing chicken limb, but none were functional in the developing cornea. A screening of 60 LWD patients with an intact SHOX coding and downstream region did not yield any deletion of the upstream enhancer region. Thus, we speculate that SHOX upstream deletions occur at a lower frequency because of the structural organization of this genomic region and/or that SHOX upstream deletions may cause a phenotype that differs from the one observed in LWD.
Enhancer elements upstream of the SHOX gene are active in the developing limb

PubMed Central

Durand, Claudia; Bangs, Fiona; Signolet, Jason; Decker, Eva; Tickle, Cheryll; Rappold, Gudrun

2010-01-01

Léri-Weill Dyschondrosteosis (LWD) is a dominant skeletal disorder characterized by short stature and distinct bone anomalies. SHOX gene mutations and deletions of regulatory elements downstream of SHOX resulting in haploinsufficiency have been found in patients with LWD. SHOX encodes a homeodomain transcription factor and is known to be expressed in the developing limb. We have now analyzed the regulatory significance of the region upstream of the SHOX gene. By comparative genomic analyses, we identified several conserved non-coding elements, which subsequently were tested in an in ovo enhancer assay in both chicken limb bud and cornea, where SHOX is also expressed. In this assay, we found three enhancers to be active in the developing chicken limb, but none were functional in the developing cornea. A screening of 60 LWD patients with an intact SHOX coding and downstream region did not yield any deletion of the upstream enhancer region. Thus, we speculate that SHOX upstream deletions occur at a lower frequency because of the structural organization of this genomic region and/or that SHOX upstream deletions may cause a phenotype that differs from the one observed in LWD. PMID:19997128
Association analysis identifies 65 new breast cancer risk loci.

PubMed

Michailidou, Kyriaki; Lindström, Sara; Dennis, Joe; Beesley, Jonathan; Hui, Shirley; Kar, Siddhartha; Lemaçon, Audrey; Soucy, Penny; Glubb, Dylan; Rostamianfar, Asha; Bolla, Manjeet K; Wang, Qin; Tyrer, Jonathan; Dicks, Ed; Lee, Andrew; Wang, Zhaoming; Allen, Jamie; Keeman, Renske; Eilber, Ursula; French, Juliet D; Qing Chen, Xiao; Fachal, Laura; McCue, Karen; McCart Reed, Amy E; Ghoussaini, Maya; Carroll, Jason S; Jiang, Xia; Finucane, Hilary; Adams, Marcia; Adank, Muriel A; Ahsan, Habibul; Aittomäki, Kristiina; Anton-Culver, Hoda; Antonenkova, Natalia N; Arndt, Volker; Aronson, Kristan J; Arun, Banu; Auer, Paul L; Bacot, François; Barrdahl, Myrto; Baynes, Caroline; Beckmann, Matthias W; Behrens, Sabine; Benitez, Javier; Bermisheva, Marina; Bernstein, Leslie; Blomqvist, Carl; Bogdanova, Natalia V; Bojesen, Stig E; Bonanni, Bernardo; Børresen-Dale, Anne-Lise; Brand, Judith S; Brauch, Hiltrud; Brennan, Paul; Brenner, Hermann; Brinton, Louise; Broberg, Per; Brock, Ian W; Broeks, Annegien; Brooks-Wilson, Angela; Brucker, Sara Y; Brüning, Thomas; Burwinkel, Barbara; Butterbach, Katja; Cai, Qiuyin; Cai, Hui; Caldés, Trinidad; Canzian, Federico; Carracedo, Angel; Carter, Brian D; Castelao, Jose E; Chan, Tsun L; David Cheng, Ting-Yuan; Seng Chia, Kee; Choi, Ji-Yeob; Christiansen, Hans; Clarke, Christine L; Collée, Margriet; Conroy, Don M; Cordina-Duverger, Emilie; Cornelissen, Sten; Cox, David G; Cox, Angela; Cross, Simon S; Cunningham, Julie M; Czene, Kamila; Daly, Mary B; Devilee, Peter; Doheny, Kimberly F; Dörk, Thilo; Dos-Santos-Silva, Isabel; Dumont, Martine; Durcan, Lorraine; Dwek, Miriam; Eccles, Diana M; Ekici, Arif B; Eliassen, A Heather; Ellberg, Carolina; Elvira, Mingajeva; Engel, Christoph; Eriksson, Mikael; Fasching, Peter A; Figueroa, Jonine; Flesch-Janys, Dieter; Fletcher, Olivia; Flyger, Henrik; Fritschi, Lin; Gaborieau, Valerie; Gabrielson, Marike; Gago-Dominguez, Manuela; Gao, Yu-Tang; Gapstur, Susan M; García-Sáenz, José A; Gaudet, Mia M; Georgoulias, Vassilios; Giles, Graham G; Glendon, Gord; Goldberg, Mark S; Goldgar, David E; González-Neira, Anna; Grenaker Alnæs, Grethe I; Grip, Mervi; Gronwald, Jacek; Grundy, Anne; Guénel, Pascal; Haeberle, Lothar; Hahnen, Eric; Haiman, Christopher A; Håkansson, Niclas; Hamann, Ute; Hamel, Nathalie; Hankinson, Susan; Harrington, Patricia; Hart, Steven N; Hartikainen, Jaana M; Hartman, Mikael; Hein, Alexander; Heyworth, Jane; Hicks, Belynda; Hillemanns, Peter; Ho, Dona N; Hollestelle, Antoinette; Hooning, Maartje J; Hoover, Robert N; Hopper, John L; Hou, Ming-Feng; Hsiung, Chia-Ni; Huang, Guanmengqian; Humphreys, Keith; Ishiguro, Junko; Ito, Hidemi; Iwasaki, Motoki; Iwata, Hiroji; Jakubowska, Anna; Janni, Wolfgang; John, Esther M; Johnson, Nichola; Jones, Kristine; Jones, Michael; Jukkola-Vuorinen, Arja; Kaaks, Rudolf; Kabisch, Maria; Kaczmarek, Katarzyna; Kang, Daehee; Kasuga, Yoshio; Kerin, Michael J; Khan, Sofia; Khusnutdinova, Elza; Kiiski, Johanna I; Kim, Sung-Won; Knight, Julia A; Kosma, Veli-Matti; Kristensen, Vessela N; Krüger, Ute; Kwong, Ava; Lambrechts, Diether; Le Marchand, Loic; Lee, Eunjung; Lee, Min Hyuk; Lee, Jong Won; Neng Lee, Chuen; Lejbkowicz, Flavio; Li, Jingmei; Lilyquist, Jenna; Lindblom, Annika; Lissowska, Jolanta; Lo, Wing-Yee; Loibl, Sibylle; Long, Jirong; Lophatananon, Artitaya; Lubinski, Jan; Luccarini, Craig; Lux, Michael P; Ma, Edmond S K; MacInnis, Robert J; Maishman, Tom; Makalic, Enes; Malone, Kathleen E; Kostovska, Ivana Maleva; Mannermaa, Arto; Manoukian, Siranoush; Manson, JoAnn E; Margolin, Sara; Mariapun, Shivaani; Martinez, Maria Elena; Matsuo, Keitaro; Mavroudis, Dimitrios; McKay, James; McLean, Catriona; Meijers-Heijboer, Hanne; Meindl, Alfons; Menéndez, Primitiva; Menon, Usha; Meyer, Jeffery; Miao, Hui; Miller, Nicola; Taib, Nur Aishah Mohd; Muir, Kenneth; Mulligan, Anna Marie; Mulot, Claire; Neuhausen, Susan L; Nevanlinna, Heli; Neven, Patrick; Nielsen, Sune F; Noh, Dong-Young; Nordestgaard, Børge G; Norman, Aaron; Olopade, Olufunmilayo I; Olson, Janet E; Olsson, Håkan; Olswold, Curtis; Orr, Nick; Pankratz, V Shane; Park, Sue K; Park-Simon, Tjoung-Won; Lloyd, Rachel; Perez, Jose I A; Peterlongo, Paolo; Peto, Julian; Phillips, Kelly-Anne; Pinchev, Mila; Plaseska-Karanfilska, Dijana; Prentice, Ross; Presneau, Nadege; Prokofyeva, Darya; Pugh, Elizabeth; Pylkäs, Katri; Rack, Brigitte; Radice, Paolo; Rahman, Nazneen; Rennert, Gadi; Rennert, Hedy S; Rhenius, Valerie; Romero, Atocha; Romm, Jane; Ruddy, Kathryn J; Rüdiger, Thomas; Rudolph, Anja; Ruebner, Matthias; Rutgers, Emiel J T; Saloustros, Emmanouil; Sandler, Dale P; Sangrajrang, Suleeporn; Sawyer, Elinor J; Schmidt, Daniel F; Schmutzler, Rita K; Schneeweiss, Andreas; Schoemaker, Minouk J; Schumacher, Fredrick; Schürmann, Peter; Scott, Rodney J; Scott, Christopher; Seal, Sheila; Seynaeve, Caroline; Shah, Mitul; Sharma, Priyanka; Shen, Chen-Yang; Sheng, Grace; Sherman, Mark E; Shrubsole, Martha J; Shu, Xiao-Ou; Smeets, Ann; Sohn, Christof; Southey, Melissa C; Spinelli, John J; Stegmaier, Christa; Stewart-Brown, Sarah; Stone, Jennifer; Stram, Daniel O; Surowy, Harald; Swerdlow, Anthony; Tamimi, Rulla; Taylor, Jack A; Tengström, Maria; Teo, Soo H; Beth Terry, Mary; Tessier, Daniel C; Thanasitthichai, Somchai; Thöne, Kathrin; Tollenaar, Rob A E M; Tomlinson, Ian; Tong, Ling; Torres, Diana; Truong, Thérèse; Tseng, Chiu-Chen; Tsugane, Shoichiro; Ulmer, Hans-Ulrich; Ursin, Giske; Untch, Michael; Vachon, Celine; van Asperen, Christi J; Van Den Berg, David; van den Ouweland, Ans M W; van der Kolk, Lizet; van der Luijt, Rob B; Vincent, Daniel; Vollenweider, Jason; Waisfisz, Quinten; Wang-Gohrke, Shan; Weinberg, Clarice R; Wendt, Camilla; Whittemore, Alice S; Wildiers, Hans; Willett, Walter; Winqvist, Robert; Wolk, Alicja; Wu, Anna H; Xia, Lucy; Yamaji, Taiki; Yang, Xiaohong R; Har Yip, Cheng; Yoo, Keun-Young; Yu, Jyh-Cherng; Zheng, Wei; Zheng, Ying; Zhu, Bin; Ziogas, Argyrios; Ziv, Elad; Lakhani, Sunil R; Antoniou, Antonis C; Droit, Arnaud; Andrulis, Irene L; Amos, Christopher I; Couch, Fergus J; Pharoah, Paul D P; Chang-Claude, Jenny; Hall, Per; Hunter, David J; Milne, Roger L; García-Closas, Montserrat; Schmidt, Marjanka K; Chanock, Stephen J; Dunning, Alison M; Edwards, Stacey L; Bader, Gary D; Chenevix-Trench, Georgia; Simard, Jacques; Kraft, Peter; Easton, Douglas F

2017-11-02

Breast cancer risk is influenced by rare coding variants in susceptibility genes, such as BRCA1, and many common, mostly non-coding variants. However, much of the genetic contribution to breast cancer risk remains unknown. Here we report the results of a genome-wide association study of breast cancer in 122,977 cases and 105,974 controls of European ancestry and 14,068 cases and 13,104 controls of East Asian ancestry. We identified 65 new loci that are associated with overall breast cancer risk at P < 5 × 10 -8 . The majority of credible risk single-nucleotide polymorphisms in these loci fall in distal regulatory elements, and by integrating in silico data to predict target genes in breast cells at each locus, we demonstrate a strong overlap between candidate target genes and somatic driver genes in breast tumours. We also find that heritability of breast cancer due to all single-nucleotide polymorphisms in regulatory features was 2-5-fold enriched relative to the genome-wide average, with strong enrichment for particular transcription factor binding sites. These results provide further insight into genetic susceptibility to breast cancer and will improve the use of genetic risk scores for individualized screening and prevention.
The Evolution of Bony Vertebrate Enhancers at Odds with Their Coding Sequence Landscape.

PubMed

Yousaf, Aisha; Sohail Raza, Muhammad; Ali Abbasi, Amir

2015-08-06

Enhancers lie at the heart of transcriptional and developmental gene regulation. Therefore, changes in enhancer sequences usually disrupt the target gene expression and result in disease phenotypes. Despite the well-established role of enhancers in development and disease, evolutionary sequence studies are lacking. The current study attempts to unravel the puzzle of bony vertebrates' conserved noncoding elements (CNE) enhancer evolution. Bayesian phylogenetics of enhancer sequences spotlights promising interordinal relationships among placental mammals, proposing a closer relationship between humans and laurasiatherians while placing rodents at the basal position. Clock-based estimates of enhancer evolution provided a dynamic picture of interspecific rate changes across the bony vertebrate lineage. Moreover, coelacanth in the study augmented our appreciation of the vertebrate cis-regulatory evolution during water-land transition. Intriguingly, we observed a pronounced upsurge in enhancer evolution in land-dwelling vertebrates. These novel findings triggered us to further investigate the evolutionary trend of coding as well as CNE nonenhancer repertoires, to highlight the relative evolutionary dynamics of diverse genomic landscapes. Surprisingly, the evolutionary rates of enhancer sequences were clearly at odds with those of the coding and the CNE nonenhancer sequences during vertebrate adaptation to land, with land vertebrates exhibiting significantly reduced rates of coding sequence evolution in comparison to their fast evolving regulatory landscape. The observed variation in tetrapod cis-regulatory elements caused the fine-tuning of associated gene regulatory networks. Therefore, the increased evolutionary rate of tetrapods' enhancer sequences might be responsible for the variation in developmental regulatory circuits during the process of vertebrate adaptation to land. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
The Reference Genome of the Halophytic Plant Eutrema salsugineum

PubMed Central

Yang, Ruolin; Jarvis, David E.; Chen, Hao; Beilstein, Mark A.; Grimwood, Jane; Jenkins, Jerry; Shu, ShengQiang; Prochnik, Simon; Xin, Mingming; Ma, Chuang; Schmutz, Jeremy; Wing, Rod A.; Mitchell-Olds, Thomas; Schumaker, Karen S.; Wang, Xiangfeng

2013-01-01

Halophytes are plants that can naturally tolerate high concentrations of salt in the soil, and their tolerance to salt stress may occur through various evolutionary and molecular mechanisms. Eutrema salsugineum is a halophytic species in the Brassicaceae that can naturally tolerate multiple types of abiotic stresses that typically limit crop productivity, including extreme salinity and cold. It has been widely used as a laboratorial model for stress biology research in plants. Here, we present the reference genome sequence (241 Mb) of E. salsugineum at 8× coverage sequenced using the traditional Sanger sequencing-based approach with comparison to its close relative Arabidopsis thaliana. The E. salsugineum genome contains 26,531 protein-coding genes and 51.4% of its genome is composed of repetitive sequences that mostly reside in pericentromeric regions. Comparative analyses of the genome structures, protein-coding genes, microRNAs, stress-related pathways, and estimated translation efficiency of proteins between E. salsugineum and A. thaliana suggest that halophyte adaptation to environmental stresses may occur via a global network adjustment of multiple regulatory mechanisms. The E. salsugineum genome provides a resource to identify naturally occurring genetic alterations contributing to the adaptation of halophytic plants to salinity and that might be bioengineered in related crop species. PMID:23518688
Imprinted and X-linked non-coding RNAs as potential regulators of human placental function

PubMed Central

Buckberry, Sam; Bianco-Miotto, Tina; Roberts, Claire T

2014-01-01

Pregnancy outcome is inextricably linked to placental development, which is strictly controlled temporally and spatially through mechanisms that are only partially understood. However, increasing evidence suggests non-coding RNAs (ncRNAs) direct and regulate a considerable number of biological processes and therefore may constitute a previously hidden layer of regulatory information in the placenta. Many ncRNAs, including both microRNAs and long non-coding transcripts, show almost exclusive or predominant expression in the placenta compared with other somatic tissues and display altered expression patterns in placentas from complicated pregnancies. In this review, we explore the results of recent genome-scale and single gene expression studies using human placental tissue, but include studies in the mouse where human data are lacking. Our review focuses on the ncRNAs epigenetically regulated through genomic imprinting or X-chromosome inactivation and includes recent evidence surrounding the H19 lincRNA, the imprinted C19MC cluster microRNAs, and X-linked miRNAs associated with pregnancy complications. PMID:24081302
Modelling Human Regulatory Variation in Mouse: Finding the Function in Genome-Wide Association Studies and Whole-Genome Sequencing

PubMed Central

Schmouth, Jean-François; Bonaguro, Russell J.; Corso-Diaz, Ximena; Simpson, Elizabeth M.

2012-01-01

An increasing body of literature from genome-wide association studies and human whole-genome sequencing highlights the identification of large numbers of candidate regulatory variants of potential therapeutic interest in numerous diseases. Our relatively poor understanding of the functions of non-coding genomic sequence, and the slow and laborious process of experimental validation of the functional significance of human regulatory variants, limits our ability to fully benefit from this information in our efforts to comprehend human disease. Humanized mouse models (HuMMs), in which human genes are introduced into the mouse, suggest an approach to this problem. In the past, HuMMs have been used successfully to study human disease variants; e.g., the complex genetic condition arising from Down syndrome, common monogenic disorders such as Huntington disease and β-thalassemia, and cancer susceptibility genes such as BRCA1. In this commentary, we highlight a novel method for high-throughput single-copy site-specific generation of HuMMs entitled High-throughput Human Genes on the X Chromosome (HuGX). This method can be applied to most human genes for which a bacterial artificial chromosome (BAC) construct can be derived and a mouse-null allele exists. This strategy comprises (1) the use of recombineering technology to create a human variant–harbouring BAC, (2) knock-in of this BAC into the mouse genome using Hprt docking technology, and (3) allele comparison by interspecies complementation. We demonstrate the throughput of the HuGX method by generating a series of seven different alleles for the human NR2E1 gene at Hprt. In future challenges, we consider the current limitations of experimental approaches and call for a concerted effort by the genetics community, for both human and mouse, to solve the challenge of the functional analysis of human regulatory variation. PMID:22396661
G-quadruplex prediction in E. coli genome reveals a conserved putative G-quadruplex-Hairpin-Duplex switch.

PubMed

Kaplan, Oktay I; Berber, Burak; Hekim, Nezih; Doluca, Osman

2016-11-02

Many studies show that short non-coding sequences are widely conserved among regulatory elements. More and more conserved sequences are being discovered since the development of next generation sequencing technology. A common approach to identify conserved sequences with regulatory roles relies on topological changes such as hairpin formation at the DNA or RNA level. G-quadruplexes, non-canonical nucleic acid topologies with little established biological roles, are increasingly considered for conserved regulatory element discovery. Since the tertiary structure of G-quadruplexes is strongly dependent on the loop sequence which is disregarded by the generally accepted algorithm, we hypothesized that G-quadruplexes with similar topology and, indirectly, similar interaction patterns, can be determined using phylogenetic clustering based on differences in the loop sequences. Phylogenetic analysis of 52 G-quadruplex forming sequences in the Escherichia coli genome revealed two conserved G-quadruplex motifs with a potential regulatory role. Further analysis revealed that both motifs tend to form hairpins and G quadruplexes, as supported by circular dichroism studies. The phylogenetic analysis as described in this work can greatly improve the discovery of functional G-quadruplex structures and may explain unknown regulatory patterns. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Comprehensive Rare Variant Analysis via Whole-Genome Sequencing to Determine the Molecular Pathology of Inherited Retinal Disease.

PubMed

Carss, Keren J; Arno, Gavin; Erwood, Marie; Stephens, Jonathan; Sanchis-Juan, Alba; Hull, Sarah; Megy, Karyn; Grozeva, Detelina; Dewhurst, Eleanor; Malka, Samantha; Plagnol, Vincent; Penkett, Christopher; Stirrups, Kathleen; Rizzo, Roberta; Wright, Genevieve; Josifova, Dragana; Bitner-Glindzicz, Maria; Scott, Richard H; Clement, Emma; Allen, Louise; Armstrong, Ruth; Brady, Angela F; Carmichael, Jenny; Chitre, Manali; Henderson, Robert H H; Hurst, Jane; MacLaren, Robert E; Murphy, Elaine; Paterson, Joan; Rosser, Elisabeth; Thompson, Dorothy A; Wakeling, Emma; Ouwehand, Willem H; Michaelides, Michel; Moore, Anthony T; Webster, Andrew R; Raymond, F Lucy

2017-01-05

Inherited retinal disease is a common cause of visual impairment and represents a highly heterogeneous group of conditions. Here, we present findings from a cohort of 722 individuals with inherited retinal disease, who have had whole-genome sequencing (n = 605), whole-exome sequencing (n = 72), or both (n = 45) performed, as part of the NIHR-BioResource Rare Diseases research study. We identified pathogenic variants (single-nucleotide variants, indels, or structural variants) for 404/722 (56%) individuals. Whole-genome sequencing gives unprecedented power to detect three categories of pathogenic variants in particular: structural variants, variants in GC-rich regions, which have significantly improved coverage compared to whole-exome sequencing, and variants in non-coding regulatory regions. In addition to previously reported pathogenic regulatory variants, we have identified a previously unreported pathogenic intronic variant in CHM in two males with choroideremia. We have also identified 19 genes not previously known to be associated with inherited retinal disease, which harbor biallelic predicted protein-truncating variants in unsolved cases. Whole-genome sequencing is an increasingly important comprehensive method with which to investigate the genetic causes of inherited retinal disease. Copyright © 2017. Published by Elsevier Inc.
Regulatory Architecture of Gene Expression Variation in the Threespine Stickleback Gasterosteus aculeatus.

PubMed

Pritchard, Victoria L; Viitaniemi, Heidi M; McCairns, R J Scott; Merilä, Juha; Nikinmaa, Mikko; Primmer, Craig R; Leder, Erica H

2017-01-05

Much adaptive evolutionary change is underlain by mutational variation in regions of the genome that regulate gene expression rather than in the coding regions of the genes themselves. An understanding of the role of gene expression variation in facilitating local adaptation will be aided by an understanding of underlying regulatory networks. Here, we characterize the genetic architecture of gene expression variation in the threespine stickleback (Gasterosteus aculeatus), an important model in the study of adaptive evolution. We collected transcriptomic and genomic data from 60 half-sib families using an expression microarray and genotyping-by-sequencing, and located expression quantitative trait loci (eQTL) underlying the variation in gene expression in liver tissue using an interval mapping approach. We identified eQTL for several thousand expression traits. Expression was influenced by polymorphism in both cis- and trans-regulatory regions. Trans-eQTL clustered into hotspots. We did not identify master transcriptional regulators in hotspot locations: rather, the presence of hotspots may be driven by complex interactions between multiple transcription factors. One observed hotspot colocated with a QTL recently found to underlie salinity tolerance in the threespine stickleback. However, most other observed hotspots did not colocate with regions of the genome known to be involved in adaptive divergence between marine and freshwater habitats. Copyright © 2017 Pritchard et al.
Regulatory Architecture of Gene Expression Variation in the Threespine Stickleback Gasterosteus aculeatus

PubMed Central

Pritchard, Victoria L.; Viitaniemi, Heidi M.; McCairns, R. J. Scott; Merilä, Juha; Nikinmaa, Mikko; Primmer, Craig R.; Leder, Erica H.

2016-01-01

Much adaptive evolutionary change is underlain by mutational variation in regions of the genome that regulate gene expression rather than in the coding regions of the genes themselves. An understanding of the role of gene expression variation in facilitating local adaptation will be aided by an understanding of underlying regulatory networks. Here, we characterize the genetic architecture of gene expression variation in the threespine stickleback (Gasterosteus aculeatus), an important model in the study of adaptive evolution. We collected transcriptomic and genomic data from 60 half-sib families using an expression microarray and genotyping-by-sequencing, and located expression quantitative trait loci (eQTL) underlying the variation in gene expression in liver tissue using an interval mapping approach. We identified eQTL for several thousand expression traits. Expression was influenced by polymorphism in both cis- and trans-regulatory regions. Trans-eQTL clustered into hotspots. We did not identify master transcriptional regulators in hotspot locations: rather, the presence of hotspots may be driven by complex interactions between multiple transcription factors. One observed hotspot colocated with a QTL recently found to underlie salinity tolerance in the threespine stickleback. However, most other observed hotspots did not colocate with regions of the genome known to be involved in adaptive divergence between marine and freshwater habitats. PMID:27836907
Genome-wide chromatin and gene expression profiling during memory formation and maintenance in adult mice.

PubMed

Centeno, Tonatiuh Pena; Shomroni, Orr; Hennion, Magali; Halder, Rashi; Vidal, Ramon; Rahman, Raza-Ur; Bonn, Stefan

2016-10-11

Recent evidence suggests that the formation and maintenance of memory requires epigenetic changes. In an effort to understand the spatio-temporal extent of learning and memory-related epigenetic changes we have charted genome-wide histone and DNA methylation profiles, in two different brain regions, two cell types, and three time-points, before and after learning. In this data descriptor we provide detailed information on data generation, give insights into the rationale of experiments, highlight necessary steps to assess data quality, offer guidelines for future use of the data and supply ready-to-use code to replicate the analysis results. The data provides a blueprint of the gene regulatory network underlying short- and long-term memory formation and maintenance. This 'healthy' gene regulatory network of learning can now be compared to changes in neurological or psychiatric diseases, providing mechanistic insights into brain disorders and highlighting potential therapeutic avenues.
Divergent evolutionary rates in vertebrate and mammalian specific conserved non-coding elements (CNEs) in echolocating mammals.

PubMed

Davies, Kalina T J; Tsagkogeorga, Georgia; Rossiter, Stephen J

2014-12-19

The majority of DNA contained within vertebrate genomes is non-coding, with a certain proportion of this thought to play regulatory roles during development. Conserved Non-coding Elements (CNEs) are an abundant group of putative regulatory sequences that are highly conserved across divergent groups and thus assumed to be under strong selective constraint. Many CNEs may contain regulatory factor binding sites, and their frequent spatial association with key developmental genes - such as those regulating sensory system development - suggests crucial roles in regulating gene expression and cellular patterning. Yet surprisingly little is known about the molecular evolution of CNEs across diverse mammalian taxa or their role in specific phenotypic adaptations. We examined 3,110 vertebrate-specific and ~82,000 mammalian-specific CNEs across 19 and 9 mammalian orders respectively, and tested for changes in the rate of evolution of CNEs located in the proximity of genes underlying the development or functioning of auditory systems. As we focused on CNEs putatively associated with genes underlying the development/functioning of auditory systems, we incorporated echolocating taxa in our dataset because of their highly specialised and derived auditory systems. Phylogenetic reconstructions of concatenated CNEs broadly recovered accepted mammal relationships despite high levels of sequence conservation. We found that CNE substitution rates were highest in rodents and lowest in primates, consistent with previous findings. Comparisons of CNE substitution rates from several genomic regions containing genes linked to auditory system development and hearing revealed differences between echolocating and non-echolocating taxa. Wider taxonomic sampling of four CNEs associated with the homeobox genes Hmx2 and Hmx3 - which are required for inner ear development - revealed family-wise variation across diverse bat species. Specifically within one family of echolocating bats that utilise frequency-modulated echolocation calls varying widely in frequency and intensity high levels of sequence divergence were found. Levels of selective constraint acting on CNEs differed both across genomic locations and taxa, with observed variation in substitution rates of CNEs among bat species. More work is needed to determine whether this variation can be linked to echolocation, and wider taxonomic sampling is necessary to fully document levels of conservation in CNEs across diverse taxa.
Bioinformatics of prokaryotic RNAs

PubMed Central

Backofen, Rolf; Amman, Fabian; Costa, Fabrizio; Findeiß, Sven; Richter, Andreas S; Stadler, Peter F

2014-01-01

The genome of most prokaryotes gives rise to surprisingly complex transcriptomes, comprising not only protein-coding mRNAs, often organized as operons, but also harbors dozens or even hundreds of highly structured small regulatory RNAs and unexpectedly large levels of anti-sense transcripts. Comprehensive surveys of prokaryotic transcriptomes and the need to characterize also their non-coding components is heavily dependent on computational methods and workflows, many of which have been developed or at least adapted specifically for the use with bacterial and archaeal data. This review provides an overview on the state-of-the-art of RNA bioinformatics focusing on applications to prokaryotes. PMID:24755880
Convergent functional genomics of psychiatric disorders.

PubMed

Niculescu, Alexander B

2013-10-01

Genetic and gene expression studies, in humans and animal models of psychiatric and other medical disorders, are becoming increasingly integrated. Particularly for genomics, the convergence and integration of data across species, experimental modalities and technical platforms is providing a fit-to-disease way of extracting reproducible and biologically important signal, in contrast to the fit-to-cohort effect and limited reproducibility of human genetic analyses alone. With the advent of whole-genome sequencing and the realization that a major portion of the non-coding genome may contain regulatory variants, Convergent Functional Genomics (CFG) approaches are going to be essential to identify disease-relevant signal from the tremendous polymorphic variation present in the general population. Such work in psychiatry can provide an example of how to address other genetically complex disorders, and in turn will benefit by incorporating concepts from other areas, such as cancer, cardiovascular diseases, and diabetes. © 2013 Wiley Periodicals, Inc.

Formation of new chromatin domains determines pathogenicity of genomic duplications.

PubMed

Franke, Martin; Ibrahim, Daniel M; Andrey, Guillaume; Schwarzer, Wibke; Heinrich, Verena; Schöpflin, Robert; Kraft, Katerina; Kempfer, Rieke; Jerković, Ivana; Chan, Wing-Lee; Spielmann, Malte; Timmermann, Bernd; Wittler, Lars; Kurth, Ingo; Cambiaso, Paola; Zuffardi, Orsetta; Houge, Gunnar; Lambie, Lindsay; Brancati, Francesco; Pombo, Ana; Vingron, Martin; Spitz, Francois; Mundlos, Stefan

2016-10-13

Chromosome conformation capture methods have identified subchromosomal structures of higher-order chromatin interactions called topologically associated domains (TADs) that are separated from each other by boundary regions. By subdividing the genome into discrete regulatory units, TADs restrict the contacts that enhancers establish with their target genes. However, the mechanisms that underlie partitioning of the genome into TADs remain poorly understood. Here we show by chromosome conformation capture (capture Hi-C and 4C-seq methods) that genomic duplications in patient cells and genetically modified mice can result in the formation of new chromatin domains (neo-TADs) and that this process determines their molecular pathology. Duplications of non-coding DNA within the mouse Sox9 TAD (intra-TAD) that cause female to male sex reversal in humans, showed increased contact of the duplicated regions within the TAD, but no change in the overall TAD structure. In contrast, overlapping duplications that extended over the next boundary into the neighbouring TAD (inter-TAD), resulted in the formation of a new chromatin domain (neo-TAD) that was isolated from the rest of the genome. As a consequence of this insulation, inter-TAD duplications had no phenotypic effect. However, incorporation of the next flanking gene, Kcnj2, in the neo-TAD resulted in ectopic contacts of Kcnj2 with the duplicated part of the Sox9 regulatory region, consecutive misexpression of Kcnj2, and a limb malformation phenotype. Our findings provide evidence that TADs are genomic regulatory units with a high degree of internal stability that can be sculptured by structural genomic variations. This process is important for the interpretation of copy number variations, as these variations are routinely detected in diagnostic tests for genetic disease and cancer. This finding also has relevance in an evolutionary setting because copy-number differences are thought to have a crucial role in the evolution of genome complexity.
Genomicus update 2015: KaryoView and MatrixView provide a genome-wide perspective to multispecies comparative genomics

PubMed Central

Louis, Alexandra; Nguyen, Nga Thi Thuy; Muffato, Matthieu; Roest Crollius, Hugues

2015-01-01

The Genomicus web server (http://www.genomicus.biologie.ens.fr/genomicus) is a visualization tool allowing comparative genomics in four different phyla (Vertebrate, Fungi, Metazoan and Plants). It provides access to genomic information from extant species, as well as ancestral gene content and gene order for vertebrates and flowering plants. Here we present the new features available for vertebrate genome with a focus on new graphical tools. The interface to enter the database has been improved, two pairwise genome comparison tools are now available (KaryoView and MatrixView) and the multiple genome comparison tools (PhyloView and AlignView) propose three new kinds of representation and a more intuitive menu. These new developments have been implemented for Genomicus portal dedicated to vertebrates. This allows the analysis of 68 extant animal genomes, as well as 58 ancestral reconstructed genomes. The Genomicus server also provides access to ancestral gene orders, to facilitate evolutionary and comparative genomics studies, as well as computationally predicted regulatory interactions, thanks to the representation of conserved non-coding elements with their putative gene targets. PMID:25378326
Genome Sequence of the Mesophilic Thermotogales Bacterium Mesotoga prima MesG1.Ag.4.2 Reveals the Largest Thermotogales Genome To Date

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhaxybayeva, Olga; Swithers, Kristen S; Foght, Julia

2012-01-01

Here we describe the genome of Mesotoga prima MesG1.Ag4.2, the first genome of a mesophilic Thermotogales bacterium. Mesotoga prima was isolated from a polychlorinated biphenyl (PCB)-dechlorinating enrichment culture from Baltimore Harbor sediments. Its 2.97 Mb genome is considerably larger than any previously sequenced Thermotogales genomes, which range between 1.86 and 2.30 Mb. This larger size is due to both higher numbers of protein-coding genes and larger intergenic regions. In particular, the M. prima genome contains more genes for proteins involved in regulatory functions, for instance those involved in regulation of transcription. Together with its closest relative, Kosmotoga olearia, it alsomore » encodes different types of proteins involved in environmental and cell-cell interactions as compared with other Thermotogales bacteria. Amino acid composition analysis of M. prima proteins implies that this lineage has inhabited low-temperature environments for a long time. A large fraction of the M. prima genome has been acquired by lateral gene transfer (LGT): a DarkHorse analysis suggests that 766 (32%) of predicted protein-coding genes have been involved in LGT after Mesotoga diverged from the other Thermotogales lineages. A notable example of a lineage-specific LGT event is a reductive dehalogenase gene - a key enzyme in dehalorespiration, indicating M. prima may have a more active role in PCB dechlorination than was previously assumed.« less
Genes uniquely expressed in human growth plate chondrocytes uncover a distinct regulatory network.

PubMed

Li, Bing; Balasubramanian, Karthika; Krakow, Deborah; Cohn, Daniel H

2017-12-20

Chondrogenesis is the earliest stage of skeletal development and is a highly dynamic process, integrating the activities and functions of transcription factors, cell signaling molecules and extracellular matrix proteins. The molecular mechanisms underlying chondrogenesis have been extensively studied and multiple key regulators of this process have been identified. However, a genome-wide overview of the gene regulatory network in chondrogenesis has not been achieved. In this study, employing RNA sequencing, we identified 332 protein coding genes and 34 long non-coding RNA (lncRNA) genes that are highly selectively expressed in human fetal growth plate chondrocytes. Among the protein coding genes, 32 genes were associated with 62 distinct human skeletal disorders and 153 genes were associated with skeletal defects in knockout mice, confirming their essential roles in skeletal formation. These gene products formed a comprehensive physical interaction network and participated in multiple cellular processes regulating skeletal development. The data also revealed 34 transcription factors and 11,334 distal enhancers that were uniquely active in chondrocytes, functioning as transcriptional regulators for the cartilage-selective genes. Our findings revealed a complex gene regulatory network controlling skeletal development whereby transcription factors, enhancers and lncRNAs participate in chondrogenesis by transcriptional regulation of key genes. Additionally, the cartilage-selective genes represent candidate genes for unsolved human skeletal disorders.
A CRISPR Path to Engineering New Genetic Mouse Models for Cardiovascular Research

PubMed Central

Miano, Joseph M.; Zhu, Qiuyu Martin; Lowenstein, Charles J.

2016-01-01

Previous efforts to target the mouse genome for the addition, subtraction, or substitution of biologically informative sequences required complex vector design and a series of arduous steps only a handful of labs could master. The facile and inexpensive clustered regularly interspaced short palindromic repeats (CRISPR) method has now superseded traditional means of genome modification such that virtually any lab can quickly assemble reagents for developing new mouse models for cardiovascular research. Here we briefly review the history of CRISPR in prokaryotes, highlighting major discoveries leading to its formulation for genome modification in the animal kingdom. Core components of CRISPR technology are reviewed and updated. Practical pointers for two-component and three-component CRISPR editing are summarized with a number of applications in mice including frameshift mutations, deletion of enhancers and non-coding genes, nucleotide substitution of protein-coding and gene regulatory sequences, incorporation of loxP sites for conditional gene inactivation, and epitope tag integration. Genotyping strategies are presented and topics of genetic mosaicism and inadvertent targeting discussed. Finally, clinical applications and ethical considerations are addressed as the biomedical community eagerly embraces this astonishing innovation in genome editing to tackle previously intractable questions. PMID:27102963
Understanding Neurodevelopmental Disorders: The Promise of Regulatory Variation in the 3'UTRome.

PubMed

Wanke, Kai A; Devanna, Paolo; Vernes, Sonja C

2018-04-01

Neurodevelopmental disorders have a strong genetic component, but despite widespread efforts, the specific genetic factors underlying these disorders remain undefined for a large proportion of affected individuals. Given the accessibility of exome sequencing, this problem has thus far been addressed from a protein-centric standpoint; however, protein-coding regions only make up ∼1% to 2% of the human genome. With the advent of whole genome sequencing we are in the midst of a paradigm shift as it is now possible to interrogate the entire sequence of the human genome (coding and noncoding) to fill in the missing heritability of complex disorders. These new technologies bring new challenges, as the number of noncoding variants identified per individual can be overwhelming, making it prudent to focus on noncoding regions of known function, for which the effects of variation can be predicted and directly tested to assess pathogenicity. The 3'UTRome is a region of the noncoding genome that perfectly fulfills these criteria and is of high interest when searching for pathogenic variation related to complex neurodevelopmental disorders. Herein, we review the regulatory roles of the 3'UTRome as binding sites for microRNAs or RNA binding proteins, or during alternative polyadenylation. We detail existing evidence that these regions contribute to neurodevelopmental disorders and outline strategies for identification and validation of novel putatively pathogenic variation in these regions. This evidence suggests that studying the 3'UTRome will lead to the identification of new risk factors, new candidate disease genes, and a better understanding of the molecular mechanisms contributing to neurodevelopmental disorders. Copyright © 2017 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.
Novel microRNA-like viral small regulatory RNAs arising during human hepatitis A virus infection.

PubMed

Shi, Jiandong; Sun, Jing; Wang, Bin; Wu, Meini; Zhang, Jing; Duan, Zhiqing; Wang, Haixuan; Hu, Ningzhu; Hu, Yunzhang

2014-10-01

MicroRNAs (miRNAs), including host miRNAs and viral miRNAs, play vital roles in regulating host-virus interactions. DNA viruses encode miRNAs that regulate the viral life cycle. However, it is generally believed that cytoplasmic RNA viruses do not encode miRNAs, owing to inaccessible cellular miRNA processing machinery. Here, we provide a comprehensive genome-wide analysis and identification of miRNAs that were derived from hepatitis A virus (HAV; Hu/China/H2/1982), which is a typical cytoplasmic RNA virus. Using deep-sequencing and in silico approaches, we identified 2 novel virally encoded miRNAs, named hav-miR-1-5p and hav-miR-2-5p. Both of the novel virally encoded miRNAs were clearly detected in infected cells. Analysis of Dicer enzyme silencing demonstrated that HAV-derived miRNA biogenesis is Dicer dependent. Furthermore, we confirmed that HAV mature miRNAs were generated from viral miRNA precursors (pre-miRNAs) in host cells. Notably, naturally derived HAV miRNAs were biologically and functionally active and induced post-transcriptional gene silencing (PTGS). Genomic location analysis revealed novel miRNAs located in the coding region of the viral genome. Overall, our results show that HAV naturally generates functional miRNA-like small regulatory RNAs during infection. This is the first report of miRNAs derived from the coding region of genomic RNA of a cytoplasmic RNA virus. These observations demonstrate that a cytoplasmic RNA virus can naturally generate functional miRNAs, as DNA viruses do. These findings also contribute to improved understanding of host-RNA virus interactions mediated by RNA virus-derived miRNAs. © FASEB.
Whole-exome/genome sequencing and genomics.

PubMed

Grody, Wayne W; Thompson, Barry H; Hudgins, Louanne

2013-12-01

As medical genetics has progressed from a descriptive entity to one focused on the functional relationship between genes and clinical disorders, emphasis has been placed on genomics. Genomics, a subelement of genetics, is the study of the genome, the sum total of all the genes of an organism. The human genome, which is contained in the 23 pairs of nuclear chromosomes and in the mitochondrial DNA of each cell, comprises >6 billion nucleotides of genetic code. There are some 23,000 protein-coding genes, a surprisingly small fraction of the total genetic material, with the remainder composed of noncoding DNA, regulatory sequences, and introns. The Human Genome Project, launched in 1990, produced a draft of the genome in 2001 and then a finished sequence in 2003, on the 50th anniversary of the initial publication of Watson and Crick's paper on the double-helical structure of DNA. Since then, this mass of genetic information has been translated at an ever-increasing pace into useable knowledge applicable to clinical medicine. The recent advent of massively parallel DNA sequencing (also known as shotgun, high-throughput, and next-generation sequencing) has brought whole-genome analysis into the clinic for the first time, and most of the current applications are directed at children with congenital conditions that are undiagnosable by using standard genetic tests for single-gene disorders. Thus, pediatricians must become familiar with this technology, what it can and cannot offer, and its technical and ethical challenges. Here, we address the concepts of human genomic analysis and its clinical applicability for primary care providers.
Lnc2Meth: a manually curated database of regulatory relationships between long non-coding RNAs and DNA methylation associated with human disease

PubMed Central

Zhi, Hui; Li, Xin; Wang, Peng; Gao, Yue; Gao, Baoqing; Zhou, Dianshuang; Zhang, Yan; Guo, Maoni; Yue, Ming; Shen, Weitao

2018-01-01

Abstract Lnc2Meth (http://www.bio-bigdata.com/Lnc2Meth/), an interactive resource to identify regulatory relationships between human long non-coding RNAs (lncRNAs) and DNA methylation, is not only a manually curated collection and annotation of experimentally supported lncRNAs-DNA methylation associations but also a platform that effectively integrates tools for calculating and identifying the differentially methylated lncRNAs and protein-coding genes (PCGs) in diverse human diseases. The resource provides: (i) advanced search possibilities, e.g. retrieval of the database by searching the lncRNA symbol of interest, DNA methylation patterns, regulatory mechanisms and disease types; (ii) abundant computationally calculated DNA methylation array profiles for the lncRNAs and PCGs; (iii) the prognostic values for each hit transcript calculated from the patients clinical data; (iv) a genome browser to display the DNA methylation landscape of the lncRNA transcripts for a specific type of disease; (v) tools to re-annotate probes to lncRNA loci and identify the differential methylation patterns for lncRNAs and PCGs with user-supplied external datasets; (vi) an R package (LncDM) to complete the differentially methylated lncRNAs identification and visualization with local computers. Lnc2Meth provides a timely and valuable resource that can be applied to significantly expand our understanding of the regulatory relationships between lncRNAs and DNA methylation in various human diseases. PMID:29069510
Rapid evolution of cis-regulatory sequences via local point mutations

NASA Technical Reports Server (NTRS)

Stone, J. R.; Wray, G. A.

2001-01-01

Although the evolution of protein-coding sequences within genomes is well understood, the same cannot be said of the cis-regulatory regions that control transcription. Yet, changes in gene expression are likely to constitute an important component of phenotypic evolution. We simulated the evolution of new transcription factor binding sites via local point mutations. The results indicate that new binding sites appear and become fixed within populations on microevolutionary timescales under an assumption of neutral evolution. Even combinations of two new binding sites evolve very quickly. We predict that local point mutations continually generate considerable genetic variation that is capable of altering gene expression.
The complete mitochondrial genome of the mudsnail Cipangopaludina cathayensis (Gastropoda: Viviparidae).

PubMed

Yang, Huirong; Zhang, Jia-En; Luo, Hao; Luo, Mingzhu; Guo, Jing; Deng, Zhixin; Zhao, Benliang

2016-05-01

We present the complete mitochondrial genome of Cipangopaludina cathayensis in this study. The mitochondrial genome is 17,157 bp in length, containing 13 protein-coding genes, 2 rRNA genes, 22 tRNA genes. All of them are encoded on the heavy strand except 7 tRNA genes on the light strand. Overall nucleotide compositions of the light strand are 44.51% of A, 26.74% of T, 20.48% of C and 8.28% of G. All the protein-coding genes start with ATG initiation codon except ATP6 with ATA and ND4 with TTG, and 2 types of termination codons are TAA (ATP6, ND2, COX1, COX2, ATP8, ND1, ND6, Cytb, COX3, ND4) and TAG (ND4L, ND5, ND3). There are 29 intergenic spacers and 5 gene overlaps. The tandem repeat sequences are observed in COX2, tRNA(Asp), ATP6, tRNA(Cys), S-rRNA, ND1, Cytb, ND4 and COX3 genes. Gene arrangement and distribution are different from the typical vertebrates. The absence of D-loop is consistent with the Gastropoda, but at least one lengthy non-coding region is essential regulatory element for the initiation of transcription and replication.
Tenebrio molitor antifreeze protein gene identification and regulation.

PubMed

Qin, Wensheng; Walker, Virginia K

2006-02-15

The yellow mealworm, Tenebrio molitor, is a freeze susceptible, stored product pest. Its winter survival is facilitated by the accumulation of antifreeze proteins (AFPs), encoded by a small gene family. We have now isolated 11 different AFP genomic clones from 3 genomic libraries. All the clones had a single coding sequence, with no evidence of intervening sequences. Three genomic clones were further characterized. All have putative TATA box sequences upstream of the coding regions and multiple potential poly(A) signal sequences downstream of the coding regions. A TmAFP regulatory region, B1037, conferred transcriptional activity when ligated to a luciferase reporter sequence and after transfection into an insect cell line. A 143 bp core promoter including a TATA box sequence was identified. Its promoter activity was increased 4.4 times by inserting an exotic 245 bp intron into the construct, similar to the enhancement of transgenic expression seen in several other systems. The addition of a duplication of the first 120 bp sequence from the 143 bp core promoter decreased promoter activity by half. Although putative hormonal response sequences were identified, none of the five hormones tested enhanced reporter activity. These studies on the mechanisms of AFP transcriptional control are important for the consideration of any transfer of freeze-resistance phenotypes to beneficial hosts.
Deploying QTL-seq for rapid delineation of a potential candidate gene underlying major trait-associated QTL in chickpea

PubMed Central

Das, Shouvik; Upadhyaya, Hari D.; Bajaj, Deepak; Kujur, Alice; Badoni, Saurabh; Laxmi; Kumar, Vinod; Tripathi, Shailesh; Gowda, C. L. Laxmipathi; Sharma, Shivali; Singh, Sube; Tyagi, Akhilesh K.; Parida, Swarup K.

2015-01-01

A rapid high-resolution genome-wide strategy for molecular mapping of major QTL(s)/gene(s) regulating important agronomic traits is vital for in-depth dissection of complex quantitative traits and genetic enhancement in chickpea. The present study for the first time employed a NGS-based whole-genome QTL-seq strategy to identify one major genomic region harbouring a robust 100-seed weight QTL using an intra-specific 221 chickpea mapping population (desi cv. ICC 7184 × desi cv. ICC 15061). The QTL-seq-derived major SW QTL (CaqSW1.1) was further validated by single-nucleotide polymorphism (SNP) and simple sequence repeat (SSR) marker-based traditional QTL mapping (47.6% R2 at higher LOD >19). This reflects the reliability and efficacy of QTL-seq as a strategy for rapid genome-wide scanning and fine mapping of major trait regulatory QTLs in chickpea. The use of QTL-seq and classical QTL mapping in combination narrowed down the 1.37 Mb (comprising 177 genes) major SW QTL (CaqSW1.1) region into a 35 kb genomic interval on desi chickpea chromosome 1 containing six genes. One coding SNP (G/A)-carrying constitutive photomorphogenic9 (COP9) signalosome complex subunit 8 (CSN8) gene of these exhibited seed-specific expression, including pronounced differential up-/down-regulation in low and high seed weight mapping parents and homozygous individuals during seed development. The coding SNP mined in this potential seed weight-governing candidate CSN8 gene was found to be present exclusively in all cultivated species/genotypes, but not in any wild species/genotypes of primary, secondary and tertiary gene pools. This indicates the effect of strong artificial and/or natural selection pressure on target SW locus during chickpea domestication. The proposed QTL-seq-driven integrated genome-wide strategy has potential to delineate major candidate gene(s) harbouring a robust trait regulatory QTL rapidly with optimal use of resources. This will further assist us to extrapolate the molecular mechanism underlying complex quantitative traits at a genome-wide scale leading to fast-paced marker-assisted genetic improvement in diverse crop plants, including chickpea. PMID:25922536
Transcriptome landscape of Lactococcus lactis reveals many novel RNAs including a small regulatory RNA involved in carbon uptake and metabolism.

PubMed

van der Meulen, Sjoerd B; de Jong, Anne; Kok, Jan

2016-01-01

RNA sequencing has revolutionized genome-wide transcriptome analyses, and the identification of non-coding regulatory RNAs in bacteria has thus increased concurrently. Here we reveal the transcriptome map of the lactic acid bacterial paradigm Lactococcus lactis MG1363 by employing differential RNA sequencing (dRNA-seq) and a combination of manual and automated transcriptome mining. This resulted in a high-resolution genome annotation of L. lactis and the identification of 60 cis-encoded antisense RNAs (asRNAs), 186 trans-encoded putative regulatory RNAs (sRNAs) and 134 novel small ORFs. Based on the putative targets of asRNAs, a novel classification is proposed. Several transcription factor DNA binding motifs were identified in the promoter sequences of (a)sRNAs, providing insight in the interplay between lactococcal regulatory RNAs and transcription factors. The presence and lengths of 14 putative sRNAs were experimentally confirmed by differential Northern hybridization, including the abundant RNA 6S that is differentially expressed depending on the available carbon source. For another sRNA, LLMGnc_147, functional analysis revealed that it is involved in carbon uptake and metabolism. L. lactis contains 13% leaderless mRNAs (lmRNAs) that, from an analysis of overrepresentation in GO classes, seem predominantly involved in nucleotide metabolism and DNA/RNA binding. Moreover, an A-rich sequence motif immediately following the start codon was uncovered, which could provide novel insight in the translation of lmRNAs. Altogether, this first experimental genome-wide assessment of the transcriptome landscape of L. lactis and subsequent sRNA studies provide an extensive basis for the investigation of regulatory RNAs in L. lactis and related lactococcal species.
The DNA Methylome of Human Peripheral Blood Mononuclear Cells

PubMed Central

Ye, Mingzhi; Zheng, Hancheng; Yu, Jian; Wu, Honglong; Sun, Jihua; Zhang, Hongyu; Chen, Quan; Luo, Ruibang; Chen, Minfeng; He, Yinghua; Jin, Xin; Zhang, Qinghui; Yu, Chang; Zhou, Guangyu; Sun, Jinfeng; Huang, Yebo; Zheng, Huisong; Cao, Hongzhi; Zhou, Xiaoyu; Guo, Shicheng; Hu, Xueda; Li, Xin; Kristiansen, Karsten; Bolund, Lars; Xu, Jiujin; Wang, Wen; Yang, Huanming; Wang, Jian; Li, Ruiqiang; Beck, Stephan; Wang, Jun; Zhang, Xiuqing

2010-01-01

DNA methylation plays an important role in biological processes in human health and disease. Recent technological advances allow unbiased whole-genome DNA methylation (methylome) analysis to be carried out on human cells. Using whole-genome bisulfite sequencing at 24.7-fold coverage (12.3-fold per strand), we report a comprehensive (92.62%) methylome and analysis of the unique sequences in human peripheral blood mononuclear cells (PBMC) from the same Asian individual whose genome was deciphered in the YH project. PBMC constitute an important source for clinical blood tests world-wide. We found that 68.4% of CpG sites and <0.2% of non-CpG sites were methylated, demonstrating that non-CpG cytosine methylation is minor in human PBMC. Analysis of the PBMC methylome revealed a rich epigenomic landscape for 20 distinct genomic features, including regulatory, protein-coding, non-coding, RNA-coding, and repeat sequences. Integration of our methylome data with the YH genome sequence enabled a first comprehensive assessment of allele-specific methylation (ASM) between the two haploid methylomes of any individual and allowed the identification of 599 haploid differentially methylated regions (hDMRs) covering 287 genes. Of these, 76 genes had hDMRs within 2 kb of their transcriptional start sites of which >80% displayed allele-specific expression (ASE). These data demonstrate that ASM is a recurrent phenomenon and is highly correlated with ASE in human PBMCs. Together with recently reported similar studies, our study provides a comprehensive resource for future epigenomic research and confirms new sequencing technology as a paradigm for large-scale epigenomics studies. PMID:21085693
CORALINA: a universal method for the generation of gRNA libraries for CRISPR-based screening.

PubMed

Köferle, Anna; Worf, Karolina; Breunig, Christopher; Baumann, Valentin; Herrero, Javier; Wiesbeck, Maximilian; Hutter, Lukas H; Götz, Magdalena; Fuchs, Christiane; Beck, Stephan; Stricker, Stefan H

2016-11-14

The bacterial CRISPR system is fast becoming the most popular genetic and epigenetic engineering tool due to its universal applicability and adaptability. The desire to deploy CRISPR-based methods in a large variety of species and contexts has created an urgent need for the development of easy, time- and cost-effective methods enabling large-scale screening approaches. Here we describe CORALINA (comprehensive gRNA library generation through controlled nuclease activity), a method for the generation of comprehensive gRNA libraries for CRISPR-based screens. CORALINA gRNA libraries can be derived from any source of DNA without the need of complex oligonucleotide synthesis. We show the utility of CORALINA for human and mouse genomic DNA, its reproducibility in covering the most relevant genomic features including regulatory, coding and non-coding sequences and confirm the functionality of CORALINA generated gRNAs. The simplicity and cost-effectiveness make CORALINA suitable for any experimental system. The unprecedented sequence complexities obtainable with CORALINA libraries are a necessary pre-requisite for less biased large scale genomic and epigenomic screens.
Horizontal gene transfer of chromosomal Type II toxin-antitoxin systems of Escherichia coli.

PubMed

Ramisetty, Bhaskar Chandra Mohan; Santhosh, Ramachandran Sarojini

2016-02-01

Type II toxin-antitoxin systems (TAs) are small autoregulated bicistronic operons that encode a toxin protein with the potential to inhibit metabolic processes and an antitoxin protein to neutralize the toxin. Most of the bacterial genomes encode multiple TAs. However, the diversity and accumulation of TAs on bacterial genomes and its physiological implications are highly debated. Here we provide evidence that Escherichia coli chromosomal TAs (encoding RNase toxins) are 'acquired' DNA likely originated from heterologous DNA and are the smallest known autoregulated operons with the potential for horizontal propagation. Sequence analyses revealed that integration of TAs into the bacterial genome is unique and contributes to variations in the coding and/or regulatory regions of flanking host genome sequences. Plasmids and genomes encoding identical TAs of natural isolates are mutually exclusive. Chromosomal TAs might play significant roles in the evolution and ecology of bacteria by contributing to host genome variation and by moderation of plasmid maintenance. © FEMS 2015. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Dcode.org anthology of comparative genomic tools.

PubMed

Loots, Gabriela G; Ovcharenko, Ivan

2005-07-01

Comparative genomics provides the means to demarcate functional regions in anonymous DNA sequences. The successful application of this method to identifying novel genes is currently shifting to deciphering the non-coding encryption of gene regulation across genomes. To facilitate the practical application of comparative sequence analysis to genetics and genomics, we have developed several analytical and visualization tools for the analysis of arbitrary sequences and whole genomes. These tools include two alignment tools, zPicture and Mulan; a phylogenetic shadowing tool, eShadow for identifying lineage- and species-specific functional elements; two evolutionary conserved transcription factor analysis tools, rVista and multiTF; a tool for extracting cis-regulatory modules governing the expression of co-regulated genes, Creme 2.0; and a dynamic portal to multiple vertebrate and invertebrate genome alignments, the ECR Browser. Here, we briefly describe each one of these tools and provide specific examples on their practical applications. All the tools are publicly available at the http://www.dcode.org/ website.
Innate immune activity conditions the effect of regulatory variants upon monocyte gene expression.

PubMed

Fairfax, Benjamin P; Humburg, Peter; Makino, Seiko; Naranbhai, Vivek; Wong, Daniel; Lau, Evelyn; Jostins, Luke; Plant, Katharine; Andrews, Robert; McGee, Chris; Knight, Julian C

2014-03-07

To systematically investigate the impact of immune stimulation upon regulatory variant activity, we exposed primary monocytes from 432 healthy Europeans to interferon-γ (IFN-γ) or differing durations of lipopolysaccharide and mapped expression quantitative trait loci (eQTLs). More than half of cis-eQTLs identified, involving hundreds of genes and associated pathways, are detected specifically in stimulated monocytes. Induced innate immune activity reveals multiple master regulatory trans-eQTLs including the major histocompatibility complex (MHC), coding variants altering enzyme and receptor function, an IFN-β cytokine network showing temporal specificity, and an interferon regulatory factor 2 (IRF2) transcription factor-modulated network. Induced eQTL are significantly enriched for genome-wide association study loci, identifying context-specific associations to putative causal genes including CARD9, ATM, and IRF8. Thus, applying pathophysiologically relevant immune stimuli assists resolution of functional genetic variants.
Long-Range Control of Gene Expression: Emerging Mechanisms and Disruption in Disease

PubMed Central

Kleinjan, Dirk A.; van Heyningen, Veronica

2005-01-01

Transcriptional control is a major mechanism for regulating gene expression. The complex machinery required to effect this control is still emerging from functional and evolutionary analysis of genomic architecture. In addition to the promoter, many other regulatory elements are required for spatiotemporally and quantitatively correct gene expression. Enhancer and repressor elements may reside in introns or up- and downstream of the transcription unit. For some genes with highly complex expression patterns—often those that function as key developmental control genes—the cis-regulatory domain can extend long distances outside the transcription unit. Some of the earliest hints of this came from disease-associated chromosomal breaks positioned well outside the relevant gene. With the availability of wide-ranging genome sequence comparisons, strong conservation of many noncoding regions became obvious. Functional studies have shown many of these conserved sites to be transcriptional regulatory elements that sometimes reside inside unrelated neighboring genes. Such sequence-conserved elements generally harbor sites for tissue-specific DNA-binding proteins. Developmentally variable chromatin conformation can control protein access to these sites and can regulate transcription. Disruption of these finely tuned mechanisms can cause disease. Some regulatory element mutations will be associated with phenotypes distinct from any identified for coding-region mutations. PMID:15549674

Genome-scale cold stress response regulatory networks in ten Arabidopsis thaliana ecotypes

PubMed Central

2013-01-01

Background Low temperature leads to major crop losses every year. Although several studies have been conducted focusing on diversity of cold tolerance level in multiple phenotypically divergent Arabidopsis thaliana (A. thaliana) ecotypes, genome-scale molecular understanding is still lacking. Results In this study, we report genome-scale transcript response diversity of 10 A. thaliana ecotypes originating from different geographical locations to non-freezing cold stress (10°C). To analyze the transcriptional response diversity, we initially compared transcriptome changes in all 10 ecotypes using Arabidopsis NimbleGen ATH6 microarrays. In total 6061 transcripts were significantly cold regulated (p < 0.01) in 10 ecotypes, including 498 transcription factors and 315 transposable elements. The majority of the transcripts (75%) showed ecotype specific expression pattern. By using sequence data available from Arabidopsis thaliana 1001 genome project, we further investigated sequence polymorphisms in the core cold stress regulon genes. Significant numbers of non-synonymous amino acid changes were observed in the coding region of the CBF regulon genes. Considering the limited knowledge about regulatory interactions between transcription factors and their target genes in the model plant A. thaliana, we have adopted a powerful systems genetics approach- Network Component Analysis (NCA) to construct an in-silico transcriptional regulatory network model during response to cold stress. The resulting regulatory network contained 1,275 nodes and 7,720 connections, with 178 transcription factors and 1,331 target genes. Conclusions A. thaliana ecotypes exhibit considerable variation in transcriptome level responses to non-freezing cold stress treatment. Ecotype specific transcripts and related gene ontology (GO) categories were identified to delineate natural variation of cold stress regulated differential gene expression in the model plant A. thaliana. The predicted regulatory network model was able to identify new ecotype specific transcription factors and their regulatory interactions, which might be crucial for their local geographic adaptation to cold temperature. Additionally, since the approach presented here is general, it could be adapted to study networks regulating biological process in any biological systems. PMID:24148294
Transcriptome Analysis of an Insecticide Resistant Housefly Strain: Insights about SNPs and Regulatory Elements in Cytochrome P450 Genes.

PubMed

Mahmood, Khalid; Højland, Dorte H; Asp, Torben; Kristensen, Michael

2016-01-01

Insecticide resistance in the housefly, Musca domestica, has been investigated for more than 60 years. It will enter a new era after the recent publication of the housefly genome and the development of multiple next generation sequencing technologies. The genetic background of the xenobiotic response can now be investigated in greater detail. Here, we investigate the 454-pyrosequencing transcriptome of the spinosad-resistant 791spin strain in relation to the housefly genome with focus on P450 genes. The de novo assembly of clean reads gave 35,834 contigs consisting of 21,780 sequences of the spinosad resistant strain. The 3,648 sequences were annotated with an enzyme code EC number and were mapped to 124 KEGG pathways with metabolic processes as most highly represented pathway. One hundred and twenty contigs were annotated as P450s covering 44 different P450 genes of housefly. Eight differentially expressed P450s genes were identified and investigated for SNPs, CpG islands and common regulatory motifs in promoter and coding regions. Functional annotation clustering of metabolic related genes and motif analysis of P450s revealed their association with epigenetic, transcription and gene expression related functions. The sequence variation analysis resulted in 12 SNPs and eight of them found in cyp6d1. There is variation in location, size and frequency of CpG islands and specific motifs were also identified in these P450s. Moreover, identified motifs were associated to GO terms and transcription factors using bioinformatic tools. Transcriptome data of a spinosad resistant strain provide together with genome data fundamental support for future research to understand evolution of resistance in houseflies. Here, we report for the first time the SNPs, CpG islands and common regulatory motifs in differentially expressed P450s. Taken together our findings will serve as a stepping stone to advance understanding of the mechanism and role of P450s in xenobiotic detoxification.
Basic biology and therapeutic implications of lncRNA.

PubMed

Khorkova, O; Hsiao, J; Wahlestedt, C

2015-06-29

Long non-coding RNAs (lncRNA), a class of non-coding RNA molecules recently identified largely due to the efforts of FANTOM, and later GENCODE and ENCODE consortia, have been a subject of intense investigation in the past decade. Extensive efforts to get deeper understanding of lncRNA biology have yielded evidence of their diverse structural and regulatory roles in protecting chromosome integrity, maintaining genomic architecture, X chromosome inactivation, imprinting, transcription, translation and epigenetic regulation. Here we will briefly review the recent studies in the field of lncRNA biology focusing mostly on mammalian species and discuss their therapeutic implications. Copyright © 2015 Elsevier B.V. All rights reserved.
Platelet function is modified by common sequence variation in megakaryocyte super enhancers

PubMed Central

Petersen, Romina; Lambourne, John J.; Javierre, Biola M.; Grassi, Luigi; Kreuzhuber, Roman; Ruklisa, Dace; Rosa, Isabel M.; Tomé, Ana R.; Elding, Heather; van Geffen, Johanna P.; Jiang, Tao; Farrow, Samantha; Cairns, Jonathan; Al-Subaie, Abeer M.; Ashford, Sofie; Attwood, Antony; Batista, Joana; Bouman, Heleen; Burden, Frances; Choudry, Fizzah A.; Clarke, Laura; Flicek, Paul; Garner, Stephen F.; Haimel, Matthias; Kempster, Carly; Ladopoulos, Vasileios; Lenaerts, An-Sofie; Materek, Paulina M.; McKinney, Harriet; Meacham, Stuart; Mead, Daniel; Nagy, Magdolna; Penkett, Christopher J.; Rendon, Augusto; Seyres, Denis; Sun, Benjamin; Tuna, Salih; van der Weide, Marie-Elise; Wingett, Steven W.; Martens, Joost H.; Stegle, Oliver; Richardson, Sylvia; Vallier, Ludovic; Roberts, David J.; Freson, Kathleen; Wernisch, Lorenz; Stunnenberg, Hendrik G.; Danesh, John; Fraser, Peter; Soranzo, Nicole; Butterworth, Adam S.; Heemskerk, Johan W.; Turro, Ernest; Spivakov, Mikhail; Ouwehand, Willem H.; Astle, William J.; Downes, Kate; Kostadima, Myrto; Frontini, Mattia

2017-01-01

Linking non-coding genetic variants associated with the risk of diseases or disease-relevant traits to target genes is a crucial step to realize GWAS potential in the introduction of precision medicine. Here we set out to determine the mechanisms underpinning variant association with platelet quantitative traits using cell type-matched epigenomic data and promoter long-range interactions. We identify potential regulatory functions for 423 of 565 (75%) non-coding variants associated with platelet traits and we demonstrate, through ex vivo and proof of principle genome editing validation, that variants in super enhancers play an important role in controlling archetypical platelet functions. PMID:28703137
Non-coding RNAs and Their Roles in Stress Response in Plants.

PubMed

Wang, Jingjing; Meng, Xianwen; Dobrovolskaya, Oxana B; Orlov, Yuriy L; Chen, Ming

2017-10-01

Eukaryotic genomes encode thousands of non-coding RNAs (ncRNAs), which play crucial roles in transcriptional and post-transcriptional regulation of gene expression. Accumulating evidence indicates that ncRNAs, especially microRNAs (miRNAs) and long ncRNAs (lncRNAs), have emerged as key regulatory molecules in plant stress responses. In this review, we have summarized the current progress on the understanding of plant miRNA and lncRNA identification, characteristics, bioinformatics tools, and resources, and provided examples of mechanisms of miRNA- and lncRNA-mediated plant stress tolerance. Copyright © 2017 The Authors. Production and hosting by Elsevier B.V. All rights reserved.
Identification of novel craniofacial regulatory domains located far upstream of SOX9 and disrupted in Pierre Robin sequence

PubMed Central

Gordon, Christopher T.; Attanasio, Catia; Bhatia, Shipra; Benko, Sabina; Ansari, Morad; Tan, Tiong Y.; Munnich, Arnold; Pennacchio, Len A.; Abadie, Véronique; Temple, I. Karen; Goldenberg, Alice; van Heyningen, Veronica; Amiel, Jeanne; FitzPatrick, David; Kleinjan, Dirk A.; Visel, Axel; Lyonnet, Stanislas

2015-01-01

Mutations in the coding sequence of SOX9 cause campomelic dysplasia (CD), a disorder of skeletal development associated with 46,XY disorders of sex development (DSDs). Translocations, deletions and duplications within a ~2 Mb region upstream of SOX9 can recapitulate the CD-DSD phenotype fully or partially, suggesting the existence of an unusually large cis-regulatory control region. Pierre Robin sequence (PRS) is a craniofacial disorder that is frequently an endophenotype of CD and a locus for isolated PRS at ~1.2-1.5 Mb upstream of SOX9 has been previously reported. The craniofacial regulatory potential within this locus, and within the greater genomic domain surrounding SOX9, remains poorly defined. We report two novel deletions upstream of SOX9 in families with PRS, allowing refinement of the regions harbouring candidate craniofacial regulatory elements. In parallel, ChIP-Seq for p300 binding sites in mouse craniofacial tissue led to the identification of several novel craniofacial enhancers at the SOX9 locus, which were validated in transgenic reporter mice and zebrafish. Notably, some of the functionally validated elements fall within the PRS deletions. These studies suggest that multiple non-coding elements contribute to the craniofacial regulation of SOX9 expression, and that their disruption results in PRS. PMID:24934569
Bayesian variable selection for post-analytic interrogation of susceptibility loci.

PubMed

Chen, Siying; Nunez, Sara; Reilly, Muredach P; Foulkes, Andrea S

2017-06-01

Understanding the complex interplay among protein coding genes and regulatory elements requires rigorous interrogation with analytic tools designed for discerning the relative contributions of overlapping genomic regions. To this aim, we offer a novel application of Bayesian variable selection (BVS) for classifying genomic class level associations using existing large meta-analysis summary level resources. This approach is applied using the expectation maximization variable selection (EMVS) algorithm to typed and imputed SNPs across 502 protein coding genes (PCGs) and 220 long intergenic non-coding RNAs (lncRNAs) that overlap 45 known loci for coronary artery disease (CAD) using publicly available Global Lipids Gentics Consortium (GLGC) (Teslovich et al., 2010; Willer et al., 2013) meta-analysis summary statistics for low-density lipoprotein cholesterol (LDL-C). The analysis reveals 33 PCGs and three lncRNAs across 11 loci with >50% posterior probabilities for inclusion in an additive model of association. The findings are consistent with previous reports, while providing some new insight into the architecture of LDL-cholesterol to be investigated further. As genomic taxonomies continue to evolve, additional classes such as enhancer elements and splicing regions, can easily be layered into the proposed analysis framework. Moreover, application of this approach to alternative publicly available meta-analysis resources, or more generally as a post-analytic strategy to further interrogate regions that are identified through single point analysis, is straightforward. All coding examples are implemented in R version 3.2.1 and provided as supplemental material. © 2016, The International Biometric Society.
An Integrated Encyclopedia of DNA Elements in the Human Genome

PubMed Central

2012-01-01

Summary The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure, and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall the project provides new insights into the organization and regulation of our genes and genome, and an expansive resource of functional annotations for biomedical research. PMID:22955616
Complete genome sequence of an isolate of Potato virus X (PVX) infecting Cape gooseberry (Physalis peruviana) in Colombia.

PubMed

Gutiérrez, Pablo A; Alzate, Juan F; Montoya, Mauricio Marín

2015-06-01

Transcriptome analysis of a Cape gooseberry (Physalis peruviana) plant with leaf symptoms of a mild yellow mosaic typical of a viral disease revealed an infection with Potato virus X (PVX). The genome sequence of the PVX-Physalis isolate comprises 6435 nt and exhibits higher sequence similarity to members of the Eurasian group of PVX (~95 %) than to the American group (~77 %). Genome organization is similar to other PVX isolates with five open reading frames coding for proteins RdRp, TGBp1, TGBp2, TGBp3, and CP. 5' and 3' untranslated regions revealed all regulatory motifs typically found in PVX isolates. The PVX-Physalis genome is the only complete sequence available for a Potexvirus in Colombia and is a new addition to the restricted number of available sequences of PVX isolates infecting plant species different to potato.
Characterization of noncoding regulatory DNA in the human genome.

PubMed

Elkon, Ran; Agami, Reuven

2017-08-08

Genetic variants associated with common diseases are usually located in noncoding parts of the human genome. Delineation of the full repertoire of functional noncoding elements, together with efficient methods for probing their biological roles, is therefore of crucial importance. Over the past decade, DNA accessibility and various epigenetic modifications have been associated with regulatory functions. Mapping these features across the genome has enabled researchers to begin to document the full complement of putative regulatory elements. High-throughput reporter assays to probe the functions of regulatory regions have also been developed but these methods separate putative regulatory elements from the chromosome so that any effects of chromatin context and long-range regulatory interactions are lost. Definitive assignment of function(s) to putative cis-regulatory elements requires perturbation of these elements. Genome-editing technologies are now transforming our ability to perturb regulatory elements across entire genomes. Interpretation of high-throughput genetic screens that incorporate genome editors might enable the construction of an unbiased map of functional noncoding elements in the human genome.
Mining the LIPG Allelic Spectrum Reveals the Contribution of Rare and Common Regulatory Variants to HDL Cholesterol

PubMed Central

Raghavan, Avanthi; Neeli, Hemanth; Jin, Weijun; Badellino, Karen O.; Demissie, Serkalem; Manning, Alisa K.; DerOhannessian, Stephanie L.; Wolfe, Megan L.; Cupples, L. Adrienne; Li, Mingyao; Kathiresan, Sekar; Rader, Daniel J.

2011-01-01

Genome-wide association studies (GWAS) have successfully identified loci associated with quantitative traits, such as blood lipids. Deep resequencing studies are being utilized to catalogue the allelic spectrum at GWAS loci. The goal of these studies is to identify causative variants and missing heritability, including heritability due to low frequency and rare alleles with large phenotypic impact. Whereas rare variant efforts have primarily focused on nonsynonymous coding variants, we hypothesized that noncoding variants in these loci are also functionally important. Using the HDL-C gene LIPG as an example, we explored the effect of regulatory variants identified through resequencing of subjects at HDL-C extremes on gene expression, protein levels, and phenotype. Resequencing a portion of the LIPG promoter and 5′ UTR in human subjects with extreme HDL-C, we identified several rare variants in individuals from both extremes. Luciferase reporter assays were used to measure the effect of these rare variants on LIPG expression. Variants conferring opposing effects on gene expression were enriched in opposite extremes of the phenotypic distribution. Minor alleles of a common regulatory haplotype and noncoding GWAS SNPs were associated with reduced plasma levels of the LIPG gene product endothelial lipase (EL), consistent with its role in HDL-C catabolism. Additionally, we found that a common nonfunctional coding variant associated with HDL-C (rs2000813) is in linkage disequilibrium with a 5′ UTR variant (rs34474737) that decreases LIPG promoter activity. We attribute the gene regulatory role of rs34474737 to the observed association of the coding variant with plasma EL levels and HDL-C. Taken together, the findings show that both rare and common noncoding regulatory variants are important contributors to the allelic spectrum in complex trait loci. PMID:22174694
Modeling positional effects of regulatory sequences with spline transformations increases prediction accuracy of deep neural networks

PubMed Central

Avsec, Žiga; Cheng, Jun; Gagneur, Julien

2018-01-01

Abstract Motivation Regulatory sequences are not solely defined by their nucleic acid sequence but also by their relative distances to genomic landmarks such as transcription start site, exon boundaries or polyadenylation site. Deep learning has become the approach of choice for modeling regulatory sequences because of its strength to learn complex sequence features. However, modeling relative distances to genomic landmarks in deep neural networks has not been addressed. Results Here we developed spline transformation, a neural network module based on splines to flexibly and robustly model distances. Modeling distances to various genomic landmarks with spline transformations significantly increased state-of-the-art prediction accuracy of in vivo RNA-binding protein binding sites for 120 out of 123 proteins. We also developed a deep neural network for human splice branchpoint based on spline transformations that outperformed the current best, already distance-based, machine learning model. Compared to piecewise linear transformation, as obtained by composition of rectified linear units, spline transformation yields higher prediction accuracy as well as faster and more robust training. As spline transformation can be applied to further quantities beyond distances, such as methylation or conservation, we foresee it as a versatile component in the genomics deep learning toolbox. Availability and implementation Spline transformation is implemented as a Keras layer in the CONCISE python package: https://github.com/gagneurlab/concise. Analysis code is available at https://github.com/gagneurlab/Manuscript_Avsec_Bioinformatics_2017. Contact avsec@in.tum.de or gagneur@in.tum.de Supplementary information Supplementary data are available at Bioinformatics online. PMID:29155928
Defining the diverse spectrum of inversions, complex structural variation, and chromothripsis in the morbid human genome.

PubMed

Collins, Ryan L; Brand, Harrison; Redin, Claire E; Hanscom, Carrie; Antolik, Caroline; Stone, Matthew R; Glessner, Joseph T; Mason, Tamara; Pregno, Giulia; Dorrani, Naghmeh; Mandrile, Giorgia; Giachino, Daniela; Perrin, Danielle; Walsh, Cole; Cipicchio, Michelle; Costello, Maura; Stortchevoi, Alexei; An, Joon-Yong; Currall, Benjamin B; Seabra, Catarina M; Ragavendran, Ashok; Margolin, Lauren; Martinez-Agosto, Julian A; Lucente, Diane; Levy, Brynn; Sanders, Stephan J; Wapner, Ronald J; Quintero-Rivera, Fabiola; Kloosterman, Wigard; Talkowski, Michael E

2017-03-06

Structural variation (SV) influences genome organization and contributes to human disease. However, the complete mutational spectrum of SV has not been routinely captured in disease association studies. We sequenced 689 participants with autism spectrum disorder (ASD) and other developmental abnormalities to construct a genome-wide map of large SV. Using long-insert jumping libraries at 105X mean physical coverage and linked-read whole-genome sequencing from 10X Genomics, we document seven major SV classes at ~5 kb SV resolution. Our results encompass 11,735 distinct large SV sites, 38.1% of which are novel and 16.8% of which are balanced or complex. We characterize 16 recurrent subclasses of complex SV (cxSV), revealing that: (1) cxSV are larger and rarer than canonical SV; (2) each genome harbors 14 large cxSV on average; (3) 84.4% of large cxSVs involve inversion; and (4) most large cxSV (93.8%) have not been delineated in previous studies. Rare SVs are more likely to disrupt coding and regulatory non-coding loci, particularly when truncating constrained and disease-associated genes. We also identify multiple cases of catastrophic chromosomal rearrangements known as chromoanagenesis, including somatic chromoanasynthesis, and extreme balanced germline chromothripsis events involving up to 65 breakpoints and 60.6 Mb across four chromosomes, further defining rare categories of extreme cxSV. These data provide a foundational map of large SV in the morbid human genome and demonstrate a previously underappreciated abundance and diversity of cxSV that should be considered in genomic studies of human disease.
Scanning the Human Genome for Novel Therapeutic Targets for Breast Cancer

DTIC Science & Technology

2006-04-01

action of this class of non-coding regulatory RNAs13,14. MicroRNAs are transcribed by RNA polymerase II as long primary polyadenylated transcripts...Artificial miRNAs can be expressed from both RNA polymerase II and III promoters resulting in silencing to varying degrees. At present there...the highest levels of mature microRNA in RISC and generally effective silencing. These structures can be transcribed by either RNA polymerase II or
Cell-type-specific enrichment of risk-associated regulatory elements at ovarian cancer susceptibility loci.

PubMed

Coetzee, Simon G; Shen, Howard C; Hazelett, Dennis J; Lawrenson, Kate; Kuchenbaecker, Karoline; Tyrer, Jonathan; Rhie, Suhn K; Levanon, Keren; Karst, Alison; Drapkin, Ronny; Ramus, Susan J; Couch, Fergus J; Offit, Kenneth; Chenevix-Trench, Georgia; Monteiro, Alvaro N A; Antoniou, Antonis; Freedman, Matthew; Coetzee, Gerhard A; Pharoah, Paul D P; Noushmehr, Houtan; Gayther, Simon A

2015-07-01

Understanding the regulatory landscape of the human genome is a central question in complex trait genetics. Most single-nucleotide polymorphisms (SNPs) associated with cancer risk lie in non-protein-coding regions, implicating regulatory DNA elements as functional targets of susceptibility variants. Here, we describe genome-wide annotation of regions of open chromatin and histone modification in fallopian tube and ovarian surface epithelial cells (FTSECs, OSECs), the debated cellular origins of high-grade serous ovarian cancers (HGSOCs) and in endometriosis epithelial cells (EECs), the likely precursor of clear cell ovarian carcinomas (CCOCs). The regulatory architecture of these cell types was compared with normal human mammary epithelial cells and LNCaP prostate cancer cells. We observed similar positional patterns of global enhancer signatures across the three different ovarian cancer precursor cell types, and evidence of tissue-specific regulatory signatures compared to non-gynecological cell types. We found significant enrichment for risk-associated SNPs intersecting regulatory biofeatures at 17 known HGSOC susceptibility loci in FTSECs (P = 3.8 × 10(-30)), OSECs (P = 2.4 × 10(-23)) and HMECs (P = 6.7 × 10(-15)) but not for EECs (P = 0.45) or LNCaP cells (P = 0.88). Hierarchical clustering of risk SNPs conditioned on the six different cell types indicates FTSECs and OSECs are highly related (96% of samples using multi-scale bootstrapping) suggesting both cell types may be precursors of HGSOC. These data represent the first description of regulatory catalogues of normal precursor cells for different ovarian cancer subtypes, and provide unique insights into the tissue specific regulatory variation with respect to the likely functional targets of germline genetic susceptibility variants for ovarian cancer. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
The analysis of APOL1 genetic variation and haplotype diversity provided by 1000 Genomes project.

PubMed

Peng, Ting; Wang, Li; Li, Guisen

2017-08-11

The APOL1 gene variants has been shown to be associated with an increased risk of multiple kinds of diseases, particularly in African Americans, but not in Caucasians and Asians. In this study, we explored the single nucleotide polymorphism (SNP) and haplotype diversity of APOL1 gene in different races provided by 1000 Genomes project. Variants of APOL1 gene in 1000 Genome Project were obtained and SNPs located in the regulatory region or coding region were selected for genetic variation analysis. Total 2504 individuals from 26 populations were classified as four groups that included Africa, Europe, Asia and Admixed populations. Tag SNPs were selected to evaluate the haplotype diversities in the four populations by HaploStats software. APOL1 gene was surrounded by some of the most polymorphic genes in the human genome, variation of APOL1 gene was common, with up to 613 SNP (1000 Genome Project reported) and 99 of them (16.2%) with MAF ≥ 1%. There were 79 SNPs in the URR and 92 SNPs in 3'UTR. Total 12 SNPs in URR and 24 SNPs in 3'UTR were considered as common variants with MAF ≥ 1%. It is worth noting that URR-1 was presents lower frequencies in European populations, while other three haplotypes taken an opposite pattern; 3'UTR presents several high-frequency variation sites in a short segment, and the differences of its haplotypes among different population were significant (P < 0.01), UTR-1 and UTR-5 presented much higher frequency in African population, while UTR-2, UTR-3 and UTR-4 were much lower. APOL1 coding region showed that two SNP of G1 with higher frequency are actually pull down the haplotype H-1 frequency when considering all populations pooled together, and the diversity among the four populations be widen by the G1 two mutation (P 1 = 3.33E-4 vs P 2 = 3.61E-30). The distributions of APOL1 gene variants and haplotypes were significantly different among the different populations, in either regulatory or coding regions. It could provide clues for the future genetic study of APOL1 related diseases.
Non-coding-regulatory regions of human brain genes delineated by bacterial artificial chromosome knock-in mice.

PubMed

Schmouth, Jean-François; Castellarin, Mauro; Laprise, Stéphanie; Banks, Kathleen G; Bonaguro, Russell J; McInerny, Simone C; Borretta, Lisa; Amirabbasi, Mahsa; Korecki, Andrea J; Portales-Casamar, Elodie; Wilson, Gary; Dreolini, Lisa; Jones, Steven J M; Wasserman, Wyeth W; Goldowitz, Daniel; Holt, Robert A; Simpson, Elizabeth M

2013-10-14

The next big challenge in human genetics is understanding the 98% of the genome that comprises non-coding DNA. Hidden in this DNA are sequences critical for gene regulation, and new experimental strategies are needed to understand the functional role of gene-regulation sequences in health and disease. In this study, we build upon our HuGX ('high-throughput human genes on the X chromosome') strategy to expand our understanding of human gene regulation in vivo. In all, ten human genes known to express in therapeutically important brain regions were chosen for study. For eight of these genes, human bacterial artificial chromosome clones were identified, retrofitted with a reporter, knocked single-copy into the Hprt locus in mouse embryonic stem cells, and mouse strains derived. Five of these human genes expressed in mouse, and all expressed in the adult brain region for which they were chosen. This defined the boundaries of the genomic DNA sufficient for brain expression, and refined our knowledge regarding the complexity of gene regulation. We also characterized for the first time the expression of human MAOA and NR2F2, two genes for which the mouse homologs have been extensively studied in the central nervous system (CNS), and AMOTL1 and NOV, for which roles in CNS have been unclear. We have demonstrated the use of the HuGX strategy to functionally delineate non-coding-regulatory regions of therapeutically important human brain genes. Our results also show that a careful investigation, using publicly available resources and bioinformatics, can lead to accurate predictions of gene expression.
Genomicus update 2015: KaryoView and MatrixView provide a genome-wide perspective to multispecies comparative genomics.

PubMed

Louis, Alexandra; Nguyen, Nga Thi Thuy; Muffato, Matthieu; Roest Crollius, Hugues

2015-01-01

The Genomicus web server (http://www.genomicus.biologie.ens.fr/genomicus) is a visualization tool allowing comparative genomics in four different phyla (Vertebrate, Fungi, Metazoan and Plants). It provides access to genomic information from extant species, as well as ancestral gene content and gene order for vertebrates and flowering plants. Here we present the new features available for vertebrate genome with a focus on new graphical tools. The interface to enter the database has been improved, two pairwise genome comparison tools are now available (KaryoView and MatrixView) and the multiple genome comparison tools (PhyloView and AlignView) propose three new kinds of representation and a more intuitive menu. These new developments have been implemented for Genomicus portal dedicated to vertebrates. This allows the analysis of 68 extant animal genomes, as well as 58 ancestral reconstructed genomes. The Genomicus server also provides access to ancestral gene orders, to facilitate evolutionary and comparative genomics studies, as well as computationally predicted regulatory interactions, thanks to the representation of conserved non-coding elements with their putative gene targets. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
A multitasking Argonaute: exploring the many facets of C. elegans CSR-1.

PubMed

Wedeles, Christopher J; Wu, Monica Z; Claycomb, Julie M

2013-12-01

While initial studies of small RNA-mediated gene regulatory pathways focused on the cytoplasmic functions of such pathways, identifying roles for Argonaute/small RNA pathways in modulating chromatin and organizing the genome has become a topic of intense research in recent years. Nuclear regulatory mechanisms for Argonaute/small RNA pathways appear to be widespread, in organisms ranging from plants to fission yeast, Caenorhabditis elegans to humans. As the effectors of small RNA-mediated gene regulatory pathways, Argonaute proteins guide the chromatin-directed activities of these pathways. Of particular interest is the C. elegans Argonaute, chromosome segregation and RNAi deficient (CSR-1), which has been implicated in such diverse functions as organizing the holocentromeres of worm chromosomes, modulating germline chromatin, protecting the genome from foreign nucleic acid, regulating histone levels, executing RNAi, and inhibiting translation in conjunction with Pumilio proteins. CSR-1 interacts with small RNAs known as 22G-RNAs, which have complementarity to 25 % of the protein coding genes. This peculiar Argonaute is the only essential C. elegans Argonaute out of 24 family members in total. Here, we summarize the current understanding of CSR-1 functions in the worm, with emphasis on the chromatin-directed activities of this ever-intriguing Argonaute.
Waiting in the wings: what can we learn about gene co-option from the diversification of butterfly wing patterns?

PubMed

Jiggins, Chris D; Wallbank, Richard W R; Hanly, Joseph J

2017-02-05

A major challenge is to understand how conserved gene regulatory networks control the wonderful diversity of form that we see among animals and plants. Butterfly wing patterns are an excellent example of this diversity. Butterfly wings form as imaginal discs in the caterpillar and are constructed by a gene regulatory network, much of which is conserved across the holometabolous insects. Recent work in Heliconius butterflies takes advantage of genomic approaches and offers insights into how the diversification of wing patterns is overlaid onto this conserved network. WntA is a patterning morphogen that alters spatial information in the wing. Optix is a transcription factor that acts later in development to paint specific wing regions red. Both of these loci fit the paradigm of conserved protein-coding loci with diverse regulatory elements and developmental roles that have taken on novel derived functions in patterning wings. These discoveries offer insights into the 'Nymphalid Ground Plan', which offers a unifying hypothesis for pattern formation across nymphalid butterflies. These loci also represent 'hotspots' for morphological change that have been targeted repeatedly during evolution. Both convergent and divergent evolution of a great diversity of patterns is controlled by complex alleles at just a few genes. We suggest that evolutionary change has become focused on one or a few genetic loci for two reasons. First, pre-existing complex cis-regulatory loci that already interact with potentially relevant transcription factors are more likely to acquire novel functions in wing patterning. Second, the shape of wing regulatory networks may constrain evolutionary change to one or a few loci. Overall, genomic approaches that have identified wing patterning loci in these butterflies offer broad insight into how gene regulatory networks evolve to produce diversity.This article is part of the themed issue 'Evo-devo in the genomics era, and the origins of morphological diversity'. © 2016 The Author(s).

Waiting in the wings: what can we learn about gene co-option from the diversification of butterfly wing patterns?

PubMed Central

Wallbank, Richard W. R.; Hanly, Joseph J.

2017-01-01

A major challenge is to understand how conserved gene regulatory networks control the wonderful diversity of form that we see among animals and plants. Butterfly wing patterns are an excellent example of this diversity. Butterfly wings form as imaginal discs in the caterpillar and are constructed by a gene regulatory network, much of which is conserved across the holometabolous insects. Recent work in Heliconius butterflies takes advantage of genomic approaches and offers insights into how the diversification of wing patterns is overlaid onto this conserved network. WntA is a patterning morphogen that alters spatial information in the wing. Optix is a transcription factor that acts later in development to paint specific wing regions red. Both of these loci fit the paradigm of conserved protein-coding loci with diverse regulatory elements and developmental roles that have taken on novel derived functions in patterning wings. These discoveries offer insights into the ‘Nymphalid Ground Plan’, which offers a unifying hypothesis for pattern formation across nymphalid butterflies. These loci also represent ‘hotspots’ for morphological change that have been targeted repeatedly during evolution. Both convergent and divergent evolution of a great diversity of patterns is controlled by complex alleles at just a few genes. We suggest that evolutionary change has become focused on one or a few genetic loci for two reasons. First, pre-existing complex cis-regulatory loci that already interact with potentially relevant transcription factors are more likely to acquire novel functions in wing patterning. Second, the shape of wing regulatory networks may constrain evolutionary change to one or a few loci. Overall, genomic approaches that have identified wing patterning loci in these butterflies offer broad insight into how gene regulatory networks evolve to produce diversity. This article is part of the themed issue ‘Evo-devo in the genomics era, and the origins of morphological diversity’. PMID:27994126
StarScan: a web server for scanning small RNA targets from degradome sequencing data.

PubMed

Liu, Shun; Li, Jun-Hao; Wu, Jie; Zhou, Ke-Ren; Zhou, Hui; Yang, Jian-Hua; Qu, Liang-Hu

2015-07-01

Endogenous small non-coding RNAs (sRNAs), including microRNAs, PIWI-interacting RNAs and small interfering RNAs, play important gene regulatory roles in animals and plants by pairing to the protein-coding and non-coding transcripts. However, computationally assigning these various sRNAs to their regulatory target genes remains technically challenging. Recently, a high-throughput degradome sequencing method was applied to identify biologically relevant sRNA cleavage sites. In this study, an integrated web-based tool, StarScan (sRNA target Scan), was developed for scanning sRNA targets using degradome sequencing data from 20 species. Given a sRNA sequence from plants or animals, our web server performs an ultrafast and exhaustive search for potential sRNA-target interactions in annotated and unannotated genomic regions. The interactions between small RNAs and target transcripts were further evaluated using a novel tool, alignScore. A novel tool, degradomeBinomTest, was developed to quantify the abundance of degradome fragments located at the 9-11th nucleotide from the sRNA 5' end. This is the first web server for discovering potential sRNA-mediated RNA cleavage events in plants and animals, which affords mechanistic insights into the regulatory roles of sRNAs. The StarScan web server is available at http://mirlab.sysu.edu.cn/starscan/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Genomic organization of human fetal specific P-450IIIA7 (cytochrome P-450HFLa)-related gene(s) and interaction of transcriptional regulatory factor with its DNA element in the 5' flanking region.

PubMed

Itoh, S; Yanagimoto, T; Tagawa, S; Hashimoto, H; Kitamura, R; Nakajima, Y; Okochi, T; Fujimoto, S; Uchino, J; Kamataki, T

1992-03-24

P-450IIIA7 is a form of cytochrome P-450 which was isolated from human fetal livers and termed P-450HFLa. This form has been clarified to be expressed during fetal life specifically (Komori, M., Nishio, K., Kitada, M., Shiramatsu, K., Muroya, K., Soma, M., Nagashima, K. and Kamataki, T. (1990) Biochemistry 29, 4430-4433). In the present study, we isolated five independent clones which probably corresponded to the human P-450IIIA7 gene. These clones were completely sequenced, all exons, exon-intron junctions and the 5' flanking region from the cap site to-869. Although the sequences in the coding region were completely identical to P-450IIIA7, it is possible that genomic fragments sequenced in this study encode portions of other P-450IIIA7-related genes since we could not obtain a complete overlapping set of genomic clones. Within its 5' flanking sequence, the putative binding sites of several transcriptional regulatory factors existed. Among them, it was shown that a basic transcription element binding factor (BTEB) actually interacted with the 5' flanking region of this gene.
Non-coding genomic regions possessing enhancer and silencer potential are associated with healthy aging and exceptional survival.

PubMed

Kim, Sangkyu; Welsh, David A; Myers, Leann; Cherry, Katie E; Wyckoff, Jennifer; Jazwinski, S Michal

2015-02-28

We have completed a genome-wide linkage scan for healthy aging using data collected from a family study, followed by fine-mapping by association in a separate population, the first such attempt reported. The family cohort consisted of parents of age 90 or above and their children ranging in age from 50 to 80. As a quantitative measure of healthy aging, we used a frailty index, called FI34, based on 34 health and function variables. The linkage scan found a single significant linkage peak on chromosome 12. Using an independent cohort of unrelated nonagenarians, we carried out a fine-scale association mapping of the region suggestive of linkage and identified three sites associated with healthy aging. These healthy-aging sites (HASs) are located in intergenic regions at 12q13-14. HAS-1 has been previously associated with multiple diseases, and an enhancer was recently mapped and experimentally validated within the site. HAS-2 is a previously uncharacterized site possessing genomic features suggestive of enhancer activity. HAS-3 contains features associated with Polycomb repression. The HASs also contain variants associated with exceptional longevity, based on a separate analysis. Our results provide insight into functional genomic networks involving non-coding regulatory elements that are involved in healthy aging and longevity.
Non-coding genomic regions possessing enhancer and silencer potential are associated with healthy aging and exceptional survival

PubMed Central

Kim, Sangkyu; Welsh, David A.; Myers, Leann; Cherry, Katie E.; Wyckoff, Jennifer; Jazwinski, S. Michal

2015-01-01

We have completed a genome-wide linkage scan for healthy aging using data collected from a family study, followed by fine-mapping by association in a separate population, the first such attempt reported. The family cohort consisted of parents of age 90 or above and their children ranging in age from 50 to 80. As a quantitative measure of healthy aging, we used a frailty index, called FI34, based on 34 health and function variables. The linkage scan found a single significant linkage peak on chromosome 12. Using an independent cohort of unrelated nonagenarians, we carried out a fine-scale association mapping of the region suggestive of linkage and identified three sites associated with healthy aging. These healthy-aging sites (HASs) are located in intergenic regions at 12q13–14. HAS-1 has been previously associated with multiple diseases, and an enhancer was recently mapped and experimentally validated within the site. HAS-2 is a previously uncharacterized site possessing genomic features suggestive of enhancer activity. HAS-3 contains features associated with Polycomb repression. The HASs also contain variants associated with exceptional longevity, based on a separate analysis. Our results provide insight into functional genomic networks involving non-coding regulatory elements that are involved in healthy aging and longevity. PMID:25682868
Hundreds of conserved non-coding genomic regions are independently lost in mammals

PubMed Central

Hiller, Michael; Schaar, Bruce T.; Bejerano, Gill

2012-01-01

Conserved non-protein-coding DNA elements (CNEs) often encode cis-regulatory elements and are rarely lost during evolution. However, CNE losses that do occur can be associated with phenotypic changes, exemplified by pelvic spine loss in sticklebacks. Using a computational strategy to detect complete loss of CNEs in mammalian genomes while strictly controlling for artifacts, we find >600 CNEs that are independently lost in at least two mammalian lineages, including a spinal cord enhancer near GDF11. We observed several genomic regions where multiple independent CNE loss events happened; the most extreme is the DIAPH2 locus. We show that CNE losses often involve deletions and that CNE loss frequencies are non-uniform. Similar to less pleiotropic enhancers, we find that independently lost CNEs are shorter, slightly less constrained and evolutionarily younger than CNEs without detected losses. This suggests that independently lost CNEs are less pleiotropic and that pleiotropic constraints contribute to non-uniform CNE loss frequencies. We also detected 35 CNEs that are independently lost in the human lineage and in other mammals. Our study uncovers an interesting aspect of the evolution of functional DNA in mammalian genomes. Experiments are necessary to test if these independently lost CNEs are associated with parallel phenotype changes in mammals. PMID:23042682
The complete genome sequence of Corynebacterium pseudotuberculosis FRC41 isolated from a 12-year-old girl with necrotizing lymphadenitis reveals insights into gene-regulatory networks contributing to virulence

PubMed Central

2010-01-01

Background Corynebacterium pseudotuberculosis is generally regarded as an important animal pathogen that rarely infects humans. Clinical strains are occasionally recovered from human cases of lymphadenitis, such as C. pseudotuberculosis FRC41 that was isolated from the inguinal lymph node of a 12-year-old girl with necrotizing lymphadenitis. To detect potential virulence factors and corresponding gene-regulatory networks in this human isolate, the genome sequence of C. pseudotuberculosis FCR41 was determined by pyrosequencing and functionally annotated. Results Sequencing and assembly of the C. pseudotuberculosis FRC41 genome yielded a circular chromosome with a size of 2,337,913 bp and a mean G+C content of 52.2%. Specific gene sets associated with iron and zinc homeostasis were detected among the 2,110 predicted protein-coding regions and integrated into a gene-regulatory network that is linked with both the central metabolism and the oxidative stress response of FRC41. Two gene clusters encode proteins involved in the sortase-mediated polymerization of adhesive pili that can probably mediate the adherence to host tissue to facilitate additional ligand-receptor interactions and the delivery of virulence factors. The prominent virulence factors phospholipase D (Pld) and corynebacterial protease CP40 are encoded in the genome of this human isolate. The genome annotation revealed additional serine proteases, neuraminidase H, nitric oxide reductase, an invasion-associated protein, and acyl-CoA carboxylase subunits involved in mycolic acid biosynthesis as potential virulence factors. The cAMP-sensing transcription regulator GlxR plays a key role in controlling the expression of several genes contributing to virulence. Conclusion The functional data deduced from the genome sequencing and the extended knowledge of virulence factors indicate that the human isolate C. pseudotuberculosis FRC41 is equipped with a distinct gene set promoting its survival under unfavorable environmental conditions encountered in the mammalian host. PMID:21192786
Lnc2Meth: a manually curated database of regulatory relationships between long non-coding RNAs and DNA methylation associated with human disease.

PubMed

Zhi, Hui; Li, Xin; Wang, Peng; Gao, Yue; Gao, Baoqing; Zhou, Dianshuang; Zhang, Yan; Guo, Maoni; Yue, Ming; Shen, Weitao; Ning, Shangwei; Jin, Lianhong; Li, Xia

2018-01-04

Lnc2Meth (http://www.bio-bigdata.com/Lnc2Meth/), an interactive resource to identify regulatory relationships between human long non-coding RNAs (lncRNAs) and DNA methylation, is not only a manually curated collection and annotation of experimentally supported lncRNAs-DNA methylation associations but also a platform that effectively integrates tools for calculating and identifying the differentially methylated lncRNAs and protein-coding genes (PCGs) in diverse human diseases. The resource provides: (i) advanced search possibilities, e.g. retrieval of the database by searching the lncRNA symbol of interest, DNA methylation patterns, regulatory mechanisms and disease types; (ii) abundant computationally calculated DNA methylation array profiles for the lncRNAs and PCGs; (iii) the prognostic values for each hit transcript calculated from the patients clinical data; (iv) a genome browser to display the DNA methylation landscape of the lncRNA transcripts for a specific type of disease; (v) tools to re-annotate probes to lncRNA loci and identify the differential methylation patterns for lncRNAs and PCGs with user-supplied external datasets; (vi) an R package (LncDM) to complete the differentially methylated lncRNAs identification and visualization with local computers. Lnc2Meth provides a timely and valuable resource that can be applied to significantly expand our understanding of the regulatory relationships between lncRNAs and DNA methylation in various human diseases. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Mammalian genomic regulatory regions predicted by utilizing human genomics, transcriptomics, and epigenetics data

PubMed Central

Nguyen, Quan H; Tellam, Ross L; Naval-Sanchez, Marina; Porto-Neto, Laercio R; Barendse, William; Reverter, Antonio; Hayes, Benjamin; Kijas, James; Dalrymple, Brian P

2018-01-01

Abstract Genome sequences for hundreds of mammalian species are available, but an understanding of their genomic regulatory regions, which control gene expression, is only beginning. A comprehensive prediction of potential active regulatory regions is necessary to functionally study the roles of the majority of genomic variants in evolution, domestication, and animal production. We developed a computational method to predict regulatory DNA sequences (promoters, enhancers, and transcription factor binding sites) in production animals (cows and pigs) and extended its broad applicability to other mammals. The method utilizes human regulatory features identified from thousands of tissues, cell lines, and experimental assays to find homologous regions that are conserved in sequences and genome organization and are enriched for regulatory elements in the genome sequences of other mammalian species. Importantly, we developed a filtering strategy, including a machine learning classification method, to utilize a very small number of species-specific experimental datasets available to select for the likely active regulatory regions. The method finds the optimal combination of sensitivity and accuracy to unbiasedly predict regulatory regions in mammalian species. Furthermore, we demonstrated the utility of the predicted regulatory datasets in cattle for prioritizing variants associated with multiple production and climate change adaptation traits and identifying potential genome editing targets. PMID:29618048
Mammalian genomic regulatory regions predicted by utilizing human genomics, transcriptomics, and epigenetics data.

PubMed

Nguyen, Quan H; Tellam, Ross L; Naval-Sanchez, Marina; Porto-Neto, Laercio R; Barendse, William; Reverter, Antonio; Hayes, Benjamin; Kijas, James; Dalrymple, Brian P

2018-03-01

Genome sequences for hundreds of mammalian species are available, but an understanding of their genomic regulatory regions, which control gene expression, is only beginning. A comprehensive prediction of potential active regulatory regions is necessary to functionally study the roles of the majority of genomic variants in evolution, domestication, and animal production. We developed a computational method to predict regulatory DNA sequences (promoters, enhancers, and transcription factor binding sites) in production animals (cows and pigs) and extended its broad applicability to other mammals. The method utilizes human regulatory features identified from thousands of tissues, cell lines, and experimental assays to find homologous regions that are conserved in sequences and genome organization and are enriched for regulatory elements in the genome sequences of other mammalian species. Importantly, we developed a filtering strategy, including a machine learning classification method, to utilize a very small number of species-specific experimental datasets available to select for the likely active regulatory regions. The method finds the optimal combination of sensitivity and accuracy to unbiasedly predict regulatory regions in mammalian species. Furthermore, we demonstrated the utility of the predicted regulatory datasets in cattle for prioritizing variants associated with multiple production and climate change adaptation traits and identifying potential genome editing targets.
SNPs in putative regulatory regions identified by human mouse comparative sequencing and transcription factor binding site data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Banerjee, Poulabi; Bahlo, Melanie; Schwartz, Jody R.

2002-01-01

Genome wide disease association analysis using SNPs is being explored as a method for dissecting complex genetic traits and a vast number of SNPs have been generated for this purpose. As there are cost and throughput limitations of genotyping large numbers of SNPs and statistical issues regarding the large number of dependent tests on the same data set, to make association analysis practical it has been proposed that SNPs should be prioritized based on likely functional importance. The most easily identifiable functional SNPs are coding SNPs (cSNPs) and accordingly cSNPs have been screened in a number of studies. SNPs inmore » gene regulatory sequences embedded in noncoding DNA are another class of SNPs suggested for prioritization due to their predicted quantitative impact on gene expression. The main challenge in evaluating these SNPs, in contrast to cSNPs is a lack of robust algorithms and databases for recognizing regulatory sequences in noncoding DNA. Approaches that have been previously used to delineate noncoding sequences with gene regulatory activity include cross-species sequence comparisons and the search for sequences recognized by transcription factors. We combined these two methods to sift through mouse human genomic sequences to identify putative gene regulatory elements and subsequently localized SNPs within these sequences in a 1 Megabase (Mb) region of human chromosome 5q31, orthologous to mouse chromosome 11 containing the Interleukin cluster.« less
ChIPBase v2.0: decoding transcriptional regulatory networks of non-coding RNAs and protein-coding genes from ChIP-seq data.

PubMed

Zhou, Ke-Ren; Liu, Shun; Sun, Wen-Ju; Zheng, Ling-Ling; Zhou, Hui; Yang, Jian-Hua; Qu, Liang-Hu

2017-01-04

The abnormal transcriptional regulation of non-coding RNAs (ncRNAs) and protein-coding genes (PCGs) is contributed to various biological processes and linked with human diseases, but the underlying mechanisms remain elusive. In this study, we developed ChIPBase v2.0 (http://rna.sysu.edu.cn/chipbase/) to explore the transcriptional regulatory networks of ncRNAs and PCGs. ChIPBase v2.0 has been expanded with ∼10 200 curated ChIP-seq datasets, which represent about 20 times expansion when comparing to the previous released version. We identified thousands of binding motif matrices and their binding sites from ChIP-seq data of DNA-binding proteins and predicted millions of transcriptional regulatory relationships between transcription factors (TFs) and genes. We constructed 'Regulator' module to predict hundreds of TFs and histone modifications that were involved in or affected transcription of ncRNAs and PCGs. Moreover, we built a web-based tool, Co-Expression, to explore the co-expression patterns between DNA-binding proteins and various types of genes by integrating the gene expression profiles of ∼10 000 tumor samples and ∼9100 normal tissues and cell lines. ChIPBase also provides a ChIP-Function tool and a genome browser to predict functions of diverse genes and visualize various ChIP-seq data. This study will greatly expand our understanding of the transcriptional regulations of ncRNAs and PCGs. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Global transcriptome analysis reveals extensive gene remodeling, alternative splicing and differential transcription profiles in non-seed vascular plant Selaginella moellendorffii.

PubMed

Zhu, Yan; Chen, Longxian; Zhang, Chengjun; Hao, Pei; Jing, Xinyun; Li, Xuan

2017-01-25

Selaginella moellendorffii, a lycophyte, is a model plant to study the early evolution and development of vascular plants. As the first and only sequenced lycophyte to date, the genome of S. moellendorffii revealed many conserved genes and pathways, as well as specialized genes different from flowering plants. Despite the progress made, little is known about long noncoding RNAs (lncRNA) and the alternative splicing (AS) of coding genes in S. moellendorffii. Its coding gene models have not been fully validated with transcriptome data. Furthermore, it remains important to understand whether the regulatory mechanisms similar to flowering plants are used, and how they operate in a non-seed primitive vascular plant. RNA-sequencing (RNA-seq) was performed for three S. moellendorffii tissues, root, stem, and leaf, by constructing strand-specific RNA-seq libraries from RNA purified using RiboMinus isolation protocol. A total of 176 million reads (44 Gbp) were obtained from three tissue types, and were mapped to S. moellendorffii genome. By comparing with 22,285 existing gene models of S. moellendorffii, we identified 7930 high-confidence novel coding genes (a 35.6% increase), and for the first time reported 4422 lncRNAs in a lycophyte. Further, we refined 2461 (11.0%) of existing gene models, and identified 11,030 AS events (for 5957 coding genes) revealed for the first time for lycophytes. Tissue-specific gene expression with functional implication was analyzed, and 1031, 554, and 269 coding genes, and 174, 39, and 17 lncRNAs were identified in root, stem, and leaf tissues, respectively. The expression of critical genes for vascular development stages, i.e. formation of provascular cells, xylem specification and differentiation, and phloem specification and differentiation, was compared in S. moellendorffii tissues, indicating a less complex regulatory mechanism in lycophytes than in flowering plants. The results were further strengthened by the evolutionary trend of seven transcription factor families related to vascular development, which was observed among four representative species of seed and non-seed vascular plants, and nonvascular land and aquatic plants. The deep RNA-seq study of S. moellendorffii discovered extensive new gene contents, including novel coding genes, lncRNAs, AS events, and refined gene models. Compared to flowering vascular plants, S. moellendorffii displayed a less complexity in both gene structure, alternative splicing, and regulatory elements of vascular development. The study offered important insight into the evolution of vascular plants, and the regulation mechanism of vascular development in a non-seed plant.
A systemic identification approach for primary transcription start site of Arabidopsis miRNAs from multidimensional omics data.

PubMed

You, Qi; Yan, Hengyu; Liu, Yue; Yi, Xin; Zhang, Kang; Xu, Wenying; Su, Zhen

2017-05-01

The 22-nucleotide non-coding microRNAs (miRNAs) are mostly transcribed by RNA polymerase II and are similar to protein-coding genes. Unlike the clear process from stem-loop precursors to mature miRNAs, the primary transcriptional regulation of miRNA, especially in plants, still needs to be further clarified, including the original transcription start site, functional cis-elements and primary transcript structures. Due to several well-characterized transcription signals in the promoter region, we proposed a systemic approach integrating multidimensional "omics" (including genomics, transcriptomics, and epigenomics) data to improve the genome-wide identification of primary miRNA transcripts. Here, we used the model plant Arabidopsis thaliana to improve the ability to identify candidate promoter locations in intergenic miRNAs and to determine rules for identifying primary transcription start sites of miRNAs by integrating high-throughput omics data, such as the DNase I hypersensitive sites, chromatin immunoprecipitation-sequencing of polymerase II and H3K4me3, as well as high throughput transcriptomic data. As a result, 93% of refined primary transcripts could be confirmed by the primer pairs from a previous study. Cis-element and secondary structure analyses also supported the feasibility of our results. This work will contribute to the primary transcriptional regulatory analysis of miRNAs, and the conserved regulatory pattern may be a suitable miRNA characteristic in other plant species.
A Novel YY1-miR-1 Regulatory Circuit in Skeletal Myogenesis Revealed by Genome-Wide Prediction of YY1-miRNA Network

PubMed Central

Lu, Leina; Zhou, Liang; Chen, Eric Z.; Sun, Kun; Jiang, Peiyong; Wang, Lijun; Su, Xiaoxi; Sun, Hao; Wang, Huating

2012-01-01

microRNAs (miRNAs) are non-coding RNAs that regulate gene expression post-transcriptionally, and mounting evidence supports the prevalence and functional significance of their interplay with transcription factors (TFs). Here we describe the identification of a regulatory circuit between muscle miRNAs (miR-1, miR-133 and miR-206) and Yin Yang 1 (YY1), an epigenetic repressor of skeletal myogenesis in mouse. Genome-wide identification of potential down-stream targets of YY1 by combining computational prediction with expression profiling data reveals a large number of putative miRNA targets of YY1 during skeletal myoblasts differentiation into myotubes with muscle miRs ranking on top of the list. The subsequent experimental results demonstrate that YY1 indeed represses muscle miRs expression in myoblasts and the repression is mediated through multiple enhancers and recruitment of Polycomb complex to several YY1 binding sites. YY1 regulating miR-1 is functionally important for both C2C12 myogenic differentiation and injury-induced muscle regeneration. Furthermore, we demonstrate that miR-1 in turn targets YY1, thus forming a negative feedback loop. Together, these results identify a novel regulatory circuit required for skeletal myogenesis and reinforce the idea that regulatory circuitries involving miRNAs and TFs are prevalent mechanisms. PMID:22319554
Complete plastid genome sequence of Daucus carota: implications for biotechnology and phylogeny of angiosperms.

PubMed

Ruhlman, Tracey; Lee, Seung-Bum; Jansen, Robert K; Hostetler, Jessica B; Tallon, Luke J; Town, Christopher D; Daniell, Henry

2006-08-31

Carrot (Daucus carota) is a major food crop in the US and worldwide. Its capacity for storage and its lifecycle as a biennial make it an attractive species for the introduction of foreign genes, especially for oral delivery of vaccines and other therapeutic proteins. Until recently efforts to express recombinant proteins in carrot have had limited success in terms of protein accumulation in the edible tap roots. Plastid genetic engineering offers the potential to overcome this limitation, as demonstrated by the accumulation of BADH in chromoplasts of carrot taproots to confer exceedingly high levels of salt resistance. The complete plastid genome of carrot provides essential information required for genetic engineering. Additionally, the sequence data add to the rapidly growing database of plastid genomes for assessing phylogenetic relationships among angiosperms. The complete carrot plastid genome is 155,911 bp in length, with 115 unique genes and 21 duplicated genes within the IR. There are four ribosomal RNAs, 30 distinct tRNA genes and 18 intron-containing genes. Repeat analysis reveals 12 direct and 2 inverted repeats > or = 30 bp with a sequence identity > or = 90%. Phylogenetic analysis of nucleotide sequences for 61 protein-coding genes using both maximum parsimony (MP) and maximum likelihood (ML) were performed for 29 angiosperms. Phylogenies from both methods provide strong support for the monophyly of several major angiosperm clades, including monocots, eudicots, rosids, asterids, eurosids II, euasterids I, and euasterids II. The carrot plastid genome contains a number of dispersed direct and inverted repeats scattered throughout coding and non-coding regions. This is the first sequenced plastid genome of the family Apiaceae and only the second published genome sequence of the species-rich euasterid II clade. Both MP and ML trees provide very strong support (100% bootstrap) for the sister relationship of Daucus with Panax in the euasterid II clade. These results provide the best taxon sampling of complete chloroplast genomes and the strongest support yet for the sister relationship of Caryophyllales to the asterids. The availability of the complete plastid genome sequence should facilitate improved transformation efficiency and foreign gene expression in carrot through utilization of endogenous flanking sequences and regulatory elements.
Comparative genomics of 9 novel Paenibacillus larvae bacteriophages

PubMed Central

Stamereilers, Casey; LeBlanc, Lucy; Yost, Diane; Amy, Penny S.; Tsourkas, Philippos K.

2016-01-01

ABSTRACT American Foulbrood Disease, caused by the bacterium Paenibacillus larvae, is one of the most destructive diseases of the honeybee, Apis mellifera. Our group recently published the sequences of 9 new phages with the ability to infect and lyse P. larvae. Here, we characterize the genomes of these P. larvae phages, compare them to each other and to other sequenced P. larvae phages, and putatively identify protein function. The phage genomes are 38–45 kb in size and contain 68–86 genes, most of which appear to be unique to P. larvae phages. We classify P. larvae phages into 2 main clusters and one singleton based on nucleotide sequence identity. Three of the new phages show sequence similarity to other sequenced P. larvae phages, while the remaining 6 do not. We identified functions for roughly half of the P. larvae phage proteins, including structural, assembly, host lysis, DNA replication/metabolism, regulatory, and host-related functions. Structural and assembly proteins are highly conserved among our phages and are located at the start of the genome. DNA replication/metabolism, regulatory, and host-related proteins are located in the middle and end of the genome, and are not conserved, with many of these genes found in some of our phages but not others. All nine phages code for a conserved N-acetylmuramoyl-L-alanine amidase. Comparative analysis showed the phages use the “cohesive ends with 3′ overhang” DNA packaging strategy. This work is the first in-depth study of P. larvae phage genomics, and serves as a marker for future work in this area. PMID:27738559
TrawlerWeb: an online de novo motif discovery tool for next-generation sequencing datasets.

PubMed

Dang, Louis T; Tondl, Markus; Chiu, Man Ho H; Revote, Jerico; Paten, Benedict; Tano, Vincent; Tokolyi, Alex; Besse, Florence; Quaife-Ryan, Greg; Cumming, Helen; Drvodelic, Mark J; Eichenlaub, Michael P; Hallab, Jeannette C; Stolper, Julian S; Rossello, Fernando J; Bogoyevitch, Marie A; Jans, David A; Nim, Hieu T; Porrello, Enzo R; Hudson, James E; Ramialison, Mirana

2018-04-05

A strong focus of the post-genomic era is mining of the non-coding regulatory genome in order to unravel the function of regulatory elements that coordinate gene expression (Nat 489:57-74, 2012; Nat 507:462-70, 2014; Nat 507:455-61, 2014; Nat 518:317-30, 2015). Whole-genome approaches based on next-generation sequencing (NGS) have provided insight into the genomic location of regulatory elements throughout different cell types, organs and organisms. These technologies are now widespread and commonly used in laboratories from various fields of research. This highlights the need for fast and user-friendly software tools dedicated to extracting cis-regulatory information contained in these regulatory regions; for instance transcription factor binding site (TFBS) composition. Ideally, such tools should not require prior programming knowledge to ensure they are accessible for all users. We present TrawlerWeb, a web-based version of the Trawler_standalone tool (Nat Methods 4:563-5, 2007; Nat Protoc 5:323-34, 2010), to allow for the identification of enriched motifs in DNA sequences obtained from next-generation sequencing experiments in order to predict their TFBS composition. TrawlerWeb is designed for online queries with standard options common to web-based motif discovery tools. In addition, TrawlerWeb provides three unique new features: 1) TrawlerWeb allows the input of BED files directly generated from NGS experiments, 2) it automatically generates an input-matched biologically relevant background, and 3) it displays resulting conservation scores for each instance of the motif found in the input sequences, which assists the researcher in prioritising the motifs to validate experimentally. Finally, to date, this web-based version of Trawler_standalone remains the fastest online de novo motif discovery tool compared to other popular web-based software, while generating predictions with high accuracy. TrawlerWeb provides users with a fast, simple and easy-to-use web interface for de novo motif discovery. This will assist in rapidly analysing NGS datasets that are now being routinely generated. TrawlerWeb is freely available and accessible at: http://trawler.erc.monash.edu.au .
Systematic elucidation and in vivo validation of sequences enriched in hindbrain transcriptional control

PubMed Central

Burzynski, Grzegorz M.; Reed, Xylena; Taher, Leila; Stine, Zachary E.; Matsui, Takeshi; Ovcharenko, Ivan; McCallion, Andrew S.

2012-01-01

Illuminating the primary sequence encryption of enhancers is central to understanding the regulatory architecture of genomes. We have developed a machine learning approach to decipher motif patterns of hindbrain enhancers and identify 40,000 sequences in the human genome that we predict display regulatory control that includes the hindbrain. Consistent with their roles in hindbrain patterning, MEIS1, NKX6-1, as well as HOX and POU family binding motifs contributed strongly to this enhancer model. Predicted hindbrain enhancers are overrepresented at genes expressed in hindbrain and associated with nervous system development, and primarily reside in the areas of open chromatin. In addition, 77 (0.2%) of these predictions are identified as hindbrain enhancers on the VISTA Enhancer Browser, and 26,000 (60%) overlap enhancer marks (H3K4me1 or H3K27ac). To validate these putative hindbrain enhancers, we selected 55 elements distributed throughout our predictions and six low scoring controls for evaluation in a zebrafish transgenic assay. When assayed in mosaic transgenic embryos, 51/55 elements directed expression in the central nervous system. Furthermore, 30/34 (88%) predicted enhancers analyzed in stable zebrafish transgenic lines directed expression in the larval zebrafish hindbrain. Subsequent analysis of sequence fragments selected based upon motif clustering further confirmed the critical role of the motifs contributing to the classifier. Our results demonstrate the existence of a primary sequence code characteristic to hindbrain enhancers. This code can be accurately extracted using machine-learning approaches and applied successfully for de novo identification of hindbrain enhancers. This study represents a critical step toward the dissection of regulatory control in specific neuronal subtypes. PMID:22759862
An Enhancer Near ISL1 and an Ultraconserved Exon of PCBP2 areDerived from a Retroposon

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bejerano, Gill; Lowe, Craig; Ahituv, Nadav

2005-11-27

Hundreds of highly conserved distal cis-regulatory elementshave been characterized to date in vertebrate genomes1. Many thousandsmore are predicted based on comparative genomics2,3. Yet, in starkcontrast to the genes they regulate, virtually none of these regions canbe traced using sequence similarity in invertebrates, leaving theirevolutionary origin obscure. Here we show that a class of conserved,primarily non-coding regions in tetrapods originated from a novel shortinterspersed repetitive element (SINE) retroposon family that was activein Sarcopterygii (lobe-finned fishes and terrestrial vertebrates) in theSilurian at least 410 Mya4, and, remarkably, appears to be recentlyactive in the "living fossil" Indonesian coelacanth, Latimeriamenadoensis. We show that onemore » copy is a distal enhancer, located 500kbfrom the neuro-developmental gene ISL1. Several others represent new,possibly regulatory, alternatively spliced exons in the middle ofpre-existing Sarcopterygian genes. One of these is the>200bpultraconserved region5, 100 percent identical in mammals, and 80 percentidentical to the coelacanth SINE, that contains a 31aa alternativelyspliced exon of the mRNA processing gene PCBP26. These add to a growinglist of examples7 in which relics of transposable elements have acquireda function that serves their host, a process termed "exaptation"8, andprovide an origin for at least some of the highly-conservedvertebrate-specific genomic sequences recently discovered usingcomparative genomics.« less

Widespread Site-Dependent Buffering of Human Regulatory Polymorphism

PubMed Central

Kutyavin, Tanya; Stamatoyannopoulos, John A.

2012-01-01

The average individual is expected to harbor thousands of variants within non-coding genomic regions involved in gene regulation. However, it is currently not possible to interpret reliably the functional consequences of genetic variation within any given transcription factor recognition sequence. To address this, we comprehensively analyzed heritable genome-wide binding patterns of a major sequence-specific regulator (CTCF) in relation to genetic variability in binding site sequences across a multi-generational pedigree. We localized and quantified CTCF occupancy by ChIP-seq in 12 related and unrelated individuals spanning three generations, followed by comprehensive targeted resequencing of the entire CTCF–binding landscape across all individuals. We identified hundreds of variants with reproducible quantitative effects on CTCF occupancy (both positive and negative). While these effects paralleled protein–DNA recognition energetics when averaged, they were extensively buffered by striking local context dependencies. In the significant majority of cases buffering was complete, resulting in silent variants spanning every position within the DNA recognition interface irrespective of level of binding energy or evolutionary constraint. The prevalence of complex partial or complete buffering effects severely constrained the ability to predict reliably the impact of variation within any given binding site instance. Surprisingly, 40% of variants that increased CTCF occupancy occurred at positions of human–chimp divergence, challenging the expectation that the vast majority of functional regulatory variants should be deleterious. Our results suggest that, even in the presence of “perfect” genetic information afforded by resequencing and parallel studies in multiple related individuals, genomic site-specific prediction of the consequences of individual variation in regulatory DNA will require systematic coupling with empirical functional genomic measurements. PMID:22457641
Genomic Correlates of Relationship QTL Involved in Fore- versus Hind Limb Divergence in Mice

PubMed Central

Pavlicev, Mihaela; Wagner, Günter P.; Noonan, James P.; Hallgrímsson, Benedikt; Cheverud, James M.

2013-01-01

Divergence of serially homologous elements of organisms is a common evolutionary pattern contributing to increased phenotypic complexity. Here, we study the genomic intervals affecting the variational independence of fore- and hind limb traits within an experimental mouse population. We use an advanced intercross of inbred mouse strains to map the loci associated with the degree of autonomy between fore- and hind limb long bone lengths (loci affecting the relationship between traits, relationship quantitative trait loci [rQTL]). These loci have been proposed to interact locally with the products of pleiotropic genes, thereby freeing the local trait from the variational constraint due to pleiotropic mutations. Using the known polymorphisms (single nucleotide polymorphisms [SNPs]) between the parental strains, we characterized and compared the genomic regions in which the rQTL, as well as their interaction partners (intQTL), reside. We find that these two classes of QTL intervals harbor different kinds of molecular variation. SNPs in rQTL intervals more frequently reside in limb-specific cis-regulatory regions than SNPs in intQTL intervals. The intQTL loci modified by the rQTL, in contrast, show the signature of protein-coding variation. This result is consistent with the widely accepted view that protein-coding mutations have broader pleiotropic effects than cis-regulatory polymorphisms. For both types of QTL intervals, the underlying candidate genes are enriched for genes involved in protein binding. This finding suggests that rQTL effects are caused by local interactions among the products of the causal genes harbored in rQTL and intQTL intervals. This is the first study to systematically document the population-level molecular variation underlying the evolution of character individuation. PMID:24065733
Developmental roles of 21 Drosophila transcription factors are determined by quantitative differences in binding to an overlapping set of thousands of genomic regions

DOE Office of Scientific and Technical Information (OSTI.GOV)

MacArthur, Stewart; Li, Xiao-Yong; Li, Jingyi

2009-05-15

BACKGROUND: We previously established that six sequence-specific transcription factors that initiate anterior/posterior patterning in Drosophila bind to overlapping sets of thousands of genomic regions in blastoderm embryos. While regions bound at high levels include known and probable functional targets, more poorly bound regions are preferentially associated with housekeeping genes and/or genes not transcribed in the blastoderm, and are frequently found in protein coding sequences or in less conserved non-coding DNA, suggesting that many are likely non-functional. RESULTS: Here we show that an additional 15 transcription factors that regulate other aspects of embryo patterning show a similar quantitative continuum of functionmore » and binding to thousands of genomic regions in vivo. Collectively, the 21 regulators show a surprisingly high overlap in the regions they bind given that they belong to 11 DNA binding domain families, specify distinct developmental fates, and can act via different cis-regulatory modules. We demonstrate, however, that quantitative differences in relative levels of binding to shared targets correlate with the known biological and transcriptional regulatory specificities of these factors. CONCLUSIONS: It is likely that the overlap in binding of biochemically and functionally unrelated transcription factors arises from the high concentrations of these proteins in nuclei, which, coupled with their broad DNA binding specificities, directs them to regions of open chromatin. We suggest that most animal transcription factors will be found to show a similar broad overlapping pattern of binding in vivo, with specificity achieved by modulating the amount, rather than the identity, of bound factor.« less
Identification and Characterization of miRNA Transcriptome in Asiatic Cotton (Gossypium arboreum) Using High Throughput Sequencing

PubMed Central

Farooq, Muhammad; Mansoor, Shahid; Guo, Hui; Amin, Imran; Chee, Peng W.; Azim, M. Kamran; Paterson, Andrew H.

2017-01-01

MicroRNAs (miRNAs) are small 20–24nt molecules that have been well studied over the past decade due to their important regulatory roles in different cellular processes. The mature sequences are more conserved across vast phylogenetic scales than their precursors and some are conserved within entire kingdoms, hence, their loci and function can be predicted by homology searches. Different studies have been performed to elucidate miRNAs using de novo prediction methods but due to complex regulatory mechanisms or false positive in silico predictions, not all of them express in reality and sometimes computationally predicted mature transcripts differ from the actual expressed ones. With the availability of a complete genome sequence of Gossypium arboreum, it is important to annotate the genome for both coding and non-coding regions using high confidence transcript evidence, for this cotton species that is highly resistant to various biotic and abiotic stresses. Here we have analyzed the small RNA transcriptome of G. arboreum leaves and provided genome annotation of miRNAs with evidence from miRNA/miRNA∗ transcripts. A total of 446 miRNAs clustered into 224 miRNA families were found, among which 48 families are conserved in other plants and 176 are novel. Four short RNA libraries were used to shortlist best predictions based on high reads per million. The size, origin, copy numbers and transcript depth of all miRNAs along with their isoforms and targets has been reported. The highest gene copy number was observed for gar-miR7504 followed by gar-miR166, gar-miR8771, gar-miR156, and gar-miR7484. Altogether, 1274 target genes were found in G. arboreum that are enriched for 216 KEGG pathways. The resultant genomic annotations are provided in UCSC, BED format. PMID:28663752
Fine mapping of regulatory loci for mammalian gene expression using radiation hybrids

PubMed Central

Park, Christopher C; Ahn, Sangtae; Bloom, Joshua S; Lin, Andy; Wang, Richard T; Wu, Tongtong; Sekar, Aswin; Khan, Arshad H; Farr, Christine J; Lusis, Aldons J; Leahy, Richard M; Lange, Kenneth; Smith, Desmond J

2010-01-01

We mapped regulatory loci for nearly all protein-coding genes in mammals using comparative genomic hybridization and expression array measurements from a panel of mouse–hamster radiation hybrid cell lines. The large number of breaks in the mouse chromosomes and the dense genotyping of the panel allowed extremely sharp mapping of loci. As the regulatory loci result from extra gene dosage, we call them copy number expression quantitative trait loci, or ceQTLs. The −2log10P support interval for the ceQTLs was <150 kb, containing an average of <2–3 genes. We identified 29,769 trans ceQTLs with −log10P > 4, including 13 hotspots each regulating >100 genes in trans. Further, this work identifies 2,761 trans ceQTLs harboring no known genes, and provides evidence for a mode of gene expression autoregulation specific to the X chromosome. PMID:18362883
Genomic analysis reveals major determinants of cis-regulatory variation in Capsella grandiflora

PubMed Central

Steige, Kim A.; Laenen, Benjamin; Reimegård, Johan; Slotte, Tanja

2017-01-01

Understanding the causes of cis-regulatory variation is a long-standing aim in evolutionary biology. Although cis-regulatory variation has long been considered important for adaptation, we still have a limited understanding of the selective importance and genomic determinants of standing cis-regulatory variation. To address these questions, we studied the prevalence, genomic determinants, and selective forces shaping cis-regulatory variation in the outcrossing plant Capsella grandiflora. We first identified a set of 1,010 genes with common cis-regulatory variation using analyses of allele-specific expression (ASE). Population genomic analyses of whole-genome sequences from 32 individuals showed that genes with common cis-regulatory variation (i) are under weaker purifying selection and (ii) undergo less frequent positive selection than other genes. We further identified genomic determinants of cis-regulatory variation. Gene body methylation (gbM) was a major factor constraining cis-regulatory variation, whereas presence of nearby transposable elements (TEs) and tissue specificity of expression increased the odds of ASE. Our results suggest that most common cis-regulatory variation in C. grandiflora is under weak purifying selection, and that gene-specific functional constraints are more important for the maintenance of cis-regulatory variation than genome-scale variation in the intensity of selection. Our results agree with previous findings that suggest TE silencing affects nearby gene expression, and provide evidence for a link between gbM and cis-regulatory constraint, possibly reflecting greater dosage sensitivity of body-methylated genes. Given the extensive conservation of gbM in flowering plants, this suggests that gbM could be an important predictor of cis-regulatory variation in a wide range of plant species. PMID:28096395
Non-coding RNAs and plant male sterility: current knowledge and future prospects.

PubMed

Mishra, Ankita; Bohra, Abhishek

2018-02-01

Latest outcomes assign functional role to non-coding (nc) RNA molecules in regulatory networks that confer male sterility to plants. Male sterility in plants offers great opportunity for improving crop performance through application of hybrid technology. In this respect, cytoplasmic male sterility (CMS) and sterility induced by photoperiod (PGMS)/temperature (TGMS) have greatly facilitated development of high-yielding hybrids in crops. Participation of non-coding (nc) RNA molecules in plant reproductive development is increasingly becoming evident. Recent breakthroughs in rice definitively associate ncRNAs with PGMS and TGMS. In case of CMS, the exact mechanism through which the mitochondrial ORFs exert influence on the development of male gametophyte remains obscure in several crops. High-throughput sequencing has enabled genome-wide discovery and validation of these regulatory molecules and their target genes, describing their potential roles performed in relation to CMS. Discovery of ncRNA localized in plant mtDNA with its possible implication in CMS induction is intriguing in this respect. Still, conclusive evidences linking ncRNA with CMS phenotypes are currently unavailable, demanding complementing genetic approaches like transgenics to substantiate the preliminary findings. Here, we review the recent literature on the contribution of ncRNAs in conferring male sterility to plants, with an emphasis on microRNAs. Also, we present a perspective on improved understanding about ncRNA-mediated regulatory pathways that control male sterility in plants. A refined understanding of plant male sterility would strengthen crop hybrid industry to deliver hybrids with improved performance.
Many human accelerated regions are developmental enhancers

PubMed Central

Capra, John A.; Erwin, Genevieve D.; McKinsey, Gabriel; Rubenstein, John L. R.; Pollard, Katherine S.

2013-01-01

The genetic changes underlying the dramatic differences in form and function between humans and other primates are largely unknown, although it is clear that gene regulatory changes play an important role. To identify regulatory sequences with potentially human-specific functions, we and others used comparative genomics to find non-coding regions conserved across mammals that have acquired many sequence changes in humans since divergence from chimpanzees. These regions are good candidates for performing human-specific regulatory functions. Here, we analysed the DNA sequence, evolutionary history, histone modifications, chromatin state and transcription factor (TF) binding sites of a combined set of 2649 non-coding human accelerated regions (ncHARs) and predicted that at least 30% of them function as developmental enhancers. We prioritized the predicted ncHAR enhancers using analysis of TF binding site gain and loss, along with the functional annotations and expression patterns of nearby genes. We then tested both the human and chimpanzee sequence for 29 ncHARs in transgenic mice, and found 24 novel developmental enhancers active in both species, 17 of which had very consistent patterns of activity in specific embryonic tissues. Of these ncHAR enhancers, five drove expression patterns suggestive of different activity for the human and chimpanzee sequence at embryonic day 11.5. The changes to human non-coding DNA in these ncHAR enhancers may modify the complex patterns of gene expression necessary for proper development in a human-specific manner and are thus promising candidates for understanding the genetic basis of human-specific biology. PMID:24218637
The African coelacanth genome provides insights into tetrapod evolution.

PubMed

Amemiya, Chris T; Alföldi, Jessica; Lee, Alison P; Fan, Shaohua; Philippe, Hervé; Maccallum, Iain; Braasch, Ingo; Manousaki, Tereza; Schneider, Igor; Rohner, Nicolas; Organ, Chris; Chalopin, Domitille; Smith, Jeramiah J; Robinson, Mark; Dorrington, Rosemary A; Gerdol, Marco; Aken, Bronwen; Biscotti, Maria Assunta; Barucca, Marco; Baurain, Denis; Berlin, Aaron M; Blatch, Gregory L; Buonocore, Francesco; Burmester, Thorsten; Campbell, Michael S; Canapa, Adriana; Cannon, John P; Christoffels, Alan; De Moro, Gianluca; Edkins, Adrienne L; Fan, Lin; Fausto, Anna Maria; Feiner, Nathalie; Forconi, Mariko; Gamieldien, Junaid; Gnerre, Sante; Gnirke, Andreas; Goldstone, Jared V; Haerty, Wilfried; Hahn, Mark E; Hesse, Uljana; Hoffmann, Steve; Johnson, Jeremy; Karchner, Sibel I; Kuraku, Shigehiro; Lara, Marcia; Levin, Joshua Z; Litman, Gary W; Mauceli, Evan; Miyake, Tsutomu; Mueller, M Gail; Nelson, David R; Nitsche, Anne; Olmo, Ettore; Ota, Tatsuya; Pallavicini, Alberto; Panji, Sumir; Picone, Barbara; Ponting, Chris P; Prohaska, Sonja J; Przybylski, Dariusz; Saha, Nil Ratan; Ravi, Vydianathan; Ribeiro, Filipe J; Sauka-Spengler, Tatjana; Scapigliati, Giuseppe; Searle, Stephen M J; Sharpe, Ted; Simakov, Oleg; Stadler, Peter F; Stegeman, John J; Sumiyama, Kenta; Tabbaa, Diana; Tafer, Hakim; Turner-Maier, Jason; van Heusden, Peter; White, Simon; Williams, Louise; Yandell, Mark; Brinkmann, Henner; Volff, Jean-Nicolas; Tabin, Clifford J; Shubin, Neil; Schartl, Manfred; Jaffe, David B; Postlethwait, John H; Venkatesh, Byrappa; Di Palma, Federica; Lander, Eric S; Meyer, Axel; Lindblad-Toh, Kerstin

2013-04-18

The discovery of a living coelacanth specimen in 1938 was remarkable, as this lineage of lobe-finned fish was thought to have become extinct 70 million years ago. The modern coelacanth looks remarkably similar to many of its ancient relatives, and its evolutionary proximity to our own fish ancestors provides a glimpse of the fish that first walked on land. Here we report the genome sequence of the African coelacanth, Latimeria chalumnae. Through a phylogenomic analysis, we conclude that the lungfish, and not the coelacanth, is the closest living relative of tetrapods. Coelacanth protein-coding genes are significantly more slowly evolving than those of tetrapods, unlike other genomic features. Analyses of changes in genes and regulatory elements during the vertebrate adaptation to land highlight genes involved in immunity, nitrogen excretion and the development of fins, tail, ear, eye, brain and olfaction. Functional assays of enhancers involved in the fin-to-limb transition and in the emergence of extra-embryonic tissues show the importance of the coelacanth genome as a blueprint for understanding tetrapod evolution.
Genome-wide analysis of the DNA-binding with one zinc finger (Dof) transcription factor family in bananas.

PubMed

Dong, Chen; Hu, Huigang; Xie, Jianghui

2016-12-01

DNA-binding with one finger (Dof) domain proteins are a multigene family of plant-specific transcription factors involved in numerous aspects of plant growth and development. In this study, we report a genome-wide search for Musa acuminata Dof (MaDof) genes and their expression profiles at different developmental stages and in response to various abiotic stresses. In addition, a complete overview of the Dof gene family in bananas is presented, including the gene structures, chromosomal locations, cis-regulatory elements, conserved protein domains, and phylogenetic inferences. Based on the genome-wide analysis, we identified 74 full-length protein-coding MaDof genes unevenly distributed on 11 chromosomes. Phylogenetic analysis with Dof members from diverse plant species showed that MaDof genes can be classified into four subgroups (StDof I, II, III, and IV). The detailed genomic information of the MaDof gene homologs in the present study provides opportunities for functional analyses to unravel the exact role of the genes in plant growth and development.
α satellite DNA variation and function of the human centromere

PubMed Central

Sullivan, Lori L.; Chew, Kimberline

2017-01-01

ABSTRACT Genomic variation is a source of functional diversity that is typically studied in genic and non-coding regulatory regions. However, the extent of variation within noncoding portions of the human genome, particularly highly repetitive regions, and the functional consequences are not well understood. Satellite DNA, including α satellite DNA found at human centromeres, comprises up to 10% of the genome, but is difficult to study because its repetitive nature hinders contiguous sequence assemblies. We recently described variation within α satellite DNA that affects centromere function. On human chromosome 17 (HSA17), we showed that size and sequence polymorphisms within primary array D17Z1 are associated with chromosome aneuploidy and defective centromere architecture. However, HSA17 can counteract this instability by assembling the centromere at a second, “backup” array lacking variation. Here, we discuss our findings in a broader context of human centromere assembly, and highlight areas of future study to uncover links between genomic and epigenetic features of human centromeres. PMID:28406740
In cell mutational interference mapping experiment (in cell MIME) identifies the 5' polyadenylation signal as a dual regulator of HIV-1 genomic RNA production and packaging.

PubMed

Smyth, Redmond P; Smith, Maureen R; Jousset, Anne-Caroline; Despons, Laurence; Laumond, Géraldine; Decoville, Thomas; Cattenoz, Pierre; Moog, Christiane; Jossinet, Fabrice; Mougel, Marylène; Paillart, Jean-Christophe; von Kleist, Max; Marquet, Roland

2018-05-18

Non-coding RNA regulatory elements are important for viral replication, making them promising targets for therapeutic intervention. However, regulatory RNA is challenging to detect and characterise using classical structure-function assays. Here, we present in cell Mutational Interference Mapping Experiment (in cell MIME) as a way to define RNA regulatory landscapes at single nucleotide resolution under native conditions. In cell MIME is based on (i) random mutation of an RNA target, (ii) expression of mutated RNA in cells, (iii) physical separation of RNA into functional and non-functional populations, and (iv) high-throughput sequencing to identify mutations affecting function. We used in cell MIME to define RNA elements within the 5' region of the HIV-1 genomic RNA (gRNA) that are important for viral replication in cells. We identified three distinct RNA motifs controlling intracellular gRNA production, and two distinct motifs required for gRNA packaging into virions. Our analysis reveals the 73AAUAAA78 polyadenylation motif within the 5' PolyA domain as a dual regulator of gRNA production and gRNA packaging, and demonstrates that a functional polyadenylation signal is required for viral packaging even though it negatively affects gRNA production.
In cell mutational interference mapping experiment (in cell MIME) identifies the 5′ polyadenylation signal as a dual regulator of HIV-1 genomic RNA production and packaging

PubMed Central

Smith, Maureen R; Jousset, Anne-Caroline; Despons, Laurence; Laumond, Géraldine; Decoville, Thomas; Cattenoz, Pierre; Moog, Christiane; Jossinet, Fabrice; Mougel, Marylène; Paillart, Jean-Christophe

2018-01-01

Abstract Non-coding RNA regulatory elements are important for viral replication, making them promising targets for therapeutic intervention. However, regulatory RNA is challenging to detect and characterise using classical structure-function assays. Here, we present in cell Mutational Interference Mapping Experiment (in cell MIME) as a way to define RNA regulatory landscapes at single nucleotide resolution under native conditions. In cell MIME is based on (i) random mutation of an RNA target, (ii) expression of mutated RNA in cells, (iii) physical separation of RNA into functional and non-functional populations, and (iv) high-throughput sequencing to identify mutations affecting function. We used in cell MIME to define RNA elements within the 5′ region of the HIV-1 genomic RNA (gRNA) that are important for viral replication in cells. We identified three distinct RNA motifs controlling intracellular gRNA production, and two distinct motifs required for gRNA packaging into virions. Our analysis reveals the 73AAUAAA78 polyadenylation motif within the 5′ PolyA domain as a dual regulator of gRNA production and gRNA packaging, and demonstrates that a functional polyadenylation signal is required for viral packaging even though it negatively affects gRNA production. PMID:29514260
Functional and topological characteristics of mammalian regulatory domains

PubMed Central

Symmons, Orsolya; Uslu, Veli Vural; Tsujimura, Taro; Ruf, Sandra; Nassari, Sonya; Schwarzer, Wibke; Ettwiller, Laurence; Spitz, François

2014-01-01

Long-range regulatory interactions play an important role in shaping gene-expression programs. However, the genomic features that organize these activities are still poorly characterized. We conducted a large operational analysis to chart the distribution of gene regulatory activities along the mouse genome, using hundreds of insertions of a regulatory sensor. We found that enhancers distribute their activities along broad regions and not in a gene-centric manner, defining large regulatory domains. Remarkably, these domains correlate strongly with the recently described TADs, which partition the genome into distinct self-interacting blocks. Different features, including specific repeats and CTCF-binding sites, correlate with the transition zones separating regulatory domains, and may help to further organize promiscuously distributed regulatory influences within large domains. These findings support a model of genomic organization where TADs confine regulatory activities to specific but large regulatory domains, contributing to the establishment of specific gene expression profiles. PMID:24398455
Genetic variation and gene expression across multiple tissues and developmental stages in a non-human primate

PubMed Central

Jasinska, Anna J.; Zelaya, Ivette; Service, Susan K.; Peterson, Christine B.; Cantor, Rita M.; Choi, Oi-Wa; DeYoung, Joseph; Eskin, Eleazar; Fairbanks, Lynn A.; Fears, Scott; Furterer, Allison E.; Huang, Yu S.; Ramensky, Vasily; Schmitt, Christopher A.; Svardal, Hannes; Jorgensen, Matthew J.; Kaplan, Jay R.; Villar, Diego; Aken, Bronwen L.; Flicek, Paul; Nag, Rishi; Wong, Emily S.; Blangero, John; Dyer, Thomas D.; Bogomolov, Marina; Benjamini, Yoav; Weinstock, George M.; Dewar, Ken; Sabatti, Chiara; Wilson, Richard K.; Jentsch, J. David; Warren, Wesley; Coppola, Giovanni; Woods, Roger P.; Freimer, Nelson B.

2017-01-01

By analyzing multi-tissue gene expression and genome-wide genetic variation data in samples from a vervet monkey pedigree, we generated a transcriptome resource and produced the first catalogue of expression quantitative trait loci (eQTLs) in a non-human primate model. This catalogue contains more genome-wide significant eQTLs, per sample, than comparable human resources, and reveals sex and age-related expression patterns. Findings include a master regulatory locus that likely plays a role in immune function, and a locus regulating hippocampal long non-coding RNAs (lncRNAs), whose expression correlates with hippocampal volume. This resource will facilitate genetic investigation of quantitative traits, including brain and behavioral phenotypes relevant to neuropsychiatric disorders. PMID:29083405
MSDB: A Comprehensive Database of Simple Sequence Repeats

PubMed Central

Avvaru, Akshay Kumar; Saxena, Saketh; Mishra, Rakesh Kumar

2017-01-01

Abstract Microsatellites, also known as Simple Sequence Repeats (SSRs), are short tandem repeats of 1–6 nt motifs present in all genomes, particularly eukaryotes. Besides their usefulness as genome markers, SSRs have been shown to perform important regulatory functions, and variations in their length at coding regions are linked to several disorders in humans. Microsatellites show a taxon-specific enrichment in eukaryotic genomes, and some may be functional. MSDB (Microsatellite Database) is a collection of >650 million SSRs from 6,893 species including Bacteria, Archaea, Fungi, Plants, and Animals. This database is by far the most exhaustive resource to access and analyze SSR data of multiple species. In addition to exploring data in a customizable tabular format, users can view and compare the data of multiple species simultaneously using our interactive plotting system. MSDB is developed using the Django framework and MySQL. It is freely available at http://tdb.ccmb.res.in/msdb. PMID:28854643
Murine endogenous retroviruses

PubMed Central

2016-01-01

Up to 10% of the mouse genome is comprised of endogenous retrovirus (ERV) sequences, and most represent the remains of ancient germ line infections. Our knowledge of the three distinct classes of ERVs is inversely correlated with their copy number, and their characterization has benefited from the availability of divergent wild mouse species and subspecies, and from ongoing analysis of the Mus genome sequence. In contrast to human ERVs, which are nearly all extinct, active mouse ERVs can still be found in all three ERV classes. The distribution and diversity of ERVs has been shaped by host-virus interactions over the course of evolution, but ERVs have also been pivotal in shaping the mouse genome by altering host genes through insertional mutagenesis, by adding novel regulatory and coding sequences, and by their co-option by host cells as retroviral resistance genes. We review mechanisms by which an adaptive coexistence has evolved. (Part of a Multi-author Review) PMID:18818872
Genome-wide transcription start site profiling in biofilm-grown Burkholderia cenocepacia J2315.

PubMed

Sass, Andrea M; Van Acker, Heleen; Förstner, Konrad U; Van Nieuwerburgh, Filip; Deforce, Dieter; Vogel, Jörg; Coenye, Tom

2015-10-13

Burkholderia cenocepacia is a soil-dwelling Gram-negative Betaproteobacterium with an important role as opportunistic pathogen in humans. Infections with B. cenocepacia are very difficult to treat due to their high intrinsic resistance to most antibiotics. Biofilm formation further adds to their antibiotic resistance. B. cenocepacia harbours a large, multi-replicon genome with a high GC-content, the reference genome of strain J2315 includes 7374 annotated genes. This study aims to annotate transcription start sites and identify novel transcripts on a whole genome scale. RNA extracted from B. cenocepacia J2315 biofilms was analysed by differential RNA-sequencing and the resulting dataset compared to data derived from conventional, global RNA-sequencing. Transcription start sites were annotated and further analysed according to their position relative to annotated genes. Four thousand ten transcription start sites were mapped over the whole B. cenocepacia genome and the primary transcription start site of 2089 genes expressed in B. cenocepacia biofilms were defined. For 64 genes a start codon alternative to the annotated one was proposed. Substantial antisense transcription for 105 genes and two novel protein coding sequences were identified. The distribution of internal transcription start sites can be used to identify genomic islands in B. cenocepacia. A potassium pump strongly induced only under biofilm conditions was found and 15 non-coding small RNAs highly expressed in biofilms were discovered. Mapping transcription start sites across the B. cenocepacia genome added relevant information to the J2315 annotation. Genes and novel regulatory RNAs putatively involved in B. cenocepacia biofilm formation were identified. These findings will help in understanding regulation of B. cenocepacia biofilm formation.
Evidence of translation efficiency adaptation of the coding regions of the bacteriophage lambda.

PubMed

Goz, Eli; Mioduser, Oriah; Diament, Alon; Tuller, Tamir

2017-08-01

Deciphering the way gene expression regulatory aspects are encoded in viral genomes is a challenging mission with ramifications related to all biomedical disciplines. Here, we aimed to understand how the evolution shapes the bacteriophage lambda genes by performing a high resolution analysis of ribosomal profiling data and gene expression related synonymous/silent information encoded in bacteriophage coding regions.We demonstrated evidence of selection for distinct compositions of synonymous codons in early and late viral genes related to the adaptation of translation efficiency to different bacteriophage developmental stages. Specifically, we showed that evolution of viral coding regions is driven, among others, by selection for codons with higher decoding rates; during the initial/progressive stages of infection the decoding rates in early/late genes were found to be superior to those in late/early genes, respectively. Moreover, we argued that selection for translation efficiency could be partially explained by adaptation to Escherichia coli tRNA pool and the fact that it can change during the bacteriophage life cycle.An analysis of additional aspects related to the expression of viral genes, such as mRNA folding and more complex/longer regulatory signals in the coding regions, is also reported. The reported conclusions are likely to be relevant also to additional viruses. © The Author 2017. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Evidence of function for conserved noncoding sequences in Arabidopsis thaliana.

PubMed

Spangler, Jacob B; Subramaniam, Sabarinath; Freeling, Michael; Feltus, F Alex

2012-01-01

• Whole genome duplication events provide a lineage with a large reservoir of genes that can be molded by evolutionary forces into phenotypes that fit alternative environments. A well-studied whole genome duplication, the α-event, occurred in an ancestor of the model plant Arabidopsis thaliana. Retained segments of the α-event have been defined in recent years in the form of duplicate protein coding sequences (α-pairs) and associated conserved noncoding DNA sequences (CNSs). Our aim was to identify any association between CNSs and α-pair co-functionality at the gene expression level. • Here, we tested for correlation between CNS counts and α-pair co-expression and expression intensity across nine expression datasets: aerial tissue, flowers, leaves, roots, rosettes, seedlings, seeds, shoots and whole plants. • We provide evidence for a putative regulatory role of the CNSs. The association of CNSs with α-pair co-expression and expression intensity varied by gene function, subgene position and the presence of transcription factor binding motifs. A range of possible CNS regulatory mechanisms, including intron-mediated enhancement, messenger RNA fold stability and transcriptional regulation, are discussed. • This study provides a framework to understand how CNS motifs are involved in the maintenance of gene expression after a whole genome duplication event. © 2011 The Authors. New Phytologist © 2011 New Phytologist Trust.

Meta-analysis and genome-wide interpretation of genetic susceptibility to drug addiction

PubMed Central

2011-01-01

Background Classical genetic studies provide strong evidence for heritable contributions to susceptibility to developing dependence on addictive substances. Candidate gene and genome-wide association studies (GWAS) have sought genes, chromosomal regions and allelic variants likely to contribute to susceptibility to drug addiction. Results Here, we performed a meta-analysis of addiction candidate gene association studies and GWAS to investigate possible functional mechanisms associated with addiction susceptibility. From meta-data retrieved from 212 publications on candidate gene association studies and 5 GWAS reports, we linked a total of 843 haplotypes to addiction susceptibility. We mapped the SNPs in these haplotypes to functional and regulatory elements in the genome and estimated the magnitude of the contributions of different molecular mechanisms to their effects on addiction susceptibility. In addition to SNPs in coding regions, these data suggest that haplotypes in gene regulatory regions may also contribute to addiction susceptibility. When we compared the lists of genes identified by association studies and those identified by molecular biological studies of drug-regulated genes, we observed significantly higher participation in the same gene interaction networks than expected by chance, despite little overlap between the two gene lists. Conclusions These results appear to offer new insights into the genetic factors underlying drug addiction. PMID:21999673
Nucleotide sequence of the Kaposi sarcoma-associated herpesvirus (HHV8)

PubMed Central

Russo, James J.; Bohenzky, Roy A.; Chien, Ming-Cheng; Chen, Jing; Yan, Ming; Maddalena, Dawn; Parry, J. Preston; Peruzzi, Daniela; Edelman, Isidore S.; Chang, Yuan; Moore, Patrick S.

1996-01-01

The genome of the Kaposi sarcoma-associated herpesvirus (KSHV or HHV8) was mapped with cosmid and phage genomic libraries from the BC-1 cell line. Its nucleotide sequence was determined except for a 3-kb region at the right end of the genome that was refractory to cloning. The BC-1 KSHV genome consists of a 140.5-kb-long unique coding region flanked by multiple G+C-rich 801-bp terminal repeat sequences. A genomic duplication that apparently arose in the parental tumor is present in this cell culture-derived strain. At least 81 ORFs, including 66 with homology to herpesvirus saimiri ORFs, and 5 internal repeat regions are present in the long unique region. The virus encodes homologs to complement-binding proteins, three cytokines (two macrophage inflammatory proteins and interleukin 6), dihydrofolate reductase, bcl-2, interferon regulatory factors, interleukin 8 receptor, neural cell adhesion molecule-like adhesin, and a D-type cyclin, as well as viral structural and metabolic proteins. Terminal repeat analysis of virus DNA from a KS lesion suggests a monoclonal expansion of KSHV in the KS tumor. PMID:8962146
The Genome of Deep-Sea Vent Chemolithoautotroph Thiomicrospira crunogena XCL-2

PubMed Central

Scott, Kathleen M; Sievert, Stefan M; Abril, Fereniki N; Ball, Lois A; Barrett, Chantell J; Blake, Rodrigo A; Boller, Amanda J; Chain, Patrick S. G; Clark, Justine A; Davis, Carisa R; Detter, Chris; Do, Kimberly F; Dobrinski, Kimberly P; Faza, Brandon I; Fitzpatrick, Kelly A; Freyermuth, Sharyn K; Harmer, Tara L; Hauser, Loren J; Hügler, Michael; Kerfeld, Cheryl A; Klotz, Martin G; Kong, William W; Land, Miriam; Lapidus, Alla; Larimer, Frank W; Longo, Dana L; Lucas, Susan; Malfatti, Stephanie A; Massey, Steven E; Martin, Darlene D; McCuddin, Zoe; Meyer, Folker; Moore, Jessica L; Ocampo, Luis H; Paul, John H; Paulsen, Ian T; Reep, Douglas K; Ren, Qinghu; Ross, Rachel L; Sato, Priscila Y; Thomas, Phaedra; Tinkham, Lance E; Zeruth, Gary T

2006-01-01

Presented here is the complete genome sequence of Thiomicrospira crunogena XCL-2, representative of ubiquitous chemolithoautotrophic sulfur-oxidizing bacteria isolated from deep-sea hydrothermal vents. This gammaproteobacterium has a single chromosome (2,427,734 base pairs), and its genome illustrates many of the adaptations that have enabled it to thrive at vents globally. It has 14 methyl-accepting chemotaxis protein genes, including four that may assist in positioning it in the redoxcline. A relative abundance of coding sequences (CDSs) encoding regulatory proteins likely control the expression of genes encoding carboxysomes, multiple dissolved inorganic nitrogen and phosphate transporters, as well as a phosphonate operon, which provide this species with a variety of options for acquiring these substrates from the environment. Thiom. crunogena XCL-2 is unusual among obligate sulfur-oxidizing bacteria in relying on the Sox system for the oxidation of reduced sulfur compounds. The genome has characteristics consistent with an obligately chemolithoautotrophic lifestyle, including few transporters predicted to have organic allocrits, and Calvin-Benson-Bassham cycle CDSs scattered throughout the genome. PMID:17105352
Non-coding cancer driver candidates identified with a sample- and position-specific model of the somatic mutation rate

PubMed Central

Juul, Malene; Bertl, Johanna; Guo, Qianyun; Nielsen, Morten Muhlig; Świtnicki, Michał; Hornshøj, Henrik; Madsen, Tobias; Hobolth, Asger; Pedersen, Jakob Skou

2017-01-01

Non-coding mutations may drive cancer development. Statistical detection of non-coding driver regions is challenged by a varying mutation rate and uncertainty of functional impact. Here, we develop a statistically founded non-coding driver-detection method, ncdDetect, which includes sample-specific mutational signatures, long-range mutation rate variation, and position-specific impact measures. Using ncdDetect, we screened non-coding regulatory regions of protein-coding genes across a pan-cancer set of whole-genomes (n = 505), which top-ranked known drivers and identified new candidates. For individual candidates, presence of non-coding mutations associates with altered expression or decreased patient survival across an independent pan-cancer sample set (n = 5454). This includes an antigen-presenting gene (CD1A), where 5’UTR mutations correlate significantly with decreased survival in melanoma. Additionally, mutations in a base-excision-repair gene (SMUG1) correlate with a C-to-T mutational-signature. Overall, we find that a rich model of mutational heterogeneity facilitates non-coding driver identification and integrative analysis points to candidates of potential clinical relevance. DOI: http://dx.doi.org/10.7554/eLife.21778.001 PMID:28362259
Transcriptome Analysis of an Insecticide Resistant Housefly Strain: Insights about SNPs and Regulatory Elements in Cytochrome P450 Genes

PubMed Central

Asp, Torben; Kristensen, Michael

2016-01-01

Background Insecticide resistance in the housefly, Musca domestica, has been investigated for more than 60 years. It will enter a new era after the recent publication of the housefly genome and the development of multiple next generation sequencing technologies. The genetic background of the xenobiotic response can now be investigated in greater detail. Here, we investigate the 454-pyrosequencing transcriptome of the spinosad-resistant 791spin strain in relation to the housefly genome with focus on P450 genes. Results The de novo assembly of clean reads gave 35,834 contigs consisting of 21,780 sequences of the spinosad resistant strain. The 3,648 sequences were annotated with an enzyme code EC number and were mapped to 124 KEGG pathways with metabolic processes as most highly represented pathway. One hundred and twenty contigs were annotated as P450s covering 44 different P450 genes of housefly. Eight differentially expressed P450s genes were identified and investigated for SNPs, CpG islands and common regulatory motifs in promoter and coding regions. Functional annotation clustering of metabolic related genes and motif analysis of P450s revealed their association with epigenetic, transcription and gene expression related functions. The sequence variation analysis resulted in 12 SNPs and eight of them found in cyp6d1. There is variation in location, size and frequency of CpG islands and specific motifs were also identified in these P450s. Moreover, identified motifs were associated to GO terms and transcription factors using bioinformatic tools. Conclusion Transcriptome data of a spinosad resistant strain provide together with genome data fundamental support for future research to understand evolution of resistance in houseflies. Here, we report for the first time the SNPs, CpG islands and common regulatory motifs in differentially expressed P450s. Taken together our findings will serve as a stepping stone to advance understanding of the mechanism and role of P450s in xenobiotic detoxification. PMID:27019205
Transcription Factors Bind Thousands of Active and InactiveRegions in the Drosophila Blastoderm

DOE Office of Scientific and Technical Information (OSTI.GOV)

Li, Xiao-Yong; MacArthur, Stewart; Bourgon, Richard

2008-01-10

Identifying the genomic regions bound by sequence-specific regulatory factors is central both to deciphering the complex DNA cis-regulatory code that controls transcription in metazoans and to determining the range of genes that shape animal morphogenesis. Here, we use whole-genome tiling arrays to map sequences bound in Drosophila melanogaster embryos by the six maternal and gap transcription factors that initiate anterior-posterior patterning. We find that these sequence-specific DNA binding proteins bind with quantitatively different specificities to highly overlapping sets of several thousand genomic regions in blastoderm embryos. Specific high- and moderate-affinity in vitro recognition sequences for each factor are enriched inmore » bound regions. This enrichment, however, is not sufficient to explain the pattern of binding in vivo and varies in a context-dependent manner, demonstrating that higher-order rules must govern targeting of transcription factors. The more highly bound regions include all of the over forty well-characterized enhancers known to respond to these factors as well as several hundred putative new cis-regulatory modules clustered near developmental regulators and other genes with patterned expression at this stage of embryogenesis. The new targets include most of the microRNAs (miRNAs) transcribed in the blastoderm, as well as all major zygotically transcribed dorsal-ventral patterning genes, whose expression we show to be quantitatively modulated by anterior-posterior factors. In addition to these highly bound regions, there are several thousand regions that are reproducibly bound at lower levels. However, these poorly bound regions are, collectively, far more distant from genes transcribed in the blastoderm than highly bound regions; are preferentially found in protein-coding sequences; and are less conserved than highly bound regions. Together these observations suggest that many of these poorly-bound regions are not involved in early-embryonic transcriptional regulation, and a significant proportion may be nonfunctional. Surprisingly, for five of the six factors, their recognition sites are not unambiguously more constrained evolutionarily than the immediate flanking DNA, even in more highly bound and presumably functional regions, indicating that comparative DNA sequence analysis is limited in its ability to identify functional transcription factor targets.« less
SigmoID: a user-friendly tool for improving bacterial genome annotation through analysis of transcription control signals

PubMed Central

Damienikan, Aliaksandr U.

2016-01-01

The majority of bacterial genome annotations are currently automated and based on a ‘gene by gene’ approach. Regulatory signals and operon structures are rarely taken into account which often results in incomplete and even incorrect gene function assignments. Here we present SigmoID, a cross-platform (OS X, Linux and Windows) open-source application aiming at simplifying the identification of transcription regulatory sites (promoters, transcription factor binding sites and terminators) in bacterial genomes and providing assistance in correcting annotations in accordance with regulatory information. SigmoID combines a user-friendly graphical interface to well known command line tools with a genome browser for visualising regulatory elements in genomic context. Integrated access to online databases with regulatory information (RegPrecise and RegulonDB) and web-based search engines speeds up genome analysis and simplifies correction of genome annotation. We demonstrate some features of SigmoID by constructing a series of regulatory protein binding site profiles for two groups of bacteria: Soft Rot Enterobacteriaceae (Pectobacterium and Dickeya spp.) and Pseudomonas spp. Furthermore, we inferred over 900 transcription factor binding sites and alternative sigma factor promoters in the annotated genome of Pectobacterium atrosepticum. These regulatory signals control putative transcription units covering about 40% of the P. atrosepticum chromosome. Reviewing the annotation in cases where it didn’t fit with regulatory information allowed us to correct product and gene names for over 300 loci. PMID:27257541
Aquaculture genomics, genetics and breeding in the United States: current status, challenges, and priorities for future research.

PubMed

Abdelrahman, Hisham; ElHady, Mohamed; Alcivar-Warren, Acacia; Allen, Standish; Al-Tobasei, Rafet; Bao, Lisui; Beck, Ben; Blackburn, Harvey; Bosworth, Brian; Buchanan, John; Chappell, Jesse; Daniels, William; Dong, Sheng; Dunham, Rex; Durland, Evan; Elaswad, Ahmed; Gomez-Chiarri, Marta; Gosh, Kamal; Guo, Ximing; Hackett, Perry; Hanson, Terry; Hedgecock, Dennis; Howard, Tiffany; Holland, Leigh; Jackson, Molly; Jin, Yulin; Khalil, Karim; Kocher, Thomas; Leeds, Tim; Li, Ning; Lindsey, Lauren; Liu, Shikai; Liu, Zhanjiang; Martin, Kyle; Novriadi, Romi; Odin, Ramjie; Palti, Yniv; Peatman, Eric; Proestou, Dina; Qin, Guyu; Reading, Benjamin; Rexroad, Caird; Roberts, Steven; Salem, Mohamed; Severin, Andrew; Shi, Huitong; Shoemaker, Craig; Stiles, Sheila; Tan, Suxu; Tang, Kathy F J; Thongda, Wilawan; Tiersch, Terrence; Tomasso, Joseph; Prabowo, Wendy Tri; Vallejo, Roger; van der Steen, Hein; Vo, Khoi; Waldbieser, Geoff; Wang, Hanping; Wang, Xiaozhu; Xiang, Jianhai; Yang, Yujia; Yant, Roger; Yuan, Zihao; Zeng, Qifan; Zhou, Tao

2017-02-20

Advancing the production efficiency and profitability of aquaculture is dependent upon the ability to utilize a diverse array of genetic resources. The ultimate goals of aquaculture genomics, genetics and breeding research are to enhance aquaculture production efficiency, sustainability, product quality, and profitability in support of the commercial sector and for the benefit of consumers. In order to achieve these goals, it is important to understand the genomic structure and organization of aquaculture species, and their genomic and phenomic variations, as well as the genetic basis of traits and their interrelationships. In addition, it is also important to understand the mechanisms of regulation and evolutionary conservation at the levels of genome, transcriptome, proteome, epigenome, and systems biology. With genomic information and information between the genomes and phenomes, technologies for marker/causal mutation-assisted selection, genome selection, and genome editing can be developed for applications in aquaculture. A set of genomic tools and resources must be made available including reference genome sequences and their annotations (including coding and non-coding regulatory elements), genome-wide polymorphic markers, efficient genotyping platforms, high-density and high-resolution linkage maps, and transcriptome resources including non-coding transcripts. Genomic and genetic control of important performance and production traits, such as disease resistance, feed conversion efficiency, growth rate, processing yield, behaviour, reproductive characteristics, and tolerance to environmental stressors like low dissolved oxygen, high or low water temperature and salinity, must be understood. QTL need to be identified, validated across strains, lines and populations, and their mechanisms of control understood. Causal gene(s) need to be identified. Genetic and epigenetic regulation of important aquaculture traits need to be determined, and technologies for marker-assisted selection, causal gene/mutation-assisted selection, genome selection, and genome editing using CRISPR and other technologies must be developed, demonstrated with applicability, and application to aquaculture industries.Major progress has been made in aquaculture genomics for dozens of fish and shellfish species including the development of genetic linkage maps, physical maps, microarrays, single nucleotide polymorphism (SNP) arrays, transcriptome databases and various stages of genome reference sequences. This paper provides a general review of the current status, challenges and future research needs of aquaculture genomics, genetics, and breeding, with a focus on major aquaculture species in the United States: catfish, rainbow trout, Atlantic salmon, tilapia, striped bass, oysters, and shrimp. While the overall research priorities and the practical goals are similar across various aquaculture species, the current status in each species should dictate the next priority areas within the species. This paper is an output of the USDA Workshop for Aquaculture Genomics, Genetics, and Breeding held in late March 2016 in Auburn, Alabama, with participants from all parts of the United States.
MicroRNA-mediated networks underlie immune response regulation in papillary thyroid carcinoma

NASA Astrophysics Data System (ADS)

Huang, Chen-Tsung; Oyang, Yen-Jen; Huang, Hsuan-Cheng; Juan, Hsueh-Fen

2014-09-01

Papillary thyroid carcinoma (PTC) is a common endocrine malignancy with low death rate but increased incidence and recurrence in recent years. MicroRNAs (miRNAs) are small non-coding RNAs with diverse regulatory capacities in eukaryotes and have been frequently implied in human cancer. Despite current progress, however, a panoramic overview concerning miRNA regulatory networks in PTC is still lacking. Here, we analyzed the expression datasets of PTC from The Cancer Genome Atlas (TCGA) Data Portal and demonstrate for the first time that immune responses are significantly enriched and under specific regulation in the direct miRNA-target network among distinctive PTC variants to different extents. Additionally, considering the unconventional properties of miRNAs, we explore the protein-coding competing endogenous RNA (ceRNA) and the modulatory networks in PTC and unexpectedly disclose concerted regulation of immune responses from these networks. Interestingly, miRNAs from these conventional and unconventional networks share general similarities and differences but tend to be disparate as regulatory activities increase, coordinately tuning the immune responses that in part account for PTC tumor biology. Together, our systematic results uncover the intensive regulation of immune responses underlain by miRNA-mediated networks in PTC, opening up new avenues in the management of thyroid cancer.
Recurrent and functional regulatory mutations in breast cancer.

PubMed

Rheinbay, Esther; Parasuraman, Prasanna; Grimsby, Jonna; Tiao, Grace; Engreitz, Jesse M; Kim, Jaegil; Lawrence, Michael S; Taylor-Weiner, Amaro; Rodriguez-Cuevas, Sergio; Rosenberg, Mara; Hess, Julian; Stewart, Chip; Maruvka, Yosef E; Stojanov, Petar; Cortes, Maria L; Seepo, Sara; Cibulskis, Carrie; Tracy, Adam; Pugh, Trevor J; Lee, Jesse; Zheng, Zongli; Ellisen, Leif W; Iafrate, A John; Boehm, Jesse S; Gabriel, Stacey B; Meyerson, Matthew; Golub, Todd R; Baselga, Jose; Hidalgo-Miranda, Alfredo; Shioda, Toshi; Bernards, Andre; Lander, Eric S; Getz, Gad

2017-07-06

Genomic analysis of tumours has led to the identification of hundreds of cancer genes on the basis of the presence of mutations in protein-coding regions. By contrast, much less is known about cancer-causing mutations in non-coding regions. Here we perform deep sequencing in 360 primary breast cancers and develop computational methods to identify significantly mutated promoters. Clear signals are found in the promoters of three genes. FOXA1, a known driver of hormone-receptor positive breast cancer, harbours a mutational hotspot in its promoter leading to overexpression through increased E2F binding. RMRP and NEAT1, two non-coding RNA genes, carry mutations that affect protein binding to their promoters and alter expression levels. Our study shows that promoter regions harbour recurrent mutations in cancer with functional consequences and that the mutations occur at similar frequencies as in coding regions. Power analyses indicate that more such regions remain to be discovered through deep sequencing of adequately sized cohorts of patients.
Genome Maps, a new generation genome browser.

PubMed

Medina, Ignacio; Salavert, Francisco; Sanchez, Rubén; de Maria, Alejandro; Alonso, Roberto; Escobar, Pablo; Bleda, Marta; Dopazo, Joaquín

2013-07-01

Genome browsers have gained importance as more genomes and related genomic information become available. However, the increase of information brought about by new generation sequencing technologies is, at the same time, causing a subtle but continuous decrease in the efficiency of conventional genome browsers. Here, we present Genome Maps, a genome browser that implements an innovative model of data transfer and management. The program uses highly efficient technologies from the new HTML5 standard, such as scalable vector graphics, that optimize workloads at both server and client sides and ensure future scalability. Thus, data management and representation are entirely carried out by the browser, without the need of any Java Applet, Flash or other plug-in technology installation. Relevant biological data on genes, transcripts, exons, regulatory features, single-nucleotide polymorphisms, karyotype and so forth, are imported from web services and are available as tracks. In addition, several DAS servers are already included in Genome Maps. As a novelty, this web-based genome browser allows the local upload of huge genomic data files (e.g. VCF or BAM) that can be dynamically visualized in real time at the client side, thus facilitating the management of medical data affected by privacy restrictions. Finally, Genome Maps can easily be integrated in any web application by including only a few lines of code. Genome Maps is an open source collaborative initiative available in the GitHub repository (https://github.com/compbio-bigdata-viz/genome-maps). Genome Maps is available at: http://www.genomemaps.org.
Genome Maps, a new generation genome browser

PubMed Central

Medina, Ignacio; Salavert, Francisco; Sanchez, Rubén; de Maria, Alejandro; Alonso, Roberto; Escobar, Pablo; Bleda, Marta; Dopazo, Joaquín

2013-01-01

Genome browsers have gained importance as more genomes and related genomic information become available. However, the increase of information brought about by new generation sequencing technologies is, at the same time, causing a subtle but continuous decrease in the efficiency of conventional genome browsers. Here, we present Genome Maps, a genome browser that implements an innovative model of data transfer and management. The program uses highly efficient technologies from the new HTML5 standard, such as scalable vector graphics, that optimize workloads at both server and client sides and ensure future scalability. Thus, data management and representation are entirely carried out by the browser, without the need of any Java Applet, Flash or other plug-in technology installation. Relevant biological data on genes, transcripts, exons, regulatory features, single-nucleotide polymorphisms, karyotype and so forth, are imported from web services and are available as tracks. In addition, several DAS servers are already included in Genome Maps. As a novelty, this web-based genome browser allows the local upload of huge genomic data files (e.g. VCF or BAM) that can be dynamically visualized in real time at the client side, thus facilitating the management of medical data affected by privacy restrictions. Finally, Genome Maps can easily be integrated in any web application by including only a few lines of code. Genome Maps is an open source collaborative initiative available in the GitHub repository (https://github.com/compbio-bigdata-viz/genome-maps). Genome Maps is available at: http://www.genomemaps.org. PMID:23748955
The circulating non-coding RNA landscape for biomarker research: lessons and prospects from cardiovascular diseases.

PubMed

St Ecedil Pień, Ewa; Costa, Marina C; Kurc, Szczepan; Drożdż, Anna; Cortez-Dias, Nuno; Enguita, Francisco J

2018-06-07

Pervasive transcription of the human genome is responsible for the production of a myriad of non-coding RNA molecules (ncRNAs) some of them with regulatory functions. The pivotal role of ncRNAs in cardiovascular biology has been unveiled in the last decade, starting from the characterization of the involvement of micro-RNAs in cardiovascular development and function, and followed by the use of circulating ncRNAs as biomarkers of cardiovascular diseases. The human non-coding secretome is composed by several RNA species that circulate in body fluids and could be used as biomarkers for diagnosis and outcome prediction. In cardiovascular diseases, secreted ncRNAs have been described as biomarkers of several conditions including myocardial infarction, cardiac failure, and atrial fibrillation. Among circulating ncRNAs, micro-RNAs (miRNAs), long noncoding RNAs (lncRNAs) and circular RNAs (circRNAs) have been proposed as biomarkers in different cardiovascular diseases. In comparison with standard biomarkers, the biochemical nature of ncRNAs offers better stability and flexible storage conditions of the samples, and increased sensitivity and specificity. In this review we describe the current trends and future prospects of the use of the ncRNA secretome components as biomarkers of cardiovascular diseases, including the opening questions related with their secretion mechanisms and regulatory actions.
A new age in functional genomics using CRISPR/Cas9 in arrayed library screening.

PubMed

Agrotis, Alexander; Ketteler, Robin

2015-01-01

CRISPR technology has rapidly changed the face of biological research, such that precise genome editing has now become routine for many labs within several years of its initial development. What makes CRISPR/Cas9 so revolutionary is the ability to target a protein (Cas9) to an exact genomic locus, through designing a specific short complementary nucleotide sequence, that together with a common scaffold sequence, constitute the guide RNA bridging the protein and the DNA. Wild-type Cas9 cleaves both DNA strands at its target sequence, but this protein can also be modified to exert many other functions. For instance, by attaching an activation domain to catalytically inactive Cas9 and targeting a promoter region, it is possible to stimulate the expression of a specific endogenous gene. In principle, any genomic region can be targeted, and recent efforts have successfully generated pooled guide RNA libraries for coding and regulatory regions of human, mouse and Drosophila genomes with high coverage, thus facilitating functional phenotypic screening. In this review, we will highlight recent developments in the area of CRISPR-based functional genomics and discuss potential future directions, with a special focus on mammalian cell systems and arrayed library screening.
Intergenic disease-associated regions are abundant in novel transcripts.

PubMed

Bartonicek, N; Clark, M B; Quek, X C; Torpy, J R; Pritchard, A L; Maag, J L V; Gloss, B S; Crawford, J; Taft, R J; Hayward, N K; Montgomery, G W; Mattick, J S; Mercer, T R; Dinger, M E

2017-12-28

Genotyping of large populations through genome-wide association studies (GWAS) has successfully identified many genomic variants associated with traits or disease risk. Unexpectedly, a large proportion of GWAS single nucleotide polymorphisms (SNPs) and associated haplotype blocks are in intronic and intergenic regions, hindering their functional evaluation. While some of these risk-susceptibility regions encompass cis-regulatory sites, their transcriptional potential has never been systematically explored. To detect rare tissue-specific expression, we employed the transcript-enrichment method CaptureSeq on 21 human tissues to identify 1775 multi-exonic transcripts from 561 intronic and intergenic haploblocks associated with 392 traits and diseases, covering 73.9 Mb (2.2%) of the human genome. We show that a large proportion (85%) of disease-associated haploblocks express novel multi-exonic non-coding transcripts that are tissue-specific and enriched for GWAS SNPs as well as epigenetic markers of active transcription and enhancer activity. Similarly, we captured transcriptomes from 13 melanomas, targeting nine melanoma-associated haploblocks, and characterized 31 novel melanoma-specific transcripts that include fusion proteins, novel exons and non-coding RNAs, one-third of which showed allelically imbalanced expression. This resource of previously unreported transcripts in disease-associated regions ( http://gwas-captureseq.dingerlab.org ) should provide an important starting point for the translational community in search of novel biomarkers, disease mechanisms, and drug targets.
Complete plastid genome sequence of Daucus carota: Implications for biotechnology and phylogeny of angiosperms

PubMed Central

Ruhlman, Tracey; Lee, Seung-Bum; Jansen, Robert K; Hostetler, Jessica B; Tallon, Luke J; Town, Christopher D; Daniell, Henry

2006-01-01

Background Carrot (Daucus carota) is a major food crop in the US and worldwide. Its capacity for storage and its lifecycle as a biennial make it an attractive species for the introduction of foreign genes, especially for oral delivery of vaccines and other therapeutic proteins. Until recently efforts to express recombinant proteins in carrot have had limited success in terms of protein accumulation in the edible tap roots. Plastid genetic engineering offers the potential to overcome this limitation, as demonstrated by the accumulation of BADH in chromoplasts of carrot taproots to confer exceedingly high levels of salt resistance. The complete plastid genome of carrot provides essential information required for genetic engineering. Additionally, the sequence data add to the rapidly growing database of plastid genomes for assessing phylogenetic relationships among angiosperms. Results The complete carrot plastid genome is 155,911 bp in length, with 115 unique genes and 21 duplicated genes within the IR. There are four ribosomal RNAs, 30 distinct tRNA genes and 18 intron-containing genes. Repeat analysis reveals 12 direct and 2 inverted repeats ≥ 30 bp with a sequence identity ≥ 90%. Phylogenetic analysis of nucleotide sequences for 61 protein-coding genes using both maximum parsimony (MP) and maximum likelihood (ML) were performed for 29 angiosperms. Phylogenies from both methods provide strong support for the monophyly of several major angiosperm clades, including monocots, eudicots, rosids, asterids, eurosids II, euasterids I, and euasterids II. Conclusion The carrot plastid genome contains a number of dispersed direct and inverted repeats scattered throughout coding and non-coding regions. This is the first sequenced plastid genome of the family Apiaceae and only the second published genome sequence of the species-rich euasterid II clade. Both MP and ML trees provide very strong support (100% bootstrap) for the sister relationship of Daucus with Panax in the euasterid II clade. These results provide the best taxon sampling of complete chloroplast genomes and the strongest support yet for the sister relationship of Caryophyllales to the asterids. The availability of the complete plastid genome sequence should facilitate improved transformation efficiency and foreign gene expression in carrot through utilization of endogenous flanking sequences and regulatory elements. PMID:16945140
Genome-wide identification and characterization of putative lncRNAs in the diamondback moth, Plutella xylostella (L.).

PubMed

Wang, Yue; Xu, Tingting; He, Weiyi; Shen, Xiujing; Zhao, Qian; Bai, Jianlin; You, Minsheng

2018-01-01

Long non-coding RNAs (lncRNAs) are of particular interest because of their contributions to many biological processes. Here, we present the genome-wide identification and characterization of putative lncRNAs in a global insect pest, Plutella xylostella. A total of 8096 lncRNAs were identified and classified into three groups. The average length of exons in lncRNAs was longer than that in coding genes and the GC content was lower than that in mRNAs. Most lncRNAs were flanked by canonical splice sites, similar to mRNAs. Expression profiling identified 114 differentially expressed lncRNAs during the DBM development and found that majority were temporally specific. While the biological functions of lncRNAs remain uncharacterized, many are microRNA precursors or competing endogenous RNAs involved in micro-RNA regulatory pathways. This work provides a valuable resource for further studies on molecular bases for development of DBM and lay the foundation for discovery of lncRNA functions in P. xylostella. Copyright © 2017 Elsevier Inc. All rights reserved.
Generating and repairing genetically programmed DNA breaks during immunoglobulin class switch recombination

PubMed Central

Nicolas, Laura; Cols, Montserrat; Choi, Jee Eun; Chaudhuri, Jayanta; Vuong, Bao

2018-01-01

Adaptive immune responses require the generation of a diverse repertoire of immunoglobulins (Igs) that can recognize and neutralize a seemingly infinite number of antigens. V(D)J recombination creates the primary Ig repertoire, which subsequently is modified by somatic hypermutation (SHM) and class switch recombination (CSR). SHM promotes Ig affinity maturation whereas CSR alters the effector function of the Ig. Both SHM and CSR require activation-induced cytidine deaminase (AID) to produce dU:dG mismatches in the Ig locus that are transformed into untemplated mutations in variable coding segments during SHM or DNA double-strand breaks (DSBs) in switch regions during CSR. Within the Ig locus, DNA repair pathways are diverted from their canonical role in maintaining genomic integrity to permit AID-directed mutation and deletion of gene coding segments. Recently identified proteins, genes, and regulatory networks have provided new insights into the temporally and spatially coordinated molecular interactions that control the formation and repair of DSBs within the Ig locus. Unravelling the genetic program that allows B cells to selectively alter the Ig coding regions while protecting non-Ig genes from DNA damage advances our understanding of the molecular processes that maintain genomic integrity as well as humoral immunity. PMID:29744038
Identification of differentially expressed small non-coding RNAs in the legume endosymbiont Sinorhizobium meliloti by comparative genomics

PubMed Central

del Val, Coral; Rivas, Elena; Torres-Quesada, Omar; Toro, Nicolás; Jiménez-Zurdo, José I

2007-01-01

Bacterial small non-coding RNAs (sRNAs) are being recognized as novel widespread regulators of gene expression in response to environmental signals. Here, we present the first search for sRNA-encoding genes in the nitrogen-fixing endosymbiont Sinorhizobium meliloti, performed by a genome-wide computational analysis of its intergenic regions. Comparative sequence data from eight related α-proteobacteria were obtained, and the interspecies pairwise alignments were scored with the programs eQRNA and RNAz as complementary predictive tools to identify conserved and stable secondary structures corresponding to putative non-coding RNAs. Northern experiments confirmed that eight of the predicted loci, selected among the original 32 candidates as most probable sRNA genes, expressed small transcripts. This result supports the combined use of eQRNA and RNAz as a robust strategy to identify novel sRNAs in bacteria. Furthermore, seven of the transcripts accumulated differentially in free-living and symbiotic conditions. Experimental mapping of the 5′-ends of the detected transcripts revealed that their encoding genes are organized in autonomous transcription units with recognizable promoter and, in most cases, termination signatures. These findings suggest novel regulatory functions for sRNAs related to the interactions of α-proteobacteria with their eukaryotic hosts. PMID:17971083
Towards Breaking the Histone Code – Bayesian Graphical Models for Histone Modifications

PubMed Central

Mitra, Riten; Müller, Peter; Liang, Shoudan; Xu, Yanxun; Ji, Yuan

2013-01-01

Background Histones are proteins that wrap DNA around in small spherical structures called nucleosomes. Histone modifications (HMs) refer to the post-translational modifications to the histone tails. At a particular genomic locus, each of these HMs can either be present or absent, and the combinatory patterns of the presence or absence of multiple HMs, or the ‘histone codes,’ are believed to co-regulate important biological processes. We aim to use raw data on HM markers at different genomic loci to (1) decode the complex biological network of HMs in a single region and (2) demonstrate how the HM networks differ in different regulatory regions. We suggest that these differences in network attributes form a significant link between histones and genomic functions. Methods and Results We develop a powerful graphical model under Bayesian paradigm. Posterior inference is fully probabilistic, allowing us to compute the probabilities of distinct dependence patterns of the HMs using graphs. Furthermore, our model-based framework allows for easy but important extensions for inference on differential networks under various conditions, such as the different annotations of the genomic locations (e.g., promoters versus insulators). We applied these models to ChIP-Seq data based on CD4+ T lymphocytes. The results confirmed many existing findings and provided a unified tool to generate various promising hypotheses. Differential network analyses revealed new insights on co-regulation of HMs of transcriptional activities in different genomic regions. Conclusions The use of Bayesian graphical models and borrowing strength across different conditions provide high power to infer histone networks and their differences. PMID:23748248

RegPrecise 3.0--a resource for genome-scale exploration of transcriptional regulation in bacteria.

PubMed

Novichkov, Pavel S; Kazakov, Alexey E; Ravcheev, Dmitry A; Leyn, Semen A; Kovaleva, Galina Y; Sutormin, Roman A; Kazanov, Marat D; Riehl, William; Arkin, Adam P; Dubchak, Inna; Rodionov, Dmitry A

2013-11-01

Genome-scale prediction of gene regulation and reconstruction of transcriptional regulatory networks in prokaryotes is one of the critical tasks of modern genomics. Bacteria from different taxonomic groups, whose lifestyles and natural environments are substantially different, possess highly diverged transcriptional regulatory networks. The comparative genomics approaches are useful for in silico reconstruction of bacterial regulons and networks operated by both transcription factors (TFs) and RNA regulatory elements (riboswitches). RegPrecise (http://regprecise.lbl.gov) is a web resource for collection, visualization and analysis of transcriptional regulons reconstructed by comparative genomics. We significantly expanded a reference collection of manually curated regulons we introduced earlier. RegPrecise 3.0 provides access to inferred regulatory interactions organized by phylogenetic, structural and functional properties. Taxonomy-specific collections include 781 TF regulogs inferred in more than 160 genomes representing 14 taxonomic groups of Bacteria. TF-specific collections include regulogs for a selected subset of 40 TFs reconstructed across more than 30 taxonomic lineages. Novel collections of regulons operated by RNA regulatory elements (riboswitches) include near 400 regulogs inferred in 24 bacterial lineages. RegPrecise 3.0 provides four classifications of the reference regulons implemented as controlled vocabularies: 55 TF protein families; 43 RNA motif families; ~150 biological processes or metabolic pathways; and ~200 effectors or environmental signals. Genome-wide visualization of regulatory networks and metabolic pathways covered by the reference regulons are available for all studied genomes. A separate section of RegPrecise 3.0 contains draft regulatory networks in 640 genomes obtained by an conservative propagation of the reference regulons to closely related genomes. RegPrecise 3.0 gives access to the transcriptional regulons reconstructed in bacterial genomes. Analytical capabilities include exploration of: regulon content, structure and function; TF binding site motifs; conservation and variations in genome-wide regulatory networks across all taxonomic groups of Bacteria. RegPrecise 3.0 was selected as a core resource on transcriptional regulation of the Department of Energy Systems Biology Knowledgebase, an emerging software and data environment designed to enable researchers to collaboratively generate, test and share new hypotheses about gene and protein functions, perform large-scale analyses, and model interactions in microbes, plants, and their communities.
Genomic Editing of Non-Coding RNA Genes with CRISPR/Cas9 Ushers in a Potential Novel Approach to Study and Treat Schizophrenia

PubMed Central

Zhuo, Chuanjun; Hou, Weihong; Hu, Lirong; Lin, Chongguang; Chen, Ce; Lin, Xiaodong

2017-01-01

Schizophrenia is a genetically related mental illness, in which the majority of genetic alterations occur in the non-coding regions of the human genome. In the past decade, a growing number of regulatory non-coding RNAs (ncRNAs) including microRNAs (miRNAs) and long non-coding RNAs (lncRNAs) have been identified to be strongly associated with schizophrenia. However, the studies of these ncRNAs in the pathophysiology of schizophrenia and the reverting of their genetic defects in restoration of the normal phenotype have been hampered by insufficient technology to manipulate these ncRNA genes effectively as well as a lack of appropriate animal models. Most recently, a revolutionary gene editing technology known as Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)/CRISPR-associated nuclease 9 (Cas9; CRISPR/Cas9) has been developed that enable researchers to overcome these challenges. In this review article, we mainly focus on the schizophrenia-related ncRNAs and the use of CRISPR/Cas9-mediated editing on the non-coding regions of the genomic DNA in proving causal relationship between the genetic defects and the pathophysiology of schizophrenia. We subsequently discuss the potential of translating this advanced technology into a clinical therapy for schizophrenia, although the CRISPR/Cas9 technology is currently still in its infancy and immature to put into use in the treatment of diseases. Furthermore, we suggest strategies to accelerate the pace from the bench to the bedside. This review describes the application of the powerful and feasible CRISPR/Cas9 technology to manipulate schizophrenia-associated ncRNA genes. This technology could help researchers tackle this complex health problem and perhaps other genetically related mental disorders due to the overlapping genetic alterations of schizophrenia with other mental illnesses. PMID:28217082
Genomics in the land of regulatory science.

PubMed

Tong, Weida; Ostroff, Stephen; Blais, Burton; Silva, Primal; Dubuc, Martine; Healy, Marion; Slikker, William

2015-06-01

Genomics science has played a major role in the generation of new knowledge in the basic research arena, and currently question arises as to its potential to support regulatory processes. However, the integration of genomics in the regulatory decision-making process requires rigorous assessment and would benefit from consensus amongst international partners and research communities. To that end, the Global Coalition for Regulatory Science Research (GCRSR) hosted the fourth Global Summit on Regulatory Science (GSRS2014) to discuss the role of genomics in regulatory decision making, with a specific emphasis on applications in food safety and medical product development. Challenges and issues were discussed in the context of developing an international consensus for objective criteria in the analysis, interpretation and reporting of genomics data with an emphasis on transparency, traceability and "fitness for purpose" for the intended application. It was recognized that there is a need for a global path in the establishment of a regulatory bioinformatics framework for the development of transparent, reliable, reproducible and auditable processes in the management of food and medical product safety risks. It was also recognized that training is an important mechanism in achieving internationally consistent outcomes. GSRS2014 provided an effective venue for regulators andresearchers to meet, discuss common issues, and develop collaborations to address the challenges posed by the application of genomics to regulatory science, with the ultimate goal of wisely integrating novel technical innovations into regulatory decision-making. Published by Elsevier Inc.
The genome sequence of Sea-Island cotton (Gossypium barbadense) provides insights into the allopolyploidization and development of superior spinnable fibres

PubMed Central

Yuan, Daojun; Tang, Zhonghui; Wang, Maojun; Gao, Wenhui; Tu, Lili; Jin, Xin; Chen, Lingling; He, Yonghui; Zhang, Lin; Zhu, Longfu; Li, Yang; Liang, Qiqi; Lin, Zhongxu; Yang, Xiyan; Liu, Nian; Jin, Shuangxia; Lei, Yang; Ding, Yuanhao; Li, Guoliang; Ruan, Xiaoan; Ruan, Yijun; Zhang, Xianlong

2015-01-01

Gossypium hirsutum contributes the most production of cotton fibre, but G. barbadense is valued for its better comprehensive resistance and superior fibre properties. However, the allotetraploid genome of G. barbadense has not been comprehensively analysed. Here we present a high-quality assembly of the 2.57 gigabase genome of G. barbadense, including 80,876 protein-coding genes. The double-sized genome of the A (or At) (1.50 Gb) against D (or Dt) (853 Mb) primarily resulted from the expansion of Gypsy elements, including Peabody and Retrosat2 subclades in the Del clade, and the Athila subclade in the Athila/Tat clade. Substantial gene expansion and contraction were observed and rich homoeologous gene pairs with biased expression patterns were identified, suggesting abundant gene sub-functionalization occurred by allopolyploidization. More specifically, the CesA gene family has adapted differentially temporal expression patterns, suggesting an integrated regulatory mechanism of CesA genes from At and Dt subgenomes for the primary and secondary cellulose biosynthesis of cotton fibre in a “relay race”-like fashion. We anticipate that the G. barbadense genome sequence will advance our understanding the mechanism of genome polyploidization and underpin genome-wide comparison research in this genus. PMID:26634818
Origins of De Novo Genes in Human and Chimpanzee.

PubMed

Ruiz-Orera, Jorge; Hernandez-Rodriguez, Jessica; Chiva, Cristina; Sabidó, Eduard; Kondova, Ivanela; Bontrop, Ronald; Marqués-Bonet, Tomàs; Albà, M Mar

2015-12-01

The birth of new genes is an important motor of evolutionary innovation. Whereas many new genes arise by gene duplication, others originate at genomic regions that did not contain any genes or gene copies. Some of these newly expressed genes may acquire coding or non-coding functions and be preserved by natural selection. However, it is yet unclear which is the prevalence and underlying mechanisms of de novo gene emergence. In order to obtain a comprehensive view of this process, we have performed in-depth sequencing of the transcriptomes of four mammalian species--human, chimpanzee, macaque, and mouse--and subsequently compared the assembled transcripts and the corresponding syntenic genomic regions. This has resulted in the identification of over five thousand new multiexonic transcriptional events in human and/or chimpanzee that are not observed in the rest of species. Using comparative genomics, we show that the expression of these transcripts is associated with the gain of regulatory motifs upstream of the transcription start site (TSS) and of U1 snRNP sites downstream of the TSS. In general, these transcripts show little evidence of purifying selection, suggesting that many of them are not functional. However, we find signatures of selection in a subset of de novo genes which have evidence of protein translation. Taken together, the data support a model in which frequently-occurring new transcriptional events in the genome provide the raw material for the evolution of new proteins.
Origins of De Novo Genes in Human and Chimpanzee

PubMed Central

Ruiz-Orera, Jorge; Hernandez-Rodriguez, Jessica; Chiva, Cristina; Sabidó, Eduard; Kondova, Ivanela; Bontrop, Ronald; Marqués-Bonet, Tomàs; Albà, M.Mar

2015-01-01

The birth of new genes is an important motor of evolutionary innovation. Whereas many new genes arise by gene duplication, others originate at genomic regions that did not contain any genes or gene copies. Some of these newly expressed genes may acquire coding or non-coding functions and be preserved by natural selection. However, it is yet unclear which is the prevalence and underlying mechanisms of de novo gene emergence. In order to obtain a comprehensive view of this process, we have performed in-depth sequencing of the transcriptomes of four mammalian species—human, chimpanzee, macaque, and mouse—and subsequently compared the assembled transcripts and the corresponding syntenic genomic regions. This has resulted in the identification of over five thousand new multiexonic transcriptional events in human and/or chimpanzee that are not observed in the rest of species. Using comparative genomics, we show that the expression of these transcripts is associated with the gain of regulatory motifs upstream of the transcription start site (TSS) and of U1 snRNP sites downstream of the TSS. In general, these transcripts show little evidence of purifying selection, suggesting that many of them are not functional. However, we find signatures of selection in a subset of de novo genes which have evidence of protein translation. Taken together, the data support a model in which frequently-occurring new transcriptional events in the genome provide the raw material for the evolution of new proteins. PMID:26720152
A survey of single nucleotide polymorphisms identified from whole-genome sequencing and their functional effect in the porcine genome.

PubMed

Keel, B N; Nonneman, D J; Rohrer, G A

2017-08-01

Genetic variants detected from sequence have been used to successfully identify causal variants and map complex traits in several organisms. High and moderate impact variants, those expected to alter or disrupt the protein coded by a gene and those that regulate protein production, likely have a more significant effect on phenotypic variation than do other types of genetic variants. Hence, a comprehensive list of these functional variants would be of considerable interest in swine genomic studies, particularly those targeting fertility and production traits. Whole-genome sequence was obtained from 72 of the founders of an intensely phenotyped experimental swine herd at the U.S. Meat Animal Research Center (USMARC). These animals included all 24 of the founding boars (12 Duroc and 12 Landrace) and 48 Yorkshire-Landrace composite sows. Sequence reads were mapped to the Sscrofa10.2 genome build, resulting in a mean of 6.1 fold (×) coverage per genome. A total of 22 342 915 high confidence SNPs were identified from the sequenced genomes. These included 21 million previously reported SNPs and 79% of the 62 163 SNPs on the PorcineSNP60 BeadChip assay. Variation was detected in the coding sequence or untranslated regions (UTRs) of 87.8% of the genes in the porcine genome: loss-of-function variants were predicted in 504 genes, 10 202 genes contained nonsynonymous variants, 10 773 had variation in UTRs and 13 010 genes contained synonymous variants. Approximately 139 000 SNPs were classified as loss-of-function, nonsynonymous or regulatory, which suggests that over 99% of the variation detected in our pigs could potentially be ignored, allowing us to focus on a much smaller number of functional SNPs during future analyses. Published 2017. This article is a U.S. Government work and is in the public domain in the USA.
Genome-Wide Profiling of p63 DNA–Binding Sites Identifies an Element that Regulates Gene Expression during Limb Development in the 7q21 SHFM1 Locus

PubMed Central

Oti, Martin; Dutilh, Bas E.; Alonso, M. Eva; de la Calle-Mustienes, Elisa; Smeenk, Leonie; Rinne, Tuula; Parsaulian, Lilian; Bolat, Emine; Jurgelenaite, Rasa; Huynen, Martijn A.; Hoischen, Alexander; Veltman, Joris A.; Brunner, Han G.; Roscioli, Tony; Oates, Emily; Wilson, Meredith; Manzanares, Miguel; Gómez-Skarmeta, José Luis; Stunnenberg, Hendrik G.; Lohrum, Marion; van Bokhoven, Hans; Zhou, Huiqing

2010-01-01

Heterozygous mutations in p63 are associated with split hand/foot malformations (SHFM), orofacial clefting, and ectodermal abnormalities. Elucidation of the p63 gene network that includes target genes and regulatory elements may reveal new genes for other malformation disorders. We performed genome-wide DNA–binding profiling by chromatin immunoprecipitation (ChIP), followed by deep sequencing (ChIP–seq) in primary human keratinocytes, and identified potential target genes and regulatory elements controlled by p63. We show that p63 binds to an enhancer element in the SHFM1 locus on chromosome 7q and that this element controls expression of DLX6 and possibly DLX5, both of which are important for limb development. A unique micro-deletion including this enhancer element, but not the DLX5/DLX6 genes, was identified in a patient with SHFM. Our study strongly indicates disruption of a non-coding cis-regulatory element located more than 250 kb from the DLX5/DLX6 genes as a novel disease mechanism in SHFM1. These data provide a proof-of-concept that the catalogue of p63 binding sites identified in this study may be of relevance to the studies of SHFM and other congenital malformations that resemble the p63-associated phenotypes. PMID:20808887
The standard operating procedure of the DOE-JGI Metagenome Annotation Pipeline (MAP v.4)

DOE PAGES

Huntemann, Marcel; Ivanova, Natalia N.; Mavromatis, Konstantinos; ...

2016-02-24

The DOE-JGI Metagenome Annotation Pipeline (MAP v.4) performs structural and functional annotation for metagenomic sequences that are submitted to the Integrated Microbial Genomes with Microbiomes (IMG/M) system for comparative analysis. The pipeline runs on nucleotide sequences provide d via the IMG submission site. Users must first define their analysis projects in GOLD and then submit the associated sequence datasets consisting of scaffolds/contigs with optional coverage information and/or unassembled reads in fasta and fastq file formats. The MAP processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNAs, as well as CRISPR elements. Structural annotation ismore » followed by functional annotation including assignment of protein product names and connection to various protein family databases.« less
The standard operating procedure of the DOE-JGI Metagenome Annotation Pipeline (MAP v.4)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Huntemann, Marcel; Ivanova, Natalia N.; Mavromatis, Konstantinos

The DOE-JGI Metagenome Annotation Pipeline (MAP v.4) performs structural and functional annotation for metagenomic sequences that are submitted to the Integrated Microbial Genomes with Microbiomes (IMG/M) system for comparative analysis. The pipeline runs on nucleotide sequences provide d via the IMG submission site. Users must first define their analysis projects in GOLD and then submit the associated sequence datasets consisting of scaffolds/contigs with optional coverage information and/or unassembled reads in fasta and fastq file formats. The MAP processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNAs, as well as CRISPR elements. Structural annotation ismore » followed by functional annotation including assignment of protein product names and connection to various protein family databases.« less
NONCODE v2.0: decoding the non-coding.

PubMed

He, Shunmin; Liu, Changning; Skogerbø, Geir; Zhao, Haitao; Wang, Jie; Liu, Tao; Bai, Baoyan; Zhao, Yi; Chen, Runsheng

2008-01-01

The NONCODE database is an integrated knowledge database designed for the analysis of non-coding RNAs (ncRNAs). Since NONCODE was first released 3 years ago, the number of known ncRNAs has grown rapidly, and there is growing recognition that ncRNAs play important regulatory roles in most organisms. In the updated version of NONCODE (NONCODE v2.0), the number of collected ncRNAs has reached 206 226, including a wide range of microRNAs, Piwi-interacting RNAs and mRNA-like ncRNAs. The improvements brought to the database include not only new and updated ncRNA data sets, but also an incorporation of BLAST alignment search service and access through our custom UCSC Genome Browser. NONCODE can be found under http://www.noncode.org or http://noncode.bioinfo.org.cn.
Phylum-Level Conservation of Regulatory Information in Nematodes despite Extensive Non-coding Sequence Divergence

PubMed Central

Gordon, Kacy L.; Arthur, Robert K.; Ruvinsky, Ilya

2015-01-01

Gene regulatory information guides development and shapes the course of evolution. To test conservation of gene regulation within the phylum Nematoda, we compared the functions of putative cis-regulatory sequences of four sets of orthologs (unc-47, unc-25, mec-3 and elt-2) from distantly-related nematode species. These species, Caenorhabditis elegans, its congeneric C. briggsae, and three parasitic species Meloidogyne hapla, Brugia malayi, and Trichinella spiralis, represent four of the five major clades in the phylum Nematoda. Despite the great phylogenetic distances sampled and the extensive sequence divergence of nematode genomes, all but one of the regulatory elements we tested are able to drive at least a subset of the expected gene expression patterns. We show that functionally conserved cis-regulatory elements have no more extended sequence similarity to their C. elegans orthologs than would be expected by chance, but they do harbor motifs that are important for proper expression of the C. elegans genes. These motifs are too short to be distinguished from the background level of sequence similarity, and while identical in sequence they are not conserved in orientation or position. Functional tests reveal that some of these motifs contribute to proper expression. Our results suggest that conserved regulatory circuitry can persist despite considerable turnover within cis elements. PMID:26020930
Genome-wide profiling of the PIWI-interacting RNA-mRNA regulatory networks in epithelial ovarian cancers.

PubMed

Singh, Garima; Roy, Jyoti; Rout, Pratiti; Mallick, Bibekanand

2018-01-01

PIWI-interacting (piRNAs), ~23-36 nucleotide-long small non-coding RNAs (sncRNAs), earlier believed to be germline-specific, have now been identified in somatic cells, including cancer cells. These sncRNAs impact critical biological processes by fine-tuning gene expression at post-transcriptional and epigenetic levels. The expression of piRNAs in ovarian cancer, the most lethal gynecologic cancer is largely uncharted. In this study, we investigated the expression of PIWILs by qRT-PCR and western blotting and then identified piRNA transcriptomes in tissues of normal ovary and two most prevalent epithelial ovarian cancer subtypes, serous and endometrioid by small RNA sequencing. We detected 219, 256 and 234 piRNAs in normal ovary, endometrioid and serous ovarian cancer samples respectively. We observed piRNAs are encoded from various genomic regions, among which introns harbor the majority of them. Surprisingly, piRNAs originated from different genomic contexts showed the varied level of conservations across vertebrates. The functional analysis of predicted targets of differentially expressed piRNAs revealed these could modulate key processes and pathways involved in ovarian oncogenesis. Our study provides the first comprehensive piRNA landscape in these samples and a useful resource for further functional studies to decipher new mechanistic views of piRNA-mediated gene regulatory networks affecting ovarian oncogenesis. The RNA-seq data is submitted to GEO database (GSE83794).
Analysis of the African coelacanth genome sheds light on tetrapod evolution

PubMed Central

Amemiya, Chris T.; Alföldi, Jessica; Lee, Alison P.; Fan, Shaohua; Philippe, Hervé; MacCallum, Iain; Braasch, Ingo; Manousaki, Tereza; Schneider, Igor; Rohner, Nicolas; Organ, Chris; Chalopin, Domitille; Smith, Jeramiah J.; Robinson, Mark; Dorrington, Rosemary A.; Gerdol, Marco; Aken, Bronwen; Biscotti, Maria Assunta; Barucca, Marco; Baurain, Denis; Berlin, Aaron M.; Blatch, Gregory L.; Buonocore, Francesco; Burmester, Thorsten; Campbell, Michael S.; Canapa, Adriana; Cannon, John P.; Christoffels, Alan; De Moro, Gianluca; Edkins, Adrienne L.; Fan, Lin; Fausto, Anna Maria; Feiner, Nathalie; Forconi, Mariko; Gamieldien, Junaid; Gnerre, Sante; Gnirke, Andreas; Goldstone, Jared V.; Haerty, Wilfried; Hahn, Mark E.; Hesse, Uljana; Hoffmann, Steve; Johnson, Jeremy; Karchner, Sibel I.; Kuraku, Shigehiro; Lara, Marcia; Levin, Joshua Z.; Litman, Gary W.; Mauceli, Evan; Miyake, Tsutomu; Mueller, M. Gail; Nelson, David R.; Nitsche, Anne; Olmo, Ettore; Ota, Tatsuya; Pallavicini, Alberto; Panji, Sumir; Picone, Barbara; Ponting, Chris P.; Prohaska, Sonja J.; Przybylski, Dariusz; Saha, Nil Ratan; Ravi, Vydianathan; Ribeiro, Filipe J.; Sauka-Spengler, Tatjana; Scapigliati, Giuseppe; Searle, Stephen M. J.; Sharpe, Ted; Simakov, Oleg; Stadler, Peter F.; Stegeman, John J.; Sumiyama, Kenta; Tabbaa, Diana; Tafer, Hakim; Turner-Maier, Jason; van Heusden, Peter; White, Simon; Williams, Louise; Yandell, Mark; Brinkmann, Henner; Volff, Jean-Nicolas; Tabin, Clifford J.; Shubin, Neil; Schartl, Manfred; Jaffe, David; Postlethwait, John H.; Venkatesh, Byrappa; Di Palma, Federica; Lander, Eric S.; Meyer, Axel; Lindblad-Toh, Kerstin

2013-01-01

It was a zoological sensation when a living specimen of the coelacanth was first discovered in 1938, as this lineage of lobe-finned fish was thought to have gone extinct 70 million years ago. The modern coelacanth looks remarkably similar to many of its ancient relatives, and its evolutionary proximity to our own fish ancestors provides a glimpse of the fish that first walked on land. Here we report the genome sequence of the African coelacanth, Latimeria chalumnae. Through a phylogenomic analysis, we conclude that the lungfish, and not the coelacanth, is the closest living relative of tetrapods. Coelacanth protein-coding genes are significantly more slowly evolving than those of tetrapods, unlike other genomic features . Analyses of changes in genes and regulatory elements during the vertebrate adaptation to land highlight genes involved in immunity, nitrogen excretion and the development of fins, tail, ear, eye, brain, and olfaction. Functional assays of enhancers involved in the fin-to-limb transition and in the emergence of extra-embryonic tissues demonstrate the importance of the coelacanth genome as a blueprint for understanding tetrapod evolution. PMID:23598338
Refined mapping of autoimmune disease associated genetic variants with gene expression suggests an important role for non-coding RNAs.

PubMed

Ricaño-Ponce, Isis; Zhernakova, Daria V; Deelen, Patrick; Luo, Oscar; Li, Xingwang; Isaacs, Aaron; Karjalainen, Juha; Di Tommaso, Jennifer; Borek, Zuzanna Agnieszka; Zorro, Maria M; Gutierrez-Achury, Javier; Uitterlinden, Andre G; Hofman, Albert; van Meurs, Joyce; Netea, Mihai G; Jonkers, Iris H; Withoff, Sebo; van Duijn, Cornelia M; Li, Yang; Ruan, Yijun; Franke, Lude; Wijmenga, Cisca; Kumar, Vinod

2016-04-01

Genome-wide association and fine-mapping studies in 14 autoimmune diseases (AID) have implicated more than 250 loci in one or more of these diseases. As more than 90% of AID-associated SNPs are intergenic or intronic, pinpointing the causal genes is challenging. We performed a systematic analysis to link 460 SNPs that are associated with 14 AID to causal genes using transcriptomic data from 629 blood samples. We were able to link 71 (39%) of the AID-SNPs to two or more nearby genes, providing evidence that for part of the AID loci multiple causal genes exist. While 54 of the AID loci are shared by one or more AID, 17% of them do not share candidate causal genes. In addition to finding novel genes such as ULK3, we also implicate novel disease mechanisms and pathways like autophagy in celiac disease pathogenesis. Furthermore, 42 of the AID SNPs specifically affected the expression of 53 non-coding RNA genes. To further understand how the non-coding genome contributes to AID, the SNPs were linked to functional regulatory elements, which suggest a model where AID genes are regulated by network of chromatin looping/non-coding RNAs interactions. The looping model also explains how a causal candidate gene is not necessarily the gene closest to the AID SNP, which was the case in nearly 50% of cases. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.
Modeling the evolution of regulatory elements by simultaneous detection and alignment with phylogenetic pair HMMs.

PubMed

Majoros, William H; Ohler, Uwe

2010-12-16

The computational detection of regulatory elements in DNA is a difficult but important problem impacting our progress in understanding the complex nature of eukaryotic gene regulation. Attempts to utilize cross-species conservation for this task have been hampered both by evolutionary changes of functional sites and poor performance of general-purpose alignment programs when applied to non-coding sequence. We describe a new and flexible framework for modeling binding site evolution in multiple related genomes, based on phylogenetic pair hidden Markov models which explicitly model the gain and loss of binding sites along a phylogeny. We demonstrate the value of this framework for both the alignment of regulatory regions and the inference of precise binding-site locations within those regions. As the underlying formalism is a stochastic, generative model, it can also be used to simulate the evolution of regulatory elements. Our implementation is scalable in terms of numbers of species and sequence lengths and can produce alignments and binding-site predictions with accuracy rivaling or exceeding current systems that specialize in only alignment or only binding-site prediction. We demonstrate the validity and power of various model components on extensive simulations of realistic sequence data and apply a specific model to study Drosophila enhancers in as many as ten related genomes and in the presence of gain and loss of binding sites. Different models and modeling assumptions can be easily specified, thus providing an invaluable tool for the exploration of biological hypotheses that can drive improvements in our understanding of the mechanisms and evolution of gene regulation.
HOXA1 and TALE proteins display cross-regulatory interactions and form a combinatorial binding code on HOXA1 targets

PubMed Central

De Kumar, Bony; Parker, Hugo J.; Paulson, Ariel; Parrish, Mark E.; Pushel, Irina; Singh, Narendra Pratap; Zhang, Ying; Slaughter, Brian D.; Unruh, Jay R.; Florens, Laurence; Zeitlinger, Julia; Krumlauf, Robb

2017-01-01

Hoxa1 has diverse functional roles in differentiation and development. We identify and characterize properties of regions bound by HOXA1 on a genome-wide basis in differentiating mouse ES cells. HOXA1-bound regions are enriched for clusters of consensus binding motifs for HOX, PBX, and MEIS, and many display co-occupancy of PBX and MEIS. PBX and MEIS are members of the TALE family and genome-wide analysis of multiple TALE members (PBX, MEIS, TGIF, PREP1, and PREP2) shows that nearly all HOXA1 targets display occupancy of one or more TALE members. The combinatorial binding patterns of TALE proteins define distinct classes of HOXA1 targets, which may create functional diversity. Transgenic reporter assays in zebrafish confirm enhancer activities for many HOXA1-bound regions and the importance of HOX-PBX and TGIF motifs for their regulation. Proteomic analyses show that HOXA1 physically interacts on chromatin with PBX, MEIS, and PREP family members, but not with TGIF, suggesting that TGIF may have an independent input into HOXA1-bound regions. Therefore, TALE proteins appear to represent a wide repertoire of HOX cofactors, which may coregulate enhancers through distinct mechanisms. We also discover extensive auto- and cross-regulatory interactions among the Hoxa1 and TALE genes, indicating that the specificity of HOXA1 during development may be regulated though a complex cross-regulatory network of HOXA1 and TALE proteins. This study provides new insight into a regulatory network involving combinatorial interactions between HOXA1 and TALE proteins. PMID:28784834
Improved accuracy of supervised CRM discovery with interpolated Markov models and cross-species comparison

PubMed Central

Kazemian, Majid; Zhu, Qiyun; Halfon, Marc S.; Sinha, Saurabh

2011-01-01

Despite recent advances in experimental approaches for identifying transcriptional cis-regulatory modules (CRMs, ‘enhancers’), direct empirical discovery of CRMs for all genes in all cell types and environmental conditions is likely to remain an elusive goal. Effective methods for computational CRM discovery are thus a critically needed complement to empirical approaches. However, existing computational methods that search for clusters of putative binding sites are ineffective if the relevant TFs and/or their binding specificities are unknown. Here, we provide a significantly improved method for ‘motif-blind’ CRM discovery that does not depend on knowledge or accurate prediction of TF-binding motifs and is effective when limited knowledge of functional CRMs is available to ‘supervise’ the search. We propose a new statistical method, based on ‘Interpolated Markov Models’, for motif-blind, genome-wide CRM discovery. It captures the statistical profile of variable length words in known CRMs of a regulatory network and finds candidate CRMs that match this profile. The method also uses orthologs of the known CRMs from closely related genomes. We perform in silico evaluation of predicted CRMs by assessing whether their neighboring genes are enriched for the expected expression patterns. This assessment uses a novel statistical test that extends the widely used Hypergeometric test of gene set enrichment to account for variability in intergenic lengths. We find that the new CRM prediction method is superior to existing methods. Finally, we experimentally validate 12 new CRM predictions by examining their regulatory activity in vivo in Drosophila; 10 of the tested CRMs were found to be functional, while 6 of the top 7 predictions showed the expected activity patterns. We make our program available as downloadable source code, and as a plugin for a genome browser installed on our servers. PMID:21821659
50 years of DNA ‘Breathing’: Reflections on Old and New Approaches

PubMed Central

von Hippel, Peter H.; Johnson, Neil P.; Marcus, Andrew H.

2015-01-01

Summary The coding sequences for genes, and much other regulatory information involved in genome expression, are located ‘inside’ the DNA duplex. Thus the ‘macromolecular machines’ that read-out this information from the base sequence of the DNA must somehow access the DNA ‘interior’. Double-stranded (ds) DNA is a highly structured and cooperatively stabilized system at physiological temperatures, but is also only marginally stable and undergoes a cooperative ‘melting phase transition’ at temperatures not far above physiological. Furthermore, due to its length and heterogeneous sequence, with AT-rich segments being less stable than GC-rich segments, the DNA genome ‘melts’ in a multistate fashion. Therefore the DNA genome must also manifest thermally driven structural (‘breathing’) fluctuations at physiological temperatures that should reflect the heterogeneity of the dsDNA stability near the melting temperature. Thus many of the breathing fluctuations of dsDNA are likely also to be sequence dependent, and could well contain information that should be ‘readable’ and useable by regulatory proteins and protein complexes in site-specific binding reactions involving dsDNA ‘opening’. Our laboratory has been involved in studying the breathing fluctuations of duplex DNA for about 50 years. In this ‘Reflections’ article we present a relatively chronological overview of these studies, starting with the use of simple chemical probes (such as hydrogen exchange, formaldehyde and simple DNA ‘melting’ proteins) to examine the local stability of the dsDNA structure, and culminating in sophisticated spectroscopic approaches that can be used to monitor the breathing-dependent interactions of regulatory complexes with their duplex DNA targets in ‘real time’. PMID:23840028
DOE Office of Scientific and Technical Information (OSTI.GOV)

Kappler, Ulrike; Davenport, Karen W.; Beatson, Scott

Starkeya novella (Starkey 1934) Kelly et al. 2000 is a member of the family Xanthobacteraceae in the order Rhizobiales, which is thus far poorly characterized at the genome level. Cultures from this spe- cies are most interesting due to their facultatively chemolithoautotrophic lifestyle, which allows them to both consume carbon dioxide and to produce it. This feature makes S. novella an interesting model organism for studying the genomic basis of regulatory networks required for the switch between consumption and production of carbon dioxide, a key component of the global carbon cycle. In addition, S. novella is of interest for itsmore » ability to grow on various inorganic sulfur compounds and several C1- compounds such as methanol. Besides Azorhizobium caulinodans, S. novella is only the second species in the family Xanthobacteraceae with a completely sequenced genome of a type strain. The cur- rent taxonomic classification of this group is in significant conflict with the 16S rRNA data. The genomic data indicate that the physiological capabilities of the organism might have been underestimated. Here, the 4,765,023 bp long chromosome with its 4,511 protein-coding and 52 RNA genes was sequenced as part of the DOE Joint Genome Institute Community Sequencing Program (CSP) 2008.« less

Transcriptome profile of a bovine respiratory disease pathogen: Mannheimia haemolytica PHL213

PubMed Central

2012-01-01

Background Computational methods for structural gene annotation have propelled gene discovery but face certain drawbacks with regards to prokaryotic genome annotation. Identification of transcriptional start sites, demarcating overlapping gene boundaries, and identifying regulatory elements such as small RNA are not accurate using these approaches. In this study, we re-visit the structural annotation of Mannheimia haemolytica PHL213, a bovine respiratory disease pathogen. M. haemolytica is one of the causative agents of bovine respiratory disease that results in about $3 billion annual losses to the cattle industry. We used RNA-Seq and analyzed the data using freely-available computational methods and resources. The aim was to identify previously unannotated regions of the genome using RNA-Seq based expression profile to complement the existing annotation of this pathogen. Results Using the Illumina Genome Analyzer, we generated 9,055,826 reads (average length ~76 bp) and aligned them to the reference genome using Bowtie. The transcribed regions were analyzed using SAMTOOLS and custom Perl scripts in conjunction with BLAST searches and available gene annotation information. The single nucleotide resolution map enabled the identification of 14 novel protein coding regions as well as 44 potential novel sRNA. The basal transcription profile revealed that 2,506 of the 2,837 annotated regions were expressed in vitro, at 95.25% coverage, representing all broad functional gene categories in the genome. The expression profile also helped identify 518 potential operon structures involving 1,086 co-expressed pairs. We also identified 11 proteins with mutated/alternate start codons. Conclusions The application of RNA-Seq based transcriptome profiling to structural gene annotation helped correct existing annotation errors and identify potential novel protein coding regions and sRNA. We used computational tools to predict regulatory elements such as promoters and terminators associated with the novel expressed regions for further characterization of these novel functional elements. Our study complements the existing structural annotation of Mannheimia haemolytica PHL213 based on experimental evidence. Given the role of sRNA in virulence gene regulation and stress response, potential novel sRNA described in this study can form the framework for future studies to determine the role of sRNA, if any, in M. haemolytica pathogenesis. PMID:23046475
A genome-wide resource for the analysis of protein localisation in Drosophila

PubMed Central

Sarov, Mihail; Barz, Christiane; Jambor, Helena; Hein, Marco Y; Schmied, Christopher; Suchold, Dana; Stender, Bettina; Janosch, Stephan; KJ, Vinay Vikas; Krishnan, RT; Krishnamoorthy, Aishwarya; Ferreira, Irene RS; Ejsmont, Radoslaw K; Finkl, Katja; Hasse, Susanne; Kämpfer, Philipp; Plewka, Nicole; Vinis, Elisabeth; Schloissnig, Siegfried; Knust, Elisabeth; Hartenstein, Volker; Mann, Matthias; Ramaswami, Mani; VijayRaghavan, K; Tomancak, Pavel; Schnorrer, Frank

2016-01-01

The Drosophila genome contains >13000 protein-coding genes, the majority of which remain poorly investigated. Important reasons include the lack of antibodies or reporter constructs to visualise these proteins. Here, we present a genome-wide fosmid library of 10000 GFP-tagged clones, comprising tagged genes and most of their regulatory information. For 880 tagged proteins, we created transgenic lines, and for a total of 207 lines, we assessed protein expression and localisation in ovaries, embryos, pupae or adults by stainings and live imaging approaches. Importantly, we visualised many proteins at endogenous expression levels and found a large fraction of them localising to subcellular compartments. By applying genetic complementation tests, we estimate that about two-thirds of the tagged proteins are functional. Moreover, these tagged proteins enable interaction proteomics from developing pupae and adult flies. Taken together, this resource will boost systematic analysis of protein expression and localisation in various cellular and developmental contexts. DOI: http://dx.doi.org/10.7554/eLife.12068.001 PMID:26896675
DNA transposon activity is associated with increased mutation rates in genes of rice and other grasses

PubMed Central

Wicker, Thomas; Yu, Yeisoo; Haberer, Georg; Mayer, Klaus F. X.; Marri, Pradeep Reddy; Rounsley, Steve; Chen, Mingsheng; Zuccolo, Andrea; Panaud, Olivier; Wing, Rod A.; Roffler, Stefan

2016-01-01

DNA (class 2) transposons are mobile genetic elements which move within their ‘host' genome through excising and re-inserting elsewhere. Although the rice genome contains tens of thousands of such elements, their actual role in evolution is still unclear. Analysing over 650 transposon polymorphisms in the rice species Oryza sativa and Oryza glaberrima, we find that DNA repair following transposon excisions is associated with an increased number of mutations in the sequences neighbouring the transposon. Indeed, the 3,000 bp flanking the excised transposons can contain over 10 times more mutations than the genome-wide average. Since DNA transposons preferably insert near genes, this is correlated with increases in mutation rates in coding sequences and regulatory regions. Most importantly, we find this phenomenon also in maize, wheat and barley. Thus, these findings suggest that DNA transposon activity is a major evolutionary force in grasses which provide the basis of most food consumed by humankind. PMID:27599761
Can a few non‐coding mutations make a human brain?

PubMed Central

Franchini, Lucía F.

2015-01-01

The recent finding that the human version of a neurodevelopmental enhancer of the Wnt receptor Frizzled 8 (FZD8) gene alters neural progenitor cell cycle timing and brain size is a step forward to understanding human brain evolution. The human brain is distinctive in terms of its cognitive abilities as well as its susceptibility to neurological disease. Identifying which of the millions of genomic changes that occurred during human evolution led to these and other uniquely human traits is extremely challenging. Recent studies have demonstrated that many of the fastest evolving regions of the human genome function as gene regulatory enhancers during embryonic development and that the human‐specific mutations in them might alter expression patterns. However, elucidating molecular and cellular effects of sequence or expression pattern changes is a major obstacle to discovering the genetic bases of the evolution of our species. There is much work to do before human‐specific genetic and genomic changes are linked to complex human traits. Also watch the Video Abstract. PMID:26350501
MSDB: A Comprehensive Database of Simple Sequence Repeats.

PubMed

Avvaru, Akshay Kumar; Saxena, Saketh; Sowpati, Divya Tej; Mishra, Rakesh Kumar

2017-06-01

Microsatellites, also known as Simple Sequence Repeats (SSRs), are short tandem repeats of 1-6 nt motifs present in all genomes, particularly eukaryotes. Besides their usefulness as genome markers, SSRs have been shown to perform important regulatory functions, and variations in their length at coding regions are linked to several disorders in humans. Microsatellites show a taxon-specific enrichment in eukaryotic genomes, and some may be functional. MSDB (Microsatellite Database) is a collection of >650 million SSRs from 6,893 species including Bacteria, Archaea, Fungi, Plants, and Animals. This database is by far the most exhaustive resource to access and analyze SSR data of multiple species. In addition to exploring data in a customizable tabular format, users can view and compare the data of multiple species simultaneously using our interactive plotting system. MSDB is developed using the Django framework and MySQL. It is freely available at http://tdb.ccmb.res.in/msdb. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Interplay between DNA methylation, histone modification and chromatin remodeling in stem cells and during development.

PubMed

Ikegami, Kohta; Ohgane, Jun; Tanaka, Satoshi; Yagi, Shintaro; Shiota, Kunio

2009-01-01

Genes constitute only a small proportion of the mammalian genome, the majority of which is composed of non-genic repetitive elements including interspersed repeats and satellites. A unique feature of the mammalian genome is that there are numerous tissue-dependent, differentially methylated regions (T-DMRs) in the non-repetitive sequences, which include genes and their regulatory elements. The epigenetic status of T-DMRs varies from that of repetitive elements and constitutes the DNA methylation profile genome-wide. Since the DNA methylation profile is specific to each cell and tissue type, much like a fingerprint, it can be used as a means of identification. The formation of DNA methylation profiles is the basis for cell differentiation and development in mammals. The epigenetic status of each T-DMR is regulated by the interplay between DNA methyltransferases, histone modification enzymes, histone subtypes, non-histone nuclear proteins and non-coding RNAs. In this review, we will discuss how these epigenetic factors cooperate to establish cell- and tissue-specific DNA methylation profiles.
Endogenous Retroviruses in the Genomics Era.

PubMed

Johnson, Welkin E

2015-11-01

Endogenous retroviruses comprise millions of discrete genetic loci distributed within the genomes of extant vertebrates. These sequences, which are clearly related to exogenous retroviruses, represent retroviral infections of the deep past, and their abundance suggests that retroviruses were a near-constant presence throughout the evolutionary history of modern vertebrates. Endogenous retroviruses contribute in myriad ways to the evolution of host genomes, as mutagens and as sources of genetic novelty (both coding and regulatory) to be acted upon by the twin engines of random genetic drift and natural selection. Importantly, the richness and complexity of endogenous retrovirus data can be used to understand how viruses spread and adapt on evolutionary timescales by combining population genetics and evolutionary theory with a detailed understanding of retrovirus biology (gleaned from the study of extant retroviruses). In addition to revealing the impact of viruses on organismal evolution, such studies can help us better understand, by looking back in time, how life-history traits, as well as ecological and geological events, influence the movement of viruses within and between populations.
Genetic therapy for the nervous system.

PubMed

Bowers, William J; Breakefield, Xandra O; Sena-Esteves, Miguel

2011-04-15

Genetic therapy is undergoing a renaissance with expansion of viral and synthetic vectors, use of oligonucleotides (RNA and DNA) and sequence-targeted regulatory molecules, as well as genetically modified cells, including induced pluripotent stem cells from the patients themselves. Several clinical trials for neurologic syndromes appear quite promising. This review covers genetic strategies to ameliorate neurologic syndromes of different etiologies, including lysosomal storage diseases, Alzheimer's disease and other amyloidopathies, Parkinson's disease, spinal muscular atrophy, amyotrophic lateral sclerosis and brain tumors. This field has been propelled by genetic technologies, including identifying disease genes and disruptive mutations, design of genomic interacting elements to regulate transcription and splicing of specific precursor mRNAs and use of novel non-coding regulatory RNAs. These versatile new tools for manipulation of genetic elements provide the ability to tailor the mode of genetic intervention to specific aspects of a disease state.
Signaling coupled epigenomic regulation of gene expression.

PubMed

Kumar, R; Deivendran, S; Santhoshkumar, T R; Pillai, M R

2017-10-26

Inheritance of genomic information independent of the DNA sequence, the epigenetics, as well as gene transcription are profoundly shaped by serine/threonine and tyrosine signaling kinases and components of the chromatin remodeling complexes. To precisely respond to a changing external milieu, human cells efficiently translate upstream signals into post-translational modifications (PTMs) on histones and coregulators such as corepressors, coactivators, DNA-binding factors and PTM modifying enzymes. Because a protein with multiple residues for putative PTMs is expected to undergo more than one PTM in cells stimulated with growth factors, the outcome of combinational PTM codes on histones and coregulators is profoundly shaped by regulatory interplays between PTMs. The genomic functions of signaling kinases in cancer cells are manifested by the downstream effectors of cytoplasmic signaling cascades as well as translocation of the cytoplasmic signaling kinases to the nucleus. Signaling-mediated phosphorylation of histones serves as a regulatory switch for other PTMs, and connects chromatin remodeling complexes into gene transcription and gene activity. Here, we will discuss the recent advances in signaling-dependent epigenomic regulation of gene transcription using a few representative cancer-relevant serine/threonine and tyrosine kinases and their interplay with chromatin remodeling factors in cancer cells.
Generation of transgenic goats by pronuclear microinjection: a retrospective analysis of a commercial operation (1995-2012).

PubMed

Gavin, W; Blash, S; Buzzell, N; Pollock, D; Chen, L; Hawkins, N; Howe, J; Miner, K; Pollock, J; Porter, C; Schofield, M; Echelard, Y; Meade, H

2018-02-01

Production of transgenic founder goats involves introducing and stably integrating an engineered piece of DNA into the genome of the animal. At LFB USA, the ultimate use of these transgenic goats is for the production of recombinant human protein therapeutics in the milk of these dairy animals. The transgene or construct typically links a milk protein specific promoter sequence, the coding sequence for the gene of interest, and the necessary downstream regulatory sequences thereby directing expression of the recombinant protein in the milk during the lactation period. Over the time period indicated (1995-2012), pronuclear microinjection was used in a number of programs to insert transgenes into 18,120, 1- or 2- cell stage fertilized embryos. These embryos were transferred into 4180 synchronized recipient females with 1934 (47%) recipients becoming pregnant, 2594 offspring generated, and a 109 (4.2%) of those offspring determined to be transgenic. Even with new and improving genome editing tools now available, pronuclear microinjection is still the predominant and proven technology used in this commercial setting supporting regulatory filings and market authorizations when producing founder transgenic animals with large transgenes (> 10 kb) such as those necessary for directing monoclonal antibody production in milk.
Employing genome-wide SNP discovery and genotyping strategy to extrapolate the natural allelic diversity and domestication patterns in chickpea

PubMed Central

Kujur, Alice; Bajaj, Deepak; Upadhyaya, Hari D.; Das, Shouvik; Ranjan, Rajeev; Shree, Tanima; Saxena, Maneesha S.; Badoni, Saurabh; Kumar, Vinod; Tripathi, Shailesh; Gowda, C. L. L.; Sharma, Shivali; Singh, Sube; Tyagi, Akhilesh K.; Parida, Swarup K.

2015-01-01

The genome-wide discovery and high-throughput genotyping of SNPs in chickpea natural germplasm lines is indispensable to extrapolate their natural allelic diversity, domestication, and linkage disequilibrium (LD) patterns leading to the genetic enhancement of this vital legume crop. We discovered 44,844 high-quality SNPs by sequencing of 93 diverse cultivated desi, kabuli, and wild chickpea accessions using reference genome- and de novo-based GBS (genotyping-by-sequencing) assays that were physically mapped across eight chromosomes of desi and kabuli. Of these, 22,542 SNPs were structurally annotated in different coding and non-coding sequence components of genes. Genes with 3296 non-synonymous and 269 regulatory SNPs could functionally differentiate accessions based on their contrasting agronomic traits. A high experimental validation success rate (92%) and reproducibility (100%) along with strong sensitivity (93–96%) and specificity (99%) of GBS-based SNPs was observed. This infers the robustness of GBS as a high-throughput assay for rapid large-scale mining and genotyping of genome-wide SNPs in chickpea with sub-optimal use of resources. With 23,798 genome-wide SNPs, a relatively high intra-specific polymorphic potential (49.5%) and broader molecular diversity (13–89%)/functional allelic diversity (18–77%) was apparent among 93 chickpea accessions, suggesting their tremendous applicability in rapid selection of desirable diverse accessions/inter-specific hybrids in chickpea crossbred varietal improvement program. The genome-wide SNPs revealed complex admixed domestication pattern, extensive LD estimates (0.54–0.68) and extended LD decay (400–500 kb) in a structured population inclusive of 93 accessions. These findings reflect the utility of our identified SNPs for subsequent genome-wide association study (GWAS) and selective sweep-based domestication trait dissection analysis to identify potential genomic loci (gene-associated targets) specifically regulating important complex quantitative agronomic traits in chickpea. The numerous informative genome-wide SNPs, natural allelic diversity-led domestication pattern, and LD-based information generated in our study have got multidimensional applicability with respect to chickpea genomics-assisted breeding. PMID:25873920
Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project.

PubMed

Birney, Ewan; Stamatoyannopoulos, John A; Dutta, Anindya; Guigó, Roderic; Gingeras, Thomas R; Margulies, Elliott H; Weng, Zhiping; Snyder, Michael; Dermitzakis, Emmanouil T; Thurman, Robert E; Kuehn, Michael S; Taylor, Christopher M; Neph, Shane; Koch, Christoph M; Asthana, Saurabh; Malhotra, Ankit; Adzhubei, Ivan; Greenbaum, Jason A; Andrews, Robert M; Flicek, Paul; Boyle, Patrick J; Cao, Hua; Carter, Nigel P; Clelland, Gayle K; Davis, Sean; Day, Nathan; Dhami, Pawandeep; Dillon, Shane C; Dorschner, Michael O; Fiegler, Heike; Giresi, Paul G; Goldy, Jeff; Hawrylycz, Michael; Haydock, Andrew; Humbert, Richard; James, Keith D; Johnson, Brett E; Johnson, Ericka M; Frum, Tristan T; Rosenzweig, Elizabeth R; Karnani, Neerja; Lee, Kirsten; Lefebvre, Gregory C; Navas, Patrick A; Neri, Fidencio; Parker, Stephen C J; Sabo, Peter J; Sandstrom, Richard; Shafer, Anthony; Vetrie, David; Weaver, Molly; Wilcox, Sarah; Yu, Man; Collins, Francis S; Dekker, Job; Lieb, Jason D; Tullius, Thomas D; Crawford, Gregory E; Sunyaev, Shamil; Noble, William S; Dunham, Ian; Denoeud, France; Reymond, Alexandre; Kapranov, Philipp; Rozowsky, Joel; Zheng, Deyou; Castelo, Robert; Frankish, Adam; Harrow, Jennifer; Ghosh, Srinka; Sandelin, Albin; Hofacker, Ivo L; Baertsch, Robert; Keefe, Damian; Dike, Sujit; Cheng, Jill; Hirsch, Heather A; Sekinger, Edward A; Lagarde, Julien; Abril, Josep F; Shahab, Atif; Flamm, Christoph; Fried, Claudia; Hackermüller, Jörg; Hertel, Jana; Lindemeyer, Manja; Missal, Kristin; Tanzer, Andrea; Washietl, Stefan; Korbel, Jan; Emanuelsson, Olof; Pedersen, Jakob S; Holroyd, Nancy; Taylor, Ruth; Swarbreck, David; Matthews, Nicholas; Dickson, Mark C; Thomas, Daryl J; Weirauch, Matthew T; Gilbert, James; Drenkow, Jorg; Bell, Ian; Zhao, XiaoDong; Srinivasan, K G; Sung, Wing-Kin; Ooi, Hong Sain; Chiu, Kuo Ping; Foissac, Sylvain; Alioto, Tyler; Brent, Michael; Pachter, Lior; Tress, Michael L; Valencia, Alfonso; Choo, Siew Woh; Choo, Chiou Yu; Ucla, Catherine; Manzano, Caroline; Wyss, Carine; Cheung, Evelyn; Clark, Taane G; Brown, James B; Ganesh, Madhavan; Patel, Sandeep; Tammana, Hari; Chrast, Jacqueline; Henrichsen, Charlotte N; Kai, Chikatoshi; Kawai, Jun; Nagalakshmi, Ugrappa; Wu, Jiaqian; Lian, Zheng; Lian, Jin; Newburger, Peter; Zhang, Xueqing; Bickel, Peter; Mattick, John S; Carninci, Piero; Hayashizaki, Yoshihide; Weissman, Sherman; Hubbard, Tim; Myers, Richard M; Rogers, Jane; Stadler, Peter F; Lowe, Todd M; Wei, Chia-Lin; Ruan, Yijun; Struhl, Kevin; Gerstein, Mark; Antonarakis, Stylianos E; Fu, Yutao; Green, Eric D; Karaöz, Ulaş; Siepel, Adam; Taylor, James; Liefer, Laura A; Wetterstrand, Kris A; Good, Peter J; Feingold, Elise A; Guyer, Mark S; Cooper, Gregory M; Asimenos, George; Dewey, Colin N; Hou, Minmei; Nikolaev, Sergey; Montoya-Burgos, Juan I; Löytynoja, Ari; Whelan, Simon; Pardi, Fabio; Massingham, Tim; Huang, Haiyan; Zhang, Nancy R; Holmes, Ian; Mullikin, James C; Ureta-Vidal, Abel; Paten, Benedict; Seringhaus, Michael; Church, Deanna; Rosenbloom, Kate; Kent, W James; Stone, Eric A; Batzoglou, Serafim; Goldman, Nick; Hardison, Ross C; Haussler, David; Miller, Webb; Sidow, Arend; Trinklein, Nathan D; Zhang, Zhengdong D; Barrera, Leah; Stuart, Rhona; King, David C; Ameur, Adam; Enroth, Stefan; Bieda, Mark C; Kim, Jonghwan; Bhinge, Akshay A; Jiang, Nan; Liu, Jun; Yao, Fei; Vega, Vinsensius B; Lee, Charlie W H; Ng, Patrick; Shahab, Atif; Yang, Annie; Moqtaderi, Zarmik; Zhu, Zhou; Xu, Xiaoqin; Squazzo, Sharon; Oberley, Matthew J; Inman, David; Singer, Michael A; Richmond, Todd A; Munn, Kyle J; Rada-Iglesias, Alvaro; Wallerman, Ola; Komorowski, Jan; Fowler, Joanna C; Couttet, Phillippe; Bruce, Alexander W; Dovey, Oliver M; Ellis, Peter D; Langford, Cordelia F; Nix, David A; Euskirchen, Ghia; Hartman, Stephen; Urban, Alexander E; Kraus, Peter; Van Calcar, Sara; Heintzman, Nate; Kim, Tae Hoon; Wang, Kun; Qu, Chunxu; Hon, Gary; Luna, Rosa; Glass, Christopher K; Rosenfeld, M Geoff; Aldred, Shelley Force; Cooper, Sara J; Halees, Anason; Lin, Jane M; Shulha, Hennady P; Zhang, Xiaoling; Xu, Mousheng; Haidar, Jaafar N S; Yu, Yong; Ruan, Yijun; Iyer, Vishwanath R; Green, Roland D; Wadelius, Claes; Farnham, Peggy J; Ren, Bing; Harte, Rachel A; Hinrichs, Angie S; Trumbower, Heather; Clawson, Hiram; Hillman-Jackson, Jennifer; Zweig, Ann S; Smith, Kayla; Thakkapallayil, Archana; Barber, Galt; Kuhn, Robert M; Karolchik, Donna; Armengol, Lluis; Bird, Christine P; de Bakker, Paul I W; Kern, Andrew D; Lopez-Bigas, Nuria; Martin, Joel D; Stranger, Barbara E; Woodroffe, Abigail; Davydov, Eugene; Dimas, Antigone; Eyras, Eduardo; Hallgrímsdóttir, Ingileif B; Huppert, Julian; Zody, Michael C; Abecasis, Gonçalo R; Estivill, Xavier; Bouffard, Gerard G; Guan, Xiaobin; Hansen, Nancy F; Idol, Jacquelyn R; Maduro, Valerie V B; Maskeri, Baishali; McDowell, Jennifer C; Park, Morgan; Thomas, Pamela J; Young, Alice C; Blakesley, Robert W; Muzny, Donna M; Sodergren, Erica; Wheeler, David A; Worley, Kim C; Jiang, Huaiyang; Weinstock, George M; Gibbs, Richard A; Graves, Tina; Fulton, Robert; Mardis, Elaine R; Wilson, Richard K; Clamp, Michele; Cuff, James; Gnerre, Sante; Jaffe, David B; Chang, Jean L; Lindblad-Toh, Kerstin; Lander, Eric S; Koriabine, Maxim; Nefedov, Mikhail; Osoegawa, Kazutoyo; Yoshinaga, Yuko; Zhu, Baoli; de Jong, Pieter J

2007-06-14

We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome. Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function.
Implications of streamlining theory for microbial ecology

PubMed Central

Giovannoni, Stephen J; Cameron Thrash, J; Temperton, Ben

2014-01-01

Whether a small cell, a small genome or a minimal set of chemical reactions with self-replicating properties, simplicity is beguiling. As Leonardo da Vinci reportedly said, ‘simplicity is the ultimate sophistication'. Two diverging views of simplicity have emerged in accounts of symbiotic and commensal bacteria and cosmopolitan free-living bacteria with small genomes. The small genomes of obligate insect endosymbionts have been attributed to genetic drift caused by small effective population sizes (Ne). In contrast, streamlining theory attributes small cells and genomes to selection for efficient use of nutrients in populations where Ne is large and nutrients limit growth. Regardless of the cause of genome reduction, lost coding potential eventually dictates loss of function. Consequences of reductive evolution in streamlined organisms include atypical patterns of prototrophy and the absence of common regulatory systems, which have been linked to difficulty in culturing these cells. Recent evidence from metagenomics suggests that streamlining is commonplace, may broadly explain the phenomenon of the uncultured microbial majority, and might also explain the highly interdependent (connected) behavior of many microbial ecosystems. Streamlining theory is belied by the observation that many successful bacteria are large cells with complex genomes. To fully appreciate streamlining, we must look to the life histories and adaptive strategies of cells, which impose minimum requirements for complexity that vary with niche. PMID:24739623
Living Organisms Author Their Read-Write Genomes in Evolution

PubMed Central

2017-01-01

Evolutionary variations generating phenotypic adaptations and novel taxa resulted from complex cellular activities altering genome content and expression: (i) Symbiogenetic cell mergers producing the mitochondrion-bearing ancestor of eukaryotes and chloroplast-bearing ancestors of photosynthetic eukaryotes; (ii) interspecific hybridizations and genome doublings generating new species and adaptive radiations of higher plants and animals; and, (iii) interspecific horizontal DNA transfer encoding virtually all of the cellular functions between organisms and their viruses in all domains of life. Consequently, assuming that evolutionary processes occur in isolated genomes of individual species has become an unrealistic abstraction. Adaptive variations also involved natural genetic engineering of mobile DNA elements to rewire regulatory networks. In the most highly evolved organisms, biological complexity scales with “non-coding” DNA content more closely than with protein-coding capacity. Coincidentally, we have learned how so-called “non-coding” RNAs that are rich in repetitive mobile DNA sequences are key regulators of complex phenotypes. Both biotic and abiotic ecological challenges serve as triggers for episodes of elevated genome change. The intersections of cell activities, biosphere interactions, horizontal DNA transfers, and non-random Read-Write genome modifications by natural genetic engineering provide a rich molecular and biological foundation for understanding how ecological disruptions can stimulate productive, often abrupt, evolutionary transformations. PMID:29211049
Complex interplay between neutral and adaptive evolution shaped differential genomic background and disease susceptibility along the Italian peninsula.

PubMed

Sazzini, Marco; Gnecchi Ruscone, Guido Alberto; Giuliani, Cristina; Sarno, Stefania; Quagliariello, Andrea; De Fanti, Sara; Boattini, Alessio; Gentilini, Davide; Fiorito, Giovanni; Catanoso, Mariagrazia; Boiardi, Luigi; Croci, Stefania; Macchioni, Pierluigi; Mantovani, Vilma; Di Blasio, Anna Maria; Matullo, Giuseppe; Salvarani, Carlo; Franceschi, Claudio; Pettener, Davide; Garagnani, Paolo; Luiselli, Donata

2016-09-01

The Italian peninsula has long represented a natural hub for human migrations across the Mediterranean area, being involved in several prehistoric and historical population movements. Coupled with a patchy environmental landscape entailing different ecological/cultural selective pressures, this might have produced peculiar patterns of population structure and local adaptations responsible for heterogeneous genomic background of present-day Italians. To disentangle this complex scenario, genome-wide data from 780 Italian individuals were generated and set into the context of European/Mediterranean genomic diversity by comparison with genotypes from 50 populations. To maximize possibility of pinpointing functional genomic regions that have played adaptive roles during Italian natural history, our survey included also ~250,000 exomic markers and ~20,000 coding/regulatory variants with well-established clinical relevance. This enabled fine-grained dissection of Italian population structure through the identification of clusters of genetically homogeneous provinces and of genomic regions underlying their local adaptations. Description of such patterns disclosed crucial implications for understanding differential susceptibility to some inflammatory/autoimmune disorders, coronary artery disease and type 2 diabetes of diverse Italian subpopulations, suggesting the evolutionary causes that made some of them particularly exposed to the metabolic and immune challenges imposed by dietary and lifestyle shifts that involved western societies in the last centuries.
DNaseI Hypersensitivity and Ultraconservation Reveal Novel, Interdependent Long-Range Enhancers at the Complex Pax6 Cis-Regulatory Region

PubMed Central

McBride, David J.; Buckle, Adam; van Heyningen, Veronica; Kleinjan, Dirk A.

2011-01-01

The PAX6 gene plays a crucial role in development of the eye, brain, olfactory system and endocrine pancreas. Consistent with its pleiotropic role the gene exhibits a complex developmental expression pattern which is subject to strict spatial, temporal and quantitative regulation. Control of expression depends on a large array of cis-elements residing in an extended genomic domain around the coding region of the gene. The minimal essential region required for proper regulation of this complex locus has been defined through analysis of human aniridia-associated breakpoints and YAC transgenic rescue studies of the mouse smalleye mutant. We have carried out a systematic DNase I hypersensitive site (HS) analysis across 200 kb of this critical region of mouse chromosome 2E3 to identify putative regulatory elements. Mapping the identified HSs onto a percent identity plot (PIP) shows many HSs correspond to recognisable genomic features such as evolutionarily conserved sequences, CpG islands and retrotransposon derived repeats. We then focussed on a region previously shown to contain essential long range cis-regulatory information, the Pax6 downstream regulatory region (DRR), allowing comparison of mouse HS data with previous human HS data for this region. Reporter transgenic mice for two of the HS sites, HS5 and HS6, show that they function as tissue specific regulatory elements. In addition we have characterised enhancer activity of an ultra-conserved cis-regulatory region located near Pax6, termed E60. All three cis-elements exhibit multiple spatio-temporal activities in the embryo that overlap between themselves and other elements in the locus. Using a deletion set of YAC reporter transgenic mice we demonstrate functional interdependence of the elements. Finally, we use the HS6 enhancer as a marker for the migration of precerebellar neuro-epithelium cells to the hindbrain precerebellar nuclei along the posterior and anterior extramural streams allowing visualisation of migratory defects in both pathways in Pax6Sey/Sey mice. PMID:22220192
RSAT 2015: Regulatory Sequence Analysis Tools

PubMed Central

Medina-Rivera, Alejandra; Defrance, Matthieu; Sand, Olivier; Herrmann, Carl; Castro-Mondragon, Jaime A.; Delerce, Jeremy; Jaeger, Sébastien; Blanchet, Christophe; Vincens, Pierre; Caron, Christophe; Staines, Daniel M.; Contreras-Moreira, Bruno; Artufel, Marie; Charbonnier-Khamvongsa, Lucie; Hernandez, Céline; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques

2015-01-01

RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/. PMID:25904632
Dark matter RNA: an intelligent scaffold for the dynamic regulation of the nuclear information landscape

PubMed Central

St. Laurent, Georges; Savva, Yiannis A.; Kapranov, Philipp

2012-01-01

Perhaps no other topic in contemporary genomics has inspired such diverse viewpoints as the 95+% of the genome, previously known as “junk DNA,” that does not code for proteins. Here, we present a theory in which dark matter RNA plays a role in the generation of a landscape of spatial micro-domains coupled to the information signaling matrix of the nuclear landscape. Within and between these micro-domains, dark matter RNAs additionally function to tether RNA interacting proteins and complexes of many different types, and by doing so, allow for a higher performance of the various processes requiring them at ultra-fast rates. This improves signal to noise characteristics of RNA processing, trafficking, and epigenetic signaling, where competition and differential RNA binding among proteins drives the computational decisions inherent in regulatory events. PMID:22539933
Contribution of transposable elements in the plant's genome.

PubMed

Sahebi, Mahbod; Hanafi, Mohamed M; van Wijnen, Andre J; Rice, David; Rafii, M Y; Azizi, Parisa; Osman, Mohamad; Taheri, Sima; Bakar, Mohd Faizal Abu; Isa, Mohd Noor Mat; Noor, Yusuf Muhammad

2018-07-30

Plants maintain extensive growth flexibility under different environmental conditions, allowing them to continuously and rapidly adapt to alterations in their environment. A large portion of many plant genomes consists of transposable elements (TEs) that create new genetic variations within plant species. Different types of mutations may be created by TEs in plants. Many TEs can avoid the host's defense mechanisms and survive alterations in transposition activity, internal sequence and target site. Thus, plant genomes are expected to utilize a variety of mechanisms to tolerate TEs that are near or within genes. TEs affect the expression of not only nearby genes but also unlinked inserted genes. TEs can create new promoters, leading to novel expression patterns or alternative coding regions to generate alternate transcripts in plant species. TEs can also provide novel cis-acting regulatory elements that act as enhancers or inserts within original enhancers that are required for transcription. Thus, the regulation of plant gene expression is strongly managed by the insertion of TEs into nearby genes. TEs can also lead to chromatin modifications and thereby affect gene expression in plants. TEs are able to generate new genes and modify existing gene structures by duplicating, mobilizing and recombining gene fragments. They can also facilitate cellular functions by sharing their transposase-coding regions. Hence, TE insertions can not only act as simple mutagens but can also alter the elementary functions of the plant genome. Here, we review recent discoveries concerning the contribution of TEs to gene expression in plant genomes and discuss the different mechanisms by which TEs can affect plant gene expression and reduce host defense mechanisms. Copyright © 2018 Elsevier B.V. All rights reserved.
Discovery and Fine-Mapping of Glycaemic and Obesity-Related Trait Loci Using High-Density Imputation.

PubMed

Horikoshi, Momoko; Mӓgi, Reedik; van de Bunt, Martijn; Surakka, Ida; Sarin, Antti-Pekka; Mahajan, Anubha; Marullo, Letizia; Thorleifsson, Gudmar; Hӓgg, Sara; Hottenga, Jouke-Jan; Ladenvall, Claes; Ried, Janina S; Winkler, Thomas W; Willems, Sara M; Pervjakova, Natalia; Esko, Tõnu; Beekman, Marian; Nelson, Christopher P; Willenborg, Christina; Wiltshire, Steven; Ferreira, Teresa; Fernandez, Juan; Gaulton, Kyle J; Steinthorsdottir, Valgerdur; Hamsten, Anders; Magnusson, Patrik K E; Willemsen, Gonneke; Milaneschi, Yuri; Robertson, Neil R; Groves, Christopher J; Bennett, Amanda J; Lehtimӓki, Terho; Viikari, Jorma S; Rung, Johan; Lyssenko, Valeriya; Perola, Markus; Heid, Iris M; Herder, Christian; Grallert, Harald; Müller-Nurasyid, Martina; Roden, Michael; Hypponen, Elina; Isaacs, Aaron; van Leeuwen, Elisabeth M; Karssen, Lennart C; Mihailov, Evelin; Houwing-Duistermaat, Jeanine J; de Craen, Anton J M; Deelen, Joris; Havulinna, Aki S; Blades, Matthew; Hengstenberg, Christian; Erdmann, Jeanette; Schunkert, Heribert; Kaprio, Jaakko; Tobin, Martin D; Samani, Nilesh J; Lind, Lars; Salomaa, Veikko; Lindgren, Cecilia M; Slagboom, P Eline; Metspalu, Andres; van Duijn, Cornelia M; Eriksson, Johan G; Peters, Annette; Gieger, Christian; Jula, Antti; Groop, Leif; Raitakari, Olli T; Power, Chris; Penninx, Brenda W J H; de Geus, Eco; Smit, Johannes H; Boomsma, Dorret I; Pedersen, Nancy L; Ingelsson, Erik; Thorsteinsdottir, Unnur; Stefansson, Kari; Ripatti, Samuli; Prokopenko, Inga; McCarthy, Mark I; Morris, Andrew P

2015-07-01

Reference panels from the 1000 Genomes (1000G) Project Consortium provide near complete coverage of common and low-frequency genetic variation with minor allele frequency ≥0.5% across European ancestry populations. Within the European Network for Genetic and Genomic Epidemiology (ENGAGE) Consortium, we have undertaken the first large-scale meta-analysis of genome-wide association studies (GWAS), supplemented by 1000G imputation, for four quantitative glycaemic and obesity-related traits, in up to 87,048 individuals of European ancestry. We identified two loci for body mass index (BMI) at genome-wide significance, and two for fasting glucose (FG), none of which has been previously reported in larger meta-analysis efforts to combine GWAS of European ancestry. Through conditional analysis, we also detected multiple distinct signals of association mapping to established loci for waist-hip ratio adjusted for BMI (RSPO3) and FG (GCK and G6PC2). The index variant for one association signal at the G6PC2 locus is a low-frequency coding allele, H177Y, which has recently been demonstrated to have a functional role in glucose regulation. Fine-mapping analyses revealed that the non-coding variants most likely to drive association signals at established and novel loci were enriched for overlap with enhancer elements, which for FG mapped to promoter and transcription factor binding sites in pancreatic islets, in particular. Our study demonstrates that 1000G imputation and genetic fine-mapping of common and low-frequency variant association signals at GWAS loci, integrated with genomic annotation in relevant tissues, can provide insight into the functional and regulatory mechanisms through which their effects on glycaemic and obesity-related traits are mediated.

Discovery and Fine-Mapping of Glycaemic and Obesity-Related Trait Loci Using High-Density Imputation

PubMed Central

van de Bunt, Martijn; Surakka, Ida; Sarin, Antti-Pekka; Mahajan, Anubha; Marullo, Letizia; Thorleifsson, Gudmar; Hӓgg, Sara; Hottenga, Jouke-Jan; Ladenvall, Claes; Ried, Janina S.; Winkler, Thomas W.; Willems, Sara M.; Pervjakova, Natalia; Esko, Tõnu; Beekman, Marian; Nelson, Christopher P.; Willenborg, Christina; Ferreira, Teresa; Fernandez, Juan; Gaulton, Kyle J.; Steinthorsdottir, Valgerdur; Hamsten, Anders; Magnusson, Patrik K. E.; Willemsen, Gonneke; Milaneschi, Yuri; Robertson, Neil R.; Groves, Christopher J.; Bennett, Amanda J.; Lehtimӓki, Terho; Viikari, Jorma S.; Rung, Johan; Lyssenko, Valeriya; Perola, Markus; Heid, Iris M.; Herder, Christian; Grallert, Harald; Müller-Nurasyid, Martina; Roden, Michael; Hypponen, Elina; Isaacs, Aaron; van Leeuwen, Elisabeth M.; Karssen, Lennart C.; Mihailov, Evelin; Houwing-Duistermaat, Jeanine J.; de Craen, Anton J. M.; Deelen, Joris; Havulinna, Aki S.; Blades, Matthew; Hengstenberg, Christian; Erdmann, Jeanette; Schunkert, Heribert; Kaprio, Jaakko; Tobin, Martin D.; Samani, Nilesh J.; Lind, Lars; Salomaa, Veikko; Lindgren, Cecilia M.; Slagboom, P. Eline; Metspalu, Andres; van Duijn, Cornelia M.; Eriksson, Johan G.; Peters, Annette; Gieger, Christian; Jula, Antti; Groop, Leif; Raitakari, Olli T.; Power, Chris; Penninx, Brenda W. J. H.; de Geus, Eco; Smit, Johannes H.; Boomsma, Dorret I.; Pedersen, Nancy L.; Ingelsson, Erik; Thorsteinsdottir, Unnur; Stefansson, Kari; Ripatti, Samuli; Prokopenko, Inga; McCarthy, Mark I.; Morris, Andrew P.

2015-01-01

Reference panels from the 1000 Genomes (1000G) Project Consortium provide near complete coverage of common and low-frequency genetic variation with minor allele frequency ≥0.5% across European ancestry populations. Within the European Network for Genetic and Genomic Epidemiology (ENGAGE) Consortium, we have undertaken the first large-scale meta-analysis of genome-wide association studies (GWAS), supplemented by 1000G imputation, for four quantitative glycaemic and obesity-related traits, in up to 87,048 individuals of European ancestry. We identified two loci for body mass index (BMI) at genome-wide significance, and two for fasting glucose (FG), none of which has been previously reported in larger meta-analysis efforts to combine GWAS of European ancestry. Through conditional analysis, we also detected multiple distinct signals of association mapping to established loci for waist-hip ratio adjusted for BMI (RSPO3) and FG (GCK and G6PC2). The index variant for one association signal at the G6PC2 locus is a low-frequency coding allele, H177Y, which has recently been demonstrated to have a functional role in glucose regulation. Fine-mapping analyses revealed that the non-coding variants most likely to drive association signals at established and novel loci were enriched for overlap with enhancer elements, which for FG mapped to promoter and transcription factor binding sites in pancreatic islets, in particular. Our study demonstrates that 1000G imputation and genetic fine-mapping of common and low-frequency variant association signals at GWAS loci, integrated with genomic annotation in relevant tissues, can provide insight into the functional and regulatory mechanisms through which their effects on glycaemic and obesity-related traits are mediated. PMID:26132169
Identification of coding and non-coding mutational hotspots in cancer genomes.

PubMed

Piraino, Scott W; Furney, Simon J

2017-01-05

The identification of mutations that play a causal role in tumour development, so called "driver" mutations, is of critical importance for understanding how cancers form and how they might be treated. Several large cancer sequencing projects have identified genes that are recurrently mutated in cancer patients, suggesting a role in tumourigenesis. While the landscape of coding drivers has been extensively studied and many of the most prominent driver genes are well characterised, comparatively less is known about the role of mutations in the non-coding regions of the genome in cancer development. The continuing fall in genome sequencing costs has resulted in a concomitant increase in the number of cancer whole genome sequences being produced, facilitating systematic interrogation of both the coding and non-coding regions of cancer genomes. To examine the mutational landscapes of tumour genomes we have developed a novel method to identify mutational hotspots in tumour genomes using both mutational data and information on evolutionary conservation. We have applied our methodology to over 1300 whole cancer genomes and show that it identifies prominent coding and non-coding regions that are known or highly suspected to play a role in cancer. Importantly, we applied our method to the entire genome, rather than relying on predefined annotations (e.g. promoter regions) and we highlight recurrently mutated regions that may have resulted from increased exposure to mutational processes rather than selection, some of which have been identified previously as targets of selection. Finally, we implicate several pan-cancer and cancer-specific candidate non-coding regions, which could be involved in tumourigenesis. We have developed a framework to identify mutational hotspots in cancer genomes, which is applicable to the entire genome. This framework identifies known and novel coding and non-coding mutional hotspots and can be used to differentiate candidate driver regions from likely passenger regions susceptible to somatic mutation.
A Catalogue of Putative cis-Regulatory Interactions Between Long Non-coding RNAs and Proximal Coding Genes Based on Correlative Analysis Across Diverse Human Tumors.

PubMed

Basu, Swaraj; Larsson, Erik

2018-05-31

Antisense transcripts and other long non-coding RNAs are pervasive in mammalian cells, and some of these molecules have been proposed to regulate proximal protein-coding genes in cis For example, non-coding transcription can contribute to inactivation of tumor suppressor genes in cancer, and antisense transcripts have been implicated in the epigenetic inactivation of imprinted genes. However, our knowledge is still limited and more such regulatory interactions likely await discovery. Here, we make use of available gene expression data from a large compendium of human tumors to generate hypotheses regarding non-coding-to-coding cis -regulatory relationships with emphasis on negative associations, as these are less likely to arise for reasons other than cis -regulation. We document a large number of possible regulatory interactions, including 193 coding/non-coding pairs that show expression patterns compatible with negative cis -regulation. Importantly, by this approach we capture several known cases, and many of the involved coding genes have known roles in cancer. Our study provides a large catalog of putative non-coding/coding cis -regulatory pairs that may serve as a basis for further experimental validation and characterization. Copyright © 2018 Basu and Larsson.
Allelic expression mapping across cellular lineages to establish impact of non-coding SNPs

PubMed Central

Adoue, Veronique; Schiavi, Alicia; Light, Nicholas; Almlöf, Jonas Carlsson; Lundmark, Per; Ge, Bing; Kwan, Tony; Caron, Maxime; Rönnblom, Lars; Wang, Chuan; Chen, Shu-Huang; Goodall, Alison H; Cambien, Francois; Deloukas, Panos; Ouwehand, Willem H; Syvänen, Ann-Christine; Pastinen, Tomi

2014-01-01

Most complex disease-associated genetic variants are located in non-coding regions and are therefore thought to be regulatory in nature. Association mapping of differential allelic expression (AE) is a powerful method to identify SNPs with direct cis-regulatory impact (cis-rSNPs). We used AE mapping to identify cis-rSNPs regulating gene expression in 55 and 63 HapMap lymphoblastoid cell lines from a Caucasian and an African population, respectively, 70 fibroblast cell lines, and 188 purified monocyte samples and found 40–60% of these cis-rSNPs to be shared across cell types. We uncover a new class of cis-rSNPs, which disrupt footprint-derived de novo motifs that are predominantly bound by repressive factors and are implicated in disease susceptibility through overlaps with GWAS SNPs. Finally, we provide the proof-of-principle for a new approach for genome-wide functional validation of transcription factor–SNP interactions. By perturbing NFκB action in lymphoblasts, we identified 489 cis-regulated transcripts with altered AE after NFκB perturbation. Altogether, we perform a comprehensive analysis of cis-variation in four cell populations and provide new tools for the identification of functional variants associated to complex diseases. PMID:25326100
A candidate gene for choanal atresia in alpaca.

PubMed

Reed, Kent M; Bauer, Miranda M; Mendoza, Kristelle M; Armién, Aníbal G

2010-03-01

Choanal atresia (CA) is a common nasal craniofacial malformation in New World domestic camelids (alpaca and llama). CA results from abnormal development of the nasal passages and is especially debilitating to newborn crias. CA in camelids shares many of the clinical manifestations of a similar condition in humans (CHARGE syndrome). Herein we report on the regulatory gene CHD7 of alpaca, whose homologue in humans is most frequently associated with CHARGE. Sequence of the CHD7 coding region was obtained from a non-affected cria. The complete coding region was 9003 bp, corresponding to a translated amino acid sequence of 3000 aa. Additional genomic sequences corresponding to a significant portion of the CHD7 gene were identified and assembled from the 2x alpaca whole genome sequence, providing confirmatory sequence for much of the CHD7 coding region. The alpaca CHD7 mRNA sequence was 97.9% similar to the human sequence, with the greatest sequence difference being an insertion in exon 38 that results in a polyalanine repeat (A12). Polymorphism in this repeat was tested for association with CA in alpaca by cloning and sequencing the repeat from both affected and non-affected individuals. Variation in length of the poly-A repeat was not associated with CA. Complete sequencing of the CHD7 gene will be necessary to determine whether other mutations in CHD7 are the cause of CA in camelids.
CoryneRegNet: an ontology-based data warehouse of corynebacterial transcription factors and regulatory networks.

PubMed

Baumbach, Jan; Brinkrolf, Karina; Czaja, Lisa F; Rahmann, Sven; Tauch, Andreas

2006-02-14

The application of DNA microarray technology in post-genomic analysis of bacterial genome sequences has allowed the generation of huge amounts of data related to regulatory networks. This data along with literature-derived knowledge on regulation of gene expression has opened the way for genome-wide reconstruction of transcriptional regulatory networks. These large-scale reconstructions can be converted into in silico models of bacterial cells that allow a systematic analysis of network behavior in response to changing environmental conditions. CoryneRegNet was designed to facilitate the genome-wide reconstruction of transcriptional regulatory networks of corynebacteria relevant in biotechnology and human medicine. During the import and integration process of data derived from experimental studies or literature knowledge CoryneRegNet generates links to genome annotations, to identified transcription factors and to the corresponding cis-regulatory elements. CoryneRegNet is based on a multi-layered, hierarchical and modular concept of transcriptional regulation and was implemented by using the relational database management system MySQL and an ontology-based data structure. Reconstructed regulatory networks can be visualized by using the yFiles JAVA graph library. As an application example of CoryneRegNet, we have reconstructed the global transcriptional regulation of a cellular module involved in SOS and stress response of corynebacteria. CoryneRegNet is an ontology-based data warehouse that allows a pertinent data management of regulatory interactions along with the genome-scale reconstruction of transcriptional regulatory networks. These models can further be combined with metabolic networks to build integrated models of cellular function including both metabolism and its transcriptional regulation.
Pervasive transcription: detecting functional RNAs in bacteria.

PubMed

Lybecker, Meghan; Bilusic, Ivana; Raghavan, Rahul

2014-01-01

Pervasive, or genome-wide, transcription has been reported in all domains of life. In bacteria, most pervasive transcription occurs antisense to protein-coding transcripts, although recently a new class of pervasive RNAs was identified that originates from within annotated genes. Initially considered to be non-functional transcriptional noise, pervasive transcription is increasingly being recognized as important in regulating gene expression. The function of pervasive transcription is an extensively debated question in the field of transcriptomics and regulatory RNA biology. Here, we highlight the most recent contributions addressing the purpose of pervasive transcription in bacteria and discuss their implications.
Bringing the fathead minnow into the genomic era | Science ...

EPA Pesticide Factsheets

The fathead minnow is a well-established ecotoxicological model organism that has been widely used for regulatory ecotoxicity testing and research for over a half century. While a large amount of molecular information has been gathered on the fathead minnow over the years, the lack of genomic sequence data has limited the utility of the fathead minnow for certain applications. To address this limitation, high-throughput Illumina sequencing technology was employed to sequence the fathead minnow genome. Approximately 100X coverage was achieved by sequencing several libraries of paired-end reads with differing genome insert sizes. Two draft genome assemblies were generated using the SOAPdenovo and String Graph Assembler (SGA) methods, respectively. When these were compared, the SOAPdenovo assembly had a higher scaffold N50 value of 60.4 kbp versus 15.4 kbp, and it also performed better in a Core Eukaryotic Genes Mapping Analysis (CEGMA), mapping 91% versus 67% of genes. As such, this assembly was selected for further development and annotation. The foundation for genome annotation was generated using AUGUSTUS, an ab initio method for gene prediction. A total of 43,345 potential coding sequences were predicted on the genome assembly. These predicted sequences were translated to peptides and queried in a BLAST search against all vertebrates, with 28,290 of these sequences corresponding to zebrafish peptides and 5,242 producing no significant alignments. Additional ty
Genomic and Epigenomic Alterations in Cancer.

PubMed

Chakravarthi, Balabhadrapatruni V S K; Nepal, Saroj; Varambally, Sooryanarayana

2016-07-01

Multiple genetic and epigenetic events characterize tumor progression and define the identity of the tumors. Advances in high-throughput technologies, like gene expression profiling, next-generation sequencing, proteomics, and metabolomics, have enabled detailed molecular characterization of various tumors. The integration and analyses of these high-throughput data have unraveled many novel molecular aberrations and network alterations in tumors. These molecular alterations include multiple cancer-driving mutations, gene fusions, amplification, deletion, and post-translational modifications, among others. Many of these genomic events are being used in cancer diagnosis, whereas others are therapeutically targeted with small-molecule inhibitors. Multiple genes/enzymes that play a role in DNA and histone modifications are also altered in various cancers, changing the epigenomic landscape during cancer initiation and progression. Apart from protein-coding genes, studies are uncovering the critical regulatory roles played by noncoding RNAs and noncoding regions of the genome during cancer progression. Many of these genomic and epigenetic events function in tandem to drive tumor development and metastasis. Concurrent advances in genome-modulating technologies, like gene silencing and genome editing, are providing ability to understand in detail the process of cancer initiation, progression, and signaling as well as opening up avenues for therapeutic targeting. In this review, we discuss some of the recent advances in cancer genomic and epigenomic research. Copyright © 2016 American Society for Investigative Pathology. Published by Elsevier Inc. All rights reserved.
The Janthinobacterium sp. HH01 Genome Encodes a Homologue of the V. cholerae CqsA and L. pneumophila LqsA Autoinducer Synthases

PubMed Central

Hornung, Claudia; Poehlein, Anja; Haack, Frederike S.; Schmidt, Martina; Dierking, Katja; Pohlen, Andrea; Schulenburg, Hinrich; Blokesch, Melanie; Plener, Laure; Jung, Kirsten; Bonge, Andreas; Krohn-Molt, Ines; Utpatel, Christian; Timmermann, Gabriele; Spieck, Eva; Pommerening-Röser, Andreas; Bode, Edna; Bode, Helge B.; Daniel, Rolf; Schmeisser, Christel; Streit, Wolfgang R.

2013-01-01

Janthinobacteria commonly form biofilms on eukaryotic hosts and are known to synthesize antibacterial and antifungal compounds. Janthinobacterium sp. HH01 was recently isolated from an aquatic environment and its genome sequence was established. The genome consists of a single chromosome and reveals a size of 7.10 Mb, being the largest janthinobacterial genome so far known. Approximately 80% of the 5,980 coding sequences (CDSs) present in the HH01 genome could be assigned putative functions. The genome encodes a wealth of secretory functions and several large clusters for polyketide biosynthesis. HH01 also encodes a remarkable number of proteins involved in resistance to drugs or heavy metals. Interestingly, the genome of HH01 apparently lacks the N-acylhomoserine lactone (AHL)-dependent signaling system and the AI-2-dependent quorum sensing regulatory circuit. Instead it encodes a homologue of the Legionella- and Vibrio-like autoinducer (lqsA/cqsA) synthase gene which we designated jqsA. The jqsA gene is linked to a cognate sensor kinase (jqsS) which is flanked by the response regulator jqsR. Here we show that a jqsA deletion has strong impact on the violacein biosynthesis in Janthinobacterium sp. HH01 and that a jqsA deletion mutant can be functionally complemented with the V. cholerae cqsA and the L. pneumophila lqsA genes. PMID:23405110
Relative stability of DNA as a generic criterion for promoter prediction: whole genome annotation of microbial genomes with varying nucleotide base composition.

PubMed

Rangannan, Vetriselvi; Bansal, Manju

2009-12-01

The rapid increase in genome sequence information has necessitated the annotation of their functional elements, particularly those occurring in the non-coding regions, in the genomic context. Promoter region is the key regulatory region, which enables the gene to be transcribed or repressed, but it is difficult to determine experimentally. Hence an in silico identification of promoters is crucial in order to guide experimental work and to pin point the key region that controls the transcription initiation of a gene. In this analysis, we demonstrate that while the promoter regions are in general less stable than the flanking regions, their average free energy varies depending on the GC composition of the flanking genomic sequence. We have therefore obtained a set of free energy threshold values, for genomic DNA with varying GC content and used them as generic criteria for predicting promoter regions in several microbial genomes, using an in-house developed tool PromPredict. On applying it to predict promoter regions corresponding to the 1144 and 612 experimentally validated TSSs in E. coli (50.8% GC) and B. subtilis (43.5% GC) sensitivity of 99% and 95% and precision values of 58% and 60%, respectively, were achieved. For the limited data set of 81 TSSs available for M. tuberculosis (65.6% GC) a sensitivity of 100% and precision of 49% was obtained.
Trichodesmium genome maintains abundant, widespread noncoding DNA in situ, despite oligotrophic lifestyle

DOE PAGES

Walworth, Nathan G.; Pfreundt, Ulrike; Nelson, William C.; ...

2015-04-07

Understanding the evolution of the free-living, cyanobacterial, diazotroph Trichodesmium is of great importance due to its critical role in oceanic biogeochemistry and primary production. Unlike the other >150 available genomes of free-living cyanobacteria, only 63.8% of the Trichodesmium erythraeum (strain IMS101) genome is predicted to encode protein, which is 20-25% less than the average for other cyanobacteria and non-pathogenic, free-living bacteria. We use distinctive isolates and metagenomic data to show that low coding density observed in IMS101 is a common feature of the Trichodesmium genus both in culture and in situ. Transcriptome analysis indicates that 86% of the non-coding spacemore » is expressed, although the function of these transcripts is unclear. The density of noncoding, possible regulatory elements predicted in Trichodesmium, when normalized per intergenic kilobase, was comparable and two fold higher than that found in the gene dense genomes of the sympatric cyanobacterial genera Synechococcus and Prochlorococcus, respectively. Conserved Trichodesmium ncRNA secondary structures were predicted between most culture and metagenomic sequences lending support to the structural conservation. Conservation of these intergenic regions in spatiotemporally separated Trichodesmium populations suggests possible genus-wide selection for their maintenance. These large intergenic spacers may have developed during intervals of strong genetic drift caused by periodic blooms of a subset of genotypes, which may have reduced effective population size. Our data suggest that transposition of selfish DNA, low effective population size, and high fidelity replication allowed the unusual ‘inflation’ of noncoding sequence observed in Trichodesmium despite its oligotrophic lifestyle.« less
Transposable elements at the center of the crossroads between embryogenesis, embryonic stem cells, reprogramming, and long non-coding RNAs.

PubMed

Hutchins, Andrew Paul; Pei, Duanqing

Transposable elements (TEs) are mobile genomic sequences of DNA capable of autonomous and non-autonomous duplication. TEs have been highly successful, and nearly half of the human genome now consists of various families of TEs. Originally thought to be non-functional, these elements have been co-opted by animal genomes to perform a variety of physiological functions ranging from TE-derived proteins acting directly in normal biological functions, to innovations in transcription factor logic and influence on epigenetic control of gene expression. During embryonic development, when the genome is epigenetically reprogrammed and DNA-demethylated, TEs are released from repression and show embryonic stage-specific expression, and in human and mouse embryos, intact TE-derived endogenous viral particles can even be detected. A similar process occurs during the reprogramming of somatic cells to pluripotent cells: When the somatic DNA is demethylated, TEs are released from repression. In embryonic stem cells (ESCs), where DNA is hypomethylated, an elaborate system of epigenetic control is employed to suppress TEs, a system that often overlaps with normal epigenetic control of ESC gene expression. Finally, many long non-coding RNAs (lncRNAs) involved in normal ESC function and those assisting or impairing reprogramming contain multiple TEs in their RNA. These TEs may act as regulatory units to recruit RNA-binding proteins and epigenetic modifiers. This review covers how TEs are interlinked with the epigenetic machinery and lncRNAs, and how these links influence each other to modulate aspects of ESCs, embryogenesis, and somatic cell reprogramming.
In the shadow: The emerging role of long non-coding RNAs in the immune response of Atlantic salmon.

PubMed

Tarifeño-Saldivia, E; Valenzuela-Miranda, D; Gallardo-Escárate, C

2017-08-01

The genomic era has increased the research effort to uncover how the genome of an organism, and specifically the transcriptome, is modulated after interplaying with pathogenic microorganisms and ectoparasites. However, the ever-increasing accessibility of sequencing technology has also evidenced regulatory roles of long non-coding RNAs (lncRNAs) related to several biological processes including immune response. This study reports a high-confidence annotation and a comparative transcriptome analysis of lncRNAs from several tissues of Salmo salar infected with the most prevalent pathogens in the Chilean salmon aquaculture such as the infectious salmon anemia (ISA) virus, the intracellular bacterium Piscirickettsia salmonis and the ectoparasite copepod Caligus rogercresseyi. Our analyses showed that lncRNAs are widely modulated during infection. However, this modulation is pathogen-specific and highly correlated with immuno-related genes associated with innate immune response. These findings represent the first discovery for the widespread differential expression of lncRNAs in response to infections with different types of pathogens in Atlantic salmon, suggesting that lncRNAs are pivotal player during the fish immune response. Copyright © 2017 Elsevier Ltd. All rights reserved.
Core signaling pathways in human pancreatic cancers revealed by global genomic analyses.

PubMed

Jones, Siân; Zhang, Xiaosong; Parsons, D Williams; Lin, Jimmy Cheng-Ho; Leary, Rebecca J; Angenendt, Philipp; Mankoo, Parminder; Carter, Hannah; Kamiyama, Hirohiko; Jimeno, Antonio; Hong, Seung-Mo; Fu, Baojin; Lin, Ming-Tseh; Calhoun, Eric S; Kamiyama, Mihoko; Walter, Kimberly; Nikolskaya, Tatiana; Nikolsky, Yuri; Hartigan, James; Smith, Douglas R; Hidalgo, Manuel; Leach, Steven D; Klein, Alison P; Jaffee, Elizabeth M; Goggins, Michael; Maitra, Anirban; Iacobuzio-Donahue, Christine; Eshleman, James R; Kern, Scott E; Hruban, Ralph H; Karchin, Rachel; Papadopoulos, Nickolas; Parmigiani, Giovanni; Vogelstein, Bert; Velculescu, Victor E; Kinzler, Kenneth W

2008-09-26

There are currently few therapeutic options for patients with pancreatic cancer, and new insights into the pathogenesis of this lethal disease are urgently needed. Toward this end, we performed a comprehensive genetic analysis of 24 pancreatic cancers. We first determined the sequences of 23,219 transcripts, representing 20,661 protein-coding genes, in these samples. Then, we searched for homozygous deletions and amplifications in the tumor DNA by using microarrays containing probes for approximately 10(6) single-nucleotide polymorphisms. We found that pancreatic cancers contain an average of 63 genetic alterations, the majority of which are point mutations. These alterations defined a core set of 12 cellular signaling pathways and processes that were each genetically altered in 67 to 100% of the tumors. Analysis of these tumors' transcriptomes with next-generation sequencing-by-synthesis technologies provided independent evidence for the importance of these pathways and processes. Our data indicate that genetically altered core pathways and regulatory processes only become evident once the coding regions of the genome are analyzed in depth. Dysregulation of these core pathways and processes through mutation can explain the major features of pancreatic tumorigenesis.
Genome-Wide Identification of Regulatory Sequences Undergoing Accelerated Evolution in the Human Genome

PubMed Central

Dong, Xinran; Wang, Xiao; Zhang, Feng; Tian, Weidong

2016-01-01

Accelerated evolution of regulatory sequence can alter the expression pattern of target genes, and cause phenotypic changes. In this study, we used DNase I hypersensitive sites (DHSs) to annotate putative regulatory sequences in the human genome, and conducted a genome-wide analysis of the effects of accelerated evolution on regulatory sequences. Working under the assumption that local ancient repeat elements of DHSs are under neutral evolution, we discovered that ∼0.44% of DHSs are under accelerated evolution (ace-DHSs). We found that ace-DHSs tend to be more active than background DHSs, and are strongly associated with epigenetic marks of active transcription. The target genes of ace-DHSs are significantly enriched in neuron-related functions, and their expression levels are positively selected in the human brain. Thus, these lines of evidences strongly suggest that accelerated evolution on regulatory sequences plays important role in the evolution of human-specific phenotypes. PMID:27401230
The nucleotide composition of microbial genomes indicates differential patterns of selection on core and accessory genomes.

PubMed

Bohlin, Jon; Eldholm, Vegard; Pettersson, John H O; Brynildsrud, Ola; Snipen, Lars

2017-02-10

The core genome consists of genes shared by the vast majority of a species and is therefore assumed to have been subjected to substantially stronger purifying selection than the more mobile elements of the genome, also known as the accessory genome. Here we examine intragenic base composition differences in core genomes and corresponding accessory genomes in 36 species, represented by the genomes of 731 bacterial strains, to assess the impact of selective forces on base composition in microbes. We also explore, in turn, how these results compare with findings for whole genome intragenic regions. We found that GC content in coding regions is significantly higher in core genomes than accessory genomes and whole genomes. Likewise, GC content variation within coding regions was significantly lower in core genomes than in accessory genomes and whole genomes. Relative entropy in coding regions, measured as the difference between observed and expected trinucleotide frequencies estimated from mononucleotide frequencies, was significantly higher in the core genomes than in accessory and whole genomes. Relative entropy was positively associated with coding region GC content within the accessory genomes, but not within the corresponding coding regions of core or whole genomes. The higher intragenic GC content and relative entropy, as well as the lower GC content variation, observed in the core genomes is most likely associated with selective constraints. It is unclear whether the positive association between GC content and relative entropy in the more mobile accessory genomes constitutes signatures of selection or selective neutral processes.
An Efficient Strategy Combining SSR Markers- and Advanced QTL-seq-driven QTL Mapping Unravels Candidate Genes Regulating Grain Weight in Rice

PubMed Central

Daware, Anurag; Das, Sweta; Srivastava, Rishi; Badoni, Saurabh; Singh, Ashok K.; Agarwal, Pinky; Parida, Swarup K.; Tyagi, Akhilesh K.

2016-01-01

Development and use of genome-wide informative simple sequence repeat (SSR) markers and novel integrated genomic strategies are vital to drive genomics-assisted breeding applications and for efficient dissection of quantitative trait loci (QTLs) underlying complex traits in rice. The present study developed 6244 genome-wide informative SSR markers exhibiting in silico fragment length polymorphism based on repeat-unit variations among genomic sequences of 11 indica, japonica, aus, and wild rice accessions. These markers were mapped on diverse coding and non-coding sequence components of known cloned/candidate genes annotated from 12 chromosomes and revealed a much higher amplification (97%) and polymorphic potential (88%) along with wider genetic/functional diversity level (16–74% with a mean 53%) especially among accessions belonging to indica cultivar group, suggesting their utility in large-scale genomics-assisted breeding applications in rice. A high-density 3791 SSR markers-anchored genetic linkage map (IR 64 × Sonasal) spanning 2060 cM total map-length with an average inter-marker distance of 0.54 cM was generated. This reference genetic map identified six major genomic regions harboring robust QTLs (31% combined phenotypic variation explained with a 5.7–8.7 LOD) governing grain weight on six rice chromosomes. One strong grain weight major QTL region (OsqGW5.1) was narrowed-down by integrating traditional QTL mapping with high-resolution QTL region-specific integrated SSR and single nucleotide polymorphism markers-based QTL-seq analysis and differential expression profiling. This led us to delineate two natural allelic variants in two known cis-regulatory elements (RAV1AAT and CARGCW8GAT) of glycosyl hydrolase and serine carboxypeptidase genes exhibiting pronounced seed-specific differential regulation in low (Sonasal) and high (IR 64) grain weight mapping parental accessions. Our genome-wide SSR marker resource (polymorphic within/between diverse cultivar groups) and integrated genomic strategy can efficiently scan functionally relevant potential molecular tags (markers, candidate genes and alleles) regulating complex agronomic traits (grain weight) and expedite marker-assisted genetic enhancement in rice. PMID:27833617
Enhancer scanning to locate regulatory regions in genomic loci

PubMed Central

Buckley, Melissa; Gjyshi, Anxhela; Mendoza-Fandiño, Gustavo; Baskin, Rebekah; Carvalho, Renato S.; Carvalho, Marcelo A.; Woods, Nicholas T.; Monteiro, Alvaro N.A.

2016-01-01

The present protocol provides a rapid, streamlined and scalable strategy to systematically scan genomic regions for the presence of transcriptional regulatory regions active in a specific cell type. It creates genomic tiles spanning a region of interest that are subsequently cloned by recombination into a luciferase reporter vector containing the Simian Virus 40 promoter. Tiling clones are transfected into specific cell types to test for the presence of transcriptional regulatory regions. The protocol includes testing of different SNP (single nucleotide polymorphism) alleles to determine their effect on regulatory activity. This procedure provides a systematic framework to identify candidate functional SNPs within a locus during functional analysis of genome-wide association studies. This protocol adapts and combines previous well-established molecular biology methods to provide a streamlined strategy, based on automated primer design and recombinational cloning to rapidly go from a genomic locus to a set of candidate functional SNPs in eight weeks. PMID:26658467
GARLIC: a bioinformatic toolkit for aetiologically connecting diseases and cell type-specific regulatory maps

PubMed Central

Nikolić, Miloš; Papantonis, Argyris

2017-01-01

Abstract Genome-wide association studies (GWAS) have emerged as a powerful tool to uncover the genetic basis of human common diseases, which often show a complex, polygenic and multi-factorial aetiology. These studies have revealed that 70–90% of all single nucleotide polymorphisms (SNPs) associated with common complex diseases do not occur within genes (i.e. they are non-coding), making the discovery of disease-causative genetic variants and the elucidation of the underlying pathological mechanisms far from straightforward. Based on emerging evidences suggesting that disease-associated SNPs are frequently found within cell type-specific regulatory sequences, here we present GARLIC (GWAS-based Prediction Toolkit for Connecting Diseases and Cell Types), a user-friendly, multi-purpose software with an associated database and online viewer that, using global maps of cis-regulatory elements, can aetiologically connect human diseases with relevant cell types. Additionally, GARLIC can be used to retrieve potential disease-causative genetic variants overlapping regulatory sequences of interest. Overall, GARLIC can satisfy several important needs within the field of medical genetics, thus potentially assisting in the ultimate goal of uncovering the elusive and complex genetic basis of common human disorders. PMID:28007912

A novel method for in silico identification of regulatory SNPs in human genome.

PubMed

Li, Rong; Zhong, Dexing; Liu, Ruiling; Lv, Hongqiang; Zhang, Xinman; Liu, Jun; Han, Jiuqiang

2017-02-21

Regulatory single nucleotide polymorphisms (rSNPs), kind of functional noncoding genetic variants, can affect gene expression in a regulatory way, and they are thought to be associated with increased susceptibilities to complex diseases. Here a novel computational approach to identify potential rSNPs is presented. Different from most other rSNPs finding methods which based on hypothesis that SNPs causing large allele-specific changes in transcription factor binding affinities are more likely to play regulatory functions, we use a set of documented experimentally verified rSNPs and nonfunctional background SNPs to train classifiers, so the discriminating features are found. To characterize variants, an extensive range of characteristics, such as sequence context, DNA structure and evolutionary conservation etc. are analyzed. Support vector machine is adopted to build the classifier model together with an ensemble method to deal with unbalanced data. 10-fold cross-validation result shows that our method can achieve accuracy with sensitivity of ~78% and specificity of ~82%. Furthermore, our method performances better than some other algorithms based on aforementioned hypothesis in handling false positives. The original data and the source matlab codes involved are available at https://sourceforge.net/projects/rsnppredict/. Copyright © 2016 Elsevier Ltd. All rights reserved.
Transterm: a database to aid the analysis of regulatory sequences in mRNAs

PubMed Central

Jacobs, Grant H.; Chen, Augustine; Stevens, Stewart G.; Stockwell, Peter A.; Black, Michael A.; Tate, Warren P.; Brown, Chris M.

2009-01-01

Messenger RNAs, in addition to coding for proteins, may contain regulatory elements that affect how the protein is translated. These include protein and microRNA-binding sites. Transterm (http://mRNA.otago.ac.nz/Transterm.html) is a database of regions and elements that affect translation with two major unique components. The first is integrated results of analysis of general features that affect translation (initiation, elongation, termination) for species or strains in Genbank, processed through a standard pipeline. The second is curated descriptions of experimentally determined regulatory elements that function as translational control elements in mRNAs. Transterm focuses on protein binding sites, particularly those in 3′-untranslated regions (3′-UTR). For this release the interface has been extensively updated based on user feedback. The data is now accessible by strain rather than species, for example there are 10 Escherichia coli strains (genomes) analysed separately. In addition to providing a repository of data, the database also provides tools for users to query their own mRNA sequences. Users can search sequences for Transterm or user defined regulatory elements, including protein or miRNA targets. Transterm also provides a central core of links to related resources for complementary analyses. PMID:18984623
Functional Assessment of Disease-Associated Regulatory Variants In Vivo Using a Versatile Dual Colour Transgenesis Strategy in Zebrafish

PubMed Central

Bhatia, Shipra; Gordon, Christopher T.; Foster, Robert G.; Melin, Lucie; Abadie, Véronique; Baujat, Geneviève; Vazquez, Marie-Paule; Amiel, Jeanne; Lyonnet, Stanislas; van Heyningen, Veronica; Kleinjan, Dirk A.

2015-01-01

Disruption of gene regulation by sequence variation in non-coding regions of the genome is now recognised as a significant cause of human disease and disease susceptibility. Sequence variants in cis-regulatory elements (CREs), the primary determinants of spatio-temporal gene regulation, can alter transcription factor binding sites. While technological advances have led to easy identification of disease-associated CRE variants, robust methods for discerning functional CRE variants from background variation are lacking. Here we describe an efficient dual-colour reporter transgenesis approach in zebrafish, simultaneously allowing detailed in vivo comparison of spatio-temporal differences in regulatory activity between putative CRE variants and assessment of altered transcription factor binding potential of the variant. We validate the method on known disease-associated elements regulating SHH, PAX6 and IRF6 and subsequently characterise novel, ultra-long-range SOX9 enhancers implicated in the craniofacial abnormality Pierre Robin Sequence. The method provides a highly cost-effective, fast and robust approach for simultaneously unravelling in a single assay whether, where and when in embryonic development a disease-associated CRE-variant is affecting its regulatory function. PMID:26030420
CTCF and Cohesin in Genome Folding and Transcriptional Gene Regulation.

PubMed

Merkenschlager, Matthias; Nora, Elphège P

2016-08-31

Genome function, replication, integrity, and propagation rely on the dynamic structural organization of chromosomes during the cell cycle. Genome folding in interphase provides regulatory segmentation for appropriate transcriptional control, facilitates ordered genome replication, and contributes to genome integrity by limiting illegitimate recombination. Here, we review recent high-resolution chromosome conformation capture and functional studies that have informed models of the spatial and regulatory compartmentalization of mammalian genomes, and discuss mechanistic models for how CTCF and cohesin control the functional architecture of mammalian chromosomes.
78 FR 37848 - ASME Code Cases Not Approved for Use

Federal Register 2010, 2011, 2012, 2013, 2014

2013-06-24

...The U.S. Nuclear Regulatory Commission (NRC) is issuing for public comment draft regulatory guide (DG), DG-1233, ``ASME Code Cases not Approved for Use.'' This regulatory guide lists the American Society of Mechanical Engineers (ASME) Code Cases that the NRC has determined not to be acceptable for use on a generic basis.
EPA ACTIVITIES TO PREPARE FOR REGULATORY AND RISK ASSESSMENT APPLICATIONS OF GENOMICS INFORMATION

EPA Science Inventory

Genomics will have significant implications for risk assessment and regulatory decision making. Since 2002, the U.S. EPA has undertaken a number of cross-Agency activities to further prepare itself to receive,interpret and apply genomics information for risk assessment and regul...
Extensive Evolutionary Changes in Regulatory Element Activity during Human Origins Are Associated with Altered Gene Expression and Positive Selection

PubMed Central

Fedrigo, Olivier; Babbitt, Courtney C.; Wortham, Matthew; Tewari, Alok K.; London, Darin; Song, Lingyun; Lee, Bum-Kyu; Iyer, Vishwanath R.; Parker, Stephen C. J.; Margulies, Elliott H.; Wray, Gregory A.; Furey, Terrence S.; Crawford, Gregory E.

2012-01-01

Understanding the molecular basis for phenotypic differences between humans and other primates remains an outstanding challenge. Mutations in non-coding regulatory DNA that alter gene expression have been hypothesized as a key driver of these phenotypic differences. This has been supported by differential gene expression analyses in general, but not by the identification of specific regulatory elements responsible for changes in transcription and phenotype. To identify the genetic source of regulatory differences, we mapped DNaseI hypersensitive (DHS) sites, which mark all types of active gene regulatory elements, genome-wide in the same cell type isolated from human, chimpanzee, and macaque. Most DHS sites were conserved among all three species, as expected based on their central role in regulating transcription. However, we found evidence that several hundred DHS sites were gained or lost on the lineages leading to modern human and chimpanzee. Species-specific DHS site gains are enriched near differentially expressed genes, are positively correlated with increased transcription, show evidence of branch-specific positive selection, and overlap with active chromatin marks. Species-specific sequence differences in transcription factor motifs found within these DHS sites are linked with species-specific changes in chromatin accessibility. Together, these indicate that the regulatory elements identified here are genetic contributors to transcriptional and phenotypic differences among primate species. PMID:22761590
RSAT 2015: Regulatory Sequence Analysis Tools.

PubMed

Medina-Rivera, Alejandra; Defrance, Matthieu; Sand, Olivier; Herrmann, Carl; Castro-Mondragon, Jaime A; Delerce, Jeremy; Jaeger, Sébastien; Blanchet, Christophe; Vincens, Pierre; Caron, Christophe; Staines, Daniel M; Contreras-Moreira, Bruno; Artufel, Marie; Charbonnier-Khamvongsa, Lucie; Hernandez, Céline; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques

2015-07-01

RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
TEAD and YAP regulate the enhancer network of human embryonic pancreatic progenitors.

PubMed

Cebola, Inês; Rodríguez-Seguí, Santiago A; Cho, Candy H-H; Bessa, José; Rovira, Meritxell; Luengo, Mario; Chhatriwala, Mariya; Berry, Andrew; Ponsa-Cobas, Joan; Maestro, Miguel Angel; Jennings, Rachel E; Pasquali, Lorenzo; Morán, Ignasi; Castro, Natalia; Hanley, Neil A; Gomez-Skarmeta, Jose Luis; Vallier, Ludovic; Ferrer, Jorge

2015-05-01

The genomic regulatory programmes that underlie human organogenesis are poorly understood. Pancreas development, in particular, has pivotal implications for pancreatic regeneration, cancer and diabetes. We have now characterized the regulatory landscape of embryonic multipotent progenitor cells that give rise to all pancreatic epithelial lineages. Using human embryonic pancreas and embryonic-stem-cell-derived progenitors we identify stage-specific transcripts and associated enhancers, many of which are co-occupied by transcription factors that are essential for pancreas development. We further show that TEAD1, a Hippo signalling effector, is an integral component of the transcription factor combinatorial code of pancreatic progenitor enhancers. TEAD and its coactivator YAP activate key pancreatic signalling mediators and transcription factors, and regulate the expansion of pancreatic progenitors. This work therefore uncovers a central role for TEAD and YAP as signal-responsive regulators of multipotent pancreatic progenitors, and provides a resource for the study of embryonic development of the human pancreas.
Genetic therapy for the nervous system

PubMed Central

Bowers, William J.; Breakefield, Xandra O.; Sena-Esteves, Miguel

2011-01-01

Genetic therapy is undergoing a renaissance with expansion of viral and synthetic vectors, use of oligonucleotides (RNA and DNA) and sequence-targeted regulatory molecules, as well as genetically modified cells, including induced pluripotent stem cells from the patients themselves. Several clinical trials for neurologic syndromes appear quite promising. This review covers genetic strategies to ameliorate neurologic syndromes of different etiologies, including lysosomal storage diseases, Alzheimer's disease and other amyloidopathies, Parkinson's disease, spinal muscular atrophy, amyotrophic lateral sclerosis and brain tumors. This field has been propelled by genetic technologies, including identifying disease genes and disruptive mutations, design of genomic interacting elements to regulate transcription and splicing of specific precursor mRNAs and use of novel non-coding regulatory RNAs. These versatile new tools for manipulation of genetic elements provide the ability to tailor the mode of genetic intervention to specific aspects of a disease state. PMID:21429918
HOXA1 and TALE proteins display cross-regulatory interactions and form a combinatorial binding code on HOXA1 targets.

PubMed

De Kumar, Bony; Parker, Hugo J; Paulson, Ariel; Parrish, Mark E; Pushel, Irina; Singh, Narendra Pratap; Zhang, Ying; Slaughter, Brian D; Unruh, Jay R; Florens, Laurence; Zeitlinger, Julia; Krumlauf, Robb

2017-09-01

Hoxa1 has diverse functional roles in differentiation and development. We identify and characterize properties of regions bound by HOXA1 on a genome-wide basis in differentiating mouse ES cells. HOXA1-bound regions are enriched for clusters of consensus binding motifs for HOX, PBX, and MEIS, and many display co-occupancy of PBX and MEIS. PBX and MEIS are members of the TALE family and genome-wide analysis of multiple TALE members (PBX, MEIS, TGIF, PREP1, and PREP2) shows that nearly all HOXA1 targets display occupancy of one or more TALE members. The combinatorial binding patterns of TALE proteins define distinct classes of HOXA1 targets, which may create functional diversity. Transgenic reporter assays in zebrafish confirm enhancer activities for many HOXA1-bound regions and the importance of HOX-PBX and TGIF motifs for their regulation. Proteomic analyses show that HOXA1 physically interacts on chromatin with PBX, MEIS, and PREP family members, but not with TGIF, suggesting that TGIF may have an independent input into HOXA1-bound regions. Therefore, TALE proteins appear to represent a wide repertoire of HOX cofactors, which may coregulate enhancers through distinct mechanisms. We also discover extensive auto- and cross-regulatory interactions among the Hoxa1 and TALE genes, indicating that the specificity of HOXA1 during development may be regulated though a complex cross-regulatory network of HOXA1 and TALE proteins. This study provides new insight into a regulatory network involving combinatorial interactions between HOXA1 and TALE proteins. © 2017 De Kumar et al.; Published by Cold Spring Harbor Laboratory Press.
Identification, characterization and expression analysis of pigeonpea miRNAs in response to Fusarium wilt.

PubMed

Hussain, Khalid; Mungikar, Kanak; Kulkarni, Abhijeet; Kamble, Avinash

2018-05-05

Upon confrontation with unfavourable conditions, plants invoke a very complex set of biochemical and physiological reactions and alter gene expression patterns to combat the situations. MicroRNAs (miRNAs), a class of small non-coding RNA, contribute extensively in regulation of gene expression through translation inhibition or degradation of their target mRNAs during such conditions. Therefore, identification of miRNAs and their targets holds importance in understanding the regulatory networks triggered during stress. Structure and sequence similarity based in silico prediction of miRNAs in Cajanus cajan L. (Pigeonpea) draft genome sequence has been carried out earlier. These annotations also appear in related GenBank genome sequence entries. However, there are no reports available on context dependent miRNA expression and their targets in pigeonpea. Therefore, in the present study we addressed these questions computationally, using pigeonpea EST sequence information. We identified five novel pigeonpea miRNA precursors, their mature forms and targets. Interestingly, only one of these miRNAs (miR169i-3p) was identified earlier in draft genome sequence. We then validated expression of these miRNAs, experimentally. It was also observed that these miRNAs show differential expression patterns in response to Fusarium inoculation indicating their biotic stress responsive nature. Overall these results will help towards better understanding the regulatory network of defense during pigeonpea -pathogen interactions and role of miRNAs in the process. Copyright © 2018 Elsevier B.V. All rights reserved.
CoryneRegNet: An ontology-based data warehouse of corynebacterial transcription factors and regulatory networks

PubMed Central

Baumbach, Jan; Brinkrolf, Karina; Czaja, Lisa F; Rahmann, Sven; Tauch, Andreas

2006-01-01

Background The application of DNA microarray technology in post-genomic analysis of bacterial genome sequences has allowed the generation of huge amounts of data related to regulatory networks. This data along with literature-derived knowledge on regulation of gene expression has opened the way for genome-wide reconstruction of transcriptional regulatory networks. These large-scale reconstructions can be converted into in silico models of bacterial cells that allow a systematic analysis of network behavior in response to changing environmental conditions. Description CoryneRegNet was designed to facilitate the genome-wide reconstruction of transcriptional regulatory networks of corynebacteria relevant in biotechnology and human medicine. During the import and integration process of data derived from experimental studies or literature knowledge CoryneRegNet generates links to genome annotations, to identified transcription factors and to the corresponding cis-regulatory elements. CoryneRegNet is based on a multi-layered, hierarchical and modular concept of transcriptional regulation and was implemented by using the relational database management system MySQL and an ontology-based data structure. Reconstructed regulatory networks can be visualized by using the yFiles JAVA graph library. As an application example of CoryneRegNet, we have reconstructed the global transcriptional regulation of a cellular module involved in SOS and stress response of corynebacteria. Conclusion CoryneRegNet is an ontology-based data warehouse that allows a pertinent data management of regulatory interactions along with the genome-scale reconstruction of transcriptional regulatory networks. These models can further be combined with metabolic networks to build integrated models of cellular function including both metabolism and its transcriptional regulation. PMID:16478536
Genome-wide colonization of gene regulatory elements by G4 DNA motifs

PubMed Central

Du, Zhuo; Zhao, Yiqiang; Li, Ning

2009-01-01

G-quadruplex (or G4 DNA), a stable four-stranded structure found in guanine-rich regions, is implicated in the transcriptional regulation of genes involved in growth and development. Previous studies on the role of G4 DNA in gene regulation mostly focused on genomic regions proximal to transcription start sites (TSSs). To gain a more comprehensive understanding of the regulatory role of G4 DNA, we examined the landscape of potential G4 DNA (PG4Ms) motifs in the human genome and found that G4 motifs, not restricted to those found in the TSS-proximal regions, are bias toward gene-associated regions. Significantly, analyses of G4 motifs in seven types of well-known gene regulatory elements revealed a constitutive enrichment pattern and the clusters of G4 motifs tend to be colocalized with regulatory elements. Considering our analysis from a genome evolutionary perspective, we found evidence that the occurrence and accumulation of certain progenitors and canonical G4 DNA motifs within regulatory regions were progressively favored by natural selection. Our results suggest that G4 DNA motifs are ‘colonized’ in regulatory regions, supporting a likely genome-wide role of G4 DNA in gene regulation. We hypothesize that G4 DNA is a regulatory apparatus situated in regulatory elements, acting as a molecular switch that can modulate the role of the host functional regions, by transition in DNA structure. PMID:19759215
Mitochondrial genome evolution in the Saccharomyces sensu stricto complex.

PubMed

Ruan, Jiangxing; Cheng, Jian; Zhang, Tongcun; Jiang, Huifeng

2017-01-01

Exploring the evolutionary patterns of mitochondrial genomes is important for our understanding of the Saccharomyces sensu stricto (SSS) group, which is a model system for genomic evolution and ecological analysis. In this study, we first obtained the complete mitochondrial sequences of two important species, Saccharomyces mikatae and Saccharomyces kudriavzevii. We then compared the mitochondrial genomes in the SSS group with those of close relatives, and found that the non-coding regions evolved rapidly, including dramatic expansion of intergenic regions, fast evolution of introns and almost 20-fold higher rearrangement rates than those of the nuclear genomes. However, the coding regions, and especially the protein-coding genes, are more conserved than those in the nuclear genomes of the SSS group. The different evolutionary patterns of coding and non-coding regions in the mitochondrial and nuclear genomes may be related to the origin of the aerobic fermentation lifestyle in this group. Our analysis thus provides novel insights into the evolution of mitochondrial genomes.
Network perturbation by recurrent regulatory variants in cancer

PubMed Central

Cho, Ara; Lee, Insuk; Choi, Jung Kyoon

2017-01-01

Cancer driving genes have been identified as recurrently affected by variants that alter protein-coding sequences. However, a majority of cancer variants arise in noncoding regions, and some of them are thought to play a critical role through transcriptional perturbation. Here we identified putative transcriptional driver genes based on combinatorial variant recurrence in cis-regulatory regions. The identified genes showed high connectivity in the cancer type-specific transcription regulatory network, with high outdegree and many downstream genes, highlighting their causative role during tumorigenesis. In the protein interactome, the identified transcriptional drivers were not as highly connected as coding driver genes but appeared to form a network module centered on the coding drivers. The coding and regulatory variants associated via these interactions between the coding and transcriptional drivers showed exclusive and complementary occurrence patterns across tumor samples. Transcriptional cancer drivers may act through an extensive perturbation of the regulatory network and by altering protein network modules through interactions with coding driver genes. PMID:28333928
New data and an old puzzle: the negative association between schizophrenia and rheumatoid arthritis.

PubMed

Lee, S Hong; Byrne, Enda M; Hultman, Christina M; Kähler, Anna; Vinkhuyzen, Anna A E; Ripke, Stephan; Andreassen, Ole A; Frisell, Thomas; Gusev, Alexander; Hu, Xinli; Karlsson, Robert; Mantzioris, Vasilis X; McGrath, John J; Mehta, Divya; Stahl, Eli A; Zhao, Qiongyi; Kendler, Kenneth S; Sullivan, Patrick F; Price, Alkes L; O'Donovan, Michael; Okada, Yukinori; Mowry, Bryan J; Raychaudhuri, Soumya; Wray, Naomi R; Byerley, William; Cahn, Wiepke; Cantor, Rita M; Cichon, Sven; Cormican, Paul; Curtis, David; Djurovic, Srdjan; Escott-Price, Valentina; Gejman, Pablo V; Georgieva, Lyudmila; Giegling, Ina; Hansen, Thomas F; Ingason, Andrés; Kim, Yunjung; Konte, Bettina; Lee, Phil H; McIntosh, Andrew; McQuillin, Andrew; Morris, Derek W; Nöthen, Markus M; O'Dushlaine, Colm; Olincy, Ann; Olsen, Line; Pato, Carlos N; Pato, Michele T; Pickard, Benjamin S; Posthuma, Danielle; Rasmussen, Henrik B; Rietschel, Marcella; Rujescu, Dan; Schulze, Thomas G; Silverman, Jeremy M; Thirumalai, Srinivasa; Werge, Thomas; Agartz, Ingrid; Amin, Farooq; Azevedo, Maria H; Bass, Nicholas; Black, Donald W; Blackwood, Douglas H R; Bruggeman, Richard; Buccola, Nancy G; Choudhury, Khalid; Cloninger, Robert C; Corvin, Aiden; Craddock, Nicholas; Daly, Mark J; Datta, Susmita; Donohoe, Gary J; Duan, Jubao; Dudbridge, Frank; Fanous, Ayman; Freedman, Robert; Freimer, Nelson B; Friedl, Marion; Gill, Michael; Gurling, Hugh; De Haan, Lieuwe; Hamshere, Marian L; Hartmann, Annette M; Holmans, Peter A; Kahn, René S; Keller, Matthew C; Kenny, Elaine; Kirov, George K; Krabbendam, Lydia; Krasucki, Robert; Lawrence, Jacob; Lencz, Todd; Levinson, Douglas F; Lieberman, Jeffrey A; Lin, Dan-Yu; Linszen, Don H; Magnusson, Patrik K E; Maier, Wolfgang; Malhotra, Anil K; Mattheisen, Manuel; Mattingsdal, Morten; McCarroll, Steven A; Medeiros, Helena; Melle, Ingrid; Milanova, Vihra; Myin-Germeys, Inez; Neale, Benjamin M; Ophoff, Roel A; Owen, Michael J; Pimm, Jonathan; Purcell, Shaun M; Puri, Vinay; Quested, Digby J; Rossin, Lizzy; Ruderfer, Douglas; Sanders, Alan R; Shi, Jianxin; Sklar, Pamela; St Clair, David; Stroup, T Scott; Van Os, Jim; Visscher, Peter M; Wiersma, Durk; Zammit, Stanley; Bridges, S Louis; Choi, Hyon K; Coenen, Marieke J H; de Vries, Niek; Dieud, Philippe; Greenberg, Jeffrey D; Huizinga, Tom W J; Padyukov, Leonid; Siminovitch, Katherine A; Tak, Paul P; Worthington, Jane; De Jager, Philip L; Denny, Joshua C; Gregersen, Peter K; Klareskog, Lars; Mariette, Xavier; Plenge, Robert M; van Laar, Mart; van Riel, Piet

2015-10-01

A long-standing epidemiological puzzle is the reduced rate of rheumatoid arthritis (RA) in those with schizophrenia (SZ) and vice versa. Traditional epidemiological approaches to determine if this negative association is underpinned by genetic factors would test for reduced rates of one disorder in relatives of the other, but sufficiently powered data sets are difficult to achieve. The genomics era presents an alternative paradigm for investigating the genetic relationship between two uncommon disorders. We use genome-wide common single nucleotide polymorphism (SNP) data from independently collected SZ and RA case-control cohorts to estimate the SNP correlation between the disorders. We test a genotype X environment (GxE) hypothesis for SZ with environment defined as winter- vs summer-born. We estimate a small but significant negative SNP-genetic correlation between SZ and RA (-0.046, s.e. 0.026, P = 0.036). The negative correlation was stronger for the SNP set attributed to coding or regulatory regions (-0.174, s.e. 0.071, P = 0.0075). Our analyses led us to hypothesize a gene-environment interaction for SZ in the form of immune challenge. We used month of birth as a proxy for environmental immune challenge and estimated the genetic correlation between winter-born and non-winter born SZ to be significantly less than 1 for coding/regulatory region SNPs (0.56, s.e. 0.14, P = 0.00090). Our results are consistent with epidemiological observations of a negative relationship between SZ and RA reflecting, at least in part, genetic factors. Results of the month of birth analysis are consistent with pleiotropic effects of genetic variants dependent on environmental context.
New data and an old puzzle: the negative association between schizophrenia and rheumatoid arthritis

PubMed Central

Lee, S Hong; Byrne, Enda M; Hultman, Christina M; Kähler, Anna; Vinkhuyzen, Anna AE; Ripke, Stephan; Andreassen, Ole A; Frisell, Thomas; Gusev, Alexander; Hu, Xinli; Karlsson, Robert; Mantzioris, Vasilis X; McGrath, John J; Mehta, Divya; Stahl, Eli A; Zhao, Qiongyi; Kendler, Kenneth S; Sullivan, Patrick F; Price, Alkes L; O’Donovan, Michael; Okada, Yukinori; Mowry, Bryan J; Raychaudhuri, Soumya; Wray, Naomi R; Byerley, William; Cahn, Wiepke; Cantor, Rita M; Cichon, Sven; Cormican, Paul; Curtis, David; Djurovic, Srdjan; Escott-Price, Valentina; Gejman, Pablo V; Georgieva, Lyudmila; Giegling, Ina; Hansen, Thomas F; Ingason, Andrés; Kim, Yunjung; Konte, Bettina; Lee, Phil H; McIntosh, Andrew; McQuillin, Andrew; Morris, Derek W; Nöthen, Markus M; O’Dushlaine, Colm; Olincy, Ann; Olsen, Line; Pato, Carlos N; Pato, Michele T; Pickard, Benjamin S; Posthuma, Danielle; Rasmussen, Henrik B; Rietschel, Marcella; Rujescu, Dan; Schulze, Thomas G; Silverman, Jeremy M; Thirumalai, Srinivasa; Werge, Thomas; Agartz, Ingrid; Amin, Farooq; Azevedo, Maria H; Bass, Nicholas; Black, Donald W; Blackwood, Douglas H R; Bruggeman, Richard; Buccola, Nancy G; Choudhury, Khalid; Cloninger, Robert C; Corvin, Aiden; Craddock, Nicholas; Daly, Mark J; Datta, Susmita; Donohoe, Gary J; Duan, Jubao; Dudbridge, Frank; Fanous, Ayman; Freedman, Robert; Freimer, Nelson B; Friedl, Marion; Gill, Michael; Gurling, Hugh; De Haan, Lieuwe; Hamshere, Marian L; Hartmann, Annette M; Holmans, Peter A; Kahn, René S; Keller, Matthew C; Kenny, Elaine; Kirov, George K; Krabbendam, Lydia; Krasucki, Robert; Lawrence, Jacob; Lencz, Todd; Levinson, Douglas F; Lieberman, Jeffrey A; Lin, Dan-Yu; Linszen, Don H; Magnusson, Patrik KE; Maier, Wolfgang; Malhotra, Anil K; Mattheisen, Manuel; Mattingsdal, Morten; McCarroll, Steven A; Medeiros, Helena; Melle, Ingrid; Milanova, Vihra; Myin-Germeys, Inez; Neale, Benjamin M; Ophoff, Roel A; Owen, Michael J; Pimm, Jonathan; Purcell, Shaun M; Puri, Vinay; Quested, Digby J; Rossin, Lizzy; Ruderfer, Douglas; Sanders, Alan R; Shi, Jianxin; Sklar, Pamela; St. Clair, David; Stroup, T Scott; Van Os, Jim; Visscher, Peter M; Wiersma, Durk; Zammit, Stanley; Bridges, S Louis; Choi, Hyon K; Coenen, Marieke JH; de Vries, Niek; Dieud, Philippe; Greenberg, Jeffrey D; Huizinga, Tom WJ; Padyukov, Leonid; Siminovitch, Katherine A; Tak, Paul P; Worthington, Jane; De Jager, Philip L; Denny, Joshua C; Gregersen, Peter K; Klareskog, Lars; Mariette, Xavier; Plenge, Robert M; van Laar, Mart; van Riel, Piet

2015-01-01

Background: A long-standing epidemiological puzzle is the reduced rate of rheumatoid arthritis (RA) in those with schizophrenia (SZ) and vice versa. Traditional epidemiological approaches to determine if this negative association is underpinned by genetic factors would test for reduced rates of one disorder in relatives of the other, but sufficiently powered data sets are difficult to achieve. The genomics era presents an alternative paradigm for investigating the genetic relationship between two uncommon disorders. Methods: We use genome-wide common single nucleotide polymorphism (SNP) data from independently collected SZ and RA case-control cohorts to estimate the SNP correlation between the disorders. We test a genotype X environment (GxE) hypothesis for SZ with environment defined as winter- vs summer-born. Results: We estimate a small but significant negative SNP-genetic correlation between SZ and RA (−0.046, s.e. 0.026, P = 0.036). The negative correlation was stronger for the SNP set attributed to coding or regulatory regions (−0.174, s.e. 0.071, P = 0.0075). Our analyses led us to hypothesize a gene-environment interaction for SZ in the form of immune challenge. We used month of birth as a proxy for environmental immune challenge and estimated the genetic correlation between winter-born and non-winter born SZ to be significantly less than 1 for coding/regulatory region SNPs (0.56, s.e. 0.14, P = 0.00090). Conclusions: Our results are consistent with epidemiological observations of a negative relationship between SZ and RA reflecting, at least in part, genetic factors. Results of the month of birth analysis are consistent with pleiotropic effects of genetic variants dependent on environmental context. PMID:26286434
Expanded subgenomic mRNA transcriptome and coding capacity of a nidovirus

PubMed Central

Di, Han; Madden, Joseph C.; Morantz, Esther K.; Tang, Hsin-Yao; Graham, Rachel L.; Baric, Ralph S.

2017-01-01

Members of the order Nidovirales express their structural protein ORFs from a nested set of 3′ subgenomic mRNAs (sg mRNAs), and for most of these ORFs, a single genomic transcription regulatory sequence (TRS) was identified. Nine TRSs were previously reported for the arterivirus Simian hemorrhagic fever virus (SHFV). In the present study, which was facilitated by next-generation sequencing, 96 SHFV body TRSs were identified that were functional in both infected MA104 cells and macaque macrophages. The abundance of sg mRNAs produced from individual TRSs was consistent over time in the two different cell types. Most of the TRSs are located in the genomic 3′ region, but some are in the 5′ ORF1a/1b region and provide alternative sources of nonstructural proteins. Multiple functional TRSs were identified for the majority of the SHFV 3′ ORFs, and four previously identified TRSs were found not to be the predominant ones used. A third of the TRSs generated sg mRNAs with variant leader–body junction sequences. Sg mRNAs encoding E′, GP2, or ORF5a as their 5′ ORF as well as sg mRNAs encoding six previously unreported alternative frame ORFs or 14 previously unreported C-terminal ORFs of known proteins were also identified. Mutation of the start codon of two C-terminal ORFs in an infectious clone reduced virus yield. Mass spectrometry detected one previously unreported protein and suggested translation of some of the C-terminal ORFs. The results reveal the complexity of the transcriptional regulatory mechanism and expanded coding capacity for SHFV, which may also be characteristic of other nidoviruses. PMID:29073030
Moving through the Stressed Genome: Emerging Regulatory Roles for Transposons in Plant Stress Response.

PubMed

Negi, Pooja; Rai, Archana N; Suprasanna, Penna

2016-01-01

The recognition of a positive correlation between organism genome size with its transposable element (TE) content, represents a key discovery of the field of genome biology. Considerable evidence accumulated since then suggests the involvement of TEs in genome structure, evolution and function. The global genome reorganization brought about by transposon activity might play an adaptive/regulatory role in the host response to environmental challenges, reminiscent of McClintock's original 'Controlling Element' hypothesis. This regulatory aspect of TEs is also garnering support in light of the recent evidences, which project TEs as "distributed genomic control modules." According to this view, TEs are capable of actively reprogramming host genes circuits and ultimately fine-tuning the host response to specific environmental stimuli. Moreover, the stress-induced changes in epigenetic status of TE activity may allow TEs to propagate their stress responsive elements to host genes; the resulting genome fluidity can permit phenotypic plasticity and adaptation to stress. Given their predominating presence in the plant genomes, nested organization in the genic regions and potential regulatory role in stress response, TEs hold unexplored potential for crop improvement programs. This review intends to present the current information about the roles played by TEs in plant genome organization, evolution, and function and highlight the regulatory mechanisms in plant stress responses. We will also briefly discuss the connection between TE activity, host epigenetic response and phenotypic plasticity as a critical link for traversing the translational bridge from a purely basic study of TEs, to the applied field of stress adaptation and crop improvement.

Moving through the Stressed Genome: Emerging Regulatory Roles for Transposons in Plant Stress Response

PubMed Central

Negi, Pooja; Rai, Archana N.; Suprasanna, Penna

2016-01-01

The recognition of a positive correlation between organism genome size with its transposable element (TE) content, represents a key discovery of the field of genome biology. Considerable evidence accumulated since then suggests the involvement of TEs in genome structure, evolution and function. The global genome reorganization brought about by transposon activity might play an adaptive/regulatory role in the host response to environmental challenges, reminiscent of McClintock's original ‘Controlling Element’ hypothesis. This regulatory aspect of TEs is also garnering support in light of the recent evidences, which project TEs as “distributed genomic control modules.” According to this view, TEs are capable of actively reprogramming host genes circuits and ultimately fine-tuning the host response to specific environmental stimuli. Moreover, the stress-induced changes in epigenetic status of TE activity may allow TEs to propagate their stress responsive elements to host genes; the resulting genome fluidity can permit phenotypic plasticity and adaptation to stress. Given their predominating presence in the plant genomes, nested organization in the genic regions and potential regulatory role in stress response, TEs hold unexplored potential for crop improvement programs. This review intends to present the current information about the roles played by TEs in plant genome organization, evolution, and function and highlight the regulatory mechanisms in plant stress responses. We will also briefly discuss the connection between TE activity, host epigenetic response and phenotypic plasticity as a critical link for traversing the translational bridge from a purely basic study of TEs, to the applied field of stress adaptation and crop improvement. PMID:27777577
A CRISPR Path to Engineering New Genetic Mouse Models for Cardiovascular Research.

PubMed

Miano, Joseph M; Zhu, Qiuyu Martin; Lowenstein, Charles J

2016-06-01

Previous efforts to target the mouse genome for the addition, subtraction, or substitution of biologically informative sequences required complex vector design and a series of arduous steps only a handful of laboratories could master. The facile and inexpensive clustered regularly interspaced short palindromic repeats (CRISPR) method has now superseded traditional means of genome modification such that virtually any laboratory can quickly assemble reagents for developing new mouse models for cardiovascular research. Here, we briefly review the history of CRISPR in prokaryotes, highlighting major discoveries leading to its formulation for genome modification in the animal kingdom. Core components of CRISPR technology are reviewed and updated. Practical pointers for 2-component and 3-component CRISPR editing are summarized with many applications in mice including frameshift mutations, deletion of enhancers and noncoding genes, nucleotide substitution of protein-coding and gene regulatory sequences, incorporation of loxP sites for conditional gene inactivation, and epitope tag integration. Genotyping strategies are presented and topics of genetic mosaicism and inadvertent targeting discussed. Finally, clinical applications and ethical considerations are addressed as the biomedical community eagerly embraces this astonishing innovation in genome editing to tackle previously intractable questions. © 2016 American Heart Association, Inc.
Applications of CRISPR genome editing technology in drug target identification and validation.

PubMed

Lu, Quinn; Livi, George P; Modha, Sundip; Yusa, Kosuke; Macarrón, Ricardo; Dow, David J

2017-06-01

The analysis of pharmaceutical industry data indicates that the major reason for drug candidates failing in late stage clinical development is lack of efficacy, with a high proportion of these due to erroneous hypotheses about target to disease linkage. More than ever, there is a requirement to better understand potential new drug targets and their role in disease biology in order to reduce attrition in drug development. Genome editing technology enables precise modification of individual protein coding genes, as well as noncoding regulatory sequences, enabling the elucidation of functional effects in human disease relevant cellular systems. Areas covered: This article outlines applications of CRISPR genome editing technology in target identification and target validation studies. Expert opinion: Applications of CRISPR technology in target validation studies are in evidence and gaining momentum. Whilst technical challenges remain, we are on the cusp of CRISPR being applied in complex cell systems such as iPS derived differentiated cells and stem cell derived organoids. In the meantime, our experience to date suggests that precise genome editing of putative targets in primary cell systems is possible, offering more human disease relevant systems than conventional cell lines.
Systems analysis of cis-regulatory motifs in C4 photosynthesis genes using maize and rice leaf transcriptomic data during a process of de-etiolation

PubMed Central

Xu, Jiajia; Bräutigam, Andrea; Weber, Andreas P. M.; Zhu, Xin-Guang

2016-01-01

Identification of potential cis-regulatory motifs controlling the development of C4 photosynthesis is a major focus of current research. In this study, we used time-series RNA-seq data collected from etiolated maize and rice leaf tissues sampled during a de-etiolation process to systematically characterize the expression patterns of C4-related genes and to further identify potential cis elements in five different genomic regions (i.e. promoter, 5′UTR, 3′UTR, intron, and coding sequence) of C4 orthologous genes. The results demonstrate that although most of the C4 genes show similar expression patterns, a number of them, including chloroplast dicarboxylate transporter 1, aspartate aminotransferase, and triose phosphate transporter, show shifted expression patterns compared with their C3 counterparts. A number of conserved short DNA motifs between maize C4 genes and their rice orthologous genes were identified not only in the promoter, 5′UTR, 3′UTR, and coding sequences, but also in the introns of core C4 genes. We also identified cis-regulatory motifs that exist in maize C4 genes and also in genes showing similar expression patterns as maize C4 genes but that do not exist in rice C3 orthologs, suggesting a possible recruitment of pre-existing cis-elements from genes unrelated to C4 photosynthesis into C4 photosynthesis genes during C4 evolution. PMID:27436282
Identification of Small RNAs in Desulfovibrio vulgaris Hildenborough

DOE Office of Scientific and Technical Information (OSTI.GOV)

Burns, Andrew; Joachimiak, Marcin; Deutschbauer, Adam

2010-05-17

Desulfovibrio vulgaris is an anaerobic sulfate-reducing bacterium capable of facilitating the removal of toxic metals such as uranium from contaminated sites via reduction. As such, it is essential to understand the intricate regulatory cascades involved in how D. vulgaris and its relatives respond to stressors in such sites. One approach is the identification and analysis of small non-coding RNAs (sRNAs); molecules ranging in size from 20-200 nucleotides that predominantly affect gene regulation by binding to complementary mRNA in an anti-sense fashion and therefore provide an immediate regulatory response. To identify sRNAs in D. vulgaris, a bacterium that does not possessmore » an annotated hfq gene, RNA was pooled from stationary and exponential phases, nitrate exposure, and biofilm conditions. The subsequent RNA was size fractionated, modified, and converted to cDNA for high throughput transcriptomic deep sequencing. A computational approach to identify sRNAs via the alignment of seven separate Desulfovibrio genomes was also performed. From the deep sequencing analysis, 2,296 reads between 20 and 250 nt were identified with expression above genome background. Analysis of those reads limited the number of candidates to ~;;87 intergenic, while ~;;140 appeared to be antisense to annotated open reading frames (ORFs). Further BLAST analysis of the intergenic candidates and other Desulfovibrio genomes indicated that eight candidates were likely portions of ORFs not previously annotated in the D. vulgaris genome. Comparison of the intergenic and antisense data sets to the bioinformatical predicted candidates, resulted in ~;;54 common candidates. Current approaches using Northern analysis and qRT-PCR are being used toverify expression of the candidates and to further develop the role these sRNAs play in D. vulgaris regulation.« less
Sensitive Periods in Epigenetics: bringing us closer to complex behavioral phenotypes

PubMed Central

Nagy, Corina; Turecki, Gustavo

2017-01-01

Genetic studies have attempted to elucidate causal mechanisms for the development of complex disease but genome-wide associations have been largely unsuccessful in establishing these links. As an alternative link between genes and disease, recent efforts have focused on mechanisms that alter the function of genes without altering the underlying DNA sequence. Known as epigenetic mechanisms, these include: DNA methylation, chromatin conformational changes through histone modifications, non-coding RNAs, and most recently, 5-hydroxymethylcytosine. Though DNA methylation is involved in normal development, aging and gene regulation, altered methylation patterns have been associated with disease. It is generally believed that early life constitutes a period during which there is increased sensitivity to the regulatory effects of epigenetic mechanisms. The purpose of this review is to outline the contribution of epigenetic mechanisms to genomic function, particularly in the development of complex behavioral phenotypes, focusing on the sensitive periods. PMID:22920183
Bioinformatic analysis of microRNA biogenesis and function related proteins in eleven animal genomes.

PubMed

Liu, Xiuying; Luo, GuanZheng; Bai, Xiujuan; Wang, Xiu-Jie

2009-10-01

MicroRNAs are approximately 22 nt long small non-coding RNAs that play important regulatory roles in eukaryotes. The biogenesis and functional processes of microRNAs require the participation of many proteins, of which, the well studied ones are Dicer, Drosha, Argonaute and Exportin 5. To systematically study these four protein families, we screened 11 animal genomes to search for genes encoding above mentioned proteins, and identified some new members for each family. Domain analysis results revealed that most proteins within the same family share identical or similar domains. Alternative spliced transcript variants were found for some proteins. We also examined the expression patterns of these proteins in different human tissues and identified other proteins that could potentially interact with these proteins. These findings provided systematic information on the four key proteins involved in microRNA biogenesis and functional pathways in animals, and will shed light on further functional studies of these proteins.
Genome-wide network of regulatory genes for construction of a chordate embryo.

PubMed

Shoguchi, Eiichi; Hamaguchi, Makoto; Satoh, Nori

2008-04-15

Animal development is controlled by gene regulation networks that are composed of sequence-specific transcription factors (TF) and cell signaling molecules (ST). Although housekeeping genes have been reported to show clustering in the animal genomes, whether the genes comprising a given regulatory network are physically clustered on a chromosome is uncertain. We examined this question in the present study. Ascidians are the closest living relatives of vertebrates, and their tadpole-type larva represents the basic body plan of chordates. The Ciona intestinalis genome contains 390 core TF genes and 119 major ST genes. Previous gene disruption assays led to the formulation of a basic chordate embryonic blueprint, based on over 3000 genetic interactions among 79 zygotic regulatory genes. Here, we mapped the regulatory genes, including all 79 regulatory genes, on the 14 pairs of Ciona chromosomes by fluorescent in situ hybridization (FISH). Chromosomal localization of upstream and downstream regulatory genes demonstrates that the components of coherent developmental gene networks are evenly distributed over the 14 chromosomes. Thus, this study provides the first comprehensive evidence that the physical clustering of regulatory genes, or their target genes, is not relevant for the genome-wide control of gene expression during development.
Regulatory RNA-assisted genome engineering in microorganisms.

PubMed

Si, Tong; HamediRad, Mohammad; Zhao, Huimin

2015-12-01

Regulatory RNAs are increasingly recognized and utilized as key modulators of gene expression in diverse organisms. Thanks to their modular and programmable nature, trans-acting regulatory RNAs are especially attractive in genome-scale applications. Here we discuss the recent examples in microbial genome engineering implementing various trans-acting RNA platforms, including sRNA, RNAi, asRNA and CRISRP-Cas. In particular, we focus on how the scalable and multiplex nature of trans-acting RNAs has been used to tackle the challenges in creating genome-wide and combinatorial diversity for functional genomics and metabolic engineering applications. Advances in computational design and context-dependent regulation are also discussed for their contribution in improving fine-tuning capabilities of trans-acting RNAs. Copyright © 2015 Elsevier Ltd. All rights reserved.
Comparative Genomics in Drosophila.

PubMed

Oti, Martin; Pane, Attilio; Sammeth, Michael

2018-01-01

Since the pioneering studies of Thomas Hunt Morgan and coworkers at the dawn of the twentieth century, Drosophila melanogaster and its sister species have tremendously contributed to unveil the rules underlying animal genetics, development, behavior, evolution, and human disease. Recent advances in DNA sequencing technologies launched Drosophila into the post-genomic era and paved the way for unprecedented comparative genomics investigations. The complete sequencing and systematic comparison of the genomes from 12 Drosophila species represents a milestone achievement in modern biology, which allowed a plethora of different studies ranging from the annotation of known and novel genomic features to the evolution of chromosomes and, ultimately, of entire genomes. Despite the efforts of countless laboratories worldwide, the vast amount of data that were produced over the past 15 years is far from being fully explored.In this chapter, we will review some of the bioinformatic approaches that were developed to interrogate the genomes of the 12 Drosophila species. Setting off from alignments of the entire genomic sequences, the degree of conservation can be separately evaluated for every region of the genome, providing already first hints about elements that are under purifying selection and therefore likely functional. Furthermore, the careful analysis of repeated sequences sheds light on the evolutionary dynamics of transposons, an enigmatic and fascinating class of mobile elements housed in the genomes of animals and plants. Comparative genomics also aids in the computational identification of the transcriptionally active part of the genome, first and foremost of protein-coding loci, but also of transcribed nevertheless apparently noncoding regions, which were once considered "junk" DNA. Eventually, the synergy between functional and comparative genomics also facilitates in silico and in vivo studies on cis-acting regulatory elements, like transcription factor binding sites, that due to the high degree of sequence variability usually impose increased challenges for bioinformatics approaches.
StereoGene: rapid estimation of genome-wide correlation of continuous or interval feature data.

PubMed

Stavrovskaya, Elena D; Niranjan, Tejasvi; Fertig, Elana J; Wheelan, Sarah J; Favorov, Alexander V; Mironov, Andrey A

2017-10-15

Genomics features with similar genome-wide distributions are generally hypothesized to be functionally related, for example, colocalization of histones and transcription start sites indicate chromatin regulation of transcription factor activity. Therefore, statistical algorithms to perform spatial, genome-wide correlation among genomic features are required. Here, we propose a method, StereoGene, that rapidly estimates genome-wide correlation among pairs of genomic features. These features may represent high-throughput data mapped to reference genome or sets of genomic annotations in that reference genome. StereoGene enables correlation of continuous data directly, avoiding the data binarization and subsequent data loss. Correlations are computed among neighboring genomic positions using kernel correlation. Representing the correlation as a function of the genome position, StereoGene outputs the local correlation track as part of the analysis. StereoGene also accounts for confounders such as input DNA by partial correlation. We apply our method to numerous comparisons of ChIP-Seq datasets from the Human Epigenome Atlas and FANTOM CAGE to demonstrate its wide applicability. We observe the changes in the correlation between epigenomic features across developmental trajectories of several tissue types consistent with known biology and find a novel spatial correlation of CAGE clusters with donor splice sites and with poly(A) sites. These analyses provide examples for the broad applicability of StereoGene for regulatory genomics. The StereoGene C ++ source code, program documentation, Galaxy integration scripts and examples are available from the project homepage http://stereogene.bioinf.fbb.msu.ru/. favorov@sensi.org. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
The pig genome project has plenty to squeal about.

PubMed

Fan, B; Gorbach, D M; Rothschild, M F

2011-01-01

Significant progress on pig genetics and genomics research has been witnessed in recent years due to the integration of advanced molecular biology techniques, bioinformatics and computational biology, and the collaborative efforts of researchers in the swine genomics community. Progress on expanding the linkage map has slowed down, but the efforts have created a higher-resolution physical map integrating the clone map and BAC end sequence. The number of QTL mapped is still growing and most of the updated QTL mapping results are available through PigQTLdb. Additionally, expression studies using high-throughput microarrays and other gene expression techniques have made significant advancements. The number of identified non-coding RNAs is rapidly increasing and their exact regulatory functions are being explored. A publishable draft (build 10) of the swine genome sequence was available for the pig genomics community by the end of December 2010. Build 9 of the porcine genome is currently available with Ensembl annotation; manual annotation is ongoing. These drafts provide useful tools for such endeavors as comparative genomics and SNP scans for fine QTL mapping. A recent community-wide effort to create a 60K porcine SNP chip has greatly facilitated whole-genome association analyses, haplotype block construction and linkage disequilibrium mapping, which can contribute to whole-genome selection. The future 'systems biology' that integrates and optimizes the information from all research levels can enhance the pig community's understanding of the full complexity of the porcine genome. These recent technological advances and where they may lead are reviewed. Copyright © 2011 S. Karger AG, Basel.
Genome sequencing and comparative genomics of honey bee microsporidia, Nosema apis reveal novel insights into host-parasite interactions.

PubMed

Chen, Yan ping; Pettis, Jeffery S; Zhao, Yan; Liu, Xinyue; Tallon, Luke J; Sadzewicz, Lisa D; Li, Renhua; Zheng, Huoqing; Huang, Shaokang; Zhang, Xuan; Hamilton, Michele C; Pernal, Stephen F; Melathopoulos, Andony P; Yan, Xianghe; Evans, Jay D

2013-07-05

The microsporidia parasite Nosema contributes to the steep global decline of honey bees that are critical pollinators of food crops. There are two species of Nosema that have been found to infect honey bees, Nosema apis and N. ceranae. Genome sequencing of N. apis and comparative genome analysis with N. ceranae, a fully sequenced microsporidia species, reveal novel insights into host-parasite interactions underlying the parasite infections. We applied the whole-genome shotgun sequencing approach to sequence and assemble the genome of N. apis which has an estimated size of 8.5 Mbp. We predicted 2,771 protein- coding genes and predicted the function of each putative protein using the Gene Ontology. The comparative genomic analysis led to identification of 1,356 orthologs that are conserved between the two Nosema species and genes that are unique characteristics of the individual species, thereby providing a list of virulence factors and new genetic tools for studying host-parasite interactions. We also identified a highly abundant motif in the upstream promoter regions of N. apis genes. This motif is also conserved in N. ceranae and other microsporidia species and likely plays a role in gene regulation across the microsporidia. The availability of the N. apis genome sequence is a significant addition to the rapidly expanding body of microsprodian genomic data which has been improving our understanding of eukaryotic genome diversity and evolution in a broad sense. The predicted virulent genes and transcriptional regulatory elements are potential targets for innovative therapeutics to break down the life cycle of the parasite.
Mapping cis-Regulatory Domains in the Human Genome UsingMulti-Species Conservation of Synteny

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ahituv, Nadav; Prabhakar, Shyam; Poulin, Francis

2005-06-13

Our inability to associate distant regulatory elements with the genes that they regulate has largely precluded their examination for sequence alterations contributing to human disease. One major obstacle is the large genomic space surrounding targeted genes in which such elements could potentially reside. In order to delineate gene regulatory boundaries we used whole-genome human-mouse-chicken (HMC) and human-mouse-frog (HMF) multiple alignments to compile conserved blocks of synteny (CBS), under the hypothesis that these blocks have been kept intact throughout evolution at least in part by the requirement of regulatory elements to stay linked to the genes that they regulate. A totalmore » of 2,116 and 1,942 CBS>200 kb were assembled for HMC and HMF respectively, encompassing 1.53 and 0.86 Gb of human sequence. To support the existence of complex long-range regulatory domains within these CBS we analyzed the prevalence and distribution of chromosomal aberrations leading to position effects (disruption of a genes regulatory environment), observing a clear bias not only for mapping onto CBS but also for longer CBS size. Our results provide a genome wide data set characterizing the regulatory domains of genes and the conserved regulatory elements within them.« less
Complete sequence and analysis of the mitochondrial genome of Hemiselmis andersenii CCMP644 (Cryptophyceae).

PubMed

Kim, Eunsoo; Lane, Christopher E; Curtis, Bruce A; Kozera, Catherine; Bowman, Sharen; Archibald, John M

2008-05-12

Cryptophytes are an enigmatic group of unicellular eukaryotes with plastids derived by secondary (i.e., eukaryote-eukaryote) endosymbiosis. Cryptophytes are unusual in that they possess four genomes-a host cell-derived nuclear and mitochondrial genome and an endosymbiont-derived plastid and 'nucleomorph' genome. The evolutionary origins of the host and endosymbiont components of cryptophyte algae are at present poorly understood. Thus far, a single complete mitochondrial genome sequence has been determined for the cryptophyte Rhodomonas salina. Here, the second complete mitochondrial genome of the cryptophyte alga Hemiselmis andersenii CCMP644 is presented. The H. andersenii mtDNA is 60,553 bp in size and encodes 30 structural RNAs and 36 protein-coding genes, all located on the same strand. A prominent feature of the genome is the presence of a approximately 20 Kbp long intergenic region comprised of numerous tandem and dispersed repeat units of between 22-336 bp. Adjacent to these repeats are 27 copies of palindromic sequences predicted to form stable DNA stem-loop structures. One such stem-loop is located near a GC-rich and GC-poor region and may have a regulatory function in replication or transcription. The H. andersenii mtDNA shares a number of features in common with the genome of the cryptophyte Rhodomonas salina, including general architecture, gene content, and the presence of a large repeat region. However, the H. andersenii mtDNA is devoid of inverted repeats and introns, which are present in R. salina. Comparative analyses of the suite of tRNAs encoded in the two genomes reveal that the H. andersenii mtDNA has lost or converted its original trnK(uuu) gene and possesses a trnS-derived 'trnK(uuu)', which appears unable to produce a functional tRNA. Mitochondrial protein coding gene phylogenies strongly support a variety of previously established eukaryotic groups, but fail to resolve the relationships among higher-order eukaryotic lineages. Comparison of the H. andersenii and R. salina mitochondrial genomes reveals a number of cryptophyte-specific genomic features, most notably the presence of a large repeat-rich intergenic region. However, unlike R. salina, the H. andersenii mtDNA does not possess introns and lacks a Lys-tRNA, which is presumably imported from the cytosol.
A High-Resolution InDel (Insertion–Deletion) Markers-Anchored Consensus Genetic Map Identifies Major QTLs Governing Pod Number and Seed Yield in Chickpea

PubMed Central

Srivastava, Rishi; Singh, Mohar; Bajaj, Deepak; Parida, Swarup K.

2016-01-01

Development and large-scale genotyping of user-friendly informative genome/gene-derived InDel markers in natural and mapping populations is vital for accelerating genomics-assisted breeding applications of chickpea with minimal resource expenses. The present investigation employed a high-throughput whole genome next-generation resequencing strategy in low and high pod number parental accessions and homozygous individuals constituting the bulks from each of two inter-specific mapping populations [(Pusa 1103 × ILWC 46) and (Pusa 256 × ILWC 46)] to develop non-erroneous InDel markers at a genome-wide scale. Comparing these high-quality genomic sequences, 82,360 InDel markers with reference to kabuli genome and 13,891 InDel markers exhibiting differentiation between low and high pod number parental accessions and bulks of aforementioned mapping populations were developed. These informative markers were structurally and functionally annotated in diverse coding and non-coding sequence components of genome/genes of kabuli chickpea. The functional significance of regulatory and coding (frameshift and large-effect mutations) InDel markers for establishing marker-trait linkages through association/genetic mapping was apparent. The markers detected a greater amplification (97%) and intra-specific polymorphic potential (58–87%) among a diverse panel of cultivated desi, kabuli, and wild accessions even by using a simpler cost-efficient agarose gel-based assay implicating their utility in large-scale genetic analysis especially in domesticated chickpea with narrow genetic base. Two high-density inter-specific genetic linkage maps generated using aforesaid mapping populations were integrated to construct a consensus 1479 InDel markers-anchored high-resolution (inter-marker distance: 0.66 cM) genetic map for efficient molecular mapping of major QTLs governing pod number and seed yield per plant in chickpea. Utilizing these high-density genetic maps as anchors, three major genomic regions harboring each of pod number and seed yield robust QTLs (15–28% phenotypic variation explained) were identified on chromosomes 2, 4, and 6. The integration of genetic and physical maps at these QTLs mapped on chromosomes scaled-down the long major QTL intervals into high-resolution short pod number and seed yield robust QTL physical intervals (0.89–2.94 Mb) which were essentially got validated in multiple genetic backgrounds of two chickpea mapping populations. The genome-wide InDel markers including natural allelic variants and genomic loci/genes delineated at major six especially in one colocalized novel congruent robust pod number and seed yield robust QTLs mapped on a high-density consensus genetic map were found most promising in chickpea. These functionally relevant molecular tags can drive marker-assisted genetic enhancement to develop high-yielding cultivars with increased seed/pod number and yield in chickpea. PMID:27695461
HIPAA's Individual Right of Access to Genomic Data: Reconciling Safety and Civil Rights.

PubMed

Evans, Barbara J

2018-01-04

In 2014, the United States granted individuals a right of access to their own laboratory test results, including genomic data. Many observers feel that this right is in tension with regulatory and bioethical standards designed to protect the safety of people who undergo genomic testing. This commentary attributes this tension to growing pains within an expanding federal regulatory program for genetic and genomic testing. The Genetic Information Nondiscrimination Act of 2008 expanded the regulatory agenda to encompass civil rights and consumer safety. The individual access right, as it applies to genomic data, is best understood as a civil-rights regulation. Competing regulatory objectives-safety and civil rights-were not successfully integrated during the initial rollout of genomic civil-rights regulations after 2008. Federal law clarifies how to prioritize safety and civil rights when the two come into conflict, although with careful policy design, the two need not collide. This commentary opens a dialog about possible solutions to advance safety and civil rights together. Copyright © 2017 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Dose-sensitivity, conserved non-coding sequences, and duplicate gene retention through multiple tetraploidies in the grasses.

PubMed

Schnable, James C; Pedersen, Brent S; Subramaniam, Sabarinath; Freeling, Michael

2011-01-01

Whole genome duplications, or tetraploidies, are an important source of increased gene content. Following whole genome duplication, duplicate copies of many genes are lost from the genome. This loss of genes is biased both in the classes of genes deleted and the subgenome from which they are lost. Many or all classes are genes preferentially retained as duplicate copies are engaged in dose sensitive protein-protein interactions, such that deletion of any one duplicate upsets the status quo of subunit concentrations, and presumably lowers fitness as a result. Transcription factors are also preferentially retained following every whole genome duplications studied. This has been explained as a consequence of protein-protein interactions, just as for other highly retained classes of genes. We show that the quantity of conserved noncoding sequences (CNSs) associated with genes predicts the likelihood of their retention as duplicate pairs following whole genome duplication. As many CNSs likely represent binding sites for transcriptional regulators, we propose that the likelihood of gene retention following tetraploidy may also be influenced by dose-sensitive protein-DNA interactions between the regulatory regions of CNS-rich genes - nicknamed bigfoot genes - and the proteins that bind to them. Using grass genomes, we show that differential loss of CNSs from one member of a pair following the pre-grass tetraploidy reduces its chance of retention in the subsequent maize lineage tetraploidy.
Dose–Sensitivity, Conserved Non-Coding Sequences, and Duplicate Gene Retention Through Multiple Tetraploidies in the Grasses

PubMed Central

Schnable, James C.; Pedersen, Brent S.; Subramaniam, Sabarinath; Freeling, Michael

2011-01-01

Whole genome duplications, or tetraploidies, are an important source of increased gene content. Following whole genome duplication, duplicate copies of many genes are lost from the genome. This loss of genes is biased both in the classes of genes deleted and the subgenome from which they are lost. Many or all classes are genes preferentially retained as duplicate copies are engaged in dose sensitive protein–protein interactions, such that deletion of any one duplicate upsets the status quo of subunit concentrations, and presumably lowers fitness as a result. Transcription factors are also preferentially retained following every whole genome duplications studied. This has been explained as a consequence of protein–protein interactions, just as for other highly retained classes of genes. We show that the quantity of conserved noncoding sequences (CNSs) associated with genes predicts the likelihood of their retention as duplicate pairs following whole genome duplication. As many CNSs likely represent binding sites for transcriptional regulators, we propose that the likelihood of gene retention following tetraploidy may also be influenced by dose–sensitive protein–DNA interactions between the regulatory regions of CNS-rich genes – nicknamed bigfoot genes – and the proteins that bind to them. Using grass genomes, we show that differential loss of CNSs from one member of a pair following the pre-grass tetraploidy reduces its chance of retention in the subsequent maize lineage tetraploidy. PMID:22645525
Antisense transcription is pervasive but rarely conserved in enteric bacteria.

PubMed

Raghavan, Rahul; Sloan, Daniel B; Ochman, Howard

2012-01-01

Noncoding RNAs, including antisense RNAs (asRNAs) that originate from the complementary strand of protein-coding genes, are involved in the regulation of gene expression in all domains of life. Recent application of deep-sequencing technologies has revealed that the transcription of asRNAs occurs genome-wide in bacteria. Although the role of the vast majority of asRNAs remains unknown, it is often assumed that their presence implies important regulatory functions, similar to those of other noncoding RNAs. Alternatively, many antisense transcripts may be produced by chance transcription events from promoter-like sequences that result from the degenerate nature of bacterial transcription factor binding sites. To investigate the biological relevance of antisense transcripts, we compared genome-wide patterns of asRNA expression in closely related enteric bacteria, Escherichia coli and Salmonella enterica serovar Typhimurium, by performing strand-specific transcriptome sequencing. Although antisense transcripts are abundant in both species, less than 3% of asRNAs are expressed at high levels in both species, and only about 14% appear to be conserved among species. And unlike the promoters of protein-coding genes, asRNA promoters show no evidence of sequence conservation between, or even within, species. Our findings suggest that many or even most bacterial asRNAs are nonadaptive by-products of the cell's transcription machinery. IMPORTANCE Application of high-throughput methods has revealed the expression throughout bacterial genomes of transcripts encoded on the strand complementary to protein-coding genes. Because transcription is costly, it is usually assumed that these transcripts, termed antisense RNAs (asRNAs), serve some function; however, the role of most asRNAs is unclear, raising questions about their relevance in cellular processes. Because natural selection conserves functional elements, comparisons between related species provide a method for assessing functionality genome-wide. Applying such an approach, we assayed all transcripts in two closely related bacteria, Escherichia coli and Salmonella enterica serovar Typhimurium, and demonstrate that, although the levels of genome-wide antisense transcription are similarly high in both bacteria, only a small fraction of asRNAs are shared across species. Moreover, the promoters associated with asRNAs show no evidence of sequence conservation between, or even within, species. These findings indicate that despite the genome-wide transcription of asRNAs, many of these transcripts are likely nonfunctional.

Genome-Wide Identification of Regulatory Sequences Undergoing Accelerated Evolution in the Human Genome.

PubMed

Dong, Xinran; Wang, Xiao; Zhang, Feng; Tian, Weidong

2016-10-01

Accelerated evolution of regulatory sequence can alter the expression pattern of target genes, and cause phenotypic changes. In this study, we used DNase I hypersensitive sites (DHSs) to annotate putative regulatory sequences in the human genome, and conducted a genome-wide analysis of the effects of accelerated evolution on regulatory sequences. Working under the assumption that local ancient repeat elements of DHSs are under neutral evolution, we discovered that ∼0.44% of DHSs are under accelerated evolution (ace-DHSs). We found that ace-DHSs tend to be more active than background DHSs, and are strongly associated with epigenetic marks of active transcription. The target genes of ace-DHSs are significantly enriched in neuron-related functions, and their expression levels are positively selected in the human brain. Thus, these lines of evidences strongly suggest that accelerated evolution on regulatory sequences plays important role in the evolution of human-specific phenotypes. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Japanese encephalitis virus non-coding RNA inhibits activation of interferon by blocking nuclear translocation of interferon regulatory factor 3.

PubMed

Chang, Ruey-Yi; Hsu, Ta-Wen; Chen, Yen-Lin; Liu, Shu-Fan; Tsai, Yi-Jer; Lin, Yun-Tong; Chen, Yi-Shiuan; Fan, Yi-Hsin

2013-09-27

Noncoding RNA (ncRNA) plays a critical role in modulating a broad range of diseases. All arthropod-borne flaviviruses produce short fragment ncRNA (sfRNA) collinear with highly conserved regions of the 3'-untranslated region (UTR) in the viral genome. We show that the molar ratio of sfRNA to genomic RNA in Japanese encephalitis virus (JEV) persistently infected cells is greater than that in acutely infected cells, indicating an sfRNA role in establishing persistent infection. Transfecting excess quantities of sfRNA into JEV-infected cells reduced interferon-β (IFN-β) promoter activity by 57% and IFN-β mRNA levels by 52%, compared to mock-transfected cells. Transfection of sfRNA into JEV-infected cells also reduced phosphorylation of interferon regulatory factor-3 (IRF-3), the IFN-β upstream regulator, and blocked roughly 30% of IRF-3 nuclear localization. Furthermore, JEV-infected sfRNA transfected cells produced 23% less IFN-β-stimulated apoptosis than mock-transfected groups did. Taken together, these results suggest that sfRNA plays a role against host-cell antiviral responses, prevents cells from undergoing apoptosis, and thus contributes to viral persistence. Copyright © 2013 Elsevier B.V. All rights reserved.
Mutations Affecting Expression of the rosy Locus in Drosophila melanogaster

PubMed Central

Lee, Chong Sung; Curtis, Daniel; McCarron, Margaret; Love, Carol; Gray, Mark; Bender, Welcome; Chovnick, Arthur

1987-01-01

The rosy locus in Drosophila melanogaster codes for the enzyme xanthine dehydrogenase (XDH). Previous studies defined a "control element" near the 5' end of the gene, where variant sites affected the amount of rosy mRNA and protein produced. We have determined the DNA sequence of this region from both genomic and cDNA clones, and from the ry+10 underproducer strain. This variant strain had many sequence differences, so that the site of the regulatory change could not be fixed. A mutagenesis was also undertaken to isolate new regulatory mutations. We induced 376 new mutations with 1-ethyl-1-nitrosourea (ENU) and screened them to isolate those that reduced the amount of XDH protein produced, but did not change the properties of the enzyme. Genetic mapping was used to find mutations located near the 5' end of the gene. DNA from each of seven mutants was cloned and sequenced through the 5' region. Mutant base changes were identified in all seven; they appear to affect splicing and translation of the rosy mRNA. In a related study (T. P. Keith et al. 1987), the genomic and cDNA sequences are extended through the 3' end of the gene; the combined sequences define the processing pattern of the rosy transcript and predict the amino acid sequence of XDH. PMID:3036645
Functional annotation of HOT regions in the human genome: implications for human disease and cancer

PubMed Central

Li, Hao; Chen, Hebing; Liu, Feng; Ren, Chao; Wang, Shengqi; Bo, Xiaochen; Shu, Wenjie

2015-01-01

Advances in genome-wide association studies (GWAS) and large-scale sequencing studies have resulted in an impressive and growing list of disease- and trait-associated genetic variants. Most studies have emphasised the discovery of genetic variation in coding sequences, however, the noncoding regulatory effects responsible for human disease and cancer biology have been substantially understudied. To better characterise the cis-regulatory effects of noncoding variation, we performed a comprehensive analysis of the genetic variants in HOT (high-occupancy target) regions, which are considered to be one of the most intriguing findings of recent large-scale sequencing studies. We observed that GWAS variants that map to HOT regions undergo a substantial net decrease and illustrate development-specific localisation during haematopoiesis. Additionally, genetic risk variants are disproportionally enriched in HOT regions compared with LOT (low-occupancy target) regions in both disease-relevant and cancer cells. Importantly, this enrichment is biased toward disease- or cancer-specific cell types. Furthermore, we observed that cancer cells generally acquire cancer-specific HOT regions at oncogenes through diverse mechanisms of cancer pathogenesis. Collectively, our findings demonstrate the key roles of HOT regions in human disease and cancer and represent a critical step toward further understanding disease biology, diagnosis, and therapy. PMID:26113264
Functional annotation of HOT regions in the human genome: implications for human disease and cancer.

PubMed

Li, Hao; Chen, Hebing; Liu, Feng; Ren, Chao; Wang, Shengqi; Bo, Xiaochen; Shu, Wenjie

2015-06-26

Advances in genome-wide association studies (GWAS) and large-scale sequencing studies have resulted in an impressive and growing list of disease- and trait-associated genetic variants. Most studies have emphasised the discovery of genetic variation in coding sequences, however, the noncoding regulatory effects responsible for human disease and cancer biology have been substantially understudied. To better characterise the cis-regulatory effects of noncoding variation, we performed a comprehensive analysis of the genetic variants in HOT (high-occupancy target) regions, which are considered to be one of the most intriguing findings of recent large-scale sequencing studies. We observed that GWAS variants that map to HOT regions undergo a substantial net decrease and illustrate development-specific localisation during haematopoiesis. Additionally, genetic risk variants are disproportionally enriched in HOT regions compared with LOT (low-occupancy target) regions in both disease-relevant and cancer cells. Importantly, this enrichment is biased toward disease- or cancer-specific cell types. Furthermore, we observed that cancer cells generally acquire cancer-specific HOT regions at oncogenes through diverse mechanisms of cancer pathogenesis. Collectively, our findings demonstrate the key roles of HOT regions in human disease and cancer and represent a critical step toward further understanding disease biology, diagnosis, and therapy.
Evolutionary dynamics of a conserved sequence motif in the ribosomal genes of the ciliate Paramecium.

PubMed

Catania, Francesco; Lynch, Michael

2010-05-04

In protozoa, the identification of preserved motifs by comparative genomics is often impeded by difficulties to generate reliable alignments for non-coding sequences. Moreover, the evolutionary dynamics of regulatory elements in 3' untranslated regions (both in protozoa and metazoa) remains a virtually unexplored issue. By screening Paramecium tetraurelia's 3' untranslated regions for 8-mers that were previously found to be preserved in mammalian 3' UTRs, we detect and characterize a motif that is distinctly conserved in the ribosomal genes of this ciliate. The motif appears to be conserved across Paramecium aurelia species but is absent from the ribosomal genes of four additional non-Paramecium species surveyed, including another ciliate, Tetrahymena thermophila. Motif-free ribosomal genes retain fewer paralogs in the genome and appear to be lost more rapidly relative to motif-containing genes. Features associated with the discovered preserved motif are consistent with this 8-mer playing a role in post-transcriptional regulation. Our observations 1) shed light on the evolution of a putative regulatory motif across large phylogenetic distances; 2) are expected to facilitate the understanding of the modulation of ribosomal genes expression in Paramecium; and 3) reveal a largely unexplored--and presumably not restricted to Paramecium--association between the presence/absence of a DNA motif and the evolutionary fate of its host genes.
Chromatin Immunoprecipitation (ChIP) Protocol for Low-abundance Embryonic Samples.

PubMed

Rehimi, Rizwan; Bartusel, Michaela; Solinas, Francesca; Altmüller, Janine; Rada-Iglesias, Alvaro

2017-08-29

Chromatin immunoprecipitation (ChIP) is a widely-used technique for mapping the localization of post-translationally modified histones, histone variants, transcription factors, or chromatin-modifying enzymes at a given locus or on a genome-wide scale. The combination of ChIP assays with next-generation sequencing (i.e., ChIP-Seq) is a powerful approach to globally uncover gene regulatory networks and to improve the functional annotation of genomes, especially of non-coding regulatory sequences. ChIP protocols normally require large amounts of cellular material, thus precluding the applicability of this method to investigating rare cell types or small tissue biopsies. In order to make the ChIP assay compatible with the amount of biological material that can typically be obtained in vivo during early vertebrate embryogenesis, we describe here a simplified ChIP protocol in which the number of steps required to complete the assay were reduced to minimize sample loss. This ChIP protocol has been successfully used to investigate different histone modifications in various embryonic chicken and adult mouse tissues using low to medium cell numbers (5 x 10 4 - 5 x 10 5 cells). Importantly, this protocol is compatible with ChIP-seq technology using standard library preparation methods, thus providing global epigenomic maps in highly relevant embryonic tissues.
Identification of a novel antisense long non-coding RNA PLA2G16-AS that regulates the expression of PLA2G16 in pigs.

PubMed

Liu, Pengliang; Jin, Long; Zhao, Lirui; Long, Keren; Song, Yang; Tang, Qianzi; Ma, Jideng; Wang, Xun; Tang, Guoqing; Jiang, Yanzhi; Zhu, Li; Li, Xuewei; Li, Mingzhou

2018-05-31

Natural antisense transcripts (NATs) are widely present in mammalian genomes and act as pivotal regulator molecules to control gene expression. However, studies on the NATs of pigs are relatively rare. Here, we identified a novel antisense transcript, designated PLA2G16-AS, transcribed from the phospholipase A2 group XVI locus (PLA2G16) in the porcine genome, which is a well-known regulatory molecule of fat deposition. PLA2G16-AS and PLA2G16 were dominantly expressed in porcine adipose tissue, and were differentially expressed between Tibetan pigs and Rongchang pigs. In addition, PLA2G16-AS has a weak sequence conservation among different vertebrates. PLA2G16-AS was also shown to form an RNA-RNA duplex with PLA2G16, and to regulate PLA2G16 expression at the mRNA level. Moreover, the overexpression of PLA2G16-AS increased the stability of PLA2G16 mRNA in porcine cells. We envision that our findings of a NAT for a regulatory gene associated with lipolysis might further our understanding of the molecular regulation of fat deposition. Copyright © 2017. Published by Elsevier B.V.
Comparative analysis of gene regulatory networks: from network reconstruction to evolution.

PubMed

Thompson, Dawn; Regev, Aviv; Roy, Sushmita

2015-01-01

Regulation of gene expression is central to many biological processes. Although reconstruction of regulatory circuits from genomic data alone is therefore desirable, this remains a major computational challenge. Comparative approaches that examine the conservation and divergence of circuits and their components across strains and species can help reconstruct circuits as well as provide insights into the evolution of gene regulatory processes and their adaptive contribution. In recent years, advances in genomic and computational tools have led to a wealth of methods for such analysis at the sequence, expression, pathway, module, and entire network level. Here, we review computational methods developed to study transcriptional regulatory networks using comparative genomics, from sequence to functional data. We highlight how these methods use evolutionary conservation and divergence to reliably detect regulatory components as well as estimate the extent and rate of divergence. Finally, we discuss the promise and open challenges in linking regulatory divergence to phenotypic divergence and adaptation.
ChIP-seq Accurately Predicts Tissue-Specific Activity of Enhancers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Visel, Axel; Blow, Matthew J.; Li, Zirong

2009-02-01

A major yet unresolved quest in decoding the human genome is the identification of the regulatory sequences that control the spatial and temporal expression of genes. Distant-acting transcriptional enhancers are particularly challenging to uncover since they are scattered amongst the vast non-coding portion of the genome. Evolutionary sequence constraint can facilitate the discovery of enhancers, but fails to predict when and where they are active in vivo. Here, we performed chromatin immunoprecipitation with the enhancer-associated protein p300, followed by massively-parallel sequencing, to map several thousand in vivo binding sites of p300 in mouse embryonic forebrain, midbrain, and limb tissue. Wemore » tested 86 of these sequences in a transgenic mouse assay, which in nearly all cases revealed reproducible enhancer activity in those tissues predicted by p300 binding. Our results indicate that in vivo mapping of p300 binding is a highly accurate means for identifying enhancers and their associated activities and suggest that such datasets will be useful to study the role of tissue-specific enhancers in human biology and disease on a genome-wide scale.« less
Translation elicits a growth rate-dependent, genome-wide, differential protein production in Bacillus subtilis.

PubMed

Borkowski, Olivier; Goelzer, Anne; Schaffer, Marc; Calabre, Magali; Mäder, Ulrike; Aymerich, Stéphane; Jules, Matthieu; Fromion, Vincent

2016-05-17

Complex regulatory programs control cell adaptation to environmental changes by setting condition-specific proteomes. In balanced growth, bacterial protein abundances depend on the dilution rate, transcript abundances and transcript-specific translation efficiencies. We revisited the current theory claiming the invariance of bacterial translation efficiency. By integrating genome-wide transcriptome datasets and datasets from a library of synthetic gfp-reporter fusions, we demonstrated that translation efficiencies in Bacillus subtilis decreased up to fourfold from slow to fast growth. The translation initiation regions elicited a growth rate-dependent, differential production of proteins without regulators, hence revealing a unique, hard-coded, growth rate-dependent mode of regulation. We combined model-based data analyses of transcript and protein abundances genome-wide and revealed that this global regulation is extensively used in B. subtilis We eventually developed a knowledge-based, three-step translation initiation model, experimentally challenged the model predictions and proposed that a growth rate-dependent drop in free ribosome abundance accounted for the differential protein production. © 2016 The Authors. Published under the terms of the CC BY 4.0 license.
Mi-DISCOVERER: A bioinformatics tool for the detection of mi-RNA in human genome.

PubMed

Arshad, Saadia; Mumtaz, Asia; Ahmad, Freed; Liaquat, Sadia; Nadeem, Shahid; Mehboob, Shahid; Afzal, Muhammad

2010-11-27

MicroRNAs (miRNAs) are 22 nucleotides non-coding RNAs that play pivotal regulatory roles in diverse organisms including the humans and are difficult to be identified due to lack of either sequence features or robust algorithms to efficiently identify. Therefore, we made a tool that is Mi-Discoverer for the detection of miRNAs in human genome. The tools used for the development of software are Microsoft Office Access 2003, the JDK version 1.6.0, BioJava version 1.0, and the NetBeans IDE version 6.0. All already made miRNAs softwares were web based; so the advantage of our project was to make a desktop facility to the user for sequence alignment search with already identified miRNAs of human genome present in the database. The user can also insert and update the newly discovered human miRNA in the database. Mi-Discoverer, a bioinformatics tool successfully identifies human miRNAs based on multiple sequence alignment searches. It's a non redundant database containing a large collection of publicly available human miRNAs.
Mi-DISCOVERER: A bioinformatics tool for the detection of mi-RNA in human genome

PubMed Central

Arshad, Saadia; Mumtaz, Asia; Ahmad, Freed; Liaquat, Sadia; Nadeem, Shahid; Mehboob, Shahid; Afzal, Muhammad

2010-01-01

MicroRNAs (miRNAs) are 22 nucleotides non-coding RNAs that play pivotal regulatory roles in diverse organisms including the humans and are difficult to be identified due to lack of either sequence features or robust algorithms to efficiently identify. Therefore, we made a tool that is Mi-Discoverer for the detection of miRNAs in human genome. The tools used for the development of software are Microsoft Office Access 2003, the JDK version 1.6.0, BioJava version 1.0, and the NetBeans IDE version 6.0. All already made miRNAs softwares were web based; so the advantage of our project was to make a desktop facility to the user for sequence alignment search with already identified miRNAs of human genome present in the database. The user can also insert and update the newly discovered human miRNA in the database. Mi-Discoverer, a bioinformatics tool successfully identifies human miRNAs based on multiple sequence alignment searches. It's a non redundant database containing a large collection of publicly available human miRNAs. PMID:21364831
An imprinted non-coding genomic cluster at 14q32 defines clinically relevant molecular subtypes in osteosarcoma across multiple independent datasets.

PubMed

Hill, Katherine E; Kelly, Andrew D; Kuijjer, Marieke L; Barry, William; Rattani, Ahmed; Garbutt, Cassandra C; Kissick, Haydn; Janeway, Katherine; Perez-Atayde, Antonio; Goldsmith, Jeffrey; Gebhardt, Mark C; Arredouani, Mohamed S; Cote, Greg; Hornicek, Francis; Choy, Edwin; Duan, Zhenfeng; Quackenbush, John; Haibe-Kains, Benjamin; Spentzos, Dimitrios

2017-05-15

A microRNA (miRNA) collection on the imprinted 14q32 MEG3 region has been associated with outcome in osteosarcoma. We assessed the clinical utility of this miRNA set and their association with methylation status. We integrated coding and non-coding RNA data from three independent annotated clinical osteosarcoma cohorts (n = 65, n = 27, and n = 25) and miRNA and methylation data from one in vitro (19 cell lines) and one clinical (NCI Therapeutically Applicable Research to Generate Effective Treatments (TARGET) osteosarcoma dataset, n = 80) dataset. We used time-dependent receiver operating characteristic (tdROC) analysis to evaluate the clinical value of candidate miRNA profiles and machine learning approaches to compare the coding and non-coding transcriptional programs of high- and low-risk osteosarcoma tumors and high- versus low-aggressiveness cell lines. In the cell line and TARGET datasets, we also studied the methylation patterns of the MEG3 imprinting control region on 14q32 and their association with miRNA expression and tumor aggressiveness. In the tdROC analysis, miRNA sets on 14q32 showed strong discriminatory power for recurrence and survival in the three clinical datasets. High- or low-risk tumor classification was robust to using different microRNA sets or classification methods. Machine learning approaches showed that genome-wide miRNA profiles and miRNA regulatory networks were quite different between the two outcome groups and mRNA profiles categorized the samples in a manner concordant with the miRNAs, suggesting potential molecular subtypes. Further, miRNA expression patterns were reproducible in comparing high-aggressiveness versus low-aggressiveness cell lines. Methylation patterns in the MEG3 differentially methylated region (DMR) also distinguished high-aggressiveness from low-aggressiveness cell lines and were associated with expression of several 14q32 miRNAs in both the cell lines and the large TARGET clinical dataset. Within the limits of available CpG array coverage, we observed a potential methylation-sensitive regulation of the non-coding RNA cluster by CTCF, a known enhancer-blocking factor. Loss of imprinting/methylation changes in the 14q32 non-coding region defines reproducible previously unrecognized osteosarcoma subtypes with distinct transcriptional programs and biologic and clinical behavior. Future studies will define the precise relationship between 14q32 imprinting, non-coding RNA expression, genomic enhancer binding, and tumor aggressiveness, with possible therapeutic implications for both early- and advanced-stage patients.
Complete Coding Genome Sequence for Mogiana Tick Virus, a Jingmenvirus Isolated from Ticks in Brazil

DTIC Science & Technology

2017-05-04

and capable of infecting a wide range of animal hosts (1–5). Here, we report the complete coding genome sequence (i.e., only missing portions of...segmented nature of the genome was not under- stood. Therefore, only the two genome segments with detectable sequence homolo- gies to flaviviruses were...originally reported (2). We revisited the data set of Maruyama et al. (2) and assembled the complete coding sequences for all four genome segments. We
Regulatory Networks Controlling Plant Cold Acclimation or Low Temperature Regulatory Networks Controlling Cold Acclimation in Arabidopsis (2011 JGI User Meeting)

ScienceCinema

Thomashow, Mike

2018-02-06

The U.S. Department of Energy Joint Genome Institute (JGI) invited scientists interested in the application of genomics to bioenergy and environmental issues, as well as all current and prospective users and collaborators, to attend the annual DOE JGI Genomics of Energy & Environment Meeting held March 22-24, 2011 in Walnut Creek, Calif. The emphasis of this meeting was on the genomics of renewable energy strategies, carbon cycling, environmental gene discovery, and engineering of fuel-producing organisms. The meeting features presentations by leading scientists advancing these topics. Mike Thomashow of Michigan State University gives a presentation on on "Low Temperature Regulatory Networks Controlling Cold Acclimation in Arabidopsis" at the 6th annual Genomics of Energy & Environment Meeting on March 23, 2011."
Using FAIRE (Formaldehyde-Assisted Isolation of Regulatory Elements) to isolate active regulatory DNA

PubMed Central

Simon, Jeremy M.; Giresi, Paul G.; Davis, Ian J.; Lieb, Jason D.

2013-01-01

Eviction or destabilization of nucleosomes from chromatin is a hallmark of functional regulatory elements of the eukaryotic genome. Historically identified by nuclease hypersensitivity, these regulatory elements are typically bound by transcription factors or other regulatory proteins. FAIRE (Formaldehyde-Assisted Isolation of Regulatory Elements) is an alternative approach to identify these genomic regions and has proven successful in a multitude of eukaryotic cell and tissue types. Cells or dissociated tissues are crosslinked briefly with formaldehyde, lysed, and sonicated. Sheared chromatin is subjected to phenol-chloroform extraction and the isolated DNA, typically encompassing 1–3% of the human genome, is purified. We provide guidelines for quantitative analysis by PCR, microarrays, or next-generation sequencing. Regulatory elements enriched by FAIRE display high concordance with those identified by nuclease hypersensitivity or ChIP, and the entire procedure can be completed in three days. FAIRE exhibits low technical variability, which allows its use in large-scale studies of chromatin from normal or diseased tissues. PMID:22262007
Are we Genomic Mosaics? Variations of the Genome of Somatic Cells can Contribute to Diversify our Phenotypes.

PubMed

Astolfi, P A; Salamini, F; Sgaramella, V

2010-09-01

Theoretical and experimental evidences support the hypothesis that the genomes and the epigenomes may be different in the somatic cells of complex organisms. In the genome, the differences range from single base substitutions to chromosome number; in the epigenome, they entail multiple postsynthetic modifications of the chromatin. Somatic genome variations (SGV) may accumulate during development in response both to genetic programs, which may differ from tissue to tissue, and to environmental stimuli, which are often undetected and generally irreproducible. SGV may jeopardize physiological cellular functions, but also create novel coding and regulatory sequences, to be exposed to intraorganismal Darwinian selection. Genomes acknowledged as comparatively poor in genes, such as humans', could thus increase their pristine informational endowment. A better understanding of SGV will contribute to basic issues such as the "nature vs nurture" dualism and the inheritance of acquired characters. On the applied side, they may explain the low yield of cloning via somatic cell nuclear transfer, provide clues to some of the problems associated with transdifferentiation, and interfere with individual DNA analysis. SGV may be unique in the different cells types and in the different developmental stages, and thus explain the several hundred gaps persisting in the human genomes "completed" so far. They may compound the variations associated to our epigenomes and make of each of us an "(epi)genomic" mosaic. An ensuing paradigm is the possibility that a single genome (the ephemeral one assembled at fertilization) has the capacity to generate several different brains in response to different environments.
Genomic regulatory blocks encompass multiple neighboring genes and maintain conserved synteny in vertebrates

PubMed Central

Kikuta, Hiroshi; Laplante, Mary; Navratilova, Pavla; Komisarczuk, Anna Z.; Engström, Pär G.; Fredman, David; Akalin, Altuna; Caccamo, Mario; Sealy, Ian; Howe, Kerstin; Ghislain, Julien; Pezeron, Guillaume; Mourrain, Philippe; Ellingsen, Staale; Oates, Andrew C.; Thisse, Christine; Thisse, Bernard; Foucher, Isabelle; Adolf, Birgit; Geling, Andrea; Lenhard, Boris; Becker, Thomas S.

2007-01-01

We report evidence for a mechanism for the maintenance of long-range conserved synteny across vertebrate genomes. We found the largest mammal-teleost conserved chromosomal segments to be spanned by highly conserved noncoding elements (HCNEs), their developmental regulatory target genes, and phylogenetically and functionally unrelated “bystander” genes. Bystander genes are not specifically under the control of the regulatory elements that drive the target genes and are expressed in patterns that are different from those of the target genes. Reporter insertions distal to zebrafish developmental regulatory genes pax6.1/2, rx3, id1, and fgf8 and miRNA genes mirn9-1 and mirn9-5 recapitulate the expression patterns of these genes even if located inside or beyond bystander genes, suggesting that the regulatory domain of a developmental regulatory gene can extend into and beyond adjacent transcriptional units. We termed these chromosomal segments genomic regulatory blocks (GRBs). After whole genome duplication in teleosts, GRBs, including HCNEs and target genes, were often maintained in both copies, while bystander genes were typically lost from one GRB, strongly suggesting that evolutionary pressure acts to keep the single-copy GRBs of higher vertebrates intact. We show that loss of bystander genes and other mutational events suffered by duplicated GRBs in teleost genomes permits target gene identification and HCNE/target gene assignment. These findings explain the absence of evolutionary breakpoints from large vertebrate chromosomal segments and will aid in the recognition of position effect mutations within human GRBs. PMID:17387144
The excludon: a new concept in bacterial antisense RNA-mediated gene regulation.

PubMed

Sesto, Nina; Wurtzel, Omri; Archambaud, Cristel; Sorek, Rotem; Cossart, Pascale

2013-02-01

In recent years, non-coding RNAs have emerged as key regulators of gene expression. Among these RNAs, the antisense RNAs (asRNAs) are particularly abundant, but in most cases the function and mechanism of action for a particular asRNA remains elusive. Here, we highlight a recently discovered paradigm termed the excludon, which defines a genomic locus encoding an unusually long asRNA that spans divergent genes or operons with related or opposing functions. Because these asRNAs can inhibit the expression of one operon while functioning as an mRNA for the adjacent operon, they act as fine-tuning regulatory switches in bacteria.

Molecular Evolution of the Non-Coding Eosinophil Granule Ontogeny Transcript

PubMed Central

Rose, Dominic; Stadler, Peter F.

2011-01-01

Eukaryotic genomes are pervasively transcribed. A large fraction of the transcriptional output consists of long, mRNA-like, non-protein-coding transcripts (mlncRNAs). The evolutionary history of mlncRNAs is still largely uncharted territory. In this contribution, we explore in detail the evolutionary traces of the eosinophil granule ontogeny transcript (EGOT), an experimentally confirmed representative of an abundant class of totally intronic non-coding transcripts (TINs). EGOT is located antisense to an intron of the ITPR1 gene. We computationally identify putative EGOT orthologs in the genomes of 32 different amniotes, including orthologs from primates, rodents, ungulates, carnivores, afrotherians, and xenarthrans, as well as putative candidates from basal amniotes, such as opossum or platypus. We investigate the EGOT gene phylogeny, analyze patterns of sequence conservation, and the evolutionary conservation of the EGOT gene structure. We show that EGO-B, the spliced isoform, may be present throughout the placental mammals, but most likely dates back even further. We demonstrate here for the first time that the whole EGOT locus is highly structured, containing several evolutionary conserved, and thermodynamic stable secondary structures. Our analyses allow us to postulate novel functional roles of a hitherto poorly understood region at the intron of EGO-B which is highly conserved at the sequence level. The region contains a novel ITPR1 exon and also conserved RNA secondary structures together with a conserved TATA-like element, which putatively acts as a promoter of an independent regulatory element. PMID:22303364
CMIP: a software package capable of reconstructing genome-wide regulatory networks using gene expression data.

PubMed

Zheng, Guangyong; Xu, Yaochen; Zhang, Xiujun; Liu, Zhi-Ping; Wang, Zhuo; Chen, Luonan; Zhu, Xin-Guang

2016-12-23

A gene regulatory network (GRN) represents interactions of genes inside a cell or tissue, in which vertexes and edges stand for genes and their regulatory interactions respectively. Reconstruction of gene regulatory networks, in particular, genome-scale networks, is essential for comparative exploration of different species and mechanistic investigation of biological processes. Currently, most of network inference methods are computationally intensive, which are usually effective for small-scale tasks (e.g., networks with a few hundred genes), but are difficult to construct GRNs at genome-scale. Here, we present a software package for gene regulatory network reconstruction at a genomic level, in which gene interaction is measured by the conditional mutual information measurement using a parallel computing framework (so the package is named CMIP). The package is a greatly improved implementation of our previous PCA-CMI algorithm. In CMIP, we provide not only an automatic threshold determination method but also an effective parallel computing framework for network inference. Performance tests on benchmark datasets show that the accuracy of CMIP is comparable to most current network inference methods. Moreover, running tests on synthetic datasets demonstrate that CMIP can handle large datasets especially genome-wide datasets within an acceptable time period. In addition, successful application on a real genomic dataset confirms its practical applicability of the package. This new software package provides a powerful tool for genomic network reconstruction to biological community. The software can be accessed at http://www.picb.ac.cn/CMIP/ .
Complete Sequence and Analysis of the Mitochondrial Genome of Hemiselmis andersenii CCMP644 (Cryptophyceae)

PubMed Central

Kim, Eunsoo; Lane, Christopher E; Curtis, Bruce A; Kozera, Catherine; Bowman, Sharen; Archibald, John M

2008-01-01

Background Cryptophytes are an enigmatic group of unicellular eukaryotes with plastids derived by secondary (i.e., eukaryote-eukaryote) endosymbiosis. Cryptophytes are unusual in that they possess four genomes–a host cell-derived nuclear and mitochondrial genome and an endosymbiont-derived plastid and 'nucleomorph' genome. The evolutionary origins of the host and endosymbiont components of cryptophyte algae are at present poorly understood. Thus far, a single complete mitochondrial genome sequence has been determined for the cryptophyte Rhodomonas salina. Here, the second complete mitochondrial genome of the cryptophyte alga Hemiselmis andersenii CCMP644 is presented. Results The H. andersenii mtDNA is 60,553 bp in size and encodes 30 structural RNAs and 36 protein-coding genes, all located on the same strand. A prominent feature of the genome is the presence of a ~20 Kbp long intergenic region comprised of numerous tandem and dispersed repeat units of between 22–336 bp. Adjacent to these repeats are 27 copies of palindromic sequences predicted to form stable DNA stem-loop structures. One such stem-loop is located near a GC-rich and GC-poor region and may have a regulatory function in replication or transcription. The H. andersenii mtDNA shares a number of features in common with the genome of the cryptophyte Rhodomonas salina, including general architecture, gene content, and the presence of a large repeat region. However, the H. andersenii mtDNA is devoid of inverted repeats and introns, which are present in R. salina. Comparative analyses of the suite of tRNAs encoded in the two genomes reveal that the H. andersenii mtDNA has lost or converted its original trnK(uuu) gene and possesses a trnS-derived 'trnK(uuu)', which appears unable to produce a functional tRNA. Mitochondrial protein coding gene phylogenies strongly support a variety of previously established eukaryotic groups, but fail to resolve the relationships among higher-order eukaryotic lineages. Conclusion Comparison of the H. andersenii and R. salina mitochondrial genomes reveals a number of cryptophyte-specific genomic features, most notably the presence of a large repeat-rich intergenic region. However, unlike R. salina, the H. andersenii mtDNA does not possess introns and lacks a Lys-tRNA, which is presumably imported from the cytosol. PMID:18474103
Comparison of Five Major Trichome Regulatory Genes in Brassica villosa with Orthologues within the Brassicaceae

PubMed Central

Nayidu, Naghabushana K.; Kagale, Sateesh; Taheri, Ali; Withana-Gamage, Thushan S.; Parkin, Isobel A. P.; Sharpe, Andrew G.; Gruber, Margaret Y.

2014-01-01

Coding sequences for major trichome regulatory genes, including the positive regulators GLABRA 1(GL1), GLABRA 2 (GL2), ENHANCER OF GLABRA 3 (EGL3), and TRANSPARENT TESTA GLABRA 1 (TTG1) and the negative regulator TRIPTYCHON (TRY), were cloned from wild Brassica villosa, which is characterized by dense trichome coverage over most of the plant. Transcript (FPKM) levels from RNA sequencing indicated much higher expression of the GL2 and TTG1 regulatory genes in B. villosa leaves compared with expression levels of GL1 and EGL3 genes in either B. villosa or the reference genome species, glabrous B. oleracea; however, cotyledon TTG1 expression was high in both species. RNA sequencing and Q-PCR also revealed an unusual expression pattern for the negative regulators TRY and CPC, which were much more highly expressed in trichome-rich B. villosa leaves than in glabrous B. oleracea leaves and in glabrous cotyledons from both species. The B. villosa TRY expression pattern also contrasted with TRY expression patterns in two diploid Brassica species, and with the Arabidopsis model for expression of negative regulators of trichome development. Further unique sequence polymorphisms, protein characteristics, and gene evolution studies highlighted specific amino acids in GL1 and GL2 coding sequences that distinguished glabrous species from hairy species and several variants that were specific for each B. villosa gene. Positive selection was observed for GL1 between hairy and non-hairy plants, and as expected the origin of the four expressed positive trichome regulatory genes in B. villosa was predicted to be from B. oleracea. In particular the unpredicted expression patterns for TRY and CPC in B. villosa suggest additional characterization is needed to determine the function of the expanded families of trichome regulatory genes in more complex polyploid species within the Brassicaceae. PMID:24755905
Gene Editing: Regulatory and Translation to Clinic.

PubMed

Ando, Dale; Meyer, Kathleen

2017-10-01

The clinical application and regulatory strategy of genome editing for ex vivo cell therapy is derived from the intersection of two fields of study: viral vector gene therapy trials; and clinical trials with ex vivo purification and engraftment of CD34 + hematopoietic stem cells, T cells, and tumor cell vaccines. This article covers the regulatory and translational preclinical activities needed for a genome editing clinical trial modifying hematopoietic stem cells and the genesis of this current strategy based on previous clinical trials using genome-edited T cells. The SB-728 zinc finger nuclease platform is discussed because this is the most clinically advanced genome editing technology. Copyright © 2017 Elsevier Inc. All rights reserved.
TRACTOR_DB: a database of regulatory networks in gamma-proteobacterial genomes

PubMed Central

González, Abel D.; Espinosa, Vladimir; Vasconcelos, Ana T.; Pérez-Rueda, Ernesto; Collado-Vides, Julio

2005-01-01

Experimental data on the Escherichia coli transcriptional regulatory system has been used in the past years to predict new regulatory elements (promoters, transcription factors (TFs), TFs' binding sites and operons) within its genome. As more genomes of gamma-proteobacteria are being sequenced, the prediction of these elements in a growing number of organisms has become more feasible, as a step towards the study of how different bacteria respond to environmental changes at the level of transcriptional regulation. In this work, we present TRACTOR_DB (TRAnscription FaCTORs' predicted binding sites in prokaryotic genomes), a relational database that contains computational predictions of new members of 74 regulons in 17 gamma-proteobacterial genomes. For these predictions we used a comparative genomics approach regarding which several proof-of-principle articles for large regulons have been published. TRACTOR_DB may be currently accessed at http://www.bioinfo.cu/Tractor_DB, http://www.tractor.lncc.br/ or at http://www.cifn.unam.mx/Computational_Genomics/tractorDB. Contact Email id is tractor@cifn.unam.mx. PMID:15608293
Targeting Non-Coding RNAs in Plants with the CRISPR-Cas Technology is a Challenge yet Worth Accepting.

PubMed

Basak, Jolly; Nithin, Chandran

2015-01-01

Non-coding RNAs (ncRNAs) have emerged as versatile master regulator of biological functions in recent years. MicroRNAs (miRNAs) are small endogenous ncRNAs of 18-24 nucleotides in length that originates from long self-complementary precursors. Besides their direct involvement in developmental processes, plant miRNAs play key roles in gene regulatory networks and varied biological processes. Alternatively, long ncRNAs (lncRNAs) are a large and diverse class of transcribed ncRNAs whose length exceed that of 200 nucleotides. Plant lncRNAs are transcribed by different RNA polymerases, showing diverse structural features. Plant lncRNAs also are important regulators of gene expression in diverse biological processes. There has been a breakthrough in the technology of genome editing, the CRISPR-Cas9 (clustered regulatory interspaced short palindromic repeats/CRISPR-associated protein 9) technology, in the last decade. CRISPR loci are transcribed into ncRNA and eventually form a functional complex with Cas9 and further guide the complex to cleave complementary invading DNA. The CRISPR-Cas technology has been successfully applied in model plants such as Arabidopsis and tobacco and important crops like wheat, maize, and rice. However, all these studies are focused on protein coding genes. Information about targeting non-coding genes is scarce. Hitherto, the CRISPR-Cas technology has been exclusively used in vertebrate systems to engineer miRNA/lncRNAs, but it is still relatively unexplored in plants. While briefing miRNAs, lncRNAs and applications of the CRISPR-Cas technology in human and animals, this review essentially elaborates several strategies to overcome the challenges of applying the CRISPR-Cas technology in editing ncRNAs in plants and the future perspective of this field.
BiRen: predicting enhancers with a deep-learning-based model using the DNA sequence alone.

PubMed

Yang, Bite; Liu, Feng; Ren, Chao; Ouyang, Zhangyi; Xie, Ziwei; Bo, Xiaochen; Shu, Wenjie

2017-07-01

Enhancer elements are noncoding stretches of DNA that play key roles in controlling gene expression programmes. Despite major efforts to develop accurate enhancer prediction methods, identifying enhancer sequences continues to be a challenge in the annotation of mammalian genomes. One of the major issues is the lack of large, sufficiently comprehensive and experimentally validated enhancers for humans or other species. Thus, the development of computational methods based on limited experimentally validated enhancers and deciphering the transcriptional regulatory code encoded in the enhancer sequences is urgent. We present a deep-learning-based hybrid architecture, BiRen, which predicts enhancers using the DNA sequence alone. Our results demonstrate that BiRen can learn common enhancer patterns directly from the DNA sequence and exhibits superior accuracy, robustness and generalizability in enhancer prediction relative to other state-of-the-art enhancer predictors based on sequence characteristics. Our BiRen will enable researchers to acquire a deeper understanding of the regulatory code of enhancer sequences. Our BiRen method can be freely accessed at https://github.com/wenjiegroup/BiRen . shuwj@bmi.ac.cn or boxc@bmi.ac.cn. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
I-motif DNA structures are formed in the nuclei of human cells

NASA Astrophysics Data System (ADS)

Zeraati, Mahdi; Langley, David B.; Schofield, Peter; Moye, Aaron L.; Rouet, Romain; Hughes, William E.; Bryan, Tracy M.; Dinger, Marcel E.; Christ, Daniel

2018-06-01

Human genome function is underpinned by the primary storage of genetic information in canonical B-form DNA, with a second layer of DNA structure providing regulatory control. I-motif structures are thought to form in cytosine-rich regions of the genome and to have regulatory functions; however, in vivo evidence for the existence of such structures has so far remained elusive. Here we report the generation and characterization of an antibody fragment (iMab) that recognizes i-motif structures with high selectivity and affinity, enabling the detection of i-motifs in the nuclei of human cells. We demonstrate that the in vivo formation of such structures is cell-cycle and pH dependent. Furthermore, we provide evidence that i-motif structures are formed in regulatory regions of the human genome, including promoters and telomeric regions. Our results support the notion that i-motif structures provide key regulatory roles in the genome.
The PathoYeastract database: an information system for the analysis of gene and genomic transcription regulation in pathogenic yeasts.

PubMed

Monteiro, Pedro Tiago; Pais, Pedro; Costa, Catarina; Manna, Sauvagya; Sá-Correia, Isabel; Teixeira, Miguel Cacho

2017-01-04

We present the PATHOgenic YEAst Search for Transcriptional Regulators And Consensus Tracking (PathoYeastract - http://pathoyeastract.org) database, a tool for the analysis and prediction of transcription regulatory associations at the gene and genomic levels in the pathogenic yeasts Candida albicans and C. glabrata Upon data retrieval from hundreds of publications, followed by curation, the database currently includes 28 000 unique documented regulatory associations between transcription factors (TF) and target genes and 107 DNA binding sites, considering 134 TFs in both species. Following the structure used for the YEASTRACT database, PathoYeastract makes available bioinformatics tools that enable the user to exploit the existing information to predict the TFs involved in the regulation of a gene or genome-wide transcriptional response, while ranking those TFs in order of their relative importance. Each search can be filtered based on the selection of specific environmental conditions, experimental evidence or positive/negative regulatory effect. Promoter analysis tools and interactive visualization tools for the representation of TF regulatory networks are also provided. The PathoYeastract database further provides simple tools for the prediction of gene and genomic regulation based on orthologous regulatory associations described for other yeast species, a comparative genomics setup for the study of cross-species evolution of regulatory networks. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Improved accuracy of supervised CRM discovery with interpolated Markov models and cross-species comparison.

PubMed

Kazemian, Majid; Zhu, Qiyun; Halfon, Marc S; Sinha, Saurabh

2011-12-01

Despite recent advances in experimental approaches for identifying transcriptional cis-regulatory modules (CRMs, 'enhancers'), direct empirical discovery of CRMs for all genes in all cell types and environmental conditions is likely to remain an elusive goal. Effective methods for computational CRM discovery are thus a critically needed complement to empirical approaches. However, existing computational methods that search for clusters of putative binding sites are ineffective if the relevant TFs and/or their binding specificities are unknown. Here, we provide a significantly improved method for 'motif-blind' CRM discovery that does not depend on knowledge or accurate prediction of TF-binding motifs and is effective when limited knowledge of functional CRMs is available to 'supervise' the search. We propose a new statistical method, based on 'Interpolated Markov Models', for motif-blind, genome-wide CRM discovery. It captures the statistical profile of variable length words in known CRMs of a regulatory network and finds candidate CRMs that match this profile. The method also uses orthologs of the known CRMs from closely related genomes. We perform in silico evaluation of predicted CRMs by assessing whether their neighboring genes are enriched for the expected expression patterns. This assessment uses a novel statistical test that extends the widely used Hypergeometric test of gene set enrichment to account for variability in intergenic lengths. We find that the new CRM prediction method is superior to existing methods. Finally, we experimentally validate 12 new CRM predictions by examining their regulatory activity in vivo in Drosophila; 10 of the tested CRMs were found to be functional, while 6 of the top 7 predictions showed the expected activity patterns. We make our program available as downloadable source code, and as a plugin for a genome browser installed on our servers. © The Author(s) 2011. Published by Oxford University Press.
A genome-wide SNP scan accelerates trait-regulatory genomic loci identification in chickpea

PubMed Central

Kujur, Alice; Bajaj, Deepak; Upadhyaya, Hari D.; Das, Shouvik; Ranjan, Rajeev; Shree, Tanima; Saxena, Maneesha S.; Badoni, Saurabh; Kumar, Vinod; Tripathi, Shailesh; Gowda, C.L.L.; Sharma, Shivali; Singh, Sube; Tyagi, Akhilesh K.; Parida, Swarup K.

2015-01-01

We identified 44844 high-quality SNPs by sequencing 92 diverse chickpea accessions belonging to a seed and pod trait-specific association panel using reference genome- and de novo-based GBS (genotyping-by-sequencing) assays. A GWAS (genome-wide association study) in an association panel of 211, including the 92 sequenced accessions, identified 22 major genomic loci showing significant association (explaining 23–47% phenotypic variation) with pod and seed number/plant and 100-seed weight. Eighteen trait-regulatory major genomic loci underlying 13 robust QTLs were validated and mapped on an intra-specific genetic linkage map by QTL mapping. A combinatorial approach of GWAS, QTL mapping and gene haplotype-specific LD mapping and transcript profiling uncovered one superior haplotype and favourable natural allelic variants in the upstream regulatory region of a CesA-type cellulose synthase (Ca_Kabuli_CesA3) gene regulating high pod and seed number/plant (explaining 47% phenotypic variation) in chickpea. The up-regulation of this superior gene haplotype correlated with increased transcript expression of Ca_Kabuli_CesA3 gene in the pollen and pod of high pod/seed number accession, resulting in higher cellulose accumulation for normal pollen and pollen tube growth. A rapid combinatorial genome-wide SNP genotyping-based approach has potential to dissect complex quantitative agronomic traits and delineate trait-regulatory genomic loci (candidate genes) for genetic enhancement in crop plants, including chickpea. PMID:26058368
cis-Regulatory control of the initial neurogenic pattern of onecut gene expression in the sea urchin embryo.

PubMed

Barsi, Julius C; Davidson, Eric H

2016-01-01

Specification of the ciliated band (CB) of echinoid embryos executes three spatial functions essential for postgastrular organization. These are establishment of a band about 5 cells wide which delimits and bounds other embryonic territories; definition of a neurogenic domain within this band; and generation within it of arrays of ciliary cells that bear the special long cilia from which the structure derives its name. In Strongylocentrotus purpuratus the spatial coordinates of the future ciliated band are initially and exactly determined by the disposition of a ring of cells that transcriptionally activate the onecut homeodomain regulatory gene, beginning in blastula stage, long before the appearance of the CB per se. Thus the cis-regulatory apparatus that governs onecut expression in the blastula directly reveals the genomic sequence code by which these aspects of the spatial organization of the embryo are initially determined. We screened the entire onecut locus and its flanking region for transcriptionally active cis-regulatory elements, and by means of BAC recombineered deletions identified three separated and required cis-regulatory modules that execute different functions. The operating logic of the crucial spatial control module accounting for the spectacularly precise and beautiful early onecut expression domain depends on spatial repression. Previously predicted oral ectoderm and aboral ectoderm repressors were identified by cis-regulatory mutation as the products of goosecoid and irxa genes respectively, while the pan-ectodermal activator SoxB1 supplies a transcriptional driver function. Copyright © 2015. Published by Elsevier Inc.
Genome-Wide Identification of Regulatory Elements and Reconstruction of Gene Regulatory Networks of the Green Alga Chlamydomonas reinhardtii under Carbon Deprivation

PubMed Central

Vischi Winck, Flavia; Arvidsson, Samuel; Riaño-Pachón, Diego Mauricio; Hempel, Sabrina; Koseska, Aneta; Nikoloski, Zoran; Urbina Gomez, David Alejandro; Rupprecht, Jens; Mueller-Roeber, Bernd

2013-01-01

The unicellular green alga Chlamydomonas reinhardtii is a long-established model organism for studies on photosynthesis and carbon metabolism-related physiology. Under conditions of air-level carbon dioxide concentration [CO2], a carbon concentrating mechanism (CCM) is induced to facilitate cellular carbon uptake. CCM increases the availability of carbon dioxide at the site of cellular carbon fixation. To improve our understanding of the transcriptional control of the CCM, we employed FAIRE-seq (formaldehyde-assisted Isolation of Regulatory Elements, followed by deep sequencing) to determine nucleosome-depleted chromatin regions of algal cells subjected to carbon deprivation. Our FAIRE data recapitulated the positions of known regulatory elements in the promoter of the periplasmic carbonic anhydrase (Cah1) gene, which is upregulated during CCM induction, and revealed new candidate regulatory elements at a genome-wide scale. In addition, time series expression patterns of 130 transcription factor (TF) and transcription regulator (TR) genes were obtained for cells cultured under photoautotrophic condition and subjected to a shift from high to low [CO2]. Groups of co-expressed genes were identified and a putative directed gene-regulatory network underlying the CCM was reconstructed from the gene expression data using the recently developed IOTA (inner composition alignment) method. Among the candidate regulatory genes, two members of the MYB-related TF family, Lcr1 (Low-CO 2 response regulator 1) and Lcr2 (Low-CO 2 response regulator 2), may play an important role in down-regulating the expression of a particular set of TF and TR genes in response to low [CO2]. The results obtained provide new insights into the transcriptional control of the CCM and revealed more than 60 new candidate regulatory genes. Deep sequencing of nucleosome-depleted genomic regions indicated the presence of new, previously unknown regulatory elements in the C. reinhardtii genome. Our work can serve as a basis for future functional studies of transcriptional regulator genes and genomic regulatory elements in Chlamydomonas. PMID:24224019
BLSSpeller: exhaustive comparative discovery of conserved cis-regulatory elements.

PubMed

De Witte, Dieter; Van de Velde, Jan; Decap, Dries; Van Bel, Michiel; Audenaert, Pieter; Demeester, Piet; Dhoedt, Bart; Vandepoele, Klaas; Fostier, Jan

2015-12-01

The accurate discovery and annotation of regulatory elements remains a challenging problem. The growing number of sequenced genomes creates new opportunities for comparative approaches to motif discovery. Putative binding sites are then considered to be functional if they are conserved in orthologous promoter sequences of multiple related species. Existing methods for comparative motif discovery usually rely on pregenerated multiple sequence alignments, which are difficult to obtain for more diverged species such as plants. As a consequence, misaligned regulatory elements often remain undetected. We present a novel algorithm that supports both alignment-free and alignment-based motif discovery in the promoter sequences of related species. Putative motifs are exhaustively enumerated as words over the IUPAC alphabet and screened for conservation using the branch length score. Additionally, a confidence score is established in a genome-wide fashion. In order to take advantage of a cloud computing infrastructure, the MapReduce programming model is adopted. The method is applied to four monocotyledon plant species and it is shown that high-scoring motifs are significantly enriched for open chromatin regions in Oryza sativa and for transcription factor binding sites inferred through protein-binding microarrays in O.sativa and Zea mays. Furthermore, the method is shown to recover experimentally profiled ga2ox1-like KN1 binding sites in Z.mays. BLSSpeller was written in Java. Source code and manual are available at http://bioinformatics.intec.ugent.be/blsspeller Klaas.Vandepoele@psb.vib-ugent.be or jan.fostier@intec.ugent.be. Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.
BLSSpeller: exhaustive comparative discovery of conserved cis-regulatory elements

PubMed Central

De Witte, Dieter; Van de Velde, Jan; Decap, Dries; Van Bel, Michiel; Audenaert, Pieter; Demeester, Piet; Dhoedt, Bart; Vandepoele, Klaas; Fostier, Jan

2015-01-01

Motivation: The accurate discovery and annotation of regulatory elements remains a challenging problem. The growing number of sequenced genomes creates new opportunities for comparative approaches to motif discovery. Putative binding sites are then considered to be functional if they are conserved in orthologous promoter sequences of multiple related species. Existing methods for comparative motif discovery usually rely on pregenerated multiple sequence alignments, which are difficult to obtain for more diverged species such as plants. As a consequence, misaligned regulatory elements often remain undetected. Results: We present a novel algorithm that supports both alignment-free and alignment-based motif discovery in the promoter sequences of related species. Putative motifs are exhaustively enumerated as words over the IUPAC alphabet and screened for conservation using the branch length score. Additionally, a confidence score is established in a genome-wide fashion. In order to take advantage of a cloud computing infrastructure, the MapReduce programming model is adopted. The method is applied to four monocotyledon plant species and it is shown that high-scoring motifs are significantly enriched for open chromatin regions in Oryza sativa and for transcription factor binding sites inferred through protein-binding microarrays in O.sativa and Zea mays. Furthermore, the method is shown to recover experimentally profiled ga2ox1-like KN1 binding sites in Z.mays. Availability and implementation: BLSSpeller was written in Java. Source code and manual are available at http://bioinformatics.intec.ugent.be/blsspeller Contact: Klaas.Vandepoele@psb.vib-ugent.be or jan.fostier@intec.ugent.be Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26254488
Genetically engineered livestock for agriculture: a generation after the first transgenic animal research conference.

PubMed

Murray, James D; Maga, Elizabeth A

2016-06-01

At the time of the first Transgenic Animal Research Conference, the lack of knowledge about promoter, enhancer and coding regions of genes of interest greatly hampered our efforts to create transgenes that would express appropriately in livestock. Additionally, we were limited to gene insertion by pronuclear microinjection. As predicted then, widespread genome sequencing efforts and technological advancements have profoundly altered what we can do. There have been many developments in technology to create transgenic animals since we first met at Granlibakken in 1997, including the advent of somatic cell nuclear transfer-based cloning and gene editing. We can now create new transgenes that will express when and where we want and can target precisely in the genome where we want to make a change or insert a transgene. With the large number of sequenced genomes, we have unprecedented access to sequence information including, control regions, coding regions, and known allelic variants. These technological developments have ushered in new and renewed enthusiasm for the production of transgenic animals among scientists and animal agriculturalists around the world, both for the production of more relevant biomedical research models as well as for agricultural applications. However, even though great advancements have been made in our ability to control gene expression and target genetic changes in our animals, there still are no genetically engineered animal products on the market for food. World-wide there has been a failure of the regulatory processes to effectively move forward. Estimates suggest the world will need to increase our current food production 70 % by 2050; that is we will have to produce the total amount of food each year that has been consumed by mankind over the past 500 years. The combination of transgenic animal technology and gene editing will become increasingly more important tools to help feed the world. However, to date the practical benefits of these technologies have not yet reached consumers in any country and in the absence of predictable, science-based regulatory programs it is unlikely that the benefits will be realized in the short to medium term.
Regulated Formation of lncRNA-DNA Hybrids Enables Faster Transcriptional Induction and Environmental Adaptation.

PubMed

Cloutier, Sara C; Wang, Siwen; Ma, Wai Kit; Al Husini, Nadra; Dhoondia, Zuzer; Ansari, Athar; Pascuzzi, Pete E; Tran, Elizabeth J

2016-02-04

Long non-coding (lnc)RNAs, once thought to merely represent noise from imprecise transcription initiation, have now emerged as major regulatory entities in all eukaryotes. In contrast to the rapidly expanding identification of individual lncRNAs, mechanistic characterization has lagged behind. Here we provide evidence that the GAL lncRNAs in the budding yeast S. cerevisiae promote transcriptional induction in trans by formation of lncRNA-DNA hybrids or R-loops. The evolutionarily conserved RNA helicase Dbp2 regulates formation of these R-loops as genomic deletion or nuclear depletion results in accumulation of these structures across the GAL cluster gene promoters and coding regions. Enhanced transcriptional induction is manifested by lncRNA-dependent displacement of the Cyc8 co-repressor and subsequent gene looping, suggesting that these lncRNAs promote induction by altering chromatin architecture. Moreover, the GAL lncRNAs confer a competitive fitness advantage to yeast cells because expression of these non-coding molecules correlates with faster adaptation in response to an environmental switch. Copyright © 2016 Elsevier Inc. All rights reserved.
Antisense Transcription Is Pervasive but Rarely Conserved in Enteric Bacteria

PubMed Central

Raghavan, Rahul; Sloan, Daniel B.; Ochman, Howard

2012-01-01

ABSTRACT Noncoding RNAs, including antisense RNAs (asRNAs) that originate from the complementary strand of protein-coding genes, are involved in the regulation of gene expression in all domains of life. Recent application of deep-sequencing technologies has revealed that the transcription of asRNAs occurs genome-wide in bacteria. Although the role of the vast majority of asRNAs remains unknown, it is often assumed that their presence implies important regulatory functions, similar to those of other noncoding RNAs. Alternatively, many antisense transcripts may be produced by chance transcription events from promoter-like sequences that result from the degenerate nature of bacterial transcription factor binding sites. To investigate the biological relevance of antisense transcripts, we compared genome-wide patterns of asRNA expression in closely related enteric bacteria, Escherichia coli and Salmonella enterica serovar Typhimurium, by performing strand-specific transcriptome sequencing. Although antisense transcripts are abundant in both species, less than 3% of asRNAs are expressed at high levels in both species, and only about 14% appear to be conserved among species. And unlike the promoters of protein-coding genes, asRNA promoters show no evidence of sequence conservation between, or even within, species. Our findings suggest that many or even most bacterial asRNAs are nonadaptive by-products of the cell’s transcription machinery. PMID:22872780
Silencing Effect of Hominoid Highly Conserved Noncoding Sequences on Embryonic Brain Development

PubMed Central

Mahmoudi Saber, Morteza

2017-01-01

Abstract Superfamily Hominoidea, which consists of Hominidae (humans and great apes) and Hylobatidae (gibbons), is well-known for sharing human-like characteristics, however, the genomic origins of these shared unique phenotypes have mainly remained elusive. To decipher the underlying genomic basis of Hominoidea-restricted phenotypes, we identified and characterized Hominoidea-restricted highly conserved noncoding sequences (HCNSs) that are a class of potential regulatory elements which may be involved in evolution of lineage-specific phenotypes. We discovered 679 such HCNSs from human, chimpanzee, gorilla, orangutan and gibbon genomes. These HCNSs were demonstrated to be under purifying selection but with lineage-restricted characteristics different from old CNSs. A significant proportion of their ancestral sequences had accelerated rates of nucleotide substitutions, insertions and deletions during the evolution of common ancestor of Hominoidea, suggesting the intervention of positive Darwinian selection for creating those HCNSs. In contrary to enhancer elements and similar to silencer sequences, these Hominoidea-restricted HCNSs are located in close proximity of transcription start sites. Their target genes are enriched in the nervous system, development and transcription, and they tend to be remotely located from the nearest coding gene. Chip-seq signals and gene expression patterns suggest that Hominoidea-restricted HCNSs are likely to be functional regulatory elements by imposing silencing effects on their target genes in a tissue-restricted manner during fetal brain development. These HCNSs, emerged through adaptive evolution and conserved through purifying selection, represent a set of promising targets for future functional studies of the evolution of Hominoidea-restricted phenotypes. PMID:28633494

Are There Genetic Paths Common to Obesity, Cardiovascular Disease Outcomes, and Cardiovascular Risk Factors?

PubMed Central

Rankinen, Tuomo; Sarzynski, Mark A.; Ghosh, Sujoy; Bouchard, Claude

2015-01-01

Clustering of obesity, coronary artery disease, and cardiovascular disease risk factors is observed in epidemiological studies and clinical settings. Twin and family studies have provided some supporting evidence for the clustering hypothesis. Loci nearest a lead single nucleotide polymorphism (SNP) showing genome-wide significant associations with coronary artery disease, body mass index, C-reactive protein, blood pressure, lipids, and type 2 diabetes mellitus were selected for pathway and network analyses. Eighty-seven autosomal regions (181 SNPs), mapping to 56 genes, were found to be pleiotropic. Most pleiotropic regions contained genes associated with coronary artery disease and plasma lipids, whereas some exhibited coaggregation between obesity and cardiovascular disease risk factors. We observed enrichment for liver X receptor (LXR)/retinoid X receptor (RXR) and farnesoid X receptor/RXR nuclear receptor signaling among pleiotropic genes and for signatures of coronary artery disease and hepatic steatosis. In the search for functionally interacting networks, we found that 43 pleiotropic genes were interacting in a network with an additional 24 linker genes. ENCODE (Encyclopedia of DNA Elements) data were queried for distribution of pleiotropic SNPs among regulatory elements and coding sequence variations. Of the 181 SNPs, 136 were annotated to ≥1 regulatory feature. An enrichment analysis found over-representation of enhancers and DNAse hypersensitive regions when compared against all SNPs of the 1000 Genomes pilot project. In summary, there are genomic regions exerting pleiotropic effects on cardiovascular disease risk factors, although only a few included obesity. Further studies are needed to resolve the clustering in terms of DNA variants, genes, pathways, and actionable targets. PMID:25722444
An integrative and applicable phylogenetic footprinting framework for cis-regulatory motifs identification in prokaryotic genomes.

PubMed

Liu, Bingqiang; Zhang, Hanyuan; Zhou, Chuan; Li, Guojun; Fennell, Anne; Wang, Guanghui; Kang, Yu; Liu, Qi; Ma, Qin

2016-08-09

Phylogenetic footprinting is an important computational technique for identifying cis-regulatory motifs in orthologous regulatory regions from multiple genomes, as motifs tend to evolve slower than their surrounding non-functional sequences. Its application, however, has several difficulties for optimizing the selection of orthologous data and reducing the false positives in motif prediction. Here we present an integrative phylogenetic footprinting framework for accurate motif predictions in prokaryotic genomes (MP(3)). The framework includes a new orthologous data preparation procedure, an additional promoter scoring and pruning method and an integration of six existing motif finding algorithms as basic motif search engines. Specifically, we collected orthologous genes from available prokaryotic genomes and built the orthologous regulatory regions based on sequence similarity of promoter regions. This procedure made full use of the large-scale genomic data and taxonomy information and filtered out the promoters with limited contribution to produce a high quality orthologous promoter set. The promoter scoring and pruning is implemented through motif voting by a set of complementary predicting tools that mine as many motif candidates as possible and simultaneously eliminate the effect of random noise. We have applied the framework to Escherichia coli k12 genome and evaluated the prediction performance through comparison with seven existing programs. This evaluation was systematically carried out at the nucleotide and binding site level, and the results showed that MP(3) consistently outperformed other popular motif finding tools. We have integrated MP(3) into our motif identification and analysis server DMINDA, allowing users to efficiently identify and analyze motifs in 2,072 completely sequenced prokaryotic genomes. The performance evaluation indicated that MP(3) is effective for predicting regulatory motifs in prokaryotic genomes. Its application may enhance progress in elucidating transcription regulation mechanism, thus provide benefit to the genomic research community and prokaryotic genome researchers in particular.
77 FR 3073 - American Society of Mechanical Engineers (ASME) Codes and New and Revised ASME Code Cases...

Federal Register 2010, 2011, 2012, 2013, 2014

2012-01-23

... NUCLEAR REGULATORY COMMISSION 10 CFR Part 50 [NRC-2008-0554] RIN 3150-AI35 American Society of Mechanical Engineers (ASME) Codes and New and Revised ASME Code Cases; Corrections AGENCY: Nuclear Regulatory... the American Society of Mechanical Engineers, Three Park Avenue, New York, NY 10016, phone (800) 843...
Transposable Elements as Stress Adaptive Capacitors Induce Genomic Instability in Fungal Pathogen Magnaporthe oryzae

PubMed Central

Chadha, Sonia; Sharma, Mradul

2014-01-01

A fundamental problem in fungal pathogenesis is to elucidate the evolutionary forces responsible for genomic rearrangements leading to races with fitter genotypes. Understanding the adaptive evolutionary mechanisms requires identification of genomic components and environmental factors reshaping the genome of fungal pathogens to adapt. Herein, Magnaporthe oryzae, a model fungal plant pathogen is used to demonstrate the impact of environmental cues on transposable elements (TE) based genome dynamics. For heat shock and copper stress exposed samples, eight TEs belonging to class I and II family were employed to obtain DNA profiles. Stress induced mutant bands showed a positive correlation with dose/duration of stress and provided evidences of TEs role in stress adaptiveness. Further, we demonstrate that genome dynamics differ for the type/family of TEs upon stress exposition and previous reports of stress induced MAGGY transposition has underestimated the role of TEs in M. oryzae. Here, we identified Pyret, MAGGY, Pot3, MINE, Mg-SINE, Grasshopper and MGLR3 as contributors of high genomic instability in M. oryzae in respective order. Sequencing of mutated bands led to the identification of LTR-retrotransposon sequences within regulatory regions of psuedogenes. DNA transposon Pot3 was identified in the coding regions of chromatin remodelling protein containing tyrosinase copper-binding and PWWP domains. LTR-retrotransposons Pyret and MAGGY are identified as key components responsible for the high genomic instability and perhaps these TEs are utilized by M. oryzae for its acclimatization to adverse environmental conditions. Our results demonstrate how common field stresses change genome dynamics of pathogen and provide perspective to explore the role of TEs in genome adaptability, signalling network and its impact on the virulence of fungal pathogens. PMID:24709911
Rewiring the severe acute respiratory syndrome coronavirus (SARS-CoV) transcription circuit: Engineering a recombination-resistant genome

NASA Astrophysics Data System (ADS)

Yount, Boyd; Roberts, Rhonda S.; Lindesmith, Lisa; Baric, Ralph S.

2006-08-01

Live virus vaccines provide significant protection against many detrimental human and animal diseases, but reversion to virulence by mutation and recombination has reduced appeal. Using severe acute respiratory syndrome coronavirus as a model, we engineered a different transcription regulatory circuit and isolated recombinant viruses. The transcription network allowed for efficient expression of the viral transcripts and proteins, and the recombinant viruses replicated to WT levels. Recombinant genomes were then constructed that contained mixtures of the WT and mutant regulatory circuits, reflecting recombinant viruses that might occur in nature. Although viable viruses could readily be isolated from WT and recombinant genomes containing homogeneous transcription circuits, chimeras that contained mixed regulatory networks were invariantly lethal, because viable chimeric viruses were not isolated. Mechanistically, mixed regulatory circuits promoted inefficient subgenomic transcription from inappropriate start sites, resulting in truncated ORFs and effectively minimize viral structural protein expression. Engineering regulatory transcription circuits of intercommunicating alleles successfully introduces genetic traps into a viral genome that are lethal in RNA recombinant progeny viruses. regulation | systems biology | vaccine design
Variations in the non-coding transcriptome as a driver of inter-strain divergence and physiological adaptation in bacteria.

PubMed

Kopf, Matthias; Klähn, Stephan; Scholz, Ingeborg; Hess, Wolfgang R; Voß, Björn

2015-04-22

In all studied organisms, a substantial portion of the transcriptome consists of non-coding RNAs that frequently execute regulatory functions. Here, we have compared the primary transcriptomes of the cyanobacteria Synechocystis sp. PCC 6714 and PCC 6803 under 10 different conditions. These strains share 2854 protein-coding genes and a 16S rRNA identity of 99.4%, indicating their close relatedness. Conserved major transcriptional start sites (TSSs) give rise to non-coding transcripts within the sigB gene, from the 5'UTRs of cmpA and isiA, and 168 loci in antisense orientation. Distinct differences include single nucleotide polymorphisms rendering promoters inactive in one of the strains, e.g., for cmpR and for the asRNA PsbA2R. Based on the genome-wide mapped location, regulation and classification of TSSs, non-coding transcripts were identified as the most dynamic component of the transcriptome. We identified a class of mRNAs that originate by read-through from an sRNA that accumulates as a discrete and abundant transcript while also serving as the 5'UTR. Such an sRNA/mRNA structure, which we name 'actuaton', represents another way for bacteria to remodel their transcriptional network. Our findings support the hypothesis that variations in the non-coding transcriptome constitute a major evolutionary element of inter-strain divergence and capability for physiological adaptation.
A system-level model for the microbial regulatory genome.

PubMed

Brooks, Aaron N; Reiss, David J; Allard, Antoine; Wu, Wei-Ju; Salvanha, Diego M; Plaisier, Christopher L; Chandrasekaran, Sriram; Pan, Min; Kaur, Amardeep; Baliga, Nitin S

2014-07-15

Microbes can tailor transcriptional responses to diverse environmental challenges despite having streamlined genomes and a limited number of regulators. Here, we present data-driven models that capture the dynamic interplay of the environment and genome-encoded regulatory programs of two types of prokaryotes: Escherichia coli (a bacterium) and Halobacterium salinarum (an archaeon). The models reveal how the genome-wide distributions of cis-acting gene regulatory elements and the conditional influences of transcription factors at each of those elements encode programs for eliciting a wide array of environment-specific responses. We demonstrate how these programs partition transcriptional regulation of genes within regulons and operons to re-organize gene-gene functional associations in each environment. The models capture fitness-relevant co-regulation by different transcriptional control mechanisms acting across the entire genome, to define a generalized, system-level organizing principle for prokaryotic gene regulatory networks that goes well beyond existing paradigms of gene regulation. An online resource (http://egrin2.systemsbiology.net) has been developed to facilitate multiscale exploration of conditional gene regulation in the two prokaryotes. © 2014 The Authors. Published under the terms of the CC BY 4.0 license.
A deep learning method for lincRNA detection using auto-encoder algorithm.

PubMed

Yu, Ning; Yu, Zeng; Pan, Yi

2017-12-06

RNA sequencing technique (RNA-seq) enables scientists to develop novel data-driven methods for discovering more unidentified lincRNAs. Meantime, knowledge-based technologies are experiencing a potential revolution ignited by the new deep learning methods. By scanning the newly found data set from RNA-seq, scientists have found that: (1) the expression of lincRNAs appears to be regulated, that is, the relevance exists along the DNA sequences; (2) lincRNAs contain some conversed patterns/motifs tethered together by non-conserved regions. The two evidences give the reasoning for adopting knowledge-based deep learning methods in lincRNA detection. Similar to coding region transcription, non-coding regions are split at transcriptional sites. However, regulatory RNAs rather than message RNAs are generated. That is, the transcribed RNAs participate the biological process as regulatory units instead of generating proteins. Identifying these transcriptional regions from non-coding regions is the first step towards lincRNA recognition. The auto-encoder method achieves 100% and 92.4% prediction accuracy on transcription sites over the putative data sets. The experimental results also show the excellent performance of predictive deep neural network on the lincRNA data sets compared with support vector machine and traditional neural network. In addition, it is validated through the newly discovered lincRNA data set and one unreported transcription site is found by feeding the whole annotated sequences through the deep learning machine, which indicates that deep learning method has the extensive ability for lincRNA prediction. The transcriptional sequences of lincRNAs are collected from the annotated human DNA genome data. Subsequently, a two-layer deep neural network is developed for the lincRNA detection, which adopts the auto-encoder algorithm and utilizes different encoding schemes to obtain the best performance over intergenic DNA sequence data. Driven by those newly annotated lincRNA data, deep learning methods based on auto-encoder algorithm can exert their capability in knowledge learning in order to capture the useful features and the information correlation along DNA genome sequences for lincRNA detection. As our knowledge, this is the first application to adopt the deep learning techniques for identifying lincRNA transcription sequences.
Mechanistically Distinct Pathways of Divergent Regulatory DNA Creation Contribute to Evolution of Human-Specific Genomic Regulatory Networks Driving Phenotypic Divergence of Homo sapiens

PubMed Central

Glinsky, Gennadi V.

2016-01-01

Abstract Thousands of candidate human-specific regulatory sequences (HSRS) have been identified, supporting the hypothesis that unique to human phenotypes result from human-specific alterations of genomic regulatory networks. Collectively, a compendium of multiple diverse families of HSRS that are functionally and structurally divergent from Great Apes could be defined as the backbone of human-specific genomic regulatory networks. Here, the conservation patterns analysis of 18,364 candidate HSRS was carried out requiring that 100% of bases must remap during the alignments of human, chimpanzee, and bonobo sequences. A total of 5,535 candidate HSRS were identified that are: (i) highly conserved in Great Apes; (ii) evolved by the exaptation of highly conserved ancestral DNA; (iii) defined by either the acceleration of mutation rates on the human lineage or the functional divergence from non-human primates. The exaptation of highly conserved ancestral DNA pathway seems mechanistically distinct from the evolution of regulatory DNA segments driven by the species-specific expansion of transposable elements. Genome-wide proximity placement analysis of HSRS revealed that a small fraction of topologically associating domains (TADs) contain more than half of HSRS from four distinct families. TADs that are enriched for HSRS and termed rapidly evolving in humans TADs (revTADs) comprise 0.8–10.3% of 3,127 TADs in the hESC genome. RevTADs manifest distinct correlation patterns between placements of human accelerated regions, human-specific transcription factor-binding sites, and recombination rates. There is a significant enrichment within revTAD boundaries of hESC-enhancers, primate-specific CTCF-binding sites, human-specific RNAPII-binding sites, hCONDELs, and H3K4me3 peaks with human-specific enrichment at TSS in prefrontal cortex neurons (P < 0.0001 in all instances). Present analysis supports the idea that phenotypic divergence of Homo sapiens is driven by the evolution of human-specific genomic regulatory networks via at least two mechanistically distinct pathways of creation of divergent sequences of regulatory DNA: (i) recombination-associated exaptation of the highly conserved ancestral regulatory DNA segments; (ii) human-specific insertions of transposable elements. PMID:27503290
Consequences of reductive evolution for gene expression in an obligate endosymbiont.

PubMed

Wilcox, Jennifer L; Dunbar, Helen E; Wolfinger, Russell D; Moran, Nancy A

2003-06-01

The smallest cellular genomes are found in obligate symbiotic and pathogenic bacteria living within eukaryotic hosts. In comparison with large genomes of free-living relatives, these reduced genomes are rearranged and have lost most regulatory elements. To test whether reduced bacterial genomes incur reduced regulatory capacities, we used full-genome microarrays to evaluate transcriptional response to environmental stress in Buchnera aphidicola, the obligate endosymbiont of aphids. The 580 genes of the B. aphidicola genome represent a subset of the 4500 genes known from the related organism, Escherichia coli. Although over 20 orthologues of E. coli heat stress (HS) genes are retained by B. aphidicola, only five were differentially expressed after near-lethal heat stress treatments, and only modest shifts were observed. Analyses of upstream regulatory regions revealed loss or degradation of most HS (sigma32) promoters. Genomic rearrangements downstream of an intact HS promoter yielded upregulation of a functionally unrelated and an inactivated gene. Reanalyses of comparable experimental array data for E. coli and Bacillus subtilis revealed that genome-wide differential expression was significantly lower in B. aphidicola. Our demonstration of a diminished stress response validates reports of temperature sensitivity in B. aphidicola and suggests that this reduced bacterial genome exhibits transcriptional inflexibility.
Genome-wide inference of regulatory networks in Streptomyces coelicolor.

PubMed

Castro-Melchor, Marlene; Charaniya, Salim; Karypis, George; Takano, Eriko; Hu, Wei-Shou

2010-10-18

The onset of antibiotics production in Streptomyces species is co-ordinated with differentiation events. An understanding of the genetic circuits that regulate these coupled biological phenomena is essential to discover and engineer the pharmacologically important natural products made by these species. The availability of genomic tools and access to a large warehouse of transcriptome data for the model organism, Streptomyces coelicolor, provides incentive to decipher the intricacies of the regulatory cascades and develop biologically meaningful hypotheses. In this study, more than 500 samples of genome-wide temporal transcriptome data, comprising wild-type and more than 25 regulatory gene mutants of Streptomyces coelicolor probed across multiple stress and medium conditions, were investigated. Information based on transcript and functional similarity was used to update a previously-predicted whole-genome operon map and further applied to predict transcriptional networks constituting modules enriched in diverse functions such as secondary metabolism, and sigma factor. The predicted network displays a scale-free architecture with a small-world property observed in many biological networks. The networks were further investigated to identify functionally-relevant modules that exhibit functional coherence and a consensus motif in the promoter elements indicative of DNA-binding elements. Despite the enormous experimental as well as computational challenges, a systems approach for integrating diverse genome-scale datasets to elucidate complex regulatory networks is beginning to emerge. We present an integrated analysis of transcriptome data and genomic features to refine a whole-genome operon map and to construct regulatory networks at the cistron level in Streptomyces coelicolor. The functionally-relevant modules identified in this study pose as potential targets for further studies and verification.
Regulatory states in the developmental control of gene expression.

PubMed

Peter, Isabelle S

2017-09-01

A growing body of evidence shows that gene expression in multicellular organisms is controlled by the combinatorial function of multiple transcription factors. This indicates that not the individual transcription factors or signaling molecules, but the combination of expressed regulatory molecules, the regulatory state, should be viewed as the functional unit in gene regulation. Here, I discuss the concept of the regulatory state and its proposed role in the genome-wide control of gene expression. Recent analyses of regulatory gene expression in sea urchin embryos have been instrumental for solving the genomic control of cell fate specification in this system. Some of the approaches that were used to determine the expression of regulatory states during sea urchin embryogenesis are reviewed. Significant developmental changes in regulatory state expression leading to the distinct specification of cell fates are regulated by gene regulatory network circuits. How these regulatory state transitions are encoded in the genome is illuminated using the sea urchin endoderm-mesoderms cell fate decision circuit as an example. These observations highlight the importance of considering developmental gene regulation, and the function of individual transcription factors, in the context of regulatory states. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Systems analysis of cis-regulatory motifs in C4 photosynthesis genes using maize and rice leaf transcriptomic data during a process of de-etiolation.

PubMed

Xu, Jiajia; Bräutigam, Andrea; Weber, Andreas P M; Zhu, Xin-Guang

2016-09-01

Identification of potential cis-regulatory motifs controlling the development of C4 photosynthesis is a major focus of current research. In this study, we used time-series RNA-seq data collected from etiolated maize and rice leaf tissues sampled during a de-etiolation process to systematically characterize the expression patterns of C4-related genes and to further identify potential cis elements in five different genomic regions (i.e. promoter, 5'UTR, 3'UTR, intron, and coding sequence) of C4 orthologous genes. The results demonstrate that although most of the C4 genes show similar expression patterns, a number of them, including chloroplast dicarboxylate transporter 1, aspartate aminotransferase, and triose phosphate transporter, show shifted expression patterns compared with their C3 counterparts. A number of conserved short DNA motifs between maize C4 genes and their rice orthologous genes were identified not only in the promoter, 5'UTR, 3'UTR, and coding sequences, but also in the introns of core C4 genes. We also identified cis-regulatory motifs that exist in maize C4 genes and also in genes showing similar expression patterns as maize C4 genes but that do not exist in rice C3 orthologs, suggesting a possible recruitment of pre-existing cis-elements from genes unrelated to C4 photosynthesis into C4 photosynthesis genes during C4 evolution. © The Author 2016. Published by Oxford University Press on behalf of the Society for Experimental Biology.
mQTL-seq delineates functionally relevant candidate gene harbouring a major QTL regulating pod number in chickpea

PubMed Central

Das, Shouvik; Singh, Mohar; Srivastava, Rishi; Bajaj, Deepak; Saxena, Maneesha S.; Rana, Jai C.; Bansal, Kailash C.; Tyagi, Akhilesh K.; Parida, Swarup K.

2016-01-01

The present study used a whole-genome, NGS resequencing-based mQTL-seq (multiple QTL-seq) strategy in two inter-specific mapping populations (Pusa 1103 × ILWC 46 and Pusa 256 × ILWC 46) to scan the major genomic region(s) underlying QTL(s) governing pod number trait in chickpea. Essentially, the whole-genome resequencing of low and high pod number-containing parental accessions and homozygous individuals (constituting bulks) from each of these two mapping populations discovered >8 million high-quality homozygous SNPs with respect to the reference kabuli chickpea. The functional significance of the physically mapped SNPs was apparent from the identified 2,264 non-synonymous and 23,550 regulatory SNPs, with 8–10% of these SNPs-carrying genes corresponding to transcription factors and disease resistance-related proteins. The utilization of these mined SNPs in Δ (SNP index)-led QTL-seq analysis and their correlation between two mapping populations based on mQTL-seq, narrowed down two (CaqaPN4.1: 867.8 kb and CaqaPN4.2: 1.8 Mb) major genomic regions harbouring robust pod number QTLs into the high-resolution short QTL intervals (CaqbPN4.1: 637.5 kb and CaqbPN4.2: 1.28 Mb) on chickpea chromosome 4. The integration of mQTL-seq-derived one novel robust QTL with QTL region-specific association analysis delineated the regulatory (C/T) and coding (C/A) SNPs-containing one pentatricopeptide repeat (PPR) gene at a major QTL region regulating pod number in chickpea. This target gene exhibited anther, mature pollen and pod-specific expression, including pronounced higher up-regulated (∼3.5-folds) transcript expression in high pod number-containing parental accessions and homozygous individuals of two mapping populations especially during pollen and pod development. The proposed mQTL-seq-driven combinatorial strategy has profound efficacy in rapid genome-wide scanning of potential candidate gene(s) underlying trait-associated high-resolution robust QTL(s), thereby expediting genomics-assisted breeding and genetic enhancement of crop plants, including chickpea. PMID:26685680
The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases

PubMed Central

Caspi, Ron; Altman, Tomer; Dale, Joseph M.; Dreher, Kate; Fulcher, Carol A.; Gilham, Fred; Kaipa, Pallavi; Karthikeyan, Athikkattuvalasu S.; Kothari, Anamika; Krummenacker, Markus; Latendresse, Mario; Mueller, Lukas A.; Paley, Suzanne; Popescu, Liviu; Pujar, Anuradha; Shearer, Alexander G.; Zhang, Peifen; Karp, Peter D.

2010-01-01

The MetaCyc database (MetaCyc.org) is a comprehensive and freely accessible resource for metabolic pathways and enzymes from all domains of life. The pathways in MetaCyc are experimentally determined, small-molecule metabolic pathways and are curated from the primary scientific literature. With more than 1400 pathways, MetaCyc is the largest collection of metabolic pathways currently available. Pathways reactions are linked to one or more well-characterized enzymes, and both pathways and enzymes are annotated with reviews, evidence codes, and literature citations. BioCyc (BioCyc.org) is a collection of more than 500 organism-specific Pathway/Genome Databases (PGDBs). Each BioCyc PGDB contains the full genome and predicted metabolic network of one organism. The network, which is predicted by the Pathway Tools software using MetaCyc as a reference, consists of metabolites, enzymes, reactions and metabolic pathways. BioCyc PGDBs also contain additional features, such as predicted operons, transport systems, and pathway hole-fillers. The BioCyc Web site offers several tools for the analysis of the PGDBs, including Omics Viewers that enable visualization of omics datasets on two different genome-scale diagrams and tools for comparative analysis. The BioCyc PGDBs generated by SRI are offered for adoption by any party interested in curation of metabolic, regulatory, and genome-related information about an organism. PMID:19850718
The pineapple genome and the evolution of CAM photosynthesis.

PubMed

Ming, Ray; VanBuren, Robert; Wai, Ching Man; Tang, Haibao; Schatz, Michael C; Bowers, John E; Lyons, Eric; Wang, Ming-Li; Chen, Jung; Biggers, Eric; Zhang, Jisen; Huang, Lixian; Zhang, Lingmao; Miao, Wenjing; Zhang, Jian; Ye, Zhangyao; Miao, Chenyong; Lin, Zhicong; Wang, Hao; Zhou, Hongye; Yim, Won C; Priest, Henry D; Zheng, Chunfang; Woodhouse, Margaret; Edger, Patrick P; Guyot, Romain; Guo, Hao-Bo; Guo, Hong; Zheng, Guangyong; Singh, Ratnesh; Sharma, Anupma; Min, Xiangjia; Zheng, Yun; Lee, Hayan; Gurtowski, James; Sedlazeck, Fritz J; Harkess, Alex; McKain, Michael R; Liao, Zhenyang; Fang, Jingping; Liu, Juan; Zhang, Xiaodan; Zhang, Qing; Hu, Weichang; Qin, Yuan; Wang, Kai; Chen, Li-Yu; Shirley, Neil; Lin, Yann-Rong; Liu, Li-Yu; Hernandez, Alvaro G; Wright, Chris L; Bulone, Vincent; Tuskan, Gerald A; Heath, Katy; Zee, Francis; Moore, Paul H; Sunkar, Ramanjulu; Leebens-Mack, James H; Mockler, Todd; Bennetzen, Jeffrey L; Freeling, Michael; Sankoff, David; Paterson, Andrew H; Zhu, Xinguang; Yang, Xiaohan; Smith, J Andrew C; Cushman, John C; Paull, Robert E; Yu, Qingyi

2015-12-01

Pineapple (Ananas comosus (L.) Merr.) is the most economically valuable crop possessing crassulacean acid metabolism (CAM), a photosynthetic carbon assimilation pathway with high water-use efficiency, and the second most important tropical fruit. We sequenced the genomes of pineapple varieties F153 and MD2 and a wild pineapple relative, Ananas bracteatus accession CB5. The pineapple genome has one fewer ancient whole-genome duplication event than sequenced grass genomes and a conserved karyotype with seven chromosomes from before the ρ duplication event. The pineapple lineage has transitioned from C3 photosynthesis to CAM, with CAM-related genes exhibiting a diel expression pattern in photosynthetic tissues. CAM pathway genes were enriched with cis-regulatory elements associated with the regulation of circadian clock genes, providing the first cis-regulatory link between CAM and circadian clock regulation. Pineapple CAM photosynthesis evolved by the reconfiguration of pathways in C3 plants, through the regulatory neofunctionalization of preexisting genes and not through the acquisition of neofunctionalized genes via whole-genome or tandem gene duplication.
Genomics and the Public Health Code of Ethics

PubMed Central

Thomas, James C.; Irwin, Debra E.; Zuiker, Erin Shaugnessy; Millikan, Robert C.

2005-01-01

We consider the public health applications of genomic technologies as viewed through the lens of the public health code of ethics. We note, for example, the potential for genomics to increase our appreciation for the public health value of interdependence, the potential for some genomic tools to exacerbate health disparities because of their inaccessibility by the poor and the way in which genomics forces public health to refine its notions of prevention. The public health code of ethics sheds light on concerns raised by commercial genomic products that are not discussed in detail by more clinically oriented perspectives. In addition, the concerns raised by genomics highlight areas of our understanding of the ethical principles of public health in which further refinement may be necessary. PMID:16257942
A future scenario of the global regulatory landscape regarding genome-edited crops

PubMed Central

Araki, Motoko

2017-01-01

ABSTRACT The global agricultural landscape regarding the commercial cultivation of genetically modified (GM) crops is mosaic. Meanwhile, a new plant breeding technique, genome editing is expected to make genetic engineering-mediated crop breeding more socially acceptable because it can be used to develop crop varieties without introducing transgenes, which have hampered the regulatory review and public acceptance of GM crops. The present study revealed that product- and process-based concepts have been implemented to regulate GM crops in 30 countries. Moreover, this study analyzed the regulatory responses to genome-edited crops in the USA, Argentina, Sweden and New Zealand. The findings suggested that countries will likely be divided in their policies on genome-edited crops: Some will deregulate transgene-free crops, while others will regulate all types of crops that have been modified by genome editing. These implications are discussed from the viewpoint of public acceptance. PMID:27960622
Comparative genomics of the mimicry switch in Papilio dardanus.

PubMed

Timmermans, Martijn J T N; Baxter, Simon W; Clark, Rebecca; Heckel, David G; Vogel, Heiko; Collins, Steve; Papanicolaou, Alexie; Fukova, Iva; Joron, Mathieu; Thompson, Martin J; Jiggins, Chris D; ffrench-Constant, Richard H; Vogler, Alfried P

2014-07-22

The African Mocker Swallowtail, Papilio dardanus, is a textbook example in evolutionary genetics. Classical breeding experiments have shown that wing pattern variation in this polymorphic Batesian mimic is determined by the polyallelic H locus that controls a set of distinct mimetic phenotypes. Using bacterial artificial chromosome (BAC) sequencing, recombination analyses and comparative genomics, we show that H co-segregates with an interval of less than 500 kb that is collinear with two other Lepidoptera genomes and contains 24 genes, including the transcription factor genes engrailed (en) and invected (inv). H is located in a region of conserved gene order, which argues against any role for genomic translocations in the evolution of a hypothesized multi-gene mimicry locus. Natural populations of P. dardanus show significant associations of specific morphs with single nucleotide polymorphisms (SNPs), centred on en. In addition, SNP variation in the H region reveals evidence of non-neutral molecular evolution in the en gene alone. We find evidence for a duplication potentially driving physical constraints on recombination in the lamborni morph. Absence of perfect linkage disequilibrium between different genes in the other morphs suggests that H is limited to nucleotide positions in the regulatory and coding regions of en. Our results therefore support the hypothesis that a single gene underlies wing pattern variation in P. dardanus.
Characterization of the reniform nematode genome by shotgun sequencing.

PubMed

Nyaku, Seloame T; Sripathi, Venkateswara R; Kantety, Ramesh V; Cseke, Sarah B; Buyyarapu, Ramesh; Mc Ewan, Robert; Gu, Yong Q; Lawrence, Kathy; Senwo, Zachary; Sripathi, Padmini; George, Pheba; Sharma, Govind C

2014-04-01

The reniform nematode (RN), a major agricultural pest particularly on cotton in the United States, is among the major plant-parasitic nematodes for which limited genomic information exists. In this study, over 380 Mb of sequence data were generated from pooled DNA of four adult female RNs and assembled into 67,317 contigs, including 25,904 (38.5%) predicted coding contigs and 41,413 (61.5%) noncoding contigs. Most of the characterized repeats were of low complexity (88.9%), and 0.9% of the contigs matched with 53.2% of GenBank ESTs. The most frequent Gene Ontology (GO) terms for molecular function and biological process were protein binding (32%) and embryonic development (20%). Further analysis showed that 741 (1.1%), 94 (0.1%), and 169 (0.25%) RN genomic contigs matched with 1328 (13.9%), 1480 (5.4%), and 1330 (7.4%) supercontigs of Meloidogyne incognita, Brugia malayi, and Pristionchus pacificus, respectively. Chromosome 5 of Caenorhabditis elegans had the highest number of hits to the RN contigs. Seven putative detoxification genes and three carbohydrate-active enzymes (CAZymes) involved in cell wall degradation were studied in more detail. Additionally, kinases, G protein-coupled receptors, and neuropeptides functioning in physiological, developmental, and regulatory processes were identified in the RN genome.

Feasibility of Genome-Wide Screening for Biosafety Assessment of Probiotics: A Case Study of Lactobacillus helveticus MTCC 5463.

PubMed

Senan, S; Prajapati, J B; Joshi, C G

2015-12-01

Recent years have witnessed an explosion in genome sequencing of probiotic strains for accurate identification and characterization. Regulatory bodies are emphasizing on the need for performing phase I safety studies for probiotics. The main hypothesis of this study was to explore the feasibility of using genome databases for safety screening of strains. In this study, we attempted to develop a framework for the safety assessment of a potential probiotic strain, Lactobacillus helveticus MTCC 5463 based on genome mining for genes associated with antibiotic resistance, production of harmful metabolites, and virulence. The sequencing of MTCC 5463 was performed using GS-FLX Titanium reagents. Genes coding for antibiotic resistance and virulence were identified using Antibiotic Resistance Genes Database and Virulence Factors Database. Results indicated that MTCC 5463 carried antibiotic resistance genes associated with beta-lactam and fluoroquinolone. There is no threat of transfer of these genes to host gut commensals because the genes are not plasmid encoded. The presence of genes for adhesion, biofilm, surface proteins, and stress-related proteins provides robustness to the strain. The presence of hemolysin gene in the genome revealed a theoretical risk of virulence. The results of in silico analysis complemented the in vitro studies and human clinical trials, confirming the safety of the probiotic strain. We propose that the safety assessment of probiotic strains administered live at high doses using a genome-wide screening could be an effective and time-saving tool for identifying prognostic biomarkers of biosafety.
On the Structural Plasticity of the Human Genome: Chromosomal Inversions Revisited

PubMed Central

Alves, Joao M; Lopes, Alexandra M; Chikhi, Lounès; Amorim, António

2012-01-01

With the aid of novel and powerful molecular biology techniques, recent years have witnessed a dramatic increase in the number of studies reporting the involvement of complex structural variants in several genomic disorders. In fact, with the discovery of Copy Number Variants (CNVs) and other forms of unbalanced structural variation, much attention has been directed to the detection and characterization of such rearrangements, as well as the identification of the mechanisms involved in their formation. However, it has long been appreciated that chromosomes can undergo other forms of structural changes - balanced rearrangements - that do not involve quantitative variation of genetic material. Indeed, a particular subtype of balanced rearrangement – inversions – was recently found to be far more common than had been predicted from traditional cytogenetics. Chromosomal inversions alter the orientation of a specific genomic sequence and, unless involving breaks in coding or regulatory regions (and, disregarding complex trans effects, in their close vicinity), appear to be phenotypically silent. Such a surprising finding, which is difficult to reconcile with the classical interpretation of inversions as a mechanism causing subfertility (and ultimately reproductive isolation), motivated a new series of theoretical and empirical studies dedicated to understand their role in human genome evolution and to explore their possible association to complex genetic disorders. With this review, we attempt to describe the latest methodological improvements to inversions detection at a genome wide level, while exploring some of the possible implications of inversion rearrangements on the evolution of the human genome. PMID:23730202
Organisation of the plant genome in chromosomes.

PubMed

Heslop-Harrison, J S Pat; Schwarzacher, Trude

2011-04-01

The plant genome is organized into chromosomes that provide the structure for the genetic linkage groups and allow faithful replication, transcription and transmission of the hereditary information. Genome sizes in plants are remarkably diverse, with a 2350-fold range from 63 to 149,000 Mb, divided into n=2 to n= approximately 600 chromosomes. Despite this huge range, structural features of chromosomes like centromeres, telomeres and chromatin packaging are well-conserved. The smallest genomes consist of mostly coding and regulatory DNA sequences present in low copy, along with highly repeated rDNA (rRNA genes and intergenic spacers), centromeric and telomeric repetitive DNA and some transposable elements. The larger genomes have similar numbers of genes, with abundant tandemly repeated sequence motifs, and transposable elements alone represent more than half the DNA present. Chromosomes evolve by fission, fusion, duplication and insertion events, allowing evolution of chromosome size and chromosome number. A combination of sequence analysis, genetic mapping and molecular cytogenetic methods with comparative analysis, all only becoming widely available in the 21st century, is elucidating the exact nature of the chromosome evolution events at all timescales, from the base of the plant kingdom, to intraspecific or hybridization events associated with recent plant breeding. As well as being of fundamental interest, understanding and exploiting evolutionary mechanisms in plant genomes is likely to be a key to crop development for food production. © 2011 The Authors. The Plant Journal © 2011 Blackwell Publishing Ltd.
Comparative genomics of metabolic capacities of regulons controlled by cis-regulatory RNA motifs in bacteria.

PubMed

Sun, Eric I; Leyn, Semen A; Kazanov, Marat D; Saier, Milton H; Novichkov, Pavel S; Rodionov, Dmitry A

2013-09-02

In silico comparative genomics approaches have been efficiently used for functional prediction and reconstruction of metabolic and regulatory networks. Riboswitches are metabolite-sensing structures often found in bacterial mRNA leaders controlling gene expression on transcriptional or translational levels.An increasing number of riboswitches and other cis-regulatory RNAs have been recently classified into numerous RNA families in the Rfam database. High conservation of these RNA motifs provides a unique advantage for their genomic identification and comparative analysis. A comparative genomics approach implemented in the RegPredict tool was used for reconstruction and functional annotation of regulons controlled by RNAs from 43 Rfam families in diverse taxonomic groups of Bacteria. The inferred regulons include ~5200 cis-regulatory RNAs and more than 12000 target genes in 255 microbial genomes. All predicted RNA-regulated genes were classified into specific and overall functional categories. Analysis of taxonomic distribution of these categories allowed us to establish major functional preferences for each analyzed cis-regulatory RNA motif family. Overall, most RNA motif regulons showed predictable functional content in accordance with their experimentally established effector ligands. Our results suggest that some RNA motifs (including thiamin pyrophosphate and cobalamin riboswitches that control the cofactor metabolism) are widespread and likely originated from the last common ancestor of all bacteria. However, many more analyzed RNA motifs are restricted to a narrow taxonomic group of bacteria and likely represent more recent evolutionary innovations. The reconstructed regulatory networks for major known RNA motifs substantially expand the existing knowledge of transcriptional regulation in bacteria. The inferred regulons can be used for genetic experiments, functional annotations of genes, metabolic reconstruction and evolutionary analysis. The obtained genome-wide collection of reference RNA motif regulons is available in the RegPrecise database (http://regprecise.lbl.gov/).
Population Genomics of Paramecium Species.

PubMed

Johri, Parul; Krenek, Sascha; Marinov, Georgi K; Doak, Thomas G; Berendonk, Thomas U; Lynch, Michael

2017-05-01

Population-genomic analyses are essential to understanding factors shaping genomic variation and lineage-specific sequence constraints. The dearth of such analyses for unicellular eukaryotes prompted us to assess genomic variation in Paramecium, one of the most well-studied ciliate genera. The Paramecium aurelia complex consists of ∼15 morphologically indistinguishable species that diverged subsequent to two rounds of whole-genome duplications (WGDs, as long as 320 MYA) and possess extremely streamlined genomes. We examine patterns of both nuclear and mitochondrial polymorphism, by sequencing whole genomes of 10-13 worldwide isolates of each of three species belonging to the P. aurelia complex: P. tetraurelia, P. biaurelia, P. sexaurelia, as well as two outgroup species that do not share the WGDs: P. caudatum and P. multimicronucleatum. An apparent absence of global geographic population structure suggests continuous or recent dispersal of Paramecium over long distances. Intergenic regions are highly constrained relative to coding sequences, especially in P. caudatum and P. multimicronucleatum that have shorter intergenic distances. Sequence diversity and divergence are reduced up to ∼100-150 bp both upstream and downstream of genes, suggesting strong constraints imposed by the presence of densely packed regulatory modules. In addition, comparison of sequence variation at non-synonymous and synonymous sites suggests similar recent selective pressures on paralogs within and orthologs across the deeply diverging species. This study presents the first genome-wide population-genomic analysis in ciliates and provides a valuable resource for future studies in evolutionary and functional genetics in Paramecium. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Genomic deletion of a long-range bone enhancer misregulatessclerostin in Van Buchem disease

DOE Office of Scientific and Technical Information (OSTI.GOV)

Loots, Gabriela G.; Kneissel, Michaela; Keller, Hansjoerg

2005-04-15

Mutations in distant regulatory elements can negatively impact human development and health, yet due to the difficulty of detecting these critical sequences we predominantly focus on coding sequences for diagnostic purposes. We have undertaken a comparative sequence-based approach to characterize a large noncoding region deleted in patients affected by Van Buchem disease (VB), a severe sclerosing bone dysplasia. Using BAC recombination and transgenesis we characterized the expression of human sclerostin (sost) from normal (hSOSTwt) or Van Buchem(hSOSTvb D) alleles. Only the hSOSTwt allele faithfully expressed high levels of human sost in the adult bone and impacted bone metabolism, consistent withmore » the model that the VB noncoding deletion removes a sost specific regulatory element. By exploiting cross-species sequence comparisons with in vitro and in vivo enhancer assays we were able to identify a candidate enhancer element that drives human sost expression in osteoblast-like cell lines in vitro and in the skeletal anlage of the E14.5 mouse embryo, and discovered a novel function for sclerostin during limb development. Our approach represents a framework for characterizing distant regulatory elements associated with abnormal human phenotypes.« less
Expression of Antisense Long Noncoding RNAs as Potential Regulators in Rainbow Trout with Different Tolerance to Plant-Based Diets.

PubMed

Abernathy, Jason; Overturf, Ken

2018-01-04

Reformulation of aquafeeds in salmonid diets to include more plant proteins is critical for sustainable aquaculture. However, increasing plant proteins can lead to stunted growth and enteritis. Toward an understanding of the regulatory mechanisms behind plant protein utilization, directional RNA sequencing of liver tissues from a rainbow trout strain selected for growth on an all plant-protein diet and a control strain, both fed a plant diet for 12 weeks, were utilized to construct long noncoding RNAs. Antisense long noncoding RNAs were selected for differential expression and functional analyses since they have been shown to have regulatory actions within a genome. A total of 142 unique antisense long noncoding RNAs were differentially expressed between strains, 60 of which could be mapped to a gene. Genes underlying these noncoding RNAs are indicated in lipid metabolism and immunity. Six noncoding transcripts were also found to overlap with differentially expressed protein-coding genes, all of which were co-expressed. Associating variation in regulatory elements between rainbow trout strains with differing tolerance to plant-protein diets will assist in future studies toward increased gains throughout carnivorous aquaculture.
Reverse Engineering of Genome-wide Gene Regulatory Networks from Gene Expression Data

PubMed Central

Liu, Zhi-Ping

2015-01-01

Transcriptional regulation plays vital roles in many fundamental biological processes. Reverse engineering of genome-wide regulatory networks from high-throughput transcriptomic data provides a promising way to characterize the global scenario of regulatory relationships between regulators and their targets. In this review, we summarize and categorize the main frameworks and methods currently available for inferring transcriptional regulatory networks from microarray gene expression profiling data. We overview each of strategies and introduce representative methods respectively. Their assumptions, advantages, shortcomings, and possible improvements and extensions are also clarified and commented. PMID:25937810
A somatic-mutational process recurrently duplicates germline susceptibility loci and tissue-specific super-enhancers in breast cancers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Glodzik, Dominik; Morganella, Sandro; Davies, Helen

Somatic rearrangements contribute to the mutagenized landscape of cancer genomes. Here, we systematically interrogated rearrangements in 560 breast cancers by using a piecewise constant fitting approach. We identified 33 hotspots of large (>100 kb) tandem duplications, a mutational signature associated with homologous-recombination-repair deficiency. Notably, these tandem-duplication hotspots were enriched in breast cancer germline susceptibility loci (odds ratio (OR) = 4.28) and breast-specific 'super-enhancer' regulatory elements (OR = 3.54). These hotspots may be sites of selective susceptibility to double-strand-break damage due to high transcriptional activity or, through incrementally increasing copy number, may be sites of secondary selective pressure. Furthermore, the transcriptomicmore » consequences ranged from strong individual oncogene effects to weak but quantifiable multigene expression effects. We thus present a somatic-rearrangement mutational process affecting coding sequences and noncoding regulatory elements and contributing a continuum of driver consequences, from modest to strong effects, thereby supporting a polygenic model of cancer development.« less
A somatic-mutational process recurrently duplicates germline susceptibility loci and tissue-specific super-enhancers in breast cancers

DOE PAGES

Glodzik, Dominik; Morganella, Sandro; Davies, Helen; ...

2017-01-23

Somatic rearrangements contribute to the mutagenized landscape of cancer genomes. Here, we systematically interrogated rearrangements in 560 breast cancers by using a piecewise constant fitting approach. We identified 33 hotspots of large (>100 kb) tandem duplications, a mutational signature associated with homologous-recombination-repair deficiency. Notably, these tandem-duplication hotspots were enriched in breast cancer germline susceptibility loci (odds ratio (OR) = 4.28) and breast-specific 'super-enhancer' regulatory elements (OR = 3.54). These hotspots may be sites of selective susceptibility to double-strand-break damage due to high transcriptional activity or, through incrementally increasing copy number, may be sites of secondary selective pressure. Furthermore, the transcriptomicmore » consequences ranged from strong individual oncogene effects to weak but quantifiable multigene expression effects. We thus present a somatic-rearrangement mutational process affecting coding sequences and noncoding regulatory elements and contributing a continuum of driver consequences, from modest to strong effects, thereby supporting a polygenic model of cancer development.« less
Cnidarian Cell Type Diversity and Regulation Revealed by Whole-Organism Single-Cell RNA-Seq.

PubMed

Sebé-Pedrós, Arnau; Saudemont, Baptiste; Chomsky, Elad; Plessier, Flora; Mailhé, Marie-Pierre; Renno, Justine; Loe-Mie, Yann; Lifshitz, Aviezer; Mukamel, Zohar; Schmutz, Sandrine; Novault, Sophie; Steinmetz, Patrick R H; Spitz, François; Tanay, Amos; Marlow, Heather

2018-05-31

The emergence and diversification of cell types is a leading factor in animal evolution. So far, systematic characterization of the gene regulatory programs associated with cell type specificity was limited to few cell types and few species. Here, we perform whole-organism single-cell transcriptomics to map adult and larval cell types in the cnidarian Nematostella vectensis, a non-bilaterian animal with complex tissue-level body-plan organization. We uncover eight broad cell classes in Nematostella, including neurons, cnidocytes, and digestive cells. Each class comprises different subtypes defined by the expression of multiple specific markers. In particular, we characterize a surprisingly diverse repertoire of neurons, which comparative analysis suggests are the result of lineage-specific diversification. By integrating transcription factor expression, chromatin profiling, and sequence motif analysis, we identify the regulatory codes that underlie Nematostella cell-specific expression. Our study reveals cnidarian cell type complexity and provides insights into the evolution of animal cell-specific genomic regulation. Copyright © 2018 Elsevier Inc. All rights reserved.
Whole Genome and Tandem Duplicate Retention Facilitated Glucosinolate Pathway Diversification in the Mustard Family

PubMed Central

Hofberger, Johannes A.; Lyons, Eric; Edger, Patrick P.; Chris Pires, J.; Eric Schranz, M.

2013-01-01

Plants share a common history of successive whole-genome duplication (WGD) events retaining genomic patterns of duplicate gene copies (ohnologs) organized in conserved syntenic blocks. Duplication was often proposed to affect the origin of novel traits during evolution. However, genetic evidence linking WGD to pathway diversification is scarce. We show that WGD and tandem duplication (TD) accelerated genetic versatility of plant secondary metabolism, exemplified with the glucosinolate (GS) pathway in the mustard family. GS biosynthesis is a well-studied trait, employing at least 52 biosynthetic and regulatory genes in the model plant Arabidopsis. In a phylogenomics approach, we identified 67 GS loci in Aethionema arabicum of the tribe Aethionemae, sister group to all mustard family members. All but one of the Arabidopsis GS gene families evolved orthologs in Aethionema and all but one of the orthologous sequence pairs exhibit synteny. The 45% fraction of duplicates among all protein-coding genes in Arabidopsis was increased to 95% and 97% for Arabidopsis and Aethionema GS pathway inventory, respectively. Compared with the 22% average for all protein-coding genes in Arabidopsis, 52% and 56% of Aethionema and Arabidopsis GS loci align to ohnolog copies dating back to the last common WGD event. Although 15% of all Arabidopsis genes are organized in tandem arrays, 45% and 48% of GS loci in Arabidopsis and Aethionema descend from TD, respectively. We describe a sequential combination of TD and WGD events driving gene family extension, thereby expanding the evolutionary playground for functional diversification and thus potential novelty and success. PMID:24171911
Functional interrogation of non-coding DNA through CRISPR genome editing

PubMed Central

Canver, Matthew C.; Bauer, Daniel E.; Orkin, Stuart H.

2017-01-01

Methodologies to interrogate non-coding regions have lagged behind coding regions despite comprising the vast majority of the genome. However, the rapid evolution of clustered regularly interspaced short palindromic repeats (CRISPR)-based genome editing has provided a multitude of novel techniques for laboratory investigation including significant contributions to the toolbox for studying non-coding DNA. CRISPR-mediated loss-of-function strategies rely on direct disruption of the underlying sequence or repression of transcription without modifying the targeted DNA sequence. CRISPR-mediated gain-of-function approaches similarly benefit from methods to alter the targeted sequence through integration of customized sequence into the genome as well as methods to activate transcription. Here we review CRISPR-based loss- and gain-of-function techniques for the interrogation of non-coding DNA. PMID:28288828
A genomic regulatory network for development

NASA Technical Reports Server (NTRS)

Davidson, Eric H.; Rast, Jonathan P.; Oliveri, Paola; Ransick, Andrew; Calestani, Cristina; Yuh, Chiou-Hwa; Minokawa, Takuya; Amore, Gabriele; Hinman, Veronica; Arenas-Mena, Cesar;

2002-01-01

Development of the body plan is controlled by large networks of regulatory genes. A gene regulatory network that controls the specification of endoderm and mesoderm in the sea urchin embryo is summarized here. The network was derived from large-scale perturbation analyses, in combination with computational methodologies, genomic data, cis-regulatory analysis, and molecular embryology. The network contains over 40 genes at present, and each node can be directly verified at the DNA sequence level by cis-regulatory analysis. Its architecture reveals specific and general aspects of development, such as how given cells generate their ordained fates in the embryo and why the process moves inexorably forward in developmental time.

Genome-scale deletion screening of human long non-coding RNAs using a paired-guide RNA CRISPR library

PubMed Central

Zhu, Shiyou; Li, Wei; Liu, Jingze; Chen, Chen-Hao; Liao, Qi; Xu, Ping; Xu, Han; Xiao, Tengfei; Cao, Zhongzheng; Peng, Jingyu; Yuan, Pengfei; Brown, Myles; Liu, Xiaole Shirley; Wei, Wensheng

2017-01-01

CRISPR/Cas9 screens have been widely adopted to analyse coding gene functions, but high throughput screening of non-coding elements using this method is more challenging, because indels caused by a single cut in non-coding regions are unlikely to produce a functional knockout. A high-throughput method to produce deletions of non-coding DNA is needed. Herein, we report a high throughput genomic deletion strategy to screen for functional long non-coding RNAs (lncRNAs) that is based on a lentiviral paired-guide RNA (pgRNA) library. Applying our screening method, we identified 51 lncRNAs that can positively or negatively regulate human cancer cell growth. We individually validated 9 lncRNAs using CRISPR/Cas9-mediated genomic deletion and functional rescue, CRISPR activation or inhibition, and gene expression profiling. Our high-throughput pgRNA genome deletion method should enable rapid identification of functional mammalian non-coding elements. PMID:27798563
The X chromosome in space.

PubMed

Jégu, Teddy; Aeby, Eric; Lee, Jeannie T

2017-06-01

Extensive 3D folding is required to package a genome into the tiny nuclear space, and this packaging must be compatible with proper gene expression. Thus, in the well-hierarchized nucleus, chromosomes occupy discrete territories and adopt specific 3D organizational structures that facilitate interactions between regulatory elements for gene expression. The mammalian X chromosome exemplifies this structure-function relationship. Recent studies have shown that, upon X-chromosome inactivation, active and inactive X chromosomes localize to different subnuclear positions and adopt distinct chromosomal architectures that reflect their activity states. Here, we review the roles of long non-coding RNAs, chromosomal organizational structures and the subnuclear localization of chromosomes as they relate to X-linked gene expression.
Genome sequence of an Australian kangaroo, Macropus eugenii, provides insight into the evolution of mammalian reproduction and development

PubMed Central

2011-01-01

Background We present the genome sequence of the tammar wallaby, Macropus eugenii, which is a member of the kangaroo family and the first representative of the iconic hopping mammals that symbolize Australia to be sequenced. The tammar has many unusual biological characteristics, including the longest period of embryonic diapause of any mammal, extremely synchronized seasonal breeding and prolonged and sophisticated lactation within a well-defined pouch. Like other marsupials, it gives birth to highly altricial young, and has a small number of very large chromosomes, making it a valuable model for genomics, reproduction and development. Results The genome has been sequenced to 2 × coverage using Sanger sequencing, enhanced with additional next generation sequencing and the integration of extensive physical and linkage maps to build the genome assembly. We also sequenced the tammar transcriptome across many tissues and developmental time points. Our analyses of these data shed light on mammalian reproduction, development and genome evolution: there is innovation in reproductive and lactational genes, rapid evolution of germ cell genes, and incomplete, locus-specific X inactivation. We also observe novel retrotransposons and a highly rearranged major histocompatibility complex, with many class I genes located outside the complex. Novel microRNAs in the tammar HOX clusters uncover new potential mammalian HOX regulatory elements. Conclusions Analyses of these resources enhance our understanding of marsupial gene evolution, identify marsupial-specific conserved non-coding elements and critical genes across a range of biological systems, including reproduction, development and immunity, and provide new insight into marsupial and mammalian biology and genome evolution. PMID:21854559
Genome sequence of an Australian kangaroo, Macropus eugenii, provides insight into the evolution of mammalian reproduction and development.

PubMed

Renfree, Marilyn B; Papenfuss, Anthony T; Deakin, Janine E; Lindsay, James; Heider, Thomas; Belov, Katherine; Rens, Willem; Waters, Paul D; Pharo, Elizabeth A; Shaw, Geoff; Wong, Emily S W; Lefèvre, Christophe M; Nicholas, Kevin R; Kuroki, Yoko; Wakefield, Matthew J; Zenger, Kyall R; Wang, Chenwei; Ferguson-Smith, Malcolm; Nicholas, Frank W; Hickford, Danielle; Yu, Hongshi; Short, Kirsty R; Siddle, Hannah V; Frankenberg, Stephen R; Chew, Keng Yih; Menzies, Brandon R; Stringer, Jessica M; Suzuki, Shunsuke; Hore, Timothy A; Delbridge, Margaret L; Patel, Hardip R; Mohammadi, Amir; Schneider, Nanette Y; Hu, Yanqiu; O'Hara, William; Al Nadaf, Shafagh; Wu, Chen; Feng, Zhi-Ping; Cocks, Benjamin G; Wang, Jianghui; Flicek, Paul; Searle, Stephen M J; Fairley, Susan; Beal, Kathryn; Herrero, Javier; Carone, Dawn M; Suzuki, Yutaka; Sugano, Sumio; Toyoda, Atsushi; Sakaki, Yoshiyuki; Kondo, Shinji; Nishida, Yuichiro; Tatsumoto, Shoji; Mandiou, Ion; Hsu, Arthur; McColl, Kaighin A; Lansdell, Benjamin; Weinstock, George; Kuczek, Elizabeth; McGrath, Annette; Wilson, Peter; Men, Artem; Hazar-Rethinam, Mehlika; Hall, Allison; Davis, John; Wood, David; Williams, Sarah; Sundaravadanam, Yogi; Muzny, Donna M; Jhangiani, Shalini N; Lewis, Lora R; Morgan, Margaret B; Okwuonu, Geoffrey O; Ruiz, San Juana; Santibanez, Jireh; Nazareth, Lynne; Cree, Andrew; Fowler, Gerald; Kovar, Christie L; Dinh, Huyen H; Joshi, Vandita; Jing, Chyn; Lara, Fremiet; Thornton, Rebecca; Chen, Lei; Deng, Jixin; Liu, Yue; Shen, Joshua Y; Song, Xing-Zhi; Edson, Janette; Troon, Carmen; Thomas, Daniel; Stephens, Amber; Yapa, Lankesha; Levchenko, Tanya; Gibbs, Richard A; Cooper, Desmond W; Speed, Terence P; Fujiyama, Asao; Graves, Jennifer A M; O'Neill, Rachel J; Pask, Andrew J; Forrest, Susan M; Worley, Kim C

2011-08-29

We present the genome sequence of the tammar wallaby, Macropus eugenii, which is a member of the kangaroo family and the first representative of the iconic hopping mammals that symbolize Australia to be sequenced. The tammar has many unusual biological characteristics, including the longest period of embryonic diapause of any mammal, extremely synchronized seasonal breeding and prolonged and sophisticated lactation within a well-defined pouch. Like other marsupials, it gives birth to highly altricial young, and has a small number of very large chromosomes, making it a valuable model for genomics, reproduction and development. The genome has been sequenced to 2 × coverage using Sanger sequencing, enhanced with additional next generation sequencing and the integration of extensive physical and linkage maps to build the genome assembly. We also sequenced the tammar transcriptome across many tissues and developmental time points. Our analyses of these data shed light on mammalian reproduction, development and genome evolution: there is innovation in reproductive and lactational genes, rapid evolution of germ cell genes, and incomplete, locus-specific X inactivation. We also observe novel retrotransposons and a highly rearranged major histocompatibility complex, with many class I genes located outside the complex. Novel microRNAs in the tammar HOX clusters uncover new potential mammalian HOX regulatory elements. Analyses of these resources enhance our understanding of marsupial gene evolution, identify marsupial-specific conserved non-coding elements and critical genes across a range of biological systems, including reproduction, development and immunity, and provide new insight into marsupial and mammalian biology and genome evolution.
Genomic Sequence around Butterfly Wing Development Genes: Annotation and Comparative Analysis

PubMed Central

Conceição, Inês C.; Long, Anthony D.; Gruber, Jonathan D.; Beldade, Patrícia

2011-01-01

Background Analysis of genomic sequence allows characterization of genome content and organization, and access beyond gene-coding regions for identification of functional elements. BAC libraries, where relatively large genomic regions are made readily available, are especially useful for species without a fully sequenced genome and can increase genomic coverage of phylogenetic and biological diversity. For example, no butterfly genome is yet available despite the unique genetic and biological properties of this group, such as diversified wing color patterns. The evolution and development of these patterns is being studied in a few target species, including Bicyclus anynana, where a whole-genome BAC library allows targeted access to large genomic regions. Methodology/Principal Findings We characterize ∼1.3 Mb of genomic sequence around 11 selected genes expressed in B. anynana developing wings. Extensive manual curation of in silico predictions, also making use of a large dataset of expressed genes for this species, identified repetitive elements and protein coding sequence, and highlighted an expansion of Alcohol dehydrogenase genes. Comparative analysis with orthologous regions of the lepidopteran reference genome allowed assessment of conservation of fine-scale synteny (with detection of new inversions and translocations) and of DNA sequence (with detection of high levels of conservation of non-coding regions around some, but not all, developmental genes). Conclusions The general properties and organization of the available B. anynana genomic sequence are similar to the lepidopteran reference, despite the more than 140 MY divergence. Our results lay the groundwork for further studies of new interesting findings in relation to both coding and non-coding sequence: 1) the Alcohol dehydrogenase expansion with higher similarity between the five tandemly-repeated B. anynana paralogs than with the corresponding B. mori orthologs, and 2) the high conservation of non-coding sequence around the genes wingless and Ecdysone receptor, both involved in multiple developmental processes including wing pattern formation. PMID:21909358
Stallion sperm transcriptome comprises functionally coherent coding and regulatory RNAs as revealed by microarray analysis and RNA-seq.

PubMed

Das, Pranab J; McCarthy, Fiona; Vishnoi, Monika; Paria, Nandina; Gresham, Cathy; Li, Gang; Kachroo, Priyanka; Sudderth, A Kendrick; Teague, Sheila; Love, Charles C; Varner, Dickson D; Chowdhary, Bhanu P; Raudsepp, Terje

2013-01-01

Mature mammalian sperm contain a complex population of RNAs some of which might regulate spermatogenesis while others probably play a role in fertilization and early development. Due to this limited knowledge, the biological functions of sperm RNAs remain enigmatic. Here we report the first characterization of the global transcriptome of the sperm of fertile stallions. The findings improved understanding of the biological significance of sperm RNAs which in turn will allow the discovery of sperm-based biomarkers for stallion fertility. The stallion sperm transcriptome was interrogated by analyzing sperm and testes RNA on a 21,000-element equine whole-genome oligoarray and by RNA-seq. Microarray analysis revealed 6,761 transcripts in the sperm, of which 165 were sperm-enriched, and 155 were differentially expressed between the sperm and testes. Next, 70 million raw reads were generated by RNA-seq of which 50% could be aligned with the horse reference genome. A total of 19,257 sequence tags were mapped to all horse chromosomes and the mitochondrial genome. The highest density of mapped transcripts was in gene-rich ECA11, 12 and 13, and the lowest in gene-poor ECA9 and X; 7 gene transcripts originated from ECAY. Structural annotation aligned sperm transcripts with 4,504 known horse and/or human genes, rRNAs and 82 miRNAs, whereas 13,354 sequence tags remained anonymous. The data were aligned with selected equine gene models to identify additional exons and splice variants. Gene Ontology annotations showed that sperm transcripts were associated with molecular processes (chemoattractant-activated signal transduction, ion transport) and cellular components (membranes and vesicles) related to known sperm functions at fertilization, while some messenger and micro RNAs might be critical for early development. The findings suggest that the rich repertoire of coding and non-coding RNAs in stallion sperm is not a random remnant from spermatogenesis in testes but a selectively retained and functionally coherent collection of RNAs.

Stallion Sperm Transcriptome Comprises Functionally Coherent Coding and Regulatory RNAs as Revealed by Microarray Analysis and RNA-seq

PubMed Central

Das, Pranab J.; McCarthy, Fiona; Vishnoi, Monika; Paria, Nandina; Gresham, Cathy; Li, Gang; Kachroo, Priyanka; Sudderth, A. Kendrick; Teague, Sheila; Love, Charles C.; Varner, Dickson D.; Chowdhary, Bhanu P.; Raudsepp, Terje

2013-01-01

Mature mammalian sperm contain a complex population of RNAs some of which might regulate spermatogenesis while others probably play a role in fertilization and early development. Due to this limited knowledge, the biological functions of sperm RNAs remain enigmatic. Here we report the first characterization of the global transcriptome of the sperm of fertile stallions. The findings improved understanding of the biological significance of sperm RNAs which in turn will allow the discovery of sperm-based biomarkers for stallion fertility. The stallion sperm transcriptome was interrogated by analyzing sperm and testes RNA on a 21,000-element equine whole-genome oligoarray and by RNA-seq. Microarray analysis revealed 6,761 transcripts in the sperm, of which 165 were sperm-enriched, and 155 were differentially expressed between the sperm and testes. Next, 70 million raw reads were generated by RNA-seq of which 50% could be aligned with the horse reference genome. A total of 19,257 sequence tags were mapped to all horse chromosomes and the mitochondrial genome. The highest density of mapped transcripts was in gene-rich ECA11, 12 and 13, and the lowest in gene-poor ECA9 and X; 7 gene transcripts originated from ECAY. Structural annotation aligned sperm transcripts with 4,504 known horse and/or human genes, rRNAs and 82 miRNAs, whereas 13,354 sequence tags remained anonymous. The data were aligned with selected equine gene models to identify additional exons and splice variants. Gene Ontology annotations showed that sperm transcripts were associated with molecular processes (chemoattractant-activated signal transduction, ion transport) and cellular components (membranes and vesicles) related to known sperm functions at fertilization, while some messenger and micro RNAs might be critical for early development. The findings suggest that the rich repertoire of coding and non-coding RNAs in stallion sperm is not a random remnant from spermatogenesis in testes but a selectively retained and functionally coherent collection of RNAs. PMID:23409192
Genomics in Regulatory Ecotoxicology: Applications and Challenges: Publication Announcement

EPA Science Inventory

In September, 2005, a workshop was held at the original home of the SETAC Pellston series, in Pellston, MI. The goal of this workshop was to explore whether and how genomics could be effectively applied to challenges in regulatory ecotoxicology. The book emanating from the worksh...
Evolutionary dynamics of a conserved sequence motif in the ribosomal genes of the ciliate Paramecium

PubMed Central

2010-01-01

Background In protozoa, the identification of preserved motifs by comparative genomics is often impeded by difficulties to generate reliable alignments for non-coding sequences. Moreover, the evolutionary dynamics of regulatory elements in 3' untranslated regions (both in protozoa and metazoa) remains a virtually unexplored issue. Results By screening Paramecium tetraurelia's 3' untranslated regions for 8-mers that were previously found to be preserved in mammalian 3' UTRs, we detect and characterize a motif that is distinctly conserved in the ribosomal genes of this ciliate. The motif appears to be conserved across Paramecium aurelia species but is absent from the ribosomal genes of four additional non-Paramecium species surveyed, including another ciliate, Tetrahymena thermophila. Motif-free ribosomal genes retain fewer paralogs in the genome and appear to be lost more rapidly relative to motif-containing genes. Features associated with the discovered preserved motif are consistent with this 8-mer playing a role in post-transcriptional regulation. Conclusions Our observations 1) shed light on the evolution of a putative regulatory motif across large phylogenetic distances; 2) are expected to facilitate the understanding of the modulation of ribosomal genes expression in Paramecium; and 3) reveal a largely unexplored--and presumably not restricted to Paramecium--association between the presence/absence of a DNA motif and the evolutionary fate of its host genes. PMID:20441586
RSAT 2018: regulatory sequence analysis tools 20th anniversary.

PubMed

Nguyen, Nga Thi Thuy; Contreras-Moreira, Bruno; Castro-Mondragon, Jaime A; Santana-Garcia, Walter; Ossio, Raul; Robles-Espinoza, Carla Daniela; Bahin, Mathieu; Collombet, Samuel; Vincens, Pierre; Thieffry, Denis; van Helden, Jacques; Medina-Rivera, Alejandra; Thomas-Chollier, Morgane

2018-05-02

RSAT (Regulatory Sequence Analysis Tools) is a suite of modular tools for the detection and the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, including from genome-wide datasets like ChIP-seq/ATAC-seq, (ii) motif scanning, (iii) motif analysis (quality assessment, comparisons and clustering), (iv) analysis of regulatory variations, (v) comparative genomics. Six public servers jointly support 10 000 genomes from all kingdoms. Six novel or refactored programs have been added since the 2015 NAR Web Software Issue, including updated programs to analyse regulatory variants (retrieve-variation-seq, variation-scan, convert-variations), along with tools to extract sequences from a list of coordinates (retrieve-seq-bed), to select motifs from motif collections (retrieve-matrix), and to extract orthologs based on Ensembl Compara (get-orthologs-compara). Three use cases illustrate the integration of new and refactored tools to the suite. This Anniversary update gives a 20-year perspective on the software suite. RSAT is well-documented and available through Web sites, SOAP/WSDL (Simple Object Access Protocol/Web Services Description Language) web services, virtual machines and stand-alone programs at http://www.rsat.eu/.
Long Non-Coding RNAs: A Novel Paradigm for Toxicology

PubMed Central

Dempsey, Joseph L.; Cui, Julia Yue

2017-01-01

Long non-coding RNAs (lncRNAs) are over 200 nucleotides in length and are transcribed from the mammalian genome in a tissue-specific and developmentally regulated pattern. There is growing recognition that lncRNAs are novel biomarkers and/or key regulators of toxicological responses in humans and animal models. Lacking protein-coding capacity, the numerous types of lncRNAs possess a myriad of transcriptional regulatory functions that include cis and trans gene expression, transcription factor activity, chromatin remodeling, imprinting, and enhancer up-regulation. LncRNAs also influence mRNA processing, post-transcriptional regulation, and protein trafficking. Dysregulation of lncRNAs has been implicated in various human health outcomes such as various cancers, Alzheimer’s disease, cardiovascular disease, autoimmune diseases, as well as intermediary metabolism such as glucose, lipid, and bile acid homeostasis. Interestingly, emerging evidence in the literature over the past five years has shown that lncRNA regulation is impacted by exposures to various chemicals such as polycyclic aromatic hydrocarbons, benzene, cadmium, chlorpyrifos-methyl, bisphenol A, phthalates, phenols, and bile acids. Recent technological advancements, including next-generation sequencing technologies and novel computational algorithms, have enabled the profiling and functional characterizations of lncRNAs on a genomic scale. In this review, we summarize the biogenesis and general biological functions of lncRNAs, highlight the important roles of lncRNAs in human diseases and especially during the toxicological responses to various xenobiotics, evaluate current methods for identifying aberrant lncRNA expression and molecular target interactions, and discuss the potential to implement these tools to address fundamental questions in toxicology. PMID:27864543
The complete mitochondrial genome of Rapana venosa (Gastropoda, Muricidae).

PubMed

Sun, Xiujun; Yang, Aiguo

2016-01-01

The complete mitochondrial (mt) genome of the veined rapa whelk, Rapana venosa, was determined using genome walking techniques in this study. The total length of the mt genome sequence of R. venosa was 15,271 bp, which is comparable to the reported Muricidae mitogenomes to date. It contained 13 protein-coding genes, 21 transfer RNA genes, and two ribosomal RNA genes. A bias towards a higher representation of nucleotides A and T (69%) was detected in the mt genome of R. venosa. A small number of non-coding nucleotides (302 bp) was detected, and the largest non-coding region was 74 bp in length.
Identification and Differential Abundance of Mitochondrial Genome Encoding Small RNAs (mitosRNA) in Breast Muscles of Modern Broilers and Unselected Chicken Breed

PubMed Central

Bottje, Walter G.; Khatri, Bhuwan; Shouse, Stephanie A.; Seo, Dongwon; Mallmann, Barbara; Orlowski, Sara K.; Pan, Jeonghoon; Kong, Seongbae; Owens, Casey M.; Anthony, Nicholas B.; Kim, Jae K.; Kong, Byungwhi C.

2017-01-01

Background: Although small non-coding RNAs are mostly encoded by the nuclear genome, thousands of small non-coding RNAs encoded by the mitochondrial genome, termed as mitosRNAs were recently reported in human, mouse and trout. In this study, we first identified chicken mitosRNAs in breast muscle using small RNA sequencing method and the differential abundance was analyzed between modern pedigree male (PeM) broilers (characterized by rapid growth and large muscle mass) and the foundational Barred Plymouth Rock (BPR) chickens (characterized by slow growth and small muscle mass). Methods: Small RNA sequencing was performed with total RNAs extracted from breast muscles of PeM and BPR (n = 6 per group) using the 1 × 50 bp single end read method of Illumina sequencing. Raw reads were processed by quality assessment, adapter trimming, and alignment to the chicken mitochondrial genome (GenBank Accession: X52392.1) using the NGen program. Further statistical analyses were performed using the JMP Genomics 8. Differentially expressed (DE) mitosRNAs between PeM and BPR were confirmed by quantitative PCR. Results: Totals of 183,416 unique small RNA sequences were identified as potential chicken mitosRNAs. After stringent filtering processes, 117 mitosRNAs showing >100 raw read counts were abundantly produced from all 37 mitochondrial genes (except D-loop region) and the length of mitosRNAs ranged from 22 to 46 nucleotides. Of those, abundance of 44 mitosRNAs were significantly altered in breast muscles of PeM compared to those of BPR: all mitosRNAs were higher in PeM breast except those produced from 16S-rRNA gene. Possibly, the higher mitosRNAs abundance in PeM breast may be due to a higher mitochondrial content compared to BPR. Our data demonstrate that in addition to 37 known mitochondrial genes, the mitochondrial genome also encodes abundant mitosRNAs, that may play an important regulatory role in muscle growth via mitochondrial gene expression control. PMID:29104541
Characterization of a species-specific repetitive DNA from a highly endangered wild animal, Rhinoceros unicornis, and assessment of genetic polymorphism by microsatellite associated sequence amplification (MASA).

PubMed

Ali, S; Azfer, M A; Bashamboo, A; Mathur, P K; Malik, P K; Mathur, V B; Raha, A K; Ansari, S

1999-03-04

We have cloned and sequenced a 906bp EcoRI repeat DNA fraction from Rhinoceros unicornis genome. The contig pSS(R)2 is AT rich with 340 A (37.53%), 187 C (20.64%), 173 G (19.09%) and 206 T (22.74%). The sequence contains MALT box, NF-E1, Poly-A signal, lariat consensus sequences, TATA box, translational initiation sequences and several stop codons. Translation of the contig showed seven different types of protein motifs, among which, EGF-like domain cysteine pattern signatures and Bowman-Birk serine protease inhibitor family signatures were prominent. The presence of eukaryotic transcriptional elements, protein signatures and analysis of subset sequences in the 5' region from 1 to 165nt indicating coding potential (test code value=0.97) suggest possible regulatory and/or functional role(s) of these sequences in the rhino genome. Translation of the complementary strand from 906 to 706nt and 190 to 2nt showed proteins of more than 7kDa rich in non-polar residues. This suggests that pSS(R)2 is either a part of, or adjacent to, a functional gene. The contig contains mostly non-consecutive simple repeat units from 2 to 17nt with varying frequencies, of which four base motifs were found to be predominant. Zoo-blot hybridization revealed that pSS(R)2 sequences are unique to R. unicornis genome because they do not cross-hybridize, even with the genomic DNA of South African black rhino Diceros bicornis. Southern blot analysis of R. unicornis genomic DNA with pSS(R)2 and other synthetic oligo probes revealed a high level of genetic homogeneity, which was also substantiated by microsatellite associated sequence amplification (MASA). Owing to its uniqueness, the pSS(R)2 probe has a potential application in the area of conservation biology for unequivocal identification of horn or other body tissues of R. unicornis. The evolutionary aspect of this repeat fraction in the context of comparative genome analysis is discussed.
Short-lived non-coding transcripts (SLiTs): Clues to regulatory long non-coding RNA.

PubMed

Tani, Hidenori

2017-03-22

Whole transcriptome analyses have revealed a large number of novel long non-coding RNAs (lncRNAs). Although the importance of lncRNAs has been documented in previous reports, the biological and physiological functions of lncRNAs remain largely unknown. The role of lncRNAs seems an elusive problem. Here, I propose a clue to the identification of regulatory lncRNAs. The key point is RNA half-life. RNAs with a long half-life (t 1/2 > 4 h) contain a significant proportion of ncRNAs, as well as mRNAs involved in housekeeping functions, whereas RNAs with a short half-life (t 1/2 < 4 h) include known regulatory ncRNAs and regulatory mRNAs. This novel class of ncRNAs with a short half-life can be categorized as Short-Lived non-coding Transcripts (SLiTs). I consider that SLiTs are likely to be rich in functionally uncharacterized regulatory RNAs. This review describes recent progress in research into SLiTs.
An integrative approach to predicting the functional effects of small indels in non-coding regions of the human genome

PubMed Central

Ferlaino, Michael; Rogers, Mark F.; Shihab, Hashem A.; Mort, Matthew; Cooper, David N.; Gaunt, Tom R.; Campbell, Colin

2018-01-01

Background Small insertions and deletions (indels) have a significant influence in human disease and, in terms of frequency, they are second only to single nucleotide variants as pathogenic mutations. As the majority of mutations associated with complex traits are located outside the exome, it is crucial to investigate the potential pathogenic impact of indels in non-coding regions of the human genome. Results We present FATHMM-indel, an integrative approach to predict the functional effect, pathogenic or neutral, of indels in non-coding regions of the human genome. Our method exploits various genomic annotations in addition to sequence data. When validated on benchmark data, FATHMM-indel significantly outperforms CADD and GAVIN, state of the art models in assessing the pathogenic impact of non-coding variants. FATHMM-indel is available via a web server at indels.biocompute.org.uk. Conclusions FATHMM-indel can accurately predict the functional impact and prioritise small indels throughout the whole non-coding genome. PMID:28985712
An integrative approach to predicting the functional effects of small indels in non-coding regions of the human genome.

PubMed

Ferlaino, Michael; Rogers, Mark F; Shihab, Hashem A; Mort, Matthew; Cooper, David N; Gaunt, Tom R; Campbell, Colin

2017-10-06

Small insertions and deletions (indels) have a significant influence in human disease and, in terms of frequency, they are second only to single nucleotide variants as pathogenic mutations. As the majority of mutations associated with complex traits are located outside the exome, it is crucial to investigate the potential pathogenic impact of indels in non-coding regions of the human genome. We present FATHMM-indel, an integrative approach to predict the functional effect, pathogenic or neutral, of indels in non-coding regions of the human genome. Our method exploits various genomic annotations in addition to sequence data. When validated on benchmark data, FATHMM-indel significantly outperforms CADD and GAVIN, state of the art models in assessing the pathogenic impact of non-coding variants. FATHMM-indel is available via a web server at indels.biocompute.org.uk. FATHMM-indel can accurately predict the functional impact and prioritise small indels throughout the whole non-coding genome.
Comparative Bioinformatics Analysis of Transcription Factor Genes Indicates Conservation of Key Regulatory Domains among Babesia bovis, Babesia microti, and Theileria equi.

PubMed

Alzan, Heba F; Knowles, Donald P; Suarez, Carlos E

2016-11-01

Apicomplexa tick-borne hemoparasites, including Babesia bovis, Babesia microti, and Theileria equi are responsible for bovine and human babesiosis and equine theileriosis, respectively. These parasites of vast medical, epidemiological, and economic impact have complex life cycles in their vertebrate and tick hosts. Large gaps in knowledge concerning the mechanisms used by these parasites for gene regulation remain. Regulatory genes coding for DNA binding proteins such as members of the Api-AP2, HMG, and Myb families are known to play crucial roles as transcription factors. Although the repertoire of Api-AP2 has been defined and a HMG gene was previously identified in the B. bovis genome, these regulatory genes have not been described in detail in B. microti and T. equi. In this study, comparative bioinformatics was used to: (i) identify and map genes encoding for these transcription factors among three parasites' genomes; (ii) identify a previously unreported HMG gene in B. microti; (iii) define a repertoire of eight conserved Myb genes; and (iv) identify AP2 correlates among B. bovis and the better-studied Plasmodium parasites. Searching the available transcriptome of B. bovis defined patterns of transcription of these three gene families in B. bovis erythrocyte stage parasites. Sequence comparisons show conservation of functional domains and general architecture in the AP2, Myb, and HMG proteins, which may be significant for the regulation of common critical parasite life cycle transitions in B. bovis, B. microti, and T. equi. A detailed understanding of the role of gene families encoding DNA binding proteins will provide new tools for unraveling regulatory mechanisms involved in B. bovis, B. microti, and T. equi life cycles and environmental adaptive responses and potentially contributes to the development of novel convergent strategies for improved control of babesiosis and equine piroplasmosis.
Long non-coding RNAs and mRNAs profiling during spleen development in pig.

PubMed

Che, Tiandong; Li, Diyan; Jin, Long; Fu, Yuhua; Liu, Yingkai; Liu, Pengliang; Wang, Yixin; Tang, Qianzi; Ma, Jideng; Wang, Xun; Jiang, Anan; Li, Xuewei; Li, Mingzhou

2018-01-01

Genome-wide transcriptomic studies in humans and mice have become extensive and mature. However, a comprehensive and systematic understanding of protein-coding genes and long non-coding RNAs (lncRNAs) expressed during pig spleen development has not been achieved. LncRNAs are known to participate in regulatory networks for an array of biological processes. Here, we constructed 18 RNA libraries from developing fetal pig spleen (55 days before birth), postnatal pig spleens (0, 30, 180 days and 2 years after birth), and the samples from the 2-year-old Wild Boar. A total of 15,040 lncRNA transcripts were identified among these samples. We found that the temporal expression pattern of lncRNAs was more restricted than observed for protein-coding genes. Time-series analysis showed two large modules for protein-coding genes and lncRNAs. The up-regulated module was enriched for genes related to immune and inflammatory function, while the down-regulated module was enriched for cell proliferation processes such as cell division and DNA replication. Co-expression networks indicated the functional relatedness between protein-coding genes and lncRNAs, which were enriched for similar functions over the series of time points examined. We identified numerous differentially expressed protein-coding genes and lncRNAs in all five developmental stages. Notably, ceruloplasmin precursor (CP), a protein-coding gene participating in antioxidant and iron transport processes, was differentially expressed in all stages. This study provides the first catalog of the developing pig spleen, and contributes to a fuller understanding of the molecular mechanisms underpinning mammalian spleen development.
It’s Time for An Epigenomics Roadmap of Heart Failure

PubMed Central

Papait, Roberto; Corrado, Nadia; Rusconi, Francesca; Serio, Simone; V.G. Latronico, Michael

2015-01-01

The post-genomic era has completed its first decade. During this time, we have seen an attempt to understand life not just through the study of individual isolated processes, but through the appreciation of the amalgam of complex networks, within which each process can influence others. Greatly benefiting this view has been the study of the epigenome, the set of DNA and histone protein modifications that regulate gene expression and the function of regulatory non-coding RNAs without altering the DNA sequence itself. Indeed, the availability of reference genome assemblies of many species has led to the development of methodologies such as ChIP-Seq and RNA-Seq that have allowed us to define with high resolution the genomic distribution of several epigenetic elements and to better comprehend how they are interconnected for the regulation of gene expression. In the last few years, the use of these methodologies in the cardiovascular field has contributed to our understanding of the importance of epigenetics in heart diseases, giving new input to this area of research. Here, we review recently acquired knowledge on the role of the epigenome in heart failure, and discuss the need of an epigenomics roadmap for cardiovascular disease. PMID:27006627
Genomic structure and expression of STM2, the chromosome 1 familial Alzheimer disease gene.

PubMed

Levy-Lahad, E; Poorkaj, P; Wang, K; Fu, Y H; Oshima, J; Mulligan, J; Schellenberg, G D

1996-06-01

Mutations in the gene STM2 result in autosomal dominant familial Alzheimer disease. To screen for mutations and to identify regulatory elements for this gene, the genomic DNA sequence and intron-exon structure were determined. Twelve exons including 10 coding exons were identified in a genomic region spanning 23,737 bp. The first 2 exons encode the 5'-untranslated region. Expression analysis of STM2 indicates that two transcripts of 2.4 and 2.8 kb are found in skeletal muscle, pancreas, and heart. In addition, a splice variant of the 2.4-kb transcript was identified that is the result of the use of an alternative splice acceptor site located in exon 10. The use of this site results in a transcript lacking a single glutamate. The promotor for this gene and the alternatively spliced exons leading to the 2.8-kb form of the gene remain to be identified. Expression of STM2 was high in skeletal muscle and pancreas, with comparatively low levels observed in brain. This expression pattern is intriguing since in Alzheimer disease, pathology and degeneration are observed only in the central nervous system.
Genomic structure and expression of STM2, the chromosome 1 familial Alzheimer disease gene

DOE Office of Scientific and Technical Information (OSTI.GOV)

Levy-Lahad, E.; Wang, Kai; Fu, Ying Hui

1996-06-01

Mutations in the gene STM2 result in autosomal dominant familial Alzheimer disease. To screen for mutations and to identify regulatory elements for this gene, the genomic DNA sequence and intron-exon structure were determined. Twelve exons including 10 coding exons were identified in a genomic region spanning 23, 737 bp. The first 2 exons encode the 5{prime}-untranslated region. Expression analysis of STM2 indicates that two transcripts of 2.4 and 2.8 kb are found in skeletal muscle, pancreas, and heart. In addition, a splice variant of the 2.4-kb transcript was identified that is the result of the use of an alternative splicemore » acceptor site located in exon 10. The use of this site results in a transcript lacking a single glutamate. The promotor for this gene and the alternatively spliced exons leading to the 2.8-kb form of the gene remain to be identified. Expression of STM2 was high in skeletal muscle and pancreas, with comparatively low levels observed in brain. This expression pattern is intriguing since in Alzheimer disease, pathology and degeneration are observed only in the central nervous system. 19 refs., 2 figs., 3 tabs.« less
POTENTIAL IMPACTS OF GENOMICS ON EPA REGULATORY AND RISK ASSESSMENT APPLICATIONS

EPA Science Inventory

Gallagher, Kathryn and William Benson. In press. Potential Impacts of Genomics on EPA Regulatory and Risk Assessment (Abstract). To be presented at the EPA Science Forum: Healthy Communities and Ecosystems, 1-3 June 2004, Washington, DC. 1 p. (ERL,GB R991).

Advances in ge...
POTENTIAL IMPLICATIONS OF GENOMICS FOR REGULATORY AND RISK ASSESSMENT APPLICATIONS AT EPA

EPA Science Inventory

Benson, W.H., K. Gallagher and K. Dearfield. In press. Potential Implications of Genomics for Regulatory and Risk Assessment Applications at EPA (Abstract). To be presented at the SETAC Fourth World Congress, 14-18 November 2004, Portland, OR. 1 p. (ERL,GB R1002).

Advance...
Using a Euclid distance discriminant method to find protein coding genes in the yeast genome.

PubMed

Zhang, Chun-Ting; Wang, Ju; Zhang, Ren

2002-02-01

The Euclid distance discriminant method is used to find protein coding genes in the yeast genome, based on the single nucleotide frequencies at three codon positions in the ORFs. The method is extremely simple and may be extended to find genes in prokaryotic genomes or eukaryotic genomes with less introns. Six-fold cross-validation tests have demonstrated that the accuracy of the algorithm is better than 93%. Based on this, it is found that the total number of protein coding genes in the yeast genome is less than or equal to 5579 only, about 3.8-7.0% less than 5800-6000, which is currently widely accepted. The base compositions at three codon positions are analyzed in details using a graphic method. The result shows that the preference codons adopted by yeast genes are of the RGW type, where R, G and W indicate the bases of purine, non-G and A/T, whereas the 'codons' in the intergenic sequences are of the form NNN, where N denotes any base. This fact constitutes the basis of the algorithm to distinguish between coding and non-coding ORFs in the yeast genome. The names of putative non-coding ORFs are listed here in detail.
The mitochondrial genomes of the acoelomorph worms Paratomella rubra, Isodiametra pulchra and Archaphanostoma ylvae.

PubMed

Robertson, Helen E; Lapraz, François; Egger, Bernhard; Telford, Maximilian J; Schiffer, Philipp H

2017-05-12

Acoels are small, ubiquitous - but understudied - marine worms with a very simple body plan. Their internal phylogeny is still not fully resolved, and the position of their proposed phylum Xenacoelomorpha remains debated. Here we describe mitochondrial genome sequences from the acoels Paratomella rubra and Isodiametra pulchra, and the complete mitochondrial genome of the acoel Archaphanostoma ylvae. The P. rubra and A. ylvae sequences are typical for metazoans in size and gene content. The larger I. pulchra mitochondrial genome contains both ribosomal genes, 21 tRNAs, but only 11 protein-coding genes. We find evidence suggesting a duplicated sequence in the I. pulchra mitochondrial genome. The P. rubra, I. pulchra and A. ylvae mitochondria have a unique genome organisation in comparison to other metazoan mitochondrial genomes. We found a large degree of protein-coding gene and tRNA overlap with little non-coding sequence in the compact P. rubra genome. Conversely, the A. ylvae and I. pulchra genomes have many long non-coding sequences between genes, likely driving genome size expansion in the latter. Phylogenetic trees inferred from mitochondrial genes retrieve Xenacoelomorpha as an early branching taxon in the deuterostomes. Sequence divergence analysis between P. rubra sampled in England and Spain indicates cryptic diversity.

Functional interrogation of non-coding DNA through CRISPR genome editing.

PubMed

Canver, Matthew C; Bauer, Daniel E; Orkin, Stuart H

2017-05-15

Methodologies to interrogate non-coding regions have lagged behind coding regions despite comprising the vast majority of the genome. However, the rapid evolution of clustered regularly interspaced short palindromic repeats (CRISPR)-based genome editing has provided a multitude of novel techniques for laboratory investigation including significant contributions to the toolbox for studying non-coding DNA. CRISPR-mediated loss-of-function strategies rely on direct disruption of the underlying sequence or repression of transcription without modifying the targeted DNA sequence. CRISPR-mediated gain-of-function approaches similarly benefit from methods to alter the targeted sequence through integration of customized sequence into the genome as well as methods to activate transcription. Here we review CRISPR-based loss- and gain-of-function techniques for the interrogation of non-coding DNA. Copyright © 2017 Elsevier Inc. All rights reserved.
Mechanistically Distinct Pathways of Divergent Regulatory DNA Creation Contribute to Evolution of Human-Specific Genomic Regulatory Networks Driving Phenotypic Divergence of Homo sapiens.

PubMed

Glinsky, Gennadi V

2016-09-19

Thousands of candidate human-specific regulatory sequences (HSRS) have been identified, supporting the hypothesis that unique to human phenotypes result from human-specific alterations of genomic regulatory networks. Collectively, a compendium of multiple diverse families of HSRS that are functionally and structurally divergent from Great Apes could be defined as the backbone of human-specific genomic regulatory networks. Here, the conservation patterns analysis of 18,364 candidate HSRS was carried out requiring that 100% of bases must remap during the alignments of human, chimpanzee, and bonobo sequences. A total of 5,535 candidate HSRS were identified that are: (i) highly conserved in Great Apes; (ii) evolved by the exaptation of highly conserved ancestral DNA; (iii) defined by either the acceleration of mutation rates on the human lineage or the functional divergence from non-human primates. The exaptation of highly conserved ancestral DNA pathway seems mechanistically distinct from the evolution of regulatory DNA segments driven by the species-specific expansion of transposable elements. Genome-wide proximity placement analysis of HSRS revealed that a small fraction of topologically associating domains (TADs) contain more than half of HSRS from four distinct families. TADs that are enriched for HSRS and termed rapidly evolving in humans TADs (revTADs) comprise 0.8-10.3% of 3,127 TADs in the hESC genome. RevTADs manifest distinct correlation patterns between placements of human accelerated regions, human-specific transcription factor-binding sites, and recombination rates. There is a significant enrichment within revTAD boundaries of hESC-enhancers, primate-specific CTCF-binding sites, human-specific RNAPII-binding sites, hCONDELs, and H3K4me3 peaks with human-specific enrichment at TSS in prefrontal cortex neurons (P < 0.0001 in all instances). Present analysis supports the idea that phenotypic divergence of Homo sapiens is driven by the evolution of human-specific genomic regulatory networks via at least two mechanistically distinct pathways of creation of divergent sequences of regulatory DNA: (i) recombination-associated exaptation of the highly conserved ancestral regulatory DNA segments; (ii) human-specific insertions of transposable elements. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Discovery of functional non-coding conserved regions in the α-synuclein gene locus

PubMed Central

Sterling, Lori; Walter, Michael; Ting, Dennis; Schüle, Birgitt

2014-01-01

Several single nucleotide polymorphisms (SNPs) and the Rep-1 microsatellite marker of the α-synuclein ( SNCA) gene have consistently been shown to be associated with Parkinson’s disease, but the functional relevance is unclear. Based on these findings we hypothesized that conserved cis-regulatory elements in the SNCA genomic region regulate expression of SNCA, and that SNPs in these regions could be functionally modulating the expression of SNCA, thus contributing to neuronal demise and predisposing to Parkinson’s disease. In a pair-wise comparison of a 206kb genomic region encompassing the SNCA gene, we revealed 34 evolutionary conserved DNA sequences between human and mouse. All elements were cloned into reporter vectors and assessed for expression modulation in dual luciferase reporter assays. We found that 12 out of 34 elements exhibited either an enhancement or reduction of the expression of the reporter gene. Three elements upstream of the SNCA gene displayed an approximately 1.5 fold (p<0.009) increase in expression. Of the intronic regions, three showed a 1.5 fold increase and two others indicated a 2 and 2.5 fold increase in expression (p<0.002). Three elements downstream of the SNCA gene showed 1.5 fold and 2.5 fold increase (p<0.0009). One element downstream of SNCA had a reduced expression of the reporter gene of 0.35 fold (p<0.0009) of normal activity. Our results demonstrate that the SNCA gene contains cis-regulatory regions that might regulate the transcription and expression of SNCA. Further studies in disease-relevant tissue types will be important to understand the functional impact of regulatory regions and specific Parkinson’s disease-associated SNPs and its function in the disease process. PMID:25566351
The TTSMI database: a catalog of triplex target DNA sites associated with genes and regulatory elements in the human genome.

PubMed

Jenjaroenpun, Piroon; Chew, Chee Siang; Yong, Tai Pang; Choowongkomon, Kiattawee; Thammasorn, Wimada; Kuznetsov, Vladimir A

2015-01-01

A triplex target DNA site (TTS), a stretch of DNA that is composed of polypurines, is able to form a triple-helix (triplex) structure with triplex-forming oligonucleotides (TFOs) and is able to influence the site-specific modulation of gene expression and/or the modification of genomic DNA. The co-localization of a genomic TTS with gene regulatory signals and functional genome structures suggests that TFOs could potentially be exploited in antigene strategies for the therapy of cancers and other genetic diseases. Here, we present the TTS Mapping and Integration (TTSMI; http://ttsmi.bii.a-star.edu.sg) database, which provides a catalog of unique TTS locations in the human genome and tools for analyzing the co-localization of TTSs with genomic regulatory sequences and signals that were identified using next-generation sequencing techniques and/or predicted by computational models. TTSMI was designed as a user-friendly tool that facilitates (i) fast searching/filtering of TTSs using several search terms and criteria associated with sequence stability and specificity, (ii) interactive filtering of TTSs that co-localize with gene regulatory signals and non-B DNA structures, (iii) exploration of dynamic combinations of the biological signals of specific TTSs and (iv) visualization of a TTS simultaneously with diverse annotation tracks via the UCSC genome browser. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Novel bacteriophages containing a genome of another bacteriophage within their genomes.

PubMed

Swanson, Maud M; Reavy, Brian; Makarova, Kira S; Cock, Peter J; Hopkins, David W; Torrance, Lesley; Koonin, Eugene V; Taliansky, Michael

2012-01-01

A novel bacteriophage infecting Staphylococus pasteuri was isolated during a screen for phages in Antarctic soils. The phage named SpaA1 is morphologically similar to phages of the family Siphoviridae. The 42,784 bp genome of SpaA1 is a linear, double-stranded DNA molecule with 3' protruding cohesive ends. The SpaA1 genome encompasses 63 predicted protein-coding genes which cluster within three regions of the genome, each of apparently different origin, in a mosaic pattern. In two of these regions, the gene sets resemble those in prophages of Bacillus thuringiensis kurstaki str. T03a001 (genes involved in DNA replication/transcription, cell entry and exit) and B. cereus AH676 (additional regulatory and recombination genes), respectively. The third region represents an almost complete genome (except for the short terminal segments) of a distinct bacteriophage, MZTP02. Nearly the same gene module was identified in prophages of B. thuringiensis serovar monterrey BGSC 4AJ1 and B. cereus Rock4-2. These findings suggest that MZTP02 can be shuttled between genomes of other bacteriophages and prophages, leading to the formation of chimeric genomes. The presence of a complete phage genome in the genome of other phages apparently has not been described previously and might represent a 'fast track' route of virus evolution and horizontal gene transfer. Another phage (BceA1) nearly identical in sequence to SpaA1, and also including the almost complete MZTP02 genome within its own genome, was isolated from a bacterium of the B. cereus/B. thuringiensis group. Remarkably, both SpaA1 and BceA1 phages can infect B. cereus and B. thuringiensis, but only one of them, SpaA1, can infect S. pasteuri. This finding is best compatible with a scenario in which MZTP02 was originally contained in BceA1 infecting Bacillus spp, the common hosts for these two phages, followed by emergence of SpaA1 infecting S. pasteuri.
The location of a disease-associated polymorphism and genomic structure of the human 52-kDa Ro/SSA locus (SSA1)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tsugu, H.; Horowitz, R.; Gibson, N.

1994-12-01

Sera from approximately 30% of patients with systemic lupus erythematosus (SLE) contain high titers of autoantibodies that bind to the 52-kDa Ro/SSA protein. We previously detected polymorphisms in the 52-kDa Ro/SSA gene (SSA1) with restriction enzymes, one of which is strongly associated with the presence of SLE (P < 0.0005) in African Americans. A higher disease frequency and more severe forms of the disease are commonly noted among these female patients. To determine the location and nature of this polymorphism, we obtained two clones that span 8.5 kb of the 52-kDa Ro/SSA locus including its upstream regulatory region. Six exonsmore » were identified, and their nucleotide sequences plus adjacent noncoding regions were determined. No differences were found between these exons and the coding region of one of the reported cDNAs. The disease-associated polymorphic site suggested by a restriction enzyme map and confirmed by DNA amplification and nucleotide sequencing was present upstream of exon 1. This polymorphism may be a genetic marker for a disease-related variation in the coding region for the protein or in the upstream regulatory region of this gene. Although this RFLP is present in Japanese, it is not associated with lupus in this race. 41 refs., 4 figs., 2 tabs.« less
Integrated Approach to Reconstruction of Microbial Regulatory Networks

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rodionov, Dmitry A; Novichkov, Pavel S

2013-11-04

This project had the goal(s) of development of integrated bioinformatics platform for genome-scale inference and visualization of transcriptional regulatory networks (TRNs) in bacterial genomes. The work was done in Sanford-Burnham Medical Research Institute (SBMRI, P.I. D.A. Rodionov) and Lawrence Berkeley National Laboratory (LBNL, co-P.I. P.S. Novichkov). The developed computational resources include: (1) RegPredict web-platform for TRN inference and regulon reconstruction in microbial genomes, and (2) RegPrecise database for collection, visualization and comparative analysis of transcriptional regulons reconstructed by comparative genomics. These analytical resources were selected as key components in the DOE Systems Biology KnowledgeBase (SBKB). The high-quality data accumulated inmore » RegPrecise will provide essential datasets of reference regulons in diverse microbes to enable automatic reconstruction of draft TRNs in newly sequenced genomes. We outline our progress toward the three aims of this grant proposal, which were: Develop integrated platform for genome-scale regulon reconstruction; Infer regulatory annotations in several groups of bacteria and building of reference collections of microbial regulons; and Develop KnowledgeBase on microbial transcriptional regulation.« less
The pineapple genome and the evolution of CAM photosynthesis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ming, Ray; VanBuren, Robert; Wai, Ching Man

Pineapple (Ananas comosus (L.) Merr.) is the most economically valuable crop possessing crassulacean acid metabolism (CAM), a photosynthetic carbon assimilation pathway with high water-use efficiency, and the second most important tropical fruit. We sequenced the genomes of pineapple varieties F153 and MD2 and a wild pineapple relative, Ananas bracteatus accession CB5. The pineapple genome has one fewer ancient whole-genome duplication event than sequenced grass genomes and a conserved karyotype with seven chromosomes from before the ρ duplication event. The pineapple lineage has transitioned from C 3 photosynthesis to CAM, with CAM-related genes exhibiting a diel expression pattern in photosynthetic tissues.more » CAM pathway genes were enriched with cis-regulatory elements associated with the regulation of circadian clock genes, providing the first cis-regulatory link between CAM and circadian clock regulation. Lastly, we found pineapple CAM photosynthesis evolved by the reconfiguration of pathways in C 3 plants, through the regulatory neofunctionalization of preexisting genes and not through the acquisition of neofunctionalized genes via whole-genome or tandem gene duplication.« less
The pineapple genome and the evolution of CAM photosynthesis

DOE PAGES

Ming, Ray; VanBuren, Robert; Wai, Ching Man; ...

2015-11-02

Pineapple (Ananas comosus (L.) Merr.) is the most economically valuable crop possessing crassulacean acid metabolism (CAM), a photosynthetic carbon assimilation pathway with high water-use efficiency, and the second most important tropical fruit. We sequenced the genomes of pineapple varieties F153 and MD2 and a wild pineapple relative, Ananas bracteatus accession CB5. The pineapple genome has one fewer ancient whole-genome duplication event than sequenced grass genomes and a conserved karyotype with seven chromosomes from before the ρ duplication event. The pineapple lineage has transitioned from C 3 photosynthesis to CAM, with CAM-related genes exhibiting a diel expression pattern in photosynthetic tissues.more » CAM pathway genes were enriched with cis-regulatory elements associated with the regulation of circadian clock genes, providing the first cis-regulatory link between CAM and circadian clock regulation. Lastly, we found pineapple CAM photosynthesis evolved by the reconfiguration of pathways in C 3 plants, through the regulatory neofunctionalization of preexisting genes and not through the acquisition of neofunctionalized genes via whole-genome or tandem gene duplication.« less
Complete Mitochondrial Genome of Echinostoma hortense (Digenea: Echinostomatidae).

PubMed

Liu, Ze-Xuan; Zhang, Yan; Liu, Yu-Ting; Chang, Qiao-Cheng; Su, Xin; Fu, Xue; Yue, Dong-Mei; Gao, Yuan; Wang, Chun-Ren

2016-04-01

Echinostoma hortense (Digenea: Echinostomatidae) is one of the intestinal flukes with medical importance in humans. However, the mitochondrial (mt) genome of this fluke has not been known yet. The present study has determined the complete mt genome sequences of E. hortense and assessed the phylogenetic relationships with other digenean species for which the complete mt genome sequences are available in GenBank using concatenated amino acid sequences inferred from 12 protein-coding genes. The mt genome of E. hortense contained 12 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA genes, and 1 non-coding region. The length of the mt genome of E. hortense was 14,994 bp, which was somewhat smaller than those of other trematode species. Phylogenetic analyses based on concatenated nucleotide sequence datasets for all 12 protein-coding genes using maximum parsimony (MP) method showed that E. hortense and Hypoderaeum conoideum gathered together, and they were closer to each other than to Fasciolidae and other echinostomatid trematodes. The availability of the complete mt genome sequences of E. hortense provides important genetic markers for diagnostics, population genetics, and evolutionary studies of digeneans.
Complete Mitochondrial Genome of Echinostoma hortense (Digenea: Echinostomatidae)

PubMed Central

Liu, Ze-Xuan; Zhang, Yan; Liu, Yu-Ting; Chang, Qiao-Cheng; Su, Xin; Fu, Xue; Yue, Dong-Mei; Gao, Yuan; Wang, Chun-Ren

2016-01-01

Echinostoma hortense (Digenea: Echinostomatidae) is one of the intestinal flukes with medical importance in humans. However, the mitochondrial (mt) genome of this fluke has not been known yet. The present study has determined the complete mt genome sequences of E. hortense and assessed the phylogenetic relationships with other digenean species for which the complete mt genome sequences are available in GenBank using concatenated amino acid sequences inferred from 12 protein-coding genes. The mt genome of E. hortense contained 12 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA genes, and 1 non-coding region. The length of the mt genome of E. hortense was 14,994 bp, which was somewhat smaller than those of other trematode species. Phylogenetic analyses based on concatenated nucleotide sequence datasets for all 12 protein-coding genes using maximum parsimony (MP) method showed that E. hortense and Hypoderaeum conoideum gathered together, and they were closer to each other than to Fasciolidae and other echinostomatid trematodes. The availability of the complete mt genome sequences of E. hortense provides important genetic markers for diagnostics, population genetics, and evolutionary studies of digeneans. PMID:27180575
Intersecting transcriptomic profiling technologies and long non-coding RNA function in lung adenocarcinoma: discovery, mechanisms, and therapeutic applications

PubMed Central

Castillo, Jonathan; Stueve, Theresa R.; Marconett, Crystal N.

2017-01-01

Previously thought of as junk transcripts and pseudogene remnants, long non-coding RNAs (lncRNAs) have come into their own over the last decade as an essential component of cellular activity, regulating a plethora of functions within multicellular organisms. lncRNAs are now known to participate in development, cellular homeostasis, immunological processes, and the development of disease. With the advent of next generation sequencing technology, hundreds of thousands of lncRNAs have been identified. However, movement beyond mere discovery to the understanding of molecular processes has been stymied by the complicated genomic structure, tissue-restricted expression, and diverse regulatory roles lncRNAs play. In this review, we will focus on lncRNAs involved in lung cancer, the most common cause of cancer-related death in the United States and worldwide. We will summarize their various methods of discovery, provide consensus rankings of deregulated lncRNAs in lung cancer, and describe in detail the limited functional analysis that has been undertaken so far. PMID:29113413
Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution.

PubMed

2004-12-09

We present here a draft genome sequence of the red jungle fowl, Gallus gallus. Because the chicken is a modern descendant of the dinosaurs and the first non-mammalian amniote to have its genome sequenced, the draft sequence of its genome--composed of approximately one billion base pairs of sequence and an estimated 20,000-23,000 genes--provides a new perspective on vertebrate genome evolution, while also improving the annotation of mammalian genomes. For example, the evolutionary distance between chicken and human provides high specificity in detecting functional elements, both non-coding and coding. Notably, many conserved non-coding sequences are far from genes and cannot be assigned to defined functional classes. In coding regions the evolutionary dynamics of protein domains and orthologous groups illustrate processes that distinguish the lineages leading to birds and mammals. The distinctive properties of avian microchromosomes, together with the inferred patterns of conserved synteny, provide additional insights into vertebrate chromosome architecture.
Competing endogenous RNA regulatory network in papillary thyroid carcinoma.

PubMed

Chen, Shouhua; Fan, Xiaobin; Gu, He; Zhang, Lili; Zhao, Wenhua

2018-05-11

The present study aimed to screen all types of RNAs involved in the development of papillary thyroid carcinoma (PTC). RNA‑sequencing data of PTC and normal samples were used for screening differentially expressed (DE) microRNAs (DE‑miRNAs), long non‑coding RNAs (DE‑lncRNAs) and genes (DEGs). Subsequently, lncRNA‑miRNA, miRNA‑gene (that is, miRNA‑mRNA) and gene‑gene interaction pairs were extracted and used to construct regulatory networks. Feature genes in the miRNA‑mRNA network were identified by topological analysis and recursive feature elimination analysis. A support vector machine (SVM) classifier was built using 15 feature genes, and its classification effect was validated using two microarray data sets that were downloaded from the Gene Expression Omnibus (GEO) database. In addition, Gene Ontology function and Kyoto Encyclopedia Genes and Genomes pathway enrichment analyses were conducted for genes identified in the ceRNA network. A total of 506 samples, including 447 tumor samples and 59 normal samples, were obtained from The Cancer Genome Atlas (TCGA); 16 DE‑lncRNAs, 917 DEGs and 30 DE‑miRNAs were screened. The miRNA‑mRNA regulatory network comprised 353 nodes and 577 interactions. From these data, 15 feature genes with high predictive precision (>95%) were extracted from the network and were used to form an SVM classifier with an accuracy of 96.05% (486/506) for PTC samples downloaded from TCGA, and accuracies of 96.81 and 98.46% for GEO downloaded data sets. The ceRNA regulatory network comprised 596 lines (or interactions) and 365 nodes. Genes in the ceRNA network were significantly enriched in 'neuron development', 'differentiation', 'neuroactive ligand‑receptor interaction', 'metabolism of xenobiotics by cytochrome P450', 'drug metabolism' and 'cytokine‑cytokine receptor interaction' pathways. Hox transcript antisense RNA, miRNA‑206 and kallikrein‑related peptidase 10 were nodes in the ceRNA regulatory network of the selected feature gene, and they may serve import roles in the development of PTC.
Balancing Selection on a Regulatory Region Exhibiting Ancient Variation That Predates Human–Neandertal Divergence

PubMed Central

Iskow, Rebecca C.; Austermann, Christian; Scharer, Christopher D.; Raj, Towfique; Boss, Jeremy M.; Sunyaev, Shamil; Price, Alkes; Stranger, Barbara; Simon, Viviana; Lee, Charles

2013-01-01

Ancient population structure shaping contemporary genetic variation has been recently appreciated and has important implications regarding our understanding of the structure of modern human genomes. We identified a ∼36-kb DNA segment in the human genome that displays an ancient substructure. The variation at this locus exists primarily as two highly divergent haplogroups. One of these haplogroups (the NE1 haplogroup) aligns with the Neandertal haplotype and contains a 4.6-kb deletion polymorphism in perfect linkage disequilibrium with 12 single nucleotide polymorphisms (SNPs) across diverse populations. The other haplogroup, which does not contain the 4.6-kb deletion, aligns with the chimpanzee haplotype and is likely ancestral. Africans have higher overall pairwise differences with the Neandertal haplotype than Eurasians do for this NE1 locus (p<10−15). Moreover, the nucleotide diversity at this locus is higher in Eurasians than in Africans. These results mimic signatures of recent Neandertal admixture contributing to this locus. However, an in-depth assessment of the variation in this region across multiple populations reveals that African NE1 haplotypes, albeit rare, harbor more sequence variation than NE1 haplotypes found in Europeans, indicating an ancient African origin of this haplogroup and refuting recent Neandertal admixture. Population genetic analyses of the SNPs within each of these haplogroups, along with genome-wide comparisons revealed significant FST (p = 0.00003) and positive Tajima's D (p = 0.00285) statistics, pointing to non-neutral evolution of this locus. The NE1 locus harbors no protein-coding genes, but contains transcribed sequences as well as sequences with putative regulatory function based on bioinformatic predictions and in vitro experiments. We postulate that the variation observed at this locus predates Human–Neandertal divergence and is evolving under balancing selection, especially among European populations. PMID:23593015
A map of human microRNA variation uncovers unexpectedly high levels of variability

PubMed Central

2012-01-01

Background MicroRNAs (miRNAs) are key components of the gene regulatory network in many species. During the past few years, these regulatory elements have been shown to be involved in an increasing number and range of diseases. Consequently, the compilation of a comprehensive map of natural variability in a healthy population seems an obvious requirement for future research on miRNA-related pathologies. Methods Data on 14 populations from the 1000 Genomes Project were analyzed, along with new data extracted from 60 exomes of healthy individuals from a population from southern Spain, sequenced in the context of the Medical Genome Project, to derive an accurate map of miRNA variability. Results Despite the common belief that miRNAs are highly conserved elements, analysis of the sequences of the 1,152 individuals indicated that the observed level of variability is double what was expected. A total of 527 variants were found. Among these, 45 variants affected the recognition region of the corresponding miRNA and were found in 43 different miRNAs, 26 of which are known to be involved in 57 diseases. Different parts of the mature structure of the miRNA were affected to different degrees by variants, which suggests the existence of a selective pressure related to the relative functional impact of the change. Moreover, 41 variants showed a significant deviation from the Hardy-Weinberg equilibrium, which supports the existence of a selective process against some alleles. The average number of variants per individual in miRNAs was 28. Conclusions Despite an expectation that miRNAs would be highly conserved genomic elements, our study reports a level of variability comparable to that observed for coding genes. PMID:22906193
Variations in the non-coding transcriptome as a driver of inter-strain divergence and physiological adaptation in bacteria

PubMed Central

Kopf, Matthias; Klähn, Stephan; Scholz, Ingeborg; Hess, Wolfgang R.; Voß, Björn

2015-01-01

In all studied organisms, a substantial portion of the transcriptome consists of non-coding RNAs that frequently execute regulatory functions. Here, we have compared the primary transcriptomes of the cyanobacteria Synechocystis sp. PCC 6714 and PCC 6803 under 10 different conditions. These strains share 2854 protein-coding genes and a 16S rRNA identity of 99.4%, indicating their close relatedness. Conserved major transcriptional start sites (TSSs) give rise to non-coding transcripts within the sigB gene, from the 5′UTRs of cmpA and isiA, and 168 loci in antisense orientation. Distinct differences include single nucleotide polymorphisms rendering promoters inactive in one of the strains, e.g., for cmpR and for the asRNA PsbA2R. Based on the genome-wide mapped location, regulation and classification of TSSs, non-coding transcripts were identified as the most dynamic component of the transcriptome. We identified a class of mRNAs that originate by read-through from an sRNA that accumulates as a discrete and abundant transcript while also serving as the 5′UTR. Such an sRNA/mRNA structure, which we name ‘actuaton’, represents another way for bacteria to remodel their transcriptional network. Our findings support the hypothesis that variations in the non-coding transcriptome constitute a major evolutionary element of inter-strain divergence and capability for physiological adaptation. PMID:25902393
Complete mitochondrial genome sequence from an endangered Indian snake, Python molurus molurus (Serpentes, Pythonidae).

PubMed

Dubey, Bhawna; Meganathan, P R; Haque, Ikramul

2012-07-01

This paper reports the complete mitochondrial genome sequence of an endangered Indian snake, Python molurus molurus (Indian Rock Python). A typical snake mitochondrial (mt) genome of 17258 bp length comprising of 37 genes including the 13 protein coding genes, 22 tRNA genes, and 2 ribosomal RNA genes along with duplicate control regions is described herein. The P. molurus molurus mt. genome is relatively similar to other snake mt. genomes with respect to gene arrangement, composition, tRNA structures and skews of AT/GC bases. The nucleotide composition of the genome shows that there are more A-C % than T-G% on the positive strand as revealed by positive AT and CG skews. Comparison of individual protein coding genes, with other snake genomes suggests that ATP8 and NADH3 genes have high divergence rates. Codon usage analysis reveals a preference of NNC codons over NNG codons in the mt. genome of P. molurus. Also, the synonymous and non-synonymous substitution rates (ka/ks) suggest that most of the protein coding genes are under purifying selection pressure. The phylogenetic analyses involving the concatenated 13 protein coding genes of P. molurus molurus conformed to the previously established snake phylogeny.
QuASAR-MPRA: accurate allele-specific analysis for massively parallel reporter assays.

PubMed

Kalita, Cynthia A; Moyerbrailean, Gregory A; Brown, Christopher; Wen, Xiaoquan; Luca, Francesca; Pique-Regi, Roger

2018-03-01

The majority of the human genome is composed of non-coding regions containing regulatory elements such as enhancers, which are crucial for controlling gene expression. Many variants associated with complex traits are in these regions, and may disrupt gene regulatory sequences. Consequently, it is important to not only identify true enhancers but also to test if a variant within an enhancer affects gene regulation. Recently, allele-specific analysis in high-throughput reporter assays, such as massively parallel reporter assays (MPRAs), have been used to functionally validate non-coding variants. However, we are still missing high-quality and robust data analysis tools for these datasets. We have further developed our method for allele-specific analysis QuASAR (quantitative allele-specific analysis of reads) to analyze allele-specific signals in barcoded read counts data from MPRA. Using this approach, we can take into account the uncertainty on the original plasmid proportions, over-dispersion, and sequencing errors. The provided allelic skew estimate and its standard error also simplifies meta-analysis of replicate experiments. Additionally, we show that a beta-binomial distribution better models the variability present in the allelic imbalance of these synthetic reporters and results in a test that is statistically well calibrated under the null. Applying this approach to the MPRA data, we found 602 SNPs with significant (false discovery rate 10%) allele-specific regulatory function in LCLs. We also show that we can combine MPRA with QuASAR estimates to validate existing experimental and computational annotations of regulatory variants. Our study shows that with appropriate data analysis tools, we can improve the power to detect allelic effects in high-throughput reporter assays. http://github.com/piquelab/QuASAR/tree/master/mpra. fluca@wayne.edu or rpique@wayne.edu. Supplementary data are available online at Bioinformatics. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Integrating non-coding RNAs in JAK-STAT regulatory networks

PubMed Central

Witte, Steven; Muljo, Stefan A

2014-01-01

Being a well-characterized pathway, JAK-STAT signaling serves as a valuable paradigm for studying the architecture of gene regulatory networks. The discovery of untranslated or non-coding RNAs, namely microRNAs and long non-coding RNAs, provides an opportunity to elucidate their roles in such networks. In principle, these regulatory RNAs can act as downstream effectors of the JAK-STAT pathway and/or affect signaling by regulating the expression of JAK-STAT components. Examples of interactions between signaling pathways and non-coding RNAs have already emerged in basic cell biology and human diseases such as cancer, and can potentially guide the identification of novel biomarkers or drug targets for medicine. PMID:24778925

Long noncoding RNAs and their proposed functions in fibre development of cotton (Gossypium spp.).

PubMed

Wang, Maojun; Yuan, Daojun; Tu, Lili; Gao, Wenhui; He, Yonghui; Hu, Haiyan; Wang, Pengcheng; Liu, Nian; Lindsey, Keith; Zhang, Xianlong

2015-09-01

Long noncoding RNAs (lncRNAs) are transcripts of at least 200 bp in length, possess no apparent coding capacity and are involved in various biological regulatory processes. Until now, no systematic identification of lncRNAs has been reported in cotton (Gossypium spp.). Here, we describe the identification of 30 550 long intergenic noncoding RNA (lincRNA) loci (50 566 transcripts) and 4718 long noncoding natural antisense transcript (lncNAT) loci (5826 transcripts). LncRNAs are rich in repetitive sequences and preferentially expressed in a tissue-specific manner. The detection of abundant genome-specific and/or lineage-specific lncRNAs indicated their weak evolutionary conservation. Approximately 76% of homoeologous lncRNAs exhibit biased expression patterns towards the At or Dt subgenomes. Compared with protein-coding genes, lncRNAs showed overall higher methylation levels and their expression was less affected by gene body methylation. Expression validation in different cotton accessions and coexpression network construction helped to identify several functional lncRNA candidates involved in cotton fibre initiation and elongation. Analysis of integrated expression from the subgenomes of lncRNAs generating miR397 and its targets as a result of genome polyploidization indicated their pivotal functions in regulating lignin metabolism in domesticated tetraploid cotton fibres. This study provides the first comprehensive identification of lncRNAs in Gossypium. © 2015 The Authors. New Phytologist © 2015 New Phytologist Trust.
Getting to the Root of Things: Spatiotemporal Regulatory Networks (JGI Seventh Annual User Meeting 2012: Genomics of Energy and Environment)

ScienceCinema

Brady, Siobhan

2018-02-12

Siobhan Brady from University of California, Davis, gives a talk titled "Getting to the Root of things: Spatiotemporal Regulatory Networks" at the JGI 7th Annual Users Meeting: Genomics of Energy & Environment Meeting on March 22, 2012 in Walnut Creek, California.
ACTIVITIES TO DEVELOP AN INTERIM GUIDANCE FOR MICROARRAY-BASED ASSAYS FOR REGULATORY AND RISK ASSESSMENT APPLICATIONS AT EPA

EPA Science Inventory

Abstract for presentation. Advances in genomics will have significant implications for risk assessment policies and regulatory decision making. In 2002, EPA issued its lnterim Policy on Genomics which stated that such data may be considered in the decision making process, but tha...
Getting to the Root of Things: Spatiotemporal Regulatory Networks (JGI Seventh Annual User Meeting 2012: Genomics of Energy and Environment)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brady, Siobhan

2012-03-22

Siobhan Brady from University of California, Davis, gives a talk titled "Getting to the Root of things: Spatiotemporal Regulatory Networks" at the JGI 7th Annual Users Meeting: Genomics of Energy & Environment Meeting on March 22, 2012 in Walnut Creek, California.
Systematic review regulatory principles of non-coding RNAs in cardiovascular diseases.

PubMed

Li, Yongsheng; Huo, Caiqin; Pan, Tao; Li, Lili; Jin, Xiyun; Lin, Xiaoyu; Chen, Juan; Zhang, Jinwen; Guo, Zheng; Xu, Juan; Li, Xia

2017-08-16

Cardiovascular diseases (CVDs) continue to be a major cause of morbidity and mortality, and non-coding RNAs (ncRNAs) play critical roles in CVDs. With the recent emergence of high-throughput technologies, including small RNA sequencing, investigations of CVDs have been transformed from candidate-based studies into genome-wide undertakings, and a number of ncRNAs in CVDs were discovered in various studies. A comprehensive review of these ncRNAs would be highly valuable for researchers to get a complete picture of the ncRNAs in CVD. To address these knowledge gaps and clinical needs, in this review, we first discussed dysregulated ncRNAs and their critical roles in cardiovascular development and related diseases. Moreover, we reviewed >28 561 published papers and documented the ncRNA-CVD association benchmarking data sets to summarize the principles of ncRNA regulation in CVDs. This data set included 13 249 curated relationships between 9503 ncRNAs and 139 CVDs in 12 species. Based on this comprehensive resource, we summarized the regulatory principles of dysregulated ncRNAs in CVDs, including the complex associations between ncRNA and CVDs, tissue specificity and ncRNA synergistic regulation. The highlighted principles are that CVD microRNAs (miRNAs) are highly expressed in heart tissue and that they play central roles in miRNA-miRNA functional synergistic network. In addition, CVD-related miRNAs are close to one another in the functional network, indicating the modular characteristic features of CVD miRNAs. We believe that the regulatory principles summarized here will further contribute to our understanding of ncRNA function and dysregulation mechanisms in CVDs. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Streamlined Genome Sequence Compression using Distributed Source Coding

PubMed Central

Wang, Shuang; Jiang, Xiaoqian; Chen, Feng; Cui, Lijuan; Cheng, Samuel

2014-01-01

We aim at developing a streamlined genome sequence compression algorithm to support alternative miniaturized sequencing devices, which have limited communication, storage, and computation power. Existing techniques that require heavy client (encoder side) cannot be applied. To tackle this challenge, we carefully examined distributed source coding theory and developed a customized reference-based genome compression protocol to meet the low-complexity need at the client side. Based on the variation between source and reference, our protocol will pick adaptively either syndrome coding or hash coding to compress subsequences of changing code length. Our experimental results showed promising performance of the proposed method when compared with the state-of-the-art algorithm (GRS). PMID:25520552
Comparative analysis of the complete sequence of the plastid genome of Parthenium argentatum and identification of DNA barcodes to differentiate Parthenium species and lines

PubMed Central

2009-01-01

Background Parthenium argentatum (guayule) is an industrial crop that produces latex, which was recently commercialized as a source of latex rubber safe for people with Type I latex allergy. The complete plastid genome of P. argentatum was sequenced. The sequence provides important information useful for genetic engineering strategies. Comparison to the sequences of plastid genomes from three other members of the Asteraceae, Lactuca sativa, Guitozia abyssinica and Helianthus annuus revealed details of the evolution of the four genomes. Chloroplast-specific DNA barcodes were developed for identification of Parthenium species and lines. Results The complete plastid genome of P. argentatum is 152,803 bp. Based on the overall comparison of individual protein coding genes with those in L. sativa, G. abyssinica and H. annuus, we demonstrate that the P. argentatum chloroplast genome sequence is most closely related to that of H. annuus. Similar to chloroplast genomes in G. abyssinica, L. sativa and H. annuus, the plastid genome of P. argentatum has a large 23 kb inversion with a smaller 3.4 kb inversion, within the large inversion. Using the matK and psbA-trnH spacer chloroplast DNA barcodes, three of the four Parthenium species tested, P. tomentosum, P. hysterophorus and P. schottii, can be differentiated from P. argentatum. In addition, we identified lines within P. argentatum. Conclusion The genome sequence of the P. argentatum chloroplast will enrich the sequence resources of plastid genomes in commercial crops. The availability of the complete plastid genome sequence may facilitate transformation efficiency by using the precise sequence of endogenous flanking sequences and regulatory elements in chloroplast transformation vectors. The DNA barcoding study forms the foundation for genetic identification of commercially significant lines of P. argentatum that are important for producing latex. PMID:19917140
Conserved Noncoding Elements in the Most Distant Genera of Cephalochordates: The Goldilocks Principle

PubMed Central

Yue, Jia-Xing; Kozmikova, Iryna; Ono, Hiroki; Nossa, Carlos W.; Kozmik, Zbynek; Putnam, Nicholas H.; Yu, Jr-Kai; Holland, Linda Z.

2016-01-01

Cephalochordates, the sister group of vertebrates + tunicates, are evolving particularly slowly. Therefore, genome comparisons between two congeners of Branchiostoma revealed so many conserved noncoding elements (CNEs), that it was not clear how many are functional regulatory elements. To more effectively identify CNEs with potential regulatory functions, we compared noncoding sequences of genomes of the most phylogenetically distant cephalochordate genera, Asymmetron and Branchiostoma, which diverged approximately 120–160 million years ago. We found 113,070 noncoding elements conserved between the two species, amounting to 3.3% of the genome. The genomic distribution, target gene ontology, and enriched motifs of these CNEs all suggest that many of them are probably cis-regulatory elements. More than 90% of previously verified amphioxus regulatory elements were re-captured in this study. A search of the cephalochordate CNEs around 50 developmental genes in several vertebrate genomes revealed eight CNEs conserved between cephalochordates and vertebrates, indicating sequence conservation over >500 million years of divergence. The function of five CNEs was tested in reporter assays in zebrafish, and one was also tested in amphioxus. All five CNEs proved to be tissue-specific enhancers. Taken together, these findings indicate that even though Branchiostoma and Asymmetron are distantly related, as they are evolving slowly, comparisons between them are likely optimal for identifying most of their tissue-specific cis-regulatory elements laying the foundation for functional characterizations and a better understanding of the evolution of developmental regulation in cephalochordates. PMID:27412606
Workshop on the Application of Genomic Tools for the Rapid Molecular Characterization of Bacterial Isolates in Food-borne Disease Outbreak Investigations Ottawa, ON, February 24-25, 2014

DTIC Science & Technology

2014-05-01

techniques in regulatory food microbiology testing. When testing is conducted to verify compliance with food regulations, detection and typing are...8 Implementation of Molecular Techniques in Regulatory Food Microbiology Testing ................................ 8 From A,C,G,T to PFGE, to MLST... food -borne isolates, as well as some case studies highlighting the role of genomics in the resolution of critical regulatory food microbiology issues
Regulating transgenic crops sensibly: lessons from plant breeding, biotechnology and genomics.

PubMed

Bradford, Kent J; Van Deynze, Allen; Gutterson, Neal; Parrott, Wayne; Strauss, Steven H

2005-04-01

The costs of meeting regulatory requirements and market restrictions guided by regulatory criteria are substantial impediments to the commercialization of transgenic crops. Although a cautious approach may have been prudent initially, we argue that some regulatory requirements can now be modified to reduce costs and uncertainty without compromising safety. Long-accepted plant breeding methods for incorporating new diversity into crop varieties, experience from two decades of research on and commercialization of transgenic crops, and expanding knowledge of plant genome structure and dynamics all indicate that if a gene or trait is safe, the genetic engineering process itself presents little potential for unexpected consequences that would not be identified or eliminated in the variety development process before commercialization. We propose that as in conventional breeding, regulatory emphasis should be on phenotypic rather than genomic characteristics once a gene or trait has been shown to be safe.
Noncoding RNAs in DNA Repair and Genome Integrity

PubMed Central

Wan, Guohui; Liu, Yunhua; Han, Cecil; Zhang, Xinna

2014-01-01

Abstract Significance: The well-studied sequences in the human genome are those of protein-coding genes, which account for only 1%–2% of the total genome. However, with the advent of high-throughput transcriptome sequencing technology, we now know that about 90% of our genome is extensively transcribed and that the vast majority of them are transcribed into noncoding RNAs (ncRNAs). It is of great interest and importance to decipher the functions of these ncRNAs in humans. Recent Advances: In the last decade, it has become apparent that ncRNAs play a crucial role in regulating gene expression in normal development, in stress responses to internal and environmental stimuli, and in human diseases. Critical Issues: In addition to those constitutively expressed structural RNA, such as ribosomal and transfer RNAs, regulatory ncRNAs can be classified as microRNAs (miRNAs), Piwi-interacting RNAs (piRNAs), small interfering RNAs (siRNAs), small nucleolar RNAs (snoRNAs), and long noncoding RNAs (lncRNAs). However, little is known about the biological features and functional roles of these ncRNAs in DNA repair and genome instability, although a number of miRNAs and lncRNAs are regulated in the DNA damage response. Future Directions: A major goal of modern biology is to identify and characterize the full profile of ncRNAs with regard to normal physiological functions and roles in human disorders. Clinically relevant ncRNAs will also be evaluated and targeted in therapeutic applications. Antioxid. Redox Signal. 20, 655–677. PMID:23879367
Differential expression of non-coding RNAs and continuous evolution of the X chromosome in testicular transcriptome of two mouse species.

PubMed

Homolka, David; Ivanek, Robert; Forejt, Jiri; Jansa, Petr

2011-02-14

Tight regulation of testicular gene expression is a prerequisite for male reproductive success, while differentiation of gene activity in spermatogenesis is important during speciation. Thus, comparison of testicular transcriptomes between closely related species can reveal unique regulatory patterns and shed light on evolutionary constraints separating the species. Here, we compared testicular transcriptomes of two closely related mouse species, Mus musculus and Mus spretus, which diverged more than one million years ago. We analyzed testicular expression using tiling arrays overlapping Chromosomes 2, X, Y and mitochondrial genome. An excess of differentially regulated non-coding RNAs was found on Chromosome 2 including the intronic antisense RNAs, intergenic RNAs and premature forms of Piwi-interacting RNAs (piRNAs). Moreover, striking difference was found in the expression of X-linked G6pdx gene, the parental gene of the autosomal retrogene G6pd2. The prevalence of non-coding RNAs among differentially expressed transcripts indicates their role in species-specific regulation of spermatogenesis. The postmeiotic expression of G6pdx in Mus spretus points towards the continuous evolution of X-chromosome silencing and provides an example of expression change accompanying the out-of-the X-chromosomal retroposition.
Improvement of genome assembly completeness and identification of novel full-length protein-coding genes by RNA-seq in the giant panda genome.

PubMed

Chen, Meili; Hu, Yibo; Liu, Jingxing; Wu, Qi; Zhang, Chenglin; Yu, Jun; Xiao, Jingfa; Wei, Fuwen; Wu, Jiayan

2015-12-11

High-quality and complete gene models are the basis of whole genome analyses. The giant panda (Ailuropoda melanoleuca) genome was the first genome sequenced on the basis of solely short reads, but the genome annotation had lacked the support of transcriptomic evidence. In this study, we applied RNA-seq to globally improve the genome assembly completeness and to detect novel expressed transcripts in 12 tissues from giant pandas, by using a transcriptome reconstruction strategy that combined reference-based and de novo methods. Several aspects of genome assembly completeness in the transcribed regions were effectively improved by the de novo assembled transcripts, including genome scaffolding, the detection of small-size assembly errors, the extension of scaffold/contig boundaries, and gap closure. Through expression and homology validation, we detected three groups of novel full-length protein-coding genes. A total of 12.62% of the novel protein-coding genes were validated by proteomic data. GO annotation analysis showed that some of the novel protein-coding genes were involved in pigmentation, anatomical structure formation and reproduction, which might be related to the development and evolution of the black-white pelage, pseudo-thumb and delayed embryonic implantation of giant pandas. The updated genome annotation will help further giant panda studies from both structural and functional perspectives.
Enhancer Evolution across 20 Mammalian Species

PubMed Central

Villar, Diego; Berthelot, Camille; Aldridge, Sarah; Rayner, Tim F.; Lukk, Margus; Pignatelli, Miguel; Park, Thomas J.; Deaville, Robert; Erichsen, Jonathan T.; Jasinska, Anna J.; Turner, James M.A.; Bertelsen, Mads F.; Murchison, Elizabeth P.; Flicek, Paul; Odom, Duncan T.

2015-01-01

Summary The mammalian radiation has corresponded with rapid changes in noncoding regions of the genome, but we lack a comprehensive understanding of regulatory evolution in mammals. Here, we track the evolution of promoters and enhancers active in liver across 20 mammalian species from six diverse orders by profiling genomic enrichment of H3K27 acetylation and H3K4 trimethylation. We report that rapid evolution of enhancers is a universal feature of mammalian genomes. Most of the recently evolved enhancers arise from ancestral DNA exaptation, rather than lineage-specific expansions of repeat elements. In contrast, almost all liver promoters are partially or fully conserved across these species. Our data further reveal that recently evolved enhancers can be associated with genes under positive selection, demonstrating the power of this approach for annotating regulatory adaptations in genomic sequences. These results provide important insight into the functional genetics underpinning mammalian regulatory evolution. PMID:25635462
Long Non-Coding RNAs (lncRNAs) of Sea Cucumber: Large-Scale Prediction, Expression Profiling, Non-Coding Network Construction, and lncRNA-microRNA-Gene Interaction Analysis of lncRNAs in Apostichopus japonicus and Holothuria glaberrima During LPS Challenge and Radial Organ Complex Regeneration.

PubMed

Mu, Chuang; Wang, Ruijia; Li, Tianqi; Li, Yuqiang; Tian, Meilin; Jiao, Wenqian; Huang, Xiaoting; Zhang, Lingling; Hu, Xiaoli; Wang, Shi; Bao, Zhenmin

2016-08-01

Long non-coding RNA (lncRNA) structurally resembles mRNA but cannot be translated into protein. Although the systematic identification and characterization of lncRNAs have been increasingly reported in model species, information concerning non-model species is still lacking. Here, we report the first systematic identification and characterization of lncRNAs in two sea cucumber species: (1) Apostichopus japonicus during lipopolysaccharide (LPS) challenge and in heathy tissues and (2) Holothuria glaberrima during radial organ complex regeneration, using RNA-seq datasets and bioinformatics analysis. We identified A. japonicus and H. glaberrima lncRNAs that were differentially expressed during LPS challenge and radial organ complex regeneration, respectively. Notably, the predicted lncRNA-microRNA-gene trinities revealed that, in addition to targeting protein-coding transcripts, miRNAs might also target lncRNAs, thereby participating in a potential novel layer of regulatory interactions among non-coding RNA classes in echinoderms. Furthermore, the constructed coding-non-coding network implied the potential involvement of lncRNA-gene interactions during the regulation of several important genes (e.g., Toll-like receptor 1 [TLR1] and transglutaminase-1 [TGM1]) in response to LPS challenge and radial organ complex regeneration in sea cucumbers. Overall, this pioneer systematic identification, annotation, and characterization of lncRNAs in echinoderm pave the way for similar studies and future genetic, genomic, and evolutionary research in non-model species.
Genomic control of patterning

PubMed Central

Peter, Isabelle S.; Davidson, Eric H.

2014-01-01

The development of multicellular organisms involves the partitioning of the organism into territories of cells of specific structure and function. The information for spatial patterning processes is directly encoded in the genome. The genome determines its own usage depending on stage and position, by means of interactions that constitute gene regulatory networks (GRNs). The GRN driving endomesoderm development in sea urchin embryos illustrates different regulatory strategies by which developmental programs are initiated, orchestrated, stabilized or excluded to define the pattern of specified territories in the developing embryo. PMID:19378258
Interplay between cardiac transcription factors and non-coding RNAs in predisposing to atrial fibrillation.

PubMed

Mikhailov, Alexander T; Torrado, Mario

2018-05-12

There is growing evidence that putative gene regulatory networks including cardio-enriched transcription factors, such as PITX2, TBX5, ZFHX3, and SHOX2, and their effector/target genes along with downstream non-coding RNAs can play a potentially important role in the process of adaptive and maladaptive atrial rhythm remodeling. In turn, expression of atrial fibrillation-associated transcription factors is under the control of upstream regulatory non-coding RNAs. This review broadly explores gene regulatory mechanisms associated with susceptibility to atrial fibrillation-with key examples from both animal models and patients-within the context of both cardiac transcription factors and non-coding RNAs. These two systems appear to have multiple levels of cross-regulation and act coordinately to achieve effective control of atrial rhythm effector gene expression. Perturbations of a dynamic expression balance between transcription factors and corresponding non-coding RNAs can provoke the development or promote the progression of atrial fibrillation. We also outline deficiencies in current models and discuss ongoing studies to clarify remaining mechanistic questions. An understanding of the function of transcription factors and non-coding RNAs in gene regulatory networks associated with atrial fibrillation risk will enable the development of innovative therapeutic strategies.
Characterization of full-length sequenced cDNA inserts (FLIcs) from Atlantic salmon (Salmo salar)

PubMed Central

Andreassen, Rune; Lunner, Sigbjørn; Høyheim, Bjørn

2009-01-01

Background Sequencing of the Atlantic salmon genome is now being planned by an international research consortium. Full-length sequenced inserts from cDNAs (FLIcs) are an important tool for correct annotation and clustering of the genomic sequence in any species. The large amount of highly similar duplicate sequences caused by the relatively recent genome duplication in the salmonid ancestor represents a particular challenge for the genome project. FLIcs will therefore be an extremely useful resource for the Atlantic salmon sequencing project. In addition to be helpful in order to distinguish between duplicate genome regions and in determining correct gene structures, FLIcs are an important resource for functional genomic studies and for investigation of regulatory elements controlling gene expression. In contrast to the large number of ESTs available, including the ESTs from 23 developmental and tissue specific cDNA libraries contributed by the Salmon Genome Project (SGP), the number of sequences where the full-length of the cDNA insert has been determined has been small. Results High quality full-length insert sequences from 560 pre-smolt white muscle tissue specific cDNAs were generated, accession numbers [GenBank: BT043497 - BT044056]. Five hundred and ten (91%) of the transcripts were annotated using Gene Ontology (GO) terms and 440 of the FLIcs are likely to contain a complete coding sequence (cCDS). The sequence information was used to identify putative paralogs, characterize salmon Kozak motifs, polyadenylation signal variation and to identify motifs likely to be involved in the regulation of particular genes. Finally, conserved 7-mers in the 3'UTRs were identified, of which some were identical to miRNA target sequences. Conclusion This paper describes the first Atlantic salmon FLIcs from a tissue and developmental stage specific cDNA library. We have demonstrated that many FLIcs contained a complete coding sequence (cCDS). This suggests that the remaining cDNA libraries generated by SGP represent a valuable cCDS FLIc source. The conservation of 7-mers in 3'UTRs indicates that these motifs are functionally important. Identity between some of these 7-mers and miRNA target sequences suggests that they are miRNA targets in Salmo salar transcripts as well. PMID:19878547
The presence, role and clinical use of spermatozoal RNAs

PubMed Central

Jodar, Meritxell; Selvaraju, Sellappan; Sendler, Edward; Diamond, Michael P.; Krawetz, Stephen A.

2013-01-01

BACKGROUND Spermatozoa are highly differentiated, transcriptionally inert cells characterized by a compact nucleus with minimal cytoplasm. Nevertheless they contain a suite of unique RNAs that are delivered to oocyte upon fertilization. They are likely integrated as part of many different processes including genome recognition, consolidation-confrontation, early embryonic development and epigenetic transgenerational inherence. Spermatozoal RNAs also provide a window into the developmental history of each sperm thereby providing biomarkers of fertility and pregnancy outcome which are being intensely studied. METHODS Literature searches were performed to review the majority of spermatozoal RNA studies that described potential functions and clinical applications with emphasis on Next-Generation Sequencing. Human, mouse, bovine and stallion were compared as their distribution and composition of spermatozoal RNAs, using these techniques, have been described. RESULTS Comparisons highlighted the complexity of the population of spermatozoal RNAs that comprises rRNA, mRNA and both large and small non-coding RNAs. RNA-seq analysis has revealed that only a fraction of the larger RNAs retain their structure. While rRNAs are the most abundant and are highly fragmented, ensuring a translationally quiescent state, other RNAs including some mRNAs retain their functional potential, thereby increasing the opportunity for regulatory interactions. Abundant small non-coding RNAs retained in spermatozoa include miRNAs and piRNAs. Some, like miR-34c are essential to the early embryo development required for the first cellular division. Others like the piRNAs are likely part of the genomic dance of confrontation and consolidation. Other non-coding spermatozoal RNAs include transposable elements, annotated lnc-RNAs, intronic retained elements, exonic elements, chromatin-associated RNAs, small-nuclear ILF3/NF30 associated RNAs, quiescent RNAs, mse-tRNAs and YRNAs. Some non-coding RNAs are known to act as epigenetic modifiers, inducing histone modifications and DNA methylation, perhaps playing a role in transgenerational epigenetic inherence. Transcript profiling holds considerable potential for the discovery of fertility biomarkers for both agriculture and human medicine. Comparing the differential RNA profiles of infertile and fertile individuals as well as assessing species similarities, should resolve the regulatory pathways contributing to male factor infertility. CONCLUSIONS Dad delivers a complex population of RNAs to the oocyte at fertilization that likely influences fertilization, embryo development, the phenotype of the offspring and possibly future generations. Development is continuing on the use of spermatozoal RNA profiles as phenotypic markers of male factor status for use as clinical diagnostics of the father's contribution to the birth of a healthy child. PMID:23856356
Genome defense against exogenous nucleic acids in eukaryotes by non-coding DNA occurs through CRISPR-like mechanisms in the cytosol and the bodyguard protection in the nucleus.

PubMed

Qiu, Guo-Hua

2016-01-01

In this review, the protective function of the abundant non-coding DNA in the eukaryotic genome is discussed from the perspective of genome defense against exogenous nucleic acids. Peripheral non-coding DNA has been proposed to act as a bodyguard that protects the genome and the central protein-coding sequences from ionizing radiation-induced DNA damage. In the proposed mechanism of protection, the radicals generated by water radiolysis in the cytosol and IR energy are absorbed, blocked and/or reduced by peripheral heterochromatin; then, the DNA damage sites in the heterochromatin are removed and expelled from the nucleus to the cytoplasm through nuclear pore complexes, most likely through the formation of extrachromosomal circular DNA. To strengthen this hypothesis, this review summarizes the experimental evidence supporting the protective function of non-coding DNA against exogenous nucleic acids. Based on these data, I hypothesize herein about the presence of an additional line of defense formed by small RNAs in the cytosol in addition to their bodyguard protection mechanism in the nucleus. Therefore, exogenous nucleic acids may be initially inactivated in the cytosol by small RNAs generated from non-coding DNA via mechanisms similar to the prokaryotic CRISPR-Cas system. Exogenous nucleic acids may enter the nucleus, where some are absorbed and/or blocked by heterochromatin and others integrate into chromosomes. The integrated fragments and the sites of DNA damage are removed by repetitive non-coding DNA elements in the heterochromatin and excluded from the nucleus. Therefore, the normal eukaryotic genome and the central protein-coding sequences are triply protected by non-coding DNA against invasion by exogenous nucleic acids. This review provides evidence supporting the protective role of non-coding DNA in genome defense. Copyright © 2016 Elsevier B.V. All rights reserved.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.