Science.gov

Sample records for cis-regulatory motif directs

  1. Computational discovery of cis-regulatory modules in Drosophila without prior knowledge of motifs

    PubMed Central

    Ivan, Andra; Halfon, Marc S; Sinha, Saurabh

    2008-01-01

    We consider the problem of predicting cis-regulatory modules without knowledge of motifs. We formulate this problem in a pragmatic setting, and create over 30 new data sets, using Drosophila modules, to use as a 'benchmark'. We propose two new methods for the problem, and evaluate these, as well as two existing methods, on our benchmark. We find that the challenge of predicting cis-regulatory modules ab initio, without any input of relevant motifs, is a realizable goal. PMID:18226245

  2. Comparative genomics of metabolic capacities of regulons controlled by cis-regulatory RNA motifs in bacteria

    PubMed Central

    2013-01-01

    Background In silico comparative genomics approaches have been efficiently used for functional prediction and reconstruction of metabolic and regulatory networks. Riboswitches are metabolite-sensing structures often found in bacterial mRNA leaders controlling gene expression on transcriptional or translational levels. An increasing number of riboswitches and other cis-regulatory RNAs have been recently classified into numerous RNA families in the Rfam database. High conservation of these RNA motifs provides a unique advantage for their genomic identification and comparative analysis. Results A comparative genomics approach implemented in the RegPredict tool was used for reconstruction and functional annotation of regulons controlled by RNAs from 43 Rfam families in diverse taxonomic groups of Bacteria. The inferred regulons include ~5200 cis-regulatory RNAs and more than 12000 target genes in 255 microbial genomes. All predicted RNA-regulated genes were classified into specific and overall functional categories. Analysis of taxonomic distribution of these categories allowed us to establish major functional preferences for each analyzed cis-regulatory RNA motif family. Overall, most RNA motif regulons showed predictable functional content in accordance with their experimentally established effector ligands. Our results suggest that some RNA motifs (including thiamin pyrophosphate and cobalamin riboswitches that control the cofactor metabolism) are widespread and likely originated from the last common ancestor of all bacteria. However, many more analyzed RNA motifs are restricted to a narrow taxonomic group of bacteria and likely represent more recent evolutionary innovations. Conclusions The reconstructed regulatory networks for major known RNA motifs substantially expand the existing knowledge of transcriptional regulation in bacteria. The inferred regulons can be used for genetic experiments, functional annotations of genes, metabolic reconstruction and

  3. On the Concept of Cis-regulatory Information: From Sequence Motifs to Logic Functions

    NASA Astrophysics Data System (ADS)

    Tarpine, Ryan; Istrail, Sorin

    The regulatory genome is about the “system level organization of the core genomic regulatory apparatus, and how this is the locus of causality underlying the twin phenomena of animal development and animal evolution” (E.H. Davidson. The Regulatory Genome: Gene Regulatory Networks in Development and Evolution, Academic Press, 2006). Information processing in the regulatory genome is done through regulatory states, defined as sets of transcription factors (sequence-specific DNA binding proteins which determine gene expression) that are expressed and active at the same time. The core information processing machinery consists of modular DNA sequence elements, called cis-modules, that interact with transcription factors. The cis-modules “read” the information contained in the regulatory state of the cell through transcription factor binding, “process” it, and directly or indirectly communicate with the basal transcription apparatus to determine gene expression. This endowment of each gene with the information-receiving capacity through their cis-regulatory modules is essential for the response to every possible regulatory state to which it might be exposed during all phases of the life cycle and in all cell types. We present here a set of challenges addressed by our CYRENE research project aimed at studying the cis-regulatory code of the regulatory genome. The CYRENE Project is devoted to (1) the construction of a database, the cis-Lexicon, containing comprehensive information across species about experimentally validated cis-regulatory modules; and (2) the software development of a next-generation genome browser, the cis-Browser, specialized for the regulatory genome. The presentation is anchored on three main computational challenges: the Gene Naming Problem, the Consensus Sequence Bottleneck Problem, and the Logic Function Inference Problem.

  4. Bacterial regulon modeling and prediction based on systematic cis regulatory motif analyses

    NASA Astrophysics Data System (ADS)

    Liu, Bingqiang; Zhou, Chuan; Li, Guojun; Zhang, Hanyuan; Zeng, Erliang; Liu, Qi; Ma, Qin

    2016-03-01

    Regulons are the basic units of the response system in a bacterial cell, and each consists of a set of transcriptionally co-regulated operons. Regulon elucidation is the basis for studying the bacterial global transcriptional regulation network. In this study, we designed a novel co-regulation score between a pair of operons based on accurate operon identification and cis regulatory motif analyses, which can capture their co-regulation relationship much better than other scores. Taking full advantage of this discovery, we developed a new computational framework and built a novel graph model for regulon prediction. This model integrates the motif comparison and clustering and makes the regulon prediction problem substantially more solvable and accurate. To evaluate our prediction, a regulon coverage score was designed based on the documented regulons and their overlap with our prediction; and a modified Fisher Exact test was implemented to measure how well our predictions match the co-expressed modules derived from E. coli microarray gene-expression datasets collected under 466 conditions. The results indicate that our program consistently performed better than others in terms of the prediction accuracy. This suggests that our algorithms substantially improve the state-of-the-art, leading to a computational capability to reliably predict regulons for any bacteria.

  5. Evolution of New cis-Regulatory Motifs Required for Cell-Specific Gene Expression in Caenorhabditis

    PubMed Central

    Félix, Marie-Anne

    2016-01-01

    Patterning of C. elegans vulval cell fates relies on inductive signaling. In this induction event, a single cell, the gonadal anchor cell, secretes LIN-3/EGF and induces three out of six competent precursor cells to acquire a vulval fate. We previously showed that this developmental system is robust to a four-fold variation in lin-3/EGF genetic dose. Here using single-molecule FISH, we find that the mean level of expression of lin-3 in the anchor cell is remarkably conserved. No change in lin-3 expression level could be detected among C. elegans wild isolates and only a low level of change—less than 30%—in the Caenorhabditis genus and in Oscheius tipulae. In C. elegans, lin-3 expression in the anchor cell is known to require three transcription factor binding sites, specifically two E-boxes and a nuclear-hormone-receptor (NHR) binding site. Mutation of any of these three elements in C. elegans results in a dramatic decrease in lin-3 expression. Yet only a single E-box is found in the Drosophilae supergroup of Caenorhabditis species, including C. angaria, while the NHR-binding site likely only evolved at the base of the Elegans group. We find that a transgene from C. angaria bearing a single E-box is sufficient for normal expression in C. elegans. Even a short 58 bp cis-regulatory fragment from C. angaria with this single E-box is able to replace the three transcription factor binding sites at the endogenous C. elegans lin-3 locus, resulting in the wild-type expression level. Thus, regulatory evolution occurring in cis within a 58 bp lin-3 fragment, results in a strict requirement for the NHR binding site and a second E-box in C. elegans. This single-cell, single-molecule, quantitative and functional evo-devo study demonstrates that conserved expression levels can hide extensive change in cis-regulatory site requirements and highlights the evolution of new cis-regulatory elements required for cell-specific gene expression. PMID:27588814

  6. Evolution of New cis-Regulatory Motifs Required for Cell-Specific Gene Expression in Caenorhabditis.

    PubMed

    Barkoulas, Michalis; Vargas Velazquez, Amhed M; Peluffo, Alexandre E; Félix, Marie-Anne

    2016-09-01

    Patterning of C. elegans vulval cell fates relies on inductive signaling. In this induction event, a single cell, the gonadal anchor cell, secretes LIN-3/EGF and induces three out of six competent precursor cells to acquire a vulval fate. We previously showed that this developmental system is robust to a four-fold variation in lin-3/EGF genetic dose. Here using single-molecule FISH, we find that the mean level of expression of lin-3 in the anchor cell is remarkably conserved. No change in lin-3 expression level could be detected among C. elegans wild isolates and only a low level of change-less than 30%-in the Caenorhabditis genus and in Oscheius tipulae. In C. elegans, lin-3 expression in the anchor cell is known to require three transcription factor binding sites, specifically two E-boxes and a nuclear-hormone-receptor (NHR) binding site. Mutation of any of these three elements in C. elegans results in a dramatic decrease in lin-3 expression. Yet only a single E-box is found in the Drosophilae supergroup of Caenorhabditis species, including C. angaria, while the NHR-binding site likely only evolved at the base of the Elegans group. We find that a transgene from C. angaria bearing a single E-box is sufficient for normal expression in C. elegans. Even a short 58 bp cis-regulatory fragment from C. angaria with this single E-box is able to replace the three transcription factor binding sites at the endogenous C. elegans lin-3 locus, resulting in the wild-type expression level. Thus, regulatory evolution occurring in cis within a 58 bp lin-3 fragment, results in a strict requirement for the NHR binding site and a second E-box in C. elegans. This single-cell, single-molecule, quantitative and functional evo-devo study demonstrates that conserved expression levels can hide extensive change in cis-regulatory site requirements and highlights the evolution of new cis-regulatory elements required for cell-specific gene expression. PMID:27588814

  7. A cis-regulatory module activating transcription in the suspensor contains five cis-regulatory elements.

    PubMed

    Henry, Kelli F; Kawashima, Tomokazu; Goldberg, Robert B

    2015-06-01

    Little is known about the molecular mechanisms by which the embryo proper and suspensor of plant embryos activate specific gene sets shortly after fertilization. We analyzed the upstream region of the Scarlet Runner Bean (Phaseolus coccineus) G564 gene in order to understand how genes are activated specifically in the suspensor during early embryo development. Previously, we showed that a 54-bp fragment of the G564 upstream region is sufficient for suspensor transcription and contains at least three required cis-regulatory sequences, including the 10-bp motif (5'-GAAAAGCGAA-3'), the 10 bp-like motif (5'-GAAAAACGAA-3'), and Region 2 motif (partial sequence 5'-TTGGT-3'). Here, we use site-directed mutagenesis experiments in transgenic tobacco globular-stage embryos to identify two additional cis-regulatory elements within the 54-bp cis-regulatory module that are required for G564 suspensor transcription: the Fifth motif (5'-GAGTTA-3') and a third 10-bp-related sequence (5'-GAAAACCACA-3'). Further deletion of the 54-bp fragment revealed that a 47-bp fragment containing the five motifs (the 10-bp, 10-bp-like, 10-bp-related, Region 2 and Fifth motifs) is sufficient for suspensor transcription, and represents a cis-regulatory module. A consensus sequence for each type of motif was determined by comparing motif sequences shown to activate suspensor transcription. Phylogenetic analyses suggest that the regulation of G564 is evolutionarily conserved. A homologous cis-regulatory module was found upstream of the G564 ortholog in the Common Bean (Phaseolus vulgaris), indicating that the regulation of G564 is evolutionarily conserved in closely related bean species.

  8. Systems analysis of cis-regulatory motifs in C4 photosynthesis genes using maize and rice leaf transcriptomic data during a process of de-etiolation

    PubMed Central

    Xu, Jiajia; Bräutigam, Andrea; Weber, Andreas P. M.; Zhu, Xin-Guang

    2016-01-01

    Identification of potential cis-regulatory motifs controlling the development of C4 photosynthesis is a major focus of current research. In this study, we used time-series RNA-seq data collected from etiolated maize and rice leaf tissues sampled during a de-etiolation process to systematically characterize the expression patterns of C4-related genes and to further identify potential cis elements in five different genomic regions (i.e. promoter, 5′UTR, 3′UTR, intron, and coding sequence) of C4 orthologous genes. The results demonstrate that although most of the C4 genes show similar expression patterns, a number of them, including chloroplast dicarboxylate transporter 1, aspartate aminotransferase, and triose phosphate transporter, show shifted expression patterns compared with their C3 counterparts. A number of conserved short DNA motifs between maize C4 genes and their rice orthologous genes were identified not only in the promoter, 5′UTR, 3′UTR, and coding sequences, but also in the introns of core C4 genes. We also identified cis-regulatory motifs that exist in maize C4 genes and also in genes showing similar expression patterns as maize C4 genes but that do not exist in rice C3 orthologs, suggesting a possible recruitment of pre-existing cis-elements from genes unrelated to C4 photosynthesis into C4 photosynthesis genes during C4 evolution. PMID:27436282

  9. In planta analysis of a cis-regulatory cytokinin response motif in Arabidopsis and identification of a novel enhancer sequence.

    PubMed

    Ramireddy, Eswarayya; Brenner, Wolfram G; Pfeifer, Andreas; Heyl, Alexander; Schmülling, Thomas

    2013-07-01

    The phytohormone cytokinin plays a key role in regulating plant growth and development, and is involved in numerous physiological responses to environmental changes. The type-B response regulators, which regulate the transcription of cytokinin response genes, are a part of the cytokinin signaling system. Arabidopsis thaliana encodes 11 type-B response regulators (type-B ARRs), and some of them were shown to bind in vitro to the core cytokinin response motif (CRM) 5'-(A/G)GAT(T/C)-3' or, in the case of ARR1, to an extended motif (ECRM), 5'-AAGAT(T/C)TT-3'. Here we obtained in planta proof for the functionality of the latter motif. Promoter deletion analysis of the primary cytokinin response gene ARR6 showed that a combination of two extended motifs within the promoter is required to mediate the full transcriptional activation by ARR1 and other type-B ARRs. CRMs were found to be over-represented in the vicinity of ECRMs in the promoters of cytokinin-regulated genes, suggesting their functional relevance. Moreover, an evolutionarily conserved 27 bp long T-rich region between -220 and -193 bp was identified and shown to be required for the full activation by type-B ARRs and the response to cytokinin. This novel enhancer is not bound by the DNA-binding domain of ARR1, indicating that additional proteins might be involved in mediating the transcriptional cytokinin response. Furthermore, genome-wide expression profiling identified genes, among them ARR16, whose induction by cytokinin depends on both ARR1 and other specific type-B ARRs. This together with the ECRM/CRM sequence clustering indicates cooperative action of different type-B ARRs for the activation of particular target genes. PMID:23620480

  10. Direct regulation of knot gene expression by Ultrabithorax and the evolution of cis-regulatory elements in Drosophila.

    PubMed

    Hersh, Bradley M; Carroll, Sean B

    2005-04-01

    The regulation of development by Hox proteins is important in the evolution of animal morphology, but how the regulatory sequences of Hox-regulated target genes function and evolve is unclear. To understand the regulatory organization and evolution of a Hox target gene, we have identified a wing-specific cis-regulatory element controlling the knot gene, which is expressed in the developing Drosophila wing but not the haltere. This regulatory element contains a single binding site that is crucial for activation by the transcription factor Cubitus interruptus (Ci), and a cluster of binding sites for repression by the Hox protein Ultrabithorax (UBX). The negative and positive control regions are physically separable, demonstrating that UBX does not repress by competing for occupancy of Ci-binding sites. Although knot expression is conserved among Drosophila species, this cluster of UBX binding sites is not. We isolated the knot wing cis-regulatory element from D. pseudoobscura, which contains a cluster of UBX-binding sites that is not homologous to the functionally defined D. melanogaster cluster. It is, however, homologous to a second D. melanogaster region containing a cluster of UBX sites that can also function as a repressor element. Thus, the knot regulatory region in D. melanogaster has two apparently functionally redundant blocks of sequences for repression by UBX, both of which are widely separated from activator sequences. This redundancy suggests that the complete evolutionary unit of regulatory control is larger than the minimal experimentally defined control element. The span of regulatory sequences upon which selection acts may, in general, be more expansive and less modular than functional studies of these elements have previously indicated.

  11. Evolution of cis-regulatory sequence and function in Diptera.

    PubMed

    Wittkopp, P J

    2006-09-01

    Cis-regulatory sequences direct patterns of gene expression essential for development and physiology. Evolutionary changes in these sequences contribute to phenotypic divergence. Despite their importance, cis-regulatory regions remain one of the most enigmatic features of the genome. Patterns of sequence evolution can be used to identify cis-regulatory elements, but the power of this approach depends upon the relationship between sequence and function. Comparative studies of gene regulation among Diptera reveal that divergent sequences can underlie conserved expression, and that expression differences can evolve despite largely similar sequences. This complex structure-function relationship is the primary impediment for computational identification and interpretation of cis-regulatory sequences. Biochemical characterization and in vivo assays of cis-regulatory sequences on a genomic-scale will relieve this barrier.

  12. NAC transcription factor genes: genome-wide identification, phylogenetic, motif and cis-regulatory element analysis in pigeonpea (Cajanus cajan (L.) Millsp.).

    PubMed

    Satheesh, Viswanathan; Jagannadham, P Tej Kumar; Chidambaranathan, Parameswaran; Jain, P K; Srinivasan, R

    2014-12-01

    The NAC (NAM, ATAF and CUC) proteins are plant-specific transcription factors implicated in development and stress responses. In the present study 88 pigeonpea NAC genes were identified from the recently published draft genome of pigeonpea by using homology based and de novo prediction programmes. These sequences were further subjected to phylogenetic, motif and promoter analyses. In motif analysis, highly conserved motifs were identified in the NAC domain and also in the C-terminal region of the NAC proteins. A phylogenetic reconstruction using pigeonpea, Arabidopsis and soybean NAC genes revealed 33 putative stress-responsive pigeonpea NAC genes. Several stress-responsive cis-elements were identified through in silico analysis of the promoters of these putative stress-responsive genes. This analysis is the first report of NAC gene family in pigeonpea and will be useful for the identification and selection of candidate genes associated with stress tolerance. PMID:25108674

  13. Modeling DNA sequence-based cis-regulatory gene networks.

    PubMed

    Bolouri, Hamid; Davidson, Eric H

    2002-06-01

    Gene network analysis requires computationally based models which represent the functional architecture of regulatory interactions, and which provide directly testable predictions. The type of model that is useful is constrained by the particular features of developmentally active cis-regulatory systems. These systems function by processing diverse regulatory inputs, generating novel regulatory outputs. A computational model which explicitly accommodates this basic concept was developed earlier for the cis-regulatory system of the endo16 gene of the sea urchin. This model represents the genetically mandated logic functions that the system executes, but also shows how time-varying kinetic inputs are processed in different circumstances into particular kinetic outputs. The same basic design features can be utilized to construct models that connect the large number of cis-regulatory elements constituting developmental gene networks. The ultimate aim of the network models discussed here is to represent the regulatory relationships among the genomic control systems of the genes in the network, and to state their functional meaning. The target site sequences of the cis-regulatory elements of these genes constitute the physical basis of the network architecture. Useful models for developmental regulatory networks must represent the genetic logic by which the system operates, but must also be capable of explaining the real time dynamics of cis-regulatory response as kinetic input and output data become available. Most importantly, however, such models must display in a direct and transparent manner fundamental network design features such as intra- and intercellular feedback circuitry; the sources of parallel inputs into each cis-regulatory element; gene battery organization; and use of repressive spatial inputs in specification and boundary formation. Successful network models lead to direct tests of key architectural features by targeted cis-regulatory analysis. PMID

  14. cis-regulatory region analysis using BEARR.

    PubMed

    Vega, Vinsensius Berlian

    2006-01-01

    Genome-wide studies are fast becoming the norm, partly fueled by the availability of genome sequences and the feasibility of high-throughput experimental platforms, e.g., microarrays. An important aspect in any genome-wide studies is determination of regulatory relationships, believed to be primarily transacted through transcription factor binding to DNA. Identification of specific transcription factor binding sites in the cis-regulatory regions of genes makes it possible to list direct targets of transcription factors, model transcriptional regulatory networks, and mine other associated datasets for relevant targets for experimental and clinical manipulation. We have developed a web-based tool to assist biologists in efficiently carrying out the analysis of genes from studies of specific transcription factors or otherwise. The batch extraction and analysis of cis-regulatory regions (BEARR) facilitates identification, extraction, and analysis of regulatory regions from the large amount of data that is typically generated in genome-wide studies. This chapter highlights features and serves as a tutorial for using this publicly available software. The URL is http://giscompute.gis.a-star.edu.sg/~vega/BEARR1.0/. PMID:16888354

  15. Identification of a cis-regulatory element by transient analysis of co-ordinately regulated genes

    PubMed Central

    Dare, Andrew P; Schaffer, Robert J; Lin-Wang, Kui; Allan, Andrew C; Hellens, Roger P

    2008-01-01

    Background Transcription factors (TFs) co-ordinately regulate target genes that are dispersed throughout the genome. This co-ordinate regulation is achieved, in part, through the interaction of transcription factors with conserved cis-regulatory motifs that are in close proximity to the target genes. While much is known about the families of transcription factors that regulate gene expression in plants, there are few well characterised cis-regulatory motifs. In Arabidopsis, over-expression of the MYB transcription factor PAP1 (PRODUCTION OF ANTHOCYANIN PIGMENT 1) leads to transgenic plants with elevated anthocyanin levels due to the co-ordinated up-regulation of genes in the anthocyanin biosynthetic pathway. In addition to the anthocyanin biosynthetic genes, there are a number of un-associated genes that also change in expression level. This may be a direct or indirect consequence of the over-expression of PAP1. Results Oligo array analysis of PAP1 over-expression Arabidopsis plants identified genes co-ordinately up-regulated in response to the elevated expression of this transcription factor. Transient assays on the promoter regions of 33 of these up-regulated genes identified eight promoter fragments that were transactivated by PAP1. Bioinformatic analysis on these promoters revealed a common cis-regulatory motif that we showed is required for PAP1 dependent transactivation. Conclusion Co-ordinated gene regulation by individual transcription factors is a complex collection of both direct and indirect effects. Transient transactivation assays provide a rapid method to identify direct target genes from indirect target genes. Bioinformatic analysis of the promoters of these direct target genes is able to locate motifs that are common to this sub-set of promoters, which is impossible to identify with the larger set of direct and indirect target genes. While this type of analysis does not prove a direct interaction between protein and DNA, it does provide a tool to

  16. The Role of cis Regulatory Evolution in Maize Domestication

    PubMed Central

    Lemmon, Zachary H.; Bukowski, Robert; Sun, Qi; Doebley, John F.

    2014-01-01

    Gene expression differences between divergent lineages caused by modification of cis regulatory elements are thought to be important in evolution. We assayed genome-wide cis and trans regulatory differences between maize and its wild progenitor, teosinte, using deep RNA sequencing in F1 hybrid and parent inbred lines for three tissue types (ear, leaf and stem). Pervasive regulatory variation was observed with approximately 70% of ∼17,000 genes showing evidence of regulatory divergence between maize and teosinte. However, many fewer genes (1,079 genes) show consistent cis differences with all sampled maize and teosinte lines. For ∼70% of these 1,079 genes, the cis differences are specific to a single tissue. The number of genes with cis regulatory differences is greatest for ear tissue, which underwent a drastic transformation in form during domestication. As expected from the domestication bottleneck, maize possesses less cis regulatory variation than teosinte with this deficit greatest for genes showing maize-teosinte cis regulatory divergence, suggesting selection on cis regulatory differences during domestication. Consistent with selection on cis regulatory elements, genes with cis effects correlated strongly with genes under positive selection during maize domestication and improvement, while genes with trans regulatory effects did not. We observed a directional bias such that genes with cis differences showed higher expression of the maize allele more often than the teosinte allele, suggesting domestication favored up-regulation of gene expression. Finally, this work documents the cis and trans regulatory changes between maize and teosinte in over 17,000 genes for three tissues. PMID:25375861

  17. The role of cis regulatory evolution in maize domestication.

    PubMed

    Lemmon, Zachary H; Bukowski, Robert; Sun, Qi; Doebley, John F

    2014-11-01

    Gene expression differences between divergent lineages caused by modification of cis regulatory elements are thought to be important in evolution. We assayed genome-wide cis and trans regulatory differences between maize and its wild progenitor, teosinte, using deep RNA sequencing in F1 hybrid and parent inbred lines for three tissue types (ear, leaf and stem). Pervasive regulatory variation was observed with approximately 70% of ∼17,000 genes showing evidence of regulatory divergence between maize and teosinte. However, many fewer genes (1,079 genes) show consistent cis differences with all sampled maize and teosinte lines. For ∼70% of these 1,079 genes, the cis differences are specific to a single tissue. The number of genes with cis regulatory differences is greatest for ear tissue, which underwent a drastic transformation in form during domestication. As expected from the domestication bottleneck, maize possesses less cis regulatory variation than teosinte with this deficit greatest for genes showing maize-teosinte cis regulatory divergence, suggesting selection on cis regulatory differences during domestication. Consistent with selection on cis regulatory elements, genes with cis effects correlated strongly with genes under positive selection during maize domestication and improvement, while genes with trans regulatory effects did not. We observed a directional bias such that genes with cis differences showed higher expression of the maize allele more often than the teosinte allele, suggesting domestication favored up-regulation of gene expression. Finally, this work documents the cis and trans regulatory changes between maize and teosinte in over 17,000 genes for three tissues.

  18. BEARR: Batch Extraction and Analysis of cis-Regulatory Regions.

    PubMed

    Vega, Vinsensius B; Bangarusamy, Dhinoth Kumar; Miller, Lance D; Liu, Edison T; Lin, Chin-Yo

    2004-07-01

    Transcription factors play important roles in regulating biological and disease processes. Microarray technology has enabled researchers to simultaneously monitor changes in the expression of thousands of transcripts. By identifying specific transcription factor binding sites in the cis-regulatory regions of differentially expressed genes, it is then possible to identify direct targets of transcription factors, model transcriptional regulatory networks and mine the dataset for relevant targets for experimental and clinical manipulation. We have developed web-based software to assist biologists in efficiently carrying out the analysis of microarray data from studies of specific transcription factors. Batch Extraction and Analysis of cis-Regulatory Regions, or BEARR, accepts gene identifier lists from microarray data analysis tools and facilitates identification, extraction and analysis of regulatory regions from the large amount of data that is typically generated in these types of studies. The software is publicly available at http://giscompute.gis.a-star.edu.sg/~vega/BEARR1.0/. PMID:15215391

  19. Phylogeny disambiguates the evolution of heat-shock cis-regulatory elements in Drosophila.

    PubMed

    Tian, Sibo; Haney, Robert A; Feder, Martin E

    2010-01-01

    Heat-shock genes have a well-studied control mechanism for their expression that is mediated through cis-regulatory motifs known as heat-shock elements (HSEs). The evolution of important features of this control mechanism has not been investigated in detail, however. Here we exploit the genome sequencing of multiple Drosophila species, combined with a wealth of available information on the structure and function of HSEs in D. melanogaster, to undertake this investigation. We find that in single-copy heat shock genes, entire HSEs have evolved or disappeared 14 times, and the phylogenetic approach bounds the timing and direction of these evolutionary events in relation to speciation. In contrast, in the multi-copy gene Hsp70, the number of HSEs is nearly constant across species. HSEs evolve in size, position, and sequence within heat-shock promoters. In turn, functional significance of certain features is implicated by preservation despite this evolutionary change; these features include tail-to-tail arrangements of HSEs, gapped HSEs, and the presence or absence of entire HSEs. The variation among Drosophila species indicates that the cis-regulatory encoding of responsiveness to heat and other stresses is diverse. The broad dimensions of variation uncovered are particularly important as they suggest a substantial challenge for functional studies.

  20. Toucan: deciphering the cis-regulatory logic of coregulated genes

    PubMed Central

    Aerts, Stein; Thijs, Gert; Coessens, Bert; Staes, Mik; Moreau, Yves; De Moor, Bart

    2003-01-01

    TOUCAN is a Java application for the rapid discovery of significant cis-regulatory elements from sets of coexpressed or coregulated genes. Biologists can automatically (i) retrieve genes and intergenic regions, (ii) identify putative regulatory regions, (iii) score sequences for known transcription factor binding sites, (iv) identify candidate motifs for unknown binding sites, and (v) detect those statistically over-represented sites that are characteristic for a gene set. Genes or intergenic regions are retrieved from Ensembl or EMBL, together with orthologs and supporting information. Orthologs are aligned and syntenic regions are selected as candidate regulatory regions. Putative sites for known transcription factors are detected using our MotifScanner, which scores position weight matrices using a probabilistic model. New motifs are detected using our MotifSampler based on Gibbs sampling. Binding sites characteristic for a gene set—and thus statistically over-represented with respect to a reference sequence set—are found using a binomial test. We have validated Toucan by analyzing muscle-specific genes, liver-specific genes and E2F target genes; we have easily detected many known binding sites within intergenic DNA and identified new biologically plausible sites for known and unknown transcription factors. Software available at http://www.esat.kuleuven.ac.be/∼dna/BioI/Software.html. PMID:12626717

  1. Cis-regulatory mutations in human disease

    PubMed Central

    2009-01-01

    Cis-acting regulatory sequences are required for the proper temporal and spatial control of gene expression. Variation in gene expression is highly heritable and a significant determinant of human disease susceptibility. The diversity of human genetic diseases attributed, in whole or in part, to mutations in non-coding regulatory sequences is on the rise. Improvements in genome-wide methods of associating genetic variation with human disease and predicting DNA with cis-regulatory potential are two of the major reasons for these recent advances. This review will highlight select examples from the literature that have successfully integrated genetic and genomic approaches to uncover the molecular basis by which cis-regulatory mutations alter gene expression and contribute to human disease. The fine mapping of disease-causing variants has led to the discovery of novel cis-acting regulatory elements that, in some instances, are located as far away as 1.5 Mb from the target gene. In other cases, the prior knowledge of the regulatory landscape surrounding the gene of interest aided in the selection of enhancers for mutation screening. The success of these studies should provide a framework for following up on the large number of genome-wide association studies that have identified common variants in non-coding regions of the genome that associate with increased risk of human diseases including, diabetes, autism, Crohn's, colorectal cancer, and asthma, to name a few. PMID:19641089

  2. Validation of Skeletal Muscle cis-Regulatory Module Predictions Reveals Nucleotide Composition Bias in Functional Enhancers

    PubMed Central

    Kwon, Andrew T.; Chou, Alice Yi; Arenillas, David J.; Wasserman, Wyeth W.

    2011-01-01

    We performed a genome-wide scan for muscle-specific cis-regulatory modules (CRMs) using three computational prediction programs. Based on the predictions, 339 candidate CRMs were tested in cell culture with NIH3T3 fibroblasts and C2C12 myoblasts for capacity to direct selective reporter gene expression to differentiated C2C12 myotubes. A subset of 19 CRMs validated as functional in the assay. The rate of predictive success reveals striking limitations of computational regulatory sequence analysis methods for CRM discovery. Motif-based methods performed no better than predictions based only on sequence conservation. Analysis of the properties of the functional sequences relative to inactive sequences identifies nucleotide sequence composition can be an important characteristic to incorporate in future methods for improved predictive specificity. Muscle-related TFBSs predicted within the functional sequences display greater sequence conservation than non-TFBS flanking regions. Comparison with recent MyoD and histone modification ChIP-Seq data supports the validity of the functional regions. PMID:22144875

  3. Global identification of the genetic networks and cis-regulatory elements of the cold response in zebrafish.

    PubMed

    Hu, Peng; Liu, Mingli; Zhang, Dong; Wang, Jinfeng; Niu, Hongbo; Liu, Yimeng; Wu, Zhichao; Han, Bingshe; Zhai, Wanying; Shen, Yu; Chen, Liangbiao

    2015-10-30

    The transcriptional programs of ectothermic teleosts are directly influenced by water temperature. However, the cis- and trans-factors governing cold responses are not well characterized. We profiled transcriptional changes in eight zebrafish tissues exposed to mildly and severely cold temperatures using RNA-Seq. A total of 1943 differentially expressed genes (DEGs) were identified, from which 34 clusters representing distinct tissue and temperature response expression patterns were derived using the k-means fuzzy clustering algorithm. The promoter regions of the clustered DEGs that demonstrated strong co-regulation were analysed for enriched cis-regulatory elements with a motif discovery program, DREME. Seventeen motifs, ten known and seven novel, were identified, which covered 23% of the DEGs. Two motifs predicted to be the binding sites for the transcription factors Bcl6 and Jun, respectively, were chosen for experimental verification, and they demonstrated the expected cold-induced and cold-repressed patterns of gene regulation. Protein interaction modeling of the network components followed by experimental validation suggested that Jun physically interacts with Bcl6 and might be a hub factor that orchestrates the cold response in zebrafish. Thus, the methodology used and the regulatory networks uncovered in this study provide a foundation for exploring the mechanisms of cold adaptation in teleosts.

  4. Identification of altered cis-regulatory elements in human disease.

    PubMed

    Mathelier, Anthony; Shi, Wenqiang; Wasserman, Wyeth W

    2015-02-01

    It has long been appreciated that variations in regulatory regions of genes can impact gene expression. With the advent of whole-genome sequencing (WGS), it has become possible to begin cataloging these noncoding variants. Evidence continues to accumulate linking clinical cases with cis-regulatory element disruption in a wide range of diseases. Identifying variants is becoming routine, but assessing their impact on regulation remains challenging. Bioinformatics approaches that identify variations functionally altering transcription factor (TF) binding are increasingly important for meeting this challenge. We present the current state of computational tools and resources for identifying the genomic regulatory components (cis-regulatory regions and TF binding sites, TFBSs) controlling gene transcriptional regulation. We review how such approaches can be used to interpret the potential disease causality of point mutations and small insertions or deletions. We hope this will motivate further the development of methods enabling the identification of etiological cis-regulatory variations.

  5. Functional cis-regulatory genomics for systems biology

    PubMed Central

    Nam, Jongmin; Dong, Ping; Tarpine, Ryan; Istrail, Sorin; Davidson, Eric H.

    2010-01-01

    Gene expression is controlled by interactions between trans-regulatory factors and cis-regulatory DNA sequences, and these interactions constitute the essential functional linkages of gene regulatory networks (GRNs). Validation of GRN models requires experimental cis-regulatory tests of predicted linkages to authenticate their identities and proposed functions. However, cis-regulatory analysis is, at present, at a severe bottleneck in genomic system biology because of the demanding experimental methodologies currently in use for discovering cis-regulatory modules (CRMs), in the genome, and for measuring their activities. Here we demonstrate a high-throughput approach to both discovery and quantitative characterization of CRMs. The unique aspect is use of DNA sequence tags to “barcode” CRM expression constructs, which can then be mixed, injected together into sea urchin eggs, and subsequently deconvolved. This method has increased the rate of cis-regulatory analysis by >100-fold compared with conventional one-by-one reporter assays. The utility of the DNA-tag reporters was demonstrated by the rapid discovery of 81 active CRMs from 37 previously unexplored sea urchin genes. We then obtained simultaneous high-resolution temporal characterization of the regulatory activities of more than 80 CRMs. On average 2–3 CRMs were discovered per gene. Comparison of endogenous gene expression profiles with those of the CRMs recovered from each gene showed that, for most cases, at least one CRM is active in each phase of endogenous expression, suggesting that CRM recovery was comprehensive. This approach will qualitatively alter the practice of GRN construction as well as validation, and will impact many additional areas of regulatory system biology. PMID:20142491

  6. Abundant raw material for cis-regulatory evolution in humans

    NASA Technical Reports Server (NTRS)

    Rockman, Matthew V.; Wray, Gregory A.

    2002-01-01

    Changes in gene expression and regulation--due in particular to the evolution of cis-regulatory DNA sequences--may underlie many evolutionary changes in phenotypes, yet little is known about the distribution of such variation in populations. We present in this study the first survey of experimentally validated functional cis-regulatory polymorphism. These data are derived from more than 140 polymorphisms involved in the regulation of 107 genes in Homo sapiens, the eukaryote species with the most available data. We find that functional cis-regulatory variation is widespread in the human genome and that the consequent variation in gene expression is twofold or greater for 63% of the genes surveyed. Transcription factor-DNA interactions are highly polymorphic, and regulatory interactions have been gained and lost within human populations. On average, humans are heterozygous at more functional cis-regulatory sites (>16,000) than at amino acid positions (<13,000), in part because of an overrepresentation among the former in multiallelic tandem repeat variation, especially (AC)(n) dinucleotide microsatellites. The role of microsatellites in gene expression variation may provide a larger store of heritable phenotypic variation, and a more rapid mutational input of such variation, than has been realized. Finally, we outline the distinctive consequences of cis-regulatory variation for the genotype-phenotype relationship, including ubiquitous epistasis and genotype-by-environment interactions, as well as underappreciated modes of pleiotropy and overdominance. Ordinary small-scale mutations contribute to pervasive variation in transcription rates and consequently to patterns of human phenotypic variation.

  7. SMCis: An Effective Algorithm for Discovery of Cis-Regulatory Modules

    PubMed Central

    Guo, Haitao; Huo, Hongwei; Yu, Qiang

    2016-01-01

    The discovery of cis-regulatory modules (CRMs) is a challenging problem in computational biology. Limited by the difficulty of using an HMM to model dependent features in transcriptional regulatory sequences (TRSs), the probabilistic modeling methods based on HMMs cannot accurately represent the distance between regulatory elements in TRSs and are cumbersome to model the prevailing dependencies between motifs within CRMs. We propose a probabilistic modeling algorithm called SMCis, which builds a more powerful CRM discovery model based on a hidden semi-Markov model. Our model characterizes the regulatory structure of CRMs and effectively models dependencies between motifs at a higher level of abstraction based on segments rather than nucleotides. Experimental results on three benchmark datasets indicate that our method performs better than the compared algorithms. PMID:27637070

  8. Cis-regulatory architecture of a brain-signaling center predates the origin of chordates

    PubMed Central

    Yao, Yao; Minor, Paul J.; Zhao, Ying-Tao; Jeong, Yongsu; Pani, Ariel M.; King, Anna N.; Symmons, Orsolya; Gan, Lin; Cardoso, Wellington V.; Spitz, François; Lowe, Christopher J.; Epstein, Douglas J.

    2016-01-01

    Genomic approaches have predicted hundreds of thousands of tissue specific cis-regulatory sequences, but the determinants critical to their function and evolutionary history are mostly unknown1–4. Here, we systematically decode a set of brain enhancers active in the zona limitans intrathalamica (zli), a signaling center essential for vertebrate forebrain development via the secreted morphogen, Sonic hedgehog (Shh)5,6. We apply a de novo motif analysis tool to identify six position-independent sequence motifs together with their cognate transcription factors that are essential for zli enhancer activity and Shh expression in the mouse embryo. Using knowledge of this regulatory lexicon, we discover novel Shh zli enhancers in mice, and a functionally equivalent element in hemichordates, indicating an ancient origin of the Shh zli regulatory network that predates the chordate phylum. These findings support a strategy for delineating functionally conserved enhancers in the absence of overt sequence homologies, and over extensive evolutionary distances. PMID:27064252

  9. SMCis: An Effective Algorithm for Discovery of Cis-Regulatory Modules.

    PubMed

    Guo, Haitao; Huo, Hongwei; Yu, Qiang

    2016-01-01

    The discovery of cis-regulatory modules (CRMs) is a challenging problem in computational biology. Limited by the difficulty of using an HMM to model dependent features in transcriptional regulatory sequences (TRSs), the probabilistic modeling methods based on HMMs cannot accurately represent the distance between regulatory elements in TRSs and are cumbersome to model the prevailing dependencies between motifs within CRMs. We propose a probabilistic modeling algorithm called SMCis, which builds a more powerful CRM discovery model based on a hidden semi-Markov model. Our model characterizes the regulatory structure of CRMs and effectively models dependencies between motifs at a higher level of abstraction based on segments rather than nucleotides. Experimental results on three benchmark datasets indicate that our method performs better than the compared algorithms. PMID:27637070

  10. Functional evolution of a cis-regulatory module.

    PubMed

    Ludwig, Michael Z; Palsson, Arnar; Alekseeva, Elena; Bergman, Casey M; Nathan, Janaki; Kreitman, Martin

    2005-04-01

    Lack of knowledge about how regulatory regions evolve in relation to their structure-function may limit the utility of comparative sequence analysis in deciphering cis-regulatory sequences. To address this we applied reverse genetics to carry out a functional genetic complementation analysis of a eukaryotic cis-regulatory module-the even-skipped stripe 2 enhancer-from four Drosophila species. The evolution of this enhancer is non-clock-like, with important functional differences between closely related species and functional convergence between distantly related species. Functional divergence is attributable to differences in activation levels rather than spatiotemporal control of gene expression. Our findings have implications for understanding enhancer structure-function, mechanisms of speciation and computational identification of regulatory modules.

  11. CREME: Cis-Regulatory Module Explorer for the Human Genome

    SciTech Connect

    Loots, G G; Sharan, R; Ovcharenko, I; Ben-Hur, A

    2004-02-11

    The binding of transcription factors to specific regulatory sequence elements is a primary mechanism for controlling gene transcription. Eukaryotic genes are often regulated by several transcription factors, whose binding sites are tightly clustered and form cis-regulatory modules. In this paper we present a web-server, CREME, for identifying and visualizing cis-regulatory modules in the promoter regions of a given set of potentially co-regulated genes. CREME relies on a database of putative transcription factor binding sites that have been annotated across the human genome using a library of position weight matrices and evolutionary conservation with the mouse and rat genomes. A search algorithm is applied to this dataset to identify combinations of transcription factors whose binding sites tend to co-occur in close proximity in the promoter regions of the input gene set. The identified cis-regulatory modules are statistically scored and significant combinations are reported and graphically visualized. Our web-server is available at http://creme.dcode.org/.

  12. cis-Regulatory control of the initial neurogenic pattern of onecut gene expression in the sea urchin embryo.

    PubMed

    Barsi, Julius C; Davidson, Eric H

    2016-01-01

    Specification of the ciliated band (CB) of echinoid embryos executes three spatial functions essential for postgastrular organization. These are establishment of a band about 5 cells wide which delimits and bounds other embryonic territories; definition of a neurogenic domain within this band; and generation within it of arrays of ciliary cells that bear the special long cilia from which the structure derives its name. In Strongylocentrotus purpuratus the spatial coordinates of the future ciliated band are initially and exactly determined by the disposition of a ring of cells that transcriptionally activate the onecut homeodomain regulatory gene, beginning in blastula stage, long before the appearance of the CB per se. Thus the cis-regulatory apparatus that governs onecut expression in the blastula directly reveals the genomic sequence code by which these aspects of the spatial organization of the embryo are initially determined. We screened the entire onecut locus and its flanking region for transcriptionally active cis-regulatory elements, and by means of BAC recombineered deletions identified three separated and required cis-regulatory modules that execute different functions. The operating logic of the crucial spatial control module accounting for the spectacularly precise and beautiful early onecut expression domain depends on spatial repression. Previously predicted oral ectoderm and aboral ectoderm repressors were identified by cis-regulatory mutation as the products of goosecoid and irxa genes respectively, while the pan-ectodermal activator SoxB1 supplies a transcriptional driver function.

  13. A primer on regression methods for decoding cis-regulatory logic

    SciTech Connect

    Das, Debopriya; Pellegrini, Matteo; Gray, Joe W.

    2009-03-03

    The rapidly emerging field of systems biology is helping us to understand the molecular determinants of phenotype on a genomic scale [1]. Cis-regulatory elements are major sequence-based determinants of biological processes in cells and tissues [2]. For instance, during transcriptional regulation, transcription factors (TFs) bind to very specific regions on the promoter DNA [2,3] and recruit the basal transcriptional machinery, which ultimately initiates mRNA transcription (Figure 1A). Learning cis-Regulatory Elements from Omics Data A vast amount of work over the past decade has shown that omics data can be used to learn cis-regulatory logic on a genome-wide scale [4-6]--in particular, by integrating sequence data with mRNA expression profiles. The most popular approach has been to identify over-represented motifs in promoters of genes that are coexpressed [4,7,8]. Though widely used, such an approach can be limiting for a variety of reasons. First, the combinatorial nature of gene regulation is difficult to explicitly model in this framework. Moreover, in many applications of this approach, expression data from multiple conditions are necessary to obtain reliable predictions. This can potentially limit the use of this method to only large data sets [9]. Although these methods can be adapted to analyze mRNA expression data from a pair of biological conditions, such comparisons are often confounded by the fact that primary and secondary response genes are clustered together--whereas only the primary response genes are expected to contain the functional motifs [10]. A set of approaches based on regression has been developed to overcome the above limitations [11-32]. These approaches have their foundations in certain biophysical aspects of gene regulation [26,33-35]. That is, the models are motivated by the expected transcriptional response of genes due to the binding of TFs to their promoters. While such methods have gathered popularity in the computational domain

  14. A set of structural features defines the cis-regulatory modules of antenna-expressed genes in Drosophila melanogaster.

    PubMed

    López, Yosvany; Vandenbon, Alexis; Nakai, Kenta

    2014-01-01

    Unraveling the biological information within the regulatory region (RR) of genes has become one of the major focuses of current genomic research. It has been hypothesized that RRs of co-expressed genes share similar architecture, but to the best of our knowledge, no studies have simultaneously examined multiple structural features, such as positioning of cis-regulatory elements relative to transcription start sites and to each other, and the order and orientation of regulatory motifs, to accurately describe overall cis-regulatory structure. In our work we present an improved computational method that builds a feature collection based on all of these structural features. We demonstrate the utility of this approach by modeling the cis-regulatory modules of antenna-expressed genes in Drosophila melanogaster. Six potential antenna-related motifs were predicted initially, including three that appeared to be novel. A feature set was created with the predicted motifs, where a correlation-based filter was used to remove irrelevant features, and a genetic algorithm was designed to optimize the feature set. Finally, a set of eight highly informative structural features was obtained for the RRs of antenna-expressed genes, achieving an area under the curve of 0.841. We used these features to score all D. melanogaster RRs for potentially unknown antenna-expressed genes sharing a similar regulatory structure. Validation of our predictions with an independent RNA sequencing dataset showed that 76.7% of genes with high scoring RRs were expressed in antenna. In addition, we found that the structural features we identified are highly conserved in RRs of orthologs in other Drosophila sibling species. This approach to identify tissue-specific regulatory structures showed comparable performance to previous approaches, but also uncovered additional interesting features because it also considered the order and orientation of motifs.

  15. Epistatic Interactions in the Arabinose Cis-Regulatory Element.

    PubMed

    Lagator, Mato; Igler, Claudia; Moreno, Anaísa B; Guet, Călin C; Bollback, Jonathan P

    2016-03-01

    Changes in gene expression are an important mode of evolution; however, the proximate mechanism of these changes is poorly understood. In particular, little is known about the effects of mutations within cis binding sites for transcription factors, or the nature of epistatic interactions between these mutations. Here, we tested the effects of single and double mutants in two cis binding sites involved in the transcriptional regulation of the Escherichia coli araBAD operon, a component of arabinose metabolism, using a synthetic system. This system decouples transcriptional control from any posttranslational effects on fitness, allowing a precise estimate of the effect of single and double mutations, and hence epistasis, on gene expression. We found that epistatic interactions between mutations in the araBAD cis-regulatory element are common, and that the predominant form of epistasis is negative. The magnitude of the interactions depended on whether the mutations are located in the same or in different operator sites. Importantly, these epistatic interactions were dependent on the presence of arabinose, a native inducer of the araBAD operon in vivo, with some interactions changing in sign (e.g., from negative to positive) in its presence. This study thus reveals that mutations in even relatively simple cis-regulatory elements interact in complex ways such that selection on the level of gene expression in one environment might perturb regulation in the other environment in an unpredictable and uncorrelated manner.

  16. Conserved cis-regulatory modules in promoters of genes encoding wheat high-molecular-weight glutenin subunits

    PubMed Central

    Ravel, Catherine; Fiquet, Samuel; Boudet, Julie; Dardevet, Mireille; Vincent, Jonathan; Merlino, Marielle; Michard, Robin; Martre, Pierre

    2014-01-01

    The concentration and composition of the gliadin and glutenin seed storage proteins (SSPs) in wheat flour are the most important determinants of its end-use value. In cereals, the synthesis of SSPs is predominantly regulated at the transcriptional level by a complex network involving at least five cis-elements in gene promoters. The high-molecular-weight glutenin subunits (HMW-GS) are encoded by two tightly linked genes located on the long arms of group 1 chromosomes. Here, we sequenced and annotated the HMW-GS gene promoters of 22 electrophoretic wheat alleles to identify putative cis-regulatory motifs. We focused on 24 motifs known to be involved in SSP gene regulation. Most of them were identified in at least one HMW-GS gene promoter sequence. A common regulatory framework was observed in all the HMW-GS gene promoters, as they shared conserved cis-regulatory modules (CCRMs) including all the five motifs known to regulate the transcription of SSP genes. This common regulatory framework comprises a composite box made of the GATA motifs and GCN4-like Motifs (GLMs) and was shown to be functional as the GLMs are able to bind a bZIP transcriptional factor SPA (Storage Protein Activator). In addition to this regulatory framework, each HMW-GS gene promoter had additional motifs organized differently. The promoters of most highly expressed x-type HMW-GS genes contain an additional box predicted to bind R2R3-MYB transcriptional factors. However, the differences in annotation between promoter alleles could not be related to their level of expression. In summary, we identified a common modular organization of HMW-GS gene promoters but the lack of correlation between the cis-motifs of each HMW-GS gene promoter and their level of expression suggests that other cis-elements or other mechanisms regulate HMW-GS gene expression. PMID:25429295

  17. Identification of tissue-specific cis-regulatory modules based on interactions between transcription factors

    PubMed Central

    Yu, Xueping; Lin, Jimmy; Zack, Donald J; Qian, Jiang

    2007-01-01

    Background Evolutionary conservation has been used successfully to help identify cis-acting DNA regions that are important in regulating tissue-specific gene expression. Motivated by increasing evidence that some DNA regulatory regions are not evolutionary conserved, we have developed an approach for cis-regulatory region identification that does not rely upon evolutionary sequence conservation. Results The conservation-independent approach is based on an empirical potential energy between interacting transcription factors (TFs). In this analysis, the potential energy is defined as a function of the number of TF interactions in a genomic region and the strength of the interactions. By identifying sets of interacting TFs, the analysis locates regions enriched with the binding sites of these interacting TFs. We applied this approach to 30 human tissues and identified 6232 putative cis-regulatory modules (CRMs) regulating 2130 tissue-specific genes. Interestingly, some genes appear to be regulated by different CRMs in different tissues. Known regulatory regions are highly enriched in our predicted CRMs. In addition, DNase I hypersensitive sites, which tend to be associated with active regulatory regions, significantly overlap with the predicted CRMs, but not with more conserved regions. We also find that conserved and non-conserved CRMs regulate distinct gene groups. Conserved CRMs control more essential genes and genes involved in fundamental cellular activities such as transcription. In contrast, non-conserved CRMs, in general, regulate more non-essential genes, such as genes related to neural activity. Conclusion These results demonstrate that identifying relevant sets of binding motifs can help in the mapping of DNA regulatory regions, and suggest that non-conserved CRMs play an important role in gene regulation. PMID:17996093

  18. Construction of cis-regulatory input functions of yeast promoters.

    PubMed

    Ratna, Prasuna; Becskei, Attila

    2011-01-01

    Promoters contain a large number of binding sites for transcriptional factors transmitting signals from a variety of cellular pathways. The promoter processes these input signals and sets the level of gene expression, the output of the gene. Here, we describe how to design genetic constructs and measure gene expression to deliver data suitable for quantitative analysis. Synthetic genetic constructs are well suited to precisely control and measure gene expression to construct cis-regulatory input functions. These functions can be used to predict gene expression based on signal intensities transmitted to activators and repressors in the gene regulatory region. Simple models of gene expression are presented for competitive and noncompetitive repressions. Complex phenomena, exemplified by synergistic silencing, are modeled by reaction-diffusion equations.

  19. Comparative genomics-based identification and analysis of cis-regulatory elements.

    PubMed

    Ogino, Hajime; Ochi, Haruki; Uchiyama, Chihiro; Louie, Sarah; Grainger, Robert M

    2012-01-01

    Identification of cis-regulatory elements, such as enhancers and promoters, is very important not only for analysis of gene regulatory networks but also as a tool for targeted gene expression experiments. In this chapter, we introduce an easy but reliable approach to predict enhancers of a gene of interest by comparing mammalian and Xenopus genome sequences, and to examine their activity using a co-transgenesis technique in Xenopus embryos. Since the bioinformatics analysis utilizes publically available web tools, bench biologists can easily perform it without any need for special computing capability. The co-transgenesis assay, which directly uses polymerase chain reaction products, quickly screens for the activity of the candidate elements in a cloning-free manner.

  20. CisMiner: Genome-Wide In-Silico Cis-Regulatory Module Prediction by Fuzzy Itemset Mining

    PubMed Central

    Navarro, Carmen; Lopez, Francisco J.; Cano, Carlos; Garcia-Alcalde, Fernando; Blanco, Armando

    2014-01-01

    Eukaryotic gene control regions are known to be spread throughout non-coding DNA sequences which may appear distant from the gene promoter. Transcription factors are proteins that coordinately bind to these regions at transcription factor binding sites to regulate gene expression. Several tools allow to detect significant co-occurrences of closely located binding sites (cis-regulatory modules, CRMs). However, these tools present at least one of the following limitations: 1) scope limited to promoter or conserved regions of the genome; 2) do not allow to identify combinations involving more than two motifs; 3) require prior information about target motifs. In this work we present CisMiner, a novel methodology to detect putative CRMs by means of a fuzzy itemset mining approach able to operate at genome-wide scale. CisMiner allows to perform a blind search of CRMs without any prior information about target CRMs nor limitation in the number of motifs. CisMiner tackles the combinatorial complexity of genome-wide cis-regulatory module extraction using a natural representation of motif combinations as itemsets and applying the Top-Down Fuzzy Frequent- Pattern Tree algorithm to identify significant itemsets. Fuzzy technology allows CisMiner to better handle the imprecision and noise inherent to regulatory processes. Results obtained for a set of well-known binding sites in the S. cerevisiae genome show that our method yields highly reliable predictions. Furthermore, CisMiner was also applied to putative in-silico predicted transcription factor binding sites to identify significant combinations in S. cerevisiae and D. melanogaster, proving that our approach can be further applied genome-wide to more complex genomes. CisMiner is freely accesible at: http://genome2.ugr.es/cisminer. CisMiner can be queried for the results presented in this work and can also perform a customized cis-regulatory module prediction on a query set of transcription factor binding sites provided by

  1. Favorable genomic environments for cis-regulatory evolution: A novel theoretical framework.

    PubMed

    Maeso, Ignacio; Tena, Juan J

    2016-09-01

    Cis-regulatory changes are arguably the primary evolutionary source of animal morphological diversity. With the recent explosion of genome-wide comparisons of the cis-regulatory content in different animal species is now possible to infer general principles underlying enhancer evolution. However, these studies have also revealed numerous discrepancies and paradoxes, suggesting that the mechanistic causes and modes of cis-regulatory evolution are still not well understood and are probably much more complex than generally appreciated. Here, we argue that the mutational mechanisms and genomic regions generating new regulatory activities must comply with the constraints imposed by the molecular properties of cis-regulatory elements (CREs) and the organizational features of long-range chromatin interactions. Accordingly, we propose a new integrative evolutionary framework for cis-regulatory evolution based on two major premises for the origin of novel enhancer activity: (i) an accessible chromatin environment and (ii) compatibility with the 3D structure and interactions of pre-existing CREs. Mechanisms and DNA sequences not fulfilling these premises, will be less likely to have a measurable impact on gene expression and as such, will have a minor contribution to the evolution of gene regulation. Finally, we discuss current comparative cis-regulatory data under the light of this new evolutionary model, and propose that the two most prominent mechanisms for the evolution of cis-regulatory changes are the overprinting of ancestral CREs and the exaptation of transposable elements.

  2. Overview Article: Identifying transcriptional cis-regulatory modules in animal genomes

    PubMed Central

    Suryamohan, Kushal; Halfon, Marc S.

    2014-01-01

    Gene expression is regulated through the activity of transcription factors and chromatin modifying proteins acting on specific DNA sequences, referred to as cis-regulatory elements. These include promoters, located at the transcription initiation sites of genes, and a variety of distal cis-regulatory modules (CRMs), the most common of which are transcriptional enhancers. Because regulated gene expression is fundamental to cell differentiation and acquisition of new cell fates, identifying, characterizing, and understanding the mechanisms of action of CRMs is critical for understanding development. CRM discovery has historically been challenging, as CRMs can be located far from the genes they regulate, have few readily-identifiable sequence characteristics, and for many years were not amenable to high-throughput discovery methods. However, the recent availability of complete genome sequences and the development of next-generation sequencing methods has led to an explosion of both computational and empirical methods for CRM discovery in model and non-model organisms alike. Experimentally, CRMs can be identified through chromatin immunoprecipitation directed against transcription factors or histone post-translational modifications, identification of nucleosome-depleted “open” chromatin regions, or sequencing-based high-throughput functional screening. Computational methods include comparative genomics, clustering of known or predicted transcription factor binding sites, and supervised machine-learning approaches trained on known CRMs. All of these methods have proven effective for CRM discovery, but each has its own considerations and limitations, and each is subject to a greater or lesser number of false-positive identifications. Experimental confirmation of predictions is essential, although shortcomings in current methods suggest that additional means of validation need to be developed. PMID:25704908

  3. Direct vs 2-stage approaches to structured motif finding

    PubMed Central

    2012-01-01

    Background The notion of DNA motif is a mathematical abstraction used to model regions of the DNA (known as Transcription Factor Binding Sites, or TFBSs) that are bound by a given Transcription Factor to regulate gene expression or repression. In turn, DNA structured motifs are a mathematical counterpart that models sets of TFBSs that work in concert in the gene regulations processes of higher eukaryotic organisms. Typically, a structured motif is composed of an ordered set of isolated (or simple) motifs, separated by a variable, but somewhat constrained number of “irrelevant” base-pairs. Discovering structured motifs in a set of DNA sequences is a computationally hard problem that has been addressed by a number of authors using either a direct approach, or via the preliminary identification and successive combination of simple motifs. Results We describe a computational tool, named SISMA, for the de-novo discovery of structured motifs in a set of DNA sequences. SISMA is an exact, enumerative algorithm, meaning that it finds all the motifs conforming to the specifications. It does so in two stages: first it discovers all the possible component simple motifs, then combines them in a way that respects the given constraints. We developed SISMA mainly with the aim of understanding the potential benefits of such a 2-stage approach w.r.t. direct methods. In fact, no 2-stage software was available for the general problem of structured motif discovery, but only a few tools that solved restricted versions of the problem. We evaluated SISMA against other published tools on a comprehensive benchmark made of both synthetic and real biological datasets. In a significant number of cases, SISMA outperformed the competitors, exhibiting a good performance also in most of the cases in which it was inferior. Conclusions A reflection on the results obtained lead us to conclude that a 2-stage approach can be implemented with many advantages over direct approaches. Some of these

  4. Characterization of a putative cis-regulatory element that controls transcriptional activity of the pig uroplakin II gene promoter

    SciTech Connect

    Kwon, Deug-Nam; Park, Mi-Ryung; Park, Jong-Yi; Cho, Ssang-Goo; Park, Chankyu; Oh, Jae-Wook; Song, Hyuk; Kim, Jae-Hwan; Kim, Jin-Hoi

    2011-07-01

    Highlights: {yields} The sequences of -604 to -84 bp of the pUPII promoter contained the region of a putative negative cis-regulatory element. {yields} The core promoter was located in the 5F-1. {yields} Transcription factor HNF4 can directly bind in the pUPII core promoter region, which plays a critical role in controlling promoter activity. {yields} These features of the pUPII promoter are fundamental to development of a target-specific vector. -- Abstract: Uroplakin II (UPII) is a one of the integral membrane proteins synthesized as a major differentiation product of mammalian urothelium. UPII gene expression is bladder specific and differentiation dependent, but little is known about its transcription response elements and molecular mechanism. To identify the cis-regulatory elements in the pig UPII (pUPII) gene promoter region, we constructed pUPII 5' upstream region deletion mutants and demonstrated that each of the deletion mutants participates in controlling the expression of the pUPII gene in human bladder carcinoma RT4 cells. We also identified a new core promoter region and putative negative cis-regulatory element within a minimal promoter region. In addition, we showed that hepatocyte nuclear factor 4 (HNF4) can directly bind in the pUPII core promoter (5F-1) region, which plays a critical role in controlling promoter activity. Transient cotransfection experiments showed that HNF4 positively regulates pUPII gene promoter activity. Thus, the binding element and its binding protein, HNF4 transcription factor, may be involved in the mechanism that specifically regulates pUPII gene transcription.

  5. BLSSpeller: exhaustive comparative discovery of conserved cis-regulatory elements

    PubMed Central

    De Witte, Dieter; Van de Velde, Jan; Decap, Dries; Van Bel, Michiel; Audenaert, Pieter; Demeester, Piet; Dhoedt, Bart; Vandepoele, Klaas; Fostier, Jan

    2015-01-01

    Motivation: The accurate discovery and annotation of regulatory elements remains a challenging problem. The growing number of sequenced genomes creates new opportunities for comparative approaches to motif discovery. Putative binding sites are then considered to be functional if they are conserved in orthologous promoter sequences of multiple related species. Existing methods for comparative motif discovery usually rely on pregenerated multiple sequence alignments, which are difficult to obtain for more diverged species such as plants. As a consequence, misaligned regulatory elements often remain undetected. Results: We present a novel algorithm that supports both alignment-free and alignment-based motif discovery in the promoter sequences of related species. Putative motifs are exhaustively enumerated as words over the IUPAC alphabet and screened for conservation using the branch length score. Additionally, a confidence score is established in a genome-wide fashion. In order to take advantage of a cloud computing infrastructure, the MapReduce programming model is adopted. The method is applied to four monocotyledon plant species and it is shown that high-scoring motifs are significantly enriched for open chromatin regions in Oryza sativa and for transcription factor binding sites inferred through protein-binding microarrays in O.sativa and Zea mays. Furthermore, the method is shown to recover experimentally profiled ga2ox1-like KN1 binding sites in Z.mays. Availability and implementation: BLSSpeller was written in Java. Source code and manual are available at http://bioinformatics.intec.ugent.be/blsspeller Contact: Klaas.Vandepoele@psb.vib-ugent.be or jan.fostier@intec.ugent.be Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26254488

  6. Complex interactions between cis-regulatory modules in native conformation are critical for Drosophila snail expression.

    PubMed

    Dunipace, Leslie; Ozdemir, Anil; Stathopoulos, Angelike

    2011-09-01

    It has been shown in several organisms that multiple cis-regulatory modules (CRMs) of a gene locus can be active concurrently to support similar spatiotemporal expression. To understand the functional importance of such seemingly redundant CRMs, we examined two CRMs from the Drosophila snail gene locus, which are both active in the ventral region of pre-gastrulation embryos. By performing a deletion series in a ∼25 kb DNA rescue construct using BAC recombineering and site-directed transgenesis, we demonstrate that the two CRMs are not redundant. The distal CRM is absolutely required for viability, whereas the proximal CRM is required only under extreme conditions such as high temperature. Consistent with their distinct requirements, the CRMs support distinct expression patterns: the proximal CRM exhibits an expanded expression domain relative to endogenous snail, whereas the distal CRM exhibits almost complete overlap with snail except at the anterior-most pole. We further show that the distal CRM normally limits the increased expression domain of the proximal CRM and that the proximal CRM serves as a `damper' for the expression levels driven by the distal CRM. Thus, the two CRMs interact in cis in a non-additive fashion and these interactions may be important for fine-tuning the domains and levels of gene expression. PMID:21813571

  7. Complex interactions between cis-regulatory modules in native conformation are critical for Drosophila snail expression

    PubMed Central

    Dunipace, Leslie; Ozdemir, Anil; Stathopoulos, Angelike

    2011-01-01

    It has been shown in several organisms that multiple cis-regulatory modules (CRMs) of a gene locus can be active concurrently to support similar spatiotemporal expression. To understand the functional importance of such seemingly redundant CRMs, we examined two CRMs from the Drosophila snail gene locus, which are both active in the ventral region of pre-gastrulation embryos. By performing a deletion series in a ∼25 kb DNA rescue construct using BAC recombineering and site-directed transgenesis, we demonstrate that the two CRMs are not redundant. The distal CRM is absolutely required for viability, whereas the proximal CRM is required only under extreme conditions such as high temperature. Consistent with their distinct requirements, the CRMs support distinct expression patterns: the proximal CRM exhibits an expanded expression domain relative to endogenous snail, whereas the distal CRM exhibits almost complete overlap with snail except at the anterior-most pole. We further show that the distal CRM normally limits the increased expression domain of the proximal CRM and that the proximal CRM serves as a `damper' for the expression levels driven by the distal CRM. Thus, the two CRMs interact in cis in a non-additive fashion and these interactions may be important for fine-tuning the domains and levels of gene expression. PMID:21813571

  8. Barcoded DNA-Tag Reporters for Multiplex Cis-Regulatory Analysis

    PubMed Central

    Nam, Jongmin; Davidson, Eric H.

    2012-01-01

    Cis-regulatory DNA sequences causally mediate patterns of gene expression, but efficient experimental analysis of these control systems has remained challenging. Here we develop a new version of “barcoded" DNA-tag reporters, “Nanotags" that permit simultaneous quantitative analysis of up to 130 distinct cis-regulatory modules (CRMs). The activities of these reporters are measured in single experiments by the NanoString RNA counting method and other quantitative procedures. We demonstrate the efficiency of the Nanotag method by simultaneously measuring hourly temporal activities of 126 CRMs from 46 genes in the developing sea urchin embryo, otherwise a virtually impossible task. Nanotags are also used in gene perturbation experiments to reveal cis-regulatory responses of many CRMs at once. Nanotag methodology can be applied to many research areas, ranging from gene regulatory networks to functional and evolutionary genomics. PMID:22563420

  9. Identification of cis-regulatory sequence variations in individual genome sequences.

    PubMed

    Worsley-Hunt, Rebecca; Bernard, Virginie; Wasserman, Wyeth W

    2011-01-01

    Functional contributions of cis-regulatory sequence variations to human genetic disease are numerous. For instance, disrupting variations in a HNF4A transcription factor binding site upstream of the Factor IX gene contributes causally to hemophilia B Leyden. Although clinical genome sequence analysis currently focuses on the identification of protein-altering variation, the impact of cis-regulatory mutations can be similarly strong. New technologies are now enabling genome sequencing beyond exomes, revealing variation across the non-coding 98% of the genome responsible for developmental and physiological patterns of gene activity. The capacity to identify causal regulatory mutations is improving, but predicting functional changes in regulatory DNA sequences remains a great challenge. Here we explore the existing methods and software for prediction of functional variation situated in the cis-regulatory sequences governing gene transcription and RNA processing.

  10. Identification of cis-regulatory sequence variations in individual genome sequences

    PubMed Central

    2011-01-01

    Functional contributions of cis-regulatory sequence variations to human genetic disease are numerous. For instance, disrupting variations in a HNF4A transcription factor binding site upstream of the Factor IX gene contributes causally to hemophilia B Leyden. Although clinical genome sequence analysis currently focuses on the identification of protein-altering variation, the impact of cis-regulatory mutations can be similarly strong. New technologies are now enabling genome sequencing beyond exomes, revealing variation across the non-coding 98% of the genome responsible for developmental and physiological patterns of gene activity. The capacity to identify causal regulatory mutations is improving, but predicting functional changes in regulatory DNA sequences remains a great challenge. Here we explore the existing methods and software for prediction of functional variation situated in the cis-regulatory sequences governing gene transcription and RNA processing. PMID:21989199

  11. A GATA/RUNX cis-regulatory module couples Drosophila blood cell commitment and differentiation into crystal cells.

    PubMed

    Ferjoux, Géraldine; Augé, Benoit; Boyer, Karène; Haenlin, Marc; Waltzer, Lucas

    2007-05-15

    Members of the RUNX and GATA transcription factor families play critical roles during hematopoiesis from Drosophila to mammals. In Drosophila, the formation of the crystal cell hematopoietic lineage depends on the continuous expression of the lineage-specific RUNX factor Lozenge (Lz) and on its interaction with the GATA factor Serpent (Srp). Crystal cells are the main source of prophenoloxidases (proPOs), the enzymes required for melanization. By analyzing the promoter regions of several insect proPOs, we identify a conserved GATA/RUNX cis-regulatory module that ensures the crystal cell-specific expression of the three Drosophila melanogaster proPO. We demonstrate that activation of this module requires the direct binding of both Srp and Lz. Interestingly, a similar GATA/RUNX signature is over-represented in crystal cell differentiation markers, allowing us to identify new Srp/Lz target genes by genome-wide screening of Drosophila promoter regions. Finally, we show that the expression of lz in the crystal cells also relies on Srp/Lz-mediated activation via a similar module, indicating that crystal cell fate choice maintenance and activation of the differentiation program are coupled. Based on our observations, we propose that this GATA/RUNX cis-regulatory module may be reiteratively used during hematopoietic development through evolution.

  12. cis-Regulatory Mutations Are a Genetic Cause of Human Limb Malformations

    PubMed Central

    VanderMeer, Julia E.; Ahituv, Nadav

    2011-01-01

    The underlying mutations that cause human limb malformations are often difficult to determine, particularly for limb malformations that occur as isolated traits. Evidence from a variety of studies shows that cis-regulatory mutations, specifically in enhancers, can lead to some of these isolated limb malformations. Here, we provide a review of human limb malformations that have been shown to be caused by enhancer mutations and propose that cis-regulatory mutations will continue to be identified as the cause of additional human malformations as our understanding of regulatory sequences improves. PMID:21509892

  13. Cis-regulatory mechanisms governing stem and progenitor cell transitions

    PubMed Central

    Johnson, Kirby D.; Kong, Guangyao; Gao, Xin; Chang, Yuan-I; Hewitt, Kyle J.; Sanalkumar, Rajendran; Prathibha, Rajalekshmi; Ranheim, Erik A.; Dewey, Colin N.; Zhang, Jing; Bresnick, Emery H.

    2015-01-01

    Cis-element encyclopedias provide information on phenotypic diversity and disease mechanisms. Although cis-element polymorphisms and mutations are instructive, deciphering function remains challenging. Mutation of an intronic GATA motif (+9.5) in GATA2, encoding a master regulator of hematopoiesis, underlies an immunodeficiency associated with myelodysplastic syndrome (MDS) and acute myeloid leukemia (AML). Whereas an inversion relocalizes another GATA2 cis-element (−77) to the proto-oncogene EVI1, inducing EVI1 expression and AML, whether this reflects ectopic or physiological activity is unknown. We describe a mouse strain that decouples −77 function from proto-oncogene deregulation. The −77−/− mice exhibited a novel phenotypic constellation including late embryonic lethality and anemia. The −77 established a vital sector of the myeloid progenitor transcriptome, conferring multipotentiality. Unlike the +9.5−/− embryos, hematopoietic stem cell genesis was unaffected in −77−/− embryos. These results illustrate a paradigm in which cis-elements in a locus differentially control stem and progenitor cell transitions, and therefore the individual cis-element alterations cause unique and overlapping disease phenotypes. PMID:26601269

  14. Complex patterns of cis-regulatory polymorphisms in ebony underlie standing pigmentation variation in Drosophila melanogaster.

    PubMed

    Miyagi, Ryutaro; Akiyama, Noriyoshi; Osada, Naoki; Takahashi, Aya

    2015-12-01

    Pigmentation traits in adult Drosophila melanogaster were used in this study to investigate how phenotypic variations in continuous ecological traits can be maintained in a natural population. First, pigmentation variation in the adult female was measured at seven different body positions in 20 strains from the Drosophila melanogaster Genetic Reference Panel (DGRP) originating from a natural population in North Carolina. Next, to assess the contributions of cis-regulatory polymorphisms of the genes involved in the melanin biosynthesis pathway, allele-specific expression levels of four genes were quantified by amplicon sequencing using a 454 GS Junior. Among those genes, ebony was significantly associated with pigmentation intensity of the thoracic segment. Detailed sequence analysis of the gene regulatory regions of this gene indicated that many different functional cis-regulatory alleles are segregating in the population and that variations outside the core enhancer element could potentially play important roles in the regulation of gene expression. In addition, a slight enrichment of distantly associated SNP pairs was observed in the ~10 kb cis-regulatory region of ebony, which suggested the presence of interacting elements scattered across the region. In contrast, sequence analysis in the core cis-regulatory region of tan indicated that SNPs within the region are significantly associated with allele-specific expression level of this gene. Collectively, the data suggest that the underlying genetic differences in the cis-regulatory regions that control intraspecific pigmentation variation can be more complex than those of interspecific pigmentation trait differences, where causal genetic changes are typically confined to modular enhancer elements.

  15. Functional evolution of cis-regulatory modules at a homeotic gene in Drosophila.

    PubMed

    Ho, Margaret C W; Johnsen, Holly; Goetz, Sara E; Schiller, Benjamin J; Bae, Esther; Tran, Diana A; Shur, Andrey S; Allen, John M; Rau, Christoph; Bender, Welcome; Fisher, William W; Celniker, Susan E; Drewell, Robert A

    2009-11-01

    It is a long-held belief in evolutionary biology that the rate of molecular evolution for a given DNA sequence is inversely related to the level of functional constraint. This belief holds true for the protein-coding homeotic (Hox) genes originally discovered in Drosophila melanogaster. Expression of the Hox genes in Drosophila embryos is essential for body patterning and is controlled by an extensive array of cis-regulatory modules (CRMs). How the regulatory modules functionally evolve in different species is not clear. A comparison of the CRMs for the Abdominal-B gene from different Drosophila species reveals relatively low levels of overall sequence conservation. However, embryonic enhancer CRMs from other Drosophila species direct transgenic reporter gene expression in the same spatial and temporal patterns during development as their D. melanogaster orthologs. Bioinformatic analysis reveals the presence of short conserved sequences within defined CRMs, representing gap and pair-rule transcription factor binding sites. One predicted binding site for the gap transcription factor KRUPPEL in the IAB5 CRM was found to be altered in Superabdominal (Sab) mutations. In Sab mutant flies, the third abdominal segment is transformed into a copy of the fifth abdominal segment. A model for KRUPPEL-mediated repression at this binding site is presented. These findings challenge our current understanding of the relationship between sequence evolution at the molecular level and functional activity of a CRM. While the overall sequence conservation at Drosophila CRMs is not distinctive from neighboring genomic regions, functionally critical transcription factor binding sites within embryonic enhancer CRMs are highly conserved. These results have implications for understanding mechanisms of gene expression during embryonic development, enhancer function, and the molecular evolution of eukaryotic regulatory modules.

  16. Recurrent Modification of a Conserved Cis-Regulatory Element Underlies Fruit Fly Pigmentation Diversity

    PubMed Central

    Rogers, William A.; Salomone, Joseph R.; Tacy, David J.; Camino, Eric M.; Davis, Kristen A.; Rebeiz, Mark; Williams, Thomas M.

    2013-01-01

    The development of morphological traits occurs through the collective action of networks of genes connected at the level of gene expression. As any node in a network may be a target of evolutionary change, the recurrent targeting of the same node would indicate that the path of evolution is biased for the relevant trait and network. Although examples of parallel evolution have implicated recurrent modification of the same gene and cis-regulatory element (CRE), little is known about the mutational and molecular paths of parallel CRE evolution. In Drosophila melanogaster fruit flies, the Bric-à-brac (Bab) transcription factors control the development of a suite of sexually dimorphic traits on the posterior abdomen. Female-specific Bab expression is regulated by the dimorphic element, a CRE that possesses direct inputs from body plan (ABD-B) and sex-determination (DSX) transcription factors. Here, we find that the recurrent evolutionary modification of this CRE underlies both intraspecific and interspecific variation in female pigmentation in the melanogaster species group. By reconstructing the sequence and regulatory activity of the ancestral Drosophila melanogaster dimorphic element, we demonstrate that a handful of mutations were sufficient to create independent CRE alleles with differing activities. Moreover, intraspecific and interspecific dimorphic element evolution proceeded with little to no alterations to the known body plan and sex-determination regulatory linkages. Collectively, our findings represent an example where the paths of evolution appear biased to a specific CRE, and drastic changes in function were accompanied by deep conservation of key regulatory linkages. PMID:24009528

  17. Experimental approaches to evaluate the contributions of candidate cis-regulatory mutations to phenotypic evolution.

    PubMed

    Rebeiz, Mark; Williams, Thomas M

    2011-01-01

    Elucidating the molecular bases by which phenotypic traits have evolved provides a glimpse into the past, allowing the characterization of genetic changes that cumulatively contribute to evolutionary innovations. Historically, much of the experimental attention has been focused on changes in protein-coding regions that can readily be identified by the genetic code for translating gene coding sequences into proteins. Resultantly, the role of noncoding sequences in trait evolution has remained more mysterious. In recent years, several studies have reached an unprecedented level of detail in describing how noncoding mutations in gene cis-regulatory elements contribute to morphological evolution. Based on these and other studies, we describe an experimental framework and some of the genetic and molecular methods to connect a particular cis-regulatory mutation to the evolution of any phenotypic trait. PMID:22065449

  18. MyoD reprogramming requires Six1 and Six4 homeoproteins: genome-wide cis-regulatory module analysis

    PubMed Central

    Santolini, Marc; Sakakibara, Iori; Gauthier, Morgane; Ribas-Aulinas, Francesc; Takahashi, Hirotaka; Sawasaki, Tatsuya; Mouly, Vincent; Concordet, Jean-Paul; Defossez, Pierre-Antoine; Hakim, Vincent; Maire, Pascal

    2016-01-01

    Myogenic regulatory factors of the MyoD family have the ability to reprogram differentiated cells toward a myogenic fate. In this study, we demonstrate that Six1 or Six4 are required for the reprogramming by MyoD of mouse embryonic fibroblasts (MEFs). Using microarray experiments, we found 761 genes under the control of both Six and MyoD. Using MyoD ChIPseq data and a genome-wide search for Six1/4 MEF3 binding sites, we found significant co-localization of binding sites for MyoD and Six proteins on over a thousand mouse genomic DNA regions. The combination of both datasets yielded 82 genes which are synergistically activated by Six and MyoD, with 96 associated MyoD+MEF3 putative cis-regulatory modules (CRMs). Fourteen out of 19 of the CRMs that we tested demonstrated in Luciferase assays a synergistic action also observed for their cognate gene. We searched putative binding sites on these CRMs using available databases and de novo search of conserved motifs and demonstrated that the Six/MyoD synergistic activation takes place in a feedforward way. It involves the recruitment of these two families of transcription factors to their targets, together with partner transcription factors, encoded by genes that are themselves activated by Six and MyoD, including Mef2, Pbx-Meis and EBF. PMID:27302134

  19. Evolved tooth gain in sticklebacks is associated with a cis-regulatory allele of Bmp6

    PubMed Central

    Cleves, Phillip A.; Ellis, Nicholas A.; Jimenez, Monica T.; Nunez, Stephanie M.; Schluter, Dolph; Kingsley, David M.; Miller, Craig T.

    2014-01-01

    Developmental genetic studies of evolved differences in morphology have led to the hypothesis that cis-regulatory changes often underlie morphological evolution. However, because most of these studies focus on evolved loss of traits, the genetic architecture and possible association with cis-regulatory changes of gain traits are less understood. Here we show that a derived benthic freshwater stickleback population has evolved an approximate twofold gain in ventral pharyngeal tooth number compared with their ancestral marine counterparts. Comparing laboratory-reared developmental time courses of a low-toothed marine population and this high-toothed benthic population reveals that increases in tooth number and tooth plate area and decreases in tooth spacing arise at late juvenile stages. Genome-wide linkage mapping identifies largely separate sets of quantitative trait loci affecting different aspects of dental patterning. One large-effect quantitative trait locus controlling tooth number fine-maps to a genomic region containing an excellent candidate gene, Bone morphogenetic protein 6 (Bmp6). Stickleback Bmp6 is expressed in developing teeth, and no coding changes are found between the high- and low-toothed populations. However, quantitative allele-specific expression assays of Bmp6 in developing teeth in F1 hybrids show that cis-regulatory changes have elevated the relative expression level of the freshwater benthic Bmp6 allele at late, but not early, stages of stickleback development. Collectively, our data support a model where a late-acting cis-regulatory up-regulation of Bmp6 expression underlies a significant increase in tooth number in derived benthic sticklebacks. PMID:25205810

  20. Cis-Regulatory Changes Associated with a Recent Mating System Shift and Floral Adaptation in Capsella

    PubMed Central

    Steige, Kim A.; Reimegård, Johan; Koenig, Daniel; Scofield, Douglas G.; Slotte, Tanja

    2015-01-01

    The selfing syndrome constitutes a suite of floral and reproductive trait changes that have evolved repeatedly across many evolutionary lineages in response to the shift to selfing. Convergent evolution of the selfing syndrome suggests that these changes are adaptive, yet our understanding of the detailed molecular genetic basis of the selfing syndrome remains limited. Here, we investigate the role of cis-regulatory changes during the recent evolution of the selfing syndrome in Capsella rubella, which split from the outcrosser Capsella grandiflora less than 200 ka. We assess allele-specific expression (ASE) in leaves and flower buds at a total of 18,452 genes in three interspecific F1 C. grandiflora x C. rubella hybrids. Using a hierarchical Bayesian approach that accounts for technical variation using genomic reads, we find evidence for extensive cis-regulatory changes. On average, 44% of the assayed genes show evidence of ASE; however, only 6% show strong allelic expression biases. Flower buds, but not leaves, show an enrichment of cis-regulatory changes in genomic regions responsible for floral and reproductive trait divergence between C. rubella and C. grandiflora. We further detected an excess of heterozygous transposable element (TE) insertions near genes with ASE, and TE insertions targeted by uniquely mapping 24-nt small RNAs were associated with reduced expression of nearby genes. Our results suggest that cis-regulatory changes have been important during the recent adaptive floral evolution in Capsella and that differences in TE dynamics between selfing and outcrossing species could be important for rapid regulatory divergence in association with mating system shifts. PMID:26318184

  1. MORPH: probabilistic alignment combined with hidden Markov models of cis-regulatory modules.

    PubMed

    Sinha, Saurabh; He, Xin

    2007-11-01

    The discovery and analysis of cis-regulatory modules (CRMs) in metazoan genomes is crucial for understanding the transcriptional control of development and many other biological processes. Cross-species sequence comparison holds much promise for improving computational prediction of CRMs, for elucidating their binding site composition, and for understanding how they evolve. Current methods for analyzing orthologous CRMs from multiple species rely upon sequence alignments produced by off-the-shelf alignment algorithms, which do not exploit the presence of binding sites in the sequences. We present here a unified probabilistic framework, called MORPH, that integrates the alignment task with binding site predictions, allowing more robust CRM analysis in two species. The framework sums over all possible alignments of two sequences, thus accounting for alignment ambiguities in a natural way. We perform extensive tests on orthologous CRMs from two moderately diverged species Drosophila melanogaster and D. mojavensis, to demonstrate the advantages of the new approach. We show that it can overcome certain computational artifacts of traditional alignment tools and provide a different, likely more accurate, picture of cis-regulatory evolution than that obtained from existing methods. The burgeoning field of cis-regulatory evolution, which is amply supported by the availability of many related genomes, is currently thwarted by the lack of accurate alignments of regulatory regions. Our work will fill in this void and enable more reliable analysis of CRM evolution.

  2. The identification of cis-regulatory elements: A review from a machine learning perspective.

    PubMed

    Li, Yifeng; Chen, Chih-Yu; Kaye, Alice M; Wasserman, Wyeth W

    2015-12-01

    The majority of the human genome consists of non-coding regions that have been called junk DNA. However, recent studies have unveiled that these regions contain cis-regulatory elements, such as promoters, enhancers, silencers, insulators, etc. These regulatory elements can play crucial roles in controlling gene expressions in specific cell types, conditions, and developmental stages. Disruption to these regions could contribute to phenotype changes. Precisely identifying regulatory elements is key to deciphering the mechanisms underlying transcriptional regulation. Cis-regulatory events are complex processes that involve chromatin accessibility, transcription factor binding, DNA methylation, histone modifications, and the interactions between them. The development of next-generation sequencing techniques has allowed us to capture these genomic features in depth. Applied analysis of genome sequences for clinical genetics has increased the urgency for detecting these regions. However, the complexity of cis-regulatory events and the deluge of sequencing data require accurate and efficient computational approaches, in particular, machine learning techniques. In this review, we describe machine learning approaches for predicting transcription factor binding sites, enhancers, and promoters, primarily driven by next-generation sequencing data. Data sources are provided in order to facilitate testing of novel methods. The purpose of this review is to attract computational experts and data scientists to advance this field.

  3. MORPH: probabilistic alignment combined with hidden Markov models of cis-regulatory modules.

    PubMed

    Sinha, Saurabh; He, Xin

    2007-11-01

    The discovery and analysis of cis-regulatory modules (CRMs) in metazoan genomes is crucial for understanding the transcriptional control of development and many other biological processes. Cross-species sequence comparison holds much promise for improving computational prediction of CRMs, for elucidating their binding site composition, and for understanding how they evolve. Current methods for analyzing orthologous CRMs from multiple species rely upon sequence alignments produced by off-the-shelf alignment algorithms, which do not exploit the presence of binding sites in the sequences. We present here a unified probabilistic framework, called MORPH, that integrates the alignment task with binding site predictions, allowing more robust CRM analysis in two species. The framework sums over all possible alignments of two sequences, thus accounting for alignment ambiguities in a natural way. We perform extensive tests on orthologous CRMs from two moderately diverged species Drosophila melanogaster and D. mojavensis, to demonstrate the advantages of the new approach. We show that it can overcome certain computational artifacts of traditional alignment tools and provide a different, likely more accurate, picture of cis-regulatory evolution than that obtained from existing methods. The burgeoning field of cis-regulatory evolution, which is amply supported by the availability of many related genomes, is currently thwarted by the lack of accurate alignments of regulatory regions. Our work will fill in this void and enable more reliable analysis of CRM evolution. PMID:17997594

  4. The identification of cis-regulatory elements: A review from a machine learning perspective.

    PubMed

    Li, Yifeng; Chen, Chih-Yu; Kaye, Alice M; Wasserman, Wyeth W

    2015-12-01

    The majority of the human genome consists of non-coding regions that have been called junk DNA. However, recent studies have unveiled that these regions contain cis-regulatory elements, such as promoters, enhancers, silencers, insulators, etc. These regulatory elements can play crucial roles in controlling gene expressions in specific cell types, conditions, and developmental stages. Disruption to these regions could contribute to phenotype changes. Precisely identifying regulatory elements is key to deciphering the mechanisms underlying transcriptional regulation. Cis-regulatory events are complex processes that involve chromatin accessibility, transcription factor binding, DNA methylation, histone modifications, and the interactions between them. The development of next-generation sequencing techniques has allowed us to capture these genomic features in depth. Applied analysis of genome sequences for clinical genetics has increased the urgency for detecting these regions. However, the complexity of cis-regulatory events and the deluge of sequencing data require accurate and efficient computational approaches, in particular, machine learning techniques. In this review, we describe machine learning approaches for predicting transcription factor binding sites, enhancers, and promoters, primarily driven by next-generation sequencing data. Data sources are provided in order to facilitate testing of novel methods. The purpose of this review is to attract computational experts and data scientists to advance this field. PMID:26499213

  5. Creating and validating cis-regulatory maps of tissue-specific gene expression regulation

    PubMed Central

    O'Connor, Timothy R.; Bailey, Timothy L.

    2014-01-01

    Predicting which genomic regions control the transcription of a given gene is a challenge. We present a novel computational approach for creating and validating maps that associate genomic regions (cis-regulatory modules–CRMs) with genes. The method infers regulatory relationships that explain gene expression observed in a test tissue using widely available genomic data for ‘other’ tissues. To predict the regulatory targets of a CRM, we use cross-tissue correlation between histone modifications present at the CRM and expression at genes within 1 Mbp of it. To validate cis-regulatory maps, we show that they yield more accurate models of gene expression than carefully constructed control maps. These gene expression models predict observed gene expression from transcription factor binding in the CRMs linked to that gene. We show that our maps are able to identify long-range regulatory interactions and improve substantially over maps linking genes and CRMs based on either the control maps or a ‘nearest neighbor’ heuristic. Our results also show that it is essential to include CRMs predicted in multiple tissues during map-building, that H3K27ac is the most informative histone modification, and that CAGE is the most informative measure of gene expression for creating cis-regulatory maps. PMID:25200088

  6. Evolutionary forces act on promoter length: identification of enriched cis-regulatory elements.

    PubMed

    Kristiansson, Erik; Thorsen, Michael; Tamás, Markus J; Nerman, Olle

    2009-06-01

    Transcription factors govern gene expression by binding to short DNA sequences called cis-regulatory elements. These sequences are typically located in promoters, which are regions of variable length upstream of the open reading frames of genes. Here, we report that promoter length and gene function are related in yeast, fungi, and plants. In particular, the promoters for stress-responsive genes are in general longer than those of other genes. Essential genes have, on the other hand, relatively short promoters. We utilize these findings in a novel method for identifying relevant cis-regulatory elements in a set of coexpressed genes. The method is shown to generate more accurate results and fewer false positives compared with other common procedures. Our results suggest that genes with complex transcriptional regulation tend to have longer promoters than genes responding to few signals. This phenomenon is present in all investigated species, indicating that evolution adjust promoter length according to gene function. Identification of cis-regulatory elements in Saccharomyces cerevisiae can be done with the web service located at http://enricher.zool.gu.se.

  7. Predominant contribution of cis-regulatory divergence in the evolution of mouse alternative splicing

    PubMed Central

    Gao, Qingsong; Sun, Wei; Ballegeer, Marlies; Libert, Claude; Chen, Wei

    2015-01-01

    Divergence of alternative splicing represents one of the major driving forces to shape phenotypic diversity during evolution. However, the extent to which these divergences could be explained by the evolving cis-regulatory versus trans-acting factors remains unresolved. To globally investigate the relative contributions of the two factors for the first time in mammals, we measured splicing difference between C57BL/6J and SPRET/EiJ mouse strains and allele-specific splicing pattern in their F1 hybrid. Out of 11,818 alternative splicing events expressed in the cultured fibroblast cells, we identified 796 with significant difference between the parental strains. After integrating allele-specific data from F1 hybrid, we demonstrated that these events could be predominately attributed to cis-regulatory variants, including those residing at and beyond canonical splicing sites. Contrary to previous observations in Drosophila, such predominant contribution was consistently observed across different types of alternative splicing. Further analysis of liver tissues from the same mouse strains and reanalysis of published datasets on other strains showed similar trends, implying in general the predominant contribution of cis-regulatory changes in the evolution of mouse alternative splicing. PMID:26134616

  8. Rapid evolution of cis-regulatory sequences via local point mutations

    NASA Technical Reports Server (NTRS)

    Stone, J. R.; Wray, G. A.

    2001-01-01

    Although the evolution of protein-coding sequences within genomes is well understood, the same cannot be said of the cis-regulatory regions that control transcription. Yet, changes in gene expression are likely to constitute an important component of phenotypic evolution. We simulated the evolution of new transcription factor binding sites via local point mutations. The results indicate that new binding sites appear and become fixed within populations on microevolutionary timescales under an assumption of neutral evolution. Even combinations of two new binding sites evolve very quickly. We predict that local point mutations continually generate considerable genetic variation that is capable of altering gene expression.

  9. Variation in Vertebrate Cis-Regulatory Elements in Evolution and Disease

    PubMed Central

    Douglas, Adam Thomas; Hill, Robert E

    2014-01-01

    Much of the genetic information that drives animal diversity lies within the vast non-coding regions of the genome. Multi-species sequence conservation in non-coding regions of the genome flags important regulatory elements and more recently, techniques that look for functional signatures predicted for regulatory sequences have added to the identification of thousands more. For some time, biologists have argued that changes in cis-regulatory sequences creates the basic genetic framework for evolutionary change. Recent advances support this notion and show that there is extensive genomic variability in non-coding regulatory elements associated with trait variation, speciation and disease. PMID:25764334

  10. Cryptic variation in vulva development by cis-regulatory evolution of a HAIRY-binding site.

    PubMed

    Kienle, Simone; Sommer, Ralf J

    2013-01-01

    Robustness to mutations is a general principle of biological systems that allows for the accumulation of cryptic variation. However, little is known about robustness and cryptic variation in core developmental pathways. Here we show through gonad-ablation screens in natural isolates of Pristionchus pacificus cryptic variation in nematode vulva development. This variation is mainly caused by cis-regulatory evolution in the conserved Notch ligand apx-1/Delta and involves binding sites for the transcription factor HAIRY. In some isolates, including a Bolivian strain, absence of a HAIRY-binding site results in Ppa-apx-1 expression in the vulva precursor cell P6.p and causes gonad-independent vulva differentiation. In contrast, a Californian strain that gained a HAIRY-binding site lacks Ppa-apx-1 vulval expression and shows gonad-dependence of vulva development. Addition of this HAIRY-binding site to the Bolivian Ppa-apx-1 promoter eliminates expression in the vulva. Our findings indicate significant cis-regulatory evolution in a core developmental pathway leading to intraspecific cryptic variation.

  11. Conservation and evolution of cis-regulatory systems in ascomycete fungi

    SciTech Connect

    Gasch, Audrey P.; Moses, Alan M.; Chiang, Derek Y.; Fraser, Hunter B.; Berardini, Mark; Eisen, Michael B.

    2004-03-15

    Relatively little is known about the mechanisms through which gene expression regulation evolves. To investigate this, we systematically explored the conservation of regulatory networks in fungi by examining the cis-regulatory elements that govern the expression of coregulated genes. We first identified groups of coregulated Saccharomyces cerevisiae genes enriched for genes with known upstream or downstream cis-regulatory sequences. Reasoning that many of these gene groups are coregulated in related species as well, we performed similar analyses on orthologs of coregulated S. cerevisiae genes in 13 other ascomycete species. We find that many species-specific gene groups are enriched for the same flanking regulatory sequences as those found in the orthologous gene groups from S. cerevisiae, indicating that those regulatory systems have been conserved in multiple ascomycete species. In addition to these clear cases of regulatory conservation, we find examples of cis-element evolution that suggest multiple modes of regulatory diversification, including alterations in transcription factor-binding specificity, incorporation of new gene targets into an existing regulatory system, and cooption of regulatory systems to control a different set of genes. We investigated one example in greater detail by measuring the in vitro activity of the S. cerevisiae transcription factor Rpn4p and its orthologs from Candida albicans and Neurospora crassa. Our results suggest that the DNA binding specificity of these proteins has coevolved with the sequences found upstream of the Rpn4p target genes and suggest that Rpn4p has a different function in N. crassa.

  12. Functionally conserved cis-regulatory elements of COL18A1 identified through zebrafish transgenesis.

    PubMed

    Kague, Erika; Bessling, Seneca L; Lee, Josephine; Hu, Gui; Passos-Bueno, Maria Rita; Fisher, Shannon

    2010-01-15

    Type XVIII collagen is a component of basement membranes, and expressed prominently in the eye, blood vessels, liver, and the central nervous system. Homozygous mutations in COL18A1 lead to Knobloch Syndrome, characterized by ocular defects and occipital encephalocele. However, relatively little has been described on the role of type XVIII collagen in development, and nothing is known about the regulation of its tissue-specific expression pattern. We have used zebrafish transgenesis to identify and characterize cis-regulatory sequences controlling expression of the human gene. Candidate enhancers were selected from non-coding sequence associated with COL18A1 based on sequence conservation among mammals. Although these displayed no overt conservation with orthologous zebrafish sequences, four regions nonetheless acted as tissue-specific transcriptional enhancers in the zebrafish embryo, and together recapitulated the major aspects of col18a1 expression. Additional post-hoc computational analysis on positive enhancer sequences revealed alignments between mammalian and teleost sequences, which we hypothesize predict the corresponding zebrafish enhancers; for one of these, we demonstrate functional overlap with the orthologous human enhancer sequence. Our results provide important insight into the biological function and regulation of COL18A1, and point to additional sequences that may contribute to complex diseases involving COL18A1. More generally, we show that combining functional data with targeted analyses for phylogenetic conservation can reveal conserved cis-regulatory elements in the large number of cases where computational alignment alone falls short.

  13. Search for enhancers: teleost models in comparative genomic and transgenic analysis of cis regulatory elements.

    PubMed

    Müller, Ferenc; Blader, Patrick; Strähle, Uwe

    2002-06-01

    Homology searches between DNA sequences of evolutionary distant species (phylogenetic footprinting) offer a fast detection method for regulatory sequences. Because of the small size of their genomes, tetraodontid species such as the Japanese pufferfish and green spotted pufferfish have become attractive models for comparative genomics. A disadvantage of the tetraodontid species is, however, that they cannot be bred and manipulated routinely under laboratory conditions, so these species are less attractive for developmental and genetic analysis. In contrast, an increasing arsenal of transgene techniques with the developmental model species zebrafish and medaka are being used for functional analysis of cis regulatory sequences. The main disadvantage is the much larger genome. While comparison between many loci proved the suitability of phylogenetic footprinting using fish and mammalian sequences, fast rate of change in enhancer structure and gene duplication within teleosts may obscure detection of homologies. Here we discuss the contribution and potentials provided by different teleost models for the detection and functional analysis of conserved cis-regulatory elements. PMID:12111739

  14. Profiling of conserved non-coding elements upstream of SHOX and functional characterisation of the SHOX cis-regulatory landscape

    PubMed Central

    Verdin, Hannah; Fernández-Miñán, Ana; Benito-Sanz, Sara; Janssens, Sandra; Callewaert, Bert; Waele, Kathleen De; Schepper, Jean De; François, Inge; Menten, Björn; Heath, Karen E.; Gómez-Skarmeta, José Luis; Baere, Elfride De

    2015-01-01

    Genetic defects such as copy number variations (CNVs) in non-coding regions containing conserved non-coding elements (CNEs) outside the transcription unit of their target gene, can underlie genetic disease. An example of this is the short stature homeobox (SHOX) gene, regulated by seven CNEs located downstream and upstream of SHOX, with proven enhancer capacity in chicken limbs. CNVs of the downstream CNEs have been reported in many idiopathic short stature (ISS) cases, however, only recently have a few CNVs of the upstream enhancers been identified. Here, we set out to provide insight into: (i) the cis-regulatory role of these upstream CNEs in human cells, (ii) the prevalence of upstream CNVs in ISS, and (iii) the chromatin architecture of the SHOX cis-regulatory landscape in chicken and human cells. Firstly, luciferase assays in human U2OS cells, and 4C-seq both in chicken limb buds and human U2OS cells, demonstrated cis-regulatory enhancer capacities of the upstream CNEs. Secondly, CNVs of these upstream CNEs were found in three of 501 ISS patients. Finally, our 4C-seq interaction map of the SHOX region reveals a cis-regulatory domain spanning more than 1 Mb and harbouring putative new cis-regulatory elements. PMID:26631348

  15. An evolutionary constraint: strongly disfavored class of change in DNA sequence during divergence of cis-regulatory modules.

    PubMed

    Cameron, R Andrew; Chow, Suk Hen; Berney, Kevin; Chiu, Tsz-Yeung; Yuan, Qiu-Autumn; Krämer, Alexander; Helguero, Argelia; Ransick, Andrew; Yun, Mirong; Davidson, Eric H

    2005-08-16

    The DNA of functional cis-regulatory modules displays extensive sequence conservation in comparisons of genomes from modestly distant species. Patches of sequence that are several hundred base pairs in length within these modules are often seen to be 80-95% identical, although the flanking sequence cannot even be aligned. However, it is unlikely that base pairs located between the transcription factor target sites of cis-regulatory modules have sequence-dependent function, and the mechanism that constrains evolutionary change within cis-regulatory modules is incompletely understood. We chose five functionally characterized cis-regulatory modules from the Strongylocentrotus purpuratus (sea urchin) genome and obtained orthologous regulatory and flanking sequences from a bacterial artificial chromosome genome library of a congener, Strongylocentrotus franciscanus. As expected, single-nucleotide substitutions and small indels occur freely at many positions within the regulatory modules of these two species, as they do outside the regulatory modules. However, large indels (>20 bp) are statistically almost absent within the regulatory modules, although they are common in flanking intergenic or intronic sequence. The result helps to explain the patterns of evolutionary sequence divergence characteristic of cis-regulatory DNA.

  16. Establishment of a Developmental Compartment Requires Interactions between Three Synergistic Cis-regulatory Modules

    PubMed Central

    Bieli, Dimitri; Kanca, Oguz; Requena, David; Hamaratoglu, Fisun; Gohl, Daryl; Schedl, Paul; Affolter, Markus; Slattery, Matthew; Müller, Martin; Estella, Carlos

    2015-01-01

    The subdivision of cell populations in compartments is a key event during animal development. In Drosophila, the gene apterous (ap) divides the wing imaginal disc in dorsal vs ventral cell lineages and is required for wing formation. ap function as a dorsal selector gene has been extensively studied. However, the regulation of its expression during wing development is poorly understood. In this study, we analyzed ap transcriptional regulation at the endogenous locus and identified three cis-regulatory modules (CRMs) essential for wing development. Only when the three CRMs are combined, robust ap expression is obtained. In addition, we genetically and molecularly analyzed the trans-factors that regulate these CRMs. Our results propose a three-step mechanism for the cell lineage compartment expression of ap that includes initial activation, positive autoregulation and Trithorax-mediated maintenance through separable CRMs. PMID:26468882

  17. Dissecting the Genetic Basis of a Complex cis-Regulatory Adaptation

    PubMed Central

    Artieri, Carlo G.; Zhang, Mian; Zhou, Yiqi; Palmer, Michael E.; Fraser, Hunter B.

    2015-01-01

    Although single genes underlying several evolutionary adaptations have been identified, the genetic basis of complex, polygenic adaptations has been far more challenging to pinpoint. Here we report that the budding yeast Saccharomyces paradoxus has recently evolved resistance to citrinin, a naturally occurring mycotoxin. Applying a genome-wide test for selection on cis-regulation, we identified five genes involved in the citrinin response that are constitutively up-regulated in S. paradoxus. Four of these genes are necessary for resistance, and are also sufficient to increase the resistance of a sensitive strain when over-expressed. Moreover, cis-regulatory divergence in the promoters of these genes contributes to resistance, while exacting a cost in the absence of citrinin. Our results demonstrate how the subtle effects of individual regulatory elements can be combined, via natural selection, into a complex adaptation. Our approach can be applied to dissect the genetic basis of polygenic adaptations in a wide range of species. PMID:26713447

  18. Lessons from Domestication: Targeting Cis-Regulatory Elements for Crop Improvement.

    PubMed

    Swinnen, Gwen; Goossens, Alain; Pauwels, Laurens

    2016-06-01

    Domestication of wild plant species has provided us with crops that serve our human nutritional needs. Advanced DNA sequencing has propelled the unveiling of underlying genetic changes associated with domestication. Interestingly, many changes reside in cis-regulatory elements (CREs) that control the expression of an unmodified coding sequence. Sequence variation in CREs can impact gene expression levels, but also developmental timing and tissue specificity of expression. When genes are involved in multiple pathways or active in several organs and developmental stages CRE modifications are favored in contrast to mutations in coding regions, due to the lack of detrimental pleiotropic effects. Therefore, learning from domestication, we propose that CREs are interesting targets for genome editing to create new alleles for plant breeding.

  19. BET bromodomain inhibition releases the Mediator complex from select cis-regulatory elements

    PubMed Central

    Bhagwat, Anand S.; Roe, Jae-Seok; Mok, Beverly A.; Hohmann, Anja F.; Shi, Junwei; Vakoc, Christopher R.

    2016-01-01

    The bromodomain and extraterminal (BET) protein BRD4 can physically interact with the Mediator complex, but the relevance of this association to the therapeutic effects of BET inhibitors in cancer is unclear. Here, we show that BET inhibition causes a rapid release of Mediator from a subset of cis-regulatory elements in the genome of acute myeloid leukemia (AML) cells. These sites of Mediator eviction were highly correlated with transcriptional suppression of neighboring genes, which are enriched for targets of the transcription factor MYB and for functions related to leukemogenesis. An shRNA screen of Mediator in AML cells identified the MED12, MED13, MED23, and MED24 subunits as performing a similar regulatory function to BRD4 in this context, including a shared role in sustaining a block in myeloid maturation. These findings suggest that the interaction between BRD4 and Mediator has functional importance for gene-specific transcriptional activation and for AML maintenance. PMID:27068464

  20. Quantitative Analysis of Cis-Regulatory Element Activity Using Synthetic Promoters in Transgenic Plants.

    PubMed

    Benn, Geoffrey; Dehesh, Katayoon

    2016-01-01

    Synthetic promoters, introduced stably or transiently into plants, are an invaluable tool for the identification of functional regulatory elements and the corresponding transcription factor(s) that regulate the amplitude, spatial distribution, and temporal patterns of gene expression. Here, we present a protocol describing the steps required to identify and characterize putative cis-regulatory elements. These steps include application of computational tools to identify putative elements, construction of a synthetic promoter upstream of luciferase, identification of transcription factors that regulate the element, testing the functionality of the element introduced transiently and/or stably into the species of interest followed by high-throughput luciferase screening assays, and subsequent data processing and statistical analysis. PMID:27557758

  1. [Identification and mapping of cis-regulatory elements within long genomic sequences].

    PubMed

    Akopov, S B; Chernov, I P; Vetchinova, A S; Bulanenkova, S S; Nikolaev, L G

    2007-01-01

    The publication of the human and other metazoan genome sequences opened up the possibility for mapping and analysis of genomic regulatory elements. Unfortunately, experimental data on genomic positions of such sequences as enhancers, silencers, insulators, transcription terminators, and replication origins are very limited, especially at the whole genome level. As most genomic regulatory elements (e.g., enhancers) are generally gene-, tissue-, or cell-specific, the prediction of these elements in silico is often ambiguous. Therefore, the development of high-throughput experimental approaches for identification and mapping of genomic functional elements is highly desirable. In this review we discuss novel approaches to high-throughput experimental identification of mammalian genomes cis-regulatory elements which is a necessary step toward the complete genome annotation. PMID:18240562

  2. Using RSAT to scan genome sequences for transcription factor binding sites and cis-regulatory modules.

    PubMed

    Turatsinze, Jean-Valery; Thomas-Chollier, Morgane; Defrance, Matthieu; van Helden, Jacques

    2008-01-01

    This protocol shows how to detect putative cis-regulatory elements and regions enriched in such elements with the regulatory sequence analysis tools (RSAT) web server (http://rsat.ulb.ac.be/rsat/). The approach applies to known transcription factors, whose binding specificity is represented by position-specific scoring matrices, using the program matrix-scan. The detection of individual binding sites is known to return many false predictions. However, results can be strongly improved by estimating P value, and by searching for combinations of sites (homotypic and heterotypic models). We illustrate the detection of sites and enriched regions with a study case, the upstream sequence of the Drosophila melanogaster gene even-skipped. This protocol is also tested on random control sequences to evaluate the reliability of the predictions. Each task requires a few minutes of computation time on the server. The complete protocol can be executed in about one hour.

  3. Alignment and prediction of cis-regulatory modules based on a probabilistic model of evolution.

    PubMed

    He, Xin; Ling, Xu; Sinha, Saurabh

    2009-03-01

    Cross-species comparison has emerged as a powerful paradigm for predicting cis-regulatory modules (CRMs) and understanding their evolution. The comparison requires reliable sequence alignment, which remains a challenging task for less conserved noncoding sequences. Furthermore, the existing models of DNA sequence evolution generally do not explicitly treat the special properties of CRM sequences. To address these limitations, we propose a model of CRM evolution that captures different modes of evolution of functional transcription factor binding sites (TFBSs) and the background sequences. A particularly novel aspect of our work is a probabilistic model of gains and losses of TFBSs, a process being recognized as an important part of regulatory sequence evolution. We present a computational framework that uses this model to solve the problems of CRM alignment and prediction. Our alignment method is similar to existing methods of statistical alignment but uses the conserved binding sites to improve alignment. Our CRM prediction method deals with the inherent uncertainties of binding site annotations and sequence alignment in a probabilistic framework. In simulated as well as real data, we demonstrate that our program is able to improve both alignment and prediction of CRM sequences over several state-of-the-art methods. Finally, we used alignments produced by our program to study binding site conservation in genome-wide binding data of key transcription factors in the Drosophila blastoderm, with two intriguing results: (i) the factor-bound sequences are under strong evolutionary constraints even if their neighboring genes are not expressed in the blastoderm and (ii) binding sites in distal bound sequences (relative to transcription start sites) tend to be more conserved than those in proximal regions. Our approach is implemented as software, EMMA (Evolutionary Model-based cis-regulatory Module Analysis), ready to be applied in a broad biological context.

  4. Brachyury, Foxa2 and the cis-Regulatory Origins of the Notochord.

    PubMed

    José-Edwards, Diana S; Oda-Ishii, Izumi; Kugler, Jamie E; Passamaneck, Yale J; Katikala, Lavanya; Nibu, Yutaka; Di Gregorio, Anna

    2015-12-01

    A main challenge of modern biology is to understand how specific constellations of genes are activated to differentiate cells and give rise to distinct tissues. This study focuses on elucidating how gene expression is initiated in the notochord, an axial structure that provides support and patterning signals to embryos of humans and all other chordates. Although numerous notochord genes have been identified, the regulatory DNAs that orchestrate development and propel evolution of this structure by eliciting notochord gene expression remain mostly uncharted, and the information on their configuration and recurrence is still quite fragmentary. Here we used the simple chordate Ciona for a systematic analysis of notochord cis-regulatory modules (CRMs), and investigated their composition, architectural constraints, predictive ability and evolutionary conservation. We found that most Ciona notochord CRMs relied upon variable combinations of binding sites for the transcription factors Brachyury and/or Foxa2, which can act either synergistically or independently from one another. Notably, one of these CRMs contains a Brachyury binding site juxtaposed to an (AC) microsatellite, an unusual arrangement also found in Brachyury-bound regulatory regions in mouse. In contrast, different subsets of CRMs relied upon binding sites for transcription factors of widely diverse families. Surprisingly, we found that neither intra-genomic nor interspecific conservation of binding sites were reliably predictive hallmarks of notochord CRMs. We propose that rather than obeying a rigid sequence-based cis-regulatory code, most notochord CRMs are rather unique. Yet, this study uncovered essential elements recurrently used by divergent chordates as basic building blocks for notochord CRMs. PMID:26684323

  5. Brachyury, Foxa2 and the cis-Regulatory Origins of the Notochord

    PubMed Central

    José-Edwards, Diana S.; Oda-Ishii, Izumi; Kugler, Jamie E.; Passamaneck, Yale J.; Katikala, Lavanya; Nibu, Yutaka; Di Gregorio, Anna

    2015-01-01

    A main challenge of modern biology is to understand how specific constellations of genes are activated to differentiate cells and give rise to distinct tissues. This study focuses on elucidating how gene expression is initiated in the notochord, an axial structure that provides support and patterning signals to embryos of humans and all other chordates. Although numerous notochord genes have been identified, the regulatory DNAs that orchestrate development and propel evolution of this structure by eliciting notochord gene expression remain mostly uncharted, and the information on their configuration and recurrence is still quite fragmentary. Here we used the simple chordate Ciona for a systematic analysis of notochord cis-regulatory modules (CRMs), and investigated their composition, architectural constraints, predictive ability and evolutionary conservation. We found that most Ciona notochord CRMs relied upon variable combinations of binding sites for the transcription factors Brachyury and/or Foxa2, which can act either synergistically or independently from one another. Notably, one of these CRMs contains a Brachyury binding site juxtaposed to an (AC) microsatellite, an unusual arrangement also found in Brachyury-bound regulatory regions in mouse. In contrast, different subsets of CRMs relied upon binding sites for transcription factors of widely diverse families. Surprisingly, we found that neither intra-genomic nor interspecific conservation of binding sites were reliably predictive hallmarks of notochord CRMs. We propose that rather than obeying a rigid sequence-based cis-regulatory code, most notochord CRMs are rather unique. Yet, this study uncovered essential elements recurrently used by divergent chordates as basic building blocks for notochord CRMs. PMID:26684323

  6. Massively parallel cis-regulatory analysis in the mammalian central nervous system

    PubMed Central

    Shen, Susan Q.; Myers, Connie A.; Hughes, Andrew E.O.; Byrne, Leah C.; Flannery, John G.; Corbo, Joseph C.

    2016-01-01

    Cis-regulatory elements (CREs, e.g., promoters and enhancers) regulate gene expression, and variants within CREs can modulate disease risk. Next-generation sequencing has enabled the rapid generation of genomic data that predict the locations of CREs, but a bottleneck lies in functionally interpreting these data. To address this issue, massively parallel reporter assays (MPRAs) have emerged, in which barcoded reporter libraries are introduced into cells, and the resulting barcoded transcripts are quantified by next-generation sequencing. Thus far, MPRAs have been largely restricted to assaying short CREs in a limited repertoire of cultured cell types. Here, we present two advances that extend the biological relevance and applicability of MPRAs. First, we adapt exome capture technology to instead capture candidate CREs, thereby tiling across the targeted regions and markedly increasing the length of CREs that can be readily assayed. Second, we package the library into adeno-associated virus (AAV), thereby allowing delivery to target organs in vivo. As a proof of concept, we introduce a capture library of about 46,000 constructs, corresponding to roughly 3500 DNase I hypersensitive (DHS) sites, into the mouse retina by ex vivo plasmid electroporation and into the mouse cerebral cortex by in vivo AAV injection. We demonstrate tissue-specific cis-regulatory activity of DHSs and provide examples of high-resolution truncation mutation analysis for multiplex parsing of CREs. Our approach should enable massively parallel functional analysis of a wide range of CREs in any organ or species that can be infected by AAV, such as nonhuman primates and human stem cell–derived organoids. PMID:26576614

  7. Massively parallel cis-regulatory analysis in the mammalian central nervous system.

    PubMed

    Shen, Susan Q; Myers, Connie A; Hughes, Andrew E O; Byrne, Leah C; Flannery, John G; Corbo, Joseph C

    2016-02-01

    Cis-regulatory elements (CREs, e.g., promoters and enhancers) regulate gene expression, and variants within CREs can modulate disease risk. Next-generation sequencing has enabled the rapid generation of genomic data that predict the locations of CREs, but a bottleneck lies in functionally interpreting these data. To address this issue, massively parallel reporter assays (MPRAs) have emerged, in which barcoded reporter libraries are introduced into cells, and the resulting barcoded transcripts are quantified by next-generation sequencing. Thus far, MPRAs have been largely restricted to assaying short CREs in a limited repertoire of cultured cell types. Here, we present two advances that extend the biological relevance and applicability of MPRAs. First, we adapt exome capture technology to instead capture candidate CREs, thereby tiling across the targeted regions and markedly increasing the length of CREs that can be readily assayed. Second, we package the library into adeno-associated virus (AAV), thereby allowing delivery to target organs in vivo. As a proof of concept, we introduce a capture library of about 46,000 constructs, corresponding to roughly 3500 DNase I hypersensitive (DHS) sites, into the mouse retina by ex vivo plasmid electroporation and into the mouse cerebral cortex by in vivo AAV injection. We demonstrate tissue-specific cis-regulatory activity of DHSs and provide examples of high-resolution truncation mutation analysis for multiplex parsing of CREs. Our approach should enable massively parallel functional analysis of a wide range of CREs in any organ or species that can be infected by AAV, such as nonhuman primates and human stem cell-derived organoids.

  8. In silico evolution of the hunchback gene indicates redundancy in cis-regulatory organization and spatial gene expression.

    PubMed

    Zagrijchuk, Elizaveta A; Sabirov, Marat A; Holloway, David M; Spirov, Alexander V

    2014-04-01

    Biological development depends on the coordinated expression of genes in time and space. Developmental genes have extensive cis-regulatory regions which control their expression. These regions are organized in a modular manner, with different modules controlling expression at different times and locations. Both how modularity evolved and what function it serves are open questions. We present a computational model for the cis-regulation of the hunchback (hb) gene in the fruit fly (Drosophila). We simulate evolution (using an evolutionary computation approach from computer science) to find the optimal cis-regulatory arrangements for fitting experimental hb expression patterns. We find that the cis-regulatory region tends to readily evolve modularity. These cis-regulatory modules (CRMs) do not tend to control single spatial domains, but show a multi-CRM/multi-domain correspondence. We find that the CRM-domain correspondence seen in Drosophila evolves with a high probability in our model, supporting the biological relevance of the approach. The partial redundancy resulting from multi-CRM control may confer some biological robustness against corruption of regulatory sequences. The technique developed on hb could readily be applied to other multi-CRM developmental genes.

  9. In silico evolution of the hunchback gene indicates redundancy in cis-regulatory organization and spatial gene expression

    PubMed Central

    Zagrijchuk, Elizaveta A.; Sabirov, Marat A.; Holloway, David M.; Spirov, Alexander V.

    2014-01-01

    Biological development depends on the coordinated expression of genes in time and space. Developmental genes have extensive cis-regulatory regions which control their expression. These regions are organized in a modular manner, with different modules controlling expression at different times and locations. Both how modularity evolved and what function it serves are open questions. We present a computational model for the cis-regulation of the hunchback (hb) gene in the fruit fly (Drosophila). We simulate evolution (using an evolutionary computation approach from computer science) to find the optimal cis-regulatory arrangements for fitting experimental hb expression patterns. We find that the cis-regulatory region tends to readily evolve modularity. These cis-regulatory modules (CRMs) do not tend to control single spatial domains, but show a multi-CRM/multi-domain correspondence. We find that the CRM-domain correspondence seen in Drosophila evolves with a high probability in our model, supporting the biological relevance of the approach. The partial redundancy resulting from multi-CRM control may confer some biological robustness against corruption of regulatory sequences. The technique developed on hb could readily be applied to other multi-CRM developmental genes. PMID:24712536

  10. Regulation of the Nanog Gene by Both Positive and Negative cis-Regulatory Elements in Embryonal Carcinoma Cells and Embryonic Stem Cells

    PubMed Central

    Boer, Brian; Cox, Jesse L.; Claassen, David; Mallanna, Sunil Kumar; Desler, Michelle; Rizzino, Angie

    2008-01-01

    The transcription factor Nanog is essential for mammalian embryogenesis, as well as the pluripotency of embryonic stem (ES) cells. Work with ES cells and embryonal carcinoma (EC) cells previously identified positive and negative cis-regulatory elements that influence the activity of the Nanog promoter, including adjacent cis-regulatory elements that bind Sox2 and Oct-3/4. Given the importance of Nanog during mammalian development, we examined the cis-regulatory elements required for Nanog promoter activity more closely. In this study, we demonstrate that two positive cis-regulatory elements previously shown to be active in F9 EC cells are also active in ES cells. We also identify a novel negative regulatory region that is located in close proximity to two other positive Nanog cis-regulatory elements. Although this negative regulatory region is active in F9 EC cells and ES cells, it is inactive in P19 EC cells. Furthermore, we demonstrate that one of the positive cis-regulatory elements active in F9 EC cells and ES cells is inactive in P19 EC cells. Together, these and other studies suggest that Nanog transcription is regulated by the interplay of positive and negative cis-regulatory elements. Given that P19 appears to be more closely related to a later developmental stage of mammalian development than F9 and ES cells, differential utilization of cis-regulatory elements may reflect mechanisms used during development to achieve the correct level of Nanog expression as embryogenesis unfolds. PMID:18537119

  11. Quantitative comparison of cis-regulatory element (CRE) activities in transgenic Drosophila melanogaster.

    PubMed

    Rogers, William A; Williams, Thomas M

    2011-01-01

    Gene expression patterns are specified by cis-regulatory element (CRE) sequences, which are also called enhancers or cis-regulatory modules. A typical CRE possesses an arrangement of binding sites for several transcription factor proteins that confer a regulatory logic specifying when, where, and at what level the regulated gene(s) is expressed. The full set of CREs within an animal genome encodes the organism's program for development, and empirical as well as theoretical studies indicate that mutations in CREs played a prominent role in morphological evolution. Moreover, human genome wide association studies indicate that genetic variation in CREs contribute substantially to phenotypic variation. Thus, understanding regulatory logic and how mutations affect such logic is a central goal of genetics. Reporter transgenes provide a powerful method to study the in vivo function of CREs. Here a known or suspected CRE sequence is coupled to heterologous promoter and coding sequences for a reporter gene encoding an easily observable protein product. When a reporter transgene is inserted into a host organism, the CRE's activity becomes visible in the form of the encoded reporter protein. P-element mediated transgenesis in the fruit fly species Drosophila (D.) melanogaster has been used for decades to introduce reporter transgenes into this model organism, though the genomic placement of transgenes is random. Hence, reporter gene activity is strongly influenced by the local chromatin and gene environment, limiting CRE comparisons to being qualitative. In recent years, the phiC31 based integration system was adapted for use in D. melanogaster to insert transgenes into specific genome landing sites. This capability has made the quantitative measurement of gene and, relevant here, CRE activity feasible. The production of transgenic fruit flies can be outsourced, including phiC31-based integration, eliminating the need to purchase expensive equipment and/or have proficiency at

  12. Intronic cis-regulatory modules mediate tissue-specific and microbial control of angptl4/fiaf transcription.

    PubMed

    Camp, J Gray; Jazwa, Amelia L; Trent, Chad M; Rawls, John F

    2012-01-01

    The intestinal microbiota enhances dietary energy harvest leading to increased fat storage in adipose tissues. This effect is caused in part by the microbial suppression of intestinal epithelial expression of a circulating inhibitor of lipoprotein lipase called Angiopoietin-like 4 (Angptl4/Fiaf). To define the cis-regulatory mechanisms underlying intestine-specific and microbial control of Angptl4 transcription, we utilized the zebrafish system in which host regulatory DNA can be rapidly analyzed in a live, transparent, and gnotobiotic vertebrate. We found that zebrafish angptl4 is transcribed in multiple tissues including the liver, pancreatic islet, and intestinal epithelium, which is similar to its mammalian homologs. Zebrafish angptl4 is also specifically suppressed in the intestinal epithelium upon colonization with a microbiota. In vivo transgenic reporter assays identified discrete tissue-specific regulatory modules within angptl4 intron 3 sufficient to drive expression in the liver, pancreatic islet β-cells, or intestinal enterocytes. Comparative sequence analyses and heterologous functional assays of angptl4 intron 3 sequences from 12 teleost fish species revealed differential evolution of the islet and intestinal regulatory modules. High-resolution functional mapping and site-directed mutagenesis defined the minimal set of regulatory sequences required for intestinal activity. Strikingly, the microbiota suppressed the transcriptional activity of the intestine-specific regulatory module similar to the endogenous angptl4 gene. These results suggest that the microbiota might regulate host intestinal Angptl4 protein expression and peripheral fat storage by suppressing the activity of an intestine-specific transcriptional enhancer. This study provides a useful paradigm for understanding how microbial signals interact with tissue-specific regulatory networks to control the activity and evolution of host gene transcription. PMID:22479192

  13. Identification of a novel cis-regulatory element essential for immune tolerance.

    PubMed

    LaFlam, Taylor N; Seumois, Grégory; Miller, Corey N; Lwin, Wint; Fasano, Kayla J; Waterfield, Michael; Proekt, Irina; Vijayanand, Pandurangan; Anderson, Mark S

    2015-11-16

    Thymic central tolerance is essential to preventing autoimmunity. In medullary thymic epithelial cells (mTECs), the Autoimmune regulator (Aire) gene plays an essential role in this process by driving the expression of a diverse set of tissue-specific antigens (TSAs), which are presented and help tolerize self-reactive thymocytes. Interestingly, Aire has a highly tissue-restricted pattern of expression, with only mTECs and peripheral extrathymic Aire-expressing cells (eTACs) known to express detectable levels in adults. Despite this high level of tissue specificity, the cis-regulatory elements that control Aire expression have remained obscure. Here, we identify a highly conserved noncoding DNA element that is essential for Aire expression. This element shows enrichment of enhancer-associated histone marks in mTECs and also has characteristics of being an NF-κB-responsive element. Finally, we find that this element is essential for Aire expression in vivo and necessary to prevent spontaneous autoimmunity, reflecting the importance of this regulatory DNA element in promoting immune tolerance. PMID:26527800

  14. Distal cis-regulatory elements are required for tissue-specific expression of enamelin (Enam)

    PubMed Central

    Hu, Yuanyuan; Papagerakis, Petros; Ye, Ling; Feng, Jerry Q.; Simmer, James P.; Hu, Jan C-C.

    2009-01-01

    Enamel formation is orchestrated by the sequential expression of genes encoding enamel matrix proteins; however, the mechanisms sustaining the spatio–temporal order of gene transcription during amelogenesis are poorly understood. The aim of this study was to characterize the cis-regulatory sequences necessary for normal expression of enamelin (Enam). Several enamelin transcription regulatory regions, showing high sequence homology among species, were identified. DNA constructs containing 5.2 or 3.9 kb regions upstream of the enamelin translation initiation site were linked to a LacZ reporter and used to generate transgenic mice. Only the 5.2-Enam–LacZ construct was sufficient to recapitulate the endogenous pattern of enamelin tooth-specific expression. The 3.9-Enam–LacZ transgenic lines showed no expression in dental cells, but ectopic β-galactosidase activity was detected in osteoblasts. Potential transcription factor-binding sites were identified that may be important in controlling enamelin basal promoter activity and in conferring enamelin tissue-specific expression. Our study provides new insights into regulatory mechanisms governing enamelin expression. PMID:18353004

  15. Systematic prediction of cis-regulatory elements in the Chlamydomonas reinhardtii genome using comparative genomics.

    PubMed

    Ding, Jun; Li, Xiaoman; Hu, Haiyan

    2012-10-01

    Chlamydomonas reinhardtii is one of the most important microalgae model organisms and has been widely studied toward the understanding of chloroplast functions and various cellular processes. Further exploitation of C. reinhardtii as a model system to elucidate various molecular mechanisms and pathways requires systematic study of gene regulation. However, there is a general lack of genome-scale gene regulation study, such as global cis-regulatory element (CRE) identification, in C. reinhardtii. Recently, large-scale genomic data in microalgae species have become available, which enable the development of efficient computational methods to systematically identify CREs and characterize their roles in microalgae gene regulation. Here, we performed in silico CRE identification at the whole genome level in C. reinhardtii using a comparative genomics-based method. We predicted a large number of CREs in C. reinhardtii that are consistent with experimentally verified CREs. We also discovered that a large percentage of these CREs form combinations and have the potential to work together for coordinated gene regulation in C. reinhardtii. Multiple lines of evidence from literature, gene transcriptional profiles, and gene annotation resources support our prediction. The predicted CREs will serve, to our knowledge, as the first large-scale collection of CREs in C. reinhardtii to facilitate further experimental study of microalgae gene regulation. The accompanying software tool and the predictions in C. reinhardtii are also made available through a Web-accessible database (http://hulab.ucf.edu/research/projects/Microalgae/sdcre/motifcomb.html).

  16. Genome-wide Computational Analysis Reveals Cardiomyocyte-specific Transcriptional Cis-regulatory Motifs That Enable Efficient Cardiac Gene Therapy

    PubMed Central

    Rincon, Melvin Y; Sarcar, Shilpita; Danso-Abeam, Dina; Keyaerts, Marleen; Matrai, Janka; Samara-Kuko, Ermira; Acosta-Sanchez, Abel; Athanasopoulos, Takis; Dickson, George; Lahoutte, Tony; De Bleser, Pieter; VandenDriessche, Thierry; Chuah, Marinee K

    2015-01-01

    Gene therapy is a promising emerging therapeutic modality for the treatment of cardiovascular diseases and hereditary diseases that afflict the heart. Hence, there is a need to develop robust cardiac-specific expression modules that allow for stable expression of the gene of interest in cardiomyocytes. We therefore explored a new approach based on a genome-wide bioinformatics strategy that revealed novel cardiac-specific cis-acting regulatory modules (CS-CRMs). These transcriptional modules contained evolutionary-conserved clusters of putative transcription factor binding sites that correspond to a “molecular signature” associated with robust gene expression in the heart. We then validated these CS-CRMs in vivo using an adeno-associated viral vector serotype 9 that drives a reporter gene from a quintessential cardiac-specific α-myosin heavy chain promoter. Most de novo designed CS-CRMs resulted in a >10-fold increase in cardiac gene expression. The most robust CRMs enhanced cardiac-specific transcription 70- to 100-fold. Expression was sustained and restricted to cardiomyocytes. We then combined the most potent CS-CRM4 with a synthetic heart and muscle-specific promoter (SPc5-12) and obtained a significant 20-fold increase in cardiac gene expression compared to the cytomegalovirus promoter. This study underscores the potential of rational vector design to improve the robustness of cardiac gene therapy. PMID:25195597

  17. Genome-wide computational analysis reveals cardiomyocyte-specific transcriptional Cis-regulatory motifs that enable efficient cardiac gene therapy.

    PubMed

    Rincon, Melvin Y; Sarcar, Shilpita; Danso-Abeam, Dina; Keyaerts, Marleen; Matrai, Janka; Samara-Kuko, Ermira; Acosta-Sanchez, Abel; Athanasopoulos, Takis; Dickson, George; Lahoutte, Tony; De Bleser, Pieter; VandenDriessche, Thierry; Chuah, Marinee K

    2015-01-01

    Gene therapy is a promising emerging therapeutic modality for the treatment of cardiovascular diseases and hereditary diseases that afflict the heart. Hence, there is a need to develop robust cardiac-specific expression modules that allow for stable expression of the gene of interest in cardiomyocytes. We therefore explored a new approach based on a genome-wide bioinformatics strategy that revealed novel cardiac-specific cis-acting regulatory modules (CS-CRMs). These transcriptional modules contained evolutionary-conserved clusters of putative transcription factor binding sites that correspond to a "molecular signature" associated with robust gene expression in the heart. We then validated these CS-CRMs in vivo using an adeno-associated viral vector serotype 9 that drives a reporter gene from a quintessential cardiac-specific α-myosin heavy chain promoter. Most de novo designed CS-CRMs resulted in a >10-fold increase in cardiac gene expression. The most robust CRMs enhanced cardiac-specific transcription 70- to 100-fold. Expression was sustained and restricted to cardiomyocytes. We then combined the most potent CS-CRM4 with a synthetic heart and muscle-specific promoter (SPc5-12) and obtained a significant 20-fold increase in cardiac gene expression compared to the cytomegalovirus promoter. This study underscores the potential of rational vector design to improve the robustness of cardiac gene therapy.

  18. Testing of Cis-Regulatory Elements by Targeted Transgene Integration in Zebrafish Using PhiC31 Integrase.

    PubMed

    Hadzhiev, Yavor; Miguel-Escalada, Irene; Balciunas, Darius; Müller, Ferenc

    2016-01-01

    Herein we present several strategies for testing the function of cis-regulatory elements using the PhiC31 integrase system. Firstly, we present two different strategies to analyze the activity of candidate enhancer elements. Targeted integration of candidate enhancers into the same genomic location circumvents the variability-associated random integration and position effects. This method is suitable for testing of candidate enhancers identified through computational or other analyses a priori. Secondly, we present methodology for targeted integration of BACs into the same genomic location(s). By using additional reporters integrated into a BAC, this enables experimental testing whether cis-regulatory elements are functional in the sequence inserted in the BAC. PMID:27464802

  19. ChIP-Seq-Annotated Heliconius erato Genome Highlights Patterns of cis-Regulatory Evolution in Lepidoptera.

    PubMed

    Lewis, James J; van der Burg, Karin R L; Mazo-Vargas, Anyi; Reed, Robert D

    2016-09-13

    Uncovering phylogenetic patterns of cis-regulatory evolution remains a fundamental goal for evolutionary and developmental biology. Here, we characterize the evolution of regulatory loci in butterflies and moths using chromatin immunoprecipitation sequencing (ChIP-seq) annotation of regulatory elements across three stages of head development. In the process we provide a high-quality, functionally annotated genome assembly for the butterfly, Heliconius erato. Comparing cis-regulatory element conservation across six lepidopteran genomes, we find that regulatory sequences evolve at a pace similar to that of protein-coding regions. We also observe that elements active at multiple developmental stages are markedly more conserved than elements with stage-specific activity. Surprisingly, we also find that stage-specific proximal and distal regulatory elements evolve at nearly identical rates. Our study provides a benchmark for genome-wide patterns of regulatory element evolution in insects, and it shows that developmental timing of activity strongly predicts patterns of regulatory sequence evolution.

  20. Mapping Association between Long-Range Cis-Regulatory Regions and Their Target Genes Using Comparative Genomics

    NASA Astrophysics Data System (ADS)

    Mongin, Emmanuel; Dewar, Ken; Blanchette, Mathieu

    In chordates, long-range cis-regulatory regions are involved in the control of transcription initiation (either as repressors or enhancers). They can be located as far as 1 Mb from the transcription start site of the target gene and can regulate more than one gene. Therefore, proper characterization of functional interactions between long-range cis-regulatory regions and their target genes remains problematic. We present a novel method to predict such interactions based on the analysis of rearrangements between the human and 16 other vertebrate genomes. Our method is based on the assumption that genome rearrangements that would disrupt the functional interaction between a cis-regulatory region and its target gene are likely to be deleterious. Therefore, conservation of synteny through evolution would be an indication of a functional interaction. We use our algorithm to classify a set of 1,406,084 putative associations from the human genome. This genome-wide map of interactions has many potential applications, including the selection of candidate regions prior to in vivo experimental characterization, a better characterization of regulatory regions involved in position effect diseases, and an improved understanding of the mechanisms and importance of long-range regulation.

  1. Changes in cis-regulatory elements of a key floral regulator are associated with divergence of inflorescence architectures.

    PubMed

    Kusters, Elske; Della Pina, Serena; Castel, Rob; Souer, Erik; Koes, Ronald

    2015-08-15

    Higher plant species diverged extensively with regard to the moment (flowering time) and position (inflorescence architecture) at which flowers are formed. This seems largely caused by variation in the expression patterns of conserved genes that specify floral meristem identity (FMI), rather than changes in the encoded proteins. Here, we report a functional comparison of the promoters of homologous FMI genes from Arabidopsis, petunia, tomato and Antirrhinum. Analysis of promoter-reporter constructs in petunia and Arabidopsis, as well as complementation experiments, showed that the divergent expression of leafy (LFY) and the petunia homolog aberrant leaf and flower (ALF) results from alterations in the upstream regulatory network rather than cis-regulatory changes. The divergent expression of unusual floral organs (UFO) from Arabidopsis, and the petunia homolog double top (DOT), however, is caused by the loss or gain of cis-regulatory promoter elements, which respond to trans-acting factors that are expressed in similar patterns in both species. Introduction of pUFO:UFO causes no obvious defects in Arabidopsis, but in petunia it causes the precocious and ectopic formation of flowers. This provides an example of how a change in a cis-regulatory region can account for a change in the plant body plan. PMID:26220938

  2. Changes in cis-regulatory elements of a key floral regulator are associated with divergence of inflorescence architectures.

    PubMed

    Kusters, Elske; Della Pina, Serena; Castel, Rob; Souer, Erik; Koes, Ronald

    2015-08-15

    Higher plant species diverged extensively with regard to the moment (flowering time) and position (inflorescence architecture) at which flowers are formed. This seems largely caused by variation in the expression patterns of conserved genes that specify floral meristem identity (FMI), rather than changes in the encoded proteins. Here, we report a functional comparison of the promoters of homologous FMI genes from Arabidopsis, petunia, tomato and Antirrhinum. Analysis of promoter-reporter constructs in petunia and Arabidopsis, as well as complementation experiments, showed that the divergent expression of leafy (LFY) and the petunia homolog aberrant leaf and flower (ALF) results from alterations in the upstream regulatory network rather than cis-regulatory changes. The divergent expression of unusual floral organs (UFO) from Arabidopsis, and the petunia homolog double top (DOT), however, is caused by the loss or gain of cis-regulatory promoter elements, which respond to trans-acting factors that are expressed in similar patterns in both species. Introduction of pUFO:UFO causes no obvious defects in Arabidopsis, but in petunia it causes the precocious and ectopic formation of flowers. This provides an example of how a change in a cis-regulatory region can account for a change in the plant body plan.

  3. Regulation of human PTCH1b expression by different 5' untranslated region cis-regulatory elements.

    PubMed

    Ozretić, Petar; Bisio, Alessandra; Musani, Vesna; Trnski, Diana; Sabol, Maja; Levanat, Sonja; Inga, Alberto

    2015-01-01

    PTCH1 gene codes for a 12-pass transmembrane receptor with a negative regulatory role in the Hedgehog-Gli signaling pathway. PTCH1 germline mutations cause Gorlin syndrome, a disorder characterized by developmental abnormalities and tumor susceptibility. The autosomal dominant inheritance, and the evidence for PTCH1 haploinsufficiency, suggests that fine-tuning systems of protein patched homolog 1 (PTC1) levels exist to properly regulate the pathway. Given the role of 5' untranslated region (5'UTR) in protein expression, our aim was to thoroughly explore cis-regulatory elements in the 5'UTR of PTCH1 transcript 1b. The (CGG)n polymorphism was the main potential regulatory element studied so far but with inconsistent results and no clear association between repeat number and disease risk. Using luciferase reporter constructs in human cell lines here we show that the number of CGG repeats has no strong impact on gene expression, both at mRNA and protein levels. We observed variability in the length of 5'UTR and changes in abundance of the associated transcripts after pathway activation. We show that upstream AUG codons (uAUGs) present only in longer 5'UTRs could negatively regulate the amount of PTC1 isoform L (PTC1-L). The existence of an internal ribosome entry site (IRES) observed using different approaches and mapped in the region comprising the CGG repeats, would counteract the effect of the uAUGs and enable synthesis of PTC1-L under stressful conditions, such as during hypoxia. Higher relative translation efficiency of PTCH1b mRNA in HEK 293T cultured hypoxia was observed by polysomal profiling and Western blot analyses. All our results point to an exceptionally complex and so far unexplored role of 5'UTR PTCH1b cis-element features in the regulation of the Hedgehog-Gli signaling pathway. PMID:25826662

  4. Identification of High-Impact cis-Regulatory Mutations Using Transcription Factor Specific Random Forest Models

    PubMed Central

    Svetlichnyy, Dmitry; Imrichova, Hana; Fiers, Mark; Kalender Atak, Zeynep; Aerts, Stein

    2015-01-01

    Cancer genomes contain vast amounts of somatic mutations, many of which are passenger mutations not involved in oncogenesis. Whereas driver mutations in protein-coding genes can be distinguished from passenger mutations based on their recurrence, non-coding mutations are usually not recurrent at the same position. Therefore, it is still unclear how to identify cis-regulatory driver mutations, particularly when chromatin data from the same patient is not available, thus relying only on sequence and expression information. Here we use machine-learning methods to predict functional regulatory regions using sequence information alone, and compare the predicted activity of the mutated region with the reference sequence. This way we define the Predicted Regulatory Impact of a Mutation in an Enhancer (PRIME). We find that the recently identified driver mutation in the TAL1 enhancer has a high PRIME score, representing a “gain-of-target” for MYB, whereas the highly recurrent TERT promoter mutation has a surprisingly low PRIME score. We trained Random Forest models for 45 cancer-related transcription factors, and used these to score variations in the HeLa genome and somatic mutations across more than five hundred cancer genomes. Each model predicts only a small fraction of non-coding mutations with a potential impact on the function of the encompassing regulatory region. Nevertheless, as these few candidate driver mutations are often linked to gains in chromatin activity and gene expression, they may contribute to the oncogenic program by altering the expression levels of specific oncogenes and tumor suppressor genes. PMID:26562774

  5. Deciphering Cis-Regulatory Element Mediated Combinatorial Regulation in Rice under Blast Infected Condition

    PubMed Central

    Deb, Arindam; Kundu, Sudip

    2015-01-01

    Combinations of cis-regulatory elements (CREs) present at the promoters facilitate the binding of several transcription factors (TFs), thereby altering the consequent gene expressions. Due to the eminent complexity of the regulatory mechanism, the combinatorics of CRE-mediated transcriptional regulation has been elusive. In this work, we have developed a new methodology that quantifies the co-occurrence tendencies of CREs present in a set of promoter sequences; these co-occurrence scores are filtered in three consecutive steps to test their statistical significance; and the significantly co-occurring CRE pairs are presented as networks. These networks of co-occurring CREs are further transformed to derive higher order of regulatory combinatorics. We have further applied this methodology on the differentially up-regulated gene-sets of rice tissues under fungal (Magnaporthe) infected conditions to demonstrate how it helps to understand the CRE-mediated combinatorial gene regulation. Our analysis includes a wide spectrum of biologically important results. The CRE pairs having a strong tendency to co-occur often exhibit very similar joint distribution patterns at the promoters of rice. We couple the network approach with experimental results of plant gene regulation and defense mechanisms and find evidences of auto and cross regulation among TF families, cross-talk among multiple hormone signaling pathways, similarities and dissimilarities in regulatory combinatorics between different tissues, etc. Our analyses have pointed a highly distributed nature of the combinatorial gene regulation facilitating an efficient alteration in response to fungal attack. All together, our proposed methodology could be an important approach in understanding the combinatorial gene regulation. It can be further applied to unravel the tissue and/or condition specific combinatorial gene regulation in other eukaryotic systems with the availability of annotated genomic sequences and suitable

  6. Deciphering Cis-Regulatory Element Mediated Combinatorial Regulation in Rice under Blast Infected Condition.

    PubMed

    Deb, Arindam; Kundu, Sudip

    2015-01-01

    Combinations of cis-regulatory elements (CREs) present at the promoters facilitate the binding of several transcription factors (TFs), thereby altering the consequent gene expressions. Due to the eminent complexity of the regulatory mechanism, the combinatorics of CRE-mediated transcriptional regulation has been elusive. In this work, we have developed a new methodology that quantifies the co-occurrence tendencies of CREs present in a set of promoter sequences; these co-occurrence scores are filtered in three consecutive steps to test their statistical significance; and the significantly co-occurring CRE pairs are presented as networks. These networks of co-occurring CREs are further transformed to derive higher order of regulatory combinatorics. We have further applied this methodology on the differentially up-regulated gene-sets of rice tissues under fungal (Magnaporthe) infected conditions to demonstrate how it helps to understand the CRE-mediated combinatorial gene regulation. Our analysis includes a wide spectrum of biologically important results. The CRE pairs having a strong tendency to co-occur often exhibit very similar joint distribution patterns at the promoters of rice. We couple the network approach with experimental results of plant gene regulation and defense mechanisms and find evidences of auto and cross regulation among TF families, cross-talk among multiple hormone signaling pathways, similarities and dissimilarities in regulatory combinatorics between different tissues, etc. Our analyses have pointed a highly distributed nature of the combinatorial gene regulation facilitating an efficient alteration in response to fungal attack. All together, our proposed methodology could be an important approach in understanding the combinatorial gene regulation. It can be further applied to unravel the tissue and/or condition specific combinatorial gene regulation in other eukaryotic systems with the availability of annotated genomic sequences and suitable

  7. Deciphering Cis-Regulatory Element Mediated Combinatorial Regulation in Rice under Blast Infected Condition.

    PubMed

    Deb, Arindam; Kundu, Sudip

    2015-01-01

    Combinations of cis-regulatory elements (CREs) present at the promoters facilitate the binding of several transcription factors (TFs), thereby altering the consequent gene expressions. Due to the eminent complexity of the regulatory mechanism, the combinatorics of CRE-mediated transcriptional regulation has been elusive. In this work, we have developed a new methodology that quantifies the co-occurrence tendencies of CREs present in a set of promoter sequences; these co-occurrence scores are filtered in three consecutive steps to test their statistical significance; and the significantly co-occurring CRE pairs are presented as networks. These networks of co-occurring CREs are further transformed to derive higher order of regulatory combinatorics. We have further applied this methodology on the differentially up-regulated gene-sets of rice tissues under fungal (Magnaporthe) infected conditions to demonstrate how it helps to understand the CRE-mediated combinatorial gene regulation. Our analysis includes a wide spectrum of biologically important results. The CRE pairs having a strong tendency to co-occur often exhibit very similar joint distribution patterns at the promoters of rice. We couple the network approach with experimental results of plant gene regulation and defense mechanisms and find evidences of auto and cross regulation among TF families, cross-talk among multiple hormone signaling pathways, similarities and dissimilarities in regulatory combinatorics between different tissues, etc. Our analyses have pointed a highly distributed nature of the combinatorial gene regulation facilitating an efficient alteration in response to fungal attack. All together, our proposed methodology could be an important approach in understanding the combinatorial gene regulation. It can be further applied to unravel the tissue and/or condition specific combinatorial gene regulation in other eukaryotic systems with the availability of annotated genomic sequences and suitable

  8. Identification and characterization of a cis-regulatory element for zygotic gene expression in Chlamydomonas reinhardtii

    DOE PAGES

    Hamaji, Takashi; Lopez, David; Pellegrini, Matteo; Umen, James

    2016-03-26

    Upon fertilization Chlamydomonas reinhardtii zygotes undergo a program of differentiation into a diploid zygospore that is accompanied by transcription of hundreds of zygote-specific genes. We identified a distinct sequence motif we term a zygotic response element (ZYRE) that is highly enriched in promoter regions of C. reinhardtii early zygotic genes. A luciferase reporter assay was used to show that native ZYRE motifs within the promoter of zygotic gene ZYS3 or intron of zygotic gene DMT4 are necessary for zygotic induction. A synthetic luciferase reporter with a minimal promoter was used to show that ZYRE motifs introduced upstream are sufficient tomore » confer zygotic upregulation, and that ZYRE-controlled zygotic transcription is dependent on the homeodomain transcription factor GSP1. Furthermore, we predict that ZYRE motifs will correspond to binding sites for the homeodomain proteins GSP1-GSM1 that heterodimerize and activate zygotic gene expression in early zygotes.« less

  9. Identification and Characterization of a cis-Regulatory Element for Zygotic Gene Expression in Chlamydomonas reinhardtii

    PubMed Central

    Hamaji, Takashi; Lopez, David; Pellegrini, Matteo; Umen, James

    2016-01-01

    Upon fertilization Chlamydomonas reinhardtii zygotes undergo a program of differentiation into a diploid zygospore that is accompanied by transcription of hundreds of zygote-specific genes. We identified a distinct sequence motif we term a zygotic response element (ZYRE) that is highly enriched in promoter regions of C. reinhardtii early zygotic genes. A luciferase reporter assay was used to show that native ZYRE motifs within the promoter of zygotic gene ZYS3 or intron of zygotic gene DMT4 are necessary for zygotic induction. A synthetic luciferase reporter with a minimal promoter was used to show that ZYRE motifs introduced upstream are sufficient to confer zygotic upregulation, and that ZYRE-controlled zygotic transcription is dependent on the homeodomain transcription factor GSP1. We predict that ZYRE motifs will correspond to binding sites for the homeodomain proteins GSP1-GSM1 that heterodimerize and activate zygotic gene expression in early zygotes. PMID:27172209

  10. Identification and Characterization of a cis-Regulatory Element for Zygotic Gene Expression in Chlamydomonas reinhardtii.

    PubMed

    Hamaji, Takashi; Lopez, David; Pellegrini, Matteo; Umen, James

    2016-01-01

    Upon fertilization Chlamydomonas reinhardtii zygotes undergo a program of differentiation into a diploid zygospore that is accompanied by transcription of hundreds of zygote-specific genes. We identified a distinct sequence motif we term a zygotic response element (ZYRE) that is highly enriched in promoter regions of C reinhardtii early zygotic genes. A luciferase reporter assay was used to show that native ZYRE motifs within the promoter of zygotic gene ZYS3 or intron of zygotic gene DMT4 are necessary for zygotic induction. A synthetic luciferase reporter with a minimal promoter was used to show that ZYRE motifs introduced upstream are sufficient to confer zygotic upregulation, and that ZYRE-controlled zygotic transcription is dependent on the homeodomain transcription factor GSP1. We predict that ZYRE motifs will correspond to binding sites for the homeodomain proteins GSP1-GSM1 that heterodimerize and activate zygotic gene expression in early zygotes. PMID:27172209

  11. An ancient yet flexible cis-regulatory architecture allows localized Hedgehog tuning by patched/Ptch1.

    PubMed

    Lorberbaum, David S; Ramos, Andrea I; Peterson, Kevin A; Carpenter, Brandon S; Parker, David S; De, Sandip; Hillers, Lauren E; Blake, Victoria M; Nishi, Yuichi; McFarlane, Matthew R; Chiang, Ason Cy; Kassis, Judith A; Allen, Benjamin L; McMahon, Andrew P; Barolo, Scott

    2016-01-01

    The Hedgehog signaling pathway is part of the ancient developmental-evolutionary animal toolkit. Frequently co-opted to pattern new structures, the pathway is conserved among eumetazoans yet flexible and pleiotropic in its effects. The Hedgehog receptor, Patched, is transcriptionally activated by Hedgehog, providing essential negative feedback in all tissues. Our locus-wide dissections of the cis-regulatory landscapes of fly patched and mouse Ptch1 reveal abundant, diverse enhancers with stage- and tissue-specific expression patterns. The seemingly simple, constitutive Hedgehog response of patched/Ptch1 is driven by a complex regulatory architecture, with batteries of context-specific enhancers engaged in promoter-specific interactions to tune signaling individually in each tissue, without disturbing patterning elsewhere. This structure-one of the oldest cis-regulatory features discovered in animal genomes-explains how patched/Ptch1 can drive dramatic adaptations in animal morphology while maintaining its essential core function. It may also suggest a general model for the evolutionary flexibility of conserved regulators and pathways. PMID:27146892

  12. An ancient yet flexible cis-regulatory architecture allows localized Hedgehog tuning by patched/Ptch1

    PubMed Central

    Lorberbaum, David S; Ramos, Andrea I; Peterson, Kevin A; Carpenter, Brandon S; Parker, David S; De, Sandip; Hillers, Lauren E; Blake, Victoria M; Nishi, Yuichi; McFarlane, Matthew R; Chiang, Ason CY; Kassis, Judith A; Allen, Benjamin L; McMahon, Andrew P; Barolo, Scott

    2016-01-01

    The Hedgehog signaling pathway is part of the ancient developmental-evolutionary animal toolkit. Frequently co-opted to pattern new structures, the pathway is conserved among eumetazoans yet flexible and pleiotropic in its effects. The Hedgehog receptor, Patched, is transcriptionally activated by Hedgehog, providing essential negative feedback in all tissues. Our locus-wide dissections of the cis-regulatory landscapes of fly patched and mouse Ptch1 reveal abundant, diverse enhancers with stage- and tissue-specific expression patterns. The seemingly simple, constitutive Hedgehog response of patched/Ptch1 is driven by a complex regulatory architecture, with batteries of context-specific enhancers engaged in promoter-specific interactions to tune signaling individually in each tissue, without disturbing patterning elsewhere. This structure—one of the oldest cis-regulatory features discovered in animal genomes—explains how patched/Ptch1 can drive dramatic adaptations in animal morphology while maintaining its essential core function. It may also suggest a general model for the evolutionary flexibility of conserved regulators and pathways. DOI: http://dx.doi.org/10.7554/eLife.13550.001 PMID:27146892

  13. Differential contribution of cis-regulatory elements to higher order chromatin structure and expression of the CFTR locus

    PubMed Central

    Yang, Rui; Kerschner, Jenny L.; Gosalia, Nehal; Neems, Daniel; Gorsic, Lidija K.; Safi, Alexias; Crawford, Gregory E.; Kosak, Steven T.; Leir, Shih-Hsing; Harris, Ann

    2016-01-01

    Higher order chromatin structure establishes domains that organize the genome and coordinate gene expression. However, the molecular mechanisms controlling transcription of individual loci within a topological domain (TAD) are not fully understood. The cystic fibrosis transmembrane conductance regulator (CFTR) gene provides a paradigm for investigating these mechanisms. CFTR occupies a TAD bordered by CTCF/cohesin binding sites within which are cell-type-selective cis-regulatory elements for the locus. We showed previously that intronic and extragenic enhancers, when occupied by specific transcription factors, are recruited to the CFTR promoter by a looping mechanism to drive gene expression. Here we use a combination of CRISPR/Cas9 editing of cis-regulatory elements and siRNA-mediated depletion of architectural proteins to determine the relative contribution of structural elements and enhancers to the higher order structure and expression of the CFTR locus. We found the boundaries of the CFTR TAD are conserved among diverse cell types and are dependent on CTCF and cohesin complex. Removal of an upstream CTCF-binding insulator alters the interaction profile, but has little effect on CFTR expression. Within the TAD, intronic enhancers recruit cell-type selective transcription factors and deletion of a pivotal enhancer element dramatically decreases CFTR expression, but has minor effect on its 3D structure. PMID:26673704

  14. ChIP-Seq-Annotated Heliconius erato Genome Highlights Patterns of cis-Regulatory Evolution in Lepidoptera.

    PubMed

    Lewis, James J; van der Burg, Karin R L; Mazo-Vargas, Anyi; Reed, Robert D

    2016-09-13

    Uncovering phylogenetic patterns of cis-regulatory evolution remains a fundamental goal for evolutionary and developmental biology. Here, we characterize the evolution of regulatory loci in butterflies and moths using chromatin immunoprecipitation sequencing (ChIP-seq) annotation of regulatory elements across three stages of head development. In the process we provide a high-quality, functionally annotated genome assembly for the butterfly, Heliconius erato. Comparing cis-regulatory element conservation across six lepidopteran genomes, we find that regulatory sequences evolve at a pace similar to that of protein-coding regions. We also observe that elements active at multiple developmental stages are markedly more conserved than elements with stage-specific activity. Surprisingly, we also find that stage-specific proximal and distal regulatory elements evolve at nearly identical rates. Our study provides a benchmark for genome-wide patterns of regulatory element evolution in insects, and it shows that developmental timing of activity strongly predicts patterns of regulatory sequence evolution. PMID:27626657

  15. Creation of cis-regulatory elements during sea urchin evolution by co-option and optimization of a repetitive sequence adjacent to the spec2a gene.

    PubMed

    Dayal, Sandeep; Kiyama, Takae; Villinski, Jeffrey T; Zhang, Ning; Liang, Shuguang; Klein, William H

    2004-09-15

    The creation, preservation, and degeneration of cis-regulatory elements controlling developmental gene expression are fundamental genome-level evolutionary processes about which little is known. Here, we identify critical differences in cis-regulatory elements controlling the expression of the sea urchin aboral ectoderm-specific spec genes. We found multiple copies of a repetitive sequence element termed RSR in genomes of species within the Strongylocentrotidae family, but RSRs were not detected in genomes of species outside Strongylocentrotidae. spec genes in Strongylocentrotus purpuratus are invariably associated with RSRs, and the spec2a RSR functioned as a transcriptional enhancer and displayed greater activity than did spec1 or spec2c RSRs. Single-base pair differences at two cis-regulatory elements within the spec2a RSR increased the binding affinities of four transcription factors, SpCCAAT-binding factor at one element and SpOtx, SpGoosecoid, and SpGATA-E at another. The cis-regulatory elements to which these four factors bound were recent evolutionary acquisitions that acted to either activate or repress transcription, depending on the cell type. These elements were found in the spec2a RSR ortholog in Strongylocentrotus pallidus but not in RSR orthologs of Strongylocentrotus droebachiensis or Hemicentrotus pulcherrimus. Our results indicated that a dynamic pattern of cis-regulatory element evolution exists for spec genes despite their conserved aboral ectoderm expression.

  16. FootprintDB: Analysis of Plant Cis-Regulatory Elements, Transcription Factors, and Binding Interfaces.

    PubMed

    Contreras-Moreira, Bruno; Sebastian, Alvaro

    2016-01-01

    FootprintDB is a database and search engine that compiles regulatory sequences from open access libraries of curated DNA cis-elements and motifs, and their associated transcription factors (TFs). It systematically annotates the binding interfaces of the TFs by exploiting protein-DNA complexes deposited in the Protein Data Bank. Each entry in footprintDB is thus a DNA motif linked to the protein sequence of the TF(s) known to recognize it, and in most cases, the set of predicted interface residues involved in specific recognition. This chapter explains step-by-step how to search for DNA motifs and protein sequences in footprintDB and how to focus the search to a particular organism. Two real-world examples are shown where this software was used to analyze transcriptional regulation in plants. Results are described with the aim of guiding users on their interpretation, and special attention is given to the choices users might face when performing similar analyses. PMID:27557773

  17. Shared Enhancer Activity in the Limbs and Phallus and Functional Divergence of a Limb-Genital cis-Regulatory Element in Snakes.

    PubMed

    Infante, Carlos R; Mihala, Alexandra G; Park, Sungdae; Wang, Jialiang S; Johnson, Kenji K; Lauderdale, James D; Menke, Douglas B

    2015-10-12

    The amniote phallus and limbs differ dramatically in their morphologies but share patterns of signaling and gene expression in early development. Thus far, the extent to which genital and limb transcriptional networks also share cis-regulatory elements has remained unexplored. We show that many limb enhancers are retained in snake genomes, suggesting that these elements may function in non-limb tissues. Consistent with this, our analysis of cis-regulatory activity in mice and Anolis lizards reveals that patterns of enhancer activity in embryonic limbs and genitalia overlap heavily. In mice, deletion of HLEB, an enhancer of Tbx4, produces defects in hindlimbs and genitalia, establishing the importance of this limb-genital enhancer for development of these different appendages. Further analyses demonstrate that the HLEB of snakes has lost hindlimb enhancer function while retaining genital activity. Our findings identify roles for Tbx4 in genital development and highlight deep similarities in cis-regulatory activity between limbs and genitalia.

  18. Design of hyperthermophilic lipase chimeras by key motif-directed recombination.

    PubMed

    Zhou, Xiaoli; Gao, Le; Yang, Guangyu; Liu, Donglai; Bai, Aixi; Li, Binchun; Deng, Zixin; Feng, Yan

    2015-02-01

    Recombination of diverse natural evolved domains within a superfamily offers greater opportunity for enzyme function leaps. How to recombine protein modules from distant parents with less disruption in cross-interfaces is a challenging issue. Here, we identified the existence of a key motif, the sequence VVSVN(D)YR, within a structural motif ψ loop in the α/β-hydrolase fold superfamily, by using a MEME server and the PROMOTIF program. To obtain thermostable lipase-like enzymes, two chimeras were engineered at the key motif regions through recombination of domains from a mesophilic lipase and a hyperthermophilic esterase/peptidase with amino acid identity less than 21 %. The chimeras retained the desirable substrate preference of their mesophilic parent and exhibited more than 100-fold increased thermostability at 50 °C. Through site-directed mutation, we further improved activity of the chimera by 4.6-fold. The recombination strategy presented here enables the creation of novel catalysts. PMID:25530200

  19. Two RNA-binding motifs in eIF3 direct HCV IRES-dependent translation

    PubMed Central

    Sun, Chaomin; Querol-Audí, Jordi; Mortimer, Stefanie A.; Arias-Palomo, Ernesto; Doudna, Jennifer A.; Nogales, Eva; Cate, Jamie H. D.

    2013-01-01

    The initiation of protein synthesis plays an essential regulatory role in human biology. At the center of the initiation pathway, the 13-subunit eukaryotic translation initiation factor 3 (eIF3) controls access of other initiation factors and mRNA to the ribosome by unknown mechanisms. Using electron microscopy (EM), bioinformatics and biochemical experiments, we identify two highly conserved RNA-binding motifs in eIF3 that direct translation initiation from the hepatitis C virus internal ribosome entry site (HCV IRES) RNA. Mutations in the RNA-binding motif of subunit eIF3a weaken eIF3 binding to the HCV IRES and the 40S ribosomal subunit, thereby suppressing eIF2-dependent recognition of the start codon. Mutations in the eIF3c RNA-binding motif also reduce 40S ribosomal subunit binding to eIF3, and inhibit eIF5B-dependent steps downstream of start codon recognition. These results provide the first connection between the structure of the central translation initiation factor eIF3 and recognition of the HCV genomic RNA start codon, molecular interactions that likely extend to the human transcriptome. PMID:23766293

  20. Use of a Drosophila Genome-Wide Conserved Sequence Database to Identify Functionally Related cis-Regulatory Enhancers

    PubMed Central

    Brody, Thomas; Yavatkar, Amarendra S; Kuzin, Alexander; Kundu, Mukta; Tyson, Leonard J; Ross, Jermaine; Lin, Tzu-Yang; Lee, Chi-Hon; Awasaki, Takeshi; Lee, Tzumin; Odenwald, Ward F

    2012-01-01

    Background: Phylogenetic footprinting has revealed that cis-regulatory enhancers consist of conserved DNA sequence clusters (CSCs). Currently, there is no systematic approach for enhancer discovery and analysis that takes full-advantage of the sequence information within enhancer CSCs. Results: We have generated a Drosophila genome-wide database of conserved DNA consisting of >100,000 CSCs derived from EvoPrints spanning over 90% of the genome. cis-Decoder database search and alignment algorithms enable the discovery of functionally related enhancers. The program first identifies conserved repeat elements within an input enhancer and then searches the database for CSCs that score highly against the input CSC. Scoring is based on shared repeats as well as uniquely shared matches, and includes measures of the balance of shared elements, a diagnostic that has proven to be useful in predicting cis-regulatory function. To demonstrate the utility of these tools, a temporally-restricted CNS neuroblast enhancer was used to identify other functionally related enhancers and analyze their structural organization. Conclusions: cis-Decoder reveals that co-regulating enhancers consist of combinations of overlapping shared sequence elements, providing insights into the mode of integration of multiple regulating transcription factors. The database and accompanying algorithms should prove useful in the discovery and analysis of enhancers involved in any developmental process. Developmental Dynamics 241:169–189, 2012. © 2011 Wiley Periodicals, Inc. Key findings A genome-wide catalog of Drosophila conserved DNA sequence clusters. cis-Decoder discovers functionally related enhancers. Functionally related enhancers share balanced sequence element copy numbers. Many enhancers function during multiple phases of development. PMID:22174086

  1. PreCisIon: PREdiction of CIS-regulatory elements improved by gene's positION.

    PubMed

    Elati, Mohamed; Nicolle, Rémy; Junier, Ivan; Fernández, David; Fekih, Rim; Font, Julio; Képès, François

    2013-02-01

    Conventional approaches to predict transcriptional regulatory interactions usually rely on the definition of a shared motif sequence on the target genes of a transcription factor (TF). These efforts have been frustrated by the limited availability and accuracy of TF binding site motifs, usually represented as position-specific scoring matrices, which may match large numbers of sites and produce an unreliable list of target genes. To improve the prediction of binding sites, we propose to additionally use the unrelated knowledge of the genome layout. Indeed, it has been shown that co-regulated genes tend to be either neighbors or periodically spaced along the whole chromosome. This study demonstrates that respective gene positioning carries significant information. This novel type of information is combined with traditional sequence information by a machine learning algorithm called PreCisIon. To optimize this combination, PreCisIon builds a strong gene target classifier by adaptively combining weak classifiers based on either local binding sequence or global gene position. This strategy generically paves the way to the optimized incorporation of any future advances in gene target prediction based on local sequence, genome layout or on novel criteria. With the current state of the art, PreCisIon consistently improves methods based on sequence information only. This is shown by implementing a cross-validation analysis of the 20 major TFs from two phylogenetically remote model organisms. For Bacillus subtilis and Escherichia coli, respectively, PreCisIon achieves on average an area under the receiver operating characteristic curve of 70 and 60%, a sensitivity of 80 and 70% and a specificity of 60 and 56%. The newly predicted gene targets are demonstrated to be functionally consistent with previously known targets, as assessed by analysis of Gene Ontology enrichment or of the relevant literature and databases.

  2. A Parallel-Displaced Directly Linked 21-Carba-23-Thiaporphyrin Dimer Incorporating a Dihydrofulvalene Motif.

    PubMed

    Berlicka, Anna; Białek, Michał J; Latos-Grażyński, Lechosław

    2016-09-01

    In the search of porphyrin arrays with a unique geometry, the efficient synthesis of a directly linked 21-carba-23-thiaporphyrin dimer with the distinctive dihydrofulvalene bridging motif has been developed. This compound acquires an uncommon parallel-displaced arrangement of two carbaporphyrin planes. The dimer undergoes an acid-triggered cleavage to create of the asymmetric carbathiaporphyrin-carbathiachlorin dyad or 2,3-dihalo-21-carba-23-thiachlorin depending on choice of acid. A formation of a reactive carbocation intermediate is postulated to account for mechanism of cleavage. PMID:27530897

  3. Using machine learning to predict gene expression and discover sequence motifs

    NASA Astrophysics Data System (ADS)

    Li, Xuejing

    Recently, large amounts of experimental data for complex biological systems have become available. We use tools and algorithms from machine learning to build data-driven predictive models. We first present a novel algorithm to discover gene sequence motifs associated with temporal expression patterns of genes. Our algorithm, which is based on partial least squares (PLS) regression, is able to directly model the flow of information, from gene sequence to gene expression, to learn cis regulatory motifs and characterize associated gene expression patterns. Our algorithm outperforms traditional computational methods e.g. clustering in motif discovery. We then present a study of extending a machine learning model for transcriptional regulation predictive of genetic regulatory response to Caenorhabditis elegans. We show meaningful results both in terms of prediction accuracy on the test experiments and biological information extracted from the regulatory program. The model discovers DNA binding sites ab initio. We also present a case study where we detect a signal of lineage-specific regulation. Finally we present a comparative study on learning predictive models for motif discovery, based on different boosting algorithms: Adaptive Boosting (AdaBoost), Linear Programming Boosting (LPBoost) and Totally Corrective Boosting (TotalBoost). We evaluate and compare the performance of the three boosting algorithms via both statistical and biological validation, for hypoxia response in Saccharomyces cerevisiae.

  4. Separate elements of the TERMINAL FLOWER 1 cis-regulatory region integrate pathways to control flowering time and shoot meristem identity.

    PubMed

    Serrano-Mislata, Antonio; Fernández-Nohales, Pedro; Doménech, María J; Hanzawa, Yoshie; Bradley, Desmond; Madueño, Francisco

    2016-09-15

    TERMINAL FLOWER 1 (TFL1) is a key regulator of Arabidopsis plant architecture that responds to developmental and environmental signals to control flowering time and the fate of shoot meristems. TFL1 expression is dynamic, being found in all shoot meristems, but not in floral meristems, with the level and distribution changing throughout development. Using a variety of experimental approaches we have analysed the TFL1 promoter to elucidate its functional structure. TFL1 expression is based on distinct cis-regulatory regions, the most important being located 3' of the coding sequence. Our results indicate that TFL1 expression in the shoot apical versus lateral inflorescence meristems is controlled through distinct cis-regulatory elements, suggesting that different signals control expression in these meristem types. Moreover, we identified a cis-regulatory region necessary for TFL1 expression in the vegetative shoot and required for a wild-type flowering time, supporting that TFL1 expression in the vegetative meristem controls flowering time. Our study provides a model for the functional organisation of TFL1 cis-regulatory regions, contributing to our understanding of how developmental pathways are integrated at the genomic level of a key regulator to control plant architecture. PMID:27385013

  5. Separate elements of the TERMINAL FLOWER 1 cis-regulatory region integrate pathways to control flowering time and shoot meristem identity.

    PubMed

    Serrano-Mislata, Antonio; Fernández-Nohales, Pedro; Doménech, María J; Hanzawa, Yoshie; Bradley, Desmond; Madueño, Francisco

    2016-09-15

    TERMINAL FLOWER 1 (TFL1) is a key regulator of Arabidopsis plant architecture that responds to developmental and environmental signals to control flowering time and the fate of shoot meristems. TFL1 expression is dynamic, being found in all shoot meristems, but not in floral meristems, with the level and distribution changing throughout development. Using a variety of experimental approaches we have analysed the TFL1 promoter to elucidate its functional structure. TFL1 expression is based on distinct cis-regulatory regions, the most important being located 3' of the coding sequence. Our results indicate that TFL1 expression in the shoot apical versus lateral inflorescence meristems is controlled through distinct cis-regulatory elements, suggesting that different signals control expression in these meristem types. Moreover, we identified a cis-regulatory region necessary for TFL1 expression in the vegetative shoot and required for a wild-type flowering time, supporting that TFL1 expression in the vegetative meristem controls flowering time. Our study provides a model for the functional organisation of TFL1 cis-regulatory regions, contributing to our understanding of how developmental pathways are integrated at the genomic level of a key regulator to control plant architecture.

  6. cisMEP: an integrated repository of genomic epigenetic profiles and cis-regulatory modules in Drosophila

    PubMed Central

    2014-01-01

    Background Cis-regulatory modules (CRMs), or the DNA sequences required for regulating gene expression, play the central role in biological researches on transcriptional regulation in metazoan species. Nowadays, the systematic understanding of CRMs still mainly resorts to computational methods due to the time-consuming and small-scale nature of experimental methods. But the accuracy and reliability of different CRM prediction tools are still unclear. Without comparative cross-analysis of the results and combinatorial consideration with extra experimental information, there is no easy way to assess the confidence of the predicted CRMs. This limits the genome-wide understanding of CRMs. Description It is known that transcription factor binding and epigenetic profiles tend to determine functions of CRMs in gene transcriptional regulation. Thus integration of the genome-wide epigenetic profiles with systematically predicted CRMs can greatly help researchers evaluate and decipher the prediction confidence and possible transcriptional regulatory functions of these potential CRMs. However, these data are still fragmentary in the literatures. Here we performed the computational genome-wide screening for potential CRMs using different prediction tools and constructed the pioneer database, cisMEP (cis-regulatory module epigenetic profile database), to integrate these computationally identified CRMs with genomic epigenetic profile data. cisMEP collects the literature-curated TFBS location data and nine genres of epigenetic data for assessing the confidence of these potential CRMs and deciphering the possible CRM functionality. Conclusions cisMEP aims to provide a user-friendly interface for researchers to assess the confidence of different potential CRMs and to understand the functions of CRMs through experimentally-identified epigenetic profiles. The deposited potential CRMs and experimental epigenetic profiles for confidence assessment provide experimentally testable

  7. Specific Sequence Motifs Direct the Oxygenation and Chlorination of Tryptophan by Myeloperoxidase

    PubMed Central

    Fu, Xiaoyun; Wang, Yi; Kao, Jeffery; Irwin, Angela; d’Avignon, André; Mecham, Robert P.; Parks, William C.; Heinecke, Jay W.

    2008-01-01

    Most studies of protein oxidation have typically focused on the reactivity of single amino acid side chains while ignoring the potential importance of adjacent sequences in directing the reaction pathway. We previously showed that hypochlorous acid (HOCl), a specific product of myeloperoxidase, inactivates matrilysin by modifying adjacent tryptophan and glycine (WG) residues in the catalytic domain. Here, we use model peptides that mimic the region of matrilysin involved in this reaction, VVWGTA, VVWATA and the library VVWXTA, to determine whether specific sequence motifs are targeted for chlorination or oxygenation by myeloperoxidase. Our results demonstrate that HOCl generated by myeloperoxidase or activated neutrophils converts the peptide VVWGTA to a chlorinated product, WG+32(Cl). Tandem mass spectrometry in concert with high resolution 1H and two-dimensional NMR analysis revealed that the modification required cross-linking of the tryptophan to the amide of glycine followed by chlorination of the indole ring of tryptophan. In contrast, when glycine in the peptide was replaced with alanine, the major products were mono- and di-oxygenated tryptophan residues. When the peptide library VVWXTA (where X represents all 20 common amino acids) was exposed to HOCl, only WG produced a high yield of the chloroindolenine derivative. However, when glycine was replaced by other amino acids, oxygenated tryptophan derivatives were the major products. Our observations indicate that WG may represent a specific sequence motif in proteins that is targeted for chlorination by myeloperoxidase. PMID:16548523

  8. DMINDA: an integrated web server for DNA motif identification and analyses

    PubMed Central

    Ma, Qin; Zhang, Hanyuan; Mao, Xizeng; Zhou, Chuan; Liu, Bingqiang; Chen, Xin; Xu, Ying

    2014-01-01

    DMINDA (DNA motif identification and analyses) is an integrated web server for DNA motif identification and analyses, which is accessible at http://csbl.bmb.uga.edu/DMINDA/. This web site is freely available to all users and there is no login requirement. This server provides a suite of cis-regulatory motif analysis functions on DNA sequences, which are important to elucidation of the mechanisms of transcriptional regulation: (i) de novo motif finding for a given set of promoter sequences along with statistical scores for the predicted motifs derived based on information extracted from a control set, (ii) scanning motif instances of a query motif in provided genomic sequences, (iii) motif comparison and clustering of identified motifs, and (iv) co-occurrence analyses of query motifs in given promoter sequences. The server is powered by a backend computer cluster with over 150 computing nodes, and is particularly useful for motif prediction and analyses in prokaryotic genomes. We believe that DMINDA, as a new and comprehensive web server for cis-regulatory motif finding and analyses, will benefit the genomic research community in general and prokaryotic genome researchers in particular. PMID:24753419

  9. Comparative epigenomics in distantly related teleost species identifies conserved cis-regulatory nodes active during the vertebrate phylotypic period.

    PubMed

    Tena, Juan J; González-Aguilera, Cristina; Fernández-Miñán, Ana; Vázquez-Marín, Javier; Parra-Acero, Helena; Cross, Joe W; Rigby, Peter W J; Carvajal, Jaime J; Wittbrodt, Joachim; Gómez-Skarmeta, José L; Martínez-Morales, Juan R

    2014-07-01

    The complex relationship between ontogeny and phylogeny has been the subject of attention and controversy since von Baer's formulations in the 19th century. The classic concept that embryogenesis progresses from clade general features to species-specific characters has often been revisited. It has become accepted that embryos from a clade show maximum morphological similarity at the so-called phylotypic period (i.e., during mid-embryogenesis). According to the hourglass model, body plan conservation would depend on constrained molecular mechanisms operating at this period. More recently, comparative transcriptomic analyses have provided conclusive evidence that such molecular constraints exist. Examining cis-regulatory architecture during the phylotypic period is essential to understand the evolutionary source of body plan stability. Here we compare transcriptomes and key epigenetic marks (H3K4me3 and H3K27ac) from medaka (Oryzias latipes) and zebrafish (Danio rerio), two distantly related teleosts separated by an evolutionary distance of 115-200 Myr. We show that comparison of transcriptome profiles correlates with anatomical similarities and heterochronies observed at the phylotypic stage. Through comparative epigenomics, we uncover a pool of conserved regulatory regions (≈700), which are active during the vertebrate phylotypic period in both species. Moreover, we show that their neighboring genes encode mainly transcription factors with fundamental roles in tissue specification. We postulate that these regulatory regions, active in both teleost genomes, represent key constrained nodes of the gene networks that sustain the vertebrate body plan.

  10. Comparative epigenomics in distantly related teleost species identifies conserved cis-regulatory nodes active during the vertebrate phylotypic period

    PubMed Central

    Tena, Juan J.; González-Aguilera, Cristina; Fernández-Miñán, Ana; Vázquez-Marín, Javier; Parra-Acero, Helena; Cross, Joe W.; Rigby, Peter W.J.; Carvajal, Jaime J.; Wittbrodt, Joachim; Gómez-Skarmeta, José L.; Martínez-Morales, Juan R.

    2014-01-01

    The complex relationship between ontogeny and phylogeny has been the subject of attention and controversy since von Baer’s formulations in the 19th century. The classic concept that embryogenesis progresses from clade general features to species-specific characters has often been revisited. It has become accepted that embryos from a clade show maximum morphological similarity at the so-called phylotypic period (i.e., during mid-embryogenesis). According to the hourglass model, body plan conservation would depend on constrained molecular mechanisms operating at this period. More recently, comparative transcriptomic analyses have provided conclusive evidence that such molecular constraints exist. Examining cis-regulatory architecture during the phylotypic period is essential to understand the evolutionary source of body plan stability. Here we compare transcriptomes and key epigenetic marks (H3K4me3 and H3K27ac) from medaka (Oryzias latipes) and zebrafish (Danio rerio), two distantly related teleosts separated by an evolutionary distance of 115–200 Myr. We show that comparison of transcriptome profiles correlates with anatomical similarities and heterochronies observed at the phylotypic stage. Through comparative epigenomics, we uncover a pool of conserved regulatory regions (≈700), which are active during the vertebrate phylotypic period in both species. Moreover, we show that their neighboring genes encode mainly transcription factors with fundamental roles in tissue specification. We postulate that these regulatory regions, active in both teleost genomes, represent key constrained nodes of the gene networks that sustain the vertebrate body plan. PMID:24709821

  11. Cis-regulatory Changes at FLOWERING LOCUS T Mediate Natural Variation in Flowering Responses of Arabidopsis thaliana

    PubMed Central

    Schwartz, Christopher; Balasubramanian, Sureshkumar; Warthmann, Norman; Michael, Todd P.; Lempe, Janne; Sureshkumar, Sridevi; Kobayashi, Yasushi; Maloof, Julin N.; Borevitz, Justin O.; Chory, Joanne; Weigel, Detlef

    2009-01-01

    Flowering time, a critical adaptive trait, is modulated by several environmental cues. These external signals converge on a small set of genes that in turn mediate the flowering response. Mutant analysis and subsequent molecular studies have revealed that one of these integrator genes, FLOWERING LOCUS T (FT), responds to photoperiod and temperature cues, two environmental parameters that greatly influence flowering time. As the central player in the transition to flowering, the protein coding sequence of FT and its function are highly conserved across species. Using QTL mapping with a new advanced intercross-recombinant inbred line (AI-RIL) population, we show that a QTL tightly linked to FT contributes to natural variation in the flowering response to the combined effects of photoperiod and ambient temperature. Using heterogeneous inbred families (HIF) and introgression lines, we fine map the QTL to a 6.7 kb fragment in the FT promoter. We confirm by quantitative complementation that FT has differential activity in the two parental strains. Further support for FT underlying the QTL comes from a new approach, quantitative knockdown with artificial microRNAs (amiRNAs). Consistent with the causal sequence polymorphism being in the promoter, we find that the QTL affects FT expression. Taken together, these results indicate that allelic variation at pathway integrator genes such as FT can underlie phenotypic variability and that this may be achieved through cis-regulatory changes. PMID:19652183

  12. Identification of Predictive Cis-Regulatory Elements Using a Discriminative Objective Function and a Dynamic Search Space.

    PubMed

    Karnik, Rahul; Beer, Michael A

    2015-01-01

    The generation of genomic binding or accessibility data from massively parallel sequencing technologies such as ChIP-seq and DNase-seq continues to accelerate. Yet state-of-the-art computational approaches for the identification of DNA binding motifs often yield motifs of weak predictive power. Here we present a novel computational algorithm called MotifSpec, designed to find predictive motifs, in contrast to over-represented sequence elements. The key distinguishing feature of this algorithm is that it uses a dynamic search space and a learned threshold to find discriminative motifs in combination with the modeling of motifs using a full PWM (position weight matrix) rather than k-mer words or regular expressions. We demonstrate that our approach finds motifs corresponding to known binding specificities in several mammalian ChIP-seq datasets, and that our PWMs classify the ChIP-seq signals with accuracy comparable to, or marginally better than motifs from the best existing algorithms. In other datasets, our algorithm identifies novel motifs where other methods fail. Finally, we apply this algorithm to detect motifs from expression datasets in C. elegans using a dynamic expression similarity metric rather than fixed expression clusters, and find novel predictive motifs.

  13. Identification of Predictive Cis-Regulatory Elements Using a Discriminative Objective Function and a Dynamic Search Space

    PubMed Central

    Karnik, Rahul; Beer, Michael A.

    2015-01-01

    The generation of genomic binding or accessibility data from massively parallel sequencing technologies such as ChIP-seq and DNase-seq continues to accelerate. Yet state-of-the-art computational approaches for the identification of DNA binding motifs often yield motifs of weak predictive power. Here we present a novel computational algorithm called MotifSpec, designed to find predictive motifs, in contrast to over-represented sequence elements. The key distinguishing feature of this algorithm is that it uses a dynamic search space and a learned threshold to find discriminative motifs in combination with the modeling of motifs using a full PWM (position weight matrix) rather than k-mer words or regular expressions. We demonstrate that our approach finds motifs corresponding to known binding specificities in several mammalian ChIP-seq datasets, and that our PWMs classify the ChIP-seq signals with accuracy comparable to, or marginally better than motifs from the best existing algorithms. In other datasets, our algorithm identifies novel motifs where other methods fail. Finally, we apply this algorithm to detect motifs from expression datasets in C. elegans using a dynamic expression similarity metric rather than fixed expression clusters, and find novel predictive motifs. PMID:26465884

  14. A universal algorithm for genome-wide in silicio identification of biologically significant gene promoter putative cis-regulatory-elements; identification of new elements for reactive oxygen species and sucrose signaling in Arabidopsis.

    PubMed

    Geisler, Matt; Kleczkowski, Leszek A; Karpinski, Stanislaw

    2006-02-01

    Short motifs of many cis-regulatory elements (CREs) can be found in the promoters of most Arabidopsis genes, and this raises the question of how their presence can confer specific regulation. We developed a universal algorithm to test the biological significance of CREs by first identifying every Arabidopsis gene with a CRE and then statistically correlating the presence or absence of the element with the gene expression profile on multiple DNA microarrays. This algorithm was successfully verified for previously characterized abscisic acid, ethylene, sucrose and drought responsive CREs in Arabidopsis, showing that the presence of these elements indeed correlates with treatment-specific gene induction. Later, we used standard motif sampling methods to identify 128 putative motifs induced by excess light, reactive oxygen species and sucrose. Our algorithm was able to filter 20 out of 128 novel CREs which significantly correlated with gene induction by either heat, reactive oxygen species and/or sucrose. The position, orientation and sequence specificity of CREs was tested in silicio by analyzing the expression of genes with naturally occurring sequence variations. In three novel CREs the forward orientation correlated with sucrose induction and the reverse orientation with sucrose suppression. The functionality of the predicted novel CREs was experimentally confirmed using Arabidopsis cell-suspension cultures transformed with short promoter fragments or artificial promoters fused with the GUS reporter gene. Our genome-wide analysis opens up new possibilities for in silicio verification of the biological significance of newly discovered CREs, and allows for subsequent selection of such CREs for experimental studies.

  15. Extensive cis-Regulatory Variation Robust to Environmental Perturbation in Arabidopsis[W

    PubMed Central

    Cubillos, Francisco A.; Stegle, Oliver; Grondin, Cécile; Canut, Matthieu; Tisné, Sébastien; Gy, Isabelle

    2014-01-01

    cis- and trans-acting factors affect gene expression and responses to environmental conditions. However, for most plant systems, we lack a comprehensive map of these factors and their interaction with environmental variation. Here, we examined allele-specific expression (ASE) in an F1 hybrid to study how alleles from two Arabidopsis thaliana accessions affect gene expression. To investigate the effect of the environment, we used drought stress and developed a variance component model to estimate the combined genetic contributions of cis- and trans-regulatory polymorphisms, environmental factors, and their interactions. We quantified ASE for 11,003 genes, identifying 3318 genes with consistent ASE in control and stress conditions, demonstrating that cis-acting genetic effects are essentially robust to changes in the environment. Moreover, we found 1618 genes with genotype x environment (GxE) interactions, mostly cis x E interactions with magnitude changes in ASE. We found fewer trans x E interactions, but these effects were relatively less robust across conditions, showing more changes in the direction of the effect between environments; this confirms that trans-regulation plays an important role in the response to environmental conditions. Our data provide a detailed map of cis- and trans-regulation and GxE interactions in A. thaliana, laying the ground for mechanistic investigations and studies in other plants and environments. PMID:25428981

  16. Cis-regulatory underpinnings of human GLI3 expression in embryonic craniofacial structures and internal organs.

    PubMed

    Abbasi, Amir A; Minhas, Rashid; Schmidt, Ansgar; Koch, Sabine; Grzeschik, Karl-Heinz

    2013-10-01

    The zinc finger transcription factor Gli3 is an important mediator of Sonic hedgehog (Shh) signaling. During early embryonic development Gli3 participates in patterning and growth of the central nervous system, face, skeleton, limb, tooth and gut. Precise regulation of the temporal and spatial expression of Gli3 is crucial for the proper specification of these structures in mammals and other vertebrates. Previously we reported a set of human intronic cis-regulators controlling almost the entire known repertoire of endogenous Gli3 expression in mouse neural tube and limbs. However, the genetic underpinning of GLI3 expression in other embryonic domains such as craniofacial structures and internal organs remain elusive. Here we demonstrate in a transgenic mice assay the potential of a subset of human/fish conserved non-coding sequences (CNEs) residing within GLI3 intronic intervals to induce reporter gene expression at known regions of endogenous Gli3 transcription in embryonic domains other than central nervous system (CNS) and limbs. Highly specific reporter expression was observed in craniofacial structures, eye, gut, and genitourinary system. Moreover, the comparison of expression patterns directed by these intronic cis-acting regulatory elements in mouse and zebrafish embryos suggests that in accordance with sequence conservation, the target site specificity of a subset of these elements remains preserved among these two lineages. Taken together with our recent investigations, it is proposed here that during vertebrate evolution the Gli3 expression control acquired multiple, independently acting, intronic enhancers for spatiotemporal patterning of CNS, limbs, craniofacial structures and internal organs.

  17. A Phox2- and Hand2-dependent Hand1 cis-regulatory element reveals a unique gene dosage requirement for Hand2 during sympathetic neurogenesis.

    PubMed

    Vincentz, Joshua W; VanDusen, Nathan J; Fleming, Andrew B; Rubart, Michael; Firulli, Beth A; Howard, Marthe J; Firulli, Anthony B

    2012-02-01

    Neural crest cell specification and differentiation to a sympathetic neuronal fate serves as an important model for neurogenesis and depends upon the function of both bHLH transcription factors, notably Hand2, and homeodomain transcription factors, including Phox2b. Here, we define a 1007 bp cis-regulatory element 5' of the Hand1 gene sufficient to drive reporter expression within the sympathetic chain of transgenic mice. Comparative genomic analyses uncovered evolutionarily conserved consensus-binding sites within this element, which chromatin immunoprecipitation and electrophoretic mobility shift assays confirm are bound by Hand2 and Phox2b. Mutational analyses revealed that the conserved Phox2 and E-box binding sites are necessary for proper cis-regulatory element activity, and expression analyses on both Hand2 conditionally null and hypomorphic backgrounds demonstrate that Hand2 is required for reporter activation in a gene dosage-dependent manner. We demonstrate that Hand2 and Hand1 differentially bind the E-boxes in this cis-regulatory element, establishing molecular differences between these two factors. Finally, we demonstrate that Hand1 is dispensable for normal tyrosine hydroxylase (TH) and dopamine β-hydroxylase (DBH) expression in sympathetic neurons, even when Hand2 gene dosage is concurrently reduced by half. Together, these data define a tissue-specific Hand1 cis-regulatory element controlled by two factors essential for the development of the sympathetic nervous system and provide in vivo regulatory evidence to support previous findings that Hand2, rather than Hand1, is predominantly responsible for regulating TH, DBH, and Hand1 expression in developing sympathetic neurons.

  18. A survey of ancient conserved non-coding elements in the PAX6 locus reveals a landscape of interdigitated cis-regulatory archipelagos.

    PubMed

    Bhatia, Shipra; Monahan, Jack; Ravi, Vydianathan; Gautier, Philippe; Murdoch, Emma; Brenner, Sydney; van Heyningen, Veronica; Venkatesh, Byrappa; Kleinjan, Dirk A

    2014-03-15

    Biological differences between cell types and developmental processes are characterised by differences in gene expression profiles. Gene-distal enhancers are key components of the regulatory networks that specify the tissue-specific expression patterns driving embryonic development and cell fate decisions, and variations in their sequences are a major contributor to genetic disease and disease susceptibility. Despite advances in the methods for discovery of putative cis-regulatory sequences, characterisation of their spatio-temporal enhancer activities in a mammalian model system remains a major bottle-neck. We employed a strategy that combines gnathostome sequence conservation with transgenic mouse and zebrafish reporter assays to survey the genomic locus of the developmental control gene PAX6 for the presence of novel cis-regulatory elements. Sequence comparison between human and the cartilaginous elephant shark (Callorhinchus milii) revealed several ancient gnathostome conserved non-coding elements (agCNEs) dispersed widely throughout the PAX6 locus, extending the range of the known PAX6 cis-regulatory landscape to contain the full upstream PAX6-RCN1 intergenic region. Our data indicates that ancient conserved regulatory sequences can be tested effectively in transgenic zebrafish even when not conserved in zebrafish themselves. The strategy also allows efficient dissection of compound regulatory regions previously assessed in transgenic mice. Remarkable overlap in expression patterns driven by sets of agCNEs indicates that PAX6 resides in a landscape of multiple tissue-specific regulatory archipelagos. PMID:24440152

  19. Cis-regulatory sequence variation and association with Mycoplasma load in natural populations of the house finch (Carpodacus mexicanus)

    PubMed Central

    Backström, Niclas; Shipilina, Daria; Blom, Mozes P K; Edwards, Scott V

    2013-01-01

    Characterization of the genetic basis of fitness traits in natural populations is important for understanding how organisms adapt to the changing environment and to novel events, such as epizootics. However, candidate fitness-influencing loci, such as regulatory regions, are usually unavailable in nonmodel species. Here, we analyze sequence data from targeted resequencing of the cis-regulatory regions of three candidate genes for disease resistance (CD74, HSP90α, and LCP1) in populations of the house finch (Carpodacus mexicanus) historically exposed (Alabama) and naïve (Arizona) to Mycoplasma gallisepticum. Our study, the first to quantify variation in regulatory regions in wild birds, reveals that the upstream regions of CD74 and HSP90α are GC-rich, with the former exhibiting unusually low sequence variation for this species. We identified two SNPs, located in a GC-rich region immediately upstream of an inferred promoter site in the gene HSP90α, that were significantly associated with Mycoplasma pathogen load in the two populations. The SNPs are closely linked and situated in potential regulatory sequences: one in a binding site for the transcription factor nuclear NFYα and the other in a dinucleotide microsatellite ((GC)6). The genotype associated with pathogen load in the putative NFYα binding site was significantly overrepresented in the Alabama birds. However, we did not see strong effects of selection at this SNP, perhaps because selection has acted on standing genetic variation over an extremely short time in a highly recombining region. Our study is a useful starting point to explore functional relationships between sequence polymorphisms, gene expression, and phenotypic traits, such as pathogen resistance that affect fitness in the wild. PMID:23532859

  20. Identification of Important Nodes in Directed Biological Networks: A Network Motif Approach

    PubMed Central

    Wang, Pei; Lü, Jinhu; Yu, Xinghuo

    2014-01-01

    Identification of important nodes in complex networks has attracted an increasing attention over the last decade. Various measures have been proposed to characterize the importance of nodes in complex networks, such as the degree, betweenness and PageRank. Different measures consider different aspects of complex networks. Although there are numerous results reported on undirected complex networks, few results have been reported on directed biological networks. Based on network motifs and principal component analysis (PCA), this paper aims at introducing a new measure to characterize node importance in directed biological networks. Investigations on five real-world biological networks indicate that the proposed method can robustly identify actually important nodes in different networks, such as finding command interneurons, global regulators and non-hub but evolutionary conserved actually important nodes in biological networks. Receiver Operating Characteristic (ROC) curves for the five networks indicate remarkable prediction accuracy of the proposed measure. The proposed index provides an alternative complex network metric. Potential implications of the related investigations include identifying network control and regulation targets, biological networks modeling and analysis, as well as networked medicine. PMID:25170616

  1. i-cisTarget 2015 update: generalized cis-regulatory enrichment analysis in human, mouse and fly

    PubMed Central

    Imrichová, Hana; Hulselmans, Gert; Kalender Atak, Zeynep; Potier, Delphine; Aerts, Stein

    2015-01-01

    i-cisTarget is a web tool to predict regulators of a set of genomic regions, such as ChIP-seq peaks or co-regulated/similar enhancers. i-cisTarget can also be used to identify upstream regulators and their target enhancers starting from a set of co-expressed genes. Whereas the original version of i-cisTarget was focused on Drosophila data, the 2015 update also provides support for human and mouse data. i-cisTarget detects transcription factor motifs (position weight matrices) and experimental data tracks (e.g. from ENCODE, Roadmap Epigenomics) that are enriched in the input set of regions. As experimental data tracks we include transcription factor ChIP-seq data, histone modification ChIP-seq data and open chromatin data. The underlying processing method is based on a ranking-and-recovery procedure, allowing accurate determination of enrichment across heterogeneous datasets, while also discriminating direct from indirect target regions through a ‘leading edge’ analysis. We illustrate i-cisTarget on various Ewing sarcoma datasets to identify EWS-FLI1 targets starting from ChIP-seq, differential ATAC-seq, differential H3K27ac and differential gene expression data. Use of i-cisTarget is free and open to all, and there is no login requirement. Address: http://gbiomed.kuleuven.be/apps/lcb/i-cisTarget. PMID:25925574

  2. i-cisTarget 2015 update: generalized cis-regulatory enrichment analysis in human, mouse and fly.

    PubMed

    Imrichová, Hana; Hulselmans, Gert; Atak, Zeynep Kalender; Potier, Delphine; Aerts, Stein

    2015-07-01

    i-cisTarget is a web tool to predict regulators of a set of genomic regions, such as ChIP-seq peaks or co-regulated/similar enhancers. i-cisTarget can also be used to identify upstream regulators and their target enhancers starting from a set of co-expressed genes. Whereas the original version of i-cisTarget was focused on Drosophila data, the 2015 update also provides support for human and mouse data. i-cisTarget detects transcription factor motifs (position weight matrices) and experimental data tracks (e.g. from ENCODE, Roadmap Epigenomics) that are enriched in the input set of regions. As experimental data tracks we include transcription factor ChIP-seq data, histone modification ChIP-seq data and open chromatin data. The underlying processing method is based on a ranking-and-recovery procedure, allowing accurate determination of enrichment across heterogeneous datasets, while also discriminating direct from indirect target regions through a 'leading edge' analysis. We illustrate i-cisTarget on various Ewing sarcoma datasets to identify EWS-FLI1 targets starting from ChIP-seq, differential ATAC-seq, differential H3K27ac and differential gene expression data. Use of i-cisTarget is free and open to all, and there is no login requirement. Address: http://gbiomed.kuleuven.be/apps/lcb/i-cisTarget.

  3. Caught in the evolutionary act: precise cis-regulatory basis of difference in the organization of gene networks of sea stars and sea urchins.

    PubMed

    Hinman, Veronica F; Nguyen, Albert; Davidson, Eric H

    2007-12-15

    The regulatory control of otxbeta1/2 in the sea urchin Strongylocentrotus purpuratus and the sea star Asterina miniata provides an exceptional opportunity to determine the genomic basis of evolutionary change in gene regulatory network (GRN) architectures. Network perturbation analyses in both taxa show that Otx regulates the transcription factors gatae and krox/blimp1 and both of these transcription factors also feed back and regulate otx. The otx gene also autoregulates. This three way interaction is an example of a GRN kernel. It has been conserved for 500 million years since these two taxa last shared a common ancestor. Amid this high level of conservation we show here one significant regulatory change. Tbrain is required for correct otxbeta1/2 expression in the sea star but not in the sea urchin. In sea urchin, tbrain is not co-expressed with otxbeta1/2 and instead has an essential role in specification of the embryonic skeleton. Tbrain in these echinoderms is thus a perfect example of an orthologous gene co-opted for entirely different developmental processes. We isolate and test the sea star otxbeta1/2 cis-regulatory module and demonstrate functional binding sites for each of the predicted inputs, including Tbrain. We compare it to the logic processing operating in the sea urchin otxbeta1/2 cis-regulatory module and present an evolutionary scenario of the change in Tbrain dependence. Finally, inter-specific gene transfer experiments confirm this scenario and demonstrate evolution occurring at the level of sequence changes to the cis-regulatory module.

  4. Motif analysis in directed ordered networks and applications to food webs

    PubMed Central

    Paulau, Pavel V.; Feenders, Christoph; Blasius, Bernd

    2015-01-01

    The analysis of small recurrent substructures, so called network motifs, has become a standard tool of complex network science to unveil the design principles underlying the structure of empirical networks. In many natural systems network nodes are associated with an intrinsic property according to which they can be ordered and compared against each other. Here, we expand standard motif analysis to be able to capture the hierarchical structure in such ordered networks. Our new approach is based on the identification of all ordered 3-node substructures and the visualization of their significance profile. We present a technique to calculate the fine grained motif spectrum by resolving the individual members of isomorphism classes (sets of substructures formed by permuting node-order). We apply this technique to computer generated ensembles of ordered networks and to empirical food web data, demonstrating the importance of considering node order for food-web analysis. Our approach may not only be helpful to identify hierarchical patterns in empirical food webs and other natural networks, it may also provide the base for extending motif analysis to other types of multi-layered networks. PMID:26144248

  5. Motif analysis in directed ordered networks and applications to food webs.

    PubMed

    Paulau, Pavel V; Feenders, Christoph; Blasius, Bernd

    2015-01-01

    The analysis of small recurrent substructures, so called network motifs, has become a standard tool of complex network science to unveil the design principles underlying the structure of empirical networks. In many natural systems network nodes are associated with an intrinsic property according to which they can be ordered and compared against each other. Here, we expand standard motif analysis to be able to capture the hierarchical structure in such ordered networks. Our new approach is based on the identification of all ordered 3-node substructures and the visualization of their significance profile. We present a technique to calculate the fine grained motif spectrum by resolving the individual members of isomorphism classes (sets of substructures formed by permuting node-order). We apply this technique to computer generated ensembles of ordered networks and to empirical food web data, demonstrating the importance of considering node order for food-web analysis. Our approach may not only be helpful to identify hierarchical patterns in empirical food webs and other natural networks, it may also provide the base for extending motif analysis to other types of multi-layered networks.

  6. iRegulon: From a Gene List to a Gene Regulatory Network Using Large Motif and Track Collections

    PubMed Central

    Imrichová, Hana; Van de Sande, Bram; Standaert, Laura; Christiaens, Valerie; Hulselmans, Gert; Herten, Koen; Naval Sanchez, Marina; Potier, Delphine; Svetlichnyy, Dmitry; Kalender Atak, Zeynep; Fiers, Mark; Marine, Jean-Christophe; Aerts, Stein

    2014-01-01

    Identifying master regulators of biological processes and mapping their downstream gene networks are key challenges in systems biology. We developed a computational method, called iRegulon, to reverse-engineer the transcriptional regulatory network underlying a co-expressed gene set using cis-regulatory sequence analysis. iRegulon implements a genome-wide ranking-and-recovery approach to detect enriched transcription factor motifs and their optimal sets of direct targets. We increase the accuracy of network inference by using very large motif collections of up to ten thousand position weight matrices collected from various species, and linking these to candidate human TFs via a motif2TF procedure. We validate iRegulon on gene sets derived from ENCODE ChIP-seq data with increasing levels of noise, and we compare iRegulon with existing motif discovery methods. Next, we use iRegulon on more challenging types of gene lists, including microRNA target sets, protein-protein interaction networks, and genetic perturbation data. In particular, we over-activate p53 in breast cancer cells, followed by RNA-seq and ChIP-seq, and could identify an extensive up-regulated network controlled directly by p53. Similarly we map a repressive network with no indication of direct p53 regulation but rather an indirect effect via E2F and NFY. Finally, we generalize our computational framework to include regulatory tracks such as ChIP-seq data and show how motif and track discovery can be combined to map functional regulatory interactions among co-expressed genes. iRegulon is available as a Cytoscape plugin from http://iregulon.aertslab.org. PMID:25058159

  7. A cis-regulatory sequence from a short intergenic region gives rise to a strong microbe-associated molecular pattern-responsive synthetic promoter.

    PubMed

    Lehmeyer, Mona; Hanko, Erik K R; Roling, Lena; Gonzalez, Lilian; Wehrs, Maren; Hehl, Reinhard

    2016-06-01

    The high gene density in Arabidopsis thaliana leaves only relatively short intergenic regions for potential cis-regulatory sequences. To learn more about the regulation of genes harbouring only very short upstream intergenic regions, this study investigates a recently identified novel microbe-associated molecular pattern (MAMP)-responsive cis-sequence located within the 101 bp long intergenic region upstream of the At1g13990 gene. It is shown that the cis-regulatory sequence is sufficient for MAMP-responsive reporter gene activity in the context of its native promoter. The 3' UTR of the upstream gene has a quantitative effect on gene expression. In context of a synthetic promoter, the cis-sequence is shown to achieve a strong increase in reporter gene activity as a monomer, dimer and tetramer. Mutation analysis of the cis-sequence determined the specific nucleotides required for gene expression activation. In transgenic A. thaliana the synthetic promoter harbouring a tetramer of the cis-sequence not only drives strong pathogen-responsive reporter gene expression but also shows a high background activity. The results of this study contribute to our understanding how genes with very short upstream intergenic regions are regulated and how these regions can serve as a source for MAMP-responsive cis-sequences for synthetic promoter design.

  8. ‘In silico expression analysis’, a novel PathoPlant web tool to identify abiotic and biotic stress conditions associated with specific cis-regulatory sequences

    PubMed Central

    Machens, Fabian; Brill, Yuri; Romanov, Artyom; Bülow, Lorenz; Hehl, Reinhard

    2014-01-01

    Using bioinformatics, putative cis-regulatory sequences can be easily identified using pattern recognition programs on promoters of specific gene sets. The abundance of predicted cis-sequences is a major challenge to associate these sequences with a possible function in gene expression regulation. To identify a possible function of the predicted cis-sequences, a novel web tool designated ‘in silico expression analysis’ was developed that correlates submitted cis-sequences with gene expression data from Arabidopsis thaliana. The web tool identifies the A. thaliana genes harbouring the sequence in a defined promoter region and compares the expression of these genes with microarray data. The result is a hierarchy of abiotic and biotic stress conditions to which these genes are most likely responsive. When testing the performance of the web tool, known cis-regulatory sequences were submitted to the ‘in silico expression analysis’ resulting in the correct identification of the associated stress conditions. When using a recently identified novel elicitor-responsive sequence, a WT-box (CGACTTTT), the ‘in silico expression analysis’ predicts that genes harbouring this sequence in their promoter are most likely Botrytis cinerea induced. Consistent with this prediction, the strongest induction of a reporter gene harbouring this sequence in the promoter is observed with B. cinerea in transgenic A. thaliana. Database URL: http://www.pathoplant.de/expression_analysis.php. PMID:24727366

  9. Cell adhesion molecule pathway genes are regulated by cis-regulatory SNPs and show significantly altered expression in Alzheimer's disease brains.

    PubMed

    Bao, Xinjie; Liu, Gengfeng; Jiang, Yongshuai; Jiang, Qinghua; Liao, Mingzhi; Feng, Rennan; Zhang, Liangcai; Ma, Guoda; Zhang, Shuyan; Chen, Zugen; Zhao, Bin; Wang, Renzhi; Li, Keshen; Liu, Guiyou

    2015-10-01

    We previously identified the cell adhesion molecule (CAM) pathway as a consistent signal in 2 Alzheimer's disease (AD) genome-wide association studies (GWAS). However, the genetic mechanisms of the CAM pathway in AD are unclear. Here, we conducted pathway analysis using (1) Kyoto Encyclopedia of Genes and Genomes and Gene Ontology pathways; (2) 4 brain expression GWAS datasets; and (3) 2 whole-genome AD case-control expression datasets. Using the 4 brain expression GWAS datasets, we identified that genes regulated by cis-regulatory single-nucleotide polymorphisms (SNPs) were significantly enriched in the CAM pathway (p = 2.05E-06, p = 6.10E-07, p = 2.05E-06, and p = 1.47E-07 for each dataset). Interestingly, CAM is a significantly enriched pathway using down-regulated genes (raw p = 0.0235 and adjusted p = 0.0305) and all differentially expressed genes (raw p = 0.0105 and adjusted p = 0.0156) in dataset 5, and all differentially expressed genes (raw p = 0.0041 and adjusted p = 0.0062) in dataset 6. Collectively, our results show that CAM pathway genes are regulated by cis-regulatory SNPs and show significantly altered expression in AD. We believe that our results advance the understanding of AD mechanisms and will be useful for future genetic studies of AD.

  10. A cis-regulatory sequence from a short intergenic region gives rise to a strong microbe-associated molecular pattern-responsive synthetic promoter.

    PubMed

    Lehmeyer, Mona; Hanko, Erik K R; Roling, Lena; Gonzalez, Lilian; Wehrs, Maren; Hehl, Reinhard

    2016-06-01

    The high gene density in Arabidopsis thaliana leaves only relatively short intergenic regions for potential cis-regulatory sequences. To learn more about the regulation of genes harbouring only very short upstream intergenic regions, this study investigates a recently identified novel microbe-associated molecular pattern (MAMP)-responsive cis-sequence located within the 101 bp long intergenic region upstream of the At1g13990 gene. It is shown that the cis-regulatory sequence is sufficient for MAMP-responsive reporter gene activity in the context of its native promoter. The 3' UTR of the upstream gene has a quantitative effect on gene expression. In context of a synthetic promoter, the cis-sequence is shown to achieve a strong increase in reporter gene activity as a monomer, dimer and tetramer. Mutation analysis of the cis-sequence determined the specific nucleotides required for gene expression activation. In transgenic A. thaliana the synthetic promoter harbouring a tetramer of the cis-sequence not only drives strong pathogen-responsive reporter gene expression but also shows a high background activity. The results of this study contribute to our understanding how genes with very short upstream intergenic regions are regulated and how these regions can serve as a source for MAMP-responsive cis-sequences for synthetic promoter design. PMID:26833485

  11. 'In silico expression analysis', a novel PathoPlant web tool to identify abiotic and biotic stress conditions associated with specific cis-regulatory sequences.

    PubMed

    Bolívar, Julio C; Machens, Fabian; Brill, Yuri; Romanov, Artyom; Bülow, Lorenz; Hehl, Reinhard

    2014-01-01

    Using bioinformatics, putative cis-regulatory sequences can be easily identified using pattern recognition programs on promoters of specific gene sets. The abundance of predicted cis-sequences is a major challenge to associate these sequences with a possible function in gene expression regulation. To identify a possible function of the predicted cis-sequences, a novel web tool designated 'in silico expression analysis' was developed that correlates submitted cis-sequences with gene expression data from Arabidopsis thaliana. The web tool identifies the A. thaliana genes harbouring the sequence in a defined promoter region and compares the expression of these genes with microarray data. The result is a hierarchy of abiotic and biotic stress conditions to which these genes are most likely responsive. When testing the performance of the web tool, known cis-regulatory sequences were submitted to the 'in silico expression analysis' resulting in the correct identification of the associated stress conditions. When using a recently identified novel elicitor-responsive sequence, a WT-box (CGACTTTT), the 'in silico expression analysis' predicts that genes harbouring this sequence in their promoter are most likely Botrytis cinerea induced. Consistent with this prediction, the strongest induction of a reporter gene harbouring this sequence in the promoter is observed with B. cinerea in transgenic A. thaliana. DATABASE URL: http://www.pathoplant.de/expression_analysis.php. PMID:24727366

  12. Transvection in the Drosophila Abd-B domain: extensive upstream sequences are involved in anchoring distant cis-regulatory regions to the promoter.

    PubMed Central

    Sipos, L; Mihály, J; Karch, F; Schedl, P; Gausz, J; Gyurkovics, H

    1998-01-01

    The Abd-B gene, one of the three homeotic genes in the Drosophila bithorax complex (BX-C), is required for the proper identity of the fifth through the eighth abdominal segments (corresponding to parasegments 10-14) of the fruitfly. The morphological difference between these four segments is due to the differential expression of Abd-B, which is achieved by the action of the parasegment-specific cis-regulatory regions infra-abdominal-5 (iab-5), -6, -7 and -8. The dominant gain-of-function mutation Frontabdominal-7 (Fab-7) removes a boundary separating two of these cis-regulatory regions, iab-6 and iab-7. As a consequence of the Fab-7 deletion, the parasegment 12- (PS12-) specific iab-7 is ectopically activated in PS11. This results in the transformation of the sixth abdominal segment (A6) into the seventh (A7) in Fab-7 flies. Here we report that point mutations of the Abd-B gene in trans suppress the Fab-7 phenotype in a pairing-dependent manner and thus represent a type of transvection. We show that the observed suppression is the result of trans-regulation of the defective Abd-B gene by the ectopically activated iab-7. Unlike previously demonstrated cases of trans-regulation in the Abd-B locus, trans-suppression of Fab-7 is sensitive to heterozygosity for chromosomal rearrangements that disturb homologous pairing at the nearby Ubx locus. However, in contrast to Ubx, the transvection we observed in the Abd-B locus is insensitive to the allelic status of zeste. Analysis of different deletion alleles of Abd-B that enhance trans-regulation suggests that an extensive upstream region, different from the sequences required for transcription initiation, mediates interactions between the iab cis-regulatory regions and the proximal Abd-B promoter. Moreover, we find that the amount of DNA deleted in the upstream region is roughly proportional to the strength of trans-interaction, suggesting that this region consists of numerous discrete elements that cooperate in tethering

  13. Directional Phosphorylation and Nuclear Transport of the Splicing Factor SRSF1 Is Regulated by an RNA Recognition Motif.

    PubMed

    Serrano, Pedro; Aubol, Brandon E; Keshwani, Malik M; Forli, Stefano; Ma, Chen-Ting; Dutta, Samit K; Geralt, Michael; Wüthrich, Kurt; Adams, Joseph A

    2016-06-01

    Multisite phosphorylation is required for the biological function of serine-arginine (SR) proteins, a family of essential regulators of mRNA splicing. These modifications are catalyzed by serine-arginine protein kinases (SRPKs) that phosphorylate numerous serines in arginine-serine-rich (RS) domains of SR proteins using a directional, C-to-N-terminal mechanism. The present studies explore how SRPKs govern this highly biased phosphorylation reaction and investigate biological roles of the observed directional phosphorylation mechanism. Using NMR spectroscopy with two separately expressed domains of SRSF1, we showed that several residues in the RNA-binding motif 2 interact with the N-terminal region of the RS domain (RS1). These contacts provide a structural framework that balances the activities of SRPK1 and the protein phosphatase PP1, thereby regulating the phosphoryl content of the RS domain. Disruption of the implicated intramolecular RNA-binding motif 2-RS domain interaction impairs both the directional phosphorylation mechanism and the nuclear translocation of SRSF1 demonstrating that the intrinsic phosphorylation bias is obligatory for SR protein biological function. PMID:27091468

  14. Two negative cis-regulatory regions involved in fruit-specific promoter activity from watermelon (Citrullus vulgaris S.).

    PubMed

    Yin, Tao; Wu, Hanying; Zhang, Shanglong; Lu, Hongyu; Zhang, Lingxiao; Xu, Yong; Chen, Daming; Liu, Jingmei

    2009-01-01

    A 1.8 kb 5'-flanking region of the large subunit of ADP-glucose pyrophosphorylase, isolated from watermelon (Citrullus vulgaris S.), has fruit-specific promoter activity in transgenic tomato plants. Two negative regulatory regions, from -986 to -959 and from -472 to -424, were identified in this promoter region by fine deletion analyses. Removal of both regions led to constitutive expression in epidermal cells. Gain-of-function experiments showed that these two regions were sufficient to inhibit RFP (red fluorescent protein) expression in transformed epidermal cells when fused to the cauliflower mosaic virus (CaMV) 35S minimal promoter. Gel mobility shift experiments demonstrated the presence of leaf nuclear factors that interact with these two elements. A TCCAAAA motif was identified in these two regions, as well as one in the reverse orientation, which was confirmed to be a novel specific cis-element. A quantitative beta-glucuronidase (GUS) activity assay of stable transgenic tomato plants showed that the activities of chimeric promoters harbouring only one of the two cis-elements, or both, were approximately 10-fold higher in fruits than in leaves. These data confirm that the TCCAAAA motif functions as a fruit-specific element by inhibiting gene expression in leaves.

  15. The Significance of Multivalent Bonding Motifs and "Bond Order" in DNA-Directed Nanoparticle Crystallization.

    PubMed

    Thaner, Ryan V; Eryazici, Ibrahim; Macfarlane, Robert J; Brown, Keith A; Lee, Byeongdu; Nguyen, SonBinh T; Mirkin, Chad A

    2016-05-18

    Multivalent oligonucleotide-based bonding elements have been synthesized and studied for the assembly and crystallization of gold nanoparticles. Through the use of organic branching points, divalent and trivalent DNA linkers were readily incorporated into the oligonucleotide shells that define DNA-nanoparticles and compared to monovalent linker systems. These multivalent bonding motifs enable the change of "bond strength" between particles and therefore modulate the effective "bond order." In addition, the improved accessibility of strands between neighboring particles, either due to multivalency or modifications to increase strand flexibility, gives rise to superlattices with less strain in the crystallites compared to traditional designs. Furthermore, the increased availability and number of binding modes also provide a new variable that allows previously unobserved crystal structures to be synthesized, as evidenced by the formation of a thorium phosphide superlattice. PMID:27148838

  16. Microevolution of cis-regulatory elements: an example from the pair-rule segmentation gene fushi tarazu in the Drosophila melanogaster subgroup.

    PubMed

    Bakkali, Mohammed

    2011-01-01

    The importance of non-coding DNAs that control transcription is ever noticeable, but the characterization and analysis of the evolution of such DNAs presents challenges not found in the analysis of coding sequences. In this study of the cis-regulatory elements of the pair rule segmentation gene fushi tarazu (ftz) I report the DNA sequences of ftz's zebra element (promoter) and a region containing the proximal enhancer from a total of 45 fly lines belonging to several populations of the species Drosophila melanogaster, D. simulans, D. sechellia, D. mauritiana, D. yakuba, D. teissieri, D. orena and D. erecta. Both elements evolve at slower rate than ftz synonymous sites, thus reflecting their functional importance. The promoter evolves more slowly than the average for ftz's coding sequence while, on average, the enhancer evolves more rapidly, suggesting more functional constraint and effective purifying selection on the former. Comparative analysis of the number and nature of base substitutions failed to detect significant evidence for positive/adaptive selection in transcription-factor-binding sites. These seem to evolve at similar rates to regions not known to bind transcription factors. Although this result reflects the evolutionary flexibility of the transcription factor binding sites, it also suggests a complex and still not completely understood nature of even the characterized cis-regulatory sequences. The latter seem to contain more functional parts than those currently identified, some of which probably transcription factor binding. This study illustrates ways in which functional assignments of sequences within cis-acting sequences can be used in the search for adaptive evolution, but also highlights difficulties in how such functional assignment and analysis can be carried out.

  17. Direct RNA motif definition and identification from multiple sequence alignments using secondary structure profiles.

    PubMed

    Gautheret, D; Lambert, A

    2001-11-01

    We present here a new approach to the problem of defining RNA signatures and finding their occurrences in sequence databases. The proposed method is based on "secondary structure profiles". An RNA sequence alignment with secondary structure information is used as an input. Two types of weight matrices/profiles are constructed from this alignment: single strands are represented by a classical lod-scores profile while helical regions are represented by an extended "helical profile" comprising 16 lod-scores per position, one for each of the 16 possible base-pairs. Database searches are then conducted using a simultaneous search for helical profiles and dynamic programming alignment of single strand profiles. The algorithm has been implemented into a new software, ERPIN, that performs both profile construction and database search. Applications are presented for several RNA motifs. The automated use of sequence information in both single-stranded and helical regions yields better sensitivity/specificity ratios than descriptor-based programs. Furthermore, since the translation of alignments into profiles is straightforward with ERPIN, iterative searches can easily be conducted to enrich collections of homologous RNAs.

  18. Sex Chromosome-wide Transcriptional Suppression and Compensatory Cis-Regulatory Evolution Mediate Gene Expression in the Drosophila Male Germline

    PubMed Central

    Landeen, Emily L.; Muirhead, Christina A.; Meiklejohn, Colin D.; Presgraves, Daven C.

    2016-01-01

    The evolution of heteromorphic sex chromosomes has repeatedly resulted in the evolution of sex chromosome-specific forms of regulation, including sex chromosome dosage compensation in the soma and meiotic sex chromosome inactivation in the germline. In the male germline of Drosophila melanogaster, a novel but poorly understood form of sex chromosome-specific transcriptional regulation occurs that is distinct from canonical sex chromosome dosage compensation or meiotic inactivation. Previous work shows that expression of reporter genes driven by testis-specific promoters is considerably lower—approximately 3-fold or more—for transgenes inserted into X chromosome versus autosome locations. Here we characterize this transcriptional suppression of X-linked genes in the male germline and its evolutionary consequences. Using transgenes and transpositions, we show that most endogenous X-linked genes, not just testis-specific ones, are transcriptionally suppressed several-fold specifically in the Drosophila male germline. In wild-type testes, this sex chromosome-wide transcriptional suppression is generally undetectable, being effectively compensated by the gene-by-gene evolutionary recruitment of strong promoters on the X chromosome. We identify and experimentally validate a promoter element sequence motif that is enriched upstream of the transcription start sites of hundreds of testis-expressed genes; evolutionarily conserved across species; associated with strong gene expression levels in testes; and overrepresented on the X chromosome. These findings show that the expression of X-linked genes in the Drosophila testes reflects a balance between chromosome-wide epigenetic transcriptional suppression and long-term compensatory adaptation by sex-linked genes. Our results have broad implications for the evolution of gene expression in the Drosophila male germline and for genome evolution. PMID:27404402

  19. Sex Chromosome-wide Transcriptional Suppression and Compensatory Cis-Regulatory Evolution Mediate Gene Expression in the Drosophila Male Germline.

    PubMed

    Landeen, Emily L; Muirhead, Christina A; Wright, Lori; Meiklejohn, Colin D; Presgraves, Daven C

    2016-07-01

    The evolution of heteromorphic sex chromosomes has repeatedly resulted in the evolution of sex chromosome-specific forms of regulation, including sex chromosome dosage compensation in the soma and meiotic sex chromosome inactivation in the germline. In the male germline of Drosophila melanogaster, a novel but poorly understood form of sex chromosome-specific transcriptional regulation occurs that is distinct from canonical sex chromosome dosage compensation or meiotic inactivation. Previous work shows that expression of reporter genes driven by testis-specific promoters is considerably lower-approximately 3-fold or more-for transgenes inserted into X chromosome versus autosome locations. Here we characterize this transcriptional suppression of X-linked genes in the male germline and its evolutionary consequences. Using transgenes and transpositions, we show that most endogenous X-linked genes, not just testis-specific ones, are transcriptionally suppressed several-fold specifically in the Drosophila male germline. In wild-type testes, this sex chromosome-wide transcriptional suppression is generally undetectable, being effectively compensated by the gene-by-gene evolutionary recruitment of strong promoters on the X chromosome. We identify and experimentally validate a promoter element sequence motif that is enriched upstream of the transcription start sites of hundreds of testis-expressed genes; evolutionarily conserved across species; associated with strong gene expression levels in testes; and overrepresented on the X chromosome. These findings show that the expression of X-linked genes in the Drosophila testes reflects a balance between chromosome-wide epigenetic transcriptional suppression and long-term compensatory adaptation by sex-linked genes. Our results have broad implications for the evolution of gene expression in the Drosophila male germline and for genome evolution. PMID:27404402

  20. Integrative Modeling of eQTLs and Cis-Regulatory Elements Suggests Mechanisms Underlying Cell Type Specificity of eQTLs

    PubMed Central

    Brown, Christopher D.; Mangravite, Lara M.; Engelhardt, Barbara E.

    2013-01-01

    Genetic variants in cis-regulatory elements or trans-acting regulators frequently influence the quantity and spatiotemporal distribution of gene transcription. Recent interest in expression quantitative trait locus (eQTL) mapping has paralleled the adoption of genome-wide association studies (GWAS) for the analysis of complex traits and disease in humans. Under the hypothesis that many GWAS associations tag non-coding SNPs with small effects, and that these SNPs exert phenotypic control by modifying gene expression, it has become common to interpret GWAS associations using eQTL data. To fully exploit the mechanistic interpretability of eQTL-GWAS comparisons, an improved understanding of the genetic architecture and causal mechanisms of cell type specificity of eQTLs is required. We address this need by performing an eQTL analysis in three parts: first we identified eQTLs from eleven studies on seven cell types; then we integrated eQTL data with cis-regulatory element (CRE) data from the ENCODE project; finally we built a set of classifiers to predict the cell type specificity of eQTLs. The cell type specificity of eQTLs is associated with eQTL SNP overlap with hundreds of cell type specific CRE classes, including enhancer, promoter, and repressive chromatin marks, regions of open chromatin, and many classes of DNA binding proteins. These associations provide insight into the molecular mechanisms generating the cell type specificity of eQTLs and the mode of regulation of corresponding eQTLs. Using a random forest classifier with cell specific CRE-SNP overlap as features, we demonstrate the feasibility of predicting the cell type specificity of eQTLs. We then demonstrate that CREs from a trait-associated cell type can be used to annotate GWAS associations in the absence of eQTL data for that cell type. We anticipate that such integrative, predictive modeling of cell specificity will improve our ability to understand the mechanistic basis of human complex phenotypic

  1. Germ line and embryonic expression of Fex, a member of the Drosophila F-element retrotransposon family, is mediated by an internal cis-regulatory control region.

    PubMed Central

    Kerber, B; Fellert, S; Taubert, H; Hoch, M

    1996-01-01

    The F elements of Drosophila melanogaster belong to the superfamily of long interspersed nucleotide element retrotransposons. To date, F-element transcription has not been detected in flies. Here we describe the isolation of a member of the F-element family, termed Fex, which is transcribed in specific cells of the female and male germ lines and in various tissues during embryogenesis of D. melanogaster. Sequence analysis revealed that this element contains two complete open reading frames coding for a putative nucleic acid-binding protein and a putative reverse transcriptase. Functional analysis of the 5' region, using germ line transformation of Fex-lacZ reporter gene constructs, demonstrates that major aspects of tissue-specific Fex expression are controlled by internal cis-acting elements that lie in the putative coding region of open reading frame 1. These sequences mediate dynamic gene expression in eight expression domains during embryonic and germ line development. The capacity of the cis-regulatory region of the Fex element to mediate such complex expression patterns is unique among members of the long interspersed nucleotide element superfamily of retrotransposons and is reminiscent of regulatory regions of developmental control genes. PMID:8649411

  2. A Hox Transcription Factor Collective Binds a Highly Conserved Distal-less cis-Regulatory Module to Generate Robust Transcriptional Outcomes

    PubMed Central

    Uhl, Juli D.; Zandvakili, Arya; Gebelein, Brian

    2016-01-01

    cis-regulatory modules (CRMs) generate precise expression patterns by integrating numerous transcription factors (TFs). Surprisingly, CRMs that control essential gene patterns can differ greatly in conservation, suggesting distinct constraints on TF binding sites. Here, we show that a highly conserved Distal-less regulatory element (DCRE) that controls gene expression in leg precursor cells recruits multiple Hox, Extradenticle (Exd) and Homothorax (Hth) complexes to mediate dual outputs: thoracic activation and abdominal repression. Using reporter assays, we found that abdominal repression is particularly robust, as neither individual binding site mutations nor a DNA binding deficient Hth protein abolished cooperative DNA binding and in vivo repression. Moreover, a re-engineered DCRE containing a distinct configuration of Hox, Exd, and Hth sites also mediated abdominal Hox repression. However, the re-engineered DCRE failed to perform additional segment-specific functions such as thoracic activation. These findings are consistent with two emerging concepts in gene regulation: First, the abdominal Hox/Exd/Hth factors utilize protein-protein and protein-DNA interactions to form repression complexes on flexible combinations of sites, consistent with the TF collective model of CRM organization. Second, the conserved DCRE mediates multiple cell-type specific outputs, consistent with recent findings that pleiotropic CRMs are associated with conserved TF binding and added evolutionary constraints. PMID:27058369

  3. A Hox Transcription Factor Collective Binds a Highly Conserved Distal-less cis-Regulatory Module to Generate Robust Transcriptional Outcomes.

    PubMed

    Uhl, Juli D; Zandvakili, Arya; Gebelein, Brian

    2016-04-01

    cis-regulatory modules (CRMs) generate precise expression patterns by integrating numerous transcription factors (TFs). Surprisingly, CRMs that control essential gene patterns can differ greatly in conservation, suggesting distinct constraints on TF binding sites. Here, we show that a highly conserved Distal-less regulatory element (DCRE) that controls gene expression in leg precursor cells recruits multiple Hox, Extradenticle (Exd) and Homothorax (Hth) complexes to mediate dual outputs: thoracic activation and abdominal repression. Using reporter assays, we found that abdominal repression is particularly robust, as neither individual binding site mutations nor a DNA binding deficient Hth protein abolished cooperative DNA binding and in vivo repression. Moreover, a re-engineered DCRE containing a distinct configuration of Hox, Exd, and Hth sites also mediated abdominal Hox repression. However, the re-engineered DCRE failed to perform additional segment-specific functions such as thoracic activation. These findings are consistent with two emerging concepts in gene regulation: First, the abdominal Hox/Exd/Hth factors utilize protein-protein and protein-DNA interactions to form repression complexes on flexible combinations of sites, consistent with the TF collective model of CRM organization. Second, the conserved DCRE mediates multiple cell-type specific outputs, consistent with recent findings that pleiotropic CRMs are associated with conserved TF binding and added evolutionary constraints. PMID:27058369

  4. A cis-regulatory mutation in troponin-I of Drosophila reveals the importance of proper stoichiometry of structural proteins during muscle assembly.

    PubMed

    Firdaus, Hena; Mohan, Jayaram; Naz, Sarwat; Arathi, Prabhashankar; Ramesh, Saraf R; Nongthomba, Upendra

    2015-05-01

    Rapid and high wing-beat frequencies achieved during insect flight are powered by the indirect flight muscles, the largest group of muscles present in the thorax. Any anomaly during the assembly and/or structural impairment of the indirect flight muscles gives rise to a flightless phenotype. Multiple mutagenesis screens in Drosophila melanogaster for defective flight behavior have led to the isolation and characterization of mutations that have been instrumental in the identification of many proteins and residues that are important for muscle assembly, function, and disease. In this article, we present a molecular-genetic characterization of a flightless mutation, flightless-H (fliH), originally designated as heldup-a (hdp-a). We show that fliH is a cis-regulatory mutation of the wings up A (wupA) gene, which codes for the troponin-I protein, one of the troponin complex proteins, involved in regulation of muscle contraction. The mutation leads to reduced levels of troponin-I transcript and protein. In addition to this, there is also coordinated reduction in transcript and protein levels of other structural protein isoforms that are part of the troponin complex. The altered transcript and protein stoichiometry ultimately culminates in unregulated acto-myosin interactions and a hypercontraction muscle phenotype. Our results shed new insights into the importance of maintaining the stoichiometry of structural proteins during muscle assembly for proper function with implications for the identification of mutations and disease phenotypes in other species, including humans.

  5. Maps of cis-Regulatory Nodes in Megabase Long Genome Segments are an Inevitable Intermediate Step Toward Whole Genome Functional Mapping

    PubMed Central

    Nikolaev, Lev G; Akopov, Sergey B; Chernov, Igor P; Sverdlov, Eugene D

    2007-01-01

    The availability of complete human and other metazoan genome sequences has greatly facilitated positioning and analysis of various genomic functional elements, with initial emphasis on coding sequences. However, complete functional maps of sequenced eukaryotic genomes should include also positions of all non-coding regulatory elements. Unfortunately, experimental data on genomic positions of a multitude of regulatory sequences, such as enhancers, silencers, insulators, transcription terminators, and replication origins are very limited, especially at the whole genome level. Since most genomic regulatory elements (e.g. enhancers) are generally gene-, tissue-, or cell-specific, the prediction of these elements by computational methods is difficult and often ambiguous. Therefore, the development of high-throughput experimental approaches for identifying and mapping genomic functional elements is highly desirable. At the same time, the creation of whole-genome map of hundreds of thousands of regulatory elements in several hundreds of tissue/cell types is presently far beyond our capabilities. A possible alternative for the whole genome approach is to concentrate efforts on individual genomic segments and then to integrate the data obtained into a whole genome functional map. Moreover, the maps of polygenic fragments with functional cis-regulatory elements would provide valuable data on complex regulatory systems, including their variability and evolution. Here, we reviewed experimental approaches to the realization of these ideas, including our own developments of experimental techniques for selection of cis-acting functionally active DNA fragments from large (megabase-sized) segments of mammalian genomes. PMID:18660850

  6. Maps of cis-Regulatory Nodes in Megabase Long Genome Segments are an Inevitable Intermediate Step Toward Whole Genome Functional Mapping.

    PubMed

    Nikolaev, Lev G; Akopov, Sergey B; Chernov, Igor P; Sverdlov, Eugene D

    2007-04-01

    The availability of complete human and other metazoan genome sequences has greatly facilitated positioning and analysis of various genomic functional elements, with initial emphasis on coding sequences. However, complete functional maps of sequenced eukaryotic genomes should include also positions of all non-coding regulatory elements. Unfortunately, experimental data on genomic positions of a multitude of regulatory sequences, such as enhancers, silencers, insulators, transcription terminators, and replication origins are very limited, especially at the whole genome level. Since most genomic regulatory elements (e.g. enhancers) are generally gene-, tissue-, or cell-specific, the prediction of these elements by computational methods is difficult and often ambiguous. Therefore, the development of high-throughput experimental approaches for identifying and mapping genomic functional elements is highly desirable. At the same time, the creation of whole-genome map of hundreds of thousands of regulatory elements in several hundreds of tissue/cell types is presently far beyond our capabilities. A possible alternative for the whole genome approach is to concentrate efforts on individual genomic segments and then to integrate the data obtained into a whole genome functional map. Moreover, the maps of polygenic fragments with functional cis-regulatory elements would provide valuable data on complex regulatory systems, including their variability and evolution. Here, we reviewed experimental approaches to the realization of these ideas, including our own developments of experimental techniques for selection of cis-acting functionally active DNA fragments from large (megabase-sized) segments of mammalian genomes. PMID:18660850

  7. A Hox Transcription Factor Collective Binds a Highly Conserved Distal-less cis-Regulatory Module to Generate Robust Transcriptional Outcomes.

    PubMed

    Uhl, Juli D; Zandvakili, Arya; Gebelein, Brian

    2016-04-01

    cis-regulatory modules (CRMs) generate precise expression patterns by integrating numerous transcription factors (TFs). Surprisingly, CRMs that control essential gene patterns can differ greatly in conservation, suggesting distinct constraints on TF binding sites. Here, we show that a highly conserved Distal-less regulatory element (DCRE) that controls gene expression in leg precursor cells recruits multiple Hox, Extradenticle (Exd) and Homothorax (Hth) complexes to mediate dual outputs: thoracic activation and abdominal repression. Using reporter assays, we found that abdominal repression is particularly robust, as neither individual binding site mutations nor a DNA binding deficient Hth protein abolished cooperative DNA binding and in vivo repression. Moreover, a re-engineered DCRE containing a distinct configuration of Hox, Exd, and Hth sites also mediated abdominal Hox repression. However, the re-engineered DCRE failed to perform additional segment-specific functions such as thoracic activation. These findings are consistent with two emerging concepts in gene regulation: First, the abdominal Hox/Exd/Hth factors utilize protein-protein and protein-DNA interactions to form repression complexes on flexible combinations of sites, consistent with the TF collective model of CRM organization. Second, the conserved DCRE mediates multiple cell-type specific outputs, consistent with recent findings that pleiotropic CRMs are associated with conserved TF binding and added evolutionary constraints.

  8. Autosomal recessive retinitis pigmentosa with homozygous rhodopsin mutation E150K and non-coding cis-regulatory variants in CRX-binding regions of SAMD7

    PubMed Central

    Van Schil, Kristof; Karlstetter, Marcus; Aslanidis, Alexander; Dannhausen, Katharina; Azam, Maleeha; Qamar, Raheel; Leroy, Bart P.; Depasse, Fanny; Langmann, Thomas; De Baere, Elfride

    2016-01-01

    The aim of this study was to unravel the molecular pathogenesis of an unusual retinitis pigmentosa (RP) phenotype observed in a Turkish consanguineous family. Homozygosity mapping revealed two candidate genes, SAMD7 and RHO. A homozygous RHO mutation c.448G > A, p.E150K was found in two affected siblings, while no coding SAMD7 mutations were identified. Interestingly, four non-coding homozygous variants were found in two SAMD7 genomic regions relevant for binding of the retinal transcription factor CRX (CRX-bound regions, CBRs) in these affected siblings. Three variants are located in a promoter CBR termed CBR1, while the fourth is located more downstream in CBR2. Transcriptional activity of these variants was assessed by luciferase assays and electroporation of mouse retinal explants with reporter constructs of wild-type and variant SAMD7 CBRs. The combined CBR2/CBR1 variant construct showed significantly decreased SAMD7 reporter activity compared to the wild-type sequence, suggesting a cis-regulatory effect on SAMD7 expression. As Samd7 is a recently identified Crx-regulated transcriptional repressor in retina, we hypothesize that these SAMD7 variants might contribute to the retinal phenotype observed here, characterized by unusual, recognizable pigment deposits, differing from the classic spicular intraretinal pigmentation observed in other individuals homozygous for p.E150K, and typically associated with RP in general. PMID:26887858

  9. Application of the cis-regulatory region of a heat-shock protein 70 gene to heat-inducible gene expression in the ascidian Ciona intestinalis.

    PubMed

    Kawaguchi, Akane; Utsumi, Nanami; Morita, Maki; Ohya, Aya; Wada, Shuichi

    2015-01-01

    Temporally controlled induction of gene expression is a useful technique for analyzing gene function. To make such a technique possible in Ciona intestinalis embryos, we employed the cis-regulatory region of the heat-shock protein 70 (HSP70) gene Ci-HSPA1/6/7-like for heat-inducible gene expression in C. intestinalis embryos. We showed that Ci-HSPA1/6/7-like becomes heat shock-inducible by the 32-cell stage during embryogenesis. The 5'-upstream region of Ci-HSPA1/6/7-like, which contains heat-shock elements indispensable for heat-inducible gene expression, induces the heat shock-dependent expression of a reporter gene in the whole embryo from the 32-cell to the middle gastrula stages and in progressively restricted areas of embryos in subsequent stages. We assessed the effects of heat-shock treatments in different conditions on the normality of embryos and induction of transgene expression. We evaluated the usefulness of this technique through overexpression experiments on the well-characterized, developmentally relevant gene, Ci-Bra, and showed that this technique is applicable for inferring the gene function in C. intestinalis.

  10. The lncRNA Malat1 is dispensable for mouse development but its transcription plays a cis-regulatory role in the adult

    PubMed Central

    Zhang, Bin; Arun, Gayatri; Mao, Yuntao S.; Lazar, Zsolt; Hung, Gene; Bhattacharjee, Gourab; Xiao, Xiaokun; Booth, Carmen J.; Wu, Jie; Zhang, Chaolin; Spector, David L.

    2012-01-01

    SUMMARY Genome-wide studies have identified thousands of long noncoding RNAs (lncRNAs) lacking protein coding capacity. However, most lncRNAs are expressed at a very low level, and in most cases there is no genetic evidence to support their in vivo function. Malat1 (metastasis associated lung adenocarcinoma transcript 1) is among the most abundant and highly conserved lncRNAs, and it exhibits an uncommon 3′-end processing mechanism. In addition, its specific nuclear localization, developmental regulation, and dysregulation in cancer are suggestive of it having a critical biological function. We have characterized a Malat1 loss-of-function genetic model that indicates Malat1 is not essential for mouse pre- and post-natal development. Furthermore, depletion of Malat1 does not impact global gene expression, splicing factor level and phosphorylation status, or alternative pre-mRNA splicing. However, among a small number of genes that were dysregulated in adult Malat1 knockout mice, many were Malat1 neighboring genes, thus indicating a potential cis regulatory role of Malat1 gene transcription. PMID:22840402

  11. Computation of direct and inverse mutations with the SEGM web server (Stochastic Evolution of Genetic Motifs): an application to splice sites of human genome introns.

    PubMed

    Benard, Emmanuel; Michel, Christian J

    2009-08-01

    We present here the SEGM web server (Stochastic Evolution of Genetic Motifs) in order to study the evolution of genetic motifs both in the direct evolutionary sense (past-present) and in the inverse evolutionary sense (present-past). The genetic motifs studied can be nucleotides, dinucleotides and trinucleotides. As an example of an application of SEGM and to understand its functionalities, we give an analysis of inverse mutations of splice sites of human genome introns. SEGM is freely accessible at http://lsiit-bioinfo.u-strasbg.fr:8080/webMathematica/SEGM/SEGM.html directly or by the web site http://dpt-info.u-strasbg.fr/~michel/. To our knowledge, this SEGM web server is to date the only computational biology software in this evolutionary approach.

  12. Distinct cis regulatory elements govern the expression of TAG1 in embryonic sensory ganglia and spinal cord.

    PubMed

    Hadas, Yoav; Nitzan, Noa; Furley, Andrew J W; Kozlov, Serguei V; Klar, Avihu

    2013-01-01

    Cell fate commitment of spinal progenitor neurons is initiated by long-range, midline-derived, morphogens that regulate an array of transcription factors that, in turn, act sequentially or in parallel to control neuronal differentiation. Included among these are transcription factors that regulate the expression of receptors for guidance cues, thereby determining axonal trajectories. The Ig/FNIII superfamily molecules TAG1/Axonin1/CNTN2 (TAG1) and Neurofascin (Nfasc) are co-expressed in numerous neuronal cell types in the CNS and PNS - for example motor, DRG and interneurons - both promote neurite outgrowth and both are required for the architecture and function of nodes of Ranvier. The genes encoding TAG1 and Nfasc are adjacent in the genome, an arrangement which is evolutionarily conserved. To study the transcriptional network that governs TAG1 and Nfasc expression in spinal motor and commissural neurons, we set out to identify cis elements that regulate their expression. Two evolutionarily conserved DNA modules, one located between the Nfasc and TAG1 genes and the second directly 5' to the first exon and encompassing the first intron of TAG1, were identified that direct complementary expression to the CNS and PNS, respectively, of the embryonic hindbrain and spinal cord. Sequential deletions and point mutations of the CNS enhancer element revealed a 130bp element containing three conserved E-boxes required for motor neuron expression. In combination, these two elements appear to recapitulate a major part of the pattern of TAG1 expression in the embryonic nervous system.

  13. Evolution of an insect-specific GROUCHO-interaction motif in the ENGRAILED selector protein.

    PubMed

    Hittinger, Chris Todd; Carroll, Sean B

    2008-01-01

    Animal morphology evolves through alterations in the genetic regulatory networks that control development. Regulatory connections are commonly added, subtracted, or modified via mutations in cis-regulatory elements, but several cases are also known where transcription factors have gained or lost activity-modulating peptide motifs. In order to better assess the role of novel transcription factor peptide motifs in evolution, we searched for synapomorphic motifs in the homeotic selectors of Drosophila melanogaster and related insects. Here, we describe an evolutionarily novel GROUCHO (GRO)-interaction motif in the ENGRAILED (EN) selector protein. This "ehIFRPF" motif is not homologous to the previously characterized "engrailed homology 1" (eh1) GRO-interaction motif of EN. This second motif is an insect-specific "WRPW"-type motif that has been maintained by purifying selection in at least the dipteran/lepidopteran lineage. We demonstrate that this motif contributes to in vivo repression of the wingless (wg) target gene and to interaction with GRO in vitro. The acquisition and conservation of this auxiliary peptide motif shows how the number and activity of short peptide motifs can evolve in transcription factors while existing regulatory functions are maintained.

  14. Novel applications of motif-directed profiling to identify disease resistance genes in plants

    PubMed Central

    2013-01-01

    Background Molecular profiling of gene families is a versatile tool to study diversity between individual genomes in sexual crosses and germplasm. Nucleotide binding site (NBS) profiling, in particular, targets conserved nucleotide binding site-encoding sequences of resistance gene analogs (RGAs), and is widely used to identify molecular markers for disease resistance (R) genes. Results In this study, we used NBS profiling to identify genome-wide locations of RGA clusters in the genome of potato clone RH. Positions of RGAs in the potato RH and DM genomes that were generated using profiling and genome sequencing, respectively, were compared. Largely overlapping results, but also interesting discrepancies, were found. Due to the clustering of RGAs, several parts of the genome are overexposed while others remain underexposed using NBS profiling. It is shown how the profiling of other gene families, i.e. protein kinases and different protein domain-coding sequences (i.e., TIR), can be used to achieve a better marker distribution. The power of profiling techniques is further illustrated using RGA cluster-directed profiling in a population of Solanum berthaultii. Multiple different paralogous RGAs within the Rpi-ber cluster could be genetically distinguished. Finally, an adaptation of the profiling protocol was made that allowed the parallel sequencing of profiling fragments using next generation sequencing. The types of RGAs that were tagged in this next-generation profiling approach largely overlapped with classical gel-based profiling. As a potential application of next-generation profiling, we showed how the R gene family associated with late blight resistance in the SH*RH population could be identified using a bulked segregant approach. Conclusions In this study, we provide a comprehensive overview of previously described and novel profiling primers and their genomic targets in potato through genetic mapping and comparative genomics. Furthermore, it is shown how

  15. Epsilon glutathione transferases possess a unique class-conserved subunit interface motif that directly interacts with glutathione in the active site.

    PubMed

    Wongsantichon, Jantana; Robinson, Robert C; Ketterman, Albert J

    2015-10-20

    Epsilon class glutathione transferases (GSTs) have been shown to contribute significantly to insecticide resistance. We report a new Epsilon class protein crystal structure from Drosophila melanogaster for the glutathione transferase DmGSTE6. The structure reveals a novel Epsilon clasp motif that is conserved across hundreds of millions of years of evolution of the insect Diptera order. This histidine-serine motif lies in the subunit interface and appears to contribute to quaternary stability as well as directly connecting the two glutathiones in the active sites of this dimeric enzyme.

  16. Epsilon glutathione transferases possess a unique class-conserved subunit interface motif that directly interacts with glutathione in the active site

    PubMed Central

    Wongsantichon, Jantana; Robinson, Robert C.; Ketterman, Albert J.

    2015-01-01

    Epsilon class glutathione transferases (GSTs) have been shown to contribute significantly to insecticide resistance. We report a new Epsilon class protein crystal structure from Drosophila melanogaster for the glutathione transferase DmGSTE6. The structure reveals a novel Epsilon clasp motif that is conserved across hundreds of millions of years of evolution of the insect Diptera order. This histidine-serine motif lies in the subunit interface and appears to contribute to quaternary stability as well as directly connecting the two glutathiones in the active sites of this dimeric enzyme. PMID:26487708

  17. A Human Immunoglobulin (Ig)A Cα3 Domain Motif Directs Polymeric Ig Receptor–mediated Secretion

    PubMed Central

    Hexham, J. Mark; White, Kendra D.; Carayannopoulos, Leonidas N.; Mandecki, Wlodeck; Brisette, Renee; Yang, Yih-Sheng; Capra, J. Donald

    1999-01-01

    Polymeric immunoglobulins provide immunological protection at mucosal surfaces to which they are specifically transported by the polymeric immunoglobulin receptor (pIgR). Using a panel of human IgA1/IgG1 constant region “domain swap” mutants, the binding site for the pIgR on dimeric IgA (dIgA) was localized to the Cα3 domain. Selection of random peptides for pIgR binding and comparison with the IgA sequence suggested amino acids 402–410 (QEPSQGTTT), in a predicted exposed loop of the Cα3 domain, as a potential binding site. Alanine substitution of two groups of amino acids in this area abrogated the binding of dIgA to pIgR, whereas adjacent substitutions in a β-strand immediately NH2-terminal to this loop had no effect. All pIgR binding IgA sequences contain a conserved three amino acid insertion, not present in IgG, at this position. These data localize the pIgR binding site on dimeric human IgA to this loop structure in the Cα3 domain, which directs mucosal secretion of polymeric antibodies. We propose that it may be possible to use a pIgR binding motif to deliver antigen-specific dIgA and small-molecule drugs to mucosal epithelia for therapy. PMID:9989991

  18. Annotating RNA motifs in sequences and alignments

    PubMed Central

    Gardner, Paul P.; Eldai, Hisham

    2015-01-01

    RNA performs a diverse array of important functions across all cellular life. These functions include important roles in translation, building translational machinery and maturing messenger RNA. More recent discoveries include the miRNAs and bacterial sRNAs that regulate gene expression, the thermosensors, riboswitches and other cis-regulatory elements that help prokaryotes sense their environment and eukaryotic piRNAs that suppress transposition. However, there can be a long period between the initial discovery of a RNA and determining its function. We present a bioinformatic approach to characterize RNA motifs, which are critical components of many RNA structure–function relationships. These motifs can, in some instances, provide researchers with functional hypotheses for uncharacterized RNAs. Moreover, we introduce a new profile-based database of RNA motifs—RMfam—and illustrate some applications for investigating the evolution and functional characterization of RNA. All the data and scripts associated with this work are available from: https://github.com/ppgardne/RMfam. PMID:25520192

  19. Subtle Changes in Motif Positioning Cause Tissue-Specific Effects on Robustness of an Enhancer's Activity

    PubMed Central

    Erceg, Jelena; Saunders, Timothy E.; Girardot, Charles; Devos, Damien P.; Hufnagel, Lars; Furlong, Eileen E. M.

    2014-01-01

    Deciphering the specific contribution of individual motifs within cis-regulatory modules (CRMs) is crucial to understanding how gene expression is regulated and how this process is affected by sequence variation. But despite vast improvements in the ability to identify where transcription factors (TFs) bind throughout the genome, we are limited in our ability to relate information on motif occupancy to function from sequence alone. Here, we engineered 63 synthetic CRMs to systematically assess the relationship between variation in the content and spacing of motifs within CRMs to CRM activity during development using Drosophila transgenic embryos. In over half the cases, very simple elements containing only one or two types of TF binding motifs were capable of driving specific spatio-temporal patterns during development. Different motif organizations provide different degrees of robustness to enhancer activity, ranging from binary on-off responses to more subtle effects including embryo-to-embryo and within-embryo variation. By quantifying the effects of subtle changes in motif organization, we were able to model biophysical rules that explain CRM behavior and may contribute to the spatial positioning of CRM activity in vivo. For the same enhancer, the effects of small differences in motif positions varied in developmentally related tissues, suggesting that gene expression may be more susceptible to sequence variation in one tissue compared to another. This result has important implications for human eQTL studies in which many associated mutations are found in cis-regulatory regions, though the mechanism for how they affect tissue-specific gene expression is often not understood. PMID:24391522

  20. WordSpy: identifying transcription factor binding motifs by building a dictionary and learning a grammar

    PubMed Central

    Wang, Guandong; Yu, Taotao; Zhang, Weixiong

    2005-01-01

    Transcription factor (TF) binding sites or motifs (TFBMs) are functional cis-regulatory DNA sequences that play an essential role in gene transcriptional regulation. Although many experimental and computational methods have been developed, finding TFBMs remains a challenging problem. We propose and develop a novel dictionary based motif finding algorithm, which we call WordSpy. One significant feature of WordSpy is the combination of a word counting method and a statistical model which consists of a dictionary of motifs and a grammar specifying their usage. The algorithm is suitable for genome-wide motif finding; it is capable of discovering hundreds of motifs from a large set of promoters in a single run. We further enhance WordSpy by applying gene expression information to separate true TFBMs from spurious ones, and by incorporating negative sequences to identify discriminative motifs. In addition, we also use randomly selected promoters from the genome to evaluate the significance of the discovered motifs. The output from WordSpy consists of an ordered list of putative motifs and a set of regulatory sequences with motif binding sites highlighted. The web server of WordSpy is available at . PMID:15980501

  1. Divergent Protein Motifs Direct Elongation Factor P-Mediated Translational Regulation in Salmonella enterica and Escherichia coli

    PubMed Central

    Hersch, Steven J.; Wang, Mengchi; Zou, S. Betty; Moon, Kyung-Mee; Foster, Leonard J.; Ibba, Michael; Navarre, William Wiley

    2013-01-01

    ABSTRACT Elongation factor P (EF-P) is a universally conserved bacterial translation factor homologous to eukaryotic/archaeal initiation factor 5A. In Salmonella, deletion of the efp gene results in pleiotropic phenotypes, including increased susceptibility to numerous cellular stressors. Only a limited number of proteins are affected by the loss of EF-P, and it has recently been determined that EF-P plays a critical role in rescuing ribosomes stalled at PPP and PPG peptide sequences. Here we present an unbiased in vivo investigation of the specific targets of EF-P by employing stable isotope labeling of amino acids in cell culture (SILAC) to compare the proteomes of wild-type and efp mutant Salmonella. We found that metabolic and motility genes are prominent among the subset of proteins with decreased production in the Δefp mutant. Furthermore, particular tripeptide motifs are statistically overrepresented among the proteins downregulated in efp mutant strains. These include both PPP and PPG but also additional motifs, such as APP and YIRYIR, which were confirmed to induce EF-P dependence by a translational fusion assay. Notably, we found that many proteins containing polyproline motifs are not misregulated in an EF-P-deficient background, suggesting that the factors that govern EF-P-mediated regulation are complex. Finally, we analyzed the specific region of the PoxB protein that is modulated by EF-P and found that mutation of any residue within a specific GSCGPG sequence eliminates the requirement for EF-P. This work expands the known repertoire of EF-P target motifs and implicates factors beyond polyproline motifs that are required for EF-P-mediated regulation. PMID:23611909

  2. Integration of bioinformatics and synthetic promoters leads to the discovery of novel elicitor-responsive cis-regulatory sequences in Arabidopsis.

    PubMed

    Koschmann, Jeannette; Machens, Fabian; Becker, Marlies; Niemeyer, Julia; Schulze, Jutta; Bülow, Lorenz; Stahl, Dietmar J; Hehl, Reinhard

    2012-09-01

    A combination of bioinformatic tools, high-throughput gene expression profiles, and the use of synthetic promoters is a powerful approach to discover and evaluate novel cis-sequences in response to specific stimuli. With Arabidopsis (Arabidopsis thaliana) microarray data annotated to the PathoPlant database, 732 different queries with a focus on fungal and oomycete pathogens were performed, leading to 510 up-regulated gene groups. Using the binding site estimation suite of tools, BEST, 407 conserved sequence motifs were identified in promoter regions of these coregulated gene sets. Motif similarities were determined with STAMP, classifying the 407 sequence motifs into 37 families. A comparative analysis of these 37 families with the AthaMap, PLACE, and AGRIS databases revealed similarities to known cis-elements but also led to the discovery of cis-sequences not yet implicated in pathogen response. Using a parsley (Petroselinum crispum) protoplast system and a modified reporter gene vector with an internal transformation control, 25 elicitor-responsive cis-sequences from 10 different motif families were identified. Many of the elicitor-responsive cis-sequences also drive reporter gene expression in an Agrobacterium tumefaciens infection assay in Nicotiana benthamiana. This work significantly increases the number of known elicitor-responsive cis-sequences and demonstrates the successful integration of a diverse set of bioinformatic resources combined with synthetic promoter analysis for data mining and functional screening in plant-pathogen interaction. PMID:22744985

  3. Integration of Bioinformatics and Synthetic Promoters Leads to the Discovery of Novel Elicitor-Responsive cis-Regulatory Sequences in Arabidopsis1[C][W][OA

    PubMed Central

    Koschmann, Jeannette; Machens, Fabian; Becker, Marlies; Niemeyer, Julia; Schulze, Jutta; Bülow, Lorenz; Stahl, Dietmar J.; Hehl, Reinhard

    2012-01-01

    A combination of bioinformatic tools, high-throughput gene expression profiles, and the use of synthetic promoters is a powerful approach to discover and evaluate novel cis-sequences in response to specific stimuli. With Arabidopsis (Arabidopsis thaliana) microarray data annotated to the PathoPlant database, 732 different queries with a focus on fungal and oomycete pathogens were performed, leading to 510 up-regulated gene groups. Using the binding site estimation suite of tools, BEST, 407 conserved sequence motifs were identified in promoter regions of these coregulated gene sets. Motif similarities were determined with STAMP, classifying the 407 sequence motifs into 37 families. A comparative analysis of these 37 families with the AthaMap, PLACE, and AGRIS databases revealed similarities to known cis-elements but also led to the discovery of cis-sequences not yet implicated in pathogen response. Using a parsley (Petroselinum crispum) protoplast system and a modified reporter gene vector with an internal transformation control, 25 elicitor-responsive cis-sequences from 10 different motif families were identified. Many of the elicitor-responsive cis-sequences also drive reporter gene expression in an Agrobacterium tumefaciens infection assay in Nicotiana benthamiana. This work significantly increases the number of known elicitor-responsive cis-sequences and demonstrates the successful integration of a diverse set of bioinformatic resources combined with synthetic promoter analysis for data mining and functional screening in plant-pathogen interaction. PMID:22744985

  4. Genetic analysis of bristle loss in hybrids between Drosophila melanogaster and D. simulans provides evidence for divergence of cis-regulatory sequences in the achaete-scute gene complex.

    PubMed

    Skaer, N; Simpson, P

    2000-05-01

    The two closely related species of Drosophila, D. melanogaster and D. simulans, display an identical bristle pattern on the notum, but hybrids between the two are lacking a variable number of bristles. We show that the loss is temperature-dependent and provide evidence for two periods of temperature sensitivity. A first period of heat sensitivity occurs during larval development and corresponds to the time when the prepattern of expression of genes whose products activate achaete-scute in the proneural clusters preceding bristle precursor formation is established. A second period of cold sensitivity corresponds to the time of emergence of the bristle precursor cells and the maintenance of their neural fate, a process requiring high levels of Achaete-Scute. Expression of achaete-scute at these two critical periods depends on cis-regulatory elements of the achaete-scute complex (AS-C). The differences between males, which have only one copy of the X-linked AS-C from D. simulans, and females, which have copies from both parental species, are compared, together with the effects of crossing in different rearrangements of the D. melanogaster AS-C that delete regulatory and/or coding sequences. We provide evidence that bristle loss in the hybrids may result from a decrease in the level of transcription at the AS-C and argue that interaction between trans-acting factors and cis-regulatory elements within the AS-C has diverged between the two species.

  5. AthaMap web tools for database-assisted identification of combinatorial cis-regulatory elements and the display of highly conserved transcription factor binding sites in Arabidopsis thaliana.

    PubMed

    Steffens, Nils Ole; Galuschka, Claudia; Schindler, Martin; Bülow, Lorenz; Hehl, Reinhard

    2005-07-01

    The AthaMap database generates a map of cis-regulatory elements for the Arabidopsis thaliana genome. AthaMap contains more than 7.4 x 10(6) putative binding sites for 36 transcription factors (TFs) from 16 different TF families. A newly implemented functionality allows the display of subsets of higher conserved transcription factor binding sites (TFBSs). Furthermore, a web tool was developed that permits a user-defined search for co-localizing cis-regulatory elements. The user can specify individually the level of conservation for each TFBS and a spacer range between them. This web tool was employed for the identification of co-localizing sites of known interacting TFs and TFs containing two DNA-binding domains. More than 1.8 x 10(5) combinatorial elements were annotated in the AthaMap database. These elements can also be used to identify more complex co-localizing elements consisting of up to four TFBSs. The AthaMap database and the connected web tools are a valuable resource for the analysis and the prediction of gene expression regulation at http://www.athamap.de. PMID:15980498

  6. Defining a Conformational Consensus Motif in Cotransin-Sensitive Signal Sequences: A Proteomic and Site-Directed Mutagenesis Study

    PubMed Central

    Klein, Wolfgang; Westendorf, Carolin; Schmidt, Antje; Conill-Cortés, Mercè; Rutz, Claudia; Blohs, Marcus; Beyermann, Michael; Protze, Jonas; Krause, Gerd; Krause, Eberhard; Schülein, Ralf

    2015-01-01

    The cyclodepsipeptide cotransin was described to inhibit the biosynthesis of a small subset of proteins by a signal sequence-discriminatory mechanism at the Sec61 protein-conducting channel. However, it was not clear how selective cotransin is, i.e. how many proteins are sensitive. Moreover, a consensus motif in signal sequences mediating cotransin sensitivity has yet not been described. To address these questions, we performed a proteomic study using cotransin-treated human hepatocellular carcinoma cells and the stable isotope labelling by amino acids in cell culture technique in combination with quantitative mass spectrometry. We used a saturating concentration of cotransin (30 micromolar) to identify also less-sensitive proteins and to discriminate the latter from completely resistant proteins. We found that the biosynthesis of almost all secreted proteins was cotransin-sensitive under these conditions. In contrast, biosynthesis of the majority of the integral membrane proteins was cotransin-resistant. Cotransin sensitivity of signal sequences was neither related to their length nor to their hydrophobicity. Instead, in the case of signal anchor sequences, we identified for the first time a conformational consensus motif mediating cotransin sensitivity. PMID:25806945

  7. Defining a conformational consensus motif in cotransin-sensitive signal sequences: a proteomic and site-directed mutagenesis study.

    PubMed

    Klein, Wolfgang; Westendorf, Carolin; Schmidt, Antje; Conill-Cortés, Mercè; Rutz, Claudia; Blohs, Marcus; Beyermann, Michael; Protze, Jonas; Krause, Gerd; Krause, Eberhard; Schülein, Ralf

    2015-01-01

    The cyclodepsipeptide cotransin was described to inhibit the biosynthesis of a small subset of proteins by a signal sequence-discriminatory mechanism at the Sec61 protein-conducting channel. However, it was not clear how selective cotransin is, i.e. how many proteins are sensitive. Moreover, a consensus motif in signal sequences mediating cotransin sensitivity has yet not been described. To address these questions, we performed a proteomic study using cotransin-treated human hepatocellular carcinoma cells and the stable isotope labelling by amino acids in cell culture technique in combination with quantitative mass spectrometry. We used a saturating concentration of cotransin (30 micromolar) to identify also less-sensitive proteins and to discriminate the latter from completely resistant proteins. We found that the biosynthesis of almost all secreted proteins was cotransin-sensitive under these conditions. In contrast, biosynthesis of the majority of the integral membrane proteins was cotransin-resistant. Cotransin sensitivity of signal sequences was neither related to their length nor to their hydrophobicity. Instead, in the case of signal anchor sequences, we identified for the first time a conformational consensus motif mediating cotransin sensitivity.

  8. Carbonyl-carbonyl interactions and amide π-stacking as the directing motifs of the supramolecular assembly of ethyl N-(2-acetylphenyl)oxalamate in a synperiplanar conformation.

    PubMed

    Cabrera-Pérez, Laura C; García-Báez, Efrén V; Franco-Hernández, Marina O; Martínez-Martínez, Francisco J; Padilla-Martínez, Itzia I

    2015-05-01

    The title compound, C12H13NO4, is one of the few examples that exhibits a syn conformation between the amide and ester carbonyl groups of the oxalyl group. This conformation allows the engagement of the amide H atom in an intramolecular three-centred hydrogen-bonding S(6)S(5) motif. The compound is self-assembled by C=O...C=O and amide-π interactions into stacked columns along the b-axis direction. The concurrence of both interactions seems to be responsible for stabilizing the observed syn conformation between the carbonyl groups. The second dimension, along the a-axis direction, is developed by soft C-H...O hydrogen bonding. Density functional theory (DFT) calculations at the B3LYP/6-31G(d,p) level of theory were performed to support the experimental findings.

  9. O-xylosylation in a recombinant protein is directed at a common motif on glycine-serine linkers.

    PubMed

    Spencer, David; Novarra, Shabazz; Zhu, Liang; Mugabe, Sheila; Thisted, Thomas; Baca, Manuel; Depaz, Roberto; Barton, Christopher

    2013-11-01

    Glycine-serine (GS) linkers are commonly used in recombinant proteins to connect domains. Here, we report the posttranslational O-glycosylation of a GS linker in a novel fusion protein. The structure of the O-glycan moiety is a xylose-based core substituted with hexose and sulfated hexauronic acid residues. The total level of O-xylosylation was approximately 30% in the material expressed in HEK-293 cell lines. There was an approximate 10-fold reduction in O-xylosylation levels when the material was expressed in Chinese hamster ovary cell lines. Similar O-glycan structures have been reported for human urinary thrombomodulin and represent the initial building block for proteoglycans such as chondroitin sulfate and heparin. The sites of attachment, determined by electron transfer dissociation mass spectrometry, were localized to serine in the linker regions of the recombinant fusion protein. This attachment could be attributed, in part, to the inherent xylosyltransferase motif present in GS linkers. Elimination of the O-glycan moiety was achieved with modified linkers containing only glycine residues. The aggregation and fragmentation behavior of the GGG construct were comparable to the GSG-linked material during thermal stress. The O-xylosylation reported has implications for the manufacturing consistency of recombinant proteins containing GS linkers. PMID:24105735

  10. A systematic approach to identify functional motifs within vertebrate developmental enhancers

    PubMed Central

    Li, Qiang; Ritter, Deborah; Yang, Nan; Dong, Zhiqiang; Li, Hao; Chuang, Jeffrey H.; Guo, Su

    2012-01-01

    Uncovering the cis-regulatory logic of developmental enhancers is critical to understanding the role of non-coding DNA in development. However, it is cumbersome to identify functional motifs within enhancers, and thus few vertebrate enhancers have their core functional motifs revealed. Here we report a combined experimental and computational approach for discovering regulatory motifs in developmental enhancers. Making use of the zebrafish gene expression database, we computationally identified conserved non-coding elements (CNEs) likely to have a desired tissue-specificity based on the expression of nearby genes. Through a high throughput and robust enhancer assay, we tested the activity of ~100 such CNEs and efficiently uncovered developmental enhancers with desired spatial and temporal expression patterns in the zebrafish brain. Application of de novo motif prediction algorithms on a group of forebrain enhancers identified five top-ranked motifs, all of which were experimentally validated as critical for forebrain enhancer activity. These results demonstrate a systematic approach to discover important regulatory motifs in vertebrate developmental enhancers. Moreover, this dataset provides a useful resource for further dissection of vertebrate brain development and function. PMID:19850031

  11. Chromatin-driven de novo discovery of DNA binding motifs in the human malaria parasite

    PubMed Central

    2011-01-01

    Background Despite extensive efforts to discover transcription factors and their binding sites in the human malaria parasite Plasmodium falciparum, only a few transcription factor binding motifs have been experimentally validated to date. As a consequence, gene regulation in P. falciparum is still poorly understood. There is now evidence that the chromatin architecture plays an important role in transcriptional control in malaria. Results We propose a methodology for discovering cis-regulatory elements that uses for the first time exclusively dynamic chromatin remodeling data. Our method employs nucleosome positioning data collected at seven time points during the erythrocytic cycle of P. falciparum to discover putative DNA binding motifs and their transcription factor binding sites along with their associated clusters of target genes. Our approach results in 129 putative binding motifs within the promoter region of known genes. About 75% of those are novel, the remaining being highly similar to experimentally validated binding motifs. About half of the binding motifs reported show statistically significant enrichment in functional gene sets and strong positional bias in the promoter region. Conclusion Experimental results establish the principle that dynamic chromatin remodeling data can be used in lieu of gene expression data to discover binding motifs and their transcription factor binding sites. Our approach can be applied using only dynamic nucleosome positioning data, independent from any knowledge of gene function or expression. PMID:22165844

  12. [Personal motif in art].

    PubMed

    Gerevich, József

    2015-01-01

    One of the basic questions of the art psychology is whether a personal motif is to be found behind works of art and if so, how openly or indirectly it appears in the work itself. Analysis of examples and documents from the fine arts and literature allow us to conclude that the personal motif that can be identified by the viewer through symbols, at times easily at others with more difficulty, gives an emotional plus to the artistic product. The personal motif may be found in traumatic experiences, in communication to the model or with other emotionally important persons (mourning, disappointment, revenge, hatred, rivalry, revolt etc.), in self-searching, or self-analysis. The emotions are expressed in artistic activity either directly or indirectly. The intention nourished by the artist's identity (Kunstwollen) may stand in the way of spontaneous self-expression, channelling it into hidden paths. Under the influence of certain circumstances, the artist may arouse in the viewer, consciously or unconsciously, an illusionary, misleading image of himself. An examination of the personal motif is one of the important research areas of art therapy.

  13. [Personal motif in art].

    PubMed

    Gerevich, József

    2015-01-01

    One of the basic questions of the art psychology is whether a personal motif is to be found behind works of art and if so, how openly or indirectly it appears in the work itself. Analysis of examples and documents from the fine arts and literature allow us to conclude that the personal motif that can be identified by the viewer through symbols, at times easily at others with more difficulty, gives an emotional plus to the artistic product. The personal motif may be found in traumatic experiences, in communication to the model or with other emotionally important persons (mourning, disappointment, revenge, hatred, rivalry, revolt etc.), in self-searching, or self-analysis. The emotions are expressed in artistic activity either directly or indirectly. The intention nourished by the artist's identity (Kunstwollen) may stand in the way of spontaneous self-expression, channelling it into hidden paths. Under the influence of certain circumstances, the artist may arouse in the viewer, consciously or unconsciously, an illusionary, misleading image of himself. An examination of the personal motif is one of the important research areas of art therapy. PMID:26202617

  14. Fast and Efficient Cloning of Cis-Regulatory Sequences for High-Throughput Yeast One-Hybrid Analyses of Transcription Factors.

    PubMed

    Kelemen, Zsolt; Przybyla-Toscano, Jonathan; Tissot, Nicolas; Lepiniec, Loïc; Dubos, Christian

    2016-01-01

    Yeast one-hybrid (Y1H) assay has been proven to be a powerful technique to characterize in vivo the interaction between a given transcription factor (TF), or its DNA-binding domain (DBD), and target DNA sequences. Comprehensive characterization of TF/DBD and DNA interactions should allow designing synthetic promoters that would undoubtedly be valuable for biotechnological approaches. Here, we use the ligation-independent cloning system (LIC) in order to enhance the cloning efficiency of DNA motifs into the pHISi Y1H vector. LIC overcomes important limitations of traditional cloning technologies, since any DNA fragment can be cloned into LIC compatible vectors without using restriction endonucleases, ligation, or in vitro recombination. PMID:27557765

  15. Redundant ERF-VII Transcription Factors Bind to an Evolutionarily Conserved cis-Motif to Regulate Hypoxia-Responsive Gene Expression in Arabidopsis.

    PubMed

    Gasch, Philipp; Fundinger, Moritz; Müller, Jana T; Lee, Travis; Bailey-Serres, Julia; Mustroph, Angelika

    2016-01-01

    The response of Arabidopsis thaliana to low-oxygen stress (hypoxia), such as during shoot submergence or root waterlogging, includes increasing the levels of ∼50 hypoxia-responsive gene transcripts, many of which encode enzymes associated with anaerobic metabolism. Upregulation of over half of these mRNAs involves stabilization of five group VII ethylene response factor (ERF-VII) transcription factors, which are routinely degraded via the N-end rule pathway of proteolysis in an oxygen- and nitric oxide-dependent manner. Despite their importance, neither the quantitative contribution of individual ERF-VIIs nor the cis-regulatory elements they govern are well understood. Here, using single- and double-null mutants, the constitutively synthesized ERF-VIIs RELATED TO APETALA2.2 (RAP2.2) and RAP2.12 are shown to act redundantly as principle activators of hypoxia-responsive genes; constitutively expressed RAP2.3 contributes to this redundancy, whereas the hypoxia-induced HYPOXIA RESPONSIVE ERF1 (HRE1) and HRE2 play minor roles. An evolutionarily conserved 12-bp cis-regulatory motif that binds to and is sufficient for activation by RAP2.2 and RAP2.12 is identified through a comparative phylogenetic motif search, promoter dissection, yeast one-hybrid assays, and chromatin immunopurification. This motif, designated the hypoxia-responsive promoter element, is enriched in promoters of hypoxia-responsive genes in multiple species. PMID:26668304

  16. Arabidopsis Flower and Embryo Developmental Genes are Repressed in Seedlings by Different Combinations of Polycomb Group Proteins in Association with Distinct Sets of Cis-regulatory Elements

    PubMed Central

    Liu, Jian; Zhang, Lei; He, Chongsheng; Shen, Wen-Hui; Jin, Hong; Xu, Lin; Zhang, Yijing

    2016-01-01

    Polycomb repressive complexes (PRCs) play crucial roles in transcriptional repression and developmental regulation in both plants and animals. In plants, depletion of different members of PRCs causes both overlapping and unique phenotypic defects. However, the underlying molecular mechanism determining the target specificity and functional diversity is not sufficiently characterized. Here, we quantitatively compared changes of tri-methylation at H3K27 in Arabidopsis mutants deprived of various key PRC components. We show that CURLY LEAF (CLF), a major catalytic subunit of PRC2, coordinates with different members of PRC1 in suppression of distinct plant developmental programs. We found that expression of flower development genes is repressed in seedlings preferentially via non-redundant role of CLF, which specifically associated with LIKE HETEROCHROMATIN PROTEIN1 (LHP1). In contrast, expression of embryo development genes is repressed by PRC1-catalytic core subunits AtBMI1 and AtRING1 in common with PRC2-catalytic enzymes CLF or SWINGER (SWN). This context-dependent role of CLF corresponds well with the change in H3K27me3 profiles, and is remarkably associated with differential co-occupancy of binding motifs of transcription factors (TFs), including MADS box and ABA-related factors. We propose that different combinations of PRC members distinctively regulate different developmental programs, and their target specificity is modulated by specific TFs. PMID:26760036

  17. Are mutagenic non D-loop direct repeat motifs in mitochondrial DNA under a negative selection pressure?

    PubMed Central

    Lakshmanan, Lakshmi Narayanan; Gruber, Jan; Halliwell, Barry; Gunawan, Rudiyanto

    2015-01-01

    Non D-loop direct repeats (DRs) in mitochondrial DNA (mtDNA) have been commonly implicated in the mutagenesis of mtDNA deletions associated with neuromuscular disease and ageing. Further, these DRs have been hypothesized to put a constraint on the lifespan of mammals and are under a negative selection pressure. Using a compendium of 294 mammalian mtDNA, we re-examined the relationship between species lifespan and the mutagenicity of such DRs. Contradicting the prevailing hypotheses, we found no significant evidence that long-lived mammals possess fewer mutagenic DRs than short-lived mammals. By comparing DR counts in human mtDNA with those in selectively randomized sequences, we also showed that the number of DRs in human mtDNA is primarily determined by global mtDNA properties, such as the bias in synonymous codon usage (SCU) and nucleotide composition. We found that SCU bias in mtDNA positively correlates with DR counts, where repeated usage of a subset of codons leads to more frequent DR occurrences. While bias in SCU and nucleotide composition has been attributed to nucleotide mutational bias, mammalian mtDNA still exhibit higher SCU bias and DR counts than expected from such mutational bias, suggesting a lack of negative selection against non D-loop DRs. PMID:25855815

  18. Comparative motif discovery combined with comparative transcriptomics yields accurate targetome and enhancer predictions.

    PubMed

    Naval-Sánchez, Marina; Potier, Delphine; Haagen, Lotte; Sánchez, Máximo; Munck, Sebastian; Van de Sande, Bram; Casares, Fernando; Christiaens, Valerie; Aerts, Stein

    2013-01-01

    The identification of transcription factor binding sites, enhancers, and transcriptional target genes often relies on the integration of gene expression profiling and computational cis-regulatory sequence analysis. Methods for the prediction of cis-regulatory elements can take advantage of comparative genomics to increase signal-to-noise levels. However, gene expression data are usually derived from only one species. Here we investigate tissue-specific cross-species gene expression profiling by high-throughput sequencing, combined with cross-species motif discovery. First, we compared different methods for expression level quantification and cross-species integration using Tag-seq data. Using the optimal pipeline, we derived a set of genes with conserved expression during retinal determination across Drosophila melanogaster, Drosophila yakuba, and Drosophila virilis. These genes are enriched for binding sites of eye-related transcription factors including the zinc-finger Glass, a master regulator of photoreceptor differentiation. Validation of predicted Glass targets using RNA-seq in homozygous glass mutants confirms that the majority of our predictions are expressed downstream from Glass. Finally, we tested nine candidate enhancers by in vivo reporter assays and found eight of them to drive GFP in the eye disc, of which seven colocalize with the Glass protein, namely, scrt, chp, dpr10, CG6329, retn, Lim3, and dmrt99B. In conclusion, we show for the first time the combined use of cross-species expression profiling with cross-species motif discovery as a method to define a core developmental program, and we augment the candidate Glass targetome from a single known target gene, lozenge, to at least 62 conserved transcriptional targets. PMID:23070853

  19. Drosophila melanogaster Hox transcription factors access the RNA polymerase II machinery through direct homeodomain binding to a conserved motif of mediator subunit Med19.

    PubMed

    Boube, Muriel; Hudry, Bruno; Immarigeon, Clément; Carrier, Yannick; Bernat-Fabre, Sandra; Merabet, Samir; Graba, Yacine; Bourbon, Henri-Marc; Cribbs, David L

    2014-05-01

    Hox genes in species across the metazoa encode transcription factors (TFs) containing highly-conserved homeodomains that bind target DNA sequences to regulate batteries of developmental target genes. DNA-bound Hox proteins, together with other TF partners, induce an appropriate transcriptional response by RNA Polymerase II (PolII) and its associated general transcription factors. How the evolutionarily conserved Hox TFs interface with this general machinery to generate finely regulated transcriptional responses remains obscure. One major component of the PolII machinery, the Mediator (MED) transcription complex, is composed of roughly 30 protein subunits organized in modules that bridge the PolII enzyme to DNA-bound TFs. Here, we investigate the physical and functional interplay between Drosophila melanogaster Hox developmental TFs and MED complex proteins. We find that the Med19 subunit directly binds Hox homeodomains, in vitro and in vivo. Loss-of-function Med19 mutations act as dose-sensitive genetic modifiers that synergistically modulate Hox-directed developmental outcomes. Using clonal analysis, we identify a role for Med19 in Hox-dependent target gene activation. We identify a conserved, animal-specific motif that is required for Med19 homeodomain binding, and for activation of a specific Ultrabithorax target. These results provide the first direct molecular link between Hox homeodomain proteins and the general PolII machinery. They support a role for Med19 as a PolII holoenzyme-embedded "co-factor" that acts together with Hox proteins through their homeodomains in regulated developmental transcription.

  20. COPS: Detecting Co-Occurrence and Spatial Arrangement of Transcription Factor Binding Motifs in Genome-Wide Datasets

    PubMed Central

    Lohmann, Ingrid

    2012-01-01

    In multi-cellular organisms, spatiotemporal activity of cis-regulatory DNA elements depends on their occupancy by different transcription factors (TFs). In recent years, genome-wide ChIP-on-Chip, ChIP-Seq and DamID assays have been extensively used to unravel the combinatorial interaction of TFs with cis-regulatory modules (CRMs) in the genome. Even though genome-wide binding profiles are increasingly becoming available for different TFs, single TF binding profiles are in most cases not sufficient for dissecting complex regulatory networks. Thus, potent computational tools detecting statistically significant and biologically relevant TF-motif co-occurrences in genome-wide datasets are essential for analyzing context-dependent transcriptional regulation. We have developed COPS (Co-Occurrence Pattern Search), a new bioinformatics tool based on a combination of association rules and Markov chain models, which detects co-occurring TF binding sites (BSs) on genomic regions of interest. COPS scans DNA sequences for frequent motif patterns using a Frequent-Pattern tree based data mining approach, which allows efficient performance of the software with respect to both data structure and implementation speed, in particular when mining large datasets. Since transcriptional gene regulation very often relies on the formation of regulatory protein complexes mediated by closely adjoining TF binding sites on CRMs, COPS additionally detects preferred short distance between co-occurring TF motifs. The performance of our software with respect to biological significance was evaluated using three published datasets containing genomic regions that are independently bound by several TFs involved in a defined biological process. In sum, COPS is a fast, efficient and user-friendly tool mining statistically and biologically significant TFBS co-occurrences and therefore allows the identification of TFs that combinatorially regulate gene expression. PMID:23272209

  1. Mining Conditional Phosphorylation Motifs.

    PubMed

    Liu, Xiaoqing; Wu, Jun; Gong, Haipeng; Deng, Shengchun; He, Zengyou

    2014-01-01

    Phosphorylation motifs represent position-specific amino acid patterns around the phosphorylation sites in the set of phosphopeptides. Several algorithms have been proposed to uncover phosphorylation motifs, whereas the problem of efficiently discovering a set of significant motifs with sufficiently high coverage and non-redundancy still remains unsolved. Here we present a novel notion called conditional phosphorylation motifs. Through this new concept, the motifs whose over-expressiveness mainly benefits from its constituting parts can be filtered out effectively. To discover conditional phosphorylation motifs, we propose an algorithm called C-Motif for a non-redundant identification of significant phosphorylation motifs. C-Motif is implemented under the Apriori framework, and it tests the statistical significance together with the frequency of candidate motifs in a single stage. Experiments demonstrate that C-Motif outperforms some current algorithms such as MMFPh and Motif-All in terms of coverage and non-redundancy of the results and efficiency of the execution. The source code of C-Motif is available at: https://sourceforge. net/projects/cmotif/. PMID:26356863

  2. Biological network motif detection: principles and practice.

    PubMed

    Wong, Elisabeth; Baur, Brittany; Quader, Saad; Huang, Chun-Hsi

    2012-03-01

    Network motifs are statistically overrepresented sub-structures (sub-graphs) in a network, and have been recognized as 'the simple building blocks of complex networks'. Study of biological network motifs may reveal answers to many important biological questions. The main difficulty in detecting larger network motifs in biological networks lies in the facts that the number of possible sub-graphs increases exponentially with the network or motif size (node counts, in general), and that no known polynomial-time algorithm exists in deciding if two graphs are topologically equivalent. This article discusses the biological significance of network motifs, the motivation behind solving the motif-finding problem, and strategies to solve the various aspects of this problem. A simple classification scheme is designed to analyze the strengths and weaknesses of several existing algorithms. Experimental results derived from a few comparative studies in the literature are discussed, with conclusions that lead to future research directions. PMID:22396487

  3. Gene expression profiling of cultured human NF1 heterozygous (NF1+/-) melanocytes reveals downregulation of a transcriptional cis-regulatory network mediating activation of the melanocyte-specific dopachrome tautomerase (DCT) gene.

    PubMed

    Boucneau, Joachim; De Schepper, Sofie; Vuylsteke, Marnik; Van Hummelen, Paul; Naeyaert, Jean-Marie; Lambert, Jo

    2005-08-01

    One of the major primary features of the neurocutaneous genetic disorder Neurofibromatosis type 1 are the hyperpigmentary café-au-lait macules where disregulation of melanocyte biology is supposed to play a key etiopathogenic role. To gain better insight into the possible role of the tumor suppressor gene NF1, a transcriptomic microarray analysis was performed on human NF1 heterozygous (NF1+/-) melanocytes of a Neurofibromatosis type 1 patient and NF1 wild type (NF1+/+) melanocytes of a healthy control patient, both cultured from normally pigmented skin and hyperpigmented lesional café-au-lait skin. From the magnitude of gene effects, we found that gene expression was affected most strongly by genotype and less so by lesional type. A total of 137 genes had a significant twofold or more up- (72) or downregulated (65) expression in NF1+/- melanocytes compared with NF1+/+ melanocytes. Melanocytes cultured from hyperpigmented café-au-lait skin showed 37 upregulated genes whereas only 14 were downregulated compared with normal skin melanocytes. In addition, significant genotype xlesional type interactions were observed for 465 genes. Differentially expressed genes were mainly involved in regulating cell proliferation and cell adhesion. A high number of transcription factor genes, among which a specific subset important in melanocyte lineage development, were downregulated in the cis-regulatory network governing the activation of the melanocyte-specific dopachrome tautomerase (DCT) gene. Although the results presented have been obtained with a restricted number of patients (one NF1 patient and one control) and using cDNA microarrays that may limit their interpretation, the data nevertheless addresses for the first time the effect of a heterozygous NF1 gene on the expression of the human melanocyte transcriptome and has generated several interesting candidate genes helpful in elucidating the etiopathology of café-au-lait macules in NF1 patients.

  4. Conserved cis-regulatory elements for DNA-binding-with-one-finger and homeo-domain-leucine-zipper transcription factors regulate companion cell-specific expression of the Arabidopsis thaliana SUCROSE TRANSPORTER 2 gene.

    PubMed

    Schneidereit, Alexander; Imlau, Astrid; Sauer, Norbert

    2008-09-01

    The transition from young carbon-importing sink leaves of higher plants to mature carbon-exporting source leaves is paralleled by a complete reversal of phloem function. While sink-leaf phloem mediates the influx of reduced carbon from older source leaves and the release of this imported carbon to the sink-leaf mesophyll, source-leaf phloem catalyzes the uptake of photoassimilates into companion cells (CCs) and sieve elements (SEs) and the net carbon export from the leaf. Phloem loading in source leaves with sucrose, the main or exclusive transport form for fixed carbon in most higher plants, is catalyzed by plasma membrane-localized sucrose transporters. Consistent with the described physiological switch from sink to source, the promoter of the Arabidopsis AtSUC2 gene is active only in source-leaf CCs of Arabidopsis or of transgenic tobacco (Nicotiana tabacum). For the identification of regulatory elements involved in this companion cell-specific and source-specific gene expression, we performed detailed analyses of the AtSUC2 promoter by truncation and mutagenesis. A 126-bp promoter fragment was identified, which seems to contain these fragments and which drives AtSUC2-typical expression when combined with a 35S minimal promoter. Within this fragment, linker-scanning analyses revealed two cis-regulatory elements that were further characterized as putative binding sites for transcription factors of the DNA-binding-with-one-finger or the homeo-domain-leucine-zipper families. Similar or identical binding sites are found in other genes and in different plant species, suggesting an ancient regulatory mechanism for this important physiological switch. PMID:18551303

  5. Motif module map reveals enforcement of aging by continual NF-kappaB activity.

    PubMed

    Adler, Adam S; Sinha, Saurabh; Kawahara, Tiara L A; Zhang, Jennifer Y; Segal, Eran; Chang, Howard Y

    2007-12-15

    Aging is characterized by specific alterations in gene expression, but their underlying mechanisms and functional consequences are not well understood. Here we develop a systematic approach to identify combinatorial cis-regulatory motifs that drive age-dependent gene expression across different tissues and organisms. Integrated analysis of 365 microarrays spanning nine tissue types predicted fourteen motifs as major regulators of age-dependent gene expression in human and mouse. The motif most strongly associated with aging was that of the transcription factor NF-kappaB. Inducible genetic blockade of NF-kappaB for 2 wk in the epidermis of chronologically aged mice reverted the tissue characteristics and global gene expression programs to those of young mice. Age-specific NF-kappaB blockade and orthogonal cell cycle interventions revealed that NF-kappaB controls cell cycle exit and gene expression signature of aging in parallel but not sequential pathways. These results identify a conserved network of regulatory pathways underlying mammalian aging and show that NF-kappaB is continually required to enforce many features of aging in a tissue-specific manner.

  6. Networks of motifs from sequences of symbols.

    PubMed

    Sinatra, Roberta; Condorelli, Daniele; Latora, Vito

    2010-10-22

    We introduce a method to convert an ensemble of sequences of symbols into a weighted directed network whose nodes are motifs, while the directed links and their weights are defined from statistically significant co-occurences of two motifs in the same sequence. The analysis of communities of networks of motifs is shown to be able to correlate sequences with functions in the human proteome database, to detect hot topics from online social dialogs, to characterize trajectories of dynamical systems, and it might find other useful applications to process large amounts of data in various fields.

  7. Networks of Motifs from Sequences of Symbols

    NASA Astrophysics Data System (ADS)

    Sinatra, Roberta; Condorelli, Daniele; Latora, Vito

    2010-10-01

    We introduce a method to convert an ensemble of sequences of symbols into a weighted directed network whose nodes are motifs, while the directed links and their weights are defined from statistically significant co-occurences of two motifs in the same sequence. The analysis of communities of networks of motifs is shown to be able to correlate sequences with functions in the human proteome database, to detect hot topics from online social dialogs, to characterize trajectories of dynamical systems, and it might find other useful applications to process large amounts of data in various fields.

  8. [Psychopathological study of lie motif in schizophrenia].

    PubMed

    Otsuka, Koichiro; Kato, Satoshi

    2006-01-01

    The theme of a statement is called "lie motif" by the authors when schizophrenic patients say "I have lied to anybody". We tried to analyse of the psychopathological characteristics and anthropological meanings of the lie motifs in schizophrenia, which has not been thematically examined until now, based on 4 cases, and contrasting with the lie motif (Lügenmotiv) in depression taken up by A. Kraus (1989). We classified the lie motifs in schizophrenia into the following two types: a) the past directive lie motif: the patients speak about their real lie regarding it as a 'petty fault' in their distant past with self-guilty feeling, b) the present directive lie motif: the patients say repeatedly 'I have lied' (about their present speech and behavior), retreating from their previous commitments. The observed false confessions of innocent fault by the patients seem to belong to the present directed lie motif. In comparison with the lie motif in depression, it is characteristic for the lie motif in schizophrenia that the patients feel themselves to already have been caught out by others before they confess the lie. The lie motif in schizophrenia seems to come into being through the attribution process of taking the others' blame on ones' own shoulders, which has been pointed out to be common in the guilt experience in schizophrenia. The others' blame on this occasion is due to "the others' gaze" in the experience of the initial self-centralization (i.e. non delusional self-referential experience) in the early stage of schizophrenia (S. Kato 1999). The others' gaze is supposed to bring about the feeling of amorphous self-revelation which could also be regarded as the guilt feeling without content, to the patients. When the guilt feeling is bound with a past concrete fault, the patients tell the past directive lie motif. On the other hand, when the patients cannot find a past fixed content, and feel their present actions as uncertain and experience them as lies, the

  9. Motifs in brain networks.

    PubMed

    Sporns, Olaf; Kötter, Rolf

    2004-11-01

    Complex brains have evolved a highly efficient network architecture whose structural connectivity is capable of generating a large repertoire of functional states. We detect characteristic network building blocks (structural and functional motifs) in neuroanatomical data sets and identify a small set of structural motifs that occur in significantly increased numbers. Our analysis suggests the hypothesis that brain networks maximize both the number and the diversity of functional motifs, while the repertoire of structural motifs remains small. Using functional motif number as a cost function in an optimization algorithm, we obtain network topologies that resemble real brain networks across a broad spectrum of structural measures, including small-world attributes. These results are consistent with the hypothesis that highly evolved neural architectures are organized to maximize functional repertoires and to support highly efficient integration of information.

  10. Motifs in Brain Networks

    PubMed Central

    2004-01-01

    Complex brains have evolved a highly efficient network architecture whose structural connectivity is capable of generating a large repertoire of functional states. We detect characteristic network building blocks (structural and functional motifs) in neuroanatomical data sets and identify a small set of structural motifs that occur in significantly increased numbers. Our analysis suggests the hypothesis that brain networks maximize both the number and the diversity of functional motifs, while the repertoire of structural motifs remains small. Using functional motif number as a cost function in an optimization algorithm, we obtain network topologies that resemble real brain networks across a broad spectrum of structural measures, including small-world attributes. These results are consistent with the hypothesis that highly evolved neural architectures are organized to maximize functional repertoires and to support highly efficient integration of information. PMID:15510229

  11. Redox active motifs in selenoproteins

    PubMed Central

    Li, Fei; Lutz, Patricia B.; Pepelyayeva, Yuliya; Arnér, Elias S. J.; Bayse, Craig A.; Rozovsky, Sharon

    2014-01-01

    Selenoproteins use the rare amino acid selenocysteine (Sec) to act as the first line of defense against oxidants, which are linked to aging, cancer, and neurodegenerative diseases. Many selenoproteins are oxidoreductases in which the reactive Sec is connected to a neighboring Cys and able to form a ring. These Sec-containing redox motifs govern much of the reactivity of selenoproteins. To study their fundamental properties, we have used 77Se NMR spectroscopy in concert with theoretical calculations to determine the conformational preferences and mobility of representative motifs. This use of 77Se as a probe enables the direct recording of the properties of Sec as its environment is systematically changed. We find that all motifs have several ring conformations in their oxidized state. These ring structures are most likely stabilized by weak, nonbonding interactions between the selenium and the amide carbon. To examine how the presence of selenium and ring geometric strain governs the motifs’ reactivity, we measured the redox potentials of Sec-containing motifs and their corresponding Cys-only variants. The comparisons reveal that for C-terminal motifs the redox potentials increased between 20–25 mV when the selenenylsulfide bond was changed to a disulfide bond. Changes of similar magnitude arose when we varied ring size or the motifs’ flanking residues. This suggests that the presence of Sec is not tied to unusually low redox potentials. The unique roles of selenoproteins in human health and their chemical reactivities may therefore not necessarily be explained by lower redox potentials, as has often been claimed. PMID:24769567

  12. A short conserved motif in ALYREF directs cap- and EJC-dependent assembly of export complexes on spliced mRNAs.

    PubMed

    Gromadzka, Agnieszka M; Steckelberg, Anna-Lena; Singh, Kusum K; Hofmann, Kay; Gehring, Niels H

    2016-03-18

    The export of messenger RNAs (mRNAs) is the final of several nuclear posttranscriptional steps of gene expression. The formation of export-competent mRNPs involves the recruitment of export factors that are assumed to facilitate transport of the mature mRNAs. Using in vitro splicing assays, we show that a core set of export factors, including ALYREF, UAP56 and DDX39, readily associate with the spliced RNAs in an EJC (exon junction complex)- and cap-dependent manner. In order to elucidate how ALYREF and other export adaptors mediate mRNA export, we conducted a computational analysis and discovered four short, conserved, linear motifs present in RNA-binding proteins. We show that mutation in one of the new motifs (WxHD) in an unstructured region of ALYREF reduced RNA binding and abolished the interaction with eIF4A3 and CBP80. Additionally, the mutation impaired proper localization to nuclear speckles and export of a spliced reporter mRNA. Our results reveal important details of the orchestrated recruitment of export factors during the formation of export competent mRNPs. PMID:26773052

  13. A short conserved motif in ALYREF directs cap- and EJC-dependent assembly of export complexes on spliced mRNAs

    PubMed Central

    Gromadzka, Agnieszka M.; Steckelberg, Anna-Lena; Singh, Kusum K.; Hofmann, Kay; Gehring, Niels H.

    2016-01-01

    The export of messenger RNAs (mRNAs) is the final of several nuclear posttranscriptional steps of gene expression. The formation of export-competent mRNPs involves the recruitment of export factors that are assumed to facilitate transport of the mature mRNAs. Using in vitro splicing assays, we show that a core set of export factors, including ALYREF, UAP56 and DDX39, readily associate with the spliced RNAs in an EJC (exon junction complex)- and cap-dependent manner. In order to elucidate how ALYREF and other export adaptors mediate mRNA export, we conducted a computational analysis and discovered four short, conserved, linear motifs present in RNA-binding proteins. We show that mutation in one of the new motifs (WxHD) in an unstructured region of ALYREF reduced RNA binding and abolished the interaction with eIF4A3 and CBP80. Additionally, the mutation impaired proper localization to nuclear speckles and export of a spliced reporter mRNA. Our results reveal important details of the orchestrated recruitment of export factors during the formation of export competent mRNPs. PMID:26773052

  14. Flanking regulatory sequences of the locus encoding the murine GDNF receptor, c-ret, directs lac Z (beta-galactosidase) expression in developing somatosensory system.

    PubMed

    Sukumaran, M; Waxman, S G; Wood, J N; Pachnis, V

    2001-11-01

    RET forms the catalytic component within the receptor complex that transmits signals from the GDNF family of neurotrophic factors. To study the mechanisms regulating the cell-type specific expression of this gene, we have cloned and characterised the murine c-ret locus. A cosmid contig comprising approximately 60 kb of the mouse genome encompassing the entire structural gene and flanking sequences have been isolated and the transcription initiation site identified and promoter characterised. The murine c-ret promoter lacks a TATA initiation motif and has GC enriched DNA sequences reminiscent of CpG islands. Analysis of transgenic mice lines bearing the Lac Z (beta-galactosidase) reporter gene under the control of 5' flanking sequences show modularity in the organisation of cis-regulatory domains within the locus. Cloned 5' flanking sequences comprise a distal regulatory domain directing Lac Z expression at the primitive streak, lateral mesoderm and facial ganglia and a proximal sensory neurones specific regulatory domain inducing Lac Z expression primarily within the developing somatosensory system. The spatial and temporal progression of transgene expression precisely recapitulates endogenous gene expression in developing sensory ganglia including its induction in postnatal Isolectin B4 binding nociceptive neurones. PMID:11747074

  15. Stochastic motif extraction using hidden Markov model

    SciTech Connect

    Fujiwara, Yukiko; Asogawa, Minoru; Konagaya, Akihiko

    1994-12-31

    In this paper, we study the application of an HMM (hidden Markov model) to the problem of representing protein sequences by a stochastic motif. A stochastic protein motif represents the small segments of protein sequences that have a certain function or structure. The stochastic motif, represented by an HMM, has conditional probabilities to deal with the stochastic nature of the motif. This HMM directive reflects the characteristics of the motif, such as a protein periodical structure or grouping. In order to obtain the optimal HMM, we developed the {open_quotes}iterative duplication method{close_quotes} for HMM topology learning. It starts from a small fully-connected network and iterates the network generation and parameter optimization until it achieves sufficient discrimination accuracy. Using this method, we obtained an HMM for a leucine zipper motif. Compared to the accuracy of a symbolic pattern representation with accuracy of 14.8 percent, an HMM achieved 79.3 percent in prediction. Additionally, the method can obtain an HMM for various types of zinc finger motifs, and it might separate the mixed data. We demonstrated that this approach is applicable to the validation of the protein databases; a constructed HMM b as indicated that one protein sequence annotated as {open_quotes}lencine-zipper like sequence{close_quotes} in the database is quite different from other leucine-zipper sequences in terms of likelihood, and we found this discrimination is plausible.

  16. No tradeoff between versatility and robustness in gene circuit motifs

    NASA Astrophysics Data System (ADS)

    Payne, Joshua L.

    2016-05-01

    Circuit motifs are small directed subgraphs that appear in real-world networks significantly more often than in randomized networks. In the Boolean model of gene circuits, most motifs are realized by multiple circuit genotypes. Each of a motif's constituent circuit genotypes may have one or more functions, which are embodied in the expression patterns the circuit forms in response to specific initial conditions. Recent enumeration of a space of nearly 17 million three-gene circuit genotypes revealed that all circuit motifs have more than one function, with the number of functions per motif ranging from 12 to nearly 30,000. This indicates that some motifs are more functionally versatile than others. However, the individual circuit genotypes that constitute each motif are less robust to mutation if they have many functions, hinting that functionally versatile motifs may be less robust to mutation than motifs with few functions. Here, I explore the relationship between versatility and robustness in circuit motifs, demonstrating that functionally versatile motifs are robust to mutation despite the inherent tradeoff between versatility and robustness at the level of an individual circuit genotype.

  17. Subgraphs and network motifs in geometric networks

    NASA Astrophysics Data System (ADS)

    Itzkovitz, Shalev; Alon, Uri

    2005-02-01

    Many real-world networks describe systems in which interactions decay with the distance between nodes. Examples include systems constrained in real space such as transportation and communication networks, as well as systems constrained in abstract spaces such as multivariate biological or economic data sets and models of social networks. These networks often display network motifs: subgraphs that recur in the network much more often than in randomized networks. To understand the origin of the network motifs in these networks, it is important to study the subgraphs and network motifs that arise solely from geometric constraints. To address this, we analyze geometric network models, in which nodes are arranged on a lattice and edges are formed with a probability that decays with the distance between nodes. We present analytical solutions for the numbers of all three- and four-node subgraphs, in both directed and nondirected geometric networks. We also analyze geometric networks with arbitrary degree sequences and models with a bias for directed edges in one direction. Scaling rules for scaling of subgraph numbers with system size, lattice dimension, and interaction range are given. Several invariant measures are found, such as the ratio of feedback and feed-forward loops, which do not depend on system size, dimension, or connectivity function. We find that network motifs in many real-world networks, including social networks and neuronal networks, are not captured solely by these geometric models. This is in line with recent evidence that biological network motifs were selected as basic circuit elements with defined information-processing functions.

  18. Structural and Mechanistic Analysis of Trichodiene Synthase Using Site-Directed Mutagenesis: Probing the Catalytic Function of Tryosine-295 and the Asparagine-225/Serine-229/Glutamate-233-Mg2+ B Motif

    SciTech Connect

    Vedula,L.; Jiang, J.; Zakharian, T.; Cane, D.; Christianson, D.

    2008-01-01

    Trichodiene synthase from Fusarium sporotrichioides contains two metal ion-binding motifs required for the cyclization of farnesyl diphosphate: the 'aspartate-rich' motif D100DXX(D/E) that coordinates to Mg{sup 2+}{sub A} and Mg{sup 2+}{sub C} source, and the 'NSE/DTE' motif N225DXXSXXXE that chelates Mg{sup 2+}{sub b} (boldface indicates metal ion ligands). Here, we report steady-state kinetic parameters, product array analyses, and X-ray crystal structures of trichodiene synthase mutants in which the fungal NSE motif is progressively converted into a plant-like DDXXTXXXE motif, resulting in a degradation in both steady-state kinetic parameters and product specificity. Each catalytically active mutant generates a different distribution of sesquiterpene products, and three newly detected sesquiterpenes are identified. In addition, the kinetic and structural properties of the Y295F mutant of trichodiene synthase were found to be similar to those of the wild-type enzyme, thereby ruling out a proposed role for Y295 in catalysis.

  19. Structural and mechanistic analysis of trichodiene synthase using site-directed mutagenesis: probing the catalytic function of tyrosine-295 and the asparagine-225/serine-229/glutamate-233-Mg2+B motif.

    PubMed

    Vedula, L Sangeetha; Jiang, Jiaoyang; Zakharian, Tatiana; Cane, David E; Christianson, David W

    2008-01-15

    Trichodiene synthase from Fusarium sporotrichioides contains two metal ion-binding motifs required for the cyclization of farnesyl diphosphate: the "aspartate-rich" motif D(100)DXX(D/E) that coordinates to Mg2+A and Mg2+C, and the "NSE/DTE" motif N(225)DXXSXXXE that chelates Mg2+B (boldface indicates metal ion ligands). Here, we report steady-state kinetic parameters, product array analyses, and X-ray crystal structures of trichodiene synthase mutants in which the fungal NSE motif is progressively converted into a plant-like DDXXTXXXE motif, resulting in a degradation in both steady-state kinetic parameters and product specificity. Each catalytically active mutant generates a different distribution of sesquiterpene products, and three newly detected sesquiterpenes are identified. In addition, the kinetic and structural properties of the Y295F mutant of trichodiene synthase were found to be similar to those of the wild-type enzyme, thereby ruling out a proposed role for Y295 in catalysis. PMID:17996718

  20. A G-Box-Like Motif Is Necessary for Transcriptional Regulation by Circadian Pseudo-Response Regulators in Arabidopsis1[OPEN

    PubMed Central

    Newton, Linsey; Liu, Ming-Jung

    2016-01-01

    PSEUDO-RESPONSE REGULATORs (PRRs) play overlapping and distinct roles in maintaining circadian rhythms and regulating diverse biological processes, including the photoperiodic control of flowering, growth, and abiotic stress responses. PRRs act as transcriptional repressors and associate with chromatin via their conserved C-terminal CCT (CONSTANS, CONSTANS-like, and TIMING OF CAB EXPRESSION 1 [TOC1/PRR1]) domains by a still-poorly understood mechanism. Here, we identified genome-wide targets of PRR9 using chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) and compared them with PRR7, PRR5, and TOC1/PRR1 ChIP-seq data. We found that PRR binding sites are located within genomic regions of low nucleosome occupancy and high DNase I hypersensitivity. Moreover, conserved noncoding regions among Brassicaceae species are enriched around PRR binding sites, indicating that PRRs associate with functionally relevant cis-regulatory regions. The PRRs shared a significant number of binding regions, and our results indicate that they coordinately restrict the expression of target genes to around dawn. A G-box-like motif was overrepresented at PRR binding regions, and we showed that this motif is necessary for mediating transcriptional regulation of CIRCADIAN CLOCK ASSOCIATED 1 and PRR9 by the PRRs. Our results further our understanding of how PRRs target specific promoters and provide an extensive resource for studying circadian regulatory networks in plants. PMID:26586835

  1. Second-site long terminal repeat (LTR) revertants of replication-defective human immunodeficiency virus: effects of revertant TATA box motifs on virus infectivity, LTR-directed expression, in vitro RNA synthesis, and binding of basal transcription factors TFIID and TFIIA.

    PubMed Central

    Kashanchi, F; Shibata, R; Ross, E K; Brady, J N; Martin, M A

    1994-01-01

    Second-site revertants from replication-incompetent molecular clones of human immunodeficiency virus (HIV) contain base substitutions adjacent to the TATA motif. The altered TATA box motifs were analyzed for their effect(s) on virus infectivity, long terminal repeat (LTR)-directed expression in transient transfection assays, in vitro RNA synthesis, and assembly of the TFIID-TFIIA preinitiation complex. The revertant TATA boxes accelerated the kinetics of HIV replication when present in the context of an LTR containing a Sp1 mutation (deletion or site specific); no effect was observed on the infectivity of wild-type HIV. In chloramphenicol acetyltransferase assays and in vitro transcription systems, the altered TATA box motifs led to elevated basal levels of RNA synthesis from NF-kappa B- and Sp1-mutagenized and wild-type templates, respectively, but did not increase responsiveness to Tat transactivation. The revertant TATA boxes accelerated the binding of TFIID and TFIIA to the LTR and stabilized their association with the promoter. The revertants did not assemble a more-processive elongation complex. These results suggest that in the context of an impaired enhancer/promoter (viz., three mutated Sp1 elements), a series of HIV revertants emerge which contain LTR alterations that significantly augment basal RNA synthesis. The TATA motif revertants are capable of rescuing the enhancer/promoter defect and sustain virus infectivity. Images PMID:8151790

  2. A generalization of substitution evolution models of nucleotides to genetic motifs.

    PubMed

    Benard, Emmanuel; Michel, Christian J

    2011-11-01

    We generalize here the classical stochastic substitution models of nucleotides to genetic motifs of any size. This generalized model gives the analytical occurrence probabilities of genetic motifs as a function of a substitution matrix containing up to three formal parameters (substitution rates) per motif site and of an initial occurrence probability vector of genetic motifs. The evolution direction can be direct (past-present) or inverse (present-past). This extension has been made due to the identification of a Kronecker relation between the nucleotide substitution matrices and the motif substitution matrices. The evolution models for motifs of size 4 (tetranucleotides) and 5 (pentanucleotides) are now included in the SEGM (Stochastic Evolution of Genetic Motifs) web server.

  3. Triadic motifs in the dependence networks of virtual societies.

    PubMed

    Xie, Wen-Jie; Li, Ming-Xia; Jiang, Zhi-Qiang; Zhou, Wei-Xing

    2014-06-10

    In friendship networks, individuals have different numbers of friends, and the closeness or intimacy between an individual and her friends is heterogeneous. Using a statistical filtering method to identify relationships about who depends on whom, we construct dependence networks (which are directed) from weighted friendship networks of avatars in more than two hundred virtual societies of a massively multiplayer online role-playing game (MMORPG). We investigate the evolution of triadic motifs in dependence networks. Several metrics show that the virtual societies evolved through a transient stage in the first two to three weeks and reached a relatively stable stage. We find that the unidirectional loop motif (M9) is underrepresented and does not appear, open motifs are also underrepresented, while other close motifs are overrepresented. We also find that, for most motifs, the overall level difference of the three avatars in the same motif is significantly lower than average, whereas the sum of ranks is only slightly larger than average. Our findings show that avatars' social status plays an important role in the formation of triadic motifs.

  4. Triadic motifs in the dependence networks of virtual societies

    NASA Astrophysics Data System (ADS)

    Xie, Wen-Jie; Li, Ming-Xia; Jiang, Zhi-Qiang; Zhou, Wei-Xing

    2014-06-01

    In friendship networks, individuals have different numbers of friends, and the closeness or intimacy between an individual and her friends is heterogeneous. Using a statistical filtering method to identify relationships about who depends on whom, we construct dependence networks (which are directed) from weighted friendship networks of avatars in more than two hundred virtual societies of a massively multiplayer online role-playing game (MMORPG). We investigate the evolution of triadic motifs in dependence networks. Several metrics show that the virtual societies evolved through a transient stage in the first two to three weeks and reached a relatively stable stage. We find that the unidirectional loop motif (M9) is underrepresented and does not appear, open motifs are also underrepresented, while other close motifs are overrepresented. We also find that, for most motifs, the overall level difference of the three avatars in the same motif is significantly lower than average, whereas the sum of ranks is only slightly larger than average. Our findings show that avatars' social status plays an important role in the formation of triadic motifs.

  5. Triadic motifs in the dependence networks of virtual societies.

    PubMed

    Xie, Wen-Jie; Li, Ming-Xia; Jiang, Zhi-Qiang; Zhou, Wei-Xing

    2014-01-01

    In friendship networks, individuals have different numbers of friends, and the closeness or intimacy between an individual and her friends is heterogeneous. Using a statistical filtering method to identify relationships about who depends on whom, we construct dependence networks (which are directed) from weighted friendship networks of avatars in more than two hundred virtual societies of a massively multiplayer online role-playing game (MMORPG). We investigate the evolution of triadic motifs in dependence networks. Several metrics show that the virtual societies evolved through a transient stage in the first two to three weeks and reached a relatively stable stage. We find that the unidirectional loop motif (M9) is underrepresented and does not appear, open motifs are also underrepresented, while other close motifs are overrepresented. We also find that, for most motifs, the overall level difference of the three avatars in the same motif is significantly lower than average, whereas the sum of ranks is only slightly larger than average. Our findings show that avatars' social status plays an important role in the formation of triadic motifs. PMID:24912755

  6. [Prediction of Promoter Motifs in Virophages].

    PubMed

    Gong, Chaowen; Zhou, Xuewen; Pan, Yingjie; Wang, Yongjie

    2015-07-01

    Virophages have crucial roles in ecosystems and are the transport vectors of genetic materials. To shed light on regulation and control mechanisms in virophage--host systems as well as evolution between virophages and their hosts, the promoter motifs of virophages were predicted on the upstream regions of start codons using an analytical tool for prediction of promoter motifs: Multiple EM for Motif Elicitation. Seventeen potential promoter motifs were identified based on the E-value, location, number and length of promoters in genomes. Sputnik and zamilon motif 2 with AT-rich regions were distributed widely on genomes, suggesting that these motifs may be associated with regulation of the expression of various genes. Motifs containing the TCTA box were predicted to be late promoter motif in mavirus; motifs containing the ATCT box were the potential late promoter motif in the Ace Lake mavirus . AT-rich regions were identified on motif 2 in the Organic Lake virophage, motif 3 in Yellowstone Lake virophage (YSLV)1 and 2, motif 1 in YSLV3, and motif 1 and 2 in YSLV4, respectively. AT-rich regions were distributed widely on the genomes of virophages. All of these motifs may be promoter motifs of virophages. Our results provide insights into further exploration of temporal expression of genes in virophages as well as associations between virophages and giant viruses. PMID:26524912

  7. From Cis-Regulatory Elements to Complex RNPs and Back

    PubMed Central

    Gebauer, Fátima; Preiss, Thomas; Hentze, Matthias W.

    2012-01-01

    Messenger RNAs (mRNAs), the templates for translation, have evolved to harbor abundant cis-acting sequences that affect their posttranscriptional fates. These elements are frequently located in the untranslated regions and serve as binding sites for trans-acting factors, RNA-binding proteins, and/or small non-coding RNAs. This article provides a systematic synopsis of cis-acting elements, trans-acting factors, and the mechanisms by which they affect translation. It also highlights recent technical advances that have ushered in the era of transcriptome-wide studies of the ribonucleoprotein complexes formed by mRNAs and their trans-acting factors. PMID:22751153

  8. Sequential visibility-graph motifs

    NASA Astrophysics Data System (ADS)

    Iacovacci, Jacopo; Lacasa, Lucas

    2016-04-01

    Visibility algorithms transform time series into graphs and encode dynamical information in their topology, paving the way for graph-theoretical time series analysis as well as building a bridge between nonlinear dynamics and network science. In this work we introduce and study the concept of sequential visibility-graph motifs, smaller substructures of n consecutive nodes that appear with characteristic frequencies. We develop a theory to compute in an exact way the motif profiles associated with general classes of deterministic and stochastic dynamics. We find that this simple property is indeed a highly informative and computationally efficient feature capable of distinguishing among different dynamics and robust against noise contamination. We finally confirm that it can be used in practice to perform unsupervised learning, by extracting motif profiles from experimental heart-rate series and being able, accordingly, to disentangle meditative from other relaxation states. Applications of this general theory include the automatic classification and description of physical, biological, and financial time series.

  9. Unravelling daily human mobility motifs.

    PubMed

    Schneider, Christian M; Belik, Vitaly; Couronné, Thomas; Smoreda, Zbigniew; González, Marta C

    2013-07-01

    Human mobility is differentiated by time scales. While the mechanism for long time scales has been studied, the underlying mechanism on the daily scale is still unrevealed. Here, we uncover the mechanism responsible for the daily mobility patterns by analysing the temporal and spatial trajectories of thousands of persons as individual networks. Using the concept of motifs from network theory, we find only 17 unique networks are present in daily mobility and they follow simple rules. These networks, called here motifs, are sufficient to capture up to 90 per cent of the population in surveys and mobile phone datasets for different countries. Each individual exhibits a characteristic motif, which seems to be stable over several months. Consequently, daily human mobility can be reproduced by an analytically tractable framework for Markov chains by modelling periods of high-frequency trips followed by periods of lower activity as the key ingredient.

  10. Neural Circuits: Male Mating Motifs.

    PubMed

    Benton, Richard

    2015-09-01

    Characterizing microcircuit motifs in intact nervous systems is essential to relate neural computations to behavior. In this issue of Neuron, Clowney et al. (2015) identify recurring, parallel feedforward excitatory and inhibitory pathways in male Drosophila's courtship circuitry, which might explain decisive mate choice.

  11. Combinatorial Information Theoretical Measurement of the Semantic Significance of Semantic Graph Motifs

    SciTech Connect

    Joslyn, Cliff A.; al-Saffar, Sinan; Haglin, David J.; Holder, Larry

    2011-06-14

    Given an arbitrary semantic graph data set, perhaps one lacking in explicit ontological information, we wish to first identify its significant semantic structures, and then measure the extent of their significance. Casting a semantic graph dataset as an edge-labeled, directed graph, this task can be built on the ability to mine frequent {\\em labeled} subgraphs in edge-labeled, directed graphs. We begin by considering the fundamentals of the enumerative combinatorics of subgraph motif structures in edge-labeled directed graphs. We identify its frequent labeled, directed subgraph motif patterns, and measure the significance of the resulting motifs by the information gain relative to the expected value of the motif based on the empirical frequency distribution of the link types which compose them, assuming indpendence. We illustrate the method on a small test graph, and discuss results obtained for small linear motifs (link type bigrams and trigrams) in a larger graph structure.

  12. Observability of Neuronal Network Motifs

    PubMed Central

    Whalen, Andrew J.; Brennan, Sean N.; Sauer, Timothy D.; Schiff, Steven J.

    2014-01-01

    We quantify observability in small (3 node) neuronal networks as a function of 1) the connection topology and symmetry, 2) the measured nodes, and 3) the nodal dynamics (linear and nonlinear). We find that typical observability metrics for 3 neuron motifs range over several orders of magnitude, depending upon topology, and for motifs containing symmetry the network observability decreases when observing from particularly confounded nodes. Nonlinearities in the nodal equations generally decrease the average network observability and full network information becomes available only in limited regions of the system phase space. Our findings demonstrate that such networks are partially observable, and suggest their potential efficacy in reconstructing network dynamics from limited measurement data. How well such strategies can be used to reconstruct and control network dynamics in experimental settings is a subject for future experimental work. PMID:25909092

  13. The Thiamin Pyrophosphate-Motif

    NASA Technical Reports Server (NTRS)

    Dominiak, P.; Ciszak, E.

    2003-01-01

    Using databases the authors have identified a common thiamin pyrophosphate (TPP)-motif in the family of functionally diverse TPP-dependent enzymes. This common motif consists of multimeric organization of subunits and two catalytic centers. Each catalytic center (PP:PYR) is formed at the interface of the PP-domain binding the magnesium ion, pyrophosphate and amhopyrimidine ring of TPP, and the PYR-domain binding the aminopyrimidine ring of that cofactor. A pair of these catalytic centers constitutes the catalytic core (PP:PYR)(sub 2) within these enzymes. Analysis of the structural elements of this catalytic core reveals novel definition of the common amino acid sequences, which are GXPhiX(sub 4)(G)PhiXXGQ and GDGX(sub 25-30)NN in the PP-domain, and the EX(sub 4)(G)PhiXXGPhi in the PYR-domain, where Phi corresponds to a hydrophobic amino acid. This TPP-motif provides a novel tool for annotation of TPP-dependent enzymes useful in advancing functional proteomics.

  14. The Thiamin Pyrophosphate-Motif

    NASA Technical Reports Server (NTRS)

    Dominiak, Paulina M.; Ciszak, Ewa M.

    2003-01-01

    Using databases the authors have identified a common thiamin pyrophosphate (TPP)-motif in the family of functionally diverse TPP-dependent enzymes. This common motif consists of multimeric organization of subunits, two catalytic centers, common amino acid sequence, and specific contacts to provide a flip-flop, or alternate site, mechanism of action. Each catalytic center [PP:PYR] is formed at the interface of the PP-domain binding the magnesium ion, pyrophosphate and aminopyrimidine ring of TPP, and the PYR-domain binding the aminopyrimidine ring of that cofactor. A pair of these catalytic centers constitutes the catalytic core [PP:PYR]* within these enzymes. Analysis of the structural elements of this catalytic core reveals novel definition of the common amino acid sequences, which are GX@&(G)@XXGQ, and GDGX25-30 within the PP- domain, and the E&(G)@XXG@ within the PYR-domain, where Q, corresponds to a hydrophobic amino acid. This TPP-motif provides a novel tool for annotation of TPP-dependent enzymes useful in advancing functional proteomics.

  15. A comprehensive analysis of the La-motif protein superfamily.

    PubMed

    Bousquet-Antonelli, Cécile; Deragon, Jean-Marc

    2009-05-01

    The extremely well-conserved La motif (LAM), in synergy with the immediately following RNA recognition motif (RRM), allows direct binding of the (genuine) La autoantigen to RNA polymerase III primary transcripts. This motif is not only found on La homologs, but also on La-related proteins (LARPs) of unrelated function. LARPs are widely found amongst eukaryotes and, although poorly characterized, appear to be RNA-binding proteins fulfilling crucial cellular functions. We searched the fully sequenced genomes of 83 eukaryotic species scattered along the tree of life for the presence of LAM-containing proteins. We observed that these proteins are absent from archaea and present in all eukaryotes (except protists from the Plasmodium genus), strongly suggesting that the LAM is an ancestral motif that emerged early after the archaea-eukarya radiation. A complete evolutionary and structural analysis of these proteins resulted in their classification into five families: the genuine La homologs and four LARP families. Unexpectedly, in each family a conserved domain representing either a classical RRM or an RRM-like motif immediately follows the LAM of most proteins. An evolutionary analysis of the LAM-RRM/RRM-L regions shows that these motifs co-evolved and should be used as a single entity to define the functional region of interaction of LARPs with their substrates. We also found two extremely well conserved motifs, named LSA and DM15, shared by LARP6 and LARP1 family members, respectively. We suggest that members of the same family are functional homologs and/or share a common molecular mode of action on different RNA baits.

  16. RNA motif discovery: a computational overview.

    PubMed

    Achar, Avinash; Sætrom, Pål

    2015-01-01

    Genomic studies have greatly expanded our knowledge of structural non-coding RNAs (ncRNAs). These RNAs fold into characteristic secondary structures and perform specific-structure dependent biological functions. Hence RNA secondary structure prediction is one of the most well studied problems in computational RNA biology. Comparative sequence analysis is one of the more reliable RNA structure prediction approaches as it exploits information of multiple related sequences to infer the consensus secondary structure. This class of methods essentially learns a global secondary structure from the input sequences. In this paper, we consider the more general problem of unearthing common local secondary structure based patterns from a set of related sequences. The input sequences for example could correspond to 3(') or 5(') untranslated regions of a set of orthologous genes and the unearthed local patterns could correspond to regulatory motifs found in these regions. These sequences could also correspond to in vitro selected RNA, genomic segments housing ncRNA genes from the same family and so on. Here, we give a detailed review of the various computational techniques proposed in literature attempting to solve this general motif discovery problem. We also give empirical comparisons of some of the current state of the art methods and point out future directions of research.

  17. Detecting correlations among functional-sequence motifs

    NASA Astrophysics Data System (ADS)

    Pirino, Davide; Rigosa, Jacopo; Ledda, Alice; Ferretti, Luca

    2012-06-01

    Sequence motifs are words of nucleotides in DNA with biological functions, e.g., gene regulation. Identification of such words proceeds through rejection of Markov models on the expected motif frequency along the genome. Additional biological information can be extracted from the correlation structure among patterns of motif occurrences. In this paper a log-linear multivariate intensity Poisson model is estimated via expectation maximization on a set of motifs along the genome of E. coli K12. The proposed approach allows for excitatory as well as inhibitory interactions among motifs and between motifs and other genomic features like gene occurrences. Our findings confirm previous stylized facts about such types of interactions and shed new light on genome-maintenance functions of some particular motifs. We expect these methods to be applicable to a wider set of genomic features.

  18. Regulatory motifs in Chk1

    PubMed Central

    Caparelli, Michael L.; O’Connell, Matthew J.

    2013-01-01

    Chk1 is the effector kinase of the G2 DNA damage checkpoint. Chk1 homologs possess a highly conserved N-terminal kinase domain and a less conserved C-terminal regulatory domain. In response to DNA damage, Chk1 is recruited to mediator proteins assembled at lesions on replication protein A (RPA)-coated single-stranded DNA (ssDNA). Chk1 is then activated by phosphorylation on S345 in the C-terminal regulatory domain by the PI3 kinase-related kinases ATM and ATR to enforce a G2 cell cycle arrest to allow time for DNA repair. Models have emerged in which this C-terminal phosphorylation relieves auto-inhibitory regulation of the kinase domain by the regulatory domain. However, experiments in fission yeast have shown that deletion of this putative auto-inhibitory domain actually inactivates Chk1 function. We show here that Chk1 homologs possess a kinase-associated 1 (KA1) domain that possesses residues previously implicated in Chk1 auto-inhibition. In addition, all Chk1 homologs have a small and highly conserved C-terminal extension (CTE domain). In fission yeast, both of these motifs are essential for Chk1 activation through interaction with the mediator protein Crb2, the homolog of human 53BP1. Thus, through different intra- and intermolecular interactions, these motifs explain why the regulatory domain exerts both positive and negative control over Chk1 activation. Such motifs may provide alternative targets to the ATP-binding pocket on which to dock Chk1 inhibitors as anticancer therapeutics. PMID:23422000

  19. Structural Motifs of Gold Nanoparticles.

    NASA Astrophysics Data System (ADS)

    Cleveland, C. L.; Luedtke, W. D.; Landman, Uzi

    1996-03-01

    Through an extensive search, involving energy minimization using embedded atom potentials, we found(R.L. Whetten et al./), submitted to Nature (1995). that the energetically optimal sequence for AuN clusters (30 <= N <= 3000 atoms) consists of fcc crystallites, with a truncated-octahedral (TO) morphological motif, and variants thereof. These predictions for bare gold particles, and for particles coated by sef-assembled thiol monolayers, are discussed in light of recent experiments on the preparation and characterization (including mass spectrometry, electron microscopy, and X-ray diffraction) of nanocrystalline gold molecules (see Ref. 2).

  20. An Algorithm for Motif Discovery with Iteration on Lengths of Motifs.

    PubMed

    Fan, Yetian; Wu, Wei; Yang, Jie; Yang, Wenyu; Liu, Rongrong

    2015-01-01

    Analysis of DNA sequence motifs is becoming increasingly important in the study of gene regulation, and the identification of motif in DNA sequences is a complex problem in computational biology. Motif discovery has attracted the attention of more and more researchers, and varieties of algorithms have been proposed. Most existing motif discovery algorithms fix the motif's length as one of the input parameters. In this paper, a novel method is proposed to identify the optimal length of the motif and the optimal motif with that length, through an iteration process on increasing length numbers. For each fixed length, a modified genetic algorithm (GA) is used for finding the optimal motif with that length. Three operators are used in the modified GA: Mutation that is similar to the one used in usual GA but is modified to avoid local optimum in our case, and Addition and Deletion that are proposed by us for the problem. A criterion is given for singling out the optimal length in the increasing motif's lengths. We call this method AMDILM (an algorithm for motif discovery with iteration on lengths of motifs). The experiments on simulated data and real biological data show that AMDILM can accurately identify the optimal motif length. Meanwhile, the optimal motifs discovered by AMDILM are consistent with the real ones and are similar with the motifs obtained by the three well-known methods: Gibbs Sampler, MEME and Weeder. PMID:26357084

  1. Circular code motifs in genomes of eukaryotes.

    PubMed

    El Soufi, Karim; Michel, Christian J

    2016-11-01

    A set X of 20 trinucleotides was identified in genes of bacteria, eukaryotes, plasmids and viruses, which has in average the highest occurrence in reading frame compared to its two shifted frames (Michel, 2015; Arquès and Michel, 1996). This set X has an interesting mathematical property as X is a circular code (Arquès and Michel, 1996). Thus, the motifs from this circular code X, called X motifs, have the property to always retrieve, synchronize and maintain the reading frame in genes. In this paper, we develop several statistical analyzes of X motifs in 138 available complete genomes of eukaryotes in which genes as well as non-gene regions are examined. Large X motifs (with lengths of at least 15 consecutive trinucleotides of X and compositions of at least 10 different trinucleotides of X among 20) have the highest occurrence in genomes of eukaryotes compared to its 23 large bijective motifs, its two large permuted motifs and large random motifs. The largest X motifs identified in eukaryotic genomes are presented, e.g. an X motif in a non-gene region of the genome Solanum pennellii with a length of 155 trinucleotides (465 nucleotides) and an expectation E=10(-71). In the human genome, the largest X motif occurs in a non-gene region of the chromosome 13 with a length of 36 trinucleotides and an expectation E=10(-11). X motifs in non-gene regions of genomes could be evolutionary relics of primitive genes using the circular code for translation. However, the proportion of X motifs (with lengths of at least 10 consecutive trinucleotides of X and compositions of at least 5 different trinucleotides of X among 20) in genes/non-genes of the 138 complete eukaryotic genomes is about 8. Thus, the X motifs occur preferentially in genes, as expected from the previous works of 20 years.

  2. Comparative genomic analysis of upstream miRNA regulatory motifs in Caenorhabditis.

    PubMed

    Jovelin, Richard; Krizus, Aldis; Taghizada, Bakhtiyar; Gray, Jeremy C; Phillips, Patrick C; Claycomb, Julie M; Cutter, Asher D

    2016-07-01

    MicroRNAs (miRNAs) comprise a class of short noncoding RNA molecules that play diverse developmental and physiological roles by controlling mRNA abundance and protein output of the vast majority of transcripts. Despite the importance of miRNAs in regulating gene function, we still lack a complete understanding of how miRNAs themselves are transcriptionally regulated. To fill this gap, we predicted regulatory sequences by searching for abundant short motifs located upstream of miRNAs in eight species of Caenorhabditis nematodes. We identified three conserved motifs across the Caenorhabditis phylogeny that show clear signatures of purifying selection from comparative genomics, patterns of nucleotide changes in motifs of orthologous miRNAs, and correlation between motif incidence and miRNA expression. We then validated our predictions with transgenic green fluorescent protein reporters and site-directed mutagenesis for a subset of motifs located in an enhancer region upstream of let-7 We demonstrate that a CT-dinucleotide motif is sufficient for proper expression of GFP in the seam cells of adult C. elegans, and that two other motifs play incremental roles in combination with the CT-rich motif. Thus, functional tests of sequence motifs identified through analysis of molecular evolutionary signatures provide a powerful path for efficiently characterizing the transcriptional regulation of miRNA genes. PMID:27140965

  3. The Thiamine-Pyrophosphate-Motif

    NASA Technical Reports Server (NTRS)

    Ciszak, Ewa; Dominiak, Paulina

    2004-01-01

    Thiamin pyrophosphate (TPP), a derivative of vitamin B1, is a cofactor for enzymes performing catalysis in pathways of energy production including the well known decarboxylation of a-keto acid dehydrogenases followed by transketolation. TPP-dependent enzymes constitute a structurally and functionally diverse group exhibiting multimeric subunit organization, multiple domains and two chemically equivalent catalytic centers. Annotation of functional TPP-dependcnt enzymes, therefore, has not been trivial due to low sequence similarity related to this complex organization. Our approach to analysis of structures of known TPP-dependent enzymes reveals for the first time features common to this group, which we have termed the TPP-motif. The TPP-motif consists of specific spatial arrangements of structural elements and their specific contacts to provide for a flip-flop, or alternate site, enzymatic mechanism of action. Analysis of structural elements entrained in the flip-flop action displayed by TPP-dependent enzymes reveals a novel definition of the common amino acid sequences. These sequences allow for annotation of TPP-dependent enzymes, thus advancing functional proteomics. Further details of three-dimensional structures of TPP-dependent enzymes will be discussed.

  4. Synthetic biology with RNA motifs.

    PubMed

    Saito, Hirohide; Inoue, Tan

    2009-02-01

    Structural motifs in naturally occurring RNAs and RNPs can be employed as new molecular parts for synthetic biology to facilitate the development of novel devices and systems that modulate cellular functions. In this review, we focus on the following: (i) experimental evolution techniques of RNA molecules in vitro and (ii) their applications for regulating gene expression systems in vivo. For experimental evolution, new artificial RNA aptamers and RNA enzymes (ribozymes) have been selected in vitro. These functional RNA molecules are likely to be applicable in the reprogramming of existing gene regulatory systems. Furthermore, they may be used for designing hypothetical RNA-based living systems in the so-called RNA world. For the regulation of gene expressions in living cells, the development of new riboswitches allows us to modulate the target gene expression in a tailor-made manner. Moreover, recently RNA-based synthetic genetic circuits have been reported by employing functional RNA molecules, expanding the repertory of synthetic biology with RNA motifs. PMID:18775792

  5. Motif3D: Relating protein sequence motifs to 3D structure.

    PubMed

    Gaulton, Anna; Attwood, Teresa K

    2003-07-01

    Motif3D is a web-based protein structure viewer designed to allow sequence motifs, and in particular those contained in the fingerprints of the PRINTS database, to be visualised on three-dimensional (3D) structures. Additional functionality is provided for the rhodopsin-like G protein-coupled receptors, enabling fingerprint motifs of any of the receptors in this family to be mapped onto the single structure available, that of bovine rhodopsin. Motif3D can be used via the web interface available at: http://www.bioinf.man.ac.uk/dbbrowser/motif3d/motif3d.html.

  6. Discriminative motif optimization based on perceptron training

    PubMed Central

    Patel, Ronak Y.; Stormo, Gary D.

    2014-01-01

    Motivation: Generating accurate transcription factor (TF) binding site motifs from data generated using the next-generation sequencing, especially ChIP-seq, is challenging. The challenge arises because a typical experiment reports a large number of sequences bound by a TF, and the length of each sequence is relatively long. Most traditional motif finders are slow in handling such enormous amount of data. To overcome this limitation, tools have been developed that compromise accuracy with speed by using heuristic discrete search strategies or limited optimization of identified seed motifs. However, such strategies may not fully use the information in input sequences to generate motifs. Such motifs often form good seeds and can be further improved with appropriate scoring functions and rapid optimization. Results: We report a tool named discriminative motif optimizer (DiMO). DiMO takes a seed motif along with a positive and a negative database and improves the motif based on a discriminative strategy. We use area under receiver-operating characteristic curve (AUC) as a measure of discriminating power of motifs and a strategy based on perceptron training that maximizes AUC rapidly in a discriminative manner. Using DiMO, on a large test set of 87 TFs from human, drosophila and yeast, we show that it is possible to significantly improve motifs identified by nine motif finders. The motifs are generated/optimized using training sets and evaluated on test sets. The AUC is improved for almost 90% of the TFs on test sets and the magnitude of increase is up to 39%. Availability and implementation: DiMO is available at http://stormo.wustl.edu/DiMO Contact: rpatel@genetics.wustl.edu, ronakypatel@gmail.com PMID:24369152

  7. Mining, compressing and classifying with extensible motifs

    PubMed Central

    Apostolico, Alberto; Comin, Matteo; Parida, Laxmi

    2006-01-01

    Background Motif patterns of maximal saturation emerged originally in contexts of pattern discovery in biomolecular sequences and have recently proven a valuable notion also in the design of data compression schemes. Informally, a motif is a string of intermittently solid and wild characters that recurs more or less frequently in an input sequence or family of sequences. Motif discovery techniques and tools tend to be computationally imposing, however, special classes of "rigid" motifs have been identified of which the discovery is affordable in low polynomial time. Results In the present work, "extensible" motifs are considered such that each sequence of gaps comes endowed with some elasticity, whereby the same pattern may be stretched to fit segments of the source that match all the solid characters but are otherwise of different lengths. A few applications of this notion are then described. In applications of data compression by textual substitution, extensible motifs are seen to bring savings on the size of the codebook, and hence to improve compression. In germane contexts, in which compressibility is used in its dual role as a basis for structural inference and classification, extensible motifs are seen to support unsupervised classification and phylogeny reconstruction. Conclusion Off-line compression based on extensible motifs can be used advantageously to compress and classify biological sequences. PMID:16722593

  8. The EDLL motif: a potent plant transcriptional activation domain from AP2/ERF transcription factors.

    PubMed

    Tiwari, Shiv B; Belachew, Alemu; Ma, Siu Fong; Young, Melinda; Ade, Jules; Shen, Yu; Marion, Colleen M; Holtan, Hans E; Bailey, Adina; Stone, Jeffrey K; Edwards, Leslie; Wallace, Andreah D; Canales, Roger D; Adam, Luc; Ratcliffe, Oliver J; Repetti, Peter P

    2012-06-01

    In plants, the ERF/EREBP family of transcriptional regulators plays a key role in adaptation to various biotic and abiotic stresses. These proteins contain a conserved AP2 DNA-binding domain and several uncharacterized motifs. Here, we describe a short motif, termed 'EDLL', that is present in AtERF98/TDR1 and other clade members from the same AP2 sub-family. We show that the EDLL motif, which has a unique arrangement of acidic amino acids and hydrophobic leucines, functions as a strong activation domain. The motif is transferable to other proteins, and is active at both proximal and distal positions of target promoters. As such, the EDLL motif is able to partly overcome the repression conferred by the AtHB2 transcription factor, which contains an ERF-associated amphiphilic repression (EAR) motif. We further examined the activation potential of EDLL by analysis of the regulation of flowering time by NF-Y (nuclear factor Y) proteins. Genetic evidence indicates that NF-Y protein complexes potentiate the action of CONSTANS in regulation of flowering in Arabidopsis; we show that the transcriptional activation function of CONSTANS can be substituted by direct fusion of the EDLL activation motif to NF-YB subunits. The EDLL motif represents a potent plant activation domain that can be used as a tool to confer transcriptional activation potential to heterologous DNA-binding proteins.

  9. Sampling Motif-Constrained Ensembles of Networks

    NASA Astrophysics Data System (ADS)

    Fischer, Rico; Leitão, Jorge C.; Peixoto, Tiago P.; Altmann, Eduardo G.

    2015-10-01

    The statistical significance of network properties is conditioned on null models which satisfy specified properties but that are otherwise random. Exponential random graph models are a principled theoretical framework to generate such constrained ensembles, but which often fail in practice, either due to model inconsistency or due to the impossibility to sample networks from them. These problems affect the important case of networks with prescribed clustering coefficient or number of small connected subgraphs (motifs). In this Letter we use the Wang-Landau method to obtain a multicanonical sampling that overcomes both these problems. We sample, in polynomial time, networks with arbitrary degree sequences from ensembles with imposed motifs counts. Applying this method to social networks, we investigate the relation between transitivity and homophily, and we quantify the correlation between different types of motifs, finding that single motifs can explain up to 60% of the variation of motif profiles.

  10. Automated Motif Discovery from Glycan Array Data

    PubMed Central

    Cholleti, Sharath R.; Agravat, Sanjay; Morris, Tim; Saltz, Joel H.; Song, Xuezheng

    2012-01-01

    Abstract Assessing interactions of a glycan-binding protein (GBP) or lectin with glycans on a microarray generates large datasets, making it difficult to identify a glycan structural motif or determinant associated with the highest apparent binding strength of the GBP. We have developed a computational method, termed GlycanMotifMiner, that uses the relative binding of a GBP with glycans within a glycan microarray to automatically reveal the glycan structural motifs recognized by a GBP. We implemented the software with a web-based graphical interface for users to explore and visualize the discovered motifs. The utility of GlycanMotifMiner was determined using five plant lectins, SNA, HPA, PNA, Con A, and UEA-I. Data from the analyses of the lectins at different protein concentrations were processed to rank the glycans based on their relative binding strengths. The motifs, defined as glycan substructures that exist in a large number of the bound glycans and few non-bound glycans, were then discovered by our algorithm and displayed in a web-based graphical user interface (http://glycanmotifminer.emory.edu). The information is used in defining the glycan-binding specificity of GBPs. The results were compared to the known glycan specificities of these lectins generated by manual methods. A more complex analysis was also carried out using glycan microarray data obtained for a recombinant form of human galectin-8. Results for all of these lectins show that GlycanMotifMiner identified the major motifs known in the literature along with some unexpected novel binding motifs. PMID:22877213

  11. Automated motif discovery from glycan array data.

    PubMed

    Cholleti, Sharath R; Agravat, Sanjay; Morris, Tim; Saltz, Joel H; Song, Xuezheng; Cummings, Richard D; Smith, David F

    2012-10-01

    Assessing interactions of a glycan-binding protein (GBP) or lectin with glycans on a microarray generates large datasets, making it difficult to identify a glycan structural motif or determinant associated with the highest apparent binding strength of the GBP. We have developed a computational method, termed GlycanMotifMiner, that uses the relative binding of a GBP with glycans within a glycan microarray to automatically reveal the glycan structural motifs recognized by a GBP. We implemented the software with a web-based graphical interface for users to explore and visualize the discovered motifs. The utility of GlycanMotifMiner was determined using five plant lectins, SNA, HPA, PNA, Con A, and UEA-I. Data from the analyses of the lectins at different protein concentrations were processed to rank the glycans based on their relative binding strengths. The motifs, defined as glycan substructures that exist in a large number of the bound glycans and few non-bound glycans, were then discovered by our algorithm and displayed in a web-based graphical user interface ( http://glycanmotifminer.emory.edu ). The information is used in defining the glycan-binding specificity of GBPs. The results were compared to the known glycan specificities of these lectins generated by manual methods. A more complex analysis was also carried out using glycan microarray data obtained for a recombinant form of human galectin-8. Results for all of these lectins show that GlycanMotifMiner identified the major motifs known in the literature along with some unexpected novel binding motifs. PMID:22877213

  12. Form and function in gene regulatory networks: the structure of network motifs determines fundamental properties of their dynamical state space.

    PubMed

    Ahnert, S E; Fink, T M A

    2016-07-01

    Network motifs have been studied extensively over the past decade, and certain motifs, such as the feed-forward loop, play an important role in regulatory networks. Recent studies have used Boolean network motifs to explore the link between form and function in gene regulatory networks and have found that the structure of a motif does not strongly determine its function, if this is defined in terms of the gene expression patterns the motif can produce. Here, we offer a different, higher-level definition of the 'function' of a motif, in terms of two fundamental properties of its dynamical state space as a Boolean network. One is the basin entropy, which is a complexity measure of the dynamics of Boolean networks. The other is the diversity of cyclic attractor lengths that a given motif can produce. Using these two measures, we examine all 104 topologically distinct three-node motifs and show that the structural properties of a motif, such as the presence of feedback loops and feed-forward loops, predict fundamental characteristics of its dynamical state space, which in turn determine aspects of its functional versatility. We also show that these higher-level properties have a direct bearing on real regulatory networks, as both basin entropy and cycle length diversity show a close correspondence with the prevalence, in neural and genetic regulatory networks, of the 13 connected motifs without self-interactions that have been studied extensively in the literature. PMID:27440255

  13. Form and function in gene regulatory networks: the structure of network motifs determines fundamental properties of their dynamical state space

    PubMed Central

    Ahnert, S. E.; Fink, T. M. A.

    2016-01-01

    Network motifs have been studied extensively over the past decade, and certain motifs, such as the feed-forward loop, play an important role in regulatory networks. Recent studies have used Boolean network motifs to explore the link between form and function in gene regulatory networks and have found that the structure of a motif does not strongly determine its function, if this is defined in terms of the gene expression patterns the motif can produce. Here, we offer a different, higher-level definition of the ‘function’ of a motif, in terms of two fundamental properties of its dynamical state space as a Boolean network. One is the basin entropy, which is a complexity measure of the dynamics of Boolean networks. The other is the diversity of cyclic attractor lengths that a given motif can produce. Using these two measures, we examine all 104 topologically distinct three-node motifs and show that the structural properties of a motif, such as the presence of feedback loops and feed-forward loops, predict fundamental characteristics of its dynamical state space, which in turn determine aspects of its functional versatility. We also show that these higher-level properties have a direct bearing on real regulatory networks, as both basin entropy and cycle length diversity show a close correspondence with the prevalence, in neural and genetic regulatory networks, of the 13 connected motifs without self-interactions that have been studied extensively in the literature. PMID:27440255

  14. Form and function in gene regulatory networks: the structure of network motifs determines fundamental properties of their dynamical state space.

    PubMed

    Ahnert, S E; Fink, T M A

    2016-07-01

    Network motifs have been studied extensively over the past decade, and certain motifs, such as the feed-forward loop, play an important role in regulatory networks. Recent studies have used Boolean network motifs to explore the link between form and function in gene regulatory networks and have found that the structure of a motif does not strongly determine its function, if this is defined in terms of the gene expression patterns the motif can produce. Here, we offer a different, higher-level definition of the 'function' of a motif, in terms of two fundamental properties of its dynamical state space as a Boolean network. One is the basin entropy, which is a complexity measure of the dynamics of Boolean networks. The other is the diversity of cyclic attractor lengths that a given motif can produce. Using these two measures, we examine all 104 topologically distinct three-node motifs and show that the structural properties of a motif, such as the presence of feedback loops and feed-forward loops, predict fundamental characteristics of its dynamical state space, which in turn determine aspects of its functional versatility. We also show that these higher-level properties have a direct bearing on real regulatory networks, as both basin entropy and cycle length diversity show a close correspondence with the prevalence, in neural and genetic regulatory networks, of the 13 connected motifs without self-interactions that have been studied extensively in the literature.

  15. Ordered cyclic motifs contribute to dynamic stability in biological and engineered networks.

    PubMed

    Ma'ayan, Avi; Cecchi, Guillermo A; Wagner, John; Rao, A Ravi; Iyengar, Ravi; Stolovitzky, Gustavo

    2008-12-01

    Representation and analysis of complex biological and engineered systems as directed networks is useful for understanding their global structure/function organization. Enrichment of network motifs, which are over-represented subgraphs in real networks, can be used for topological analysis. Because counting network motifs is computationally expensive, only characterization of 3- to 5-node motifs has been previously reported. In this study we used a supercomputer to analyze cyclic motifs made of 3-20 nodes for 6 biological and 3 technological networks. Using tools from statistical physics, we developed a theoretical framework for characterizing the ensemble of cyclic motifs in real networks. We have identified a generic property of real complex networks, antiferromagnetic organization, which is characterized by minimal directional coherence of edges along cyclic subgraphs, such that consecutive links tend to have opposing direction. As a consequence, we find that the lack of directional coherence in cyclic motifs leads to depletion in feedback loops, where the number of nodes affected by feedback loops appears to be at a local minimum compared with surrogate shuffled networks. This topology provides more dynamic stability in large networks.

  16. Automated classification of RNA 3D motifs and the RNA 3D Motif Atlas.

    PubMed

    Petrov, Anton I; Zirbel, Craig L; Leontis, Neocles B

    2013-10-01

    The analysis of atomic-resolution RNA three-dimensional (3D) structures reveals that many internal and hairpin loops are modular, recurrent, and structured by conserved non-Watson-Crick base pairs. Structurally similar loops define RNA 3D motifs that are conserved in homologous RNA molecules, but can also occur at nonhomologous sites in diverse RNAs, and which often vary in sequence. To further our understanding of RNA motif structure and sequence variability and to provide a useful resource for structure modeling and prediction, we present a new method for automated classification of internal and hairpin loop RNA 3D motifs and a new online database called the RNA 3D Motif Atlas. To classify the motif instances, a representative set of internal and hairpin loops is automatically extracted from a nonredundant list of RNA-containing PDB files. Their structures are compared geometrically, all-against-all, using the FR3D program suite. The loops are clustered into motif groups, taking into account geometric similarity and structural annotations and making allowance for a variable number of bulged bases. The automated procedure that we have implemented identifies all hairpin and internal loop motifs previously described in the literature. All motif instances and motif groups are assigned unique and stable identifiers and are made available in the RNA 3D Motif Atlas (http://rna.bgsu.edu/motifs), which is automatically updated every four weeks. The RNA 3D Motif Atlas provides an interactive user interface for exploring motif diversity and tools for programmatic data access.

  17. Automated classification of RNA 3D motifs and the RNA 3D Motif Atlas

    PubMed Central

    Petrov, Anton I.; Zirbel, Craig L.; Leontis, Neocles B.

    2013-01-01

    The analysis of atomic-resolution RNA three-dimensional (3D) structures reveals that many internal and hairpin loops are modular, recurrent, and structured by conserved non-Watson–Crick base pairs. Structurally similar loops define RNA 3D motifs that are conserved in homologous RNA molecules, but can also occur at nonhomologous sites in diverse RNAs, and which often vary in sequence. To further our understanding of RNA motif structure and sequence variability and to provide a useful resource for structure modeling and prediction, we present a new method for automated classification of internal and hairpin loop RNA 3D motifs and a new online database called the RNA 3D Motif Atlas. To classify the motif instances, a representative set of internal and hairpin loops is automatically extracted from a nonredundant list of RNA-containing PDB files. Their structures are compared geometrically, all-against-all, using the FR3D program suite. The loops are clustered into motif groups, taking into account geometric similarity and structural annotations and making allowance for a variable number of bulged bases. The automated procedure that we have implemented identifies all hairpin and internal loop motifs previously described in the literature. All motif instances and motif groups are assigned unique and stable identifiers and are made available in the RNA 3D Motif Atlas (http://rna.bgsu.edu/motifs), which is automatically updated every four weeks. The RNA 3D Motif Atlas provides an interactive user interface for exploring motif diversity and tools for programmatic data access. PMID:23970545

  18. Miz-1 activates gene expression via a novel consensus DNA binding motif.

    PubMed

    Barrilleaux, Bonnie L; Burow, Dana; Lockwood, Sarah H; Yu, Abigail; Segal, David J; Knoepfler, Paul S

    2014-01-01

    The transcription factor Miz-1 can either activate or repress gene expression in concert with binding partners including the Myc oncoprotein. The genomic binding of Miz-1 includes both core promoters and more distal sites, but the preferred DNA binding motif of Miz-1 has been unclear. We used a high-throughput in vitro technique, Bind-n-Seq, to identify two Miz-1 consensus DNA binding motif sequences--ATCGGTAATC and ATCGAT (Mizm1 and Mizm2)--bound by full-length Miz-1 and its zinc finger domain, respectively. We validated these sequences directly as high affinity Miz-1 binding motifs. Competition assays using mutant probes indicated that the binding affinity of Miz-1 for Mizm1 and Mizm2 is highly sequence-specific. Miz-1 strongly activates gene expression through the motifs in a Myc-independent manner. MEME-ChIP analysis of Miz-1 ChIP-seq data in two different cell types reveals a long motif with a central core sequence highly similar to the Mizm1 motif identified by Bind-n-Seq, validating the in vivo relevance of the findings. Miz-1 ChIP-seq peaks containing the long motif are predominantly located outside of proximal promoter regions, in contrast to peaks without the motif, which are highly concentrated within 1.5 kb of the nearest transcription start site. Overall, our results indicate that Miz-1 may be directed in vivo to the novel motif sequences we have identified, where it can recruit its specific binding partners to control gene expression and ultimately regulate cell fate. PMID:24983942

  19. Identification of a putative nuclear export signal motif in human NANOG homeobox domain

    SciTech Connect

    Park, Sung-Won; Do, Hyun-Jin; Huh, Sun-Hyung; Sung, Boreum; Uhm, Sang-Jun; Song, Hyuk; Kim, Nam-Hyung; Kim, Jae-Hwan

    2012-05-11

    Highlights: Black-Right-Pointing-Pointer We found the putative nuclear export signal motif within human NANOG homeodomain. Black-Right-Pointing-Pointer Leucine-rich residues are important for human NANOG homeodomain nuclear export. Black-Right-Pointing-Pointer CRM1-specific inhibitor LMB blocked the potent human NANOG NES-mediated nuclear export. -- Abstract: NANOG is a homeobox-containing transcription factor that plays an important role in pluripotent stem cells and tumorigenic cells. To understand how nuclear localization of human NANOG is regulated, the NANOG sequence was examined and a leucine-rich nuclear export signal (NES) motif ({sup 125}MQELSNILNL{sup 134}) was found in the homeodomain (HD). To functionally validate the putative NES motif, deletion and site-directed mutants were fused to an EGFP expression vector and transfected into COS-7 cells, and the localization of the proteins was examined. While hNANOG HD exclusively localized to the nucleus, a mutant with both NLSs deleted and only the putative NES motif contained (hNANOG HD-{Delta}NLSs) was predominantly cytoplasmic, as observed by nucleo/cytoplasmic fractionation and Western blot analysis as well as confocal microscopy. Furthermore, site-directed mutagenesis of the putative NES motif in a partial hNANOG HD only containing either one of the two NLS motifs led to localization in the nucleus, suggesting that the NES motif may play a functional role in nuclear export. Furthermore, CRM1-specific nuclear export inhibitor LMB blocked the hNANOG potent NES-mediated export, suggesting that the leucine-rich motif may function in CRM1-mediated nuclear export of hNANOG. Collectively, a NES motif is present in the hNANOG HD and may be functionally involved in CRM1-mediated nuclear export pathway.

  20. MotifMiner: A Table Driven Greedy Algorithm for DNA Motif Mining

    NASA Astrophysics Data System (ADS)

    Seeja, K. R.; Alam, M. A.; Jain, S. K.

    DNA motif discovery is a much explored problem in functional genomics. This paper describes a table driven greedy algorithm for discovering regulatory motifs in the promoter sequences of co-expressed genes. The proposed algorithm searches both DNA strands for the common patterns or motifs. The inputs to the algorithm are set of promoter sequences, the motif length and minimum Information Content. The algorithm generates subsequences of given length from the shortest input promoter sequence. It stores these subsequences and their reverse complements in a table. Then it searches the remaining sequences for good matches of these subsequences. The Information Content score is used to measure the goodness of the motifs. The algorithm has been tested with synthetic data and real data. The results are found promising. The algorithm could discover meaningful motifs from the muscle specific regulatory sequences.

  1. Chaotic motifs in gene regulatory networks.

    PubMed

    Zhang, Zhaoyang; Ye, Weiming; Qian, Yu; Zheng, Zhigang; Huang, Xuhui; Hu, Gang

    2012-01-01

    Chaos should occur often in gene regulatory networks (GRNs) which have been widely described by nonlinear coupled ordinary differential equations, if their dimensions are no less than 3. It is therefore puzzling that chaos has never been reported in GRNs in nature and is also extremely rare in models of GRNs. On the other hand, the topic of motifs has attracted great attention in studying biological networks, and network motifs are suggested to be elementary building blocks that carry out some key functions in the network. In this paper, chaotic motifs (subnetworks with chaos) in GRNs are systematically investigated. The conclusion is that: (i) chaos can only appear through competitions between different oscillatory modes with rivaling intensities. Conditions required for chaotic GRNs are found to be very strict, which make chaotic GRNs extremely rare. (ii) Chaotic motifs are explored as the simplest few-node structures capable of producing chaos, and serve as the intrinsic source of chaos of random few-node GRNs. Several optimal motifs causing chaos with atypically high probability are figured out. (iii) Moreover, we discovered that a number of special oscillators can never produce chaos. These structures bring some advantages on rhythmic functions and may help us understand the robustness of diverse biological rhythms. (iv) The methods of dominant phase-advanced driving (DPAD) and DPAD time fraction are proposed to quantitatively identify chaotic motifs and to explain the origin of chaotic behaviors in GRNs.

  2. Basic OSF/Motif programming and applications

    SciTech Connect

    Brooks, D. ); Novak, B. )

    1992-09-15

    When users refer to Motif, they are usually talking about mwm, the window manager. However, when programmers mention Motif they are usually discussing the programming toolkit. This toolkit is used to develop new or modify existing applications. In this presentation, the term Motif will refer to the toolkit. Motif comes with a number of features that help users effectively use the applications built with it. The term look and feel may be overused; nonetheless, a consistent and well designed look and feel assists the user in Teaming and using new applications. The term point and click generally refers to using a mouse to select program commands. While Motif supports point and click, the toolkit also supports using the keyboard as a substitute for many operations. This gives a good typist a distinct advantage when using a familiar application. We will give an overview of the toolkit, touching on the user interface features and general programming considerations. Since the source code for many useful Motif programs is readily available, we will explain how to get these sources and touch on derived benefits. We win also point to other sources of on-line help and documentation. Finally, we will present some practical experiences developing applications.

  3. Helix-packing motifs in membrane proteins.

    PubMed

    Walters, R F S; DeGrado, W F

    2006-09-12

    The fold of a helical membrane protein is largely determined by interactions between membrane-imbedded helices. To elucidate recurring helix-helix interaction motifs, we dissected the crystallographic structures of membrane proteins into a library of interacting helical pairs. The pairs were clustered according to their three-dimensional similarity (rmsd motifs whose structural features can be understood in terms of simple principles of helix-helix packing. Thus, the universe of common transmembrane helix-pairing motifs is relatively simple. The largest cluster, which comprises 29% of the library members, consists of an antiparallel motif with left-handed packing angles, and it is frequently stabilized by packing of small side chains occurring every seven residues in the sequence. Right-handed parallel and antiparallel structures show a similar tendency to segregate small residues to the helix-helix interface but spaced at four-residue intervals. Position-specific sequence propensities were derived for the most populated motifs. These structural and sequential motifs should be quite useful for the design and structural prediction of membrane proteins.

  4. iMotifs: an integrated sequence motif visualization and analysis environment

    PubMed Central

    Piipari, Matias; Down, Thomas A.; Saini, Harpreet; Enright, Anton; Hubbard, Tim J.P.

    2010-01-01

    Motivation: Short sequence motifs are an important class of models in molecular biology, used most commonly for describing transcription factor binding site specificity patterns. High-throughput methods have been recently developed for detecting regulatory factor binding sites in vivo and in vitro and consequently high-quality binding site motif data are becoming available for increasing number of organisms and regulatory factors. Development of intuitive tools for the study of sequence motifs is therefore important. iMotifs is a graphical motif analysis environment that allows visualization of annotated sequence motifs and scored motif hits in sequences. It also offers motif inference with the sensitive NestedMICA algorithm, as well as overrepresentation and pairwise motif matching capabilities. All of the analysis functionality is provided without the need to convert between file formats or learn different command line interfaces. The application includes a bundled and graphically integrated version of the NestedMICA motif inference suite that has no outside dependencies. Problems associated with local deployment of software are therefore avoided. Availability: iMotifs is licensed with the GNU Lesser General Public License v2.0 (LGPL 2.0). The software and its source is available at http://wiki.github.com/mz2/imotifs and can be run on Mac OS X Leopard (Intel/PowerPC). We also provide a cross-platform (Linux, OS X, Windows) LGPL 2.0 licensed library libxms for the Perl, Ruby, R and Objective-C programming languages for input and output of XMS formatted annotated sequence motif set files. Contact: matias.piipari@gmail.com; imotifs@googlegroups.com PMID:20106815

  5. RNA 3D Structural Motifs: Definition, Identification, Annotation, and Database Searching

    NASA Astrophysics Data System (ADS)

    Nasalean, Lorena; Stombaugh, Jesse; Zirbel, Craig L.; Leontis, Neocles B.

    Structured RNA molecules resemble proteins in the hierarchical organization of their global structures, folding and broad range of functions. Structured RNAs are composed of recurrent modular motifs that play specific functional roles. Some motifs direct the folding of the RNA or stabilize the folded structure through tertiary interactions. Others bind ligands or proteins or catalyze chemical reactions. Therefore, it is desirable, starting from the RNA sequence, to be able to predict the locations of recurrent motifs in RNA molecules. Conversely, the potential occurrence of one or more known 3D RNA motifs may indicate that a genomic sequence codes for a structured RNA molecule. To identify known RNA structural motifs in new RNA sequences, precise structure-based definitions are needed that specify the core nucleotides of each motif and their conserved interactions. By comparing instances of each recurrent motif and applying base pair isosteriCity relations, one can identify neutral mutations that preserve its structure and function in the contexts in which it occurs.

  6. SVM2Motif--Reconstructing Overlapping DNA Sequence Motifs by Mimicking an SVM Predictor.

    PubMed

    Vidovic, Marina M-C; Görnitz, Nico; Müller, Klaus-Robert; Rätsch, Gunnar; Kloft, Marius

    2015-01-01

    Identifying discriminative motifs underlying the functionality and evolution of organisms is a major challenge in computational biology. Machine learning approaches such as support vector machines (SVMs) achieve state-of-the-art performances in genomic discrimination tasks, but--due to its black-box character--motifs underlying its decision function are largely unknown. As a remedy, positional oligomer importance matrices (POIMs) allow us to visualize the significance of position-specific subsequences. Although being a major step towards the explanation of trained SVM models, they suffer from the fact that their size grows exponentially in the length of the motif, which renders their manual inspection feasible only for comparably small motif sizes, typically k ≤ 5. In this work, we extend the work on positional oligomer importance matrices, by presenting a new machine-learning methodology, entitled motifPOIM, to extract the truly relevant motifs--regardless of their length and complexity--underlying the predictions of a trained SVM model. Our framework thereby considers the motifs as free parameters in a probabilistic model, a task which can be phrased as a non-convex optimization problem. The exponential dependence of the POIM size on the oligomer length poses a major numerical challenge, which we address by an efficient optimization framework that allows us to find possibly overlapping motifs consisting of up to hundreds of nucleotides. We demonstrate the efficacy of our approach on a synthetic data set as well as a real-world human splice site data set. PMID:26690911

  7. Defect Motifs for Constant Mean Curvature Surfaces

    NASA Astrophysics Data System (ADS)

    Kusumaatmaja, Halim; Wales, David J.

    2013-04-01

    The energy landscapes of electrostatically charged particles embedded on constant mean curvature surfaces are analyzed for a wide range of system size, curvature, and interaction potentials. The surfaces are taken to be rigid, and the basin-hopping method is used to locate the putative global minimum structures. The defect motifs favored by potential energy agree with experimental observations for colloidal systems: extended defects (scars and pleats) for weakly positive and negative Gaussian curvatures, and isolated defects for strongly negative Gaussian curvatures. Near the phase boundary between these regimes, the two motifs are in strong competition, as evidenced from the appearance of distinct funnels in the potential energy landscape. We also report a novel defect motif consisting of pentagon pairs.

  8. Armadillo motifs involved in vesicular transport.

    PubMed

    Striegl, Harald; Andrade-Navarro, Miguel A; Heinemann, Udo

    2010-02-01

    Armadillo (ARM) repeat proteins function in various cellular processes including vesicular transport and membrane tethering. They contain an imperfect repeating sequence motif that forms a conserved three-dimensional structure. Recently, structural and functional insight into tethering mediated by the ARM-repeat protein p115 has been provided. Here we describe the p115 ARM-motifs for reasons of clarity and nomenclature and show that both sequence and structure are highly conserved among ARM-repeat proteins. We argue that there is no need to invoke repeat types other than ARM repeats for a proper description of the structure of the p115 globular head region. Additionally, we propose to define a new subfamily of ARM-like proteins and show lack of evidence that the ARM motifs found in p115 are present in other long coiled-coil tethering factors of the golgin family.

  9. Polyrhythmic synchronization in bursting networking motifs

    NASA Astrophysics Data System (ADS)

    Shilnikov, Andrey; Gordon, René; Belykh, Igor

    2008-09-01

    We study the emergence of polyrhythmic dynamics of motifs which are the building block for small inhibitory-excitatory networks, such as central pattern generators controlling various locomotive behaviors of animals. We discover that the pacemaker determining the specific rhythm of such a network composed of realistic Hodgkin-Huxley-type neurons is identified through the order parameter, which is the ratio of the neurons' burst durations or of duty cycles. We analyze different configurations of the motifs and describe the universal mechanisms for synergetics of the bursting patterns. We discuss also the multistability of inhibitory networks that results in polyrhythmicity of its emergent synchronous behaviors.

  10. Calendar motifs on Getashen hydria

    NASA Astrophysics Data System (ADS)

    Vrtanesyan, Garegin

    2015-07-01

    Getashen hydria was found in the tombs of the middle bronze age (the first third of the second Millennium B.C.) in Armenia (Lake Sevan). It shows a scene consisting of three friezes. On the lower frieze depicts six zoomorphic figures, on an average six frieze waterfowl, and on top, is the graphic signs. Calendar motives of this composition have a numeric expression, six zoomorphic figures on the lower and middle friezes. Division of the annual cycle into two parts is known in the calendars of the ancient Indo-Iranian ("great summer" and "the great winter"). Animals on the lower frieze of the second mark, "winter" road of the Sun, because in this period are the most important events, ensuring the reproduction of the economy of the society. This rut ungulates - wild (deer) and domestic (goats). Moreover, the gon goats end in December, almost coinciding with the onset of the winter solstice. A couple of dogs on the lower frieze marks the version of the myth, imprisoned in the rock hero - the Sun (Mihr - Artavazd), to which his dogs have to chew the chains, anticipating his exit at the winter solstice. This is indicated by the direction of their movement, the Sun moves from left to right for an observer, only when located on the South side of the sky (i.e., beginning with the autumnal equinox). The most important event of the period of "summer road" of the Sun is the vernal equinox, which coincide with the arrival of waterfowl (ducks, geese). Their direction on the second frieze (left to right) corresponds to the position of the observer, facing North.

  11. The Verrucomicrobia LexA-Binding Motif: Insights into the Evolutionary Dynamics of the SOS Response

    PubMed Central

    Erill, Ivan; Campoy, Susana; Kılıç, Sefa; Barbé, Jordi

    2016-01-01

    The SOS response is the primary bacterial mechanism to address DNA damage, coordinating multiple cellular processes that include DNA repair, cell division, and translesion synthesis. In contrast to other regulatory systems, the composition of the SOS genetic network and the binding motif of its transcriptional repressor, LexA, have been shown to vary greatly across bacterial clades, making it an ideal system to study the co-evolution of transcription factors and their regulons. Leveraging comparative genomics approaches and prior knowledge on the core SOS regulon, here we define the binding motif of the Verrucomicrobia, a recently described phylum of emerging interest due to its association with eukaryotic hosts. Site directed mutagenesis of the Verrucomicrobium spinosum recA promoter confirms that LexA binds a 14 bp palindromic motif with consensus sequence TGTTC-N4-GAACA. Computational analyses suggest that recognition of this novel motif is determined primarily by changes in base-contacting residues of the third alpha helix of the LexA helix-turn-helix DNA binding motif. In conjunction with comparative genomics analysis of the LexA regulon in the Verrucomicrobia phylum, electrophoretic shift assays reveal that LexA binds to operators in the promoter region of DNA repair genes and a mutagenesis cassette in this organism, and identify previously unreported components of the SOS response. The identification of tandem LexA-binding sites generating instances of other LexA-binding motifs in the lexA gene promoter of Verrucomicrobia species leads us to postulate a novel mechanism for LexA-binding motif evolution. This model, based on gene duplication, successfully addresses outstanding questions in the intricate co-evolution of the LexA protein, its binding motif and the regulatory network it controls. PMID:27489856

  12. CytoKavosh: a cytoscape plug-in for finding network motifs in large biological networks.

    PubMed

    Masoudi-Nejad, Ali; Ansariola, Mitra; Kashani, Zahra Razaghi Moghadam; Salehzadeh-Yazdi, Ali; Khakabimamaghani, Sahand

    2012-01-01

    Network motifs are small connected sub-graphs that have recently gathered much attention to discover structural behaviors of large and complex networks. Finding motifs with any size is one of the most important problems in complex and large networks. It needs fast and reliable algorithms and tools for achieving this purpose. CytoKavosh is one of the best choices for finding motifs with any given size in any complex network. It relies on a fast algorithm, Kavosh, which makes it faster than other existing tools. Kavosh algorithm applies some well known algorithmic features and includes tricky aspects, which make it an efficient algorithm in this field. CytoKavosh is a Cytoscape plug-in which supports us in finding motifs of given size in a network that is formerly loaded into the Cytoscape work-space (directed or undirected). High performance of CytoKavosh is achieved by dynamically linking highly optimized functions of Kavosh's C++ to the Cytoscape Java program, which makes this plug-in suitable for analyzing large biological networks. Some significant attributes of CytoKavosh is efficiency in time usage and memory and having no limitation related to the implementation in motif size. CytoKavosh is implemented in a visual environment Cytoscape that is convenient for the users to interact and create visual options to analyze the structural behavior of a network. This plug-in can work on any given network and is very simple to use and generates graphical results of discovered motifs with any required details. There is no specific Cytoscape plug-in, specific for finding the network motifs, based on original concept. So, we have introduced for the first time, CytoKavosh as the first plug-in, and we hope that this plug-in can be improved to cover other options to make it the best motif-analyzing tool.

  13. The Verrucomicrobia LexA-Binding Motif: Insights into the Evolutionary Dynamics of the SOS Response.

    PubMed

    Erill, Ivan; Campoy, Susana; Kılıç, Sefa; Barbé, Jordi

    2016-01-01

    The SOS response is the primary bacterial mechanism to address DNA damage, coordinating multiple cellular processes that include DNA repair, cell division, and translesion synthesis. In contrast to other regulatory systems, the composition of the SOS genetic network and the binding motif of its transcriptional repressor, LexA, have been shown to vary greatly across bacterial clades, making it an ideal system to study the co-evolution of transcription factors and their regulons. Leveraging comparative genomics approaches and prior knowledge on the core SOS regulon, here we define the binding motif of the Verrucomicrobia, a recently described phylum of emerging interest due to its association with eukaryotic hosts. Site directed mutagenesis of the Verrucomicrobium spinosum recA promoter confirms that LexA binds a 14 bp palindromic motif with consensus sequence TGTTC-N4-GAACA. Computational analyses suggest that recognition of this novel motif is determined primarily by changes in base-contacting residues of the third alpha helix of the LexA helix-turn-helix DNA binding motif. In conjunction with comparative genomics analysis of the LexA regulon in the Verrucomicrobia phylum, electrophoretic shift assays reveal that LexA binds to operators in the promoter region of DNA repair genes and a mutagenesis cassette in this organism, and identify previously unreported components of the SOS response. The identification of tandem LexA-binding sites generating instances of other LexA-binding motifs in the lexA gene promoter of Verrucomicrobia species leads us to postulate a novel mechanism for LexA-binding motif evolution. This model, based on gene duplication, successfully addresses outstanding questions in the intricate co-evolution of the LexA protein, its binding motif and the regulatory network it controls.

  14. The Verrucomicrobia LexA-Binding Motif: Insights into the Evolutionary Dynamics of the SOS Response.

    PubMed

    Erill, Ivan; Campoy, Susana; Kılıç, Sefa; Barbé, Jordi

    2016-01-01

    The SOS response is the primary bacterial mechanism to address DNA damage, coordinating multiple cellular processes that include DNA repair, cell division, and translesion synthesis. In contrast to other regulatory systems, the composition of the SOS genetic network and the binding motif of its transcriptional repressor, LexA, have been shown to vary greatly across bacterial clades, making it an ideal system to study the co-evolution of transcription factors and their regulons. Leveraging comparative genomics approaches and prior knowledge on the core SOS regulon, here we define the binding motif of the Verrucomicrobia, a recently described phylum of emerging interest due to its association with eukaryotic hosts. Site directed mutagenesis of the Verrucomicrobium spinosum recA promoter confirms that LexA binds a 14 bp palindromic motif with consensus sequence TGTTC-N4-GAACA. Computational analyses suggest that recognition of this novel motif is determined primarily by changes in base-contacting residues of the third alpha helix of the LexA helix-turn-helix DNA binding motif. In conjunction with comparative genomics analysis of the LexA regulon in the Verrucomicrobia phylum, electrophoretic shift assays reveal that LexA binds to operators in the promoter region of DNA repair genes and a mutagenesis cassette in this organism, and identify previously unreported components of the SOS response. The identification of tandem LexA-binding sites generating instances of other LexA-binding motifs in the lexA gene promoter of Verrucomicrobia species leads us to postulate a novel mechanism for LexA-binding motif evolution. This model, based on gene duplication, successfully addresses outstanding questions in the intricate co-evolution of the LexA protein, its binding motif and the regulatory network it controls. PMID:27489856

  15. Motifs and structural blocks retrieval by GHT

    NASA Astrophysics Data System (ADS)

    Cantoni, Virginio; Ferone, Alessio; Petrosino, Alfredo; Polat, Ozlem

    2014-06-01

    The structure of a protein gives more insight on the protein function than its amino acid sequence. Protein structure analysis and comparison are important for understanding the evolutionary relationships among proteins, predicting protein functions, and predicting protein folding. Proteins are formed by two basic regular 3D structural patterns, called Secondary Structures (SSs): helices and sheets. A structural motif is a compact 3D protein block referring to a small specific combination of secondary structural elements, which appears in a variety of molecules. In this paper we compare a few approaches for motif retrieval based on the Generalized Hough Transform (GHT). A primary technique is to adopt the single SS as structural primitives; alternatives are to adopt a SSs pair as primitive structural element, or a SSs triplet, and so on up-to an entire motif. The richer the primitive, the higher the time for pre-analysis and search, and the simpler the inspection process on the parameter space for analyzing the peaks. Performance comparisons, in terms of precision and computation time, are here presented considering the retrieval of motifs composed by three to five SSs for more than 15 million searches. The approach can be easily applied to the retrieval of greater blocks, up to protein domains, or even entire proteins.

  16. The Motif of Meeting in Digital Education

    ERIC Educational Resources Information Center

    Sheail, Philippa

    2015-01-01

    This article draws on theoretical work which considers the composition of meetings, in order to think about the form of the meeting in digital environments for higher education. To explore the motif of meeting, I undertake a "compositional interpretation" (Rose, 2012) of the default interface offered by "Collaborate", an…

  17. DNA motif elucidation using belief propagation.

    PubMed

    Wong, Ka-Chun; Chan, Tak-Ming; Peng, Chengbin; Li, Yue; Zhang, Zhaolei

    2013-09-01

    Protein-binding microarray (PBM) is a high-throughout platform that can measure the DNA-binding preference of a protein in a comprehensive and unbiased manner. A typical PBM experiment can measure binding signal intensities of a protein to all the possible DNA k-mers (k=8∼10); such comprehensive binding affinity data usually need to be reduced and represented as motif models before they can be further analyzed and applied. Since proteins can often bind to DNA in multiple modes, one of the major challenges is to decompose the comprehensive affinity data into multimodal motif representations. Here, we describe a new algorithm that uses Hidden Markov Models (HMMs) and can derive precise and multimodal motifs using belief propagations. We describe an HMM-based approach using belief propagations (kmerHMM), which accepts and preprocesses PBM probe raw data into median-binding intensities of individual k-mers. The k-mers are ranked and aligned for training an HMM as the underlying motif representation. Multiple motifs are then extracted from the HMM using belief propagations. Comparisons of kmerHMM with other leading methods on several data sets demonstrated its effectiveness and uniqueness. Especially, it achieved the best performance on more than half of the data sets. In addition, the multiple binding modes derived by kmerHMM are biologically meaningful and will be useful in interpreting other genome-wide data such as those generated from ChIP-seq. The executables and source codes are available at the authors' websites: e.g. http://www.cs.toronto.edu/∼wkc/kmerHMM. PMID:23814189

  18. A survey of motif finding Web tools for detecting binding site motifs in ChIP-Seq data

    PubMed Central

    2014-01-01

    Abstract ChIP-Seq (chromatin immunoprecipitation sequencing) has provided the advantage for finding motifs as ChIP-Seq experiments narrow down the motif finding to binding site locations. Recent motif finding tools facilitate the motif detection by providing user-friendly Web interface. In this work, we reviewed nine motif finding Web tools that are capable for detecting binding site motifs in ChIP-Seq data. We showed each motif finding Web tool has its own advantages for detecting motifs that other tools may not discover. We recommended the users to use multiple motif finding Web tools that implement different algorithms for obtaining significant motifs, overlapping resemble motifs, and non-overlapping motifs. Finally, we provided our suggestions for future development of motif finding Web tool that better assists researchers for finding motifs in ChIP-Seq data. Reviewers This article was reviewed by Prof. Sandor Pongor, Dr. Yuriy Gusev, and Dr. Shyam Prabhakar (nominated by Prof. Limsoon Wong). PMID:24555784

  19. Discovering interacting domains and motifs in protein-protein interactions.

    PubMed

    Hugo, Willy; Sung, Wing-Kin; Ng, See-Kiong

    2013-01-01

    Many important biological processes, such as the signaling pathways, require protein-protein interactions (PPIs) that are designed for fast response to stimuli. These interactions are usually transient, easily formed, and disrupted, yet specific. Many of these transient interactions involve the binding of a protein domain to a short stretch (3-10) of amino acid residues, which can be characterized by a sequence pattern, i.e., a short linear motif (SLiM). We call these interacting domains and motifs domain-SLiM interactions. Existing methods have focused on discovering SLiMs in the interacting proteins' sequence data. With the recent increase in protein structures, we have a new opportunity to detect SLiMs directly from the proteins' 3D structures instead of their linear sequences. In this chapter, we describe a computational method called SLiMDIet to directly detect SLiMs on domain interfaces extracted from 3D structures of PPIs. SLiMDIet comprises two steps: (1) interaction interfaces belonging to the same domain are extracted and grouped together using structural clustering and (2) the extracted interaction interfaces in each cluster are structurally aligned to extract the corresponding SLiM. Using SLiMDIet, de novo SLiMs interacting with protein domains can be computationally detected from structurally clustered domain-SLiM interactions for PFAM domains which have available 3D structures in the PDB database.

  20. CombiMotif: A new algorithm for network motifs discovery in protein-protein interaction networks

    NASA Astrophysics Data System (ADS)

    Luo, Jiawei; Li, Guanghui; Song, Dan; Liang, Cheng

    2014-12-01

    Discovering motifs in protein-protein interaction networks is becoming a current major challenge in computational biology, since the distribution of the number of network motifs can reveal significant systemic differences among species. However, this task can be computationally expensive because of the involvement of graph isomorphic detection. In this paper, we present a new algorithm (CombiMotif) that incorporates combinatorial techniques to count non-induced occurrences of subgraph topologies in the form of trees. The efficiency of our algorithm is demonstrated by comparing the obtained results with the current state-of-the art subgraph counting algorithms. We also show major differences between unicellular and multicellular organisms. The datasets and source code of CombiMotif are freely available upon request.

  1. Using SCOPE to identify potential regulatory motifs in coregulated genes.

    PubMed

    Martyanov, Viktor; Gross, Robert H

    2011-05-31

    SCOPE is an ensemble motif finder that uses three component algorithms in parallel to identify potential regulatory motifs by over-representation and motif position preference. Each component algorithm is optimized to find a different kind of motif. By taking the best of these three approaches, SCOPE performs better than any single algorithm, even in the presence of noisy data. In this article, we utilize a web version of SCOPE to examine genes that are involved in telomere maintenance. SCOPE has been incorporated into at least two other motif finding programs and has been used in other studies. The three algorithms that comprise SCOPE are BEAM, which finds non-degenerate motifs (ACCGGT), PRISM, which finds degenerate motifs (ASCGWT), and SPACER, which finds longer bipartite motifs (ACCnnnnnnnnGGT). These three algorithms have been optimized to find their corresponding type of motif. Together, they allow SCOPE to perform extremely well. Once a gene set has been analyzed and candidate motifs identified, SCOPE can look for other genes that contain the motif which, when added to the original set, will improve the motif score. This can occur through over-representation or motif position preference. Working with partial gene sets that have biologically verified transcription factor binding sites, SCOPE was able to identify most of the rest of the genes also regulated by the given transcription factor. Output from SCOPE shows candidate motifs, their significance, and other information both as a table and as a graphical motif map. FAQs and video tutorials are available at the SCOPE web site which also includes a "Sample Search" button that allows the user to perform a trial run. Scope has a very friendly user interface that enables novice users to access the algorithm's full power without having to become an expert in the bioinformatics of motif finding. As input, SCOPE can take a list of genes, or FASTA sequences. These can be entered in browser text fields, or read from

  2. Functional Motifs in Biochemical Reaction Networks

    PubMed Central

    Tyson, John J.; Novák, Béla

    2013-01-01

    The signal-response characteristics of a living cell are determined by complex networks of interacting genes, proteins, and metabolites. Understanding how cells respond to specific challenges, how these responses are contravened in diseased cells, and how to intervene pharmacologically in the decision-making processes of cells requires an accurate theory of the information-processing capabilities of macromolecular regulatory networks. Adopting an engineer’s approach to control systems, we ask whether realistic cellular control networks can be decomposed into simple regulatory motifs that carry out specific functions in a cell. We show that such functional motifs exist and review the experimental evidence that they control cellular responses as expected. PMID:20055671

  3. Anticipated synchronization in neuronal network motifs

    NASA Astrophysics Data System (ADS)

    Matias, F. S.; Gollo, L. L.; Carelli, P. V.; Copelli, M.; Mirasso, C. R.

    2013-01-01

    Two identical dynamical systems coupled unidirectionally (in a so called master-slave configuration) exhibit anticipated synchronization (AS) if the one which receives the coupling (the slave) also receives a negative delayed self-feedback. In oscillatory neuronal systems AS is characterized by a phase-locking with negative time delay τ between the spikes of the master and of the slave (slave fires before the master), while in the usual delayed synchronization (DS) regime τ is positive (slave fires after the master). A 3-neuron motif in which the slave self-feedback is replaced by a feedback loop mediated by an interneuron can exhibits both AS and DS regimes. Here we show that AS is robust in the presence of noise in a 3 Hodgkin-Huxley type neuronal motif. We also show that AS is stable for large values of τ in a chain of connected slaves-interneurons.

  4. Analyzing network reliability using structural motifs

    NASA Astrophysics Data System (ADS)

    Khorramzadeh, Yasamin; Youssef, Mina; Eubank, Stephen; Mowlaei, Shahir

    2015-04-01

    This paper uses the reliability polynomial, introduced by Moore and Shannon in 1956, to analyze the effect of network structure on diffusive dynamics such as the spread of infectious disease. We exhibit a representation for the reliability polynomial in terms of what we call structural motifs that is well suited for reasoning about the effect of a network's structural properties on diffusion across the network. We illustrate by deriving several general results relating graph structure to dynamical phenomena.

  5. Acidic/IQ Motif Regulator of Calmodulin*

    PubMed Central

    Putkey, John A.; Waxham, M. Neal; Gaertner, Tara R.; Brewer, Kari J.; Goldsmith, Michael; Kubota, Yoshihisa; Kleerekoper, Quinn K.

    2013-01-01

    The small IQ motif proteins PEP-19 (62 amino acids) and RC3 (78 amino acids) greatly accelerate the rates of Ca2+ binding to sites III and IV in the C-domain of calmodulin (CaM). We show here that PEP-19 decreases the degree of cooperativity of Ca2+ binding to sites III and IV, and we present a model showing that this could increase Ca2+ binding rate constants. Comparative sequence analysis showed that residues 28 to 58 from PEP-19 are conserved in other proteins. This region includes the IQ motif (amino acids 39–62), and an adjacent acidic cluster of amino acids (amino acids 28–40). A synthetic peptide spanning residues 28–62 faithfully mimics intact PEP-19 with respect to increasing the rates of Ca2+ association and dissociation, as well as binding preferentially to the C-domain of CaM. In contrast, a peptide encoding only the core IQ motif does not modulate Ca2+ binding, and binds to multiple sites on CaM. A peptide that includes only the acidic region does not bind to CaM. These results show that PEP-19 has a novel acidic/IQ CaM regulatory motif in which the IQ sequence provides a targeting function that allows binding of PEP-19 to CaM, whereas the acidic residues modify the nature of this interaction, and are essential for modulating Ca2+ binding to the C-domain of CaM. PMID:17991744

  6. Dynamic motifs in socio-economic networks

    NASA Astrophysics Data System (ADS)

    Zhang, Xin; Shao, Shuai; Stanley, H. Eugene; Havlin, Shlomo

    2014-12-01

    Socio-economic networks are of central importance in economic life. We develop a method of identifying and studying motifs in socio-economic networks by focusing on “dynamic motifs,” i.e., evolutionary connection patterns that, because of “node acquaintances” in the network, occur much more frequently than random patterns. We examine two evolving bi-partite networks: i) the world-wide commercial ship chartering market and ii) the ship build-to-order market. We find similar dynamic motifs in both bipartite networks, even though they describe different economic activities. We also find that “influence” and “persistence” are strong factors in the interaction behavior of organizations. When two companies are doing business with the same customer, it is highly probable that another customer who currently only has business relationship with one of these two companies, will become customer of the second in the future. This is the effect of influence. Persistence means that companies with close business ties to customers tend to maintain their relationships over a long period of time.

  7. The Geometry of Plasticity-Induced Sensitization in Isoinhibitory Rate Motifs.

    PubMed

    Kumar, Gautam; Ching, ShiNung

    2016-09-01

    A well-known phenomenon in sensory perception is desensitization, wherein behavioral responses to persistent stimuli become attenuated over time. In this letter, our focus is on studying mechanisms through which desensitization may be mediated at the network level and, specifically, how sensitivity changes arise as a function of long-term plasticity. Our principal object of study is a generic isoinhibitory motif: a small excitatory-inhibitory network with recurrent inhibition. Such a motif is of interest due to its overrepresentation in laminar sensory network architectures. Here, we introduce a sensitivity analysis derived from control theory in which we characterize the fixed-energy reachable set of the motif. This set describes the regions of the phase-space that are more easily (in terms of stimulus energy) accessed, thus providing a holistic assessment of sensitivity. We specifically focus on how the geometry of this set changes due to repetitive application of a persistent stimulus. We find that for certain motif dynamics, this geometry contracts along the stimulus orientation while expanding in orthogonal directions. In other words, the motif not only desensitizes to the persistent input, but heightens its responsiveness (sensitizes) to those that are orthogonal. We develop a perturbation analysis that links this sensitization to both plasticity-induced changes in synaptic weights and the intrinsic dynamics of the network, highlighting that the effect is not purely due to weight-dependent disinhibition. Instead, this effect depends on the relative neuronal time constants and the consequent stimulus-induced drift that arises in the motif phase-space. For tightly distributed (but random) parameter ranges, sensitization is quite generic and manifests in larger recurrent E-I networks within which the motif is embedded. PMID:27391684

  8. A motif for reversible nitric oxide interactions in metalloenzymes.

    PubMed

    Zhang, Shiyu; Melzer, Marie M; Sen, S Nermin; Çelebi-Ölçüm, Nihan; Warren, Timothy H

    2016-07-01

    Nitric oxide (NO) participates in numerous biological processes, such as signalling in the respiratory system and vasodilation in the cardiovascular system. Many metal-mediated processes involve direct reaction of NO to form a metal-nitrosyl (M-NO), as occurs at the Fe(2+) centres of soluble guanylate cyclase or cytochrome c oxidase. However, some copper electron-transfer proteins that bear a type 1 Cu site (His2Cu-Cys) reversibly bind NO by an unknown motif. Here, we use model complexes of type 1 Cu sites based on tris(pyrazolyl)borate copper thiolates [Cu(II)]-SR to unravel the factors involved in NO reactivity. Addition of NO provides the fully characterized S-nitrosothiol adduct [Cu(I)](κ(1)-N(O)SR), which reversibly loses NO on purging with an inert gas. Computational analysis outlines a low-barrier pathway for the capture and release of NO. These findings suggest a new motif for reversible binding of NO at bioinorganic metal centres that can interconvert NO and RSNO molecular signals at copper sites. PMID:27325092

  9. ET-Motif: Solving the Exact (l, d)-Planted Motif Problem Using Error Tree Structure.

    PubMed

    Al-Okaily, Anas; Huang, Chun-Hsi

    2016-07-01

    Motif finding is an important and a challenging problem in many biological applications such as discovering promoters, enhancers, locus control regions, transcription factors, and more. The (l, d)-planted motif search, PMS, is one of several variations of the problem. In this problem, there are n given sequences over alphabets of size [Formula: see text], each of length m, and two given integers l and d. The problem is to find a motif m of length l, where in each sequence there is at least an l-mer at a Hamming distance of [Formula: see text] of m. In this article, we propose ET-Motif, an algorithm that can solve the PMS problem in [Formula: see text] time and [Formula: see text] space. The time bound can be further reduced by a factor of m with [Formula: see text] space. In case the suffix tree that is built for the input sequences is balanced, the problem can be solved in [Formula: see text] time and [Formula: see text] space. Similarly, the time bound can be reduced by a factor of m using [Formula: see text] space. Moreover, the variations of the problem, namely the edit distance PMS and edited PMS (Quorum), can be solved using ET-Motif with simple modifications but upper bands of space and time. For edit distance PMS, the time and space bounds will be increased by [Formula: see text], while for edited PMS the increase will be of [Formula: see text] in the time bound. PMID:27152692

  10. Occurrence probability of structured motifs in random sequences.

    PubMed

    Robin, S; Daudin, J-J; Richard, H; Sagot, M-F; Schbath, S

    2002-01-01

    The problem of extracting from a set of nucleic acid sequences motifs which may have biological function is more and more important. In this paper, we are interested in particular motifs that may be implicated in the transcription process. These motifs, called structured motifs, are composed of two ordered parts separated by a variable distance and allowing for substitutions. In order to assess their statistical significance, we propose approximations of the probability of occurrences of such a structured motif in a given sequence. An application of our method to evaluate candidate promoters in E. coli and B. subtilis is presented. Simulations show the goodness of the approximations. PMID:12614545

  11. Alanine substitutions of noncysteine residues in the cysteine-stabilized αβ motif

    PubMed Central

    Yang, Ying-Fang; Cheng, Kuo-Chang; Tsai, Ping-Hsing; Liu, Chung-Cheng; Lee, Tian-Ren; Ping-Chiang Lyu

    2009-01-01

    The protein scaffold is a peptide framework with a high tolerance of residue modifications. The cysteine-stabilized αβ motif (CSαβ) consists of an α-helix and an antiparallel triple-stranded β-sheet connected by two disulfide bridges. Proteins containing this motif share low sequence identity but high structural similarity and has been suggested as a good scaffold for protein engineering. The Vigna radiate defensin 1 (VrD1), a plant defensin, serves here as a model protein to probe the amino acid tolerance of CSαβ motif. A systematic alanine substitution is performed on the VrD1. The key residues governing the inhibitory function and structure stability are monitored. Thirty-two of 46 residue positions of VrD1 are altered by site-directed mutagenesis techniques. The circular dichroism spectrum, intrinsic fluorescence spectrum, and chemical denaturation are used to analyze the conformation and structural stability of proteins. The secondary structures were highly tolerant to the amino acid substitutions; however, the protein stabilities were varied for each mutant. Many mutants, although they maintained their conformations, altered their inhibitory function significantly. In this study, we reported the first alanine scan on the plant defensin containing the CSαβ motif. The information is valuable to the scaffold with the CSαβ motif and protein engineering. PMID:19533758

  12. Identification of a consensus motif in substrates bound by a Type I Hsp40

    PubMed Central

    Kota, Pradeep; Summers, Daniel W.; Ren, Hong-Yu; Cyr, Douglas M.; Dokholyan, Nikolay V.

    2009-01-01

    Protein aggregation is a hallmark of a large and diverse number of conformational diseases. Molecular chaperones of the Hsp40 family (Escherichia coli DnaJ homologs) recognize misfolded disease proteins and suppress the accumulation of toxic protein species. Type I Hsp40s are very potent at suppressing protein aggregation and facilitating the refolding of damaged proteins. Yet, the molecular mechanism for the recognition of nonnative polypeptides by Type I Hsp40s such as yeast Ydj1 is not clear. Here we computationally identify a unique motif that is selectively recognized by Ydj1p. The motif is characterized by the consensus sequence GX[LMQ]{P}X{P}{CIMPVW}, where [XY] denotes either X or Y and {XY} denotes neither X nor Y. We further verify the validity of the motif by site-directed mutagenesis and show that substrate binding by Ydj1 requires recognition of this motif. A yeast proteome screen revealed that many proteins contain more than one stretch of residues that contain the motif and are separated by varying numbers of amino acids. In light of our results, we propose a 2-site peptide-binding model and a plausible mechanism of peptide presentation by Ydj1p to the chaperones of the Hsp70 family. Based on our results, and given that Ydj1p and its human ortholog Hdj2 are functionally interchangeable, we hypothesize that our results can be extended to understanding human diseases. PMID:19549854

  13. Identification of a SUMO-binding motif that recognizes SUMO-modified proteins

    PubMed Central

    Song, Jing; Durrin, Linda K.; Wilkinson, Thomas A.; Krontiris, Theodore G.; Chen, Yuan

    2004-01-01

    Posttranslational modification by the ubiquitin homologue, small ubiquitin-like modifier 1 (SUMO-1), has been established as an important regulatory mechanism. However, in most cases it is not clear how sumoylation regulates various cellular functions. Emerging evidence suggests that sumoylation may play a general role in regulating protein-protein interactions, as shown in RanBP2/Nup358 and RanGAP1 interaction. In this study, we have defined an amino acid sequence motif that binds SUMO. This motif, V/I-X-V/I-V/I, was identified by NMR spectroscopic characterization of interactions among SUMO-1 and peptides derived from proteins that are known to bind SUMO or sumoylated proteins. This motif binds all SUMO paralogues (SUMO-1-3). Using site-directed mutagenesis, we also show that this SUMO-binding motif in RanBP2/Nup358 is responsible for the interaction between RanBP2/Nup358 and sumoylated RanGAP1. The SUMO-binding motif exists in nearly all proteins known to be involved in SUMO-dependent processes, suggesting its general role in sumoylation-dependent cellular functions. PMID:15388847

  14. Mechano-chemical selections of two competitive unfolding pathways of a single DNA i-motif

    NASA Astrophysics Data System (ADS)

    Xu, Yue; Chen, Hu; Qu, Yu-Jie; Artem, K. Efremov; Li, Ming; Ouyang, Zhong-Can; Liu, Dong-Sheng; Yan, Jie

    2014-06-01

    The DNA i-motif is a quadruplex structure formed in tandem cytosine-rich sequences in slightly acidic conditions. Besides being considered as a building block of DNA nano-devices, it may also play potential roles in regulating chromosome stability and gene transcriptions. The stability of i-motif is crucial for these functions. In this work, we investigated the mechanical stability of a single i-motif formed in the human telomeric sequence 5'-(CCCTAA)3CCC, which revealed a novel pH and loading rate-dependent bimodal unfolding force distribution. Although the cause of the bimodal unfolding force species is not clear, we proposed a phenomenological model involving a direct unfolding favored at lower loading rate or higher pH value, which is subject to competition with another unfolding pathway through a mechanically stable intermediate state whose nature is yet to be determined. Overall, the unique mechano—chemical responses of i-motif-provide a new perspective to its stability, which may be useful to guide designing new i-motif-based DNA mechanical nano-devices.

  15. CLIMP: Clustering Motifs via Maximal Cliques with Parallel Computing Design.

    PubMed

    Zhang, Shaoqiang; Chen, Yong

    2016-01-01

    A set of conserved binding sites recognized by a transcription factor is called a motif, which can be found by many applications of comparative genomics for identifying over-represented segments. Moreover, when numerous putative motifs are predicted from a collection of genome-wide data, their similarity data can be represented as a large graph, where these motifs are connected to one another. However, an efficient clustering algorithm is desired for clustering the motifs that belong to the same groups and separating the motifs that belong to different groups, or even deleting an amount of spurious ones. In this work, a new motif clustering algorithm, CLIMP, is proposed by using maximal cliques and sped up by parallelizing its program. When a synthetic motif dataset from the database JASPAR, a set of putative motifs from a phylogenetic foot-printing dataset, and a set of putative motifs from a ChIP dataset are used to compare the performances of CLIMP and two other high-performance algorithms, the results demonstrate that CLIMP mostly outperforms the two algorithms on the three datasets for motif clustering, so that it can be a useful complement of the clustering procedures in some genome-wide motif prediction pipelines. CLIMP is available at http://sqzhang.cn/climp.html. PMID:27487245

  16. CLIMP: Clustering Motifs via Maximal Cliques with Parallel Computing Design.

    PubMed

    Zhang, Shaoqiang; Chen, Yong

    2016-01-01

    A set of conserved binding sites recognized by a transcription factor is called a motif, which can be found by many applications of comparative genomics for identifying over-represented segments. Moreover, when numerous putative motifs are predicted from a collection of genome-wide data, their similarity data can be represented as a large graph, where these motifs are connected to one another. However, an efficient clustering algorithm is desired for clustering the motifs that belong to the same groups and separating the motifs that belong to different groups, or even deleting an amount of spurious ones. In this work, a new motif clustering algorithm, CLIMP, is proposed by using maximal cliques and sped up by parallelizing its program. When a synthetic motif dataset from the database JASPAR, a set of putative motifs from a phylogenetic foot-printing dataset, and a set of putative motifs from a ChIP dataset are used to compare the performances of CLIMP and two other high-performance algorithms, the results demonstrate that CLIMP mostly outperforms the two algorithms on the three datasets for motif clustering, so that it can be a useful complement of the clustering procedures in some genome-wide motif prediction pipelines. CLIMP is available at http://sqzhang.cn/climp.html.

  17. CLIMP: Clustering Motifs via Maximal Cliques with Parallel Computing Design

    PubMed Central

    Chen, Yong

    2016-01-01

    A set of conserved binding sites recognized by a transcription factor is called a motif, which can be found by many applications of comparative genomics for identifying over-represented segments. Moreover, when numerous putative motifs are predicted from a collection of genome-wide data, their similarity data can be represented as a large graph, where these motifs are connected to one another. However, an efficient clustering algorithm is desired for clustering the motifs that belong to the same groups and separating the motifs that belong to different groups, or even deleting an amount of spurious ones. In this work, a new motif clustering algorithm, CLIMP, is proposed by using maximal cliques and sped up by parallelizing its program. When a synthetic motif dataset from the database JASPAR, a set of putative motifs from a phylogenetic foot-printing dataset, and a set of putative motifs from a ChIP dataset are used to compare the performances of CLIMP and two other high-performance algorithms, the results demonstrate that CLIMP mostly outperforms the two algorithms on the three datasets for motif clustering, so that it can be a useful complement of the clustering procedures in some genome-wide motif prediction pipelines. CLIMP is available at http://sqzhang.cn/climp.html. PMID:27487245

  18. MISAE: a new approach for regulatory motif extraction.

    PubMed

    Sun, Zhaohui; Yang, Jingyi; Deogun, Jitender S

    2004-01-01

    The recognition of regulatory motifs of co-regulated genes is essential for understanding the regulatory mechanisms. However, the automatic extraction of regulatory motifs from a given data set of the upstream non-coding DNA sequences of a family of co-regulated genes is difficult because regulatory motifs are often subtle and inexact. This problem is further complicated by the corruption of the data sets. In this paper, a new approach called Mismatch-allowed Probabilistic Suffix Tree Motif Extraction (MISAE) is proposed. It combines the mismatch-allowed probabilistic suffix tree that is a probabilistic model and local prediction for the extraction of regulatory motifs. The proposed approach is tested on 15 co-regulated gene families and compares favorably with other state-of-the-art approaches. Moreover, MISAE performs well on "corrupted" data sets. It is able to extract the motif from a "corrupted" data set with less than one fourth of the sequences containing the real motif.

  19. RNA structural motif recognition based on least-squares distance.

    PubMed

    Shen, Ying; Wong, Hau-San; Zhang, Shaohong; Zhang, Lin

    2013-09-01

    RNA structural motifs are recurrent structural elements occurring in RNA molecules. RNA structural motif recognition aims to find RNA substructures that are similar to a query motif, and it is important for RNA structure analysis and RNA function prediction. In view of this, we propose a new method known as RNA Structural Motif Recognition based on Least-Squares distance (LS-RSMR) to effectively recognize RNA structural motifs. A test set consisting of five types of RNA structural motifs occurring in Escherichia coli ribosomal RNA is compiled by us. Experiments are conducted for recognizing these five types of motifs. The experimental results fully reveal the superiority of the proposed LS-RSMR compared with four other state-of-the-art methods.

  20. Functional Analysis of Semi-conserved Transit Peptide Motifs and Mechanistic Implications in Precursor Targeting and Recognition.

    PubMed

    Holbrook, Kristen; Subramanian, Chitra; Chotewutmontri, Prakitchai; Reddick, L Evan; Wright, Sarah; Zhang, Huixia; Moncrief, Lily; Bruce, Barry D

    2016-09-01

    Over 95% of plastid proteins are nuclear-encoded as their precursors containing an N-terminal extension known as the transit peptide (TP). Although highly variable, TPs direct the precursors through a conserved, posttranslational mechanism involving translocons in the outer (TOC) and inner envelope (TOC). The organelle import specificity is mediated by one or more components of the Toc complex. However, the high TP diversity creates a paradox on how the sequences can be specifically recognized. An emerging model of TP design is that they contain multiple loosely conserved motifs that are recognized at different steps in the targeting and transport process. Bioinformatics has demonstrated that many TPs contain semi-conserved physicochemical motifs, termed FGLK. In order to characterize FGLK motifs in TP recognition and import, we have analyzed two well-studied TPs from the precursor of RuBisCO small subunit (SStp) and ferredoxin (Fdtp). Both SStp and Fdtp contain two FGLK motifs. Analysis of large set mutations (∼85) in these two motifs using in vitro, in organello, and in vivo approaches support a model in which the FGLK domains mediate interaction with TOC34 and possibly other TOC components. In vivo import analysis suggests that multiple FGLK motifs are functionally redundant. Furthermore, we discuss how FGLK motifs are required for efficient precursor protein import and how these elements may permit a convergent function of this highly variable class of targeting sequences. PMID:27378725

  1. Chaotic motif sampler: detecting motifs from biological sequences by using chaotic neurodynamics

    NASA Astrophysics Data System (ADS)

    Matsuura, Takafumi; Ikeguchi, Tohru

    Identification of a region in biological sequences, motif extraction problem (MEP) is solved in bioinformatics. However, the MEP is an NP-hard problem. Therefore, it is almost impossible to obtain an optimal solution within a reasonable time frame. To find near optimal solutions for NP-hard combinatorial optimization problems such as traveling salesman problems, quadratic assignment problems, and vehicle routing problems, chaotic search, which is one of the deterministic approaches, has been proposed and exhibits better performance than stochastic approaches. In this paper, we propose a new alignment method that employs chaotic dynamics to solve the MEPs. It is called the Chaotic Motif Sampler. We show that the performance of the Chaotic Motif Sampler is considerably better than that of the conventional methods such as the Gibbs Site Sampler and the Neighborhood Optimization for Multiple Alignment Discovery.

  2. The RNA 3D Motif Atlas: Computational methods for extraction, organization and evaluation of RNA motifs.

    PubMed

    Parlea, Lorena G; Sweeney, Blake A; Hosseini-Asanjan, Maryam; Zirbel, Craig L; Leontis, Neocles B

    2016-07-01

    RNA 3D motifs occupy places in structured RNA molecules that correspond to the hairpin, internal and multi-helix junction "loops" of their secondary structure representations. As many as 40% of the nucleotides of an RNA molecule can belong to these structural elements, which are distinct from the regular double helical regions formed by contiguous AU, GC, and GU Watson-Crick basepairs. With the large number of atomic- or near atomic-resolution 3D structures appearing in a steady stream in the PDB/NDB structure databases, the automated identification, extraction, comparison, clustering and visualization of these structural elements presents an opportunity to enhance RNA science. Three broad applications are: (1) identification of modular, autonomous structural units for RNA nanotechnology, nanobiology and synthetic biology applications; (2) bioinformatic analysis to improve RNA 3D structure prediction from sequence; and (3) creation of searchable databases for exploring the binding specificities, structural flexibility, and dynamics of these RNA elements. In this contribution, we review methods developed for computational extraction of hairpin and internal loop motifs from a non-redundant set of high-quality RNA 3D structures. We provide a statistical summary of the extracted hairpin and internal loop motifs in the most recent version of the RNA 3D Motif Atlas. We also explore the reliability and accuracy of the extraction process by examining its performance in clustering recurrent motifs from homologous ribosomal RNA (rRNA) structures. We conclude with a summary of remaining challenges, especially with regard to extraction of multi-helix junction motifs. PMID:27125735

  3. Signature motifs of GDP polyribonucleotidyltransferase, a non-segmented negative strand RNA viral mRNA capping enzyme, domain in the L protein are required for covalent enzyme-pRNA intermediate formation.

    PubMed

    Neubauer, Julie; Ogino, Minako; Green, Todd J; Ogino, Tomoaki

    2016-01-01

    The unconventional mRNA capping enzyme (GDP polyribonucleotidyltransferase, PRNTase; block V) domain in RNA polymerase L proteins of non-segmented negative strand (NNS) RNA viruses (e.g. rabies, measles, Ebola) contains five collinear sequence elements, Rx(3)Wx(3-8)ΦxGxζx(P/A) (motif A; Φ, hydrophobic; ζ, hydrophilic), (Y/W)ΦGSxT (motif B), W (motif C), HR (motif D) and ζxxΦx(F/Y)QxxΦ (motif E). We performed site-directed mutagenesis of the L protein of vesicular stomatitis virus (VSV, a prototypic NNS RNA virus) to examine participation of these motifs in mRNA capping. Similar to the catalytic residues in motif D, G1100 in motif A, T1157 in motif B, W1188 in motif C, and F1269 and Q1270 in motif E were found to be essential or important for the PRNTase activity in the step of the covalent L-pRNA intermediate formation, but not for the GTPase activity that generates GDP (pRNA acceptor). Cap defective mutations in these residues induced termination of mRNA synthesis at position +40 followed by aberrant stop-start transcription, and abolished virus gene expression in host cells. These results suggest that the conserved motifs constitute the active site of the PRNTase domain and the L-pRNA intermediate formation followed by the cap formation is essential for successful synthesis of full-length mRNAs.

  4. MINER: software for phylogenetic motif identification.

    PubMed

    La, David; Livesay, Dennis R

    2005-07-01

    MINER is web-based software for phylogenetic motif (PM) identification. PMs are sequence regions (fragments) that conserve the overall familial phylogeny. PMs have been shown to correspond to a wide variety of catalytic regions, substrate-binding sites and protein interfaces, making them ideal functional site predictions. The MINER output provides an intuitive interface for interactive PM sequence analysis and structural visualization. The web implementation of MINER is freely available at http://www.pmap.csupomona.edu/MINER/. Source code is available to the academic community on request.

  5. Transcription factor motif quality assessment requires systematic comparative analysis

    PubMed Central

    Kibet, Caleb Kipkurui; Machanick, Philip

    2016-01-01

    Transcription factor (TF) binding site prediction remains a challenge in gene regulatory research due to degeneracy and potential variability in binding sites in the genome. Dozens of algorithms designed to learn binding models (motifs) have generated many motifs available in research papers with a subset making it to databases like JASPAR, UniPROBE and Transfac. The presence of many versions of motifs from the various databases for a single TF and the lack of a standardized assessment technique makes it difficult for biologists to make an appropriate choice of binding model and for algorithm developers to benchmark, test and improve on their models. In this study, we review and evaluate the approaches in use, highlight differences and demonstrate the difficulty of defining a standardized motif assessment approach. We review scoring functions, motif length, test data and the type of performance metrics used in prior studies as some of the factors that influence the outcome of a motif assessment. We show that the scoring functions and statistics used in motif assessment influence ranking of motifs in a TF-specific manner. We also show that TF binding specificity can vary by source of genomic binding data. We also demonstrate that information content of a motif is not in isolation a measure of motif quality but is influenced by TF binding behaviour. We conclude that there is a need for an easy-to-use tool that presents all available evidence for a comparative analysis. PMID:27092243

  6. Potential Direct Regulators of the Drosophila yellow Gene Identified by Yeast One-Hybrid and RNAi Screens

    PubMed Central

    Kalay, Gizem; Lusk, Richard; Dome, Mackenzie; Hens, Korneel; Deplancke, Bart; Wittkopp, Patricia J.

    2016-01-01

    The regulation of gene expression controls development, and changes in this regulation often contribute to phenotypic evolution. Drosophila pigmentation is a model system for studying evolutionary changes in gene regulation, with differences in expression of pigmentation genes such as yellow that correlate with divergent pigment patterns among species shown to be caused by changes in cis- and trans-regulation. Currently, much more is known about the cis-regulatory component of divergent yellow expression than the trans-regulatory component, in part because very few trans-acting regulators of yellow expression have been identified. This study aims to improve our understanding of the trans-acting control of yellow expression by combining yeast-one-hybrid and RNAi screens for transcription factors binding to yellow cis-regulatory sequences and affecting abdominal pigmentation in adults, respectively. Of the 670 transcription factors included in the yeast-one-hybrid screen, 45 showed evidence of binding to one or more sequence fragments tested from the 5′ intergenic and intronic yellow sequences from D. melanogaster, D. pseudoobscura, and D. willistoni, suggesting that they might be direct regulators of yellow expression. Of the 670 transcription factors included in the yeast-one-hybrid screen, plus another TF previously shown to be genetically upstream of yellow, 125 were also tested using RNAi, and 32 showed altered abdominal pigmentation. Nine transcription factors were identified in both screens, including four nuclear receptors related to ecdysone signaling (Hr78, Hr38, Hr46, and Eip78C). This finding suggests that yellow expression might be directly controlled by nuclear receptors influenced by ecdysone during early pupal development when adult pigmentation is forming. PMID:27527791

  7. The network motif architecture of dominance hierarchies.

    PubMed

    Shizuka, Daizaburo; McDonald, David B

    2015-04-01

    The widespread existence of dominance hierarchies has been a central puzzle in social evolution, yet we lack a framework for synthesizing the vast empirical data on hierarchy structure in animal groups. We applied network motif analysis to compare the structures of dominance networks from data published over the past 80 years. Overall patterns of dominance relations, including some aspects of non-interactions, were strikingly similar across disparate group types. For example, nearly all groups exhibited high frequencies of transitive triads, whereas cycles were very rare. Moreover, pass-along triads were rare, and double-dominant triads were common in most groups. These patterns did not vary in any systematic way across taxa, study settings (captive or wild) or group size. Two factors significantly affected network motif structure: the proportion of dyads that were observed to interact and the interaction rates of the top-ranked individuals. Thus, study design (i.e. how many interactions were observed) and the behaviour of key individuals in the group could explain much of the variations we see in social hierarchies across animals. Our findings confirm the ubiquity of dominance hierarchies across all animal systems, and demonstrate that network analysis provides new avenues for comparative analyses of social hierarchies. PMID:25762649

  8. The network motif architecture of dominance hierarchies.

    PubMed

    Shizuka, Daizaburo; McDonald, David B

    2015-04-01

    The widespread existence of dominance hierarchies has been a central puzzle in social evolution, yet we lack a framework for synthesizing the vast empirical data on hierarchy structure in animal groups. We applied network motif analysis to compare the structures of dominance networks from data published over the past 80 years. Overall patterns of dominance relations, including some aspects of non-interactions, were strikingly similar across disparate group types. For example, nearly all groups exhibited high frequencies of transitive triads, whereas cycles were very rare. Moreover, pass-along triads were rare, and double-dominant triads were common in most groups. These patterns did not vary in any systematic way across taxa, study settings (captive or wild) or group size. Two factors significantly affected network motif structure: the proportion of dyads that were observed to interact and the interaction rates of the top-ranked individuals. Thus, study design (i.e. how many interactions were observed) and the behaviour of key individuals in the group could explain much of the variations we see in social hierarchies across animals. Our findings confirm the ubiquity of dominance hierarchies across all animal systems, and demonstrate that network analysis provides new avenues for comparative analyses of social hierarchies.

  9. Structural motifs and the stability of fullerenes

    SciTech Connect

    Austin, S.J.; Fowler, P.W.; Manolopoulos, D.E.; Orlandi, G.; Zerbetto, F.

    1995-05-18

    Full geometry optimization has been performed within the semiempirical QCFF/PI model for the 1812 fullerene structural isomers of C{sub 60} formed by 12 pentagons and 20 hexagons. All are local minima on the potential energy hypersurface. Correlations of total energy with many structural motifs yield highly scattered diagrams, but some exhibit linear trends. Penalty and merit functions can be assigned to certain motifs: inclusion of a fused pentagon pair entails an average penalty of 111 kJ mol{sup -1}; a generic hexagon triple costs 23 kJ mol{sup -1}; a triple (open or fused) comprising a pentagon between two hexagonal neighbors gives a stabilization of 19 kJ mol{sup -1}. These results can be understood in terms of the curved nature of fullerene molecules: pentagons should be isolated to avoid sharp local curvature, hexagon triples are costly because they enforce local planarity and hence imply high curvature in another part of the fullerene surface, but hexagon-pentagon-hexagon triples allow the surface to distribute steric strain by warping. The best linear fit is found for H, the second moment of the hexagon-neighbor-index signature, which fits the total energies with a standard deviation of only 53 kJ mol{sup -1} and must be minimized for stability; this index too can be interpreted in terms of curvature. 26 refs., 5 figs.

  10. Network motifs: simple building blocks of complex networks.

    PubMed

    Milo, R; Shen-Orr, S; Itzkovitz, S; Kashtan, N; Chklovskii, D; Alon, U

    2002-10-25

    Complex networks are studied across many fields of science. To uncover their structural design principles, we defined "network motifs," patterns of interconnections occurring in complex networks at numbers that are significantly higher than those in randomized networks. We found such motifs in networks from biochemistry, neurobiology, ecology, and engineering. The motifs shared by ecological food webs were distinct from the motifs shared by the genetic networks of Escherichia coli and Saccharomyces cerevisiae or from those found in the World Wide Web. Similar motifs were found in networks that perform information processing, even though they describe elements as different as biomolecules within a cell and synaptic connections between neurons in Caenorhabditis elegans. Motifs may thus define universal classes of networks. This approach may uncover the basic building blocks of most networks. PMID:12399590

  11. A Gibbs sampler for motif detection in phylogenetically close sequences

    NASA Astrophysics Data System (ADS)

    Siddharthan, Rahul; van Nimwegen, Erik; Siggia, Eric

    2004-03-01

    Genes are regulated by transcription factors that bind to DNA upstream of genes and recognize short conserved ``motifs'' in a random intergenic ``background''. Motif-finders such as the Gibbs sampler compare the probability of these short sequences being represented by ``weight matrices'' to the probability of their arising from the background ``null model'', and explore this space (analogous to a free-energy landscape). But closely related species may show conservation not because of functional sites but simply because they have not had sufficient time to diverge, so conventional methods will fail. We introduce a new Gibbs sampler algorithm that accounts for common ancestry when searching for motifs, while requiring minimal ``prior'' assumptions on the number and types of motifs, assessing the significance of detected motifs by ``tracking'' clusters that stay together. We apply this scheme to motif detection in sporulation-cycle genes in the yeast S. cerevisiae, using recent sequences of other closely-related Saccharomyces species.

  12. Network Motifs: Simple Building Blocks of Complex Networks

    NASA Astrophysics Data System (ADS)

    Milo, R.; Shen-Orr, S.; Itzkovitz, S.; Kashtan, N.; Chklovskii, D.; Alon, U.

    2002-10-01

    Complex networks are studied across many fields of science. To uncover their structural design principles, we defined ``network motifs,'' patterns of interconnections occurring in complex networks at numbers that are significantly higher than those in randomized networks. We found such motifs in networks from biochemistry, neurobiology, ecology, and engineering. The motifs shared by ecological food webs were distinct from the motifs shared by the genetic networks of Escherichia coli and Saccharomyces cerevisiae or from those found in the World Wide Web. Similar motifs were found in networks that perform information processing, even though they describe elements as different as biomolecules within a cell and synaptic connections between neurons in Caenorhabditis elegans. Motifs may thus define universal classes of networks. This approach may uncover the basic building blocks of most networks.

  13. Detecting DNA regulatory motifs by incorporating positional trendsin information content

    SciTech Connect

    Kechris, Katherina J.; van Zwet, Erik; Bickel, Peter J.; Eisen,Michael B.

    2004-05-04

    On the basis of the observation that conserved positions in transcription factor binding sites are often clustered together, we propose a simple extension to the model-based motif discovery methods. We assign position-specific prior distributions to the frequency parameters of the model, penalizing deviations from a specified conservation profile. Examples with both simulated and real data show that this extension helps discover motifs as the data become noisier or when there is a competing false motif.

  14. STEME: a robust, accurate motif finder for large data sets.

    PubMed

    Reid, John E; Wernisch, Lorenz

    2014-01-01

    Motif finding is a difficult problem that has been studied for over 20 years. Some older popular motif finders are not suitable for analysis of the large data sets generated by next-generation sequencing. We recently published an efficient approximation (STEME) to the EM algorithm that is at the core of many motif finders such as MEME. This approximation allows the EM algorithm to be applied to large data sets. In this work we describe several efficient extensions to STEME that are based on the MEME algorithm. Together with the original STEME EM approximation, these extensions make STEME a fully-fledged motif finder with similar properties to MEME. We discuss the difficulty of objectively comparing motif finders. We show that STEME performs comparably to existing prominent discriminative motif finders, DREME and Trawler, on 13 sets of transcription factor binding data in mouse ES cells. We demonstrate the ability of STEME to find long degenerate motifs which these discriminative motif finders do not find. As part of our method, we extend an earlier method due to Nagarajan et al. for the efficient calculation of motif E-values. STEME's source code is available under an open source license and STEME is available via a web interface. PMID:24625410

  15. Motif content comparison between monocot and dicot species

    PubMed Central

    Cserhati, Matyas

    2015-01-01

    While a number of DNA sequence motifs have been functionally characterized, the full repertoire of motifs in an organism (the motifome) is yet to be characterized. The present study wishes to widen the scope of motif content analysis in different monocot and dicot species that include both rice species, Brachypodium, corn, wheat as monocots and Arabidopsis, Lotus japonica, Medicago truncatula, and Populus tremula as dicots. All possible existing motifs were analyzed in different regions of genomes such as were found in different sets of sequences in these species: the whole genome, core proximal and distal promoters, 5′ and 3′ UTRs, and the 1st introns. Due to the increased number of species involved in this study compared to previous works, species relationships were analyzed based on the similarity of common motif content. Certain secondary structure elements were inferred in the genomes of these species as well as new unknown motifs. The distribution of 20 motifs common to the studied species were found to have a significantly larger occurrence within the promoters and 3′ UTRs of genes, both being regulatory regions. Motifs common to the promoter regions of japonica rice, Brachypodium, and corn were also found in a number of orthologous and paralogous genes. Some of our motifs were found to be complementary to miRNA elements in Brachypodium distachyon and japonica rice. PMID:26484161

  16. Gibbs motif sampling: detection of bacterial outer membrane protein repeats.

    PubMed Central

    Neuwald, A. F.; Liu, J. S.; Lawrence, C. E.

    1995-01-01

    The detection and alignment of locally conserved regions (motifs) in multiple sequences can provide insight into protein structure, function, and evolution. A new Gibbs sampling algorithm is described that detects motif-encoding regions in sequences and optimally partitions them into distinct motif models; this is illustrated using a set of immunoglobulin fold proteins. When applied to sequences sharing a single motif, the sampler can be used to classify motif regions into related submodels, as is illustrated using helix-turn-helix DNA-binding proteins. Other statistically based procedures are described for searching a database for sequences matching motifs found by the sampler. When applied to a set of 32 very distantly related bacterial integral outer membrane proteins, the sampler revealed that they share a subtle, repetitive motif. Although BLAST (Altschul SF et al., 1990, J Mol Biol 215:403-410) fails to detect significant pairwise similarity between any of the sequences, the repeats present in these outer membrane proteins, taken as a whole, are highly significant (based on a generally applicable statistical test for motifs described here). Analysis of bacterial porins with known trimeric beta-barrel structure and related proteins reveals a similar repetitive motif corresponding to alternating membrane-spanning beta-strands. These beta-strands occur on the membrane interface (as opposed to the trimeric interface) of the beta-barrel. The broad conservation and structural location of these repeats suggests that they play important functional roles. PMID:8520488

  17. Plasticity of the RNA Kink Turn Structural Motif

    SciTech Connect

    Antonioli, A.; Cochrane, J; Lipchock, S; Strobel, S

    2010-01-01

    The kink turn (K-turn) is an RNA structural motif found in many biologically significant RNAs. While most examples of the K-turn have a similar fold, the crystal structure of the Azoarcus group I intron revealed a novel RNA conformation, a reverse kink turn bent in the direction opposite that of a consensus K-turn. The reverse K-turn is bent toward the major grooves rather than the minor grooves of the flanking helices, yet the sequence differs from the K-turn consensus by only a single nucleotide. Here we demonstrate that the reverse bend direction is not solely defined by internal sequence elements, but is instead affected by structural elements external to the K-turn. It bends toward the major groove under the direction of a tetraloop-tetraloop receptor. The ability of one sequence to form two distinct structures demonstrates the inherent plasticity of the K-turn sequence. Such plasticity suggests that the K-turn is not a primary element in RNA folding, but instead is shaped by other structural elements within the RNA or ribonucleoprotein assembly.

  18. An RNA motif that binds ATP

    NASA Technical Reports Server (NTRS)

    Sassanfar, M.; Szostak, J. W.

    1993-01-01

    RNAs that contain specific high-affinity binding sites for small molecule ligands immobilized on a solid support are present at a frequency of roughly one in 10(10)-10(11) in pools of random sequence RNA molecules. Here we describe a new in vitro selection procedure designed to ensure the isolation of RNAs that bind the ligand of interest in solution as well as on a solid support. We have used this method to isolate a remarkably small RNA motif that binds ATP, a substrate in numerous biological reactions and the universal biological high-energy intermediate. The selected ATP-binding RNAs contain a consensus sequence, embedded in a common secondary structure. The binding properties of ATP analogues and modified RNAs show that the binding interaction is characterized by a large number of close contacts between the ATP and RNA, and by a change in the conformation of the RNA.

  19. A hydrophobic proline-rich motif is involved in the intracellular targeting of temperature-induced lipocalin.

    PubMed

    Hernández-Gras, Francesc; Boronat, Albert

    2015-06-01

    Temperature-induced lipocalins (TILs) play an essential role in the response of plants to different abiotic stresses. In agreement with their proposed role in protecting membrane lipids, TILs have been reported to be associated to cell membranes. However, TILs show an overall hydrophilic character and do not contain any signal for membrane targeting nor hydrophobic sequences that could represent transmembrane domains. Arabidopsis TIL (AtTIL) is considered the ortholog of human ApoD, a protein known to associate to membranes through a short hydrophobic loop protruding from strands 5 and 6 of the lipocalin β-barrel. An equivalent loop (referred to as HPR motif) is also present between β-strands 5 and 6 of TILs. The HPR motif, which is highly conserved among TIL proteins, extends over as short stretch of eight amino acids and contains four invariant proline residues. Subcellular localization studies have shown that TILs are targeted to a variety of cell membranes and organelles. We have also found that the HPR motif is necessary and sufficient for the intracellular targeting of TILs. Modeling studies suggest that the HPR motif may directly anchor TILs to cell membranes, favoring in this way further contact with the polar group of membrane lipids. However, some particular features of the HPR motif open the possibility that targeting of TILs to cell membranes could be mediated by interaction with other proteins. The functional analysis of the HPR motif unveils the existence of novel mechanisms involved in the intracellular targeting of proteins in plants.

  20. Encoded expansion: an efficient algorithm to discover identical string motifs.

    PubMed

    Azmi, Aqil M; Al-Ssulami, Abdulrakeeb

    2014-01-01

    A major task in computational biology is the discovery of short recurring string patterns known as motifs. Most of the schemes to discover motifs are either stochastic or combinatorial in nature. Stochastic approaches do not guarantee finding the correct motifs, while the combinatorial schemes tend to have an exponential time complexity with respect to motif length. To alleviate the cost, the combinatorial approach exploits dynamic data structures such as trees or graphs. Recently (Karci (2009) Efficient automatic exact motif discovery algorithms for biological sequences, Expert Systems with Applications 36:7952-7963) devised a deterministic algorithm that finds all the identical copies of string motifs of all sizes [Formula: see text] in theoretical time complexity of [Formula: see text] and a space complexity of [Formula: see text] where [Formula: see text] is the length of the input sequence and [Formula: see text] is the length of the longest possible string motif. In this paper, we present a significant improvement on Karci's original algorithm. The algorithm that we propose reports all identical string motifs of sizes [Formula: see text] that occur at least [Formula: see text] times. Our algorithm starts with string motifs of size 2, and at each iteration it expands the candidate string motifs by one symbol throwing out those that occur less than [Formula: see text] times in the entire input sequence. We use a simple array and data encoding to achieve theoretical worst-case time complexity of [Formula: see text] and a space complexity of [Formula: see text] Encoding of the substrings can speed up the process of comparison between string motifs. Experimental results on random and real biological sequences confirm that our algorithm has indeed a linear time complexity and it is more scalable in terms of sequence length than the existing algorithms. PMID:24871320

  1. Encoded Expansion: An Efficient Algorithm to Discover Identical String Motifs

    PubMed Central

    Azmi, Aqil M.; Al-Ssulami, Abdulrakeeb

    2014-01-01

    A major task in computational biology is the discovery of short recurring string patterns known as motifs. Most of the schemes to discover motifs are either stochastic or combinatorial in nature. Stochastic approaches do not guarantee finding the correct motifs, while the combinatorial schemes tend to have an exponential time complexity with respect to motif length. To alleviate the cost, the combinatorial approach exploits dynamic data structures such as trees or graphs. Recently (Karci (2009) Efficient automatic exact motif discovery algorithms for biological sequences, Expert Systems with Applications 36:7952–7963) devised a deterministic algorithm that finds all the identical copies of string motifs of all sizes in theoretical time complexity of and a space complexity of where is the length of the input sequence and is the length of the longest possible string motif. In this paper, we present a significant improvement on Karci's original algorithm. The algorithm that we propose reports all identical string motifs of sizes that occur at least times. Our algorithm starts with string motifs of size 2, and at each iteration it expands the candidate string motifs by one symbol throwing out those that occur less than times in the entire input sequence. We use a simple array and data encoding to achieve theoretical worst-case time complexity of and a space complexity of Encoding of the substrings can speed up the process of comparison between string motifs. Experimental results on random and real biological sequences confirm that our algorithm has indeed a linear time complexity and it is more scalable in terms of sequence length than the existing algorithms. PMID:24871320

  2. Encoded expansion: an efficient algorithm to discover identical string motifs.

    PubMed

    Azmi, Aqil M; Al-Ssulami, Abdulrakeeb

    2014-01-01

    A major task in computational biology is the discovery of short recurring string patterns known as motifs. Most of the schemes to discover motifs are either stochastic or combinatorial in nature. Stochastic approaches do not guarantee finding the correct motifs, while the combinatorial schemes tend to have an exponential time complexity with respect to motif length. To alleviate the cost, the combinatorial approach exploits dynamic data structures such as trees or graphs. Recently (Karci (2009) Efficient automatic exact motif discovery algorithms for biological sequences, Expert Systems with Applications 36:7952-7963) devised a deterministic algorithm that finds all the identical copies of string motifs of all sizes [Formula: see text] in theoretical time complexity of [Formula: see text] and a space complexity of [Formula: see text] where [Formula: see text] is the length of the input sequence and [Formula: see text] is the length of the longest possible string motif. In this paper, we present a significant improvement on Karci's original algorithm. The algorithm that we propose reports all identical string motifs of sizes [Formula: see text] that occur at least [Formula: see text] times. Our algorithm starts with string motifs of size 2, and at each iteration it expands the candidate string motifs by one symbol throwing out those that occur less than [Formula: see text] times in the entire input sequence. We use a simple array and data encoding to achieve theoretical worst-case time complexity of [Formula: see text] and a space complexity of [Formula: see text] Encoding of the substrings can speed up the process of comparison between string motifs. Experimental results on random and real biological sequences confirm that our algorithm has indeed a linear time complexity and it is more scalable in terms of sequence length than the existing algorithms.

  3. The Q Motif Is Involved in DNA Binding but Not ATP Binding in ChlR1 Helicase.

    PubMed

    Ding, Hao; Guo, Manhong; Vidhyasagar, Venkatasubramanian; Talwar, Tanu; Wu, Yuliang

    2015-01-01

    Helicases are molecular motors that couple the energy of ATP hydrolysis to the unwinding of structured DNA or RNA and chromatin remodeling. The conversion of energy derived from ATP hydrolysis into unwinding and remodeling is coordinated by seven sequence motifs (I, Ia, II, III, IV, V, and VI). The Q motif, consisting of nine amino acids (GFXXPXPIQ) with an invariant glutamine (Q) residue, has been identified in some, but not all helicases. Compared to the seven well-recognized conserved helicase motifs, the role of the Q motif is less acknowledged. Mutations in the human ChlR1 (DDX11) gene are associated with a unique genetic disorder known as Warsaw Breakage Syndrome, which is characterized by cellular defects in genome maintenance. To examine the roles of the Q motif in ChlR1 helicase, we performed site directed mutagenesis of glutamine to alanine at residue 23 in the Q motif of ChlR1. ChlR1 recombinant protein was overexpressed and purified from HEK293T cells. ChlR1-Q23A mutant abolished the helicase activity of ChlR1 and displayed reduced DNA binding ability. The mutant showed impaired ATPase activity but normal ATP binding. A thermal shift assay revealed that ChlR1-Q23A has a melting point value similar to ChlR1-WT. Partial proteolysis mapping demonstrated that ChlR1-WT and Q23A have a similar globular structure, although some subtle conformational differences in these two proteins are evident. Finally, we found ChlR1 exists and functions as a monomer in solution, which is different from FANCJ, in which the Q motif is involved in protein dimerization. Taken together, our results suggest that the Q motif is involved in DNA binding but not ATP binding in ChlR1 helicase.

  4. Crystal structure of SEL1L: Insight into the roles of SLR motifs in ERAD pathway

    PubMed Central

    Jeong, Hanbin; Sim, Hyo Jung; Song, Eun Kyung; Lee, Hakbong; Ha, Sung Chul; Jun, Youngsoo; Park, Tae Joo; Lee, Changwook

    2016-01-01

    Terminally misfolded proteins are selectively recognized and cleared by the endoplasmic reticulum-associated degradation (ERAD) pathway. SEL1L, a component of the ERAD machinery, plays an important role in selecting and transporting ERAD substrates for degradation. We have determined the crystal structure of the mouse SEL1L central domain comprising five Sel1-Like Repeats (SLR motifs 5 to 9; hereafter called SEL1Lcent). Strikingly, SEL1Lcent forms a homodimer with two-fold symmetry in a head-to-tail manner. Particularly, the SLR motif 9 plays an important role in dimer formation by adopting a domain-swapped structure and providing an extensive dimeric interface. We identified that the full-length SEL1L forms a self-oligomer through the SEL1Lcent domain in mammalian cells. Furthermore, we discovered that the SLR-C, comprising SLR motifs 10 and 11, of SEL1L directly interacts with the N-terminus luminal loops of HRD1. Therefore, we propose that certain SLR motifs of SEL1L play a unique role in membrane bound ERAD machinery. PMID:27064360

  5. Roles of conserved proline and glycosyltransferase motifs of EmbC in biosynthesis of lipoarabinomannan.

    PubMed

    Berg, Stefan; Starbuck, James; Torrelles, Jordi B; Vissa, Varalakshmi D; Crick, Dean C; Chatterjee, Delphi; Brennan, Patrick J

    2005-02-18

    D-Arabinans, composed of D-arabinofuranose (D-Araf), dominate the structure of mycobacterial cell walls in two settings, as part of lipoarabinomannan (LAM) and arabinogalactan, each with markedly different structures and functions. Little is known of the complexity of their biosynthesis. beta-D-Arabinofuranosyl-1-monophosphoryldecaprenol is the only known sugar donor. EmbA, EmbB, and EmbC, products of the paralogous genes embA, embB, and embC, the sites of resistance to the anti-tuberculosis drug ethambutol (EMB), are the only known implicated enzymes. EmbA and -B apparently contribute to the synthesis of arabinogalactan, whereas EmbC is reserved for the synthesis of LAM. The Emb proteins show no overall similarity to any known proteins beyond Mycobacterium and related genera. However, functional motifs, equivalent to a proline-rich motif of several bacterial polysaccharide co-polymerases and a superfamily of glycosyltransferases, were found. Site-directed mutagenesis in glycosyltransferase superfamily C resulted in complete ablation of LAM synthesis. Point mutations in three amino acids of the proline motif of EmbC resulted in marked reduction of LAM-arabinan synthesis and accumulation of an unknown intermediate and of the known precursor lipomannan. Yet the pattern of the differently linked d-Araf units observed in wild type LAM-arabinan was largely retained in the proline motif mutants. The results allow for the presentation of a unique model of arabinan synthesis. PMID:15546869

  6. A Conserved Upstream Motif Orchestrates Autonomous, Germline-Enriched Expression of Caenorhabditis elegans piRNAs

    PubMed Central

    Day, Amanda M.; Chun, Sang Young; Khivansara, Vishal; Kim, John K.

    2013-01-01

    Piwi-interacting RNAs (piRNAs) fulfill a critical, conserved role in defending the genome against foreign genetic elements. In many organisms, piRNAs appear to be derived from processing of a long, polycistronic RNA precursor. Here, we establish that each Caenorhabditis elegans piRNA represents a tiny, autonomous transcriptional unit. Remarkably, the minimal C. elegans piRNA cassette requires only a 21 nucleotide (nt) piRNA sequence and an ∼50 nt upstream motif with limited genomic context for expression. Combining computational analyses with a novel, in vivo transgenic system, we demonstrate that this upstream motif is necessary for independent expression of a germline-enriched, Piwi-dependent piRNA. We further show that a single nucleotide position within this motif directs differential germline enrichment. Accordingly, over 70% of C. elegans piRNAs are selectively expressed in male or female germline, and comparison of the genes they target suggests that these two populations have evolved independently. Together, our results indicate that C. elegans piRNA upstream motifs act as independent promoters to specify which sequences are expressed as piRNAs, how abundantly they are expressed, and in what germline. As the genome encodes well over 15,000 unique piRNA sequences, our study reveals that the number of transcriptional units encoding piRNAs rivals the number of mRNA coding genes in the C. elegans genome. PMID:23516384

  7. ELM: the status of the 2010 eukaryotic linear motif resource.

    PubMed

    Gould, Cathryn M; Diella, Francesca; Via, Allegra; Puntervoll, Pål; Gemünd, Christine; Chabanis-Davidson, Sophie; Michael, Sushama; Sayadi, Ahmed; Bryne, Jan Christian; Chica, Claudia; Seiler, Markus; Davey, Norman E; Haslam, Niall; Weatheritt, Robert J; Budd, Aidan; Hughes, Tim; Pas, Jakub; Rychlewski, Leszek; Travé, Gilles; Aasland, Rein; Helmer-Citterich, Manuela; Linding, Rune; Gibson, Toby J

    2010-01-01

    Linear motifs are short segments of multidomain proteins that provide regulatory functions independently of protein tertiary structure. Much of intracellular signalling passes through protein modifications at linear motifs. Many thousands of linear motif instances, most notably phosphorylation sites, have now been reported. Although clearly very abundant, linear motifs are difficult to predict de novo in protein sequences due to the difficulty of obtaining robust statistical assessments. The ELM resource at http://elm.eu.org/ provides an expanding knowledge base, currently covering 146 known motifs, with annotation that includes >1300 experimentally reported instances. ELM is also an exploratory tool for suggesting new candidates of known linear motifs in proteins of interest. Information about protein domains, protein structure and native disorder, cellular and taxonomic contexts is used to reduce or deprecate false positive matches. Results are graphically displayed in a 'Bar Code' format, which also displays known instances from homologous proteins through a novel 'Instance Mapper' protocol based on PHI-BLAST. ELM server output provides links to the ELM annotation as well as to a number of remote resources. Using the links, researchers can explore the motifs, proteins, complex structures and associated literature to evaluate whether candidate motifs might be worth experimental investigation. PMID:19920119

  8. DETAIL VIEW, MAIN ENTRANCE GATES, SHOWING A WINGED HOURGLASS MOTIF, ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    DETAIL VIEW, MAIN ENTRANCE GATES, SHOWING A WINGED HOURGLASS MOTIF, WHICH REFERS TO THE QUICK PASSAGE OF TIME AND THE SHORTNESS OF HUMAN LIFE. USE OF THIS MOTIF WAS A CARRYOVER FROM THE MCARTHUR GATES. - Woodlands Cemetery, 4000 Woodlands Avenue, Philadelphia, Philadelphia County, PA

  9. Role of GxxxG Motifs in Transmembrane Domain Interactions.

    PubMed

    Teese, Mark G; Langosch, Dieter

    2015-08-25

    Transmembrane (TM) helices of integral membrane proteins can facilitate strong and specific noncovalent protein-protein interactions. Mutagenesis and structural analyses have revealed numerous examples in which the interaction between TM helices of single-pass membrane proteins is dependent on a GxxxG or (small)xxx(small) motif. It is therefore tempting to use the presence of these simple motifs as an indicator of TM helix interactions. In this Current Topic review, we point out that these motifs are quite common, with more than 50% of single-pass TM domains containing a (small)xxx(small) motif. However, the actual interaction strength of motif-containing helices depends strongly on sequence context and membrane properties. In addition, recent studies have revealed several GxxxG-containing TM domains that interact via alternative interfaces involving hydrophobic, polar, aromatic, or even ionizable residues that do not form recognizable motifs. In multipass membrane proteins, GxxxG motifs can be important for protein folding, and not just oligomerization. Our current knowledge thus suggests that the presence of a GxxxG motif alone is a weak predictor of protein dimerization in the membrane. PMID:26244771

  10. Aztec, Incan and Mayan Motifs...Lead to Distinctive Designs.

    ERIC Educational Resources Information Center

    Shields, Joanne

    2001-01-01

    Describes an art project for seventh-grade students in which they choose motifs based on Incan, Aztec, and Mayan Indian materials to incorporate into two-dimensional designs. Explains that the activity objective is to create a unified, balanced and pleasing composition using a minimum of three motifs. (CMK)

  11. The phenomenon of astral motifs on late mediaeval tombstones

    NASA Astrophysics Data System (ADS)

    Mijatović, V.; Ninković, S.; Vemić, D.

    2003-10-01

    The authors study astral motifs present on some mediaeval tombstones found in present-day Serbia and Montenegro and in the neighbouring countries (especially in Bosnia and Herzegovina). The authors discern some important astral motifs, explain them and present a short review concerning their frequency.

  12. Identifying novel sequence variants of RNA 3D motifs

    PubMed Central

    Zirbel, Craig L.; Roll, James; Sweeney, Blake A.; Petrov, Anton I.; Pirrung, Meg; Leontis, Neocles B.

    2015-01-01

    Predicting RNA 3D structure from sequence is a major challenge in biophysics. An important sub-goal is accurately identifying recurrent 3D motifs from RNA internal and hairpin loop sequences extracted from secondary structure (2D) diagrams. We have developed and validated new probabilistic models for 3D motif sequences based on hybrid Stochastic Context-Free Grammars and Markov Random Fields (SCFG/MRF). The SCFG/MRF models are constructed using atomic-resolution RNA 3D structures. To parameterize each model, we use all instances of each motif found in the RNA 3D Motif Atlas and annotations of pairwise nucleotide interactions generated by the FR3D software. Isostericity relations between non-Watson–Crick basepairs are used in scoring sequence variants. SCFG techniques model nested pairs and insertions, while MRF ideas handle crossing interactions and base triples. We use test sets of randomly-generated sequences to set acceptance and rejection thresholds for each motif group and thus control the false positive rate. Validation was carried out by comparing results for four motif groups to RMDetect. The software developed for sequence scoring (JAR3D) is structured to automatically incorporate new motifs as they accumulate in the RNA 3D Motif Atlas when new structures are solved and is available free for download. PMID:26130723

  13. Automated discovery of active motifs in multiple RNA secondary structures

    SciTech Connect

    Wang, J.T.L.; Chang, Chia-Yo; Shapiro, B.A.

    1996-12-31

    In this paper we present a method for discovering approximately common motifs (also known as active motifs) in multiple RNA secondary structures. The secondary structures can be represented as ordered trees (i.e., the order among siblings matters). Motifs in these trees are connected subgraphs that can differ in both substitutions and deletions/insertions. The proposed method consists of two steps: (1) find candidate motifs in a small sample of the secondary structures; (2) search all of the secondary structures to determine how frequently these motifs occur (within the allowed approximation) in the secondary structures. To reduce the running time, we develop two optimization heuristics based on sampling and pattern matching techniques. Experimental results obtained by running these algorithms on both generated data and RNA secondary structures show the good performance of the algorithms. To demonstrate the utility of our algorithms, we discuss their applications to conducting the phylogenetic study of RNA sequences obtained from GenBank.

  14. Mapping cis-Regulatory Domains in the Human Genome UsingMulti-Species Conservation of Synteny

    SciTech Connect

    Ahituv, Nadav; Prabhakar, Shyam; Poulin, Francis; Rubin, EdwardM.; Couronne, Olivier

    2005-06-13

    Our inability to associate distant regulatory elements with the genes that they regulate has largely precluded their examination for sequence alterations contributing to human disease. One major obstacle is the large genomic space surrounding targeted genes in which such elements could potentially reside. In order to delineate gene regulatory boundaries we used whole-genome human-mouse-chicken (HMC) and human-mouse-frog (HMF) multiple alignments to compile conserved blocks of synteny (CBS), under the hypothesis that these blocks have been kept intact throughout evolution at least in part by the requirement of regulatory elements to stay linked to the genes that they regulate. A total of 2,116 and 1,942 CBS>200 kb were assembled for HMC and HMF respectively, encompassing 1.53 and 0.86 Gb of human sequence. To support the existence of complex long-range regulatory domains within these CBS we analyzed the prevalence and distribution of chromosomal aberrations leading to position effects (disruption of a genes regulatory environment), observing a clear bias not only for mapping onto CBS but also for longer CBS size. Our results provide a genome wide data set characterizing the regulatory domains of genes and the conserved regulatory elements within them.

  15. Functional conservation of cis-regulatory elements of heat-shock genes over long evolutionary distances.

    PubMed

    He, Zhengying; Eichel, Kelsie; Ruvinsky, Ilya

    2011-01-01

    Transcriptional control of gene regulation is an intricate process that requires precise orchestration of a number of molecular components. Studying its evolution can serve as a useful model for understanding how complex molecular machines evolve. One way to investigate evolution of transcriptional regulation is to test the functions of cis-elements from one species in a distant relative. Previous results suggested that few, if any, tissue-specific promoters from Drosophila are faithfully expressed in C. elegans. Here we show that, in contrast, promoters of fly and human heat-shock genes are upregulated in C. elegans upon exposure to heat. Inducibility under conditions of heat shock may represent a relatively simple "on-off" response, whereas complex expression patterns require integration of multiple signals. Our results suggest that simpler aspects of regulatory logic may be retained over longer periods of evolutionary time, while more complex ones may be diverging more rapidly.

  16. Making connections: insulators organize eukaryotic chromosomes into independent cis-regulatory networks.

    PubMed

    Chetverina, Darya; Aoki, Tsutomu; Erokhin, Maksim; Georgiev, Pavel; Schedl, Paul

    2014-02-01

    Insulators play a central role in subdividing the chromosome into a series of discrete topologically independent domains and in ensuring that enhancers and silencers contact their appropriate target genes. In this review we first discuss the general characteristics of insulator elements and their associated protein factors. A growing collection of insulator proteins have been identified including a family of proteins whose expression is developmentally regulated. We next consider several unexpected discoveries that require us to completely rethink how insulators function (and how they can best be assayed). These discoveries also require a reevaluation of how insulators might restrict or orchestrate (by preventing or promoting) interactions between regulatory elements and their target genes. We conclude by connecting these new insights into the mechanisms of insulator action to dynamic changes in the three-dimensional topology of the chromatin fiber and the generation of specific patterns of gene activity during development and differentiation.

  17. Long-range evolutionary constraints reveal cis-regulatory interactions on the human X chromosome

    PubMed Central

    Naville, Magali; Ishibashi, Minaka; Ferg, Marco; Bengani, Hemant; Rinkwitz, Silke; Krecsmarik, Monika; Hawkins, Thomas A.; Wilson, Stephen W.; Manning, Elizabeth; Chilamakuri, Chandra S. R.; Wilson, David I.; Louis, Alexandra; Lucy Raymond, F.; Rastegar, Sepand; Strähle, Uwe; Lenhard, Boris; Bally-Cuif, Laure; van Heyningen, Veronica; FitzPatrick, David R.; Becker, Thomas S.; Roest Crollius, Hugues

    2015-01-01

    Enhancers can regulate the transcription of genes over long genomic distances. This is thought to lead to selection against genomic rearrangements within such regions that may disrupt this functional linkage. Here we test this concept experimentally using the human X chromosome. We describe a scoring method to identify evolutionary maintenance of linkage between conserved noncoding elements and neighbouring genes. Chromatin marks associated with enhancer function are strongly correlated with this linkage score. We test >1,000 putative enhancers by transgenesis assays in zebrafish to ascertain the identity of the target gene. The majority of active enhancers drive a transgenic expression in a pattern consistent with the known expression of a linked gene. These results show that evolutionary maintenance of linkage is a reliable predictor of an enhancer's function, and provide new information to discover the genetic basis of diseases caused by the mis-regulation of gene expression. PMID:25908307

  18. A cis-regulatory mutation of PDSS2 causes silky-feather in chickens.

    PubMed

    Feng, Chungang; Gao, Yu; Dorshorst, Ben; Song, Chi; Gu, Xiaorong; Li, Qingyuan; Li, Jinxiu; Liu, Tongxin; Rubin, Carl-Johan; Zhao, Yiqiang; Wang, Yanqiang; Fei, Jing; Li, Huifang; Chen, Kuanwei; Qu, Hao; Shu, Dingming; Ashwell, Chris; Da, Yang; Andersson, Leif; Hu, Xiaoxiang; Li, Ning

    2014-08-01

    Silky-feather has been selected and fixed in some breeds due to its unique appearance. This phenotype is caused by a single recessive gene (hookless, h). Here we map the silky-feather locus to chromosome 3 by linkage analysis and subsequently fine-map it to an 18.9 kb interval using the identical by descent (IBD) method. Further analysis reveals that a C to G transversion located upstream of the prenyl (decaprenyl) diphosphate synthase, subunit 2 (PDSS2) gene is causing silky-feather. All silky-feather birds are homozygous for the G allele. The silky-feather mutation significantly decreases the expression of PDSS2 during feather development in vivo. Consistent with the regulatory effect, the C to G transversion is shown to remarkably reduce PDSS2 promoter activity in vitro. We report a new example of feather structure variation associated with a spontaneous mutation and provide new insight into the PDSS2 function.

  19. A cis-Regulatory Mutation of PDSS2 Causes Silky-Feather in Chickens

    PubMed Central

    Feng, Chungang; Gao, Yu; Dorshorst, Ben; Song, Chi; Gu, Xiaorong; Li, Qingyuan; Li, Jinxiu; Liu, Tongxin; Rubin, Carl-Johan; Zhao, Yiqiang; Wang, Yanqiang; Fei, Jing; Li, Huifang; Chen, Kuanwei; Qu, Hao; Shu, Dingming; Ashwell, Chris; Da, Yang; Andersson, Leif; Hu, Xiaoxiang; Li, Ning

    2014-01-01

    Silky-feather has been selected and fixed in some breeds due to its unique appearance. This phenotype is caused by a single recessive gene (hookless, h). Here we map the silky-feather locus to chromosome 3 by linkage analysis and subsequently fine-map it to an 18.9 kb interval using the identical by descent (IBD) method. Further analysis reveals that a C to G transversion located upstream of the prenyl (decaprenyl) diphosphate synthase, subunit 2 (PDSS2) gene is causing silky-feather. All silky-feather birds are homozygous for the G allele. The silky-feather mutation significantly decreases the expression of PDSS2 during feather development in vivo. Consistent with the regulatory effect, the C to G transversion is shown to remarkably reduce PDSS2 promoter activity in vitro. We report a new example of feather structure variation associated with a spontaneous mutation and provide new insight into the PDSS2 function. PMID:25166907

  20. Characterization of "cis"-regulatory elements ("c"RE) associated with mammary gland function

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The Bos taurus genome assembly has propelled dairy science into a new era; still, most of the information encoded in the genome has not yet been decoded. The human Encyclopedia of DNA Elements (ENCODE) project has spearheaded the identification and annotation of functional genomic elements in the hu...

  1. New cis-regulatory elements in the Rht-D1b locus region of wheat

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Fifteen gene-containing BACs with accumulated length of 1.82-Mb from the Rht-D1b locus region weresequenced and compared in detail with the orthologous regions of rice, sorghum, and maize. Our results show that Rht-D1b represents a conserved genomic region as implied by high gene sequence identity...

  2. PAZAR: a framework for collection and dissemination of cis-regulatory sequence annotation

    PubMed Central

    Portales-Casamar, Elodie; Kirov, Stefan; Lim, Jonathan; Lithwick, Stuart; Swanson, Magdalena I; Ticoll, Amy; Snoddy, Jay; Wasserman, Wyeth W

    2007-01-01

    PAZAR is an open-access and open-source database of transcription factor and regulatory sequence annotation with associated web interface and programming tools for data submission and extraction. Curated boutique data collections can be maintained and disseminated through the unified schema of the mall-like PAZAR repository. The Pleiades Promoter Project collection of brain-linked regulatory sequences is introduced to demonstrate the depth of annotation possible within PAZAR. PAZAR, located at , is open for business. PMID:17916232

  3. Dissection of transcriptional and cis-regulatory control of differentiation in human pancreatic cancer.

    PubMed

    Diaferia, Giuseppe R; Balestrieri, Chiara; Prosperini, Elena; Nicoli, Paola; Spaggiari, Paola; Zerbi, Alessandro; Natoli, Gioacchino

    2016-03-15

    The histological grade of carcinomas describes the ability of tumor cells to organize in differentiated epithelial structures and has prognostic and therapeutic impact. Here, we show that differential usage of the genomic repertoire of transcriptional enhancers leads to grade-specific gene expression programs in human pancreatic ductal adenocarcinoma (PDAC). By integrating gene expression profiling, epigenomic footprinting, and loss-of-function experiments in PDAC cell lines of different grade, we identified the repertoires of enhancers specific to high- and low-grade PDACs and the cognate set of transcription factors acting to maintain their activity. Among the candidate regulators of PDAC differentiation, KLF5 was selectively expressed in pre-neoplastic lesions and low-grade primary PDACs and cell lines, where it maintained the acetylation of grade-specific enhancers, the expression of epithelial genes such as keratins and mucins, and the ability to organize glandular epithelia in xenografts. The identification of the transcription factors controlling differentiation in PDACs will help clarify the molecular bases of its heterogeneity and progression. PMID:26769127

  4. Cis-regulatory programs in the development and evolution of vertebrate paired appendages.

    PubMed

    Gehrke, Andrew R; Shubin, Neil H

    2016-09-01

    Differential gene expression is the core of development, mediating the genetic changes necessary for determining cell identity. The regulation of gene activity by cis-acting elements (e.g., enhancers) is a crucial mechanism for determining differential gene activity by precise control of gene expression in embryonic space and time. Modifications to regulatory regions can have profound impacts on phenotype, and therefore developmental and evolutionary biologists have increasingly focused on elucidating the transcriptional control of genes that build and pattern body plans. Here, we trace the evolutionary history of transcriptional control of three loci key to vertebrate appendage development (Fgf8, Shh, and HoxD/A). Within and across these regulatory modules, we find both complex and flexible regulation in contrast with more fixed enhancers that appear unchanged over vast timescales of vertebrate evolution. The transcriptional control of vertebrate appendage development was likely already incredibly complex in the common ancestor of fish, implying that subtle changes to regulatory networks were more likely responsible for alterations in phenotype rather than the de novo addition of whole regulatory domains. Finally, we discuss the dangers of relying on inter-species transgenesis when testing enhancer function, and call for more controlled regulatory swap experiments when inferring the evolutionary history of enhancer elements. PMID:26783722

  5. Web-based identification of evolutionary conserved DNA cis-regulatory elements.

    PubMed

    Benos, Panayiotis V; Corcoran, David L; Feingold, Eleanor

    2007-01-01

    Transcription regulation on a gene-by-gene basis is achieved through transcription factors, the DNA-binding proteins that recognize short DNA sequences in the proximity of the genes. Unlike other DNA-binding proteins, each transcription factor recognizes a number of sequences, usually variants of a preferred, "consensus" sequence. The degree of dissimilarity of a given target sequence from the consensus is indicative of the binding affinity of the transcription factor-DNA interaction. Because of the short size and the degeneracy of the patterns, it is frequently difficult for a computational algorithm to distinguish between the true sites and the background genomic "noise." One way to overcome this problem of low signal-to-noise ratio is to use evolutionary information to detect signals that are conserved in two or more species. FOOTER is an algorithm that uses this phylogenetic footprinting concept and evaluates putative mammalian transcription factor binding sites in a quantitative way. The user is asked to upload the human and mouse promoter sequences and select the transcription factors to be analyzed. The results' page presents an alignment of the two sequences (color-coded by degree of conservation) and information about the predicted sites and single-nucleotide polymorphisms found around the predicted sites. This chapter presents the main aspects of the underlying method and gives detailed instructions and tips on the use of this web-based tool.

  6. De Novo Regulatory Motif Discovery Identifies Significant Motifs in Promoters of Five Classes of Plant Dehydrin Genes

    PubMed Central

    Zolotarov, Yevgen; Strömvik, Martina

    2015-01-01

    Plants accumulate dehydrins in response to osmotic stresses. Dehydrins are divided into five different classes, which are thought to be regulated in different manners. To better understand differences in transcriptional regulation of the five dehydrin classes, de novo motif discovery was performed on 350 dehydrin promoter sequences from a total of 51 plant genomes. Overrepresented motifs were identified in the promoters of five dehydrin classes. The Kn dehydrin promoters contain motifs linked with meristem specific expression, as well as motifs linked with cold/dehydration and abscisic acid response. KS dehydrin promoters contain a motif with a GATA core. SKn and YnSKn dehydrin promoters contain motifs that match elements connected with cold/dehydration, abscisic acid and light response. YnKn dehydrin promoters contain motifs that match abscisic acid and light response elements, but not cold/dehydration response elements. Conserved promoter motifs are present in the dehydrin classes and across different plant lineages, indicating that dehydrin gene regulation is likely also conserved. PMID:26114291

  7. Tripartite motif 32 prevents pathological cardiac hypertrophy

    PubMed Central

    Huang, Jia; Ji, Yanxiao; Zhang, Xiaojing; Wang, Pixiao; Deng, Keqiong; Jiang, Xi; Ma, Genshan

    2016-01-01

    TRIM32 (tripartite motif 32) is widely accepted to be an E3 ligase that interacts with and eventually ubiquitylates multiple substrates. TRIM32 mutants have been associated with LGMD-2H (limb girdle muscular dystrophy 2H). However, whether TRIM32 is involved in cardiac hypertrophy induced by biomechanical stresses and neurohumoral mediators remains unclear. We generated mice and isolated NRCMs (neonatal rat cardiomyocytes) that overexpressed or were deficient in TRIM32 to investigate the effect of TRIM32 on AB (aortic banding) or AngII (angiotensin II)-mediated cardiac hypertrophy. Echocardiography and both pathological and molecular analyses were used to determine the extent of cardiac hypertrophy and subsequent fibrosis. Our results showed that overexpression of TRIM32 in the heart significantly alleviated the hypertrophic response induced by pressure overload, whereas TRIM32 deficiency dramatically aggravated pathological cardiac remodelling. Similar results were also found in cultured NRCMs incubated with AngII. Mechanistically, the present study suggests that TRIM32 exerts cardioprotective action by interruption of Akt- but not MAPK (mitogen-dependent protein kinase)-dependent signalling pathways. Additionally, inactivation of Akt by LY294002 offset the exacerbated hypertrophic response induced by AB in TRIM32-deficient mice. In conclusion, the present study indicates that TRIM32 plays a protective role in AB-induced pathological cardiac remodelling by blocking Akt-dependent signalling. Therefore TRIM32 could be a novel therapeutic target for the prevention of cardiac hypertrophy and heart failure. PMID:26884348

  8. A motif for infinite metal atom wires.

    PubMed

    Yin, Xi; Warren, Steven A; Pan, Yung-Tin; Tsao, Kai-Chieh; Gray, Danielle L; Bertke, Jeffery; Yang, Hong

    2014-12-15

    A new motif for infinite metal atom wires with tunable compositions and properties is developed based on the connection between metal paddlewheel and square planar complex moieties. Two infinite Pd chain compounds, [Pd4(CO)4(OAc)4Pd(acac)2] 1 and [Pd4(CO)4(TFA)4Pd(acac)2] 2, and an infinite Pd-Pt heterometallic chain compound, [Pd4(CO)4(OAc)4Pt(acac)2] 3, are identified by single-crystal X-ray diffraction analysis. In these new structures, the paddlewheel moiety is a Pd four-membered ring coordinated by bridging carboxylic ligands and μ2 carbonyl ligands. The planar moiety is either Pd(acac)2 or Pt(acac)2 (acac = acetylacetonate). These moieties are connected by metallophilic interactions. The results showed that these one-dimensional metal wire compounds have photoluminescent properties that are tunable by changing ligands and metal ions. 3 can also serve as a single source precursor for making Pd4Pt bimetallic nanostructures with precise control of metal composition.

  9. Regulatory elements of the floral homeotic gene AGAMOUS identified by phylogenetic footprinting and shadowing.

    SciTech Connect

    Hong, R. L., Hamaguchi, L., Busch, M. A., and Weigel, D.

    2003-06-01

    OAK-B135 In Arabidopsis thaliana, cis-regulatory sequences of the floral homeotic gene AGAMOUS (AG) are located in the second intron. This 3 kb intron contains binding sites for two direct activators of AG, LEAFY (LFY) and WUSCHEL (WUS), along with other putative regulatory elements. We have used phylogenetic footprinting and the related technique of phylogenetic shadowing to identify putative cis-regulatory elements in this intron. Among 29 Brassicaceae, several other motifs, but not the LFY and WUS binding sites previously identified, are largely invariant. Using reporter gene analyses, we tested six of these motifs and found that they are all functionally important for activity of AG regulatory sequences in A. thaliana. Although there is little obvious sequence similarity outside the Brassicaceae, the intron from cucumber AG has at least partial activity in A. thaliana. Our studies underscore the value of the comparative approach as a tool that complements gene-by-gene promoter dissection, but also highlight that sequence-based studies alone are insufficient for a complete identification of cis-regulatory sites.

  10. Early illness recognition using frequent motif discovery.

    PubMed

    Hajihashemi, Zahra; Popescu, Mihail

    2015-08-01

    Living alone in their own residence, older adults are at risk for late assessment of physical or cognitive changes due to many factors such as their impression that such changes are simply a normal part of aging or their reluctance to admit to a problem. This paper describes an early illness recognition framework using sensor network technology to identify the health trajectory of older adults reflected in patterns of day-today activities. Describing the behavior of older adults could help clinicians to identify those at the greatest risk for functional decline and adverse events. The proposed framework, denoted as Abnormal Frequent Activity Pattern (AFAP), is based on the identification of known past abnormal frequent activities in current sensor data. More specifically, AFAP declares a day abnormal when past frequent abnormal behavior patterns, not found during normal days, are discovered in the current activity data. While AFAP requires the labeling of past days as normal/abnormal, it doesn't need specific activity identification. Frequent activity patterns (FAP) are found using MEME, a bioinformatics motif detection algorithm. To validate our approach, we used data obtained from TigerPlace, an aging in place community situated in Columbia, MO, where apartments are equipped with sensor networks (motion, bed and depth sensors). A retrospective multiple case study (N=3) design was used to quantify the in-home older adult's daily routines, over a period of two weeks. Within-person variability of routine activities may be used as a new predictor in the study of health trajectories of older adults. PMID:26737096

  11. Targeting functional motifs of a protein family

    NASA Astrophysics Data System (ADS)

    Bhadola, Pradeep; Deo, Nivedita

    2016-10-01

    The structural organization of a protein family is investigated by devising a method based on the random matrix theory (RMT), which uses the physiochemical properties of the amino acid with multiple sequence alignment. A graphical method to represent protein sequences using physiochemical properties is devised that gives a fast, easy, and informative way of comparing the evolutionary distances between protein sequences. A correlation matrix associated with each property is calculated, where the noise reduction and information filtering is done using RMT involving an ensemble of Wishart matrices. The analysis of the eigenvalue statistics of the correlation matrix for the β -lactamase family shows the universal features as observed in the Gaussian orthogonal ensemble (GOE). The property-based approach captures the short- as well as the long-range correlation (approximately following GOE) between the eigenvalues, whereas the previous approach (treating amino acids as characters) gives the usual short-range correlations, while the long-range correlations are the same as that of an uncorrelated series. The distribution of the eigenvector components for the eigenvalues outside the bulk (RMT bound) deviates significantly from RMT observations and contains important information about the system. The information content of each eigenvector of the correlation matrix is quantified by introducing an entropic estimate, which shows that for the β -lactamase family the smallest eigenvectors (low eigenmodes) are highly localized as well as informative. These small eigenvectors when processed gives clusters involving positions that have well-defined biological and structural importance matching with experiments. The approach is crucial for the recognition of structural motifs as shown in β -lactamase (and other families) and selectively identifies the important positions for targets to deactivate (activate) the enzymatic actions.

  12. Notch signaling from the endosome requires a conserved dileucine motif

    PubMed Central

    Zheng, Li; Saunders, Cosmo A.; Sorensen, Erika B.; Waxmonsky, Nicole C.; Conner, Sean D.

    2013-01-01

    Notch signaling is reliant on γ-secretase–mediated processing, although the subcellular location where γ-secretase cleaves Notch to initiate signaling remains unresolved. Accumulating evidence demonstrates that Notch signaling is modulated by endocytosis and endosomal transport. In this study, we investigated the relationship between Notch transport itinerary and signaling capacity. In doing so, we discovered a highly conserved dileucine sorting signal encoded within the cytoplasmic tail that directs Notch to the limiting membrane of the lysosome for signaling. Mutating the dileucine motif led to receptor accumulation in cation-dependent mannose-phosphate receptor–positive tubular early endosomes and a reduction in Notch signaling capacity. Moreover, truncated receptor forms that mimic activated Notch were readily cleaved by γ-secretase within the endosome; however, the cleavage product was proteasome-sensitive and failed to contribute to robust signaling. Collectively these results indicate that Notch signaling from the lysosome limiting membrane is conserved and that receptor targeting to this compartment is an active process. Moreover, the data support a model in which Notch signaling in mammalian systems is initiated from either the plasma membrane or lysosome, but not the early endosome. PMID:23171551

  13. An autoinhibited conformation of LGN reveals a distinct interaction mode between GoLoco motifs and TPR motifs.

    PubMed

    Pan, Zhu; Zhu, Jinwei; Shang, Yuan; Wei, Zhiyi; Jia, Min; Xia, Caihao; Wen, Wenyu; Wang, Wenning; Zhang, Mingjie

    2013-06-01

    LGN plays essential roles in asymmetric cell divisions via its N-terminal TPR-motif-mediated binding to mInsc and NuMA. This scaffolding activity requires the release of the autoinhibited conformation of LGN by binding of Gα(i) to its C-terminal GoLoco (GL) motifs. The interaction between the GL and TPR motifs of LGN represents a distinct GL/target binding mode with an unknown mechanism. Here, we show that two consecutive GL motifs of LGN form a minimal TPR-motif-binding unit. GL12 and GL34 bind to TPR0-3 and TPR4-7, respectively. The crystal structure of a truncated LGN reveals that GL34 forms a pair of parallel α helices and binds to the concave surface of TPR4-7, thereby preventing LGN from binding to other targets. Importantly, the GLs bind to TPR motifs with a mode distinct from that observed in the GL/Gα(i)·GDP complexes. Our results also indicate that multiple and orphan GL motif proteins likely respond to G proteins with distinct mechanisms.

  14. A Convex Atomic-Norm Approach to Multiple Sequence Alignment and Motif Discovery

    PubMed Central

    Yen, Ian E. H.; Lin, Xin; Zhang, Jiong; Ravikumar, Pradeep; Dhillon, Inderjit S.

    2016-01-01

    Multiple Sequence Alignment and Motif Discovery, known as NP-hard problems, are two fundamental tasks in Bioinformatics. Existing approaches to these two problems are based on either local search methods such as Expectation Maximization (EM), Gibbs Sampling or greedy heuristic methods. In this work, we develop a convex relaxation approach to both problems based on the recent concept of atomic norm and develop a new algorithm, termed Greedy Direction Method of Multiplier, for solving the convex relaxation with two convex atomic constraints. Experiments show that our convex relaxation approach produces solutions of higher quality than those standard tools widely-used in Bioinformatics community on the Multiple Sequence Alignment and Motif Discovery problems. PMID:27559428

  15. Native characterization of nucleic acid motif thermodynamics via non-covalent catalysis

    PubMed Central

    Wang, Chunyan; Bae, Jin H.; Zhang, David Yu

    2016-01-01

    DNA hybridization thermodynamics is critical for accurate design of oligonucleotides for biotechnology and nanotechnology applications, but parameters currently in use are inaccurately extrapolated based on limited quantitative understanding of thermal behaviours. Here, we present a method to measure the ΔG° of DNA motifs at temperatures and buffer conditions of interest, with significantly better accuracy (6- to 14-fold lower s.e.) than prior methods. The equilibrium constant of a reaction with thermodynamics closely approximating that of a desired motif is numerically calculated from directly observed reactant and product equilibrium concentrations; a DNA catalyst is designed to accelerate equilibration. We measured the ΔG° of terminal fluorophores, single-nucleotide dangles and multinucleotide dangles, in temperatures ranging from 10 to 45 °C. PMID:26782977

  16. A Common Structural Motif in the Binding of Virulence Factors to Bacterial Secretion Chaperones

    SciTech Connect

    Lilic,M.; Vujanac, M.; Stebbins, C.

    2006-01-01

    Salmonella invasion protein A (SipA) is translocated into host cells by a type III secretion system (T3SS) and comprises two regions: one domain binds its cognate type III secretion chaperone, InvB, in the bacterium to facilitate translocation, while a second domain functions in the host cell, contributing to bacterial uptake by polymerizing actin. We present here the crystal structures of the SipA chaperone binding domain (CBD) alone and in complex with InvB. The SipA CBD is found to consist of a nonglobular polypeptide as well as a large globular domain, both of which are necessary for binding to InvB. We also identify a structural motif that may direct virulence factors to their cognate chaperones in a diverse range of pathogenic bacteria. Disruption of this structural motif leads to a destabilization of several chaperone-substrate complexes from different species, as well as an impairment of secretion in Salmonella.

  17. Native characterization of nucleic acid motif thermodynamics via non-covalent catalysis

    NASA Astrophysics Data System (ADS)

    Wang, Chunyan; Bae, Jin H.; Zhang, David Yu

    2016-01-01

    DNA hybridization thermodynamics is critical for accurate design of oligonucleotides for biotechnology and nanotechnology applications, but parameters currently in use are inaccurately extrapolated based on limited quantitative understanding of thermal behaviours. Here, we present a method to measure the ΔG° of DNA motifs at temperatures and buffer conditions of interest, with significantly better accuracy (6- to 14-fold lower s.e.) than prior methods. The equilibrium constant of a reaction with thermodynamics closely approximating that of a desired motif is numerically calculated from directly observed reactant and product equilibrium concentrations; a DNA catalyst is designed to accelerate equilibration. We measured the ΔG° of terminal fluorophores, single-nucleotide dangles and multinucleotide dangles, in temperatures ranging from 10 to 45 °C.

  18. Native characterization of nucleic acid motif thermodynamics via non-covalent catalysis.

    PubMed

    Wang, Chunyan; Bae, Jin H; Zhang, David Yu

    2016-01-01

    DNA hybridization thermodynamics is critical for accurate design of oligonucleotides for biotechnology and nanotechnology applications, but parameters currently in use are inaccurately extrapolated based on limited quantitative understanding of thermal behaviours. Here, we present a method to measure the ΔG° of DNA motifs at temperatures and buffer conditions of interest, with significantly better accuracy (6- to 14-fold lower s.e.) than prior methods. The equilibrium constant of a reaction with thermodynamics closely approximating that of a desired motif is numerically calculated from directly observed reactant and product equilibrium concentrations; a DNA catalyst is designed to accelerate equilibration. We measured the ΔG° of terminal fluorophores, single-nucleotide dangles and multinucleotide dangles, in temperatures ranging from 10 to 45 °C.

  19. A million peptide motifs for the molecular biologist.

    PubMed

    Tompa, Peter; Davey, Norman E; Gibson, Toby J; Babu, M Madan

    2014-07-17

    A molecular description of functional modules in the cell is the focus of many high-throughput studies in the postgenomic era. A large portion of biomolecular interactions in virtually all cellular processes is mediated by compact interaction modules, referred to as peptide motifs. Such motifs are typically less than ten residues in length, occur within intrinsically disordered regions, and are recognized and/or posttranslationally modified by structured domains of the interacting partner. In this review, we suggest that there might be over a million instances of peptide motifs in the human proteome. While this staggering number suggests that peptide motifs are numerous and the most understudied functional module in the cell, it also holds great opportunities for new discoveries. PMID:25038412

  20. Local graph alignment and motif search in biological networks

    NASA Astrophysics Data System (ADS)

    Berg, Johannes; Lässig, Michael

    2004-10-01

    Interaction networks are of central importance in postgenomic molecular biology, with increasing amounts of data becoming available by high-throughput methods. Examples are gene regulatory networks or protein interaction maps. The main challenge in the analysis of these data is to read off biological functions from the topology of the network. Topological motifs, i.e., patterns occurring repeatedly at different positions in the network, have recently been identified as basic modules of molecular information processing. In this article, we discuss motifs derived from families of mutually similar but not necessarily identical patterns. We establish a statistical model for the occurrence of such motifs, from which we derive a scoring function for their statistical significance. Based on this scoring function, we develop a search algorithm for topological motifs called graph alignment, a procedure with some analogies to sequence alignment. The algorithm is applied to the gene regulation network of Escherichia coli.

  1. DETAIL OF CORNICE MOULDING WITH RAM'S HEAD MOTIF. EIGHT SHADES ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    DETAIL OF CORNICE MOULDING WITH RAM'S HEAD MOTIF. EIGHT SHADES OF GOLD LEAF AND BURNISHED GOLD LEAF WERE USED FOR THE INTERIOR FINISHES. - Anaconda Historic District, Washoe Theater, 305 Main Street, Anaconda, Deer Lodge County, MT

  2. 10. DETAIL OF CORNICE MOULDING WITH RAM'S HEAD MOTIF. EIGHT ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    10. DETAIL OF CORNICE MOULDING WITH RAM'S HEAD MOTIF. EIGHT SHADES OF GOLD LEAF AND BURNISHED GOLD LEAF WERE USED FOR THE INTERIOR FINISHES - Anaconda Historic District, Washoe Theater, 305 Main Street, Anaconda, Deer Lodge County, MT

  3. Transmembrane helix dimerization: beyond the search for sequence motifs.

    PubMed

    Li, Edwin; Wimley, William C; Hristova, Kalina

    2012-02-01

    Studies of the dimerization of transmembrane (TM) helices have been ongoing for many years now, and have provided clues to the fundamental principles behind membrane protein (MP) folding. Our understanding of TM helix dimerization has been dominated by the idea that sequence motifs, simple recognizable amino acid sequences that drive lateral interaction, can be used to explain and predict the lateral interactions between TM helices in membrane proteins. But as more and more unique interacting helices are characterized, it is becoming clear that the sequence motif paradigm is incomplete. Experimental evidence suggests that the search for sequence motifs, as mediators of TM helix dimerization, cannot solve the membrane protein folding problem alone. Here we review the current understanding in the field, as it has evolved from the paradigm of sequence motifs into a view in which the interactions between TM helices are much more complex. This article is part of a Special Issue entitled: Membrane protein structure and function.

  4. Macrocyclization of the ATCUN Motif Controls Metal Binding and Catalysis

    PubMed Central

    Neupane, Kosh P.; Aldous, Amanda R.; Kritzer, Joshua A.

    2013-01-01

    We report the design, synthesis and characterization of macrocyclic analogs of the amino-terminal copper and nickel binding (ATCUN) motif. These macrocycles have altered pH transitions for metal binding, and unlike linear ATCUN motifs, the optimal cyclic peptide 1 binds Cu(II) selectively over Ni(II) at physiological pH. UV-vis and EPR spectroscopy showed that cyclic peptide 1 can coordinate Cu(II) or Ni(II) in a square planar geometry. Metal binding titration and ESI-MS data revealed a 1:1 binding stoichiometry. Macrocyclization allows for coordination of Cu(II) or Ni(II) as in linear ATCUN motifs, but with enhanced DNA cleavage by the Cu(II)-1 complex relative to linear analogs. The Cu(II)-1 complex was also capable of producing diffusible hydroxyl radicals, which is unique among ATCUN motifs and most other common copper(II) chelators. PMID:23421754

  5. Motif-Synchronization: A new method for analysis of dynamic brain networks with EEG

    NASA Astrophysics Data System (ADS)

    Rosário, R. S.; Cardoso, P. T.; Muñoz, M. A.; Montoya, P.; Miranda, J. G. V.

    2015-12-01

    The major aim of this work was to propose a new association method known as Motif-Synchronization. This method was developed to provide information about the synchronization degree and direction between two nodes of a network by counting the number of occurrences of some patterns between any two time series. The second objective of this work was to present a new methodology for the analysis of dynamic brain networks, by combining the Time-Varying Graph (TVG) method with a directional association method. We further applied the new algorithms to a set of human electroencephalogram (EEG) signals to perform a dynamic analysis of the brain functional networks (BFN).

  6. Acetylation of the KXGS motifs in tau is a critical determinant in modulation of tau aggregation and clearance

    PubMed Central

    Cook, Casey; Carlomagno, Yari; Gendron, Tania F.; Dunmore, Judy; Scheffel, Kristyn; Stetler, Caroline; Davis, Mary; Dickson, Dennis; Jarpe, Matthew; DeTure, Michael; Petrucelli, Leonard

    2014-01-01

    The accumulation of hyperphosphorylated tau in neurofibrillary tangles (NFTs) is a neuropathological hallmark of tauopathies, including Alzheimer's disease (AD) and chronic traumatic encephalopathy, but effective therapies directly targeting the tau protein are currently lacking. Herein, we describe a novel mechanism in which the acetylation of tau on KXGS motifs inhibits phosphorylation on this same motif, and also prevents tau aggregation. Using a site-specific antibody to detect acetylation of KXGS motifs, we demonstrate that these sites are hypoacetylated in patients with AD, as well as a mouse model of tauopathy, suggesting that loss of acetylation on KXGS motifs renders tau vulnerable to pathogenic insults. Furthermore, we identify histone deacetylase 6 (HDAC6) as the enzyme responsible for the deacetylation of these residues, and provide proof of concept that acute treatment with a selective and blood–brain barrier-permeable HDAC6 inhibitor enhances acetylation and decreases phosphorylation on tau's KXGS motifs in vivo. As such, we have uncovered a novel therapeutic pathway that can be manipulated to block the formation of pathogenic tau species in disease. PMID:23962722

  7. S6:S18 ribosomal protein complex interacts with a structural motif present in its own mRNA

    PubMed Central

    Matelska, Dorota; Purta, Elzbieta; Panek, Sylwia; Boniecki, Michal J.; Bujnicki, Janusz M.; Dunin-Horkawicz, Stanislaw

    2013-01-01

    Prokaryotic ribosomal protein genes are typically grouped within highly conserved operons. In many cases, one or more of the encoded proteins not only bind to a specific site in the ribosomal RNA, but also to a motif localized within their own mRNA, and thereby regulate expression of the operon. In this study, we computationally predicted an RNA motif present in many bacterial phyla within the 5′ untranslated region of operons encoding ribosomal proteins S6 and S18. We demonstrated that the S6:S18 complex binds to this motif, which we hereafter refer to as the S6:S18 complex-binding motif (S6S18CBM). This motif is a conserved CCG sequence presented in a bulge flanked by a stem and a hairpin structure. A similar structure containing a CCG trinucleotide forms the S6:S18 complex binding site in 16S ribosomal RNA. We have constructed a 3D structural model of a S6:S18 complex with S6S18CBM, which suggests that the CCG trinucleotide in a specific structural context may be specifically recognized by the S18 protein. This prediction was supported by site-directed mutagenesis of both RNA and protein components. These results provide a molecular basis for understanding protein-RNA recognition and suggest that the S6S18CBM is involved in an auto-regulatory mechanism. PMID:23980204

  8. Identification of internal transcribed spacer sequence motifs in truffles: a first step toward their DNA bar coding.

    PubMed

    El Karkouri, Khalid; Murat, Claude; Zampieri, Elisa; Bonfante, Paola

    2007-08-01

    This work presents DNA sequence motifs from the internal transcribed spacer (ITS) of the nuclear rRNA repeat unit which are useful for the identification of five European and Asiatic truffles (Tuber magnatum, T. melanosporum, T. indicum, T. aestivum, and T. mesentericum). Truffles are edible mycorrhizal ascomycetes that show similar morphological characteristics but that have distinct organoleptic and economic values. A total of 36 out of 46 ITS1 or ITS2 sequence motifs have allowed an accurate in silico distinction of the five truffles to be made (i.e., by pattern matching and/or BLAST analysis on downloaded GenBank sequences and directly against GenBank databases). The motifs considered the intraspecific genetic variability of each species, including rare haplotypes, and assigned their respective species from either the ascocarps or ectomycorrhizas. The data indicate that short ITS1 or ITS2 motifs (< or = 50 bp in size) can be considered promising tools for truffle species identification. A dot blot hybridization analysis of T. magnatum and T. melanosporum compared with other close relatives or distant lineages allowed at least one highly specific motif to be identified for each species. These results were confirmed in a blind test which included new field isolates. The current work has provided a reliable new tool for a truffle oligonucleotide bar code and identification in ecological and evolutionary studies. PMID:17601808

  9. Network motif-based method for identifying coronary artery disease

    PubMed Central

    LI, YIN; CONG, YAN; ZHAO, YUN

    2016-01-01

    The present study aimed to develop a more efficient method for identifying coronary artery disease (CAD) than the conventional method using individual differentially expressed genes (DEGs). GSE42148 gene microarray data were downloaded, preprocessed and screened for DEGs. Additionally, based on transcriptional regulation data obtained from ENCODE database and protein-protein interaction data from the HPRD, the common genes were downloaded and compared with genes annotated from gene microarrays to screen additional common genes in order to construct an integrated regulation network. FANMOD was then used to detect significant three-gene network motifs. Subsequently, GlobalAncova was used to screen differential three-gene network motifs between the CAD group and the normal control data from GSE42148. Genes involved in the differential network motifs were then subjected to functional annotation and pathway enrichment analysis. Finally, clustering analysis of the CAD and control samples was performed based on individual DEGs and the top 20 network motifs identified. In total, 9,008 significant three-node network motifs were detected from the integrated regulation network; these were categorized into 22 interaction modes, each containing a minimum of one transcription factor. Subsequently, 1,132 differential network motifs involving 697 genes were screened between the CAD and control group. The 697 genes were enriched in 154 gene ontology terms, including 119 biological processes, and 14 KEGG pathways. Identifying patients with CAD based on the top 20 network motifs provided increased accuracy compared with the conventional method based on individual DEGs. The results of the present study indicate that the network motif-based method is more efficient and accurate for identifying CAD patients than the conventional method based on individual DEGs. PMID:27347046

  10. Discovering Motifs in Biological Sequences Using the Micron Automata Processor.

    PubMed

    Roy, Indranil; Aluru, Srinivas

    2016-01-01

    Finding approximately conserved sequences, called motifs, across multiple DNA or protein sequences is an important problem in computational biology. In this paper, we consider the (l, d) motif search problem of identifying one or more motifs of length l present in at least q of the n given sequences, with each occurrence differing from the motif in at most d substitutions. The problem is known to be NP-complete, and the largest solved instance reported to date is (26,11). We propose a novel algorithm for the (l,d) motif search problem using streaming execution over a large set of non-deterministic finite automata (NFA). This solution is designed to take advantage of the micron automata processor, a new technology close to deployment that can simultaneously execute multiple NFA in parallel. We demonstrate the capability for solving much larger instances of the (l, d) motif search problem using the resources available within a single automata processor board, by estimating run-times for problem instances (39,18) and (40,17). The paper serves as a useful guide to solving problems using this new accelerator technology. PMID:26886735

  11. An experimental test of a fundamental food web motif.

    PubMed

    Rip, Jason M K; McCann, Kevin S; Lynn, Denis H; Fawcett, Sonia

    2010-06-01

    Large-scale changes to the world's ecosystem are resulting in the deterioration of biostructure-the complex web of species interactions that make up ecological communities. A difficult, yet crucial task is to identify food web structures, or food web motifs, that are the building blocks of this baroque network of interactions. Once identified, these food web motifs can then be examined through experiments and theory to provide mechanistic explanations for how structure governs ecosystem stability. Here, we synthesize recent ecological research to show that generalist consumers coupling resources with different interaction strengths, is one such motif. This motif amazingly occurs across an enormous range of spatial scales, and so acts to distribute coupled weak and strong interactions throughout food webs. We then perform an experiment that illustrates the importance of this motif to ecological stability. We find that weak interactions coupled to strong interactions by generalist consumers dampen strong interaction strengths and increase community stability. This study takes a critical step by isolating a common food web motif and through clear, experimental manipulation, identifies the fundamental stabilizing consequences of this structure for ecological communities. PMID:20129988

  12. An experimental test of a fundamental food web motif

    PubMed Central

    Rip, Jason M. K.; McCann, Kevin S.; Lynn, Denis H.; Fawcett, Sonia

    2010-01-01

    Large-scale changes to the world's ecosystem are resulting in the deterioration of biostructure—the complex web of species interactions that make up ecological communities. A difficult, yet crucial task is to identify food web structures, or food web motifs, that are the building blocks of this baroque network of interactions. Once identified, these food web motifs can then be examined through experiments and theory to provide mechanistic explanations for how structure governs ecosystem stability. Here, we synthesize recent ecological research to show that generalist consumers coupling resources with different interaction strengths, is one such motif. This motif amazingly occurs across an enormous range of spatial scales, and so acts to distribute coupled weak and strong interactions throughout food webs. We then perform an experiment that illustrates the importance of this motif to ecological stability. We find that weak interactions coupled to strong interactions by generalist consumers dampen strong interaction strengths and increase community stability. This study takes a critical step by isolating a common food web motif and through clear, experimental manipulation, identifies the fundamental stabilizing consequences of this structure for ecological communities. PMID:20129988

  13. cWINNOWER Algorithm for Finding Fuzzy DNA Motifs

    NASA Technical Reports Server (NTRS)

    Liang, Shoudan

    2003-01-01

    The cWINNOWER algorithm detects fuzzy motifs in DNA sequences rich in protein-binding signals. A signal is defined as any short nucleotide pattern having up to d mutations differing from a motif of length l. The algorithm finds such motifs if multiple mutated copies of the motif (i.e., the signals) are present in the DNA sequence in sufficient abundance. The cWINNOWER algorithm substantially improves the sensitivity of the winnower method of Pevzner and Sze by imposing a consensus constraint, enabling it to detect much weaker signals. We studied the minimum number of detectable motifs qc as a function of sequence length N for random sequences. We found that qc increases linearly with N for a fast version of the algorithm based on counting three-member sub-cliques. Imposing consensus constraints reduces qc, by a factor of three in this case, which makes the algorithm dramatically more sensitive. Our most sensitive algorithm, which counts four-member sub-cliques, needs a minimum of only 13 signals to detect motifs in a sequence of length N = 12000 for (l,d) = (15,4).

  14. Survey on the PABC recognition motif PAM2.

    PubMed

    Albrecht, Mario; Lengauer, Thomas

    2004-03-26

    The PABP-interacting motif PAM2 has been identified in various eukaryotic proteins as an important binding site for the PABC domain. This domain is contained in homologs of the poly(A)-binding protein PABP and the ubiquitin-protein ligase HYD. Despite the importance of the PAM2 motif, a comprehensive analysis of its occurrence in different proteins has been missing. Using iterated sequence profile searches, we obtained an extensive list of proteins carrying the PAM2 motif. We discuss their functional context and domain architecture, which often consists of RNA-binding domains. Our list of PAM2 motif proteins includes eukaryotic homologs of eRF3/GSPT1/2, PAIP1/2, Tob1/2, Ataxin-2, RBP37, RBP1, Blackjack, HELZ, TPRD, USP10, ERD15, C1D4.14, and the viral protease P29. The identification of the PAM2 motif in as yet uncharacterized proteins can give valuable hints with respect to their cellular function and potential interaction partners and suggests further experimentation. It is also striking that the PAM2 motif appears to occur solely outside globular protein domains.

  15. Finding specific RNA motifs: Function in a zeptomole world?

    PubMed Central

    KNIGHT, ROB; YARUS, MICHAEL

    2003-01-01

    We have developed a new method for estimating the abundance of any modular (piecewise) RNA motif within a longer random region. We have used this method to estimate the size of the active motifs available to modern SELEX experiments (picomoles of unique sequences) and to a plausible RNA World (zeptomoles of unique sequences: 1 zmole = 602 sequences). Unexpectedly, activities such as specific isoleucine binding are almost certainly present in zeptomoles of molecules, and even ribozymes such as self-cleavage motifs may appear (depending on assumptions about the minimal structures). The number of specified nucleotides is not the only important determinant of a motif’s rarity: The number of modules into which it is divided, and the details of this division, are also crucial. We propose three maxims for easily isolated motifs: the Maxim of Minimization, the Maxim of Multiplicity, and the Maxim of the Median. These maxims together state that selected motifs should be small and composed of as many separate, equally sized modules as possible. For evenly divided motifs with four modules, the largest accessible activity in picomole scale (1–1000 pmole) pools of length 100 is about 34 nucleotides; while for zeptomole scale (1–1000 zmole) pools it is about 20 specific nucleotides (50% probability of occurrence). This latter figure includes some ribozymes and aptamers. Consequently, an RNA metabolism apparently could have begun with only zeptomoles of RNA molecules. PMID:12554865

  16. PRINTS--a database of protein motif fingerprints.

    PubMed

    Attwood, T K; Beck, M E; Bleasby, A J; Parry-Smith, D J

    1994-09-01

    PRINTS is a compendium of protein motif 'fingerprints'. A fingerprint is defined as a group of motifs excised from conserved regions of a sequence alignment, whose diagnostic power or potency is refined by iterative databasescanning (in this case the OWL composite sequence database). Generally, the motifs do not overlap, but are separated along a sequence, though they may be contiguous in 3D-space. The use of groups of independent, linearly- or spatially-distinct motifs allows protein folds and functionalities to be characterised more flexibly and powerfully than conventional single-component patterns or regular expressions. The current version of the database contains 200 entries (encoding 950 motifs), covering a wide range of globular and membrane proteins, modular polypeptides, and so on. The growth of the databaseis influenced by a number of factors; e.g. the use of multiple motifs; the maximisation of sequence information through iterative database scanning; and the fact that the database searched is a large composite. The information contained within PRINTS is distinct from, but complementary to the consensus expressions stored in the widely-used PROSITE dictionary of patterns.

  17. Motif types, motif locations and base composition patterns around the RNA polyadenylation site in microorganisms, plants and animals

    PubMed Central

    2014-01-01

    Background The polyadenylation of RNA is critical for gene functioning, but the conserved sequence motifs (often called signal or signature motifs), motif locations and abundances, and base composition patterns around mRNA polyadenylation [poly(A)] sites are still uncharacterized in most species. The evolutionary tendency for poly(A) site selection is still largely unknown. Results We analyzed the poly(A) site regions of 31 species or phyla. Different groups of species showed different poly(A) signal motifs: UUACUU at the poly(A) site in the parasite Trypanosoma cruzi; UGUAAC (approximately 13 bases upstream of the site) in the alga Chlamydomonas reinhardtii; UGUUUG (or UGUUUGUU) at mainly the fourth base downstream of the poly(A) site in the parasite Blastocystis hominis; and AAUAAA at approximately 16 bases and approximately 19 bases upstream of the poly(A) site in animals and plants, respectively. Polyadenylation signal motifs are usually several hundred times more abundant around poly(A) sites than in whole genomes. These predominant motifs usually had very specific locations, whether upstream of, at, or downstream of poly(A) sites, depending on the species or phylum. The poly(A) site was usually an adenosine (A) in all analyzed species except for B. hominis, and there was weak A predominance in C. reinhardtii. Fungi, animals, plants, and the protist Phytophthora infestans shared a general base abundance pattern (or base composition pattern) of “U-rich—A-rich—U-rich—Poly(A) site—U-rich regions”, or U-A-U-A-U for short, with some variation for each kingdom or subkingdom. Conclusion This study identified the poly(A) signal motifs, motif locations, and base composition patterns around mRNA poly(A) sites in protists, fungi, plants, and animals and provided insight into poly(A) site evolution. PMID:25052519

  18. Analysis of the Key Elements of FFAT-Like Motifs Identifies New Proteins That Potentially Bind VAP on the ER, Including Two AKAPs and FAPP2

    PubMed Central

    Mikitova, Veronika; Levine, Timothy P.

    2012-01-01

    Background Two phenylalanines (FF) in an acidic tract (FFAT)-motifs were originally described as having seven elements: an acidic flanking region followed by 6 residues (EFFDA–E). Such motifs are found in several lipid transfer protein (LTP) families, and they interact with a protein on the cytosolic face of the ER called vesicle-associated membrane protein-associated protein (VAP). Mutation of which causes ER stress and motor neuron disease, making it important to determine which proteins bind VAP. Among other proteins that bind VAP, some contain FFAT-like motifs that are missing one or more of the seven elements. Defining how much variation is tolerated in FFAT-like motifs is a preliminary step prior to the identification of the full range of VAP interactors. Results We used a quantifiable in vivo system that measured ER targeting in a reporter yeast strain that over-expressed VAP to study the effect of substituting different elements of FFAT-like motifs in turn. By defining FFAT-like motifs more widely than before, we found them in novel proteins the functions of which had not previously been directly linked to the ER, including: two PKA anchoring proteins, AKAP220 and AKAP110; a family of plant LTPs; and the glycolipid LTP phosphatidylinositol-four-phosphate adaptor-protein-2 (FAPP-2). Conclusion All of the seven essential elements of a FFAT motif tolerate variation, and weak targeting to the ER via VAP is still detected if two elements are substituted. In addition to the strong FFAT motifs already known, there are additional proteins with weaker FFAT-like motifs, which might be functionally important VAP interactors. PMID:22276202

  19. Generation of high-performance binding proteins for peptide motifs by affinity clamping

    PubMed Central

    Koide, Shohei; Huang, Jin

    2013-01-01

    We describe concepts and methodologies for generating “Affinity Clamps”, a new class of recombinant binding proteins that achieve high affinity and high specificity toward short peptide motifs of biological importance, which is a major challenge in protein engineering. The Affinity Clamping concept exploits the potential of nonhomologous recombination of protein domains in generating large changes in protein function and the inherent binding affinity and specificity of the so-called modular interaction domains toward short peptide motifs. Affinity Clamping creates a clamshell architecture that clamps onto a target peptide. The design processes involve (i) choosing a starting modular interaction domain appropriate for the target and applying structure-guided modifications, (ii) attaching a second domain, termed “enhancer domain” and (iii) optimizing the peptide-binding site located between the domains by directed evolution. The two connected domains work synergistically to achieve high levels of affinity and specificity that are unattainable with either domain alone. Because of the simple and modular architecture, affinity clamps are particularly well suited as building blocks for designing more complex functionalities. Affinity Clamping represents a major advance in protein design that is broadly applicable to the recognition of peptide motifs. PMID:23422435

  20. [A primary study of evolution of hepatitis B virus based on motif discovery].

    PubMed

    Ma, Lei; Yi, Qing-Qing; Zhang, Qi; He, Jian-Feng

    2014-01-01

    Hepatitis B is a serious infectious disease worldwide, and hepatitis B virus (HBV) is the direct cause of this disease. In recent years, as an essential part of its evolutionary process, HBV mutation has been extensively studied domestically and globally. However, the study on the conserved sequences in HBV sequences is still in its infancy. In this study, we applied multiple EM for motif elicitation (MEME) algorithm to discover HBV motif and proposed a new metric, conservative index (CI), to carry out phylogenetic analysis based on HBV sequences. Then, the constructed phylogenetic tree was subjected to reliability assessment. The results demonstrated that the new metric CI combined with the MEME algorithm can effectively help to discover motifs in HBV sequences and construct a phylogenetic tree based on them and to analyze the evolutionary relationship between HBV sequences; in addition, the possible ancestral sequences of samples may be obtained by conservative analysis. The proposed method is valuable for the exploratory study on large HBV sequence data sets. PMID:24772892

  1. Network-dosage compensation topologies as recurrent network motifs in natural gene networks

    PubMed Central

    2014-01-01

    Background Global noise in gene expression and chromosome duplication during cell-cycle progression cause inevitable fluctuations in the effective number of copies of gene networks in cells. These indirect and direct alterations of network copy numbers have the potential to change the output or activity of a gene network. For networks whose specific activity levels are crucial for optimally maintaining cellular functions, cells need to implement mechanisms to robustly compensate the effects of network dosage fluctuations. Results Here, we determine the necessary conditions for generalized N-component gene networks to be network-dosage compensated and show that the compensation mechanism can robustly operate over large ranges of gene expression levels. Furthermore, we show that the conditions that are necessary for network-dosage compensation are also sufficient. Finally, using genome-wide protein-DNA and protein-protein interaction data, we search the yeast genome for the abundance of specific dosage-compensation motifs and show that a substantial percentage of the natural networks identified contain at least one dosage-compensation motif. Conclusions Our results strengthen the hypothesis that the special network topologies that are necessary for network-dosage compensation may be recurrent network motifs in eukaryotic genomes and therefore may be an important design principle in gene network assembly in cells. PMID:24929807

  2. Ubiquitous presence of the hammerhead ribozyme motif along the tree of life

    PubMed Central

    de la Peña, Marcos; García-Robles, Inmaculada

    2010-01-01

    Examples of small self-cleaving RNAs embedded in noncoding regions already have been found to be involved in the control of gene expression, although their origin remains uncertain. In this work, we show the widespread occurrence of the hammerhead ribozyme (HHR) motif among genomes from the Bacteria, Chromalveolata, Plantae, and Metazoa kingdoms. Intergenic HHRs were detected in three different bacterial genomes, whereas metagenomic data from Galapagos Islands showed the occurrence of similar ribozymes that could be regarded as direct relics from the RNA world. Among eukaryotes, HHRs were detected in the genomes of three water molds as well as 20 plant species, ranging from unicellular algae to vascular plants. These HHRs were very similar to those previously described in small RNA plant pathogens and, in some cases, appeared as close tandem repetitions. A parallel situation of tandemly repeated HHR motifs was also detected in the genomes of lower metazoans from cnidarians to invertebrates, with special emphasis among hematophagous and parasitic organisms. Altogether, these findings unveil the HHR as a widespread motif in DNA genomes, which would be involved in new forms of retrotransposable elements. PMID:20705646

  3. Gemin5 proteolysis reveals a novel motif to identify L protease targets.

    PubMed

    Piñeiro, David; Ramajo, Jorge; Bradrick, Shelton S; Martínez-Salas, Encarnación

    2012-06-01

    Translation of picornavirus RNA is governed by the internal ribosome entry site (IRES) element, directing the synthesis of a single polyprotein. Processing of the polyprotein is performed by viral proteases that also recognize as substrates host factors. Among these substrates are translation initiation factors and RNA-binding proteins whose cleavage is responsible for inactivation of cellular gene expression. Foot-and-mouth disease virus (FMDV) encodes two proteases, L(pro) and 3C(pro). Widespread definition of L(pro) targets suffers from the lack of a sufficient number of characterized substrates. Here, we report the proteolysis of the IRES-binding protein Gemin5 in FMDV-infected cells, but not in cells infected by other picornaviruses. Proteolysis was specifically associated with expression of L(pro), yielding two stable products, p85 and p57. In silico search of putative L targets within Gemin5 identified two sequences whose potential recognition was in agreement with proteolysis products observed in infected cells. Mutational analysis revealed a novel L(pro) target sequence that included the RKAR motif. Confirming this result, the Fas-ligand Daxx, was proteolysed in FMDV-infected and L(pro)-expressing cells. This protein carries a RRLR motif whose substitution to EELR abrogated L(pro) recognition. Thus, the sequence (R)(R/K)(L/A)(R) defines a novel motif to identify putative targets of L(pro) in host factors.

  4. Sequence motifs of myelin membrane proteins: towards the molecular basis of diseases.

    PubMed

    Sedzik, Jan; Jastrzebski, Jan Pawel; Ikenaka, Kazuhiro

    2013-04-01

    The shortest sequence of amino acids in protein containing functional and structural information is a "motif." To understand myelin protein functions, we intensively searched for motifs that can be found in myelin proteins. Some myelin proteins had several different motifs or repetition of the same motif. The most abundant motif found among myelin proteins was a myristoylation motif. Bovine MAG held 11 myristoylation motifs and human myelin basic protein held as many as eight such motifs. PMP22 had the fewest myristoylation motifs, which was only one; rat PMP22 contained no such motifs. Cholesterol recognition/interaction amino-acid consensus (CRAC) motif was not found in myelin basic protein. P2 protein of different species contained only one CRAC motif, except for P2 of horse, which had no such motifs. MAG, MOG, and P0 were very rich in CRAC, three to eight motifs per protein. The analysis of motifs in myelin proteins is expected to provide structural insight and refinement of predicted 3D models for which structures are as yet unknown. Analysis of motifs in mutant proteins associated with neurological diseases uncovered that some motifs disappeared in P0 with mutation found in neurological diseases. There are 2,500 motifs deposited in a databank, but 21 were found in myelin proteins, which is only 1% of the total known motifs. There was great variability in the number of motifs among proteins from different species. The appearance or disappearance of protein motifs after gaining point mutation in the protein related to neurological diseases was very interesting. PMID:23339078

  5. Pseudouridine synthases: four families of enzymes containing a putative uridine-binding motif also conserved in dUTPases and dCTP deaminases.

    PubMed

    Koonin, E V

    1996-06-15

    Using a combination of several methods for protein sequence comparison and motif analysis, it is shown that the four recently described pseudouridine syntheses with different specificities belong to four distinct families. Three of these families share two conserved motifs that are likely to be directly involved in catalysis. One of these motifs is detected also in two other families of enzymes that specifically bind uridine, namely deoxycitidine triphosphate deaminases and deoxyuridine triphosphatases. It is proposed that this motif is an essential part of the uridine-binding site. Two of the pseudouridine syntheses, one of which modifies the anticodon arm of tRNAs and the other is predicted to modify a portion of the large ribosomal subunit RNA belonging to the peptidyltransferase center, are encoded in all extensively sequenced genomes, including the 'minimal' genome of Mycoplasma genitalium. These particular RNA modifications and the respective enzymes are likely to be essential for the functioning of any cell.

  6. Discovering motifs in ranked lists of DNA sequences.

    PubMed

    Eden, Eran; Lipson, Doron; Yogev, Sivan; Yakhini, Zohar

    2007-03-23

    Computational methods for discovery of sequence elements that are enriched in a target set compared with a background set are fundamental in molecular biology research. One example is the discovery of transcription factor binding motifs that are inferred from ChIP-chip (chromatin immuno-precipitation on a microarray) measurements. Several major challenges in sequence motif discovery still require consideration: (i) the need for a principled approach to partitioning the data into target and background sets; (ii) the lack of rigorous models and of an exact p-value for measuring motif enrichment; (iii) the need for an appropriate framework for accounting for motif multiplicity; (iv) the tendency, in many of the existing methods, to report presumably significant motifs even when applied to randomly generated data. In this paper we present a statistical framework for discovering enriched sequence elements in ranked lists that resolves these four issues. We demonstrate the implementation of this framework in a software application, termed DRIM (discovery of rank imbalanced motifs), which identifies sequence motifs in lists of ranked DNA sequences. We applied DRIM to ChIP-chip and CpG methylation data and obtained the following results. (i) Identification of 50 novel putative transcription factor (TF) binding sites in yeast ChIP-chip data. The biological function of some of them was further investigated to gain new insights on transcription regulation networks in yeast. For example, our discoveries enable the elucidation of the network of the TF ARO80. Another finding concerns a systematic TF binding enhancement to sequences containing CA repeats. (ii) Discovery of novel motifs in human cancer CpG methylation data. Remarkably, most of these motifs are similar to DNA sequence elements bound by the Polycomb complex that promotes histone methylation. Our findings thus support a model in which histone methylation and CpG methylation are mechanistically linked. Overall, we

  7. Elongated Polyproline Motifs Facilitate Enamel Evolution through Matrix Subunit Compaction

    PubMed Central

    Luan, Xianghong; Dangaria, Smit; Walker, Cameron; Allen, Michael; Kulkarni, Ashok; Gibson, Carolyn; Braatz, Richard; Liao, Xiubei; Diekwisch, Thomas G. H.

    2009-01-01

    Vertebrate body designs rely on hydroxyapatite as the principal mineral component of relatively light-weight, articulated endoskeletons and sophisticated tooth-bearing jaws, facilitating rapid movement and efficient predation. Biological mineralization and skeletal growth are frequently accomplished through proteins containing polyproline repeat elements. Through their well-defined yet mobile and flexible structure polyproline-rich proteins control mineral shape and contribute many other biological functions including Alzheimer's amyloid aggregation and prolamine plant storage. In the present study we have hypothesized that polyproline repeat proteins exert their control over biological events such as mineral growth, plaque aggregation, or viscous adhesion by altering the length of their central repeat domain, resulting in dramatic changes in supramolecular assembly dimensions. In order to test our hypothesis, we have used the vertebrate mineralization protein amelogenin as an exemplar and determined the biological effect of the four-fold increased polyproline tandem repeat length in the amphibian/mammalian transition. To study the effect of polyproline repeat length on matrix assembly, protein structure, and apatite crystal growth, we have measured supramolecular assembly dimensions in various vertebrates using atomic force microscopy, tested the effect of protein assemblies on crystal growth by electron microscopy, generated a transgenic mouse model to examine the effect of an abbreviated polyproline sequence on crystal growth, and determined the structure of polyproline repeat elements using 3D NMR. Our study shows that an increase in PXX/PXQ tandem repeat motif length results (i) in a compaction of protein matrix subunit dimensions, (ii) reduced conformational variability, (iii) an increase in polyproline II helices, and (iv) promotion of apatite crystal length. Together, these findings establish a direct relationship between polyproline tandem repeat fragment

  8. A novel human AP endonuclease with conserved zinc-finger-like motifs involved in DNA strand break responses

    PubMed Central

    Kanno, Shin-ichiro; Kuzuoka, Hiroyuki; Sasao, Shigeru; Hong, Zehui; Lan, Li; Nakajima, Satoshi; Yasui, Akira

    2007-01-01

    DNA damage causes genome instability and cell death, but many of the cellular responses to DNA damage still remain elusive. We here report a human protein, PALF (PNK and APTX-like FHA protein), with an FHA (forkhead-associated) domain and novel zinc-finger-like CYR (cysteine–tyrosine–arginine) motifs that are involved in responses to DNA damage. We found that the CYR motif is widely distributed among DNA repair proteins of higher eukaryotes, and that PALF, as well as a Drosophila protein with tandem CYR motifs, has endo- and exonuclease activities against abasic site and other types of base damage. PALF accumulates rapidly at single-strand breaks in a poly(ADP-ribose) polymerase 1 (PARP1)-dependent manner in human cells. Indeed, PALF interacts directly with PARP1 and is required for its activation and for cellular resistance to methyl-methane sulfonate. PALF also interacts directly with KU86, LIGASEIV and phosphorylated XRCC4 proteins and possesses endo/exonuclease activity at protruding DNA ends. Various treatments that produce double-strand breaks induce formation of PALF foci, which fully coincide with γH2AX foci. Thus, PALF and the CYR motif may play important roles in DNA repair of higher eukaryotes. PMID:17396150

  9. Finding regulatory elements and regulatory motifs: a general probabilistic framework

    PubMed Central

    van Nimwegen, Erik

    2007-01-01

    Over the last two decades a large number of algorithms has been developed for regulatory motif finding. Here we show how many of these algorithms, especially those that model binding specificities of regulatory factors with position specific weight matrices (WMs), naturally arise within a general Bayesian probabilistic framework. We discuss how WMs are constructed from sets of regulatory sites, how sites for a given WM can be discovered by scanning of large sequences, how to cluster WMs, and more generally how to cluster large sets of sites from different WMs into clusters. We discuss how 'regulatory modules', clusters of sites for subsets of WMs, can be found in large intergenic sequences, and we discuss different methods for ab initio motif finding, including expectation maximization (EM) algorithms, and motif sampling algorithms. Finally, we extensively discuss how module finding methods and ab initio motif finding methods can be extended to take phylogenetic relations between the input sequences into account, i.e. we show how motif finding and phylogenetic footprinting can be integrated in a rigorous probabilistic framework. The article is intended for readers with a solid background in applied mathematics, and preferably with some knowledge of general Bayesian probabilistic methods. The main purpose of the article is to elucidate that all these methods are not a disconnected set of individual algorithmic recipes, but that they are just different facets of a single integrated probabilistic theory. PMID:17903285

  10. BC1 RNA motifs required for dendritic transport in vivo

    PubMed Central

    Robeck, Thomas; Skryabin, Boris V.; Rozhdestvensky, Timofey S.; Skryabin, Anastasiya B.; Brosius, Jürgen

    2016-01-01

    BC1 RNA is a small brain specific non-protein coding RNA. It is transported from the cell body into dendrites where it is involved in the fine-tuning translational control. Due to its compactness and established secondary structure, BC1 RNA is an ideal model for investigating the motifs necessary for dendritic localization. Previously, microinjection of in vitro transcribed BC1 RNA mutants into the soma of cultured primary neurons suggested the importance of RNA motifs for dendritic targeting. These ex vivo experiments identified a single bulged nucleotide (U22) and a putative K-turn (GA motif) structure required for dendritic localization or distal transport, respectively. We generated six transgenic mouse lines (three founders each) containing neuronally expressing BC1 RNA variants on a BC1 RNA knockout mouse background. In contrast to ex vivo data, we did not find indications of reduction or abolition of dendritic BC1 RNA localization in the mutants devoid of the GA motif or the bulged nucleotide. We confirmed the ex vivo data, which showed that the triloop terminal sequence had no consequence on dendritic transport. Interestingly, changing the triloop supporting structure completely abolished dendritic localization of BC1 RNA. We propose a novel RNA motif important for dendritic transport in vivo. PMID:27350115

  11. BC1 RNA motifs required for dendritic transport in vivo.

    PubMed

    Robeck, Thomas; Skryabin, Boris V; Rozhdestvensky, Timofey S; Skryabin, Anastasiya B; Brosius, Jürgen

    2016-01-01

    BC1 RNA is a small brain specific non-protein coding RNA. It is transported from the cell body into dendrites where it is involved in the fine-tuning translational control. Due to its compactness and established secondary structure, BC1 RNA is an ideal model for investigating the motifs necessary for dendritic localization. Previously, microinjection of in vitro transcribed BC1 RNA mutants into the soma of cultured primary neurons suggested the importance of RNA motifs for dendritic targeting. These ex vivo experiments identified a single bulged nucleotide (U22) and a putative K-turn (GA motif) structure required for dendritic localization or distal transport, respectively. We generated six transgenic mouse lines (three founders each) containing neuronally expressing BC1 RNA variants on a BC1 RNA knockout mouse background. In contrast to ex vivo data, we did not find indications of reduction or abolition of dendritic BC1 RNA localization in the mutants devoid of the GA motif or the bulged nucleotide. We confirmed the ex vivo data, which showed that the triloop terminal sequence had no consequence on dendritic transport. Interestingly, changing the triloop supporting structure completely abolished dendritic localization of BC1 RNA. We propose a novel RNA motif important for dendritic transport in vivo. PMID:27350115

  12. cWINNOWER algorithm for finding fuzzy dna motifs

    NASA Technical Reports Server (NTRS)

    Liang, S.; Samanta, M. P.; Biegel, B. A.

    2004-01-01

    The cWINNOWER algorithm detects fuzzy motifs in DNA sequences rich in protein-binding signals. A signal is defined as any short nucleotide pattern having up to d mutations differing from a motif of length l. The algorithm finds such motifs if a clique consisting of a sufficiently large number of mutated copies of the motif (i.e., the signals) is present in the DNA sequence. The cWINNOWER algorithm substantially improves the sensitivity of the winnower method of Pevzner and Sze by imposing a consensus constraint, enabling it to detect much weaker signals. We studied the minimum detectable clique size qc as a function of sequence length N for random sequences. We found that qc increases linearly with N for a fast version of the algorithm based on counting three-member sub-cliques. Imposing consensus constraints reduces qc by a factor of three in this case, which makes the algorithm dramatically more sensitive. Our most sensitive algorithm, which counts four-member sub-cliques, needs a minimum of only 13 signals to detect motifs in a sequence of length N = 12,000 for (l, d) = (15, 4). Copyright Imperial College Press.

  13. Regulatory role of suppressive motifs from commensal DNA.

    PubMed

    Bouladoux, N; Hall, J A; Grainger, J R; dos Santos, L M; Kann, M G; Nagarajan, V; Verthelyi, D; Belkaid, Y

    2012-11-01

    The microbiota contributes to the induction of both effector and regulatory responses in the gastrointestinal (GI) tract. However, the mechanisms controlling these distinct properties remain poorly understood. We previously showed that commensal DNA promotes intestinal immunity. Here, we find that the capacity of bacterial DNA to stimulate immune responses is species specific and correlated with the frequency of motifs known to exert immunosuppressive function. In particular, we show that the DNA of Lactobacillus species, including various probiotics, is enriched in suppressive motifs able to inhibit lamina propria dendritic cell activation. In addition, immunosuppressive oligonucleotides sustain T(reg) cell conversion during inflammation and limit pathogen-induced immunopathology and colitis. Altogether, our findings identify DNA-suppressive motifs as a molecular ligand expressed by commensals and support the idea that a balance between stimulatory and regulatory DNA motifs contributes to the induction of controlled immune responses in the GI tract and gut immune homeostasis. Further, our findings suggest that the endogenous regulatory capacity of DNA motifs enriched in some commensal bacteria could be exploited for therapeutic purposes. PMID:22617839

  14. MALISAM: a database of structurally analogous motifs in proteins.

    PubMed

    Cheng, Hua; Kim, Bong-Hyun; Grishin, Nick V

    2008-01-01

    MALISAM (manual alignments for structurally analogous motifs) represents the first database containing pairs of structural analogs and their alignments. To find reliable analogs, we developed an approach based on three ideas. First, an insertion together with a part of the evolutionary core of one domain family (a hybrid motif) is analogous to a similar motif contained within the core of another domain family. Second, a motif at an interface, formed by secondary structural elements (SSEs) contributed by two or more domains or subunits contacting along that interface, is analogous to a similar motif present in the core of a single domain. Third, an artificial protein obtained through selection from random peptides or in sequence design experiments not biased by sequences of a particular homologous family, is analogous to a structurally similar natural protein. Each analogous pair is superimposed and aligned manually, as well as by several commonly used programs. Applications of this database may range from protein evolution studies, e.g. development of remote homology inference tools and discriminators between homologs and analogs, to protein-folding research, since in the absence of evolutionary reasons, similarity between proteins is caused by structural and folding constraints. The database is publicly available at http://prodata.swmed.edu/malisam. PMID:17855399

  15. Interconnected network motifs control podocyte morphology and kidney function.

    PubMed

    Azeloglu, Evren U; Hardy, Simon V; Eungdamrong, Narat John; Chen, Yibang; Jayaraman, Gomathi; Chuang, Peter Y; Fang, Wei; Xiong, Huabao; Neves, Susana R; Jain, Mohit R; Li, Hong; Ma'ayan, Avi; Gordon, Ronald E; He, John Cijiang; Iyengar, Ravi

    2014-02-01

    Podocytes are kidney cells with specialized morphology that is required for glomerular filtration. Diseases, such as diabetes, or drug exposure that causes disruption of the podocyte foot process morphology results in kidney pathophysiology. Proteomic analysis of glomeruli isolated from rats with puromycin-induced kidney disease and control rats indicated that protein kinase A (PKA), which is activated by adenosine 3',5'-monophosphate (cAMP), is a key regulator of podocyte morphology and function. In podocytes, cAMP signaling activates cAMP response element-binding protein (CREB) to enhance expression of the gene encoding a differentiation marker, synaptopodin, a protein that associates with actin and promotes its bundling. We constructed and experimentally verified a β-adrenergic receptor-driven network with multiple feedback and feedforward motifs that controls CREB activity. To determine how the motifs interacted to regulate gene expression, we mapped multicompartment dynamical models, including information about protein subcellular localization, onto the network topology using Petri net formalisms. These computational analyses indicated that the juxtaposition of multiple feedback and feedforward motifs enabled the prolonged CREB activation necessary for synaptopodin expression and actin bundling. Drug-induced modulation of these motifs in diseased rats led to recovery of normal morphology and physiological function in vivo. Thus, analysis of regulatory motifs using network dynamics can provide insights into pathophysiology that enable predictions for drug intervention strategies to treat kidney disease. PMID:24497609

  16. Crystal structure of bacterial cell-surface alginate-binding protein with an M75 peptidase motif

    SciTech Connect

    Maruyama, Yukie; Ochiai, Akihito; Mikami, Bunzo; Hashimoto, Wataru; Murata, Kousaku

    2011-02-18

    Research highlights: {yields} Bacterial alginate-binding Algp7 is similar to component EfeO of Fe{sup 2+} transporter. {yields} We determined the crystal structure of Algp7 with a metal-binding motif. {yields} Algp7 consists of two helical bundles formed through duplication of a single bundle. {yields} A deep cleft involved in alginate binding locates around the metal-binding site. {yields} Algp7 may function as a Fe{sup 2+}-chelated alginate-binding protein. -- Abstract: A gram-negative Sphingomonas sp. A1 directly incorporates alginate polysaccharide into the cytoplasm via the cell-surface pit and ABC transporter. A cell-surface alginate-binding protein, Algp7, functions as a concentrator of the polysaccharide in the pit. Based on the primary structure and genetic organization in the bacterial genome, Algp7 was found to be homologous to an M75 peptidase motif-containing EfeO, a component of a ferrous ion transporter. Despite the presence of an M75 peptidase motif with high similarity, the Algp7 protein purified from recombinant Escherichia coli cells was inert on insulin B chain and N-benzoyl-Phe-Val-Arg-p-nitroanilide, both of which are substrates for a typical M75 peptidase, imelysin, from Pseudomonas aeruginosa. The X-ray crystallographic structure of Algp7 was determined at 2.10 A resolution by single-wavelength anomalous diffraction. Although a metal-binding motif, HxxE, conserved in zinc ion-dependent M75 peptidases is also found in Algp7, the crystal structure of Algp7 contains no metal even at the motif. The protein consists of two structurally similar up-and-down helical bundles as the basic scaffold. A deep cleft between the bundles is sufficiently large to accommodate macromolecules such as alginate polysaccharide. This is the first structural report on a bacterial cell-surface alginate-binding protein with an M75 peptidase motif.

  17. Modeling Small Noncanonical RNA Motifs with the Rosetta FARFAR Server.

    PubMed

    Yesselman, Joseph D; Das, Rhiju

    2016-01-01

    Noncanonical RNA motifs help define the vast complexity of RNA structure and function, and in many cases, these loops and junctions are on the order of only ten nucleotides in size. Unfortunately, despite their small size, there is no reliable method to determine the ensemble of lowest energy structures of junctions and loops at atomic accuracy. This chapter outlines straightforward protocols using a webserver for Rosetta Fragment Assembly of RNA with Full Atom Refinement (FARFAR) ( http://rosie.rosettacommons.org/rna_denovo/submit ) to model the 3D structure of small noncanonical RNA motifs for use in visualizing motifs and for further refinement or filtering with experimental data such as NMR chemical shifts. PMID:27665600

  18. Selection against spurious promoter motifs correlates withtranslational efficiency across bacteria

    SciTech Connect

    Froula, Jeffrey L.; Francino, M. Pilar

    2007-05-01

    Because binding of RNAP to misplaced sites could compromise the efficiency of transcription, natural selection for the optimization of gene expression should regulate the distribution of DNA motifs capable of RNAP-binding across the genome. Here we analyze the distribution of the -10 promoter motifs that bind the {sigma}{sup 70} subunit of RNAP in 42 bacterial genomes. We show that selection on these motifs operates across the genome, maintaining an over-representation of -10 motifs in regulatory sequences while eliminating them from the nonfunctional and, in most cases, from the protein coding regions. In some genomes, however, -10 sites are over-represented in the coding sequences; these sites could induce pauses effecting regulatory roles throughout the length of a transcriptional unit. For nonfunctional sequences, the extent of motif under-representation varies across genomes in a manner that broadly correlates with the number of tRNA genes, a good indicator of translational speed and growth rate. This suggests that minimizing the time invested in gene transcription is an important selective pressure against spurious binding. However, selection against spurious binding is detectable in the reduced genomes of host-restricted bacteria that grow at slow rates, indicating that components of efficiency other than speed may also be important. Minimizing the number of RNAP molecules per cell required for transcription, and the corresponding energetic expense, may be most relevant in slow growers. These results indicate that genome-level properties affecting the efficiency of transcription and translation can respond in an integrated manner to optimize gene expression. The detection of selection against promoter motifs in nonfunctional regions also implies that no sequence may evolve free of selective constraints, at least in the relatively small and unstructured genomes of bacteria.

  19. Application of Synthetic Peptide Arrays To Uncover Cyclic Di-GMP Binding Motifs

    PubMed Central

    Düvel, Juliane; Bense, Sarina; Möller, Stefan; Bertinetti, Daniela; Schwede, Frank; Morr, Michael; Eckweiler, Denitsa; Genieser, Hans-Gottfried; Jänsch, Lothar; Herberg, Friedrich W.; Frank, Ronald

    2015-01-01

    ABSTRACT High levels of the universal bacterial second messenger cyclic di-GMP (c-di-GMP) promote the establishment of surface-attached growth in many bacteria. Not only can c-di-GMP bind to nucleic acids and directly control gene expression, but it also binds to a diverse array of proteins of specialized functions and orchestrates their activity. Since its development in the early 1990s, the synthetic peptide array technique has become a powerful tool for high-throughput approaches and was successfully applied to investigate the binding specificity of protein-ligand interactions. In this study, we used peptide arrays to uncover the c-di-GMP binding site of a Pseudomonas aeruginosa protein (PA3740) that was isolated in a chemical proteomics approach. PA3740 was shown to bind c-di-GMP with a high affinity, and peptide arrays uncovered LKKALKKQTNLR to be a putative c-di-GMP binding motif. Most interestingly, different from the previously identified c-di-GMP binding motif of the PilZ domain (RXXXR) or the I site of diguanylate cyclases (RXXD), two leucine residues and a glutamine residue and not the charged amino acids provided the key residues of the binding sequence. Those three amino acids are highly conserved across PA3740 homologs, and their singular exchange to alanine reduced c-di-GMP binding within the full-length protein. IMPORTANCE In many bacterial pathogens the universal bacterial second messenger c-di-GMP governs the switch from the planktonic, motile mode of growth to the sessile, biofilm mode of growth. Bacteria adapt their intracellular c-di-GMP levels to a variety of environmental challenges. Several classes of c-di-GMP binding proteins have been structurally characterized, and diverse c-di-GMP binding domains have been identified. Nevertheless, for several c-di-GMP receptors, the binding motif remains to be determined. Here we show that the use of a synthetic peptide array allowed the identification of a c-di-GMP binding motif of a putative c

  20. The Membrane-Bound NAC Transcription Factor ANAC013 Functions in Mitochondrial Retrograde Regulation of the Oxidative Stress Response in Arabidopsis[C][W

    PubMed Central

    De Clercq, Inge; Vermeirssen, Vanessa; Van Aken, Olivier; Vandepoele, Klaas; Murcha, Monika W.; Law, Simon R.; Inzé, Annelies; Ng, Sophia; Ivanova, Aneta; Rombaut, Debbie; van de Cotte, Brigitte; Jaspers, Pinja; Van de Peer, Yves; Kangasjärvi, Jaakko; Whelan, James; Van Breusegem, Frank

    2013-01-01

    Upon disturbance of their function by stress, mitochondria can signal to the nucleus to steer the expression of responsive genes. This mitochondria-to-nucleus communication is often referred to as mitochondrial retrograde regulation (MRR). Although reactive oxygen species and calcium are likely candidate signaling molecules for MRR, the protein signaling components in plants remain largely unknown. Through meta-analysis of transcriptome data, we detected a set of genes that are common and robust targets of MRR and used them as a bait to identify its transcriptional regulators. In the upstream regions of these mitochondrial dysfunction stimulon (MDS) genes, we found a cis-regulatory element, the mitochondrial dysfunction motif (MDM), which is necessary and sufficient for gene expression under various mitochondrial perturbation conditions. Yeast one-hybrid analysis and electrophoretic mobility shift assays revealed that the transmembrane domain–containing NO APICAL MERISTEM/ARABIDOPSIS TRANSCRIPTION ACTIVATION FACTOR/CUP-SHAPED COTYLEDON transcription factors (ANAC013, ANAC016, ANAC017, ANAC053, and ANAC078) bound to the MDM cis-regulatory element. We demonstrate that ANAC013 mediates MRR-induced expression of the MDS genes by direct interaction with the MDM cis-regulatory element and triggers increased oxidative stress tolerance. In conclusion, we characterized ANAC013 as a regulator of MRR upon stress in Arabidopsis thaliana. PMID:24045019

  1. The SLiMDisc server: short, linear motif discovery in proteins.

    PubMed

    Davey, Norman E; Edwards, Richard J; Shields, Denis C

    2007-07-01

    Short, linear motifs (SLiMs) play a critical role in many biological processes, particularly in protein-protein interactions. Overrepresentation of convergent occurrences of motifs in proteins with a common attribute (such as similar subcellular location or a shared interaction partner) provides a feasible means to discover novel occurrences computationally. The SLiMDisc (Short, Linear Motif Discovery) web server corrects for common ancestry in describing shared motifs, concentrating on the convergently evolved motifs. The server returns a listing of the most interesting motifs found within unmasked regions, ranked according to an information content-based scoring scheme. It allows interactive input masking, according to various criteria. Scoring allows for evolutionary relationships in the data sets through treatment of BLAST local alignments. Alongside this ranked list, visualizations of the results improve understanding of the context of suggested motifs, helping to identify true motifs of interest. These visualizations include alignments of motif occurrences, alignments of motifs and their homologues and a visual schematic of the top-ranked motifs. Additional options for filtering and/or re-ranking motifs further permit the user to focus on motifs with desired attributes. Returned motifs can also be compared with known SLiMs from the literature. SLiMDisc is available at: http://bioware.ucd.ie/~slimdisc/.

  2. Nephila clavipes Flagelliform silk-like GGX motifs contribute to extensibility and spacer motifs contribute to strength in synthetic spider silk fibers.

    PubMed

    Adrianos, Sherry L; Teulé, Florence; Hinman, Michael B; Jones, Justin A; Weber, Warner S; Yarger, Jeffery L; Lewis, Randolph V

    2013-06-10

    Flagelliform spider silk is the most extensible silk fiber produced by orb weaver spiders, though not as strong as the dragline silk of the spider. The motifs found in the core of the Nephila clavipes flagelliform Flag protein are GGX, spacer, and GPGGX. Flag does not contain the polyalanine motif known to provide the strength of dragline silk. To investigate the source of flagelliform fiber strength, four recombinant proteins were produced containing variations of the three core motifs of the Nephila clavipes flagelliform Flag protein that produces this type of fiber. The as-spun fibers were processed in 80% aqueous isopropanol using a standardized process for all four fiber types, which produced improved mechanical properties. Mechanical testing of the recombinant proteins determined that the GGX motif contributes extensibility and the spacer motif contributes strength to the recombinant fibers. Recombinant protein fibers containing the spacer motif were stronger than the proteins constructed without the spacer that contained only the GGX motif or the combination of the GGX and GPGGX motifs. The mechanical and structural X-ray diffraction analysis of the recombinant fibers provide data that suggests a functional role of the spacer motif that produces tensile strength, though the spacer motif is not clearly defined structurally. These results indicate that the spacer is likely a primary contributor of strength, with the GGX motif supplying mobility to the protein network of native N. clavipes flagelliform silk fibers. PMID:23646825

  3. Novel DNA Motif Binding Activity Observed In Vivo With an Estrogen Receptor α Mutant Mouse

    PubMed Central

    Li, Leping; Grimm, Sara A.; Winuthayanon, Wipawee; Hamilton, Katherine J.; Pockette, Brianna; Rubel, Cory A.; Pedersen, Lars C.; Fargo, David; Lanz, Rainer B.; DeMayo, Francesco J.; Schütz, Günther; Korach, Kenneth S.

    2014-01-01

    Estrogen receptor α (ERα) interacts with DNA directly or indirectly via other transcription factors, referred to as “tethering.” Evidence for tethering is based on in vitro studies and a widely used “KIKO” mouse model containing mutations that prevent direct estrogen response element DNA- binding. KIKO mice are infertile, due in part to the inability of estradiol (E2) to induce uterine epithelial proliferation. To elucidate the molecular events that prevent KIKO uterine growth, regulation of the pro-proliferative E2 target gene Klf4 and of Klf15, a progesterone (P4) target gene that opposes the pro-proliferative activity of KLF4, was evaluated. Klf4 induction was impaired in KIKO uteri; however, Klf15 was induced by E2 rather than by P4. Whole uterine chromatin immunoprecipitation-sequencing revealed enrichment of KIKO ERα binding to hormone response elements (HREs) motifs. KIKO binding to HRE motifs was verified using reporter gene and DNA-binding assays. Because the KIKO ERα has HRE DNA-binding activity, we evaluated the “EAAE” ERα, which has more severe DNA-binding domain mutations, and demonstrated a lack of estrogen response element or HRE reporter gene induction or DNA-binding. The EAAE mouse has an ERα null–like phenotype, with impaired uterine growth and transcriptional activity. Our findings demonstrate that the KIKO mouse model, which has been used by numerous investigators, cannot be used to establish biological functions for ERα tethering, because KIKO ERα effectively stimulates transcription using HRE motifs. The EAAE-ERα DNA-binding domain mutant mouse demonstrates that ERα DNA-binding is crucial for biological and transcriptional processes in reproductive tissues and that ERα tethering may not contribute to estrogen responsiveness in vivo. PMID:24713037

  4. The GA motif: an RNA element common to bacterial antitermination systems, rRNA, and eukaryotic RNAs.

    PubMed Central

    Winkler, W C; Grundy, F J; Murphy, B A; Henkin, T M

    2001-01-01

    Two different transcription termination control mechanisms, the T box and S box systems, are used to regulate transcription of many bacterial aminoacyl-tRNA synthetase, amino acid biosynthesis, and amino acid transport genes. Both of these regulatory mechanisms involve an untranslated mRNA leader region capable of adopting alternate structural conformations that result in transcription termination or transcription elongation into the downstream region. Comparative analyses revealed a small RNA secondary structural element, designated the GA motif, that is highly conserved in both T box and S box leader sequences. The motif consists of two short helices separated by an asymmetric internal loop, with highly conserved GA dinucleotide sequences on either side of the internal loop. Site-directed mutagenesis of this motif in model T and S box leader sequences indicated that it is essential for transcriptional regulation in both systems. This motif is similar to the binding site of yeast ribosomal protein L30, the Snu13p binding sites found in U4 snRNA and box C/D snoRNAs, and two elements in 23S rRNA. PMID:11497434

  5. Tertiary structure and function of an RNA motif required for plant vascular entry to initiate systemic trafficking

    PubMed Central

    Zhong, Xuehua; Tao, Xiaorong; Stombaugh, Jesse; Leontis, Neocles; Ding, Biao

    2007-01-01

    Vascular entry is a decisive step for the initiation of long-distance movement of infectious and endogenous RNAs, silencing signals and developmental/defense signals in plants. However, the mechanisms remain poorly understood. We used Potato spindle tuber viroid (PSTVd) as a model to investigate the direct role of the RNA itself in vascular entry. We report here the identification of an RNA motif that is required for PSTVd to traffic from nonvascular into the vascular tissue phloem to initiate systemic infection. This motif consists of nucleotides U/C that form a water-inserted cis Watson–Crick/Watson–Crick base pair flanked by short helices that comprise canonical Watson–Crick/Watson–Crick base pairs. This tertiary structural model was inferred by comparison with X-ray crystal structures of similar motifs in rRNAs and is supported by combined mutagenesis and covariation analyses. Hydration pattern analysis suggests that water insertion induces a widened minor groove conducive to protein and/or RNA interactions. Our model and approaches have broad implications to investigate the RNA structural motifs in other RNAs for vascular entry and to study the basic principles of RNA structure–function relationships. PMID:17660743

  6. 5. DETAIL VIEW OF THE EGYPTIAN MOTIF DECORATIVE ELEMENTS OF ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    5. DETAIL VIEW OF THE EGYPTIAN MOTIF DECORATIVE ELEMENTS OF BUILDING 1'S MAIN ENTRY TOWER (INCLUDING THE ENGAGED COLUMN CAPITALS, PILASTERS & CAPITALS, CORNICES, AND TERRA COTTA EAGLES); LOOKING SW FROM THE E WING ROOF. (Ryan) - Veterans Administration Medical Center, Building No. 1, Old State Route 13 West, Marion, Williamson County, IL

  7. Insights into the motif preference of APOBEC3 enzymes.

    PubMed

    Ebrahimi, Diako; Alinejad-Rokny, Hamid; Davenport, Miles P

    2014-01-01

    We used a multivariate data analysis approach to identify motifs associated with HIV hypermutation by different APOBEC3 enzymes. The analysis showed that APOBEC3G targets G mainly within GG, TG, TGG, GGG, TGGG and also GGGT. The G nucleotides flanked by a C at the 3' end (in +1 and +2 positions) were indicated as disfavoured targets by APOBEC3G. The G nucleotides within GGGG were found to be targeted at a frequency much less than what is expected. We found that the infrequent G-to-A mutation within GGGG is not limited to the inaccessibility, to APOBEC3, of poly Gs in the central and 3'polypurine tracts (PPTs) which remain double stranded during the HIV reverse transcription. GGGG motifs outside the PPTs were also disfavoured. The motifs GGAG and GAGG were also found to be disfavoured targets for APOBEC3. The motif-dependent mutation of G within the HIV genome by members of the APOBEC3 family other than APOBEC3G was limited to GA→AA changes. The results did not show evidence of other types of context dependent G-to-A changes in the HIV genome. PMID:24498164

  8. Motifs in triadic random graphs based on Steiner triple systems

    NASA Astrophysics Data System (ADS)

    Winkler, Marco; Reichardt, Jörg

    2013-08-01

    Conventionally, pairwise relationships between nodes are considered to be the fundamental building blocks of complex networks. However, over the last decade, the overabundance of certain subnetwork patterns, i.e., the so-called motifs, has attracted much attention. It has been hypothesized that these motifs, instead of links, serve as the building blocks of network structures. Although the relation between a network's topology and the general properties of the system, such as its function, its robustness against perturbations, or its efficiency in spreading information, is the central theme of network science, there is still a lack of sound generative models needed for testing the functional role of subgraph motifs. Our work aims to overcome this limitation. We employ the framework of exponential random graph models (ERGMs) to define models based on triadic substructures. The fact that only a small portion of triads can actually be set independently poses a challenge for the formulation of such models. To overcome this obstacle, we use Steiner triple systems (STSs). These are partitions of sets of nodes into pair-disjoint triads, which thus can be specified independently. Combining the concepts of ERGMs and STSs, we suggest generative models capable of generating ensembles of networks with nontrivial triadic Z-score profiles. Further, we discover inevitable correlations between the abundance of triad patterns, which occur solely for statistical reasons and need to be taken into account when discussing the functional implications of motif statistics. Moreover, we calculate the degree distributions of our triadic random graphs analytically.

  9. DNA containing CpG motifs induces angiogenesis

    NASA Astrophysics Data System (ADS)

    Zheng, Mei; Klinman, Dennis M.; Gierynska, Malgorzata; Rouse, Barry T.

    2002-06-01

    New blood vessel formation in the cornea is an essential step in the pathogenesis of a blinding immunoinflammatory reaction caused by ocular infection with herpes simplex virus (HSV). By using a murine corneal micropocket assay, we found that HSV DNA (which contains a significant excess of potentially bioactive "CpG" motifs when compared with mammalian DNA) induces angiogenesis. Moreover, synthetic oligodeoxynucleotides containing CpG motifs attract inflammatory cells and stimulate the release of vascular endothelial growth factor (VEGF), which in turn triggers new blood vessel formation. In vitro, CpG DNA induces the J774A.1 murine macrophage cell line to produce VEGF. In vivo CpG-induced angiogenesis was blocked by the administration of anti-mVEGF Ab or the inclusion of "neutralizing" oligodeoxynucleotides that specifically oppose the stimulatory activity of CpG DNA. These findings establish that DNA containing bioactive CpG motifs induces angiogenesis, and suggest that CpG motifs in HSV DNA may contribute to the blinding lesions of stromal keratitis.

  10. Themes or Motifs? Aiming for Coherence through Interdisciplinary Outlines.

    ERIC Educational Resources Information Center

    Barton, Keith C.; Smith Lynne A.

    2000-01-01

    Describes how "motif-units" undermine the potential benefits of integrated thematic instruction. Suggests replacing the term "thematic unit" with the concept of "interdisciplinary outline," which focus on meaningful content, authentic activities, students' needs, teacher mediation, and a variety of resources. Shows how one fourth-grade teacher…

  11. Folding of helical membrane proteins: the role of polar, GxxxG-like and proline motifs.

    PubMed

    Senes, Alessandro; Engel, Donald E; DeGrado, William F

    2004-08-01

    Helical integral membrane proteins share several structural determinants that are widely conserved across their universe. The discovery of common motifs has furthered our understanding of the features that are important to stability in the membrane environment, while simultaneously providing clues about proteins that lack high-resolution structures. Motif analysis also helps to target mutagenesis studies, and other experimental and computational work. Three types of transmembrane motifs have recently seen interesting developments: the GxxxG motif and its like; polar and hydrogen bonding motifs; and proline motifs.

  12. Charged Assembly Helix Motif in Murine Leukemia Virus Capsid: an Important Region for Virus Assembly and Particle Size Determination

    PubMed Central

    Cheslock, Sara Rasmussen; Poon, Dexter T. K.; Fu, William; Rhodes, Terence D.; Henderson, Louis E.; Nagashima, Kunio; McGrath, Connor F.; Hu, Wei-Shau

    2003-01-01

    We have identified a region near the C terminus of capsid (CA) of murine leukemia virus (MLV) that contains many charged residues. This motif is conserved in various lengths in most MLV-like viruses. One exception is that spleen necrosis virus (SNV) does not contain a well-defined domain of charged residues. When 33 amino acids of the MLV motif were deleted to mimic SNV CA, the resulting mutant produced drastically reduced amounts of virions and the virions were noninfectious. Furthermore, these viruses had abnormal sizes, often contained punctate structures resembling those in the cell cytoplasm, and packaged both ribosomal and viral RNA. When 11 or 15 amino acids were deleted to modify the MLV CA to resemble those from other gammaretroviruses, the deletion mutants produced virions at levels comparable to those of the wild-type virus and were able to complete one round of virus replication without detectable defects. We generated 10 more mutants that displayed either the wild-type or mutant phenotype. The distribution of the wild-type or mutant phenotype did not directly correlate with the number of amino acids deleted, suggesting that the function of the motif is determined not simply by its length but also by its structure. Structural modeling of the wild-type and mutant proteins suggested that this region forms α-helices; thus, we termed this motif the “charged assembly helix.” This is the first description of the charged assembly helix motif in MLV CA and demonstration of its role in virus budding and assembly. PMID:12768025

  13. The RNA recognition motif domains of RBM5 are required for RNA binding and cancer cell proliferation inhibition

    SciTech Connect

    Zhang, Lei; Zhang, Qing; Yang, Yu; Wu, Chuanfang

    2014-02-14

    Highlights: • RNA recognition motif domains of RBM5 are essential for cell proliferation inhibition. • RNA recognition motif domains of RBM5 are essential for apoptosis induction. • RNA recognition motif domains of RBM5 are essential for RNA binding. • RNA recognition motif domains of RBM5 are essential for caspase-2 alternative splicing. - Abstract: RBM5 is a known putative tumor suppressor gene that has been shown to function in cell growth inhibition by modulating apoptosis. RBM5 also plays a critical role in alternative splicing as an RNA binding protein. However, it is still unclear which domains of RBM5 are required for RNA binding and related functional activities. We hypothesized the two putative RNA recognition motif (RRM) domains of RBM5 spanning from amino acids 98–178 and 231–315 are essential for RBM5-mediated cell growth inhibition, apoptosis regulation, and RNA binding. To investigate this hypothesis, we evaluated the activities of the wide-type and mutant RBM5 gene transfer in low-RBM5 expressing A549 cells. We found that, unlike wild-type RBM5 (RBM5-wt), a RBM5 mutant lacking the two RRM domains (RBM5-ΔRRM), is unable to bind RNA, has compromised caspase-2 alternative splicing activity, lacks cell proliferation inhibition and apoptosis induction function in A549 cells. These data provide direct evidence that the two RRM domains of RBM5 are required for RNA binding and the RNA binding activity of RBM5 contributes to its function on apoptosis induction and cell growth inhibition.

  14. The OB-fold domain 1 of human POT1 recognizes both telomeric and non-telomeric DNA motifs

    PubMed Central

    Kolar, Carol; Yan, Ying; Borgstahl, Gloria E.O.; Ouellette, Michel M.

    2015-01-01

    The POT1 protein plays a critical role in telomere protection and telomerase regulation. POT1 binds single-stranded 5’-TTAGGGTTAG-3’ and forms a dimer with the TPP1 protein. The dimer is recruited to telomeres, either directly or as part of the Shelterin complex. Human POT1 contains two Oligonucleotide/Oligosaccharide Binding (OB) fold domains, OB1 and OB2, which make physical contact with the DNA. OB1 recognizes 5’-TTAGGG whereas OB2 binds to the downstream TTAG-3’. Studies of POT1 proteins from other species have shown that some of these proteins are able to recognize a broader variety of DNA ligands than expected. To explore this possibility in humans, we have used SELEX to reexamine the sequence-specificity of the protein. Using human POT1 as a selection matrix, high-affinity DNA ligands were selected from a pool of randomized single-stranded oligonucleotides. After six successive rounds of selection, two classes of high-affinity targets were obtained. The first class was composed of oligonucleotides containing a cognate POT1 binding sites (5’-TTAGGGTTAG-3’). The second and more abundant class was made of molecules that carried a novel non-telomeric consensus: 5’-TNCANNAGKKKTTAGG-3’ (where K=G/T and N=any base). Binding studies showed that these non-telomeric sites were made of an OB1-binding motif (TTAGG) and a non-telomeric motif (NT motif), with the two motifs recognized by distinct regions of the OB1 domain. POT1 interacted with these non-telomeric binding sites with high affinity and specificity, even when bound to its dimerization partner TPP1. This intrinsic ability of POT1 to recognize NT motifs raises the possibility that the protein may fulfill additional functions at certain non-telomeric locations of the genome, in perhaps gene transcription, replication, or repair. PMID:25934589

  15. Glycines from the APP GXXXG/GXXXA Transmembrane Motifs Promote Formation of Pathogenic Aβ Oligomers in Cells

    PubMed Central

    Decock, Marie; Stanga, Serena; Octave, Jean-Noël; Dewachter, Ilse; Smith, Steven O.; Constantinescu, Stefan N.; Kienlen-Campard, Pascal

    2016-01-01

    Alzheimer’s disease (AD) is the most common neurodegenerative disorder characterized by progressive cognitive decline leading to dementia. The amyloid precursor protein (APP) is a ubiquitous type I transmembrane (TM) protein sequentially processed to generate the β-amyloid peptide (Aβ), the major constituent of senile plaques that are typical AD lesions. There is a growing body of evidence that soluble Aβ oligomers correlate with clinical symptoms associated with the disease. The Aβ sequence begins in the extracellular juxtamembrane region of APP and includes roughly half of the TM domain. This region contains GXXXG and GXXXA motifs, which are critical for both TM protein interactions and fibrillogenic properties of peptides derived from TM α-helices. Glycine-to-leucine mutations of these motifs were previously shown to affect APP processing and Aβ production in cells. However, the detailed contribution of these motifs to APP dimerization, their relation to processing, and the conformational changes they can induce within Aβ species remains undefined. Here, we describe highly resistant Aβ42 oligomers that are produced in cellular membrane compartments. They are formed in cells by processing of the APP amyloidogenic C-terminal fragment (C99), or by direct expression of a peptide corresponding to Aβ42, but not to Aβ40. By a point-mutation approach, we demonstrate that glycine-to-leucine mutations in the G29XXXG33 and G38XXXA42 motifs dramatically affect the Aβ oligomerization process. G33 and G38 in these motifs are specifically involved in Aβ oligomerization; the G33L mutation strongly promotes oligomerization, while G38L blocks it with a dominant effect on G33 residue modification. Finally, we report that the secreted Aβ42 oligomers display pathological properties consistent with their suggested role in AD, but do not induce toxicity in survival assays with neuronal cells. Exposure of neurons to these Aβ42 oligomers dramatically affects neuronal

  16. Exploring water binding motifs to an excess electron via X2(-)(H2O) [X = O, F].

    PubMed

    Chiou, Mong-Feng; Sheu, Wen-Shyan

    2012-07-26

    X(2)(-)(H(2)O) [X = O, F] is utilized to explore water binding motifs to an excess electron via ab initio calculations at the MP4(SDQ)/aug-cc-pVDZ + diffs(2s2p,2s2p) level of theory. X(2)(-)(H(2)O) can be regarded as a water molecule that binds to an excess electron, the distribution of which is gauged by X(2). By varying the interatomic distance of X(2), r(X1-X2), the distribution of the excess electron is altered, and the water binding motifs to the excess electron is then examined. Depending on r(X1-X2), both binding motifs of C(s) and C(2v) forms are found with a critical distance of ∼1.37 Å and ∼1.71 Å for O(2)(-)(H(2)O) and F(2)(-)(H(2)O), respectively. The energetic and geometrical features of O(2)(-)(H(2)O) and F(2)(-)(H(2)O) are compared. In addition, various electronic properties of X(2)(-)(H(2)O) are examined. For both O(2)(-)(H(2)O) and F(2)(-)(H(2)O), the C(s) binding motif appears to prevail at a compact distribution of the excess electron. However, when the electron is diffuse, characterized by the radius of gyration in the direction of the X(2) bond axis with a threshold of ∼0.84 Å, the C(2v) binding motif is formed.

  17. Convergent evolution and mimicry of protein linear motifs in host-pathogen interactions.

    PubMed

    Chemes, Lucía Beatriz; de Prat-Gay, Gonzalo; Sánchez, Ignacio Enrique

    2015-06-01

    Pathogen linear motif mimics are highly evolvable elements that facilitate rewiring of host protein interaction networks. Host linear motifs and pathogen mimics differ in sequence, leading to thermodynamic and structural differences in the resulting protein-protein interactions. Moreover, the functional output of a mimic depends on the motif and domain repertoire of the pathogen protein. Regulatory evolution mediated by linear motifs can be understood by measuring evolutionary rates, quantifying positive and negative selection and performing phylogenetic reconstructions of linear motif natural history. Convergent evolution of linear motif mimics is widespread among unrelated proteins from viral, prokaryotic and eukaryotic pathogens and can also take place within individual protein phylogenies. Statistics, biochemistry and laboratory models of infection link pathogen linear motifs to phenotypic traits such as tropism, virulence and oncogenicity. In vitro evolution experiments and analysis of natural sequences suggest that changes in linear motif composition underlie pathogen adaptation to a changing environment. PMID:25863584

  18. A Bioinformatics Approach for Detecting Repetitive Nested Motifs using Pattern Matching

    PubMed Central

    Romero, José R.; Carballido, Jessica A.; Garbus, Ingrid; Echenique, Viviana C.; Ponzoni, Ignacio

    2016-01-01

    The identification of nested motifs in genomic sequences is a complex computational problem. The detection of these patterns is important to allow the discovery of transposable element (TE) insertions, incomplete reverse transcripts, deletions, and/or mutations. In this study, a de novo strategy for detecting patterns that represent nested motifs was designed based on exhaustive searches for pairs of motifs and combinatorial pattern analysis. These patterns can be grouped into three categories, motifs within other motifs, motifs flanked by other motifs, and motifs of large size. The methodology used in this study, applied to genomic sequences from the plant species Aegilops tauschii and Oryza sativa, revealed that it is possible to identify putative nested TEs by detecting these three types of patterns. The results were validated through BLAST alignments, which revealed the efficacy and usefulness of the new method, which is called Mamushka. PMID:27812277

  19. FPGA implementation of motifs-based neuronal network and synchronization analysis

    NASA Astrophysics Data System (ADS)

    Deng, Bin; Zhu, Zechen; Yang, Shuangming; Wei, Xile; Wang, Jiang; Yu, Haitao

    2016-06-01

    Motifs in complex networks play a crucial role in determining the brain functions. In this paper, 13 kinds of motifs are implemented with Field Programmable Gate Array (FPGA) to investigate the relationships between the networks properties and motifs properties. We use discretization method and pipelined architecture to construct various motifs with Hindmarsh-Rose (HR) neuron as the node model. We also build a small-world network based on these motifs and conduct the synchronization analysis of motifs as well as the constructed network. We find that the synchronization properties of motif determine that of motif-based small-world network, which demonstrates effectiveness of our proposed hardware simulation platform. By imitation of some vital nuclei in the brain to generate normal discharges, our proposed FPGA-based artificial neuronal networks have the potential to replace the injured nuclei to complete the brain function in the treatment of Parkinson's disease and epilepsy.

  20. Identifiability and inference of pathway motifs by epistasis analysis

    NASA Astrophysics Data System (ADS)

    Phenix, Hilary; Perkins, Theodore; Kærn, Mads

    2013-06-01

    The accuracy of genetic network inference is limited by the assumptions used to determine if one hypothetical model is better than another in explaining experimental observations. Most previous work on epistasis analysis—in which one attempts to infer pathway relationships by determining equivalences among traits following mutations—has been based on Boolean or linear models. Here, we delineate the ultimate limits of epistasis-based inference by systematically surveying all two-gene network motifs and use symbolic algebra with arbitrary regulation functions to examine trait equivalences. Our analysis divides the motifs into equivalence classes, where different genetic perturbations result in indistinguishable experimental outcomes. We demonstrate that this partitioning can reveal important information about network architecture, and show, using simulated data, that it greatly improves the accuracy of genetic network inference methods. Because of the minimal assumptions involved, equivalence partitioning has broad applicability for gene network inference.

  1. Identifiability and inference of pathway motifs by epistasis analysis.

    PubMed

    Phenix, Hilary; Perkins, Theodore; Kærn, Mads

    2013-06-01

    The accuracy of genetic network inference is limited by the assumptions used to determine if one hypothetical model is better than another in explaining experimental observations. Most previous work on epistasis analysis-in which one attempts to infer pathway relationships by determining equivalences among traits following mutations-has been based on Boolean or linear models. Here, we delineate the ultimate limits of epistasis-based inference by systematically surveying all two-gene network motifs and use symbolic algebra with arbitrary regulation functions to examine trait equivalences. Our analysis divides the motifs into equivalence classes, where different genetic perturbations result in indistinguishable experimental outcomes. We demonstrate that this partitioning can reveal important information about network architecture, and show, using simulated data, that it greatly improves the accuracy of genetic network inference methods. Because of the minimal assumptions involved, equivalence partitioning has broad applicability for gene network inference. PMID:23822501

  2. Characterization of two VQIXXK motifs for tau fibrillization in vitro.

    PubMed

    Li, Wenkai; Lee, Virginia M-Y

    2006-12-26

    Tau proteins are building blocks of the filaments that form neurofibrillary tangles of Alzheimer's disease (AD) and related neurodegenerative tauopathies. It was recently reported that two VQIXXK motifs in the microtubule (MT) binding region, named PHF6 and PHF6*, are responsible for tau fibrillization. However, the exact role each of these motifs plays in this process has not been analyzed in detail. Using a recombinant human tau fragment containing only the four MT-binding repeats (K18), we show that deletion of either PHF6 or PHF6* affected tau assembly but only PHF6 is essential for filament formation, suggesting a critical role of this motif. To determine the amino acid residues within PHF6 that are required for tau fibrillization, a series of deletion and mutation constructs targeting this motif were generated. Deletion of VQI in either PHF6 or PHF6* lessened but did not eliminate K18 fibrillization. However, removal of the single K311 residue from PHF6 completely abrogated the fibril formation of K18. K311D mutation of K18 inhibited tau filament formation, while K311A and K311R mutations had no effect. These data imply that charge change at position 311 is important in tau fibril formation. A similar requirement of nonnegative charge at this position for fibrillization was observed with the full-length human tau isoform (T40), and data from these studies indicate that the formation of fibrils by T40K311D and T40K311P mutants is repressed at the nucleation phase. These findings provide important insights into the mechanisms of tau fibrillization and suggest targets for AD drug discovery to ameliorate neurodegeneration mediated by filamentous tau pathologies.

  3. Graph animals, subgraph sampling, and motif search in large networks

    NASA Astrophysics Data System (ADS)

    Baskerville, Kim; Grassberger, Peter; Paczuski, Maya

    2007-09-01

    We generalize a sampling algorithm for lattice animals (connected clusters on a regular lattice) to a Monte Carlo algorithm for “graph animals,” i.e., connected subgraphs in arbitrary networks. As with the algorithm in [N. Kashtan , Bioinformatics 20, 1746 (2004)], it provides a weighted sample, but the computation of the weights is much faster (linear in the size of subgraphs, instead of superexponential). This allows subgraphs with up to ten or more nodes to be sampled with very high statistics, from arbitrarily large networks. Using this together with a heuristic algorithm for rapidly classifying isomorphic graphs, we present results for two protein interaction networks obtained using the tandem affinity purification (TAP) method: one of Escherichia coli with 230 nodes and 695 links, and one for yeast (Saccharomyces cerevisiae) with roughly ten times more nodes and links. We find in both cases that most connected subgraphs are strong motifs ( Z scores >10 ) or antimotifs ( Z scores <-10 ) when the null model is the ensemble of networks with fixed degree sequence. Strong differences appear between the two networks, with dominant motifs in E. coli being (nearly) bipartite graphs and having many pairs of nodes that connect to the same neighbors, while dominant motifs in yeast tend towards completeness or contain large cliques. We also explore a number of methods that do not rely on measurements of Z scores or comparisons with null models. For instance, we discuss the influence of specific complexes like the 26S proteasome in yeast, where a small number of complexes dominate the k cores with large k and have a decisive effect on the strongest motifs with 6-8 nodes. We also present Zipf plots of counts versus rank. They show broad distributions that are not power laws, in contrast to the case when disconnected subgraphs are included.

  4. Motif, the basics: an overview of the widget set

    SciTech Connect

    McClurg, F.R.

    1992-10-01

    The Motif library provides programmers with a rich set of tools for building a graphical user interface with a three-dimensional appearance and a consistent method of interaction for controlling an Unix application. This Xt-based, high-level library presents an ``object-oriented`` approach to program design for programmers and allows end-users the flexibility to modify attributes of the interface.

  5. Biosynthesis of caffeine underlying the diversity of motif B' methyltransferase.

    PubMed

    Nakayama, Fumiyo; Mizuno, Kouichi; Kato, Misako

    2015-05-01

    Caffeine (1,3,7-trimethylxanthine) and theobromine (3,7-dimethylxanthine) are well-known purine alkaloids in Camellia, Coffea, Cola, Paullinia, Ilex, and Theobroma spp. The caffeine biosynthetic pathway depends on the substrate specificity of N-methyltransferases, which are members of the motif B' methyl-transferase family. The caffeine biosynthetic pathways in purine alkaloid-containing plants might have evolved in parallel with one another, consistent with different catalytic properties of the enzymes involved in these pathways. PMID:26058161

  6. Biomolecular network motif counting and discovery by color coding.

    PubMed

    Alon, Noga; Dao, Phuong; Hajirasouliha, Iman; Hormozdiari, Fereydoun; Sahinalp, S Cenk

    2008-07-01

    Protein-protein interaction (PPI) networks of many organisms share global topological features such as degree distribution, k-hop reachability, betweenness and closeness. Yet, some of these networks can differ significantly from the others in terms of local structures: e.g. the number of specific network motifs can vary significantly among PPI networks. Counting the number of network motifs provides a major challenge to compare biomolecular networks. Recently developed algorithms have been able to count the number of induced occurrences of subgraphs with k < or = 7 vertices. Yet no practical algorithm exists for counting non-induced occurrences, or counting subgraphs with k > or = 8 vertices. Counting non-induced occurrences of network motifs is not only challenging but also quite desirable as available PPI networks include several false interactions and miss many others. In this article, we show how to apply the 'color coding' technique for counting non-induced occurrences of subgraph topologies in the form of trees and bounded treewidth subgraphs. Our algorithm can count all occurrences of motif G' with k vertices in a network G with n vertices in time polynomial with n, provided k = O(log n). We use our algorithm to obtain 'treelet' distributions for k < or = 10 of available PPI networks of unicellular organisms (Saccharomyces cerevisiae Escherichia coli and Helicobacter Pyloris), which are all quite similar, and a multicellular organism (Caenorhabditis elegans) which is significantly different. Furthermore, the treelet distribution of the unicellular organisms are similar to that obtained by the 'duplication model' but are quite different from that of the 'preferential attachment model'. The treelet distribution is robust w.r.t. sparsification with bait/edge coverage of 70% but differences can be observed when bait/edge coverage drops to 50%. PMID:18586721

  7. Biosynthesis of caffeine underlying the diversity of motif B' methyltransferase.

    PubMed

    Nakayama, Fumiyo; Mizuno, Kouichi; Kato, Misako

    2015-05-01

    Caffeine (1,3,7-trimethylxanthine) and theobromine (3,7-dimethylxanthine) are well-known purine alkaloids in Camellia, Coffea, Cola, Paullinia, Ilex, and Theobroma spp. The caffeine biosynthetic pathway depends on the substrate specificity of N-methyltransferases, which are members of the motif B' methyl-transferase family. The caffeine biosynthetic pathways in purine alkaloid-containing plants might have evolved in parallel with one another, consistent with different catalytic properties of the enzymes involved in these pathways.

  8. Motif, the basics: an overview of the widget set

    SciTech Connect

    McClurg, F.R.

    1992-10-01

    The Motif library provides programmers with a rich set of tools for building a graphical user interface with a three-dimensional appearance and a consistent method of interaction for controlling an Unix application. This Xt-based, high-level library presents an object-oriented'' approach to program design for programmers and allows end-users the flexibility to modify attributes of the interface.

  9. Maximum likelihood density modification by pattern recognition of structural motifs

    DOEpatents

    Terwilliger, Thomas C.

    2004-04-13

    An electron density for a crystallographic structure having protein regions and solvent regions is improved by maximizing the log likelihood of a set of structures factors {F.sub.h } using a local log-likelihood function: (x)+p(.rho.(x).vertline.SOLV)p.sub.SOLV (x)+p(.rho.(x).vertline.H)p.sub.H (x)], where p.sub.PROT (x) is the probability that x is in the protein region, p(.rho.(x).vertline.PROT) is the conditional probability for .rho.(x) given that x is in the protein region, and p.sub.SOLV (x) and p(.rho.(x).vertline.SOLV) are the corresponding quantities for the solvent region, p.sub.H (x) refers to the probability that there is a structural motif at a known location, with a known orientation, in the vicinity of the point x; and p(.rho.(x).vertline.H) is the probability distribution for electron density at this point given that the structural motif actually is present. One appropriate structural motif is a helical structure within the crystallographic structure.

  10. Binding cofactors with triplex-based DNA motifs.

    PubMed

    Kröner, Christoph; Göckel, Anja; Liu, Wenjing; Richert, Clemens

    2013-11-18

    Cofactors are pivotal compounds for the cell and many biotechnological processes. It is therefore interesting to ask how well cofactors can be bound by oligonucleotides designed not to convert but to store and release these biomolecules. Here we show that triplex-based DNA binding motifs can be used to bind nucleotides and cofactors, including NADH, FAD, SAM, acetyl CoA, and tetrahydrofolate (THF). Dissociation constants between 0.1 μM for SAM and 35 μM for THF were measured. A two-nucleotide gap still binds NADH. The selectivity for one ligand over the others can be changed by changing the sequence of the binding pocket. For example, a mismatch placed in one of the two triplets adjacent to the base-pairing site changes the selectivity, favoring the binding of FAD over that of ATP. Further, changing one of the two thymines of an A-binding motif to cytosine gives significant affinity for G, whereas changing the other does not. Immobilization of DNA motifs gives beads that store NADH. Exploratory experiments show that the beads release the cofactor upon warming to body temperature.

  11. MAR characteristic motifs mediate episomal vector in CHO cells.

    PubMed

    Lin, Yan; Li, Zhaoxi; Wang, Tianyun; Wang, Xiaoyin; Wang, Li; Dong, Weihua; Jing, Changqin; Yang, Xianjun

    2015-04-01

    An ideal gene therapy vector should enable persistent transgene expression without limitations in safety and reproducibility. Recent researches' insight into the ability of chromosomal matrix attachment regions (MARs) to mediate episomal maintenance of genetic elements allowed the development of a circular episomal vector. Although a MAR-mediated engineered vector has been developed, little is known on which motifs of MAR confer this function during interaction with the host genome. Here, we report an artificially synthesized DNA fragment containing only characteristic motif sequences that served as an alternative to human beta-interferon matrix attachment region sequence. The potential of the vector to mediate gene transfer in CHO cells was investigated. The short synthetic MAR motifs were found to mediate episomal vector at a low copy number for many generations without integration into the host genome. Higher transgene expression was maintained for at least 4 months. In addition, MAR was maintained episomally and conferred sustained EGFP expression even in nonselective CHO cells. All the results demonstrated that MAR characteristic sequence-based vector can function as stable episomes in CHO cells, supporting long-term and effective transgene expression.

  12. A novel swarm intelligence algorithm for finding DNA motifs

    PubMed Central

    Lei, Chengwei; Ruan, Jianhua

    2010-01-01

    Discovering DNA motifs from co-expressed or co-regulated genes is an important step towards deciphering complex gene regulatory networks and understanding gene functions. Despite significant improvement in the last decade, it still remains one of the most challenging problems in computational molecular biology. In this work, we propose a novel motif finding algorithm that finds consensus patterns using a population-based stochastic optimisation technique called Particle Swarm Optimisation (PSO), which has been shown to be effective in optimising difficult multidimensional problems in continuous domains. We propose to use a word dissimilarity graph to remap the neighborhood structure of the solution space of DNA motifs, and propose a modification of the naive PSO algorithm to accommodate discrete variables. In order to improve efficiency, we also propose several strategies for escaping from local optima and for automatically determining the termination criteria. Experimental results on simulated challenge problems show that our method is both more efficient and more accurate than several existing algorithms. Applications to several sets of real promoter sequences also show that our approach is able to detect known transcription factor binding sites, and outperforms two of the most popular existing algorithms. PMID:20090174

  13. Structure and ubiquitin binding of the ubiquitin-interacting motif

    SciTech Connect

    Fisher,R.; Wang, B.; Alam, S.; Higginson, D.; Robinson, H.; Sundquist, C.; Hill, C.

    2003-01-01

    Ubiquitylation is used to target proteins into a large number of different biological processes including proteasomal degradation, endocytosis, virus budding, and vacuolar protein sorting (Vps). Ubiquitylated proteins are typically recognized using one of several different conserved ubiquitin binding modules. Here, we report the crystal structure and ubiquitin binding properties of one such module, the ubiquitin-interacting motif (UIM). We found that UIM peptides from several proteins involved in endocytosis and vacuolar protein sorting including Hrs, Vps27p, Stam1, and Eps15 bound specifically, but with modest affinity (K{sub d} = 0.1-1 mM), to free ubiquitin. Full affinity ubiquitin binding required the presence of conserved acidic patches at the N and C terminus of the UIM, as well as highly conserved central alanine and serine residues. NMR chemical shift perturbation mapping experiments demonstrated that all of these UIM peptides bind to the I44 surface of ubiquitin. The 1.45 {angstrom} resolution crystal structure of the second yeast Vps27p UIM (Vps27p-2) revealed that the ubiquitin-interacting motif forms an amphipathic helix. Although Vps27p-2 is monomeric in solution, the motif unexpectedly crystallized as an antiparallel four-helix bundle, and the potential biological implications of UIM oligomerization are therefore discussed.

  14. An update on cell surface proteins containing extensin-motifs.

    PubMed

    Borassi, Cecilia; Sede, Ana R; Mecchia, Martin A; Salgado Salter, Juan D; Marzol, Eliana; Muschietti, Jorge P; Estevez, Jose M

    2016-01-01

    In recent years it has become clear that there are several molecular links that interconnect the plant cell surface continuum, which is highly important in many biological processes such as plant growth, development, and interaction with the environment. The plant cell surface continuum can be defined as the space that contains and interlinks the cell wall, plasma membrane and cytoskeleton compartments. In this review, we provide an updated view of cell surface proteins that include modular domains with an extensin (EXT)-motif followed by a cytoplasmic kinase-like domain, known as PERKs (for proline-rich extensin-like receptor kinases); with an EXT-motif and an actin binding domain, known as formins; and with extracellular hybrid-EXTs. We focus our attention on the EXT-motifs with the short sequence Ser-Pro(3-5), which is found in several different protein contexts within the same extracellular space, highlighting a putative conserved structural and functional role. A closer understanding of the dynamic regulation of plant cell surface continuum and its relationship with the downstream signalling cascade is a crucial forthcoming challenge.

  15. Event Networks and the Identification of Crime Pattern Motifs

    PubMed Central

    2015-01-01

    In this paper we demonstrate the use of network analysis to characterise patterns of clustering in spatio-temporal events. Such clustering is of both theoretical and practical importance in the study of crime, and forms the basis for a number of preventative strategies. However, existing analytical methods show only that clustering is present in data, while offering little insight into the nature of the patterns present. Here, we show how the classification of pairs of events as close in space and time can be used to define a network, thereby generalising previous approaches. The application of graph-theoretic techniques to these networks can then offer significantly deeper insight into the structure of the data than previously possible. In particular, we focus on the identification of network motifs, which have clear interpretation in terms of spatio-temporal behaviour. Statistical analysis is complicated by the nature of the underlying data, and we provide a method by which appropriate randomised graphs can be generated. Two datasets are used as case studies: maritime piracy at the global scale, and residential burglary in an urban area. In both cases, the same significant 3-vertex motif is found; this result suggests that incidents tend to occur not just in pairs, but in fact in larger groups within a restricted spatio-temporal domain. In the 4-vertex case, different motifs are found to be significant in each case, suggesting that this technique is capable of discriminating between clustering patterns at a finer granularity than previously possible. PMID:26605544

  16. A cost-aggregating integer linear program for motif finding.

    PubMed

    Kingsford, Carl; Zaslavsky, Elena; Singh, Mona

    2011-12-01

    In the motif finding problem one seeks a set of mutually similar substrings within a collection of biological sequences. This is an important and widely-studied problem, as such shared motifs in DNA often correspond to regulatory elements. We study a combinatorial framework where the goal is to find substrings of a given length such that the sum of their pairwise distances is minimized. We describe a novel integer linear program for the problem, which uses the fact that distances between substrings come from a limited set of possibilities allowing for aggregate consideration of sequence position pairs with the same distances. We show how to tighten its linear programming relaxation by adding an exponential set of constraints and give an efficient separation algorithm that can find violated constraints, thereby showing that the tightened linear program can still be solved in polynomial time. We apply our approach to find optimal solutions for the motif finding problem and show that it is effective in practice in uncovering known transcription factor binding sites.

  17. TOPDOM: database of conservatively located domains and motifs in proteins

    PubMed Central

    Varga, Julia; Dobson, László; Tusnády, Gábor E.

    2016-01-01

    Summary: The TOPDOM database—originally created as a collection of domains and motifs located consistently on the same side of the membranes in α-helical transmembrane proteins—has been updated and extended by taking into consideration consistently localized domains and motifs in globular proteins, too. By taking advantage of the recently developed CCTOP algorithm to determine the type of a protein and predict topology in case of transmembrane proteins, and by applying a thorough search for domains and motifs as well as utilizing the most up-to-date version of all source databases, we managed to reach a 6-fold increase in the size of the whole database and a 2-fold increase in the number of transmembrane proteins. Availability and implementation: TOPDOM database is available at http://topdom.enzim.hu. The webpage utilizes the common Apache, PHP5 and MySQL software to provide the user interface for accessing and searching the database. The database itself is generated on a high performance computer. Contact: tusnady.gabor@ttk.mta.hu. Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153630

  18. Motif structure and cooperation in real-world complex networks

    NASA Astrophysics Data System (ADS)

    Salehi, Mostafa; Rabiee, Hamid R.; Jalili, Mahdi

    2010-12-01

    Networks of dynamical nodes serve as generic models for real-world systems in many branches of science ranging from mathematics to physics, technology, sociology and biology. Collective behavior of agents interacting over complex networks is important in many applications. The cooperation between selfish individuals is one of the most interesting collective phenomena. In this paper we address the interplay between the motifs’ cooperation properties and their abundance in a number of real-world networks including yeast protein-protein interaction, human brain, protein structure, email communication, dolphins’ social interaction, Zachary karate club and Net-science coauthorship networks. First, the amount of cooperativity for all possible undirected subgraphs with three to six nodes is calculated. To this end, the evolutionary dynamics of the Prisoner’s Dilemma game is considered and the cooperativity of each subgraph is calculated as the percentage of cooperating agents at the end of the simulation time. Then, the three- to six-node motifs are extracted for each network. The significance of the abundance of a motif, represented by a Z-value, is obtained by comparing them with some properly randomized versions of the original network. We found that there is always a group of motifs showing a significant inverse correlation between their cooperativity amount and Z-value, i.e. the more the Z-value the less the amount of cooperativity. This suggests that networks composed of well-structured units do not have good cooperativity properties.

  19. The mammalian heterochromatin protein 1 binds diverse nuclear proteins through a common motif that targets the chromoshadow domain

    SciTech Connect

    Lechner, Mark S. . E-mail: msl27@drexel.edu; Schultz, David C.; Negorev, Dmitri; Maul, Gerd G.; Rauscher, Frank J.

    2005-06-17

    The HP1 proteins regulate epigenetic gene silencing by promoting and maintaining chromatin condensation. The HP1 chromodomain binds to methylated histone H3. More enigmatic is the chromoshadow domain (CSD), which mediates dimerization, transcription repression, and interaction with multiple nuclear proteins. Here we show that KAP-1, CAF-1 p150, and NIPBL carry a canonical amino acid motif, PxVxL, which binds directly to the CSD with high affinity. We also define a new class of variant PxVxL CSD-binding motifs in Sp100A, LBR, and ATRX. Both canonical and variant motifs recognize a similar surface of the CSD dimer as demonstrated by a panel of CSD mutants. These in vitro binding results were confirmed by the analysis of polypeptides found associated with nuclear HP1 complexes and we provide the first evidence of the NIPBL/delangin protein in human cells, a protein recently implicated in the developmental disorder, Cornelia de Lange syndrome. NIPBL is related to Nipped-B, a factor participating in gene activation by remote enhancers in Drosophila melanogaster. Thus, this spectrum of direct binding partners suggests an expanded role for HP1 as factor participating in promoter-enhancer communication, chromatin remodeling/assembly, and sub-nuclear compartmentalization.

  20. Kalata B8, a novel antiviral circular protein, exhibits conformational flexibility in the cystine knot motif.

    PubMed

    Daly, Norelle L; Clark, Richard J; Plan, Manuel R; Craik, David J

    2006-02-01

    The cyclotides are a family of circular proteins with a range of biological activities and potential pharmaceutical and agricultural applications. The biosynthetic mechanism of cyclization is unknown and the discovery of novel sequences may assist in achieving this goal. In the present study, we have isolated a new cyclotide from Oldenlandia affinis, kalata B8, which appears to be a hybrid of the two major subfamilies (Möbius and bracelet) of currently known cyclotides. We have determined the three-dimensional structure of kalata B8 and observed broadening of resonances directly involved in the cystine knot motif, suggesting flexibility in this region despite it being the core structural element of the cyclotides. The cystine knot motif is widespread throughout Nature and inherently stable, making this apparent flexibility a surprising result. Furthermore, there appears to be isomerization of the peptide backbone at an Asp-Gly sequence in the region involved in the cyclization process. Interestingly, such isomerization has been previously characterized in related cyclic knottins from Momordica cochinchinensis that have no sequence similarity to kalata B8 apart from the six conserved cysteine residues and may result from a common mechanism of cyclization. Kalata B8 also provides insight into the structure-activity relationships of cyclotides as it displays anti-HIV activity but lacks haemolytic activity. The 'uncoupling' of these two activities has not previously been observed for the cyclotides and may be related to the unusual hydrophilic nature of the peptide. PMID:16207177

  1. Binding Mode of Acetylated Histones to Bromodomains: Variations on a Common Motif.

    PubMed

    Marchand, Jean-Rémy; Caflisch, Amedeo

    2015-08-01

    Bromodomains, epigenetic readers that recognize acetylated lysine residues in histone tails, are potential drug targets in cancer and inflammation. Herein we review the crystal structures of human bromodomains in complex with histone tails and analyze the main interaction motifs. The histone backbone is extended and occupies, in one of the two possible orientations, the bromodomain surface groove lined by the ZA and BC loops. The acetyl-lysine side chain is buried in the cavity between the four helices of the bromodomain, and its oxygen atom accepts hydrogen bonds from a structural water molecule and a conserved asparagine residue in the BC loop. In stark contrast to this common binding motif, a large variety of ancillary interactions emerge from our analysis. In 10 of 26 structures, a basic side chain (up to five residues up- or downstream in sequence with respect to the acetyl-lysine) interacts with the carbonyl groups of the C-terminal turn of helix αB. Furthermore, the complexes reveal many heterogeneous backbone hydrogen bonds (direct or water-bridged). These interactions contribute unselectively to the binding of acetylated histone tails to bromodomains, which provides further evidence that specific recognition is modulated by combinations of multiple histone modifications and multiple modules of the proteins involved in transcription.

  2. Beyond consensus: statistical free energies reveal hidden interactions in the design of a TPR motif.

    PubMed

    Magliery, Thomas J; Regan, Lynne

    2004-10-22

    Consensus design methods have been used successfully to engineer proteins with a particular fold, and moreover to engineer thermostable exemplars of particular folds. Here, we consider how a statistical free energy approach can expand upon current methods of phylogenetic design. As an example, we have analyzed the tetratricopeptide repeat (TPR) motif, using multiple sequence alignment to identify the significance of each position in the TPR. The results provide information above and beyond that revealed by consensus design alone, especially at poorly conserved positions. A particularly striking finding is that certain residues, which TPR-peptide co-crystal structures show are in direct contact with the ligand, display a marked hypervariability. This suggests a novel means of identifying ligand-binding sites, and also implies that TPRs generally function as ligand-binding domains. Using perturbation analysis (or statistical coupling analysis), we examined site-site interactions within the TPR motif. Correlated occurrences of amino acid residues at poorly conserved positions explain how TPRs achieve their near-neutral surface charge distributions, and why a TPR designed from straight consensus has an unusually high net charge. Networks of interacting sites revealed that TPRs fall into two unrecognized families with distinct sets of interactions related to the identity of position 7 (Leu or Lys/Arg). Statistical free energy analysis provides a more complete description of "What makes a TPR a TPR?" than consensus alone, and it suggests general approaches to extend and improve the phylogenetic design of proteins.

  3. Kalata B8, a novel antiviral circular protein, exhibits conformational flexibility in the cystine knot motif.

    PubMed

    Daly, Norelle L; Clark, Richard J; Plan, Manuel R; Craik, David J

    2006-02-01

    The cyclotides are a family of circular proteins with a range of biological activities and potential pharmaceutical and agricultural applications. The biosynthetic mechanism of cyclization is unknown and the discovery of novel sequences may assist in achieving this goal. In the present study, we have isolated a new cyclotide from Oldenlandia affinis, kalata B8, which appears to be a hybrid of the two major subfamilies (Möbius and bracelet) of currently known cyclotides. We have determined the three-dimensional structure of kalata B8 and observed broadening of resonances directly involved in the cystine knot motif, suggesting flexibility in this region despite it being the core structural element of the cyclotides. The cystine knot motif is widespread throughout Nature and inherently stable, making this apparent flexibility a surprising result. Furthermore, there appears to be isomerization of the peptide backbone at an Asp-Gly sequence in the region involved in the cyclization process. Interestingly, such isomerization has been previously characterized in related cyclic knottins from Momordica cochinchinensis that have no sequence similarity to kalata B8 apart from the six conserved cysteine residues and may result from a common mechanism of cyclization. Kalata B8 also provides insight into the structure-activity relationships of cyclotides as it displays anti-HIV activity but lacks haemolytic activity. The 'uncoupling' of these two activities has not previously been observed for the cyclotides and may be related to the unusual hydrophilic nature of the peptide.

  4. Structure of a (Cys3His) zinc ribbon, a ubiquitous motif in archaeal and eucaryal transcription.

    PubMed

    Chen, H T; Legault, P; Glushka, J; Omichinski, J G; Scott, R A

    2000-09-01

    Transcription factor IIB (TFIIB) is an essential component in the formation of the transcription initiation complex in eucaryal and archaeal transcription. TFIIB interacts with a promoter complex containing the TATA-binding protein (TBP) to facilitate interaction with RNA polymerase II (RNA pol II) and the associated transcription factor IIF (TFIIF). TFIIB contains a zinc-binding motif near the N-terminus that is directly involved in the interaction with RNA pol II/TFIIF and plays a crucial role in selecting the transcription initiation site. The solution structure of the N-terminal residues 2-59 of human TFIIB was determined by multidimensional NMR spectroscopy. The structure consists of a nearly tetrahedral Zn(Cys)3(His)1 site confined by type I and "rubredoxin" turns, three antiparallel beta-strands, and disordered loops. The structure is similar to the reported zinc-ribbon motifs in several transcription-related proteins from archaea and eucarya, including Pyrococcus furiosus transcription factor B (PfTFB), human and yeast transcription factor IIS (TFIIS), and Thermococcus celer RNA polymerase II subunit M (TcRPOM). The zinc-ribbon structure of TFIIB, in conjunction with the biochemical analyses, suggests that residues on the beta-sheet are involved in the interaction with RNA pol II/TFIIF, while the zinc-binding site may increase the stability of the beta-sheet. PMID:11045620

  5. Structure of a (Cys3His) zinc ribbon, a ubiquitous motif in archaeal and eucaryal transcription.

    PubMed Central

    Chen, H. T.; Legault, P.; Glushka, J.; Omichinski, J. G.; Scott, R. A.

    2000-01-01

    Transcription factor IIB (TFIIB) is an essential component in the formation of the transcription initiation complex in eucaryal and archaeal transcription. TFIIB interacts with a promoter complex containing the TATA-binding protein (TBP) to facilitate interaction with RNA polymerase II (RNA pol II) and the associated transcription factor IIF (TFIIF). TFIIB contains a zinc-binding motif near the N-terminus that is directly involved in the interaction with RNA pol II/TFIIF and plays a crucial role in selecting the transcription initiation site. The solution structure of the N-terminal residues 2-59 of human TFIIB was determined by multidimensional NMR spectroscopy. The structure consists of a nearly tetrahedral Zn(Cys)3(His)1 site confined by type I and "rubredoxin" turns, three antiparallel beta-strands, and disordered loops. The structure is similar to the reported zinc-ribbon motifs in several transcription-related proteins from archaea and eucarya, including Pyrococcus furiosus transcription factor B (PfTFB), human and yeast transcription factor IIS (TFIIS), and Thermococcus celer RNA polymerase II subunit M (TcRPOM). The zinc-ribbon structure of TFIIB, in conjunction with the biochemical analyses, suggests that residues on the beta-sheet are involved in the interaction with RNA pol II/TFIIF, while the zinc-binding site may increase the stability of the beta-sheet. PMID:11045620

  6. More robust detection of motifs in coexpressed genes by using phylogenetic information

    PubMed Central

    Monsieurs, Pieter; Thijs, Gert; Fadda, Abeer A; De Keersmaecker, Sigrid CJ; Vanderleyden, Jozef; De Moor, Bart; Marchal, Kathleen

    2006-01-01

    Background Several motif detection algorithms have been developed to discover overrepresented motifs in sets of coexpressed genes. However, in a noisy gene list, the number of genes containing the motif versus the number lacking the motif might not be sufficiently high to allow detection by classical motif detection tools. To still recover motifs which are not significantly enriched but still present, we developed a procedure in which we use phylogenetic footprinting to first delineate all potential motifs in each gene. Then we mutually compare all detected motifs and identify the ones that are shared by at least a few genes in the data set as potential candidates. Results We applied our methodology to a compiled test data set containing known regulatory motifs and to two biological data sets derived from genome wide expression studies. By executing four consecutive steps of 1) identifying conserved regions in orthologous intergenic regions, 2) aligning these conserved regions, 3) clustering the conserved regions containing similar regulatory regions followed by extraction of the regulatory motifs and 4) screening the input intergenic sequences with detected regulatory motif models, our methodology proves to be a powerful tool for detecting regulatory motifs when a low signal to noise ratio is present in the input data set. Comparing our results with two other motif detection algorithms points out the robustness of our algorithm. Conclusion We developed an approach that can reliably identify multiple regulatory motifs lacking a high degree of overrepresentation in a set of coexpressed genes (motifs belonging to sparsely connected hubs in the regulatory network) by exploiting the advantages of using both coexpression and phylogenetic information. PMID:16549017

  7. Long-Lived Excited-State Dynamics of i-Motif Structures Probed by Time-Resolved Infrared Spectroscopy.

    PubMed

    Keane, Páraic M; Baptista, Frederico R; Gurung, Sarah P; Devereux, Stephen J; Sazanovich, Igor V; Towrie, Michael; Brazier, John A; Cardin, Christine J; Kelly, John M; Quinn, Susan J

    2016-05-01

    UV-generated excited states of cytosine (C) nucleobases are precursors to mutagenic photoproduct formation. The i-motif formed from C-rich sequences is known to exhibit high yields of long-lived excited states following UV absorption. Here the excited states of several i-motif structures have been characterized following 267 nm laser excitation using time-resolved infrared spectroscopy (TRIR). All structures possess a long-lived excited state of ∼300 ps and notably in some cases decays greater than 1 ns are observed. These unusually long-lived lifetimes are attributed to the interdigitated DNA structure which prevents direct base stacking overlap.

  8. Direct transcriptional regulation of Six6 is controlled by SoxB1 binding to a remote forebrain enhancer

    PubMed Central

    Lee, Bumwhee; Rizzoti, Karine; Kwon, David S.; Kim, Seon-Young; Oh, Sangtaek; Epstein, Douglas J.; Son, Youngsook; Yoon, Jaeseung; Baek, Kwanghee; Jeong, Yongsu

    2014-01-01

    Six6, a sine oculis homeobox protein, plays a crucial and conserved role in the development of the forebrain and eye. To understand how the expression of Six6 is regulated during embryogenesis, we screened ~250 kb of genomic DNA encompassing the Six6 locus for cis-regulatory elements capable of directing reporter gene expression to sites of Six6 transcription in transgenic mouse embryos. Here, we describe two novel enhancer elements, that are highly conserved in vertebrate species and whose activities recapitulate Six6 expression in the ventral forebrain and eye, respectively. Cross-species comparisons of the Six6 forebrain enhancer sequences revealed highly conserved binding sites matching the consensus for homeodomain and SoxB1 transcription factors. Deletion of either of the binding sites resulted in loss of the forebrain enhancer activity in the ventral forebrain. Moreover, our studies show that members of the SoxB1 family, including Sox2 and Sox3, are expressed in the overlapping region of the ventral forebrain with Six6 and can bind to the Six6 forebrain enhancer. Loss of function of SoxB1 genes in vivo further emphasizes their role in regulating Six6 forebrain enhancer activity. Thus, our data strongly suggest that SoxB1 transcription factors are direct activators of Six6 expression in the ventral forebrain. PMID:22561201

  9. CAGEd-oPOSSUM: motif enrichment analysis from CAGE-derived TSSs

    PubMed Central

    Arenillas, David J.; Forrest, Alistair R. R.; Kawaji, Hideya; Lassmann, Timo; Wasserman, Wyeth W.; Mathelier, Anthony

    2016-01-01

    With the emergence of large-scale Cap Analysis of Gene Expression (CAGE) datasets from individual labs and the FANTOM consortium, one can now analyze the cis-regulatory regions associated with gene transcription at an unprecedented level of refinement. By coupling transcription factor binding site (TFBS) enrichment analysis with CAGE-derived genomic regions, CAGEd-oPOSSUM can identify TFs that act as key regulators of genes involved in specific mammalian cell and tissue types. The webtool allows for the analysis of CAGE-derived transcription start sites (TSSs) either provided by the user or selected from ∼1300 mammalian samples from the FANTOM5 project with pre-computed TFBS predicted with JASPAR TF binding profiles. The tool helps power insights into the regulation of genes through the study of the specific usage of TSSs within specific cell types and/or under specific conditions. Availability and Implementation: The CAGEd-oPOSUM web tool is implemented in Perl, MySQL and Apache and is available at http://cagedop.cmmt.ubc.ca/CAGEd_oPOSSUM. Contacts: anthony.mathelier@ncmm.uio.no or wyeth@cmmt.ubc.ca Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27334471

  10. Mutational analysis of two highly conserved motifs in the silencing suppressor encoded by tomato spotted wilt virus (genus Tospovirus, family Bunyaviridae).

    PubMed

    Zhai, Ying; Bag, Sudeep; Mitter, Neena; Turina, Massimo; Pappu, Hanu R

    2014-06-01

    Tospoviruses cause serious economic losses to a wide range of field and horticultural crops on a global scale. The NSs gene encoded by tospoviruses acts as a suppressor of host plant defense. We identified amino acid motifs that are conserved in all of the NSs proteins of tospoviruses for which the sequence is known. Using tomato spotted wilt virus (TSWV) as a model, the role of these motifs in suppressor activity of NSs was investigated. Using site-directed point mutations in two conserved motifs, glycine, lysine and valine/threonine (GKV/T) at positions 181-183 and tyrosine and leucine (YL) at positions 412-413, and an assay to measure the reversal of gene silencing in Nicotiana benthamiana line 16c, we show that substitutions (K182 to A, and L413 to A) in these motifs abolished suppressor activity of the NSs protein, indicating that these two motifs are essential for the RNAi suppressor function of tospoviruses. PMID:24363189

  11. Motif-based analysis of large nucleotide data sets using MEME-ChIP.

    PubMed

    Ma, Wenxiu; Noble, William S; Bailey, Timothy L

    2014-01-01

    MEME-ChIP is a web-based tool for analyzing motifs in large DNA or RNA data sets. It can analyze peak regions identified by ChIP-seq, cross-linking sites identified by CLIP-seq and related assays, as well as sets of genomic regions selected using other criteria. MEME-ChIP performs de novo motif discovery, motif enrichment analysis, motif location analysis and motif clustering, providing a comprehensive picture of the DNA or RNA motifs that are enriched in the input sequences. MEME-ChIP performs two complementary types of de novo motif discovery: weight matrix-based discovery for high accuracy; and word-based discovery for high sensitivity. Motif enrichment analysis using DNA or RNA motifs from human, mouse, worm, fly and other model organisms provides even greater sensitivity. MEME-ChIP's interactive HTML output groups and aligns significant motifs to ease interpretation. This protocol takes less than 3 h, and it provides motif discovery approaches that are distinct and complementary to other online methods. PMID:24853928

  12. Multiple Weak Linear Motifs Enhance Recruitment and Processivity in SPOP-Mediated Substrate Ubiquitination.

    PubMed

    Pierce, Wendy K; Grace, Christy R; Lee, Jihun; Nourse, Amanda; Marzahn, Melissa R; Watson, Edmond R; High, Anthony A; Peng, Junmin; Schulman, Brenda A; Mittag, Tanja

    2016-03-27

    Primary sequence motifs, with millimolar affinities for binding partners, are abundant in disordered protein regions. In multivalent interactions, such weak linear motifs can cooperate to recruit binding partners via avidity effects. If linear motifs recruit modifying enzymes, optimal placement of weak motifs may regulate access to modification sites. Weak motifs may thus exert physiological relevance stronger than that suggested by their affinities, but molecular mechanisms of their function are still poorly understood. Herein, we use the N-terminal disordered region of the Hedgehog transcriptional regulator Gli3 (Gli3(1-90)) to determine the role of weak motifs encoded in its primary sequence for the recruitment of its ubiquitin ligase CRL3(SPOP) and the subsequent effect on ubiquitination efficiency. The substrate adaptor SPOP binds linear motifs through its MATH (meprin and TRAF homology) domain and forms higher-order oligomers through its oligomerization domains, rendering SPOP multivalent for its substrates. Gli3 has multiple weak SPOP binding motifs. We map three such motifs in Gli3(1-90), the weakest of which has a millimolar dissociation constant. Multivalency of ligase and substrate for each other facilitates enhanced ligase recruitment and stimulates Gli3(1-90) ubiquitination in in vitro ubiquitination assays. We speculate that the weak motifs enable processivity through avidity effects and by providing steric access to lysine residues that are otherwise not prioritized for polyubiquitination. Weak motifs may generally be employed in multivalent systems to act as gatekeepers regulating post-translational modification. PMID:26475525

  13. Motif-based analysis of large nucleotide data sets using MEME-ChIP

    PubMed Central

    Ma, Wenxiu; Noble, William S; Bailey, Timothy L

    2014-01-01

    MEME-ChIP is a web-based tool for analyzing motifs in large DNA or RNA data sets. It can analyze peak regions identified by ChIP-seq, cross-linking sites identified by cLIP-seq and related assays, as well as sets of genomic regions selected using other criteria. MEME-ChIP performs de novo motif discovery, motif enrichment analysis, motif location analysis and motif clustering, providing a comprehensive picture of the DNA or RNA motifs that are enriched in the input sequences. MEME-ChIP performs two complementary types of de novo motif discovery: weight matrix–based discovery for high accuracy; and word-based discovery for high sensitivity. Motif enrichment analysis using DNA or RNA motifs from human, mouse, worm, fly and other model organisms provides even greater sensitivity. MEME-ChIP’s interactive HTML output groups and aligns significant motifs to ease interpretation. this protocol takes less than 3 h, and it provides motif discovery approaches that are distinct and complementary to other online methods. PMID:24853928

  14. Co-evolution of segregation guide DNA motifs and the FtsK translocase in bacteria: identification of the atypical Lactococcus lactis KOPS motif

    PubMed Central

    Nolivos, Sophie; Touzain, Fabrice; Pages, Carine; Coddeville, Michele; Rousseau, Philippe; El Karoui, Meriem; Le Bourgeois, Pascal; Cornet, François

    2012-01-01

    Bacteria use the global bipolarization of their chromosomes into replichores to control the dynamics and segregation of their genome during the cell cycle. This involves the control of protein activities by recognition of specific short DNA motifs whose orientation along the chromosome is highly skewed. The KOPS motifs act in chromosome segregation by orienting the activity of the FtsK DNA translocase towards the terminal replichore junction. KOPS motifs have been identified in γ-Proteobacteria and in Bacillus subtilis as closely related G-rich octamers. We have identified the KOPS motif of Lactococcus lactis, a model bacteria of the Streptococcaceae family harbouring a compact and low GC% genome. This motif, 5′-GAAGAAG-3, was predicted in silico using the occurrence and skew characteristics of known KOPS motifs. We show that it is specifically recognized by L. lactis FtsK in vitro and controls its activity in vivo. L. lactis KOPS is thus an A-rich heptamer motif. Our results show that KOPS-controlled chromosome segregation is conserved in Streptococcaceae but that KOPS may show important variation in sequence and length between bacterial families. This suggests that FtsK adapts to its host genome by selecting motifs with convenient occurrence frequencies and orientation skews to orient its activity. PMID:22373923

  15. DNA nanotechnology based on i-motif structures.

    PubMed

    Dong, Yuanchen; Yang, Zhongqiang; Liu, Dongsheng

    2014-06-17

    CONSPECTUS: Most biological processes happen at the nanometer scale, and understanding the energy transformations and material transportation mechanisms within living organisms has proved challenging. To better understand the secrets of life, researchers have investigated artificial molecular motors and devices over the past decade because such systems can mimic certain biological processes. DNA nanotechnology based on i-motif structures is one system that has played an important role in these investigations. In this Account, we summarize recent advances in functional DNA nanotechnology based on i-motif structures. The i-motif is a DNA quadruplex that occurs as four stretches of cytosine repeat sequences form C·CH(+) base pairs, and their stabilization requires slightly acidic conditions. This unique property has produced the first DNA molecular motor driven by pH changes. The motor is reliable, and studies show that it is capable of millisecond running speeds, comparable to the speed of natural protein motors. With careful design, the output of these types of motors was combined to drive micrometer-sized cantilevers bend. Using established DNA nanostructure assembly and functionalization methods, researchers can easily integrate the motor within other DNA assembled structures and functional units, producing DNA molecular devices with new functions such as suprahydrophobic/suprahydrophilic smart surfaces that switch, intelligent nanopores triggered by pH changes, molecular logic gates, and DNA nanosprings. Recently, researchers have produced motors driven by light and electricity, which have allowed DNA motors to be integrated within silicon-based nanodevices. Moreover, some devices based on i-motif structures have proven useful for investigating processes within living cells. The pH-responsiveness of the i-motif structure also provides a way to control the stepwise assembly of DNA nanostructures. In addition, because of the stability of the i-motif, this

  16. DNA nanotechnology based on i-motif structures.

    PubMed

    Dong, Yuanchen; Yang, Zhongqiang; Liu, Dongsheng

    2014-06-17

    CONSPECTUS: Most biological processes happen at the nanometer scale, and understanding the energy transformations and material transportation mechanisms within living organisms has proved challenging. To better understand the secrets of life, researchers have investigated artificial molecular motors and devices over the past decade because such systems can mimic certain biological processes. DNA nanotechnology based on i-motif structures is one system that has played an important role in these investigations. In this Account, we summarize recent advances in functional DNA nanotechnology based on i-motif structures. The i-motif is a DNA quadruplex that occurs as four stretches of cytosine repeat sequences form C·CH(+) base pairs, and their stabilization requires slightly acidic conditions. This unique property has produced the first DNA molecular motor driven by pH changes. The motor is reliable, and studies show that it is capable of millisecond running speeds, comparable to the speed of natural protein motors. With careful design, the output of these types of motors was combined to drive micrometer-sized cantilevers bend. Using established DNA nanostructure assembly and functionalization methods, researchers can easily integrate the motor within other DNA assembled structures and functional units, producing DNA molecular devices with new functions such as suprahydrophobic/suprahydrophilic smart surfaces that switch, intelligent nanopores triggered by pH changes, molecular logic gates, and DNA nanosprings. Recently, researchers have produced motors driven by light and electricity, which have allowed DNA motors to be integrated within silicon-based nanodevices. Moreover, some devices based on i-motif structures have proven useful for investigating processes within living cells. The pH-responsiveness of the i-motif structure also provides a way to control the stepwise assembly of DNA nanostructures. In addition, because of the stability of the i-motif, this

  17. Short sequence motifs, overrepresented in mammalian conservednon-coding sequences

    SciTech Connect

    Minovitsky, Simon; Stegmaier, Philip; Kel, Alexander; Kondrashov,Alexey S.; Dubchak, Inna

    2007-02-21

    Background: A substantial fraction of non-coding DNAsequences of multicellular eukaryotes is under selective constraint. Inparticular, ~;5 percent of the human genome consists of conservednon-coding sequences (CNSs). CNSs differ from other genomic sequences intheir nucleotide composition and must play important functional roles,which mostly remain obscure.Results: We investigated relative abundancesof short sequence motifs in all human CNSs present in the human/mousewhole-genome alignments vs. three background sets of sequences: (i)weakly conserved or unconserved non-coding sequences (non-CNSs); (ii)near-promoter sequences (located between nucleotides -500 and -1500,relative to a start of transcription); and (iii) random sequences withthe same nucleotide composition as that of CNSs. When compared tonon-CNSs and near-promoter sequences, CNSs possess an excess of AT-richmotifs, often containing runs of identical nucleotides. In contrast, whencompared to random sequences, CNSs contain an excess of GC-rich motifswhich, however, lack CpG dinucleotides. Thus, abundance of short sequencemotifs in human CNSs, taken as a whole, is mostly determined by theiroverall compositional properties and not by overrepresentation of anyspecific short motifs. These properties are: (i) high AT-content of CNSs,(ii) a tendency, probably due to context-dependent mutation, of A's andT's to clump, (iii) presence of short GC-rich regions, and (iv) avoidanceof CpG contexts, due to their hypermutability. Only a small number ofshort motifs, overrepresented in all human CNSs are similar to bindingsites of transcription factors from the FOX family.Conclusion: Human CNSsas a whole appear to be too broad a class of sequences to possess strongfootprints of any short sequence-specific functions. Such footprintsshould be studied at the level of functional subclasses of CNSs, such asthose which flank genes with a particular pattern of expression. Overallproperties of CNSs are affected by patterns in

  18. Sequence-based classification using discriminatory motif feature selection.

    PubMed

    Xiong, Hao; Capurso, Daniel; Sen, Saunak; Segal, Mark R

    2011-01-01

    Most existing methods for sequence-based classification use exhaustive feature generation, employing, for example, all k-mer patterns. The motivation behind such (enumerative) approaches is to minimize the potential for overlooking important features. However, there are shortcomings to this strategy. First, practical constraints limit the scope of exhaustive feature generation to patterns of length ≤ k, such that potentially important, longer (> k) predictors are not considered. Second, features so generated exhibit strong dependencies, which can complicate understanding of derived classification rules. Third, and most importantly, numerous irrelevant features are created. These concerns can compromise prediction and interpretation. While remedies have been proposed, they tend to be problem-specific and not broadly applicable. Here, we develop a generally applicable methodology, and an attendant software pipeline, that is predicated on discriminatory motif finding. In addition to the traditional training and validation partitions, our framework entails a third level of data partitioning, a discovery partition. A discriminatory motif finder is used on sequences and associated class labels in the discovery partition to yield a (small) set of features. These features are then used as inputs to a classifier in the training partition. Finally, performance assessment occurs on the validation partition. Important attributes of our approach are its modularity (any discriminatory motif finder and any classifier can be deployed) and its universality (all data, including sequences that are unaligned and/or of unequal length, can be accommodated). We illustrate our approach on two nucleosome occupancy datasets and a protein solubility dataset, previously analyzed using enumerative feature generation. Our method achieves excellent performance results, with and without optimization of classifier tuning parameters. A Python pipeline implementing the approach is available at

  19. Present status of quinoxaline motifs: excellent pathfinders in therapeutic medicine.

    PubMed

    Ajani, Olayinka Oyewale

    2014-10-01

    Quinoxalines belong to a class of excellent heterocyclic scaffolds owing to their wide biological properties and diverse therapeutic applications in medicinal research. They are complementary in shapes and charges to numerous biomolecules they interact with, thereby resulting in increased binding affinity. The pharmacokinetic properties of drugs bearing quinoxaline cores have shown them to be relatively easy to administer either as intramuscular solutions, oral capsules or rectal suppositories. This work deals with recent advances in the synthesis and pharmacological diversities of quinoxaline motifs which might pave ways for novel drugs development.

  20. Nucleic Acid i-Motif Structures in Analytical Chemistry.

    PubMed

    Alba, Joan Josep; Sadurní, Anna; Gargallo, Raimundo

    2016-09-01

    Under the appropriate experimental conditions of pH and temperature, cytosine-rich segments in DNA or RNA sequences may produce a characteristic folded structure known as an i-motif. Besides its potential role in vivo, which is still under investigation, this structure has attracted increasing interest in other fields due to its sharp, fast and reversible pH-driven conformational changes. This "on/off" switch at molecular level is being used in nanotechnology and analytical chemistry to develop nanomachines and sensors, respectively. This paper presents a review of the latest applications of this structure in the field of chemical analysis.

  1. Dysprosium-carboxylate nanomeshes with tunable cavity size and assembly motif through ionic interactions.

    PubMed

    Cirera, B; Đorđević, L; Otero, R; Gallego, J M; Bonifazi, D; Miranda, R; Ecija, D

    2016-09-28

    We report the design of dysprosium directed metallo-supramolecular architectures on a pristine Cu(111) surface. By an appropriate selection of the ditopic molecular linkers equipped with terminal carboxylic groups (TPA, PDA and TDA species), we create reticular and mononuclear metal-organic nanomeshes of tunable internodal distance, which are stabilized by eight-fold DyO interactions. A thermal annealing treatment for the reticular Dy:TDA architecture gives rise to an unprecedented quasi-hexagonal nanostructure based on dinuclear Dy clusters, exhibiting a unique six-fold DyO bonding motif. All metallo-supramolecular architectures are stable at room temperature. Our results open new avenues for the engineering of supramolecular architectures on surfaces incorporating f-block elements forming thermally robust nanoarchitectures through ionic bonds. PMID:27560774

  2. Dysprosium-carboxylate nanomeshes with tunable cavity size and assembly motif through ionic interactions.

    PubMed

    Cirera, B; Đorđević, L; Otero, R; Gallego, J M; Bonifazi, D; Miranda, R; Ecija, D

    2016-09-28

    We report the design of dysprosium directed metallo-supramolecular architectures on a pristine Cu(111) surface. By an appropriate selection of the ditopic molecular linkers equipped with terminal carboxylic groups (TPA, PDA and TDA species), we create reticular and mononuclear metal-organic nanomeshes of tunable internodal distance, which are stabilized by eight-fold DyO interactions. A thermal annealing treatment for the reticular Dy:TDA architecture gives rise to an unprecedented quasi-hexagonal nanostructure based on dinuclear Dy clusters, exhibiting a unique six-fold DyO bonding motif. All metallo-supramolecular architectures are stable at room temperature. Our results open new avenues for the engineering of supramolecular architectures on surfaces incorporating f-block elements forming thermally robust nanoarchitectures through ionic bonds.

  3. Bacterial RNA motif in the 5′ UTR of rpsF interacts with an S6:S18 complex

    PubMed Central

    Fu, Yang; Deiorio-Haggar, Kaila; Soo, Mark W.; Meyer, Michelle M.

    2014-01-01

    Approximately half the transcripts encoding ribosomal proteins in Escherichia coli include a structured RNA motif that interacts with a specific ribosomal protein to inhibit gene expression, thus allowing stoichiometric production of ribosome components. However, many of these RNA structures are not widely distributed across bacterial phyla. It is increasingly common for RNA motifs associated with ribosomal protein genes to be identified using comparative genomic methods, yet these are rarely experimentally validated. In this work, we characterize one such motif that precedes operons containing rpsF and rpsR, which encode ribosomal proteins S6 and S18. This RNA structure is widely distributed across many phyla of bacteria despite differences within the downstream operon, and examples are present in both E. coli and Bacillus subtilis. We demonstrate a direct interaction between an example of the RNA from B. subtilis and an S6:S18 complex using in vitro binding assays, verify our predicted secondary structure, and identify a putative protein-binding site. The proposed binding site bears a strong resemblance to the S18 binding site within the 16S rRNA, suggesting molecular mimicry. This interaction is a valuable addition to the canon of ribosomal protein mRNA interactions. This work shows how experimental verification translates computational results into concrete knowledge of biological systems. PMID:24310371

  4. Single-molecule study of thymidine glycol and i-motif through the alpha-hemolysin ion channel

    NASA Astrophysics Data System (ADS)

    He, Lidong

    Nanopore-based devices have emerged as a single-molecule detection and analysis tool for a wide range of applications. Through electrophoretically driving DNA molecules across a nanosized pore, a lot of information can be received, including unfolding kinetics and DNA-protein interactions. This single-molecule method has the potential to sequence kilobase length DNA polymers without amplification or labeling, approaching "the third generation" genome sequencing for around $1000 within 24 hours. alpha-Hemolysin biological nanopores have the advantages of excellent stability, low-noise level, and precise site-directed mutagenesis for engineering this protein nanopore. The first work presented in this thesis established the current signal of the thymidine glycol lesion in DNA oligomers through an immobilization experiment. The thymidine glycol enantiomers were differentiated from each other by different current blockage levels. Also, the effect of bulky hydrophobic adducts to the current blockage was investigated. Secondly, the alpha-hemolysin nanopore was used to study the human telomere i-motif and RET oncogene i-motif at a single-molecule level. In Chapter 3, it was demonstrated that the alpha-hemolysin nanopore can differentiate an i-motif form and single-strand DNA form at different pH values based on the same sequence. In addition, it shows potential to differentiate the folding topologies generated from the same DNA sequence.

  5. A conserved motif in JNK/p38-specific MAPK phosphatases as a determinant for JNK1 recognition and inactivation.

    PubMed

    Liu, Xin; Zhang, Chen-Song; Lu, Chang; Lin, Sheng-Cai; Wu, Jia-Wei; Wang, Zhi-Xin

    2016-01-01

    Mitogen-activated protein kinases (MAPKs), important in a large array of signalling pathways, are tightly controlled by a cascade of protein kinases and by MAPK phosphatases (MKPs). MAPK signalling efficiency and specificity is modulated by protein-protein interactions between individual MAPKs and the docking motifs in cognate binding partners. Two types of docking interactions have been identified: D-motif-mediated interaction and FXF-docking interaction. Here we report the crystal structure of JNK1 bound to the catalytic domain of MKP7 at 2.4-Å resolution, providing high-resolution structural insight into the FXF-docking interaction. The (285)FNFL(288) segment in MKP7 directly binds to a hydrophobic site on JNK1 that is near the MAPK insertion and helix αG. Biochemical studies further reveal that this highly conserved structural motif is present in all members of the MKP family, and the interaction mode is universal and critical for the MKP-MAPK recognition and biological function. PMID:26988444

  6. A conserved motif in JNK/p38-specific MAPK phosphatases as a determinant for JNK1 recognition and inactivation

    PubMed Central

    Liu, Xin; Zhang, Chen-Song; Lu, Chang; Lin, Sheng-Cai; Wu, Jia-Wei; Wang, Zhi-Xin

    2016-01-01

    Mitogen-activated protein kinases (MAPKs), important in a large array of signalling pathways, are tightly controlled by a cascade of protein kinases and by MAPK phosphatases (MKPs). MAPK signalling efficiency and specificity is modulated by protein–protein interactions between individual MAPKs and the docking motifs in cognate binding partners. Two types of docking interactions have been identified: D-motif-mediated interaction and FXF-docking interaction. Here we report the crystal structure of JNK1 bound to the catalytic domain of MKP7 at 2.4-Å resolution, providing high-resolution structural insight into the FXF-docking interaction. The 285FNFL288 segment in MKP7 directly binds to a hydrophobic site on JNK1 that is near the MAPK insertion and helix αG. Biochemical studies further reveal that this highly conserved structural motif is present in all members of the MKP family, and the interaction mode is universal and critical for the MKP-MAPK recognition and biological function. PMID:26988444

  7. The promoter competition assay (PCA): a new approach to identify motifs involved in the transcriptional activity of reporter genes.

    PubMed

    Hube, Florent; Myal, Yvonne; Leygue, Etienne

    2006-05-01

    Identifying particular motifs responsible for promoter activity is a crucial step toward the development of new gene-based preventive and therapeutic strategies. However, to date, experimental methods to study promoter activity remain limited. We present in this report a promoter competition assay designed to identify, within a given promoter region, motifs critical for its activity. This assay consists in co-transfecting the promoter to be analyzed and double-stranded oligonucleotides which will compete for the binding of transcription factors. Using the recently characterized SBEM promoter as model, we first delineated the feasibility of the method and optimized the experimental conditions. We then identified, within an 87-bp region responsible for a strong expression of the reporter gene, an octamer-binding site essential for its transcriptional regulation. The importance of this motif has been confirmed by site-directed mutagenesis. The promoter competition assay appears to be a fast and efficient approach to identify, within a given promoter sequence, sites critical for its activity.

  8. TAF4, a subunit of transcription factor II D, directs promoter occupancy of nuclear receptor HNF4A during post-natal hepatocyte differentiation

    PubMed Central

    Alpern, Daniil; Langer, Diana; Ballester, Benoit; Le Gras, Stephanie; Romier, Christophe; Mengus, Gabrielle; Davidson, Irwin

    2014-01-01

    The functions of the TAF subunits of mammalian TFIID in physiological processes remain poorly characterised. In this study, we describe a novel function of TAFs in directing genomic occupancy of a transcriptional activator. Using liver-specific inactivation in mice, we show that the TAF4 subunit of TFIID is required for post-natal hepatocyte maturation. TAF4 promotes pre-initiation complex (PIC) formation at post-natal expressed liver function genes and down-regulates a subset of embryonic expressed genes by increased RNA polymerase II pausing. The TAF4–TAF12 heterodimer interacts directly with HNF4A and in vivo TAF4 is necessary to maintain HNF4A-directed embryonic gene expression at post-natal stages and promotes HNF4A occupancy of functional cis-regulatory elements adjacent to the transcription start sites of post-natal expressed genes. Stable HNF4A occupancy of these regulatory elements requires TAF4-dependent PIC formation highlighting that these are mutually dependent events. Local promoter-proximal HNF4A–TFIID interactions therefore act as instructive signals for post-natal hepatocyte differentiation. DOI: http://dx.doi.org/10.7554/eLife.03613.001 PMID:25209997

  9. RAP: Accurate and Fast Motif Finding Based on Protein-Binding Microarray Data

    PubMed Central

    Orenstein, Yaron; Mick, Eran

    2013-01-01

    Abstract The novel high-throughput technology of protein-binding microarrays (PBMs) measures binding intensity of a transcription factor to thousands of DNA probe sequences. Several algorithms have been developed to extract binding-site motifs from these data. Such motifs are commonly represented by positional weight matrices. Previous studies have shown that the motifs produced by these algorithms are either accurate in predicting in vitro binding or similar to previously published motifs, but not both. In this work, we present a new simple algorithm to infer binding-site motifs from PBM data. It outperforms prior art both in predicting in vitro binding and in producing motifs similar to literature motifs. Our results challenge previous claims that motifs with lower information content are better models for transcription-factor binding specificity. Moreover, we tested the effect of motif length and side positions flanking the “core” motif in the binding site. We show that side positions have a significant effect and should not be removed, as commonly done. A large drop in the results quality of all methods is observed between in vitro and in vivo binding prediction. The software is available on acgt.cs.tau.ac.il/rap. PMID:23464877

  10. Automated protein motif generation in the structure-based protein function prediction tool ProMOL.

    PubMed

    Osipovitch, Mikhail; Lambrecht, Mitchell; Baker, Cameron; Madha, Shariq; Mills, Jeffrey L; Craig, Paul A; Bernstein, Herbert J

    2015-12-01

    ProMOL, a plugin for the PyMOL molecular graphics system, is a structure-based protein function prediction tool. ProMOL includes a set of routines for building motif templates that are used for screening query structures for enzyme active sites. Previously, each motif template was generated manually and required supervision in the optimization of parameters for sensitivity and selectivity. We developed an algorithm and workflow for the automation of motif building and testing routines in ProMOL. The algorithm uses a set of empirically derived parameters for optimization and requires little user intervention. The automated motif generation algorithm was first tested in a performance comparison with a set of manually generated motifs based on identical active sites from the same 112 PDB entries. The two sets of motifs were equally effective in identifying alignments with homologs and in rejecting alignments with unrelated structures. A second set of 296 active site motifs were generated automatically, based on Catalytic Site Atlas entries with literature citations, as an expansion of the library of existing manually generated motif templates. The new motif templates exhibited comparable performance to the existing ones in terms of hit rates against native structures, homologs with the same EC and Pfam designations, and randomly selected unrelated structures with a different EC designation at the first EC digit, as well as in terms of RMSD values obtained from local structural alignments of motifs and query structures. This research is supported by NIH grant GM078077. PMID:26573864

  11. Prostaglandin E2 promotes Na1.8 trafficking via its intracellular RRR motif through the protein kinase A pathway.

    PubMed

    Liu, Chao; Li, Qian; Su, Yuanyuan; Bao, Lan

    2010-03-01

    Voltage-gated sodium channels (Na(v)) are essential for the initiation and propagation of action potentials in neurons. Na(v)1.8 activity is regulated by prostaglandin E(2) (PGE(2)). There is, however, no direct evidence showing the regulated trafficking of Na(v)1.8, and the molecular and cellular mechanism of PGE(2)-induced sodium channel trafficking is not clear. Here, we report that PGE(2) regulates the trafficking of Na(v)1.8 through the protein kinase A (PKA) signaling pathway, and an RRR motif in the first intracellular loop of Na(v)1.8 mediates this effect. In rat dorsal root ganglion (DRG) neurons, prolonged PGE(2) treatment enhanced Na(v)1.8 currents by increasing the channel density on the cell surface. Activation of PKA by forskolin had the same effect on DRG neurons and human embryonic kidney 293T cells expressing Na(v)1.8. Inhibition of PKA completely blocked the PGE(2)-promoted effect on Na(v)1.8. Mutation of five PKA phosphorylation sites or the RRR motif in the first intracellular loop of Na(v)1.8 abolished the PKA-promoted Na(v)1.8 surface expression. Furthermore, a membrane-tethered peptide containing the intracellular RRR motif disrupted the PGE(2)-induced promotion of the Na(v)1.8 current in DRG neurons. Our data indicate that PGE(2) promotes the surface expression of Na(v)1.8 via an intracellular RRR motif, and provide a novel mechanism for functional modulation of Na(v)1.8 by hyperalgesic agents. PMID:20028484

  12. The Runt domain of AML1 (RUNX1) binds a sequence-conserved RNA motif that mimics a DNA element

    PubMed Central

    Fukunaga, Junichi; Nomura, Yusuke; Tanaka, Yoichiro; Amano, Ryo; Tanaka, Taku; Nakamura, Yoshikazu; Kawai, Gota; Sakamoto, Taiichi; Kozu, Tomoko

    2013-01-01

    AML1 (RUNX1) is a key transcription factor for hematopoiesis that binds to the Runt-binding double-stranded DNA element (RDE) of target genes through its N-terminal Runt domain. Aberrations in the AML1 gene are frequently found in human leukemia. To better understand AML1 and its potential utility for diagnosis and therapy, we obtained RNA aptamers that bind specifically to the AML1 Runt domain. Enzymatic probing and NMR analyses revealed that Apt1-S, which is a truncated variant of one of the aptamers, has a CACG tetraloop and two stem regions separated by an internal loop. All the isolated aptamers were found to contain the conserved sequence motif 5′-NNCCAC-3′ and 5′-GCGMGN′N′-3′ (M:A or C; N and N′ form Watson–Crick base pairs). The motif contains one AC mismatch and one base bulged out. Mutational analysis of Apt1-S showed that three guanines of the motif are important for Runt binding as are the three guanines of RDE, which are directly recognized by three arginine residues of the Runt domain. Mutational analyses of the Runt domain revealed that the amino acid residues used for Apt1-S binding were similar to those used for RDE binding. Furthermore, the aptamer competed with RDE for binding to the Runt domain in vitro. These results demonstrated that the Runt domain of the AML1 protein binds to the motif of the aptamer that mimics DNA. Our findings should provide new insights into RNA function and utility in both basic and applied sciences. PMID:23709277

  13. Regulation of amyloid precursor protein processing by its KFERQ motif

    PubMed Central

    Park, Ji-Seon; Kim, Dong-Hou; Yoon, Seung-Yong

    2016-01-01

    Understanding of trafficking, processing, and degradation mechanisms of amyloid precursor protein (APP) is important because APP can be processed to produce β-amyloid (Aβ), a key pathogenic molecule in Alzheimer’s disease (AD). Here, we found that APP contains KFERQ motif at its C-terminus, a consensus sequence for chaperone-mediated autophagy (CMA) or microautophagy which are another types of autophagy for degradation of pathogenic molecules in neurodegenerative diseases. Deletion of KFERQ in APP increased C-terminal fragments (CTFs) and secreted N-terminal fragments of APP and kept it away from lysosomes. KFERQ deletion did not abolish the interaction of APP or its cleaved products with heat shock cognate protein 70 (Hsc70), a protein necessary for CMA or microautophagy. These findings suggest that KFERQ motif is important for normal processing and degradation of APP to preclude the accumulation of APP-CTFs although it may not be important for CMA or microautophagy. [BMB Reports 2016;49(6): 337-342] PMID:26779997

  14. Ultrasensitive response motifs: basic amplifiers in molecular signalling networks

    PubMed Central

    Zhang, Qiang; Bhattacharya, Sudin; Andersen, Melvin E.

    2013-01-01

    Multi-component signal transduction pathways and gene regulatory circuits underpin integrated cellular responses to perturbations. A recurring set of network motifs serve as the basic building blocks of these molecular signalling networks. This review focuses on ultrasensitive response motifs (URMs) that amplify small percentage changes in the input signal into larger percentage changes in the output response. URMs generally possess a sigmoid input–output relationship that is steeper than the Michaelis–Menten type of response and is often approximated by the Hill function. Six types of URMs can be commonly found in intracellular molecular networks and each has a distinct kinetic mechanism for signal amplification. These URMs are: (i) positive cooperative binding, (ii) homo-multimerization, (iii) multistep signalling, (iv) molecular titration, (v) zero-order covalent modification cycle and (vi) positive feedback. Multiple URMs can be combined to generate highly switch-like responses. Serving as basic signal amplifiers, these URMs are essential for molecular circuits to produce complex nonlinear dynamics, including multistability, robust adaptation and oscillation. These dynamic properties are in turn responsible for higher-level cellular behaviours, such as cell fate determination, homeostasis and biological rhythm. PMID:23615029

  15. The discodermolide hairpin structure flows from conformationally stable modular motifs.

    PubMed

    Jogalekar, Ashutosh S; Kriel, Frederik H; Shi, Qi; Cornett, Ben; Cicero, Daniel; Snyder, James P

    2010-01-14

    (+)-Discodermolide (DDM), a polyketide macrolide from marine sponge, is a potent microtubule assembly promoter. Reported solid-state, solution, and protein-bound DDM conformations reveal the unusual result that a common hairpin conformational motif exists in all three microenvironments. No other flexible microtubule binding agent exhibits such constancy of conformation. In the present study, we combine force-field conformational searches with NMR deconvolution in different solvents to compare DDM conformers with those observed in other environments. While several conformational families are perceived, the hairpin form dominates. The stability of this motif is dictated primarily by steric factors arising from repeated modular segments in DDM composed of the C(Me)-CHX-C(Me) fragment. Furthermore, docking protocols were utilized to probe the DDM binding mode in beta-tubulin. A previously suggested pose is substantiated (Pose-1), while an alternative (Pose-2) has been identified. SAR analysis for DDM analogues differentiates the two poses and suggests that Pose-2 is better able to accommodate the biodata.

  16. Motifs emerge from function in model gene regulatory networks

    PubMed Central

    Burda, Z.; Krzywicki, A.; Martin, O. C.; Zagorski, M.

    2011-01-01

    Gene regulatory networks allow the control of gene expression patterns in living cells. The study of network topology has revealed that certain subgraphs of interactions or “motifs” appear at anomalously high frequencies. We ask here whether this phenomenon may emerge because of the functions carried out by these networks. Given a framework for describing regulatory interactions and dynamics, we consider in the space of all regulatory networks those that have prescribed functional capabilities. Markov Chain Monte Carlo sampling is then used to determine how these functional networks lead to specific motif statistics in the interactions. In the case where the regulatory networks are constrained to exhibit multistability, we find a high frequency of gene pairs that are mutually inhibitory and self-activating. In contrast, networks constrained to have periodic gene expression patterns (mimicking for instance the cell cycle) have a high frequency of bifan-like motifs involving four genes with at least one activating and one inhibitory interaction. PMID:21960444

  17. Proline Rich Motifs as Drug Targets in Immune Mediated Disorders

    PubMed Central

    Srinivasan, Mythily; Dunker, A. Keith

    2012-01-01

    The current version of the human immunome network consists of nearly 1400 interactions involving approximately 600 proteins. Intermolecular interactions mediated by proline-rich motifs (PRMs) are observed in many facets of the immune response. The proline-rich regions are known to preferentially adopt a polyproline type II helical conformation, an extended structure that facilitates transient intermolecular interactions such as signal transduction, antigen recognition, cell-cell communication and cytoskeletal organization. The propensity of both the side chain and the backbone carbonyls of the polyproline type II helix to participate in the interface interaction makes it an excellent recognition motif. An advantage of such distinct chemical features is that the interactions can be discriminatory even in the absence of high affinities. Indeed, the immune response is mediated by well-orchestrated low-affinity short-duration intermolecular interactions. The proline-rich regions are predominantly localized in the solvent-exposed regions such as the loops, intrinsically disordered regions, or between domains that constitute the intermolecular interface. Peptide mimics of the PRM have been suggested as potential antagonists of intermolecular interactions. In this paper, we discuss novel PRM-mediated interactions in the human immunome that potentially serve as attractive targets for immunomodulation and drug development for inflammatory and autoimmune pathologies. PMID:22666276

  18. Prevalent RNA recognition motif duplication in the human genome.

    PubMed

    Tsai, Yihsuan S; Gomez, Shawn M; Wang, Zefeng

    2014-05-01

    The sequence-specific recognition of RNA by proteins is mediated through various RNA binding domains, with the RNA recognition motif (RRM) being the most frequent and present in >50% of RNA-binding proteins (RBPs). Many RBPs contain multiple RRMs, and it is unclear how each RRM contributes to the binding specificity of the entire protein. We found that RRMs within the same RBP (i.e., sibling RRMs) tend to have significantly higher similarity than expected by chance. Sibling RRM pairs from RBPs shared by multiple species tend to have lower similarity than those found only in a single species, suggesting that multiple RRMs within the same protein might arise from domain duplication followed by divergence through random mutations. This finding is exemplified by a recent RRM domain duplication in DAZ proteins and an ancient duplication in PABP proteins. Additionally, we found that different similarities between sibling RRMs are associated with distinct functions of an RBP and that the RBPs tend to contain repetitive sequences with low complexity. Taken together, this study suggests that the number of RBPs with multiple RRMs has expanded in mammals and that the multiple sibling RRMs may recognize similar target motifs in a cooperative manner.

  19. Identifying DNA Binding Motifs by Combining Data from Different Sources

    SciTech Connect

    Mao, Linyong; Resat, Haluk; Nagib Callaos; Katsuhisa Horimoto; Jake Chen; Amy Sze Chan

    2004-07-19

    A transcription factor regulates the expression of its target genes by binding to their operator regions. It functions by affecting the interactions between RNA polymerases and the gene's promoter. Many transcription factors bind to their targets by recognizing a specific DNA sequence pattern, which is referred to as a consensus sequence or a motif. Since it would remove the possible biases, combining biological data from different sources can be expected to improve the quality of the information extracted from the biological data. We analyzed the microarray gene expression data and the organism's genome sequence jointly to determine the transcription factor recognition sequences with more accuracy. Utilizing such a data integration approach, we have investigated the regulation of the photosynthesis genes of the purple non-sulphur photosynthetic bacterium Rhodobacter sphaeroides. The photosynthesis genes in this organism are tightly regulated as a function of environmental growth conditions by three major regulatory systems, PrrB/PrrA, AppA/PpsR and FnrL. In this study, we have detected a previously undefined PrrA consensus sequence, improved the previously known DNA-binding motif of PpsR, and confirmed the consensus sequence of the global regulator FnrL.

  20. Cancer-causing mutations in a novel transcription-dependent nuclear export motif of VHL abrogate oxygen-dependent degradation of hypoxia-inducible factor.

    PubMed

    Khacho, Mireille; Mekhail, Karim; Pilon-Larose, Karine; Payette, Josianne; Lee, Stephen

    2008-01-01

    It is thought that degradation of nuclear proteins by the ubiquitylation system requires nuclear-cytoplasmic trafficking of E3 ubiquitin ligases. The von Hippel-Lindau (VHL) tumor suppressor protein is the substrate recognition component of a Cullin-2-containing E3 ubiquitin ligase that recruits hypoxia-inducible factor (HIF) for oxygen-dependent degradation. We demonstrated that VHL engages in nuclear-cytoplasmic trafficking that requires ongoing transcription to promote efficient HIF degradation. Here, we report the identification of a discreet motif, DXGX(2)DX(2)L, that directs transcription-dependent nuclear export of VHL and which is targeted by naturally occurring mutations associated with renal carcinoma and polycythemia in humans. The DXGX(2)DX(2)L motif is also found in other proteins, including poly(A)-binding protein 1, to direct its transcription-dependent nuclear export. We define DXGX(2)DX(2)L as TD-NEM (transcription-dependent nuclear export motif), since inhibition of transcription by actinomycin D or 5,6-dichlorobenzimidazole abrogates its nuclear export activity. Disease-causing mutations of key residues of TD-NEM restrain the ability of VHL to efficiently mediate oxygen-dependent degradation of HIF by altering its nuclear export dynamics without affecting interaction with its substrate. These results identify a novel nuclear export motif, further highlight the role of nuclear-cytoplasmic shuttling of E3 ligases in degradation of nuclear substrates, and provide evidence that disease-causing mutations can target subcellular trafficking.

  1. Formation and Dissociation of the Interstrand i-Motif by the Sequences d(XnC4Ym) Monitored with Electrospray Ionization Mass Spectrometry

    NASA Astrophysics Data System (ADS)

    Cao, Yanwei; Qin, Yujiao; Bruist, Michael; Gao, Shang; Wang, Bing; Wang, Huixin; Guo, Xinhua

    2015-06-01

    Formation and dissociation of the interstrand i-motifs by DNA with the sequence d(XnC4Ym) (X and Y represent thymine, adenine, or guanine, and n, m range from 0 to 2) are studied with electrospray ionization mass spectrometry (ESI-MS), circular dichroism (CD), and UV spectrophotometry. The ion complexes detected in the gas phase and the melting temperatures (Tm) obtained in solution show that a non-C base residue located at 5' end favors formation of the four-stranded structures, with T > A > G for imparting stability. Comparatively, no rule is found when a non-C base is located at the 3' end. Detection of penta- and hexa-stranded ions indicates the formation of i-motifs with more than four strands. In addition, the i-motifs seen in our mass spectra are accompanied by single-, double-, and triple-stranded ions, and the trimeric ions were always less abundant during annealing and heat-induced dissociation process of the DNA strands in solution (pH = 4.5). This provides a direct evidence of a strand-by-strand formation and dissociation pathway of the interstrand i-motif and formation of the triple strands is the rate-limiting step. In contrast, the trimeric ions are abundant when the tetramolecular ions are subjected to collision-induced dissociation (CID) in the gas phase, suggesting different dissociation behaviors of the interstrand i-motif in the gas phase and in solution. Furthermore, hysteretic UV absorption melting and cooling curves reveal an irreversible dissociation and association kinetic process of the interstrand i-motif in solution.

  2. In vivo assessment of NS1-truncated influenza virus with a novel SLSYSINWRH motif as a self-adjuvanting live attenuated vaccine.

    PubMed

    Ngunjiri, John M; Ali, Ahmed; Boyaka, Prosper; Marcus, Philip I; Lee, Chang-Won

    2015-01-01

    Mutants of influenza virus that encode C-terminally truncated NS1 proteins (NS1-truncated mutants) characteristically induce high interferon responses. The dual activity of interferon in blocking virus replication and enhancing the development of adaptive immune responses makes these mutants promising as self-adjuvanting live-attenuated influenza vaccine (LAIV) candidates. Yet, among the NS1-truncated mutants, the length of NS1 is not directly correlated with the interferon-inducing efficiency, the level of attenuation, or effectiveness as LAIV. Using quantitative in vitro biologically active particle subpopulation analysis as a tool to identify potential LAIV candidates from a pool of NS1-truncated mutants, we previously predicted that a NS1-truncated mutant pc2, which was less effective as a LAIV in chickens, would be sufficiently effective as a LAIV in mammalian hosts. In this study, we confirmed that pc2 protected mice and pigs against heterologous virus challenge in terms of preventing clinical signs and reducing virus shedding. pc2 expresses a unique SLSYSINWRH motif at the C-terminus of its truncated NS1. Deletion of the SLSYSINWRH motif led to ~821-fold reduction in the peak yield of type I interferon induced in murine cells. Furthermore, replacement of the SLSYSINWRH motif with the wildtype MVKMDQAIMD sequence did not restore the interferon-inducing efficiency. The diminished interferon induction capacity in the absence of the SLSYSINWRH motif was similar to that observed in other mutants which are less effective LAIV candidates. Remarkably, pc2 induced 16-fold or more interferon in human lung and monkey kidney cells compared to the temperature-sensitive, cold-adapted Ann Arbor virus that is currently used as a master backbone for LAIVs such as FluMist. Although the mechanism by which the SLSYSINWRH motif regulates the vaccine properties of pc2 has not been elucidated, this motif has potential use in engineering self-adjuvanting NS1-truncated-based LAIVs.

  3. Solution Structure of the Cuz1 AN1 Zinc Finger Domain: An Exposed LDFLP Motif Defines a Subfamily of AN1 Proteins

    PubMed Central

    Sun, Zhen-Yu J.; Bhanu, Meera K.; Allan, Martin G.; Arthanari, Haribabu; Wagner, Gerhard; Hanna, John

    2016-01-01

    Zinc binding domains are common and versatile protein structural motifs that mediate diverse cellular functions. Among the many structurally distinct families of zinc finger (ZnF) proteins, the AN1 domain remains poorly characterized. Cuz1 is one of two AN1 ZnF proteins in the yeast S. cerevisiae, and is a stress-inducible protein that functions in protein degradation through direct interaction with the proteasome and Cdc48. Here we report the solution structure of the Cuz1 AN1 ZnF which reveals a compact C6H2 zinc-coordinating domain that resembles a two-finger hand holding a tri-helical clamp. A central phenylalanine residue sits between the two zinc-coordinating centers. The position of this phenylalanine, just before the penultimate zinc-chelating cysteine, is strongly conserved from yeast to man. This phenylalanine shows an exceptionally slow ring-flipping rate which likely contributes to the high rigidity and stability of the AN1 domain. In addition to the zinc-chelating residues, sequence analysis of Cuz1 indicates a second highly evolutionarily conserved motif. This LDFLP motif is shared with three human proteins—Zfand1, AIRAP, and AIRAP-L—the latter two of which share similar cellular functions with Cuz1. The LDFLP motif, while embedded within the zinc finger domain, is surface exposed, largely uninvolved in zinc chelation, and not required for the overall fold of the domain. The LDFLP motif was dispensable for Cuz1's major known functions, proteasome- and Cdc48-binding. These results provide the first structural characterization of the AN1 zinc finger domain, and suggest that the LDFLP motif may define a sub-family of evolutionarily conserved AN1 zinc finger proteins. PMID:27662200

  4. Multiple FLC haplotypes defined by independent cis-regulatory variation underpin life history diversity in Arabidopsis thaliana

    PubMed Central

    Filiault, Daniele; Box, Mathew S.; Kerdaffrec, Envel; van Oosterhout, Cock; Wilczek, Amity M.; Schmitt, Johanna; McMullan, Mark; Bergelson, Joy; Nordborg, Magnus

    2014-01-01

    Relating molecular variation to phenotypic diversity is a central goal in evolutionary biology. In Arabidopsis thaliana, FLOWERING LOCUS C (FLC) is a major determinant of variation in vernalization—the acceleration of flowering by prolonged cold. Here, through analysis of 1307 A. thaliana accessions, we identify five predominant FLC haplotypes defined by noncoding sequence variation. Genetic and transgenic experiments show that they are functionally distinct, varying in FLC expression level and rate of epigenetic silencing. Allelic heterogeneity at this single locus accounts for a large proportion of natural variation in vernalization that contributes to adaptation of A. thaliana. PMID:25035417

  5. Modular changes of cis-regulatory elements from two functional Pit1 genes in the duplicated genome of Cyprinus carpio.

    PubMed

    Kausel, G; Salazar, M; Castro, L; Vera, T; Romero, A; Muller, M; Figueroa, J

    2006-10-15

    The pituitary-specific transcription factor Pit1 is involved in its own regulation and in a network of transcriptional regulation of hypothalamo-hypophyseal factors including prolactin (PRL) and growth hormone (GH). In the ectotherm teleost Cyprinus carpio, Pit1 plays an important role in regulation of the adaptive response to seasonal environmental changes. Two Pit1 genes exist in carp, a tetraploid vertebrate and transcripts of both genes were detected by RT-PCR analysis. Powerful comparative analyses of the 5'-flanking regions revealed copy specific changes comprising modular functional units in the naturally evolved promoters. These include the precise replacement of four nucleotides around the transcription start site embedded in completely conserved regions extending upstream of the TATA-box, an additional transcription factor binding site in the 5'-UTR of gene-I and, instead, duplication of a 9 bp element in gene-II. Binding of nuclear factors was assessed by electro mobility shift assays using extracts from rat pituitary cells and carp pituitary. Binding was confirmed at one conserved Pit1, one conserved CREB and one consensus MTF1. Interestingly, two functional Pit1 sites and one putative MTF1 binding site are unique to the Pit1 gene-I. In situ hybridization experiments revealed that the expression of gene-I in winter carp was significantly stronger than that of gene-II. Our data suggest that the specific control elements identified in the proximal regulatory region are physiologically relevant for the function of the duplicated Pit1 genes in carp and highlight modular changes in the architecture of two Pit1 genes that evolved for at least 12 MYA in the same organism.

  6. Comprehensively evaluating cis-regulatory variation in the human prostate transcriptome by using gene-level allele-specific expression.

    PubMed

    Larson, Nicholas B; McDonnell, Shannon; French, Amy J; Fogarty, Zach; Cheville, John; Middha, Sumit; Riska, Shaun; Baheti, Saurabh; Nair, Asha A; Wang, Liang; Schaid, Daniel J; Thibodeau, Stephen N

    2015-06-01

    The identification of cis-acting regulatory variation in primary tissues has the potential to elucidate the genetic basis of complex traits and further our understanding of transcriptomic diversity across cell types. Expression quantitative trait locus (eQTL) association analysis using RNA sequencing (RNA-seq) data can improve upon the detection of cis-acting regulatory variation by leveraging allele-specific expression (ASE) patterns in association analysis. Here, we present a comprehensive evaluation of cis-acting eQTLs by analyzing RNA-seq gene-expression data and genome-wide high-density genotypes from 471 samples of normal primary prostate tissue. Using statistical models that integrate ASE information, we identified extensive cis-eQTLs across the prostate transcriptome and found that approximately 70% of expressed genes corresponded to a significant eQTL at a gene-level false-discovery rate of 0.05. Overall, cis-eQTLs were heavily concentrated near the transcription start and stop sites of affected genes, and effects were negatively correlated with distance. We identified multiple instances of cis-acting co-regulation by using phased genotype data and discovered 233 SNPs as the most strongly associated eQTLs for more than one gene. We also noted significant enrichment (25/50, p = 2E-5) of previously reported prostate cancer risk SNPs in prostate eQTLs. Our results illustrate the benefit of assessing ASE data in cis-eQTL analyses by showing better reproducibility of prior eQTL findings than of eQTL mapping based on total expression alone. Altogether, our analysis provides extensive functional context of thousands of SNPs in prostate tissue, and these results will be of critical value in guiding studies examining disease of the human prostate.

  7. A cis-regulatory antisense RNA represses translation in Vibrio cholerae through extensive complementarity and proximity to the target locus.