Science.gov

Sample records for conserved sequence motif

  1. BlockLogo: visualization of peptide and sequence motif conservation.

    PubMed

    Olsen, Lars Rønn; Kudahl, Ulrich Johan; Simon, Christian; Sun, Jing; Schönbach, Christian; Reinherz, Ellis L; Zhang, Guang Lan; Brusic, Vladimir

    2013-12-31

    BlockLogo is a web-server application for the visualization of protein and nucleotide fragments, continuous protein sequence motifs, and discontinuous sequence motifs using calculation of block entropy from multiple sequence alignments. The user input consists of a multiple sequence alignment, selection of motif positions, type of sequence, and output format definition. The output has BlockLogo along with the sequence logo, and a table of motif frequencies. We deployed BlockLogo as an online application and have demonstrated its utility through examples that show visualization of T-cell epitopes and B-cell epitopes (both continuous and discontinuous). Our additional example shows a visualization and analysis of structural motifs that determine the specificity of peptide binding to HLA-DR molecules. The BlockLogo server also employs selected experimentally validated prediction algorithms to enable on-the-fly prediction of MHC binding affinity to 15 common HLA class I and class II alleles as well as visual analysis of discontinuous epitopes from multiple sequence alignments. It enables the visualization and analysis of structural and functional motifs that are usually described as regular expressions. It provides a compact view of discontinuous motifs composed of distant positions within biological sequences. BlockLogo is available at: http://research4.dfci.harvard.edu/cvc/blocklogo/ and http://met-hilab.bu.edu/blocklogo/. PMID:24001880

  2. The BsaHI restriction-modification system: Cloning, sequencing and analysis of conserved motifs

    PubMed Central

    Neely, Robert K; Roberts, Richard J

    2008-01-01

    Background Restriction and modification enzymes typically recognise short DNA sequences of between two and eight bases in length. Understanding the mechanism of this recognition represents a significant challenge that we begin to address for the BsaHI restriction-modification system, which recognises the six base sequence GRCGYC. Results The DNA sequences of the genes for the BsaHI methyltransferase, bsaHIM, and restriction endonuclease, bsaHIR, have been determined (GenBank accession #EU386360), cloned and expressed in E. coli. Both the restriction endonuclease and methyltransferase enzymes share significant similarity with a group of 6 other enzymes comprising the restriction-modification systems HgiDI and HgiGI and the putative HindVP, NlaCORFDP, NpuORFC228P and SplZORFNP restriction-modification systems. A sequence alignment of these homologues shows that their amino acid sequences are largely conserved and highlights several motifs of interest. We target one such conserved motif, reading SPERRFD, at the C-terminal end of the bsaHIR gene. A mutational analysis of these amino acids indicates that the motif is crucial for enzymatic activity. Sequence alignment of the methyltransferase gene reveals a short motif within the target recognition domain that is conserved among enzymes recognising the same sequences. Thus, this motif may be used as a diagnostic tool to define the recognition sequences of the cytosine C5 methyltransferases. Conclusion We have cloned and sequenced the BsaHI restriction and modification enzymes. We have identified a region of the R. BsaHI enzyme that is crucial for its activity. Analysis of the amino acid sequence of the BsaHI methyltransferase enzyme led us to propose two new motifs that can be used in the diagnosis of the recognition sequence of the cytosine C5-methyltransferases. PMID:18479503

  3. A search for small noncoding RNAs in Staphylococcus aureus reveals a conserved sequence motif for regulation

    PubMed Central

    Geissmann, Thomas; Chevalier, Clément; Cros, Marie-Josée; Boisset, Sandrine; Fechter, Pierre; Noirot, Céline; Schrenzel, Jacques; François, Patrice; Vandenesch, François; Gaspin, Christine; Romby, Pascale

    2009-01-01

    Bioinformatic analysis of the intergenic regions of Staphylococcus aureus predicted multiple regulatory regions. From this analysis, we characterized 11 novel noncoding RNAs (RsaA‐K) that are expressed in several S. aureus strains under different experimental conditions. Many of them accumulate in the late-exponential phase of growth. All ncRNAs are stable and their expression is Hfq-independent. The transcription of several of them is regulated by the alternative sigma B factor (RsaA, D and F) while the expression of RsaE is agrA-dependent. Six of these ncRNAs are specific to S. aureus, four are conserved in other Staphylococci, and RsaE is also present in Bacillaceae. Transcriptomic and proteomic analysis indicated that RsaE regulates the synthesis of proteins involved in various metabolic pathways. Phylogenetic analysis combined with RNA structure probing, searches for RsaE‐mRNA base pairing, and toeprinting assays indicate that a conserved and unpaired UCCC sequence motif of RsaE binds to target mRNAs and prevents the formation of the ribosomal initiation complex. This study unexpectedly shows that most of the novel ncRNAs carry the conserved C−rich motif, suggesting that they are members of a class of ncRNAs that target mRNAs by a shared mechanism. PMID:19786493

  4. An approach to delineate primers for a group of poorly conserved sequences incorporating the common motif region.

    PubMed

    Sahu, Mousumi; Sahu, Jagajjit; Sahoo, Smita; Dehury, Budheswar; Sarma, Kishore; Sarmah, Ranjan; Sen, Priyabrata; Modi, Mahendra Kumar; Barooah, Madhumita

    2012-01-01

    Glutathione synthetase (gshB) has previously been reported to confer tolerance to acidic soil condition in Rhizobium species. Cloning the gene coding for this enzyme necessitates the designing of proper primer sets which in turn depends on the identification of high quality sequence similarity in multiple global alignments. In this experiment, a group of homologous gene sequences related to gshB gene (accession no: gi-86355669:327589-328536) of Rhizobium etli CFN 42, were extracted from NCBI nucleotide sequence databases using BLASTN and were analyzed for designing degenerate primers. However, the T-coffee multiple global alignment results did not show any block of conserved region for the above sequence set to design the primers. Therefore, we attempted to identify the location of common motif region based on multiple local alignments employing the MEME algorithm supported with MAST and Primer3. The results revealed some common motif regions that enabled us to design the primer sets for related gshB gene sequences. The result will be validated in wet lab. PMID:22419837

  5. Mining protein sequences for motifs.

    PubMed

    Narasimhan, Giri; Bu, Changsong; Gao, Yuan; Wang, Xuning; Xu, Ning; Mathee, Kalai

    2002-01-01

    We use methods from Data Mining and Knowledge Discovery to design an algorithm for detecting motifs in protein sequences. The algorithm assumes that a motif is constituted by the presence of a "good" combination of residues in appropriate locations of the motif. The algorithm attempts to compile such good combinations into a "pattern dictionary" by processing an aligned training set of protein sequences. The dictionary is subsequently used to detect motifs in new protein sequences. Statistical significance of the detection results are ensured by statistically determining the various parameters of the algorithm. Based on this approach, we have implemented a program called GYM. The Helix-Turn-Helix motif was used as a model system on which to test our program. The program was also extended to detect Homeodomain motifs. The detection results for the two motifs compare favorably with existing programs. In addition, the GYM program provides a lot of useful information about a given protein sequence. PMID:12487759

  6. [Conserved motifs in voltage sensing proteins].

    PubMed

    Wang, Chang-He; Xie, Zhen-Li; Lv, Jian-Wei; Yu, Zhi-Dan; Shao, Shu-Li

    2012-08-25

    This paper was aimed to study conserved motifs of voltage sensing proteins (VSPs) and establish a voltage sensing model. All VSPs were collected from the Uniprot database using a comprehensive keyword search followed by manual curation, and the results indicated that there are only two types of known VSPs, voltage gated ion channels and voltage dependent phosphatases. All the VSPs have a common domain of four helical transmembrane segments (TMS, S1-S4), which constitute the voltage sensing module of the VSPs. The S1 segment was shown to be responsible for membrane targeting and insertion of these proteins, while S2-S4 segments, which can sense membrane potential, for protein properties. Conserved motifs/residues and their functional significance of each TMS were identified using profile-to-profile sequence alignments. Conserved motifs in these four segments are strikingly similar for all VSPs, especially, the conserved motif [RK]-X(2)-R-X(2)-R-X(2)-[RK] was presented in all the S4 segments, with positively charged arginine (R) alternating with two hydrophobic or uncharged residues. Movement of these arginines across the membrane electric field is the core mechanism by which the VSPs detect changes in membrane potential. The negatively charged aspartate (D) in the S3 segment is universally conserved in all the VSPs, suggesting that the aspartate residue may be involved in voltage sensing properties of VSPs as well as the electrostatic interactions with the positively charged residues in the S4 segment, which may enhance the thermodynamic stability of the S4 segments in plasma membrane. PMID:22907298

  7. Distance conservation of transcriptional and splicing regulatory motifs

    NASA Astrophysics Data System (ADS)

    Lu, Jun; Ding, Changjiang

    2012-09-01

    The distance conservation is a new kind of genomic evolutionary conservation. The transcriptional and splicing regulatory k-mer motifs are functionally important DNA sequence elements. We demonstrated that there exist the evolutionarily conservation of the distance between these k-mer pairs in genomic sequences. This kind of conservation is not based on the strict location of bases in genome sequences, and does not depend on excess frequency of occurrence of k-mers. By utilizing the conservation of k-mer distance it is possible to design a non-alignment-based approach to quickly identify transcriptional or splicing regulatory motifs on the genome-wide scale. In this paper we will summarize our previous studies on distance conservation, introduce the method of distance conservation and indicate the prospects of its application.

  8. A Gibbs sampler for motif detection in phylogenetically close sequences

    NASA Astrophysics Data System (ADS)

    Siddharthan, Rahul; van Nimwegen, Erik; Siggia, Eric

    2004-03-01

    Genes are regulated by transcription factors that bind to DNA upstream of genes and recognize short conserved ``motifs'' in a random intergenic ``background''. Motif-finders such as the Gibbs sampler compare the probability of these short sequences being represented by ``weight matrices'' to the probability of their arising from the background ``null model'', and explore this space (analogous to a free-energy landscape). But closely related species may show conservation not because of functional sites but simply because they have not had sufficient time to diverge, so conventional methods will fail. We introduce a new Gibbs sampler algorithm that accounts for common ancestry when searching for motifs, while requiring minimal ``prior'' assumptions on the number and types of motifs, assessing the significance of detected motifs by ``tracking'' clusters that stay together. We apply this scheme to motif detection in sporulation-cycle genes in the yeast S. cerevisiae, using recent sequences of other closely-related Saccharomyces species.

  9. Short sequence motifs, overrepresented in mammalian conservednon-coding sequences

    SciTech Connect

    Minovitsky, Simon; Stegmaier, Philip; Kel, Alexander; Kondrashov,Alexey S.; Dubchak, Inna

    2007-02-21

    Background: A substantial fraction of non-coding DNAsequences of multicellular eukaryotes is under selective constraint. Inparticular, ~;5 percent of the human genome consists of conservednon-coding sequences (CNSs). CNSs differ from other genomic sequences intheir nucleotide composition and must play important functional roles,which mostly remain obscure.Results: We investigated relative abundancesof short sequence motifs in all human CNSs present in the human/mousewhole-genome alignments vs. three background sets of sequences: (i)weakly conserved or unconserved non-coding sequences (non-CNSs); (ii)near-promoter sequences (located between nucleotides -500 and -1500,relative to a start of transcription); and (iii) random sequences withthe same nucleotide composition as that of CNSs. When compared tonon-CNSs and near-promoter sequences, CNSs possess an excess of AT-richmotifs, often containing runs of identical nucleotides. In contrast, whencompared to random sequences, CNSs contain an excess of GC-rich motifswhich, however, lack CpG dinucleotides. Thus, abundance of short sequencemotifs in human CNSs, taken as a whole, is mostly determined by theiroverall compositional properties and not by overrepresentation of anyspecific short motifs. These properties are: (i) high AT-content of CNSs,(ii) a tendency, probably due to context-dependent mutation, of A's andT's to clump, (iii) presence of short GC-rich regions, and (iv) avoidanceof CpG contexts, due to their hypermutability. Only a small number ofshort motifs, overrepresented in all human CNSs are similar to bindingsites of transcription factors from the FOX family.Conclusion: Human CNSsas a whole appear to be too broad a class of sequences to possess strongfootprints of any short sequence-specific functions. Such footprintsshould be studied at the level of functional subclasses of CNSs, such asthose which flank genes with a particular pattern of expression. Overallproperties of CNSs are affected by patterns in

  10. Comparison of SIV and HIV-1 genomic RNA structures reveals impact of sequence evolution on conserved and non-conserved structural motifs.

    PubMed

    Pollom, Elizabeth; Dang, Kristen K; Potter, E Lake; Gorelick, Robert J; Burch, Christina L; Weeks, Kevin M; Swanstrom, Ronald

    2013-01-01

    RNA secondary structure plays a central role in the replication and metabolism of all RNA viruses, including retroviruses like HIV-1. However, structures with known function represent only a fraction of the secondary structure reported for HIV-1(NL4-3). One tool to assess the importance of RNA structures is to examine their conservation over evolutionary time. To this end, we used SHAPE to model the secondary structure of a second primate lentiviral genome, SIVmac239, which shares only 50% sequence identity at the nucleotide level with HIV-1NL4-3. Only about half of the paired nucleotides are paired in both genomic RNAs and, across the genome, just 71 base pairs form with the same pairing partner in both genomes. On average the RNA secondary structure is thus evolving at a much faster rate than the sequence. Structure at the Gag-Pro-Pol frameshift site is maintained but in a significantly altered form, while the impact of selection for maintaining a protein binding interaction can be seen in the conservation of pairing partners in the small RRE stems where Rev binds. Structures that are conserved between SIVmac239 and HIV-1(NL4-3) also occur at the 5' polyadenylation sequence, in the plus strand primer sites, PPT and cPPT, and in the stem-loop structure that includes the first splice acceptor site. The two genomes are adenosine-rich and cytidine-poor. The structured regions are enriched in guanosines, while unpaired regions are enriched in adenosines, and functionaly important structures have stronger base pairing than nonconserved structures. We conclude that much of the secondary structure is the result of fortuitous pairing in a metastable state that reforms during sequence evolution. However, secondary structure elements with important function are stabilized by higher guanosine content that allows regions of structure to persist as sequence evolution proceeds, and, within the confines of selective pressure, allows structures to evolve. PMID:23593004

  11. Detecting correlations among functional-sequence motifs

    NASA Astrophysics Data System (ADS)

    Pirino, Davide; Rigosa, Jacopo; Ledda, Alice; Ferretti, Luca

    2012-06-01

    Sequence motifs are words of nucleotides in DNA with biological functions, e.g., gene regulation. Identification of such words proceeds through rejection of Markov models on the expected motif frequency along the genome. Additional biological information can be extracted from the correlation structure among patterns of motif occurrences. In this paper a log-linear multivariate intensity Poisson model is estimated via expectation maximization on a set of motifs along the genome of E. coli K12. The proposed approach allows for excitatory as well as inhibitory interactions among motifs and between motifs and other genomic features like gene occurrences. Our findings confirm previous stylized facts about such types of interactions and shed new light on genome-maintenance functions of some particular motifs. We expect these methods to be applicable to a wider set of genomic features.

  12. Detecting correlations among functional-sequence motifs.

    PubMed

    Pirino, Davide; Rigosa, Jacopo; Ledda, Alice; Ferretti, Luca

    2012-06-01

    Sequence motifs are words of nucleotides in DNA with biological functions, e.g., gene regulation. Identification of such words proceeds through rejection of Markov models on the expected motif frequency along the genome. Additional biological information can be extracted from the correlation structure among patterns of motif occurrences. In this paper a log-linear multivariate intensity Poisson model is estimated via expectation maximization on a set of motifs along the genome of E. coli K12. The proposed approach allows for excitatory as well as inhibitory interactions among motifs and between motifs and other genomic features like gene occurrences. Our findings confirm previous stylized facts about such types of interactions and shed new light on genome-maintenance functions of some particular motifs. We expect these methods to be applicable to a wider set of genomic features. PMID:23005179

  13. QGRS-Conserve: a computational method for discovering evolutionarily conserved G-quadruplex motifs

    PubMed Central

    2014-01-01

    Background Nucleic acids containing guanine tracts can form quadruplex structures via non-Watson-Crick base pairing. Formation of G-quadruplexes is associated with the regulation of important biological functions such as transcription, genetic instability, DNA repair, DNA replication, epigenetic mechanisms, regulation of translation, and alternative splicing. G-quadruplexes play important roles in human diseases and are being considered as targets for a variety of therapies. Identification of functional G-quadruplexes and the study of their overall distribution in genomes and transcriptomes is an important pursuit. Traditional computational methods map sequence motifs capable of forming G-quadruplexes but have difficulty in distinguishing motifs that occur by chance from ones which fold into G-quadruplexes. Results We present Quadruplex forming ‘G’-rich sequences (QGRS)-Conserve, a computational method for calculating motif conservation across exomes and supports filtering to provide researchers with more precise methods of studying G-quadruplex distribution patterns. Our method quantitatively evaluates conservation between quadruplexes found in homologous nucleotide sequences based on several motif structural characteristics. QGRS-Conserve also efficiently manages overlapping G-quadruplex sequences such that the resulting datasets can be analyzed effectively. Conclusions We have applied QGRS-Conserve to identify a large number of G-quadruplex motifs in the human exome conserved across several mammalian and non-mammalian species. We have successfully identified multiple homologs of many previously published G-quadruplexes that play post-transcriptional regulatory roles in human genes. Preliminary large-scale analysis identified many homologous G-quadruplexes in the 5′- and 3′-untranslated regions of mammalian species. An expectedly smaller set of G-quadruplex motifs was found to be conserved across larger phylogenetic distances. QGRS-Conserve provides means

  14. Detecting seeded motifs in DNA sequences.

    PubMed

    Pizzi, Cinzia; Bortoluzzi, Stefania; Bisognin, Andrea; Coppe, Alessandro; Danieli, Gian Antonio

    2005-01-01

    The problem of detecting DNA motifs with functional relevance in real biological sequences is difficult due to a number of biological, statistical and computational issues and also because of the lack of knowledge about the structure of searched patterns. Many algorithms are implemented in fully automated processes, which are often based upon a guess of input parameters from the user at the very first step. In this paper, we present a novel method for the detection of seeded DNA motifs, composed by regions with a different extent of variability. The method is based on a multi-step approach, which was implemented in a motif searching web tool (MOST). Overrepresented exact patterns are extracted from input sequences and clustered to produce motifs core regions, which are then extended and scored to generate seeded motifs. The combination of automated pattern discovery algorithms and different display tools for the evaluation and selection of results at several analysis steps can potentially lead to much more meaningful results than complete automation can produce. Experimental results on different yeast and human real datasets proved the methodology to be a promising solution for finding seeded motifs. MOST web tool is freely available at http://telethon.bio.unipd.it/bioinfo/MOST. PMID:16141193

  15. Detecting seeded motifs in DNA sequences

    PubMed Central

    Pizzi, Cinzia; Bortoluzzi, Stefania; Bisognin, Andrea; Coppe, Alessandro; Danieli, Gian Antonio

    2005-01-01

    The problem of detecting DNA motifs with functional relevance in real biological sequences is difficult due to a number of biological, statistical and computational issues and also because of the lack of knowledge about the structure of searched patterns. Many algorithms are implemented in fully automated processes, which are often based upon a guess of input parameters from the user at the very first step. In this paper, we present a novel method for the detection of seeded DNA motifs, composed by regions with a different extent of variability. The method is based on a multi-step approach, which was implemented in a motif searching web tool (MOST). Overrepresented exact patterns are extracted from input sequences and clustered to produce motifs core regions, which are then extended and scored to generate seeded motifs. The combination of automated pattern discovery algorithms and different display tools for the evaluation and selection of results at several analysis steps can potentially lead to much more meaningful results than complete automation can produce. Experimental results on different yeast and human real datasets proved the methodology to be a promising solution for finding seeded motifs. MOST web tool is freely available at . PMID:16141193

  16. Discovering Motifs in Biological Sequences Using the Micron Automata Processor.

    PubMed

    Roy, Indranil; Aluru, Srinivas

    2016-01-01

    Finding approximately conserved sequences, called motifs, across multiple DNA or protein sequences is an important problem in computational biology. In this paper, we consider the (l, d) motif search problem of identifying one or more motifs of length l present in at least q of the n given sequences, with each occurrence differing from the motif in at most d substitutions. The problem is known to be NP-complete, and the largest solved instance reported to date is (26,11). We propose a novel algorithm for the (l,d) motif search problem using streaming execution over a large set of non-deterministic finite automata (NFA). This solution is designed to take advantage of the micron automata processor, a new technology close to deployment that can simultaneously execute multiple NFA in parallel. We demonstrate the capability for solving much larger instances of the (l, d) motif search problem using the resources available within a single automata processor board, by estimating run-times for problem instances (39,18) and (40,17). The paper serves as a useful guide to solving problems using this new accelerator technology. PMID:26886735

  17. Fast, Sensitive Discovery of Conserved Genome-Wide Motifs

    PubMed Central

    Ihuegbu, Nnamdi E.; Buhler, Jeremy

    2012-01-01

    Abstract Regulatory sites that control gene expression are essential to the proper functioning of cells, and identifying them is critical for modeling regulatory networks. We have developed Magma (Multiple Aligner of Genomic Multiple Alignments), a software tool for multiple species, multiple gene motif discovery. Magma identifies putative regulatory sites that are conserved across multiple species and occur near multiple genes throughout a reference genome. Magma takes as input multiple alignments that can include gaps. It uses efficient clustering methods that make it about 70 times faster than PhyloNet, a previous program for this task, with slightly greater sensitivity. We ran Magma on all non-coding DNA conserved between Caenorhabditis elegans and five additional species, about 70 Mbp in total, in <4 h. We obtained 2,309 motifs with lengths of 6–20 bp, each occurring at least 10 times throughout the genome, which collectively covered about 566 kbp of the genomes, approximately 0.8% of the input. Predicted sites occurred in all types of non-coding sequence but were especially enriched in the promoter regions. Comparisons to several experimental datasets show that Magma motifs correspond to a variety of known regulatory motifs. PMID:22300316

  18. Genomic analysis of membrane protein families: abundance and conserved motifs

    PubMed Central

    Liu, Yang; Engelman, Donald M; Gerstein, Mark

    2002-01-01

    Background Polytopic membrane proteins can be related to each other on the basis of the number of transmembrane helices and sequence similarities. Building on the Pfam classification of protein domain families, and using transmembrane-helix prediction and sequence-similarity searching, we identified a total of 526 well-characterized membrane protein families in 26 recently sequenced genomes. To this we added a clustering of a number of predicted but unclassified membrane proteins, resulting in a total of 637 membrane protein families. Results Analysis of the occurrence and composition of these families revealed several interesting trends. The number of assigned membrane protein domains has an approximately linear relationship to the total number of open reading frames (ORFs) in 26 genomes studied. Caenorhabditis elegans is an apparent outlier, because of its high representation of seven-span transmembrane (7-TM) chemoreceptor families. In all genomes, including that of C. elegans, the number of distinct membrane protein families has a logarithmic relation to the number of ORFs. Glycine, proline, and tyrosine locations tend to be conserved in transmembrane regions within families, whereas isoleucine, valine, and methionine locations are relatively mutable. Analysis of motifs in putative transmembrane helices reveals that GxxxG and GxxxxxxG (which can be written GG4 and GG7, respectively; see Materials and methods) are among the most prevalent. This was noted in earlier studies; we now find these motifs are particularly well conserved in families, however, especially those corresponding to transporters, symporters, and channels. Conclusions We carried out a genome-wide analysis on patterns of the classified polytopic membrane protein families and analyzed the distribution of conserved amino acids and motifs in the transmembrane helix regions in these families. PMID:12372142

  19. Sequence-Based Screening for Rare Enzymes: New Insights into the World of AMDases Reveal a Conserved Motif and 58 Novel Enzymes Clustering in Eight Distinct Families

    PubMed Central

    Maimanakos, Janine; Chow, Jennifer; Gaßmeyer, Sarah K.; Güllert, Simon; Busch, Florian; Kourist, Robert; Streit, Wolfgang R.

    2016-01-01

    Arylmalonate Decarboxylases (AMDases, EC 4.1.1.76) are very rare and mostly underexplored enzymes. Currently only four known and biochemically characterized representatives exist. However, their ability to decarboxylate α-disubstituted malonic acid derivatives to optically pure products without cofactors makes them attractive and promising candidates for the use as biocatalysts in industrial processes. Until now, AMDases could not be separated from other members of the aspartate/glutamate racemase superfamily based on their gene sequences. Within this work, a search algorithm was developed that enables a reliable prediction of AMDase activity for potential candidates. Based on specific sequence patterns and screening methods 58 novel AMDase candidate genes could be identified in this work. Thereby, AMDases with the conserved sequence pattern of Bordetella bronchiseptica’s prototype appeared to be limited to the classes of Alpha-, Beta-, and Gamma-proteobacteria. Amino acid homologies and comparison of gene surrounding sequences enabled the classification of eight enzyme clusters. Particularly striking is the accumulation of genes coding for different transporters of the tripartite tricarboxylate transporters family, TRAP transporters and ABC transporters as well as genes coding for mandelate racemases/muconate lactonizing enzymes that might be involved in substrate uptake or degradation of AMDase products. Further, three novel AMDases were characterized which showed a high enantiomeric excess (>99%) of the (R)-enantiomer of flurbiprofen. These are the recombinant AmdA and AmdV from Variovorax sp. strains HH01 and HH02, originated from soil, and AmdP from Polymorphum gilvum found by a data base search. Altogether our findings give new insights into the class of AMDases and reveal many previously unknown enzyme candidates with high potential for bioindustrial processes. PMID:27610105

  20. Sequence-Based Screening for Rare Enzymes: New Insights into the World of AMDases Reveal a Conserved Motif and 58 Novel Enzymes Clustering in Eight Distinct Families.

    PubMed

    Maimanakos, Janine; Chow, Jennifer; Gaßmeyer, Sarah K; Güllert, Simon; Busch, Florian; Kourist, Robert; Streit, Wolfgang R

    2016-01-01

    Arylmalonate Decarboxylases (AMDases, EC 4.1.1.76) are very rare and mostly underexplored enzymes. Currently only four known and biochemically characterized representatives exist. However, their ability to decarboxylate α-disubstituted malonic acid derivatives to optically pure products without cofactors makes them attractive and promising candidates for the use as biocatalysts in industrial processes. Until now, AMDases could not be separated from other members of the aspartate/glutamate racemase superfamily based on their gene sequences. Within this work, a search algorithm was developed that enables a reliable prediction of AMDase activity for potential candidates. Based on specific sequence patterns and screening methods 58 novel AMDase candidate genes could be identified in this work. Thereby, AMDases with the conserved sequence pattern of Bordetella bronchiseptica's prototype appeared to be limited to the classes of Alpha-, Beta-, and Gamma-proteobacteria. Amino acid homologies and comparison of gene surrounding sequences enabled the classification of eight enzyme clusters. Particularly striking is the accumulation of genes coding for different transporters of the tripartite tricarboxylate transporters family, TRAP transporters and ABC transporters as well as genes coding for mandelate racemases/muconate lactonizing enzymes that might be involved in substrate uptake or degradation of AMDase products. Further, three novel AMDases were characterized which showed a high enantiomeric excess (>99%) of the (R)-enantiomer of flurbiprofen. These are the recombinant AmdA and AmdV from Variovorax sp. strains HH01 and HH02, originated from soil, and AmdP from Polymorphum gilvum found by a data base search. Altogether our findings give new insights into the class of AMDases and reveal many previously unknown enzyme candidates with high potential for bioindustrial processes. PMID:27610105

  1. Identification of imine reductase-specific sequence motifs.

    PubMed

    Fademrecht, Silvia; Scheller, Philipp N; Nestl, Bettina M; Hauer, Bernhard; Pleiss, Jürgen

    2016-05-01

    Chiral amines are valuable building blocks for the production of a variety of pharmaceuticals, agrochemicals and other specialty chemicals. Only recently, imine reductases (IREDs) were discovered which catalyze the stereoselective reduction of imines to chiral amines. Although several IREDs were biochemically characterized in the last few years, knowledge of the reaction mechanism and the molecular basis of substrate specificity and stereoselectivity is limited. To gain further insights into the sequence-function relationships, the Imine Reductase Engineering Database (www.IRED.BioCatNet.de) was established and a systematic analysis of 530 putative IREDs was performed. A standard numbering scheme based on R-IRED-Sk was introduced to facilitate the identification and communication of structurally equivalent positions in different proteins. A conservation analysis revealed a highly conserved cofactor binding region and a predominantly hydrophobic substrate binding cleft. Two IRED-specific motifs were identified, the cofactor binding motif GLGxMGx5 [ATS]x4 Gx4 [VIL]WNR[TS]x2 [KR] and the active site motif Gx[DE]x[GDA]x[APS]x3 {K}x[ASL]x[LMVIAG]. Our results indicate a preference toward NADPH for all IREDs and explain why, despite their sequence similarity to β-hydroxyacid dehydrogenases (β-HADs), no conversion of β-hydroxyacids has been observed. Superfamily-specific conservations were investigated to explore the molecular basis of their stereopreference. Based on our analysis and previous experimental results on IRED mutants, an exclusive role of standard position 187 for stereoselectivity is excluded. Alternatively, two standard positions 139 and 194 were identified which are superfamily-specifically conserved and differ in R- and S-selective enzymes. Proteins 2016; 84:600-610. © 2016 Wiley Periodicals, Inc. PMID:26857686

  2. CodingMotif: exact determination of overrepresented nucleotide motifs in coding sequences

    PubMed Central

    2012-01-01

    Background It has been increasingly appreciated that coding sequences harbor regulatory sequence motifs in addition to encoding for protein. These sequence motifs are expected to be overrepresented in nucleotide sequences bound by a common protein or small RNA. However, detecting overrepresented motifs has been difficult because of interference by constraints at the protein level. Sampling-based approaches to solve this problem based on codon-shuffling have been limited to exploring only an infinitesimal fraction of the sequence space and by their use of parametric approximations. Results We present a novel O(N(log N)2)-time algorithm, CodingMotif, to identify nucleotide-level motifs of unusual copy number in protein-coding regions. Using a new dynamic programming algorithm we are able to exhaustively calculate the distribution of the number of occurrences of a motif over all possible coding sequences that encode the same amino acid sequence, given a background model for codon usage and dinucleotide biases. Our method takes advantage of the sparseness of loci where a given motif can occur, greatly speeding up the required convolution calculations. Knowledge of the distribution allows one to assess the exact non-parametric p-value of whether a given motif is over- or under- represented. We demonstrate that our method identifies known functional motifs more accurately than sampling and parametric-based approaches in a variety of coding datasets of various size, including ChIP-seq data for the transcription factors NRSF and GABP. Conclusions CodingMotif provides a theoretically and empirically-demonstrated advance for the detection of motifs overrepresented in coding sequences. We expect CodingMotif to be useful for identifying motifs in functional genomic datasets such as DNA-protein binding, RNA-protein binding, or microRNA-RNA binding within coding regions. A software implementation is available at http://bioinformatics.bc.edu/chuanglab/codingmotif.tar PMID

  3. D-MATRIX: A web tool for constructing weight matrix of conserved DNA motifs

    PubMed Central

    Sen, Naresh; Mishra, Manoj; Khan, Feroz; Meena, Abha; Sharma, Ashok

    2009-01-01

    Despite considerable efforts to date, DNA motif prediction in whole genome remains a challenge for researchers. Currently the genome wide motif prediction tools required either direct pattern sequence (for single motif) or weight matrix (for multiple motifs). Although there are known motif pattern databases and tools for genome level prediction but no tool for weight matrix construction. Considering this, we developed a D-MATRIX tool which predicts the different types of weight matrix based on user defined aligned motif sequence set and motif width. For retrieval of known motif sequences user can access the commonly used databases such as TFD, RegulonDB, DBTBS, Transfac. D­MATRIX program uses a simple statistical approach for weight matrix construction, which can be converted into different file formats according to user requirement. It provides the possibility to identify the conserved motifs in the co­regulated genes or whole genome. As example, we successfully constructed the weight matrix of LexA transcription factor binding site with the help of known sos­box cis­regulatory elements in Deinococcus radiodurans genome. The algorithm is implemented in C-Sharp and wrapped in ASP.Net to maintain a user friendly web interface. D­MATRIX tool is accessible through the CIMAP domain network. Availability http://203.190.147.116/dmatrix/ PMID:19759861

  4. Conservation defines functional motifs in the squint/nodal-related 1 RNA dorsal localization element

    PubMed Central

    Gilligan, Patrick C.; Kumari, Pooja; Lim, Shimin; Cheong, Albert; Chang, Alex; Sampath, Karuna

    2011-01-01

    RNA localization is emerging as a general principle of sub-cellular protein localization and cellular organization. However, the sequence and structural requirements in many RNA localization elements remain poorly understood. Whereas transcription factor-binding sites in DNA can be recognized as short degenerate motifs, and consensus binding sites readily inferred, protein-binding sites in RNA often contain structural features, and can be difficult to infer. We previously showed that zebrafish squint/nodal-related 1 (sqt/ndr1) RNA localizes to the future dorsal side of the embryo. Interestingly, mammalian nodal RNA can also localize to dorsal when injected into zebrafish embryos, suggesting that the sequence motif(s) may be conserved, even though the fish and mammal UTRs cannot be aligned. To define potential sequence and structural features, we obtained ndr1 3′-UTR sequences from approximately 50 fishes that are closely, or distantly, related to zebrafish, for high-resolution phylogenetic footprinting. We identify conserved sequence and structural motifs within the zebrafish/carp family and catfish. We find that two novel motifs, a single-stranded AGCAC motif and a small stem-loop, are required for efficient sqt RNA localization. These findings show that comparative sequencing in the zebrafish/carp family is an efficient approach for identifying weak consensus binding sites for RNA regulatory proteins. PMID:21149265

  5. The highly conserved amino acid sequence motif Tyr-Gly-Asp-Thr-Asp-Ser in alpha-like DNA polymerases is required by phage phi 29 DNA polymerase for protein-primed initiation and polymerization.

    PubMed Central

    Bernad, A; Lázaro, J M; Salas, M; Blanco, L

    1990-01-01

    The alpha-like DNA polymerases from bacteriophage phi 29 and other viruses, prokaryotes and eukaryotes contain an amino acid consensus sequence that has been proposed to form part of the dNTP binding site. We have used site-directed mutants to study five of the six highly conserved consecutive amino acids corresponding to the most conserved C-terminal segment (Tyr-Gly-Asp-Thr-Asp-Ser). Our results indicate that in phi 29 DNA polymerase this consensus sequence, although irrelevant for the 3'----5' exonuclease activity, is essential for initiation and elongation. Based on these results and on its homology with known or putative metal-binding amino acid sequences, we propose that in phi 29 DNA polymerase the Tyr-Gly-Asp-Thr-Asp-Ser consensus motif is part of the dNTP binding site, involved in the synthetic activities of the polymerase (i.e., initiation and polymerization), and that it is involved particularly in the metal binding associated with the dNTP site. Images PMID:2191296

  6. Occurrence probability of structured motifs in random sequences.

    PubMed

    Robin, S; Daudin, J-J; Richard, H; Sagot, M-F; Schbath, S

    2002-01-01

    The problem of extracting from a set of nucleic acid sequences motifs which may have biological function is more and more important. In this paper, we are interested in particular motifs that may be implicated in the transcription process. These motifs, called structured motifs, are composed of two ordered parts separated by a variable distance and allowing for substitutions. In order to assess their statistical significance, we propose approximations of the probability of occurrences of such a structured motif in a given sequence. An application of our method to evaluate candidate promoters in E. coli and B. subtilis is presented. Simulations show the goodness of the approximations. PMID:12614545

  7. Comparative analysis of the full genome sequence of European bat lyssavirus type 1 and type 2 with other lyssaviruses and evidence for a conserved transcription termination and polyadenylation motif in the G-L 3' non-translated region.

    PubMed

    Marston, D A; McElhinney, L M; Johnson, N; Müller, T; Conzelmann, K K; Tordo, N; Fooks, A R

    2007-04-01

    We report the first full-length genomic sequences for European bat lyssavirus type-1 (EBLV-1) and type-2 (EBLV-2). The EBLV-1 genomic sequence was derived from a virus isolated from a serotine bat in Hamburg, Germany, in 1968 and the EBLV-2 sequence was derived from a virus isolate from a human case of rabies that occurred in Scotland in 2002. A long-distance PCR strategy was used to amplify the open reading frames (ORFs), followed by standard and modified RACE (rapid amplification of cDNA ends) techniques to amplify the 3' and 5' ends. The lengths of each complete viral genome for EBLV-1 and EBLV-2 were 11 966 and 11 930 base pairs, respectively, and follow the standard rhabdovirus genome organization of five viral proteins. Comparison with other lyssavirus sequences demonstrates variation in degrees of homology, with the genomic termini showing a high degree of complementarity. The nucleoprotein was the most conserved, both intra- and intergenotypically, followed by the polymerase (L), matrix and glyco- proteins, with the phosphoprotein being the most variable. In addition, we have shown that the two EBLVs utilize a conserved transcription termination and polyadenylation (TTP) motif, approximately 50 nt upstream of the L gene start codon. All available lyssavirus sequences to date, with the exception of Pasteur virus (PV) and PV-derived isolates, use the second TTP site. This observation may explain differences in pathogenicity between lyssavirus strains, dependent on the length of the untranslated region, which might affect transcriptional activity and RNA stability. PMID:17374776

  8. Identifying novel sequence variants of RNA 3D motifs

    PubMed Central

    Zirbel, Craig L.; Roll, James; Sweeney, Blake A.; Petrov, Anton I.; Pirrung, Meg; Leontis, Neocles B.

    2015-01-01

    Predicting RNA 3D structure from sequence is a major challenge in biophysics. An important sub-goal is accurately identifying recurrent 3D motifs from RNA internal and hairpin loop sequences extracted from secondary structure (2D) diagrams. We have developed and validated new probabilistic models for 3D motif sequences based on hybrid Stochastic Context-Free Grammars and Markov Random Fields (SCFG/MRF). The SCFG/MRF models are constructed using atomic-resolution RNA 3D structures. To parameterize each model, we use all instances of each motif found in the RNA 3D Motif Atlas and annotations of pairwise nucleotide interactions generated by the FR3D software. Isostericity relations between non-Watson–Crick basepairs are used in scoring sequence variants. SCFG techniques model nested pairs and insertions, while MRF ideas handle crossing interactions and base triples. We use test sets of randomly-generated sequences to set acceptance and rejection thresholds for each motif group and thus control the false positive rate. Validation was carried out by comparing results for four motif groups to RMDetect. The software developed for sequence scoring (JAR3D) is structured to automatically incorporate new motifs as they accumulate in the RNA 3D Motif Atlas when new structures are solved and is available free for download. PMID:26130723

  9. Identifying novel sequence variants of RNA 3D motifs.

    PubMed

    Zirbel, Craig L; Roll, James; Sweeney, Blake A; Petrov, Anton I; Pirrung, Meg; Leontis, Neocles B

    2015-09-01

    Predicting RNA 3D structure from sequence is a major challenge in biophysics. An important sub-goal is accurately identifying recurrent 3D motifs from RNA internal and hairpin loop sequences extracted from secondary structure (2D) diagrams. We have developed and validated new probabilistic models for 3D motif sequences based on hybrid Stochastic Context-Free Grammars and Markov Random Fields (SCFG/MRF). The SCFG/MRF models are constructed using atomic-resolution RNA 3D structures. To parameterize each model, we use all instances of each motif found in the RNA 3D Motif Atlas and annotations of pairwise nucleotide interactions generated by the FR3D software. Isostericity relations between non-Watson-Crick basepairs are used in scoring sequence variants. SCFG techniques model nested pairs and insertions, while MRF ideas handle crossing interactions and base triples. We use test sets of randomly-generated sequences to set acceptance and rejection thresholds for each motif group and thus control the false positive rate. Validation was carried out by comparing results for four motif groups to RMDetect. The software developed for sequence scoring (JAR3D) is structured to automatically incorporate new motifs as they accumulate in the RNA 3D Motif Atlas when new structures are solved and is available free for download. PMID:26130723

  10. Over-represented localized sequence motifs in ribosomal protein gene promoters of basal metazoans.

    PubMed

    Perina, Drago; Korolija, Marina; Roller, Maša; Harcet, Matija; Jeličić, Branka; Mikoč, Andreja; Cetković, Helena

    2011-07-01

    Equimolecular presence of ribosomal proteins (RPs) in the cell is needed for ribosome assembly and is achieved by synchronized expression of ribosomal protein genes (RPGs) with promoters of similar strengths. Over-represented motifs of RPG promoter regions are identified as targets for specific transcription factors. Unlike RPs, those motifs are not conserved between mammals, drosophila, and yeast. We analyzed RPGs proximal promoter regions of three basal metazoans with sequenced genomes: sponge, cnidarian, and placozoan and found common features, such as 5'-terminal oligopyrimidine tracts and TATA-boxes. Furthermore, we identified over-represented motifs, some of which displayed the highest similarity to motifs abundant in human RPG promoters and not present in Drosophila or yeast. Our results indicate that humans over-represented motifs, as well as corresponding domains of transcription factors, were established very early in metazoan evolution. The fast evolving nature of RPGs regulatory network leads to formation of other, lineage specific, over-represented motifs. PMID:21457775

  11. Function-based classification of carbohydrate-active enzymes by recognition of short, conserved peptide motifs.

    PubMed

    Busk, Peter Kamp; Lange, Lene

    2013-06-01

    Functional prediction of carbohydrate-active enzymes is difficult due to low sequence identity. However, similar enzymes often share a few short motifs, e.g., around the active site, even when the overall sequences are very different. To exploit this notion for functional prediction of carbohydrate-active enzymes, we developed a simple algorithm, peptide pattern recognition (PPR), that can divide proteins into groups of sequences that share a set of short conserved sequences. When this method was used on 118 glycoside hydrolase 5 proteins with 9% average pairwise identity and representing four characterized enzymatic functions, 97% of the proteins were sorted into groups correlating with their enzymatic activity. Furthermore, we analyzed 8,138 glycoside hydrolase 13 proteins including 204 experimentally characterized enzymes with 28 different functions. There was a 91% correlation between group and enzyme activity. These results indicate that the function of carbohydrate-active enzymes can be predicted with high precision by finding short, conserved motifs in their sequences. The glycoside hydrolase 61 family is important for fungal biomass conversion, but only a few proteins of this family have been functionally characterized. Interestingly, PPR divided 743 glycoside hydrolase 61 proteins into 16 subfamilies useful for targeted investigation of the function of these proteins and pinpointed three conserved motifs with putative importance for enzyme activity. Furthermore, the conserved sequences were useful for cloning of new, subfamily-specific glycoside hydrolase 61 proteins from 14 fungi. In conclusion, identification of conserved sequence motifs is a new approach to sequence analysis that can predict carbohydrate-active enzyme functions with high precision. PMID:23524681

  12. [Conserved motifs in the primary and secondary ITS1 structures in bryophytes].

    PubMed

    Milyutina, I A; Ignatov, M S

    2015-01-01

    A study of the ITS1 nucleotide sequences of 1000 moss species of 62 families, 11 liverwort species from five orders, and one hornwort Anthoceros agrestis identified five highly conserved motifs (CM1-CM5), which are presumably involved in pre-rRNA processing. Although the ITS1 sequences substantially differ in length and the extent of divergence, the conserved motifs are found in all of them. ITS1 secondary structures were constructed for 76 mosses, and main regularities at conserved motif positioning were observed. The positions of processing sites in the ITS1 secondary structure of the yeast Saccharomyces cerevisiae were found to be similar to the positions of the conserved motifs in the ITS1 secondary structures of mosses and liverworts. In addition, a potential hairpin formation in the putative secondary structure of a pre-rRNA fragment was considered for the region between ITS1 CM4-CM5 and a highly conserved region between hairpins 49 and 50 (H49 and H50) of the 18S rRNA. PMID:26107892

  13. Bioinformatic identification of novel regulatory DNA sequence motifs in Streptomyces coelicolor

    PubMed Central

    Studholme, David J; Bentley, Stephen D; Kormanec, Jan

    2004-01-01

    Background Streptomyces coelicolor is a bacterium with a vast repertoire of metabolic functions and complex systems of cellular development. Its genome sequence is rich in genes that encode regulatory proteins to control these processes in response to its changing environment. We wished to apply a recently published bioinformatic method for identifying novel regulatory sequence signals to gain new insights into regulation in S. coelicolor. Results The method involved production of position-specific weight matrices from alignments of over-represented words of DNA sequence. We generated 2497 weight matrices, each representing a candidate regulatory DNA sequence motif. We scanned the genome sequence of S. coelicolor against each of these matrices. A DNA sequence motif represented by one of the matrices was found preferentially in non-coding sequences immediately upstream of genes involved in polysaccharide degradation, including several that encode chitinases. This motif (TGGTCTAGACCA) was also found upstream of genes encoding components of the phosphoenolpyruvate phosphotransfer system (PTS). We hypothesise that this DNA sequence motif represents a regulatory element that is responsive to availability of carbon-sources. Other motifs of potential biological significance were found upstream of genes implicated in secondary metabolism (TTAGGTtAGgCTaACCTAA), sigma factors (TGACN19TGAC), DNA replication and repair (ttgtCAGTGN13TGGA), nucleotide conversions (CTACgcNCGTAG), and ArsR (TCAGN12TCAG). A motif found upstream of genes involved in chromosome replication (TGTCagtgcN7Tagg) was similar to a previously described motif found in UV-responsive promoters. Conclusions We successfully applied a recently published in silico method to identify conserved sequence motifs in S. coelicolor that may be biologically significant as regulatory elements. Our data are broadly consistent with and further extend data from previously published studies. We invite experimental testing of

  14. Characterization of the tandem CWCH2 sequence motif: a hallmark of inter-zinc finger interactions

    PubMed Central

    2010-01-01

    Background The C2H2 zinc finger (ZF) domain is widely conserved among eukaryotic proteins. In Zic/Gli/Zap1 C2H2 ZF proteins, the two N-terminal ZFs form a single structural unit by sharing a hydrophobic core. This structural unit defines a new motif comprised of two tryptophan side chains at the center of the hydrophobic core. Because each tryptophan residue is located between the two cysteine residues of the C2H2 motif, we have named this structure the tandem CWCH2 (tCWCH2) motif. Results Here, we characterized 587 tCWCH2-containing genes using data derived from public databases. We categorized genes into 11 classes including Zic/Gli/Glis, Arid2/Rsc9, PacC, Mizf, Aebp2, Zap1/ZafA, Fungl, Zfp106, Twincl, Clr1, and Fungl-4ZF, based on sequence similarity, domain organization, and functional similarities. tCWCH2 motifs are mostly found in organisms belonging to the Opisthokonta (metazoa, fungi, and choanoflagellates) and Amoebozoa (amoeba, Dictyostelium discoideum). By comparison, the C2H2 ZF motif is distributed widely among the eukaryotes. The structure and organization of the tCWCH2 motif, its phylogenetic distribution, and molecular phylogenetic analysis suggest that prototypical tCWCH2 genes existed in the Opisthokonta ancestor. Within-group or between-group comparisons of the tCWCH2 amino acid sequence identified three additional sequence features (site-specific amino acid frequencies, longer linker sequence between two C2H2 ZFs, and frequent extra-sequences within C2H2 ZF motifs). Conclusion These features suggest that the tCWCH2 motif is a specialized motif involved in inter-zinc finger interactions. PMID:20167128

  15. Fast and Accurate Discovery of Degenerate Linear Motifs in Protein Sequences

    PubMed Central

    Levy, Emmanuel D.; Michnick, Stephen W.

    2014-01-01

    Linear motifs mediate a wide variety of cellular functions, which makes their characterization in protein sequences crucial to understanding cellular systems. However, the short length and degenerate nature of linear motifs make their discovery a difficult problem. Here, we introduce MotifHound, an algorithm particularly suited for the discovery of small and degenerate linear motifs. MotifHound performs an exact and exhaustive enumeration of all motifs present in proteins of interest, including all of their degenerate forms, and scores the overrepresentation of each motif based on its occurrence in proteins of interest relative to a background (e.g., proteome) using the hypergeometric distribution. To assess MotifHound, we benchmarked it together with state-of-the-art algorithms. The benchmark consists of 11,880 sets of proteins from S. cerevisiae; in each set, we artificially spiked-in one motif varying in terms of three key parameters, (i) number of occurrences, (ii) length and (iii) the number of degenerate or “wildcard” positions. The benchmark enabled the evaluation of the impact of these three properties on the performance of the different algorithms. The results showed that MotifHound and SLiMFinder were the most accurate in detecting degenerate linear motifs. Interestingly, MotifHound was 15 to 20 times faster at comparable accuracy and performed best in the discovery of highly degenerate motifs. We complemented the benchmark by an analysis of proteins experimentally shown to bind the FUS1 SH3 domain from S. cerevisiae. Using the full-length protein partners as sole information, MotifHound recapitulated most experimentally determined motifs binding to the FUS1 SH3 domain. Moreover, these motifs exhibited properties typical of SH3 binding peptides, e.g., high intrinsic disorder and evolutionary conservation, despite the fact that none of these properties were used as prior information. MotifHound is available (http://michnick.bcm.umontreal.ca or http

  16. Phosphatidylinositol transfer proteins: sequence motifs in structural and evolutionary analyses

    PubMed Central

    Wyckoff, Gerald J.; Solidar, Ada; Yoden, Marilyn D.

    2016-01-01

    Phosphatidylinositol transfer proteins (PITP) are a family of monomeric proteins that bind and transfer phosphatidylinositol and phosphatidylcholine between membrane compartments. They are required for production of inositol and diacylglycerol second messengers, and are found in most metazoan organisms. While PITPs are known to carry out crucial cell-signaling roles in many organisms, the structure, function and evolution of the majority of family members remains unexplored; primarily because the ubiquity and diversity of the family thwarts traditional methods of global alignment. To surmount this obstacle, we instead took a novel approach, using MEME and a parsimony-based analysis to create a cladogram of conserved sequence motifs in 56 PITP family proteins from 26 species. In keeping with previous functional annotations, three clades were supported within our evolutionary analysis; two classes of soluble proteins and a class of membrane-associated proteins. By, focusing on conserved regions, the analysis allowed for in depth queries regarding possible functional roles of PITP proteins in both intra- and extra- cellular signaling.

  17. Predicting candidate genomic sequences that correspond to synthetic functional RNA motifs

    PubMed Central

    Laserson, Uri; Gan, Hin Hark; Schlick, Tamar

    2005-01-01

    Riboswitches and RNA interference are important emerging mechanisms found in many organisms to control gene expression. To enhance our understanding of such RNA roles, finding small regulatory motifs in genomes presents a challenge on a wide scale. Many simple functional RNA motifs have been found by in vitro selection experiments, which produce synthetic target-binding aptamers as well as catalytic RNAs, including the hammerhead ribozyme. Motivated by the prediction of Piganeau and Schroeder [(2003) Chem. Biol., 10, 103–104] that synthetic RNAs may have natural counterparts, we develop and apply an efficient computational protocol for identifying aptamer-like motifs in genomes. We define motifs from the sequence and structural information of synthetic aptamers, search for sequences in genomes that will produce motif matches, and then evaluate the structural stability and statistical significance of the potential hits. Our application to aptamers for streptomycin, chloramphenicol, neomycin B and ATP identifies 37 candidate sequences (in coding and non-coding regions) that fold to the target aptamer structures in bacterial and archaeal genomes. Further energetic screening reveals that several candidates exhibit energetic properties and sequence conservation patterns that are characteristic of functional motifs. Besides providing candidates for experimental testing, our computational protocol offers an avenue for expanding natural RNA's functional repertoire. PMID:16254081

  18. A Conserved Motif Provides Binding Specificity to the PP2A-B56 Phosphatase.

    PubMed

    Hertz, Emil Peter Thrane; Kruse, Thomas; Davey, Norman E; López-Méndez, Blanca; Sigurðsson, Jón Otti; Montoya, Guillermo; Olsen, Jesper V; Nilsson, Jakob

    2016-08-18

    Dynamic protein phosphorylation is a fundamental mechanism regulating biological processes in all organisms. Protein phosphatase 2A (PP2A) is the main source of phosphatase activity in the cell, but the molecular details of substrate recognition are unknown. Here, we report that a conserved surface-exposed pocket on PP2A regulatory B56 subunits binds to a consensus sequence on interacting proteins, which we term the LxxIxE motif. The composition of the motif modulates the affinity for B56, which in turn determines the phosphorylation status of associated substrates. Phosphorylation of amino acid residues within the motif increases B56 binding, allowing integration of kinase and phosphatase activity. We identify conserved LxxIxE motifs in essential proteins throughout the eukaryotic domain of life and in human viruses, suggesting that the motifs are required for basic cellular function. Our study provides a molecular description of PP2A binding specificity with broad implications for understanding signaling in eukaryotes. PMID:27453045

  19. Discovering Motifs in Ranked Lists of DNA Sequences

    PubMed Central

    Eden, Eran; Lipson, Doron; Yogev, Sivan; Yakhini, Zohar

    2007-01-01

    Computational methods for discovery of sequence elements that are enriched in a target set compared with a background set are fundamental in molecular biology research. One example is the discovery of transcription factor binding motifs that are inferred from ChIP–chip (chromatin immuno-precipitation on a microarray) measurements. Several major challenges in sequence motif discovery still require consideration: (i) the need for a principled approach to partitioning the data into target and background sets; (ii) the lack of rigorous models and of an exact p-value for measuring motif enrichment; (iii) the need for an appropriate framework for accounting for motif multiplicity; (iv) the tendency, in many of the existing methods, to report presumably significant motifs even when applied to randomly generated data. In this paper we present a statistical framework for discovering enriched sequence elements in ranked lists that resolves these four issues. We demonstrate the implementation of this framework in a software application, termed DRIM (discovery of rank imbalanced motifs), which identifies sequence motifs in lists of ranked DNA sequences. We applied DRIM to ChIP–chip and CpG methylation data and obtained the following results. (i) Identification of 50 novel putative transcription factor (TF) binding sites in yeast ChIP–chip data. The biological function of some of them was further investigated to gain new insights on transcription regulation networks in yeast. For example, our discoveries enable the elucidation of the network of the TF ARO80. Another finding concerns a systematic TF binding enhancement to sequences containing CA repeats. (ii) Discovery of novel motifs in human cancer CpG methylation data. Remarkably, most of these motifs are similar to DNA sequence elements bound by the Polycomb complex that promotes histone methylation. Our findings thus support a model in which histone methylation and CpG methylation are mechanistically linked. Overall

  20. Oligonucleotide Sequence Motifs as Nucleosome Positioning Signals

    PubMed Central

    Collings, Clayton K.; Fernandez, Alfonso G.; Pitschka, Chad G.; Hawkins, Troy B.; Anderson, John N.

    2010-01-01

    To gain a better understanding of the sequence patterns that characterize positioned nucleosomes, we first performed an analysis of the periodicities of the 256 tetranucleotides in a yeast genome-wide library of nucleosomal DNA sequences that was prepared by in vitro reconstitution. The approach entailed the identification and analysis of 24 unique tetranucleotides that were defined by 8 consensus sequences. These consensus sequences were shown to be responsible for most if not all of the tetranucleotide and dinucleotide periodicities displayed by the entire library, demonstrating that the periodicities of dinucleotides that characterize the yeast genome are, in actuality, due primarily to the 8 consensus sequences. A novel combination of experimental and bioinformatic approaches was then used to show that these tetranucleotides are important for preferred formation of nucleosomes at specific sites along DNA in vitro. These results were then compared to tetranucleotide patterns in genome-wide in vivo libraries from yeast and C. elegans in order to assess the contributions of DNA sequence in the control of nucleosome residency in the cell. These comparisons revealed striking similarities in the tetranucleotide occurrence profiles that are likely to be involved in nucleosome positioning in both in vitro and in vivo libraries, suggesting that DNA sequence is an important factor in the control of nucleosome placement in vivo. However, the strengths of the tetranucleotide periodicities were 3–4 fold higher in the in vitro as compared to the in vivo libraries, which implies that DNA sequence plays less of a role in dictating nucleosome positions in vivo. The results of this study have important implications for models of sequence-dependent positioning since they suggest that a defined subset of tetranucleotides is involved in preferred nucleosome occupancy and that these tetranucleotides are the major source of the dinucleotide periodicities that are characteristic of

  1. A conserved heptamer motif for ribosomal RNA transcription termination in animal mitochondria.

    PubMed Central

    Valverde, J R; Marco, R; Garesse, R

    1994-01-01

    A search of sequence data bases for a tridecamer transcription termination signal, previously described in human mtDNA as being responsible for the accumulation of mitochondrial ribosomal RNAs (rRNAs) in excess over the rest of mitochondrial genes, has revealed that this termination signal occurs in equivalent positions in a wide variety of organisms from protozoa to mammals. Due to the compact organization of the mtDNA, the tridecamer motif usually appears as part of the 3' adjacent gene sequence. Because in phylogenetically widely separated organisms the mitochondrial genome has experienced many rearrangements, it is interesting that its occurrence near the 3' end of the large rRNA is independent of the adjacent gene. The tridecamer sequence has diverged in phylogenetically widely separated organisms. Nevertheless, a well-conserved heptamer--TGGCAGA, the mitochondrial rRNA termination box--can be defined. Although extending the experimental evidence of its role as a transcription termination signal in humans will be of great interest, its evolutionary conservation strongly suggests that mitochondrial rRNA transcription termination could be a widely conserved mechanism in animals. Furthermore, the conservation of a homologous tridecamer motif in one of the last 3' secondary loops of nonmitochondrial 23S-like rRNAs suggests that the role of the sequence has changed during mitochondrial evolution. PMID:7515499

  2. Classification of protein motifs based on subcellular localization uncovers evolutionary relationships at both sequence and functional levels

    PubMed Central

    2013-01-01

    Background Most proteins have evolved in specific cellular compartments that limit their functions and potential interactions. On the other hand, motifs define amino acid arrangements conserved between protein family members and represent powerful tools for assigning function to protein sequences. The ideal motif would identify all members of a protein family but in practice many motifs identify both family members and unrelated proteins, referred to as True Positive (TP) and False Positive (FP) sequences, respectively. Results To address the relationship between protein motifs, protein function and cellular localization, we systematically assigned subcellular localization data to motif sequences from the comprehensive PROSITE sequence motif database. Using this data we analyzed relationships between localization and function. We find that TPs and FPs have a strong tendency to localize in different compartments. When multiple localizations are considered, TPs are usually distributed between related cellular compartments. We also identified cases where FPs are concentrated in particular subcellular regions, indicating possible functional or evolutionary relationships with TP sequences of the same motif. Conclusions Our findings suggest that the systematic examination of subcellular localization has the potential to uncover evolutionary and functional relationships between motif-containing sequences. We believe that this type of analysis complements existing motif annotations and could aid in their interpretation. Our results shed light on the evolution of cellular organelles and potentially establish the basis for new subcellular localization and function prediction algorithms. PMID:23865897

  3. Functional Analysis of Semi-conserved Transit Peptide Motifs and Mechanistic Implications in Precursor Targeting and Recognition.

    PubMed

    Holbrook, Kristen; Subramanian, Chitra; Chotewutmontri, Prakitchai; Reddick, L Evan; Wright, Sarah; Zhang, Huixia; Moncrief, Lily; Bruce, Barry D

    2016-09-01

    Over 95% of plastid proteins are nuclear-encoded as their precursors containing an N-terminal extension known as the transit peptide (TP). Although highly variable, TPs direct the precursors through a conserved, posttranslational mechanism involving translocons in the outer (TOC) and inner envelope (TOC). The organelle import specificity is mediated by one or more components of the Toc complex. However, the high TP diversity creates a paradox on how the sequences can be specifically recognized. An emerging model of TP design is that they contain multiple loosely conserved motifs that are recognized at different steps in the targeting and transport process. Bioinformatics has demonstrated that many TPs contain semi-conserved physicochemical motifs, termed FGLK. In order to characterize FGLK motifs in TP recognition and import, we have analyzed two well-studied TPs from the precursor of RuBisCO small subunit (SStp) and ferredoxin (Fdtp). Both SStp and Fdtp contain two FGLK motifs. Analysis of large set mutations (∼85) in these two motifs using in vitro, in organello, and in vivo approaches support a model in which the FGLK domains mediate interaction with TOC34 and possibly other TOC components. In vivo import analysis suggests that multiple FGLK motifs are functionally redundant. Furthermore, we discuss how FGLK motifs are required for efficient precursor protein import and how these elements may permit a convergent function of this highly variable class of targeting sequences. PMID:27378725

  4. Discovering common stem–loop motifs in unaligned RNA sequences

    PubMed Central

    Gorodkin, Jan; Stricklin, Shawn L.; Stormo, Gary D.

    2001-01-01

    Post-transcriptional regulation of gene expression is often accomplished by proteins binding to specific sequence motifs in mRNA molecules, to affect their translation or stability. The motifs are often composed of a combination of sequence and structural constraints such that the overall structure is preserved even though much of the primary sequence is variable. While several methods exist to discover transcriptional regulatory sites in the DNA sequences of coregulated genes, the RNA motif discovery problem is much more difficult because of covariation in the positions. We describe the combined use of two approaches for RNA structure prediction, FOLDALIGN and COVE, that together can discover and model stem–loop RNA motifs in unaligned sequences, such as UTRs from post-transcriptionally coregulated genes. We evaluate the method on two datasets, one a section of rRNA genes with randomly truncated ends so that a global alignment is not possible, and the other a hyper-variable collection of IRE-like elements that were inserted into randomized UTR sequences. In both cases the combined method identified the motifs correctly, and in the rRNA example we show that it is capable of determining the structure, which includes bulge and internal loops as well as a variable length hairpin loop. Those automated results are quantitatively evaluated and found to agree closely with structures contained in curated databases, with correlation coefficients up to 0.9. A basic server, Stem–Loop Align SearcH (SLASH), which will perform stem–loop searches in unaligned RNA sequences, is available at http://www.bioinf.au.dk/slash/. PMID:11353083

  5. An evolutionary analysis of flightin reveals a conserved motif unique and widespread in Pancrustacea.

    PubMed

    Soto-Adames, Felipe N; Alvarez-Ortiz, Pedro; Vigoreaux, Jim O

    2014-01-01

    Flightin is a thick filament protein that in Drosophila melanogaster is uniquely expressed in the asynchronous, indirect flight muscles (IFM). Flightin is required for the structure and function of the IFM and is indispensable for flight in Drosophila. Given the importance of flight acquisition in the evolutionary history of insects, here we study the phylogeny and distribution of flightin. Flightin was identified in 69 species of hexapods in classes Collembola (springtails), Protura, Diplura, and insect orders Thysanura (silverfish), Dictyoptera (roaches), Orthoptera (grasshoppers), Pthiraptera (lice), Hemiptera (true bugs), Coleoptera (beetles), Neuroptera (green lacewing), Hymenoptera (bees, ants, and wasps), Lepidoptera (moths), and Diptera (flies and mosquitoes). Flightin was also found in 14 species of crustaceans in orders Anostraca (water flea), Cladocera (brine shrimp), Isopoda (pill bugs), Amphipoda (scuds, sideswimmers), and Decapoda (lobsters, crabs, and shrimps). Flightin was not identified in representatives of chelicerates, myriapods, or any species outside Pancrustacea (Tetraconata, sensu Dohle). Alignment of amino acid sequences revealed a conserved region of 52 amino acids, referred herein as WYR, that is bound by strictly conserved tryptophan (W) and arginine (R) and an intervening sequence with a high content of tyrosines (Y). This motif has no homologs in GenBank or PROSITE and is unique to flightin and paraflightin, a putative flightin paralog identified in decapods. A third motif of unclear affinities to pancrustacean WYR was observed in chelicerates. Phylogenetic analysis of amino acid sequences of the conserved motif suggests that paraflightin originated before the divergence of amphipods, isopods, and decapods. We conclude that flightin originated de novo in the ancestor of Pancrustacea > 500 MYA, well before the divergence of insects (~400 MYA) and the origin of flight (~325 MYA), and that its IFM-specific function in Drosophila is a more

  6. Conserved motif of CDK5RAP2 mediates its localization to centrosomes and the Golgi complex.

    PubMed

    Wang, Zhe; Wu, Tao; Shi, Lin; Zhang, Lin; Zheng, Wei; Qu, Jianan Y; Niu, Ruifang; Qi, Robert Z

    2010-07-16

    As the primary microtubule-organizing centers, centrosomes require gamma-tubulin for microtubule nucleation and organization. Located in close vicinity to centrosomes, the Golgi complex is another microtubule-organizing organelle in interphase cells. CDK5RAP2 is a gamma-tubulin complex-binding protein and functions in gamma-tubulin attachment to centrosomes. In this study, we find that CDK5RAP2 localizes to the Golgi complex in an ATP- and centrosome-dependent manner and associates with Golgi membranes independently of microtubules. CDK5RAP2 contains a centrosome-targeting domain with its core region highly homologous to the Motif 2 (CM2) of centrosomin, a functionally related protein in Drosophila. This sequence, referred to as the CM2-like motif, is also conserved in related proteins in chicken and zebrafish. Therefore, CDK5RAP2 may undertake a conserved mechanism for centrosomal localization. Using a mutational approach, we demonstrate that the CM2-like motif plays a crucial role in the centrosomal and Golgi localization of CDK5RAP2. Furthermore, the CM2-like motif is essential for the association of the centrosome-targeting domain to pericentrin and AKAP450. The binding with pericentrin is required for the centrosomal and Golgi localization of CDK5RAP2, whereas the binding with AKAP450 is required for the Golgi localization. Although the CM2-like motif possesses the activity of Ca(2+)-independent calmodulin binding, binding of calmodulin to this sequence is dispensable for centrosomal and Golgi association. Altogether, CDK5RAP2 may represent a novel mechanism for centrosomal and Golgi localization. PMID:20466722

  7. Computing distribution of scale independent motifs in biological sequences

    PubMed Central

    Almeida, Jonas S; Vinga, Susana

    2006-01-01

    The use of Chaos Game Representation (CGR) or its generalization, Universal Sequence Maps (USM), to describe the distribution of biological sequences has been found objectionable because of the fractal structure of that coordinate system. Consequently, the investigation of distribution of symbolic motifs at multiple scales is hampered by an inexact association between distance and sequence dissimilarity. A solution to this problem could unleash the use of iterative maps as phase-state representation of sequences where its statistical properties can be conveniently investigated. In this study a family of kernel density functions is described that accommodates the fractal nature of iterative function representations of symbolic sequences and, consequently, enables the exact investigation of sequence motifs of arbitrary lengths in that scale-independent representation. Furthermore, the proposed kernel density includes both Markovian succession and currently used alignment-free sequence dissimilarity metrics as special solutions. Therefore, the fractal kernel described is in fact a generalization that provides a common framework for a diverse suite of sequence analysis techniques. PMID:17049089

  8. Do short, frequent DNA sequence motifs mould the epigenome?

    PubMed

    Quante, Timo; Bird, Adrian

    2016-04-01

    'Epigenome' refers to the panoply of chemical modifications borne by DNA and its associated proteins that locally affect genome function. Epigenomic patterns are thought to be determined by external constraints resulting from development, disease and the environment, but DNA sequence is also a potential influence. We propose that domains of relatively uniform DNA base composition may modulate the epigenome through cell type-specific proteins that recognize short, frequent sequence motifs. Differential recruitment of epigenomic modifiers may adjust gene expression in multigene blocks as an alternative to tuning the activity of each gene separately, thus simplifying gene expression programming. PMID:26837845

  9. Sequence-motif Detection of NAD(P)-binding Proteins: Discovery of a Unique Antibacterial Drug Target

    NASA Astrophysics Data System (ADS)

    Hua, Yun Hao; Wu, Chih Yuan; Sargsyan, Karen; Lim, Carmay

    2014-09-01

    Many enzymes use nicotinamide adenine dinucleotide or nicotinamide adenine dinucleotide phosphate (NAD(P)) as essential coenzymes. These enzymes often do not share significant sequence identity and cannot be easily detected by sequence homology. Previously, we determined all distinct locally conserved pyrophosphate-binding structures (3d motifs) from NAD(P)-bound protein structures, from which 1d sequence motifs were derived. Here, we aim to establish the precision of these 3d and 1d motifs to annotate NAD(P)-binding proteins. We show that the pyrophosphate-binding 3d motifs are characteristic of NAD(P)-binding proteins, as they are rarely found in nonNAD(P)-binding proteins. Furthermore, several 1d motifs could distinguish between proteins that bind only NAD and those that bind only NADP. They could also distinguish between NAD(P)-binding proteins from nonNAD(P)-binding ones. Interestingly, one of the pyrophosphate-binding 3d and corresponding 1d motifs was found only in enoyl-acyl carrier protein reductases, which are enzymes essential for bacterial fatty acid biosynthesis. This unique 3d motif serves as an attractive novel drug target, as it is conserved across many bacterial species and is not found in human proteins.

  10. Sequence-motif Detection of NAD(P)-binding Proteins: Discovery of a Unique Antibacterial Drug Target

    PubMed Central

    Hua, Yun Hao; Wu, Chih Yuan; Sargsyan, Karen; Lim, Carmay

    2014-01-01

    Many enzymes use nicotinamide adenine dinucleotide or nicotinamide adenine dinucleotide phosphate (NAD(P)) as essential coenzymes. These enzymes often do not share significant sequence identity and cannot be easily detected by sequence homology. Previously, we determined all distinct locally conserved pyrophosphate-binding structures (3d motifs) from NAD(P)-bound protein structures, from which 1d sequence motifs were derived. Here, we aim to establish the precision of these 3d and 1d motifs to annotate NAD(P)-binding proteins. We show that the pyrophosphate-binding 3d motifs are characteristic of NAD(P)-binding proteins, as they are rarely found in nonNAD(P)-binding proteins. Furthermore, several 1d motifs could distinguish between proteins that bind only NAD and those that bind only NADP. They could also distinguish between NAD(P)-binding proteins from nonNAD(P)-binding ones. Interestingly, one of the pyrophosphate-binding 3d and corresponding 1d motifs was found only in enoyl-acyl carrier protein reductases, which are enzymes essential for bacterial fatty acid biosynthesis. This unique 3d motif serves as an attractive novel drug target, as it is conserved across many bacterial species and is not found in human proteins. PMID:25253464

  11. Conserved rhodopsin intradiscal structural motifs mediate stabilization: effects of zinc.

    PubMed

    Gleim, Scott; Stojanovic, Aleksandar; Arehart, Eric; Byington, Daniel; Hwa, John

    2009-03-01

    Retinitis pigmentosa (RP), a neurodegenerative disorder, can arise from single point mutations in rhodopsin, leading to a cascade of protein instability, misfolding, aggregation, rod cell death, retinal degeneration, and ultimately blindness. Divalent cations, such as zinc and copper, have allosteric effects on misfolded aggregates of comparable neurodegenerative disorders including Alzheimer disease, prion diseases, and ALS. We report that two structurally conserved low-affinity zinc coordination motifs, located among a cluster of RP mutations in the intradiscal loop region, mediate dose-dependent rhodopsin destabilization. Disruption of native interactions involving histidines 100 and 195, through site-directed mutagenesis or exogenous zinc coordination, results in significant loss of receptor stability. Furthermore, chelation with EDTA stabilizes the structure of both wild-type rhodopsin and the most prevalent rhodopsin RP mutation, P(23)H. These interactions suggest that homeostatic regulation of trace metal concentrations in the rod outer segment of the retina may be important both physiologically and for an important cluster of RP mutations. Furthermore, with a growing awareness of allosteric zinc binding domains on a diverse range of GPCRs, such principles may apply to many other receptors and their associated diseases. PMID:19206210

  12. Conserved rhodopsin intradiscal structural motifs mediate stabilization; effects of zinc†

    PubMed Central

    Gleim, Scott; Stojanovic, Aleksandar; Arehart, Eric; Byington, Daniel; Hwa, John

    2009-01-01

    Retinitis pigmentosa (RP), a neurodegenerative disorder, can arise from single point mutations in rhodopsin, leading to a cascade of protein instability, misfolding, aggregation, rod cell death, retinal degeneration, and ultimately blindness. Divalent cations, such as zinc and copper, have allosteric effects on misfolded aggregates of comparable neurodegenerative disorders including Alzheimer disease, prion diseases, and ALS. We report that two structurally conserved low-affinity zinc coordination motifs, located among a cluster of RP mutations in the intradiscal loop region, mediate dose-dependent rhodopsin destabilization. Disruption of native interactions involving histidines 100 and 195, through site-directed mutagenesis or exogenous zinc coordination, results in significant loss of receptor stability. Furthermore, chelation with EDTA stabilizes the structure of both wild type rhodopsin and the most prevalent rhodopsin RP mutation, P23H. These interactions suggest that homeostatic regulation of trace metal concentrations in the rod outer segment of the retina may be important both physiologically and for an important cluster of RP mutations. Furthermore, with a growing awareness of allosteric zinc binding domains on a diverse range of GPCRs, such principles may apply to many other receptors and their associated diseases. PMID:19206210

  13. Gapped alignment of protein sequence motifs through Monte Carlo optimization of a hidden Markov model

    PubMed Central

    Neuwald, Andrew F; Liu, Jun S

    2004-01-01

    Background Certain protein families are highly conserved across distantly related organisms and belong to large and functionally diverse superfamilies. The patterns of conservation present in these protein sequences presumably are due to selective constraints maintaining important but unknown structural mechanisms with some constraints specific to each family and others shared by a larger subset or by the entire superfamily. To exploit these patterns as a source of functional information, we recently devised a statistically based approach called contrast hierarchical alignment and interaction network (CHAIN) analysis, which infers the strengths of various categories of selective constraints from co-conserved patterns in a multiple alignment. The power of this approach strongly depends on the quality of the multiple alignments, which thus motivated development of theoretical concepts and strategies to improve alignment of conserved motifs within large sets of distantly related sequences. Results Here we describe a hidden Markov model (HMM), an algebraic system, and Markov chain Monte Carlo (MCMC) sampling strategies for alignment of multiple sequence motifs. The MCMC sampling strategies are useful both for alignment optimization and for adjusting position specific background amino acid frequencies for alignment uncertainties. Associated statistical formulations provide an objective measure of alignment quality as well as automatic gap penalty optimization. Improved alignments obtained in this way are compared with PSI-BLAST based alignments within the context of CHAIN analysis of three protein families: Giα subunits, prolyl oligopeptidases, and transitional endoplasmic reticulum (p97) AAA+ ATPases. Conclusion While not entirely replacing PSI-BLAST based alignments, which likewise may be optimized for CHAIN analysis using this approach, these motif-based methods often more accurately align very distantly related sequences and thus can provide a better measure of

  14. Characterization of evolutionarily conserved motifs involved in activity and regulation of the ABA-INSENSITIVE (ABI) 4 transcription factor.

    PubMed

    Gregorio, Josefat; Hernández-Bernal, Alma Fabiola; Cordoba, Elizabeth; León, Patricia

    2014-02-01

    In recent years, the transcription factor ABI4 has emerged as an important node of integration for external and internal signals such as nutrient status and hormone signaling that modulates critical transitions during the growth and development of plants. For this reason, understanding the mechanism of action and regulation of this protein represents an important step towards the elucidation of crosstalk mechanisms in plants. However, this understanding has been hindered due to the negligible levels of this protein as a result of multiple posttranscriptional regulations. To better understand the function and regulation of the ABI4 protein in this work, we performed a functional analysis of several evolutionarily conserved motifs. Based on these conserved motifs, we identified ortholog genes of ABI4 in different plant species. The functionality of the putative ortholog from Theobroma cacao was demonstrated in transient expression assays and in complementation studies in plants. The function of the highly conserved motifs was analyzed after their deletion or mutagenesis in the Arabidopsis ABI4 sequence using mesophyll protoplasts. This approach permitted us to immunologically detect the ABI4 protein and identify some of the mechanisms involved in its regulation. We identified sequences required for the nuclear localization (AP2-associated motif) as well as those for transcriptional activation function (LRP motif). Moreover, this approach showed that the protein stability of this transcription factor is controlled through protein degradation and subcellular localization and involves the AP2-associated and the PEST motifs. We demonstrated that the degradation of ABI4 protein through the PEST motif is mediated by the 26S proteasome in response to changes in the sugar levels. PMID:24046063

  15. Sequence-Based Classification Using Discriminatory Motif Feature Selection

    PubMed Central

    Xiong, Hao; Capurso, Daniel; Sen, Śaunak; Segal, Mark R.

    2011-01-01

    Most existing methods for sequence-based classification use exhaustive feature generation, employing, for example, all -mer patterns. The motivation behind such (enumerative) approaches is to minimize the potential for overlooking important features. However, there are shortcomings to this strategy. First, practical constraints limit the scope of exhaustive feature generation to patterns of length , such that potentially important, longer () predictors are not considered. Second, features so generated exhibit strong dependencies, which can complicate understanding of derived classification rules. Third, and most importantly, numerous irrelevant features are created. These concerns can compromise prediction and interpretation. While remedies have been proposed, they tend to be problem-specific and not broadly applicable. Here, we develop a generally applicable methodology, and an attendant software pipeline, that is predicated on discriminatory motif finding. In addition to the traditional training and validation partitions, our framework entails a third level of data partitioning, a discovery partition. A discriminatory motif finder is used on sequences and associated class labels in the discovery partition to yield a (small) set of features. These features are then used as inputs to a classifier in the training partition. Finally, performance assessment occurs on the validation partition. Important attributes of our approach are its modularity (any discriminatory motif finder and any classifier can be deployed) and its universality (all data, including sequences that are unaligned and/or of unequal length, can be accommodated). We illustrate our approach on two nucleosome occupancy datasets and a protein solubility dataset, previously analyzed using enumerative feature generation. Our method achieves excellent performance results, with and without optimization of classifier tuning parameters. A Python pipeline implementing the approach is available at http

  16. Rewiring yeast sugar transporter preference through modifying a conserved protein motif

    PubMed Central

    Young, Eric M.; Tong, Alice; Bui, Hang; Spofford, Caitlin; Alper, Hal S.

    2014-01-01

    Utilization of exogenous sugars found in lignocellulosic biomass hydrolysates, such as xylose, must be improved before yeast can serve as an efficient biofuel and biochemical production platform. In particular, the first step in this process, the molecular transport of xylose into the cell, can serve as a significant flux bottleneck and is highly inhibited by other sugars. Here we demonstrate that sugar transport preference and kinetics can be rewired through the programming of a sequence motif of the general form G-G/F-XXX-G found in the first transmembrane span. By evaluating 46 different heterologously expressed transporters, we find that this motif is conserved among functional transporters and highly enriched in transporters that confer growth on xylose. Through saturation mutagenesis and subsequent rational mutagenesis, four transporter mutants unable to confer growth on glucose but able to sustain growth on xylose were engineered. Specifically, Candida intermedia gxs1 Phe38Ile39Met40, Scheffersomyces stipitis rgt2 Phe38 and Met40, and Saccharomyces cerevisiae hxt7 Ile39Met40Met340 all exhibit this phenotype. In these cases, primary hexose transporters were rewired into xylose transporters. These xylose transporters nevertheless remained inhibited by glucose. Furthermore, in the course of identifying this motif, novel wild-type transporters with superior monosaccharide growth profiles were discovered, namely S. stipitis RGT2 and Debaryomyces hansenii 2D01474. These findings build toward the engineering of efficient pentose utilization in yeast and provide a blueprint for reprogramming transporter properties. PMID:24344268

  17. Rewiring yeast sugar transporter preference through modifying a conserved protein motif.

    PubMed

    Young, Eric M; Tong, Alice; Bui, Hang; Spofford, Caitlin; Alper, Hal S

    2014-01-01

    Utilization of exogenous sugars found in lignocellulosic biomass hydrolysates, such as xylose, must be improved before yeast can serve as an efficient biofuel and biochemical production platform. In particular, the first step in this process, the molecular transport of xylose into the cell, can serve as a significant flux bottleneck and is highly inhibited by other sugars. Here we demonstrate that sugar transport preference and kinetics can be rewired through the programming of a sequence motif of the general form G-G/F-XXX-G found in the first transmembrane span. By evaluating 46 different heterologously expressed transporters, we find that this motif is conserved among functional transporters and highly enriched in transporters that confer growth on xylose. Through saturation mutagenesis and subsequent rational mutagenesis, four transporter mutants unable to confer growth on glucose but able to sustain growth on xylose were engineered. Specifically, Candida intermedia gxs1 Phe(38)Ile(39)Met(40), Scheffersomyces stipitis rgt2 Phe(38) and Met(40), and Saccharomyces cerevisiae hxt7 Ile(39)Met(40)Met(340) all exhibit this phenotype. In these cases, primary hexose transporters were rewired into xylose transporters. These xylose transporters nevertheless remained inhibited by glucose. Furthermore, in the course of identifying this motif, novel wild-type transporters with superior monosaccharide growth profiles were discovered, namely S. stipitis RGT2 and Debaryomyces hansenii 2D01474. These findings build toward the engineering of efficient pentose utilization in yeast and provide a blueprint for reprogramming transporter properties. PMID:24344268

  18. Conserved Promoter Motif Is Required for Cell Cycle Timing of dnaX Transcription in Caulobacter

    PubMed Central

    Keiler, Kenneth C.; Shapiro, Lucy

    2001-01-01

    Cells use highly regulated transcriptional networks to control temporally regulated events. In the bacterium Caulobacter crescentus, many cellular processes are temporally regulated with respect to the cell cycle, and the genes required for these processes are expressed immediately before the products are needed. Genes encoding factors required for DNA replication, including dnaX, dnaA, dnaN, gyrB, and dnaK, are induced at the G1/S-phase transition. By analyzing mutations in the dnaX promoter, we identified a motif between the −10 and −35 regions that is required for proper timing of gene expression. This motif, named RRF (for repression of replication factors), is conserved in the promoters of other coordinately induced replication factors. Because mutations in the RRF motif result in constitutive gene expression throughout the cell cycle, this sequence is likely to be the binding site for a cell cycle-regulated transcriptional repressor. Consistent with this hypothesis, Caulobacter extracts contain an activity that binds specifically to the RRF in vitro. PMID:11466289

  19. Exploiting topological constraints to reveal buried sequence motifs in the membrane-bound N-linked oligosaccharyl transferases.

    PubMed

    Jaffee, Marcie B; Imperiali, Barbara

    2011-09-01

    The central enzyme in N-linked glycosylation is the oligosaccharyl transferase (OTase), which catalyzes glycan transfer from a polyprenyldiphosphate-linked carrier to select asparagines within acceptor proteins. PglB from Campylobacter jejuni is a single-subunit OTase with homology to the Stt3 subunit of the complex multimeric yeast OTase. Sequence identity between PglB and Stt3 is low (17.9%); however, both have a similar predicted architecture and contain the conserved WWDxG motif. To investigate the relationship between PglB and other Stt3 proteins, sequence analysis was performed using 28 homologues from evolutionarily distant organisms. Since detection of small conserved motifs within large membrane-associated proteins is complicated by divergent sequences surrounding the motifs, we developed a program to parse sequences according to predicted topology and then analyze topologically related regions. This approach identified three conserved motifs that served as the basis for subsequent mutagenesis and functional studies. This work reveals that several inter-transmembrane loop regions of PglB/Stt3 contain strictly conserved motifs that are essential for PglB function. The recent publication of a 3.4 Å resolution structure of full-length C. lari OTase provides clear structural evidence that these loops play a fundamental role in catalysis [ Lizak , C. ; ( 2011 ) Nature 474 , 350 - 355 ]. The current study provides biochemical support for the role of the inter-transmembrane domain loops in OTase catalysis and demonstrates the utility of combining topology prediction and sequence analysis for exposing buried pockets of homology in large membrane proteins. The described approach allowed detection of the catalytic motifs prior to availability of structural data and reveals additional catalytically relevant residues that are not predicted by structural data alone. PMID:21812456

  20. Phosphotyrosine Substrate Sequence Motifs for Dual Specificity Phosphatases

    PubMed Central

    Zhao, Bryan M.; Keasey, Sarah L.; Tropea, Joseph E.; Lountos, George T.; Dyas, Beverly K.; Cherry, Scott; Raran-Kurussi, Sreejith; Waugh, David S.; Ulrich, Robert G.

    2015-01-01

    Protein tyrosine phosphatases dephosphorylate tyrosine residues of proteins, whereas, dual specificity phosphatases (DUSPs) are a subgroup of protein tyrosine phosphatases that dephosphorylate not only Tyr(P) residue, but also the Ser(P) and Thr(P) residues of proteins. The DUSPs are linked to the regulation of many cellular functions and signaling pathways. Though many cellular targets of DUSPs are known, the relationship between catalytic activity and substrate specificity is poorly defined. We investigated the interactions of peptide substrates with select DUSPs of four types: MAP kinases (DUSP1 and DUSP7), atypical (DUSP3, DUSP14, DUSP22 and DUSP27), viral (variola VH1), and Cdc25 (A-C). Phosphatase recognition sites were experimentally determined by measuring dephosphorylation of 6,218 microarrayed Tyr(P) peptides representing confirmed and theoretical phosphorylation motifs from the cellular proteome. A broad continuum of dephosphorylation was observed across the microarrayed peptide substrates for all phosphatases, suggesting a complex relationship between substrate sequence recognition and optimal activity. Further analysis of peptide dephosphorylation by hierarchical clustering indicated that DUSPs could be organized by substrate sequence motifs, and peptide-specificities by phylogenetic relationships among the catalytic domains. The most highly dephosphorylated peptides represented proteins from 29 cell-signaling pathways, greatly expanding the list of potential targets of DUSPs. These newly identified DUSP substrates will be important for examining structure-activity relationships with physiologically relevant targets. PMID:26302245

  1. DoOPSearch: a web-based tool for finding and analysing common conserved motifs in the promoter regions of different chordate and plant genes

    PubMed Central

    Sebestyén, Endre; Nagy, Tibor; Suhai, Sándor; Barta, Endre

    2009-01-01

    Background The comparative genomic analysis of a large number of orthologous promoter regions of the chordate and plant genes from the DoOP databases shows thousands of conserved motifs. Most of these motifs differ from any known transcription factor binding site (TFBS). To identify common conserved motifs, we need a specific tool to be able to search amongst them. Since conserved motifs from the DoOP databases are linked to genes, the result of such a search can give a list of genes that are potentially regulated by the same transcription factor(s). Results We have developed a new tool called DoOPSearch for the analysis of the conserved motifs in the promoter regions of chordate or plant genes. We used the orthologous promoters of the DoOP database to extract thousands of conserved motifs from different taxonomic groups. The advantage of this approach is that different sets of conserved motifs might be found depending on how broad the taxonomic coverage of the underlying orthologous promoter sequence collection is (consider e.g. primates vs. mammals or Brassicaceae vs. Viridiplantae). The DoOPSearch tool allows the users to search these motif collections or the promoter regions of DoOP with user supplied query sequences or any of the conserved motifs from the DoOP database. To find overrepresented gene ontologies, the gene lists obtained can be analysed further using a modified version of the GeneMerge program. Conclusion We present here a comparative genomics based promoter analysis tool. Our system is based on a unique collection of conserved promoter motifs characteristic of different taxonomic groups. We offer both a command line and a web-based tool for searching in these motif collections using user specified queries. These can be either short promoter sequences or consensus sequences of known transcription factor binding sites. The GeneMerge analysis of the search results allows the user to identify statistically overrepresented Gene Ontology terms that

  2. A conserved motif flags Acyl Carrier Proteins for β-branching in polyketide synthesis

    PubMed Central

    Song, Zhongshu; Farmer, Rohit; Williams, Christopher; Hothersall, Joanne; Płoskoń, Eliza; Wattana-amorn, Pakorn; Stephens, Elton R.; Yamada, Erika; Gurney, Rachel; Takebayashi, Yuiko; Masschelein, Joleen; Cox, Russell J.; Lavigne, Rob; Willis, Christine L.; Simpson, Thomas J.; Crosby, John; Winn, Peter J.; Thomas, Christopher M.; Crump, Matthew P.

    2015-01-01

    Type I PKSs often utilise programmed β-branching, via enzymes of an “HMG-CoA synthase (HCS) cassette”, to incorporate various side chains at the second carbon from the terminal carboxylic acid of growing polyketide backbones. We identified a strong sequence motif in Acyl Carrier Proteins (ACPs) where β-branching is known. Substituting ACPs confirmed a correlation of ACP type with β-branching specificity. While these ACPs often occur in tandem, NMR analysis of tandem β-branching ACPs indicated no ACP-ACP synergistic effects and revealed that the conserved sequence motif forms an internal core rather than an exposed patch. Modelling and mutagenesis identified ACP Helix III as a probable anchor point of the ACP-HCS complex whose position is determined by the core. Mutating the core affects ACP functionality while ACP-HCS interface substitutions modulate system specificity. Our method for predicting β-carbon branching expands the potential for engineering novel polyketides and lays a basis for determining specificity rules. PMID:24056399

  3. Conserved Repeat Motifs and Glucan Binding by Glucansucrases of Oral Streptococci and Leuconostoc mesenteroides

    PubMed Central

    Shah, Deepan S. H.; Joucla, Gilles; Remaud-Simeon, Magali; Russell, Roy R. B.

    2004-01-01

    Glucansucrases of oral streptococci and Leuconostoc mesenteroides have a common pattern of structural organization and characteristically contain a domain with a series of tandem amino acid repeats in which certain residues are highly conserved, particularly aromatic amino acids and glycine. In some glucosyltransferases (GTFs) the repeat region has been identified as a glucan binding domain (GBD). Such GBDs are also found in several glucan binding proteins (GBP) of oral streptococci that do not have glucansucrase activity. Alignment of the amino acid sequences of 20 glucansucrases and GBP showed the widespread conservation of the 33-residue A repeat first identified in GtfI of Streptococcus downei. Site-directed mutagenesis of individual highly conserved residues in recombinant GBD of GtfI demonstrated the importance of the first tryptophan and the tyrosine-phenylalanine pair in the binding of dextran, as well as the essential contribution of a basic residue (arginine or lysine). A microplate binding assay was developed to measure the binding affinity of recombinant GBDs. GBD of GtfI was shown to be capable of binding glucans with predominantly α-1,3 or α-1,6 links, as well as alternating α-1,3 and α-1,6 links (alternan). Western blot experiments using biotinylated dextran or alternan as probes demonstrated a difference between the binding of streptococcal GTF and GBP and that of Leuconostoc glucansucrases. Experimental data and bioinformatics analysis showed that the A repeat motif is distinct from the 20-residue CW motif, which also has conserved aromatic amino acids and glycine and which occurs in the choline-binding proteins of Streptococcus pneumoniae and other organisms. PMID:15576779

  4. Bioinformatic Identification of Conserved Cis-Sequences in Coregulated Genes.

    PubMed

    Bülow, Lorenz; Hehl, Reinhard

    2016-01-01

    Bioinformatics tools can be employed to identify conserved cis-sequences in sets of coregulated plant genes because more and more gene expression and genomic sequence data become available. Knowledge on the specific cis-sequences, their enrichment and arrangement within promoters, facilitates the design of functional synthetic plant promoters that are responsive to specific stresses. The present chapter illustrates an example for the bioinformatic identification of conserved Arabidopsis thaliana cis-sequences enriched in drought stress-responsive genes. This workflow can be applied for the identification of cis-sequences in any sets of coregulated genes. The workflow includes detailed protocols to determine sets of coregulated genes, to extract the corresponding promoter sequences, and how to install and run a software package to identify overrepresented motifs. Further bioinformatic analyses that can be performed with the results are discussed. PMID:27557771

  5. SLiMPrints: conservation-based discovery of functional motif fingerprints in intrinsically disordered protein regions

    PubMed Central

    Davey, Norman E.; Cowan, Joanne L.; Shields, Denis C.; Gibson, Toby J.; Coldwell, Mark J.; Edwards, Richard J.

    2012-01-01

    Large portions of higher eukaryotic proteomes are intrinsically disordered, and abundant evidence suggests that these unstructured regions of proteins are rich in regulatory interaction interfaces. A major class of disordered interaction interfaces are the compact and degenerate modules known as short linear motifs (SLiMs). As a result of the difficulties associated with the experimental identification and validation of SLiMs, our understanding of these modules is limited, advocating the use of computational methods to focus experimental discovery. This article evaluates the use of evolutionary conservation as a discriminatory technique for motif discovery. A statistical framework is introduced to assess the significance of relatively conserved residues, quantifying the likelihood a residue will have a particular level of conservation given the conservation of the surrounding residues. The framework is expanded to assess the significance of groupings of conserved residues, a metric that forms the basis of SLiMPrints (short linear motif fingerprints), a de novo motif discovery tool. SLiMPrints identifies relatively overconstrained proximal groupings of residues within intrinsically disordered regions, indicative of putatively functional motifs. Finally, the human proteome is analysed to create a set of highly conserved putative motif instances, including a novel site on translation initiation factor eIF2A that may regulate translation through binding of eIF4E. PMID:22977176

  6. Sequence Motifs in MADS Transcription Factors Responsible for Specificity and Diversification of Protein-Protein Interaction

    PubMed Central

    van Dijk, Aalt D. J.; Morabito, Giuseppa; Fiers, Martijn; van Ham, Roeland C. H. J.; Angenent, Gerco C.; Immink, Richard G. H.

    2010-01-01

    Protein sequences encompass tertiary structures and contain information about specific molecular interactions, which in turn determine biological functions of proteins. Knowledge about how protein sequences define interaction specificity is largely missing, in particular for paralogous protein families with high sequence similarity, such as the plant MADS domain transcription factor family. In comparison to the situation in mammalian species, this important family of transcription regulators has expanded enormously in plant species and contains over 100 members in the model plant species Arabidopsis thaliana. Here, we provide insight into the mechanisms that determine protein-protein interaction specificity for the Arabidopsis MADS domain transcription factor family, using an integrated computational and experimental approach. Plant MADS proteins have highly similar amino acid sequences, but their dimerization patterns vary substantially. Our computational analysis uncovered small sequence regions that explain observed differences in dimerization patterns with reasonable accuracy. Furthermore, we show the usefulness of the method for prediction of MADS domain transcription factor interaction networks in other plant species. Introduction of mutations in the predicted interaction motifs demonstrated that single amino acid mutations can have a large effect and lead to loss or gain of specific interactions. In addition, various performed bioinformatics analyses shed light on the way evolution has shaped MADS domain transcription factor interaction specificity. Identified protein-protein interaction motifs appeared to be strongly conserved among orthologs, indicating their evolutionary importance. We also provide evidence that mutations in these motifs can be a source for sub- or neo-functionalization. The analyses presented here take us a step forward in understanding protein-protein interactions and the interplay between protein sequences and network evolution. PMID

  7. A Monte Carlo-based framework enhances the discovery and interpretation of regulatory sequence motifs

    PubMed Central

    2012-01-01

    Background Discovery of functionally significant short, statistically overrepresented subsequence patterns (motifs) in a set of sequences is a challenging problem in bioinformatics. Oftentimes, not all sequences in the set contain a motif. These non-motif-containing sequences complicate the algorithmic discovery of motifs. Filtering the non-motif-containing sequences from the larger set of sequences while simultaneously determining the identity of the motif is, therefore, desirable and a non-trivial problem in motif discovery research. Results We describe MotifCatcher, a framework that extends the sensitivity of existing motif-finding tools by employing random sampling to effectively remove non-motif-containing sequences from the motif search. We developed two implementations of our algorithm; each built around a commonly used motif-finding tool, and applied our algorithm to three diverse chromatin immunoprecipitation (ChIP) data sets. In each case, the motif finder with the MotifCatcher extension demonstrated improved sensitivity over the motif finder alone. Our approach organizes candidate functionally significant discovered motifs into a tree, which allowed us to make additional insights. In all cases, we were able to support our findings with experimental work from the literature. Conclusions Our framework demonstrates that additional processing at the sequence entry level can significantly improve the performance of existing motif-finding tools. For each biological data set tested, we were able to propose novel biological hypotheses supported by experimental work from the literature. Specifically, in Escherichia coli, we suggested binding site motifs for 6 non-traditional LexA protein binding sites; in Saccharomyces cerevisiae, we hypothesize 2 disparate mechanisms for novel binding sites of the Cse4p protein; and in Halobacterium sp. NRC-1, we discoverd subtle differences in a general transcription factor (GTF) binding site motif across several data sets. We

  8. SVM2Motif--Reconstructing Overlapping DNA Sequence Motifs by Mimicking an SVM Predictor.

    PubMed

    Vidovic, Marina M-C; Görnitz, Nico; Müller, Klaus-Robert; Rätsch, Gunnar; Kloft, Marius

    2015-01-01

    Identifying discriminative motifs underlying the functionality and evolution of organisms is a major challenge in computational biology. Machine learning approaches such as support vector machines (SVMs) achieve state-of-the-art performances in genomic discrimination tasks, but--due to its black-box character--motifs underlying its decision function are largely unknown. As a remedy, positional oligomer importance matrices (POIMs) allow us to visualize the significance of position-specific subsequences. Although being a major step towards the explanation of trained SVM models, they suffer from the fact that their size grows exponentially in the length of the motif, which renders their manual inspection feasible only for comparably small motif sizes, typically k ≤ 5. In this work, we extend the work on positional oligomer importance matrices, by presenting a new machine-learning methodology, entitled motifPOIM, to extract the truly relevant motifs--regardless of their length and complexity--underlying the predictions of a trained SVM model. Our framework thereby considers the motifs as free parameters in a probabilistic model, a task which can be phrased as a non-convex optimization problem. The exponential dependence of the POIM size on the oligomer length poses a major numerical challenge, which we address by an efficient optimization framework that allows us to find possibly overlapping motifs consisting of up to hundreds of nucleotides. We demonstrate the efficacy of our approach on a synthetic data set as well as a real-world human splice site data set. PMID:26690911

  9. TOPDOM: database of conservatively located domains and motifs in proteins

    PubMed Central

    Varga, Julia; Dobson, László; Tusnády, Gábor E.

    2016-01-01

    Summary: The TOPDOM database—originally created as a collection of domains and motifs located consistently on the same side of the membranes in α-helical transmembrane proteins—has been updated and extended by taking into consideration consistently localized domains and motifs in globular proteins, too. By taking advantage of the recently developed CCTOP algorithm to determine the type of a protein and predict topology in case of transmembrane proteins, and by applying a thorough search for domains and motifs as well as utilizing the most up-to-date version of all source databases, we managed to reach a 6-fold increase in the size of the whole database and a 2-fold increase in the number of transmembrane proteins. Availability and implementation: TOPDOM database is available at http://topdom.enzim.hu. The webpage utilizes the common Apache, PHP5 and MySQL software to provide the user interface for accessing and searching the database. The database itself is generated on a high performance computer. Contact: tusnady.gabor@ttk.mta.hu. Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153630

  10. JAR3D Webserver: Scoring and aligning RNA loop sequences to known 3D motifs.

    PubMed

    Roll, James; Zirbel, Craig L; Sweeney, Blake; Petrov, Anton I; Leontis, Neocles

    2016-07-01

    Many non-coding RNAs have been identified and may function by forming 2D and 3D structures. RNA hairpin and internal loops are often represented as unstructured on secondary structure diagrams, but RNA 3D structures show that most such loops are structured by non-Watson-Crick basepairs and base stacking. Moreover, different RNA sequences can form the same RNA 3D motif. JAR3D finds possible 3D geometries for hairpin and internal loops by matching loop sequences to motif groups from the RNA 3D Motif Atlas, by exact sequence match when possible, and by probabilistic scoring and edit distance for novel sequences. The scoring gauges the ability of the sequences to form the same pattern of interactions observed in 3D structures of the motif. The JAR3D webserver at http://rna.bgsu.edu/jar3d/ takes one or many sequences of a single loop as input, or else one or many sequences of longer RNAs with multiple loops. Each sequence is scored against all current motif groups. The output shows the ten best-matching motif groups. Users can align input sequences to each of the motif groups found by JAR3D. JAR3D will be updated with every release of the RNA 3D Motif Atlas, and so its performance is expected to improve over time. PMID:27235417

  11. False occurrences of functional motifs in protein sequences highlight evolutionary constraints

    PubMed Central

    Via, Allegra; Gherardini, Pier Federico; Ferraro, Enrico; Ausiello, Gabriele; Scalia Tomba, Gianpaolo; Helmer-Citterich, Manuela

    2007-01-01

    Background False occurrences of functional motifs in protein sequences can be considered as random events due solely to the sequence composition of a proteome. Here we use a numerical approach to investigate the random appearance of functional motifs with the aim of addressing biological questions such as: How are organisms protected from undesirable occurrences of motifs otherwise selected for their functionality? Has the random appearance of functional motifs in protein sequences been affected during evolution? Results Here we analyse the occurrence of functional motifs in random sequences and compare it to that observed in biological proteomes; the behaviour of random motifs is also studied. Most motifs exhibit a number of false positives significantly similar to the number of times they appear in randomized proteomes (=expected number of false positives). Interestingly, about 3% of the analysed motifs show a different kind of behaviour and appear in biological proteomes less than they do in random sequences. In some of these cases, a mechanism of evolutionary negative selection is apparent; this helps to prevent unwanted functionalities which could interfere with cellular mechanisms. Conclusion Our thorough statistical and biological analysis showed that there are several mechanisms and evolutionary constraints both of which affect the appearance of functional motifs in protein sequences. PMID:17331242

  12. Ser/Thr Motifs in Transmembrane Proteins: Conservation Patterns and Effects on Local Protein Structure and Dynamics

    PubMed Central

    del Val, Coral; White, Stephen H.

    2014-01-01

    We combined systematic bioinformatics analyses and molecular dynamics simulations to assess the conservation patterns of Ser and Thr motifs in membrane proteins, and the effect of such motifs on the structure and dynamics of α-helical transmembrane (TM) segments. We find that Ser/Thr motifs are often present in β-barrel TM proteins. At least one Ser/Thr motif is present in almost half of the sequences of α-helical proteins analyzed here. The extensive bioinformatics analyses and inspection of protein structures led to the identification of molecular transporters with noticeable numbers of Ser/Thr motifs within the TM region. Given the energetic penalty for burying multiple Ser/Thr groups in the membrane hydrophobic core, the observation of transporters with multiple membrane-embedded Ser/Thr is intriguing and raises the question of how the presence of multiple Ser/Thr affects protein local structure and dynamics. Molecular dynamics simulations of four different Ser-containing model TM peptides indicate that backbone hydrogen bonding of membrane-buried Ser/Thr hydroxyl groups can significantly change the local structure and dynamics of the helix. Ser groups located close to the membrane interface can hydrogen bond to solvent water instead of protein backbone, leading to an enhanced local solvation of the peptide. PMID:22836667

  13. Ser/Thr motifs in transmembrane proteins: conservation patterns and effects on local protein structure and dynamics.

    PubMed

    Del Val, Coral; White, Stephen H; Bondar, Ana-Nicoleta

    2012-11-01

    We combined systematic bioinformatics analyses and molecular dynamics simulations to assess the conservation patterns of Ser and Thr motifs in membrane proteins, and the effect of such motifs on the structure and dynamics of α-helical transmembrane (TM) segments. We find that Ser/Thr motifs are often present in β-barrel TM proteins. At least one Ser/Thr motif is present in almost half of the sequences of α-helical proteins analyzed here. The extensive bioinformatics analyses and inspection of protein structures led to the identification of molecular transporters with noticeable numbers of Ser/Thr motifs within the TM region. Given the energetic penalty for burying multiple Ser/Thr groups in the membrane hydrophobic core, the observation of transporters with multiple membrane-embedded Ser/Thr is intriguing and raises the question of how the presence of multiple Ser/Thr affects protein local structure and dynamics. Molecular dynamics simulations of four different Ser-containing model TM peptides indicate that backbone hydrogen bonding of membrane-buried Ser/Thr hydroxyl groups can significantly change the local structure and dynamics of the helix. Ser groups located close to the membrane interface can hydrogen bond to solvent water instead of protein backbone, leading to an enhanced local solvation of the peptide. PMID:22836667

  14. Illumina MiSeq sequencing disfavours a sequence motif in the GFP reporter gene

    PubMed Central

    Van den Hoecke, Silvie; Verhelst, Judith; Saelens, Xavier

    2016-01-01

    Green fluorescent protein (GFP) is one of the most used reporter genes. We have used next-generation sequencing (NGS) to analyse the genetic diversity of a recombinant influenza A virus that expresses GFP and found a remarkable coverage dip in the GFP coding sequence. This coverage dip was present when virus-derived RT-PCR product or the parental plasmid DNA was used as starting material for NGS and regardless of whether Nextera XT transposase or Covaris shearing was used for DNA fragmentation. Therefore, the sequence coverage dip in the GFP coding sequence was not the result of emerging GFP mutant viruses or a bias introduced by Nextera XT fragmentation. Instead, we found that the Illumina MiSeq sequencing method disfavours the ‘CCCGCC’ motif in the GFP coding sequence. PMID:27193250

  15. Illumina MiSeq sequencing disfavours a sequence motif in the GFP reporter gene.

    PubMed

    Van den Hoecke, Silvie; Verhelst, Judith; Saelens, Xavier

    2016-01-01

    Green fluorescent protein (GFP) is one of the most used reporter genes. We have used next-generation sequencing (NGS) to analyse the genetic diversity of a recombinant influenza A virus that expresses GFP and found a remarkable coverage dip in the GFP coding sequence. This coverage dip was present when virus-derived RT-PCR product or the parental plasmid DNA was used as starting material for NGS and regardless of whether Nextera XT transposase or Covaris shearing was used for DNA fragmentation. Therefore, the sequence coverage dip in the GFP coding sequence was not the result of emerging GFP mutant viruses or a bias introduced by Nextera XT fragmentation. Instead, we found that the Illumina MiSeq sequencing method disfavours the 'CCCGCC' motif in the GFP coding sequence. PMID:27193250

  16. Sequence motifs and prokaryotic expression of the reptilian paramyxovirus fusion protein

    USGS Publications Warehouse

    Franke, J.; Batts, W.N.; Ahne, W.; Kurath, G.; Winton, J.R.

    2006-01-01

    Fourteen reptilian paramyxovirus isolates were chosen to represent the known extent of genetic diversity among this novel group of viruses. Selected regions of the fusion (F) gene were sequenced, analyzed and compared. The F gene of all isolates contained conserved motifs homologous to those described for other members of the family Paramyxoviridae including: signal peptide, transmembrane domain, furin cleavage site, fusion peptide, N-linked glycosylation sites, and two heptad repeats, the second of which (HRB-LZ) had the characteristics of a leucine zipper. Selected regions of the fusion gene of isolate Gono-GER85 were inserted into a prokaryotic expression system to generate three recombinant protein fragments of various sizes. The longest recombinant protein was cleaved by furin into two fragments of predicted length. Western blot analysis with virus-neutralizing rabbit-antiserum against this isolate demonstrated that only the longest construct reacted with the antiserum. This construct was unique in containing 30 additional C-terminal amino acids that included most of the HRB-LZ. These results indicate that the F genes of reptilian paramyxoviruses contain highly conserved motifs typical of other members of the family and suggest that the HRB-LZ domain of the reptilian paramyxovirus F protein contains a linear antigenic epitope. ?? Springer-Verlag 2005.

  17. RSAT::Plants: Motif Discovery Within Clusters of Upstream Sequences in Plant Genomes.

    PubMed

    Contreras-Moreira, Bruno; Castro-Mondragon, Jaime A; Rioualen, Claire; Cantalapiedra, Carlos P; van Helden, Jacques

    2016-01-01

    The plant-dedicated mirror of the Regulatory Sequence Analysis Tools (RSAT, http://plants.rsat.eu ) offers specialized options for researchers dealing with plant transcriptional regulation. The website contains whole-sequenced genomes from species regularly updated from Ensembl Plants and other sources (currently 40), and supports an array of tasks frequently required for the analysis of regulatory sequences, such as retrieving upstream sequences, motif discovery, motif comparison, and pattern matching. RSAT::Plants also integrates the footprintDB collection of DNA motifs. This protocol explains step-by-step how to discover DNA motifs in regulatory regions of clusters of co-expressed genes in plants. It also explains how to empirically control the significance of the result, and how to associate the discovered motifs with putative binding factors. PMID:27557774

  18. Unique Structural Features and Sequence Motifs of Proline Utilization A (PutA)

    PubMed Central

    Singh, Ranjan K.; Tanner, John J.

    2013-01-01

    Proline utilization A proteins (PutAs) are bifunctional enzymes that catalyze the oxidation of proline to glutamate using spatially separated proline dehydrogenase and pyrroline-5-carboxylate dehydrogenase active sites. Here we use the crystal structure of the minimalist PutA from Bradyrhizobium japonicum (BjPutA) along with sequence analysis to identify unique structural features of PutAs. This analysis shows that PutAs have secondary structural elements and domains not found in the related monofunctional enzymes. Some of these extra features are predicted to be important for substrate channeling in BjPutA. Multiple sequence alignment analysis shows that some PutAs have a 17-residue conserved motif in the C-terminal 20–30 residues of the polypeptide chain. The BjPutA structure shows that this motif helps seal the internal substrate-channeling cavity from the bulk medium. Finally, it is shown that some PutAs have a 100–200 residue domain of unknown function in the C-terminus that is not found in minimalist PutAs. Remote homology detection suggests that this domain is homologous to the oligomerization beta-hairpin and Rossmann fold domain of BjPutA. PMID:22201760

  19. Unique structural features and sequence motifs of proline utilization A (PutA).

    PubMed

    Singh, Ranjan K; Tanner, John J

    2012-01-01

    Proline utilization A proteins (PutAs) are bifunctional enzymes that catalyze the oxidation of proline to glutamate using spatially separated proline dehydrogenase and pyrroline-5-carboxylate dehydrogenase active sites. Here we use the crystal structure of the minimalist PutA from Bradyrhizobium japonicum (BjPutA) along with sequence analysis to identify unique structural features of PutAs. This analysis shows that PutAs have secondary structural elements and domains not found in the related monofunctional enzymes. Some of these extra features are predicted to be important for substrate channeling in BjPutA. Multiple sequence alignment analysis shows that some PutAs have a 17-residue conserved motif in the C-terminal 20-30 residues of the polypeptide chain. The BjPutA structure shows that this motif helps seal the internal substrate-channeling cavity from the bulk medium. Finally, it is shown that some PutAs have a 100-200 residue domain of unknown function in the C-terminus that is not found in minimalist PutAs. Remote homology detection suggests that this domain is homologous to the oligomerization beta-hairpin and Rossmann fold domain of BjPutA. PMID:22201760

  20. A conserved motif mediates both multimer formation and allosteric activation of phosphoglycerate mutase 5.

    PubMed

    Wilkins, Jordan M; McConnell, Cyrus; Tipton, Peter A; Hannink, Mark

    2014-09-01

    Phosphoglycerate mutase 5 (PGAM5) is an atypical mitochondrial Ser/Thr phosphatase that modulates mitochondrial dynamics and participates in both apoptotic and necrotic cell death. The mechanisms that regulate the phosphatase activity of PGAM5 are poorly understood. The C-terminal phosphoglycerate mutase domain of PGAM5 shares homology with the catalytic domains found in other members of the phosphoglycerate mutase family, including a conserved histidine that is absolutely required for catalytic activity. However, this conserved domain is not sufficient for maximal phosphatase activity. We have identified a highly conserved amino acid motif, WDXNWD, located within the unique N-terminal region, which is required for assembly of PGAM5 into large multimeric complexes. Alanine substitutions within the WDXNWD motif abolish the formation of multimeric complexes and markedly reduce phosphatase activity of PGAM5. A peptide containing the WDXNWD motif dissociates the multimeric complex and reduces but does not fully abolish phosphatase activity. Addition of the WDXNWD-containing peptide in trans to a mutant PGAM5 protein lacking the WDXNWD motif markedly increases phosphatase activity of the mutant protein. Our results are consistent with an intermolecular allosteric regulation mechanism for the phosphatase activity of PGAM5, in which the assembly of PGAM5 into multimeric complexes, mediated by the WDXNWD motif, results in maximal activation of phosphatase activity. Our results suggest the possibility of identifying small molecules that function as allosteric regulators of the phosphatase activity of PGAM5. PMID:25012655

  1. A Conserved Motif Mediates both Multimer Formation and Allosteric Activation of Phosphoglycerate Mutase 5*

    PubMed Central

    Wilkins, Jordan M.; McConnell, Cyrus; Tipton, Peter A.; Hannink, Mark

    2014-01-01

    Phosphoglycerate mutase 5 (PGAM5) is an atypical mitochondrial Ser/Thr phosphatase that modulates mitochondrial dynamics and participates in both apoptotic and necrotic cell death. The mechanisms that regulate the phosphatase activity of PGAM5 are poorly understood. The C-terminal phosphoglycerate mutase domain of PGAM5 shares homology with the catalytic domains found in other members of the phosphoglycerate mutase family, including a conserved histidine that is absolutely required for catalytic activity. However, this conserved domain is not sufficient for maximal phosphatase activity. We have identified a highly conserved amino acid motif, WDXNWD, located within the unique N-terminal region, which is required for assembly of PGAM5 into large multimeric complexes. Alanine substitutions within the WDXNWD motif abolish the formation of multimeric complexes and markedly reduce phosphatase activity of PGAM5. A peptide containing the WDXNWD motif dissociates the multimeric complex and reduces but does not fully abolish phosphatase activity. Addition of the WDXNWD-containing peptide in trans to a mutant PGAM5 protein lacking the WDXNWD motif markedly increases phosphatase activity of the mutant protein. Our results are consistent with an intermolecular allosteric regulation mechanism for the phosphatase activity of PGAM5, in which the assembly of PGAM5 into multimeric complexes, mediated by the WDXNWD motif, results in maximal activation of phosphatase activity. Our results suggest the possibility of identifying small molecules that function as allosteric regulators of the phosphatase activity of PGAM5. PMID:25012655

  2. Physical-chemical property based sequence motifs and methods regarding same

    DOEpatents

    Braun, Werner; Mathura, Venkatarajan S.; Schein, Catherine H.

    2008-09-09

    A data analysis system, program, and/or method, e.g., a data mining/data exploration method, using physical-chemical property motifs. For example, a sequence database may be searched for identifying segments thereof having physical-chemical properties similar to the physical-chemical property motifs.

  3. The Eps1p Protein Disulfide Isomerase Conserves Classic Thioredoxin Superfamily Amino Acid Motifs but Not Their Functional Geometries

    PubMed Central

    Biran, Shai; Gat, Yair; Fass, Deborah

    2014-01-01

    The widespread thioredoxin superfamily enzymes typically share the following features: a characteristic α-β fold, the presence of a Cys-X-X-Cys (or Cys-X-X-Ser) redox-active motif, and a proline in the cis configuration abutting the redox-active site in the tertiary structure. The Cys-X-X-Cys motif is at the solvent-exposed amino terminus of an α-helix, allowing the first cysteine to engage in nucleophilic attack on substrates, or substrates to attack the Cys-X-X-Cys disulfide, depending on whether the enzyme functions to reduce, isomerize, or oxidize its targets. We report here the X-ray crystal structure of an enzyme that breaks many of our assumptions regarding the sequence-structure relationship of thioredoxin superfamily proteins. The yeast Protein Disulfide Isomerase family member Eps1p has Cys-X-X-Cys motifs and proline residues at the appropriate primary structural positions in its first two predicted thioredoxin-fold domains. However, crystal structures show that the Cys-X-X-Cys of the second domain is buried and that the adjacent proline is in the trans, rather than the cis isomer. In these configurations, neither the “active-site” disulfide nor the backbone carbonyl preceding the proline is available to interact with substrate. The Eps1p structures thus expand the documented diversity of the PDI oxidoreductase family and demonstrate that conserved sequence motifs in common folds do not guarantee structural or functional conservation. PMID:25437863

  4. Interaction of MYC with host cell factor-1 is mediated by the evolutionarily conserved Myc box IV motif.

    PubMed

    Thomas, L R; Foshage, A M; Weissmiller, A M; Popay, T M; Grieb, B C; Qualls, S J; Ng, V; Carboneau, B; Lorey, S; Eischen, C M; Tansey, W P

    2016-07-01

    The MYC family of oncogenes encodes a set of three related transcription factors that are overexpressed in many human tumors and contribute to the cancer-related deaths of more than 70,000 Americans every year. MYC proteins drive tumorigenesis by interacting with co-factors that enable them to regulate the expression of thousands of genes linked to cell growth, proliferation, metabolism and genome stability. One effective way to identify critical co-factors required for MYC function has been to focus on sequence motifs within MYC that are conserved throughout evolution, on the assumption that their conservation is driven by protein-protein interactions that are vital for MYC activity. In addition to their DNA-binding domains, MYC proteins carry five regions of high sequence conservation known as Myc boxes (Mb). To date, four of the Mb motifs (MbI, MbII, MbIIIa and MbIIIb) have had a molecular function assigned to them, but the precise role of the remaining Mb, MbIV, and the reason for its preservation in vertebrate Myc proteins, is unknown. Here, we show that MbIV is required for the association of MYC with the abundant transcriptional coregulator host cell factor-1 (HCF-1). We show that the invariant core of MbIV resembles the tetrapeptide HCF-binding motif (HBM) found in many HCF-interaction partners, and demonstrate that MYC interacts with HCF-1 in a manner indistinguishable from the prototypical HBM-containing protein VP16. Finally, we show that rationalized point mutations in MYC that disrupt interaction with HCF-1 attenuate the ability of MYC to drive tumorigenesis in mice. Together, these data expose a molecular function for MbIV and indicate that HCF-1 is an important co-factor for MYC. PMID:26522729

  5. Evolutionarily conserved sequences on human chromosome 21

    SciTech Connect

    Frazer, Kelly A.; Sheehan, John B.; Stokowski, Renee P.; Chen, Xiyin; Hosseini, Roya; Cheng, Jan-Fang; Fodor, Stephen P.A.; Cox, David R.; Patil, Nila

    2001-09-01

    Comparison of human sequences with the DNA of other mammals is an excellent means of identifying functional elements in the human genome. Here we describe the utility of high-density oligonucleotide arrays as a rapid approach for comparing human sequences with the DNA of multiple species whose sequences are not presently available. High-density arrays representing approximately 22.5 Mb of nonrepetitive human chromosome 21 sequence were synthesized and then hybridized with mouse and dog DNA to identify sequences conserved between humans and mice (human-mouse elements) and between humans and dogs (human-dog elements). Our data show that sequence comparison of multiple species provides a powerful empiric method for identifying actively conserved elements in the human genome. A large fraction of these evolutionarily conserved elements are present in regions on chromosome 21 that do not encode known genes.

  6. Analysis of Genomic Sequence Motifs for Deciphering Transcription Factor Binding and Transcriptional Regulation in Eukaryotic Cells

    PubMed Central

    Boeva, Valentina

    2016-01-01

    Eukaryotic genomes contain a variety of structured patterns: repetitive elements, binding sites of DNA and RNA associated proteins, splice sites, and so on. Often, these structured patterns can be formalized as motifs and described using a proper mathematical model such as position weight matrix and IUPAC consensus. Two key tasks are typically carried out for motifs in the context of the analysis of genomic sequences. These are: identification in a set of DNA regions of over-represented motifs from a particular motif database, and de novo discovery of over-represented motifs. Here we describe existing methodology to perform these two tasks for motifs characterizing transcription factor binding. When applied to the output of ChIP-seq and ChIP-exo experiments, or to promoter regions of co-modulated genes, motif analysis techniques allow for the prediction of transcription factor binding events and enable identification of transcriptional regulators and co-regulators. The usefulness of motif analysis is further exemplified in this review by how motif discovery improves peak calling in ChIP-seq and ChIP-exo experiments and, when coupled with information on gene expression, allows insights into physical mechanisms of transcriptional modulation. PMID:26941778

  7. Analysis of Genomic Sequence Motifs for Deciphering Transcription Factor Binding and Transcriptional Regulation in Eukaryotic Cells.

    PubMed

    Boeva, Valentina

    2016-01-01

    Eukaryotic genomes contain a variety of structured patterns: repetitive elements, binding sites of DNA and RNA associated proteins, splice sites, and so on. Often, these structured patterns can be formalized as motifs and described using a proper mathematical model such as position weight matrix and IUPAC consensus. Two key tasks are typically carried out for motifs in the context of the analysis of genomic sequences. These are: identification in a set of DNA regions of over-represented motifs from a particular motif database, and de novo discovery of over-represented motifs. Here we describe existing methodology to perform these two tasks for motifs characterizing transcription factor binding. When applied to the output of ChIP-seq and ChIP-exo experiments, or to promoter regions of co-modulated genes, motif analysis techniques allow for the prediction of transcription factor binding events and enable identification of transcriptional regulators and co-regulators. The usefulness of motif analysis is further exemplified in this review by how motif discovery improves peak calling in ChIP-seq and ChIP-exo experiments and, when coupled with information on gene expression, allows insights into physical mechanisms of transcriptional modulation. PMID:26941778

  8. Repulsive parallel MCMC algorithm for discovering diverse motifs from large sequence sets

    PubMed Central

    Ikebata, Hisaki; Yoshida, Ryo

    2015-01-01

    Motivation: The motif discovery problem consists of finding recurring patterns of short strings in a set of nucleotide sequences. This classical problem is receiving renewed attention as most early motif discovery methods lack the ability to handle large data of recent genome-wide ChIP studies. New ChIP-tailored methods focus on reducing computation time and pay little regard to the accuracy of motif detection. Unlike such methods, our method focuses on increasing the detection accuracy while maintaining the computation efficiency at an acceptable level. The major advantage of our method is that it can mine diverse multiple motifs undetectable by current methods. Results: The repulsive parallel Markov chain Monte Carlo (RPMCMC) algorithm that we propose is a parallel version of the widely used Gibbs motif sampler. RPMCMC is run on parallel interacting motif samplers. A repulsive force is generated when different motifs produced by different samplers near each other. Thus, different samplers explore different motifs. In this way, we can detect much more diverse motifs than conventional methods can. Through application to 228 transcription factor ChIP-seq datasets of the ENCODE project, we show that the RPMCMC algorithm can find many reliable cofactor interacting motifs that existing methods are unable to discover. Availability and implementation: A C++ implementation of RPMCMC and discovered cofactor motifs for the 228 ENCODE ChIP-seq datasets are available from http://daweb.ism.ac.jp/yoshidalab/motif. Contact: ikebata.hisaki@ism.ac.jp, yoshidar@ism.ac.jp Supplementary information: Supplementary data are available from Bioinformatics online. PMID:25583120

  9. An artificial intelligence approach to motif discovery in protein sequences: application to steriod dehydrogenases.

    PubMed

    Bailey, T L; Baker, M E; Elkan, C P

    1997-05-01

    MEME (Multiple Expectation-maximization for Motif Elicitation) is a unique new software tool that uses artificial intelligence techniques to discover motifs shared by a set of protein sequences in a fully automated manner. This paper is the first detailed study of the use of MEME to analyse a large, biologically relevant set of sequences, and to evaluate the sensitivity and accuracy of MEME in identifying structurally important motifs. For this purpose, we chose the short-chain alcohol dehydrogenase superfamily because it is large and phylogenetically diverse, providing a test of how well MEME can work on sequences with low amino acid similarity. Moreover, this dataset contains enzymes of biological importance, and because several enzymes have known X-ray crystallographic structures, we can test the usefulness of MEME for structural analysis. The first six motifs from MEME map onto structurally important alpha-helices and beta-strands on Streptomyces hydrogenans 20beta-hydroxysteroid dehydrogenase. We also describe MAST (Motif Alignment Search Tool), which conveniently uses output from MEME for searching databases such as SWISS-PROT and Genpept. MAST provides statistical measures that permit a rigorous evaluation of the significance of database searches with individual motifs or groups of motifs. A database search of Genpept90 by MAST with the log-odds matrix of the first six motifs obtained from MEME yields a bimodal output, demonstrating the selectivity of MAST. We show for the first time, using primary sequence analysis, that bacterial sugar epimerases are homologs of short-chain dehydrogenases. MEME and MAST will be increasingly useful as genome sequencing provides large datasets of phylogenetically divergent sequences of biomedical interest. PMID:9366496

  10. Characterization of a conserved C-terminal motif (RSPRR) in ribosomal protein S6 kinase 1 required for its mammalian target of rapamycin-dependent regulation.

    PubMed

    Schalm, Stefanie S; Tee, Andrew R; Blenis, John

    2005-03-25

    The mammalian target of rapamycin, mTOR, is a Ser/Thr kinase that promotes cell growth and proliferation by activating ribosomal protein S6 kinase 1 (S6K1). We previously identified a conserved TOR signaling (TOS) motif in the N terminus of S6K1 that is required for its mTOR-dependent activation. Furthermore, our data suggested that the TOS motif suppresses an inhibitory function associated with the C terminus of S6K1. Here, we have characterized the mTOR-regulated inhibitory region within the C terminus. We have identified a conserved C-terminal "RSPRR" sequence that is responsible for an mTOR-dependent suppression of S6K1 activation. Deletion or mutations within this RSPRR motif partially rescue the kinase activity of the S6K1 TOS motif mutant (S6K1-F5A), and this rescued activity is rapamycin resistant. Furthermore, we have shown that the RSPRR motif significantly suppresses S6K1 phosphorylation at two phosphorylation sites (Thr-389 and Thr-229) that are crucial for S6K1 activation. Importantly, introducing both the Thr-389 phosphomimetic and RSPRR motif mutations into the catalytically inactive S6K1 mutant S6K1-F5A completely rescues its activity and renders it fully rapamycin resistant. These data show that the N-terminal TOS motif suppresses an inhibitory function mediated by the C-terminal RSPRR motif. We propose that the RSPRR motif interacts with a negative regulator of S6K1 that is normally suppressed by mTOR. PMID:15659381

  11. A Conserved Di-Basic Motif of Drosophila Crumbs Contributes to Efficient ER Export.

    PubMed

    Kumichel, Alexandra; Kapp, Katja; Knust, Elisabeth

    2015-06-01

    The Drosophila type I transmembrane protein Crumbs is an apical determinant required for the maintenance of apico-basal epithelial cell polarity. The level of Crumbs at the plasma membrane is crucial, but how it is regulated is poorly understood. In a genetic screen for regulators of Crumbs protein trafficking we identified Sar1, the core component of the coat protein complex II transport vesicles. sar1 mutant embryos show a reduced plasma membrane localization of Crumbs, a defect similar to that observed in haunted and ghost mutant embryos, which lack Sec23 and Sec24CD, respectively. By pulse-chase assays in Drosophila Schneider cells and analysis of protein transport kinetics based on Endoglycosidase H resistance we identified an RNKR motif in Crumbs, which contributes to efficient ER export. The motif identified fits the highly conserved di-basic RxKR motif and mediates interaction with Sar1. The RNKR motif is also required for plasma membrane delivery of transgene-encoded Crumbs in epithelial cells of Drosophila embryos. Our data are the first to show that a di-basic motif acts as a signal for ER exit of a type I plasma membrane protein in a metazoan organism. PMID:25753515

  12. Two structurally distinct {kappa}B sequence motifs cooperatively control LPS-induced KC gene transcription in mouse macrophages

    SciTech Connect

    Ohmori, Y.; Fukumoto, S.; Hamilton, T.A.

    1995-10-01

    The mouse KC gene is an {alpha}-chemokine gene whose transcription is induced in mononuclear phagocytes by LPS. DNA sequences necessary for transcriptional control of KC by LPS were identified in the region flanking the transcription start site. Transient transfection analysis in macrophages using deletion mutants of a 1.5-kb sequence placed in front of the chloramphenicol acetyl transferase (CAT) gene identified an LPS-responsive region between residues -104 and +30. This region contained two {kappa}B sequence motifs. The first motif (position -70 to -59, {kappa}B1) is highly conserved in all three human GRO genes and in the mouse macrophage inflammatory protein-2 (MIP-2) gene. The second {kappa}B motif (position -89 to -78, {kappa}B2) was conserved only between the mouse and the rat KC genes. Consistent with previous reports, the highly conserved {kappa}B site ({kappa}B1) was essential for LPS inducibility. Surprisingly, the distal {kappa}B site ({kappa}B2) was also necessary for optimal response; mutation of either {kappa}B site markedly reduced sensitivity to LPS in RAW264.7 cells and to TNF-{alpha} in NIH 3T3 fibroblasts. Although both {kappa}B1 and {kappa}B2 sequences were able to bind members of the Rel homology family, including NF{kappa}B1 (P50), RelA (65), and c-Rel, the {kappa}B1 site bound these factors with higher affinity and functioned more effectively than the {kappa}B2 site in a heterologous promoter. These findings demonstrate that transcriptional control of the KC gene requires cooperation between two {kappa}B sites and is thus distinct from that of the three human GRO genes and the mouse MIP-2 gene. 71 refs., 8 figs.

  13. A tobacco bZip transcription activator (TAF-1) binds to a G-box-like motif conserved in plant genes.

    PubMed Central

    Oeda, K; Salinas, J; Chua, N H

    1991-01-01

    Tobacco nuclear extract contains a factor that binds specifically to the motif I sequence (5'-GTACGTGGCG-3') conserved among rice rab genes and cotton lea genes. We isolated from a tobacco cDNA expression library, a partial cDNA clone encoding a truncated derivative of a protein designated as TAF-1. The truncated TAF-1 (Mr = 26,000) contains an acidic region at its N-terminus and a bZip motif at its C-terminus. Using a panel of motif I mutants as probes, we showed that the truncated TAF-1 and the tobacco nuclear factor for motif I have similar, it not identical, binding specificities. In particular, both show high-affinity binding to the perfect palindrome 5'-GCCACGTGGC-3' which is also known as the G-box motif. TAF-1 mRNA is highly expressed in root, but the level is at least 10 times lower in stem and leaf. Consistent with this observation, we found that a motif I tetramer, when fused to the -90 derivative of the CaMV 35S promoter, is inactive in leaf of transgenic tobacco. The activity, however, can be elevated by transient expression of the truncated TAF-1. We conclude from these results that TAF-1 can bind to the G-box and related motifs and that it functions as a transcription activator. Images PMID:2050116

  14. Conserved structural motifs located in distal loops of aphthovirus internal ribosome entry site domain 3 are required for internal initiation of translation.

    PubMed Central

    López de Quinto, S; Martínez-Salas, E

    1997-01-01

    A comparison of picornavirus internal ribosome entry site (IRES) secondary structures revealed the existence of conserved motifs located on loops. We have carried out a mutational analysis to test their requirement for IRES-driven translation. The GUAA sequence, located in the aphthovirus 3A loop, did not tolerate substitutions that disrupt the GNRA motif. Interestingly, this motif was found at similar positions in all picornavirus IRESs, suggesting that it may form part of a tertiary-structure element. The RAAA tetranucleotide located in the 3B loop was conserved only in cardiovirus and aphthovirus. A mutational analysis of the RAAA motif revealed that activities of 3B loop mutants correlated with both the presence of a sequence close to CAAA at the new 3B loop and the absence of reorganization of the 3B and 3C stem-loops. In support of this conclusion, insertion of a large number of nucleotides close to the 3B loop, which was predicted to reorganize the 3B-3C stem-loop structure, led to defective IRES elements. We conclude that the aphthovirus IRES loops located at the most distal part of domain 3, which carries GNRA and RAAA motifs, are essential for IRES function. PMID:9094703

  15. A conserved disulfide motif in human tear lipocalins influences ligand binding.

    PubMed

    Glasgow, B J; Abduragimov, A R; Yusifov, T N; Gasymov, O K; Horwitz, J; Hubbell, W L; Faull, K F

    1998-02-24

    Structural and functional characteristics of the disulfide motif have been determined for tear lipocalins, members of a novel group of proteins that carry lipids. Amino acid sequences for two of the six isolated isoforms were assigned by a comparison of molecular mass measurements with masses calculated from the cDNA-predicted protein sequence and available N-terminal protein sequence data. A third isoform was tentatively sequence assigned using the same criteria. The most abundant isoform has a measured mass of 17 446.3 Da, consistent with residues 19-176 of the putative precursor (calculated mass 17 445.8 Da). Chemical derivatization of native and reduced/denatured protein confirmed the presence of a single intramolecular disulfide bond in the native protein. Reactivity of native, reduced, and denatured protein with 4-pyridine disulfide and dithiobis(2-nitrobenzoic acid) indicated that access to the free cysteine is markedly restricted by the intact disulfide bridge. Mass measurements of tryptic fragments identified C119 as the free cysteine and showed that the single intramolecular disulfide bond joined residues C79 and C171. Circular dichroism indicated that tear lipocalins have a predominant beta-pleated sheet structure (44%) that is essentially retained after reduction of the disulfide bond. Circular dichroism in the far-UV showed reduced molecular asymmetry and enhanced urea-induced unfolding with disulfide reduction indicative of relaxation of protein structure. Circular dichroism in the near-UV shows that the disulfide bond contributes to the asymmetry of aromatic sites. The effect of disulfide reduction on ligand binding was monitored using the intrinsic optical activity of bound retinol. The intact disulfide bond diminishes the affinity of tear lipocalins for retinol and restricts the displacement of native lipids by retinol. Disulfide reduction is accompanied by a dramatic alteration in ligand-induced conformational changes that involves aromatic

  16. Modeling of the Ebola virus delta peptide reveals a potential lytic sequence motif.

    PubMed

    Gallaher, William R; Garry, Robert F

    2015-01-01

    Filoviruses, such as Ebola and Marburg viruses, cause severe outbreaks of human infection, including the extensive epidemic of Ebola virus disease (EVD) in West Africa in 2014. In the course of examining mutations in the glycoprotein gene associated with 2014 Ebola virus (EBOV) sequences, a differential level of conservation was noted between the soluble form of glycoprotein (sGP) and the full length glycoprotein (GP), which are both encoded by the GP gene via RNA editing. In the region of the proteins encoded after the RNA editing site sGP was more conserved than the overlapping region of GP when compared to a distant outlier species, Tai Forest ebolavirus. Half of the amino acids comprising the "delta peptide", a 40 amino acid carboxy-terminal fragment of sGP, were identical between otherwise widely divergent species. A lysine-rich amphipathic peptide motif was noted at the carboxyl terminus of delta peptide with high structural relatedness to the cytolytic peptide of the non-structural protein 4 (NSP4) of rotavirus. EBOV delta peptide is a candidate viroporin, a cationic pore-forming peptide, and may contribute to EBOV pathogenesis. PMID:25609303

  17. Motif composition, conservation and condition-specificity of single and alternative transcription start sites in the Drosophila genome

    PubMed Central

    Rach, Elizabeth A; Yuan, Hsiang-Yu; Majoros, William H; Tomancak, Pavel; Ohler, Uwe

    2009-01-01

    Background Transcription initiation is a key component in the regulation of gene expression. mRNA 5' full-length sequencing techniques have enhanced our understanding of mammalian transcription start sites (TSSs), revealing different initiation patterns on a genomic scale. Results To identify TSSs in Drosophila melanogaster, we applied a hierarchical clustering strategy on available 5' expressed sequence tags (ESTs) and identified a high quality set of 5,665 TSSs for approximately 4,000 genes. We distinguished two initiation patterns: 'peaked' TSSs, and 'broad' TSS cluster groups. Peaked promoters were found to contain location-specific sequence elements; conversely, broad promoters were associated with non-location-specific elements. In alignments across other Drosophila genomes, conservation levels of sequence elements exceeded 90% within the melanogaster subgroup, but dropped considerably for distal species. Elements in broad promoters had lower levels of conservation than those in peaked promoters. When characterizing the distributions of ESTs, 64% of TSSs showed distinct associations to one out of eight different spatiotemporal conditions. Available whole-genome tiling array time series data revealed different temporal patterns of embryonic activity across the majority of genes with distinct alternative promoters. Many genes with maternally inherited transcripts were found to have alternative promoters utilized later in development. Core promoters of maternally inherited transcripts showed differences in motif composition compared to zygotically active promoters. Conclusions Our study provides a comprehensive map of Drosophila TSSs and the conditions under which they are utilized. Distinct differences in motif associations with initiation pattern and spatiotemporal utilization illustrate the complex regulatory code of transcription initiation. PMID:19589141

  18. Identification of an Electrostatic Ruler Motif for Sequence-Specific Binding of Collagenase to Collagen.

    PubMed

    Subramanian, Sundar Raman; Singam, Ettayapuram Ramaprasad Azhagiya; Berinski, Michael; Subramanian, Venkatesan; Wade, Rebecca C

    2016-08-25

    Sequence-specific cleavage of collagen by mammalian collagenase plays a pivotal role in cell function. Collagenases are matrix metalloproteinases that cleave the peptide bond at a specific position on fibrillar collagen. The collagenase Hemopexin-like (HPX) domain has been proposed to be responsible for substrate recognition, but the mechanism by which collagenases identify the cleavage site on fibrillar collagen is not clearly understood. In this study, Brownian dynamics simulations coupled with atomic-detail and coarse-grained molecular dynamics simulations were performed to dock matrix metalloproteinase-1 (MMP-1) on a collagen IIIα1 triple helical peptide. We find that the HPX domain recognizes the collagen triple helix at a conserved R-X11-R motif C-terminal to the cleavage site to which the HPX domain of collagen is guided electrostatically. The binding of the HPX domain between the two arginine residues is energetically stabilized by hydrophobic contacts with collagen. From the simulations and analysis of the sequences and structural flexibility of collagen and collagenase, a mechanistic scheme by which MMP-1 can recognize and bind collagen for proteolysis is proposed. PMID:27245212

  19. REPdenovo: Inferring De Novo Repeat Motifs from Short Sequence Reads

    PubMed Central

    Chu, Chong; Nielsen, Rasmus; Wu, Yufeng

    2016-01-01

    Repeat elements are important components of eukaryotic genomes. One limitation in our understanding of repeat elements is that most analyses rely on reference genomes that are incomplete and often contain missing data in highly repetitive regions that are difficult to assemble. To overcome this problem we develop a new method, REPdenovo, which assembles repeat sequences directly from raw shotgun sequencing data. REPdenovo can construct various types of repeats that are highly repetitive and have low sequence divergence within copies. We show that REPdenovo is substantially better than existing methods both in terms of the number and the completeness of the repeat sequences that it recovers. The key advantage of REPdenovo is that it can reconstruct long repeats from sequence reads. We apply the method to human data and discover a number of potentially new repeats sequences that have been missed by previous repeat annotations. Many of these sequences are incorporated into various parasite genomes, possibly because the filtering process for host DNA involved in the sequencing of the parasite genomes failed to exclude the host derived repeat sequences. REPdenovo is a new powerful computational tool for annotating genomes and for addressing questions regarding the evolution of repeat families. The software tool, REPdenovo, is available for download at https://github.com/Reedwarbler/REPdenovo. PMID:26977803

  20. Identification of disease-specific motifs in the antibody specificity repertoire via next-generation sequencing.

    PubMed

    Pantazes, Robert J; Reifert, Jack; Bozekowski, Joel; Ibsen, Kelly N; Murray, Joseph A; Daugherty, Patrick S

    2016-01-01

    Disease-specific antibodies can serve as highly effective biomarkers but have been identified for only a relatively small number of autoimmune diseases. A method was developed to identify disease-specific binding motifs through integration of bacterial display peptide library screening, next-generation sequencing (NGS) and computational analysis. Antibody specificity repertoires were determined by identifying bound peptide library members for each specimen using cell sorting and performing NGS. A computational algorithm, termed Identifying Motifs Using Next- generation sequencing Experiments (IMUNE), was developed and applied to discover disease- and healthy control-specific motifs. IMUNE performs comprehensive pattern searches, identifies patterns statistically enriched in the disease or control groups and clusters the patterns to generate motifs. Using celiac disease sera as a discovery set, IMUNE identified a consensus motif (QPEQPF[PS]E) with high diagnostic sensitivity and specificity in a validation sera set, in addition to novel motifs. Peptide display and sequencing (Display-Seq) coupled with IMUNE analysis may thus be useful to characterize antibody repertoires and identify disease-specific antibody epitopes and biomarkers. PMID:27481573

  1. Identification of disease-specific motifs in the antibody specificity repertoire via next-generation sequencing

    PubMed Central

    Pantazes, Robert J.; Reifert, Jack; Bozekowski, Joel; Ibsen, Kelly N.; Murray, Joseph A.; Daugherty, Patrick S.

    2016-01-01

    Disease-specific antibodies can serve as highly effective biomarkers but have been identified for only a relatively small number of autoimmune diseases. A method was developed to identify disease-specific binding motifs through integration of bacterial display peptide library screening, next-generation sequencing (NGS) and computational analysis. Antibody specificity repertoires were determined by identifying bound peptide library members for each specimen using cell sorting and performing NGS. A computational algorithm, termed Identifying Motifs Using Next- generation sequencing Experiments (IMUNE), was developed and applied to discover disease- and healthy control-specific motifs. IMUNE performs comprehensive pattern searches, identifies patterns statistically enriched in the disease or control groups and clusters the patterns to generate motifs. Using celiac disease sera as a discovery set, IMUNE identified a consensus motif (QPEQPF[PS]E) with high diagnostic sensitivity and specificity in a validation sera set, in addition to novel motifs. Peptide display and sequencing (Display-Seq) coupled with IMUNE analysis may thus be useful to characterize antibody repertoires and identify disease-specific antibody epitopes and biomarkers. PMID:27481573

  2. Identification of Promoter Motifs Involved in the Network of Phytochrome A-Regulated Gene Expression by Combined Analysis of Genomic Sequence and Microarray Data1[w

    PubMed Central

    Hudson, Matthew E.; Quail, Peter H.

    2003-01-01

    Several hundred Arabidopsis genes, transcriptionally regulated by phytochrome A (phyA), were previously identified using an oligonucleotide microarray. We have now identified, in silico, conserved sequence motifs in the promoters of these genes by comparing the promoter sequences to those of all the genes present on the microarray from which they were sampled. This was done using a Perl script (called Sift) that identifies over-represented motifs using an enumerative approach. The utility of Sift was verified by analysis of circadian-regulated promoters known to contain a biologically significant motif. Several elements were then identified in phyA-responsive promoters by their over-representation. Five previously undescribed motifs were detected in the promoters of phyA-induced genes. Four novel motifs were found in phyA-repressed promoters, plus a motif that strongly resembles the DE1 element. The G-box, CACGTG, was a prominent hit in both induced and repressed phyA-responsive promoters. Intriguingly, two distinct flanking consensus sequences were observed adjacent to the G-box core sequence: one predominating in phyA-induced promoters, the other in phyA-repressed promoters. Such different conserved flanking nucleotides around the core motif in these two sets of promoters may indicate that different members of the same family of DNA-binding proteins mediate phyA induction and repression. An increased abundance of G-box sequences was observed in the most rapidly phyA-responsive genes and in the promoters of phyA-regulated transcription factors, indicating that G-box-binding transcription factors are upstream components in a transcriptional cascade that mediates phyA-regulated development. PMID:14681527

  3. Conserved Hydration Sites in Pin1 Reveal a Distinctive Water Recognition Motif in Proteins.

    PubMed

    Barman, Arghya; Smitherman, Crystal; Souffrant, Michael; Gadda, Giovanni; Hamelberg, Donald

    2016-01-25

    Structurally conserved water molecules are important for biomolecular stability, flexibility, and function. X-ray crystallographic studies of Pin1 have resolved a number of water molecules around the enzyme, including two highly conserved water molecules within the protein. The functional role of these localized water molecules remains unknown and unexplored. Pin1 catalyzes cis/trans isomerizations of peptidyl prolyl bonds that are preceded by a phosphorylated serine or threonine residue. Pin1 is involved in many subcellular signaling processes and is a potential therapeutic target for the treatment of several life threatening diseases. Here, we investigate the significance of these structurally conserved water molecules in the catalytic domain of Pin1 using molecular dynamics (MD) simulations, free energy calculations, analysis of X-ray crystal structures, and circular dichroism (CD) experiments. MD simulations and free energy calculations suggest the tighter binding water molecule plays a crucial role in maintaining the integrity and stability of a critical hydrogen-bonding network in the active site. The second water molecule is exchangeable with bulk solvent and is found in a distinctive helix-turn-coil motif. Structural bioinformatics analysis of nonredundant X-ray crystallographic protein structures in the Protein Data Bank (PDB) suggest this motif is present in several other proteins and can act as a water site, akin to the calcium EF hand. CD experiments suggest the isolated motif is in a distorted PII conformation and requires the protein environment to fully form the α-helix-turn-coil motif. This study provides valuable insights into the role of hydration in the structural integrity of Pin1 that can be exploited in protein engineering and drug design. PMID:26651388

  4. Novel missense mutations in a conserved loop between ERCC6 (CSB) helicase motifs V and VI: Insights into Cockayne syndrome.

    PubMed

    Wilson, Brian T; Lochan, Anneline; Stark, Zornitza; Sutton, Ruth E

    2016-03-01

    Cockayne syndrome is caused by biallelic ERCC8 (CSA) or ERCC6 (CSB) mutations and is characterized by growth restriction, microcephaly, developmental delay, and premature pathological aging. Typically affected patients also have dermal photosensitivity. Although Cockayne syndrome is considered a DNA repair disorder, patients with UV-sensitive syndrome, with ERCC8 (CSA) or ERCC6 (CSB) mutations have indistinguishable DNA repair defects, but none of the extradermal features of Cockayne syndrome. We report novel missense mutations affecting a conserved loop in the ERCC6 (CSB) protein, associated with the Cockayne syndrome phenotype. Indeed, the amino acid sequence of this loop is more highly conserved than the adjacent helicase motifs V and VI, suggesting that this is a crucial structural component of the SWI/SNF family of proteins, to which ERCC6 (CSB) belongs. These comprise two RecA-like domains, separated by an interdomain linker, which interact through helicase motif VI. As the observed mutations are likely to act through destabilizing the tertiary protein structure, this prompted us to re-evaluate ERCC6 (CSB) mutation data in relation to the structure of SWI/SNF proteins. Our analysis suggests that antimorphic mutations cause Cockayne syndrome and that biallelic interdomain linker deletions produce more severe phenotypes. Based on our observations, we propose that further investigation of the pathogenic mechanisms underlying Cockayne syndrome should focus on the effect of antimorphic rather than null ERCC6 (CSB) mutations. PMID:26749132

  5. Using a color-coded ambigraphic nucleic acid notation to visualize conserved palindromic motifs within and across genomes

    PubMed Central

    2014-01-01

    Background Ambiscript is a graphically-designed nucleic acid notation that uses symbol symmetries to support sequence complementation, highlight biologically-relevant palindromes, and facilitate the analysis of consensus sequences. Although the original Ambiscript notation was designed to easily represent consensus sequences for multiple sequence alignments, the notation’s black-on-white ambiguity characters are unable to reflect the statistical distribution of nucleotides found at each position. We now propose a color-augmented ambigraphic notation to encode the frequency of positional polymorphisms in these consensus sequences. Results We have implemented this color-coding approach by creating an Adobe Flash® application ( http://www.ambiscript.org) that shades and colors modified Ambiscript characters according to the prevalence of the encoded nucleotide at each position in the alignment. The resulting graphic helps viewers perceive biologically-relevant patterns in multiple sequence alignments by uniquely combining color, shading, and character symmetries to highlight palindromes and inverted repeats in conserved DNA motifs. Conclusion Juxtaposing an intuitive color scheme over the deliberate character symmetries of an ambigraphic nucleic acid notation yields a highly-functional nucleic acid notation that maximizes information content and successfully embodies key principles of graphic excellence put forth by the statistician and graphic design theorist, Edward Tufte. PMID:24447494

  6. Sequence Motifs in Transit Peptides Act as Independent Functional Units and Can Be Transferred to New Sequence Contexts.

    PubMed

    Lee, Dong Wook; Woo, Seungjin; Geem, Kyoung Rok; Hwang, Inhwan

    2015-09-01

    A large number of nuclear-encoded proteins are imported into chloroplasts after they are translated in the cytosol. Import is mediated by transit peptides (TPs) at the N termini of these proteins. TPs contain many small motifs, each of which is critical for a specific step in the process of chloroplast protein import; however, it remains unknown how these motifs are organized to give rise to TPs with diverse sequences. In this study, we generated various hybrid TPs by swapping domains between Rubisco small subunit (RbcS) and chlorophyll a/b-binding protein, which have highly divergent sequences, and examined the abilities of the resultant TPs to deliver proteins into chloroplasts. Subsequently, we compared the functionality of sequence motifs in the hybrid TPs with those of wild-type TPs. The sequence motifs in the hybrid TPs exhibited three different modes of functionality, depending on their domain composition, as follows: active in both wild-type and hybrid TPs, active in wild-type TPs but inactive in hybrid TPs, and inactive in wild-type TPs but active in hybrid TPs. Moreover, synthetic TPs, in which only three critical motifs from RbcS or chlorophyll a/b-binding protein TPs were incorporated into an unrelated sequence, were able to deliver clients to chloroplasts with a comparable efficiency to RbcS TP. Based on these results, we propose that diverse sequence motifs in TPs are independent functional units that interact with specific translocon components at various steps during protein import and can be transferred to new sequence contexts. PMID:26149569

  7. Improved detection of helix-turn-helix DNA-binding motifs in protein sequences.

    PubMed Central

    Dodd, I B; Egan, J B

    1990-01-01

    We present an update of our method for systematic detection and evaluation of potential helix-turn-helix DNA-binding motifs in protein sequences [Dodd, I. and Egan, J. B. (1987) J. Mol. Biol. 194, 557-564]. The new method is considerably more powerful, detecting approximately 50% more likely helix-turn-helix sequences without an increase in false predictions. This improvement is due almost entirely to the use of a much larger reference set of 91 presumed helix-turn-helix sequences. The scoring matrix derived from this reference set has been calibrated against a large protein sequence database so that the score obtained by a sequence can be used to give a practical estimation of the probability that the sequence is a helix-turn-helix motif. PMID:2402433

  8. Sequence conservation on the Y chromosome

    SciTech Connect

    Gibson, L.H.; Yang-Feng, L.; Lau, C.

    1994-09-01

    The Y chromosome is present in all mammals and is considered to be essential to sex determination. Despite intense genomic research, only a few genes have been identified and mapped to this chromosome in humans. Several of them, such as SRY and ZFY, have been demonstrated to be conserved and Y-located in other mammals. In order to address the issue of sequence conservation on the Y chromosome, we performed fluorescence in situ hybridization (FISH) with DNA from a human Y cosmid library as a probe to study the Y chromosomes from other mammalian species. Total DNA from 3,000-4,500 cosmid pools were labeled with biotinylated-dUTP and hybridized to metaphase chromosomes. For human and primate preparations, human cot1 DNA was included in the hybridization mixture to suppress the hybridization from repeat sequences. FISH signals were detected on the Y chromosomes of human, gorilla, orangutan and baboon (Old World monkey) and were absent on those of squirrel monkey (New World monkey), Indian munjac, wood lemming, Chinese hamster, rat and mouse. Since sequence analysis suggested that specific genes, e.g. SRY and ZFY, are conserved between these two groups, the lack of detectable hybridization in the latter group implies either that conservation of the human Y sequences is limited to the Y chromosomes of the great apes and Old World monkeys, or that the size of the syntenic segment is too small to be detected under the resolution of FISH, or that homologeous sequences have undergone considerable divergence. Further studies with reduced hybridization stringency are currently being conducted. Our results provide some clues as to Y-sequence conservation across species and demonstrate the limitations of FISH across species with total DNA sequences from a particular chromosome.

  9. Viroids: From Genotype to Phenotype Just Relying on RNA Sequence and Structural Motifs

    PubMed Central

    Flores, Ricardo; Serra, Pedro; Minoia, Sofía; Di Serio, Francesco; Navarro, Beatriz

    2012-01-01

    As a consequence of two unique physical properties, small size and circularity, viroid RNAs do not code for proteins and thus depend on RNA sequence/structural motifs for interacting with host proteins that mediate their invasion, replication, spread, and circumvention of defensive barriers. Viroid genomes fold up on themselves adopting collapsed secondary structures wherein stretches of nucleotides stabilized by Watson–Crick pairs are flanked by apparently unstructured loops. However, compelling data show that they are instead stabilized by alternative non-canonical pairs and that specific loops in the rod-like secondary structure, characteristic of Potato spindle tuber viroid and most other members of the family Pospiviroidae, are critical for replication and systemic trafficking. In contrast, rather than folding into a rod-like secondary structure, most members of the family Avsunviroidae adopt multibranched conformations occasionally stabilized by kissing-loop interactions critical for viroid viability in vivo. Besides these most stable secondary structures, viroid RNAs alternatively adopt during replication transient metastable conformations containing elements of local higher-order structure, prominent among which are the hammerhead ribozymes catalyzing a key replicative step in the family Avsunviroidae, and certain conserved hairpins that also mediate replication steps in the family Pospiviroidae. Therefore, different RNA structures – either global or local – determine different functions, thus highlighting the need for in-depth structural studies on viroid RNAs. PMID:22719735

  10. Conserved noncoding sequences (CNSs) in higher plants.

    PubMed

    Freeling, Michael; Subramaniam, Shabarinath

    2009-04-01

    Plant conserved noncoding sequences (CNSs)--a specific category of phylogenetic footprint--have been shown experimentally to function. No plant CNS is conserved to the extent that ultraconserved noncoding sequences are conserved in vertebrates. Plant CNSs are enriched in known transcription factor or other cis-acting binding sites, and are usually clustered around genes. Genes that encode transcription factors and/or those that respond to stimuli are particularly CNS-rich. Only rarely could this function involve small RNA binding. Some transcribed CNSs encode short translation products as a form of negative control. Approximately 4% of Arabidopsis gene content is estimated to be both CNS-rich and occupies a relatively long stretch of chromosome: Bigfoot genes (long phylogenetic footprints). We discuss a 'DNA-templated protein assembly' idea that might help explain Bigfoot gene CNSs. PMID:19249238

  11. Membrane localization of MinD is mediated by a C-terminal motif that is conserved across eubacteria, archaea, and chloroplasts.

    PubMed

    Szeto, Tim H; Rowland, Susan L; Rothfield, Lawrence I; King, Glenn F

    2002-11-26

    MinD is a widely conserved ATPase that has been demonstrated to play a pivotal role in selection of the division site in eubacteria and chloroplasts. It is a member of the large ParA superfamily of ATPases that are characterized by a deviant Walker-type ATP-binding motif. MinD localizes to the cytoplasmic face of the inner membrane in Escherichia coli, and its association with the inner membrane is a prerequisite for membrane recruitment of the septation inhibitor MinC. However, the mechanism by which MinD associates with the membrane has proved enigmatic; it seems to lack a transmembrane domain and the amino acid sequence is devoid of hydrophobic tracts that might predispose the protein to interaction with lipids. In this study, we show that the extreme C-terminal region of MinD contains a highly conserved 8- to 12-residue sequence motif that is essential for membrane localization of the protein. We provide evidence that this motif forms an amphipathic helix that most likely mediates a direct interaction between MinD and membrane phospholipids. A model is proposed whereby the membrane-targeting motif mediates the rapid cycles of membrane attachment-release-reattachment that are presumed to occur during pole-to-pole oscillation of MinD in E. coli. PMID:12424340

  12. Stanniocalcin 1 binds hemin through a partially conserved heme regulatory motif

    SciTech Connect

    Westberg, Johan A.; Jiang, Ji; Andersson, Leif C.

    2011-06-03

    Highlights: {yields} Stanniocalcin 1 (STC1) binds heme through novel heme binding motif. {yields} Central iron atom of heme and cysteine-114 of STC1 are essential for binding. {yields} STC1 binds Fe{sup 2+} and Fe{sup 3+} heme. {yields} STC1 peptide prevents oxidative decay of heme. -- Abstract: Hemin (iron protoporphyrin IX) is a necessary component of many proteins, functioning either as a cofactor or an intracellular messenger. Hemoproteins have diverse functions, such as transportation of gases, gas detection, chemical catalysis and electron transfer. Stanniocalcin 1 (STC1) is a protein involved in respiratory responses of the cell but whose mechanism of action is still undetermined. We examined the ability of STC1 to bind hemin in both its reduced and oxidized states and located Cys{sup 114} as the axial ligand of the central iron atom of hemin. The amino acid sequence differs from the established (Cys-Pro) heme regulatory motif (HRM) and therefore presents a novel heme binding motif (Cys-Ser). A STC1 peptide containing the heme binding sequence was able to inhibit both spontaneous and H{sub 2}O{sub 2} induced decay of hemin. Binding of hemin does not affect the mitochondrial localization of STC1.

  13. Phylum-Level Conservation of Regulatory Information in Nematodes despite Extensive Non-coding Sequence Divergence

    PubMed Central

    Gordon, Kacy L.; Arthur, Robert K.; Ruvinsky, Ilya

    2015-01-01

    Gene regulatory information guides development and shapes the course of evolution. To test conservation of gene regulation within the phylum Nematoda, we compared the functions of putative cis-regulatory sequences of four sets of orthologs (unc-47, unc-25, mec-3 and elt-2) from distantly-related nematode species. These species, Caenorhabditis elegans, its congeneric C. briggsae, and three parasitic species Meloidogyne hapla, Brugia malayi, and Trichinella spiralis, represent four of the five major clades in the phylum Nematoda. Despite the great phylogenetic distances sampled and the extensive sequence divergence of nematode genomes, all but one of the regulatory elements we tested are able to drive at least a subset of the expected gene expression patterns. We show that functionally conserved cis-regulatory elements have no more extended sequence similarity to their C. elegans orthologs than would be expected by chance, but they do harbor motifs that are important for proper expression of the C. elegans genes. These motifs are too short to be distinguished from the background level of sequence similarity, and while identical in sequence they are not conserved in orientation or position. Functional tests reveal that some of these motifs contribute to proper expression. Our results suggest that conserved regulatory circuitry can persist despite considerable turnover within cis elements. PMID:26020930

  14. A Conserved Cysteine Motif Is Critical for Rice Ceramide Kinase Activity and Function

    PubMed Central

    Liu, Zhe; Fang, Ce; Li, Jian; Su, Jian-Bin; Greenberg, Jean T.; Wang, Hong-Bin; Yao, Nan

    2011-01-01

    Background Ceramide kinase (CERK) is a key regulator of cell survival in dicotyledonous plants and animals. Much less is known about the roles of CERK and ceramides in mediating cellular processes in monocot plants. Here, we report the characterization of a ceramide kinase, OsCERK, from rice (Oryza sativa spp. Japonica cv. Nipponbare) and investigate the effects of ceramides on rice cell viability. Principal Findings OsCERK can complement the Arabidopsis CERK mutant acd5. Recombinant OsCERK has ceramide kinase activity with Michaelis-Menten kinetics and optimal activity at 7.0 pH and 40°C. Mg2+ activates OsCERK in a concentration-dependent manner. Importantly, a CXXXCXXC motif, conserved in all ceramide kinases and important for the activity of the human enzyme, is critical for OsCERK enzyme activity and in planta function. In a rice protoplast system, inhibition of CERK leads to cell death and the ratio of added ceramide and ceramide-1-phosphate, CERK's substrate and product, respectively, influences cell survival. Ceramide-induced rice cell death has apoptotic features and is an active process that requires both de novo protein synthesis and phosphorylation, respectively. Finally, mitochondria membrane potential loss previously associated with ceramide-induced cell death in Arabidopsis was also found in rice, but it occurred with different timing. Conclusions OsCERK is a bona fide ceramide kinase with a functionally and evolutionarily conserved Cys-rich motif that plays an important role in modulating cell fate in plants. The vital function of the conserved motif in both human and rice CERKs suggests that the biochemical mechanism of CERKs is similar in animals and plants. Furthermore, ceramides induce cell death with similar features in monocot and dicot plants. PMID:21483860

  15. Evolutionarily Conserved Regulatory Motifs in the Promoter of the Arabidopsis Clock Gene LATE ELONGATED HYPOCOTYL[C][W

    PubMed Central

    Spensley, Mark; Kim, Jae-Yean; Picot, Emma; Reid, John; Ott, Sascha; Helliwell, Chris; Carré, Isabelle A.

    2009-01-01

    The transcriptional regulation of the LATE ELONGATED HYPOCOTYL (LHY) gene is key to the structure of the circadian oscillator, integrating information from multiple regulatory pathways. We identified a minimal region of the LHY promoter that was sufficient for rhythmic expression. Another upstream sequence was also required for appropriate waveform of transcription and for maximum amplitude of oscillations under both diurnal and free-running conditions. We showed that two classes of protein complexes interact with a G-box and with novel 5A motifs; mutation of these sites reduced the amplitude of oscillation and broadened the peak of expression. A genome-wide bioinformatic analysis showed that these sites were enriched in phase-specific clusters of rhythmically expressed genes. Comparative genomic analyses showed that these motifs were conserved in orthologous promoters from several species. A position-specific scoring matrix for the 5A sites suggested similarity to CArG boxes, which are recognized by MADS box transcription factors. In support of this, the FLOWERING LOCUS C (FLC) protein was shown to interact with the LHY promoter in planta. This suggests a mechanism by which FLC might affect circadian period. PMID:19789276

  16. Novel hexamerization motif is discovered in a conserved cytoplasmic protein from Salmonella typhimurium.

    SciTech Connect

    Petrova, T.; Cuff, M.; Wu, R.; Kim, Y.; Holzle, D.; Joachimiak, A.; Biosciences Division; Inst. of Mathematical Problems of Biology

    2007-01-01

    The cytoplasmic protein Stm3548 of unknown function obtained from a strain of Salmonella typhimurium was determined by X-ray crystallography at a resolution of 2.25 A. The asymmetric unit contains a hexamer of structurally identical monomers. The monomer is a globular domain with a long beta-hairpin protrusion that distinguishes this structure. This beta-hairpin occupies a central position in the hexamer, and its residues participate in the majority of interactions between subunits of the hexamer. We suggest that the structure of Stm3548 presents a new hexamerization motif. Because the residues participating in interdomain interactions are highly conserved among close members of protein family DUF1355 and buried solvent accessible area for the hexamer is significant, the hexamer is most likely conserved as well. A light scattering experiment confirmed the presence of hexamer in solution.

  17. Structural alphabet motif discovery and a structural motif database.

    PubMed

    Ku, Shih-Yen; Hu, Yuh-Jyh

    2012-01-01

    This study proposes a general framework for structural motif discovery. The framework is based on a modular design in which the system components can be modified or replaced independently to increase its applicability to various studies. It is a two-stage approach that first converts protein 3D structures into structural alphabet sequences, and then applies a sequence motif-finding tool to these sequences to detect conserved motifs. We named the structural motif database we built the SA-Motifbase, which provides the structural information conserved at different hierarchical levels in SCOP. For each motif, SA-Motifbase presents its 3D view; alphabet letter preference; alphabet letter frequency distribution; and the significance. SA-Motifbase is available at http://bioinfo.cis.nctu.edu.tw/samotifbase/. PMID:22099701

  18. Proteome-Wide Discovery of Evolutionary Conserved Sequences in Disordered Regions

    PubMed Central

    Nguyen Ba, Alex N.; Yeh, Brian J.; van Dyk, Dewald; Davidson, Alan R.; Andrews, Brenda J.; Weiss, Eric L.; Moses, Alan M.

    2016-01-01

    At least 30% of human proteins are thought to contain intrinsically disordered regions, which lack stable structural conformation. Despite lacking enzymatic functions and having few protein domains, disordered regions are functionally important for protein regulation and contain short linear motifs (short peptide sequences involved in protein-protein interactions), but in most disordered regions, the functional amino acid residues remain unknown. We searched for evolutionarily conserved sequences within disordered regions according to the hypothesis that conservation would indicate functional residues. Using a phylogenetic hidden Markov model (phylo-HMM), we made accurate, specific predictions of functional elements in disordered regions even when these elements are only two or three amino acids long. Among the conserved sequences that we identified were previously known and newly identified short linear motifs, and we experimentally verified key examples, including a motif that may mediate interaction between protein kinase Cbk1 and its substrates. We also observed that hub proteins, which interact with many partners in a protein interaction network, are highly enriched in these conserved sequences. Our analysis enabled the systematic identification of the functional residues in disordered regions and suggested that at least 5% of amino acids in disordered regions are important for function. PMID:22416277

  19. The conserved helicase motifs of the herpes simplex virus type 1 origin-binding protein UL9 are important for function.

    PubMed Central

    Martinez, R; Shao, L; Weller, S K

    1992-01-01

    The UL9 gene of herpes simplex virus encodes a protein that specifically recognizes sequences within the viral origins of replication and exhibits helicase and DNA-dependent ATPase activities. The specific DNA binding domain of the UL9 protein was localized to the carboxy-terminal one-third of the molecule (H. M. Weir, J. M. Calder, and N. D. Stow, Nucleic Acids Res. 17:1409-1425, 1989). The N-terminal two-thirds of the UL9 gene contains six sequence motifs found in all members of a superfamily of DNA and RNA helicases, suggesting that this region may be important for helicase activity of UL9. In this report, we examined the functional significance of these six motifs for the UL9 protein through the introduction of site-specific mutations resulting in single amino acid substitutions of the most highly conserved residues within each motif. An in vivo complementation test was used to study the effect of each mutation on the function of the UL9 protein in viral DNA replication. In this assay, a mutant UL9 protein expressed from a transfected plasmid is used to complement a replication-deficient null mutant in the UL9 gene for the amplification of herpes simplex virus origin-containing plasmids. Mutations in five of the six conserved motifs inactivated the function of the UL9 protein in viral DNA replication, providing direct evidence for the importance of these conserved motifs. Insertion mutants resulting in the introduction of two alanines at 100-residue intervals in regions outside the conserved motifs were also constructed. Three of the insertion mutations were tolerated, whereas the other five abolished UL9 function. These data indicate that other regions of the protein, in addition to the helicase motifs, are important for function in vivo. Several mutations result in instability of the mutant products, presumably because of conformational changes in the protein. Taken together, these results suggest that UL9 is very sensitive to mutations with respect to both

  20. Discovering active motifs in sets of related protein sequences and using them for classification.

    PubMed Central

    Wang, J T; Marr, T G; Shasha, D; Shapiro, B A; Chirn, G W

    1994-01-01

    We describe a method for discovering active motifs in a set of related protein sequences. The method is an automatic two step process: (1) find candidate motifs in a small sample of the sequences; (2) test whether these motifs are approximately present in all the sequences. To reduce the running time, we develop two optimization heuristics based on statistical estimation and pattern matching techniques. Experimental results obtained by running these algorithms on generated data and functionally related proteins demonstrate the good performance of the presented method compared with visual method of O'Farrell and Leopold. By combining the discovered motifs with an existing fingerprint technique, we develop a protein classifier. When we apply the classifier to the 698 groups of related proteins in the PROSITE catalog, it gives information that is complementary to the BLOCKS protein classifier of Henikoff and Henikoff. Thus, using our classifier in conjunction with theirs, one can obtain high confidence classifications (if BLOCKS and our classifier agree) or suggest a new hypothesis (if the two disagree). PMID:8052532

  1. Identification of potential regulatory motifs in odorant receptor genes by analysis of promoter sequences

    PubMed Central

    Michaloski, Jussara S.; Galante, Pedro A.F.

    2006-01-01

    Mouse odorant receptors (ORs) are encoded by >1000 genes dispersed throughout the genome. Each olfactory neuron expresses one single OR gene, while the rest of the genes remain silent. The mechanisms underlying OR gene expression are poorly understood. Here, we investigated if OR genes share common cis-regulatory sequences in their promoter regions. We carried out a comprehensive analysis in which the upstream regions of a large number of OR genes were compared. First, using RLM-RACE, we generated cDNAs containing the complete 5′-untranslated regions (5′-UTRs) for a total number of 198 mouse OR genes. Then, we aligned these cDNA sequences to the mouse genome so that the 5′ structure and transcription start sites (TSSs) of the OR genes could be precisely determined. Sequences upstream of the TSSs were retrieved and browsed for common elements. We found DNA sequence motifs that are overrepresented in the promoter regions of the OR genes. Most motifs resemble O/E-like sites and are preferentially localized within 200 bp upstream of the TSSs. Finally, we show that these motifs specifically interact with proteins extracted from nuclei prepared from the olfactory epithelium, but not from brain or liver. Our results show that the OR genes share common promoter elements. The present strategy should provide information on the role played by cis-regulatory sequences in OR gene regulation. PMID:16902085

  2. Functional roles of short sequence motifs in the endocytosis of membrane receptors

    PubMed Central

    Pandey, Kailash N.

    2009-01-01

    Internalization and trafficking of cell-surface membrane receptors and proteins into subcellular compartments is mediated by specific short-sequence signal motifs, which are usually located within the cytoplasmic domains of these receptor and protein molecules. The signals usually consist of short linear amino acid sequences, which are recognized by adaptor coat proteins along the endocytic and sorting pathways. The complex arrays of signals and recognition proteins ensure the dynamic movement, accurate trafficking, and designated distribution of transmembrane receptors and ligands into intracellular compartments, particularly to the endosomal-lysosomal system. This review summarizes the new information and concepts, integrating them with the current and established views of endocytosis, intracellular trafficking, and sorting of membrane receptors and proteins. Particular emphasis has been given to the functional roles of short-sequence signal motifs responsible for the itinerary and destination of membrane receptors and proteins moving into the subcellular compartments. The specific characteristics and functions of short-sequence motifs, including various tyrosine-based, dileucine-type, and other short-sequence signals in the trafficking and sorting of membrane receptors and membrane proteins are presented and discussed. PMID:19482617

  3. Mutations in the highly conserved GGQ motif of class 1 polypeptide release factors abolish ability of human eRF1 to trigger peptidyl-tRNA hydrolysis.

    PubMed Central

    Frolova, L Y; Tsivkovskii, R Y; Sivolobova, G F; Oparina, N Y; Serpinsky, O I; Blinov, V M; Tatkov, S I; Kisselev, L L

    1999-01-01

    Although the primary structures of class 1 polypeptide release factors (RF1 and RF2 in prokaryotes, eRF1 in eukaryotes) are known, the molecular basis by which they function in translational termination remains obscure. Because all class 1 RFs promote a stop-codon-dependent and ribosome-dependent hydrolysis of peptidyl-tRNAs, one may anticipate that this common function relies on a common structural motif(s). We have compared amino acid sequences of the available class 1 RFs and found a novel, common, unique, and strictly conserved GGQ motif that should be in a loop (coil) conformation as deduced by programs predicting protein secondary structure. Site-directed mutagenesis of the human eRF1 as a representative of class 1 RFs shows that substitution of both glycyl residues in this motif, G183 and G184, causes complete inactivation of the protein as a release factor toward all three stop codons, whereas two adjacent amino acid residues, G181 and R182, are functionally nonessential. Inactive human eRF1 mutants compete in release assays with wild-type eRF1 and strongly inhibit their release activity. Mutations of the glycyl residues in this motif do not affect another function, the ability of eRF1 together with the ribosome to induce GTPase activity of human eRF3, a class 2 RF. We assume that the novel highly conserved GGQ motif is implicated directly or indirectly in the activity of class 1 RFs in translation termination. PMID:10445876

  4. In planta analysis of a cis-regulatory cytokinin response motif in Arabidopsis and identification of a novel enhancer sequence.

    PubMed

    Ramireddy, Eswarayya; Brenner, Wolfram G; Pfeifer, Andreas; Heyl, Alexander; Schmülling, Thomas

    2013-07-01

    The phytohormone cytokinin plays a key role in regulating plant growth and development, and is involved in numerous physiological responses to environmental changes. The type-B response regulators, which regulate the transcription of cytokinin response genes, are a part of the cytokinin signaling system. Arabidopsis thaliana encodes 11 type-B response regulators (type-B ARRs), and some of them were shown to bind in vitro to the core cytokinin response motif (CRM) 5'-(A/G)GAT(T/C)-3' or, in the case of ARR1, to an extended motif (ECRM), 5'-AAGAT(T/C)TT-3'. Here we obtained in planta proof for the functionality of the latter motif. Promoter deletion analysis of the primary cytokinin response gene ARR6 showed that a combination of two extended motifs within the promoter is required to mediate the full transcriptional activation by ARR1 and other type-B ARRs. CRMs were found to be over-represented in the vicinity of ECRMs in the promoters of cytokinin-regulated genes, suggesting their functional relevance. Moreover, an evolutionarily conserved 27 bp long T-rich region between -220 and -193 bp was identified and shown to be required for the full activation by type-B ARRs and the response to cytokinin. This novel enhancer is not bound by the DNA-binding domain of ARR1, indicating that additional proteins might be involved in mediating the transcriptional cytokinin response. Furthermore, genome-wide expression profiling identified genes, among them ARR16, whose induction by cytokinin depends on both ARR1 and other specific type-B ARRs. This together with the ECRM/CRM sequence clustering indicates cooperative action of different type-B ARRs for the activation of particular target genes. PMID:23620480

  5. The sea anemone actinoporin (Arg-Gly-Asp) conserved motif is involved in maintaining the competent oligomerization state of these pore-forming toxins.

    PubMed

    García-Linares, Sara; Richmond, Ryan; García-Mayoral, María F; Bustamante, Noemí; Bruix, Marta; Gavilanes, José G; Martínez-Del-Pozo, Alvaro

    2014-03-01

    Sea anemone actinoporins constitute an optimum model to investigate mechanisms of membrane pore formation. All actinoporins of known structure show a general fold of a β-sandwich motif flanked by two α-helices. The crucial structure for pore formation seems to be the helix located at the N-terminal end. The role of several other protein regions in membrane attachment is also well established. However, not much is known about the protein residues involved in the oligomerization required for pore formation. Previous detailed analysis of the soluble three-dimensional structures of different wild-type and mutant actinoporins from Stychodactyla helianthus suggested residues which could be involved in this oligomerization. One of these stretches contains a conserved sequence compatible with an integrin-binding RGD motif. The results presented now deal with mutants affecting this motif in the well-characterized actinoporin sticholysin II. Small modifications along this three-residue sequence had profound effects on its solubility. Just a single methyl group yielded an RAD mutant version with a highly diminished haemolytic activity and altered oligomerization behaviour. The results obtained are discussed in terms of a key role for the RGD motif in maintaining the actinoporins' pore-competent state of protein oligomerization. PMID:24418371

  6. Using machine learning to predict gene expression and discover sequence motifs

    NASA Astrophysics Data System (ADS)

    Li, Xuejing

    Recently, large amounts of experimental data for complex biological systems have become available. We use tools and algorithms from machine learning to build data-driven predictive models. We first present a novel algorithm to discover gene sequence motifs associated with temporal expression patterns of genes. Our algorithm, which is based on partial least squares (PLS) regression, is able to directly model the flow of information, from gene sequence to gene expression, to learn cis regulatory motifs and characterize associated gene expression patterns. Our algorithm outperforms traditional computational methods e.g. clustering in motif discovery. We then present a study of extending a machine learning model for transcriptional regulation predictive of genetic regulatory response to Caenorhabditis elegans. We show meaningful results both in terms of prediction accuracy on the test experiments and biological information extracted from the regulatory program. The model discovers DNA binding sites ab initio. We also present a case study where we detect a signal of lineage-specific regulation. Finally we present a comparative study on learning predictive models for motif discovery, based on different boosting algorithms: Adaptive Boosting (AdaBoost), Linear Programming Boosting (LPBoost) and Totally Corrective Boosting (TotalBoost). We evaluate and compare the performance of the three boosting algorithms via both statistical and biological validation, for hypoxia response in Saccharomyces cerevisiae.

  7. Comparative Analysis of Evolutionarily Conserved Motifs of Epidermal Growth Factor Receptor 2 (HER2) Predicts Novel Potential Therapeutic Epitopes

    PubMed Central

    Deng, Xiaohong; Zheng, Xuxu; Yang, Huanming; Moreira, José Manuel Afonso; Brünner, Nils; Christensen, Henrik

    2014-01-01

    Overexpression of human epidermal growth factor receptor 2 (HER2) is associated with tumor aggressiveness and poor prognosis in breast cancer. With the availability of therapeutic antibodies against HER2, great strides have been made in the clinical management of HER2 overexpressing breast cancer. However, de novo and acquired resistance to these antibodies presents a serious limitation to successful HER2 targeting treatment. The identification of novel epitopes of HER2 that can be used for functional/region-specific blockade could represent a central step in the development of new clinically relevant anti-HER2 antibodies. In the present study, we present a novel computational approach as an auxiliary tool for identification of novel HER2 epitopes. We hypothesized that the structurally and linearly evolutionarily conserved motifs of the extracellular domain of HER2 (ECD HER2) contain potential druggable epitopes/targets. We employed the PROSITE Scan to detect structurally conserved motifs and PRINTS to search for linearly conserved motifs of ECD HER2. We found that the epitopes recognized by trastuzumab and pertuzumab are located in the predicted conserved motifs of ECD HER2, supporting our initial hypothesis. Considering that structurally and linearly conserved motifs can provide functional specific configurations, we propose that by comparing the two types of conserved motifs, additional druggable epitopes/targets in the ECD HER2 protein can be identified, which can be further modified for potential therapeutic application. Thus, this novel computational process for predicting or searching for potential epitopes or key target sites may contribute to epitope-based vaccine and function-selected drug design, especially when x-ray crystal structure protein data is not available. PMID:25192037

  8. Sequence-specific intramembrane proteolysis: identification of a recognition motif in rhomboid substrates.

    PubMed

    Strisovsky, Kvido; Sharpe, Hayley J; Freeman, Matthew

    2009-12-25

    Members of the widespread rhomboid family of intramembrane proteases cleave transmembrane domain (TMD) proteins to regulate processes as diverse as EGF receptor signaling, mitochondrial dynamics, and invasion by apicomplexan parasites. However, lack of information about their substrates means that the biological role of most rhomboids remains obscure. Knowledge of how rhomboids recognize their substrates would illuminate their mechanism and might also allow substrate prediction. Previous work has suggested that rhomboid substrates are specified by helical instability in their TMD. Here we demonstrate that rhomboids instead primarily recognize a specific sequence surrounding the cleavage site. This recognition motif is necessary for substrate cleavage, it determines the cleavage site, and it is more strictly required than TM helix-destabilizing residues. Our work demonstrates that intramembrane proteases can be sequence specific and that genome-wide substrate prediction based on their recognition motifs is feasible. PMID:20064469

  9. A Convex Atomic-Norm Approach to Multiple Sequence Alignment and Motif Discovery

    PubMed Central

    Yen, Ian E. H.; Lin, Xin; Zhang, Jiong; Ravikumar, Pradeep; Dhillon, Inderjit S.

    2016-01-01

    Multiple Sequence Alignment and Motif Discovery, known as NP-hard problems, are two fundamental tasks in Bioinformatics. Existing approaches to these two problems are based on either local search methods such as Expectation Maximization (EM), Gibbs Sampling or greedy heuristic methods. In this work, we develop a convex relaxation approach to both problems based on the recent concept of atomic norm and develop a new algorithm, termed Greedy Direction Method of Multiplier, for solving the convex relaxation with two convex atomic constraints. Experiments show that our convex relaxation approach produces solutions of higher quality than those standard tools widely-used in Bioinformatics community on the Multiple Sequence Alignment and Motif Discovery problems. PMID:27559428

  10. Recognition of Conserved Amino Acid Motifs of Common Viruses and Its Role in Autoimmunity

    PubMed Central

    2005-01-01

    The triggers of autoimmune diseases such as multiple sclerosis (MS) remain elusive. Epidemiological studies suggest that common pathogens can exacerbate and also induce MS, but it has been difficult to pinpoint individual organisms. Here we demonstrate that in vivo clonally expanded CD4+ T cells isolated from the cerebrospinal fluid of a MS patient during disease exacerbation respond to a poly-arginine motif of the nonpathogenic and ubiquitous Torque Teno virus. These T cell clones also can be stimulated by arginine-enriched protein domains from other common viruses and recognize multiple autoantigens. Our data suggest that repeated infections with common pathogenic and even nonpathogenic viruses could expand T cells specific for conserved protein domains that are able to cross-react with tissue-derived and ubiquitous autoantigens. PMID:16362076

  11. Integrating bioinformatic resources to predict transcription factors interacting with cis-sequences conserved in co-regulated genes

    PubMed Central

    2014-01-01

    Background Using motif detection programs it is fairly straightforward to identify conserved cis-sequences in promoters of co-regulated genes. In contrast, the identification of the transcription factors (TFs) interacting with these cis-sequences is much more elaborate. To facilitate this, we explore the possibility of using several bioinformatic and experimental approaches for TF identification. This starts with the selection of co-regulated gene sets and leads first to the prediction and then to the experimental validation of TFs interacting with cis-sequences conserved in the promoters of these co-regulated genes. Results Using the PathoPlant database, 32 up-regulated gene groups were identified with microarray data for drought-responsive gene expression from Arabidopsis thaliana. Application of the binding site estimation suite of tools (BEST) discovered 179 conserved sequence motifs within the corresponding promoters. Using the STAMP web-server, 49 sequence motifs were classified into 7 motif families for which similarities with known cis-regulatory sequences were identified. All motifs were subjected to a footprintDB analysis to predict interacting DNA binding domains from plant TF families. Predictions were confirmed by using a yeast-one-hybrid approach to select interacting TFs belonging to the predicted TF families. TF-DNA interactions were further experimentally validated in yeast and with a Physcomitrella patens transient expression system, leading to the discovery of several novel TF-DNA interactions. Conclusions The present work demonstrates the successful integration of several bioinformatic resources with experimental approaches to predict and validate TFs interacting with conserved sequence motifs in co-regulated genes. PMID:24773781

  12. Evolutionary divergence and limits of conserved non-coding sequence detection in plant genomes

    PubMed Central

    Reineke, Anna R.; Bornberg-Bauer, Erich; Gu, Jenny

    2011-01-01

    The discovery of regulatory motifs embedded in upstream regions of plants is a particularly challenging bioinformatics task. Previous studies have shown that motifs in plants are short compared with those found in vertebrates. Furthermore, plant genomes have undergone several diversification mechanisms such as genome duplication events which impact the evolution of regulatory motifs. In this article, a systematic phylogenomic comparison of upstream regions is conducted to further identify features of the plant regulatory genomes, the component of genomes regulating gene expression, to enable future de novo discoveries. The findings highlight differences in upstream region properties between major plant groups and the effects of divergence times and duplication events. First, clear differences in upstream region evolution can be detected between monocots and dicots, thus suggesting that a separation of these groups should be made when searching for novel regulatory motifs, particularly since universal motifs such as the TATA box are rare. Second, investigating the decay rate of significantly aligned regions suggests that a divergence time of ∼100 mya sets a limit for reliable conserved non-coding sequence (CNS) detection. Insights presented here will set a framework to help identify embedded motifs of functional relevance by understanding the limits of bioinformatics detection for CNSs. PMID:21470961

  13. Interaction prediction using conserved network motifs in protein-protein interaction networks

    NASA Astrophysics Data System (ADS)

    Albert, Reka

    2005-03-01

    High-throughput protein interaction detection methods are strongly affected by false positive and false negative results. Focused experiments are needed to complement the large-scale methods by validating previously detected interactions but it is often difficult to decide which proteins to probe as interaction partners. Developing reliable computational methods assisting this decision process is a pressing need in bioinformatics. This talk will describe the recent developments in analyzing and understanding protein interaction networks, then present a method that uses the conserved properties of the protein network to identify and validate interaction candidates. We apply a number of machine learning algorithms to the protein connectivity information and achieve a surprisingly good overall performance in predicting interacting proteins. Using a ``leave-one-ou approach we find average success rates between 20-50% for predicting the correct interaction partner of a protein. We demonstrate that the success of these methods is based on the presence of conserved interaction motifs within the network. A reference implementation and a table with candidate interacting partners for each yeast protein are available at http://www.protsuggest.org

  14. Conserved Intramolecular Interactions Maintain Myosin Interacting-Heads Motifs Explaining Tarantula Muscle Super-Relaxed State Structural Basis.

    PubMed

    Alamo, Lorenzo; Qi, Dan; Wriggers, Willy; Pinto, Antonio; Zhu, Jingui; Bilbao, Aivett; Gillilan, Richard E; Hu, Songnian; Padrón, Raúl

    2016-03-27

    Tarantula striated muscle is an outstanding system for understanding the molecular organization of myosin filaments. Three-dimensional reconstruction based on cryo-electron microscopy images and single-particle image processing revealed that, in a relaxed state, myosin molecules undergo intramolecular head-head interactions, explaining why head activity switches off. The filament model obtained by rigidly docking a chicken smooth muscle myosin structure to the reconstruction was improved by flexibly fitting an atomic model built by mixing structures from different species to a tilt-corrected 2-nm three-dimensional map of frozen-hydrated tarantula thick filament. We used heavy and light chain sequences from tarantula myosin to build a single-species homology model of two heavy meromyosin interacting-heads motifs (IHMs). The flexibly fitted model includes previously missing loops and shows five intramolecular and five intermolecular interactions that keep the IHM in a compact off structure, forming four helical tracks of IHMs around the backbone. The residues involved in these interactions are oppositely charged, and their sequence conservation suggests that IHM is present across animal species. The new model, PDB 3JBH, explains the structural origin of the ATP turnover rates detected in relaxed tarantula muscle by ascribing the very slow rate to docked unphosphorylated heads, the slow rate to phosphorylated docked heads, and the fast rate to phosphorylated undocked heads. The conservation of intramolecular interactions across animal species and the presence of IHM in bilaterians suggest that a super-relaxed state should be maintained, as it plays a role in saving ATP in skeletal, cardiac, and smooth muscles. PMID:26851071

  15. CyanoLyase: a database of phycobilin lyase sequences, motifs and functions

    PubMed Central

    Bretaudeau, Anthony; Coste, François; Humily, Florian; Garczarek, Laurence; Le Corguillé, Gildas; Six, Christophe; Ratin, Morgane; Collin, Olivier; Schluchter, Wendy M.; Partensky, Frédéric

    2013-01-01

    CyanoLyase (http://cyanolyase.genouest.org/) is a manually curated sequence and motif database of phycobilin lyases and related proteins. These enzymes catalyze the covalent ligation of chromophores (phycobilins) to specific binding sites of phycobiliproteins (PBPs). The latter constitute the building bricks of phycobilisomes, the major light-harvesting systems of cyanobacteria and red algae. Phycobilin lyases sequences are poorly annotated in public databases. Sequences included in CyanoLyase were retrieved from all available genomes of these organisms and a few others by similarity searches using biochemically characterized enzyme sequences and then classified into 3 clans and 32 families. Amino acid motifs were computed for each family using Protomata learner. CyanoLyase also includes BLAST and a novel pattern matching tool (Protomatch) that allow users to rapidly retrieve and annotate lyases from any new genome. In addition, it provides phylogenetic analyses of all phycobilin lyases families, describes their function, their presence/absence in all genomes of the database (phyletic profiles) and predicts the chromophorylation of PBPs in each strain. The site also includes a thorough bibliography about phycobilin lyases and genomes included in the database. This resource should be useful to scientists and companies interested in natural or artificial PBPs, which have a number of biotechnological applications, notably as fluorescent markers. PMID:23175607

  16. Conserved function of the lysine-based KXD/E motif in Golgi retention for endomembrane proteins among different organisms.

    PubMed

    Woo, Cheuk Hang; Gao, Caiji; Yu, Ping; Tu, Linna; Meng, Zhaoyue; Banfield, David K; Yao, Xiaoqiang; Jiang, Liwen

    2015-11-15

    We recently identified a new COPI-interacting KXD/E motif in the C-terminal cytosolic tail (CT) of Arabidopsis endomembrane protein 12 (AtEMP12) as being a crucial Golgi retention mechanism for AtEMP12. This KXD/E motif is conserved in CTs of all EMPs found in plants, yeast, and humans and is also present in hundreds of other membrane proteins. Here, by cloning selective EMP isoforms from plants, yeast, and mammals, we study the localizations of EMPs in different expression systems, since there are contradictory reports on the localizations of EMPs. We show that the N-terminal and C-terminal GFP-tagged EMP fusions are localized to Golgi and post-Golgi compartments, respectively, in plant, yeast, and mammalian cells. In vitro pull-down assay further proves the interaction of the KXD/E motif with COPI coatomer in yeast. COPI loss of function in yeast and plants causes mislocalization of EMPs or KXD/E motif-containing proteins to vacuole. Ultrastructural studies further show that RNA interference (RNAi) knockdown of coatomer expression in transgenic Arabidopsis plants causes severe morphological changes in the Golgi. Taken together, our results demonstrate that N-terminal GFP fusions reflect the real localization of EMPs, and KXD/E is a conserved motif in COPI interaction and Golgi retention in eukaryotes. PMID:26378254

  17. A dominant negative mutation in the conserved RNA helicase motif 'SAT' causes splicing factor PRP2 to stall in spliceosomes.

    PubMed Central

    Plumpton, M; McGarvey, M; Beggs, J D

    1994-01-01

    To characterize sequences in the RNA helicase-like PRP2 protein of Saccharomyces cerevisiae that are essential for its function in pre-mRNA splicing, a pool of random PRP2 mutants was generated. A dominant negative allele was isolated which, when overexpressed in a wild-type yeast strain, inhibited cell growth by causing a defect in pre-mRNA splicing. This defect was partially alleviated by simultaneous co-overexpression of wild-type PRP2. The dominant negative PRP2 protein inhibited splicing in vitro and caused the accumulation of stalled splicing complexes. Immunoprecipitation with anti-PRP2 antibodies confirmed that dominant negative PRP2 protein competed with its wild-type counterpart for interaction with spliceosomes, with which the mutant protein remained associated. The PRP2-dn1 mutation led to a single amino acid change within the conserved SAT motif that in the prototype helicase eIF-4A is required for RNA unwinding. Purified dominant negative PRP2 protein had approximately 40% of the wild-type level of RNA-stimulated ATPase activity. As ATPase activity was reduced only slightly, but splicing activity was abolished, we propose that the dominant negative phenotype is due primarily to a defect in the putative RNA helicase activity of PRP2 protein. Images PMID:8112301

  18. CDR3β sequence motifs regulate autoreactivity of human invariant NKT cell receptors.

    PubMed

    Chamoto, Kenji; Guo, Tingxi; Imataki, Osamu; Tanaka, Makito; Nakatsugawa, Munehide; Ochi, Toshiki; Yamashita, Yuki; Saito, Akiko M; Saito, Toshiki I; Butler, Marcus O; Hirano, Naoto

    2016-04-01

    Invariant natural killer T (iNKT) cells are a subset of T lymphocytes that recognize lipid ligands presented by monomorphic CD1d. Human iNKT T cell receptor (TCR) is largely composed of invariant Vα24 (Vα24i) TCRα chain and semi-variant Vβ11 TCRβ chain, where complementarity-determining region (CDR)3β is the sole variable region. One of the characteristic features of iNKT cells is that they retain autoreactivity even after the thymic selection. However, the molecular features of human iNKT TCR CDR3β sequences that regulate autoreactivity remain unknown. Since the numbers of iNKT cells with detectable autoreactivity in peripheral blood is limited, we introduced the Vα24i gene into peripheral T cells and generated a de novo human iNKT TCR repertoire. By stimulating the transfected T cells with artificial antigen presenting cells (aAPCs) presenting self-ligands, we enriched strongly autoreactive iNKT TCRs and isolated a large panel of human iNKT TCRs with a broad range autoreactivity. From this panel of unique iNKT TCRs, we deciphered three CDR3β sequence motifs frequently encoded by strongly-autoreactive iNKT TCRs: a VD region with 2 or more acidic amino acids, usage of the Jβ2-5 allele, and a CDR3β region of 13 amino acids in length. iNKT TCRs encoding 2 or 3 sequence motifs also exhibit higher autoreactivity than those encoding 0 or 1 motifs. These data facilitate our understanding of the molecular basis for human iNKT cell autoreactivity involved in immune responses associated with human disease. PMID:26748722

  19. Identification of sequence motifs involved in Dengue virus-host interactions.

    PubMed

    Asnet Mary, J; Paramasivan, R; Shenbagarathai, R

    2016-03-01

    Dengue fever is a rapidly spreading mosquito-borne virus infection, which remains a serious global public health problem. As there is no specific treatment or commercial vaccine available for effective control of the disease, the attempts on developing novel control strategies are underway. Viruses utilize the surface receptor proteins of host to enter into the cells. Though various proteins were said to be receptors of Dengue virus (DENV) using Virus Overlay Protein Binding Assay, the precise interaction between DENV and host is not explored. Understanding the structural features of domain III envelope glycoprotein would help in developing efficient antiviral inhibitors. Therefore, an attempt was made to identify the sequence motifs present in domain III envelope glycoprotein of Dengue virus. Computational analysis revealed that the NGR motif is present in the domain III envelope glycoprotein of DENV-1 and DENV-3. Similarly, DENV-1, DENV-2 and DENV-4 were found to contain Yxxphi motif which is a tyrosine-based sorting signal responsible for the interaction with a mu subunit of adaptor protein complex. High-throughput virtual screening resulted in five compounds as lead molecules based on glide score, which ranges from -4.664 to -6.52 kcal/Mol. This computational prediction provides an additional tool for understanding the virus-host interactions and helps to identify potential targets in the host. Further, experimental evidence is warranted to confirm the virus-host interactions and also inhibitory activity of reported lead compounds. PMID:25905427

  20. A Conserved Ectodomain-Transmembrane Domain Linker Motif Tunes the Allosteric Regulation of Cell Surface Receptors.

    PubMed

    Schmidt, Thomas; Ye, Feng; Situ, Alan J; An, Woojin; Ginsberg, Mark H; Ulmer, Tobias S

    2016-08-19

    In many families of cell surface receptors, a single transmembrane (TM) α-helix separates ecto- and cytosolic domains. A defined coupling of ecto- and TM domains must be essential to allosteric receptor regulation but remains little understood. Here, we characterize the linker structure, dynamics, and resulting ecto-TM domain coupling of integrin αIIb in model constructs and relate it to other integrin α subunits by mutagenesis. Cellular integrin activation assays subsequently validate the findings in intact receptors. Our results indicate a flexible yet carefully tuned ecto-TM coupling that modulates the signaling threshold of integrin receptors. Interestingly, a proline at the N-terminal TM helix border, termed NBP, is critical to linker flexibility in integrins. NBP is further predicted in 21% of human single-pass TM proteins and validated in cytokine receptors by the TM domain structure of the cytokine receptor common subunit β and its P441A-substituted variant. Thus, NBP is a conserved uncoupling motif of the ecto-TM domain transition and the degree of ecto-TM domain coupling represents an important parameter in the allosteric regulation of diverse cell surface receptors. PMID:27365391

  1. A small conserved motif supports polarity augmentation of Shigella flexneri IcsA.

    PubMed

    Doyle, Matthew Thomas; Grabowicz, Marcin; Morona, Renato

    2015-11-01

    The rod-shaped enteric intracellular pathogen Shigella flexneri and other Shigella species are the causative agents of bacillary dysentery. S. flexneri are able to spread within the epithelial lining of the gut, resulting in lesion formation, cramps and bloody stools. The outer membrane protein IcsA is essential for this spreading process. IcsA is the initiator of an actin-based form of motility whereby it allows the formation of a filamentous actin 'tail' at the bacterial pole. Importantly, IcsA is specifically positioned at the bacterial pole such that this process occurs asymmetrically. The mechanism of IcsA polarity is not completely understood, but it appears to be a multifactorial process involving factors intrinsic to IcsA and other regulating factors. In this study, we further investigated IcsA polarization by its intramolecular N-terminal and central polar-targeting (PT) regions (nPT and cPT regions, respectively). The results obtained support a role in polar localization for the cPT region and contend the role of the nPT region. We identified single IcsA residues that have measurable impacts on IcsA polarity augmentation, resulting in decreased S. flexneri sprading efficiency. Intriguingly, regions and residues involved in PT clustered around a highly conserved motif which may provide a functional scaffold for polarity-augmenting residues. How these results fit with the current model of IcsA polarity determination is discussed. PMID:26315462

  2. Multiple cellular proteins interact with LEDGF/p75 through a conserved unstructured consensus motif.

    PubMed

    Tesina, Petr; Čermáková, Kateřina; Hořejší, Magdalena; Procházková, Kateřina; Fábry, Milan; Sharma, Subhalakshmi; Christ, Frauke; Demeulemeester, Jonas; Debyser, Zeger; De Rijck, Jan; Veverka, Václav; Řezáčová, Pavlína

    2015-01-01

    Lens epithelium-derived growth factor (LEDGF/p75) is an epigenetic reader and attractive therapeutic target involved in HIV integration and the development of mixed lineage leukaemia (MLL1) fusion-driven leukaemia. Besides HIV integrase and the MLL1-menin complex, LEDGF/p75 interacts with various cellular proteins via its integrase binding domain (IBD). Here we present structural characterization of IBD interactions with transcriptional repressor JPO2 and domesticated transposase PogZ, and show that the PogZ interaction is nearly identical to the interaction of LEDGF/p75 with MLL1. The interaction with the IBD is maintained by an intrinsically disordered IBD-binding motif (IBM) common to all known cellular partners of LEDGF/p75. In addition, based on IBM conservation, we identify and validate IWS1 as a novel LEDGF/p75 interaction partner. Our results also reveal how HIV integrase efficiently displaces cellular binding partners from LEDGF/p75. Finally, the similar binding modes of LEDGF/p75 interaction partners represent a new challenge for the development of selective interaction inhibitors. PMID:26245978

  3. Conserved phosphoprotein interaction motif is functionally interchangeable between ataxin-7 and arrestins.

    PubMed

    Mushegian, A R; Vishnivetskiy, S A; Gurevich, V V

    2000-06-13

    Olivopontocerebellar atrophy with retinal degeneration is a hereditary neurodegenerative disorder that belongs to the subtype II of the autosomal dominant cerebellar ataxias and is characterized by early-onset cerebellar and macular degeneration preceded by diagnostically useful tritan colorblindness. The gene mutated in the disease (SCA7) has been mapped to chromosome 3p12-13.5, and positional cloning identified the cause of the disease as CAG repeat expansion in this gene. The SCA7 gene product, ataxin-7, is an 897 amino acid protein with an expandable polyglutamine tract close to its N-terminus. No clues to ataxin-7 function have been obtained from sequence database searches. Here we report that ataxin-7 has a motif of ca. 50 amino acids, related to the phosphate-binding site of arrestins. To test the relevance of this sequence similarity, we introduced the putative ataxin-7 phosphate-binding site into visual arrestin and beta-arrestin. Both chimeric arrestins retain receptor-binding affinity and show characteristic high selectivity for phosphorylated activated forms of rhodopsin and beta-adrenergic receptor, respectively. Although the insertion of a Gly residue (absent in arrestins but present in the putative phosphate-binding site of ataxin-7) disrupts the function of visual arrestin-ataxin-7 chimera, it enhances the function of beta-arrestin-ataxin-7 chimera. Taken together, our data suggest that the arrestin-like site in the ataxin-7 sequence is a functional phosphate-binding site. The presence of the phosphate-binding site in ataxin-7 suggests that this protein may be involved in phosphorylation-dependent binding to its protein partner(s) in the cell. PMID:10841760

  4. Role of two sequence motifs of mesencephalic astrocyte-derived neurotrophic factor in its survival-promoting activity

    PubMed Central

    Mätlik, K; Yu, Li-ying; Eesmaa, A; Hellman, M; Lindholm, P; Peränen, J; Galli, E; Anttila, J; Saarma, M; Permi, P; Airavaara, M; Arumäe, U

    2015-01-01

    Mesencephalic astrocyte-derived neurotrophic factor (MANF) is a prosurvival protein that protects the cells when applied intracellularly in vitro or extracellularly in vivo. Its protective mechanisms are poorly known. Here we studied the role of two short sequence motifs within the carboxy-(C) terminal domain of MANF in its neuroprotective activity: the CKGC sequence (a CXXC motif) that could be involved in redox reactions, and the C-terminal RTDL sequence, an endoplasmic reticulum (ER) retention signal. We mutated these motifs and analyzed the antiapoptotic effect and intracellular localization of these mutants of MANF when overexpressed in cultured sympathetic or sensory neurons. As an in vivo model for studying the effect of these mutants after their extracellular application, we used the rat model of cerebral ischemia. Even though we found no evidence for oxidoreductase activity of MANF, the mutation of CXXC motif completely abolished its protective effect, showing that this motif is crucial for both MANF's intracellular and extracellular activity. The RTDL motif was not needed for the neuroprotective activity of MANF after its extracellular application in the stroke model in vivo. However, in vitro the deletion of RTDL motif inactivated MANF in the sympathetic neurons where the mutant protein localized to Golgi, but not in the sensory neurons where the mutant localized to the ER, showing that intracellular MANF protects these peripheral neurons in vitro only when localized to the ER. PMID:26720341

  5. Evolutionary and taxonomic implications of conserved structural motifs between picornaviruses and insect picorna-like viruses.

    PubMed

    Liljas, L; Tate, J; Lin, T; Christian, P; Johnson, J E

    2002-01-01

    A comparison of the recently determined structure of an insect picorna-like virus, Cricket paralysis virus (CrPV), with that of the mammalian picornaviruses shows that several structural features are highly conserved between these viruses. These conserved features include the topology of the coat proteins, the conformation of most loops, and the general arrangement of the internally located N-terminal arms of the coat proteins. The conformational conservation of the N-termini of the three major coat proteins between CrPV and the picornaviruses suggests a putative ancestral T = 3 virus. Comparisons of the genome structure and amino-acid sequence of the coat proteins of CrPV with a number of other insect picorna-like viruses show that most of them belong to a novel group, recently given the interim name Cricket paralysis-like viruses. Two other insect picorna-like viruses, Infectious flacherie virus (IFV) and Sacbrood virus (SBV), for which the genome sequences have recently been determined, have very different coat protein sequences and a genome organization more like the picornaviruses. However, the position of the small VP4 protein in the structural protein polyprotein as well as the mechanism for its cleavage from VP3 upon assembly strongly suggests an evolutionary link to the "Cricket paralysis-like viruses". We propose that the picornaviruses, Cricket paralysis-like viruses and IFV/SBV group are a natural assemblage. The ancestor for this assemblage had a structure based upon the CrPV/picornavirus paradigm and a genome encoding a single major coat protein; gene duplication and rearrangements have subsequently produced the viruses that we observe today. We also discuss the possible relatives of the proposed assemblage and the likely implications of future structural studies that may be carried out on the putative relatives. PMID:11855636

  6. A search for conserved sequences in coding regions reveals that the let-7 microRNA targets Dicer within its coding sequence

    PubMed Central

    Forman, Joshua J.; Legesse-Miller, Aster; Coller, Hilary A.

    2008-01-01

    Recognition sites for microRNAs (miRNAs) have been reported to be located in the 3′ untranslated regions of transcripts. In a computational screen for highly conserved motifs within coding regions, we found an excess of sequences conserved at the nucleotide level within coding regions in the human genome, the highest scoring of which are enriched for miRNA target sequences. To validate our results, we experimentally demonstrated that the let-7 miRNA directly targets the miRNA-processing enzyme Dicer within its coding sequence, thus establishing a mechanism for a miRNA/Dicer autoregulatory negative feedback loop. We also found computational evidence to suggest that miRNA target sites in coding regions and 3′ UTRs may differ in mechanism. This work demonstrates that miRNAs can directly target transcripts within their coding region in animals, and it suggests that a complete search for the regulatory targets of miRNAs should be expanded to include genes with recognition sites within their coding regions. As more genomes are sequenced, the methodological approach that we used for identifying motifs with high sequence conservation will be increasingly valuable for detecting functional sequence motifs within coding regions. PMID:18812516

  7. qPMS7: a fast algorithm for finding (ℓ, d)-motifs in DNA and protein sequences.

    PubMed

    Dinh, Hieu; Rajasekaran, Sanguthevar; Davila, Jaime

    2012-01-01

    Detection of rare events happening in a set of DNA/protein sequences could lead to new biological discoveries. One kind of such rare events is the presence of patterns called motifs in DNA/protein sequences. Finding motifs is a challenging problem since the general version of motif search has been proven to be intractable. Motifs discovery is an important problem in biology. For example, it is useful in the detection of transcription factor binding sites and transcriptional regulatory elements that are very crucial in understanding gene function, human disease, drug design, etc. Many versions of the motif search problem have been proposed in the literature. One such is the (ℓ, d)-motif search (or Planted Motif Search (PMS)). A generalized version of the PMS problem, namely, Quorum Planted Motif Search (qPMS), is shown to accurately model motifs in real data. However, solving the qPMS problem is an extremely difficult task because a special case of it, the PMS Problem, is already NP-hard, which means that any algorithm solving it can be expected to take exponential time in the worse case scenario. In this paper, we propose a novel algorithm named qPMS7 that tackles the qPMS problem on real data as well as challenging instances. Experimental results show that our Algorithm qPMS7 is on an average 5 times faster than the state-of-art algorithm. The executable program of Algorithm qPMS7 is freely available on the web at http://pms.engr.uconn.edu/downloads/qPMS7.zip. Our online motif discovery tools that use Algorithm qPMS7 are freely available at http://pms.engr.uconn.edu or http://motifsearch.com. PMID:22848493

  8. Mutational analysis of a conserved motif of Agrobacterium tumefaciens VirD2.

    PubMed

    Vogel, A M; Yoon, J; Das, A

    1995-10-25

    The VirD2 polypeptide from Agrobacterium tumefaciens, in the presence of VirD1, introduces a site- and strand-specific nick at the T-DNA borders. A similar reaction at the origin of transfer (oriT) of plasmids is essential for plasmid transfer by bacterial conjugation. A comparison of protein sequences of VirD2 and its functional homologs in bacterial conjugation and in rolling circle replication revealed that they share a conserved 14 residue segment, HxDxxx(P/u)HuHuuux [residues 126-139 of VirD2; Ilyina, T.V. and Koonin, E.V. (1992) Nucleic Acids Res. 20, 3279-3285]. A mutational approach was used to test the role of these residues in the endonuclease activity of VirD2. The results demonstrated that the two invariant histidine residues (H133 and H135) are essential for activity. Mutations at three sites, histidine 126, aspartic acid 128 and aspartic acid 130, that are conserved in a subfamily of the plasmid mobilization proteins, led to the loss of VirD2 activity. Aspartic acid at position 130, could be substituted with glutamic acid and to a much lesser extent, with tyrosine. In contrast, another conserved residue, asparagine 139, tolerated many different amino acid substitutions. The non-conserved residues, arginine 129, proline 132 and leucine 134, were also found to be important for function. Isolation of null mutations that map throughout this conserved domain confirm the hypothesis that this region is essential for function. PMID:7479069

  9. Quadfinder: server for identification and analysis of quadruplex-forming motifs in nucleotide sequences

    PubMed Central

    Scaria, Vinod; Hariharan, Manoj; Arora, Amit; Maiti, Souvik

    2006-01-01

    G-quadruplex secondary structures, which play a structural role in repetitive DNA such as telomeres, may also play a functional role at other genomic locations as targetable regulatory elements which control gene expression. The recent interest in application of quadruplexes in biological systems prompted us to develop a tool for the identification and analysis of quadruplex-forming nucleotide sequences especially in the RNA. Here we present Quadfinder, an online server for prediction and bioinformatics of uni-molecular quadruplex-forming nucleotide sequences. The server is designed to be user-friendly and needs minimal intervention by the user, while providing flexibility of defining the variants of the motif. The server is freely available at URL . PMID:16845097

  10. Conserved motifs II to VI of DNA helicase II from Escherichia coli are all required for biological activity.

    PubMed Central

    Zhang, G; Deng, E; Baugh, L R; Hamilton, C M; Maples, V F; Kushner, S R

    1997-01-01

    There are seven conserved motifs (IA, IB, and II to VI) in DNA helicase II of Escherichia coli that have high homology among a large family of proteins involved in DNA metabolism. To address the functional importance of motifs II to VI, we employed site-directed mutagenesis to replace the charged amino acid residues in each motif with alanines. Cells carrying these mutant alleles exhibited higher UV and methyl methanesulfonate sensitivity, increased rates of spontaneous mutagenesis, and elevated levels of homologous recombination, indicating defects in both the excision repair and mismatch repair pathways. In addition, we also changed the highly conserved tyrosine(600) in motif VI to phenylalanine (uvrD309, Y600F). This mutant displayed a moderate increase in UV sensitivity but a decrease in spontaneous mutation rate, suggesting that DNA helicase II may have different functions in the two DNA repair pathways. Furthermore, a mutation in domain IV (uvrD307, R284A) significantly reduced the viability of some E. coli K-12 strains at 30 degrees C but not at 37 degrees C. The implications of these observations are discussed. PMID:9393722

  11. Functionally conserved enhancers with divergent sequences in distant vertebrates

    SciTech Connect

    Yang, Song; Oksenberg, Nir; Takayama, Sachiko; Heo, Seok -Jin; Poliakov, Alexander; Ahituv, Nadav; Dubchak, Inna; Boffelli, Dario

    2015-10-30

    To examine the contributions of sequence and function conservation in the evolution of enhancers, we systematically identified enhancers whose sequences are not conserved among distant groups of vertebrate species, but have homologous function and are likely to be derived from a common ancestral sequence. In conclusion, our approach combined comparative genomics and epigenomics to identify potential enhancer sequences in the genomes of three groups of distantly related vertebrate species.

  12. Using Weeder, Pscan, and PscanChIP for the Discovery of Enriched Transcription Factor Binding Site Motifs in Nucleotide Sequences.

    PubMed

    Zambelli, Federico; Pesole, Graziano; Pavesi, Giulio

    2014-01-01

    One of the greatest challenges facing modern molecular biology is understanding the complex mechanisms regulating gene expression. A fundamental step in this process requires the characterization of sequence motifs involved in the regulation of gene expression at transcriptional and post-transcriptional levels. In particular, transcription is modulated by the interaction of transcription factors (TFs) with their corresponding binding sites. Weeder, Pscan, and PscanChIP are software tools freely available for noncommercial users as a stand-alone or Web-based applications for the automatic discovery of conserved motifs in a set of DNA sequences likely to be bound by the same TFs. Input for the tools can be promoter sequences from co-expressed or co-regulated genes (for which Weeder and Pscan are suitable), or regions identified through genome wide ChIP-seq or similar experiments (Weeder and PscanChIP). The motifs are either found by a de novo approach (Weeder) or by using descriptors of the binding specificity of TFs (Pscan and PscanChIP). PMID:25199791

  13. Sequence Analysis and Domain Motifs in the Porcine Skin Decorin Glycosaminoglycan Chain*

    PubMed Central

    Zhao, Xue; Yang, Bo; Solakylidirim, Kemal; Joo, Eun Ji; Toida, Toshihiko; Higashi, Kyohei; Linhardt, Robert J.; Li, Lingyun

    2013-01-01

    Decorin proteoglycan is comprised of a core protein containing a single O-linked dermatan sulfate/chondroitin sulfate glycosaminoglycan (GAG) chain. Although the sequence of the decorin core protein is determined by the gene encoding its structure, the structure of its GAG chain is determined in the Golgi. The recent application of modern MS to bikunin, a far simpler chondroitin sulfate proteoglycans, suggests that it has a single or small number of defined sequences. On this basis, a similar approach to sequence the decorin of porcine skin much larger and more structurally complex dermatan sulfate/chondroitin sulfate GAG chain was undertaken. This approach resulted in information on the consistency/variability of its linkage region at the reducing end of the GAG chain, its iduronic acid-rich domain, glucuronic acid-rich domain, and non-reducing end. A general motif for the porcine skin decorin GAG chain was established. A single small decorin GAG chain was sequenced using MS/MS analysis. The data obtained in the study suggest that the decorin GAG chain has a small or a limited number of sequences. PMID:23423381

  14. DILIMOT: discovery of linear motifs in proteins.

    PubMed

    Neduva, Victor; Russell, Robert B

    2006-07-01

    Discovery of protein functional motifs is critical in modern biology. Small segments of 3-10 residues play critical roles in protein interactions, post-translational modifications and trafficking. DILIMOT (DIscovery of LInear MOTifs) is a server for the prediction of these short linear motifs within a set of proteins. Given a set of sequences sharing a common functional feature (e.g. interaction partner or localization) the method finds statistically over-represented motifs likely to be responsible for it. The input sequences are first passed through a set of filters to remove regions unlikely to contain instances of linear motifs. Motifs are then found in the remaining sequence and ranked according to a statistic that measure over-representation and conservation across homologues in related species. The results are displayed via a visual interface for easy perusal. The server is available at http://dilimot.embl.de. PMID:16845024

  15. Temporal motifs reveal homophily, gender-specific patterns, and group talk in call sequences.

    PubMed

    Kovanen, Lauri; Kaski, Kimmo; Kertész, János; Saramäki, Jari

    2013-11-01

    Recent studies on electronic communication records have shown that human communication has complex temporal structure. We study how communication patterns that involve multiple individuals are affected by attributes such as sex and age. To this end, we represent the communication records as a colored temporal network where node color is used to represent individuals' attributes, and identify patterns known as temporal motifs. We then construct a null model for the occurrence of temporal motifs that takes into account the interaction frequencies and connectivity between nodes of different colors. This null model allows us to detect significant patterns in call sequences that cannot be observed in a static network that uses interaction frequencies as link weights. We find sex-related differences in communication patterns in a large dataset of mobile phone records and show the existence of temporal homophily, the tendency of similar individuals to participate in communication patterns beyond what would be expected on the basis of their average interaction frequencies. We also show that temporal patterns differ between dense and sparse neighborhoods in the network. Because also this result is independent of interaction frequencies, it can be seen as an extension of Granovetter's hypothesis to temporal networks. PMID:24145424

  16. ZFP57 recognizes multiple and closely spaced sequence motif variants to maintain repressive epigenetic marks in mouse embryonic stem cells

    PubMed Central

    Anvar, Zahra; Cammisa, Marco; Riso, Vincenzo; Baglivo, Ilaria; Kukreja, Harpreet; Sparago, Angela; Girardot, Michael; Lad, Shraddha; De Feis, Italia; Cerrato, Flavia; Angelini, Claudia; Feil, Robert; Pedone, Paolo V.; Grimaldi, Giovanna; Riccio, Andrea

    2016-01-01

    Imprinting Control Regions (ICRs) need to maintain their parental allele-specific DNA methylation during early embryogenesis despite genome-wide demethylation and subsequent de novo methylation. ZFP57 and KAP1 are both required for maintaining the repressive DNA methylation and H3-lysine-9-trimethylation (H3K9me3) at ICRs. In vitro, ZFP57 binds a specific hexanucleotide motif that is enriched at its genomic binding sites. We now demonstrate in mouse embryonic stem cells (ESCs) that SNPs disrupting closely-spaced hexanucleotide motifs are associated with lack of ZFP57 binding and H3K9me3 enrichment. Through a transgenic approach in mouse ESCs, we further demonstrate that an ICR fragment containing three ZFP57 motif sequences recapitulates the original methylated or unmethylated status when integrated into the genome at an ectopic position. Mutation of Zfp57 or the hexanucleotide motifs led to loss of ZFP57 binding and DNA methylation of the transgene. Finally, we identified a sequence variant of the hexanucleotide motif that interacts with ZFP57 both in vivo and in vitro. The presence of multiple and closely located copies of ZFP57 motif variants emerges as a distinct characteristic that is required for the faithful maintenance of repressive epigenetic marks at ICRs and other ZFP57 binding sites. PMID:26481358

  17. Sequence analysis of the L protein of the Ebola 2014 outbreak: Insight into conserved regions and mutations.

    PubMed

    Ayub, Gohar; Waheed, Yasir

    2016-06-01

    The 2014 Ebola outbreak was one of the largest that have occurred; it started in Guinea and spread to Nigeria, Liberia and Sierra Leone. Phylogenetic analysis of the current virus species indicated that this outbreak is the result of a divergent lineage of the Zaire ebolavirus. The L protein of Ebola virus (EBOV) is the catalytic subunit of the RNA‑dependent RNA polymerase complex, which, with VP35, is key for the replication and transcription of viral RNA. Earlier sequence analysis demonstrated that the L protein of all non‑segmented negative‑sense (NNS) RNA viruses consists of six domains containing conserved functional motifs. The aim of the present study was to analyze the presence of these motifs in 2014 EBOV isolates, highlight their function and how they may contribute to the overall pathogenicity of the isolates. For this purpose, 81 2014 EBOV L protein sequences were aligned with 475 other NNS RNA viruses, including Paramyxoviridae and Rhabdoviridae viruses. Phylogenetic analysis of all EBOV outbreak L protein sequences was also performed. Analysis of the amino acid substitutions in the 2014 EBOV outbreak was conducted using sequence analysis. The alignment demonstrated the presence of previously conserved motifs in the 2014 EBOV isolates and novel residues. Notably, all the mutations identified in the 2014 EBOV isolates were tolerant, they were pathogenic with certain examples occurring within previously determined functional conserved motifs, possibly altering viral pathogenicity, replication and virulence. The phylogenetic analysis demonstrated that all sequences with the exception of the 2014 EBOV sequences were clustered together. The 2014 EBOV outbreak has acquired a great number of mutations, which may explain the reasons behind this unprecedented outbreak. Certain residues critical to the function of the polymerase remain conserved and may be targets for the development of antiviral therapeutic agents. PMID:27082438

  18. Exceptional motifs in different Markov chain models for a statistical analysis of DNA sequences.

    PubMed

    Schbath, S; Prum, B; de Turckheim, E

    1995-01-01

    Identifying exceptional motifs is often used for extracting information from long DNA sequences. The two difficulties of the method are the choice of the model that defines the expected frequencies of words and the approximation of the variance of the difference T(W) between the number of occurrences of a word W and its estimation. We consider here different Markov chain models, either with stationary or periodic transition probabilities. We estimate the variance of the difference T(W) by the conditional variance of the number of occurrences of W given the oligonucleotides counts that define the model. Two applications show how to use asymptotically standard normal statistics associated with the counts to describe a given sequence in terms of its outlying words. Sequences of Escherichia coli and of Bacillus subtilis are compared with respect to their exceptional tri- and tetranucleotides. For both bacteria, exceptional 3-words are mainly found in the coding frame. E. coli palindrome counts are analyzed in different models, showing that many overabundant words are one-letter mutations of avoided palindromes. PMID:8521272

  19. The nature of actinomycin D binding to d(AACCAXYG) sequence motifs

    PubMed Central

    Chen, Fu-Ming; Sha, Feng; Chin, Ko-Hsin; Chou, Shan-Ho

    2004-01-01

    Earlier studies by others had indicated that actinomycin D (ACTD) binds well to d(AACCATAG) and the end sequence TAG-3′ is essential for its strong binding. In an effort to verify these assertions and to uncover other possible strong ACTD binding sequences as well as to elucidate the nature of their binding, systematic studies have been carried out with oligomers of d(AACCAXYG) sequence motifs, where X and Y can be any DNA base. The results indicate that in addition to TAG-3′, oligomers ending with XAG-3′ and XCG-3′ all provide binding constants ≥1 × 107 M–1 and even sequences ending with XTG-3′ and XGG-3′ exhibit binding affinities in the range 1–8 × 106 M–1. The nature of the strong ACTD affinity of the sequences d(A1A2C3C4A5X6Y7G8) was delineated via comparative binding studies of d(AACCAAAG), d(AGCCAAAG) and their base substituted derivatives. Two binding modes are proposed to coexist, with the major component consisting of the 3′-terminus G base folding back to base pair with C4 and the ACTD inserting at A2C3C4 by looping out the C3 while both faces of the chromophore are stacked by A and G bases, respectively. The minor mode is for the G to base pair with C3 and to have the same A/chromophore/G stacking but without a looped out base. These assertions are supported by induced circular dichroic and fluorescence spectral measurements. PMID:14715925

  20. Identification of Internal Transcribed Spacer Sequence Motifs in Truffles: a First Step toward Their DNA Bar Coding▿ †

    PubMed Central

    El Karkouri, Khalid; Murat, Claude; Zampieri, Elisa; Bonfante, Paola

    2007-01-01

    This work presents DNA sequence motifs from the internal transcribed spacer (ITS) of the nuclear rRNA repeat unit which are useful for the identification of five European and Asiatic truffles (Tuber magnatum, T. melanosporum, T. indicum, T. aestivum, and T. mesentericum). Truffles are edible mycorrhizal ascomycetes that show similar morphological characteristics but that have distinct organoleptic and economic values. A total of 36 out of 46 ITS1 or ITS2 sequence motifs have allowed an accurate in silico distinction of the five truffles to be made (i.e., by pattern matching and/or BLAST analysis on downloaded GenBank sequences and directly against GenBank databases). The motifs considered the intraspecific genetic variability of each species, including rare haplotypes, and assigned their respective species from either the ascocarps or ectomycorrhizas. The data indicate that short ITS1 or ITS2 motifs (≤50 bp in size) can be considered promising tools for truffle species identification. A dot blot hybridization analysis of T. magnatum and T. melanosporum compared with other close relatives or distant lineages allowed at least one highly specific motif to be identified for each species. These results were confirmed in a blind test which included new field isolates. The current work has provided a reliable new tool for a truffle oligonucleotide bar code and identification in ecological and evolutionary studies. PMID:17601808

  1. Identification of internal transcribed spacer sequence motifs in truffles: a first step toward their DNA bar coding.

    PubMed

    El Karkouri, Khalid; Murat, Claude; Zampieri, Elisa; Bonfante, Paola

    2007-08-01

    This work presents DNA sequence motifs from the internal transcribed spacer (ITS) of the nuclear rRNA repeat unit which are useful for the identification of five European and Asiatic truffles (Tuber magnatum, T. melanosporum, T. indicum, T. aestivum, and T. mesentericum). Truffles are edible mycorrhizal ascomycetes that show similar morphological characteristics but that have distinct organoleptic and economic values. A total of 36 out of 46 ITS1 or ITS2 sequence motifs have allowed an accurate in silico distinction of the five truffles to be made (i.e., by pattern matching and/or BLAST analysis on downloaded GenBank sequences and directly against GenBank databases). The motifs considered the intraspecific genetic variability of each species, including rare haplotypes, and assigned their respective species from either the ascocarps or ectomycorrhizas. The data indicate that short ITS1 or ITS2 motifs (< or = 50 bp in size) can be considered promising tools for truffle species identification. A dot blot hybridization analysis of T. magnatum and T. melanosporum compared with other close relatives or distant lineages allowed at least one highly specific motif to be identified for each species. These results were confirmed in a blind test which included new field isolates. The current work has provided a reliable new tool for a truffle oligonucleotide bar code and identification in ecological and evolutionary studies. PMID:17601808

  2. miRNA-mediated deadenylation is orchestrated by GW182 through two conserved motifs that interact with CCR4-NOT.

    PubMed

    Fabian, Marc R; Cieplak, Maja K; Frank, Filipp; Morita, Masahiro; Green, Jonathan; Srikumar, Tharan; Nagar, Bhushan; Yamamoto, Tadashi; Raught, Brian; Duchaine, Thomas F; Sonenberg, Nahum

    2011-11-01

    miRNAs recruit the miRNA-induced silencing complex (miRISC), which includes Argonaute and GW182 as core proteins. GW182 proteins effect translational repression and deadenylation of target mRNAs. However, the molecular mechanisms of GW182-mediated repression remain obscure. We show here that human GW182 independently interacts with the PAN2-PAN3 and CCR4-NOT deadenylase complexes. Interaction of GW182 with CCR4-NOT is mediated by two newly discovered phylogenetically conserved motifs. Although either motif is sufficient to bind CCR4-NOT, only one of them can promote processive deadenylation of target mRNAs. Thus, GW182 serves as both a platform that recruits deadenylases and as a deadenylase coactivator that facilitates the removal of the poly(A) tail by CCR4-NOT. PMID:21984185

  3. A conserved motif in transmembrane helix 1 of diphtheria toxin mediates catalytic domain delivery to the cytosol

    PubMed Central

    Ratts, Ryan; Trujillo, Carolina; Bharti, Ajit; vanderSpek, Johanna; Harrison, Robert; Murphy, John R.

    2005-01-01

    A 10-aa motif in transmembrane helix 1 of diphtheria toxin that is conserved in anthrax edema factor, anthrax lethal factor, and botulinum neurotoxin serotypes A, C, and D was identified by blast, clustal w, and meme computational analysis. Using the diphtheria toxin-related fusion protein toxin DAB389IL-2, we demonstrate that introduction of the L221E mutation into a highly conserved residue within this motif results in a nontoxic catalytic domain translocation deficient phenotype. To further probe the function of this motif in the process by which the catalytic domain is delivered from the lumen of early endosomes to the cytosol, we constructed a gene encoding a portion of diphtheria toxin transmembrane helix 1, T1, which carries the motif and is expressed from a CMV promoter. We then isolated stable transfectants of Hut102/6TG cells that express the T1 peptide, Hut102/6TG-T1. In contrast to the parental cell line, Hut102/6TG-T1 cells are ca. 104-fold more resistant to the fusion protein toxin. This resistance is completely reversed by coexpression of small interfering RNA directed against the gene encoding the T1 peptide in Hut102/6TG-T1 cells. We further demonstrate by GST-DT140-271 pull-down experiments in the presence and absence of synthetic T1 peptides the specific binding of coatomer protein complex subunit β to this region of the diphtheria toxin transmembrane domain. PMID:16230620

  4. Protospacer Adjacent Motif (PAM)-Distal Sequences Engage CRISPR Cas9 DNA Target Cleavage

    PubMed Central

    Ethier, Sylvain; Schmeing, T. Martin; Dostie, Josée; Pelletier, Jerry

    2014-01-01

    The clustered regularly interspaced short palindromic repeat (CRISPR)-associated enzyme Cas9 is an RNA-guided nuclease that has been widely adapted for genome editing in eukaryotic cells. However, the in vivo target specificity of Cas9 is poorly understood and most studies rely on in silico predictions to define the potential off-target editing spectrum. Using chromatin immunoprecipitation followed by sequencing (ChIP-seq), we delineate the genome-wide binding panorama of catalytically inactive Cas9 directed by two different single guide (sg) RNAs targeting the Trp53 locus. Cas9:sgRNA complexes are able to load onto multiple sites with short seed regions adjacent to 5′NGG3′ protospacer adjacent motifs (PAM). Yet among 43 ChIP-seq sites harboring seed regions analyzed for mutational status, we find editing only at the intended on-target locus and one off-target site. In vitro analysis of target site recognition revealed that interactions between the 5′ end of the guide and PAM-distal target sequences are necessary to efficiently engage Cas9 nucleolytic activity, providing an explanation for why off-target editing is significantly lower than expected from ChIP-seq data. PMID:25275497

  5. Amplification of human papillomavirus DNA sequences by using conserved primers.

    PubMed Central

    Gregoire, L; Arella, M; Campione-Piccardo, J; Lancaster, W D

    1989-01-01

    The polymerase chain reaction has potential for use in the detection of small amounts of human papillomavirus (HPV) viral nucleic acids present in clinical specimens. However, new HPV types for which no probes exist would remain undetected by using type-specific primers for the polymerase chain reaction before hybridization. Primers corresponding to highly conserved HPV sequences may be useful for detecting low amounts of known HPV DNA as well as new HPV types. Here we analyze a pair of primers derived from conserved sequences within the E1 open reading frame for HPV sequence amplification by using the polymerase chain reaction. The longest perfect homology among HPV sequences is a 12-mer within the first exon of E1M. A region of conserved amino acids coded by the E1 open reading frame allowed the detection of another highly conserved region about 850 base pairs downstream. Two 21-mers derived from these conserved regions were used to amplify sequences from all HPV DNAs used as templates. The amplified DNA was shown to be specific for HPV sequences within the E1 open reading frame. DNA from HPVs whose sequences were not available were amplified by using these two primers. HPV DNA sequences in clinical specimens could also be amplified with the primers. Images PMID:2556429

  6. Human immunodeficiency virus type 1 and 2 envelope glycoproteins oligomerize through conserved sequences.

    PubMed Central

    Center, R J; Kemp, B E; Poumbourios, P

    1997-01-01

    Hetero-oligomerization between human immunodeficiency virus type 2 (HIV-2) envelope glycoprotein (Env) truncation mutants and epitope-tagged gp160 is dependent on the presence of gp41 transmembrane protein (TM) amino acids 552 to 589, a putative amphipathic alpha-helical sequence. HIV-2 Env truncation mutants containing this sequence were also able to form cross-type hetero-oligomers with HIV-1 Env. HIV-2/HIV-1 hetero-oligomerization was, however, more sensitive to disruption by mutagenesis or increased temperature. The conservation of the Env oligomerization function of the HIV-1 and HIV-2 alpha-helical sequences suggests that retroviral TM alpha-helical motifs may have a universal role in oligomerization. PMID:9188654

  7. Identification of G and P genotype-specific motifs in the predicted VP7 and VP4 amino acid sequences.

    PubMed

    Ma, Yongping

    2015-12-01

    Equine rotavirus (ERV) strain L338 (G13P[18]) has a unique G and P genotype. However, the evolutionary relationship of L338 with other ERVs is still unknown. Here whole genome analysis of the L338 ERV strain was independently performed. Its genotype constellations were determined as G13-P[18]-I6-R9-C9-M6-A6-N9-T12-E14-H11, confirming previous genotype assignments. The L338 strain only shared the P[18] and I6 genotypes with other ERVs. The nucleotide sequences of the other 9 RNA segments were different from those of cogent genes of all other group A rotavirus (RVA) strains including ERVs and formed unique phylogenetic lineages. The L338 evolutionary footprints were tentatively identified in both VP7 and VP4 amino acid sequences: two regions were found in VP7 and twelve in VP4. The conserved regions shared between L338 and other group A rotavirus strains (RVAs) indicated that L338 was more closely related genomically to animal and human RVAs other than ERVs, suggesting that L338 may not be an endogenous equine RV but have emerged as an interspecies reassortant with other RVA strains. Furthermore, genotype-specific motifs of all 27 G and 37 P types were identified in regions 7-1a (aa 91-100) of VP7 and regions 8-1 (aa146-151) and 8-3 (aa113-118 and 125-135) of VP4 (VP8*). PMID:26321159

  8. Flow Cytometry-assisted Cloning of Specific Sequence Motifs fromComplex 16S ribosomal RNA Gene Libraries.

    SciTech Connect

    Nielsen, J.L.; Schramm, A.; Bernhard, A.E.; van den Engh, G.J.; Stahl, D.A.

    2004-07-21

    A flow cytometry method was developed for rapid screeningand recovery of cloned DNA containing common sequence motifs. Thisapproach, termed fluorescence-activated cell sorting-assisted cloning,was used to recover sequences affiliated with a unique lineage within theBacteroidetes not abundant in a clone library of environmental 16S rRNAgenes. Retrieval and sequence analysis of phylogenetically informativegenes has become a standard cultivation-independent technique toinvestigate microbial diversity in nature (7, 18). Genes encoding the 16SrRNA, because of the relative ease of their selective amplification, havebeen most frequently employed for general diversity surveys (16).Environmental studies have also focused on specific subpopulationsaffiliated with a phylogenetic group or identified by genes encodingspecific metabolic functions (e.g., ammonia oxidation, sulfaterespiration, and nitrate reduction) (8,15,20). However, specificpopulations may be of low abundance (1,23), or the genes encodingspecific metabolic functions may be insufficiently conserved to providepriming sites for general PCR amplification. Three general approacheshave been used to obtain 16S rRNA sequence information from low-abundancepopulations: screening hundreds to thousands of clones in a general 16SrRNA gene library (21), flow cytometric sorting of a subpopulation ofenvironmentally derived cells labeled by fluorescent in situhybridization (FISH) (27), or selective PCR amplification using primersspecific for the subpopulation (2,23). While the first approach is simplytime-consuming and tedious, the second has been restricted to fairlylarge and strongly fluorescent cells from aquatic samples (5, 27). Thethird approach often generates fragments of only a few hundred bases dueto the limited number of specific priming sites. Partial sequenceinformation often degrades analysis, obscuring or distorting thephylogenetic placement of the new sequences (11, 20). A more robustcharacterization of environ

  9. Sequence, structure, and cooperativity in folding of elementary protein structural motifs

    PubMed Central

    Lai, Jason K.; Kubelka, Ginka S.; Kubelka, Jan

    2015-01-01

    Residue-level unfolding of two helix-turn-helix proteins—one naturally occurring and one de novo designed—is reconstructed from multiple sets of site-specific 13C isotopically edited infrared (IR) and circular dichroism (CD) data using Ising-like statistical-mechanical models. Several model variants are parameterized to test the importance of sequence-specific interactions (approximated by Miyazawa–Jernigan statistical potentials), local structural flexibility (derived from the ensemble of NMR structures), interhelical hydrogen bonds, and native contacts separated by intervening disordered regions (through the Wako–Saitô–Muñoz–Eaton scheme, which disallows such configurations). The models are optimized by directly simulating experimental observables: CD ellipticity at 222 nm for model proteins and their fragments and 13C-amide I′ bands for multiple isotopologues of each protein. We find that data can be quantitatively reproduced by the model that allows two interacting segments flanking a disordered loop (double sequence approximation) and incorporates flexibility in the native contact maps, but neither sequence-specific interactions nor hydrogen bonds are required. The near-identical free energy profiles as a function of the global order parameter are consistent with expected similar folding kinetics for nearly identical structures. However, the predicted folding mechanism for the two motifs is different, reflecting the order of local stability. We introduce free energy profiles for “experimental” reaction coordinates—namely, the degree of local folding as sensed by site-specific 13C-edited IR, which highlight folding heterogeneity and contrast its overall, average description with the detailed, local picture. PMID:26216963

  10. Sequence, structure, and cooperativity in folding of elementary protein structural motifs.

    PubMed

    Lai, Jason K; Kubelka, Ginka S; Kubelka, Jan

    2015-08-11

    Residue-level unfolding of two helix-turn-helix proteins--one naturally occurring and one de novo designed--is reconstructed from multiple sets of site-specific (13)C isotopically edited infrared (IR) and circular dichroism (CD) data using Ising-like statistical-mechanical models. Several model variants are parameterized to test the importance of sequence-specific interactions (approximated by Miyazawa-Jernigan statistical potentials), local structural flexibility (derived from the ensemble of NMR structures), interhelical hydrogen bonds, and native contacts separated by intervening disordered regions (through the Wako-Saitô-Muñoz-Eaton scheme, which disallows such configurations). The models are optimized by directly simulating experimental observables: CD ellipticity at 222 nm for model proteins and their fragments and (13)C-amide I' bands for multiple isotopologues of each protein. We find that data can be quantitatively reproduced by the model that allows two interacting segments flanking a disordered loop (double sequence approximation) and incorporates flexibility in the native contact maps, but neither sequence-specific interactions nor hydrogen bonds are required. The near-identical free energy profiles as a function of the global order parameter are consistent with expected similar folding kinetics for nearly identical structures. However, the predicted folding mechanism for the two motifs is different, reflecting the order of local stability. We introduce free energy profiles for "experimental" reaction coordinates--namely, the degree of local folding as sensed by site-specific (13)C-edited IR, which highlight folding heterogeneity and contrast its overall, average description with the detailed, local picture. PMID:26216963

  11. A conserved motif in JNK/p38-specific MAPK phosphatases as a determinant for JNK1 recognition and inactivation

    PubMed Central

    Liu, Xin; Zhang, Chen-Song; Lu, Chang; Lin, Sheng-Cai; Wu, Jia-Wei; Wang, Zhi-Xin

    2016-01-01

    Mitogen-activated protein kinases (MAPKs), important in a large array of signalling pathways, are tightly controlled by a cascade of protein kinases and by MAPK phosphatases (MKPs). MAPK signalling efficiency and specificity is modulated by protein–protein interactions between individual MAPKs and the docking motifs in cognate binding partners. Two types of docking interactions have been identified: D-motif-mediated interaction and FXF-docking interaction. Here we report the crystal structure of JNK1 bound to the catalytic domain of MKP7 at 2.4-Å resolution, providing high-resolution structural insight into the FXF-docking interaction. The 285FNFL288 segment in MKP7 directly binds to a hydrophobic site on JNK1 that is near the MAPK insertion and helix αG. Biochemical studies further reveal that this highly conserved structural motif is present in all members of the MKP family, and the interaction mode is universal and critical for the MKP-MAPK recognition and biological function. PMID:26988444

  12. Comparative Sequence and Structure Analysis Reveals the Conservation and Diversity of Nucleotide Positions and Their Associated Tertiary Interactions in the Riboswitches

    PubMed Central

    Appasamy, Sri D.; Ramlan, Effirul Ikhwan; Firdaus-Raih, Mohd

    2013-01-01

    The tertiary motifs in complex RNA molecules play vital roles to either stabilize the formation of RNA 3D structure or to provide important biological functionality to the molecule. In order to better understand the roles of these tertiary motifs in riboswitches, we examined 11 representative riboswitch PDB structures for potential agreement of both motif occurrences and conservations. A total of 61 unique tertiary interactions were found in the reference structures. In addition to the expected common A-minor motifs and base-triples mainly involved in linking distant regions the riboswitch structures three highly conserved variants of A-minor interactions called G-minors were found in the SAM-I and FMN riboswitches where they appear to be involved in the recognition of the respective ligand’s functional groups. From our structural survey as well as corresponding structure and sequence alignments, the agreement between motif occurrences and conservations are very prominent across the representative riboswitches. Our analysis provide evidence that some of these tertiary interactions are essential components to form the structure where their sequence positions are conserved despite a high degree of diversity in other parts of the respective riboswitches sequences. This is indicative of a vital role for these tertiary interactions in determining the specific biological function of riboswitch. PMID:24040136

  13. AptaTRACE Elucidates RNA Sequence-Structure Motifs from Selection Trends in HT-SELEX Experiments.

    PubMed

    Dao, Phuong; Hoinka, Jan; Takahashi, Mayumi; Zhou, Jiehua; Ho, Michelle; Wang, Yijie; Costa, Fabrizio; Rossi, John J; Backofen, Rolf; Burnett, John; Przytycka, Teresa M

    2016-07-01

    Aptamers, short RNA or DNA molecules that bind distinct targets with high affinity and specificity, can be identified using high-throughput systematic evolution of ligands by exponential enrichment (HT-SELEX), but scalable analytic tools for understanding sequence-function relationships from diverse HT-SELEX data are not available. Here we present AptaTRACE, a computational approach that leverages the experimental design of the HT-SELEX protocol, RNA secondary structure, and the potential presence of many secondary motifs to identify sequence-structure motifs that show a signature of selection. We apply AptaTRACE to identify nine motifs in C-C chemokine receptor type 7 targeted by aptamers in an in vitro cell-SELEX experiment. We experimentally validate two aptamers whose binding required both sequence and structural features. AptaTRACE can identify low-abundance motifs, and we show through simulations that, because of this, it could lower HT-SELEX cost and time by reducing the number of selection cycles required. PMID:27467247

  14. Finishing and Special Motifs: Lessons Learned from CRISPR Analysis Using Next-Generation Draft Sequences ( 7th Annual SFAF Meeting, 2012)

    ScienceCinema

    Campbell, Catherine [Noblis

    2013-03-22

    Catherine Campbell on "Finishing and Special Motifs: Lessons learned from CRISPR analysis using next-generation draft sequences" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.

  15. Finishing and Special Motifs: Lessons Learned from CRISPR Analysis Using Next-Generation Draft Sequences ( 7th Annual SFAF Meeting, 2012)

    SciTech Connect

    Campbell, Catherine

    2012-06-01

    Catherine Campbell on "Finishing and Special Motifs: Lessons learned from CRISPR analysis using next-generation draft sequences" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.

  16. Structure of the Brd4 ET domain bound to a C-terminal motif from γ-retroviral integrases reveals a conserved mechanism of interaction

    PubMed Central

    Crowe, Brandon L.; Larue, Ross C.; Yuan, Chunhua; Hess, Sonja; Kvaratskhelia, Mamuka; Foster, Mark P.

    2016-01-01

    The bromodomain and extraterminal domain (BET) protein family are promising therapeutic targets for a range of diseases linked to transcriptional activation, cancer, viral latency, and viral integration. Tandem bromodomains selectively tether BET proteins to chromatin by engaging cognate acetylated histone marks, and the extraterminal (ET) domain is the focal point for recruiting a range of cellular and viral proteins. BET proteins guide γ-retroviral integration to transcription start sites and enhancers through bimodal interaction with chromatin and the γ-retroviral integrase (IN). We report the NMR-derived solution structure of the Brd4 ET domain bound to a conserved peptide sequence from the C terminus of murine leukemia virus (MLV) IN. The complex reveals a protein–protein interaction governed by the binding-coupled folding of disordered regions in both interacting partners to form a well-structured intermolecular three-stranded β sheet. In addition, we show that a peptide comprising the ET binding motif (EBM) of MLV IN can disrupt the cognate interaction of Brd4 with NSD3, and that substitutions of Brd4 ET residues essential for binding MLV IN also impair interaction of Brd4 with a number of cellular partners involved in transcriptional regulation and chromatin remodeling. This suggests that γ-retroviruses have evolved the EBM to mimic a cognate interaction motif to achieve effective integration in host chromatin. Collectively, our findings identify key structural features of the ET domain of Brd4 that allow for interactions with both cellular and viral proteins. PMID:26858406

  17. Explorations of linked editosome domains leading to the discovery of motifs defining conserved pockets in editosome OB-folds

    PubMed Central

    Park, Young-Jun; Hol, Wim G. J.

    2012-01-01

    Trypanosomatids form a group of protozoa which contain parasites of human, animals and plants. Several of these species cause major human diseases, including Trypanosoma brucei which is the causative agent of human African trypanosomiasis, also called sleeping sickness. These organisms have many highly unusual features including a unique U-insertion/deletion RNA editing process in the single mitochondrion. A key multi-protein complex, called the ~20S editosome, or editosome, carries out a cascade of essential RNA-modifying reactions and contains a core of 12 different proteins of which six are the interaction proteins A1 to A6. Each of these interaction proteins comprises a C-terminal OB-fold and the smallest interaction protein A6 has been shown to interact with four other editosome OB-folds. Here we report the results of a “linked OB-fold” approach to obtain a view of how multiple OB-folds might interact in the core of the editosome. Constructs of multiple variants of linked domains in 25 expression and co-expression experiments resulted in 13 soluble multi-OB-fold complexes. In several instances, these complexes were more homogeneous in size than those obtained from corresponding unlinked OB-folds. The crystal structure of A3OB linked to A6 could be elucidated and confirmed the tight interaction between these two OB domains as seen also in our recent complex of A3OB and A6 with nanobodies. In the current crystal structure of A3OB linked to A6, hydrophobic side chains reside in well-defined pockets of neighboring OB-fold domains. When analyzing the available crystal structures of editosome OB-folds, it appears that in five instances “Pocket 1” of A1OB, A3OB and A6 is occupied by a hydrophobic side chain from a neighboring protein. In these three different OB-folds, Pocket 1 is formed by two conserved sequence motifs and an invariant arginine. These pockets might play a key role in the assembly or mechanism of the editosome by interacting with hydrophobic

  18. Endocytosis and Trafficking of Natriuretic Peptide Receptor-A: Potential Role of Short Sequence Motifs

    PubMed Central

    Pandey, Kailash N.

    2015-01-01

    The targeted endocytosis and redistribution of transmembrane receptors among membrane-bound subcellular organelles are vital for their correct signaling and physiological functions. Membrane receptors committed for internalization and trafficking pathways are sorted into coated vesicles. Cardiac hormones, atrial and brain natriuretic peptides (ANP and BNP) bind to guanylyl cyclase/natriuretic peptide receptor-A (GC-A/NPRA) and elicit the generation of intracellular second messenger cyclic guanosine 3',5'-monophosphate (cGMP), which lowers blood pressure and incidence of heart failure. After ligand binding, the receptor is rapidly internalized, sequestrated, and redistributed into intracellular locations. Thus, NPRA is considered a dynamic cellular macromolecule that traverses different subcellular locations through its lifetime. The utilization of pharmacologic and molecular perturbants has helped in delineating the pathways of endocytosis, trafficking, down-regulation, and degradation of membrane receptors in intact cells. This review describes the investigation of the mechanisms of internalization, trafficking, and redistribution of NPRA compared with other cell surface receptors from the plasma membrane into the cell interior. The roles of different short-signal peptide sequence motifs in the internalization and trafficking of other membrane receptors have been briefly reviewed and their potential significance in the internalization and trafficking of NPRA is discussed. PMID:26151885

  19. The Molecular Switching Mechanism at the Conserved D(E)RY Motif in Class-A GPCRs.

    PubMed

    Sandoval, Angelica; Eichler, Stefanie; Madathil, Sineej; Reeves, Philip J; Fahmy, Karim; Böckmann, Rainer A

    2016-07-12

    The disruption of ionic and H-bond interactions between the cytosolic ends of transmembrane helices TM3 and TM6 of class-A (rhodopsin-like) G protein-coupled receptors (GPCRs) is a hallmark for their activation by chemical or physical stimuli. In the bovine photoreceptor rhodopsin, this is accompanied by proton uptake at Glu(134) in the class-conserved D(E)RY motif. Studies on TM3 model peptides proposed a crucial role of the lipid bilayer in linking protonation to stabilization of an active state-like conformation. However, the molecular details of this linkage could not be resolved and have been addressed in this study by molecular dynamics (MD) simulations on TM3 model peptides in a bilayer of 1,2-dioleoyl-sn-glycero-3-phosphocholine (DOPC). We show that protonation of the conserved glutamic acid alters the peptide insertion depth in the membrane, its side-chain rotamer preferences, and stabilizes the C-terminal helical structure. These factors contribute to the rise of the side-chain pKa (> 6) and to reduced polarity around the TM3 C terminus as confirmed by fluorescence spectroscopy. Helix stabilization requires the protonated carboxyl group; unexpectedly, this stabilization could not be evoked with an amide in MD simulations. Additionally, time-resolved Fourier transform infrared (FTIR) spectroscopy of TM3 model peptides revealed a different kinetics for lipid ester carbonyl hydration, suggesting that the carboxyl is linked to more extended H-bond clusters than an amide. Remarkably, this was seen as well in DOPC-reconstituted Glu(134)- and Gln(134)-containing bovine opsin mutants and demonstrates that the D(E)RY motif is a hydrated microdomain. The function of the D(E)RY motif as a proton switch is suggested to be based on the reorganization of the H-bond network at the membrane interface. PMID:27410736

  20. Direct contacts between conserved motifs of different subunits provide major contribution to active site organization in human and mycobacterial dUTPases

    PubMed Central

    Takács, Enikő; Nagy, Gergely; Leveles, Ibolya; Harmat, Veronika; Lopata, Anna; Tóth, Judit; Vértessy, Beáta G.

    2010-01-01

    dUTPases are essential for genome integrity. Recent results allowed characterization of the role of conserved residues. Here we analyzed the Asp/Asn mutation within conserved Motif I of human and mycobacterial dUTPases, wherein the Asp residue was previously implicated in Mg2+-coordination. Our results on transient/steady-state kinetics, ligand-binding and a 1.80 Å-resolution structure of the mutant mycobacterial enzyme, in comparison with wild type and C-terminally truncated structures, argue that this residue has a major role in providing intra- and intersubunit contacts, but is not essential for Mg2+ accommodation. We conclude that in addition to the role of conserved motifs in substrate accommodation, direct subunit interaction between protein atoms of active site residues from different conserved motifs are crucial for enzyme function. PMID:20493855

  1. Direct contacts between conserved motifs of different subunits provide major contribution to active site organization in human and mycobacterial dUTPases.

    PubMed

    Takács, Eniko; Nagy, Gergely; Leveles, Ibolya; Harmat, Veronika; Lopata, Anna; Tóth, Judit; Vértessy, Beáta G

    2010-07-16

    dUTP pyrophosphatases (dUTPases) are essential for genome integrity. Recent results allowed characterization of the role of conserved residues. Here we analyzed the Asp/Asn mutation within conserved Motif I of human and mycobacterial dUTPases, wherein the Asp residue was previously implicated in Mg(2+)-coordination. Our results on transient/steady-state kinetics, ligand binding and a 1.80 A resolution structure of the mutant mycobacterial enzyme, in comparison with wild type and C-terminally truncated structures, argue that this residue has a major role in providing intra- and intersubunit contacts, but is not essential for Mg(2+) accommodation. We conclude that in addition to the role of conserved motifs in substrate accommodation, direct subunit interaction between protein atoms of active site residues from different conserved motifs are crucial for enzyme function. PMID:20493855

  2. DNA recognition for virus assembly through multiple sequence-independent interactions with a helix-turn-helix motif.

    PubMed

    Greive, Sandra J; Fung, Herman K H; Chechik, Maria; Jenkins, Huw T; Weitzel, Stephen E; Aguiar, Pedro M; Brentnall, Andrew S; Glousieau, Matthieu; Gladyshev, Grigory V; Potts, Jennifer R; Antson, Alfred A

    2016-01-29

    The helix-turn-helix (HTH) motif features frequently in protein DNA-binding assemblies. Viral pac site-targeting small terminase proteins possess an unusual architecture in which the HTH motifs are displayed in a ring, distinct from the classical HTH dimer. Here we investigate how such a circular array of HTH motifs enables specific recognition of the viral genome for initiation of DNA packaging during virus assembly. We found, by surface plasmon resonance and analytical ultracentrifugation, that individual HTH motifs of the Bacillus phage SF6 small terminase bind the packaging regions of SF6 and related SPP1 genome weakly, with little local sequence specificity. Nuclear magnetic resonance chemical shift perturbation studies with an arbitrary single-site substrate suggest that the HTH motif contacts DNA similarly to how certain HTH proteins contact DNA non-specifically. Our observations support a model where specificity is generated through conformational selection of an intrinsically bent DNA segment by a ring of HTHs which bind weakly but cooperatively. Such a system would enable viral gene regulation and control of the viral life cycle, with a minimal genome, conferring a major evolutionary advantage for SPP1-like viruses. PMID:26673721

  3. DNA recognition for virus assembly through multiple sequence-independent interactions with a helix-turn-helix motif

    PubMed Central

    Greive, Sandra J.; Fung, Herman K.H.; Chechik, Maria; Jenkins, Huw T.; Weitzel, Stephen E.; Aguiar, Pedro M.; Brentnall, Andrew S.; Glousieau, Matthieu; Gladyshev, Grigory V.; Potts, Jennifer R.; Antson, Alfred A.

    2016-01-01

    The helix-turn-helix (HTH) motif features frequently in protein DNA-binding assemblies. Viral pac site-targeting small terminase proteins possess an unusual architecture in which the HTH motifs are displayed in a ring, distinct from the classical HTH dimer. Here we investigate how such a circular array of HTH motifs enables specific recognition of the viral genome for initiation of DNA packaging during virus assembly. We found, by surface plasmon resonance and analytical ultracentrifugation, that individual HTH motifs of the Bacillus phage SF6 small terminase bind the packaging regions of SF6 and related SPP1 genome weakly, with little local sequence specificity. Nuclear magnetic resonance chemical shift perturbation studies with an arbitrary single-site substrate suggest that the HTH motif contacts DNA similarly to how certain HTH proteins contact DNA non-specifically. Our observations support a model where specificity is generated through conformational selection of an intrinsically bent DNA segment by a ring of HTHs which bind weakly but cooperatively. Such a system would enable viral gene regulation and control of the viral life cycle, with a minimal genome, conferring a major evolutionary advantage for SPP1-like viruses. PMID:26673721

  4. Defining RNA motif-aminoglycoside interactions via two-dimensional combinatorial screening and structure-activity relationships through sequencing.

    PubMed

    Velagapudi, Sai Pradeep; Disney, Matthew D

    2013-10-15

    RNA is an extremely important target for the development of chemical probes of function or small molecule therapeutics. Aminoglycosides are the most well studied class of small molecules to target RNA. However, the RNA motifs outside of the bacterial rRNA A-site that are likely to be bound by these compounds in biological systems is largely unknown. If such information were known, it could allow for aminoglycosides to be exploited to target other RNAs and, in addition, could provide invaluable insights into potential bystander targets of these clinically used drugs. We utilized two-dimensional combinatorial screening (2DCS), a library-versus-library screening approach, to select the motifs displayed in a 3×3 nucleotide internal loop library and in a 6-nucleotide hairpin library that bind with high affinity and selectivity to six aminoglycoside derivatives. The selected RNA motifs were then analyzed using structure-activity relationships through sequencing (StARTS), a statistical approach that defines the privileged RNA motif space that binds a small molecule. StARTS allowed for the facile annotation of the selected RNA motif-aminoglycoside interactions in terms of affinity and selectivity. The interactions selected by 2DCS generally have nanomolar affinities, which is higher affinity than the binding of aminoglycosides to a mimic of their therapeutic target, the bacterial rRNA A-site. PMID:23719281

  5. Identification of a Novel Sequence Motif Recognized by the Ankyrin Repeat Domain of zDHHC17/13 S-Acyltransferases.

    PubMed

    Lemonidis, Kimon; Sanchez-Perez, Maria C; Chamberlain, Luke H

    2015-09-01

    S-Acylation is a major post-translational modification affecting several cellular processes. It is particularly important for neuronal functions. This modification is catalyzed by a family of transmembrane S-acyltransferases that contain a conserved zinc finger DHHC (zDHHC) domain. Typically, eukaryote genomes encode for 7-24 distinct zDHHC enzymes, with two members also harboring an ankyrin repeat (AR) domain at their cytosolic N termini. The AR domain of zDHHC enzymes is predicted to engage in numerous interactions and facilitates both substrate recruitment and S-acylation-independent functions; however, the sequence/structural features recognized by this module remain unknown. The two mammalian AR-containing S-acyltransferases are the Golgi-localized zDHHC17 and zDHHC13, also known as Huntingtin-interacting proteins 14 and 14-like, respectively; they are highly expressed in brain, and their loss in mice leads to neuropathological deficits that are reminiscent of Huntington's disease. Here, we report that zDHHC17 and zDHHC13 recognize, via their AR domain, evolutionary conserved and closely related sequences of a [VIAP][VIT]XXQP consensus in SNAP25, SNAP23, cysteine string protein, Huntingtin, cytoplasmic linker protein 3, and microtubule-associated protein 6. This novel AR-binding sequence motif is found in regions predicted to be unstructured and is present in a number of zDHHC17 substrates and zDHHC17/13-interacting S-acylated proteins. This is the first study to identify a motif recognized by AR-containing zDHHCs. PMID:26198635

  6. Fine Scale Analysis of Crossover and Non-Crossover and Detection of Recombination Sequence Motifs in the Honeybee (Apis mellifera)

    PubMed Central

    Bessoltane, Nadia; Toffano-Nioche, Claire; Solignac, Michel; Mougel, Florence

    2012-01-01

    Background Meiotic exchanges are non-uniformly distributed across the genome of most studied organisms. This uneven distribution suggests that recombination is initiated by specific signals and/or regulations. Some of these signals were recently identified in humans and mice. However, it is unclear whether or not sequence signals are also involved in chromosomal recombination of insects. Methodology We analyzed recombination frequencies in the honeybee, in which genome sequencing provided a large amount of SNPs spread over the entire set of chromosomes. As the genome sequences were obtained from a pool of haploid males, which were the progeny of a single queen, an oocyte method (study of recombination on haploid males that develop from unfertilized eggs and hence are the direct reflect of female gametes haplotypes) was developed to detect recombined pairs of SNP sites. Sequences were further compared between recombinant and non-recombinant fragments to detect recombination-specific motifs. Conclusions Recombination events between adjacent SNP sites were detected at an average distance of 92 bp and revealed the existence of high rates of recombination events. This study also shows the presence of conversion without crossover (i. e. non-crossover) events, the number of which largely outnumbers that of crossover events. Furthermore the comparison of sequences that have undergone recombination with sequences that have not, led to the discovery of sequence motifs (CGCA, GCCGC, CCGCA), which may correspond to recombination signals. PMID:22567142

  7. A Conserved alpha-helical motif mediates the binding of diverse nuclear proteins to the SRC1 interaction domain of CBP.

    PubMed

    Matsuda, Sachiko; Harries, Janet C; Viskaduraki, Maria; Troke, Philip J F; Kindle, Karin B; Ryan, Colm; Heery, David M

    2004-04-01

    CREB-binding protein (CBP) and p300 contain modular domains that mediate protein-protein interactions with a wide variety of nuclear factors. A C-terminal domain of CBP (referred to as the SID) is responsible for interaction with the alpha-helical AD1 domain of p160 coactivators such as the steroid receptor coactivator (SRC1), and also other transcriptional regulators such as E1A, Ets-2, IRF3, and p53. Here we show that the pointed (PNT) domain of Ets-2 mediates its interaction with the CBP SID, and describe the effects of mutations in the SID on binding of Ets-2, E1A, and SRC1. In vitro binding studies indicate that SRC1, Ets-2 and E1A display mutually exclusive binding to the CBP SID. Consistent with this, we observed negative cross-talk between ERalpha/SRC1, Ets-2, and E1A proteins in reporter assays in transiently transfected cells. Transcriptional inhibition of Ets-2 or GAL4-AD1 activity by E1A was rescued by co-transfection with a CBP expression plasmid, consistent with the hypothesis that the observed inhibition was due to competition for CBP in vivo. Sequence comparisons revealed that SID-binding proteins contain a leucine-rich motif similar to the alpha-helix Aalpha1 of the SRC1 AD1 domain. Deletion mutants of E1A and Ets-2 lacking the conserved motif were unable to bind the CBP SID. Moreover, a peptide corresponding to this sequence competed the binding of full-length SRC1, Ets-2, and E1A proteins to the CBP SID. Thus, a leucine-rich amphipathic alpha-helix mediates mutually exclusive interactions of functionally diverse nuclear proteins with CBP. PMID:14722092

  8. The 2.2 Å resolution crystal structure of Bacillus cereus Nif3-family protein YqfO reveals a conserved dimetal-binding motif and a regulatory domain

    PubMed Central

    Godsey, Michael H.; Minasov, George; Shuvalova, Ludmilla; Brunzelle, Joseph S.; Vorontsov, Ivan I.; Collart, Frank R.; Anderson, Wayne F.

    2007-01-01

    YqfO of Bacillus cereus is a member of the widespread Nif3 family of proteins, which has been highlighted as an important target for structural genomics. The N- and C-terminal domains are conserved across the family and contain a dimetal-binding motif in a putative active site. YqfO contains an insert in the middle of the protein, present in a minority of bacterial family members. The structure of YqfO was determined at a resolution of 2.2 Å and reveals conservation of the putative active site. It also reveals the previously unknown structure of the insert, which despite extremely limited sequence conservation, bears great similarity to PII, CutA, and a number of other trimeric regulatory proteins. Our results suggest that this domain acts as a signal sensor to regulate the still-unknown catalytic activity of the more-conserved domains. PMID:17586767

  9. Creation of Hybrid Nanorods From Sequences of Natural Trimeric Fibrous Proteins Using the Fibritin Trimerization Motif

    NASA Astrophysics Data System (ADS)

    Papanikolopoulou, Katerina; van Raaij, Mark J.; Mitraki, Anna

    Stable, artificial fibrous proteins that can be functionalized open new avenues in fields such as bionanomaterials design and fiber engineering. An important source of inspiration for the creation of such proteins are natural fibrous proteins such as collagen, elastin, insect silks, and fibers from phages and viruses. The fibrous parts of this last class of proteins usually adopt trimeric, β-stranded structural folds and are appended to globular, receptor-binding domains. It has been recently shown that the globular domains are essential for correct folding and trimerization and can be successfully substituted by a very small (27-amino acid) trimerization motif from phage T4 fibritin. The hybrid proteins are correctly folded nanorods that can withstand extreme conditions. When the fibrous part derives from the adenovirus fiber shaft, different tissue-targeting specificities can be engineered into the hybrid proteins, which therefore can be used as gene therapy vectors. The integration of such stable nanorods in devices is also a big challenge in the field of biomechanical design. The fibritin foldon domain is a versatile trimerization motif and can be combined with a variety of fibrous motifs, such as coiled-coil, collagenous, and triple β-stranded motifs, provided the appropriate linkers are used. The combination of different motifs within the same fibrous molecule to create stable rods with multiple functions can even be envisioned. We provide a comprehensive overview of the experimental procedures used for designing, creating, and characterizing hybrid fibrous nanorods using the fibritin trimerization motif.

  10. Sequence and Spatiotemporal Expression Analysis of CLE-Motif Containing Genes from the Reniform Nematode (Rotylenchulus reniformis Linford & Oliveira)

    PubMed Central

    Wubben, Martin J.; Gavilano, Lily; Baum, Thomas J.; Davis, Eric L.

    2015-01-01

    The reniform nematode, Rotylenchulus reniformis, is a sedentary semi-endoparasitic species with a host range that encompasses more than 77 plant families. Nematode effector proteins containing plant-ligand motifs similar to CLAVATA3/ESR (CLE) peptides have been identified in the Heterodera, Globodera, and Meloidogyne genera of sedentary endoparasites. Here, we describe the isolation, sequence analysis, and spatiotemporal expression of three R. reniformis genes encoding putative CLE motifs named Rr-cle-1, Rr-cle-2, and Rr-cle-3. The Rr-cle cDNAs showed >98% identity with each other and the predicted peptides were identical with the exception of a short stretch of residues at the carboxy(C)-terminus of the variable domain (VD). Each RrCLE peptide possessed an amino-terminal signal peptide for secretion and a single C-terminal CLE motif that was most similar to Heterodera CLE motifs. Aligning the Rr-cle cDNAs with their corresponding genomic sequences showed three exons with an intron separating the signal peptide from the VD and a second intron separating the VD from the CLE motif. An alignment of the RrCLE1 peptide with Heterodera glycines and Heterodera schachtii CLE proteins revealed a high level of homology within the VD region associated with regulating in planta trafficking of the processed CLE peptide. Quantitative RT-PCR (qRT-PCR) showed similar expression profiles for each Rr-cle transcript across the R. reniformis life-cycle with the greatest transcript abundance being in sedentary parasitic female nematodes. In situ hybridization showed specific Rr-cle expression within the dorsal esophageal gland cell of sedentary parasitic females. PMID:26170479

  11. Sequence and Spatiotemporal Expression Analysis of CLE-Motif Containing Genes from the Reniform Nematode (Rotylenchulus reniformis Linford & Oliveira).

    PubMed

    Wubben, Martin J; Gavilano, Lily; Baum, Thomas J; Davis, Eric L

    2015-06-01

    The reniform nematode, Rotylenchulus reniformis, is a sedentary semi-endoparasitic species with a host range that encompasses more than 77 plant families. Nematode effector proteins containing plant-ligand motifs similar to CLAVATA3/ESR (CLE) peptides have been identified in the Heterodera, Globodera, and Meloidogyne genera of sedentary endoparasites. Here, we describe the isolation, sequence analysis, and spatiotemporal expression of three R. reniformis genes encoding putative CLE motifs named Rr-cle-1, Rr-cle-2, and Rr-cle-3. The Rr-cle cDNAs showed >98% identity with each other and the predicted peptides were identical with the exception of a short stretch of residues at the carboxy(C)-terminus of the variable domain (VD). Each RrCLE peptide possessed an amino-terminal signal peptide for secretion and a single C-terminal CLE motif that was most similar to Heterodera CLE motifs. Aligning the Rr-cle cDNAs with their corresponding genomic sequences showed three exons with an intron separating the signal peptide from the VD and a second intron separating the VD from the CLE motif. An alignment of the RrCLE1 peptide with Heterodera glycines and Heterodera schachtii CLE proteins revealed a high level of homology within the VD region associated with regulating in planta trafficking of the processed CLE peptide. Quantitative RT-PCR (qRT-PCR) showed similar expression profiles for each Rr-cle transcript across the R. reniformis life-cycle with the greatest transcript abundance being in sedentary parasitic female nematodes. In situ hybridization showed specific Rr-cle expression within the dorsal esophageal gland cell of sedentary parasitic females. PMID:26170479

  12. Sequence conservation of an avian centromeric repeated DNA component.

    PubMed

    Madsen, C S; Brooks, J E; de Kloet, E; de Kloet, S R

    1994-06-01

    The approximately 190-bp centromeric repeat monomers of the spur-winged lapwing (Vanellus spinosus, Charadriidae), the Chilean flamingo (Phoenicopterus chilensis, Phoenicopteridae), the sarus crane (Grus antigone, Gruidae), parrots (Psittacidae), waterfowl (Anatidae), and the merlin (Falco columbarius, Falconidae) contain elements that are interspecifically highly variable, as well as elements (trinucleotides and higher order oligonucleotides) that are highly conserved in sequence and relative location within the repeat. Such conservation suggests that the centromeric repeats of these avian species have evolved from a common ancestral sequence that may date from very early stages of avian radiation. PMID:8034177

  13. Specific Prenylation of Tomato Rab Proteins by Geranylgeranyl Type-II Transferase Requires a Conserved Cysteine-Cysteine Motif.

    PubMed Central

    Yalovsky, S.; Loraine, A. E.; Gruissem, W.

    1996-01-01

    Posttranslational isoprenylation of some small GTP-binding proteins is required for their biological activity. Rab geranylgeranyl transferase (Rab GGTase) uses geranylgeranyl pyrophosphate to modify Rab proteins, its only known substrates. Geranylgeranylation of Rabs is believed to promote their association with target membranes and interaction with other proteins. Plants, like other eukaryotes, contain Rab-like proteins that are associated with intracellular membranes. However, to our knowledge, the geranylgeranylation of Rab proteins has not yet been characterized from any plant source. This report presents an activity assay that allows the characterization of prenylation of Rab-like proteins in vitro, by protein extracts prepared from plants. Tomato Rab1 proteins and mammalian Rab1a were modified by geranylgeranyl pyrophosphate but not by farnesyl pyrophosphate. This modification required a conserved cysteine-cysteine motif. A mutant form lacking the cysteine-cysteine motif could not be modified, but inhibited the geranylgeranylation of its wild-type homolog. The tomato Rab proteins were modified in vitro by protein extract prepared from yeast, but failed to become modified when the protein extract was prepared from a yeast strain containing a mutant allele for the [alpha] subunit of yeast Rab GGTase (bet4 ts). These results demonstrate that plant cells, like other eukaryotes, contain Rab GGTase-like activity. PMID:12226265

  14. Conserved sequence pattern in a wide variety of phosphoesterases.

    PubMed Central

    Koonin, E. V.

    1994-01-01

    A unique sequence pattern, designated the GD/GNH signature, was shown to be conserved in a wide variety of phosphoesterases. The enzymes containing this signature cleave phosphoester bonds in such different substrates as (1) phosphoserine and phosphothreonine in polypeptides; (2) bis(5'-nucleosidyl)-tetraphosphates; (3) nucleoside 5' phosphates; (4) 2',3'-cyclic nucleotide phosphates; (5) polynucleotides; (6) 2'-5' phosphodiesters in RNA (intron) lariats; (7) sphingomyelin; and (7) various phosphomonoesters. Two conserved acidic amino acid residues and a conserved histidine residue may be directly involved in phosphoester bond cleavage. PMID:8003970

  15. Conserved Sequence Preferences Contribute to Substrate Recognition by the Proteasome*

    PubMed Central

    Yu, Houqing; Singh Gautam, Amit K.; Wilmington, Shameika R.; Wylie, Dennis; Martinez-Fonts, Kirby; Kago, Grace; Warburton, Marie; Chavali, Sreenivas; Inobe, Tomonao; Finkelstein, Ilya J.; Babu, M. Madan

    2016-01-01

    The proteasome has pronounced preferences for the amino acid sequence of its substrates at the site where it initiates degradation. Here, we report that modulating these sequences can tune the steady-state abundance of proteins over 2 orders of magnitude in cells. This is the same dynamic range as seen for inducing ubiquitination through a classic N-end rule degron. The stability and abundance of His3 constructs dictated by the initiation site affect survival of yeast cells and show that variation in proteasomal initiation can affect fitness. The proteasome's sequence preferences are linked directly to the affinity of the initiation sites to their receptor on the proteasome and are conserved between Saccharomyces cerevisiae, Schizosaccharomyces pombe, and human cells. These findings establish that the sequence composition of unstructured initiation sites influences protein abundance in vivo in an evolutionarily conserved manner and can affect phenotype and fitness. PMID:27226608

  16. The Annotation of RNA Motifs

    PubMed Central

    2002-01-01

    The recent deluge of new RNA structures, including complete atomic-resolution views of both subunits of the ribosome, has on the one hand literally overwhelmed our individual abilities to comprehend the diversity of RNA structure, and on the other hand presented us with new opportunities for comprehensive use of RNA sequences for comparative genetic, evolutionary and phylogenetic studies. Two concepts are key to understanding RNA structure: hierarchical organization of global structure and isostericity of local interactions. Global structure changes extremely slowly, as it relies on conserved long-range tertiary interactions. Tertiary RNA–RNA and quaternary RNA–protein interactions are mediated by RNA motifs, defined as recurrent and ordered arrays of non-Watson–Crick base-pairs. A single RNA motif comprises a family of sequences, all of which can fold into the same three-dimensional structure and can mediate the same interaction(s). The chemistry and geometry of base pairing constrain the evolution of motifs in such a way that random mutations that occur within motifs are accepted or rejected insofar as they can mediate a similar ordered array of interactions. The steps involved in the analysis and annotation of RNA motifs in 3D structures are: (a) decomposition of each motif into non-Watson–Crick base-pairs; (b) geometric classification of each basepair; (c) identification of isosteric substitutions for each basepair by comparison to isostericity matrices; (d) alignment of homologous sequences using the isostericity matrices to identify corresponding positions in the crystal structure; (e) acceptance or rejection of the null hypothesis that the motif is conserved. PMID:18629252

  17. Application of PCR amplicon sequencing using a single primer pair in PCR amplification to assess variations in Helicobacter pylori CagA EPIYA tyrosine phosphorylation motifs

    PubMed Central

    2010-01-01

    Background The presence of various EPIYA tyrosine phosphorylation motifs in the CagA protein of Helicobacter pylori has been suggested to contribute to pathogenesis in adults. In this study, a unique PCR assay and sequencing strategy was developed to establish the number and variation of cagA EPIYA motifs. Findings MDA-DNA derived from gastric biopsy specimens from eleven subjects with gastritis was used with M13- and T7-sequence-tagged primers for amplification of the cagA EPIYA motif region. Automated capillary electrophoresis using a high resolution kit and amplicon sequencing confirmed variations in the cagA EPIYA motif region. In nine cases, sequencing revealed the presence of AB, ABC, or ABCC (Western type) cagA EPIYA motif, respectively. In two cases, double cagA EPIYA motifs were detected (ABC/ABCC or ABC/AB), indicating the presence of two H. pylori strains in the same biopsy. Conclusion Automated capillary electrophoresis and Amplicon sequencing using a single, M13- and T7-sequence-tagged primer pair in PCR amplification enabled a rapid molecular typing of cagA EPIYA motifs. Moreover, the techniques described allowed for a rapid detection of mixed H. pylori strains present in the same biopsy specimen. PMID:20181142

  18. Modeling and analysis of MH1 domain of Smads and their interaction with promoter DNA sequence motif.

    PubMed

    Makkar, Pooja; Metpally, Raghu Prasad R; Sangadala, Sreedhara; Reddy, Boojala Vijay B

    2009-04-01

    The Smads are a group of related intracellular proteins critical for transmitting the signals to the nucleus from the transforming growth factor-beta (TGF-beta) superfamily of proteins at the cell surface. The prototypic members of the Smad family, Mad and Sma, were first described in Drosophila and Caenorhabditis elegans, respectively. Related proteins in Xenopus, Humans, Mice and Rats were subsequently identified, and are now known as Smads. Smad protein family members act downstream in the TGF-beta signaling pathway mediating various biological processes, including cell growth, differentiation, matrix production, apoptosis and development. Smads range from about 400-500 amino acids in length and are grouped into the receptor-regulated Smads (R-Smads), the common Smads (Co-Smads) and the inhibitory Smads (I-Smads). There are eight Smads in mammals, Smad1/5/8 (bone morphogenetic protein regulated) and Smad2/3 (TGF-beta/activin regulated) are termed R-Smads, Smad4 is denoted as Co-Smad and Smad6/7 are inhibitory Smads. A typical Smad consists of a conserved N-terminal Mad Homology 1 (MH1) domain and a C-terminal Mad Homology 2 (MH2) domain connected by a proline rich linker. The MH1 domain plays key role in DNA recognition and also facilitates the binding of Smad4 to the phosphorylated C-terminus of R-Smads to form activated complex. The MH2 domain exhibits transcriptional activation properties. In order to understand the structural basis of interaction of various Smads with their target proteins and the promoter DNA, we modeled MH1 domain of the remaining mammalian Smads based on known crystal structures of Smad3-MH1 domain bound to GTCT Smad box DNA sequence (1OZJ). We generated a B-DNA structure using average base-pair parameters of Twist, Tilt, Roll and base Slide angles. We then modeled interaction pose of the MH1 domain of Smad1/5/8 to their corresponding DNA sequence motif GCCG. These models provide the structural basis towards understanding functional

  19. Space-related pharma-motifs for fast search of protein binding motifs and polypharmacological targets

    PubMed Central

    2012-01-01

    Background To discover a compound inhibiting multiple proteins (i.e. polypharmacological targets) is a new paradigm for the complex diseases (e.g. cancers and diabetes). In general, the polypharmacological proteins often share similar local binding environments and motifs. As the exponential growth of the number of protein structures, to find the similar structural binding motifs (pharma-motifs) is an emergency task for drug discovery (e.g. side effects and new uses for old drugs) and protein functions. Results We have developed a Space-Related Pharmamotifs (called SRPmotif) method to recognize the binding motifs by searching against protein structure database. SRPmotif is able to recognize conserved binding environments containing spatially discontinuous pharma-motifs which are often short conserved peptides with specific physico-chemical properties for protein functions. Among 356 pharma-motifs, 56.5% interacting residues are highly conserved. Experimental results indicate that 81.1% and 92.7% polypharmacological targets of each protein-ligand complex are annotated with same biological process (BP) and molecular function (MF) terms, respectively, based on Gene Ontology (GO). Our experimental results show that the identified pharma-motifs often consist of key residues in functional (active) sites and play the key roles for protein functions. The SRPmotif is available at http://gemdock.life.nctu.edu.tw/SRP/. Conclusions SRPmotif is able to identify similar pharma-interfaces and pharma-motifs sharing similar binding environments for polypharmacological targets by rapidly searching against the protein structure database. Pharma-motifs describe the conservations of binding environments for drug discovery and protein functions. Additionally, these pharma-motifs provide the clues for discovering new sequence-based motifs to predict protein functions from protein sequence databases. We believe that SRPmotif is useful for elucidating protein functions and drug discovery

  20. A conserved intronic U1 snRNP-binding sequence promotes trans-splicing in Drosophila

    PubMed Central

    Gao, Jun-Li; Fan, Yu-Jie; Wang, Xiu-Ye; Zhang, Yu; Pu, Jia; Li, Liang; Shao, Wei; Zhan, Shuai; Hao, Jianjiang

    2015-01-01

    Unlike typical cis-splicing, trans-splicing joins exons from two separate transcripts to produce chimeric mRNA and has been detected in most eukaryotes. Trans-splicing in trypanosomes and nematodes has been characterized as a spliced leader RNA-facilitated reaction; in contrast, its mechanism in higher eukaryotes remains unclear. Here we investigate mod(mdg4), a classic trans-spliced gene in Drosophila, and report that two critical RNA sequences in the middle of the last 5′ intron, TSA and TSB, promote trans-splicing of mod(mdg4). In TSA, a 13-nucleotide (nt) core motif is conserved across Drosophila species and is essential and sufficient for trans-splicing, which binds U1 small nuclear RNP (snRNP) through strong base-pairing with U1 snRNA. In TSB, a conserved secondary structure acts as an enhancer. Deletions of TSA and TSB using the CRISPR/Cas9 system result in developmental defects in flies. Although it is not clear how the 5′ intron finds the 3′ introns, compensatory changes in U1 snRNA rescue trans-splicing of TSA mutants, demonstrating that U1 recruitment is critical to promote trans-splicing in vivo. Furthermore, TSA core-like motifs are found in many other trans-spliced Drosophila genes, including lola. These findings represent a novel mechanism of trans-splicing, in which RNA motifs in the 5′ intron are sufficient to bring separate transcripts into close proximity to promote trans-splicing. PMID:25838544

  1. Automated classification of RNA 3D motifs and the RNA 3D Motif Atlas

    PubMed Central

    Petrov, Anton I.; Zirbel, Craig L.; Leontis, Neocles B.

    2013-01-01

    The analysis of atomic-resolution RNA three-dimensional (3D) structures reveals that many internal and hairpin loops are modular, recurrent, and structured by conserved non-Watson–Crick base pairs. Structurally similar loops define RNA 3D motifs that are conserved in homologous RNA molecules, but can also occur at nonhomologous sites in diverse RNAs, and which often vary in sequence. To further our understanding of RNA motif structure and sequence variability and to provide a useful resource for structure modeling and prediction, we present a new method for automated classification of internal and hairpin loop RNA 3D motifs and a new online database called the RNA 3D Motif Atlas. To classify the motif instances, a representative set of internal and hairpin loops is automatically extracted from a nonredundant list of RNA-containing PDB files. Their structures are compared geometrically, all-against-all, using the FR3D program suite. The loops are clustered into motif groups, taking into account geometric similarity and structural annotations and making allowance for a variable number of bulged bases. The automated procedure that we have implemented identifies all hairpin and internal loop motifs previously described in the literature. All motif instances and motif groups are assigned unique and stable identifiers and are made available in the RNA 3D Motif Atlas (http://rna.bgsu.edu/motifs), which is automatically updated every four weeks. The RNA 3D Motif Atlas provides an interactive user interface for exploring motif diversity and tools for programmatic data access. PMID:23970545

  2. Gibbs motif sampling: detection of bacterial outer membrane protein repeats.

    PubMed Central

    Neuwald, A. F.; Liu, J. S.; Lawrence, C. E.

    1995-01-01

    The detection and alignment of locally conserved regions (motifs) in multiple sequences can provide insight into protein structure, function, and evolution. A new Gibbs sampling algorithm is described that detects motif-encoding regions in sequences and optimally partitions them into distinct motif models; this is illustrated using a set of immunoglobulin fold proteins. When applied to sequences sharing a single motif, the sampler can be used to classify motif regions into related submodels, as is illustrated using helix-turn-helix DNA-binding proteins. Other statistically based procedures are described for searching a database for sequences matching motifs found by the sampler. When applied to a set of 32 very distantly related bacterial integral outer membrane proteins, the sampler revealed that they share a subtle, repetitive motif. Although BLAST (Altschul SF et al., 1990, J Mol Biol 215:403-410) fails to detect significant pairwise similarity between any of the sequences, the repeats present in these outer membrane proteins, taken as a whole, are highly significant (based on a generally applicable statistical test for motifs described here). Analysis of bacterial porins with known trimeric beta-barrel structure and related proteins reveals a similar repetitive motif corresponding to alternating membrane-spanning beta-strands. These beta-strands occur on the membrane interface (as opposed to the trimeric interface) of the beta-barrel. The broad conservation and structural location of these repeats suggests that they play important functional roles. PMID:8520488

  3. Amino Acids of Conserved Kinase Motifs of Cytomegalovirus Protein UL97 Are Essential for Autophosphorylation

    PubMed Central

    Michel, Detlef; Kramer, Silke; Höhn, Simone; Schaarschmidt, Peter; Wunderlich, Kirsten; Mertens, Thomas

    1999-01-01

    Thirteen point mutations targeting predicted domains conserved in homologous protein kinases were introduced into the UL97 coding region of the human cytomegalovirus. All mutagenized proteins were expressed in cells infected with recombinant vaccinia viruses (rVV). Several mutations drastically reduced ganciclovir (GCV) phosphorylation. Mutations at amino acids G340, A442, L446, and F523 resulted in a complete loss of pUL97 phosphorylation, which was strictly associated with a loss of GCV phosphorylation. Our results confirm that in rVV-infected cells pUL97 phosphorylation is due to autophosphorylation and show that several amino acids conserved within domains of protein kinases are essential for this pUL97 phosphorylation. GCV phosphorylation is dependent on pUL97 phosphorylation. PMID:10482650

  4. Using the Gibbs Motif Sampler for Phylogenetic Footprinting

    SciTech Connect

    Thompson, William; Conlan, Sean; McCue, Lee Ann; Lawrence, Charles

    2007-07-01

    The Gibbs Motif Sampler (Gibbs) (1) is a software package used to predict conserved elements in biopolymer sequences. While the software can be used to locate conserved motifs in protein sequences, its most common use is the prediction of transcription factor binding sites (TFBSs) in promoters upstream of gene sequences. We will describe approaches that use Gibbs to locate TFBSs in a collection of orthologous nucleotide sequences, i.e. phylogenetic footprinting. To illustrate this technique, we present examples that use Gibbs to detect binding sites for the transcription factor LexA in orthologous sequence data from representative species belonging to two different proteobacterial divisions.

  5. Bayesian Markov models consistently outperform PWMs at predicting motifs in nucleotide sequences.

    PubMed

    Siebert, Matthias; Söding, Johannes

    2016-07-27

    Position weight matrices (PWMs) are the standard model for DNA and RNA regulatory motifs. In PWMs nucleotide probabilities are independent of nucleotides at other positions. Models that account for dependencies need many parameters and are prone to overfitting. We have developed a Bayesian approach for motif discovery using Markov models in which conditional probabilities of order k - 1 act as priors for those of order k This Bayesian Markov model (BaMM) training automatically adapts model complexity to the amount of available data. We also derive an EM algorithm for de-novo discovery of enriched motifs. For transcription factor binding, BaMMs achieve significantly (P    =  1/16) higher cross-validated partial AUC than PWMs in 97% of 446 ChIP-seq ENCODE datasets and improve performance by 36% on average. BaMMs also learn complex multipartite motifs, improving predictions of transcription start sites, polyadenylation sites, bacterial pause sites, and RNA binding sites by 26-101%. BaMMs never performed worse than PWMs. These robust improvements argue in favour of generally replacing PWMs by BaMMs. PMID:27288444

  6. Defining a Conformational Consensus Motif in Cotransin-Sensitive Signal Sequences: A Proteomic and Site-Directed Mutagenesis Study

    PubMed Central

    Klein, Wolfgang; Westendorf, Carolin; Schmidt, Antje; Conill-Cortés, Mercè; Rutz, Claudia; Blohs, Marcus; Beyermann, Michael; Protze, Jonas; Krause, Gerd; Krause, Eberhard; Schülein, Ralf

    2015-01-01

    The cyclodepsipeptide cotransin was described to inhibit the biosynthesis of a small subset of proteins by a signal sequence-discriminatory mechanism at the Sec61 protein-conducting channel. However, it was not clear how selective cotransin is, i.e. how many proteins are sensitive. Moreover, a consensus motif in signal sequences mediating cotransin sensitivity has yet not been described. To address these questions, we performed a proteomic study using cotransin-treated human hepatocellular carcinoma cells and the stable isotope labelling by amino acids in cell culture technique in combination with quantitative mass spectrometry. We used a saturating concentration of cotransin (30 micromolar) to identify also less-sensitive proteins and to discriminate the latter from completely resistant proteins. We found that the biosynthesis of almost all secreted proteins was cotransin-sensitive under these conditions. In contrast, biosynthesis of the majority of the integral membrane proteins was cotransin-resistant. Cotransin sensitivity of signal sequences was neither related to their length nor to their hydrophobicity. Instead, in the case of signal anchor sequences, we identified for the first time a conformational consensus motif mediating cotransin sensitivity. PMID:25806945

  7. Variability Of The Conserved V3 Loop Tip Motif In Hiv-1 Subtype B Isolates Collected From Brazilian And French Patients

    PubMed Central

    Tomasini-Grotto, Rejane-Maria; Montes, Brigitte; Triglia, Denise; Torres-Braconi, Carla; Aliano-Block, Juliana; de A. Zanotto, Paolo M.; de M. C. Pardini, Maria-Inès; Segondy, Michel

    2010-01-01

    The diversity of the V3 loop tip motif sequences of HIV-1 subtype B was analyzed in patients from Botucatu (Brazil) and Montpellier (France). Overall, 37 tetrameric tip motifs were identified, 28 and 17 of them being recognized in Brazilian and French patients, respectively. The GPGR (P) motif was predominant in French but not in Brazilian patients (53.5% vs 31.0%), whereas the GWGR (W) motif was frequent in Brazilian patients (23.0%) and absent in French patients. Three tip motif groups were considered: P, W, and non-P non-W groups. The distribution of HIV-1 isolates into the three groups was significantly different between isolates from Botucatu and from Montpellier (P < 0.001). A higher proportion of CXCR4-using HIV-1 (X4 variants) was observed in the non-P non-W group as compared with the P group (37.5% vs 19.1%), and no X4 variant was identified in the W group (P < 0.001). The higher proportion of X4 variants in the non-P non-W group was essentially observed among the patients from Montpellier, who have been infected with HIV-1 for a longer period of time than those from Botucatu. Among patients from Montpellier, CD4+ cell counts were lower in patients belonging to the non-P non-W group than in those belonging to the P group (24 cells/μL vs 197 cells/μL; P = 0.005). Taken together, the results suggest that variability of the V3 loop tip motif may be related to HIV-1 coreceptor usage and to disease progression. However, as analyzed by a bioinformatic method, the substitution of the V3 loop tip motif of the subtype B consensus sequence with the different tip motifs identified in the present study was not sufficient to induce a change in HIV-1 coreceptor usage. PMID:24031549

  8. Sequence-Specific Recognition of DNA by Proteins: Binding Motifs Discovered Using a Novel Statistical/Computational Analysis.

    PubMed

    Jakubec, David; Laskowski, Roman A; Vondrasek, Jiri

    2016-01-01

    Decades of intensive experimental studies of the recognition of DNA sequences by proteins have provided us with a view of a diverse and complicated world in which few to no features are shared between individual DNA-binding protein families. The originally conceived direct readout of DNA residue sequences by amino acid side chains offers very limited capacity for sequence recognition, while the effects of the dynamic properties of the interacting partners remain difficult to quantify and almost impossible to generalise. In this work we investigated the energetic characteristics of all DNA residue-amino acid side chain combinations in the conformations found at the interaction interface in a very large set of protein-DNA complexes by the means of empirical potential-based calculations. General specificity-defining criteria were derived and utilised to look beyond the binding motifs considered in previous studies. Linking energetic favourability to the observed geometrical preferences, our approach reveals several additional amino acid motifs which can distinguish between individual DNA bases. Our results remained valid in environments with various dielectric properties. PMID:27384774

  9. Sequence-Specific Recognition of DNA by Proteins: Binding Motifs Discovered Using a Novel Statistical/Computational Analysis

    PubMed Central

    Jakubec, David; Laskowski, Roman A.; Vondrasek, Jiri

    2016-01-01

    Decades of intensive experimental studies of the recognition of DNA sequences by proteins have provided us with a view of a diverse and complicated world in which few to no features are shared between individual DNA-binding protein families. The originally conceived direct readout of DNA residue sequences by amino acid side chains offers very limited capacity for sequence recognition, while the effects of the dynamic properties of the interacting partners remain difficult to quantify and almost impossible to generalise. In this work we investigated the energetic characteristics of all DNA residue—amino acid side chain combinations in the conformations found at the interaction interface in a very large set of protein—DNA complexes by the means of empirical potential-based calculations. General specificity-defining criteria were derived and utilised to look beyond the binding motifs considered in previous studies. Linking energetic favourability to the observed geometrical preferences, our approach reveals several additional amino acid motifs which can distinguish between individual DNA bases. Our results remained valid in environments with various dielectric properties. PMID:27384774

  10. Drosophila EYA regulates the immune response against DNA through an evolutionarily conserved threonine phosphatase motif.

    PubMed

    Liu, Xi; Sano, Teruyuki; Guan, Yongsheng; Nagata, Shigekazu; Hoffmann, Jules A; Fukuyama, Hidehiro

    2012-01-01

    Innate immune responses against DNA are essential to counter both pathogen infections and tissue damages. Mammalian EYAs were recently shown to play a role in regulating the innate immune responses against DNA. Here, we demonstrate that the unique Drosophila eya gene is also involved in the response specific to DNA. Haploinsufficiency of eya in mutants deficient for lysosomal DNase activity (DNaseII) reduces antimicrobial peptide gene expression, a hallmark for immune responses in flies. Like the mammalian orthologues, Drosophila EYA features a N-terminal threonine and C-terminal tyrosine phosphatase domain. Through the generation of a series of mutant EYA fly strains, we show that the threonine phosphatase domain, but not the tyrosine phosphatase domain, is responsible for the innate immune response against DNA. A similar role for the threonine phosphatase domain in mammalian EYA4 had been surmised on the basis of in vitro studies. Furthermore EYA associates with IKKβ and full-length RELISH, and the induction of the IMD pathway-dependent antimicrobial peptide gene is independent of SO. Our data provide the first in vivo demonstration for the immune function of EYA and point to their conserved immune function in response to endogenous DNA, throughout evolution. PMID:22916150

  11. Local Function Conservation in Sequence and Structure Space

    PubMed Central

    Weinhold, Nils; Sander, Oliver; Domingues, Francisco S.; Lengauer, Thomas; Sommer, Ingolf

    2008-01-01

    We assess the variability of protein function in protein sequence and structure space. Various regions in this space exhibit considerable difference in the local conservation of molecular function. We analyze and capture local function conservation by means of logistic curves. Based on this analysis, we propose a method for predicting molecular function of a query protein with known structure but unknown function. The prediction method is rigorously assessed and compared with a previously published function predictor. Furthermore, we apply the method to 500 functionally unannotated PDB structures and discuss selected examples. The proposed approach provides a simple yet consistent statistical model for the complex relations between protein sequence, structure, and function. The GOdot method is available online (http://godot.bioinf.mpi-inf.mpg.de). PMID:18604264

  12. A Glance at Microsatellite Motifs from 454 Sequencing Reads of Watermelon Genomic DNA

    Technology Transfer Automated Retrieval System (TEKTRAN)

    A single 454 (Life Sciences Sequencing Technology) run of Charleston Gray watermelon (Citrullus lanatus var. lanatus) genomic DNA was performed and sequence data were assembled. A large scale identification of simple sequence repeat (SSR) was performed and SSR sequence data were used for the develo...

  13. Epsilon glutathione transferases possess a unique class-conserved subunit interface motif that directly interacts with glutathione in the active site.

    PubMed

    Wongsantichon, Jantana; Robinson, Robert C; Ketterman, Albert J

    2015-01-01

    Epsilon class glutathione transferases (GSTs) have been shown to contribute significantly to insecticide resistance. We report a new Epsilon class protein crystal structure from Drosophila melanogaster for the glutathione transferase DmGSTE6. The structure reveals a novel Epsilon clasp motif that is conserved across hundreds of millions of years of evolution of the insect Diptera order. This histidine-serine motif lies in the subunit interface and appears to contribute to quaternary stability as well as directly connecting the two glutathiones in the active sites of this dimeric enzyme. PMID:26487708

  14. Epsilon glutathione transferases possess a unique class-conserved subunit interface motif that directly interacts with glutathione in the active site

    PubMed Central

    Wongsantichon, Jantana; Robinson, Robert C.; Ketterman, Albert J.

    2015-01-01

    Epsilon class glutathione transferases (GSTs) have been shown to contribute significantly to insecticide resistance. We report a new Epsilon class protein crystal structure from Drosophila melanogaster for the glutathione transferase DmGSTE6. The structure reveals a novel Epsilon clasp motif that is conserved across hundreds of millions of years of evolution of the insect Diptera order. This histidine-serine motif lies in the subunit interface and appears to contribute to quaternary stability as well as directly connecting the two glutathiones in the active sites of this dimeric enzyme. PMID:26487708

  15. A conserved secondary structural motif in 23S rRNA defines the site of interaction of amicetin, a universal inhibitor of peptide bond formation.

    PubMed Central

    Leviev, I G; Rodriguez-Fonseca, C; Phan, H; Garrett, R A; Heilek, G; Noller, H F; Mankin, A S

    1994-01-01

    The binding site and probable site of action have been determined for the universal antibiotic amicetin which inhibits peptide bond formation. Evidence from in vivo mutants, site-directed mutations and chemical footprinting all implicate a highly conserved motif in the secondary structure of the 23S-like rRNA close to the central circle of domain V. We infer that this motif lies at, or close to, the catalytic site in the peptidyl transfer centre. The binding site of amicetin is the first of a group of functionally related hexose-cytosine inhibitors to be localized on the ribosome. Images PMID:8157007

  16. Automated conserved non-coding sequence (CNS) discovery reveals differences in gene content and promoter evolution among grasses

    PubMed Central

    Turco, Gina; Schnable, James C.; Pedersen, Brent; Freeling, Michael

    2013-01-01

    Conserved non-coding sequences (CNS) are islands of non-coding sequence that, like protein coding exons, show less divergence in sequence between related species than functionless DNA. Several CNSs have been demonstrated experimentally to function as cis-regulatory regions. However, the specific functions of most CNSs remain unknown. Previous searches for CNS in plants have either anchored on exons and only identified nearby sequences or required years of painstaking manual annotation. Here we present an open source tool that can accurately identify CNSs between any two related species with sequenced genomes, including both those immediately adjacent to exons and distal sequences separated by >12 kb of non-coding sequence. We have used this tool to characterize new motifs, associate CNSs with additional functions, and identify previously undetected genes encoding RNA and protein in the genomes of five grass species. We provide a list of 15,363 orthologous CNSs conserved across all grasses tested. We were also able to identify regulatory sequences present in the common ancestor of grasses that have been lost in one or more extant grass lineages. Lists of orthologous gene pairs and associated CNSs are provided for reference inbred lines of arabidopsis, Japonica rice, foxtail millet, sorghum, brachypodium, and maize. PMID:23874343

  17. Properties of Sequence Conservation in Upstream Regulatory and Protein Coding Sequences among Paralogs in Arabidopsis thaliana

    NASA Astrophysics Data System (ADS)

    Richardson, Dale N.; Wiehe, Thomas

    Whole genome duplication (WGD) has catalyzed the formation of new species, genes with novel functions, altered expression patterns, complexified signaling pathways and has provided organisms a level of genetic robustness. We studied the long-term evolution and interrelationships of 5’ upstream regulatory sequences (URSs), protein coding sequences (CDSs) and expression correlations (EC) of duplicated gene pairs in Arabidopsis. Three distinct methods revealed significant evolutionary conservation between paralogous URSs and were highly correlated with microarray-based expression correlation of the respective gene pairs. Positional information on exact matches between sequences unveiled the contribution of micro-chromosomal rearrangements on expression divergence. A three-way rank analysis of URS similarity, CDS divergence and EC uncovered specific gene functional biases. Transcription factor activity was associated with gene pairs exhibiting conserved URSs and divergent CDSs, whereas a broad array of metabolic enzymes was found to be associated with gene pairs showing diverged URSs but conserved CDSs.

  18. Conservation patterns in different functional sequence categoriesof divergent Drosophila species

    SciTech Connect

    Papatsenko, Dmitri; Kislyuk, Andrey; Levine, Michael; Dubchak, Inna

    2005-10-01

    We have explored the distributions of fully conservedungapped blocks in genome-wide pairwise alignments of recently completedspecies of Drosophila: D.yakuba, D.ananassae, D.pseudoobscura, D.virilisand D.mojavensis. Based on these distributions we have found that nearlyevery functional sequence category possesses its own distinctiveconservation pattern, sometimes independent of the overall sequenceconservation level. In the coding and regulatory regions, the ungappedblocks were longer than in introns, UTRs and non-functional sequences. Atthe same time, the blocks in the coding regions carried 3N+2 signaturecharacteristic to synonymic substitutions in the 3rd codon positions.Larger block sizes in transcription regulatory regions can be explainedby the presence of conserved arrays of binding sites for transcriptionfactors. We also have shown that the longest ungapped blocks, or'ultraconserved' sequences, are associated with specific gene groups,including those encoding ion channels and components of the cytoskeleton.We discussed how restrained conservation patterns may help in mappingfunctional sequence categories and improving genomeannotation.

  19. Sturgeon conservation genomics: SNP discovery and validation using RAD sequencing.

    PubMed

    Ogden, R; Gharbi, K; Mugue, N; Martinsohn, J; Senn, H; Davey, J W; Pourkazemi, M; McEwing, R; Eland, C; Vidotto, M; Sergeev, A; Congiu, L

    2013-06-01

    Caviar-producing sturgeons belonging to the genus Acipenser are considered to be one of the most endangered species groups in the world. Continued overfishing in spite of increasing legislation, zero catch quotas and extensive aquaculture production have led to the collapse of wild stocks across Europe and Asia. The evolutionary relationships among Adriatic, Russian, Persian and Siberian sturgeons are complex because of past introgression events and remain poorly understood. Conservation management, traceability and enforcement suffer a lack of appropriate DNA markers for the genetic identification of sturgeon at the species, population and individual level. This study employed RAD sequencing to discover and characterize single nucleotide polymorphism (SNP) DNA markers for use in sturgeon conservation in these four tetraploid species over three biological levels, using a single sequencing lane. Four population meta-samples and eight individual samples from one family were barcoded separately before sequencing. Analysis of 14.4 Gb of paired-end RAD data focused on the identification of SNPs in the paired-end contig, with subsequent in silico and empirical validation of candidate markers. Thousands of putatively informative markers were identified including, for the first time, SNPs that show population-wide differentiation between Russian and Persian sturgeons, representing an important advance in our ability to manage these cryptic species. The results highlight the challenges of genotyping-by-sequencing in polyploid taxa, while establishing the potential genetic resources for developing a new range of caviar traceability and enforcement tools. PMID:23473098

  20. Conservation patterns in angiosperm rDNA ITS2 sequences.

    PubMed Central

    Hershkovitz, M A; Zimmer, E A

    1996-01-01

    The two internal transcribed spacers (ITS1 and ITS2) of nuclear ribosomal DNA have become commonly exploited sources of informative variation for interspecific-/intergeneric-level phylogenetic analyses among angiosperms and other eukaryotes. We present an alignment in which one-third to one-half of the ITS2 sequence is alignable above the family level in angiosperms and a phenetic analysis showing that ITS2 contains information sufficient to diagnose lineages at several hierarchical levels. Base compositional analysis shows that angiosperm ITS2 is inherently GC-rich, and that the proportion of T is much more variable than that for other bases. We propose a general model of angiosperm ITS2 secondary structure that shows common pairing relationships for most of the conserved sequence tracts. Variations in our secondary structure predictions for sequences from different taxa indicate that compensatory mutation is not limited to paired positions. PMID:8760866

  1. Conservative Patch Algorithm and Mesh Sequencing for PAB3D

    NASA Technical Reports Server (NTRS)

    Pao, S. P.; Abdol-Hamid, K. S.

    2005-01-01

    A mesh-sequencing algorithm and a conservative patched-grid-interface algorithm (hereafter Patch Algorithm ) have been incorporated into the PAB3D code, which is a computer program that solves the Navier-Stokes equations for the simulation of subsonic, transonic, or supersonic flows surrounding an aircraft or other complex aerodynamic shapes. These algorithms are efficient, flexible, and have added tremendously to the capabilities of PAB3D. The mesh-sequencing algorithm makes it possible to perform preliminary computations using only a fraction of the grid cells (provided the original cell count is divisible by an integer) along any grid coordinate axis, independently of the other axes. The patch algorithm addresses another critical need in multi-block grid situation where the cell faces of adjacent grid blocks may not coincide, leading to errors in calculating fluxes of conserved physical quantities across interfaces between the blocks. The patch algorithm, based on the Stokes integral formulation of the applicable conservation laws, effectively matches each of the interfacial cells on one side of the block interface to the corresponding fractional cell area pieces on the other side. This approach is comprehensive and unified such that all interface topology is automatically processed without user intervention. This algorithm is implemented in a preprocessing code that creates a cell-by-cell database that will maintain flux conservation at any level of full or reduced grid density as the user may choose by way of the mesh-sequencing algorithm. These two algorithms have enhanced the numerical accuracy of the code, reduced the time and effort for grid preprocessing, and provided users with the flexibility of performing computations at any desired full or reduced grid resolution to suit their specific computational requirements.

  2. Conserved Sequences at the Origin of Adenovirus DNA Replication

    PubMed Central

    Stillman, Bruce W.; Topp, William C.; Engler, Jeffrey A.

    1982-01-01

    The origin of adenovirus DNA replication lies within an inverted sequence repetition at either end of the linear, double-stranded viral DNA. Initiation of DNA replication is primed by a deoxynucleoside that is covalently linked to a protein, which remains bound to the newly synthesized DNA. We demonstrate that virion-derived DNA-protein complexes from five human adenovirus serological subgroups (A to E) can act as a template for both the initiation and the elongation of DNA replication in vitro, using nuclear extracts from adenovirus type 2 (Ad2)-infected HeLa cells. The heterologous template DNA-protein complexes were not as active as the homologous Ad2 DNA, most probably due to inefficient initiation by Ad2 replication factors. In an attempt to identify common features which may permit this replication, we have also sequenced the inverted terminal repeated DNA from human adenovirus serotypes Ad4 (group E), Ad9 and Ad10 (group D), and Ad31 (group A), and we have compared these to previously determined sequences from Ad2 and Ad5 (group C), Ad7 (group B), and Ad12 and Ad18 (group A) DNA. In all cases, the sequence around the origin of DNA replication can be divided into two structural domains: a proximal A · T-rich region which is partially conserved among these serotypes, and a distal G · C-rich region which is less well conserved. The G · C-rich region contains sequences similar to sequences present in papovavirus replication origins. The two domains may reflect a dual mechanism for initiation of DNA replication: adenovirus-specific protein priming of replication, and subsequent utilization of this primer by host replication factors for completion of DNA synthesis. Images PMID:7143575

  3. Conserved sequences in the carboxyl terminus of integrase that are essential for human immunodeficiency virus type 1 replication.

    PubMed

    Cannon, P M; Byles, E D; Kingsman, S M; Kingsman, A J

    1996-01-01

    We have previously identified a residue in the carboxyl terminus of human immunodeficiency virus type 1 integrase (HIV-1 IN), W-235, the requirement for which is only revealed in viral assays for integrase function (P. M. Cannon, W. Wilson, E. Byles, S. M. Kingsman, and A. J. Kingsman, J. Virol. 68:4768-4775, 1994). Our further analysis of this region of retroviral IN has now identified several sequence motifs which are conserved in all the retroviruses we examined, apart from human spumaretrovirus. We have made mutations within these motifs in HIV-1 IN and examined their phenotypes when reintroduced into an infectious proviral clone. The deleterious effects of several of these mutations demonstrate the importance of these regions for IN function in vivo. We observed a further discrepancy, at a motif that is only conserved in the lentiviruses, in the ability of mutants to function in in vitro and in vivo assays. Substitutions both in this region and at W-235 abolish HIV-1 infectivity but do not affect particle production, morphology, reverse transcription, or nuclear import in T-cell lines. Taken together with the in vitro data suggesting that neither of these residues is directly involved in the catalytic reactions of IN, it seems likely that we have identified regions of IN that are essential for interactions with other components of the integration machinery. PMID:8523588

  4. Nucleotide sequence and organization of the human S-protein gene: repeating peptide motifs in the pexin family and a model for their evolution

    SciTech Connect

    Jenne, D.; Stanley, K.K.

    1987-10-20

    The S-protein/vitronectin gene was isolated from a human genomic DNA library, and its sequence of about 5.3 kilobases including the adjacent 5' and 3' flanking regions was established. Alignment of the genomic DNA nucleotide sequence and the cDNA sequence indicated that the gene consisted of eight exons and seven introns. The intron positions in the S-protein gene and their phase type were compared to those in the hemopexin gene which shares amino acid sequence homologies with transin and the S-protein. Three introns have been found at equivalent positions; two other introns are very close to these positions and are interpreted as cases of intron sliding. Introns 3-7 occur at a conserved glycine residue within repeating peptide segments, whereas introns 1 and 2 are at the boundaries of the Somatomedin B domain of S-protein. The analysis of the exon structure in relations to repeating peptide motifs within the S-protein strongly suggest that it contains only seven repeats, one less than the hemopexin molecule. A very similar repeat pattern like that in hemopexin is shown to be present also in two other related proteins, transin and interstitial collagenase. An evolutionary model for the generation of the repeat pattern in the S-protein and the other members of this novel pexin gene family is proposed, and the sequence modifications for some of the repeats during divergent evolution are discussed in relation to know unique functional properties of hemopexin and S-protein.

  5. Fox-2 Splicing Factor Binds to a Conserved Intron Motif to PromoteInclusion of Protein 4.1R Alternative Exon 16

    SciTech Connect

    Ponthier, Julie L.; Schluepen, Christina; Chen, Weiguo; Lersch,Robert A.; Gee, Sherry L.; Hou, Victor C.; Lo, Annie J.; Short, Sarah A.; Chasis, Joel A.; Winkelmann, John C.; Conboy, John G.

    2006-03-01

    Activation of protein 4.1R exon 16 (E16) inclusion during erythropoiesis represents a physiologically important splicing switch that increases 4.1R affinity for spectrin and actin. Previous studies showed that negative regulation of E16 splicing is mediated by the binding of hnRNP A/B proteins to silencer elements in the exon and that downregulation of hnRNP A/B proteins in erythroblasts leads to activation of E16 inclusion. This paper demonstrates that positive regulation of E16 splicing can be mediated by Fox-2 or Fox-1, two closely related splicing factors that possess identical RNA recognition motifs. SELEX experiments with human Fox-1 revealed highly selective binding to the hexamer UGCAUG. Both Fox-1 and Fox-2 were able to bind the conserved UGCAUG elements in the proximal intron downstream of E16, and both could activate E16 splicing in HeLa cell co-transfection assays in a UGCAUG-dependent manner. Conversely, knockdown of Fox-2 expression, achieved with two different siRNA sequences resulted in decreased E16 splicing. Moreover, immunoblot experiments demonstrate mouse erythroblasts express Fox-2, but not Fox-1. These findings suggest that Fox-2 is a physiological activator of E16 splicing in differentiating erythroid cells in vivo. Recent experiments show that UGCAUG is present in the proximal intron sequence of many tissue-specific alternative exons, and we propose that the Fox family of splicing enhancers plays an important role in alternative splicing switches during differentiation in metazoan organisms.

  6. In Vivo Enhancer Analysis Chromosome 16 Conserved NoncodingSequences

    SciTech Connect

    Pennacchio, Len A.; Ahituv, Nadav; Moses, Alan M.; Nobrega,Marcelo; Prabhakar, Shyam; Shoukry, Malak; Minovitsky, Simon; Visel,Axel; Dubchak, Inna; Holt, Amy; Lewis, Keith D.; Plajzer-Frick, Ingrid; Akiyama, Jennifer; De Val, Sarah; Afzal, Veena; Black, Brian L.; Couronne, Olivier; Eisen, Michael B.; Rubin, Edward M.

    2006-02-01

    The identification of enhancers with predicted specificitiesin vertebrate genomes remains a significant challenge that is hampered bya lack of experimentally validated training sets. In this study, weleveraged extreme evolutionary sequence conservation as a filter toidentify putative gene regulatory elements and characterized the in vivoenhancer activity of human-fish conserved and ultraconserved1 noncodingelements on human chromosome 16 as well as such elements from elsewherein the genome. We initially tested 165 of these extremely conservedsequences in a transgenic mouse enhancer assay and observed that 48percent (79/165) functioned reproducibly as tissue-specific enhancers ofgene expression at embryonic day 11.5. While driving expression in abroad range of anatomical structures in the embryo, the majority of the79 enhancers drove expression in various regions of the developingnervous system. Studying a set of DNA elements that specifically droveforebrain expression, we identified DNA signatures specifically enrichedin these elements and used these parameters to rank all ~;3,400human-fugu conserved noncoding elements in the human genome. The testingof the top predictions in transgenic mice resulted in a three-foldenrichment for sequences with forebrain enhancer activity. These datadramatically expand the catalogue of in vivo-characterized human geneenhancers and illustrate the future utility of such training sets for avariety of iological applications including decoding the regulatoryvocabulary of the human genome.

  7. Conserved Ser/Arg-rich Motif in PPZ Orthologs from Fungi Is Important for Its Role in Cation Tolerance

    PubMed Central

    Minhas, Anupriya; Sharma, Anupam; Kaur, Harsimran; Rawal, Yashpal; Ganesan, Kaliannan; Mondal, Alok K.

    2012-01-01

    PPZ1 orthologs, novel members of a phosphoprotein phosphatase family of phosphatases, are found only in fungi. They regulate diverse physiological processes in fungi e.g. ion homeostasis, cell size, cell integrity, etc. Although they are an important determinant of salt tolerance in fungi, their physiological role remained unexplored in any halotolerant species. In this context we report here molecular and functional characterization of DhPPZ1 from Debaryomyces hansenii, which is one of the most halotolerant and osmotolerant species of yeast. Our results showed that DhPPZ1 knock-out strain displayed higher tolerance to toxic cations, and unlike in Saccharomyces cerevisiae, Na+/H+ antiporter appeared to have an important role in this process. Besides salt tolerance, DhPPZ1 also had role in cell wall integrity and growth in D. hansenii. We have also identified a short, serine-arginine-rich sequence motif in DhPpz1p that is essential for its role in salt tolerance but not in other physiological processes. Taken together, these results underscore a distinct role of DhPpz1p in D. hansenii and illustrate an example of how organisms utilize the same molecular tool box differently to garner adaptive fitness for their respective ecological niches. PMID:22232558

  8. Co-conservation of rRNA tetraloop sequences and helix length suggests involvement of the tetraloops in higher-order interactions

    NASA Technical Reports Server (NTRS)

    Hedenstierna, K. O.; Siefert, J. L.; Fox, G. E.; Murgola, E. J.

    2000-01-01

    Terminal loops containing four nucleotides (tetraloops) are common in structural RNAs, and they frequently conform to one of three sequence motifs, GNRA, UNCG, or CUUG. Here we compare available sequences and secondary structures for rRNAs from bacteria, and we show that helices capped by phylogenetically conserved GNRA loops display a strong tendency to be of conserved length. The simplest interpretation of this correlation is that the conserved GNRA loops are involved in higher-order interactions, intramolecular or intermolecular, resulting in a selective pressure for maintaining the lengths of these helices. A small number of conserved UNCG loops were also found to be associated with conserved length helices, consistent with the possibility that this type of tetraloop also takes part in higher-order interactions.

  9. The Ku-binding motif is a conserved module for recruitment and stimulation of non-homologous end-joining proteins

    PubMed Central

    Grundy, Gabrielle J.; Rulten, Stuart L.; Arribas-Bosacoma, Raquel; Davidson, Kathryn; Kozik, Zuzanna; Oliver, Antony W.; Pearl, Laurence H.; Caldecott, Keith W.

    2016-01-01

    The Ku-binding motif (KBM) is a short peptide module first identified in APLF that we now show is also present in Werner syndrome protein (WRN) and in Modulator of retrovirus infection homologue (MRI). We also identify a related but functionally distinct motif in XLF, WRN, MRI and PAXX, which we denote the XLF-like motif. We show that WRN possesses two KBMs; one at the N terminus next to the exonuclease domain and one at the C terminus next to an XLF-like motif. We reveal that the WRN C-terminal KBM and XLF-like motif function cooperatively to bind Ku complexes and that the N-terminal KBM mediates Ku-dependent stimulation of WRN exonuclease activity. We also show that WRN accelerates DSB repair by a mechanism requiring both KBMs, demonstrating the importance of WRN interaction with Ku. These data define a conserved family of KBMs that function as molecular tethers to recruit and/or stimulate enzymes during NHEJ. PMID:27063109

  10. Assessment of the potential contribution of the highly conserved C-terminal motif (C10) of Borrelia burgdorferi outer surface protein C in transmission and infectivity.

    PubMed

    Earnhart, Christopher G; Rhodes, DeLacy V L; Smith, Alexis A; Yang, Xiuli; Tegels, Brittney; Carlyon, Jason A; Pal, Utpal; Marconi, Richard T

    2014-03-01

    OspC is produced by all species of the Borrelia burgdorferi sensu lato complex and is required for infectivity in mammals. To test the hypothesis that the conserved C-terminal motif (C10) of OspC is required for function in vivo, a mutant B. burgdorferi strain (B31::ospCΔC10) was created in which ospC was replaced with an ospC gene lacking the C10 motif. The ability of the mutant to infect mice was investigated using tick transmission and needle inoculation. Infectivity was assessed by cultivation, qRT-PCR, and measurement of IgG antibody responses. B31::ospCΔC10 retained the ability to infect mice by both needle and tick challenge and was competent to survive in ticks after exposure to the blood meal. To determine whether recombinant OspC protein lacking the C-terminal 10 amino acid residues (rOspCΔC10) can bind plasminogen, the only known mammalian-derived ligand for OspC, binding analyses were performed. Deletion of the C10 motif resulted in a statistically significant decrease in plasminogen binding. Although deletion of the C10 motif influenced plasminogen binding, it can be concluded that the C10 motif is not required for OspC to carry out its critical in vivo functions in tick to mouse transmission. PMID:24376161

  11. Rice MEL2, the RNA recognition motif (RRM) protein, binds in vitro to meiosis-expressed genes containing U-rich RNA consensus sequences in the 3'-UTR.

    PubMed

    Miyazaki, Saori; Sato, Yutaka; Asano, Tomoya; Nagamura, Yoshiaki; Nonomura, Ken-Ichi

    2015-10-01

    Post-transcriptional gene regulation by RNA recognition motif (RRM) proteins through binding to cis-elements in the 3'-untranslated region (3'-UTR) is widely used in eukaryotes to complete various biological processes. Rice MEIOSIS ARRESTED AT LEPTOTENE2 (MEL2) is the RRM protein that functions in the transition to meiosis in proper timing. The MEL2 RRM preferentially associated with the U-rich RNA consensus, UUAGUU[U/A][U/G][A/U/G]U, dependently on sequences and proportionally to MEL2 protein amounts in vitro. The consensus sequences were located in the putative looped structures of the RNA ligand. A genome-wide survey revealed a tendency of MEL2-binding consensus appearing in 3'-UTR of rice genes. Of 249 genes that conserved the consensus in their 3'-UTR, 13 genes spatiotemporally co-expressed with MEL2 in meiotic flowers, and included several genes whose function was supposed in meiosis; such as Replication protein A and OsMADS3. The proteome analysis revealed that the amounts of small ubiquitin-related modifier-like protein and eukaryotic translation initiation factor3-like protein were dramatically altered in mel2 mutant anthers. Taken together with transcriptome and gene ontology results, we propose that the rice MEL2 is involved in the translational regulation of key meiotic genes on 3'-UTRs to achieve the faithful transition of germ cells to meiosis. PMID:26319516

  12. The BEN domain is a novel sequence-specific DNA-binding domain conserved in neural transcriptional repressors

    PubMed Central

    Dai, Qi; Ren, Aiming; Westholm, Jakub O.; Serganov, Artem A.; Patel, Dinshaw J.; Lai, Eric C.

    2013-01-01

    We recently reported that Drosophila Insensitive (Insv) promotes sensory organ development and has activity as a nuclear corepressor for the Notch transcription factor Suppressor of Hairless [Su(H)]. Insv lacks domains of known biochemical function but contains a single BEN domain (i.e., a “BEN-solo” protein). Our chromatin immunoprecipitation (ChIP) sequencing (ChIP-seq) analysis confirmed binding of Insensitive to Su(H) target genes in the Enhancer of split gene complex [E(spl)-C]; however, de novo motif analysis revealed a novel site strongly enriched in Insv peaks (TCYAATHRGAA). We validate binding of endogenous Insv to genomic regions bearing such sites, whose associated genes are enriched for neural functions and are functionally repressed by Insv. Unexpectedly, we found that the Insv BEN domain binds specifically to this sequence motif and that Insv directly regulates transcription via this motif. We determined the crystal structure of the BEN–DNA target complex, revealing homodimeric binding of the BEN domain and extensive nucleotide contacts via α helices and a C-terminal loop. Point mutations in key DNA-contacting residues severely impair DNA binding in vitro and capacity for transcriptional regulation in vivo. We further demonstrate DNA-binding and repression activities by the mammalian neural BEN-solo protein BEND5. Altogether, we define novel DNA-binding activity in a conserved family of transcriptional repressors, opening a molecular window on this extensive gene family. PMID:23468431

  13. Rapid characterization of CRISPR-Cas9 protospacer adjacent motif sequence elements.

    PubMed

    Karvelis, Tautvydas; Gasiunas, Giedrius; Young, Joshua; Bigelyte, Greta; Silanskas, Arunas; Cigan, Mark; Siksnys, Virginijus

    2015-01-01

    To expand the repertoire of Cas9s available for genome targeting, we present a new in vitro method for the simultaneous examination of guide RNA and protospacer adjacent motif (PAM) requirements. The method relies on the in vitro cleavage of plasmid libraries containing a randomized PAM as a function of Cas9-guide RNA complex concentration. Using this method, we accurately reproduce the canonical PAM preferences for Streptococcus pyogenes, Streptococcus thermophilus CRISPR3 (Sth3), and CRISPR1 (Sth1). Additionally, PAM and sgRNA solutions for a novel Cas9 protein from Brevibacillus laterosporus are provided by the assay and are demonstrated to support functional activity in vitro and in plants. PMID:26585795

  14. An apoptosis-inhibiting gene from a nuclear polyhedrosis virus encoding a polypeptide with Cys/His sequence motifs.

    PubMed Central

    Birnbaum, M J; Clem, R J; Miller, L K

    1994-01-01

    Two different baculovirus genes are known to be able to block apoptosis triggered upon infection of Spodoptera frugiperda cells with p35 mutants of the insect baculovirus Autographa californica nuclear polyhedrosis virus (AcMNPV):p35 (P35-encoding gene) of AcMNPV (R. J. Clem, M. Fechheimer, and L. K. Miller, Science 254:1388-1390, 1991) and iap (inhibitor of apoptosis gene) of Cydia pomonella granulosis virus (CpGV) (N. E. Crook, R. J. Clem, and L. K. Miller, J. Virol. 67:2168-2174, 1993). Using a genetic complementation assay to identify additional genes which inhibit apoptosis during infection with a p35 mutant, we have isolated a gene from Orgyia pseudotsugata NPV (OpMNPV) that was able to functionally substitute for AcMNPV p35. The nucleotide sequence of this gene, Op-iap, predicted a 30-kDa polypeptide product with approximately 58% amino acid sequence identity to the product of CpGV iap, Cp-IAP. Like Cp-IAP, the predicted product of Op-iap has a carboxy-terminal C3HC4 zinc finger-like motif. In addition, a pair of additional cysteine/histidine motifs were found in the N-terminal regions of both polypeptide sequences. Recombinant p35 mutant viruses carrying either Op-iap or Cp-iap appeared to have a normal phenotype in S. frugiperda cells. Thus, Cp-IAP and Op-IAP appear to be functionally analogous to P35 but are likely to block apoptosis by a different mechanism which may involve direct interaction with DNA. Images PMID:8139034

  15. An apoptosis-inhibiting gene from a nuclear polyhedrosis virus encoding a polypeptide with Cys/His sequence motifs.

    PubMed

    Birnbaum, M J; Clem, R J; Miller, L K

    1994-04-01

    Two different baculovirus genes are known to be able to block apoptosis triggered upon infection of Spodoptera frugiperda cells with p35 mutants of the insect baculovirus Autographa californica nuclear polyhedrosis virus (AcMNPV):p35 (P35-encoding gene) of AcMNPV (R. J. Clem, M. Fechheimer, and L. K. Miller, Science 254:1388-1390, 1991) and iap (inhibitor of apoptosis gene) of Cydia pomonella granulosis virus (CpGV) (N. E. Crook, R. J. Clem, and L. K. Miller, J. Virol. 67:2168-2174, 1993). Using a genetic complementation assay to identify additional genes which inhibit apoptosis during infection with a p35 mutant, we have isolated a gene from Orgyia pseudotsugata NPV (OpMNPV) that was able to functionally substitute for AcMNPV p35. The nucleotide sequence of this gene, Op-iap, predicted a 30-kDa polypeptide product with approximately 58% amino acid sequence identity to the product of CpGV iap, Cp-IAP. Like Cp-IAP, the predicted product of Op-iap has a carboxy-terminal C3HC4 zinc finger-like motif. In addition, a pair of additional cysteine/histidine motifs were found in the N-terminal regions of both polypeptide sequences. Recombinant p35 mutant viruses carrying either Op-iap or Cp-iap appeared to have a normal phenotype in S. frugiperda cells. Thus, Cp-IAP and Op-IAP appear to be functionally analogous to P35 but are likely to block apoptosis by a different mechanism which may involve direct interaction with DNA. PMID:8139034

  16. A comparative genomics strategy for targeted discovery of single-nucleotide polymorphisms and conserved-noncoding sequences in orphan crops.

    PubMed

    Feltus, F A; Singh, H P; Lohithaswa, H C; Schulze, S R; Silva, T D; Paterson, A H

    2006-04-01

    Completed genome sequences provide templates for the design of genome analysis tools in orphan species lacking sequence information. To demonstrate this principle, we designed 384 PCR primer pairs to conserved exonic regions flanking introns, using Sorghum/Pennisetum expressed sequence tag alignments to the Oryza genome. Conserved-intron scanning primers (CISPs) amplified single-copy loci at 37% to 80% success rates in taxa that sample much of the approximately 50-million years of Poaceae divergence. While the conserved nature of exons fostered cross-taxon amplification, the lesser evolutionary constraints on introns enhanced single-nucleotide polymorphism detection. For example, in eight rice (Oryza sativa) genotypes, polymorphism averaged 12.1 per kb in introns but only 3.6 per kb in exons. Curiously, among 124 CISPs evaluated across Oryza, Sorghum, Pennisetum, Cynodon, Eragrostis, Zea, Triticum, and Hordeum, 23 (18.5%) seemed to be subject to rigid intron size constraints that were independent of per-nucleotide DNA sequence variation. Furthermore, we identified 487 conserved-noncoding sequence motifs in 129 CISP loci. A large CISP set (6,062 primer pairs, amplifying introns from 1,676 genes) designed using an automated pipeline showed generally higher abundance in recombinogenic than in nonrecombinogenic regions of the rice genome, thus providing relatively even distribution along genetic maps. CISPs are an effective means to explore poorly characterized genomes for both DNA polymorphism and noncoding sequence conservation on a genome-wide or candidate gene basis, and also provide anchor points for comparative genomics across a diverse range of species. PMID:16607031

  17. Juvenile hormone regulates Aedes aegypti Krüppel homolog 1 through a conserved E box motif.

    PubMed

    Cui, Yingjun; Sui, Yipeng; Xu, Jingjing; Zhu, Fang; Palli, Subba Reddy

    2014-09-01

    Juvenile hormone (JH) plays important roles in regulation of many physiological processes including development, reproduction and metabolism in insects. However, the molecular mechanisms of JH signaling pathway are not completely understood. To elucidate the molecular mechanisms of JH regulation of Krüppel homolog 1 gene (Kr-h1) in Aedes aegypti, we employed JH-sensitive Aag-2 cells developed from the embryos of this insect. In Aag-2 cells, AaKr-h1 gene is induced by nanomolar concentration of JH III, its expression peaked at 1.5 h after treatment with JH III. RNAi studies showed that JH induction of this gene requires the presence of Ae. aegypti methoprene-tolerant (AaMet). A conserved 13 nucleotide JH response element (JHRE, TGCCTCCACGTGC) containing canonical E box motif (underlined) identified in the promoter of AaKr-h1 is required for JH induction of this gene. Critical nucleotides in the JHRE required for JH action were identified by employing mutagenesis and reporter assays. Reporter assays also showed that basic helix loop helix (bHLH) domain of AaMet is required for JH induction of AaKr-h1. 5' rapid amplification of cDNA ends method identified two isoforms of AaKr-h1, AaKr-h1α and AaKr-h1β, the expression of both isoforms is induced by JH III, but AaKr-h1α is the predominant isoform in both Aag-2 cells and Ae. aegypti larvae. PMID:24931431

  18. In vitro enzymatic activity of human immunodeficiency virus type 1 reverse transcriptase mutants in the highly conserved YMDD amino acid motif correlates with the infectious potential of the proviral genome.

    PubMed Central

    Wakefield, J K; Jablonski, S A; Morrow, C D

    1992-01-01

    Reverse transcriptases contain a highly conserved YXDD amino acid motif believed to be important in enzyme function. The second amino acid is not strictly conserved, with a methionine, valine or alanine occupying the second position in reverse transcriptases from various retroviruses and retroelements. Recently, a 3.5-A (0.35-nm) resolution electron density map of human immunodeficiency virus type 1 (HIV-1) reverse transcriptase positioned the YMDD motif within an antiparallel beta-hairpin structure which forms a portion of its catalytic site. To further explore the role of methionine of the conserved YMDD motif in HIV-1 reverse transcriptase function, we have substituted methionine with a valine, alanine, serine, glycine, or proline, reflecting in some cases sequence motifs of other related reverse transcriptases. Wild-type and mutant enzymes were expressed in Escherichia coli, partially purified by phosphocellulose chromatography, and assayed for the capacity to polymerize TTP by using a homopolymeric template [poly(rA)] with either a DNA [oligo(dT)] or an RNA [oligo(U)] primer. With a poly(rA).oligo(dT) template-primer, reverse transcriptases with the methionine replaced by valine (YVDD), serine (YSDD), or alanine (YADD) were 70 to 100% as active as the wild type, while those with the glycine substitution (YGDD) were approximately 5 to 10% as active. A proline substitution (YPDD) completely inactivated the enzyme. With a poly(rA).oligo(U) template-primer, only the activity of mutants with YVDD was similar to that of the wild type, while mutants with YADD and YSDD were approximately 5 to 10% as active as the wild-type enzyme. The reverse transcriptases with the YGDD and YPDD mutations demonstrated no activity above background. Proviruses containing the reverse transcriptase with the valine mutation (YVDD) produced viruses with infectivities similar to that of the wild type, as determined by measurement of p24 antigen in culture supernatants and visual inspection

  19. Core sequence in the RNA motif recognized by the ErmE methyltransferase revealed by relaxing the fidelity of the enzyme for its target.

    PubMed Central

    Hansen, L H; Vester, B; Douthwaite, S

    1999-01-01

    Under physiological conditions, the ErmE methyltransferase specifically modifies a single adenosine within ribosomal RNA (rRNA), and thereby confers resistance to multiple antibiotics. The adenosine (A2058 in Escherichia coli 23S rRNA) lies within a highly conserved structure, and is methylated efficiently, and with equally high fidelity, in rRNAs from phylogenetically diverse bacteria. However, the fidelity of ErmE is reduced when magnesium is removed, and over twenty new sites of ErmE methylation appear in E. coli 16S and 23S rRNAs. These sites show widely different degrees of reactivity to ErmE. The canonical A2058 site is largely unaffected by magnesium depletion and remains the most reactive site in the rRNA. This suggests that methylation at the new sites results from changes in the RNA substrate rather than the methyltransferase. Chemical probing confirms that the rRNA structure opens upon magnesium depletion, exposing potential new interaction sites to the enzyme. The new ErmE sites show homology with the canonical A2058 site, and have the consensus sequence aNNNcgGAHAg (ErmE methylation occurs exclusively at adenosines (underlined); these are preceded by a guanosine, equivalent to G2057; there is a high preference for the adenosine equivalent to A2060; H is any nucleotide except G; N is any nucleotide; and there are slight preferences for the nucleotides shown in lower case). This consensus is believed to represent the core of the motif that Erm methyltransferases recognize at their canonical A2058 site. The data also reveal constraints on the higher order structure of the motif that affect methyltransferase recognition. PMID:9917069

  20. Hemagglutinin Sequence Conservation Guided Stem Immunogen Design from Influenza A H3 Subtype

    PubMed Central

    Mallajosyula, V. Vamsee Aditya; Citron, Michael; Ferrara, Francesca; Temperton, Nigel J.; Liang, Xiaoping; Flynn, Jessica A.; Varadarajan, Raghavan

    2015-01-01

    Seasonal epidemics caused by influenza A (H1 and H3 subtypes) and B viruses are a major global health threat. The traditional, trivalent influenza vaccines have limited efficacy because of rapid antigenic evolution of the circulating viruses. This antigenic variability mediates viral escape from the host immune responses, necessitating annual vaccine updates. Influenza vaccines elicit a protective antibody response, primarily targeting the viral surface glycoprotein hemagglutinin (HA). However, the predominant humoral response is against the hypervariable head domain of HA, thereby restricting the breadth of protection. In contrast, the conserved, subdominant stem domain of HA is a potential “universal” vaccine candidate. We designed an HA stem-fragment immunogen from the 1968 pandemic H3N2 strain (A/Hong Kong/1/68) guided by a comprehensive H3 HA sequence conservation analysis. The biophysical properties of the designed immunogen were further improved by C-terminal fusion of a trimerization motif, “isoleucine-zipper”, or “foldon”. These immunogens elicited cross-reactive, antiviral antibodies and conferred partial protection against a lethal, homologous HK68 virus challenge in vivo. Furthermore, bacterial expression of these immunogens is economical and facilitates rapid scale-up. PMID:26167164

  1. Genetic diversity of the conserved motifs of six bacterial leaf blight resistance genes in a set of rice landraces

    PubMed Central

    2014-01-01

    Background Bacterial leaf blight (BLB) caused by the vascular pathogen Xanthomonas oryzae pv. oryzae (Xoo) is one of the most serious diseases leading to crop failure in rice growing countries. A total of 37 resistance genes against Xoo has been identified in rice. Of these, ten BLB resistance genes have been mapped on rice chromosomes, while 6 have been cloned, sequenced and characterized. Diversity analysis at the resistance gene level of this disease is scanty, and the landraces from West Bengal and North Eastern states of India have received little attention so far. The objective of this study was to assess the genetic diversity at conserved domains of 6 BLB resistance genes in a set of 22 rice accessions including landraces and check genotypes collected from the states of Assam, Nagaland, Mizoram and West Bengal. Results In this study 34 pairs of primers were designed from conserved domains of 6 BLB resistance genes; Xa1, xa5, Xa21, Xa21(A1), Xa26 and Xa27. The designed primer pairs were used to generate PCR based polymorphic DNA profiles to detect and elucidate the genetic diversity of the six genes in the 22 diverse rice accessions of known disease phenotype. A total of 140 alleles were identified including 41 rare and 26 null alleles. The average polymorphism information content (PIC) value was 0.56/primer pair. The DNA profiles identified each of the rice landraces unequivocally. The amplified polymorphic DNA bands were used to calculate genetic similarity of the rice landraces in all possible pair combinations. The similarity among the rice accessions ranged from 18% to 89% and the dendrogram produced from the similarity values was divided into 2 major clusters. The conserved domains identified within the sequenced rare alleles include Leucine-Rich Repeat, BED-type zinc finger domain, sugar transferase domain and the domain of the carbohydrate esterase 4 superfamily. Conclusions This study revealed high genetic diversity at conserved domains of six BLB

  2. G-boxes, bigfoot genes, and environmental response: characterization of intragenomic conserved noncoding sequences in Arabidopsis.

    PubMed

    Freeling, Michael; Rapaka, Lakshmi; Lyons, Eric; Pedersen, Brent; Thomas, Brian C

    2007-05-01

    A tetraploidy left Arabidopsis thaliana with 6358 pairs of homoeologs that, when aligned, generated 14,944 intragenomic conserved noncoding sequences (CNSs). Our previous work assembled these phylogenetic footprints into a database. We show that known transcription factor (TF) binding motifs, including the G-box, are overrepresented in these CNSs. A total of 254 genes spanning long lengths of CNS-rich chromosomes (Bigfoot) dominate this database. Therefore, we made subdatabases: one containing Bigfoot genes and the other containing genes with three to five CNSs (Smallfoot). Bigfoot genes are generally TFs that respond to signals, with their modal CNS positioned 3.1 kb 5' from the ATG. Smallfoot genes encode components of signal transduction machinery, the cytoskeleton, or involve transcription. We queried each subdatabase with each possible 7-nucleotide sequence. Among hundreds of hits, most were purified from CNSs, and almost all of those significantly enriched in CNSs had no experimental history. The 7-mers in CNSs are not 5'- to 3'-oriented in Bigfoot genes but are often oriented in Smallfoot genes. CNSs with one G-box tend to have two G-boxes. CNSs were shared with the homoeolog only and with no other gene, suggesting that binding site turnover impedes detection. Bigfoot genes may function in adaptation to environmental change. PMID:17496117

  3. A Conserved Motif in the Membrane Proximal C-Terminal Tail of Human Muscarinic M1 Acetylcholine Receptors Affects Plasma Membrane Expression

    PubMed Central

    Ehlert, Frederick J.; Shults, Crystal A.

    2010-01-01

    We investigated the functional role of a conserved motif, F(x)6LL, in the membrane proximal C-tail of the human muscarinic M1 (hM1) receptor. By use of site-directed mutagenesis, several different point mutations were introduced into the C-tail sequence 423FRDTFRLLL431. Wild-type and mutant hM1 receptors were transiently expressed in Chinese hamster ovary cells, and the amount of plasma membrane-expressed receptor was determined by use of intact, whole-cell [3H]N-methylscopolamine binding assays. The plasma membrane expression of hM1 receptors possessing either L430A or L431A or both point mutations was significantly reduced compared with the wild type. The hM1 receptor possessing a L430A/L431A double-point mutation was retained in the endoplasmic reticulum (ER), and atropine treatment caused the redistribution of the mutant receptor from the ER to the plasma membrane. Atropine treatment also caused an increase in the maximal response and potency of carbachol-stimulated phosphoinositide hydrolysis elicited by the L430A/L431A mutant. The effect of atropine on the L430A/L431A receptor mutant suggests that L430 and L431 play a role in folding hM1 receptors, which is necessary for exit from the ER. Using site-directed mutagenesis, we also identified amino acid residues at the base of transmembrane-spanning domain 1 (TM1), V46 and L47, that, when mutated, reduce the plasma membrane expression of hM1 receptors in an atropine-reversible manner. Overall, these mutagenesis data show that amino acid residues in the membrane-proximal C-tail and base of TM1 are necessary for hM1 receptors to achieve a transport-competent state. PMID:19841475

  4. Cross-reactivity between the rheumatoid arthritis-associated motif EQKRAA and structurally related sequences found in Proteus mirabilis.

    PubMed

    Tiwana, H; Wilson, C; Alvarez, A; Abuknesha, R; Bansal, S; Ebringer, A

    1999-06-01

    Cross-reactivity or molecular mimicry may be one of the underlying mechanisms involved in the etiopathogenesis of rheumatoid arthritis (RA). Antiserum against the RA susceptibility sequence EQKRAA was shown to bind to a similar peptide ESRRAL present in the hemolysin of the gram-negative bacterium Proteus mirabilis, and an anti-ESRRAL serum reacted with EQKRAA. There was no reactivity with either anti-EQKRAA or anti-ESRRAL to a peptide containing the EDERAA sequence which is present in HLA-DRB1*0402, an allele not associated with RA. Furthermore, the EQKRAA and ESRRAL antisera bound to a mouse fibroblast transfectant cell line (Dap.3) expressing HLA-DRB1*0401 but not to DRB1*0402. However, peptide sequences structurally related to the RA susceptibility motif LEIEKDFTTYGEE (P. mirabilis urease), VEIRAEGNRFTY (collagen type II) and DELSPETSPYVKE (collagen type XI) did not bind significantly to cell lines expressing HLA-DRB1*0401 or HLA-DRB1*0402 compared to the control peptide YASGASGASGAS. It is suggested here that molecular mimicry between HLA alleles associated with RA and P. mirabilis may be relevant in the etiopathogenesis of the disease. PMID:10338479

  5. The tryptophan repressor sequence is highly conserved among the Enterobacteriaceae.

    PubMed Central

    Arvidson, D N; Arvidson, C G; Lawson, C L; Miner, J; Adams, C; Youderian, P

    1994-01-01

    Tryptophan biosynthesis in Escherichia coli is regulated by the product of the trpR gene, the tryptophan (Trp) repressor. Trp aporepressor binds the corepressor, L-tryptophan, to form a holorepressor complex, which binds trp operator DNA tightly, and inhibits transcription of the tryptophan biosynthetic operon. The conservation of trp operator sequences among enteric Gram-negative bacteria suggests that trpR genes from other bacterial species can be cloned by complementation in E. coli. To clone trpR homologues, a deletion of the E. coli trpR gene, delta trpR504, was made on a plasmid by site-directed mutagenesis, then crossed onto the E. coli genome. Plasmid clones of the trpR genes of Enterobacter aerogenes and Enterobacter cloacae were isolated by complementation of the delta trpR504 allele, scored as the ability to repress beta-galactosidase synthesis from a prophage-borne trpE-lacZ gene fusion. The predicted amino acid sequences of four enteric TrpR proteins show differences, clustered on the backside of the folded repressor, opposite the DNA-binding helix-turn-helix substructures. These differences are predicted to have little effect on the interactions of the aporepressor with tryptophan, holorepressor with operator DNA, or tandemly bound holorepressor dimers with one another. Although there is some variation observed at the dimer interface, interactions predicted to stabilize the interface are conserved. The phylogenetic relationships revealed by the TrpR amino acid sequence alignment agree with the results of others. PMID:8208606

  6. A Conserved Acidic Motif in the N-Terminal Domain of Nitrate Reductase Is Necessary for the Inactivation of the Enzyme in the Dark by Phosphorylation and 14-3-3 Binding1

    PubMed Central

    Pigaglio, Emmanuelle; Durand, Nathalie; Meyer, Christian

    1999-01-01

    It has previously been shown that the N-terminal domain of tobacco (Nicotiana tabacum) nitrate reductase (NR) is involved in the inactivation of the enzyme by phosphorylation, which occurs in the dark (L. Nussaume, M. Vincentz, C. Meyer, J.P. Boutin, and M. Caboche [1995] Plant Cell 7: 611–621). The activity of a mutant NR protein lacking this N-terminal domain was no longer regulated by light-dark transitions. In this study smaller deletions were performed in the N-terminal domain of tobacco NR that removed protein motifs conserved among higher plant NRs. The resulting truncated NR-coding sequences were then fused to the cauliflower mosaic virus 35S RNA promoter and introduced in NR-deficient mutants of the closely related species Nicotiana plumbaginifolia. We found that the deletion of a conserved stretch of acidic residues led to an active NR protein that was more thermosensitive than the wild-type enzyme, but it was relatively insensitive to the inactivation by phosphorylation in the dark. Therefore, the removal of this acidic stretch seems to have the same effects on NR activation state as the deletion of the N-terminal domain. A hypothetical explanation for these observations is that a specific factor that impedes inactivation remains bound to the truncated enzyme. A synthetic peptide derived from this acidic protein motif was also found to be a good substrate for casein kinase II. PMID:9880364

  7. Polymorphism, monomorphism, and sequences in conserved microsatellites in primate species.

    PubMed

    Blanquer-Maumont, A; Crouau-Roy, B

    1995-10-01

    Dimeric short tandem repeats are a source of highly polymorphic markers in the mammalian genome. Genetic variation at these hypervariable loci is extensively used for linkage analysis, for the identification of individuals, and may be useful for interpopulation and interspecies studies. In this paper, we analyze the variability and the sequences of a segment including three microsatellites, first described in man, in several species of primates (chimpanzee, orangutan, gibbon, and macaque) using the heterologous primers (man primers). This region is located on the human chromosome 6p, near the tumor necrosis factor genes, in the major histocompatibility complex. The fact that these primers work in all species studied indicates that they are conserved throughout the different lineages of the two superfamilies, the Hominoidea and the Cercopithecidea, represented by the macaques. However, the intervening sequence displays intraspecific and interspecific variability. The sites of base substitutions and the insertion/deletion events are not evenly distributed within this region. The data suggest that it is necessary to have a minimal number of repeats to increase the rate of mutation sufficiently to allow the development of polymorphism. In some species, the microsatellites present single base variations which reduce the number of contiguous repeats, thus apparently slowing the rate of additional slippage events. Species with such variations or a low number of repeats are monomorphic. These microsatellite sequences are informative in the comparison of closely related species and reflect the phylogeny of the Old World monkeys, apes, and man. PMID:7563137

  8. Structural Analysis of a Repetitive Protein Sequence Motif in Strepsirrhine Primate Amelogenin

    PubMed Central

    Bromley, Keith M.; Hacia, Joseph G.; Bromage, Timothy G.; Snead, Malcolm L.; Moradian-Oldak, Janet; Paine, Michael L.

    2011-01-01

    Strepsirrhines are members of a primate suborder that has a distinctive set of features associated with the development of the dentition. Amelogenin (AMEL), the better known of the enamel matrix proteins, forms 90% of the secreted organic matrix during amelogenesis. Although AMEL has been sequenced in numerous mammalian lineages, the only reported strepsirrhine AMEL sequences are those of the ring-tailed lemur and galago, which contain a set of additional proline-rich tandem repeats absent in all other primates species analyzed to date, but present in some non-primate mammals. Here, we first determined that these repeats are present in AMEL from three additional lemur species and thus are likely to be widespread throughout this group. To evaluate the functional relevance of these repeats in strepsirrhines, we engineered a mutated murine amelogenin sequence containing a similar proline-rich sequence to that of Lemur catta. In the monomeric form, the MQP insertions had no influence on the secondary structure or refolding properties, whereas in the assembled form, the insertions increased the hydrodynamic radii. We speculate that increased AMEL nanosphere size may influence enamel formation in strepsirrhine primates. PMID:21437261

  9. Structural analysis of a repetitive protein sequence motif in strepsirrhine primate amelogenin.

    PubMed

    Lacruz, Rodrigo S; Lakshminarayanan, Rajamani; Bromley, Keith M; Hacia, Joseph G; Bromage, Timothy G; Snead, Malcolm L; Moradian-Oldak, Janet; Paine, Michael L

    2011-01-01

    Strepsirrhines are members of a primate suborder that has a distinctive set of features associated with the development of the dentition. Amelogenin (AMEL), the better known of the enamel matrix proteins, forms 90% of the secreted organic matrix during amelogenesis. Although AMEL has been sequenced in numerous mammalian lineages, the only reported strepsirrhine AMEL sequences are those of the ring-tailed lemur and galago, which contain a set of additional proline-rich tandem repeats absent in all other primates species analyzed to date, but present in some non-primate mammals. Here, we first determined that these repeats are present in AMEL from three additional lemur species and thus are likely to be widespread throughout this group. To evaluate the functional relevance of these repeats in strepsirrhines, we engineered a mutated murine amelogenin sequence containing a similar proline-rich sequence to that of Lemur catta. In the monomeric form, the MQP insertions had no influence on the secondary structure or refolding properties, whereas in the assembled form, the insertions increased the hydrodynamic radii. We speculate that increased AMEL nanosphere size may influence enamel formation in strepsirrhine primates. PMID:21437261

  10. Complex architecture of major histocompatibility complex class II promoters: reiterated motifs and conserved protein-protein interactions.

    PubMed Central

    Jabrane-Ferrat, N; Fontes, J D; Boss, J M; Peterlin, B M

    1996-01-01

    The S box (also known as at the H, W, or Z box) is the 5'-most element of the conserved upstream sequences in promoters of major histocompatibility complex class II genes. It is important for their B-cell-specific and interferon gamma-inducible expression. In this study, we demonstrate that the S box represents a duplication of the downstream X box. First, RFX, which is composed of the RFX5-p36 heterodimer that binds to the X box, also binds to the S box and its 5'-flanking sequence. Second, NF-Y, which binds to the Y box and increases interactions between RFX and the X box, also increases the binding of RFX to the S box. Third, RFXs bound to S and X boxes interact with each other in a spatially constrained manner. Finally, we confirmed these protein-protein and protein-DNA interactions by expressing a hybrid RFX5-VP16 protein in cells. We conclude that RFX binds to S and X boxes and that complex interactions between RFX and NF-Y direct B-cell-specific and interferon gamma-inducible expression or major histocompatibility complex class II genes. PMID:8756625