Science.gov

Sample records for conserved sequence motif

  1. The BsaHI restriction-modification system: cloning, sequencing and analysis of conserved motifs.

    PubMed

    Neely, Robert K; Roberts, Richard J

    2008-05-14

    Restriction and modification enzymes typically recognise short DNA sequences of between two and eight bases in length. Understanding the mechanism of this recognition represents a significant challenge that we begin to address for the BsaHI restriction-modification system, which recognises the six base sequence GRCGYC. The DNA sequences of the genes for the BsaHI methyltransferase, bsaHIM, and restriction endonuclease, bsaHIR, have been determined (GenBank accession #EU386360), cloned and expressed in E. coli. Both the restriction endonuclease and methyltransferase enzymes share significant similarity with a group of 6 other enzymes comprising the restriction-modification systems HgiDI and HgiGI and the putative HindVP, NlaCORFDP, NpuORFC228P and SplZORFNP restriction-modification systems. A sequence alignment of these homologues shows that their amino acid sequences are largely conserved and highlights several motifs of interest. We target one such conserved motif, reading SPERRFD, at the C-terminal end of the bsaHIR gene. A mutational analysis of these amino acids indicates that the motif is crucial for enzymatic activity. Sequence alignment of the methyltransferase gene reveals a short motif within the target recognition domain that is conserved among enzymes recognising the same sequences. Thus, this motif may be used as a diagnostic tool to define the recognition sequences of the cytosine C5 methyltransferases. We have cloned and sequenced the BsaHI restriction and modification enzymes. We have identified a region of the R. BsaHI enzyme that is crucial for its activity. Analysis of the amino acid sequence of the BsaHI methyltransferase enzyme led us to propose two new motifs that can be used in the diagnosis of the recognition sequence of the cytosine C5-methyltransferases.

  2. Evolutionary dynamics of a conserved sequence motif in the ribosomal genes of the ciliate Paramecium.

    PubMed

    Catania, Francesco; Lynch, Michael

    2010-05-04

    In protozoa, the identification of preserved motifs by comparative genomics is often impeded by difficulties to generate reliable alignments for non-coding sequences. Moreover, the evolutionary dynamics of regulatory elements in 3' untranslated regions (both in protozoa and metazoa) remains a virtually unexplored issue. By screening Paramecium tetraurelia's 3' untranslated regions for 8-mers that were previously found to be preserved in mammalian 3' UTRs, we detect and characterize a motif that is distinctly conserved in the ribosomal genes of this ciliate. The motif appears to be conserved across Paramecium aurelia species but is absent from the ribosomal genes of four additional non-Paramecium species surveyed, including another ciliate, Tetrahymena thermophila. Motif-free ribosomal genes retain fewer paralogs in the genome and appear to be lost more rapidly relative to motif-containing genes. Features associated with the discovered preserved motif are consistent with this 8-mer playing a role in post-transcriptional regulation. Our observations 1) shed light on the evolution of a putative regulatory motif across large phylogenetic distances; 2) are expected to facilitate the understanding of the modulation of ribosomal genes expression in Paramecium; and 3) reveal a largely unexplored--and presumably not restricted to Paramecium--association between the presence/absence of a DNA motif and the evolutionary fate of its host genes.

  3. Evolutionary dynamics of a conserved sequence motif in the ribosomal genes of the ciliate Paramecium

    PubMed Central

    2010-01-01

    Background In protozoa, the identification of preserved motifs by comparative genomics is often impeded by difficulties to generate reliable alignments for non-coding sequences. Moreover, the evolutionary dynamics of regulatory elements in 3' untranslated regions (both in protozoa and metazoa) remains a virtually unexplored issue. Results By screening Paramecium tetraurelia's 3' untranslated regions for 8-mers that were previously found to be preserved in mammalian 3' UTRs, we detect and characterize a motif that is distinctly conserved in the ribosomal genes of this ciliate. The motif appears to be conserved across Paramecium aurelia species but is absent from the ribosomal genes of four additional non-Paramecium species surveyed, including another ciliate, Tetrahymena thermophila. Motif-free ribosomal genes retain fewer paralogs in the genome and appear to be lost more rapidly relative to motif-containing genes. Features associated with the discovered preserved motif are consistent with this 8-mer playing a role in post-transcriptional regulation. Conclusions Our observations 1) shed light on the evolution of a putative regulatory motif across large phylogenetic distances; 2) are expected to facilitate the understanding of the modulation of ribosomal genes expression in Paramecium; and 3) reveal a largely unexplored--and presumably not restricted to Paramecium--association between the presence/absence of a DNA motif and the evolutionary fate of its host genes. PMID:20441586

  4. Conserved sequence motifs among bacterial, eukaryotic, and archaeal phosphatases that define a new phosphohydrolase superfamily.

    PubMed Central

    Thaller, M. C.; Schippa, S.; Rossolini, G. M.

    1998-01-01

    Members of a new molecular family of bacterial nonspecific acid phosphatases (NSAPs), indicated as class C, were found to share significant sequence similarities to bacterial class B NSAPs and to some plant acid phosphatases, representing the first example of a family of bacterial NSAPs that has a relatively close eukaryotic counterpart. Despite the lack of an overall similarity, conserved sequence motifs were also identified among the above enzyme families (class B and class C bacterial NSAPs, and related plant phosphatases) and several other families of phosphohydrolases, including bacterial phosphoglycolate phosphatases, histidinol-phosphatase domains of the bacterial bifunctional enzymes imidazole-glycerolphosphate dehydratases, and bacterial, eukaryotic, and archaeal phosphoserine phosphatases and threalose-6-phosphatases. These conserved motifs are clustered within two domains, separated by a variable spacer region, according to the pattern [FILMAVT]-D-[ILFRMVY]-D-[GSNDE]-[TV]-[ILVAM]-[AT S VILMC]-X-¿YFWHKR)-X-¿YFWHNQ¿-X( 102,191)-¿KRHNQ¿-G-D-¿FYWHILVMC¿-¿QNH¿-¿FWYGP¿-D -¿PSNQYW¿. The dephosphorylating activity common to all these proteins supports the definition of this phosphatase motif and the inclusion of these enzymes into a superfamily of phosphohydrolases that we propose to indicate as "DDDD" after the presence of the four invariant aspartate residues. Database searches retrieved various hypothetical proteins of unknown function containing this or similar motifs, for which a phosphohydrolase activity could be hypothesized. PMID:9684901

  5. Detecting Remote Sequence Homology in Disordered Proteins: Discovery of Conserved Motifs in the N-Termini of Mononegavirales phosphoproteins

    PubMed Central

    Karlin, David; Belshaw, Robert

    2012-01-01

    Paramyxovirinae are a large group of viruses that includes measles virus and parainfluenza viruses. The viral Phosphoprotein (P) plays a central role in viral replication. It is composed of a highly variable, disordered N-terminus and a conserved C-terminus. A second viral protein alternatively expressed, the V protein, also contains the N-terminus of P, fused to a zinc finger. We suspected that, despite their high variability, the N-termini of P/V might all be homologous; however, using standard approaches, we could previously identify sequence conservation only in some Paramyxovirinae. We now compared the N-termini using sensitive sequence similarity search programs, able to detect residual similarities unnoticeable by conventional approaches. We discovered that all Paramyxovirinae share a short sequence motif in their first 40 amino acids, which we called soyuz1. Despite its short length (11–16aa), several arguments allow us to conclude that soyuz1 probably evolved by homologous descent, unlike linear motifs. Conservation across such evolutionary distances suggests that soyuz1 plays a crucial role and experimental data suggest that it binds the viral nucleoprotein to prevent its illegitimate self-assembly. In some Paramyxovirinae, the N-terminus of P/V contains a second motif, soyuz2, which might play a role in blocking interferon signaling. Finally, we discovered that the P of related Mononegavirales contain similarly overlooked motifs in their N-termini, and that their C-termini share a previously unnoticed structural similarity suggesting a common origin. Our results suggest several testable hypotheses regarding the replication of Mononegavirales and suggest that disordered regions with little overall sequence similarity, common in viral and eukaryotic proteins, might contain currently overlooked motifs (intermediate in length between linear motifs and disordered domains) that could be detected simply by comparing orthologous proteins. PMID:22403617

  6. Discovering novel sequence motifs with MEME.

    PubMed

    Bailey, Timothy L

    2002-11-01

    This unit illustrates how to use MEME to discover motifs in a group of related nucleotide or peptide sequences. A MEME motif is a sequence pattern that occurs repeatedly in one or more sequences in the input group. MEME can be used to discover novel patterns because it bases its discoveries only on the input sequences, not on any prior knowledge (such as databases of known motifs). The input to MEME is a set of unaligned sequences of the same type (peptide or nucleotide). For each motif it discovers, MEME reports the occurrences (sites), consensus sequence, and the level of conservation (information content) at each position in the pattern. MEME also produces block diagrams showing where all of the discovered motifs occur in the training set sequences. MEME's hypertext (HTML) output also contains buttons that allow for the convenient use of the motifs in other searches.

  7. Spatial clustering of binding motifs and charges reveals conserved functional features in disordered nucleoporin sequences

    NASA Astrophysics Data System (ADS)

    Ando, David; Colvin, Michael; Rexach, Michael; Gopinathan, Ajay

    2013-03-01

    The Nuclear Pore Complex (NPC) gates the only channel through which cells exchange material between the nucleus and cytoplasm. Traffic is regulated by transport receptors bound to cargo which interact with numerous of disordered phenylalanine glycine (FG) repeat containing proteins (FG nups) that line this channel. The precise physical mechanism of transport regulation has remained elusive primarily due to the difficulty in understanding the structure and dynamics of such a large assembly of interacting disordered proteins. Here we have performed a comprehensive bioinformatic analysis, specifically tailored towards disordered proteins, on thousands of nuclear pore proteins from a variety of species revealing a set of highly conserved features in the sequence structure among FG nups. Contrary to the general perception that these proteins are functionally equivalent to homogeneous polymers, we show that biophysically important features within individual nups like the separation, spatial localization and ordering along the chain of FG and charge domains are highly conserved. Our current understanding of NPC structure and function should therefore be revised to account for these common features that are functionally relevant for the underlying physical mechanism of NPC gating.

  8. The Runt domain of AML1 (RUNX1) binds a sequence-conserved RNA motif that mimics a DNA element

    PubMed Central

    Fukunaga, Junichi; Nomura, Yusuke; Tanaka, Yoichiro; Amano, Ryo; Tanaka, Taku; Nakamura, Yoshikazu; Kawai, Gota; Sakamoto, Taiichi; Kozu, Tomoko

    2013-01-01

    AML1 (RUNX1) is a key transcription factor for hematopoiesis that binds to the Runt-binding double-stranded DNA element (RDE) of target genes through its N-terminal Runt domain. Aberrations in the AML1 gene are frequently found in human leukemia. To better understand AML1 and its potential utility for diagnosis and therapy, we obtained RNA aptamers that bind specifically to the AML1 Runt domain. Enzymatic probing and NMR analyses revealed that Apt1-S, which is a truncated variant of one of the aptamers, has a CACG tetraloop and two stem regions separated by an internal loop. All the isolated aptamers were found to contain the conserved sequence motif 5′-NNCCAC-3′ and 5′-GCGMGN′N′-3′ (M:A or C; N and N′ form Watson–Crick base pairs). The motif contains one AC mismatch and one base bulged out. Mutational analysis of Apt1-S showed that three guanines of the motif are important for Runt binding as are the three guanines of RDE, which are directly recognized by three arginine residues of the Runt domain. Mutational analyses of the Runt domain revealed that the amino acid residues used for Apt1-S binding were similar to those used for RDE binding. Furthermore, the aptamer competed with RDE for binding to the Runt domain in vitro. These results demonstrated that the Runt domain of the AML1 protein binds to the motif of the aptamer that mimics DNA. Our findings should provide new insights into RNA function and utility in both basic and applied sciences. PMID:23709277

  9. FastMotif: spectral sequence motif discovery.

    PubMed

    Colombo, Nicoló; Vlassis, Nikos

    2015-08-15

    Sequence discovery tools play a central role in several fields of computational biology. In the framework of Transcription Factor binding studies, most of the existing motif finding algorithms are computationally demanding, and they may not be able to support the increasingly large datasets produced by modern high-throughput sequencing technologies. We present FastMotif, a new motif discovery algorithm that is built on a recent machine learning technique referred to as Method of Moments. Based on spectral decompositions, our method is robust to model misspecifications and is not prone to locally optimal solutions. We obtain an algorithm that is extremely fast and designed for the analysis of big sequencing data. On HT-Selex data, FastMotif extracts motif profiles that match those computed by various state-of-the-art algorithms, but one order of magnitude faster. We provide a theoretical and numerical analysis of the algorithm's robustness and discuss its sensitivity with respect to the free parameters. The Matlab code of FastMotif is available from http://lcsb-portal.uni.lu/bioinformatics. vlassis@adobe.com Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  10. Analysis of Cytochrome P450 Conserved Sequence Motifs between Helices E and H: Prediction of Critical Motifs and Residues in Enzyme Functions

    PubMed Central

    Oezguen, Numan; Kumar, Santosh

    2014-01-01

    Rational approaches have been extensively used to investigate the role of active site residues in cytochrome P450 (CYP) functions. However, recent studies using random mutagenesis suggest an important role for non-active site residues in CYP functions. Meta-analysis of the random mutants showed that 75% of the functionally important non-active site residues are present in 20% of the entire protein between helices E and H (E-H) and conserved sequence motif (CSM) between 7 and 11. The CSM approach was developed recently to investigate the functional role of non-active site residues in CYP2B4. Furthermore, we identified and analyzed the CSM in multiple CYP families and subfamilies in the E-H region. Results from CSM analysis showed that CSM 7, 8, 10, and 11 are conserved in CYP1, CYP2, and CYP3 families, while CSM 9 is conserved only in CYP2 family. Analysis of different CYP2 subfamilies showed that CYP2B and CYP2C have similar characteristics in the CSM, while the characteristics of CYP2A and CYP2D subfamilies are different. Finally, we analyzed CSM 7, 8, 10, and 11, which are common in all the CYP families/subfamilies analyzed, in fifteen important drug-metabolizing CYPs. The results showed that while CSM 8 is most conserved among these CYPs, CSM 7, 9, and 10 have significant variations. We suggest that CSM8 has a common role in all the CYPs that have been analyzed, while CSM 7, 10, and 11 may have relatively specific role within the subfamily. We further suggest that these CSM play important role in opening and closing of the substrate access/egress channel by modulating the flexible/plastic region of the protein. Thus, site-directed mutagenesis of these CSM can be used to study structure-function and dynamic/plasticity-function relationships and to design CYP biocatalysts. PMID:25426333

  11. Alignment of U3 region sequences of mammalian type C viruses: identification of highly conserved motifs and implications for enhancer design.

    PubMed Central

    Golemis, E A; Speck, N A; Hopkins, N

    1990-01-01

    We aligned published sequences for the U3 region of 35 type C mammalian retroviruses. The alignment reveals that certain sequence motifs within the U3 region are strikingly conserved. A number of these motifs correspond to previously identified sites. In particular, we found that the enhancer region of most of the viruses examined contains a binding site for leukemia virus factor b, a viral corelike element, the consensus motif for nuclear factor 1, and the glucocorticoid response element. Most viruses containing more than one copy of enhancer sequences include these binding sites in both copies of the repeat. We consider this set of binding sites to constitute a framework for the enhancers of this set of viruses. Other highly conserved motifs in the U3 region include the retrovirus inverted repeat sequence, a negative regulatory element, and the CCAAT and TATA boxes. In addition, we identified two novel motifs in the promoter region that were exceptionally highly conserved but have not been previously described. PMID:2153223

  12. Linear array of conserved sequence motifs to discriminate protein subfamilies: study on pyridine nucleotide-disulfide reductases

    PubMed Central

    Avila, César L; Rapisarda, Viviana A; Farías, Ricardo N; De Las Rivas, Javier; Chehín, Rosana

    2007-01-01

    Background The pyridine nucleotide disulfide reductase (PNDR) is a large and heterogeneous protein family divided into two classes (I and II), which reflect the divergent evolution of its characteristic disulfide redox active site. However, not all the PNDR members fit into these categories and this suggests the need of further studies to achieve a more comprehensive classification of this complex family. Results A workflow to improve the clusterization of protein families based on the array of linear conserved motifs is designed. The method is applied to the PNDR large family finding two main groups, which correspond to PNDR classes I and II. However, two other separate protein clusters, previously classified as class I in most databases, are outgrouped: the peroxide reductases (NAOX, NAPE) and the type II NADH dehydrogenases (NDH-2). In this way, two novel PNDR classes III and IV for NAOX/NAPE and NDH-2 respectively are proposed. By knowledge-driven biochemical and functional data analyses done on the new class IV, a linear array of motifs putatively related to Cu(II)-reductase activity is detected in a specific subset of NDH-2. Conclusion The results presented are a novel contribution to the classification of the complex and large PNDR protein family, supporting its reclusterization into four classes. The linear array of motifs detected within the class IV PNDR subfamily could be useful as a signature for a particular subgroup of NDH-2. PMID:17367536

  13. rMotifGen: random motif generator for DNA and protein sequences.

    PubMed

    Rouchka, Eric C; Hardin, C Timothy

    2007-08-07

    Detection of short, subtle conserved motif regions within a set of related DNA or amino acid sequences can lead to discoveries about important regulatory domains such as transcription factor and DNA binding sites as well as conserved protein domains. In order to help assess motif detection algorithms on motifs with varying properties and levels of conservation, we have developed a computational tool, rMotifGen, with the sole purpose of generating a number of random DNA or protein sequences containing short sequence motifs. Each motif consensus can be user-defined, randomly generated, or created from a position-specific scoring matrix (PSSM). Insertions and mutations within these motifs are created according to user-defined parameters and substitution matrices. The resulting sequences can be helpful in mutational simulations and in testing the limits of motif detection algorithms. Two implementations of rMotifGen have been created, one providing a graphical user interface (GUI) for random motif construction, and the other serving as a command line interface. The second implementation has the added advantages of platform independence and being able to be called in a batch mode. rMotifGen was used to construct sample sets of sequences containing DNA motifs and amino acid motifs that were then tested against the Gibbs sampler and MEME packages. rMotifGen provides an efficient and convenient method for creating random DNA or amino acid sequences with a variable number of motifs, where the instance of each motif can be incorporated using a position-specific scoring matrix (PSSM) or by creating an instance mutated from its corresponding consensus using an evolutionary model based on substitution matrices. rMotifGen is freely available at: http://bioinformatics.louisville.edu/brg/rMotifGen/.

  14. Conserved sequence motifs upstream from the co-ordinately expressed vitellogenin and apoVLDLII genes of chicken.

    PubMed

    van het Schip, F; Strijker, R; Samallo, J; Gruber, M; Geert, A B

    1986-11-11

    The vitellogenin and apoVLDLII yolk protein genes of chicken are transcribed in the liver upon estrogenization. To get information on putative regulatory elements, we compared more than 2 kb of their 5' flanking DNA sequences. Common sequence motifs were found in regions exhibiting estrogen-induced changes in chromatin structure. Stretches of alternating pyrimidines and purines of about 30-nucleotides long are present at roughly similar positions. A distinct box of sequence homology in the chicken genes also appears to be present at a similar position in front of the vitellogenin genes of Xenopus laevis, but is absent from the estrogen-responsive egg-white protein genes expressed in the oviduct. In front of the vitellogenin (position -595) and the VLDLII gene (position -548), a DNA element of about 300 base-pairs was found, which possesses structural characteristics of a mobile genetic element and bears homology to the transposon-like Vi element of Xenopus laevis.

  15. rMotifGen: random motif generator for DNA and protein sequences

    PubMed Central

    Rouchka, Eric C; Hardin, C Timothy

    2007-01-01

    Background Detection of short, subtle conserved motif regions within a set of related DNA or amino acid sequences can lead to discoveries about important regulatory domains such as transcription factor and DNA binding sites as well as conserved protein domains. In order to help assess motif detection algorithms on motifs with varying properties and levels of conservation, we have developed a computational tool, rMotifGen, with the sole purpose of generating a number of random DNA or protein sequences containing short sequence motifs. Each motif consensus can be user-defined, randomly generated, or created from a position-specific scoring matrix (PSSM). Insertions and mutations within these motifs are created according to user-defined parameters and substitution matrices. The resulting sequences can be helpful in mutational simulations and in testing the limits of motif detection algorithms. Results Two implementations of rMotifGen have been created, one providing a graphical user interface (GUI) for random motif construction, and the other serving as a command line interface. The second implementation has the added advantages of platform independence and being able to be called in a batch mode. rMotifGen was used to construct sample sets of sequences containing DNA motifs and amino acid motifs that were then tested against the Gibbs sampler and MEME packages. Conclusion rMotifGen provides an efficient and convenient method for creating random DNA or amino acid sequences with a variable number of motifs, where the instance of each motif can be incorporated using a position-specific scoring matrix (PSSM) or by creating an instance mutated from its corresponding consensus using an evolutionary model based on substitution matrices. rMotifGen is freely available at: . PMID:17683637

  16. ICAP-1, a Novel β1 Integrin Cytoplasmic Domain–associated Protein, Binds to a Conserved and Functionally Important NPXY Sequence Motif of β1 Integrin

    PubMed Central

    Chang, David D.; Wong, Carol; Smith, Healy; Liu, Jenny

    1997-01-01

    The cytoplasmic domains of integrins are essential for cell adhesion. We report identification of a novel protein, ICAP-1 (integrin cytoplasmic domain– associated protein-1), which binds to the β1 integrin cytoplasmic domain. The interaction between ICAP-1 and β1 integrins is highly specific, as demonstrated by the lack of interaction between ICAP-1 and the cytoplasmic domains of other β integrins, and requires a conserved and functionally important NPXY sequence motif found in the COOH-terminal region of the β1 integrin cytoplasmic domain. Mutational studies reveal that Asn and Tyr of the NPXY motif and a Val residue located NH2-terminal to this motif are critical for the ICAP-1 binding. Two isoforms of ICAP-1, a 200–amino acid protein (ICAP-1α) and a shorter 150–amino acid protein (ICAP-1β), derived from alternatively spliced mRNA, are expressed in most cells. ICAP-1α is a phosphoprotein and the extent of its phosphorylation is regulated by the cell–matrix interaction. First, an enhancement of ICAP-1α phosphorylation is observed when cells were plated on fibronectin-coated but not on nonspecific poly-l-lysine–coated surface. Second, the expression of a constitutively activated RhoA protein that disrupts the cell–matrix interaction results in dephosphorylation of ICAP-1α. The regulation of ICAP-1α phosphorylation by the cell–matrix interaction suggests an important role of ICAP-1 during integrin-dependent cell adhesion. PMID:9281591

  17. Short sequence motifs, overrepresented in mammalian conservednon-coding sequences

    SciTech Connect

    Minovitsky, Simon; Stegmaier, Philip; Kel, Alexander; Kondrashov,Alexey S.; Dubchak, Inna

    2007-02-21

    Background: A substantial fraction of non-coding DNAsequences of multicellular eukaryotes is under selective constraint. Inparticular, ~;5 percent of the human genome consists of conservednon-coding sequences (CNSs). CNSs differ from other genomic sequences intheir nucleotide composition and must play important functional roles,which mostly remain obscure.Results: We investigated relative abundancesof short sequence motifs in all human CNSs present in the human/mousewhole-genome alignments vs. three background sets of sequences: (i)weakly conserved or unconserved non-coding sequences (non-CNSs); (ii)near-promoter sequences (located between nucleotides -500 and -1500,relative to a start of transcription); and (iii) random sequences withthe same nucleotide composition as that of CNSs. When compared tonon-CNSs and near-promoter sequences, CNSs possess an excess of AT-richmotifs, often containing runs of identical nucleotides. In contrast, whencompared to random sequences, CNSs contain an excess of GC-rich motifswhich, however, lack CpG dinucleotides. Thus, abundance of short sequencemotifs in human CNSs, taken as a whole, is mostly determined by theiroverall compositional properties and not by overrepresentation of anyspecific short motifs. These properties are: (i) high AT-content of CNSs,(ii) a tendency, probably due to context-dependent mutation, of A's andT's to clump, (iii) presence of short GC-rich regions, and (iv) avoidanceof CpG contexts, due to their hypermutability. Only a small number ofshort motifs, overrepresented in all human CNSs are similar to bindingsites of transcription factors from the FOX family.Conclusion: Human CNSsas a whole appear to be too broad a class of sequences to possess strongfootprints of any short sequence-specific functions. Such footprintsshould be studied at the level of functional subclasses of CNSs, such asthose which flank genes with a particular pattern of expression. Overallproperties of CNSs are affected by patterns in

  18. Parametric bootstrapping for biological sequence motifs.

    PubMed

    O'Neill, Patrick K; Erill, Ivan

    2016-10-06

    Biological sequence motifs drive the specific interactions of proteins and nucleic acids. Accordingly, the effective computational discovery and analysis of such motifs is a central theme in bioinformatics. Many practical questions about the properties of motifs can be recast as random sampling problems. In this light, the task is to determine for a given motif whether a certain feature of interest is statistically unusual among relevantly similar alternatives. Despite the generality of this framework, its use has been frustrated by the difficulties of defining an appropriate reference class of motifs for comparison and of sampling from it effectively. We define two distributions over the space of all motifs of given dimension. The first is the maximum entropy distribution subject to mean information content, and the second is the truncated uniform distribution over all motifs having information content within a given interval. We derive exact sampling algorithms for each. As a proof of concept, we employ these sampling methods to analyze a broad collection of prokaryotic and eukaryotic transcription factor binding site motifs. In addition to positional information content, we consider the informational Gini coefficient of the motif, a measure of the degree to which information is evenly distributed throughout a motif's positions. We find that both prokaryotic and eukaryotic motifs tend to exhibit higher informational Gini coefficients (IGC) than would be expected by chance under either reference distribution. As a second application, we apply maximum entropy sampling to the motif p-value problem and use it to give elementary derivations of two new estimators. Despite the historical centrality of biological sequence motif analysis, this study constitutes to our knowledge the first use of principled null hypotheses for sequence motifs given information content. Through their use, we are able to characterize for the first time differerences in global motif statistics

  19. Association of Arabidopsis type-II ROPs with the plasma membrane requires a conserved C-terminal sequence motif and a proximal polybasic domain.

    PubMed

    Lavy, Meirav; Yalovsky, Shaul

    2006-06-01

    Plant ROPs (or RACs) are soluble Ras-related small GTPases that are attached to cell membranes by virtue of the post-translational lipid modifications of prenylation and S-acylation. ROPs (RACs) are subdivided into two major subgroups called type-I and type-II. Whereas type-I ROPs terminate with a conserved CaaL box and undergo prenylation, type-II ROPs undergo S-acylation on two or three C-terminal cysteines. In the present work we determined the sequence requirement for association of Arabidopsis type-II ROPs with the plasma membrane. We identified a conserved sequence motif, designated the GC-CG box, in which the modified cysteines are flanked by glycines. The GC-CG box cysteines are separated by five to six mostly non-polar residues. Deletion of this sequence or the introduction of mutations that change its nature disrupted the association of ROPs with the membrane. Mutations that changed the GC-CG box glycines to alanines also interfered with membrane association. Deletion of a polybasic domain proximal to the GC-CG box disrupted the plasma membrane association of AtROP10. A green fluorescent protein fusion protein containing the C-terminal 25 residues of AtROP10, including its polybasic domain and GC-CG box, was primarily associated with the plasma membrane but a similar fusion protein lacking the polybasic domain was exclusively localized in the soluble fraction. These data provide evidence for the minimal sequence required for plasma membrane association of type-II ROPs in Arabidopsis and other plant species.

  20. A conserved sequence extending motif III of the motor domain in the Snf2-family DNA translocase Rad54 is critical for ATPase activity.

    PubMed

    Zhang, Xiao-Ping; Janke, Ryan; Kingsley, James; Luo, Jerry; Fasching, Clare; Ehmsen, Kirk T; Heyer, Wolf-Dietrich

    2013-01-01

    Rad54 is a dsDNA-dependent ATPase that translocates on duplex DNA. Its ATPase function is essential for homologous recombination, a pathway critical for meiotic chromosome segregation, repair of complex DNA damage, and recovery of stalled or broken replication forks. In recombination, Rad54 cooperates with Rad51 protein and is required to dissociate Rad51 from heteroduplex DNA to allow access by DNA polymerases for recombination-associated DNA synthesis. Sequence analysis revealed that Rad54 contains a perfect match to the consensus PIP box sequence, a widely spread PCNA interaction motif. Indeed, Rad54 interacts directly with PCNA, but this interaction is not mediated by the Rad54 PIP box-like sequence. This sequence is located as an extension of motif III of the Rad54 motor domain and is essential for full Rad54 ATPase activity. Mutations in this motif render Rad54 non-functional in vivo and severely compromise its activities in vitro. Further analysis demonstrated that such mutations affect dsDNA binding, consistent with the location of this sequence motif on the surface of the cleft formed by two RecA-like domains, which likely forms the dsDNA binding site of Rad54. Our study identified a novel sequence motif critical for Rad54 function and showed that even perfect matches to the PIP box consensus may not necessarily identify PCNA interaction sites.

  1. Motif Yggdrasil: sampling sequence motifs from a tree mixture model.

    PubMed

    Andersson, Samuel A; Lagergren, Jens

    2007-06-01

    In phylogenetic foot-printing, putative regulatory elements are found in upstream regions of orthologous genes by searching for common motifs. Motifs in different upstream sequences are subject to mutations along the edges of the corresponding phylogenetic tree, consequently taking advantage of the tree in the motif search is an appealing idea. We describe the Motif Yggdrasil sampler; the first Gibbs sampler based on a general tree that uses unaligned sequences. Previous tree-based Gibbs samplers have assumed a star-shaped tree or partially aligned upstream regions. We give a probabilistic model (MY model) describing upstream sequences with regulatory elements and build a Gibbs sampler with respect to this model. The model allows toggling, i.e., the restriction of a position to a subset of nucleotides, but does not require aligned sequences nor edge lengths, which may be difficult to come by. We apply the collapsing technique to eliminate the need to sample nuisance parameters, and give a derivation of the predictive update formula. We show that the MY model improves the modeling of difficult motif instances and that the use of the tree achieves a substantial increase in nucleotide level correlation coefficient both for synthetic data and 37 bacterial lexA genes. We investigate the sensitivity to errors in the tree and show that using random trees MY sampler still has a performance similar to the original version.

  2. Detecting correlations among functional-sequence motifs

    NASA Astrophysics Data System (ADS)

    Pirino, Davide; Rigosa, Jacopo; Ledda, Alice; Ferretti, Luca

    2012-06-01

    Sequence motifs are words of nucleotides in DNA with biological functions, e.g., gene regulation. Identification of such words proceeds through rejection of Markov models on the expected motif frequency along the genome. Additional biological information can be extracted from the correlation structure among patterns of motif occurrences. In this paper a log-linear multivariate intensity Poisson model is estimated via expectation maximization on a set of motifs along the genome of E. coli K12. The proposed approach allows for excitatory as well as inhibitory interactions among motifs and between motifs and other genomic features like gene occurrences. Our findings confirm previous stylized facts about such types of interactions and shed new light on genome-maintenance functions of some particular motifs. We expect these methods to be applicable to a wider set of genomic features.

  3. MotifHyades: Expectation Maximization for de novo DNA Motif Pair Discovery on Paired Sequences.

    PubMed

    Wong, Ka-Chun

    2017-06-13

    In higher eukaryotes, protein-DNA binding interactions are the central activities in gene regulation. In particular, DNA motifs such as transcription factor binding sites are the key components in gene transcription. Harnessing the recently available chromatin interaction data, computational methods are desired for identifying the coupling DNA motif pairs enriched on long-range chromatin-interacting sequence pairs (e.g. promoter-enhancer pairs) systematically. To fill the void, a novel probabilistic model (namely, MotifHyades) is proposed and developed for de novo DNA motif pair discovery on paired sequences. In particular, two expectation maximization algorithms are derived for efficient model training with linear computational complexity. Under diverse scenarios, MotifHyades is demonstrated faster and more accurate than the existing ad hoc computational pipeline. In addition, MotifHyades is applied to discover thousands of DNA motif pairs with higher gold standard motif matching ratio, higher DNase accessibility, and higher evolutionary conservation than the previous ones in the human K562 cell line. Lastly, it has been run on five other human cell lines (i.e. GM12878, HeLa-S3, HUVEC, IMR90, and NHEK), revealing another thousands of novel DNA motif pairs which are characterized across a broad spectrum of genomic features on long-range promoter-enhancer pairs. The matrix-algebra-optimized versions of MotifHyades and the discovered DNA motif pairs can be found in http://bioinfo.cs.cityu.edu.hk/MotifHyades . kc.w@cityu.edu.hk. Supplementary data are available at Bioinformatics online.

  4. Genomic analysis of membrane protein families: abundance and conserved motifs

    PubMed Central

    Liu, Yang; Engelman, Donald M; Gerstein, Mark

    2002-01-01

    Background Polytopic membrane proteins can be related to each other on the basis of the number of transmembrane helices and sequence similarities. Building on the Pfam classification of protein domain families, and using transmembrane-helix prediction and sequence-similarity searching, we identified a total of 526 well-characterized membrane protein families in 26 recently sequenced genomes. To this we added a clustering of a number of predicted but unclassified membrane proteins, resulting in a total of 637 membrane protein families. Results Analysis of the occurrence and composition of these families revealed several interesting trends. The number of assigned membrane protein domains has an approximately linear relationship to the total number of open reading frames (ORFs) in 26 genomes studied. Caenorhabditis elegans is an apparent outlier, because of its high representation of seven-span transmembrane (7-TM) chemoreceptor families. In all genomes, including that of C. elegans, the number of distinct membrane protein families has a logarithmic relation to the number of ORFs. Glycine, proline, and tyrosine locations tend to be conserved in transmembrane regions within families, whereas isoleucine, valine, and methionine locations are relatively mutable. Analysis of motifs in putative transmembrane helices reveals that GxxxG and GxxxxxxG (which can be written GG4 and GG7, respectively; see Materials and methods) are among the most prevalent. This was noted in earlier studies; we now find these motifs are particularly well conserved in families, however, especially those corresponding to transporters, symporters, and channels. Conclusions We carried out a genome-wide analysis on patterns of the classified polytopic membrane protein families and analyzed the distribution of conserved amino acids and motifs in the transmembrane helix regions in these families. PMID:12372142

  5. Genome-wide conserved consensus transcription factor binding motifs are hyper-methylated

    PubMed Central

    2010-01-01

    Background DNA methylation can regulate gene expression by modulating the interaction between DNA and proteins or protein complexes. Conserved consensus motifs exist across the human genome ("predicted transcription factor binding sites": "predicted TFBS") but the large majority of these are proven by chromatin immunoprecipitation and high throughput sequencing (ChIP-seq) not to be biological transcription factor binding sites ("empirical TFBS"). We hypothesize that DNA methylation at conserved consensus motifs prevents promiscuous or disorderly transcription factor binding. Results Using genome-wide methylation maps of the human heart and sperm, we found that all conserved consensus motifs as well as the subset of those that reside outside CpG islands have an aggregate profile of hyper-methylation. In contrast, empirical TFBS with conserved consensus motifs have a profile of hypo-methylation. 40% of empirical TFBS with conserved consensus motifs resided in CpG islands whereas only 7% of all conserved consensus motifs were in CpG islands. Finally we further identified a minority subset of TF whose profiles are either hypo-methylated or neutral at their respective conserved consensus motifs implicating that these TF may be responsible for establishing or maintaining an un-methylated DNA state, or whose binding is not regulated by DNA methylation. Conclusions Our analysis supports the hypothesis that at least for a subset of TF, empirical binding to conserved consensus motifs genome-wide may be controlled by DNA methylation. PMID:20875111

  6. Discriminative motif discovery in DNA and protein sequences using the DEME algorithm.

    PubMed

    Redhead, Emma; Bailey, Timothy L

    2007-10-15

    Motif discovery aims to detect short, highly conserved patterns in a collection of unaligned DNA or protein sequences. Discriminative motif finding algorithms aim to increase the sensitivity and selectivity of motif discovery by utilizing a second set of sequences, and searching only for patterns that can differentiate the two sets of sequences. Potential applications of discriminative motif discovery include discovering transcription factor binding site motifs in ChIP-chip data and finding protein motifs involved in thermal stability using sets of orthologous proteins from thermophilic and mesophilic organisms. We describe DEME, a discriminative motif discovery algorithm for use with protein and DNA sequences. Input to DEME is two sets of sequences; a "positive" set and a "negative" set. DEME represents motifs using a probabilistic model, and uses a novel combination of global and local search to find the motif that optimally discriminates between the two sets of sequences. DEME is unique among discriminative motif finders in that it uses an informative Bayesian prior on protein motif columns, allowing it to incorporate prior knowledge of residue characteristics. We also introduce four, synthetic, discriminative motif discovery problems that are designed for evaluating discriminative motif finders in various biologically motivated contexts. We test DEME using these synthetic problems and on two biological problems: finding yeast transcription factor binding motifs in ChIP-chip data, and finding motifs that discriminate between groups of thermophilic and mesophilic orthologous proteins. Using artificial data, we show that DEME is more effective than a non-discriminative approach when there are "decoy" motifs or when a variant of the motif is present in the "negative" sequences. With real data, we show that DEME is as good, but not better than non-discriminative algorithms at discovering yeast transcription factor binding motifs. We also show that DEME can find

  7. Improved K-means clustering algorithm for exploring local protein sequence motifs representing common structural property.

    PubMed

    Zhong, Wei; Altun, Gulsah; Harrison, Robert; Tai, Phang C; Pan, Yi

    2005-09-01

    Information about local protein sequence motifs is very important to the analysis of biologically significant conserved regions of protein sequences. These conserved regions can potentially determine the diverse conformation and activities of proteins. In this work, recurring sequence motifs of proteins are explored with an improved K-means clustering algorithm on a new dataset. The structural similarity of these recurring sequence clusters to produce sequence motifs is studied in order to evaluate the relationship between sequence motifs and their structures. To the best of our knowledge, the dataset used by our research is the most updated dataset among similar studies for sequence motifs. A new greedy initialization method for the K-means algorithm is proposed to improve traditional K-means clustering techniques. The new initialization method tries to choose suitable initial points, which are well separated and have the potential to form high-quality clusters. Our experiments indicate that the improved K-means algorithm satisfactorily increases the percentage of sequence segments belonging to clusters with high structural similarity. Careful comparison of sequence motifs obtained by the improved and traditional algorithms also suggests that the improved K-means clustering algorithm may discover some relatively weak and subtle sequence motifs, which are undetectable by the traditional K-means algorithms. Many biochemical tests reported in the literature show that these sequence motifs are biologically meaningful. Experimental results also indicate that the improved K-means algorithm generates more detailed sequence motifs representing common structures than previous research. Furthermore, these motifs are universally conserved sequence patterns across protein families, overcoming some weak points of other popular sequence motifs. The satisfactory result of the experiment suggests that this new K-means algorithm may be applied to other areas of bioinformatics

  8. The complete sequence of a Spanish isolate of Broad bean wilt virus 1 (BBWV-1) reveals a high variability and conserved motifs in the genus Fabavirus.

    PubMed

    Ferrer, R M; Guerri, J; Luis-Arteaga, M S; Moreno, P; Rubio, L

    2005-10-01

    The genome of a Spanish isolate of Broad bean wilt virus-1 (BBWV-1) was completely sequenced and compared with available sequences of other isolates of the genus Fabavirus (BBWV-1 and BBWV-2). This consisted of two RNAs of 5814 and 3431 nucleotides, respectively, and their organization was similar to that of other members of the family Comoviridae. Its mean nucleotide identity with a BBWV-1 American isolate was 81.5%, and between 59.8 and 63.5% with seven BBWV-2 isolates. Our analysis showed sequence stretches in the 5' non-coding regions which are conserved in both genomic RNAs and in BBWV-1 and BBWV-2 isolates.

  9. Localization of the labile disulfide bond between SU and TM of the murine leukemia virus envelope protein complex to a highly conserved CWLC motif in SU that resembles the active-site sequence of thiol-disulfide exchange enzymes.

    PubMed Central

    Pinter, A; Kopelman, R; Li, Z; Kayman, S C; Sanders, D A

    1997-01-01

    Previous studies have indicated that the surface (SU) and transmembrane (TM) subunits of the envelope protein (Env) of murine leukemia viruses (MuLVs) are joined by a labile disulfide bond that can be stabilized by treatment of virions with thiol-specific reagents. In the present study this observation was extended to the Envs of additional classes of MuLV, and the cysteines of SU involved in this linkage were mapped by proteolytic fragmentation analyses to the CWLC sequence present at the beginning of the C-terminal domain of SU. This sequence is highly conserved across a broad range of distantly related retroviruses and resembles the CXXC motif present at the active site of thiol-disulfide exchange enzymes. A model is proposed in which rearrangements of the SU-TM intersubunit disulfide linkage, mediated by the CWLC sequence, play roles in the assembly and function of the Env complex. PMID:9311907

  10. Finding sequence motifs in groups of functionally related proteins.

    PubMed

    Smith, H O; Annau, T M; Chandrasegaran, S

    1990-01-01

    We have developed a method for rapidly finding patterns of conserved amino acid residues (motifs) in groups of functionally related proteins. All 3-amino acid patterns in a group of proteins of the type aa1 d1 aa2 d2 aa3, where d1 and d2 are distances that can be varied in a range up to 24 residues, are accumulated into an array. Segments of the proteins containing those patterns that occur most frequently are aligned on each other by a scoring method that obtains an average relatedness value for all the amino acids in each column of the aligned sequence block based on the Dayhoff relatedness odds matrix. The automated method successfully finds and displays nearly all of the sequence motifs that have been previously reported to occur in 33 reverse transcriptases, 18 DNA integrases, and 30 DNA methyltransferases.

  11. WildSpan: mining structured motifs from protein sequences

    PubMed Central

    2011-01-01

    Background Automatic extraction of motifs from biological sequences is an important research problem in study of molecular biology. For proteins, it is desired to discover sequence motifs containing a large number of wildcard symbols, as the residues associated with functional sites are usually largely separated in sequences. Discovering such patterns is time-consuming because abundant combinations exist when long gaps (a gap consists of one or more successive wildcards) are considered. Mining algorithms often employ constraints to narrow down the search space in order to increase efficiency. However, improper constraint models might degrade the sensitivity and specificity of the motifs discovered by computational methods. We previously proposed a new constraint model to handle large wildcard regions for discovering functional motifs of proteins. The patterns that satisfy the proposed constraint model are called W-patterns. A W-pattern is a structured motif that groups motif symbols into pattern blocks interleaved with large irregular gaps. Considering large gaps reflects the fact that functional residues are not always from a single region of protein sequences, and restricting motif symbols into clusters corresponds to the observation that short motifs are frequently present within protein families. To efficiently discover W-patterns for large-scale sequence annotation and function prediction, this paper first formally introduces the problem to solve and proposes an algorithm named WildSpan (sequential pattern mining across large wildcard regions) that incorporates several pruning strategies to largely reduce the mining cost. Results WildSpan is shown to efficiently find W-patterns containing conserved residues that are far separated in sequences. We conducted experiments with two mining strategies, protein-based and family-based mining, to evaluate the usefulness of W-patterns and performance of WildSpan. The protein-based mining mode of WildSpan is developed for

  12. WildSpan: mining structured motifs from protein sequences.

    PubMed

    Hsu, Chen-Ming; Chen, Chien-Yu; Liu, Baw-Jhiune

    2011-03-31

    Automatic extraction of motifs from biological sequences is an important research problem in study of molecular biology. For proteins, it is desired to discover sequence motifs containing a large number of wildcard symbols, as the residues associated with functional sites are usually largely separated in sequences. Discovering such patterns is time-consuming because abundant combinations exist when long gaps (a gap consists of one or more successive wildcards) are considered. Mining algorithms often employ constraints to narrow down the search space in order to increase efficiency. However, improper constraint models might degrade the sensitivity and specificity of the motifs discovered by computational methods. We previously proposed a new constraint model to handle large wildcard regions for discovering functional motifs of proteins. The patterns that satisfy the proposed constraint model are called W-patterns. A W-pattern is a structured motif that groups motif symbols into pattern blocks interleaved with large irregular gaps. Considering large gaps reflects the fact that functional residues are not always from a single region of protein sequences, and restricting motif symbols into clusters corresponds to the observation that short motifs are frequently present within protein families. To efficiently discover W-patterns for large-scale sequence annotation and function prediction, this paper first formally introduces the problem to solve and proposes an algorithm named WildSpan (sequential pattern mining across large wildcard regions) that incorporates several pruning strategies to largely reduce the mining cost. WildSpan is shown to efficiently find W-patterns containing conserved residues that are far separated in sequences. We conducted experiments with two mining strategies, protein-based and family-based mining, to evaluate the usefulness of W-patterns and performance of WildSpan. The protein-based mining mode of WildSpan is developed for discovering

  13. Sequence-Based Screening for Rare Enzymes: New Insights into the World of AMDases Reveal a Conserved Motif and 58 Novel Enzymes Clustering in Eight Distinct Families

    PubMed Central

    Maimanakos, Janine; Chow, Jennifer; Gaßmeyer, Sarah K.; Güllert, Simon; Busch, Florian; Kourist, Robert; Streit, Wolfgang R.

    2016-01-01

    Arylmalonate Decarboxylases (AMDases, EC 4.1.1.76) are very rare and mostly underexplored enzymes. Currently only four known and biochemically characterized representatives exist. However, their ability to decarboxylate α-disubstituted malonic acid derivatives to optically pure products without cofactors makes them attractive and promising candidates for the use as biocatalysts in industrial processes. Until now, AMDases could not be separated from other members of the aspartate/glutamate racemase superfamily based on their gene sequences. Within this work, a search algorithm was developed that enables a reliable prediction of AMDase activity for potential candidates. Based on specific sequence patterns and screening methods 58 novel AMDase candidate genes could be identified in this work. Thereby, AMDases with the conserved sequence pattern of Bordetella bronchiseptica’s prototype appeared to be limited to the classes of Alpha-, Beta-, and Gamma-proteobacteria. Amino acid homologies and comparison of gene surrounding sequences enabled the classification of eight enzyme clusters. Particularly striking is the accumulation of genes coding for different transporters of the tripartite tricarboxylate transporters family, TRAP transporters and ABC transporters as well as genes coding for mandelate racemases/muconate lactonizing enzymes that might be involved in substrate uptake or degradation of AMDase products. Further, three novel AMDases were characterized which showed a high enantiomeric excess (>99%) of the (R)-enantiomer of flurbiprofen. These are the recombinant AmdA and AmdV from Variovorax sp. strains HH01 and HH02, originated from soil, and AmdP from Polymorphum gilvum found by a data base search. Altogether our findings give new insights into the class of AMDases and reveal many previously unknown enzyme candidates with high potential for bioindustrial processes. PMID:27610105

  14. Identification of protein motifs using conserved amino acid properties and partitioning techniques

    SciTech Connect

    Wu, T.D.; Brutlag, D.L.

    1995-12-31

    Analyzing a set of protein sequences involves a fundamental relationship between the coherency of the set and the specificity of the motif that describes it. Motifs may be obscured by training sets that contain incoherent sequences, in part due to protein subclasses, contamination, or errors. We develop an algorithm for motif identification that systematically explores possible patterns of coherency within a set of protein sequences, Our algorithm constructs alternative partitions of the training set data, where one subset of each partition is presumed to contain coherent data and is used for forming a motif. The motif is represented by multiple overlapping amino acid groups based on evolutionary, biochemical, or physical properties. We demonstrate our method on a training set of reverse transcriptases that contains subclasses, sequence errors, misalignments, and contaminating sequences. Despite these complications, our program identifies a novel motif for the subclass of retroviral and retrovirus-related reverse transcriptases. This motif has a much higher specificity than previously reported motifs and suggests the importance of conserved hydrophilic and hydrophobic residues in the structure of reverse transcriptases.

  15. D-MATRIX: a web tool for constructing weight matrix of conserved DNA motifs.

    PubMed

    Sen, Naresh; Mishra, Manoj; Khan, Feroz; Meena, Abha; Sharma, Ashok

    2009-07-27

    Despite considerable efforts to date, DNA motif prediction in whole genome remains a challenge for researchers. Currently the genome wide motif prediction tools required either direct pattern sequence (for single motif) or weight matrix (for multiple motifs). Although there are known motif pattern databases and tools for genome level prediction but no tool for weight matrix construction. Considering this, we developed a D-MATRIX tool which predicts the different types of weight matrix based on user defined aligned motif sequence set and motif width. For retrieval of known motif sequences user can access the commonly used databases such as TFD, RegulonDB, DBTBS, Transfac. D-MATRIX program uses a simple statistical approach for weight matrix construction, which can be converted into different file formats according to user requirement. It provides the possibility to identify the conserved motifs in the co-regulated genes or whole genome. As example, we successfully constructed the weight matrix of LexA transcription factor binding site with the help of known sos-box cis-regulatory elements in Deinococcus radiodurans genome. The algorithm is implemented in C-Sharp and wrapped in ASP.Net to maintain a user friendly web interface. D-MATRIX tool is accessible through the CIMAP domain network. http://203.190.147.116/dmatrix/

  16. D-MATRIX: A web tool for constructing weight matrix of conserved DNA motifs

    PubMed Central

    Sen, Naresh; Mishra, Manoj; Khan, Feroz; Meena, Abha; Sharma, Ashok

    2009-01-01

    Despite considerable efforts to date, DNA motif prediction in whole genome remains a challenge for researchers. Currently the genome wide motif prediction tools required either direct pattern sequence (for single motif) or weight matrix (for multiple motifs). Although there are known motif pattern databases and tools for genome level prediction but no tool for weight matrix construction. Considering this, we developed a D-MATRIX tool which predicts the different types of weight matrix based on user defined aligned motif sequence set and motif width. For retrieval of known motif sequences user can access the commonly used databases such as TFD, RegulonDB, DBTBS, Transfac. D­MATRIX program uses a simple statistical approach for weight matrix construction, which can be converted into different file formats according to user requirement. It provides the possibility to identify the conserved motifs in the co­regulated genes or whole genome. As example, we successfully constructed the weight matrix of LexA transcription factor binding site with the help of known sos­box cis­regulatory elements in Deinococcus radiodurans genome. The algorithm is implemented in C-Sharp and wrapped in ASP.Net to maintain a user friendly web interface. D­MATRIX tool is accessible through the CIMAP domain network. Availability http://203.190.147.116/dmatrix/ PMID:19759861

  17. Seed storage protein gene promoters contain conserved DNA motifs in Brassicaceae, Fabaceae and Poaceae

    PubMed Central

    Fauteux, François; Strömvik, Martina V

    2009-01-01

    Background Accurate computational identification of cis-regulatory motifs is difficult, particularly in eukaryotic promoters, which typically contain multiple short and degenerate DNA sequences bound by several interacting factors. Enrichment in combinations of rare motifs in the promoter sequence of functionally or evolutionarily related genes among several species is an indicator of conserved transcriptional regulatory mechanisms. This provides a basis for the computational identification of cis-regulatory motifs. Results We have used a discriminative seeding DNA motif discovery algorithm for an in-depth analysis of 54 seed storage protein (SSP) gene promoters from three plant families, namely Brassicaceae (mustards), Fabaceae (legumes) and Poaceae (grasses) using backgrounds based on complete sets of promoters from a representative species in each family, namely Arabidopsis (Arabidopsis thaliana (L.) Heynh.), soybean (Glycine max (L.) Merr.) and rice (Oryza sativa L.) respectively. We have identified three conserved motifs (two RY-like and one ACGT-like) in Brassicaceae and Fabaceae SSP gene promoters that are similar to experimentally characterized seed-specific cis-regulatory elements. Fabaceae SSP gene promoter sequences are also enriched in a novel, seed-specific E2Fb-like motif. Conserved motifs identified in Poaceae SSP gene promoters include a GCN4-like motif, two prolamin-box-like motifs and an Skn-1-like motif. Evidence of the presence of a variant of the TATA-box is found in the SSP gene promoters from the three plant families. Motifs discovered in SSP gene promoters were used to score whole-genome sets of promoters from Arabidopsis, soybean and rice. The highest-scoring promoters are associated with genes coding for different subunits or precursors of seed storage proteins. Conclusion Seed storage protein gene promoter motifs are conserved in diverse species, and different plant families are characterized by a distinct combination of conserved motifs

  18. Identification of imine reductase-specific sequence motifs.

    PubMed

    Fademrecht, Silvia; Scheller, Philipp N; Nestl, Bettina M; Hauer, Bernhard; Pleiss, Jürgen

    2016-05-01

    Chiral amines are valuable building blocks for the production of a variety of pharmaceuticals, agrochemicals and other specialty chemicals. Only recently, imine reductases (IREDs) were discovered which catalyze the stereoselective reduction of imines to chiral amines. Although several IREDs were biochemically characterized in the last few years, knowledge of the reaction mechanism and the molecular basis of substrate specificity and stereoselectivity is limited. To gain further insights into the sequence-function relationships, the Imine Reductase Engineering Database (www.IRED.BioCatNet.de) was established and a systematic analysis of 530 putative IREDs was performed. A standard numbering scheme based on R-IRED-Sk was introduced to facilitate the identification and communication of structurally equivalent positions in different proteins. A conservation analysis revealed a highly conserved cofactor binding region and a predominantly hydrophobic substrate binding cleft. Two IRED-specific motifs were identified, the cofactor binding motif GLGxMGx(5 )[ATS]x(4) Gx(4) [VIL]WNR[TS]x(2) [KR] and the active site motif Gx[DE]x[GDA]x[APS]x(3){K}x[ASL]x[LMVIAG]. Our results indicate a preference toward NADPH for all IREDs and explain why, despite their sequence similarity to β-hydroxyacid dehydrogenases (β-HADs), no conversion of β-hydroxyacids has been observed. Superfamily-specific conservations were investigated to explore the molecular basis of their stereopreference. Based on our analysis and previous experimental results on IRED mutants, an exclusive role of standard position 187 for stereoselectivity is excluded. Alternatively, two standard positions 139 and 194 were identified which are superfamily-specifically conserved and differ in R- and S-selective enzymes.

  19. Conservation defines functional motifs in the squint/nodal-related 1 RNA dorsal localization element

    PubMed Central

    Gilligan, Patrick C.; Kumari, Pooja; Lim, Shimin; Cheong, Albert; Chang, Alex; Sampath, Karuna

    2011-01-01

    RNA localization is emerging as a general principle of sub-cellular protein localization and cellular organization. However, the sequence and structural requirements in many RNA localization elements remain poorly understood. Whereas transcription factor-binding sites in DNA can be recognized as short degenerate motifs, and consensus binding sites readily inferred, protein-binding sites in RNA often contain structural features, and can be difficult to infer. We previously showed that zebrafish squint/nodal-related 1 (sqt/ndr1) RNA localizes to the future dorsal side of the embryo. Interestingly, mammalian nodal RNA can also localize to dorsal when injected into zebrafish embryos, suggesting that the sequence motif(s) may be conserved, even though the fish and mammal UTRs cannot be aligned. To define potential sequence and structural features, we obtained ndr1 3′-UTR sequences from approximately 50 fishes that are closely, or distantly, related to zebrafish, for high-resolution phylogenetic footprinting. We identify conserved sequence and structural motifs within the zebrafish/carp family and catfish. We find that two novel motifs, a single-stranded AGCAC motif and a small stem-loop, are required for efficient sqt RNA localization. These findings show that comparative sequencing in the zebrafish/carp family is an efficient approach for identifying weak consensus binding sites for RNA regulatory proteins. PMID:21149265

  20. The highly conserved amino acid sequence motif Tyr-Gly-Asp-Thr-Asp-Ser in alpha-like DNA polymerases is required by phage phi 29 DNA polymerase for protein-primed initiation and polymerization.

    PubMed Central

    Bernad, A; Lázaro, J M; Salas, M; Blanco, L

    1990-01-01

    The alpha-like DNA polymerases from bacteriophage phi 29 and other viruses, prokaryotes and eukaryotes contain an amino acid consensus sequence that has been proposed to form part of the dNTP binding site. We have used site-directed mutants to study five of the six highly conserved consecutive amino acids corresponding to the most conserved C-terminal segment (Tyr-Gly-Asp-Thr-Asp-Ser). Our results indicate that in phi 29 DNA polymerase this consensus sequence, although irrelevant for the 3'----5' exonuclease activity, is essential for initiation and elongation. Based on these results and on its homology with known or putative metal-binding amino acid sequences, we propose that in phi 29 DNA polymerase the Tyr-Gly-Asp-Thr-Asp-Ser consensus motif is part of the dNTP binding site, involved in the synthetic activities of the polymerase (i.e., initiation and polymerization), and that it is involved particularly in the metal binding associated with the dNTP site. Images PMID:2191296

  1. Occurrence probability of structured motifs in random sequences.

    PubMed

    Robin, S; Daudin, J-J; Richard, H; Sagot, M-F; Schbath, S

    2002-01-01

    The problem of extracting from a set of nucleic acid sequences motifs which may have biological function is more and more important. In this paper, we are interested in particular motifs that may be implicated in the transcription process. These motifs, called structured motifs, are composed of two ordered parts separated by a variable distance and allowing for substitutions. In order to assess their statistical significance, we propose approximations of the probability of occurrences of such a structured motif in a given sequence. An application of our method to evaluate candidate promoters in E. coli and B. subtilis is presented. Simulations show the goodness of the approximations.

  2. Probabilistic models for semisupervised discriminative motif discovery in DNA sequences.

    PubMed

    Kim, Jong Kyoung; Choi, Seungjin

    2011-01-01

    Methods for discriminative motif discovery in DNA sequences identify transcription factor binding sites (TFBSs), searching only for patterns that differentiate two sets (positive and negative sets) of sequences. On one hand, discriminative methods increase the sensitivity and specificity of motif discovery, compared to generative models. On the other hand, generative models can easily exploit unlabeled sequences to better detect functional motifs when labeled training samples are limited. In this paper, we develop a hybrid generative/discriminative model which enables us to make use of unlabeled sequences in the framework of discriminative motif discovery, leading to semisupervised discriminative motif discovery. Numerical experiments on yeast ChIP-chip data for discovering DNA motifs demonstrate that the best performance is obtained between the purely-generative and the purely-discriminative and the semisupervised learning improves the performance when labeled sequences are limited.

  3. Conservation of sequence in recombination signal sequence spacers.

    PubMed Central

    Ramsden, D A; Baetz, K; Wu, G E

    1994-01-01

    The variable domains of immunoglobulins and T cell receptors are assembled through the somatic, site specific recombination of multiple germline segments (V, D, and J segments) or V(D)J rearrangement. The recombination signal sequence (RSS) is necessary and sufficient for cell type specific targeting of the V(D)J rearrangement machinery to these germline segments. Previously, the RSS has been described as possessing both a conserved heptamer and a conserved nonamer motif. The heptamer and nonamer motifs are separated by a 'spacer' that was not thought to possess significant sequence conservation, however the length of the spacer could be either 12 +/- 1 bp or 23 +/- 1 bp long. In this report we have assembled and analyzed an extensive data base of published RSS. We have derived, through extensive consensus comparison, a more detailed description of the RSS than has previously been reported. Our analysis indicates that RSS spacers possess significant conservation of sequence, and that the conserved sequence in 12 bp spacers is similar to the conserved sequence in the first half of 23 bp spacers. PMID:8208601

  4. iMotifs: an integrated sequence motif visualization and analysis environment

    PubMed Central

    Piipari, Matias; Down, Thomas A.; Saini, Harpreet; Enright, Anton; Hubbard, Tim J.P.

    2010-01-01

    Motivation: Short sequence motifs are an important class of models in molecular biology, used most commonly for describing transcription factor binding site specificity patterns. High-throughput methods have been recently developed for detecting regulatory factor binding sites in vivo and in vitro and consequently high-quality binding site motif data are becoming available for increasing number of organisms and regulatory factors. Development of intuitive tools for the study of sequence motifs is therefore important. iMotifs is a graphical motif analysis environment that allows visualization of annotated sequence motifs and scored motif hits in sequences. It also offers motif inference with the sensitive NestedMICA algorithm, as well as overrepresentation and pairwise motif matching capabilities. All of the analysis functionality is provided without the need to convert between file formats or learn different command line interfaces. The application includes a bundled and graphically integrated version of the NestedMICA motif inference suite that has no outside dependencies. Problems associated with local deployment of software are therefore avoided. Availability: iMotifs is licensed with the GNU Lesser General Public License v2.0 (LGPL 2.0). The software and its source is available at http://wiki.github.com/mz2/imotifs and can be run on Mac OS X Leopard (Intel/PowerPC). We also provide a cross-platform (Linux, OS X, Windows) LGPL 2.0 licensed library libxms for the Perl, Ruby, R and Objective-C programming languages for input and output of XMS formatted annotated sequence motif set files. Contact: matias.piipari@gmail.com; imotifs@googlegroups.com PMID:20106815

  5. A conserved motif of vertebrate Y RNAs essential for chromosomal DNA replication

    PubMed Central

    Gardiner, Timothy J.; Christov, Christo P.; Langley, Alexander R.; Krude, Torsten

    2009-01-01

    Noncoding Y RNAs are required for the reconstitution of chromosomal DNA replication in late G1 phase template nuclei in a human cell-free system. Y RNA genes are present in all vertebrates and in some isolated nonvertebrates, but the conservation of Y RNA function and key determinants for its function are unknown. Here, we identify a determinant of Y RNA function in DNA replication, which is conserved throughout vertebrate evolution. Vertebrate Y RNAs are able to reconstitute chromosomal DNA replication in the human cell-free DNA replication system, but nonvertebrate Y RNAs are not. A conserved nucleotide sequence motif in the double-stranded stem of vertebrate Y RNAs correlates with Y RNA function. A functional screen of human Y1 RNA mutants identified this conserved motif as an essential determinant for reconstituting DNA replication in vitro. Double-stranded RNA oligonucleotides comprising this RNA motif are sufficient to reconstitute DNA replication, but corresponding DNA or random sequence RNA oligonucleotides are not. In intact cells, wild-type hY1 or the conserved RNA duplex can rescue an inhibition of DNA replication after RNA interference against hY3 RNA. Therefore, we have identified a new RNA motif that is conserved in vertebrate Y RNA evolution, and essential and sufficient for Y RNA function in human chromosomal DNA replication. PMID:19474146

  6. A Discriminative Approach for Unsupervised Clustering of DNA Sequence Motifs

    PubMed Central

    Stegmaier, Philip; Kel, Alexander; Wingender, Edgar; Borlak, Jürgen

    2013-01-01

    Algorithmic comparison of DNA sequence motifs is a problem in bioinformatics that has received increased attention during the last years. Its main applications concern characterization of potentially novel motifs and clustering of a motif collection in order to remove redundancy. Despite growing interest in motif clustering, the question which motif clusters to aim at has so far not been systematically addressed. Here we analyzed motif similarities in a comprehensive set of vertebrate transcription factor classes. For this we developed enhanced similarity scores by inclusion of the information coverage (IC) criterion, which evaluates the fraction of information an alignment covers in aligned motifs. A network-based method enabled us to identify motif clusters with high correspondence to DNA-binding domain phylogenies and prior experimental findings. Based on this analysis we derived a set of motif families representing distinct binding specificities. These motif families were used to train a classifier which was further integrated into a novel algorithm for unsupervised motif clustering. Application of the new algorithm demonstrated its superiority to previously published methods and its ability to reproduce entrained motif families. As a result, our work proposes a probabilistic approach to decide whether two motifs represent common or distinct binding specificities. PMID:23555204

  7. Structural assessment of glycyl mutations in invariantly conserved motifs.

    PubMed

    Prakash, Tulika; Sandhu, Kuljeet Singh; Singh, Nitin Kumar; Bhasin, Yasha; Ramakrishnan, C; Brahmachari, Samir K

    2007-11-15

    Motifs that are evolutionarily conserved in proteins are crucial to their structure and function. In one of our earlier studies, we demonstrated that the conserved motifs occurring invariantly across several organisms could act as structural determinants of the proteins. We observed the abundance of glycyl residues in these invariantly conserved motifs. The role of glycyl residues in highly conserved motifs has not been studied extensively. Thus, it would be interesting to examine the structural perturbations induced by mutation in these conserved glycyl sites. In this work, we selected a representative set of invariant signature (IS) peptides for which both the PDB structure and mutation information was available. We thoroughly analyzed the conformational features of the glycyl sites and their local interactions with the surrounding residues. Using Ramachandran angles, we showed that the glycyl residues occurring in these IS peptides, which have undergone mutation, occurred more often in the L-disallowed as compared with the L-allowed region of the Ramachandran plot. Short range contacts around the mutation site were analyzed to study the steric effects. With the results obtained from our analysis, we hypothesize that any change of activity arising because of such mutations must be attributed to the long-range interaction(s) of the new residue if the glycyl residue in the IS peptide occurred in the L-allowed region of the Ramachandran plot. However, the mutation of those conserved glycyl residues that occurred in the L-disallowed region of the Ramachandran plot might lead to an altered activity of the protein as a result of an altered conformation of the backbone in the immediate vicinity of the glycyl residue, in addition to long range effects arising from the long side chains of the new residue. Thus, the loss of activity because of mutation in the conserved glycyl site might either relate to long range interactions or to local perturbations around the site

  8. Sequence Bundles: a novel method for visualising, discovering and exploring sequence motifs

    PubMed Central

    2014-01-01

    Background We introduce Sequence Bundles--a novel data visualisation method for representing multiple sequence alignments (MSAs). We identify and address key limitations of the existing bioinformatics data visualisation methods (i.e. the Sequence Logo) by enabling Sequence Bundles to give salient visual expression to sequence motifs and other data features, which would otherwise remain hidden. Methods For the development of Sequence Bundles we employed research-led information design methodologies. Sequences are encoded as uninterrupted, semi-opaque lines plotted on a 2-dimensional reconfigurable grid. Each line represents a single sequence. The thickness and opacity of the stack at each residue in each position indicates the level of conservation and the lines' curved paths expose patterns in correlation and functionality. Several MSAs can be visualised in a composite image. The Sequence Bundles method is designed to favour a tangible, continuous and intuitive display of information. Results We have developed a software demonstration application for generating a Sequence Bundles visualisation of MSAs provided for the BioVis 2013 redesign contest. A subsequent exploration of the visualised line patterns allowed for the discovery of a number of interesting features in the dataset. Reported features include the extreme conservation of sequences displaying a specific residue and bifurcations of the consensus sequence. Conclusions Sequence Bundles is a novel method for visualisation of MSAs and the discovery of sequence motifs. It can aid in generating new insight and hypothesis making. Sequence Bundles is well disposed for future implementation as an interactive visual analytics software, which can complement existing visualisation tools. PMID:25237395

  9. Identifying novel sequence variants of RNA 3D motifs

    PubMed Central

    Zirbel, Craig L.; Roll, James; Sweeney, Blake A.; Petrov, Anton I.; Pirrung, Meg; Leontis, Neocles B.

    2015-01-01

    Predicting RNA 3D structure from sequence is a major challenge in biophysics. An important sub-goal is accurately identifying recurrent 3D motifs from RNA internal and hairpin loop sequences extracted from secondary structure (2D) diagrams. We have developed and validated new probabilistic models for 3D motif sequences based on hybrid Stochastic Context-Free Grammars and Markov Random Fields (SCFG/MRF). The SCFG/MRF models are constructed using atomic-resolution RNA 3D structures. To parameterize each model, we use all instances of each motif found in the RNA 3D Motif Atlas and annotations of pairwise nucleotide interactions generated by the FR3D software. Isostericity relations between non-Watson–Crick basepairs are used in scoring sequence variants. SCFG techniques model nested pairs and insertions, while MRF ideas handle crossing interactions and base triples. We use test sets of randomly-generated sequences to set acceptance and rejection thresholds for each motif group and thus control the false positive rate. Validation was carried out by comparing results for four motif groups to RMDetect. The software developed for sequence scoring (JAR3D) is structured to automatically incorporate new motifs as they accumulate in the RNA 3D Motif Atlas when new structures are solved and is available free for download. PMID:26130723

  10. [Conserved motifs in the primary and secondary ITS1 structures in bryophytes].

    PubMed

    Milyutina, I A; Ignatov, M S

    2015-01-01

    A study of the ITS1 nucleotide sequences of 1000 moss species of 62 families, 11 liverwort species from five orders, and one hornwort Anthoceros agrestis identified five highly conserved motifs (CM1-CM5), which are presumably involved in pre-rRNA processing. Although the ITS1 sequences substantially differ in length and the extent of divergence, the conserved motifs are found in all of them. ITS1 secondary structures were constructed for 76 mosses, and main regularities at conserved motif positioning were observed. The positions of processing sites in the ITS1 secondary structure of the yeast Saccharomyces cerevisiae were found to be similar to the positions of the conserved motifs in the ITS1 secondary structures of mosses and liverworts. In addition, a potential hairpin formation in the putative secondary structure of a pre-rRNA fragment was considered for the region between ITS1 CM4-CM5 and a highly conserved region between hairpins 49 and 50 (H49 and H50) of the 18S rRNA.

  11. MEME: discovering and analyzing DNA and protein sequence motifs.

    PubMed

    Bailey, Timothy L; Williams, Nadya; Misleh, Chris; Li, Wilfred W

    2006-07-01

    MEME (Multiple EM for Motif Elicitation) is one of the most widely used tools for searching for novel 'signals' in sets of biological sequences. Applications include the discovery of new transcription factor binding sites and protein domains. MEME works by searching for repeated, ungapped sequence patterns that occur in the DNA or protein sequences provided by the user. Users can perform MEME searches via the web server hosted by the National Biomedical Computation Resource (http://meme.nbcr.net) and several mirror sites. Through the same web server, users can also access the Motif Alignment and Search Tool to search sequence databases for matches to motifs encoded in several popular formats. By clicking on buttons in the MEME output, users can compare the motifs discovered in their input sequences with databases of known motifs, search sequence databases for matches to the motifs and display the motifs in various formats. This article describes the freely accessible web server and its architecture, and discusses ways to use MEME effectively to find new sequence patterns in biological sequences and analyze their significance.

  12. MEME: discovering and analyzing DNA and protein sequence motifs

    PubMed Central

    Bailey, Timothy L.; Williams, Nadya; Misleh, Chris; Li, Wilfred W.

    2006-01-01

    MEME (Multiple EM for Motif Elicitation) is one of the most widely used tools for searching for novel ‘signals’ in sets of biological sequences. Applications include the discovery of new transcription factor binding sites and protein domains. MEME works by searching for repeated, ungapped sequence patterns that occur in the DNA or protein sequences provided by the user. Users can perform MEME searches via the web server hosted by the National Biomedical Computation Resource () and several mirror sites. Through the same web server, users can also access the Motif Alignment and Search Tool to search sequence databases for matches to motifs encoded in several popular formats. By clicking on buttons in the MEME output, users can compare the motifs discovered in their input sequences with databases of known motifs, search sequence databases for matches to the motifs and display the motifs in various formats. This article describes the freely accessible web server and its architecture, and discusses ways to use MEME effectively to find new sequence patterns in biological sequences and analyze their significance. PMID:16845028

  13. AliBiMotif: integrating alignment and biclustering to unravel transcription factor binding sites in DNA sequences.

    PubMed

    Gonçalves, Joana P; Moreau, Yves; Madeira, Sara C

    2012-01-01

    Transcription Factors (TFs) control transcription by binding to specific sites in the promoter regions of the target genes, which can be modelled by structured motifs. In this paper we propose AliBiMotif, a method combining sequence alignment and a biclustering approach based on efficient string matching techniques using suffix trees to unravel approximately conserved sets of blocks (structured motifs) while straightforwardly disregarding non-conserved stretches in-between. The ability to ignore the width of non-conserved regions is a major advantage of the proposed method over other motif finders, as the lengths of the binding sites are usually easier to estimate than the separating distances.

  14. Dipeptide frequency/bias analysis identifies conserved sites of nonrandomness shared by cysteine-rich motifs.

    PubMed

    Campion, S R; Ameen, A S; Lai, L; King, J M; Munzenmaier, T N

    2001-08-15

    This report describes the application of a simple computational tool, AAPAIR.TAB, for the systematic analysis of the cysteine-rich EGF, Sushi, and Laminin motif/sequence families at the two-amino acid level. Automated dipeptide frequency/bias analysis detects preferences in the distribution of amino acids in established protein families, by determining which "ordered dipeptides" occur most frequently in comprehensive motif-specific sequence data sets. Graphic display of the dipeptide frequency/bias data revealed family-specific preferences for certain dipeptides, but more importantly detected a shared preference for employment of the ordered dipeptides Gly-Tyr (GY) and Gly-Phe (GF) in all three protein families. The dipeptide Asn-Gly (NG) also exhibited high-frequency and bias in the EGF and Sushi motif families, whereas Asn-Thr (NT) was distinguished in the Laminin family. Evaluation of the distribution of dipeptides identified by frequency/bias analysis subsequently revealed the highly restricted localization of the G(F/Y) and N(G/T) sequence elements at two separate sites of extreme conservation in the consensus sequence of all three sequence families. The similar employment of the high-frequency/bias dipeptides in three distinct protein sequence families was further correlated with the concurrence of these shared molecular determinants at similar positions within the distinctive scaffolds of three structurally divergent, but similarly employed, motif modules.

  15. Fission yeast hotspot sequence motifs are also active in budding yeast.

    PubMed

    Steiner, Walter W; Steiner, Estelle M

    2012-01-01

    In most organisms, including humans, meiotic recombination occurs preferentially at a limited number of sites in the genome known as hotspots. There has been substantial progress recently in elucidating the factors determining the location of meiotic recombination hotspots, and it is becoming clear that simple sequence motifs play a significant role. In S. pombe, there are at least five unique sequence motifs that have been shown to produce hotspots of recombination, and it is likely that there are more. In S. cerevisiae, simple sequence motifs have also been shown to produce hotspots or show significant correlations with hotspots. Some of the hotspot motifs in both yeasts are known or suspected to bind transcription factors (TFs), which are required for the activity of those hotspots. Here we show that four of the five hotspot motifs identified in S. pombe also create hotspots in the distantly related budding yeast S. cerevisiae. For one of these hotspots, M26 (also called CRE), we identify TFs, Cst6 and Sko1, that activate and inhibit the hotspot, respectively. In addition, two of the hotspot motifs show significant correlations with naturally occurring hotspots. The conservation of these hotspots between the distantly related fission and budding yeasts suggests that these sequence motifs, and others yet to be discovered, may function widely as hotspots in many diverse organisms.

  16. Functional characterization of motif sequences under purifying selection.

    PubMed

    Chen, De-Hua; Chang, Andrew Ying-Fei; Liao, Ben-Yang; Yeang, Chen-Hsiang

    2013-02-01

    Diverse life forms are driven by the evolution of gene regulatory programs including changes in regulator proteins and cis-regulatory elements. Alterations of cis-regulatory elements are likely to dominate the evolution of the gene regulatory networks, as they are subjected to smaller selective constraints compared with proteins and hence may evolve quickly to adapt the environment. Prior studies on cis-regulatory element evolution focus primarily on sequence substitutions of known transcription factor-binding motifs. However, evolutionary models for the dynamics of motif occurrence are relatively rare, and comprehensive characterization of the evolution of all possible motif sequences has not been pursued. In the present study, we propose an algorithm to estimate the strength of purifying selection of a motif sequence based on an evolutionary model capturing the birth and death of motif occurrences on promoters. We term this measure as the 'evolutionary retention coefficient', as it is related yet distinct from the canonical definition of selection coefficient in population genetics. Using this algorithm, we estimate and report the evolutionary retention coefficients of all possible 10-nucleotide sequences from the aligned promoter sequences of 27 748. orthologous gene families in 34 mammalian species. Intriguingly, the evolutionary retention coefficients of motifs are intimately associated with their functional relevance. Top-ranking motifs (sorted by evolutionary retention coefficients) are significantly enriched with transcription factor-binding sequences according to the curated knowledge from the TRANSFAC database and the ChIP-seq data generated from the ENCODE Consortium. Moreover, genes harbouring high-scoring motifs on their promoters retain significantly coherent expression profiles, and those genes are over-represented in the functional classes involved in gene regulation. The validation results reveal the dependencies between natural selection and

  17. A Conserved Metal Binding Motif in the Bacillus subtilis Competence Protein ComFA Enhances Transformation.

    PubMed

    Chilton, Scott S; Falbel, Tanya G; Hromada, Susan; Burton, Briana M

    2017-08-01

    Genetic competence is a process in which cells are able to take up DNA from their environment, resulting in horizontal gene transfer, a major mechanism for generating diversity in bacteria. Many bacteria carry homologs of the central DNA uptake machinery that has been well characterized in Bacillus subtilis It has been postulated that the B. subtilis competence helicase ComFA belongs to the DEAD box family of helicases/translocases. Here, we made a series of mutants to analyze conserved amino acid motifs in several regions of B. subtilis ComFA. First, we confirmed that ComFA activity requires amino acid residues conserved among the DEAD box helicases, and second, we show that a zinc finger-like motif consisting of four cysteines is required for efficient transformation. Each cysteine in the motif is important, and mutation of at least two of the cysteines dramatically reduces transformation efficiency. Further, combining multiple cysteine mutations with the helicase mutations shows an additive phenotype. Our results suggest that the helicase and metal binding functions are two distinct activities important for ComFA function during transformation.IMPORTANCE ComFA is a highly conserved protein that has a role in DNA uptake during natural competence, a mechanism for horizontal gene transfer observed in many bacteria. Investigation of the details of the DNA uptake mechanism is important for understanding the ways in which bacteria gain new traits from their environment, such as drug resistance. To dissect the role of ComFA in the DNA uptake machinery, we introduced point mutations into several motifs in the protein sequence. We demonstrate that several amino acid motifs conserved among ComFA proteins are important for efficient transformation. This report is the first to demonstrate the functional requirement of an amino-terminal cysteine motif in ComFA. Copyright © 2017 American Society for Microbiology.

  18. Characterization of the tandem CWCH2 sequence motif: a hallmark of inter-zinc finger interactions

    PubMed Central

    2010-01-01

    Background The C2H2 zinc finger (ZF) domain is widely conserved among eukaryotic proteins. In Zic/Gli/Zap1 C2H2 ZF proteins, the two N-terminal ZFs form a single structural unit by sharing a hydrophobic core. This structural unit defines a new motif comprised of two tryptophan side chains at the center of the hydrophobic core. Because each tryptophan residue is located between the two cysteine residues of the C2H2 motif, we have named this structure the tandem CWCH2 (tCWCH2) motif. Results Here, we characterized 587 tCWCH2-containing genes using data derived from public databases. We categorized genes into 11 classes including Zic/Gli/Glis, Arid2/Rsc9, PacC, Mizf, Aebp2, Zap1/ZafA, Fungl, Zfp106, Twincl, Clr1, and Fungl-4ZF, based on sequence similarity, domain organization, and functional similarities. tCWCH2 motifs are mostly found in organisms belonging to the Opisthokonta (metazoa, fungi, and choanoflagellates) and Amoebozoa (amoeba, Dictyostelium discoideum). By comparison, the C2H2 ZF motif is distributed widely among the eukaryotes. The structure and organization of the tCWCH2 motif, its phylogenetic distribution, and molecular phylogenetic analysis suggest that prototypical tCWCH2 genes existed in the Opisthokonta ancestor. Within-group or between-group comparisons of the tCWCH2 amino acid sequence identified three additional sequence features (site-specific amino acid frequencies, longer linker sequence between two C2H2 ZFs, and frequent extra-sequences within C2H2 ZF motifs). Conclusion These features suggest that the tCWCH2 motif is a specialized motif involved in inter-zinc finger interactions. PMID:20167128

  19. Fast and Accurate Discovery of Degenerate Linear Motifs in Protein Sequences

    PubMed Central

    Levy, Emmanuel D.; Michnick, Stephen W.

    2014-01-01

    Linear motifs mediate a wide variety of cellular functions, which makes their characterization in protein sequences crucial to understanding cellular systems. However, the short length and degenerate nature of linear motifs make their discovery a difficult problem. Here, we introduce MotifHound, an algorithm particularly suited for the discovery of small and degenerate linear motifs. MotifHound performs an exact and exhaustive enumeration of all motifs present in proteins of interest, including all of their degenerate forms, and scores the overrepresentation of each motif based on its occurrence in proteins of interest relative to a background (e.g., proteome) using the hypergeometric distribution. To assess MotifHound, we benchmarked it together with state-of-the-art algorithms. The benchmark consists of 11,880 sets of proteins from S. cerevisiae; in each set, we artificially spiked-in one motif varying in terms of three key parameters, (i) number of occurrences, (ii) length and (iii) the number of degenerate or “wildcard” positions. The benchmark enabled the evaluation of the impact of these three properties on the performance of the different algorithms. The results showed that MotifHound and SLiMFinder were the most accurate in detecting degenerate linear motifs. Interestingly, MotifHound was 15 to 20 times faster at comparable accuracy and performed best in the discovery of highly degenerate motifs. We complemented the benchmark by an analysis of proteins experimentally shown to bind the FUS1 SH3 domain from S. cerevisiae. Using the full-length protein partners as sole information, MotifHound recapitulated most experimentally determined motifs binding to the FUS1 SH3 domain. Moreover, these motifs exhibited properties typical of SH3 binding peptides, e.g., high intrinsic disorder and evolutionary conservation, despite the fact that none of these properties were used as prior information. MotifHound is available (http://michnick.bcm.umontreal.ca or http

  20. Correlating novel variable and conserved motifs in the Hemagglutinin protein with significant biological functions

    PubMed Central

    Gendoo, Deena MA; El-Hefnawi, Mahmoud M; Werner, Mark; Siam, Rania

    2008-01-01

    Background Variations in the influenza Hemagglutinin protein contributes to antigenic drift resulting in decreased efficiency of seasonal influenza vaccines and escape from host immune response. We performed an in silico study to determine characteristics of novel variable and conserved motifs in the Hemagglutinin protein from previously reported H3N2 strains isolated from Hong Kong from 1968–1999 to predict viral motifs involved in significant biological functions. Results 14 MEME blocks were generated and comparative analysis of the MEME blocks identified blocks 1, 2, 3 and 7 to correlate with several biological functions. Analysis of the different Hemagglutinin sequences elucidated that the single block 7 has the highest frequency of amino acid substitution and the highest number of co-mutating pairs. MEME 2 showed intermediate variability and MEME 1 was the most conserved. Interestingly, MEME blocks 2 and 7 had the highest incidence of potential post-translational modifications sites including phosphorylation sites, ASN glycosylation motifs and N-myristylation sites. Similarly, these 2 blocks overlap with previously identified antigenic sites and receptor binding sites. Conclusion Our study identifies motifs in the Hemagglutinin protein with different amino acid substitution frequencies over a 31 years period, and derives relevant functional characteristics by correlation of these motifs with potential post-translational modifications sites, antigenic and receptor binding sites. PMID:18681973

  1. Physical Motif Clustering within Intrinsically Disordered Nucleoporin Sequences Reveals Universal Functional Features

    PubMed Central

    Ando, David; Colvin, Michael; Rexach, Michael; Gopinathan, Ajay

    2013-01-01

    Bioinformatics of disordered proteins is especially challenging given high mutation rates for homologous proteins and that functionality may not be strongly related to sequence. Here we have performed a novel bioinformatic analysis, based on the spatial clustering of physically relevant features such as binding motifs and charges within disordered proteins, on thousands of Nuclear Pore Complex (NPC) FG motif containing proteins (FG nups). The biophysical mechanism by which FG nups regulate nucleocytoplasmic transport has remained elusive. Our analysis revealed a set of highly conserved spatial features in the sequence structure of individual FG nups, such as the separation, localization, and ordering of FG motifs and charged residues along the protein chain. These functionally conserved features provide insight into the particular biophysical mechanisms responsible for regulation of nucleocytoplasmic traffic in the NPC, strongly constraining current models. Additionally this method allows us to identify potentially functionally analogous disordered proteins across distantly related species. PMID:24066078

  2. Predicting candidate genomic sequences that correspond to synthetic functional RNA motifs

    PubMed Central

    Laserson, Uri; Gan, Hin Hark; Schlick, Tamar

    2005-01-01

    Riboswitches and RNA interference are important emerging mechanisms found in many organisms to control gene expression. To enhance our understanding of such RNA roles, finding small regulatory motifs in genomes presents a challenge on a wide scale. Many simple functional RNA motifs have been found by in vitro selection experiments, which produce synthetic target-binding aptamers as well as catalytic RNAs, including the hammerhead ribozyme. Motivated by the prediction of Piganeau and Schroeder [(2003) Chem. Biol., 10, 103–104] that synthetic RNAs may have natural counterparts, we develop and apply an efficient computational protocol for identifying aptamer-like motifs in genomes. We define motifs from the sequence and structural information of synthetic aptamers, search for sequences in genomes that will produce motif matches, and then evaluate the structural stability and statistical significance of the potential hits. Our application to aptamers for streptomycin, chloramphenicol, neomycin B and ATP identifies 37 candidate sequences (in coding and non-coding regions) that fold to the target aptamer structures in bacterial and archaeal genomes. Further energetic screening reveals that several candidates exhibit energetic properties and sequence conservation patterns that are characteristic of functional motifs. Besides providing candidates for experimental testing, our computational protocol offers an avenue for expanding natural RNA's functional repertoire. PMID:16254081

  3. A conserved heptamer motif for ribosomal RNA transcription termination in animal mitochondria.

    PubMed Central

    Valverde, J R; Marco, R; Garesse, R

    1994-01-01

    A search of sequence data bases for a tridecamer transcription termination signal, previously described in human mtDNA as being responsible for the accumulation of mitochondrial ribosomal RNAs (rRNAs) in excess over the rest of mitochondrial genes, has revealed that this termination signal occurs in equivalent positions in a wide variety of organisms from protozoa to mammals. Due to the compact organization of the mtDNA, the tridecamer motif usually appears as part of the 3' adjacent gene sequence. Because in phylogenetically widely separated organisms the mitochondrial genome has experienced many rearrangements, it is interesting that its occurrence near the 3' end of the large rRNA is independent of the adjacent gene. The tridecamer sequence has diverged in phylogenetically widely separated organisms. Nevertheless, a well-conserved heptamer--TGGCAGA, the mitochondrial rRNA termination box--can be defined. Although extending the experimental evidence of its role as a transcription termination signal in humans will be of great interest, its evolutionary conservation strongly suggests that mitochondrial rRNA transcription termination could be a widely conserved mechanism in animals. Furthermore, the conservation of a homologous tridecamer motif in one of the last 3' secondary loops of nonmitochondrial 23S-like rRNAs suggests that the role of the sequence has changed during mitochondrial evolution. PMID:7515499

  4. PISMA: A Visual Representation of Motif Distribution in DNA Sequences

    PubMed Central

    Alcántara-Silva, Rogelio; Alvarado-Hermida, Moisés; Díaz-Contreras, Gibrán; Sánchez-Barrios, Martha; Carrera, Samantha; Galván, Silvia Carolina

    2017-01-01

    Background: Because the graphical presentation and analysis of motif distribution can provide insights for experimental hypothesis, PISMA aims at identifying motifs on DNA sequences, counting and showing them graphically. The motif length ranges from 2 to 10 bases, and the DNA sequences range up to 10 kb. The motif distribution is shown as a bar-code–like, as a gene-map–like, and as a transcript scheme. Results: We obtained graphical schemes of the CpG site distribution from 91 human papillomavirus genomes. Also, we present 2 analyses: one of DNA motifs associated with either methylation-resistant or methylation-sensitive CpG islands and another analysis of motifs associated with exosome RNA secretion. Availability and Implementation: PISMA is developed in Java; it is executable in any type of hardware and in diverse operating systems. PISMA is freely available to noncommercial users. The English version and the User Manual are provided in Supplementary Files 1 and 2, and a Spanish version is available at www.biomedicas.unam.mx/wp-content/software/pisma.zip and www.biomedicas.unam.mx/wp-content/pdf/manual/pisma.pdf. PMID:28469418

  5. The distribution of RNA motifs in natural sequences.

    PubMed

    Bourdeau, V; Ferbeyre, G; Pageau, M; Paquin, B; Cedergren, R

    1999-11-15

    Functional analysis of genome sequences has largely ignored RNA genes and their structures. We introduce here the notion of 'ribonomics' to describe the search for the distribution of and eventually the determination of the physiological roles of these RNA structures found in the sequence databases. The utility of this approach is illustrated here by the identification in the GenBank database of RNA motifs having known binding or chemical activity. The frequency of these motifs indicates that most have originated from evolutionary drift and are selectively neutral. On the other hand, their distribution among species and their location within genes suggest that the destiny of these motifs may be more elaborate. For example, the hammerhead motif has a skewed organismal presence, is phylogenetically stable and recent work on a schistosome version confirms its in vivo biological activity. The under-representation of the valine-binding motif and the Rev-binding element in GenBank hints at a detrimental effect on cell growth or viability. Data on the presence and the location of these motifs may provide critical guidance in the design of experiments directed towards the understanding and the manipulation of RNA complexes and activities in vivo.

  6. PISMA: A Visual Representation of Motif Distribution in DNA Sequences.

    PubMed

    Alcántara-Silva, Rogelio; Alvarado-Hermida, Moisés; Díaz-Contreras, Gibrán; Sánchez-Barrios, Martha; Carrera, Samantha; Galván, Silvia Carolina

    2017-01-01

    Because the graphical presentation and analysis of motif distribution can provide insights for experimental hypothesis, PISMA aims at identifying motifs on DNA sequences, counting and showing them graphically. The motif length ranges from 2 to 10 bases, and the DNA sequences range up to 10 kb. The motif distribution is shown as a bar-code-like, as a gene-map-like, and as a transcript scheme. We obtained graphical schemes of the CpG site distribution from 91 human papillomavirus genomes. Also, we present 2 analyses: one of DNA motifs associated with either methylation-resistant or methylation-sensitive CpG islands and another analysis of motifs associated with exosome RNA secretion. PISMA is developed in Java; it is executable in any type of hardware and in diverse operating systems. PISMA is freely available to noncommercial users. The English version and the User Manual are provided in Supplementary Files 1 and 2, and a Spanish version is available at www.biomedicas.unam.mx/wp-content/software/pisma.zip and www.biomedicas.unam.mx/wp-content/pdf/manual/pisma.pdf.

  7. Classification of protein motifs based on subcellular localization uncovers evolutionary relationships at both sequence and functional levels

    PubMed Central

    2013-01-01

    Background Most proteins have evolved in specific cellular compartments that limit their functions and potential interactions. On the other hand, motifs define amino acid arrangements conserved between protein family members and represent powerful tools for assigning function to protein sequences. The ideal motif would identify all members of a protein family but in practice many motifs identify both family members and unrelated proteins, referred to as True Positive (TP) and False Positive (FP) sequences, respectively. Results To address the relationship between protein motifs, protein function and cellular localization, we systematically assigned subcellular localization data to motif sequences from the comprehensive PROSITE sequence motif database. Using this data we analyzed relationships between localization and function. We find that TPs and FPs have a strong tendency to localize in different compartments. When multiple localizations are considered, TPs are usually distributed between related cellular compartments. We also identified cases where FPs are concentrated in particular subcellular regions, indicating possible functional or evolutionary relationships with TP sequences of the same motif. Conclusions Our findings suggest that the systematic examination of subcellular localization has the potential to uncover evolutionary and functional relationships between motif-containing sequences. We believe that this type of analysis complements existing motif annotations and could aid in their interpretation. Our results shed light on the evolution of cellular organelles and potentially establish the basis for new subcellular localization and function prediction algorithms. PMID:23865897

  8. Identification of four conserved motifs among the RNA-dependent polymerase encoding elements.

    PubMed Central

    Poch, O; Sauvaget, I; Delarue, M; Tordo, N

    1989-01-01

    Four consensus sequences are conserved with the same linear arrangement in RNA-dependent DNA polymerases encoded by retroid elements and in RNA-dependent RNA polymerases encoded by plus-, minus- and double-strand RNA viruses. One of these motifs corresponds to the YGDD span previously described by Kamer and Argos (1984). These consensus sequences altogether lead to 4 strictly and 18 conservatively maintained amino acids embedded in a large domain of 120 to 210 amino acids. As judged from secondary structure predictions, each of the 4 motifs, which may cooperate to form a well-ordered domain, places one invariant amino acid in or proximal to turn structures that may be crucial for their correct positioning in a catalytic process. We suggest that this domain may constitute a prerequisite 'polymerase module' implicated in template seating and polymerase activity. At the evolutionary level, the sequence similarities, gap distribution and distances between each motif strongly suggest that the ancestral polymerase module was encoded by an individual genetic element which was most closely related to the plus-strand RNA viruses and the non-viral retroposons. This polymerase module gene may have subsequently propagated in the viral kingdom by distinct gene set recombination events leading to the wide viral variety observed today. Images PMID:2555175

  9. An evolutionary analysis of flightin reveals a conserved motif unique and widespread in Pancrustacea.

    PubMed

    Soto-Adames, Felipe N; Alvarez-Ortiz, Pedro; Vigoreaux, Jim O

    2014-01-01

    Flightin is a thick filament protein that in Drosophila melanogaster is uniquely expressed in the asynchronous, indirect flight muscles (IFM). Flightin is required for the structure and function of the IFM and is indispensable for flight in Drosophila. Given the importance of flight acquisition in the evolutionary history of insects, here we study the phylogeny and distribution of flightin. Flightin was identified in 69 species of hexapods in classes Collembola (springtails), Protura, Diplura, and insect orders Thysanura (silverfish), Dictyoptera (roaches), Orthoptera (grasshoppers), Pthiraptera (lice), Hemiptera (true bugs), Coleoptera (beetles), Neuroptera (green lacewing), Hymenoptera (bees, ants, and wasps), Lepidoptera (moths), and Diptera (flies and mosquitoes). Flightin was also found in 14 species of crustaceans in orders Anostraca (water flea), Cladocera (brine shrimp), Isopoda (pill bugs), Amphipoda (scuds, sideswimmers), and Decapoda (lobsters, crabs, and shrimps). Flightin was not identified in representatives of chelicerates, myriapods, or any species outside Pancrustacea (Tetraconata, sensu Dohle). Alignment of amino acid sequences revealed a conserved region of 52 amino acids, referred herein as WYR, that is bound by strictly conserved tryptophan (W) and arginine (R) and an intervening sequence with a high content of tyrosines (Y). This motif has no homologs in GenBank or PROSITE and is unique to flightin and paraflightin, a putative flightin paralog identified in decapods. A third motif of unclear affinities to pancrustacean WYR was observed in chelicerates. Phylogenetic analysis of amino acid sequences of the conserved motif suggests that paraflightin originated before the divergence of amphipods, isopods, and decapods. We conclude that flightin originated de novo in the ancestor of Pancrustacea > 500 MYA, well before the divergence of insects (~400 MYA) and the origin of flight (~325 MYA), and that its IFM-specific function in Drosophila is a more

  10. Chaotic motif sampler: detecting motifs from biological sequences by using chaotic neurodynamics

    NASA Astrophysics Data System (ADS)

    Matsuura, Takafumi; Ikeguchi, Tohru

    Identification of a region in biological sequences, motif extraction problem (MEP) is solved in bioinformatics. However, the MEP is an NP-hard problem. Therefore, it is almost impossible to obtain an optimal solution within a reasonable time frame. To find near optimal solutions for NP-hard combinatorial optimization problems such as traveling salesman problems, quadratic assignment problems, and vehicle routing problems, chaotic search, which is one of the deterministic approaches, has been proposed and exhibits better performance than stochastic approaches. In this paper, we propose a new alignment method that employs chaotic dynamics to solve the MEPs. It is called the Chaotic Motif Sampler. We show that the performance of the Chaotic Motif Sampler is considerably better than that of the conventional methods such as the Gibbs Site Sampler and the Neighborhood Optimization for Multiple Alignment Discovery.

  11. A gating motif in the translocation channel sets the hydrophobicity threshold for signal sequence function

    PubMed Central

    Trueman, Steven F.; Mandon, Elisabet C.

    2012-01-01

    A critical event in protein translocation across the endoplasmic reticulum is the structural transition between the closed and open conformations of Sec61, the eukaryotic translocation channel. Channel opening allows signal sequence insertion into a gap between the N- and C-terminal halves of Sec61. We have identified a gating motif that regulates the transition between the closed and open channel conformations. Polar amino acid substitutions in the gating motif cause a gain-of-function phenotype that permits translocation of precursors with marginally hydrophobic signal sequences. In contrast, hydrophobic substitutions at certain residues in the gating motif cause a protein translocation defect. We conclude that the gating motif establishes the hydrophobicity threshold for functional insertion of a signal sequence into the Sec61 complex, thereby allowing the wild-type translocation channel to discriminate between authentic signal sequences and the less hydrophobic amino acid segments in cytosolic proteins. Bioinformatic analysis indicates that the gating motif is conserved between eubacterial and archaebacterial SecY and eukaryotic Sec61. PMID:23229898

  12. MoD Tools: regulatory motif discovery in nucleotide sequences from co-regulated or homologous genes

    PubMed Central

    Pavesi, Giulio; Mereghetti, Paolo; Zambelli, Federico; Stefani, Marco; Mauri, Giancarlo; Pesole, Graziano

    2006-01-01

    Understanding the complex mechanisms regulating gene expression at the transcriptional and post-transcriptional levels is one of the greatest challenges of the post-genomic era. The MoD (MOtif Discovery) Tools web server comprises a set of tools for the discovery of novel conserved sequence and structure motifs in nucleotide sequences, motifs that in turn are good candidates for regulatory activity. The server includes the following programs: Weeder, for the discovery of conserved transcription factor binding sites (TFBSs) in nucleotide sequences from co-regulated genes; WeederH, for the discovery of conserved TFBSs and distal regulatory modules in sequences from homologous genes; RNAProfile, for the discovery of conserved secondary structure motifs in unaligned RNA sequences whose secondary structure is not known. In this way, a given gene can be compared with other co-regulated genes or with its homologs, or its mRNA can be analyzed for conserved motifs regulating its post-transcriptional fate. The web server thus provides researchers with different strategies and methods to investigate the regulation of gene expression, at both the transcriptional and post-transcriptional levels. Available at and . PMID:16845071

  13. Finding sequence motifs in prokaryotic genomes--a brief practical guide for a microbiologist.

    PubMed

    Mrázek, Jan

    2009-09-01

    Finding significant nucleotide sequence motifs in prokaryotic genomes can be divided into three types of tasks: (1) supervised motif finding, where a sample of motif sequences is used to find other similar sequences in genomes; (2) unsupervised motif finding, which typically relates to the task of finding regulatory motifs and protein binding sites and (3) exploratory motif finding, which aims to identify potential functionally significant sequence motifs as those that are unusual in some statistical sense. This article provides a conceptual overview for each type of task, a brief description of basic algorithms used in their solution, and a review of selected relevant software available online.

  14. Learning Cellular Sorting Pathways Using Protein Interactions and Sequence Motifs

    PubMed Central

    Lin, Tien-Ho; Bar-Joseph, Ziv

    2011-01-01

    Abstract Proper subcellular localization is critical for proteins to perform their roles in cellular functions. Proteins are transported by different cellular sorting pathways, some of which take a protein through several intermediate locations until reaching its final destination. The pathway a protein is transported through is determined by carrier proteins that bind to specific sequence motifs. In this article, we present a new method that integrates protein interaction and sequence motif data to model how proteins are sorted through these sorting pathways. We use a hidden Markov model (HMM) to represent protein sorting pathways. The model is able to determine intermediate sorting states and to assign carrier proteins and motifs to the sorting pathways. In simulation studies, we show that the method can accurately recover an underlying sorting model. Using data for yeast, we show that our model leads to accurate prediction of subcellular localization. We also show that the pathways learned by our model recover many known sorting pathways and correctly assign proteins to the path they utilize. The learned model identified new pathways and their putative carriers and motifs and these may represent novel protein sorting mechanisms. Supplementary results and software implementation are available from http://murphylab.web.cmu.edu/software/2010_RECOMB_pathways/. PMID:21999284

  15. An intuitionistic approach to scoring DNA sequences against transcription factor binding site motifs.

    PubMed

    Garcia-Alcalde, Fernando; Blanco, Armando; Shepherd, Adrian J

    2010-11-08

    Transcription factors (TFs) control transcription by binding to specific regions of DNA called transcription factor binding sites (TFBSs). The identification of TFBSs is a crucial problem in computational biology and includes the subtask of predicting the location of known TFBS motifs in a given DNA sequence. It has previously been shown that, when scoring matches to known TFBS motifs, interdependencies between positions within a motif should be taken into account. However, this remains a challenging task owing to the fact that sequences similar to those of known TFBSs can occur by chance with a relatively high frequency. Here we present a new method for matching sequences to TFBS motifs based on intuitionistic fuzzy sets (IFS) theory, an approach that has been shown to be particularly appropriate for tackling problems that embody a high degree of uncertainty. We propose SCintuit, a new scoring method for measuring sequence-motif affinity based on IFS theory. Unlike existing methods that consider dependencies between positions, SCintuit is designed to prevent overestimation of less conserved positions of TFBSs. For a given pair of bases, SCintuit is computed not only as a function of their combined probability of occurrence, but also taking into account the individual importance of each single base at its corresponding position. We used SCintuit to identify known TFBSs in DNA sequences. Our method provides excellent results when dealing with both synthetic and real data, outperforming the sensitivity and the specificity of two existing methods in all the experiments we performed. The results show that SCintuit improves the prediction quality for TFs of the existing approaches without compromising sensitivity. In addition, we show how SCintuit can be successfully applied to real research problems. In this study the reliability of the IFS theory for motif discovery tasks is proven.

  16. A generalized profile syntax for biomolecular sequence motifs and its function in automatic sequence interpretation

    SciTech Connect

    Bucher, P.; Bairoch, A.

    1994-12-31

    A general syntax for expressing bimolecular sequence motifs is described, which will be used in future releases of the PROSITE data bank and in a similar collection of nucleic acid sequence motifs currently under development. The central part of the syntax is a regular structure which can be viewed as a generalization of the profiles introduced by Gribskov and coworkers. Accessory features implement specific motif search strategies and provide information helpful for the interpretation of predicted matches. Two contrasting examples, representing E. coli promoters and SH3 domains respectively, are shown to demonstrate the versatility of the syntax, and its compatibility with diverse motif search methods. It is argued, that a comprehensive machine-readable motif collection based on the new syntax, in conjunction with a standard search program, can serve as a general-purpose sequence interpretation and function prediction tool.

  17. Integrative analysis of tissue-specific methylation and alternative splicing identifies conserved transcription factor binding motifs

    PubMed Central

    Wan, Jun; Oliver, Verity F.; Zhu, Heng; Zack, Donald J.; Qian, Jiang; Merbs, Shannath L.

    2013-01-01

    The exact role of intragenic DNA methylation in regulating tissue-specific gene regulation is unclear. Recently, the DNA-binding protein CTCF has been shown to participate in the regulation of alternative splicing in a DNA methylation-dependent manner. To globally evaluate the relationship between DNA methylation and tissue-specific alternative splicing, we performed genome-wide DNA methylation profiling of mouse retina and brain. In protein-coding genes, tissue-specific differentially methylated regions (T-DMRs) were preferentially located in exons and introns. Gene ontology and evolutionary conservation analysis suggest that these T-DMRs are likely to be biologically relevant. More than 14% of alternatively spliced genes were associated with a T-DMR. T-DMR-associated genes were enriched for developmental genes, suggesting that a specific set of alternatively spliced genes may be regulated through DNA methylation. Novel DNA sequences motifs overrepresented in T-DMRs were identified as being associated with positive and/or negative regulation of alternative splicing in a position-dependent context. The majority of these evolutionarily conserved motifs contain a CpG dinucleotide. Some transcription factors, which recognize these motifs, are known to be involved in splicing. Our results suggest that DNA methylation-dependent alternative splicing is widespread and lay the foundation for further mechanistic studies of the role of DNA methylation in tissue-specific splicing regulation. PMID:23887936

  18. Computational definition of sequence motifs governing constitutive exon splicing.

    PubMed

    Zhang, Xiang H-F; Chasin, Lawrence A

    2004-06-01

    We have searched for sequence motifs that contribute to the recognition of human pre-mRNA splice sites by comparing the frequency of 8-mers in internal noncoding exons versus unspliced pseudo exons and 5' untranslated regions (5' untranslated regions [UTRs]) of transcripts of intronless genes. This type of comparison avoids the isolation of sequences that are distinguished by their protein-coding information. We classified sequence families comprising 2069 putative exonic enhancers and 974 putative exonic silencers. Representatives of each class functioned as enhancers or silencers when inserted into a test exon and assayed in transfected mammalian cells. As a class, the enhancer sequencers were more prevalent and the silencer elements less prevalent in all exons compared with introns. A survey of 58 reported exonic splicing mutations showed good agreement between the splicing phenotype and the effect of the mutation on the motifs defined here. The large number of effective sequences implied by these results suggests that sequences that influence splicing may be very abundant in pre-mRNA.

  19. Identification of conserved splicing motifs in mutually exclusive exons of 15 insect species.

    PubMed

    Buendia, Patricia; Tyree, John; Loredo, Robert; Hsu, Shu-Ning

    2012-04-12

    During alternative splicing, the inclusion of an exon in the final mRNA molecule is determined by nuclear proteins that bind cis-regulatory sequences in a target pre-mRNA molecule. A recent study suggested that the regulatory codes of individual RNA-binding proteins may be nearly immutable between very diverse species such as mammals and insects. The model system Drosophila melanogaster therefore presents an excellent opportunity for the study of alternative splicing due to the availability of quality EST annotations in FlyBase. In this paper, we describe an in silico analysis pipeline to extract putative exonic splicing regulatory sequences from a multiple alignment of 15 species of insects. Our method, ESTs-to-ESRs (E2E), uses graph analysis of EST splicing graphs to identify mutually exclusive (ME) exons and combines phylogenetic measures, a sliding window approach along the multiple alignment and the Welch's t statistic to extract conserved ESR motifs. The most frequent 100% conserved word of length 5 bp in different insect exons was "ATGGA". We identified 799 statistically significant "spike" hexamers, 218 motifs with either a left or right FDR corrected spike magnitude p-value < 0.05 and 83 with both left and right uncorrected p < 0.01. 11 genes were identified with highly significant motifs in one ME exon but not in the other, suggesting regulation of ME exon splicing through these highly conserved hexamers. The majority of these genes have been shown to have regulated spatiotemporal expression. 10 elements were found to match three mammalian splicing regulator databases. A putative ESR motif, GATGCAG, was identified in the ME-13b but not in the ME-13a of Drosophila N-Cadherin, a gene that has been shown to have a distinct spatiotemporal expression pattern of spliced isoforms in a recent study. Analysis of phylogenetic relationships and variability of sequence conservation as implemented in the E2E spikes method may lead to improved identification of ESRs

  20. Cloning, expression and functional characterization of the putative regeneration and tolerance factor (RTF/TJ6) as a functional vacuolar ATPase proton pump regulatory subunit with a conserved sequence of immunoreceptor tyrosine-based activation motif.

    PubMed

    Babichev, Yael; Tamir, Ami; Park, Meeyoug; Muallem, Shmuel; Isakov, Noah

    2005-10-01

    In an attempt to identify new immunoreceptor tyrosine-based activation motif (ITAM)-containing human molecules that may regulate hitherto unknown immune cell functions, we BLAST searched the National Center for Biotechnology Information database for ITAM-containing sequences. A human expressed sequence tag showing partial homology to the murine TJ6 (mTJ6) gene and encoding a putative ITAM sequence has been identified and used to clone the human TJ6 (hTJ6) gene from an HL-60-derived cDNA library. hTJ6 was found to encode a protein of 856 residues with a calculated mass of 98 155 Da. Immunolocalization and sequence analysis revealed that hTJ6 is a membrane protein with predicted six transmembrane-spanning regions, typical of ion channels, and a single putative ITAM (residues 452-466) in a juxtamembrane or hydrophobic intramembrane region. hTJ6 is highly homologous to Bos taurus 116-kDa subunit of the vacuolar proton-translocating ATPase. Over-expression of hTJ6 in HEK 293 cells increased H+ uptake into intracellular organelles, an effect that was sensitive to inhibition by bafilomycin, a selective inhibitor of vacuolar H+ pump. Northern blot analysis demonstrated three different hybridizing mRNA transcripts corresponding to 3.2, 5.0 and 7.3 kb, indicating the presence of several splice variants. Significant differences in hTJ6 mRNA levels in human tissues of different origins point to possible tissue-specific function. Although hTJ6 was found to be a poor substrate for tyrosine-phosphorylating enzymes, suggesting that its ITAM sequence is non-functional in protein tyrosine kinase-mediated signaling pathways, its role in organellar H+ pumping suggests that hTJ6 function may participate in protein trafficking/processing.

  1. Sequence-motif Detection of NAD(P)-binding Proteins: Discovery of a Unique Antibacterial Drug Target

    NASA Astrophysics Data System (ADS)

    Hua, Yun Hao; Wu, Chih Yuan; Sargsyan, Karen; Lim, Carmay

    2014-09-01

    Many enzymes use nicotinamide adenine dinucleotide or nicotinamide adenine dinucleotide phosphate (NAD(P)) as essential coenzymes. These enzymes often do not share significant sequence identity and cannot be easily detected by sequence homology. Previously, we determined all distinct locally conserved pyrophosphate-binding structures (3d motifs) from NAD(P)-bound protein structures, from which 1d sequence motifs were derived. Here, we aim to establish the precision of these 3d and 1d motifs to annotate NAD(P)-binding proteins. We show that the pyrophosphate-binding 3d motifs are characteristic of NAD(P)-binding proteins, as they are rarely found in nonNAD(P)-binding proteins. Furthermore, several 1d motifs could distinguish between proteins that bind only NAD and those that bind only NADP. They could also distinguish between NAD(P)-binding proteins from nonNAD(P)-binding ones. Interestingly, one of the pyrophosphate-binding 3d and corresponding 1d motifs was found only in enoyl-acyl carrier protein reductases, which are enzymes essential for bacterial fatty acid biosynthesis. This unique 3d motif serves as an attractive novel drug target, as it is conserved across many bacterial species and is not found in human proteins.

  2. Conserved motifs in prokaryotic and eukaryotic polypeptide release factors: tRNA-protein mimicry hypothesis.

    PubMed Central

    Ito, K; Ebihara, K; Uno, M; Nakamura, Y

    1996-01-01

    Translation termination requires two codon-specific polypeptide release factors in prokaryotes and one omnipotent factor in eukaryotes. Sequences of 17 different polypeptide release factors from prokaryotes and eukaryotes were compared. The prokaryotic release factors share residues split into seven motifs. Conservation of many discrete, perhaps critical, amino acids is observed in eukaryotic release factors, as well as in the C-terminal portion of elongation factor (EF) G. Given that the C-terminal domains of EF-G interacts with ribosomes by mimicry of a tRNA structure, the pattern of conservation of residues in release factors may reflect requirements for a tRNA-mimicry for binding to the A site of the ribosome. This mimicry would explain why release factors recognize stop codons and suggests that all prokaryotic and eukaryotic release factors evolved from the progenitor of EF-G. Images Fig. 2 Fig. 3 PMID:8643594

  3. Identification of LAG3 high affinity aptamers by HT-SELEX and Conserved Motif Accumulation (CMA).

    PubMed

    Soldevilla, Mario Martínez; Hervas, Sandra; Villanueva, Helena; Lozano, Teresa; Rabal, Obdulia; Oyarzabal, Julen; Lasarte, Juan José; Bendandi, Maurizio; Inoges, Susana; López-Díaz de Cerio, Ascensión; Pastor, Fernando

    2017-01-01

    LAG3 receptor belongs to a family of immune-checkpoints expressed in T lymphocytes and other cells of the immune system. It plays an important role as a rheostat of the immune response. Focus on this receptor as a potential therapeutic target in cancer immunotherapy has been underscored after the success of other immune-checkpoint blockade strategies in clinical trials. LAG3 showcases the interest in the field of autoimmunity as several studies show that LAG3-targeting antibodies can also be used for the treatment of autoimmune diseases. In this work we describe the identification of a high-affinity LAG3 aptamer by High Throughput Sequencing SELEX in combination with a study of potential conserved binding modes according to sequence conservation by using 2D-structure prediction and 3D-RNA modeling using Rosetta. The aptamer with the highest accumulation of these conserved sequence motifs displays the highest affinity to LAG3 recombinant soluble proteins and binds to LAG3-expressing lymphocytes. The aptamer described herein has the potential to be used as a therapeutic agent, as it enhances the threshold of T-cell activation. Nonetheless, in future applications, it could also be engineered for treatment of autoimmune diseases by target depletion of LAG3-effector T lymphocytes.

  4. A novel cysteine-rich sequence-specific DNA-binding protein interacts with the conserved X-box motif of the human major histocompatibility complex class II genes via a repeated Cys-His domain and functions as a transcriptional repressor

    PubMed Central

    1994-01-01

    The class II major histocompatibility complex (MHC) molecules function in the presentation of processed peptides to helper T cells. As most mammalian cells can endocytose and process foreign antigen, the critical determinant of an antigen-presenting cell is its ability to express class II MHC molecules. Expression of these molecules is usually restricted to cells of the immune system and dysregulated expression is hypothesized to contribute to the pathogenesis of a severe combined immunodeficiency syndrome and certain autoimmune diseases. Human complementary DNA clones encoding a newly identified, cysteine-rich transcription factor, NF-X1, which binds to the conserved X-box motif of class II MHC genes, were obtained, and the primary amino acid sequence deduced. The major open reading frame encodes a polypeptide of 1,104 amino acids with a symmetrical organization. A central cysteine-rich portion encodes the DNA-binding domain, and is subdivided into seven repeated motifs. This motif is similar to but distinct from the LIM domain and the RING finger family, and is reminiscent of known metal-binding regions. The unique arrangement of cysteines indicates that the consensus sequence CX3CXL-XCGX1- 5HXCX3CHXGXC represents a novel cysteine-rich motif. Two lines of evidence indicate that the polypeptide encodes a potent and biologically relevant repressor of HLA-DRA transcription: (a) overexpression of NF-X1 from a retroviral construct strongly decreases transcription from the HLA-DRA promoter; and (b) the NF-X1 transcript is markedly induced late after induction with interferon gamma (IFN- gamma), coinciding with postinduction attenuation of HLA-DRA transcription. The NF-X1 protein may therefore play an important role in regulating the duration of an inflammatory response by limiting the period in which class II MHC molecules are induced by IFN-gamma. PMID:7964459

  5. Nucleotide binding database NBDB – a collection of sequence motifs with specific protein-ligand interactions

    PubMed Central

    Zheng, Zejun; Goncearenco, Alexander; Berezovsky, Igor N.

    2016-01-01

    NBDB database describes protein motifs, elementary functional loops (EFLs) that are involved in binding of nucleotide-containing ligands and other biologically relevant cofactors/coenzymes, including ATP, AMP, ATP, GMP, GDP, GTP, CTP, PAP, PPS, FMN, FAD(H), NAD(H), NADP, cAMP, cGMP, c-di-AMP and c-di-GMP, ThPP, THD, F-420, ACO, CoA, PLP and SAM. The database is freely available online at http://nbdb.bii.a-star.edu.sg. In total, NBDB contains data on 249 motifs that work in interactions with 24 ligands. Sequence profiles of EFL motifs were derived de novo from nonredundant Uniprot proteome sequences. Conserved amino acid residues in the profiles interact specifically with distinct chemical parts of nucleotide-containing ligands, such as nitrogenous bases, phosphate groups, ribose, nicotinamide, and flavin moieties. Each EFL profile in the database is characterized by a pattern of corresponding ligand–protein interactions found in crystallized ligand–protein complexes. NBDB database helps to explore the determinants of nucleotide and cofactor binding in different protein folds and families. NBDB can also detect fragments that match to profiles of particular EFLs in the protein sequence provided by user. Comprehensive information on sequence, structures, and interactions of EFLs with ligands provides a foundation for experimental and computational efforts on design of required protein functions. PMID:26507856

  6. Phosphotyrosine Substrate Sequence Motifs for Dual Specificity Phosphatases

    PubMed Central

    Zhao, Bryan M.; Keasey, Sarah L.; Tropea, Joseph E.; Lountos, George T.; Dyas, Beverly K.; Cherry, Scott; Raran-Kurussi, Sreejith; Waugh, David S.; Ulrich, Robert G.

    2015-01-01

    Protein tyrosine phosphatases dephosphorylate tyrosine residues of proteins, whereas, dual specificity phosphatases (DUSPs) are a subgroup of protein tyrosine phosphatases that dephosphorylate not only Tyr(P) residue, but also the Ser(P) and Thr(P) residues of proteins. The DUSPs are linked to the regulation of many cellular functions and signaling pathways. Though many cellular targets of DUSPs are known, the relationship between catalytic activity and substrate specificity is poorly defined. We investigated the interactions of peptide substrates with select DUSPs of four types: MAP kinases (DUSP1 and DUSP7), atypical (DUSP3, DUSP14, DUSP22 and DUSP27), viral (variola VH1), and Cdc25 (A-C). Phosphatase recognition sites were experimentally determined by measuring dephosphorylation of 6,218 microarrayed Tyr(P) peptides representing confirmed and theoretical phosphorylation motifs from the cellular proteome. A broad continuum of dephosphorylation was observed across the microarrayed peptide substrates for all phosphatases, suggesting a complex relationship between substrate sequence recognition and optimal activity. Further analysis of peptide dephosphorylation by hierarchical clustering indicated that DUSPs could be organized by substrate sequence motifs, and peptide-specificities by phylogenetic relationships among the catalytic domains. The most highly dephosphorylated peptides represented proteins from 29 cell-signaling pathways, greatly expanding the list of potential targets of DUSPs. These newly identified DUSP substrates will be important for examining structure-activity relationships with physiologically relevant targets. PMID:26302245

  7. cisExpress: motif detection in DNA sequences.

    PubMed

    Triska, Martin; Grocutt, David; Southern, James; Murphy, Denis J; Tatarinova, Tatiana

    2013-09-01

    One of the major challenges for contemporary bioinformatics is the analysis and accurate annotation of genomic datasets to enable extraction of useful information about the functional role of DNA sequences. This article describes a novel genome-wide statistical approach to the detection of specific DNA sequence motifs based on similarities between the promoters of similarly expressed genes. This new tool, cisExpress, is especially designed for use with large datasets, such as those generated by publicly accessible whole genome and transcriptome projects. cisExpress uses a task farming algorithm to exploit all available computational cores within a shared memory node. We demonstrate the robust nature and validity of the proposed method. It is applicable for use with a wide range of genomic databases for any species of interest. cisExpress is available at www.cisexpress.org.

  8. cisExpress: motif detection in DNA sequences

    PubMed Central

    Triska, Martin; Grocutt, David; Southern, James; Murphy, Denis J.; Tatarinova, Tatiana

    2013-01-01

    Motivation: One of the major challenges for contemporary bioinformatics is the analysis and accurate annotation of genomic datasets to enable extraction of useful information about the functional role of DNA sequences. This article describes a novel genome-wide statistical approach to the detection of specific DNA sequence motifs based on similarities between the promoters of similarly expressed genes. This new tool, cisExpress, is especially designed for use with large datasets, such as those generated by publicly accessible whole genome and transcriptome projects. cisExpress uses a task farming algorithm to exploit all available computational cores within a shared memory node. We demonstrate the robust nature and validity of the proposed method. It is applicable for use with a wide range of genomic databases for any species of interest. Availability: cisExpress is available at www.cisexpress.org. Contact: tatiana.tatarinova@usc.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23793750

  9. A conserved motif flags Acyl Carrier Proteins for β-branching in polyketide synthesis

    PubMed Central

    Song, Zhongshu; Farmer, Rohit; Williams, Christopher; Hothersall, Joanne; Płoskoń, Eliza; Wattana-amorn, Pakorn; Stephens, Elton R.; Yamada, Erika; Gurney, Rachel; Takebayashi, Yuiko; Masschelein, Joleen; Cox, Russell J.; Lavigne, Rob; Willis, Christine L.; Simpson, Thomas J.; Crosby, John; Winn, Peter J.; Thomas, Christopher M.; Crump, Matthew P.

    2015-01-01

    Type I PKSs often utilise programmed β-branching, via enzymes of an “HMG-CoA synthase (HCS) cassette”, to incorporate various side chains at the second carbon from the terminal carboxylic acid of growing polyketide backbones. We identified a strong sequence motif in Acyl Carrier Proteins (ACPs) where β-branching is known. Substituting ACPs confirmed a correlation of ACP type with β-branching specificity. While these ACPs often occur in tandem, NMR analysis of tandem β-branching ACPs indicated no ACP-ACP synergistic effects and revealed that the conserved sequence motif forms an internal core rather than an exposed patch. Modelling and mutagenesis identified ACP Helix III as a probable anchor point of the ACP-HCS complex whose position is determined by the core. Mutating the core affects ACP functionality while ACP-HCS interface substitutions modulate system specificity. Our method for predicting β-carbon branching expands the potential for engineering novel polyketides and lays a basis for determining specificity rules. PMID:24056399

  10. QGRS-H Predictor: a web server for predicting homologous quadruplex forming G-rich sequence motifs in nucleotide sequences

    PubMed Central

    Menendez, Camille; Frees, Scott; Bagga, Paramjeet S.

    2012-01-01

    Naturally occurring G-quadruplex structural motifs, formed by guanine-rich nucleic acids, have been reported in telomeric, promoter and transcribed regions of mammalian genomes. G-quadruplex structures have received significant attention because of growing evidence for their role in important biological processes, human disease and as therapeutic targets. Lately, there has been much interest in the potential roles of RNA G-quadruplexes as cis-regulatory elements of post-transcriptional gene expression. Large-scale computational genomics studies on G-quadruplexes have difficulty validating their predictions without laborious testing in ‘wet’ labs. We have developed a bioinformatics tool, QGRS-H Predictor that can map and analyze conserved putative Quadruplex forming 'G'-Rich Sequences (QGRS) in mRNAs, ncRNAs and other nucleotide sequences, e.g. promoter, telomeric and gene flanking regions. Identifying conserved regulatory motifs helps validate computations and enhances accuracy of predictions. The QGRS-H Predictor is particularly useful for mapping homologous G-quadruplex forming sequences as cis-regulatory elements in the context of 5′- and 3′-untranslated regions, and CDS sections of aligned mRNA sequences. QGRS-H Predictor features highly interactive graphic representation of the data. It is a unique and user-friendly application that provides many options for defining and studying G-quadruplexes. The QGRS-H Predictor can be freely accessed at: http://quadruplex.ramapo.edu/qgrs/app/start. PMID:22576365

  11. Software tools for motif and pattern scanning: program descriptions including a universal sequence reading algorithm.

    PubMed

    Cockwell, K Y; Giles, I G

    1989-07-01

    Two programs, MOTIF and PATTERN, that scan sequences for matches to user-defined motifs and patterns of motifs based on identity and set membership are described. The programs use a simple and logical notation to define motifs, and may be used either interactively or by using command line parameters (suitable for batch processing). The two programs described also incorporate a simple, yet reliable, algorithm that automatically detects in which of six possible formats the sequence entry is written.

  12. Identification of multiple distinct Snf2 subfamilies with conserved structural motifs

    PubMed Central

    Flaus, Andrew; Martin, David M. A.; Barton, Geoffrey J.; Owen-Hughes, Tom

    2006-01-01

    The Snf2 family of helicase-related proteins includes the catalytic subunits of ATP-dependent chromatin remodelling complexes found in all eukaryotes. These act to regulate the structure and dynamic properties of chromatin and so influence a broad range of nuclear processes. We have exploited progress in genome sequencing to assemble a comprehensive catalogue of over 1300 Snf2 family members. Multiple sequence alignment of the helicase-related regions enables 24 distinct subfamilies to be identified, a considerable expansion over earlier surveys. Where information is known, there is a good correlation between biological or biochemical function and these assignments, suggesting Snf2 family motor domains are tuned for specific tasks. Scanning of complete genomes reveals all eukaryotes contain members of multiple subfamilies, whereas they are less common and not ubiquitous in eubacteria or archaea. The large sample of Snf2 proteins enables additional distinguishing conserved sequence blocks within the helicase-like motor to be identified. The establishment of a phylogeny for Snf2 proteins provides an opportunity to make informed assignments of function, and the identification of conserved motifs provides a framework for understanding the mechanisms by which these proteins function. PMID:16738128

  13. A conserved motif in the promoters of several cytokines expressed by human Th2-type lymphocytes.

    PubMed

    Staynov, D Z; Cousins, D J; Lee, T H

    1995-01-01

    We have recently found a novel conserved motif in the promoters of several T-cell-expressed cytokines [human interleukin-2, -4, -5 and -13 and human and mouse granulocyte/macrophage-colony stimulating factor (GM-CSF)]. It contains a core sequence CTTGG ... CCAAG which is present as part of larger palindromic sequences in each gene. This suggest that they may interact with a new family of trans-acting factors. In transfection assays, the human GM-CSF element has a strong positive effect on the expression of a reporter gene by the human T cell line Jurkat J6 upon stimulation. In DNA mobility shift assays, this sequence can give either six different specific bands which are competed out by different parts of the sequence or one specific band which is competed out by each of the inverted repeats, depending on the reconstitution conditions. In different genes, the core sequences are separated by integer numbers of helical turns. Considering the strong positive regulatory effect of this element and its presence in several T-cell-expressed cytokine genes, it may be crucial to the coordinated expression of these cytokines in T helper cells.

  14. Conserved DNA motifs in the type II-A CRISPR leader region

    PubMed Central

    Babu, Kesavan; Najar, Fares Z.

    2017-01-01

    The Clustered Regularly Interspaced Short Palindromic Repeats associated (CRISPR-Cas) systems consist of RNA-protein complexes that provide bacteria and archaea with sequence-specific immunity against bacteriophages, plasmids, and other mobile genetic elements. Bacteria and archaea become immune to phage or plasmid infections by inserting short pieces of the intruder DNA (spacer) site-specifically into the leader-repeat junction in a process called adaptation. Previous studies have shown that parts of the leader region, especially the 3′ end of the leader, are indispensable for adaptation. However, a comprehensive analysis of leader ends remains absent. Here, we have analyzed the leader, repeat, and Cas proteins from 167 type II-A CRISPR loci. Our results indicate two distinct conserved DNA motifs at the 3′ leader end: ATTTGAG (noted previously in the CRISPR1 locus of Streptococcus thermophilus DGCC7710) and a newly defined CTRCGAG, associated with the CRISPR3 locus of S. thermophilus DGCC7710. A third group with a very short CG DNA conservation at the 3′ leader end is observed mostly in lactobacilli. Analysis of the repeats and Cas proteins revealed clustering of these CRISPR components that mirrors the leader motif clustering, in agreement with the coevolution of CRISPR-Cas components. Based on our analysis of the type II-A CRISPR loci, we implicate leader end sequences that could confer site-specificity for the adaptation-machinery in the different subsets of type II-A CRISPR loci. PMID:28392985

  15. Sequence Motifs in MADS Transcription Factors Responsible for Specificity and Diversification of Protein-Protein Interaction

    PubMed Central

    van Dijk, Aalt D. J.; Morabito, Giuseppa; Fiers, Martijn; van Ham, Roeland C. H. J.; Angenent, Gerco C.; Immink, Richard G. H.

    2010-01-01

    Protein sequences encompass tertiary structures and contain information about specific molecular interactions, which in turn determine biological functions of proteins. Knowledge about how protein sequences define interaction specificity is largely missing, in particular for paralogous protein families with high sequence similarity, such as the plant MADS domain transcription factor family. In comparison to the situation in mammalian species, this important family of transcription regulators has expanded enormously in plant species and contains over 100 members in the model plant species Arabidopsis thaliana. Here, we provide insight into the mechanisms that determine protein-protein interaction specificity for the Arabidopsis MADS domain transcription factor family, using an integrated computational and experimental approach. Plant MADS proteins have highly similar amino acid sequences, but their dimerization patterns vary substantially. Our computational analysis uncovered small sequence regions that explain observed differences in dimerization patterns with reasonable accuracy. Furthermore, we show the usefulness of the method for prediction of MADS domain transcription factor interaction networks in other plant species. Introduction of mutations in the predicted interaction motifs demonstrated that single amino acid mutations can have a large effect and lead to loss or gain of specific interactions. In addition, various performed bioinformatics analyses shed light on the way evolution has shaped MADS domain transcription factor interaction specificity. Identified protein-protein interaction motifs appeared to be strongly conserved among orthologs, indicating their evolutionary importance. We also provide evidence that mutations in these motifs can be a source for sub- or neo-functionalization. The analyses presented here take us a step forward in understanding protein-protein interactions and the interplay between protein sequences and network evolution. PMID

  16. Human and mouse introns are linked to the same processes and functions through each genome's most frequent non-conserved motifs.

    PubMed

    Tsirigos, Aristotelis; Rigoutsos, Isidore

    2008-06-01

    We identified the most frequent, variable-length DNA sequence motifs in the human and mouse genomes and sub-selected those with multiple recurrences in the intergenic and intronic regions and at least one additional exonic instance in the corresponding genome. We discovered that these motifs have virtually no overlap with intronic sequences that are conserved between human and mouse, and thus are genome-specific. Moreover, we found that these motifs span a substantial fraction of previously uncharacterized human and mouse intronic space. Surprisingly, we found that these genome-specific motifs are over-represented in the introns of genes belonging to the same biological processes and molecular functions in both the human and mouse genomes even though the underlying sequences are not conserved between the two genomes. In fact, the processes and functions that are linked to these genome-specific sequence-motifs are distinct from the processes and functions which are associated with intronic regions that are conserved between human and mouse. The findings show that intronic regions from different genomes are linked to the same processes and functions in the absence of underlying sequence conservation. We highlight the ramifications of this observation with a concrete example that involves the microsatellite instability gene MLH1.

  17. TOPDOM: database of conservatively located domains and motifs in proteins.

    PubMed

    Varga, Julia; Dobson, László; Tusnády, Gábor E

    2016-09-01

    The TOPDOM database-originally created as a collection of domains and motifs located consistently on the same side of the membranes in α-helical transmembrane proteins-has been updated and extended by taking into consideration consistently localized domains and motifs in globular proteins, too. By taking advantage of the recently developed CCTOP algorithm to determine the type of a protein and predict topology in case of transmembrane proteins, and by applying a thorough search for domains and motifs as well as utilizing the most up-to-date version of all source databases, we managed to reach a 6-fold increase in the size of the whole database and a 2-fold increase in the number of transmembrane proteins. TOPDOM database is available at http://topdom.enzim.hu The webpage utilizes the common Apache, PHP5 and MySQL software to provide the user interface for accessing and searching the database. The database itself is generated on a high performance computer. tusnady.gabor@ttk.mta.hu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  18. Conservation of MHC class II DOA sequences among carnivores.

    PubMed

    Soll, S J; Stewart, B S; Lehman, N

    2005-03-01

    We obtained the nucleotide sequence for most of the major histocompatibility complex (MHC) class II DOA locus for Weddell, leopard, northern elephant, and southern elephant seals and from the coyote and compared them to all known DOA data available to date. We found generally low levels of interspecific polymorphisms, providing further support for stabilizing selection acting on the DOA locus. This suggests that DO gene products play a substantial functional role in the regulation of antigen presentation. A seven-amino-acid motif of VWRLPEF was found to be conserved across all DOA sequences and may be a DO-specific recognition element.

  19. Comprehensive analysis of animal TALE homeobox genes: new conserved motifs and cases of accelerated evolution.

    PubMed

    Mukherjee, Krishanu; Bürglin, Thomas R

    2007-08-01

    TALE homeodomain proteins are an ancient subgroup within the group of homeodomain transcription factors that play important roles in animal, plant, and fungal development. We have extracted the full complement of TALE superclass homeobox genes from the genome projects of seven protostomes, seven deuterostomes, and Nematostella. This was supplemented with TALE homeobox genes from additional species and phylogenetic analyses were carried out with 276 sequences. We found 20 homeobox genes and 4 pseudogenes in humans, 21 genes in mouse, 8 genes in Drosophila, and 5 genes plus one truncated gene in Caenorhabditis elegans. Apart from the previously identified TALE classes MEIS, PBC, IRO, and TGIF, a novel class is identified, termed MOHAWK (MKX). Further, we show that the MEIS class can be divided into two families, PREP and MEIS. Prep genes have previously only been described in vertebrates but are lacking in Drosophila. Here we identify orthologues in other insect taxa as well as in the cnidarian Nematostella. In C. elegans, a divergent Prep protein has lost the homeodomain. Full-length multiple sequence alignment of the protostome and deuterostome sequences allowed us to identify several novel conserved motifs within the MKX, TGIF, and MEIS classes. Phylogenetic analyses revealed fast-evolving PBC class genes; in particular, some X-linked PBC genes in nematodes are subject to rapid evolution. In addition, several instances of gene loss were identified. In conclusion, our comprehensive analysis provides a defining framework for the classification of animal TALE homeobox genes and the understanding of their evolution.

  20. RNA polymerase II senses obstruction in the DNA minor groove via a conserved sensor motif

    PubMed Central

    Xu, Liang; Wang, Wei; Gotte, Deanna; Yang, Fei; Hare, Alissa A.; Welch, Timothy R.; Li, Benjamin C.; Shin, Ji Hyun; Chong, Jenny; Strathern, Jeffrey N.; Dervan, Peter B.; Wang, Dong

    2016-01-01

    RNA polymerase II (pol II) encounters numerous barriers during transcription elongation, including DNA strand breaks, DNA lesions, and nucleosomes. Pyrrole-imidazole (Py-Im) polyamides bind to the minor groove of DNA with programmable sequence specificity and high affinity. Previous studies suggest that Py-Im polyamides can prevent transcription factor binding, as well as interfere with pol II transcription elongation. However, the mechanism of pol II inhibition by Py-Im polyamides is unclear. Here we investigate the mechanism of how these minor-groove binders affect pol II transcription elongation. In the presence of site-specifically bound Py-Im polyamides, we find that the pol II elongation complex becomes arrested immediately upstream of the targeted DNA sequence, and is not rescued by transcription factor IIS, which is in contrast to pol II blockage by a nucleosome barrier. Further analysis reveals that two conserved pol II residues in the Switch 1 region contribute to pol II stalling. Our study suggests this motif in pol II can sense the structural changes of the DNA minor groove and can be considered a “minor groove sensor.” Prolonged interference of transcription elongation by sequence-specific minor groove binders may present opportunities to target transcription addiction for cancer therapy. PMID:27791148

  1. Ser/Thr Motifs in Transmembrane Proteins: Conservation Patterns and Effects on Local Protein Structure and Dynamics

    PubMed Central

    del Val, Coral; White, Stephen H.

    2014-01-01

    We combined systematic bioinformatics analyses and molecular dynamics simulations to assess the conservation patterns of Ser and Thr motifs in membrane proteins, and the effect of such motifs on the structure and dynamics of α-helical transmembrane (TM) segments. We find that Ser/Thr motifs are often present in β-barrel TM proteins. At least one Ser/Thr motif is present in almost half of the sequences of α-helical proteins analyzed here. The extensive bioinformatics analyses and inspection of protein structures led to the identification of molecular transporters with noticeable numbers of Ser/Thr motifs within the TM region. Given the energetic penalty for burying multiple Ser/Thr groups in the membrane hydrophobic core, the observation of transporters with multiple membrane-embedded Ser/Thr is intriguing and raises the question of how the presence of multiple Ser/Thr affects protein local structure and dynamics. Molecular dynamics simulations of four different Ser-containing model TM peptides indicate that backbone hydrogen bonding of membrane-buried Ser/Thr hydroxyl groups can significantly change the local structure and dynamics of the helix. Ser groups located close to the membrane interface can hydrogen bond to solvent water instead of protein backbone, leading to an enhanced local solvation of the peptide. PMID:22836667

  2. Sequence-specific high mobility group box factors recognize 10-12-base pair minor groove motifs.

    PubMed

    van Beest, M; Dooijes, D; van De Wetering, M; Kjaerulff, S; Bonvin, A; Nielsen, O; Clevers, H

    2000-09-01

    Sequence-specific high mobility group (HMG) box factors bind and bend DNA via interactions in the minor groove. Three-dimensional NMR analyses have provided the structural basis for this interaction. The cognate HMG domain DNA motif is generally believed to span 6-8 bases. However, alignment of promoter elements controlled by the yeast genes ste11 and Rox1 has indicated strict conservation of a larger DNA motif. By site selection, we identify a highly specific 12-base pair motif for Ste11, AGAACAAAGAAA. Similarly, we show that Tcf1, MatMc, and Sox4 bind unique, highly specific DNA motifs of 12, 12, and 10 base pairs, respectively. Footprinting with a deletion mutant of Ste11 reveals a novel interaction between the 3' base pairs of the extended DNA motif and amino acids C-terminal to the HMG domain. The sequence-specific interaction of Ste11 with these 3' base pairs contributes significantly to binding and bending of the DNA motif.

  3. Illumina MiSeq sequencing disfavours a sequence motif in the GFP reporter gene

    PubMed Central

    Van den Hoecke, Silvie; Verhelst, Judith; Saelens, Xavier

    2016-01-01

    Green fluorescent protein (GFP) is one of the most used reporter genes. We have used next-generation sequencing (NGS) to analyse the genetic diversity of a recombinant influenza A virus that expresses GFP and found a remarkable coverage dip in the GFP coding sequence. This coverage dip was present when virus-derived RT-PCR product or the parental plasmid DNA was used as starting material for NGS and regardless of whether Nextera XT transposase or Covaris shearing was used for DNA fragmentation. Therefore, the sequence coverage dip in the GFP coding sequence was not the result of emerging GFP mutant viruses or a bias introduced by Nextera XT fragmentation. Instead, we found that the Illumina MiSeq sequencing method disfavours the ‘CCCGCC’ motif in the GFP coding sequence. PMID:27193250

  4. Identification of sequence motifs in oligonucleotides whose presence is correlated with antisense activity.

    PubMed

    Matveeva, O V; Tsodikov, A D; Giddings, M; Freier, S M; Wyatt, J R; Spiridonov, A N; Shabalina, S A; Gesteland, R F; Atkins, J F

    2000-08-01

    Design of antisense oligonucleotides targeting any mRNA can be much more efficient when several activity-enhancing motifs are included and activity-decreasing motifs are avoided. This conclusion was made after statistical analysis of data collected from >1000 experiments with phosphorothioate-modified oligonucleotides. Highly significant positive correlation between the presence of motifs CCAC, TCCC, ACTC, GCCA and CTCT in the oligonucleotide and its antisense efficiency was demonstrated. In addition, negative correlation was revealed for the motifs GGGG, ACTG, AAA and TAA. It was found that the likelihood of activity of an oligonucleotide against a desired mRNA target is sequence motif content dependent.

  5. Identification of sequence motifs in oligonucleotides whose presence is correlated with antisense activity

    PubMed Central

    Matveeva, O. V.; Tsodikov, A. D.; Giddings, M.; Freier, S. M.; Wyatt, J. R.; Spiridonov, A. N.; Shabalina, S. A.; Gesteland, R. F.; Atkins, J. F.

    2000-01-01

    Design of antisense oligonucleotides targeting any mRNA can be much more efficient when several activity-enhancing motifs are included and activity-decreasing motifs are avoided. This conclusion was made after statistical analysis of data collected from >1000 experiments with phosphorothioate-modified oligonucleotides. Highly significant positive correlation between the presence of motifs CCAC, TCCC, ACTC, GCCA and CTCT in the oligonucleotide and its antisense efficiency was demonstrated. In addition, negative correlation was revealed for the motifs GGGG, ACTG, AAA and TAA. It was found that the likelihood of activity of an oligonucleotide against a desired mRNA target is sequence motif content dependent. PMID:10908347

  6. Evolutionarily divergent spliceosomal snRNAs and a conserved non-coding RNA processing motif in Giardia lamblia

    PubMed Central

    Hudson, Andrew J.; Moore, Ashley N.; Elniski, David; Joseph, Joella; Yee, Janet; Russell, Anthony G.

    2012-01-01

    Non-coding RNAs (ncRNAs) have diverse essential biological functions in all organisms, and in eukaryotes, two such classes of ncRNAs are the small nucleolar (sno) and small nuclear (sn) RNAs. In this study, we have identified and characterized a collection of sno and snRNAs in Giardia lamblia, by exploiting our discovery of a conserved 12 nt RNA processing sequence motif found in the 3′ end regions of a large number of G. lamblia ncRNA genes. RNA end mapping and other experiments indicate the motif serves to mediate ncRNA 3′ end formation from mono- and di-cistronic RNA precursor transcripts. Remarkably, we find the motif is also utilized in the processing pathway of all four previously identified trans-spliced G. lamblia introns, revealing a common RNA processing pathway for ncRNAs and trans-spliced introns in this organism. Motif sequence conservation then allowed for the bioinformatic and experimental identification of additional G. lamblia ncRNAs, including new U1 and U6 spliceosomal snRNA candidates. The U6 snRNA candidate was then used as a tool to identity novel U2 and U4 snRNAs, based on predicted phylogenetically conserved snRNA–snRNA base-pairing interactions, from a set of previously identified G. lamblia ncRNAs without assigned function. The Giardia snRNAs retain the core features of spliceosomal snRNAs but are sufficiently evolutionarily divergent to explain the difficulties in their identification. Most intriguingly, all of these snRNAs show structural features diagnostic of U2-dependent/major and U12-dependent/minor spliceosomal snRNAs. PMID:23019220

  7. Evolutionarily divergent spliceosomal snRNAs and a conserved non-coding RNA processing motif in Giardia lamblia.

    PubMed

    Hudson, Andrew J; Moore, Ashley N; Elniski, David; Joseph, Joella; Yee, Janet; Russell, Anthony G

    2012-11-01

    Non-coding RNAs (ncRNAs) have diverse essential biological functions in all organisms, and in eukaryotes, two such classes of ncRNAs are the small nucleolar (sno) and small nuclear (sn) RNAs. In this study, we have identified and characterized a collection of sno and snRNAs in Giardia lamblia, by exploiting our discovery of a conserved 12 nt RNA processing sequence motif found in the 3' end regions of a large number of G. lamblia ncRNA genes. RNA end mapping and other experiments indicate the motif serves to mediate ncRNA 3' end formation from mono- and di-cistronic RNA precursor transcripts. Remarkably, we find the motif is also utilized in the processing pathway of all four previously identified trans-spliced G. lamblia introns, revealing a common RNA processing pathway for ncRNAs and trans-spliced introns in this organism. Motif sequence conservation then allowed for the bioinformatic and experimental identification of additional G. lamblia ncRNAs, including new U1 and U6 spliceosomal snRNA candidates. The U6 snRNA candidate was then used as a tool to identity novel U2 and U4 snRNAs, based on predicted phylogenetically conserved snRNA-snRNA base-pairing interactions, from a set of previously identified G. lamblia ncRNAs without assigned function. The Giardia snRNAs retain the core features of spliceosomal snRNAs but are sufficiently evolutionarily divergent to explain the difficulties in their identification. Most intriguingly, all of these snRNAs show structural features diagnostic of U2-dependent/major and U12-dependent/minor spliceosomal snRNAs.

  8. Active motif finder - a bio-tool based on mutational structures in DNA sequences

    PubMed Central

    Udayakumar, Mani; Shanmuga-priya, Palaniyandi; Hemavathi, Kamalakannan; Seenivasagam, Rengasamy

    2011-01-01

    Active Motif Finder (AMF) is a novel algorithmic tool, designed based on mutations in DNA sequences. Tools available at present for finding motifs are based on matching a given motif in the query sequence. AMF describes a new algorithm that identifies the occurrences of patterns which possess all kinds of mutations like insertion, deletion and mismatch. The algorithm is mainly based on the Alignment Score Matrix (ASM) computation by comparing input motif with full length sequence. Much of the effort in bioinformatics is directed to identify these motifs in the sequences of newly discovered genes. The proposed bio-tool serves as an open resource for analysis and useful for studying polymorphisms in DNA sequences. AMF can be searched via a user-friendly interface. This tool is intended to serve the scientific community working in the areas of chemical and structural biology, and is freely available to all users, at http://www.sastra.edu/scbt/amf/. PMID:23554723

  9. Exhaustive Search for Over-represented DNA Sequence Motifs with CisFinder

    PubMed Central

    Sharov, Alexei A.; Ko, Minoru S.H.

    2009-01-01

    We present CisFinder software, which generates a comprehensive list of motifs enriched in a set of DNA sequences and describes them with position frequency matrices (PFMs). A new algorithm was designed to estimate PFMs directly from counts of n-mer words with and without gaps; then PFMs are extended over gaps and flanking regions and clustered to generate non-redundant sets of motifs. The algorithm successfully identified binding motifs for 12 transcription factors (TFs) in embryonic stem cells based on published chromatin immunoprecipitation sequencing data. Furthermore, CisFinder successfully identified alternative binding motifs of TFs (e.g. POU5F1, ESRRB, and CTCF) and motifs for known and unknown co-factors of genes associated with the pluripotent state of ES cells. CisFinder also showed robust performance in the identification of motifs that were only slightly enriched in a set of DNA sequences. PMID:19740934

  10. Sequence motifs and prokaryotic expression of the reptilian paramyxovirus fusion protein

    USGS Publications Warehouse

    Franke, J.; Batts, W.N.; Ahne, W.; Kurath, G.; Winton, J.R.

    2006-01-01

    Fourteen reptilian paramyxovirus isolates were chosen to represent the known extent of genetic diversity among this novel group of viruses. Selected regions of the fusion (F) gene were sequenced, analyzed and compared. The F gene of all isolates contained conserved motifs homologous to those described for other members of the family Paramyxoviridae including: signal peptide, transmembrane domain, furin cleavage site, fusion peptide, N-linked glycosylation sites, and two heptad repeats, the second of which (HRB-LZ) had the characteristics of a leucine zipper. Selected regions of the fusion gene of isolate Gono-GER85 were inserted into a prokaryotic expression system to generate three recombinant protein fragments of various sizes. The longest recombinant protein was cleaved by furin into two fragments of predicted length. Western blot analysis with virus-neutralizing rabbit-antiserum against this isolate demonstrated that only the longest construct reacted with the antiserum. This construct was unique in containing 30 additional C-terminal amino acids that included most of the HRB-LZ. These results indicate that the F genes of reptilian paramyxoviruses contain highly conserved motifs typical of other members of the family and suggest that the HRB-LZ domain of the reptilian paramyxovirus F protein contains a linear antigenic epitope. ?? Springer-Verlag 2005.

  11. Sequence motifs and prokaryotic expression of the reptilian paramyxovirus fusion protein.

    PubMed

    Franke, J; Batts, W N; Ahne, W; Kurath, G; Winton, J R

    2006-03-01

    Fourteen reptilian paramyxovirus isolates were chosen to represent the known extent of genetic diversity among this novel group of viruses. Selected regions of the fusion (F) gene were sequenced, analyzed and compared. The F gene of all isolates contained conserved motifs homologous to those described for other members of the family Paramyxoviridae including: signal peptide, transmembrane domain, furin cleavage site, fusion peptide, N-linked glycosylation sites, and two heptad repeats, the second of which (HRB-LZ) had the characteristics of a leucine zipper. Selected regions of the fusion gene of isolate Gono-GER85 were inserted into a prokaryotic expression system to generate three recombinant protein fragments of various sizes. The longest recombinant protein was cleaved by furin into two fragments of predicted length. Western blot analysis with virus-neutralizing rabbit-antiserum against this isolate demonstrated that only the longest construct reacted with the antiserum. This construct was unique in containing 30 additional C-terminal amino acids that included most of the HRB-LZ. These results indicate that the F genes of reptilian paramyxoviruses contain highly conserved motifs typical of other members of the family and suggest that the HRB-LZ domain of the reptilian paramyxovirus F protein contains a linear antigenic epitope.

  12. A conserved motif mediates both multimer formation and allosteric activation of phosphoglycerate mutase 5.

    PubMed

    Wilkins, Jordan M; McConnell, Cyrus; Tipton, Peter A; Hannink, Mark

    2014-09-05

    Phosphoglycerate mutase 5 (PGAM5) is an atypical mitochondrial Ser/Thr phosphatase that modulates mitochondrial dynamics and participates in both apoptotic and necrotic cell death. The mechanisms that regulate the phosphatase activity of PGAM5 are poorly understood. The C-terminal phosphoglycerate mutase domain of PGAM5 shares homology with the catalytic domains found in other members of the phosphoglycerate mutase family, including a conserved histidine that is absolutely required for catalytic activity. However, this conserved domain is not sufficient for maximal phosphatase activity. We have identified a highly conserved amino acid motif, WDXNWD, located within the unique N-terminal region, which is required for assembly of PGAM5 into large multimeric complexes. Alanine substitutions within the WDXNWD motif abolish the formation of multimeric complexes and markedly reduce phosphatase activity of PGAM5. A peptide containing the WDXNWD motif dissociates the multimeric complex and reduces but does not fully abolish phosphatase activity. Addition of the WDXNWD-containing peptide in trans to a mutant PGAM5 protein lacking the WDXNWD motif markedly increases phosphatase activity of the mutant protein. Our results are consistent with an intermolecular allosteric regulation mechanism for the phosphatase activity of PGAM5, in which the assembly of PGAM5 into multimeric complexes, mediated by the WDXNWD motif, results in maximal activation of phosphatase activity. Our results suggest the possibility of identifying small molecules that function as allosteric regulators of the phosphatase activity of PGAM5.

  13. HIGEDA: a hierarchical gene-set genetics based algorithm for finding subtle motifs in biological sequences.

    PubMed

    Le, Thanh; Altman, Tom; Gardiner, Katheleen

    2010-02-01

    Identification of motifs in biological sequences is a challenging problem because such motifs are often short, degenerate, and may contain gaps. Most algorithms that have been developed for motif-finding use the expectation-maximization (EM) algorithm iteratively. Although EM algorithms can converge quickly, they depend strongly on initialization parameters and can converge to local sub-optimal solutions. In addition, they cannot generate gapped motifs. The effectiveness of EM algorithms in motif finding can be improved by incorporating methods that choose different sets of initial parameters to enable escape from local optima, and that allow gapped alignments within motif models. We have developed HIGEDA, an algorithm that uses the hierarchical gene-set genetic algorithm (HGA) with EM to initiate and search for the best parameters for the motif model. In addition, HIGEDA can identify gapped motifs using a position weight matrix and dynamic programming to generate an optimal gapped alignment of the motif model with sequences from the dataset. We show that HIGEDA outperforms MEME and other motif-finding algorithms on both DNA and protein sequences. Source code and test datasets are available for download at http://ouray.cudenver.edu/~tnle/, implemented in C++ and supported on Linux and MS Windows.

  14. Conserved amino acid motifs from the novel Piv/MooV family of transposases and site-specific recombinases are required for catalysis of DNA inversion by Piv.

    PubMed

    Tobiason, D M; Buchner, J M; Thiel, W H; Gernert, K M; Karls, A C

    2001-02-01

    Piv, a site-specific invertase from Moraxella lacunata, exhibits amino acid homology with the transposases of the IS110/IS492 family of insertion elements. The functions of conserved amino acid motifs that define this novel family of both transposases and site-specific recombinases (Piv/MooV family) were examined by mutagenesis of fully conserved amino acids within each motif in Piv. All Piv mutants altered in conserved residues were defective for in vivo inversion of the M. lacunata invertible DNA segment, but competent for in vivo binding to Piv DNA recognition sequences. Although the primary amino acid sequences of the Piv/MooV recombinases do not contain a conserved DDE motif, which defines the retroviral integrase/transposase (IN/Tnps) family, the predicted secondary structural elements of Piv align well with those of the IN/Tnps for which crystal structures have been determined. Molecular modelling of Piv based on these alignments predicts that E59, conserved as either E or D in the Piv/MooV family, forms a catalytic pocket with the conserved D9 and D101 residues. Analysis of Piv E59G confirms a role for E59 in catalysis of inversion. These results suggest that Piv and the related IS110/IS492 transposases mediate DNA recombination by a common mechanism involving a catalytic DED or DDD motif.

  15. PairMotifChIP: A Fast Algorithm for Discovery of Patterns Conserved in Large ChIP-seq Data Sets

    PubMed Central

    Huo, Hongwei; Feng, Dazheng

    2016-01-01

    Identifying conserved patterns in DNA sequences, namely, motif discovery, is an important and challenging computational task. With hundreds or more sequences contained, the high-throughput sequencing data set is helpful to improve the identification accuracy of motif discovery but requires an even higher computing performance. To efficiently identify motifs in large DNA data sets, a new algorithm called PairMotifChIP is proposed by extracting and combining pairs of l-mers in the input with relatively small Hamming distance. In particular, a method for rapidly extracting pairs of l-mers is designed, which can be used not only for PairMotifChIP, but also for other DNA data mining tasks with the same demand. Experimental results on the simulated data show that the proposed algorithm can find motifs successfully and runs faster than the state-of-the-art motif discovery algorithms. Furthermore, the validity of the proposed algorithm has been verified on real data. PMID:27843946

  16. Computational generation and screening of RNA motifs in large nucleotide sequence pools

    PubMed Central

    Kim, Namhee; Izzo, Joseph A.; Elmetwaly, Shereef; Gan, Hin Hark; Schlick, Tamar

    2010-01-01

    Although identification of active motifs in large random sequence pools is central to RNA in vitro selection, no systematic computational equivalent of this process has yet been developed. We develop a computational approach that combines target pool generation, motif scanning and motif screening using secondary structure analysis for applications to 1012–1014-sequence pools; large pool sizes are made possible using program redesign and supercomputing resources. We use the new protocol to search for aptamer and ribozyme motifs in pools up to experimental pool size (1014 sequences). We show that motif scanning, structure matching and flanking sequence analysis, respectively, reduce the initial sequence pool by 6–8, 1–2 and 1 orders of magnitude, consistent with the rare occurrence of active motifs in random pools. The final yields match the theoretical yields from probability theory for simple motifs and overestimate experimental yields, which constitute lower bounds, for aptamers because screening analyses beyond secondary structure information are not considered systematically. We also show that designed pools using our nucleotide transition probability matrices can produce higher yields for RNA ligase motifs than random pools. Our methods for generating, analyzing and designing large pools can help improve RNA design via simulation of aspects of in vitro selection. PMID:20448026

  17. Evolutionarily conserved sequences on human chromosome 21

    SciTech Connect

    Frazer, Kelly A.; Sheehan, John B.; Stokowski, Renee P.; Chen, Xiyin; Hosseini, Roya; Cheng, Jan-Fang; Fodor, Stephen P.A.; Cox, David R.; Patil, Nila

    2001-09-01

    Comparison of human sequences with the DNA of other mammals is an excellent means of identifying functional elements in the human genome. Here we describe the utility of high-density oligonucleotide arrays as a rapid approach for comparing human sequences with the DNA of multiple species whose sequences are not presently available. High-density arrays representing approximately 22.5 Mb of nonrepetitive human chromosome 21 sequence were synthesized and then hybridized with mouse and dog DNA to identify sequences conserved between humans and mice (human-mouse elements) and between humans and dogs (human-dog elements). Our data show that sequence comparison of multiple species provides a powerful empiric method for identifying actively conserved elements in the human genome. A large fraction of these evolutionarily conserved elements are present in regions on chromosome 21 that do not encode known genes.

  18. Plant and yeast cornichon possess a conserved acidic motif required for correct targeting of plasma membrane cargos.

    PubMed

    Rosas-Santiago, Paul; Lagunas-Gomez, Daniel; Yáñez-Domínguez, Carolina; Vera-Estrella, Rosario; Zimmermannová, Olga; Sychrová, Hana; Pantoja, Omar

    2017-10-01

    The export of membrane proteins along the secretory pathway is initiated at the endoplasmic reticulum after proteins are folded and packaged inside this organelle by their recruiting into the coat complex COPII vesicles. It is proposed that cargo receptors are required for the correct transport of proteins to its target membrane, however, little is known about ER export signals for cargo receptors. Erv14/Cornichon belong to a well conserved protein family in Eukaryotes, and have been proposed to function as cargo receptors for many transmembrane proteins. Amino acid sequence alignment showed the presence of a conserved acidic motif in the C-terminal in homologues from plants and yeast. Here, we demonstrate that mutation of the C-terminal acidic motif from ScErv14 or OsCNIH1, did not alter the localization of these cargo receptors, however it modified the proper targeting of the plasma membrane transporters Nha1p, Pdr12p and Qdr2p. Our results suggest that mistargeting of these plasma membrane proteins is a consequence of a weaker interaction between the cargo receptor and cargo proteins caused by the mutation of the C-terminal acidic motif. Copyright © 2017 Elsevier B.V. All rights reserved.

  19. Mitoxantrone and Analogues Bind and Stabilize i-Motif Forming DNA Sequences

    NASA Astrophysics Data System (ADS)

    Wright, Elisé P.; Day, Henry A.; Ibrahim, Ali M.; Kumar, Jeethendra; Boswell, Leo J. E.; Huguin, Camille; Stevenson, Clare E. M.; Pors, Klaus; Waller, Zoë A. E.

    2016-12-01

    There are hundreds of ligands which can interact with G-quadruplex DNA, yet very few which target i-motif. To appreciate an understanding between the dynamics between these structures and how they can be affected by intervention with small molecule ligands, more i-motif binding compounds are required. Herein we describe how the drug mitoxantrone can bind, induce folding of and stabilise i-motif forming DNA sequences, even at physiological pH. Additionally, mitoxantrone was found to bind i-motif forming sequences preferentially over double helical DNA. We also describe the stabilisation properties of analogues of mitoxantrone. This offers a new family of ligands with potential for use in experiments into the structure and function of i-motif forming DNA sequences.

  20. Mitoxantrone and Analogues Bind and Stabilize i-Motif Forming DNA Sequences

    PubMed Central

    Wright, Elisé P.; Day, Henry A.; Ibrahim, Ali M.; Kumar, Jeethendra; Boswell, Leo J. E.; Huguin, Camille; Stevenson, Clare E. M.; Pors, Klaus; Waller, Zoë A. E.

    2016-01-01

    There are hundreds of ligands which can interact with G-quadruplex DNA, yet very few which target i-motif. To appreciate an understanding between the dynamics between these structures and how they can be affected by intervention with small molecule ligands, more i-motif binding compounds are required. Herein we describe how the drug mitoxantrone can bind, induce folding of and stabilise i-motif forming DNA sequences, even at physiological pH. Additionally, mitoxantrone was found to bind i-motif forming sequences preferentially over double helical DNA. We also describe the stabilisation properties of analogues of mitoxantrone. This offers a new family of ligands with potential for use in experiments into the structure and function of i-motif forming DNA sequences. PMID:28004744

  1. Physical-chemical property based sequence motifs and methods regarding same

    DOEpatents

    Braun, Werner; Mathura, Venkatarajan S.; Schein, Catherine H.

    2008-09-09

    A data analysis system, program, and/or method, e.g., a data mining/data exploration method, using physical-chemical property motifs. For example, a sequence database may be searched for identifying segments thereof having physical-chemical properties similar to the physical-chemical property motifs.

  2. MISCORE: a new scoring function for characterizing DNA regulatory motifs in promoter sequences

    PubMed Central

    2012-01-01

    Background Computational approaches for finding DNA regulatory motifs in promoter sequences are useful to biologists in terms of reducing the experimental costs and speeding up the discovery process of de novo binding sites. It is important for rule-based or clustering-based motif searching schemes to effectively and efficiently evaluate the similarity between a k-mer (a k-length subsequence) and a motif model, without assuming the independence of nucleotides in motif models or without employing computationally expensive Markov chain models to estimate the background probabilities of k-mers. Also, it is interesting and beneficial to use a priori knowledge in developing advanced searching tools. Results This paper presents a new scoring function, termed as MISCORE, for functional motif characterization and evaluation. Our MISCORE is free from: (i) any assumption on model dependency; and (ii) the use of Markov chain model for background modeling. It integrates the compositional complexity of motif instances into the function. Performance evaluations with comparison to the well-known Maximum a Posteriori (MAP) score and Information Content (IC) have shown that MISCORE has promising capabilities to separate and recognize functional DNA motifs and its instances from non-functional ones. Conclusions MISCORE is a fast computational tool for candidate motif characterization, evaluation and selection. It enables to embed priori known motif models for computing motif-to-motif similarity, which is more advantageous than IC and MAP score. In addition to these merits mentioned above, MISCORE can automatically filter out some repetitive k-mers from a motif model due to the introduction of the compositional complexity in the function. Consequently, the merits of our proposed MISCORE in terms of both motif signal modeling power and computational efficiency will make it more applicable in the development of computational motif discovery tools. PMID:23282090

  3. Nuclear Magnetic Resonance Structure of a Novel Globular Domain in RBM10 Containing OCRE, the Octamer Repeat Sequence Motif.

    PubMed

    Martin, Bryan T; Serrano, Pedro; Geralt, Michael; Wüthrich, Kurt

    2016-01-05

    The OCtamer REpeat (OCRE) has been annotated as a 42-residue sequence motif with 12 tyrosine residues in the spliceosome trans-regulatory elements RBM5 and RBM10 (RBM [RNA-binding motif]), which are known to regulate alternative splicing of Fas and Bcl-x pre-mRNA transcripts. Nuclear magnetic resonance structure determination showed that the RBM10 OCRE sequence motif is part of a 55-residue globular domain containing 16 aromatic amino acids, which consists of an anti-parallel arrangement of six β strands, with the first five strands containing complete or incomplete Tyr triplets. This OCRE globular domain is a distinctive component of RBM10 and is more widely conserved in RBM10s across the animal kingdom than the ubiquitous RNA recognition components. It is also found in the functionally related RBM5. Thus, it appears that the three-dimensional structure of the globular OCRE domain, rather than the 42-residue OCRE sequence motif alone, confers specificity on RBM10 intermolecular interactions in the spliceosome.

  4. Sequence Fingerprints of MicroRNA Conservation

    PubMed Central

    Shi, Bing; Gao, Wei; Wang, Juan

    2012-01-01

    It is known that the conservation of protein-coding genes is associated with their sequences both various species, such as animals and plants. However, the association between microRNA (miRNA) conservation and their sequences in various species remains unexplored. Here we report the association of miRNA conservation with its sequence features, such as base content and cleavage sites, suggesting that miRNA sequences contain the fingerprints for miRNA conservation. More interestingly, different species show different and even opposite patterns between miRNA conservation and sequence features. For example, mammalian miRNAs show a positive/negative correlation between conservation and AU/GC content, whereas plant miRNAs show a negative/positive correlation between conservation and AU/GC content. Further analysis puts forward the hypothesis that the introns of protein-coding genes may be a main driving force for the origin and evolution of mammalian miRNAs. At the 5′ end, conserved miRNAs have a preference for base U, while less-conserved miRNAs have a preference for a non-U base in mammals. This difference does not exist in insects and plants, in which both conserved miRNAs and less-conserved miRNAs have a preference for base U at the 5′ end. We further revealed that the non-U preference at the 5′ end of less-conserved mammalian miRNAs is associated with miRNA function diversity, which may have evolved from the pressure of a highly sophisticated environmental stimulus the mammals encountered during evolution. These results indicated that miRNA sequences contain the fingerprints for conservation, and these fingerprints vary according to species. More importantly, the results suggest that although species share common mechanisms by which miRNAs originate and evolve, mammals may develop a novel mechanism for miRNA origin and evolution. In addition, the fingerprint found in this study can be predictor of miRNA conservation, and the findings are helpful in achieving a

  5. Examination of the transcription factor NtcA-binding motif by in vitro selection of DNA sequences from a random library.

    PubMed

    Jiang, F; Wisén, S; Widersten, M; Bergman, B; Mannervik, B

    2000-08-25

    A recursive in vitro selection among random DNA sequences was used for analysis of the cyanobacterial transcription factor NtcA-binding motifs. An eight-base palindromic sequence, TGTA-(N(8))-TACA, was found to be the optimal NtcA-binding sequence. The more divergent the binding sequences, compared to this consensus sequence, the lower the NtcA affinity. The second and third bases in each four-nucleotide half of the consensus sequence were crucial for NtcA binding, and they were in general highly conserved. The most frequently occurring sequence in the middle weakly conserved region was similar to that of the NtcA-binding motif of the Anabaena sp. strain PCC 7120 glnA gene, previously known to have high affinity for NtcA. This indicates that the middle sequences were selected for high NtcA affinity. Analysis of natural NtcA-binding motifs showed that these could be classified into two groups based on differences in recognition consensus sequences. It is suggested that NtcA naturally recognizes different DNA-binding motifs, or has differential affinities to these sequences under different physiological conditions.

  6. A Conserved Di-Basic Motif of Drosophila Crumbs Contributes to Efficient ER Export.

    PubMed

    Kumichel, Alexandra; Kapp, Katja; Knust, Elisabeth

    2015-06-01

    The Drosophila type I transmembrane protein Crumbs is an apical determinant required for the maintenance of apico-basal epithelial cell polarity. The level of Crumbs at the plasma membrane is crucial, but how it is regulated is poorly understood. In a genetic screen for regulators of Crumbs protein trafficking we identified Sar1, the core component of the coat protein complex II transport vesicles. sar1 mutant embryos show a reduced plasma membrane localization of Crumbs, a defect similar to that observed in haunted and ghost mutant embryos, which lack Sec23 and Sec24CD, respectively. By pulse-chase assays in Drosophila Schneider cells and analysis of protein transport kinetics based on Endoglycosidase H resistance we identified an RNKR motif in Crumbs, which contributes to efficient ER export. The motif identified fits the highly conserved di-basic RxKR motif and mediates interaction with Sar1. The RNKR motif is also required for plasma membrane delivery of transgene-encoded Crumbs in epithelial cells of Drosophila embryos. Our data are the first to show that a di-basic motif acts as a signal for ER exit of a type I plasma membrane protein in a metazoan organism.

  7. Factoring local sequence composition in motif significance analysis.

    PubMed

    Ng, Patrick; Keich, Uri

    2008-01-01

    We recently introduced a biologically realistic and reliable significance analysis of the output of a popular class of motif finders. In this paper we further improve our significance analysis by incorporating local base composition information. Relying on realistic biological data simulation, as well as on FDR analysis applied to real data, we show that our method is significantly better than the increasingly popular practice of using the normal approximation to estimate the significance of a finder's output. Finally we turn to leveraging our reliable significance analysis to improve the actual motif finding task. Specifically, endowing a variant of the Gibbs Sampler with our improved significance analysis we demonstrate that de novo finders can perform better than has been perceived. Significantly, our new variant outperforms all the finders reviewed in a recently published comprehensive analysis of the Harbison genome-wide binding location data. Interestingly, many of these finders incorporate additional information such as nucleosome positioning and the significance of binding data.

  8. Analysis of Genomic Sequence Motifs for Deciphering Transcription Factor Binding and Transcriptional Regulation in Eukaryotic Cells

    PubMed Central

    Boeva, Valentina

    2016-01-01

    Eukaryotic genomes contain a variety of structured patterns: repetitive elements, binding sites of DNA and RNA associated proteins, splice sites, and so on. Often, these structured patterns can be formalized as motifs and described using a proper mathematical model such as position weight matrix and IUPAC consensus. Two key tasks are typically carried out for motifs in the context of the analysis of genomic sequences. These are: identification in a set of DNA regions of over-represented motifs from a particular motif database, and de novo discovery of over-represented motifs. Here we describe existing methodology to perform these two tasks for motifs characterizing transcription factor binding. When applied to the output of ChIP-seq and ChIP-exo experiments, or to promoter regions of co-modulated genes, motif analysis techniques allow for the prediction of transcription factor binding events and enable identification of transcriptional regulators and co-regulators. The usefulness of motif analysis is further exemplified in this review by how motif discovery improves peak calling in ChIP-seq and ChIP-exo experiments and, when coupled with information on gene expression, allows insights into physical mechanisms of transcriptional modulation. PMID:26941778

  9. A sequence-specific transcription activator motif and powerful synthetic variants that bind Mediator using a fuzzy protein interface

    PubMed Central

    Warfield, Linda; Tuttle, Lisa M.; Pacheco, Derek; Klevit, Rachel E.; Hahn, Steven

    2014-01-01

    Although many transcription activators contact the same set of coactivator complexes, the mechanism and specificity of these interactions have been unclear. For example, do intrinsically disordered transcription activation domains (ADs) use sequence-specific motifs, or do ADs of seemingly different sequence have common properties that encode activation function? We find that the central activation domain (cAD) of the yeast activator Gcn4 functions through a short, conserved sequence-specific motif. Optimizing the residues surrounding this short motif by inserting additional hydrophobic residues creates very powerful ADs that bind the Mediator subunit Gal11/Med15 with high affinity via a “fuzzy” protein interface. In contrast to Gcn4, the activity of these synthetic ADs is not strongly dependent on any one residue of the AD, and this redundancy is similar to that of some natural ADs in which few if any sequence-specific residues have been identified. The additional hydrophobic residues in the synthetic ADs likely allow multiple faces of the AD helix to interact with the Gal11 activator-binding domain, effectively forming a fuzzier interface than that of the wild-type cAD. PMID:25122681

  10. An artificial intelligence approach to motif discovery in protein sequences: application to steriod dehydrogenases.

    PubMed

    Bailey, T L; Baker, M E; Elkan, C P

    1997-05-01

    MEME (Multiple Expectation-maximization for Motif Elicitation) is a unique new software tool that uses artificial intelligence techniques to discover motifs shared by a set of protein sequences in a fully automated manner. This paper is the first detailed study of the use of MEME to analyse a large, biologically relevant set of sequences, and to evaluate the sensitivity and accuracy of MEME in identifying structurally important motifs. For this purpose, we chose the short-chain alcohol dehydrogenase superfamily because it is large and phylogenetically diverse, providing a test of how well MEME can work on sequences with low amino acid similarity. Moreover, this dataset contains enzymes of biological importance, and because several enzymes have known X-ray crystallographic structures, we can test the usefulness of MEME for structural analysis. The first six motifs from MEME map onto structurally important alpha-helices and beta-strands on Streptomyces hydrogenans 20beta-hydroxysteroid dehydrogenase. We also describe MAST (Motif Alignment Search Tool), which conveniently uses output from MEME for searching databases such as SWISS-PROT and Genpept. MAST provides statistical measures that permit a rigorous evaluation of the significance of database searches with individual motifs or groups of motifs. A database search of Genpept90 by MAST with the log-odds matrix of the first six motifs obtained from MEME yields a bimodal output, demonstrating the selectivity of MAST. We show for the first time, using primary sequence analysis, that bacterial sugar epimerases are homologs of short-chain dehydrogenases. MEME and MAST will be increasingly useful as genome sequencing provides large datasets of phylogenetically divergent sequences of biomedical interest.

  11. Sequence search and analysis of gene products containing RNA recognition motifs in the human genome.

    PubMed

    Malhotra, Sony; Sowdhamini, Ramanathan

    2014-12-22

    Gene expression is tightly regulated at both transcriptional and post-transcriptional levels. RNA-binding proteins are involved in post-transcriptional gene regulation events. They are involved in a variety of functions such as splicing, alternative splicing, nuclear import and export of mRNA, RNA stability and translation. There are several well-characterized RNA-binding motifs present in a whole genome, such as RNA recognition motif (RRM), KH domain, zinc-fingers etc. In the present study, we have investigated human genome for the presence of RRM-containing gene products starting from RRM domains in the Pfam (Protein family database) repository. In Pfam, seven families are recorded to contain RRM-containing proteins. We studied these families for their taxonomic representation, sequence features (identity, length, phylogeny) and structural properties (mapping conservation on the structures). We then examined the presence of RRM-containing gene products in Homo sapiens genome and identified 928 RRM-containing gene products. These were studied for their predicted domain architectures, biological processes, involvement in pathways, disease relevance and disorder content. RRM domains were observed to occur multiple times in a single polypeptide. However, there are 56 other co-existing domains involved in different regulatory functions. Further, functional enrichment analysis revealed that RRM-containing gene products are mainly involved in biological functions such as mRNA splicing and its regulation. Our sequence analysis identified RRM-containing gene products in the human genome and provides insights into their domain architectures and biological functions. Since mRNA splicing and gene regulation are important in the cellular machinery, this analysis provides an early overview of genes that carry out these functions.

  12. Two structurally distinct {kappa}B sequence motifs cooperatively control LPS-induced KC gene transcription in mouse macrophages

    SciTech Connect

    Ohmori, Y.; Fukumoto, S.; Hamilton, T.A.

    1995-10-01

    The mouse KC gene is an {alpha}-chemokine gene whose transcription is induced in mononuclear phagocytes by LPS. DNA sequences necessary for transcriptional control of KC by LPS were identified in the region flanking the transcription start site. Transient transfection analysis in macrophages using deletion mutants of a 1.5-kb sequence placed in front of the chloramphenicol acetyl transferase (CAT) gene identified an LPS-responsive region between residues -104 and +30. This region contained two {kappa}B sequence motifs. The first motif (position -70 to -59, {kappa}B1) is highly conserved in all three human GRO genes and in the mouse macrophage inflammatory protein-2 (MIP-2) gene. The second {kappa}B motif (position -89 to -78, {kappa}B2) was conserved only between the mouse and the rat KC genes. Consistent with previous reports, the highly conserved {kappa}B site ({kappa}B1) was essential for LPS inducibility. Surprisingly, the distal {kappa}B site ({kappa}B2) was also necessary for optimal response; mutation of either {kappa}B site markedly reduced sensitivity to LPS in RAW264.7 cells and to TNF-{alpha} in NIH 3T3 fibroblasts. Although both {kappa}B1 and {kappa}B2 sequences were able to bind members of the Rel homology family, including NF{kappa}B1 (P50), RelA (65), and c-Rel, the {kappa}B1 site bound these factors with higher affinity and functioned more effectively than the {kappa}B2 site in a heterologous promoter. These findings demonstrate that transcriptional control of the KC gene requires cooperation between two {kappa}B sites and is thus distinct from that of the three human GRO genes and the mouse MIP-2 gene. 71 refs., 8 figs.

  13. Modeling of the Ebola Virus Delta Peptide Reveals a Potential Lytic Sequence Motif

    PubMed Central

    Gallaher, William R.; Garry, Robert F.

    2015-01-01

    Filoviruses, such as Ebola and Marburg viruses, cause severe outbreaks of human infection, including the extensive epidemic of Ebola virus disease (EVD) in West Africa in 2014. In the course of examining mutations in the glycoprotein gene associated with 2014 Ebola virus (EBOV) sequences, a differential level of conservation was noted between the soluble form of glycoprotein (sGP) and the full length glycoprotein (GP), which are both encoded by the GP gene via RNA editing. In the region of the proteins encoded after the RNA editing site sGP was more conserved than the overlapping region of GP when compared to a distant outlier species, Tai Forest ebolavirus. Half of the amino acids comprising the “delta peptide”, a 40 amino acid carboxy-terminal fragment of sGP, were identical between otherwise widely divergent species. A lysine-rich amphipathic peptide motif was noted at the carboxyl terminus of delta peptide with high structural relatedness to the cytolytic peptide of the non-structural protein 4 (NSP4) of rotavirus. EBOV delta peptide is a candidate viroporin, a cationic pore-forming peptide, and may contribute to EBOV pathogenesis. PMID:25609303

  14. Sequence conservation on the Y chromosome

    SciTech Connect

    Gibson, L.H.; Yang-Feng, L.; Lau, C.

    1994-09-01

    The Y chromosome is present in all mammals and is considered to be essential to sex determination. Despite intense genomic research, only a few genes have been identified and mapped to this chromosome in humans. Several of them, such as SRY and ZFY, have been demonstrated to be conserved and Y-located in other mammals. In order to address the issue of sequence conservation on the Y chromosome, we performed fluorescence in situ hybridization (FISH) with DNA from a human Y cosmid library as a probe to study the Y chromosomes from other mammalian species. Total DNA from 3,000-4,500 cosmid pools were labeled with biotinylated-dUTP and hybridized to metaphase chromosomes. For human and primate preparations, human cot1 DNA was included in the hybridization mixture to suppress the hybridization from repeat sequences. FISH signals were detected on the Y chromosomes of human, gorilla, orangutan and baboon (Old World monkey) and were absent on those of squirrel monkey (New World monkey), Indian munjac, wood lemming, Chinese hamster, rat and mouse. Since sequence analysis suggested that specific genes, e.g. SRY and ZFY, are conserved between these two groups, the lack of detectable hybridization in the latter group implies either that conservation of the human Y sequences is limited to the Y chromosomes of the great apes and Old World monkeys, or that the size of the syntenic segment is too small to be detected under the resolution of FISH, or that homologeous sequences have undergone considerable divergence. Further studies with reduced hybridization stringency are currently being conducted. Our results provide some clues as to Y-sequence conservation across species and demonstrate the limitations of FISH across species with total DNA sequences from a particular chromosome.

  15. Conserved Hydration Sites in Pin1 Reveal a Distinctive Water Recognition Motif in Proteins.

    PubMed

    Barman, Arghya; Smitherman, Crystal; Souffrant, Michael; Gadda, Giovanni; Hamelberg, Donald

    2016-01-25

    Structurally conserved water molecules are important for biomolecular stability, flexibility, and function. X-ray crystallographic studies of Pin1 have resolved a number of water molecules around the enzyme, including two highly conserved water molecules within the protein. The functional role of these localized water molecules remains unknown and unexplored. Pin1 catalyzes cis/trans isomerizations of peptidyl prolyl bonds that are preceded by a phosphorylated serine or threonine residue. Pin1 is involved in many subcellular signaling processes and is a potential therapeutic target for the treatment of several life threatening diseases. Here, we investigate the significance of these structurally conserved water molecules in the catalytic domain of Pin1 using molecular dynamics (MD) simulations, free energy calculations, analysis of X-ray crystal structures, and circular dichroism (CD) experiments. MD simulations and free energy calculations suggest the tighter binding water molecule plays a crucial role in maintaining the integrity and stability of a critical hydrogen-bonding network in the active site. The second water molecule is exchangeable with bulk solvent and is found in a distinctive helix-turn-coil motif. Structural bioinformatics analysis of nonredundant X-ray crystallographic protein structures in the Protein Data Bank (PDB) suggest this motif is present in several other proteins and can act as a water site, akin to the calcium EF hand. CD experiments suggest the isolated motif is in a distorted PII conformation and requires the protein environment to fully form the α-helix-turn-coil motif. This study provides valuable insights into the role of hydration in the structural integrity of Pin1 that can be exploited in protein engineering and drug design.

  16. Transferring functional annotations of membrane transporters on the basis of sequence similarity and sequence motifs

    PubMed Central

    2013-01-01

    Background Membrane transporters catalyze the transport of small solute molecules across biological barriers such as lipid bilayer membranes. Experimental identification of the transported substrates is very tedious. Once a particular transport mechanism has been identified in one organism, it is thus highly desirable to transfer this information to related transporter sequences in different organisms based on bioinformatics evidence. Results We present a thorough benchmark at which level of sequence identity membrane transporters from Escherichia coli, Saccharomyces cerevisiae, and Arabidopsis thaliana belong to the same families of the Transporter Classification (TC) system, and at what level these membrane transporters mediate the transport of the same substrate. We found that two membrane transporter sequences from different organisms that are aligned with normalized BLAST expectation value better than E-value 1e-8 are highly likely to belong to the same TC family (F-measure around 90%). Enriched sequence motifs identified by MEME at thresholds below 1e-12 support accurate classification into TC families for about two thirds of the sequences (F-measure 80% and higher). For the comparison of transported substrates, we focused on the four largest substrate classes of amino acids, sugars, metal ions, and phosphate. At similar identity thresholds, the nature of the transported substrates was more divergent (F-measure 40 - 75% at the same thresholds) than the TC family membership. Conclusions We suggest an acceptable threshold of 1e-8 for BLAST and HMMER where at least three quarters of the sequences are classified according to the TC system with a reasonably high accuracy. Researchers who wish to apply these thresholds in their studies should multiply these thresholds by the size of the database they search against. Our findings should be useful to those who wish to transfer transporter functional annotations across species. PMID:24283849

  17. Using a color-coded ambigraphic nucleic acid notation to visualize conserved palindromic motifs within and across genomes.

    PubMed

    Rozak, David A; Rozak, Anthony J

    2014-01-22

    Ambiscript is a graphically-designed nucleic acid notation that uses symbol symmetries to support sequence complementation, highlight biologically-relevant palindromes, and facilitate the analysis of consensus sequences. Although the original Ambiscript notation was designed to easily represent consensus sequences for multiple sequence alignments, the notation's black-on-white ambiguity characters are unable to reflect the statistical distribution of nucleotides found at each position. We now propose a color-augmented ambigraphic notation to encode the frequency of positional polymorphisms in these consensus sequences. We have implemented this color-coding approach by creating an Adobe Flash® application ( http://www.ambiscript.org) that shades and colors modified Ambiscript characters according to the prevalence of the encoded nucleotide at each position in the alignment. The resulting graphic helps viewers perceive biologically-relevant patterns in multiple sequence alignments by uniquely combining color, shading, and character symmetries to highlight palindromes and inverted repeats in conserved DNA motifs. Juxtaposing an intuitive color scheme over the deliberate character symmetries of an ambigraphic nucleic acid notation yields a highly-functional nucleic acid notation that maximizes information content and successfully embodies key principles of graphic excellence put forth by the statistician and graphic design theorist, Edward Tufte.

  18. Using a color-coded ambigraphic nucleic acid notation to visualize conserved palindromic motifs within and across genomes

    PubMed Central

    2014-01-01

    Background Ambiscript is a graphically-designed nucleic acid notation that uses symbol symmetries to support sequence complementation, highlight biologically-relevant palindromes, and facilitate the analysis of consensus sequences. Although the original Ambiscript notation was designed to easily represent consensus sequences for multiple sequence alignments, the notation’s black-on-white ambiguity characters are unable to reflect the statistical distribution of nucleotides found at each position. We now propose a color-augmented ambigraphic notation to encode the frequency of positional polymorphisms in these consensus sequences. Results We have implemented this color-coding approach by creating an Adobe Flash® application ( http://www.ambiscript.org) that shades and colors modified Ambiscript characters according to the prevalence of the encoded nucleotide at each position in the alignment. The resulting graphic helps viewers perceive biologically-relevant patterns in multiple sequence alignments by uniquely combining color, shading, and character symmetries to highlight palindromes and inverted repeats in conserved DNA motifs. Conclusion Juxtaposing an intuitive color scheme over the deliberate character symmetries of an ambigraphic nucleic acid notation yields a highly-functional nucleic acid notation that maximizes information content and successfully embodies key principles of graphic excellence put forth by the statistician and graphic design theorist, Edward Tufte. PMID:24447494

  19. Identification of disease-specific motifs in the antibody specificity repertoire via next-generation sequencing

    PubMed Central

    Pantazes, Robert J.; Reifert, Jack; Bozekowski, Joel; Ibsen, Kelly N.; Murray, Joseph A.; Daugherty, Patrick S.

    2016-01-01

    Disease-specific antibodies can serve as highly effective biomarkers but have been identified for only a relatively small number of autoimmune diseases. A method was developed to identify disease-specific binding motifs through integration of bacterial display peptide library screening, next-generation sequencing (NGS) and computational analysis. Antibody specificity repertoires were determined by identifying bound peptide library members for each specimen using cell sorting and performing NGS. A computational algorithm, termed Identifying Motifs Using Next- generation sequencing Experiments (IMUNE), was developed and applied to discover disease- and healthy control-specific motifs. IMUNE performs comprehensive pattern searches, identifies patterns statistically enriched in the disease or control groups and clusters the patterns to generate motifs. Using celiac disease sera as a discovery set, IMUNE identified a consensus motif (QPEQPF[PS]E) with high diagnostic sensitivity and specificity in a validation sera set, in addition to novel motifs. Peptide display and sequencing (Display-Seq) coupled with IMUNE analysis may thus be useful to characterize antibody repertoires and identify disease-specific antibody epitopes and biomarkers. PMID:27481573

  20. Conserved DNA Motifs, Including the CENP-B Box-like, Are Possible Promoters of Satellite DNA Array Rearrangements in Nematodes

    PubMed Central

    Car, Ana; Castagnone-Sereno, Philippe; Abad, Pierre; Plohl, Miroslav

    2013-01-01

    Tandemly arrayed non-coding sequences or satellite DNAs (satDNAs) are rapidly evolving segments of eukaryotic genomes, including the centromere, and may raise a genetic barrier that leads to speciation. However, determinants and mechanisms of satDNA sequence dynamics are only partially understood. Sequence analyses of a library of five satDNAs common to the root-knot nematodes Meloidogyne chitwoodi and M. fallax together with a satDNA, which is specific for M. chitwoodi only revealed low sequence identity (32–64%) among them. However, despite sequence differences, two conserved motifs were recovered. One of them turned out to be highly similar to the CENP-B box of human alpha satDNA, identical in 10–12 out of 17 nucleotides. In addition, organization of nematode satDNAs was comparable to that found in alpha satDNA of human and primates, characterized by monomers concurrently arranged in simple and higher-order repeat (HOR) arrays. In contrast to alpha satDNA, phylogenetic clustering of nematode satDNA monomers extracted either from simple or from HOR array indicated frequent shuffling between these two organizational forms. Comparison of homogeneous simple arrays and complex HORs composed of different satDNAs, enabled, for the first time, the identification of conserved motifs as obligatory components of monomer junctions. This observation highlights the role of short motifs in rearrangements, even among highly divergent sequences. Two mechanisms are proposed to be involved in this process, i.e., putative transposition-related cut-and-paste insertions and/or illegitimate recombination. Possibility for involvement of the nematode CENP-B box-like sequence in the transposition-related mechanism and together with previously established similarity of the human CENP-B protein and pogo-like transposases implicate a novel role of the CENP-B box and related sequence motifs in addition to the known function in centromere protein binding. PMID:23826269

  1. Conserved Sequence Processing in Primate Frontal Cortex.

    PubMed

    Wilson, Benjamin; Marslen-Wilson, William D; Petkov, Christopher I

    2017-02-01

    An important aspect of animal perception and cognition is learning to recognize relationships between environmental events that predict others in time, a form of relational knowledge that can be assessed using sequence-learning paradigms. Humans are exquisitely sensitive to sequencing relationships, and their combinatorial capacities, most saliently in the domain of language, are unparalleled. Recent comparative research in human and nonhuman primates has obtained behavioral and neuroimaging evidence for evolutionarily conserved substrates involved in sequence processing. The findings carry implications for the origins of domain-general capacities underlying core language functions in humans. Here, we synthesize this research into a 'ventrodorsal gradient' model, where frontal cortex engagement along this axis depends on sequencing complexity, mapping onto the sequencing capacities of different species. Copyright © 2016 Elsevier Ltd. All rights reserved.

  2. Targeting of Arabidopsis KNL2 to Centromeres Depends on the Conserved CENPC-k Motif in Its C Terminus[OPEN

    PubMed Central

    Talbert, Paul; Demidov, Dmitri

    2017-01-01

    KINETOCHORE NULL2 (KNL2) is involved in recognition of centromeres and in centromeric localization of the centromere-specific histone cenH3. Our study revealed a cenH3 nucleosome binding CENPC-k motif at the C terminus of Arabidopsis thaliana KNL2, which is conserved among a wide spectrum of eukaryotes. Centromeric localization of KNL2 is abolished by deletion of the CENPC-k motif and by mutating single conserved amino acids, but can be restored by insertion of the corresponding motif of Arabidopsis CENP-C. We showed by electrophoretic mobility shift assay that the C terminus of KNL2 binds DNA sequence-independently and interacts with the centromeric transcripts in vitro. Chromatin immunoprecipitation with anti-KNL2 antibodies indicated that in vivo KNL2 is preferentially associated with the centromeric repeat pAL1. Complete deletion of the CENPC-k motif did not influence its ability to interact with DNA in vitro. Therefore, we suggest that KNL2 recognizes centromeric nucleosomes, similar to CENP-C, via the CENPC-k motif and binds adjoining DNA. PMID:28062749

  3. A conserved motif in Tetrahymena thermophila telomerase reverse transcriptase is proximal to the RNA template and is essential for boundary definition.

    PubMed

    Akiyama, Benjamin M; Gomez, Anastassia; Stone, Michael D

    2013-07-26

    The ends of linear chromosomes are extended by telomerase, a ribonucleoprotein complex minimally consisting of a protein subunit called telomerase reverse transcriptase (TERT) and the telomerase RNA (TER). TERT functions by reverse transcribing a short template region of TER into telomeric DNA. Proper assembly of TERT and TER is essential for telomerase activity; however, a detailed understanding of how TERT interacts with TER is lacking. Previous studies have identified an RNA binding domain (RBD) within TERT, which includes three evolutionarily conserved sequence motifs: CP2, CP, and T. Here, we used site-directed hydroxyl radical probing to directly identify sites of interaction between the TERT RBD and TER, revealing that the CP2 motif is in close proximity to a conserved region of TER known as the template boundary element (TBE). Gel shift assays on CP2 mutants confirmed that the CP2 motif is an RNA binding determinant. Our results explain previous work that established that mutations to the CP2 motif of TERT and to the TBE of TER both permit misincorporation of nucleotides into the growing DNA strand beyond the canonical template. Taken together, these results suggest a model in which the CP2 motif binds the TBE to strictly define which TER nucleotides can be reverse transcribed.

  4. Recurring sequence-structure motifs in (βα)8-barrel proteins and experimental optimization of a chimeric protein designed based on such motifs.

    PubMed

    Wang, Jichao; Zhang, Tongchuan; Liu, Ruicun; Song, Meilin; Wang, Juncheng; Hong, Jiong; Chen, Quan; Liu, Haiyan

    2017-02-01

    An interesting way of generating novel artificial proteins is to combine sequence motifs from natural proteins, mimicking the evolutionary path suggested by natural proteins comprising recurring motifs. We analyzed the βα and αβ modules of TIM barrel proteins by structure alignment-based sequence clustering. A number of preferred motifs were identified. A chimeric TIM was designed by using recurring elements as mutually compatible interfaces. The foldability of the designed TIM protein was then significantly improved by six rounds of directed evolution. The melting temperature has been improved by more than 20°C. A variety of characteristics suggested that the resulting protein is well-folded. Our analysis provided a library of peptide motifs that is potentially useful for different protein engineering studies. The protein engineering strategy of using recurring motifs as interfaces to connect partial natural proteins may be applied to other protein folds. Copyright © 2016 Elsevier B.V. All rights reserved.

  5. Role of the conserved lysine within the Walker A motif of human DMC1

    PubMed Central

    Sharma, Deepti; Say, Amanda F.; Ledford, LeAnna L.; Hughes, Ami J.; Sehorn, Hilarie A.; Dwyer, Donard S.; Sehorn, Michael G.

    2012-01-01

    During meiosis, the RAD51 recombinase and its meiosis-specific homolog DMC1 mediate DNA strand exchange between homologous chromosomes. The proteins form a right-handed nucleoprotein complex on ssDNA called the presynaptic filament. In an ATP-dependent manner, the presynaptic filament searches for homology to form a physical connection with the homologous chromosome. We constructed two variants of hDMC1 altering the conserved lysine residue of the Walker A motif to arginine (hDMC1K132R) or alanine (hDMC1K132A). The hDMC1 variants were expressed in Escherichia coli and purified to near homogeneity. Both hDMC1K132R and hDMC1K132A variants were devoid of ATP hydrolysis. The hDMC1K132R variant was attenuated for ATP binding that was partially restored by the addition of either ssDNA or calcium. The hDMC1K132R variant was partially capable of homologous DNA pairing and strand exchange in the presence of calcium and protecting DNA from a nuclease, while the hDMC1K132A variant was inactive. These results suggest that the conserved lysine of the Walker A motif in hDMC1 plays a key role in ATP binding. Furthermore, the binding of calcium and ssDNA promotes a conformational change in the ATP binding pocket of hDMC1 that promotes ATP binding. Our results provide evidence that the conserved lysine in the Walker A motif of hDMC1 is critical for ATP binding which is required for presynaptic filament formation. PMID:23182424

  6. Phylum-Level Conservation of Regulatory Information in Nematodes despite Extensive Non-coding Sequence Divergence

    PubMed Central

    Gordon, Kacy L.; Arthur, Robert K.; Ruvinsky, Ilya

    2015-01-01

    Gene regulatory information guides development and shapes the course of evolution. To test conservation of gene regulation within the phylum Nematoda, we compared the functions of putative cis-regulatory sequences of four sets of orthologs (unc-47, unc-25, mec-3 and elt-2) from distantly-related nematode species. These species, Caenorhabditis elegans, its congeneric C. briggsae, and three parasitic species Meloidogyne hapla, Brugia malayi, and Trichinella spiralis, represent four of the five major clades in the phylum Nematoda. Despite the great phylogenetic distances sampled and the extensive sequence divergence of nematode genomes, all but one of the regulatory elements we tested are able to drive at least a subset of the expected gene expression patterns. We show that functionally conserved cis-regulatory elements have no more extended sequence similarity to their C. elegans orthologs than would be expected by chance, but they do harbor motifs that are important for proper expression of the C. elegans genes. These motifs are too short to be distinguished from the background level of sequence similarity, and while identical in sequence they are not conserved in orientation or position. Functional tests reveal that some of these motifs contribute to proper expression. Our results suggest that conserved regulatory circuitry can persist despite considerable turnover within cis elements. PMID:26020930

  7. Phylum-Level Conservation of Regulatory Information in Nematodes despite Extensive Non-coding Sequence Divergence.

    PubMed

    Gordon, Kacy L; Arthur, Robert K; Ruvinsky, Ilya

    2015-05-01

    Gene regulatory information guides development and shapes the course of evolution. To test conservation of gene regulation within the phylum Nematoda, we compared the functions of putative cis-regulatory sequences of four sets of orthologs (unc-47, unc-25, mec-3 and elt-2) from distantly-related nematode species. These species, Caenorhabditis elegans, its congeneric C. briggsae, and three parasitic species Meloidogyne hapla, Brugia malayi, and Trichinella spiralis, represent four of the five major clades in the phylum Nematoda. Despite the great phylogenetic distances sampled and the extensive sequence divergence of nematode genomes, all but one of the regulatory elements we tested are able to drive at least a subset of the expected gene expression patterns. We show that functionally conserved cis-regulatory elements have no more extended sequence similarity to their C. elegans orthologs than would be expected by chance, but they do harbor motifs that are important for proper expression of the C. elegans genes. These motifs are too short to be distinguished from the background level of sequence similarity, and while identical in sequence they are not conserved in orientation or position. Functional tests reveal that some of these motifs contribute to proper expression. Our results suggest that conserved regulatory circuitry can persist despite considerable turnover within cis elements.

  8. Sequence and structural conservation in RNA ribose zippers

    SciTech Connect

    Tamura, Makio; Holbrook, Stephen R.

    2002-03-01

    both within these three ribosomal RNA structures and in a large database of aligned prokaryotic sequences. The physical basis of the sequence conservation is stacked base triples formed between consecutive base-pairs on the stem or stem-like segment with bases (often adenines) from the loop-side segment. These triples have previously been characterized as Type I and Type II A-minor motifs and are stabilized by base base and base ribose hydrogen bonds. The sequence and structure conservation of ribose zippers can be directly used in tertiary structure prediction and may have applications in molecular modeling and design.

  9. A Conserved Phenylalanine of Motif IV in Superfamily 2 Helicases Is Required for Cooperative, ATP-Dependent Binding of RNA Substrates in DEAD-Box Proteins▿ †

    PubMed Central

    Banroques, Josette; Cordin, Olivier; Doère, Monique; Linder, Patrick; Tanner, N. Kyle

    2008-01-01

    We have identified a highly conserved phenylalanine in motif IV of the DEAD-box helicases that is important for their enzymatic activities. In vivo analyses of essential proteins in yeast showed that mutants of this residue had severe growth phenotypes. Most of the mutants also were temperature sensitive, which suggested that the mutations altered the conformational stability. Intragenic suppressors of the F405L mutation in yeast Ded1 mapped close to regions of the protein involved in ATP or RNA binding in DEAD-box crystal structures, which implicated a defect at this level. In vitro experiments showed that these mutations affected ATP binding and hydrolysis as well as strand displacement activity. However, the most pronounced effect was the loss of the ATP-dependent cooperative binding of the RNA substrates. Sequence analyses and an examination of the Protein Data Bank showed that the motif IV phenylalanine is conserved among superfamily 2 helicases. The phenylalanine appears to be an anchor that maintains the rigidity of the RecA-like domain. For DEAD-box proteins, the phenylalanine also aligns a highly conserved arginine of motif VI through van der Waals and cation-π interactions, thereby helping to maintain the network of interactions that exist between the different motifs involved in ATP and RNA binding. PMID:18332124

  10. Viroids: from genotype to phenotype just relying on RNA sequence and structural motifs.

    PubMed

    Flores, Ricardo; Serra, Pedro; Minoia, Sofía; Di Serio, Francesco; Navarro, Beatriz

    2012-01-01

    As a consequence of two unique physical properties, small size and circularity, viroid RNAs do not code for proteins and thus depend on RNA sequence/structural motifs for interacting with host proteins that mediate their invasion, replication, spread, and circumvention of defensive barriers. Viroid genomes fold up on themselves adopting collapsed secondary structures wherein stretches of nucleotides stabilized by Watson-Crick pairs are flanked by apparently unstructured loops. However, compelling data show that they are instead stabilized by alternative non-canonical pairs and that specific loops in the rod-like secondary structure, characteristic of Potato spindle tuber viroid and most other members of the family Pospiviroidae, are critical for replication and systemic trafficking. In contrast, rather than folding into a rod-like secondary structure, most members of the family Avsunviroidae adopt multibranched conformations occasionally stabilized by kissing-loop interactions critical for viroid viability in vivo. Besides these most stable secondary structures, viroid RNAs alternatively adopt during replication transient metastable conformations containing elements of local higher-order structure, prominent among which are the hammerhead ribozymes catalyzing a key replicative step in the family Avsunviroidae, and certain conserved hairpins that also mediate replication steps in the family Pospiviroidae. Therefore, different RNA structures - either global or local - determine different functions, thus highlighting the need for in-depth structural studies on viroid RNAs.

  11. Viroids: From Genotype to Phenotype Just Relying on RNA Sequence and Structural Motifs

    PubMed Central

    Flores, Ricardo; Serra, Pedro; Minoia, Sofía; Di Serio, Francesco; Navarro, Beatriz

    2012-01-01

    As a consequence of two unique physical properties, small size and circularity, viroid RNAs do not code for proteins and thus depend on RNA sequence/structural motifs for interacting with host proteins that mediate their invasion, replication, spread, and circumvention of defensive barriers. Viroid genomes fold up on themselves adopting collapsed secondary structures wherein stretches of nucleotides stabilized by Watson–Crick pairs are flanked by apparently unstructured loops. However, compelling data show that they are instead stabilized by alternative non-canonical pairs and that specific loops in the rod-like secondary structure, characteristic of Potato spindle tuber viroid and most other members of the family Pospiviroidae, are critical for replication and systemic trafficking. In contrast, rather than folding into a rod-like secondary structure, most members of the family Avsunviroidae adopt multibranched conformations occasionally stabilized by kissing-loop interactions critical for viroid viability in vivo. Besides these most stable secondary structures, viroid RNAs alternatively adopt during replication transient metastable conformations containing elements of local higher-order structure, prominent among which are the hammerhead ribozymes catalyzing a key replicative step in the family Avsunviroidae, and certain conserved hairpins that also mediate replication steps in the family Pospiviroidae. Therefore, different RNA structures – either global or local – determine different functions, thus highlighting the need for in-depth structural studies on viroid RNAs. PMID:22719735

  12. Novel hexamerization motif is discovered in a conserved cytoplasmic protein from Salmonella typhimurium.

    SciTech Connect

    Petrova, T.; Cuff, M.; Wu, R.; Kim, Y.; Holzle, D.; Joachimiak, A.; Biosciences Division; Inst. of Mathematical Problems of Biology

    2007-01-01

    The cytoplasmic protein Stm3548 of unknown function obtained from a strain of Salmonella typhimurium was determined by X-ray crystallography at a resolution of 2.25 A. The asymmetric unit contains a hexamer of structurally identical monomers. The monomer is a globular domain with a long beta-hairpin protrusion that distinguishes this structure. This beta-hairpin occupies a central position in the hexamer, and its residues participate in the majority of interactions between subunits of the hexamer. We suggest that the structure of Stm3548 presents a new hexamerization motif. Because the residues participating in interdomain interactions are highly conserved among close members of protein family DUF1355 and buried solvent accessible area for the hexamer is significant, the hexamer is most likely conserved as well. A light scattering experiment confirmed the presence of hexamer in solution.

  13. A conserved acidic motif is crucial for enzymatic activity of protein O-mannosyltransferases.

    PubMed

    Lommel, Mark; Schott, Andrea; Jank, Thomas; Hofmann, Verena; Strahl, Sabine

    2011-11-18

    Protein O-mannosylation is an essential modification in fungi and mammals. It is initiated at the endoplasmic reticulum by a conserved family of dolichyl phosphate mannose-dependent protein O-mannosyltransferases (PMTs). PMTs are integral membrane proteins with two hydrophilic loops (loops 1 and 5) facing the endoplasmic reticulum lumen. Formation of dimeric PMT complexes is crucial for mannosyltransferase activity, but the direct cause is not known to date. In bakers' yeast, O-mannosylation is catalyzed largely by heterodimeric Pmt1p-Pmt2p and homodimeric Pmt4p complexes. To further characterize Pmt1p-Pmt2p complexes, we developed a photoaffinity probe based on the artificial mannosyl acceptor substrate Tyr-Ala-Thr-Ala-Val. The photoreactive probe was preferentially cross-linked to Pmt1p, and deletion of the loop 1 (but not loop 5) region abolished this interaction. Analysis of Pmt1p loop 1 mutants revealed that especially Glu-78 is crucial for binding of the photoreactive probe. Glu-78 belongs to an Asp-Glu motif that is highly conserved among PMTs. We further demonstrate that single amino acid substitutions in this motif completely abolish activity of Pmt4p complexes. In contrast, both acidic residues need to be exchanged to eliminate activity of Pmt1p-Pmt2p complexes. On the basis of our data, we propose that the loop 1 regions of dimeric complexes form part of the catalytic site.

  14. ssHMM: extracting intuitive sequence-structure motifs from high-throughput RNA-binding protein data.

    PubMed

    Heller, David; Krestel, Ralf; Ohler, Uwe; Vingron, Martin; Marsico, Annalisa

    2017-08-30

    RNA-binding proteins (RBPs) play an important role in RNA post-transcriptional regulation and recognize target RNAs via sequence-structure motifs. The extent to which RNA structure influences protein binding in the presence or absence of a sequence motif is still poorly understood. Existing RNA motif finders either take the structure of the RNA only partially into account, or employ models which are not directly interpretable as sequence-structure motifs. We developed ssHMM, an RNA motif finder based on a hidden Markov model (HMM) and Gibbs sampling which fully captures the relationship between RNA sequence and secondary structure preference of a given RBP. Compared to previous methods which output separate logos for sequence and structure, it directly produces a combined sequence-structure motif when trained on a large set of sequences. ssHMM's model is visualized intuitively as a graph and facilitates biological interpretation. ssHMM can be used to find novel bona fide sequence-structure motifs of uncharacterized RBPs, such as the one presented here for the YY1 protein. ssHMM reaches a high motif recovery rate on synthetic data, it recovers known RBP motifs from CLIP-Seq data, and scales linearly on the input size, being considerably faster than MEMERIS and RNAcontext on large datasets while being on par with GraphProt. It is freely available on Github and as a Docker image. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  15. Enhancing Gibbs sampling method for motif finding in DNA with initial graph representation of sequences.

    PubMed

    Stepančič, Ziva

    2014-10-01

    Finding short patterns with residue variation in a set of sequences is still an open problem in genetics, since motif-finding techniques on DNA and protein sequences are inconclusive on real data sets and their performance varies on different species. Hence, finding new algorithms and evolving established methods are vital to further understanding of genome properties and the mechanisms of protein development. In this work, we present an approach to finding functional motifs in DNA sequences in connection to Gibbs sampling method. Starting points in the search space are partly determined via graphical representation of input sequences opposed to completely random initial points with the standard Gibbs sampling. Our algorithm is evaluated on synthetic as well as on real data sets by using several statistics, such as sensitivity, positive predictive value, specificity, performance, and correlation coefficient. Additionally, a comparison between our algorithm and the basic standard Gibbs sampling algorithm is made to show improvement in accuracy, repeatability, and performance.

  16. The conserved helicase motifs of the herpes simplex virus type 1 origin-binding protein UL9 are important for function.

    PubMed Central

    Martinez, R; Shao, L; Weller, S K

    1992-01-01

    The UL9 gene of herpes simplex virus encodes a protein that specifically recognizes sequences within the viral origins of replication and exhibits helicase and DNA-dependent ATPase activities. The specific DNA binding domain of the UL9 protein was localized to the carboxy-terminal one-third of the molecule (H. M. Weir, J. M. Calder, and N. D. Stow, Nucleic Acids Res. 17:1409-1425, 1989). The N-terminal two-thirds of the UL9 gene contains six sequence motifs found in all members of a superfamily of DNA and RNA helicases, suggesting that this region may be important for helicase activity of UL9. In this report, we examined the functional significance of these six motifs for the UL9 protein through the introduction of site-specific mutations resulting in single amino acid substitutions of the most highly conserved residues within each motif. An in vivo complementation test was used to study the effect of each mutation on the function of the UL9 protein in viral DNA replication. In this assay, a mutant UL9 protein expressed from a transfected plasmid is used to complement a replication-deficient null mutant in the UL9 gene for the amplification of herpes simplex virus origin-containing plasmids. Mutations in five of the six conserved motifs inactivated the function of the UL9 protein in viral DNA replication, providing direct evidence for the importance of these conserved motifs. Insertion mutants resulting in the introduction of two alanines at 100-residue intervals in regions outside the conserved motifs were also constructed. Three of the insertion mutations were tolerated, whereas the other five abolished UL9 function. These data indicate that other regions of the protein, in addition to the helicase motifs, are important for function in vivo. Several mutations result in instability of the mutant products, presumably because of conformational changes in the protein. Taken together, these results suggest that UL9 is very sensitive to mutations with respect to both

  17. The transcription factor Spn1 regulates gene expression via a highly conserved novel structural motif

    PubMed Central

    Pujari, Venugopal; Radebaugh, Catherine A.; Chodaparambil, Jayanth V.; Muthurajan, Uma M.; Almeida, Adam R.; Fischbeck, Julie A.; Luger, Karolin; Stargell, Laurie A.

    2010-01-01

    Spn1 plays essential roles in the regulation of gene expression by RNA Polymerase II (RNAPII), and it is highly conserved in organisms ranging from yeast to humans. Spn1 physically and/or genetically interacts with RNAPII, TBP, TFIIS and a number of chromatin remodeling factors (Swi/Snf and Spt6). The central domain of Spn1 (residues 141-305 out of 410) is necessary and sufficient for performing the essential functions of SPN1 in yeast cells. Here we report the high-resolution (1.85Å) crystal structure of the conserved central domain of Saccharomyces cerevisiae Spn1. The central domain is comprised of eight alpha-helices in a right handed super helical arrangement, and exhibits structural similarity to domain I of TFIIS. A unique structural feature of Spn1 is a highly conserved loop, which defines one side of a pronounced cavity. The loop and the other residues forming the cavity are highly conserved at the amino acid level among all Spn1 family members, suggesting that this is a signature motif for Spn1 orthologs. The locations and the molecular characterization of temperature-sensitive mutations in Spn1 indicate that the cavity is a key attribute of Spn1 that is critical for its regulatory functions during RNAPII-mediated transcriptional activity. PMID:20875428

  18. The transcription factor Spn1 regulates gene expression via a highly conserved novel structural motif.

    PubMed

    Pujari, Venugopal; Radebaugh, Catherine A; Chodaparambil, Jayanth V; Muthurajan, Uma M; Almeida, Adam R; Fischbeck, Julie A; Luger, Karolin; Stargell, Laurie A

    2010-11-19

    Spn1/Iws1 plays essential roles in the regulation of gene expression by RNA polymerase II (RNAPII), and it is highly conserved in organisms ranging from yeast to humans. Spn1 physically and/or genetically interacts with RNAPII, TBP (TATA-binding protein), TFIIS (transcription factor IIS), and a number of chromatin remodeling factors (Swi/Snf and Spt6). The central domain of Spn1 (residues 141-305 out of 410) is necessary and sufficient for performing the essential functions of SPN1 in yeast cells. Here, we report the high-resolution (1.85 Å) crystal structure of the conserved central domain of Saccharomyces cerevisiae Spn1. The central domain is composed of eight α-helices in a right-handed superhelical arrangement and exhibits structural similarity to domain I of TFIIS. A unique structural feature of Spn1 is a highly conserved loop, which defines one side of a pronounced cavity. The loop and the other residues forming the cavity are highly conserved at the amino acid level among all Spn1 family members, suggesting that this is a signature motif for Spn1 orthologs. The locations and the molecular characterization of temperature-sensitive mutations in Spn1 indicate that the cavity is a key attribute of Spn1 that is critical for its regulatory functions during RNAPII-mediated transcriptional activity.

  19. BayesMotif: de novo protein sorting motif discovery from impure datasets

    PubMed Central

    2010-01-01

    Background Protein sorting is the process that newly synthesized proteins are transported to their target locations within or outside of the cell. This process is precisely regulated by protein sorting signals in different forms. A major category of sorting signals are amino acid sub-sequences usually located at the N-terminals or C-terminals of protein sequences. Genome-wide experimental identification of protein sorting signals is extremely time-consuming and costly. Effective computational algorithms for de novo discovery of protein sorting signals is needed to improve the understanding of protein sorting mechanisms. Methods We formulated the protein sorting motif discovery problem as a classification problem and proposed a Bayesian classifier based algorithm (BayesMotif) for de novo identification of a common type of protein sorting motifs in which a highly conserved anchor is present along with a less conserved motif regions. A false positive removal procedure is developed to iteratively remove sequences that are unlikely to contain true motifs so that the algorithm can identify motifs from impure input sequences. Results Experiments on both implanted motif datasets and real-world datasets showed that the enhanced BayesMotif algorithm can identify anchored sorting motifs from pure or impure protein sequence dataset. It also shows that the false positive removal procedure can help to identify true motifs even when there is only 20% of the input sequences containing true motif instances. Conclusion We proposed BayesMotif, a novel Bayesian classification based algorithm for de novo discovery of a special category of anchored protein sorting motifs from impure datasets. Compared to conventional motif discovery algorithms such as MEME, our algorithm can find less-conserved motifs with short highly conserved anchors. Our algorithm also has the advantage of easy incorporation of additional meta-sequence features such as hydrophobicity or charge of the motifs which

  20. An evolutionarily conserved motif in the TAB1 C-terminal region is necessary for interaction with and activation of TAK1 MAPKKK.

    PubMed

    Ono, K; Ohtomo, T; Sato, S; Sugamata, Y; Suzuki, M; Hisamoto, N; Ninomiya-Tsuji, J; Tsuchiya, M; Matsumoto, K

    2001-06-29

    TAK1, a member of the MAPKKK family, is involved in the intracellular signaling pathways mediated by transforming growth factor beta, interleukin 1, and Wnt. TAK1 kinase activity is specifically activated by the TAK1-binding protein TAB1. The C-terminal 68-amino acid sequence of TAB1 (TAB1-C68) is sufficient for TAK1 interaction and activation. Analysis of various truncated versions of TAB1-C68 defined a C-terminal 30-amino acid sequence (TAB1-C30) necessary for TAK1 binding and activation. NMR studies revealed that the TAB1-C30 region has a unique alpha-helical structure. We identified a conserved sequence motif, PYVDXA/TXF, in the C-terminal domain of mammalian TAB1, Xenopus TAB1, and its Caenorhabditis elegans homolog TAP-1, suggesting that this motif constitutes a specific TAK1 docking site. Alanine substitution mutagenesis showed that TAB1 Phe-484, located in the conserved motif, is crucial for TAK1 binding and activation. The C. elegans homolog of TAB1, TAP-1, was able to interact with and activate the C. elegans homolog of TAK1, MOM-4. However, the site in TAP-1 corresponding to Phe-484 of TAB1 is an alanine residue (Ala-364), and changing this residue to Phe abrogates the ability of TAP-1 to interact with and activate MOM-4. These results suggest that the Phe or Ala residue within the conserved motif of the TAB1-related proteins is important for interaction with and activation of specific TAK1 MAPKKK family members in vivo.

  1. Conserved Noncoding Sequences in the Grasses4

    PubMed Central

    Inada, Dan Choffnes; Bashir, Ali; Lee, Chunghau; Thomas, Brian C.; Ko, Cynthia; Goff, Stephen A.; Freeling, Michael

    2003-01-01

    As orthologous genes from related species diverge over time, some sequences are conserved in noncoding regions. In mammals, large phylogenetic footprints, or conserved noncoding sequences (CNSs), are known to be common features of genes. Here we present the first large-scale analysis of plant genes for CNSs. We used maize and rice, maximally diverged members of the grass family of monocots. Using a local sequence alignment set to deliver only significant alignments, we found one or more CNSs in the noncoding regions of the majority of genes studied. Grass genes have dramatically fewer and much smaller CNSs than mammalian genes. Twenty-seven percent of grass gene comparisons revealed no CNSs. Genes functioning in upstream regulatory roles, such as transcription factors, are greatly enriched for CNSs relative to genes encoding enzymes or structural proteins. Further, we show that a CNS cluster in an intron of the knotted1 homeobox gene serves as a site of negative regulation. We showthat CNSs in the adh1 gene do not correlate with known cis-acting sites. We discuss the potential meanings of CNSs and their value as analytical tools and evolutionary characters. We advance the idea that many CNSs function to lock-in gene regulatory decisions. PMID:12952874

  2. Identification of an oligodeoxynucleotide sequence motif that specifically inhibits phosphorylation by protein tyrosine kinases.

    PubMed

    Krieg, A M; Matson, S; Cheng, K; Fisher, E; Koretzky, G A; Koland, J G

    1997-04-01

    Protein tyrosine kinases (PTKs) have central roles in cellular signal transduction. We have identified a sequence motif (CGT[C]GA) in phosphorothioate-modified oligodeoxynucleotides (ODNs) that specifically inhibits the enzymatic activity of recombinant or immunoprecipitated PTK in vitro. Hexamer ODNs containing this motif block both substrate and autophosphorylation of at least four different PTKs but have no apparent effect on the enzymatic activity of a serine/threonine protein kinase. These data suggest possible new applications for ODNs and have implications for the design and interpretation of experiments using antisense or triplex ODNs.

  3. A Conserved GPG-Motif in the HIV-1 Nef Core Is Required for Principal Nef-Activities

    PubMed Central

    Martínez-Bonet, Marta; Palladino, Claudia; Briz, Veronica; Rudolph, Jochen M.; Fackler, Oliver T.; Relloso, Miguel; Muñoz-Fernandez, Maria Angeles; Madrid, Ricardo

    2015-01-01

    To find out new determinants required for Nef activity we performed a functional alanine scanning analysis along a discrete but highly conserved region at the core of HIV-1 Nef. We identified the GPG-motif, located at the 121–137 region of HIV-1 NL4.3 Nef, as a novel protein signature strictly required for the p56Lck dependent Nef-induced CD4-downregulation in T-cells. Since the Nef-GPG motif was dispensable for CD4-downregulation in HeLa-CD4 cells, Nef/AP-1 interaction and Nef-dependent effects on Tf-R trafficking, the observed effects on CD4 downregulation cannot be attributed to structure constraints or to alterations on general protein trafficking. Besides, we found that the GPG-motif was also required for Nef-dependent inhibition of ring actin re-organization upon TCR triggering and MHCI downregulation, suggesting that the GPG-motif could actively cooperate with the Nef PxxP motif for these HIV-1 Nef-related effects. Finally, we observed that the Nef-GPG motif was required for optimal infectivity of those viruses produced in T-cells. According to these findings, we propose the conserved GPG-motif in HIV-1 Nef as functional region required for HIV-1 infectivity and therefore with a potential interest for the interference of Nef activity during HIV-1 infection. PMID:26700863

  4. Using hybrid hierarchical K-means (HHK) clustering algorithm for protein sequence motif super-rule-tree (SRT) structure construction.

    PubMed

    Chen, Bernard; He, Jieyue; Pellicer, Stephen; Pan, Yi

    2010-01-01

    Many algorithms or techniques to discover motifs require a predefined fixed window size in advance. Because of the fixed size, these approaches often deliver a number of similar motifs simply shifted by some bases or including mismatches. To confront the mismatched motifs problem, we use the super-rule concept to construct a Super-Rule-Tree (SRT) by a modified Hybrid Hierarchical K-means (HHK) clustering algorithm, which requires no parameter set-up to identify the similarities and dissimilarities between the motifs. By analysing the motif results generated by our approach, they are significant not only in sequence area but also in secondary structure similarity.

  5. Protospacer recognition motifs

    PubMed Central

    Shah, Shiraz A.; Erdmann, Susanne; Mojica, Francisco J.M.; Garrett, Roger A.

    2013-01-01

    Protospacer adjacent motifs (PAMs) were originally characterized for CRISPR-Cas systems that were classified on the basis of their CRISPR repeat sequences. A few short 2–5 bp sequences were identified adjacent to one end of the protospacers. Experimental and bioinformatical results linked the motif to the excision of protospacers and their insertion into CRISPR loci. Subsequently, evidence accumulated from different virus- and plasmid-targeting assays, suggesting that these motifs were also recognized during DNA interference, at least for the recently classified type I and type II CRISPR-based systems. The two processes, spacer acquisition and protospacer interference, employ different molecular mechanisms, and there is increasing evidence to suggest that the sequence motifs that are recognized, while overlapping, are unlikely to be identical. In this article, we consider the properties of PAM sequences and summarize the evidence for their dual functional roles. It is proposed to use the terms protospacer associated motif (PAM) for the conserved DNA sequence and to employ spacer acqusition motif (SAM) and target interference motif (TIM), respectively, for acquisition and interference recognition sites. PMID:23403393

  6. DNA sequence motif: a jack of all trades for ChIP-Seq data.

    PubMed

    Kulakovskiy, Ivan V; Makeev, Vsevolod J

    2013-01-01

    Nowadays, chromatin immunoprecipitation followed by next-generation sequencing, often referred to as ChIP-Seq, has become an industry standard to study a landscape of DNA-protein interactions in vivo. ChIP-Seq captures highly specific protein-DNA interactions, such as transcription factors (TFs) bound to appropriate binding sites, and sparse patterns formed by different histone marks. In this review, we focus on DNA sequence analysis methods adequate for TF ChIP-Seq data. We discuss numerous tasks starting from basic DNA motif finding and motif discovery as is, further applied to explore various features of experimental data. We show how sequence analysis of ChIP-Seq data derives novel biological knowledge on multiple levels, from individual transcription factor binding sites to genome segments operating as regulatory modules. Finally, we provide an overview of existing software in the field. Copyright © 2013 Elsevier Inc. All rights reserved.

  7. DEEP MOTIF DASHBOARD: VISUALIZING AND UNDERSTANDING GENOMIC SEQUENCES USING DEEP NEURAL NETWORKS.

    PubMed

    Lanchantin, Jack; Singh, Ritambhara; Wang, Beilun; Qi, Yanjun

    2016-01-01

    Deep neural network (DNN) models have recently obtained state-of-the-art prediction accuracy for the transcription factor binding (TFBS) site classification task. However, it remains unclear how these approaches identify meaningful DNA sequence signals and give insights as to why TFs bind to certain locations. In this paper, we propose a toolkit called the Deep Motif Dashboard (DeMo Dashboard) which provides a suite of visualization strategies to extract motifs, or sequence patterns from deep neural network models for TFBS classification. We demonstrate how to visualize and understand three important DNN models: convolutional, recurrent, and convolutional-recurrent networks. Our first visualization method is finding a test sequence's saliency map which uses first-order derivatives to describe the importance of each nucleotide in making the final prediction. Second, considering recurrent models make predictions in a temporal manner (from one end of a TFBS sequence to the other), we introduce temporal output scores, indicating the prediction score of a model over time for a sequential input. Lastly, a class-specific visualization strategy finds the optimal input sequence for a given TFBS positive class via stochastic gradient optimization. Our experimental results indicate that a convolutional-recurrent architecture performs the best among the three architectures. The visualization techniques indicate that CNN-RNN makes predictions by modeling both motifs as well as dependencies among them.

  8. Evolutionary and molecular analysis of Dof transcription factors identified a conserved motif for intercellular protein trafficking.

    PubMed

    Chen, Huan; Ahmad, Munawar; Rim, Yeonggil; Lucas, William J; Kim, Jae-Yean

    2013-06-01

    · Cell-to-cell trafficking of transcription factors (TFs) has been shown to play an important role in the regulation of plant developmental events, but the evolutionary relationship between cell-autonomous and noncell-autonomous (NCA) TFs remains elusive. · AtDof4.1, named INTERCELLULAR TRAFFICKING DOF 1 (ITD1), was chosen as a representative NCA member to explore this evolutionary relationship. Using domain structure-function analyses and swapping studies, we examined the cell-to-cell trafficking of plant-specific Dof TF family members across Arabidopsis and other species. · We identified a conserved intercellular trafficking motif (ITM) that is necessary and sufficient for selective cell-to-cell trafficking and can impart gain-of-function cell-to-cell movement capacity to an otherwise cell-autonomous TF. The functionality of related motifs from Dof members across the plant kingdom extended, surprisingly, to a unicellular alga that lacked plasmodesmata. By contrast, the algal homeodomain related to the NCA KNOX homeodomain was either inefficient or unable to impart such cell-to-cell movement function. · The Dof ITM appears to predate the evolution of selective plasmodesmal trafficking in the plant kingdom, which may well have acted as a molecular template for the evolution of Dof proteins as NCA TFs. However, the ability to efficiently traffic for KNOX homeodomain (HD) proteins may have been acquired during the evolution of early nonvascular plants.

  9. Interpreting Frequency Responses to Dose-Conserved Pulsatile Input Signals in Simple Cell Signaling Motifs

    PubMed Central

    Fletcher, Patrick A.; Clément, Frédérique; Vidal, Alexandre; Tabak, Joel; Bertram, Richard

    2014-01-01

    Many hormones are released in pulsatile patterns. This pattern can be modified, for instance by changing pulse frequency, to encode relevant physiological information. Often other properties of the pulse pattern will also change with frequency. How do signaling pathways of cells targeted by these hormones respond to different input patterns? In this study, we examine how a given dose of hormone can induce different outputs from the target system, depending on how this dose is distributed in time. We use simple mathematical models of feedforward signaling motifs to understand how the properties of the target system give rise to preferences in input pulse pattern. We frame these problems in terms of frequency responses to pulsatile inputs, where the amplitude or duration of the pulses is varied along with frequency to conserve input dose. We find that the form of the nonlinearity in the steady state input-output function of the system predicts the optimal input pattern. It does so by selecting an optimal input signal amplitude. Our results predict the behavior of common signaling motifs such as receptor binding with dimerization, and protein phosphorylation. The findings have implications for experiments aimed at studying the frequency response to pulsatile inputs, as well as for understanding how pulsatile patterns drive biological responses via feedforward signaling pathways. PMID:24748217

  10. The Elongin BC complex interacts with the conserved SOCS-box motif present in members of the SOCS, ras, WD-40 repeat, and ankyrin repeat families

    PubMed Central

    Kamura, Takumi; Sato, Shigeo; Haque, Dewan; Liu, Li; Kaelin, William G.; Conaway, Ronald C.; Conaway, Joan Weliky

    1998-01-01

    The Elongin BC complex was identified initially as a positive regulator of RNA polymerase II (Pol II) elongation factor Elongin A and subsequently as a component of the multiprotein von Hippel-Lindau (VHL) tumor suppressor complex, in which it participates in both tumor suppression and negative regulation of hypoxia-inducible genes. Elongin B is a ubiquitin-like protein, and Elongin C is a Skp1-like protein that binds to a BC-box motif that is present in both Elongin A and VHL and is distinct from the conserved F-box motif recognized by Skp1. In this report, we demonstrate that the Elongin BC complex also binds to a functional BC box present in the SOCS box, a sequence motif identified recently in the suppressor of cytokine signaling-1 (SOCS-1) protein, as well as in a collection of additional proteins belonging to the SOCS, ras, WD-40 repeat, SPRY domain, and ankyrin repeat families. In addition, we present evidence (1) that the Elongin BC complex is a component of a multiprotein SOCS-1 complex that attenuates Jak/STAT signaling by binding to Jak2 and inhibiting Jak2 kinase, and (2) that by interacting with the SOCS box, the Elongin BC complex can increase expression of the SOCS-1 protein by inhibiting its degradation. These results suggest that Elongin BC is a multifunctional regulatory complex capable of controlling multiple pathways in the cell through interaction with a short degenerate sequence motif found in many different proteins. PMID:9869640

  11. Widespread position-specific conservation of synonymous rare codons within coding sequences

    PubMed Central

    Steele, Aaron; Carmichael, Rory; Rodriguez, Anabel; Specht, Alicia T.; Ngo, Kim; Emrich, Scott

    2017-01-01

    Synonymous rare codons are considered to be sub-optimal for gene expression because they are translated more slowly than common codons. Yet surprisingly, many protein coding sequences include large clusters of synonymous rare codons. Rare codons at the 5’ terminus of coding sequences have been shown to increase translational efficiency. Although a general functional role for synonymous rare codons farther within coding sequences has not yet been established, several recent reports have identified rare-to-common synonymous codon substitutions that impair folding of the encoded protein. Here we test the hypothesis that although the usage frequencies of synonymous codons change from organism to organism, codon rarity will be conserved at specific positions in a set of homologous coding sequences, for example to tune translation rate without altering a protein sequence. Such conservation of rarity–rather than specific codon identity–could coordinate co-translational folding of the encoded protein. We demonstrate that many rare codon cluster positions are indeed conserved within homologous coding sequences across diverse eukaryotic, bacterial, and archaeal species, suggesting they result from positive selection and have a functional role. Most conserved rare codon clusters occur within rather than between conserved protein domains, challenging the view that their primary function is to facilitate co-translational folding after synthesis of an autonomous structural unit. Instead, many conserved rare codon clusters separate smaller protein structural motifs within structural domains. These smaller motifs typically fold faster than an entire domain, on a time scale more consistent with translation rate modulation by synonymous codon usage. While proteins with conserved rare codon clusters are structurally and functionally diverse, they are enriched in functions associated with organism growth and development, suggesting an important role for synonymous codon usage in

  12. Sequence motifs associated with paternal transmission of mitochondrial DNA in the horse mussel, Modiolus modiolus (Bivalvia: Mytilidae).

    PubMed

    Robicheau, Brent M; Breton, Sophie; Stewart, Donald T

    2017-03-20

    In the majority of metazoans paternal mitochondria represent evolutionary dead-ends. In many bivalves, however, this paradigm does not hold true; both maternal and paternal mitochondria are inherited. Herein, we characterize maternal and paternal mitochondrial control regions of the horse mussel, Modiolus modiolus (Bivalvia: Mytilidae). The maternal control region is 808bp long, while the paternal control region is longer at 2.3kb. We hypothesize that the size difference is due to a combination of repeated duplications within the control region of the paternal mtDNA genome, as well as an evolutionarily ancient recombination event between two sex-associated mtDNA genomes that led to the insertion of a second control region sequence in the genome that is now transmitted via males. In a comparison to other mytilid male control regions, we identified two evolutionarily Conserved Motifs, CMA and CMB, associated with paternal transmission of mitochondrial DNA. CMA is characterized by a conserved purine/pyrimidine pattern, while CMB exhibits a specific 13bp nucleotide string within a stem and loop structure. The identification of motifs CMA and CMB in M. modiolus extends our understanding of Sperm Transmission Elements (STEs) that have recently been identified as being associated with the paternal transmission of mitochondria in marine bivalves. Copyright © 2017 Elsevier B.V. All rights reserved.

  13. Species-Specific Minimal Sequence Motif for Oligodeoxyribonucleotides Activating Mouse TLR9.

    PubMed

    Pohar, Jelka; Lainšček, Duško; Fukui, Ryutaro; Yamamoto, Chikako; Miyake, Kensuke; Jerala, Roman; Benčina, Mojca

    2015-11-01

    Synthetic oligodeoxyribonucleotides (ODNs) containing unmethylated CpG recapitulate the activation of TLR9 by microbial DNA. ODNs are potent stimulators of the immune response in cells expressing TLR9. Despite extensive use of mice as experimental animals in basic and applied immunological research, the key sequence determinants that govern the activation of mouse TLR9 by ODNs have not been well defined. We performed a systematic investigation of the sequence motif of B class phosphodiester ODNs to identify the sequence properties that govern mouse TLR9 activation. In contrast to ODNs activating human TLR9, where the minimal sequence motif for the receptor activation comprises a pair of closely positioned CpGs we found that the mouse TLR9 requires a single CpG positioned 4-6 nt from the 5'-end. Activation is augmented by a 5'TCC sequence one to three nucleotides from the CG. The distance of the CG dinucleotide of four to six nucleotides from the 5'-end and the ODN's length fine-tunes activation of mouse macrophages. Length of the ODN <23 and >29 nt decreases activation of dendritic cells. The ODNs with minimal sequence induce Th1-type cytokine synthesis in dendritic cells and confirm the expression of cell surface markers in B cells. Identification of the minimal sequence provides an insight into the sequence selectivity of mouse TLR9 and points to the differences in the receptor selectivity between species probably as a result of differences in the receptor binding sites.

  14. Using machine learning to predict gene expression and discover sequence motifs

    NASA Astrophysics Data System (ADS)

    Li, Xuejing

    Recently, large amounts of experimental data for complex biological systems have become available. We use tools and algorithms from machine learning to build data-driven predictive models. We first present a novel algorithm to discover gene sequence motifs associated with temporal expression patterns of genes. Our algorithm, which is based on partial least squares (PLS) regression, is able to directly model the flow of information, from gene sequence to gene expression, to learn cis regulatory motifs and characterize associated gene expression patterns. Our algorithm outperforms traditional computational methods e.g. clustering in motif discovery. We then present a study of extending a machine learning model for transcriptional regulation predictive of genetic regulatory response to Caenorhabditis elegans. We show meaningful results both in terms of prediction accuracy on the test experiments and biological information extracted from the regulatory program. The model discovers DNA binding sites ab initio. We also present a case study where we detect a signal of lineage-specific regulation. Finally we present a comparative study on learning predictive models for motif discovery, based on different boosting algorithms: Adaptive Boosting (AdaBoost), Linear Programming Boosting (LPBoost) and Totally Corrective Boosting (TotalBoost). We evaluate and compare the performance of the three boosting algorithms via both statistical and biological validation, for hypoxia response in Saccharomyces cerevisiae.

  15. Unique sequence features of the Human Adenovirus 31 complete genomic sequence are conserved in clinical isolates

    PubMed Central

    2009-01-01

    Background Human adenoviruses (HAdV) are causing a broad spectrum of diseases. One of the most severe forms of adenovirus infection is a disseminated disease resulting in significant morbidity and mortality. Several reports in recent years have identified HAdV-31 from species A (HAdV-A31) as a cause of disseminated disease in children following haematopoetic stem cell transplantation (hSCT) and liver transplantation. We sequenced and analyzed the complete genome of the HAdV-A31 prototype strain to uncover unique sequence motifs associated with its high virulence. Moreover, we sequenced coding regions known to be essential for tropism and virulence (early transcription units E1A, E3, E4, the fiber knob and the penton base) of HAdV-A31 clinical isolates from patients with disseminated disease. Results The genome size of HAdV-A31 is 33763 base pairs (bp) in length with a GC content of 46.36%. Nucleotide alignment to the closely related HAdV-A12 revealed an overall homology of 84.2%. The genome organization into early, intermediate and late regions is similar to HAdV-A12. Sequence analysis of the prototype strain showed unique sequence features such as an immunoglobulin-like domain in the species A specific gene product E3 CR1 beta and a potentially integrin binding RGD motif in the C-terminal region of the protein IX. These features were conserved in all analyzed clinical isolates. Overall, amino acid sequences of clinical isolates were highly conserved compared to the prototype (99.2 to 100%), but a synonymous/non synonymous ratio (S/N) of 2.36 in E3 CR1 beta suggested positive selection. Conclusion Unique sequence features of HAdV-A31 may enhance its ability to escape the host's immune surveillance and may facilitate a promiscuous tropism for various tissues. Moderate evolution of clinical isolates did not indicate the emergence of new HAdV-A31 subtypes in the recent years. PMID:19939241

  16. Phylogenomics-guided discovery of a novel conserved cassette of short linear motifs in BubR1 essential for the spindle checkpoint

    PubMed Central

    Bade, Debora

    2016-01-01

    The spindle assembly checkpoint (SAC) maintains genomic integrity by preventing progression of mitotic cell division until all chromosomes are stably attached to spindle microtubules. The SAC critically relies on the paralogues Bub1 and BubR1/Mad3, which integrate kinetochore–spindle attachment status with generation of the anaphase inhibitory complex MCC. We previously reported on the widespread occurrences of independent gene duplications of an ancestral ‘MadBub’ gene in eukaryotic evolution and the striking parallel subfunctionalization that lead to loss of kinase function in BubR1/Mad3-like paralogues. Here, we present an elaborate subfunctionalization analysis of the Bub1/BubR1 gene family and perform de novo sequence discovery in a comparative phylogenomics framework to trace the distribution of ancestral sequence features to extant paralogues throughout the eukaryotic tree of life. We show that known ancestral sequence features are consistently retained in the same functional paralogue: GLEBS/CMI/CDII/kinase in the Bub1-like and KEN1/KEN2/D-Box in the BubR1/Mad3-like. The recently described ABBA motif can be found in either or both paralogues. We however discovered two additional ABBA motifs that flank KEN2. This cassette of ABBA1-KEN2-ABBA2 forms a strictly conserved module in all ancestral and BubR1/Mad3-like proteins, suggestive of a specific and crucial SAC function. Indeed, deletion of the ABBA motifs in human BUBR1 abrogates the SAC and affects APC/C–Cdc20 interactions. Our detailed comparative genomics analyses thus enabled discovery of a conserved cassette of motifs essential for the SAC and shows how this approach can be used to uncover hitherto unrecognized functional protein features. PMID:28003474

  17. Pool sequencing of natural HLA-DR, DQ, and DP ligands reveals detailed peptide motifs, constraints of processing, and general rules.

    PubMed

    Falk, K; Rötzschke, O; Stevanović, S; Jung, G; Rammensee, H G

    1994-01-01

    We have approached the problem of MHC class II ligand motifs by pool sequencing natural peptides eluted from HLA-DR, DQ, and DP molecules. The results indicate surprisingly clear patterns, although not quite as clear as with natural class I ligands. The most striking feature is a highly dominant Proline at position 2. We interpret this to be a consequence of aminopeptidase N-like activity in processing. Another general aspect is the existence of three to four hydrophobic or aromatic anchors, whereby the first and the last are separated by five to eight residues. The peptide motifs for HLA-DR1, DR5, DQ7, and DPw4 are allele-specific and differ by spacing and occupancy of anchors. The anchors tend to be flanked by clusters of charged residues, and small residues, especially Ala, are frequent in the motif centers. These detailed motifs allow one to interpret most previous (DR-) motifs as fitting one or more of the anchors or conserved clusters. The relative motif symmetry suggests the possibility of bidirectional binding of peptides in the class II groove.

  18. Caraparu virus (group C Orthobunyavirus): sequencing and phylogenetic analysis based on the conserved region 3 of the RNA polymerase gene.

    PubMed

    de Brito Magalhães, Cintia Lopes; Quinan, Bárbara Resende; Novaes, Renata Franco Vianna; dos Santos, João Rodrigues; Kroon, Erna Geessien; Bonjardim, Cláudio Antônio; Ferreira, Paulo César Peregrino

    2007-12-01

    Here, for the first time, we report the nucleotide sequence of Caraparu virus (CARV) L segment and the analysis of the RNA polymerase region 3 encoded by this segment. The 1,404 bp nucleotide sequence shares the highest identity with Bunyamwera, La Crosse, Oropouche, and Akabane virus sequences. The amino acid sequence was deduced and aligned with sequences from members of the Bunyaviridae family and used for phylogenetic analysis. The CARV clustered in the Orthobunyavirus genus. The premotif A and motifs A-E are present in the region 3 of the Bunyaviridae family, were also conserved in CARV L protein, as well as other conserved regions among Orthobunyavirus genus.

  19. A Convex Atomic-Norm Approach to Multiple Sequence Alignment and Motif Discovery

    PubMed Central

    Yen, Ian E. H.; Lin, Xin; Zhang, Jiong; Ravikumar, Pradeep; Dhillon, Inderjit S.

    2016-01-01

    Multiple Sequence Alignment and Motif Discovery, known as NP-hard problems, are two fundamental tasks in Bioinformatics. Existing approaches to these two problems are based on either local search methods such as Expectation Maximization (EM), Gibbs Sampling or greedy heuristic methods. In this work, we develop a convex relaxation approach to both problems based on the recent concept of atomic norm and develop a new algorithm, termed Greedy Direction Method of Multiplier, for solving the convex relaxation with two convex atomic constraints. Experiments show that our convex relaxation approach produces solutions of higher quality than those standard tools widely-used in Bioinformatics community on the Multiple Sequence Alignment and Motif Discovery problems. PMID:27559428

  20. Functional importance of GGXG sequence motifs in putative reentrant loops of 2HCT and ESS transport proteins.

    PubMed

    Dobrowolski, Adam; Lolkema, Juke S

    2009-08-11

    The 2HCT and ESS families are two families of secondary transporters. Members of the two families are unrelated in amino acid sequence but share similar hydropathy profiles, which suggest a similar folding of the proteins in membranes. Structural models show two homologous domains containing five transmembrane segments (TMSs) each, with a reentrant or pore loop between the fourth and fifth TMSs in each domain. Here we show that GGXG sequence motifs present in the putative reentrant loops are important for the activity of the transporters. Mutation of the conserved Gly residues to Cys in the motifs of the Na(+)-citrate transporter CitS in the 2HCT family and the Na(+)-glutamate transporter GltS in the ESS family resulted in strongly reduced transport activity. Similarly, mutation of the variable residue "X" to Cys in the N-terminal half of GltS essentially inactivated the transporter. The corresponding mutations in the N- and C-terminal halves of CitS reduced transport activity to 60 and 25% of that of the wild type, respectively. Residual activity of any of the mutants could be further reduced by treatment with the membrane permeable thiol reagent N-ethylmaleimide (NEM). The X to Cys mutation (S405C) in the cytoplasmic loop in the C-terminal half of CitS rendered the protein sensitive to the bulky, membrane impermeable thiol reagent 4-acetamido-4'-maleimidylstilbene-2,2'-disulfonic acid (AmdiS) added at the periplasmic side of the membrane, providing further evidence that this part of the loop is positioned between the transmembrane segments. The putative reentrant loop in the C-terminal half of the ESS family does not contain the GGXG motif, but a conserved stretch rich in Gly residues. Cysteine-scanning mutagenesis of a stretch of 18 residues in the GltS protein revealed two residues important for function. Mutant N356C was completely inactivated by treatment with NEM, and mutant P351C appeared to be the counterpart of mutant S405C of CitS; the mutant was

  1. 'Size leap' algorithm: an efficient extraction of the longest common motifs from a molecular sequence set. Application to the DNA sequence reconstruction.

    PubMed

    Danckaert, A; Chappey, C; Hazout, S

    1991-10-01

    We propose a new method, called 'size leap' algorithm, of search for motifs of maximum size and common to two fragments at least. It allows the creation of a reduced database of motifs from a set of sequences whose size obeys the series of Fibonacci numbers. The convenience lies in the efficiency of the motif extraction. It can be applied in the establishment of overlap regions for DNA sequence reconstruction and multiple alignment of biological sequences. The method of complete DNA sequence reconstruction by extraction of the longest motifs ('anchor motifs') is presented as an application of the size leap algorithm. The details of a reconstruction from three sequenced fragments are given as an example.

  2. The tungsten formylmethanofuran dehydrogenase from Methanobacterium thermoautotrophicum contains sequence motifs characteristic for enzymes containing molybdopterin dinucleotide.

    PubMed

    Hochheimer, A; Schmitz, R A; Thauer, R K; Hedderich, R

    1995-12-15

    Formylmethanofuran dehydrogenases are molybdenum or tungsten iron-sulfur proteins containing a pterin dinucleotide cofactor. We report here on the primary structures of the four subunits FwdABCD of the tungsten enzyme from Methanobacterium thermoautotrophicum which were determined by cloning and sequencing the encoding genes fwdABCD. FwdB was found to contain sequence motifs characteristic for molybdopterin-dinucleotide-containing enzymes indicating that this subunit harbors the active site. FwdA, FwdC and FwdD showed no significant sequence similarity to proteins in the data bases. Northern blot analysis revealed that the four fwd genes form a transcription unit together with three additional genes designated fwdE, fwdF and fwdG. A 17.8-kDa protein and an 8.6-kDa protein, both containing two [4Fe-4S] cluster binding motifs, were deduced from fwdE and fwdG. The open reading frame fwdF encodes a 38.6-kDa protein containing eight binding motifs for [4Fe-4S] clusters suggesting the gene product to be a novel polyferredoxin. All seven fwd genes were expressed in Escherichia coli yielding proteins of the expected size. The fwd operon was found to be located in a region of the M. thermoautotrophicum genome encoding molybdenum enzymes and proteins involved in molybdopterin biosynthesis.

  3. Conserved Intramolecular Interactions Maintain Myosin Interacting-Heads Motifs Explaining Tarantula Muscle Super-Relaxed State Structural Basis.

    PubMed

    Alamo, Lorenzo; Qi, Dan; Wriggers, Willy; Pinto, Antonio; Zhu, Jingui; Bilbao, Aivett; Gillilan, Richard E; Hu, Songnian; Padrón, Raúl

    2016-03-27

    Tarantula striated muscle is an outstanding system for understanding the molecular organization of myosin filaments. Three-dimensional reconstruction based on cryo-electron microscopy images and single-particle image processing revealed that, in a relaxed state, myosin molecules undergo intramolecular head-head interactions, explaining why head activity switches off. The filament model obtained by rigidly docking a chicken smooth muscle myosin structure to the reconstruction was improved by flexibly fitting an atomic model built by mixing structures from different species to a tilt-corrected 2-nm three-dimensional map of frozen-hydrated tarantula thick filament. We used heavy and light chain sequences from tarantula myosin to build a single-species homology model of two heavy meromyosin interacting-heads motifs (IHMs). The flexibly fitted model includes previously missing loops and shows five intramolecular and five intermolecular interactions that keep the IHM in a compact off structure, forming four helical tracks of IHMs around the backbone. The residues involved in these interactions are oppositely charged, and their sequence conservation suggests that IHM is present across animal species. The new model, PDB 3JBH, explains the structural origin of the ATP turnover rates detected in relaxed tarantula muscle by ascribing the very slow rate to docked unphosphorylated heads, the slow rate to phosphorylated docked heads, and the fast rate to phosphorylated undocked heads. The conservation of intramolecular interactions across animal species and the presence of IHM in bilaterians suggest that a super-relaxed state should be maintained, as it plays a role in saving ATP in skeletal, cardiac, and smooth muscles. Copyright © 2016 Elsevier Ltd. All rights reserved.

  4. A Conserved EAR Motif Is Required for Avirulence and Stability of the Ralstonia solanacearum Effector PopP2 In Planta.

    PubMed

    Segonzac, Cécile; Newman, Toby E; Choi, Sera; Jayaraman, Jay; Choi, Du Seok; Jung, Ga Young; Cho, Heejung; Lee, Young Kee; Sohn, Kee Hoon

    2017-01-01

    Ralstonia solanacearum is the causal agent of the devastating bacterial wilt disease in many high value Solanaceae crops. R. solanacearum secretes around 70 effectors into host cells in order to promote infection. Plants have, however, evolved specialized immune receptors that recognize corresponding effectors and confer qualitative disease resistance. In the model species Arabidopsis thaliana, the paired immune receptors RRS1 (resistance to Ralstonia solanacearum 1) and RPS4 (resistance to Pseudomonas syringae 4) cooperatively recognize the R. solanacearum effector PopP2 in the nuclei of infected cells. PopP2 is an acetyltransferase that binds to and acetylates the RRS1 WRKY DNA-binding domain resulting in reduced RRS1-DNA association thereby activating plant immunity. Here, we surveyed the naturally occurring variation in PopP2 sequence among the R. solanacearum strains isolated from diseased tomato and pepper fields across the Republic of Korea. Our analysis revealed high conservation of popP2 sequence with only three polymorphic alleles present amongst 17 strains. Only one variation (a premature stop codon) caused the loss of RPS4/RRS1-dependent recognition in Arabidopsis. We also found that PopP2 harbors a putative eukaryotic transcriptional repressor motif (ethylene-responsive element binding factor-associated amphiphilic repression or EAR), which is known to be involved in the recruitment of transcriptional co-repressors. Remarkably, mutation of the EAR motif disabled PopP2 avirulence function as measured by the development of hypersensitive response, electrolyte leakage, defense marker gene expression and bacterial growth in Arabidopsis. This lack of recognition was partially but significantly reverted by the C-terminal addition of a synthetic EAR motif. We show that the EAR motif-dependent gain of avirulence correlated with the stability of the PopP2 protein. Furthermore, we demonstrated the requirement of the PopP2 EAR motif for PTI suppression. A yeast

  5. Massive microRNA sequence conservation and prevalence in human and chimpanzee introns.

    PubMed

    Hill, Aubrey E; Sorscher, Eric J

    2013-06-01

    Human and chimpanzee introns contain numerous sequences strongly related to known microRNA hairpin structures. The relative frequency is precisely maintained across all chromosomes, suggesting the possible co-evolution of gene networks dependent upon microRNA regulation and with origins corresponding to the advent of primate transposable elements (TEs). While the motifs are known to be derived from transposable elements, the most common are far more numerous than expected from the number of TEs and their paralogous sequences, and exhibit striking conservation in comparison to the surrounding TE sequence context. Several of these motifs also exhibit structural complimentarity to each other, suggesting a pairing function at the level of DNA or RNA. These "pseudomicroRNAs," in semblance to pseudogenes, include hundreds of thousands of vestigial paralogs of primate microRNAs, many of which may have functioned historically or remain active today.

  6. Polar residues in a conserved motif spanning helices 1 and 2 are functionally important in the SulP transporter family.

    PubMed

    Leves, Fiona P; Tierney, M Louise; Howitt, Susan M

    2008-01-01

    The SulP family (including the SLC26 family) is a diverse family of anion transporters found in all domains of life, with different members transporting different anions. We used sequence and bioinformatics analysis of helices 1 and 2 of SulP family members to identify a conserved motif, extending the previously defined 'sulfate transporter motif'. The analysis showed that in addition to being highly conserved in both sequence and spacing, helices 1 and 2 contain a significant number of polar residues and are predicted to be buried within the protein interior, with at least some faces packed closely against other helices. This suggests a significant functional role for this region and we tested this by mutating polar residues in helices 1 and 2 in the sulfate transporter, SHST1. All mutations made, even those removing only a single hydroxyl group, had significant effects on transport. Many mutations abolished transport without affecting plasma membrane expression of the mutant protein, suggesting a functional role for these residues. Different helical faces appear to have different roles, with the most severe effects being localised to two interacting faces of helices 1 and 2. Our results confirm the predicted importance of conserved polar residues in helices 1 and 2 and suggest that transport of sulfate by SHST1 is dependent on a network of polar and aromatic interactions between these two helices.

  7. Conservation of Tubulin-Binding Sequences in TRPV1 throughout Evolution

    PubMed Central

    Sardar, Puspendu; Kumar, Abhishek; Bhandari, Anita; Goswami, Chandan

    2012-01-01

    Background Transient Receptor Potential Vanilloid sub type 1 (TRPV1), commonly known as capsaicin receptor can detect multiple stimuli ranging from noxious compounds, low pH, temperature as well as electromagnetic wave at different ranges. In addition, this receptor is involved in multiple physiological and sensory processes. Therefore, functions of TRPV1 have direct influences on adaptation and further evolution also. Availability of various eukaryotic genomic sequences in public domain facilitates us in studying the molecular evolution of TRPV1 protein and the respective conservation of certain domains, motifs and interacting regions that are functionally important. Methodology and Principal Findings Using statistical and bioinformatics tools, our analysis reveals that TRPV1 has evolved about ∼420 million years ago (MYA). Our analysis reveals that specific regions, domains and motifs of TRPV1 has gone through different selection pressure and thus have different levels of conservation. We found that among all, TRP box is the most conserved and thus have functional significance. Our results also indicate that the tubulin binding sequences (TBS) have evolutionary significance as these stretch sequences are more conserved than many other essential regions of TRPV1. The overall distribution of positively charged residues within the TBS motifs is conserved throughout evolution. In silico analysis reveals that the TBS-1 and TBS-2 of TRPV1 can form helical structures and may play important role in TRPV1 function. Conclusions and Significance Our analysis identifies the regions of TRPV1, which are important for structure – function relationship. This analysis indicates that tubulin binding sequence-1 (TBS-1) near the TRP-box forms a potential helix and the tubulin interactions with TRPV1 via TBS-1 have evolutionary significance. This interaction may be required for the proper channel function and regulation and may also have significance in the context of Taxol

  8. A dominant negative mutation in the conserved RNA helicase motif 'SAT' causes splicing factor PRP2 to stall in spliceosomes.

    PubMed Central

    Plumpton, M; McGarvey, M; Beggs, J D

    1994-01-01

    To characterize sequences in the RNA helicase-like PRP2 protein of Saccharomyces cerevisiae that are essential for its function in pre-mRNA splicing, a pool of random PRP2 mutants was generated. A dominant negative allele was isolated which, when overexpressed in a wild-type yeast strain, inhibited cell growth by causing a defect in pre-mRNA splicing. This defect was partially alleviated by simultaneous co-overexpression of wild-type PRP2. The dominant negative PRP2 protein inhibited splicing in vitro and caused the accumulation of stalled splicing complexes. Immunoprecipitation with anti-PRP2 antibodies confirmed that dominant negative PRP2 protein competed with its wild-type counterpart for interaction with spliceosomes, with which the mutant protein remained associated. The PRP2-dn1 mutation led to a single amino acid change within the conserved SAT motif that in the prototype helicase eIF-4A is required for RNA unwinding. Purified dominant negative PRP2 protein had approximately 40% of the wild-type level of RNA-stimulated ATPase activity. As ATPase activity was reduced only slightly, but splicing activity was abolished, we propose that the dominant negative phenotype is due primarily to a defect in the putative RNA helicase activity of PRP2 protein. Images PMID:8112301

  9. Evolutionary Conserved Motif Finder (ECMFinder) for genome-wide identification of clustered YY1- and CTCF-binding sites

    PubMed Central

    Kang, Keunsoo; Chung, Jae Hoon; Kim, Joomyeong

    2009-01-01

    We have developed a new bioinformatics approach called ECMFinder (Evolutionary Conserved Motif Finder). This program searches for a given DNA motif within the entire genome of one species and uses the gene association information of a potential transcription factor-binding site (TFBS) to screen the homologous regions of a second and third species. If multiple species have this potential TFBS in homologous positions, this program recognizes the identified TFBS as an evolutionary conserved motif (ECM). This program outputs a list of ECMs, which can be uploaded as a Custom Track in the UCSC genome browser and can be visualized along with other available data. The feasibility of this approach was tested by searching the genomes of three mammals (human, mouse and cow) with the DNA-binding motifs of YY1 and CTCF. This program successfully identified many clustered YY1- and CTCF-binding sites that are conserved among these species but were previously undetected. In particular, this program identified CTCF-binding sites that are located close to the Dlk1, Magel2 and Cdkn1c imprinted genes. Individual ChIP experiments confirmed the in vivo binding of the YY1 and CTCF proteins to most of these newly discovered binding sites, demonstrating the feasibility and usefulness of ECMFinder. PMID:19208640

  10. CyanoLyase: a database of phycobilin lyase sequences, motifs and functions

    PubMed Central

    Bretaudeau, Anthony; Coste, François; Humily, Florian; Garczarek, Laurence; Le Corguillé, Gildas; Six, Christophe; Ratin, Morgane; Collin, Olivier; Schluchter, Wendy M.; Partensky, Frédéric

    2013-01-01

    CyanoLyase (http://cyanolyase.genouest.org/) is a manually curated sequence and motif database of phycobilin lyases and related proteins. These enzymes catalyze the covalent ligation of chromophores (phycobilins) to specific binding sites of phycobiliproteins (PBPs). The latter constitute the building bricks of phycobilisomes, the major light-harvesting systems of cyanobacteria and red algae. Phycobilin lyases sequences are poorly annotated in public databases. Sequences included in CyanoLyase were retrieved from all available genomes of these organisms and a few others by similarity searches using biochemically characterized enzyme sequences and then classified into 3 clans and 32 families. Amino acid motifs were computed for each family using Protomata learner. CyanoLyase also includes BLAST and a novel pattern matching tool (Protomatch) that allow users to rapidly retrieve and annotate lyases from any new genome. In addition, it provides phylogenetic analyses of all phycobilin lyases families, describes their function, their presence/absence in all genomes of the database (phyletic profiles) and predicts the chromophorylation of PBPs in each strain. The site also includes a thorough bibliography about phycobilin lyases and genomes included in the database. This resource should be useful to scientists and companies interested in natural or artificial PBPs, which have a number of biotechnological applications, notably as fluorescent markers. PMID:23175607

  11. Peptide sequence motif analysis of tandem MS data with the SALSA algorithm.

    PubMed

    Liebler, Daniel C; Hansen, Beau T; Davey, Sean W; Tiscareno, Laura; Mason, Daniel E

    2002-01-01

    We have developed a pattern recognition algorithm called SALSA (scoring algorithm for spectral analysis) for the detection of specific features in tandem MS (MS-MS) spectra. Application of the SALSA algorithm to the detection of peptide MS-MS ion series enables identification of MS-MS spectra displaying characteristics of specific peptide sequences. SALSA analysis scores MS-MS spectra based on correspondence between theoretical ion series for peptide sequence motifs and actual MS-MS product ion series, regardless of their absolute positions on the m/z axis. Analyses of tryptic digests of bovine serum albumin (BSA) by LC-MS-MS followed by SALSA analysis detected MS-MS spectra for both unmodified and multiple modified forms of several BSA tryptic peptides. SALSA analysis of MS-MS data from mixtures of BSA and human serum albumin (HSA) tryptic digests indicated that ion series searches with BSA peptide sequence motifs identified MS-MS spectra for both BSA and closely related HSA peptides. Optimal discrimination between MS-MS spectra of variant peptide forms is achieved when the SALSA search criteria are optimized to the target peptide. Application of SALSA to LC-MS-MS proteome analysis will facilitate the characterization of modified and sequence variant proteins.

  12. A Sequence Motif within Trypanosome Precursor tRNAs Influences Abundance and Mitochondrial Localization

    PubMed Central

    Sherrer, R. Lynn; Yermovsky-Kammerer, Audra E.; Hajduk, Stephen L.

    2003-01-01

    Trypanosoma brucei lacks mitochondrial genes encoding tRNAs and must import nuclearly encoded tRNAs from the cytosol. The mechanism and specificity of this process remain unclear. We have identified a unique sequence motif, YGG(C/A)RRC, upstream of the genes encoding mitochondrially localized tRNAs in T. brucei. Both in vitro import studies and in vivo transfection studies indicate that deletion of the YGG(C/A)RRC sequence alters mitochondrial localization of tRNALeu, and in vivo studies also show a decrease in the cellular abundance of tRNALeu. These studies provide direct evidence for cis-acting RNA motifs within precursor tRNAs that facilitate the selection of tRNAs for mitochondrial import in trypanosomes. Furthermore, we found that mutations to the YGG(C/A)RRC sequence also altered the intracellular distribution of other endogenous tRNAs, suggesting a general role for this sequence in tRNA trafficking in trypanosomes. PMID:14645518

  13. A sequence motif within trypanosome precursor tRNAs influences abundance and mitochondrial localization.

    PubMed

    Sherrer, R Lynn; Yermovsky-Kammerer, Audra E; Hajduk, Stephen L

    2003-12-01

    Trypanosoma brucei lacks mitochondrial genes encoding tRNAs and must import nuclearly encoded tRNAs from the cytosol. The mechanism and specificity of this process remain unclear. We have identified a unique sequence motif, YGG(C/A)RRC, upstream of the genes encoding mitochondrially localized tRNAs in T. brucei. Both in vitro import studies and in vivo transfection studies indicate that deletion of the YGG(C/A)RRC sequence alters mitochondrial localization of tRNA(Leu), and in vivo studies also show a decrease in the cellular abundance of tRNA(Leu). These studies provide direct evidence for cis-acting RNA motifs within precursor tRNAs that facilitate the selection of tRNAs for mitochondrial import in trypanosomes. Furthermore, we found that mutations to the YGG(C/A)RRC sequence also altered the intracellular distribution of other endogenous tRNAs, suggesting a general role for this sequence in tRNA trafficking in trypanosomes.

  14. Multiple cellular proteins interact with LEDGF/p75 through a conserved unstructured consensus motif.

    PubMed

    Tesina, Petr; Čermáková, Kateřina; Hořejší, Magdalena; Procházková, Kateřina; Fábry, Milan; Sharma, Subhalakshmi; Christ, Frauke; Demeulemeester, Jonas; Debyser, Zeger; De Rijck, Jan; Veverka, Václav; Řezáčová, Pavlína

    2015-08-06

    Lens epithelium-derived growth factor (LEDGF/p75) is an epigenetic reader and attractive therapeutic target involved in HIV integration and the development of mixed lineage leukaemia (MLL1) fusion-driven leukaemia. Besides HIV integrase and the MLL1-menin complex, LEDGF/p75 interacts with various cellular proteins via its integrase binding domain (IBD). Here we present structural characterization of IBD interactions with transcriptional repressor JPO2 and domesticated transposase PogZ, and show that the PogZ interaction is nearly identical to the interaction of LEDGF/p75 with MLL1. The interaction with the IBD is maintained by an intrinsically disordered IBD-binding motif (IBM) common to all known cellular partners of LEDGF/p75. In addition, based on IBM conservation, we identify and validate IWS1 as a novel LEDGF/p75 interaction partner. Our results also reveal how HIV integrase efficiently displaces cellular binding partners from LEDGF/p75. Finally, the similar binding modes of LEDGF/p75 interaction partners represent a new challenge for the development of selective interaction inhibitors.

  15. Conserved Functional Motifs and Homology Modeling to Predict Hidden Moonlighting Functional Sites

    PubMed Central

    Wong, Aloysius; Gehring, Chris; Irving, Helen R.

    2015-01-01

    Moonlighting functional centers within proteins can provide them with hitherto unrecognized functions. Here, we review how hidden moonlighting functional centers, which we define as binding sites that have catalytic activity or regulate protein function in a novel manner, can be identified using targeted bioinformatic searches. Functional motifs used in such searches include amino acid residues that are conserved across species and many of which have been assigned functional roles based on experimental evidence. Molecules that were identified in this manner seeking cyclic mononucleotide cyclases in plants are used as examples. The strength of this computational approach is enhanced when good homology models can be developed to test the functionality of the predicted centers in silico, which, in turn, increases confidence in the ability of the identified candidates to perform the predicted functions. Computational characterization of moonlighting functional centers is not diagnostic for catalysis but serves as a rapid screening method, and highlights testable targets from a potentially large pool of candidates for subsequent in vitro and in vivo experiments required to confirm the functionality of the predicted moonlighting centers. PMID:26106597

  16. Novel selenoproteins identified in silico and in vivo by using a conserved RNA structural motif.

    PubMed

    Lescure, A; Gautheret, D; Carbon, P; Krol, A

    1999-12-31

    Selenocysteine is incorporated into selenoproteins by an in-frame UGA codon whose readthrough requires the selenocysteine insertion sequence (SECIS), a conserved hairpin in the 3'-untranslated region of eukaryotic selenoprotein mRNAs. To identify new selenoproteins, we developed a strategy that obviates the need for prior amino acid sequence information. A computational screen was used to scan nucleotide sequence data bases for sequences presenting a potential SECIS secondary structure. The computer-selected hairpins were then assayed in vivo for their functional capacities, and the cDNAs corresponding to the SECIS winners were identified. Four of them encoded novel selenoproteins as confirmed by in vivo experiments. Among these, SelZf1 and SelZf2 share a common domain with mitochondrial thioredoxin reductase-2. The three proteins, however, possess distinct N-terminal domains. We found that another protein, SelX, displays sequence similarity to a protein involved in bacterial pilus formation. For the first time, four novel selenoproteins were discovered based on a computational screen for the RNA hairpin directing selenocysteine incorporation.

  17. Microfluidic affinity and ChIP-seq analyses converge on a conserved FOXP2-binding motif in chimp and human, which enables the detection of evolutionarily novel targets.

    PubMed

    Nelson, Christopher S; Fuller, Chris K; Fordyce, Polly M; Greninger, Alexander L; Li, Hao; DeRisi, Joseph L

    2013-07-01

    The transcription factor forkhead box P2 (FOXP2) is believed to be important in the evolution of human speech. A mutation in its DNA-binding domain causes severe speech impairment. Humans have acquired two coding changes relative to the conserved mammalian sequence. Despite intense interest in FOXP2, it has remained an open question whether the human protein's DNA-binding specificity and chromatin localization are conserved. Previous in vitro and ChIP-chip studies have provided conflicting consensus sequences for the FOXP2-binding site. Using MITOMI 2.0 microfluidic affinity assays, we describe the binding site of FOXP2 and its affinity profile in base-specific detail for all substitutions of the strongest binding site. We find that human and chimp FOXP2 have similar binding sites that are distinct from previously suggested consensus binding sites. Additionally, through analysis of FOXP2 ChIP-seq data from cultured neurons, we find strong overrepresentation of a motif that matches our in vitro results and identifies a set of genes with FOXP2 binding sites. The FOXP2-binding sites tend to be conserved, yet we identified 38 instances of evolutionarily novel sites in humans. Combined, these data present a comprehensive portrait of FOXP2's-binding properties and imply that although its sequence specificity has been conserved, some of its genomic binding sites are newly evolved.

  18. Functionally conserved enhancers with divergent sequences in distant vertebrates

    DOE PAGES

    Yang, Song; Oksenberg, Nir; Takayama, Sachiko; ...

    2015-10-30

    To examine the contributions of sequence and function conservation in the evolution of enhancers, we systematically identified enhancers whose sequences are not conserved among distant groups of vertebrate species, but have homologous function and are likely to be derived from a common ancestral sequence. In conclusion, our approach combined comparative genomics and epigenomics to identify potential enhancer sequences in the genomes of three groups of distantly related vertebrate species.

  19. Functionally conserved enhancers with divergent sequences in distant vertebrates

    SciTech Connect

    Yang, Song; Oksenberg, Nir; Takayama, Sachiko; Heo, Seok -Jin; Poliakov, Alexander; Ahituv, Nadav; Dubchak, Inna; Boffelli, Dario

    2015-10-30

    To examine the contributions of sequence and function conservation in the evolution of enhancers, we systematically identified enhancers whose sequences are not conserved among distant groups of vertebrate species, but have homologous function and are likely to be derived from a common ancestral sequence. In conclusion, our approach combined comparative genomics and epigenomics to identify potential enhancer sequences in the genomes of three groups of distantly related vertebrate species.

  20. Identification of sequence motifs involved in Dengue virus-host interactions.

    PubMed

    Asnet Mary, J; Paramasivan, R; Shenbagarathai, R

    2016-01-01

    Dengue fever is a rapidly spreading mosquito-borne virus infection, which remains a serious global public health problem. As there is no specific treatment or commercial vaccine available for effective control of the disease, the attempts on developing novel control strategies are underway. Viruses utilize the surface receptor proteins of host to enter into the cells. Though various proteins were said to be receptors of Dengue virus (DENV) using Virus Overlay Protein Binding Assay, the precise interaction between DENV and host is not explored. Understanding the structural features of domain III envelope glycoprotein would help in developing efficient antiviral inhibitors. Therefore, an attempt was made to identify the sequence motifs present in domain III envelope glycoprotein of Dengue virus. Computational analysis revealed that the NGR motif is present in the domain III envelope glycoprotein of DENV-1 and DENV-3. Similarly, DENV-1, DENV-2 and DENV-4 were found to contain Yxxphi motif which is a tyrosine-based sorting signal responsible for the interaction with a mu subunit of adaptor protein complex. High-throughput virtual screening resulted in five compounds as lead molecules based on glide score, which ranges from -4.664 to -6.52 kcal/Mol. This computational prediction provides an additional tool for understanding the virus-host interactions and helps to identify potential targets in the host. Further, experimental evidence is warranted to confirm the virus-host interactions and also inhibitory activity of reported lead compounds.

  1. Characterization of the highly conserved VFMGD motif in a bacterial polyisoprenyl-phosphate N-acetylaminosugar-1-phosphate transferase.

    PubMed

    Furlong, Sarah E; Valvano, Miguel A

    2012-09-01

    Polyisoprenyl-phosphate N-acetylaminosugar-1-phosphate transferases (PNPTs) constitute a family of eukaryotic and prokaryotic membrane proteins that catalyze the transfer of a sugar-1-phosphate to a phosphoisoprenyl lipid carrier. All PNPT members share a highly conserved 213-Valine-Phenylalanine-Methionine-Glycine-Aspartic acid-217 (VFMGD) motif. Previous studies using the MraY protein suggested that the aspartic acid residue in this motif, D267, is a nucleophile for a proposed double-displacement mechanism involving the cleavage of the phosphoanhydride bond of the nucleoside. Here, we demonstrate that the corresponding residue in the E. coli WecA, D217, is not directly involved in catalysis, as its replacement by asparagine results in a more active enzyme. Kinetic data indicate that the D217N replacement leads to more than twofold increase in V(max) without significant change in the K(m) for the nucleoside sugar substrate. Furthermore, no differences in the binding of the reaction intermediate analog tunicamycin were found in D217N as well as in other replacement mutants at the same position. We also found that alanine substitutions in various residues of the VFMGD motif affect to various degrees the enzymatic activity of WecA in vivo and in vitro. Together, our data suggest that the highly conserved VFMGD motif defines a common region in PNPT proteins that contributes to the active site and is likely involved in the release of the reaction product. Copyright © 2012 The Protein Society.

  2. Characterization of the highly conserved VFMGD motif in a bacterial polyisoprenyl-phosphate N-acetylaminosugar-1-phosphate transferase

    PubMed Central

    Furlong, Sarah E; Valvano, Miguel A

    2012-01-01

    Polyisoprenyl-phosphate N-acetylaminosugar-1-phosphate transferases (PNPTs) constitute a family of eukaryotic and prokaryotic membrane proteins that catalyze the transfer of a sugar-1-phosphate to a phosphoisoprenyl lipid carrier. All PNPT members share a highly conserved 213-Valine-Phenylalanine-Methionine-Glycine-Aspartic acid-217 (VFMGD) motif. Previous studies using the MraY protein suggested that the aspartic acid residue in this motif, D267, is a nucleophile for a proposed double-displacement mechanism involving the cleavage of the phosphoanhydride bond of the nucleoside. Here, we demonstrate that the corresponding residue in the E. coli WecA, D217, is not directly involved in catalysis, as its replacement by asparagine results in a more active enzyme. Kinetic data indicate that the D217N replacement leads to more than twofold increase in Vmax without significant change in the Km for the nucleoside sugar substrate. Furthermore, no differences in the binding of the reaction intermediate analog tunicamycin were found in D217N as well as in other replacement mutants at the same position. We also found that alanine substitutions in various residues of the VFMGD motif affect to various degrees the enzymatic activity of WecA in vivo and in vitro. Together, our data suggest that the highly conserved VFMGD motif defines a common region in PNPT proteins that contributes to the active site and is likely involved in the release of the reaction product. PMID:22811320

  3. Conserved motifs II to VI of DNA helicase II from Escherichia coli are all required for biological activity.

    PubMed Central

    Zhang, G; Deng, E; Baugh, L R; Hamilton, C M; Maples, V F; Kushner, S R

    1997-01-01

    There are seven conserved motifs (IA, IB, and II to VI) in DNA helicase II of Escherichia coli that have high homology among a large family of proteins involved in DNA metabolism. To address the functional importance of motifs II to VI, we employed site-directed mutagenesis to replace the charged amino acid residues in each motif with alanines. Cells carrying these mutant alleles exhibited higher UV and methyl methanesulfonate sensitivity, increased rates of spontaneous mutagenesis, and elevated levels of homologous recombination, indicating defects in both the excision repair and mismatch repair pathways. In addition, we also changed the highly conserved tyrosine(600) in motif VI to phenylalanine (uvrD309, Y600F). This mutant displayed a moderate increase in UV sensitivity but a decrease in spontaneous mutation rate, suggesting that DNA helicase II may have different functions in the two DNA repair pathways. Furthermore, a mutation in domain IV (uvrD307, R284A) significantly reduced the viability of some E. coli K-12 strains at 30 degrees C but not at 37 degrees C. The implications of these observations are discussed. PMID:9393722

  4. Targeting of the human adrenoleukodystrophy protein to the peroxisomal membrane by an internal region containing a highly conserved motif.

    PubMed

    Landgraf, Pablo; Mayerhofer, Peter U; Polanetz, Roman; Roscher, Adelbert A; Holzinger, Andreas

    2003-08-01

    In this study we addressed the targeting requirements of peroxisomal ABC transporters, in particular the human adrenoleukodystrophy protein. This membrane protein is defective or missing in X-linked adrenoleukodystrophy, a neurodegenerative disorder predominantly presenting in childhood. Using adrenoleukodystrophy protein deletion constructs and green fluorescent protein fusion constructs we identified the amino acid regions 1-110 and 67-164 to be sufficient for peroxisomal targeting. However, the minimal region shared by these constructs (amino acids 67-110) is not sufficient for peroxisomal targeting by itself. Additionally, the NH2-terminal 66 amino acids enhance targeting efficiency. Green fluorescent protein-labeled fragments of human peroxisomal membrane protein 69 and Saccharomyces cerevisiae Pxa1 corresponding to the amino acid 67-164 adrenoleukodystrophy protein region were also directed to the mammalian peroxisome. The required region contains a 14-amino-acid motif (71-84) conserved between the adrenoleukodystrophy protein and human peroxisomal membrane protein 69 and yeast Pxa1. Omission or truncation of this motif in the adrenoleukodystrophy protein abolished peroxisomal targeting. The single amino acid substitution L78F resulted in a significant reduction of targeting efficiency. The in-frame deletion of three amino acids (del78-80LLR) within the proposed targeting motif in two patients suffering from X-linked adrenoleukodystrophy resulted in the mislocalization of a green fluorescent protein fusion protein to nucleus, cytosol and mitochondria. Our data define the targeting region of human adrenoleukodystrophy protein containing a highly conserved 14-amino-acid motif.

  5. DNA sequence analysis of cagA 3' motifs of Helicobacter pylori strains from patients with peptic ulcer diseases.

    PubMed

    Salih, Barik A; Bolek, Bora Kazim; Arikan, Soykan

    2010-02-01

    The Helicobacter pylori cagA gene is a major virulence factor that plays an important role in gastric pathologies. DNA sequence data for the cagA 3' region of Western isolates differ markedly in their EPIYA motifs from those of East Asian isolates. An increase in the number of these motifs is known to be associated with gastric cancer. Whether such an association is also the case for peptic ulceration was investigated in this study. Gastric biopsies were collected from 96 patients with duodenal ulcer (DU), gastric ulcer (GU) and gastritis. The types of EPIYA motif detected by PCR among 28 DU strains were 13 ABC, eight ABCC, six ABCCC, and in one patient both ABC and ABCCCCC; among nine GU strains were two ABC, five ABCC and two ABCCC; and among 40 gastritis strains were 35 ABC and five ABCC. DNA sequencing was carried out to confirm the detection of the EPIYA motif types and to analyse their peptide sequences. A significant association was found between the number of the EPIYA-C motifs (>or=2) and peptic ulceration (P=0.00001) compared with gastritis. In conclusion, this study shows that our patients harboured cagA-positive H. pylori strains with EPIYA motifs of the Western type and that the increase in the number of EPIYA-C motifs was significantly associated with DU and GU but not with gastritis, indicating predictive association with the severity of the disease.

  6. Interleukin-11 binds specific EF-hand proteins via their conserved structural motifs.

    PubMed

    Kazakov, Alexei S; Sokolov, Andrei S; Vologzhannikova, Alisa A; Permyakova, Maria E; Khorn, Polina A; Ismailov, Ramis G; Denessiouk, Konstantin A; Denesyuk, Alexander I; Rastrygina, Victoria A; Baksheeva, Viktoriia E; Zernii, Evgeni Yu; Zinchenko, Dmitry V; Glazatov, Vladimir V; Uversky, Vladimir N; Mirzabekov, Tajib A; Permyakov, Eugene A; Permyakov, Sergei E

    2017-01-01

    Interleukin-11 (IL-11) is a hematopoietic cytokine engaged in numerous biological processes and validated as a target for treatment of various cancers. IL-11 contains intrinsically disordered regions that might recognize multiple targets. Recently we found that aside from IL-11RA and gp130 receptors, IL-11 interacts with calcium sensor protein S100P. Strict calcium dependence of this interaction suggests a possibility of IL-11 interaction with other calcium sensor proteins. Here we probed specificity of IL-11 to calcium-binding proteins of various types: calcium sensors of the EF-hand family (calmodulin, S100B and neuronal calcium sensors: recoverin, NCS-1, GCAP-1, GCAP-2), calcium buffers of the EF-hand family (S100G, oncomodulin), and a non-EF-hand calcium buffer (α-lactalbumin). A specific subset of the calcium sensor proteins (calmodulin, S100B, NCS-1, GCAP-1/2) exhibits metal-dependent binding of IL-11 with dissociation constants of 1-19 μM. These proteins share several amino acid residues belonging to conservative structural motifs of the EF-hand proteins, 'black' and 'gray' clusters. Replacements of the respective S100P residues by alanine drastically decrease its affinity to IL-11, suggesting their involvement into the association process. Secondary structure and accessibility of the hinge region of the EF-hand proteins studied are predicted to control specificity and selectivity of their binding to IL-11. The IL-11 interaction with the EF-hand proteins is expected to occur under numerous pathological conditions, accompanied by disintegration of plasma membrane and efflux of cellular components into the extracellular milieu.

  7. A Conserved Structural Motif Mediates Retrograde Trafficking of Shiga Toxin Types 1 and 2.

    PubMed

    Selyunin, Andrey S; Mukhopadhyay, Somshuvra

    2015-12-01

    Shiga toxin-producing Escherichia coli (STEC) produce two types of Shiga toxin (STx): STx1 and STx2. The toxin A-subunits block protein synthesis, while the B-subunits mediate retrograde trafficking. STEC infections do not have definitive treatments, and there is growing interest in generating toxin transport inhibitors for therapy. However, a comprehensive understanding of the mechanisms of toxin trafficking is essential for drug development. While STx2 is more toxic in vivo, prior studies focused on STx1 B-subunit (STx1B) trafficking. Here, we show that, compared with STx1B, trafficking of the B-subunit of STx2 (STx2B) to the Golgi occurs with slower kinetics. Despite this difference, similar to STx1B, endosome-to-Golgi transport of STx2B does not involve transit through degradative late endosomes and is dependent on dynamin II, epsinR, retromer and syntaxin5. Importantly, additional experiments show that a surface-exposed loop in STx2B (β4-β5 loop) is required for its endosome-to-Golgi trafficking. We previously demonstrated that residues in the corresponding β4-β5 loop of STx1B are required for interaction with GPP130, the STx1B-specific endosomal receptor, and for endosome-to-Golgi transport. Overall, STx1B and STx2B share a common pathway and use a similar structural motif to traffic to the Golgi, suggesting that the underlying mechanisms of endosomal sorting may be evolutionarily conserved.

  8. Functional conservation of PISTILLATA activity in a pea homolog lacking the PI motif.

    PubMed

    Berbel, Ana; Navarro, Cristina; Ferrándiz, Cristina; Cañas, Luis Antonio; Beltrán, José-Pío; Madueño, Francisco

    2005-09-01

    Current understanding of floral development is mainly based on what we know from Arabidopsis (Arabidopsis thaliana) and Antirrhinum majus. However, we can learn more by comparing developmental mechanisms that may explain morphological differences between species. A good example comes from the analysis of genes controlling flower development in pea (Pisum sativum), a plant with more complex leaves and inflorescences than Arabidopsis and Antirrhinum, and a different floral ontogeny. The analysis of UNIFOLIATA (UNI) and STAMINA PISTILLOIDA (STP), the pea orthologs of LEAFY and UNUSUAL FLORAL ORGANS, has revealed a common link in the regulation of flower and leaf development not apparent in Arabidopsis. While the Arabidopsis genes mainly behave as key regulators of flower development, where they control the expression of B-function genes, UNI and STP also contribute to the development of the pea compound leaf. Here, we describe the characterization of P. sativum PISTILLATA (PsPI), a pea MADS-box gene homologous to B-function genes like PI and GLOBOSA (GLO), from Arabidopsis and Antirrhinum, respectively. PsPI encodes for an atypical PI-type polypeptide that lacks the highly conserved C-terminal PI motif. Nevertheless, constitutive expression of PsPI in tobacco (Nicotiana tabacum) and Arabidopsis shows that it can specifically replace the function of PI, being able to complement the strong pi-1 mutant. Accordingly, PsPI expression in pea flowers, which is dependent on STP, is identical to PI and GLO. Interestingly, PsPI is also transiently expressed in young leaves, suggesting a role of PsPI in pea leaf development, a possibility that fits with the established role of UNI and STP in the control of this process.

  9. Processing of yeast mitochondrial messenger RNAs at a conserved dodecamer sequence.

    PubMed Central

    Osinga, K A; De Vries, E; Van der Horst, G; Tabak, H F

    1984-01-01

    The yeast mitochondrial genes coding for cytochrome c oxidase subunit I ( COX1 ) and the ATPase subunits 8 and 6 are organized in one transcription unit. Precise mapping of RNA termini with S1 nuclease and primer extension analysis shows that the 3' end of the COX1 mRNA and the 5' end of the ATPase precursor RNA are juxtaposed within a conserved dodecamer sequence (5'- AAUAAUAUUCUU -3'). Sequence comparison reveals that this motif is present downstream of nearly all protein-encoding genes, including extragenic unassigned reading frames ( URFs ) and two URFs located within introns. Also the 3' terminus of an RNA species derived from the URF -containing intron of the large rRNA gene maps within such a dodecamer sequence. It is likely, therefore, that this motif serves as a processing point in the generation of mature mRNA. From a comparison of the various transcription units, we infer that RNAs that originate from an endonucleolytic cleavage at this sequence have stable 3' termini, while further processing of the 5' ends occurs. The efficiency of the initial cleavage varies between the different positions at which the motif is present. Images Fig. 1. Fig. 2. Fig. 3. PMID:6327291

  10. A Conserved Leucine Zipper Motif in Gammaherpesvirus ORF52 Is Critical for Distinct Microtubule Rearrangements.

    PubMed

    Loftus, Matthew S; Verville, Nancy; Kedes, Dean H

    2017-09-01

    Productive viral infection often depends on the manipulation of the cytoskeleton. Herpesviruses, including rhesus monkey rhadinovirus (RRV) and its close homolog, the oncogenic human gammaherpesvirus Kaposi's sarcoma-associated herpesvirus/human herpesvirus 8 (KSHV/HHV8), exploit microtubule (MT)-based retrograde transport to deliver their genomes to the nucleus. Subsequently, during the lytic phase of the life cycle, the maturing viral particles undergo orchestrated translocation to specialized regions within the cytoplasm, leading to tegumentation, secondary envelopment, and then egress. As a result, we hypothesized that RRV might induce changes in the cytoskeleton at both early and late stages of infection. Using confocal imaging, we found that RRV infection led to the thickening and acetylation of MTs emanating from the MT-organizing center (MTOC) shortly after viral entry and more pronounced and diffuse MT reorganization during peak stages of lytic gene expression and virion production. We subsequently identified open reading frame 52 (ORF52), a multifunctional and abundant tegument protein, as being the only virally encoded component responsible for these cytoskeletal changes. Mutational and modeling analyses indicated that an evolutionarily conserved, truncated leucine zipper motif near the N terminus as well as a strictly conserved arginine residue toward the C terminus of ORF52 play critical roles in its ability to rearrange the architecture of the MT cytoskeleton. Taken together, our findings combined with data from previous studies describing diverse roles for ORF52 suggest that it likely binds to different cellular components, thereby allowing context-dependent modulation of function.IMPORTANCE A thorough understanding of the processes governing viral infection includes knowledge of how viruses manipulate their intracellular milieu, including the cytoskeleton. Altering the dynamics of actin or MT polymerization, for example, is a common strategy

  11. A Conserved Leucine Zipper Motif in Gammaherpesvirus ORF52 Is Critical for Distinct Microtubule Rearrangements

    PubMed Central

    Loftus, Matthew S.; Verville, Nancy

    2017-01-01

    ABSTRACT Productive viral infection often depends on the manipulation of the cytoskeleton. Herpesviruses, including rhesus monkey rhadinovirus (RRV) and its close homolog, the oncogenic human gammaherpesvirus Kaposi's sarcoma-associated herpesvirus/human herpesvirus 8 (KSHV/HHV8), exploit microtubule (MT)-based retrograde transport to deliver their genomes to the nucleus. Subsequently, during the lytic phase of the life cycle, the maturing viral particles undergo orchestrated translocation to specialized regions within the cytoplasm, leading to tegumentation, secondary envelopment, and then egress. As a result, we hypothesized that RRV might induce changes in the cytoskeleton at both early and late stages of infection. Using confocal imaging, we found that RRV infection led to the thickening and acetylation of MTs emanating from the MT-organizing center (MTOC) shortly after viral entry and more pronounced and diffuse MT reorganization during peak stages of lytic gene expression and virion production. We subsequently identified open reading frame 52 (ORF52), a multifunctional and abundant tegument protein, as being the only virally encoded component responsible for these cytoskeletal changes. Mutational and modeling analyses indicated that an evolutionarily conserved, truncated leucine zipper motif near the N terminus as well as a strictly conserved arginine residue toward the C terminus of ORF52 play critical roles in its ability to rearrange the architecture of the MT cytoskeleton. Taken together, our findings combined with data from previous studies describing diverse roles for ORF52 suggest that it likely binds to different cellular components, thereby allowing context-dependent modulation of function. IMPORTANCE A thorough understanding of the processes governing viral infection includes knowledge of how viruses manipulate their intracellular milieu, including the cytoskeleton. Altering the dynamics of actin or MT polymerization, for example, is a common

  12. Evolution, homology conservation, and identification of unique sequence signatures in GH19 family chitinases.

    PubMed

    Udaya Prakash, N A; Jayanthi, M; Sabarinathan, R; Kangueane, P; Mathew, Lazar; Sekar, K

    2010-05-01

    The discovery of GH (Glycoside Hydrolase) 19 chitinases in Streptomyces sp. raises the possibility of the presence of these proteins in other bacterial species, since they were initially thought to be confined to higher plants. The present study mainly concentrates on the phylogenetic distribution and homology conservation in GH19 family chitinases. Extensive database searches are performed to identify the presence of GH19 family chitinases in the three major super kingdoms of life. Multiple sequence alignment of all the identified GH19 chitinase family members resulted in the identification of globally conserved residues. We further identified conserved sequence motifs across the major sub groups within the family. Estimation of evolutionary distance between the various bacterial and plant chitinases are carried out to better understand the pattern of evolution. Our study also supports the horizontal gene transfer theory, which states that GH19 chitinase genes are transferred from higher plants to bacteria. Further, the present study sheds light on the phylogenetic distribution and identifies unique sequence signatures that define GH19 chitinase family of proteins. The identified motifs could be used as markers to delineate uncharacterized GH19 family chitinases. The estimation of evolutionary distance between chitinase identified in plants and bacteria shows that the flowering plants are more related to chitinase in actinobacteria than that of identified in purple bacteria. We propose a model to elucidate the natural history of GH19 family chitinases.

  13. Quadfinder: server for identification and analysis of quadruplex-forming motifs in nucleotide sequences

    PubMed Central

    Scaria, Vinod; Hariharan, Manoj; Arora, Amit; Maiti, Souvik

    2006-01-01

    G-quadruplex secondary structures, which play a structural role in repetitive DNA such as telomeres, may also play a functional role at other genomic locations as targetable regulatory elements which control gene expression. The recent interest in application of quadruplexes in biological systems prompted us to develop a tool for the identification and analysis of quadruplex-forming nucleotide sequences especially in the RNA. Here we present Quadfinder, an online server for prediction and bioinformatics of uni-molecular quadruplex-forming nucleotide sequences. The server is designed to be user-friendly and needs minimal intervention by the user, while providing flexibility of defining the variants of the motif. The server is freely available at URL . PMID:16845097

  14. SNPs occur in regions with less genomic sequence conservation.

    PubMed

    Castle, John C

    2011-01-01

    Rates of SNPs (single nucleotide polymorphisms) and cross-species genomic sequence conservation reflect intra- and inter-species variation, respectively. Here, I report SNP rates and genomic sequence conservation adjacent to mRNA processing regions and show that, as expected, more SNPs occur in less conserved regions and that functional regions have fewer SNPs. Results are confirmed using both mouse and human data. Regions include protein start codons, 3' splice sites, 5' splice sites, protein stop codons, predicted miRNA binding sites, and polyadenylation sites. Throughout, SNP rates are lower and conservation is higher at regulatory sites. Within coding regions, SNP rates are highest and conservation is lowest at codon position three and the fewest SNPs are found at codon position two, reflecting codon degeneracy for amino acid encoding. Exon splice sites show high conservation and very low SNP rates, reflecting both splicing signals and protein coding. Relaxed constraint on the codon third position is dramatically seen when separating exonic SNP rates based on intron phase. At polyadenylation sites, a peak of conservation and low SNP rate occurs from 30 to 17 nt preceding the site. This region is highly enriched for the sequence AAUAAA, reflecting the location of the conserved polyA signal. miRNA 3' UTR target sites are predicted incorporating interspecies genomic sequence conservation; SNP rates are low in these sites, again showing fewer SNPs in conserved regions. Together, these results confirm that SNPs, reflecting recent genetic variation, occur more frequently in regions with less evolutionarily conservation.

  15. A common set of conserved motifs in a vast variety of putative nucleic acid-dependent ATPases including MCM proteins involved in the initiation of eukaryotic DNA replication.

    PubMed Central

    Koonin, E V

    1993-01-01

    A new superfamily of (putative) DNA-dependent ATPases is described that includes the ATPase domains of prokaryotic NtrC-related transcription regulators, MCM proteins involved in the initiation of eukaryotic DNA replication, and a group of uncharacterized bacterial and chloroplast proteins. MCM proteins are shown to contain a modified form of the ATP-binding motif and are predicted to mediate ATP-dependent opening of double-stranded DNA in the replication origins. In a second line of investigation, it is demonstrated that the products of unidentified open reading frames from Marchantia mitochondria and from yeast, and a domain of a baculovirus protein involved in viral DNA replication are related to the superfamily III of DNA and RNA helicases that previously has been known to include only proteins of small viruses. Comparison of the multiple alignments showed that the proteins of the NtrC superfamily and the helicases of superfamily III share three related sequence motifs tightly packed in the ATPase domain that consists of 100-150 amino acid residues. A similar array of conserved motifs is found in the family of DnaA-related ATPases. It is hypothesized that the three large groups of nucleic acid-dependent ATPases have similar structure of the core ATPase domain and have evolved from a common ancestor. PMID:8332451

  16. Complementary intron sequence motifs associated with human exon repetition: a role for intragenic, inter-transcript interactions in gene expression.

    PubMed

    Dixon, Richard J; Eperon, Ian C; Samani, Nilesh J

    2007-01-15

    Exon repetition describes the presence of tandemly repeated exons in mRNA in the absence of duplications in the genome. The regulation of this process is not fully understood. We therefore investigated the entire flanking intronic sequences of exons involved in exon repetition for common sequence elements. A computational analysis of 48 human single exon repetition events identified two common sequence motifs. One of these motifs is pyrimidine-rich and is more common in the upstream intron, whilst the other motif is highly enriched in purines and is more common in the downstream intron. As the two motifs are complementary to each other, they support a model by which exon repetition occurs as a result of trans-splicing between separate pre-mRNA transcripts from the same gene that are brought together during transcription by complementary intronic sequences. The majority of the motif instances overlap with the locations of mobile elements such as Alu elements. We explore the potential importance of complementary intron sequences in a rat gene that undertakes natural exon repetition in a strain specific manner. The possibility that distant complementary sequences can stimulate inter-transcript splicing during transcription suggests an unsuspected new role for potential secondary structures in endogenous genes.

  17. A conserved structural motif reveals the essential transcriptional repression function of Spen proteins and their role in developmental signaling.

    PubMed

    Ariyoshi, Mariko; Schwabe, John W R

    2003-08-01

    Spen proteins regulate the expression of key transcriptional effectors in diverse signaling pathways. They are large proteins characterized by N-terminal RNA-binding motifs and a highly conserved C-terminal SPOC domain. The specific biological role of the SPOC domain (Spen paralog and ortholog C-terminal domain), and hence, the common function of Spen proteins, has been unclear to date. The Spen protein, SHARP (SMRT/HDAC1-associated repressor protein), was identified as a component of transcriptional repression complexes in both nuclear receptor and Notch/RBP-Jkappa signaling pathways. We have determined the 1.8 A crystal structure of the SPOC domain from SHARP. This structure shows that essentially all of the conserved surface residues map to a positively charged patch. Structure-based mutational analysis indicates that this conserved region is responsible for the interaction between SHARP and the universal transcriptional corepressor SMRT/NCoR (silencing mediator for retinoid and thyroid receptors/nuclear receptor corepressor. We demonstrate that this interaction involves a highly conserved acidic motif at the C terminus of SMRT/NCoR. These findings suggest that the conserved function of the SPOC domain is to mediate interaction with SMRT/NCoR corepressors, and that Spen proteins play an essential role in the repression complex.

  18. A family of cyclin D homologs from plants differentially controlled by growth regulators and containing the conserved retinoblastoma protein interaction motif.

    PubMed Central

    Soni, R; Carmichael, J P; Shah, Z H; Murray, J A

    1995-01-01

    A new family of three related cyclins has been identified in Arabidopsis by complementation of a yeast strain deficient in G1 cyclins. Individual members show tissue-specific expression and are conserved in other plant species. They form a distinctive group of plant cyclins, which we named delta-type cyclins to indicate their similarities with mammalian D-type cyclins. The sequence relationships between delta and D cyclins include the N-terminal sequence LXCXE. This motif was originally identified in certain viral oncoproteins and is strongly implicated in binding to the retinoblastoma protein pRb. By analogy to mammalian cyclin D, these plant homologs may mediate growth and phytohormonal signals into the plant cell cycle. In support of this hypothesis, we show that, on restimulation of suspension-cultured cells, cyclin delta 3 is rapidly induced by the plant growth regulator cytokinin and cyclin delta 2 is induced by carbon source. PMID:7696881

  19. Patterns of sequence conservation in the S-Layer proteins and related sequences in Clostridium difficile.

    PubMed

    Calabi, Emanuela; Fairweather, Neil

    2002-07-01

    Clostridium difficile is the etiological agent of antibiotic-associated diarrhea. Among the factors that may play a role in infection are S-layer proteins (SLPs). Previous work has shown these to consist mainly of two components, resulting from the cleavage of a precursor encoded by the slpA gene. The high-molecular-weight (MW) subunit is related both to amidases from B. subtilis and to at least another 28 gene products in C. difficile strain 630. To gain insight into the functions of the SLPs and related proteins, we have further investigated the pattern of variability both at the slpA locus and at six nearby paralogs. Sequencing of the slpA gene from an S-layer group II strain and a variant S-layer group strain confirms a high degree of divergence in the low-MW SLP, which may result from diversifying selection. A highly conserved motif, however, is found at the C terminus in all low-MW subunits and may be essential for SlpA precursor cleavage. In strain 167, a variant cleavage product is present, suggesting a secondary processing site. Southern blotting analysis shows slpA-like open reading frames (ORFs) 2 to 7 to be conserved in all nine strains tested, with one exception: ORF2, which encodes a 66-kDa polypeptide coextracted at low pH with the main SLPs in strain 630, may be partially deleted in strain 167. Polymorphism within the slpA-ORF7 cluster may be more pronounced in the region proximal to the slpA gene. Unexpectedly, a high-MW subunit probe cross hybridizes to sequences outside the slpA locus, which appear to vary in number in different strains.

  20. Functional characterization of sequence motifs in the transit peptide of Arabidopsis small subunit of rubisco.

    PubMed

    Lee, Dong Wook; Lee, Sookjin; Lee, Gil-Je; Lee, Kwang Hee; Kim, Sanguk; Cheong, Gang-Won; Hwang, Inhwan

    2006-02-01

    The transit peptides of nuclear-encoded chloroplast proteins are necessary and sufficient for targeting and import of proteins into chloroplasts. However, the sequence information encoded by transit peptides is not fully understood. In this study, we investigated sequence motifs in the transit peptide of the small subunit of the Rubisco complex by examining the ability of various mutant transit peptides to target green fluorescent protein reporter proteins to chloroplasts in Arabidopsis (Arabidopsis thaliana) leaf protoplasts. We divided the transit peptide into eight blocks (T1 through T8), each consisting of eight or 10 amino acids, and generated mutants that had alanine (Ala) substitutions or deletions, of one or two T blocks in the transit peptide. In addition, we generated mutants that had the original sequence partially restored in single- or double-T-block Ala (A) substitution mutants. Analysis of chloroplast import of these mutants revealed several interesting observations. Single-T-block mutations did not noticeably affect targeting efficiency, except in T1 and T4 mutations. However, double-T mutants, T2A/T4A, T3A/T6A, T3A/T7A, T4A/T6A, and T4A/T7A, caused a 50% to 100% loss in targeting ability. T3A/T6A and T4A/T6A mutants produced only precursor proteins, whereas T2A/T4A and T4A/T7A mutants produced only a 37-kD protein. Detailed analyses revealed that sequence motifs ML in T1, LKSSA in T3, FP and RK in T4, CMQVW in T6, and KKFET in T7 play important roles in chloroplast targeting. In T1, the hydrophobicity of ML is important for targeting. LKSSA in T3 is functionally equivalent to CMQVW in T6 and KKFET in T7. Furthermore, subcellular fractionation revealed that Ala substitution in T1, T3, and T6 produced soluble precursors, whereas Ala substitution in T4 and T7 produced intermediates that were tightly associated with membranes. These results demonstrate that the transit peptide contains multiple motifs and that some of them act in concert or

  1. Using Weeder, Pscan, and PscanChIP for the Discovery of Enriched Transcription Factor Binding Site Motifs in Nucleotide Sequences.

    PubMed

    Zambelli, Federico; Pesole, Graziano; Pavesi, Giulio

    2014-09-08

    One of the greatest challenges facing modern molecular biology is understanding the complex mechanisms regulating gene expression. A fundamental step in this process requires the characterization of sequence motifs involved in the regulation of gene expression at transcriptional and post-transcriptional levels. In particular, transcription is modulated by the interaction of transcription factors (TFs) with their corresponding binding sites. Weeder, Pscan, and PscanChIP are software tools freely available for noncommercial users as a stand-alone or Web-based applications for the automatic discovery of conserved motifs in a set of DNA sequences likely to be bound by the same TFs. Input for the tools can be promoter sequences from co-expressed or co-regulated genes (for which Weeder and Pscan are suitable), or regions identified through genome wide ChIP-seq or similar experiments (Weeder and PscanChIP). The motifs are either found by a de novo approach (Weeder) or by using descriptors of the binding specificity of TFs (Pscan and PscanChIP). Copyright © 2014 John Wiley & Sons, Inc.

  2. Candidate disease resistance genes in sunflower cloned using conserved nucleotide-binding site motifs: genetic mapping and linkage to the downy mildew resistance gene Pl1.

    PubMed

    Gedil, M A; Slabaugh, M B; Berry, S; Johnson, R; Michelmore, R; Miller, J; Gulya, T; Knapp, S J

    2001-04-01

    Disease resistance gene candidates (RGCs) belonging to the nucleotide-binding site (NBS) superfamily have been cloned from numerous crop plants using highly conserved DNA sequence motifs. The aims of this research were to (i) isolate genomic DNA clones for RGCs in cultivated sunflower (Helianthus annuus L.) and (ii) map RGC markers and Pl1, a gene for resistance to downy mildew (Plasmopara halstedii (Farl.) Berl. & de Toni) race 1. Degenerate oligonucleotide primers targeted to conserved NBS DNA sequence motifs were used to amplify RGC fragments from sunflower genomic DNA. PCR products were cloned, sequenced, and assigned to 11 groups. RFLP analyses mapped six RGC loci to three linkage groups. One of the RGCs (Ha-4W2) was linked to Pl1, a downy mildew resistance gene. A cleaved amplified polymorphic sequence (CAPS) marker was developed for Ha-4W2 using gene-specific oligonucleotide primers. Downy mildew susceptible lines (HA89 and HA372) lacked a 276-bp Tsp5091 restriction fragment that was present in downy mildew resistant lines (HA370, 335, 336, 337, 338, and 339). HA370 x HA372 F2 progeny were genotyped for the Ha-4W2 CAPS marker and phenotyped for resistance to downy mildew race 1. The CAPS marker was linked to but did not completely cosegregate with Pl1 on linkage group 8. Ha-4W2 was found to comprise a gene family with at least five members. Although genetic markers for Ha-4W2 have utility for marker-assisted selection, the RGC detected by the CAPS marker has been ruled out as a candidate gene for Pl1. Three of the RGC probes were monomorphic between HA370 and HA372 and still need to be mapped and screened for linkage to disease resistance loci.

  3. Conserved forkhead dimerization motif controls DNA replication timing and spatial organization of chromosomes in S. cerevisiae.

    PubMed

    Ostrow, A Zachary; Kalhor, Reza; Gan, Yan; Villwock, Sandra K; Linke, Christian; Barberis, Matteo; Chen, Lin; Aparicio, Oscar M

    2017-03-21

    Forkhead Box (Fox) proteins share the Forkhead domain, a winged-helix DNA binding module, which is conserved among eukaryotes from yeast to humans. These sequence-specific DNA binding proteins have been primarily characterized as transcription factors regulating diverse cellular processes from cell cycle control to developmental fate, deregulation of which contributes to developmental defects, cancer, and aging. We recently identified Saccharomyces cerevisiae Forkhead 1 (Fkh1) and Forkhead 2 (Fkh2) as required for the clustering of a subset of replication origins in G1 phase and for the early initiation of these origins in the ensuing S phase, suggesting a mechanistic role linking the spatial organization of the origins and their activity. Here, we show that Fkh1 and Fkh2 share a unique structural feature of human FoxP proteins that enables FoxP2 and FoxP3 to form domain-swapped dimers capable of bridging two DNA molecules in vitro. Accordingly, Fkh1 self-associates in vitro and in vivo in a manner dependent on the conserved domain-swapping region, strongly suggestive of homodimer formation. Fkh1- and Fkh2-domain-swap-minus (dsm) mutations are functional as transcription factors yet are defective in replication origin timing control. Fkh1-dsm binds replication origins in vivo but fails to cluster them, supporting the conclusion that Fkh1 and Fkh2 dimers perform a structural role in the spatial organization of chromosomal elements with functional importance.

  4. Conserved forkhead dimerization motif controls DNA replication timing and spatial organization of chromosomes in S. cerevisiae

    PubMed Central

    Ostrow, A. Zachary; Gan, Yan; Villwock, Sandra K.; Linke, Christian; Barberis, Matteo; Chen, Lin; Aparicio, Oscar M.

    2017-01-01

    Forkhead Box (Fox) proteins share the Forkhead domain, a winged-helix DNA binding module, which is conserved among eukaryotes from yeast to humans. These sequence-specific DNA binding proteins have been primarily characterized as transcription factors regulating diverse cellular processes from cell cycle control to developmental fate, deregulation of which contributes to developmental defects, cancer, and aging. We recently identified Saccharomyces cerevisiae Forkhead 1 (Fkh1) and Forkhead 2 (Fkh2) as required for the clustering of a subset of replication origins in G1 phase and for the early initiation of these origins in the ensuing S phase, suggesting a mechanistic role linking the spatial organization of the origins and their activity. Here, we show that Fkh1 and Fkh2 share a unique structural feature of human FoxP proteins that enables FoxP2 and FoxP3 to form domain-swapped dimers capable of bridging two DNA molecules in vitro. Accordingly, Fkh1 self-associates in vitro and in vivo in a manner dependent on the conserved domain-swapping region, strongly suggestive of homodimer formation. Fkh1- and Fkh2-domain-swap-minus (dsm) mutations are functional as transcription factors yet are defective in replication origin timing control. Fkh1-dsm binds replication origins in vivo but fails to cluster them, supporting the conclusion that Fkh1 and Fkh2 dimers perform a structural role in the spatial organization of chromosomal elements with functional importance. PMID:28265091

  5. Conserved XPB Core Structure and Motifs for DNA Unwinding:Implications for Pathway Selection of Transcription or ExcisionRepair

    SciTech Connect

    Fan, Li; Arval, Andrew S.; Cooper, Priscilla K.; Iwai, Shigenori; Hanaoka, Fumio; Tainer, John A.

    2005-04-01

    The human xeroderma pigmentosum group B (XPB) helicase is essential for transcription, nucleotide excision repair, and TFIIH functional assembly. Here, we determined crystal structures of an Archaeoglobus fulgidus XPB homolog (AfXPB) that characterize two RecA-like XPB helicase domains and discover a DNA damage recognition domain (DRD), a unique RED motif, a flexible thumb motif (ThM), and implied conformational changes within a conserved functional core. RED motif mutations dramatically reduce helicase activity, and the DRD and ThM, which flank the RED motif, appear structurally as well as functionally analogous to the MutS mismatch recognition and DNA polymerase thumb domains. Substrate specificity is altered by DNA damage, such that AfXPB unwinds dsDNA with 3' extensions, but not blunt-ended dsDNA, unless it contains a lesion, as shown for CPD or (6-4) photoproducts. Together, these results provide an unexpected mechanism of DNA unwinding with Implications for XPB damage verification in nucleotide excision repair.

  6. The Carboxy Terminus of YCF1 Contains a Motif Conserved throughout >500 Myr of Streptophyte Evolution

    PubMed Central

    Archibald, John M.; Gould, Sven B.

    2017-01-01

    Plastids evolved from cyanobacteria by endosymbiosis. During the course of evolution, the coding capacity of plastid genomes shrinks due to gene loss or transfer to the nucleus. In the green lineage, however, there were apparent gene gains including that of ycf1. Although its function is still debated, YCF1 has proven to be a useful marker for plastid evolution. YCF1 sequence and predicted structural features unite the plastid genomes of land plants with those of their closest algal relatives, the higher streptophyte algae; YCF1 appears to have undergone pronounced changes during the course of streptophyte algal evolution. Using new data, we show that YCF1 underwent divergent evolution in the common ancestor of higher streptophyte algae and Klebsormidiophycae. This divergence resulted in the origin of an extreme, klebsormidiophycean-specific YCF1 and the higher streptophyte Ste-YCF1. Most importantly, our analysis uncovers a conserved carboxy-terminal sequence stretch within YCF1 that is unique to higher streptophytes and hints at an important, yet unexplored function. PMID:28164224

  7. Sequence analysis of the L protein of the Ebola 2014 outbreak: Insight into conserved regions and mutations.

    PubMed

    Ayub, Gohar; Waheed, Yasir

    2016-06-01

    The 2014 Ebola outbreak was one of the largest that have occurred; it started in Guinea and spread to Nigeria, Liberia and Sierra Leone. Phylogenetic analysis of the current virus species indicated that this outbreak is the result of a divergent lineage of the Zaire ebolavirus. The L protein of Ebola virus (EBOV) is the catalytic subunit of the RNA‑dependent RNA polymerase complex, which, with VP35, is key for the replication and transcription of viral RNA. Earlier sequence analysis demonstrated that the L protein of all non‑segmented negative‑sense (NNS) RNA viruses consists of six domains containing conserved functional motifs. The aim of the present study was to analyze the presence of these motifs in 2014 EBOV isolates, highlight their function and how they may contribute to the overall pathogenicity of the isolates. For this purpose, 81 2014 EBOV L protein sequences were aligned with 475 other NNS RNA viruses, including Paramyxoviridae and Rhabdoviridae viruses. Phylogenetic analysis of all EBOV outbreak L protein sequences was also performed. Analysis of the amino acid substitutions in the 2014 EBOV outbreak was conducted using sequence analysis. The alignment demonstrated the presence of previously conserved motifs in the 2014 EBOV isolates and novel residues. Notably, all the mutations identified in the 2014 EBOV isolates were tolerant, they were pathogenic with certain examples occurring within previously determined functional conserved motifs, possibly altering viral pathogenicity, replication and virulence. The phylogenetic analysis demonstrated that all sequences with the exception of the 2014 EBOV sequences were clustered together. The 2014 EBOV outbreak has acquired a great number of mutations, which may explain the reasons behind this unprecedented outbreak. Certain residues critical to the function of the polymerase remain conserved and may be targets for the development of antiviral therapeutic agents.

  8. Nucleotide sequence conservation of novel and established cis-regulatory sites within the tyrosine hydroxylase gene promoter

    PubMed Central

    Wang, Meng; Banerjee, Kasturi; Baker, Harriet; Cave, John W.

    2015-01-01

    Tyrosine hydroxylase (TH) is the rate-limiting enzyme in catecholamine biosynthesis and its gene proximal promoter ( < 1 kb upstream from the transcription start site) is essential for regulating transcription in both the developing and adult nervous systems. Several putative regulatory elements within the TH proximal promoter have been reported, but evolutionary conservation of these elements has not been thoroughly investigated. Since many vertebrate species are used to model development, function and disorders of human catecholaminergic neurons, identifying evolutionarily conserved transcription regulatory mechanisms is a high priority. In this study, we align TH proximal promoter nucleotide sequences from several vertebrate species to identify evolutionarily conserved motifs. This analysis identified three elements (a TATA box, cyclic AMP response element (CRE) and a 5′-GGTGG-3′ site) that constitute the core of an ancient vertebrate TH promoter. Focusing on only eutherian mammals, two regions of high conservation within the proximal promoter were identified: a ∼250 bp region adjacent to the transcription start site and a ∼85 bp region located approximately 350 bp further upstream. Within both regions, conservation of previously reported cis-regulatory motifs and human single nucleotide variants was evaluated. Transcription reporter assays in a TH -expressing cell line demonstrated the functionality of highly conserved motifs in the proximal promoter regions and electromobility shift assays showed that brain-region specific complexes assemble on these motifs. These studies also identified a non-canonical CRE binding (CREB) protein recognition element in the proximal promoter. Together, these studies provide a detailed analysis of evolutionary conservation within the TH promoter and identify potential cis-regulatory motifs that underlie a core set of regulatory mechanisms in mammals. PMID:25774193

  9. The Motif Tool Assessment Platform (MTAP) for sequence-based transcription factor binding site prediction tools.

    PubMed

    Quest, Daniel; Ali, Hesham

    2010-01-01

    Predicting transcription factor binding sites (TFBS) from sequence is one of the most challenging problems in computational biology. The development of (semi-)automated computer-assisted prediction methods is needed to find TFBS over an entire genome, which is a first step in reconstructing mechanisms that control gene activity. Bioinformatics journals continue to publish diverse methods for predicting TFBS on a monthly basis. To help practitioners in deciding which method to use to predict for a particular TFBS, we provide a platform to assess the quality and applicability of the available methods. Assessment tools allow researchers to determine how methods can be expected to perform on specific organisms or on specific transcription factor families. This chapter introduces the TFBS detection problem and reviews current strategies for evaluating algorithm effectiveness. In this chapter, a novel and robust assessment tool, the Motif Tool Assessment Platform (MTAP), is introduced and discussed.

  10. Function of the PEX19-binding site of human adrenoleukodystrophy protein as targeting motif in man and yeast. PMP targeting is evolutionarily conserved.

    PubMed

    Halbach, André; Lorenzen, Stephan; Landgraf, Christiane; Volkmer-Engert, Rudolf; Erdmann, Ralf; Rottensteiner, Hanspeter

    2005-06-03

    We predicted in human peroxisomal membrane proteins (PMPs) the binding sites for PEX19, a key player in the topogenesis of PMPs, by virtue of an algorithm developed for yeast PMPs. The best scoring PEX19-binding site was found in the adrenoleukodystrophy protein (ALDP). The identified site was indeed bound by human PEX19 and was also recognized by the orthologous yeast PEX19 protein. Likewise, both human and yeast PEX19 bound with comparable affinities to the PEX19-binding site of the yeast PMP Pex13p. Interestingly, the identified PEX19-binding site of ALDP coincided with its previously determined targeting motif. We corroborated the requirement of the ALDP PEX19-binding site for peroxisomal targeting in human fibroblasts and showed that the minimal ALDP fragment targets correctly also in yeast, again in a PEX19-binding site-dependent manner. Furthermore, the human PEX19-binding site of ALDP proved interchangeable with that of yeast Pex13p in an in vivo targeting assay. Finally, we showed in vitro that most of the predicted binding sequences of human PMPs represent true binding sites for human PEX19, indicating that human PMPs harbor common PEX19-binding sites that do resemble those of yeast. Our data clearly revealed a role for PEX19-binding sites as PMP-targeting motifs across species, thereby demonstrating the evolutionary conservation of PMP signal sequences from yeast to man.

  11. The structure of an endogenous Drosophila centromere reveals the prevalence of tandemly repeated sequences able to form i-motifs

    PubMed Central

    Garavís, Miguel; Méndez-Lago, María; Gabelica, Valérie; Whitehead, Siobhan L.; González, Carlos; Villasante, Alfredo

    2015-01-01

    Centromeres are the chromosomal loci at which spindle microtubules attach to mediate chromosome segregation during mitosis and meiosis. In most eukaryotes, centromeres are made up of highly repetitive DNA sequences (satellite DNA) interspersed with middle repetitive DNA sequences (transposable elements). Despite the efforts to establish complete genomic sequences of eukaryotic organisms, the so-called ‘finished’ genomes are not actually complete because the centromeres have not been assembled due to the intrinsic difficulties in constructing both physical maps and complete sequence assemblies of long stretches of tandemly repetitive DNA. Here we show the first molecular structure of an endogenous Drosophila centromere and the ability of the C-rich dodeca satellite strand to form dimeric i-motifs. The finding of i-motif structures in simple and complex centromeric satellite DNAs leads us to suggest that these centromeric sequences may have been selected not by their primary sequence but by their ability to form noncanonical secondary structures. PMID:26289671

  12. The structure of an endogenous Drosophila centromere reveals the prevalence of tandemly repeated sequences able to form i-motifs.

    PubMed

    Garavís, Miguel; Méndez-Lago, María; Gabelica, Valérie; Whitehead, Siobhan L; González, Carlos; Villasante, Alfredo

    2015-08-20

    Centromeres are the chromosomal loci at which spindle microtubules attach to mediate chromosome segregation during mitosis and meiosis. In most eukaryotes, centromeres are made up of highly repetitive DNA sequences (satellite DNA) interspersed with middle repetitive DNA sequences (transposable elements). Despite the efforts to establish complete genomic sequences of eukaryotic organisms, the so-called 'finished' genomes are not actually complete because the centromeres have not been assembled due to the intrinsic difficulties in constructing both physical maps and complete sequence assemblies of long stretches of tandemly repetitive DNA. Here we show the first molecular structure of an endogenous Drosophila centromere and the ability of the C-rich dodeca satellite strand to form dimeric i-motifs. The finding of i-motif structures in simple and complex centromeric satellite DNAs leads us to suggest that these centromeric sequences may have been selected not by their primary sequence but by their ability to form noncanonical secondary structures.

  13. Inference of transcriptional networks in Arabidopsis through conserved noncoding sequence analysis.

    PubMed

    Van de Velde, Jan; Heyndrickx, Ken S; Vandepoele, Klaas

    2014-07-01

    Transcriptional regulation plays an important role in establishing gene expression profiles during development or in response to (a)biotic stimuli. Transcription factor binding sites (TFBSs) are the functional elements that determine transcriptional activity, and the identification of individual TFBS in genome sequences is a major goal to inferring regulatory networks. We have developed a phylogenetic footprinting approach for the identification of conserved noncoding sequences (CNSs) across 12 dicot plants. Whereas both alignment and non-alignment-based techniques were applied to identify functional motifs in a multispecies context, our method accounts for incomplete motif conservation as well as high sequence divergence between related species. We identified 69,361 footprints associated with 17,895 genes. Through the integration of known TFBS obtained from the literature and experimental studies, we used the CNSs to compile a gene regulatory network in Arabidopsis thaliana containing 40,758 interactions, of which two-thirds act through binding events located in DNase I hypersensitive sites. This network shows significant enrichment toward in vivo targets of known regulators, and its overall quality was confirmed using five different biological validation metrics. Finally, through the integration of detailed expression and function information, we demonstrate how static CNSs can be converted into condition-dependent regulatory networks, offering opportunities for regulatory gene annotation.

  14. A Developmental Sequence of Skills Leading to Conservation

    ERIC Educational Resources Information Center

    Walker, Alice A.

    1978-01-01

    Examines the developmental sequence of skills involved in the understanding of relational concepts and in the development of conservation. Fifty kindergarten children participated in the study. (BD/BR)

  15. A Developmental Sequence of Skills Leading to Conservation

    ERIC Educational Resources Information Center

    Walker, Alice A.

    1978-01-01

    Examines the developmental sequence of skills involved in the understanding of relational concepts and in the development of conservation. Fifty kindergarten children participated in the study. (BD/BR)

  16. Direct RNA motif definition and identification from multiple sequence alignments using secondary structure profiles.

    PubMed

    Gautheret, D; Lambert, A

    2001-11-09

    We present here a new approach to the problem of defining RNA signatures and finding their occurrences in sequence databases. The proposed method is based on "secondary structure profiles". An RNA sequence alignment with secondary structure information is used as an input. Two types of weight matrices/profiles are constructed from this alignment: single strands are represented by a classical lod-scores profile while helical regions are represented by an extended "helical profile" comprising 16 lod-scores per position, one for each of the 16 possible base-pairs. Database searches are then conducted using a simultaneous search for helical profiles and dynamic programming alignment of single strand profiles. The algorithm has been implemented into a new software, ERPIN, that performs both profile construction and database search. Applications are presented for several RNA motifs. The automated use of sequence information in both single-stranded and helical regions yields better sensitivity/specificity ratios than descriptor-based programs. Furthermore, since the translation of alignments into profiles is straightforward with ERPIN, iterative searches can easily be conducted to enrich collections of homologous RNAs. Copyright 2001 Academic Press.

  17. Exceptional motifs in different Markov chain models for a statistical analysis of DNA sequences.

    PubMed

    Schbath, S; Prum, B; de Turckheim, E

    1995-01-01

    Identifying exceptional motifs is often used for extracting information from long DNA sequences. The two difficulties of the method are the choice of the model that defines the expected frequencies of words and the approximation of the variance of the difference T(W) between the number of occurrences of a word W and its estimation. We consider here different Markov chain models, either with stationary or periodic transition probabilities. We estimate the variance of the difference T(W) by the conditional variance of the number of occurrences of W given the oligonucleotides counts that define the model. Two applications show how to use asymptotically standard normal statistics associated with the counts to describe a given sequence in terms of its outlying words. Sequences of Escherichia coli and of Bacillus subtilis are compared with respect to their exceptional tri- and tetranucleotides. For both bacteria, exceptional 3-words are mainly found in the coding frame. E. coli palindrome counts are analyzed in different models, showing that many overabundant words are one-letter mutations of avoided palindromes.

  18. SIRW: a web server for the Simple Indexing and Retrieval System that combines sequence motif searches with keyword searches

    PubMed Central

    Ramu, Chenna

    2003-01-01

    SIRW (http://sirw.embl.de/) is a World Wide Web interface to the Simple Indexing and Retrieval System (SIR) that is capable of parsing and indexing various flat file databases. In addition it provides a framework for doing sequence analysis (e.g. motif pattern searches) for selected biological sequences through keyword search. SIRW is an ideal tool for the bioinformatics community for searching as well as analyzing biological sequences of interest. PMID:12824415

  19. SIRW: A web server for the Simple Indexing and Retrieval System that combines sequence motif searches with keyword searches.

    PubMed

    Ramu, Chenna

    2003-07-01

    SIRW (http://sirw.embl.de/) is a World Wide Web interface to the Simple Indexing and Retrieval System (SIR) that is capable of parsing and indexing various flat file databases. In addition it provides a framework for doing sequence analysis (e.g. motif pattern searches) for selected biological sequences through keyword search. SIRW is an ideal tool for the bioinformatics community for searching as well as analyzing biological sequences of interest.

  20. Synthesis, anti-mycobacterial activity and DNA sequence-selectivity of a library of biaryl-motifs containing polyamides.

    PubMed

    Brucoli, Federico; Guzman, Juan D; Maitra, Arundhati; James, Colin H; Fox, Keith R; Bhakta, Sanjib

    2015-07-01

    The alarming rise of extensively drug-resistant tuberculosis (XDR-TB) strains, compel the development of new molecules with novel modes of action to control this world health emergency. Distamycin analogues containing N-terminal biaryl-motifs 2(1-5)(1-7) were synthesised using a solution-phase approach and evaluated for their anti-mycobacterial activity and DNA-sequence selectivity. Thiophene dimer motif-containing polyamide 2(2,6) exhibited 10-fold higher inhibitory activity against Mycobacterium tuberculosis compared to distamycin and library member 2(5,7) showed high binding affinity for the 5'-ACATAT-3' sequence.

  1. The nature of actinomycin D binding to d(AACCAXYG) sequence motifs.

    PubMed

    Chen, Fu-Ming; Sha, Feng; Chin, Ko-Hsin; Chou, Shan-Ho

    2004-01-01

    Earlier studies by others had indicated that actinomycin D (ACTD) binds well to d(AACCATAG) and the end sequence TAG-3' is essential for its strong binding. In an effort to verify these assertions and to uncover other possible strong ACTD binding sequences as well as to elucidate the nature of their binding, systematic studies have been carried out with oligomers of d(AACCAXYG) sequence motifs, where X and Y can be any DNA base. The results indicate that in addition to TAG-3', oligomers ending with XAG-3' and XCG-3' all provide binding constants > or =1 x 10(7) M(-1) and even sequences ending with XTG-3' and XGG-3' exhibit binding affinities in the range 1-8 x 10(6) M(-1). The nature of the strong ACTD affinity of the sequences d(A1A2C3C4A5X6Y7G8) was delineated via comparative binding studies of d(AACCAAAG), d(AGCCAAAG) and their base substituted derivatives. Two binding modes are proposed to coexist, with the major component consisting of the 3'-terminus G base folding back to base pair with C4 and the ACTD inserting at A2C3C4 by looping out the C3 while both faces of the chromophore are stacked by A and G bases, respectively. The minor mode is for the G to base pair with C3 and to have the same A/chromophore/G stacking but without a looped out base. These assertions are supported by induced circular dichroic and fluorescence spectral measurements.

  2. Unsupervised statistical discovery of spaced motifs in prokaryotic genomes.

    PubMed

    Tong, Hao; Schliekelman, Paul; Mrázek, Jan

    2017-01-05

    DNA sequences contain repetitive motifs which have various functions in the physiology of the organism. A number of methods have been developed for discovery of such sequence motifs with a primary focus on detection of regulatory motifs and particularly transcription factor binding sites. Most motif-finding methods apply probabilistic models to detect motifs characterized by unusually high number of copies of the motif in the analyzed sequences. We present a novel method for detection of pairs of motifs separated by spacers of variable nucleotide sequence but conserved length. Unlike existing methods for motif discovery, the motifs themselves are not required to occur at unusually high frequency but only to exhibit a significant preference to occur at a specific distance from each other. In the present implementation of the method, motifs are represented by pentamers and all pairs of pentamers are evaluated for statistically significant preference for a specific distance. An important step of the algorithm eliminates motif pairs where the spacers separating the two motifs exhibit a high degree of sequence similarity; such motif pairs likely arise from duplications of the whole segment including the motifs and the spacer rather than due to selective constraints indicative of a functional importance of the motif pair. The method was used to scan 569 complete prokaryotic genomes for novel sequence motifs. Some motifs detected were previously known but other motifs found in the search appear to be novel. Selected motif pairs were subjected to further investigation and in some cases their possible biological functions were proposed. We present a new motif-finding technique that is applicable to scanning complete genomes for sequence motifs. The results from analysis of 569 genomes suggest that the method detects previously known motifs that are expected to be found as well as new motifs that are unlikely to be discovered by traditional motif-finding methods. We conclude

  3. Functional Role of Histidine in the Conserved His-x-Asp Motif in the Catalytic Core of Protein Kinases.

    PubMed

    Zhang, Lun; Wang, Jian-Chuan; Hou, Li; Cao, Peng-Rong; Wu, Li; Zhang, Qian-Sen; Yang, Huai-Yu; Zang, Yi; Ding, Jian-Ping; Li, Jia

    2015-05-11

    The His-x-Asp (HxD) motif is one of the most conserved structural components of the catalytic core of protein kinases; however, the functional role of the conserved histidine is unclear. Here we report that replacement of the HxD-histidine with Arginine or Phenylalanine in Aurora A abolishes both the catalytic activity and auto-phosphorylation, whereas the Histidine-to-tyrosine impairs the catalytic activity without affecting its auto-phosphorylation. Comparisons of the crystal structures of wild-type (WT) and mutant Aurora A demonstrate that the impairment of the kinase activity is accounted for by (1) disruption of the regulatory spine in the His-to-Arg mutant, and (2) change in the geometry of backbones of the Asp-Phe-Gly (DFG) motif and the DFG-1 residue in the His-to-Tyr mutant. In addition, bioinformatics analyses show that the HxD-histidine is a mutational hotspot in tumor tissues. Moreover, the H174R mutation of the HxD-histidine, in the tumor suppressor LKB1 abrogates the inhibition of anchorage-independent growth of A549 cells by WT LKB1. Based on these data, we propose that the HxD-histidine is involved in a conserved inflexible organization of the catalytic core that is required for the kinase activity. Mutation of the HxD-histidine may also be involved in the pathogenesis of some diseases including cancer.

  4. Tyrosine-heme ligation in heme-peptide complex: design based on conserved motif of catalase.

    PubMed

    Rai, Jagdish; Raghothama, S; Sahal, D

    2007-06-01

    On the basis of evolutionary conservation of sequence in catalases, we have designed a heme-binding peptide (Ac-RLKSYTDTQISR12-(GGGG)-CRIVHC22-NH2) for the 'redox activity modulation' of heme. Heme-binding studies showed a blue-shifted Soret (369 nm) in the presence of TFE and a red-shifted Soret (418 nm) in the absence of TFE. These blue- and red-shifted Sorets suggest ligation through tyrosinate and histidine, respectively. This is the first designed peptide ligating to heme through tyrosine. NMR studies have confirmed that tyrosine ligation to heme in this heme-peptide complex occurs only in the presence of TFE. We suggest that TFE induces helicity in the peptide and brings the arginine and tyrosine in proximity, resulting in ionization of the phenolic side chain of tyrosine. In the absence of TFE, the unstructured peptide lacks the intra-molecular Arg(+)Tyr(-) ion pair, allowing heme binding to histidine. This peptide has significant peroxidase activity though it does not have catalase activity. Copyright (c) 2007 European Peptide Society and John Wiley & Sons, Ltd.

  5. A highly conserved redox-active Mx(2)CWx(6)R motif regulates Zap70 stability and activity

    PubMed Central

    Thurm, Christoph; Poltorak, Mateusz P.; Reimer, Elisa; Brinkmann, Melanie M.; Leichert, Lars; Schraven, Burkhart; Simeoni, Luca

    2017-01-01

    ζ-associated protein of 70 kDa (Zap70) is crucial for T-cell receptor (TCR) signaling. Loss of Zap70 in both humans and mice results in severe immunodeficiency. On the other hand, the expression of Zap70 in B-cell malignancies correlates with the severity of the disease. Because of its role in immune-related disorders, Zap70 has become a therapeutic target for the treatment of human diseases. It is well-established that the activity/expression of Zap70 is regulated by post-translational modifications of crucial amino acids including the phosphorylation of tyrosines and the ubiquitination of lysines. Here, we have investigated whether also oxidation of cysteine residues regulates Zap70 functions. We have identified C575 as a major sulfenylation site of Zap70. A C575A substitution results in protein instability, reduced activity, and increased dependency on the Hsp90/Cdc37 chaperone system. Indeed, Cdc37 overexpression reconstituted partially the expression but fully the function of Zap70C575A. C575 lies within a Mx(2)CWx(6)R motif which is highly conserved among almost all human tyrosine kinases. Mutation of any of the conserved amino acids, but not of a non-conserved residue preceding the cysteine, also results in Zap70 instability. Collectively, we have identified a new redox-active motif which is crucial for the regulation of Zap70 stability/activity. We believe that this motif has the potential to become a novel target for the development of therapeutic tools to modulate the expression/activity of kinases. PMID:28415650

  6. SNPs Occur in Regions with Less Genomic Sequence Conservation

    PubMed Central

    Castle, John C.

    2011-01-01

    Rates of SNPs (single nucleotide polymorphisms) and cross-species genomic sequence conservation reflect intra- and inter-species variation, respectively. Here, I report SNP rates and genomic sequence conservation adjacent to mRNA processing regions and show that, as expected, more SNPs occur in less conserved regions and that functional regions have fewer SNPs. Results are confirmed using both mouse and human data. Regions include protein start codons, 3′ splice sites, 5′ splice sites, protein stop codons, predicted miRNA binding sites, and polyadenylation sites. Throughout, SNP rates are lower and conservation is higher at regulatory sites. Within coding regions, SNP rates are highest and conservation is lowest at codon position three and the fewest SNPs are found at codon position two, reflecting codon degeneracy for amino acid encoding. Exon splice sites show high conservation and very low SNP rates, reflecting both splicing signals and protein coding. Relaxed constraint on the codon third position is dramatically seen when separating exonic SNP rates based on intron phase. At polyadenylation sites, a peak of conservation and low SNP rate occurs from 30 to 17 nt preceding the site. This region is highly enriched for the sequence AAUAAA, reflecting the location of the conserved polyA signal. miRNA 3′ UTR target sites are predicted incorporating interspecies genomic sequence conservation; SNP rates are low in these sites, again showing fewer SNPs in conserved regions. Together, these results confirm that SNPs, reflecting recent genetic variation, occur more frequently in regions with less evolutionarily conservation. PMID:21674007

  7. CSTminer: a web tool for the identification of coding and noncoding conserved sequence tags through cross-species genome comparison.

    PubMed

    Castrignanò, Tiziana; Canali, Alessandro; Grillo, Giorgio; Liuni, Sabino; Mignone, Flavio; Pesole, Graziano

    2004-07-01

    The identification and characterization of genome tracts that are highly conserved across species during evolution may contribute significantly to the functional annotation of whole-genome sequences. Indeed, such sequences are likely to correspond to known or unknown coding exons or regulatory motifs. Here, we present a web server implementing a previously developed algorithm that, by comparing user-submitted genome sequences, is able to identify statistically significant conserved blocks and assess their coding or noncoding nature through the measure of a coding potential score. The web tool, available at http://www.caspur.it/CSTminer/, is dynamically interconnected with the Ensembl genome resources and produces a graphical output showing a map of detected conserved sequences and annotated gene features.

  8. Structure of PEP carboxykinase from the succinate-producing Actinobacillus succinogenes: a new conserved active-site motif.

    PubMed

    Leduc, Yvonne A; Prasad, Lata; Laivenieks, Maris; Zeikus, J Gregory; Delbaere, Louis T J

    2005-07-01

    Actinobacillus succinogenes can produce, via fermentation, high concentrations of succinate, an important industrial commodity. A key enzyme in this pathway is phosphoenolpyruvate carboxykinase (PCK), which catalyzes the production of oxaloacetate from phosphoenolpyruvate and carbon dioxide, with the concomitant conversion of adenosine 5'-diphosphate to adenosine 5'-triphosphate. 1.85 and 1.70 A resolution structures of the native and a pyruvate/Mn(2+)/phosphate complex have been solved, respectively. The structure of the complex contains sulfhydryl reducing agents covalently bound to three cysteine residues via disulfide bonds. One of these cysteine residues (Cys285) is located in the active-site cleft and may be analogous to the putative reactive cysteine of PCK from Trypanosoma cruzi. Cys285 is also part of a previously unreported conserved motif comprising residues 280-287 and containing the pattern NXEXGXY(/F)A(/G); this new motif appears to have a structural role in stabilizing and positioning side chains that bind substrates and metal ions. The first few residues of this motif connect the two domains of the enzyme and a fulcrum point appears to be located near Asn280. In addition, an active-site Asp residue forms two coordinate bonds with the Mn(2+) ion present in the structure of the complex in a symmetrical bidentate manner, unlike in other PCK structures that contain a manganese ion.

  9. A conserved motif in JNK/p38-specific MAPK phosphatases as a determinant for JNK1 recognition and inactivation

    PubMed Central

    Liu, Xin; Zhang, Chen-Song; Lu, Chang; Lin, Sheng-Cai; Wu, Jia-Wei; Wang, Zhi-Xin

    2016-01-01

    Mitogen-activated protein kinases (MAPKs), important in a large array of signalling pathways, are tightly controlled by a cascade of protein kinases and by MAPK phosphatases (MKPs). MAPK signalling efficiency and specificity is modulated by protein–protein interactions between individual MAPKs and the docking motifs in cognate binding partners. Two types of docking interactions have been identified: D-motif-mediated interaction and FXF-docking interaction. Here we report the crystal structure of JNK1 bound to the catalytic domain of MKP7 at 2.4-Å resolution, providing high-resolution structural insight into the FXF-docking interaction. The 285FNFL288 segment in MKP7 directly binds to a hydrophobic site on JNK1 that is near the MAPK insertion and helix αG. Biochemical studies further reveal that this highly conserved structural motif is present in all members of the MKP family, and the interaction mode is universal and critical for the MKP-MAPK recognition and biological function. PMID:26988444

  10. Engineering Proteins with Enhanced Mechanical Stability by Force Specific Sequence Motifs

    PubMed Central

    Lu, Wenzhe; Negi, Surendra; Oberhauser, Andres F.; Braun, Werner

    2012-01-01

    Use of atomic force microscopy (AFM) has recently led to a better understanding of the molecular mechanisms of the unfolding process by mechanical forces; however, the rational design of novel proteins with specific mechanical strength remains challenging. We have approached this problem from a new perspective that generates linear physical-chemical properties (PCP) motifs from a limited AFM data set. Guided by our linear sequence analysis we designed and analyzed four new mutants of the titin I1 domain with the goal of increasing the domain's mechanical strength. All four mutants could be cloned and expressed as soluble proteins. AFM data indicate that at least two of the mutants have increased molecular mechanical strength. This observation suggests that the PCP method is useful to graft sequences specific for high mechanical stability to weak proteins to increase their mechanical stability, and represents an additional tool in the design of novel proteins besides steered molecular dynamics calculations, coarse grained simulations and phi-value analysis of the transition state. PMID:22274941

  11. Structural analysis of the regulatory elements of the type-II procollagen gene. Conservation of promoter and first intron sequences between human and mouse.

    PubMed Central

    Vikkula, M; Metsäranta, M; Syvänen, A C; Ala-Kokko, L; Vuorio, E; Peltonen, L

    1992-01-01

    Transcription of the type-II procollagen gene (COL2A1) is very specifically restricted to a limited number of tissues, particularly cartilages. In order to identify transcription-control motifs we have sequenced the promoter region and the first intron of the human and mouse COL2A1 genes. With the assumption that these motifs should be well conserved during evolution, we have searched for potential elements important for the tissue-specific transcription of the COL2A1 gene by aligning the two sequences with each other and with the available rat type-II procollagen sequence for the promoter. With this approach we could identify specific evolutionarily well-conserved motifs in the promoter area. On the other hand, several suggested regulatory elements in the promoter region did not show evolutionary conservation. In the middle of the first intron we found a cluster of well-conserved transcription-control elements and we conclude that these conserved motifs most probably possess a significant function in the control of the tissue-specific transcription of the COL2A1 gene. We also describe locations of additional, highly conserved nucleotide stretches, which are good candidate regions in the search for binding sites of yet-uncharacterized cartilage-specific transcription regulators of the COL2A1 gene. PMID:1637314

  12. Explorations of linked editosome domains leading to the discovery of motifs defining conserved pockets in editosome OB-folds.

    PubMed

    Park, Young-Jun; Hol, Wim G J

    2012-11-01

    Trypanosomatids form a group of protozoa which contain parasites of human, animals and plants. Several of these species cause major human diseases, including Trypanosoma brucei which is the causative agent of human African trypanosomiasis, also called sleeping sickness. These organisms have many highly unusual features including a unique U-insertion/deletion RNA editing process in the single mitochondrion. A key multi-protein complex, called the ∼20S editosome, or editosome, carries out a cascade of essential RNA-modifying reactions and contains a core of 12 different proteins of which six are the interaction proteins A1 to A6. Each of these interaction proteins comprises a C-terminal OB-fold and the smallest interaction protein A6 has been shown to interact with four other editosome OB-folds. Here we report the results of a "linked OB-fold" approach to obtain a view of how multiple OB-folds might interact in the core of the editosome. Constructs with variants of linked domains in 25 expression and co-expression experiments resulted in 13 soluble multi-OB-fold complexes. In several instances, these complexes were more homogeneous in size than those obtained from corresponding unlinked OB-folds. The crystal structure of A3(OB) linked to A6 could be elucidated and confirmed the tight interaction between these two OB domains as seen also in our recent complex of A3(OB) and A6 with nanobodies. In the current crystal structure of A3(OB) linked to A6, hydrophobic side chains reside in well-defined pockets of neighboring OB-fold domains. When analyzing the available crystal structures of editosome OB-folds, it appears that in five instances "Pocket 1" of A1(OB), A3(OB) and A6 is occupied by a hydrophobic side chain from a neighboring protein. In these three different OB-folds, Pocket 1 is formed by two conserved sequence motifs and an invariant arginine. These pockets might play a key role in the assembly or mechanism of the editosome by interacting with hydrophobic

  13. Comparative sequence and structure analysis reveals the conservation and diversity of nucleotide positions and their associated tertiary interactions in the riboswitches.

    PubMed

    Appasamy, Sri D; Ramlan, Effirul Ikhwan; Firdaus-Raih, Mohd

    2013-01-01

    The tertiary motifs in complex RNA molecules play vital roles to either stabilize the formation of RNA 3D structure or to provide important biological functionality to the molecule. In order to better understand the roles of these tertiary motifs in riboswitches, we examined 11 representative riboswitch PDB structures for potential agreement of both motif occurrences and conservations. A total of 61 unique tertiary interactions were found in the reference structures. In addition to the expected common A-minor motifs and base-triples mainly involved in linking distant regions the riboswitch structures three highly conserved variants of A-minor interactions called G-minors were found in the SAM-I and FMN riboswitches where they appear to be involved in the recognition of the respective ligand's functional groups. From our structural survey as well as corresponding structure and sequence alignments, the agreement between motif occurrences and conservations are very prominent across the representative riboswitches. Our analysis provide evidence that some of these tertiary interactions are essential components to form the structure where their sequence positions are conserved despite a high degree of diversity in other parts of the respective riboswitches sequences. This is indicative of a vital role for these tertiary interactions in determining the specific biological function of riboswitch.

  14. Comparative Sequence and Structure Analysis Reveals the Conservation and Diversity of Nucleotide Positions and Their Associated Tertiary Interactions in the Riboswitches

    PubMed Central

    Appasamy, Sri D.; Ramlan, Effirul Ikhwan; Firdaus-Raih, Mohd

    2013-01-01

    The tertiary motifs in complex RNA molecules play vital roles to either stabilize the formation of RNA 3D structure or to provide important biological functionality to the molecule. In order to better understand the roles of these tertiary motifs in riboswitches, we examined 11 representative riboswitch PDB structures for potential agreement of both motif occurrences and conservations. A total of 61 unique tertiary interactions were found in the reference structures. In addition to the expected common A-minor motifs and base-triples mainly involved in linking distant regions the riboswitch structures three highly conserved variants of A-minor interactions called G-minors were found in the SAM-I and FMN riboswitches where they appear to be involved in the recognition of the respective ligand’s functional groups. From our structural survey as well as corresponding structure and sequence alignments, the agreement between motif occurrences and conservations are very prominent across the representative riboswitches. Our analysis provide evidence that some of these tertiary interactions are essential components to form the structure where their sequence positions are conserved despite a high degree of diversity in other parts of the respective riboswitches sequences. This is indicative of a vital role for these tertiary interactions in determining the specific biological function of riboswitch. PMID:24040136

  15. Comparative Analysis of P450 Signature Motifs EXXR and CXG in the Large and Diverse Kingdom of Fungi: Identification of Evolutionarily Conserved Amino Acid Patterns Characteristic of P450 Family

    PubMed Central

    Syed, Khajamohiddin; Mashele, Samson Sitheni

    2014-01-01

    Cytochrome P450 monooxygenases (P450s) are heme-thiolate proteins distributed across the biological kingdoms. P450s are catalytically versatile and play key roles in organisms primary and secondary metabolism. Identification of P450s across the biological kingdoms depends largely on the identification of two P450 signature motifs, EXXR and CXG, in the protein sequence. Once a putative protein has been identified as P450, it will be assigned to a family and subfamily based on the criteria that P450s within a family share more than 40% homology and members of subfamilies share more than 55% homology. However, to date, no evidence has been presented that can distinguish members of a P450 family. Here, for the first time we report the identification of EXXR- and CXG-motifs-based amino acid patterns that are characteristic of the P450 family. Analysis of P450 signature motifs in the under-explored fungal P450s from four different phyla, ascomycota, basidiomycota, zygomycota and chytridiomycota, indicated that the EXXR motif is highly variable and the CXG motif is somewhat variable. The amino acids threonine and leucine are preferred as second and third amino acids in the EXXR motif and proline and glycine are preferred as second and third amino acids in the CXG motif in fungal P450s. Analysis of 67 P450 families from biological kingdoms such as plants, animals, bacteria and fungi showed conservation of a set of amino acid patterns characteristic of a particular P450 family in EXXR and CXG motifs. This suggests that during the divergence of P450 families from a common ancestor these amino acids patterns evolve and are retained in each P450 family as a signature of that family. The role of amino acid patterns characteristic of a P450 family in the structural and/or functional aspects of members of the P450 family is a topic for future research. PMID:24743800

  16. Flow Cytometry-assisted Cloning of Specific Sequence Motifs fromComplex 16S ribosomal RNA Gene Libraries.

    SciTech Connect

    Nielsen, J.L.; Schramm, A.; Bernhard, A.E.; van den Engh, G.J.; Stahl, D.A.

    2004-07-21

    A flow cytometry method was developed for rapid screeningand recovery of cloned DNA containing common sequence motifs. Thisapproach, termed fluorescence-activated cell sorting-assisted cloning,was used to recover sequences affiliated with a unique lineage within theBacteroidetes not abundant in a clone library of environmental 16S rRNAgenes. Retrieval and sequence analysis of phylogenetically informativegenes has become a standard cultivation-independent technique toinvestigate microbial diversity in nature (7, 18). Genes encoding the 16SrRNA, because of the relative ease of their selective amplification, havebeen most frequently employed for general diversity surveys (16).Environmental studies have also focused on specific subpopulationsaffiliated with a phylogenetic group or identified by genes encodingspecific metabolic functions (e.g., ammonia oxidation, sulfaterespiration, and nitrate reduction) (8,15,20). However, specificpopulations may be of low abundance (1,23), or the genes encodingspecific metabolic functions may be insufficiently conserved to providepriming sites for general PCR amplification. Three general approacheshave been used to obtain 16S rRNA sequence information from low-abundancepopulations: screening hundreds to thousands of clones in a general 16SrRNA gene library (21), flow cytometric sorting of a subpopulation ofenvironmentally derived cells labeled by fluorescent in situhybridization (FISH) (27), or selective PCR amplification using primersspecific for the subpopulation (2,23). While the first approach is simplytime-consuming and tedious, the second has been restricted to fairlylarge and strongly fluorescent cells from aquatic samples (5, 27). Thethird approach often generates fragments of only a few hundred bases dueto the limited number of specific priming sites. Partial sequenceinformation often degrades analysis, obscuring or distorting thephylogenetic placement of the new sequences (11, 20). A more robustcharacterization of environ

  17. Structure of the Brd4 ET domain bound to a C-terminal motif from γ-retroviral integrases reveals a conserved mechanism of interaction.

    PubMed

    Crowe, Brandon L; Larue, Ross C; Yuan, Chunhua; Hess, Sonja; Kvaratskhelia, Mamuka; Foster, Mark P

    2016-02-23

    The bromodomain and extraterminal domain (BET) protein family are promising therapeutic targets for a range of diseases linked to transcriptional activation, cancer, viral latency, and viral integration. Tandem bromodomains selectively tether BET proteins to chromatin by engaging cognate acetylated histone marks, and the extraterminal (ET) domain is the focal point for recruiting a range of cellular and viral proteins. BET proteins guide γ-retroviral integration to transcription start sites and enhancers through bimodal interaction with chromatin and the γ-retroviral integrase (IN). We report the NMR-derived solution structure of the Brd4 ET domain bound to a conserved peptide sequence from the C terminus of murine leukemia virus (MLV) IN. The complex reveals a protein-protein interaction governed by the binding-coupled folding of disordered regions in both interacting partners to form a well-structured intermolecular three-stranded β sheet. In addition, we show that a peptide comprising the ET binding motif (EBM) of MLV IN can disrupt the cognate interaction of Brd4 with NSD3, and that substitutions of Brd4 ET residues essential for binding MLV IN also impair interaction of Brd4 with a number of cellular partners involved in transcriptional regulation and chromatin remodeling. This suggests that γ-retroviruses have evolved the EBM to mimic a cognate interaction motif to achieve effective integration in host chromatin. Collectively, our findings identify key structural features of the ET domain of Brd4 that allow for interactions with both cellular and viral proteins.

  18. Direct contacts between conserved motifs of different subunits provide major contribution to active site organization in human and mycobacterial dUTPases

    PubMed Central

    Takács, Enikő; Nagy, Gergely; Leveles, Ibolya; Harmat, Veronika; Lopata, Anna; Tóth, Judit; Vértessy, Beáta G.

    2010-01-01

    dUTPases are essential for genome integrity. Recent results allowed characterization of the role of conserved residues. Here we analyzed the Asp/Asn mutation within conserved Motif I of human and mycobacterial dUTPases, wherein the Asp residue was previously implicated in Mg2+-coordination. Our results on transient/steady-state kinetics, ligand-binding and a 1.80 Å-resolution structure of the mutant mycobacterial enzyme, in comparison with wild type and C-terminally truncated structures, argue that this residue has a major role in providing intra- and intersubunit contacts, but is not essential for Mg2+ accommodation. We conclude that in addition to the role of conserved motifs in substrate accommodation, direct subunit interaction between protein atoms of active site residues from different conserved motifs are crucial for enzyme function. PMID:20493855

  19. SarA, a global regulator of virulence determinants in Staphylococcus aureus, binds to a conserved motif essential for sar-dependent gene regulation.

    PubMed

    Chien, Y; Manna, A C; Projan, S J; Cheung, A L

    1999-12-24

    The expression of many virulence determinants in Staphylococcus aureus including alpha-hemolysin-, protein A-, and fibronectin-binding proteins is controlled by global regulatory loci such as sar and agr. In addition to controlling target gene expression via agr (e.g. alpha-hemolysin), the sar locus can also regulate target gene transcription via agr-independent mechanisms. In particular, we have found that SarA, the major regulatory protein encoded within sar, binds to a conserved sequence, homologous to the SarA-binding site on the agr promoter, upstream of the -35 promoter boxes of several target genes including hla (alpha-hemolysin gene), spa (protein A gene), fnb (fibronectin-binding protein genes), and sec (enterotoxin C gene). Deletion of the SarA recognition motif in the promoter regions of agr and hla in shuttle plasmids rendered the transcription of these genes undetectable in agr and hla mutants, respectively. Likewise, the transcription activity of spa (a gene normally repressed by sar), as measured by a XylE reporter fusion assay, became derepressed in a wild type strain containing a shuttle plasmid in which the SarA recognition site had been deleted from the spa promoter region. However, DNase I footprinting assays demonstrated that the SarA-binding region on the spa and hla promoter is more extensive than the predicted consensus sequence, thus raising the possibility that the consensus sequence is an activation site within a larger binding region. Because the sar and agr regulate an assortment of virulence factors in S. aureus, we propose, based on our data, a unifying hypothesis for virulence gene activation in S. aureus whereby SarA is a regulatory protein that binds to its consensus SarA recognition motif to activate (e.g. hla) or repress (e.g. spa) the transcription of sar target genes, thus accounting for both agr-dependent and agr-independent mode of regulation.

  20. Localization of Proteins to the 1,2-Propanediol Utilization Microcompartment by Non-native Signal Sequences Is Mediated by a Common Hydrophobic Motif*

    PubMed Central

    Jakobson, Christopher M.; Kim, Edward Y.; Slininger, Marilyn F.; Chien, Alex; Tullman-Ercek, Danielle

    2015-01-01

    Various bacteria localize metabolic pathways to proteinaceous organelles known as bacterial microcompartments (MCPs), enabling the metabolism of carbon sources to enhance survival and pathogenicity in the gut. There is considerable interest in exploiting bacterial MCPs for metabolic engineering applications, but little is known about the interactions between MCP signal sequences and the protein shells of different MCP systems. We found that the N-terminal sequences from the ethanolamine utilization (Eut) and glycyl radical-generating protein MCPs are able to target reporter proteins to the 1,2-propanediol utilization (Pdu) MCP, and that this localization is mediated by a conserved hydrophobic residue motif. Recapitulation of this motif by the addition of a single amino acid conferred targeting function on an N-terminal sequence from the ethanol utilization MCP system that previously did not act as a Pdu signal sequence. Moreover, the Pdu-localized signal sequences competed with native Pdu targeting sequences for encapsulation in the Pdu MCP. Salmonella enterica natively possesses both the Pdu and Eut operons, and our results suggest that Eut proteins might be localized to the Pdu MCP in vivo. We further demonstrate that S. enterica LT2 retained the ability to grow on 1,2-propanediol as the sole carbon source when a Pdu enzyme was replaced with its Eut homolog. Although the relevance of this finding to the native system remains to be explored, we show that the Pdu-localized signal sequences described herein allow control over the ratio of heterologous proteins encapsulated within Pdu MCPs. PMID:26283792

  1. Localization of proteins to the 1,2-propanediol utilization microcompartment by non-native signal sequences is mediated by a common hydrophobic motif.

    PubMed

    Jakobson, Christopher M; Kim, Edward Y; Slininger, Marilyn F; Chien, Alex; Tullman-Ercek, Danielle

    2015-10-02

    Various bacteria localize metabolic pathways to proteinaceous organelles known as bacterial microcompartments (MCPs), enabling the metabolism of carbon sources to enhance survival and pathogenicity in the gut. There is considerable interest in exploiting bacterial MCPs for metabolic engineering applications, but little is known about the interactions between MCP signal sequences and the protein shells of different MCP systems. We found that the N-terminal sequences from the ethanolamine utilization (Eut) and glycyl radical-generating protein MCPs are able to target reporter proteins to the 1,2-propanediol utilization (Pdu) MCP, and that this localization is mediated by a conserved hydrophobic residue motif. Recapitulation of this motif by the addition of a single amino acid conferred targeting function on an N-terminal sequence from the ethanol utilization MCP system that previously did not act as a Pdu signal sequence. Moreover, the Pdu-localized signal sequences competed with native Pdu targeting sequences for encapsulation in the Pdu MCP. Salmonella enterica natively possesses both the Pdu and Eut operons, and our results suggest that Eut proteins might be localized to the Pdu MCP in vivo. We further demonstrate that S. enterica LT2 retained the ability to grow on 1,2-propanediol as the sole carbon source when a Pdu enzyme was replaced with its Eut homolog. Although the relevance of this finding to the native system remains to be explored, we show that the Pdu-localized signal sequences described herein allow control over the ratio of heterologous proteins encapsulated within Pdu MCPs.

  2. Helicobacter pylori CagA: analysis of sequence diversity in relation to phosphorylation motifs and implications for the role of CagA as a virulence factor.

    PubMed

    Evans, D J; Evans, D G

    2001-09-01

    CagA is transported into host target cells and subsequently phosphorylated. Clearly this is a mechanism by which Helicobacter pylori could take control of one or more host cell signal transduction pathways. Presumably the end result of this interaction favors survival of H. pylori, irrespective of eventual damage to the host cell. CagA is noted for its amino acid (AA) sequence diversity, both within and outside the variable region of the molecule. The primary purpose of this review is to examine how variation in the type and number of CagA phosphorylation sites might determine the outcome of infection by different strains of H. pylori. The answer to this question could help to explain the widely disparate results obtained when H. pylori CagA status has been compared to type and severity of disease outcome in different populations, that is in different countries. Analysis of all available CagA sequences revealed that CagA contains both tyrosine phosphorylation motifs (TPMs) and cyclic-AMP-dependent phosphorylation motifs (CPMs). There are two potential CPMs near the N-terminus of CagA and at least two in the repeat region; these are not all equally well conserved. We also defined a 48-residue AA sequence, which includes the N-terminal TPM at tyrosine (Y)-122, which distinguishes between Eastern (Hong Kong-Taiwan-Japan-Thailand) H. pylori isolates and those from the West (Europe-Africa-the Americas-Australia). All 28 of the Eastern type CagA proteins have a functional N-terminal TPM whereas 11 of 47 (23.4%) of the Western type contain an inactive motif, with threonine (T) replacing the critical aspartic acid (D) residue. Only 13 of 24 (54%) known CagA sequences have an active TPM in the repeat region and only one has two TPMs in this region. The potential TPM near the C-terminus of CagA is not likely to be important since only 3 of 24 (12.5%) sequences were found to be intact. Protein database searches revealed that the AA sequence immediately following the TPM at Y

  3. Endocytosis and Trafficking of Natriuretic Peptide Receptor-A: Potential Role of Short Sequence Motifs

    PubMed Central

    Pandey, Kailash N.

    2015-01-01

    The targeted endocytosis and redistribution of transmembrane receptors among membrane-bound subcellular organelles are vital for their correct signaling and physiological functions. Membrane receptors committed for internalization and trafficking pathways are sorted into coated vesicles. Cardiac hormones, atrial and brain natriuretic peptides (ANP and BNP) bind to guanylyl cyclase/natriuretic peptide receptor-A (GC-A/NPRA) and elicit the generation of intracellular second messenger cyclic guanosine 3',5'-monophosphate (cGMP), which lowers blood pressure and incidence of heart failure. After ligand binding, the receptor is rapidly internalized, sequestrated, and redistributed into intracellular locations. Thus, NPRA is considered a dynamic cellular macromolecule that traverses different subcellular locations through its lifetime. The utilization of pharmacologic and molecular perturbants has helped in delineating the pathways of endocytosis, trafficking, down-regulation, and degradation of membrane receptors in intact cells. This review describes the investigation of the mechanisms of internalization, trafficking, and redistribution of NPRA compared with other cell surface receptors from the plasma membrane into the cell interior. The roles of different short-signal peptide sequence motifs in the internalization and trafficking of other membrane receptors have been briefly reviewed and their potential significance in the internalization and trafficking of NPRA is discussed. PMID:26151885

  4. Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks.

    PubMed Central

    Tatusov, R L; Altschul, S F; Koonin, E V

    1994-01-01

    We describe an approach to analyzing protein sequence databases that, starting from a single uncharacterized sequence or group of related sequences, generates blocks of conserved segments. The procedure involves iterative database scans with an evolving position-dependent weight matrix constructed from a coevolving set of aligned conserved segments. For each iteration, the expected distribution of matrix scores under a random model is used to set a cutoff score for the inclusion of a segment in the next iteration. This cutoff may be calculated to allow the chance inclusion of either a fixed number or a fixed proportion of false positive segments. With sufficiently high cutoff scores, the procedure converged for all alignment blocks studied, with varying numbers of iterations required. Different methods for calculating weight matrices from alignment blocks were compared. The most effective of those tested was a logarithm-of-odds, Bayesian-based approach that used prior residue probabilities calculated from a mixture of Dirichlet distributions. The procedure described was used to detect novel conserved motifs of potential biological importance. Images PMID:7991589

  5. Finishing and Special Motifs: Lessons Learned from CRISPR Analysis Using Next-Generation Draft Sequences ( 7th Annual SFAF Meeting, 2012)

    ScienceCinema

    Campbell, Catherine [Noblis

    2016-07-12

    Catherine Campbell on "Finishing and Special Motifs: Lessons learned from CRISPR analysis using next-generation draft sequences" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.

  6. Finishing and Special Motifs: Lessons Learned from CRISPR Analysis Using Next-Generation Draft Sequences ( 7th Annual SFAF Meeting, 2012)

    SciTech Connect

    Campbell, Catherine

    2012-06-01

    Catherine Campbell on "Finishing and Special Motifs: Lessons learned from CRISPR analysis using next-generation draft sequences" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.

  7. Triazine-Based Sequence-Defined Polymers with Side-Chain Diversity and Backbone-Backbone Interaction Motifs.

    PubMed

    Grate, Jay W; Mo, Kai-For; Daily, Michael D

    2016-03-14

    Sequence control in polymers, well-known in nature, encodes structure and functionality. Here we introduce a new architecture, based on the nucleophilic aromatic substitution chemistry of cyanuric chloride, that creates a new class of sequence-defined polymers dubbed TZPs. Proof of concept is demonstrated with two synthesized hexamers, having neutral and ionizable side chains. Molecular dynamics simulations show backbone-backbone interactions, including H-bonding motifs and pi-pi interactions. This architecture is arguably biomimetic while differing from sequence-defined polymers having peptide bonds. The synthetic methodology supports the structural diversity of side chains known in peptides, as well as backbone-backbone hydrogen-bonding motifs, and will thus enable new macromolecules and materials with useful functions.

  8. Rust Secreted Protein Ps87 Is Conserved in Diverse Fungal Pathogens and Contains a RXLR-like Motif Sufficient for Translocation into Plant Cells

    PubMed Central

    Gu, Biao; Kale, Shiv D.; Wang, Qinhu; Wang, Dinghe; Pan, Qiaona; Cao, Hua; Meng, Yuling; Kang, Zhensheng; Tyler, Brett M.; Shan, Weixing

    2011-01-01

    Background Effector proteins of biotrophic plant pathogenic fungi and oomycetes are delivered into host cells and play important roles in both disease development and disease resistance response. How obligate fungal pathogen effectors enter host cells is poorly understood. The Ps87 gene of Puccinia striiformis encodes a protein that is conserved in diverse fungal pathogens. Ps87 homologs from a clade containing rust fungi are predicted to be secreted. The aim of this study is to test whether Ps87 may act as an effector during Puccinia striiformis infection. Methodology/Principal Findings Yeast signal sequence trap assay showed that the rust protein Ps87 could be secreted from yeast cells, but a homolog from Magnaporthe oryzae that was not predicted to be secreted, could not. Cell re-entry and protein uptake assays showed that a region of Ps87 containing a conserved RXLR-like motif [K/R]RLTG was confirmed to be capable of delivering oomycete effector Avr1b into soybean leaf cells and carrying GFP into soybean root cells. Mutations in the Ps87 motif (KRLTG) abolished the protein translocation ability. Conclusions/Significance The results suggest that Ps87 and its secreted homologs could utilize similar protein translocation machinery as those of oomycete and other fungal pathogens. Ps87 did not show direct suppression activity on plant defense responses. These results suggest Ps87 may represent an “emerging effector” that has recently acquired the ability to enter plant cells but has not yet acquired the ability to alter host physiology. PMID:22076138

  9. Highly conserved repetitive DNA sequences are present at human centromeres.

    PubMed Central

    Grady, D L; Ratliff, R L; Robinson, D L; McCanlies, E C; Meyne, J; Moyzis, R K

    1992-01-01

    Highly conserved repetitive DNA sequence clones, largely consisting of (GGAAT)n repeats, have been isolated from a human recombinant repetitive DNA library by high-stringency hybridization with rodent repetitive DNA. This sequence, the predominant repetitive sequence in human satellites II and III, is similar to the essential core DNA of the Saccharomyces cerevisiae centromere, centromere DNA element (CDE) III. In situ hybridization to human telophase and Drosophila polytene chromosomes shows localization of the (GGAAT)n sequence to centromeric regions. Hyperchromicity studies indicate that the (GGAAT)n sequence exhibits unusual hydrogen bonding properties. The purine-rich strand alone has the same thermal stability as the duplex. Hyperchromicity studies of synthetic DNA variants indicate that all sequences with the composition (AATGN)n exhibit this unusual thermal stability. DNA-mobility-shift assays indicate that specific HeLa-cell nuclear proteins recognize this sequence with a relative affinity greater than 10(5). The extreme evolutionary conservation of this DNA sequence, its centromeric location, its unusual hydrogen bonding properties, its high affinity for specific nuclear proteins, and its similarity to functional centromeres isolated from yeast suggest that this sequence may be a component of the functional human centromere. Images PMID:1542662

  10. DNA recognition for virus assembly through multiple sequence-independent interactions with a helix-turn-helix motif

    PubMed Central

    Greive, Sandra J.; Fung, Herman K.H.; Chechik, Maria; Jenkins, Huw T.; Weitzel, Stephen E.; Aguiar, Pedro M.; Brentnall, Andrew S.; Glousieau, Matthieu; Gladyshev, Grigory V.; Potts, Jennifer R.; Antson, Alfred A.

    2016-01-01

    The helix-turn-helix (HTH) motif features frequently in protein DNA-binding assemblies. Viral pac site-targeting small terminase proteins possess an unusual architecture in which the HTH motifs are displayed in a ring, distinct from the classical HTH dimer. Here we investigate how such a circular array of HTH motifs enables specific recognition of the viral genome for initiation of DNA packaging during virus assembly. We found, by surface plasmon resonance and analytical ultracentrifugation, that individual HTH motifs of the Bacillus phage SF6 small terminase bind the packaging regions of SF6 and related SPP1 genome weakly, with little local sequence specificity. Nuclear magnetic resonance chemical shift perturbation studies with an arbitrary single-site substrate suggest that the HTH motif contacts DNA similarly to how certain HTH proteins contact DNA non-specifically. Our observations support a model where specificity is generated through conformational selection of an intrinsically bent DNA segment by a ring of HTHs which bind weakly but cooperatively. Such a system would enable viral gene regulation and control of the viral life cycle, with a minimal genome, conferring a major evolutionary advantage for SPP1-like viruses. PMID:26673721

  11. Defining RNA motif-aminoglycoside interactions via two-dimensional combinatorial screening and structure-activity relationships through sequencing.

    PubMed

    Velagapudi, Sai Pradeep; Disney, Matthew D

    2013-10-15

    RNA is an extremely important target for the development of chemical probes of function or small molecule therapeutics. Aminoglycosides are the most well studied class of small molecules to target RNA. However, the RNA motifs outside of the bacterial rRNA A-site that are likely to be bound by these compounds in biological systems is largely unknown. If such information were known, it could allow for aminoglycosides to be exploited to target other RNAs and, in addition, could provide invaluable insights into potential bystander targets of these clinically used drugs. We utilized two-dimensional combinatorial screening (2DCS), a library-versus-library screening approach, to select the motifs displayed in a 3×3 nucleotide internal loop library and in a 6-nucleotide hairpin library that bind with high affinity and selectivity to six aminoglycoside derivatives. The selected RNA motifs were then analyzed using structure-activity relationships through sequencing (StARTS), a statistical approach that defines the privileged RNA motif space that binds a small molecule. StARTS allowed for the facile annotation of the selected RNA motif-aminoglycoside interactions in terms of affinity and selectivity. The interactions selected by 2DCS generally have nanomolar affinities, which is higher affinity than the binding of aminoglycosides to a mimic of their therapeutic target, the bacterial rRNA A-site.

  12. Sequence and domain conservation of the coelacanth Hsp40 and Hsp90 chaperones suggests conservation of function.

    PubMed

    Bishop, Özlem Tastan; Edkins, Adrienne Lesley; Blatch, Gregory Lloyd

    2014-09-01

    Molecular chaperones and their associated co-chaperones play an important role in preserving and regulating the active conformational state of cellular proteins. The chaperone complement of the Indonesian Coelacanth, Latimeria menadoensis, was elucidated using transcriptomic sequences. Heat shock protein 90 (Hsp90) and heat shock protein 40 (Hsp40) chaperones, and associated co-chaperones were focused on, and homologous human sequences were used to search the sequence databases. Coelacanth homologs of the cytosolic, mitochondrial and endoplasmic reticulum (ER) homologs of human Hsp90 were identified, as well as all of the major co-chaperones of the cytosolic isoform. Most of the human Hsp40s were found to have coelacanth homologs, and the data suggested that all of the chaperone machinery for protein folding at the ribosome, protein translocation to cellular compartments such as the ER and protein degradation were conserved. Some interesting similarities and differences were identified when interrogating human, mouse, and zebrafish homologs. For example, DnaJB13 is predicted to be a non-functional Hsp40 in humans, mouse, and zebrafish due to a corrupted histidine-proline-aspartic acid (HPD) motif, while the coelacanth homolog has an intact HPD. These and other comparisons enabled important functional and evolutionary questions to be posed for future experimental studies.

  13. DNA consensus sequence motif for binding response regulator PhoP, a virulence regulator of Mycobacterium tuberculosis.

    PubMed

    He, Xiaoyuan; Wang, Shuishu

    2014-12-30

    Tuberculosis has reemerged as a serious threat to human health because of the increasing prevalence of drug-resistant strains and synergetic infection with HIV, prompting an urgent need for new and more efficient treatments. The PhoP-PhoR two-component system of Mycobacterium tuberculosis plays an important role in the virulence of the pathogen and thus represents a potential drug target. To study the mechanism of gene transcription regulation by response regulator PhoP, we identified a high-affinity DNA sequence for PhoP binding using systematic evolution of ligands by exponential enrichment. The sequence contains a direct repeat of two 7 bp motifs separated by a 4 bp spacer, TCACAGC(N4)TCACAGC. The specificity of the direct-repeat sequence for PhoP binding was confirmed by isothermal titration calorimetry and electrophoretic mobility shift assays. PhoP binds to the direct repeat as a dimer in a highly cooperative manner. We found many genes previously identified to be regulated by PhoP that contain the direct-repeat motif in their promoter sequences. Synthetic DNA fragments at the putative promoter-binding sites bind PhoP with variable affinity, which is related to the number of mismatches in the 7 bp motifs, the positions of the mismatches, and the spacer and flanking sequences. Phosphorylation of PhoP increases the affinity but does not change the specificity of DNA binding. Overall, our results confirm the direct-repeat sequence as the consensus motif for PhoP binding and thus pave the way for identification of PhoP directly regulated genes in different mycobacterial genomes.

  14. Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion

    PubMed Central

    Thomsen, Martin Christen Frølund; Nielsen, Morten

    2012-01-01

    Seq2Logo is a web-based sequence logo generator. Sequence logos are a graphical representation of the information content stored in a multiple sequence alignment (MSA) and provide a compact and highly intuitive representation of the position-specific amino acid composition of binding motifs, active sites, etc. in biological sequences. Accurate generation of sequence logos is often compromised by sequence redundancy and low number of observations. Moreover, most methods available for sequence logo generation focus on displaying the position-specific enrichment of amino acids, discarding the equally valuable information related to amino acid depletion. Seq2logo aims at resolving these issues allowing the user to include sequence weighting to correct for data redundancy, pseudo counts to correct for low number of observations and different logotype representations each capturing different aspects related to amino acid enrichment and depletion. Besides allowing input in the format of peptides and MSA, Seq2Logo accepts input as Blast sequence profiles, providing easy access for non-expert end-users to characterize and identify functionally conserved/variable amino acids in any given protein of interest. The output from the server is a sequence logo and a PSSM. Seq2Logo is available at http://www.cbs.dtu.dk/biotools/Seq2Logo (14 May 2012, date last accessed). PMID:22638583

  15. Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion.

    PubMed

    Thomsen, Martin Christen Frølund; Nielsen, Morten

    2012-07-01

    Seq2Logo is a web-based sequence logo generator. Sequence logos are a graphical representation of the information content stored in a multiple sequence alignment (MSA) and provide a compact and highly intuitive representation of the position-specific amino acid composition of binding motifs, active sites, etc. in biological sequences. Accurate generation of sequence logos is often compromised by sequence redundancy and low number of observations. Moreover, most methods available for sequence logo generation focus on displaying the position-specific enrichment of amino acids, discarding the equally valuable information related to amino acid depletion. Seq2logo aims at resolving these issues allowing the user to include sequence weighting to correct for data redundancy, pseudo counts to correct for low number of observations and different logotype representations each capturing different aspects related to amino acid enrichment and depletion. Besides allowing input in the format of peptides and MSA, Seq2Logo accepts input as Blast sequence profiles, providing easy access for non-expert end-users to characterize and identify functionally conserved/variable amino acids in any given protein of interest. The output from the server is a sequence logo and a PSSM. Seq2Logo is available at http://www.cbs.dtu.dk/biotools/Seq2Logo (14 May 2012, date last accessed).

  16. Conserved Sequence Preferences Contribute to Substrate Recognition by the Proteasome*

    PubMed Central

    Yu, Houqing; Singh Gautam, Amit K.; Wilmington, Shameika R.; Wylie, Dennis; Martinez-Fonts, Kirby; Kago, Grace; Warburton, Marie; Chavali, Sreenivas; Inobe, Tomonao; Finkelstein, Ilya J.; Babu, M. Madan

    2016-01-01

    The proteasome has pronounced preferences for the amino acid sequence of its substrates at the site where it initiates degradation. Here, we report that modulating these sequences can tune the steady-state abundance of proteins over 2 orders of magnitude in cells. This is the same dynamic range as seen for inducing ubiquitination through a classic N-end rule degron. The stability and abundance of His3 constructs dictated by the initiation site affect survival of yeast cells and show that variation in proteasomal initiation can affect fitness. The proteasome's sequence preferences are linked directly to the affinity of the initiation sites to their receptor on the proteasome and are conserved between Saccharomyces cerevisiae, Schizosaccharomyces pombe, and human cells. These findings establish that the sequence composition of unstructured initiation sites influences protein abundance in vivo in an evolutionarily conserved manner and can affect phenotype and fitness. PMID:27226608

  17. SVM2Motif—Reconstructing Overlapping DNA Sequence Motifs by Mimicking an SVM Predictor

    PubMed Central

    Vidovic, Marina M. -C.; Görnitz, Nico; Müller, Klaus-Robert; Rätsch, Gunnar; Kloft, Marius

    2015-01-01

    Identifying discriminative motifs underlying the functionality and evolution of organisms is a major challenge in computational biology. Machine learning approaches such as support vector machines (SVMs) achieve state-of-the-art performances in genomic discrimination tasks, but—due to its black-box character—motifs underlying its decision function are largely unknown. As a remedy, positional oligomer importance matrices (POIMs) allow us to visualize the significance of position-specific subsequences. Although being a major step towards the explanation of trained SVM models, they suffer from the fact that their size grows exponentially in the length of the motif, which renders their manual inspection feasible only for comparably small motif sizes, typically k ≤ 5. In this work, we extend the work on positional oligomer importance matrices, by presenting a new machine-learning methodology, entitled motifPOIM, to extract the truly relevant motifs—regardless of their length and complexity—underlying the predictions of a trained SVM model. Our framework thereby considers the motifs as free parameters in a probabilistic model, a task which can be phrased as a non-convex optimization problem. The exponential dependence of the POIM size on the oligomer length poses a major numerical challenge, which we address by an efficient optimization framework that allows us to find possibly overlapping motifs consisting of up to hundreds of nucleotides. We demonstrate the efficacy of our approach on a synthetic data set as well as a real-world human splice site data set. PMID:26690911

  18. NMR characterisation of a highly conserved secondary structural RNA motif of Halobacterium halobium 23S rRNA.

    PubMed

    King, John; Shammas, Christos; Nareen, Misbah; Lelli, Moreno; Ramesh, Vasudevan

    2013-05-28

    The highly conserved 29-mer RNA motif corresponding to the peptidyl transferase central circle region of the domain V of Halobacterium halobium 23S rRNA has been characterised by multidimensional NMR spectroscopy. The NMR structure has a good all atom average RMSD of 1.28 Å and a stable A-form helical conformation. The NMR structure differs from the X-ray crystal structure of an analogous motif, contained within the Escherichia coli ribosome, as none of the bases are flipped out and a number of non-canonical base pairs are formed in the solution structure. Thus in the observed NMR structure, the predicted A7 to U30 base pair is not seen and a non-canonical U6 to U30 base pair was formed in its place. Similarly the predicted A9 to U26 base pair was also not observed and another non-canonical A9 to A27 base pair was formed. It was also seen from the conformational analysis that the steps near the bulges had the greatest deviation from the canonical Watson-Crick base pair step parameters. Despite these differences, the 29-mer structure provides a working model of the RNA within the ribosome in a more natural solution state than that observed in the intact ribosome crystal structures, particularly around the A27 residue. The NMR structure determination of the 29-mer RNA motif provides a solid foundation for determining the NMR structure of the RNA-amicetin complex in the next step. To extend the above study, a fully (13)C and (15)N isotopically labelled 37-mer RNA version of the Halobacterium halobium RNA sample has been characterised using ultra high field 1 GHz spectroscopy. The results have been used to demonstrate the advantages conferred by the use of a 1 GHz spectrometer frequency over 800 MHz in terms of superior sensitivity and greater spectral dispersion achieved in the spectrum of the RNA.

  19. MSDmotif: exploring protein sites and motifs

    PubMed Central

    Golovin, Adel; Henrick, Kim

    2008-01-01

    Background Protein structures have conserved features – motifs, which have a sufficient influence on the protein function. These motifs can be found in sequence as well as in 3D space. Understanding of these fragments is essential for 3D structure prediction, modelling and drug-design. The Protein Data Bank (PDB) is the source of this information however present search tools have limited 3D options to integrate protein sequence with its 3D structure. Results We describe here a web application for querying the PDB for ligands, binding sites, small 3D structural and sequence motifs and the underlying database. Novel algorithms for chemical fragments, 3D motifs, ϕ/ψ sequences, super-secondary structure motifs and for small 3D structural motif associations searches are incorporated. The interface provides functionality for visualization, search criteria creation, sequence and 3D multiple alignment options. MSDmotif is an integrated system where a results page is also a search form. A set of motif statistics is available for analysis. This set includes molecule and motif binding statistics, distribution of motif sequences, occurrence of an amino-acid within a motif, correlation of amino-acids side-chain charges within a motif and Ramachandran plots for each residue. The binding statistics are presented in association with properties that include a ligand fragment library. Access is also provided through the distributed Annotation System (DAS) protocol. An additional entry point facilitates XML requests with XML responses. Conclusion MSDmotif is unique by combining chemical, sequence and 3D data in a single search engine with a range of search and visualisation options. It provides multiple views of data found in the PDB archive for exploring protein structures. PMID:18637174

  20. Refining multiple sequence alignments with conserved core regions

    PubMed Central

    Chakrabarti, Saikat; Lanczycki, Christopher J.; Panchenko, Anna R.; Przytycka, Teresa M.; Thiessen, Paul A.; Bryant, Stephen H.

    2006-01-01

    Accurate multiple sequence alignments of proteins are very important to several areas of computational biology and provide an understanding of phylogenetic history of domain families, their identification and classification. This article presents a new algorithm, REFINER, that refines a multiple sequence alignment by iterative realignment of its individual sequences with the predetermined conserved core (block) model of a protein family. Realignment of each sequence can correct misalignments between a given sequence and the rest of the profile and at the same time preserves the family's overall block model. Large-scale benchmarking studies showed a noticeable improvement of alignment after refinement. This can be inferred from the increased alignment score and enhanced sensitivity for database searching using the sequence profiles derived from refined alignments compared with the original alignments. A standalone version of the program is available by ftp distribution () and will be incorporated into the next release of the Cn3D structure/alignment viewer. PMID:16707662

  1. Phylogenetic Analysis of Geographically Diverse Radopholus similis via rDNA Sequence Reveals a Monomorphic Motif.

    PubMed

    Kaplan, D T; Thomas, W K; Frisse, L M; Sarah, J L; Stanton, J M; Speijer, P R; Marin, D H; Opperman, C H

    2000-06-01

    The nucleic acid sequences of rDNA ITS1 and the rDNA D2/D3 expansion segment were compared for 57 burrowing nematode isolates collected from Australia, Cameroon, Central America, Cuba, Dominican Republic, Florida, Guadeloupe, Hawaii, Nigeria, Honduras, Indonesia, Ivory Coast, Puerto Rico, South Africa, and Uganda. Of the 57 isolates, 55 were morphologically similar to Radopholus similis and seven were citrus-parasitic. The nucleic acid sequences for PCR-amplified ITS1 and for the D2/D3 expansion segment of the 28S rDNA gene were each identical for all putative R. similis. Sequence divergence for both the ITS1 and the D2/D3 was concordant with morphological differences that distinguish R. similis from other burrowing nematode species. This result substantiates previous observations that the R. similis genome is highly conserved across geographic regions. Autapomorphies that would delimit phylogenetic lineages of non-citrus-parasitic R. similis from those that parasitize citrus were not observed. The data presented herein support the concept that R. similis is comprised of two pathotypes-one that parasitizes citrus and one that does not.

  2. A combined sequence and structure based method for discovering enriched motifs in RNA from in vivo binding data.

    PubMed

    Polishchuk, Maya; Paz, Inbal; Kohen, Refael; Mesika, Rona; Yakhini, Zohar; Mandel-Gutfreund, Yael

    2017-04-15

    RNA binding proteins (RBPs) play an important role in regulating many processes in the cell. RBPs often recognize their RNA targets in a specific manner. In addition to the RNA primary sequence, the structure of the RNA has been shown to play a central role in RNA recognition by RBPs. In recent years, many experimental approaches, both in vitro and in vivo, were developed and employed to identify and characterize RBP targets and extract their binding specificities. In vivo binding techniques, such as CrossLinking and ImmunoPrecipitation (CLIP)-based methods, enable the characterization of protein binding sites on RNA targets. However, these methods do not provide information regarding the structural preferences of the protein. While methods to obtain the structure of RNA are available, inferring both the sequence and the structure preferences of RBPs remains a challenge. Here we present SMARTIV, a novel computational tool for discovering combined sequence and structure binding motifs from in vivo RNA binding data relying on the sequences of the target sites, the ranking of their binding scores and their predicted secondary structure. The combined motifs are provided in a unified representation that is informative and easy for visual perception. We tested the method on CLIP-seq data from different platforms for a variety of RBPs. Overall, we show that our results are highly consistent with known binding motifs of RBPs, offering additional information on their structural preferences. Copyright © 2017 Elsevier Inc. All rights reserved.

  3. A Conserved Motif within RAP1 Plays Diversified Roles in Telomere Protection and Regulation in Different Organisms

    PubMed Central

    Chen, Yong; Rai, Rekha; Zhou, Zi-Ren; Kanoh, Junko; Ribeyre, Cyril; Yang, Yuting; Zheng, Hong; Damay, Pascal; Wang, Feng; Tsujii, Hisayo; Hiraoka, Yasushi; Shore, David; Hu, Hong-Yu; Chang, Sandy; Lei, Ming

    2013-01-01

    Repressor activator protein 1 (RAP1) is the most highly conserved telomere protein. It is involved in protecting chromosome ends in fission yeast, promoting gene silencing in Saccharomyces cerevisiae while in Kluyveromyces lactis it is required to repress homology directed recombination (HDR) at telomeres. Since mammalian RAP1 requires TRF2 for stable expression, its role in telomere function has remained obscure. To understand how RAP1 plays such diverse functions at telomeres, we solved the crystal or solution structures of the C-terminal RCT domains of RAP1 from multiple organisms in complex with their respective protein-binding partners. Our comparative structural analysis establishes the RCT domain of RAP1 as an evolutionarily conserved protein-protein interaction module. In mammalian and fission yeast cells, this module interacts with TRF2 and Taz1, respectively, targeting RAP1 to chromosome ends for telomere end protection. While RAP1 repress NHEJ at fission yeast telomeres, at mammalian telomeres it is required to repress HDR. In contrast, S. cerevisiae RAP1 utilizes the RCT domain to recruit Sir3 to telomeres to mediate gene silencing. Together, our results reveal that depending on the organism, the evolutionarily conserved RAP1 RCT motif plays diverse functional roles at telomeres. PMID:21217703

  4. Sequence and structure conservation in a protein core.

    PubMed

    Rodionov, M A; Blundell, T L

    1998-11-15

    In order to study structural aspects of sequence conservation in families of homologous proteins, we have analyzed structurally aligned sequences of 585 proteins grouped into 128 homologous families. The conservation of a residue in a family is defined as the average residue similarity in a given position of aligned sequences. The residue similarities were expressed in the form of log-odd substitution tables that take into account the environments of amino acids in three-dimensional structures. The protein core is defined as those residues that have less then 7% solvent accessibility. The density of a protein core is described in terms of atom packing, which is investigated as a criterion for residue substitution and conservation. Although there is no significant correlation between sequence conservation and average atom packing around nonpolar residues such as leucine, valine and isoleucine, a significant correlation is observed for polar residues in the protein core. This may be explained by the hydrogen bonds in which polar residues are involved; the better their protection from water access the more stable should be the structure in that position.

  5. Building dictionaries of 1D and 3D motifs by mining the Unaligned 1D sequences of 17 archaeal and bacterial genomes.

    PubMed

    Rigoutsos, I; Gao, Y; Floratos, A; Parida, L

    1999-01-01

    We have used the Teiresias algorithm to carry out unsupervised pattern discovery in a database containing the unaligned ORFs from the 17 publicly available complete archaeal and bacterial genomes and build a 1D dictionary of motifs. These motifs which we refer to as seqlets account for and cover 97.88% of this genomic input at the level of amino acid positions. Each of the seqlets in this 1D dictionary was located among the sequences in Release 38.0 of the Protein Data Bank and the structural fragments corresponding to each seqlet's instances were identified and aligned in three dimensions: those of the seqlets that resulted in RMSD errors below a pre-selected threshold of 2.5 Angstroms were entered in a 3D dictionary of structurally conserved seqlets. These two dictionaries can be thought of as cross-indices that facilitate the tackling of tasks such as automated functional annotation of genomic sequences, local homology identification, local structure characterization, comparative genomics, etc.

  6. Identification of a Novel Sequence Motif Recognized by the Ankyrin Repeat Domain of zDHHC17/13 S-Acyltransferases*

    PubMed Central

    Lemonidis, Kimon; Sanchez-Perez, Maria C.; Chamberlain, Luke H.

    2015-01-01

    S-Acylation is a major post-translational modification affecting several cellular processes. It is particularly important for neuronal functions. This modification is catalyzed by a family of transmembrane S-acyltransferases that contain a conserved zinc finger DHHC (zDHHC) domain. Typically, eukaryote genomes encode for 7–24 distinct zDHHC enzymes, with two members also harboring an ankyrin repeat (AR) domain at their cytosolic N termini. The AR domain of zDHHC enzymes is predicted to engage in numerous interactions and facilitates both substrate recruitment and S-acylation-independent functions; however, the sequence/structural features recognized by this module remain unknown. The two mammalian AR-containing S-acyltransferases are the Golgi-localized zDHHC17 and zDHHC13, also known as Huntingtin-interacting proteins 14 and 14-like, respectively; they are highly expressed in brain, and their loss in mice leads to neuropathological deficits that are reminiscent of Huntington's disease. Here, we report that zDHHC17 and zDHHC13 recognize, via their AR domain, evolutionary conserved and closely related sequences of a [VIAP][VIT]XXQP consensus in SNAP25, SNAP23, cysteine string protein, Huntingtin, cytoplasmic linker protein 3, and microtubule-associated protein 6. This novel AR-binding sequence motif is found in regions predicted to be unstructured and is present in a number of zDHHC17 substrates and zDHHC17/13-interacting S-acylated proteins. This is the first study to identify a motif recognized by AR-containing zDHHCs. PMID:26198635

  7. A Novel Family in Medicago truncatula Consisting of More Than 300 Nodule-Specific Genes Coding for Small, Secreted Polypeptides with Conserved Cysteine Motifs1[w

    PubMed Central

    Mergaert, Peter; Nikovics, Krisztina; Kelemen, Zsolt; Maunoury, Nicolas; Vaubert, Danièle; Kondorosi, Adam; Kondorosi, Eva

    2003-01-01

    Transcriptome analysis of Medicago truncatula nodules has led to the discovery of a gene family named NCR (nodule-specific cysteine rich) with more than 300 members. The encoded polypeptides were short (60–90 amino acids), carried a conserved signal peptide, and, except for a conserved cysteine motif, displayed otherwise extensive sequence divergence. Family members were found in pea (Pisum sativum), broad bean (Vicia faba), white clover (Trifolium repens), and Galega orientalis but not in other plants, including other legumes, suggesting that the family might be specific for galegoid legumes forming indeterminate nodules. Gene expression of all family members was restricted to nodules except for two, also expressed in mycorrhizal roots. NCR genes exhibited distinct temporal and spatial expression patterns in nodules and, thus, were coupled to different stages of development. The signal peptide targeted the polypeptides in the secretory pathway, as shown by green fluorescent protein fusions expressed in onion (Allium cepa) epidermal cells. Coregulation of certain NCR genes with genes coding for a potentially secreted calmodulin-like protein and for a signal peptide peptidase suggests a concerted action in nodule development. Potential functions of the NCR polypeptides in cell-to-cell signaling and creation of a defense system are discussed. PMID:12746522

  8. Fine Scale Analysis of Crossover and Non-Crossover and Detection of Recombination Sequence Motifs in the Honeybee (Apis mellifera)

    PubMed Central

    Bessoltane, Nadia; Toffano-Nioche, Claire; Solignac, Michel; Mougel, Florence

    2012-01-01

    Background Meiotic exchanges are non-uniformly distributed across the genome of most studied organisms. This uneven distribution suggests that recombination is initiated by specific signals and/or regulations. Some of these signals were recently identified in humans and mice. However, it is unclear whether or not sequence signals are also involved in chromosomal recombination of insects. Methodology We analyzed recombination frequencies in the honeybee, in which genome sequencing provided a large amount of SNPs spread over the entire set of chromosomes. As the genome sequences were obtained from a pool of haploid males, which were the progeny of a single queen, an oocyte method (study of recombination on haploid males that develop from unfertilized eggs and hence are the direct reflect of female gametes haplotypes) was developed to detect recombined pairs of SNP sites. Sequences were further compared between recombinant and non-recombinant fragments to detect recombination-specific motifs. Conclusions Recombination events between adjacent SNP sites were detected at an average distance of 92 bp and revealed the existence of high rates of recombination events. This study also shows the presence of conversion without crossover (i. e. non-crossover) events, the number of which largely outnumbers that of crossover events. Furthermore the comparison of sequences that have undergone recombination with sequences that have not, led to the discovery of sequence motifs (CGCA, GCCGC, CCGCA), which may correspond to recombination signals. PMID:22567142

  9. Creation of Hybrid Nanorods From Sequences of Natural Trimeric Fibrous Proteins Using the Fibritin Trimerization Motif

    NASA Astrophysics Data System (ADS)

    Papanikolopoulou, Katerina; van Raaij, Mark J.; Mitraki, Anna

    Stable, artificial fibrous proteins that can be functionalized open new avenues in fields such as bionanomaterials design and fiber engineering. An important source of inspiration for the creation of such proteins are natural fibrous proteins such as collagen, elastin, insect silks, and fibers from phages and viruses. The fibrous parts of this last class of proteins usually adopt trimeric, β-stranded structural folds and are appended to globular, receptor-binding domains. It has been recently shown that the globular domains are essential for correct folding and trimerization and can be successfully substituted by a very small (27-amino acid) trimerization motif from phage T4 fibritin. The hybrid proteins are correctly folded nanorods that can withstand extreme conditions. When the fibrous part derives from the adenovirus fiber shaft, different tissue-targeting specificities can be engineered into the hybrid proteins, which therefore can be used as gene therapy vectors. The integration of such stable nanorods in devices is also a big challenge in the field of biomechanical design. The fibritin foldon domain is a versatile trimerization motif and can be combined with a variety of fibrous motifs, such as coiled-coil, collagenous, and triple β-stranded motifs, provided the appropriate linkers are used. The combination of different motifs within the same fibrous molecule to create stable rods with multiple functions can even be envisioned. We provide a comprehensive overview of the experimental procedures used for designing, creating, and characterizing hybrid fibrous nanorods using the fibritin trimerization motif.

  10. A conserved 11 nucleotide sequence contains an essential promoter element of the maize mitochondrial atp1 gene.

    PubMed Central

    Rapp, W D; Stern, D B

    1992-01-01

    To determine the structure of a functional plant mitochondrial promoter, we have partially purified an RNA polymerase activity that correctly initiates transcription at the maize mitochondrial atp1 promoter in vitro. Using a series of 5' deletion constructs, we found that essential sequences are located within--19 nucleotides (nt) of the transcription initiation site. The region surrounding the initiation site includes conserved sequence motifs previously proposed to be maize mitochondrial promoter elements. Deletion of a conserved 11 nt sequence showed that it is critical for promoter function, but deletion or alteration of conserved upstream G(A/T)3-4 repeats had no effect. When the atp1 11 nt sequence was inserted into different plasmids lacking mitochondrial promoter activity, transcription was only observed for one of these constructs. We infer from these data that the functional promoter extends beyond this motif, most likely in the 5' direction. The maize mitochondrial cox3 and atp6 promoters also direct transcription initiation in this in vitro system, suggesting that it may be widely applicable for studies of mitochondrial transcription in this species. Images PMID:1372246

  11. Evolutionary growth process of highly conserved sequences in vertebrate genomes.

    PubMed

    Ishibashi, Minaka; Noda, Akiko Ogura; Sakate, Ryuichi; Imanishi, Tadashi

    2012-08-01

    Genome sequence comparison between evolutionarily distant species revealed ultraconserved elements (UCEs) among mammals under strong purifying selection. Most of them were also conserved among vertebrates. Because they tend to be located in the flanking regions of developmental genes, they would have fundamental roles in creating vertebrate body plans. However, the evolutionary origin and selection mechanism of these UCEs remain unclear. Here we report that UCEs arose in primitive vertebrates, and gradually grew in vertebrate evolution. We searched for UCEs in two teleost fishes, Tetraodon nigroviridis and Oryzias latipes, and found 554 UCEs with 100% identity over 100 bps. Comparison of teleost and mammalian UCEs revealed 43 pairs of common, jawed-vertebrate UCEs (jUCE) with high sequence identities, ranging from 83.1% to 99.2%. Ten of them retain lower similarities to the Petromyzon marinus genome, and the substitution rates of four non-exonic jUCEs were reduced after the teleost-mammal divergence, suggesting that robust conservation had been acquired in the jawed vertebrate lineage. Our results indicate that prototypical UCEs originated before the divergence of jawed and jawless vertebrates and have been frozen as perfect conserved sequences in the jawed vertebrate lineage. In addition, our comparative sequence analyses of UCEs and neighboring regions resulted in a discovery of lineage-specific conserved sequences. They were added progressively to prototypical UCEs, suggesting step-wise acquisition of novel regulatory roles. Our results indicate that conserved non-coding elements (CNEs) consist of blocks with distinct evolutionary history, each having been frozen since different evolutionary era along the vertebrate lineage. Copyright © 2012 Elsevier B.V. All rights reserved.

  12. A conserved intronic U1 snRNP-binding sequence promotes trans-splicing in Drosophila.

    PubMed

    Gao, Jun-Li; Fan, Yu-Jie; Wang, Xiu-Ye; Zhang, Yu; Pu, Jia; Li, Liang; Shao, Wei; Zhan, Shuai; Hao, Jianjiang; Xu, Yong-Zhen

    2015-04-01

    Unlike typical cis-splicing, trans-splicing joins exons from two separate transcripts to produce chimeric mRNA and has been detected in most eukaryotes. Trans-splicing in trypanosomes and nematodes has been characterized as a spliced leader RNA-facilitated reaction; in contrast, its mechanism in higher eukaryotes remains unclear. Here we investigate mod(mdg4), a classic trans-spliced gene in Drosophila, and report that two critical RNA sequences in the middle of the last 5' intron, TSA and TSB, promote trans-splicing of mod(mdg4). In TSA, a 13-nucleotide (nt) core motif is conserved across Drosophila species and is essential and sufficient for trans-splicing, which binds U1 small nuclear RNP (snRNP) through strong base-pairing with U1 snRNA. In TSB, a conserved secondary structure acts as an enhancer. Deletions of TSA and TSB using the CRISPR/Cas9 system result in developmental defects in flies. Although it is not clear how the 5' intron finds the 3' introns, compensatory changes in U1 snRNA rescue trans-splicing of TSA mutants, demonstrating that U1 recruitment is critical to promote trans-splicing in vivo. Furthermore, TSA core-like motifs are found in many other trans-spliced Drosophila genes, including lola. These findings represent a novel mechanism of trans-splicing, in which RNA motifs in the 5' intron are sufficient to bring separate transcripts into close proximity to promote trans-splicing.

  13. A conserved intronic U1 snRNP-binding sequence promotes trans-splicing in Drosophila

    PubMed Central

    Gao, Jun-Li; Fan, Yu-Jie; Wang, Xiu-Ye; Zhang, Yu; Pu, Jia; Li, Liang; Shao, Wei; Zhan, Shuai; Hao, Jianjiang

    2015-01-01

    Unlike typical cis-splicing, trans-splicing joins exons from two separate transcripts to produce chimeric mRNA and has been detected in most eukaryotes. Trans-splicing in trypanosomes and nematodes has been characterized as a spliced leader RNA-facilitated reaction; in contrast, its mechanism in higher eukaryotes remains unclear. Here we investigate mod(mdg4), a classic trans-spliced gene in Drosophila, and report that two critical RNA sequences in the middle of the last 5′ intron, TSA and TSB, promote trans-splicing of mod(mdg4). In TSA, a 13-nucleotide (nt) core motif is conserved across Drosophila species and is essential and sufficient for trans-splicing, which binds U1 small nuclear RNP (snRNP) through strong base-pairing with U1 snRNA. In TSB, a conserved secondary structure acts as an enhancer. Deletions of TSA and TSB using the CRISPR/Cas9 system result in developmental defects in flies. Although it is not clear how the 5′ intron finds the 3′ introns, compensatory changes in U1 snRNA rescue trans-splicing of TSA mutants, demonstrating that U1 recruitment is critical to promote trans-splicing in vivo. Furthermore, TSA core-like motifs are found in many other trans-spliced Drosophila genes, including lola. These findings represent a novel mechanism of trans-splicing, in which RNA motifs in the 5′ intron are sufficient to bring separate transcripts into close proximity to promote trans-splicing. PMID:25838544

  14. Comparison of loline alkaloid gene clusters across fungal endophytes: predicting the co-regulatory sequence motifs and the evolutionary history.

    PubMed

    Kutil, Brandi L; Greenwald, Charles; Liu, Gang; Spiering, Martin J; Schardl, Christopher L; Wilkinson, Heather H

    2007-10-01

    LOL, a fungal secondary metabolite gene cluster found in Epichloë and Neotyphodium species, is responsible for production of insecticidal loline alkaloids. To analyze the genetic architecture and to predict the evolutionary history of LOL, we compared five clusters from four fungal species (single clusters from Epichloë festucae, Neotyphodium sp. PauTG-1, Neotyphodium coenophialum, and two clusters we previously characterized in Neotyphodium uncinatum). Using PhyloCon to compare putative lol gene promoter regions, we have identified four motifs conserved across the lol genes in all five clusters. Each motif has significant similarity to known fungal transcription factor binding sites in the TRANSFAC database. Conservation of these motifs is further support for the hypothesis that the lol genes are co-regulated. Interestingly, the history of asexual Neotyphodium spp. includes multiple interspecific hybridization events. Comparing clusters from three Neotyphodium species and E. festucae allowed us to determine which Epichloë ancestors are the most likely contributors of LOL in these asexual species. For example, while no present day Epichloë typhina isolates are known to produce lolines, our data support the hypothesis that the E. typhina ancestor(s) of three asexual endophyte species contained a LOL gene cluster. Thus, these data support a model of evolution in which the polymorphism in loline alkaloid production phenotypes among endophyte species is likely due to the loss of the trait over time.

  15. Origin replication complex binding, nucleosome depletion patterns, and a primary sequence motif can predict origins of replication in a genome with epigenetic centromeres.

    PubMed

    Tsai, Hung-Ji; Baller, Joshua A; Liachko, Ivan; Koren, Amnon; Burrack, Laura S; Hickman, Meleah A; Thevandavakkam, Mathuravani A; Rusche, Laura N; Berman, Judith

    2014-09-02

    Origins of DNA replication are key genetic elements, yet their identification remains elusive in most organisms. In previous work, we found that centromeres contain origins of replication (ORIs) that are determined epigenetically in the pathogenic yeast Candida albicans. In this study, we used origin recognition complex (ORC) binding and nucleosome occupancy patterns in Saccharomyces cerevisiae and Kluyveromyces lactis to train a machine learning algorithm to predict the position of active arm (noncentromeric) origins in the C. albicans genome. The model identified bona fide active origins as determined by the presence of replication intermediates on nondenaturing two-dimensional (2D) gels. Importantly, these origins function at their native chromosomal loci and also as autonomously replicating sequences (ARSs) on a linear plasmid. A "mini-ARS screen" identified at least one and often two ARS regions of ≥100 bp within each bona fide origin. Furthermore, a 15-bp AC-rich consensus motif was associated with the predicted origins and conferred autonomous replicating activity to the mini-ARSs. Thus, while centromeres and the origins associated with them are epigenetic, arm origins are dependent upon critical DNA features, such as a binding site for ORC and a propensity for nucleosome exclusion. DNA replication machinery is highly conserved, yet the definition of exactly what specifies a replication origin differs in different species. Here, we utilized computational genomics to predict origin locations in Candida albicans by combining locations of binding sites for the conserved origin replication complex, necessary for replication initiation, together with chromatin organization patterns. We identified predicted sequences that exhibited bona fide origin function and developed a linear plasmid assay to delimit the DNA fragments necessary for origin function. Additionally, we found that a short AC-rich motif, which is enriched in predicted origins, is required for

  16. Sequence and Spatiotemporal Expression Analysis of CLE-Motif Containing Genes from the Reniform Nematode (Rotylenchulus reniformis Linford & Oliveira).

    PubMed

    Wubben, Martin J; Gavilano, Lily; Baum, Thomas J; Davis, Eric L

    2015-06-01

    The reniform nematode, Rotylenchulus reniformis, is a sedentary semi-endoparasitic species with a host range that encompasses more than 77 plant families. Nematode effector proteins containing plant-ligand motifs similar to CLAVATA3/ESR (CLE) peptides have been identified in the Heterodera, Globodera, and Meloidogyne genera of sedentary endoparasites. Here, we describe the isolation, sequence analysis, and spatiotemporal expression of three R. reniformis genes encoding putative CLE motifs named Rr-cle-1, Rr-cle-2, and Rr-cle-3. The Rr-cle cDNAs showed >98% identity with each other and the predicted peptides were identical with the exception of a short stretch of residues at the carboxy(C)-terminus of the variable domain (VD). Each RrCLE peptide possessed an amino-terminal signal peptide for secretion and a single C-terminal CLE motif that was most similar to Heterodera CLE motifs. Aligning the Rr-cle cDNAs with their corresponding genomic sequences showed three exons with an intron separating the signal peptide from the VD and a second intron separating the VD from the CLE motif. An alignment of the RrCLE1 peptide with Heterodera glycines and Heterodera schachtii CLE proteins revealed a high level of homology within the VD region associated with regulating in planta trafficking of the processed CLE peptide. Quantitative RT-PCR (qRT-PCR) showed similar expression profiles for each Rr-cle transcript across the R. reniformis life-cycle with the greatest transcript abundance being in sedentary parasitic female nematodes. In situ hybridization showed specific Rr-cle expression within the dorsal esophageal gland cell of sedentary parasitic females.

  17. Complete mitochondrial genome of the red drum, Sciaenops ocellatus (Perciformes, Sciaenidae): absence of the typical conserved motif in the origin of the light-strand replication.

    PubMed

    Cheng, Yuanzhi; Shi, Ge; Xu, Tianjun; Li, Haiyan; Sun, Yueyan; Wang, Rixin

    2012-04-01

    In this study, the complete mitochondrial genome of the red drum Sciaenops ocellatus was determined first. The genome was 16,500 bp in length and contained 13 protein-coding genes, 2 ribosomal RNA genes, 22 transfer RNA genes, and 2 main non-coding regions (the control region and the origin of the light-strand replication); the gene composition and order of which were similar to most other vertebrates. The overall base composition of the heavy strand was T 25.5%, C 30.7%, A 27.5%, and G 16.3%, with a slight AT bias of 53%. Within the control region, the discrete and conserved sequence blocks were identified. Motif 5'-ACCGG-3' rather than 5'-GCCGG-3' was detected in the origin of light-strand replication (O(L)) of red drum, which is rare in the mitogenomes of Sciaenidae species. These results would play an important role in elucidating sequence-function relationships of the O(L).

  18. A DNA-binding protein containing two widely separated zinc finger motifs that recognize the same DNA sequence.

    PubMed

    Fan, C M; Maniatis, T

    1990-01-01

    We have isolated a full-length cDNA clone encoding a protein (PRDII-BF1) that binds specifically to a positive regulatory domain (PRDII) of the human IFN-beta gene promoter, and to a similar sequence present in a number of other promoters and enhancers. The sequence of this protein reveals two novel structural features. First, it is the largest sequence-specific DNA-binding protein reported to date (298 kD). Second, it contains two widely separated sets of C2-H2-type zinc fingers. Remarkably, each set of zinc fingers binds to the same DNA sequence motif with similar affinities and methylation interference patterns. Thus, this protein may act by binding simultaneously to reiterated copies of the same recognition sequence. Although the function of PRDII-BF1 is not known, the level of its mRNA is inducible by serum and virus, albeit with different kinetics.

  19. An amino acid sequence motif sufficient for subnuclear localization of an arginine/serine-rich splicing factor.

    PubMed

    Hedley, M L; Amrein, H; Maniatis, T

    1995-12-05

    We have identified an amino acid sequence in the Drosophila Transformer (Tra) protein that is capable of directing a heterologous protein to nuclear speckles, regions of the nucleus previously shown to contain high concentrations of spliceosomal small nuclear RNAs and splicing factors. This sequence contains a nucleoplasmin-like bipartite nuclear localization signal (NLS) and a repeating arginine/serine (RS) dipeptide sequence adjacent to a short stretch of basic amino acids. Sequence comparisons from a number of other splicing factors that colocalize to nuclear speckles reveal the presence of one or more copies of this motif. We propose a two-step subnuclear localization mechanism for splicing factors. The first step is transport across the nuclear envelope via the nucleoplasmin-like NLS, while the second step is association with components in the speckled domain via the RS dipeptide sequence.

  20. Modeling and analysis of MH1 domain of Smads and their interaction with promoter DNA sequence motif.

    PubMed

    Makkar, Pooja; Metpally, Raghu Prasad R; Sangadala, Sreedhara; Reddy, Boojala Vijay B

    2009-04-01

    The Smads are a group of related intracellular proteins critical for transmitting the signals to the nucleus from the transforming growth factor-beta (TGF-beta) superfamily of proteins at the cell surface. The prototypic members of the Smad family, Mad and Sma, were first described in Drosophila and Caenorhabditis elegans, respectively. Related proteins in Xenopus, Humans, Mice and Rats were subsequently identified, and are now known as Smads. Smad protein family members act downstream in the TGF-beta signaling pathway mediating various biological processes, including cell growth, differentiation, matrix production, apoptosis and development. Smads range from about 400-500 amino acids in length and are grouped into the receptor-regulated Smads (R-Smads), the common Smads (Co-Smads) and the inhibitory Smads (I-Smads). There are eight Smads in mammals, Smad1/5/8 (bone morphogenetic protein regulated) and Smad2/3 (TGF-beta/activin regulated) are termed R-Smads, Smad4 is denoted as Co-Smad and Smad6/7 are inhibitory Smads. A typical Smad consists of a conserved N-terminal Mad Homology 1 (MH1) domain and a C-terminal Mad Homology 2 (MH2) domain connected by a proline rich linker. The MH1 domain plays key role in DNA recognition and also facilitates the binding of Smad4 to the phosphorylated C-terminus of R-Smads to form activated complex. The MH2 domain exhibits transcriptional activation properties. In order to understand the structural basis of interaction of various Smads with their target proteins and the promoter DNA, we modeled MH1 domain of the remaining mammalian Smads based on known crystal structures of Smad3-MH1 domain bound to GTCT Smad box DNA sequence (1OZJ). We generated a B-DNA structure using average base-pair parameters of Twist, Tilt, Roll and base Slide angles. We then modeled interaction pose of the MH1 domain of Smad1/5/8 to their corresponding DNA sequence motif GCCG. These models provide the structural basis towards understanding functional

  1. Evening Expression of Arabidopsis GIGANTEA Is Controlled by Combinatorial Interactions among Evolutionarily Conserved Regulatory Motifs[C][W][OPEN

    PubMed Central

    Nordström, Karl; Cremer, Frédéric; Tóth, Réka; Hartke, Martin; Simon, Samson; Klasen, Jonas R.; Bürstel, Ingmar; Coupland, George

    2014-01-01

    Diurnal patterns of gene transcription are often conferred by complex interactions between circadian clock control and acute responses to environmental cues. Arabidopsis thaliana GIGANTEA (GI) contributes to photoperiodic flowering, circadian clock control, and photoreceptor signaling, and its transcription is regulated by the circadian clock and light. We used phylogenetic shadowing to identify three evolutionarily constrained regions (conserved regulatory modules [CRMs]) within the GI promoter and show that CRM2 is sufficient to confer a similar transcriptional pattern as the full-length promoter. Dissection of CRM2 showed that one subfragment (CRM2-A) contributes light inducibility, while another (CRM2-B) exhibits a diurnal response. Mutational analysis showed that three ABA RESPONSE ELEMENT LIKE (ABREL) motifs in CRM2-A and three EVENING ELEMENTs (EEs) in CRM2-B are essential in combination to confer a high amplitude diurnal pattern of expression. Genome-wide analysis identified characteristic spacing patterns of EEs and 71 A. thaliana promoters containing three EEs. Among these promoters, that of FLAVIN BINDING KELCH REPEAT F-BOX1 was analyzed in detail and shown to harbor a CRM functionally related to GI CRM2. Thus, combinatorial interactions among EEs and ABRELs confer diurnal patterns of transcription via an evolutionarily conserved module present in GI and other evening-expressed genes. PMID:25361953

  2. Drosophila melanogaster Hox Transcription Factors Access the RNA Polymerase II Machinery through Direct Homeodomain Binding to a Conserved Motif of Mediator Subunit Med19

    PubMed Central

    Boube, Muriel; Hudry, Bruno; Immarigeon, Clément; Carrier, Yannick; Bernat-Fabre, Sandra; Merabet, Samir; Graba, Yacine; Bourbon, Henri-Marc; Cribbs, David L.

    2014-01-01

    Hox genes in species across the metazoa encode transcription factors (TFs) containing highly-conserved homeodomains that bind target DNA sequences to regulate batteries of developmental target genes. DNA-bound Hox proteins, together with other TF partners, induce an appropriate transcriptional response by RNA Polymerase II (PolII) and its associated general transcription factors. How the evolutionarily conserved Hox TFs interface with this general machinery to generate finely regulated transcriptional responses remains obscure. One major component of the PolII machinery, the Mediator (MED) transcription complex, is composed of roughly 30 protein subunits organized in modules that bridge the PolII enzyme to DNA-bound TFs. Here, we investigate the physical and functional interplay between Drosophila melanogaster Hox developmental TFs and MED complex proteins. We find that the Med19 subunit directly binds Hox homeodomains, in vitro and in vivo. Loss-of-function Med19 mutations act as dose-sensitive genetic modifiers that synergistically modulate Hox-directed developmental outcomes. Using clonal analysis, we identify a role for Med19 in Hox-dependent target gene activation. We identify a conserved, animal-specific motif that is required for Med19 homeodomain binding, and for activation of a specific Ultrabithorax target. These results provide the first direct molecular link between Hox homeodomain proteins and the general PolII machinery. They support a role for Med19 as a PolII holoenzyme-embedded “co-factor” that acts together with Hox proteins through their homeodomains in regulated developmental transcription. PMID:24786462

  3. Four thiol peroxidases contain a conserved GCT catalytic motif and act as a versatile array of lipid peroxidases in Anabaena sp. PCC7120.

    PubMed

    Cha, Mee-Kyung; Hong, Seung-Keun; Kim, Il-Han

    2007-06-01

    The Anabaena sp. (ANASP) genome contains seven open reading frames with homology to thiol peroxidase (TPx), also known as peroxiredoxin (Prx). Based on sequence similarities among putative TPx's derived from various cyanobacteria genomes, we designated the seven putative TPx members as VCP, VCT, TCS, and GCT clusters according to the sequence of their conserved catalytic motif. The GCT cluster consists of four members, named GCT1, GCT2, GCT3, and GCT4. The ANASP GCT-TPx genes were recombinantly expressed in Escherichia coli. The purified proteins were characterized with an emphasis on the ability to destroy various peroxides, the electron donor, and the conserved cysteine structure as a catalytic intermediate. All GCT members, as an atypical 2-Cys TPx family, exerted the highest peroxidase activity toward a lipid hydroperoxide using an electron from thioredoxin. Periplasmic protein analysis revealed that GCT2 and GCT4 are distributed in the cytoplasm, whereas GCT1 and GCT3, homologues of E. coli bacterioferritin comigratory protein/plant PrxQ, are localized in the periplasmic space. Immunoblots of the heterocystic proteins showed that the level of GCT2 in the heterocyst is comparable to that in the vegetative cell, whereas the other GCT members were not significantly detected in the heterocyst. The transcriptional responses of ANASP GCT genes to various oxidative stresses and growth environments were multifarious. Their intrinsic differences in transcriptional responsiveness and cellular localization suggest that this large GCT cluster is designed as an adaptive strategy to efficiently combat lipid hydroperoxide in Anabaena sp. that perform oxygenic photosynthesis and N(2) fixation.

  4. Conserved structural motifs in the central pair complex of eukaryotic flagella.

    PubMed

    Carbajal-González, Blanca I; Heuser, Thomas; Fu, Xiaofeng; Lin, Jianfeng; Smith, Brandon W; Mitchell, David R; Nicastro, Daniela

    2013-02-01

    Cilia and flagella are conserved hair-like appendages of eukaryotic cells that function as sensing and motility generating organelles. Motility is driven by thousands of axonemal dyneins that require precise regulation. One essential motility regulator is the central pair complex (CPC) and many CPC defects cause paralysis of cilia/flagella. Several human diseases, such as immotile cilia syndrome, show CPC abnormalities, but little is known about the detailed three-dimensional (3D) structure and function of the CPC. The CPC is located in the center of typical [9+2] cilia/flagella and is composed of two singlet microtubules (MTs), each with a set of associated projections that extend toward the surrounding nine doublet MTs. Using cryo-electron tomography coupled with subtomogram averaging, we visualized and compared the 3D structures of the CPC in both the green alga Chlamydomonas and the sea urchin Strongylocentrotus at the highest resolution published to date. Despite the evolutionary distance between these species, their CPCs exhibit remarkable structural conservation. We identified several new projections, including those that form the elusive sheath, and show that the bridge has a more complex architecture than previously thought. Organism-specific differences include the presence of MT inner proteins in Chlamydomonas, but not Strongylocentrotus, and different overall outlines of the highly connected projection network, which forms a round-shaped cylinder in algae, but is more oval in sea urchin. These differences could be adaptations to the mechanical requirements of the rotating CPC in Chlamydomonas, compared to the Strongylocentrotus CPC which has a fixed orientation.

  5. Sequencing Conservation Actions Through Threat Assessments in the Southeastern United States

    Treesearch

    Robert D. Sutter; Christopher C. Szell

    2006-01-01

    The identification of conservation priorities is one of the leading issues in conservation biology. We present a project of The Nature Conservancy, called Sequencing Conservation Actions, which prioritizes conservation areas and identifies foci for crosscutting strategies at various geographic scales. We use the term “Sequencing” to mean an ordering of actions over...

  6. A sequence upstream of canonical PDZ-binding motif within CFTR COOH-terminus enhances NHERF1 interaction.

    PubMed

    Sharma, Neeraj; LaRusch, Jessica; Sosnay, Patrick R; Gottschalk, Laura B; Lopez, Andrea P; Pellicore, Matthew J; Evans, Taylor; Davis, Emily; Atalar, Melis; Na, Chan-Hyun; Rosson, Gedge D; Belchis, Deborah; Milewski, Michal; Pandey, Akhilesh; Cutting, Garry R

    2016-12-01

    The development of cystic fibrosis transmembrane conductance regulator (CFTR) targeted therapy for cystic fibrosis has generated interest in maximizing membrane residence of mutant forms of CFTR by manipulating interactions with scaffold proteins, such as sodium/hydrogen exchange regulatory factor-1 (NHERF1). In this study, we explored whether COOH-terminal sequences in CFTR beyond the PDZ-binding motif influence its interaction with NHERF1. NHERF1 displayed minimal self-association in blot overlays (NHERF1, Kd = 1,382 ± 61.1 nM) at concentrations well above physiological levels, estimated at 240 nM from RNA-sequencing and 260 nM by liquid chromatography tandem mass spectrometry in sweat gland, a key site of CFTR function in vivo. However, NHERF1 oligomerized at considerably lower concentrations (10 nM) in the presence of the last 111 amino acids of CFTR (20 nM) in blot overlays and cross-linking assays and in coimmunoprecipitations using differently tagged versions of NHERF1. Deletion and alanine mutagenesis revealed that a six-amino acid sequence (1417)EENKVR(1422) and the terminal (1478)TRL(1480) (PDZ-binding motif) in the COOH-terminus were essential for the enhanced oligomerization of NHERF1. Full-length CFTR stably expressed in Madin-Darby canine kidney epithelial cells fostered NHERF1 oligomerization that was substantially reduced (∼5-fold) on alanine substitution of EEN, KVR, or EENKVR residues or deletion of the TRL motif. Confocal fluorescent microscopy revealed that the EENKVR and TRL sequences contribute to preferential localization of CFTR to the apical membrane. Together, these results indicate that COOH-terminal sequences mediate enhanced NHERF1 interaction and facilitate the localization of CFTR, a property that could be manipulated to stabilize mutant forms of CFTR at the apical surface to maximize the effect of CFTR-targeted therapeutics.

  7. A conserved MADS-box phosphorylation motif regulates differentiation and mitochondrial function in skeletal, cardiac, and smooth muscle cells.

    PubMed

    Mughal, W; Nguyen, L; Pustylnik, S; da Silva Rosa, S C; Piotrowski, S; Chapman, D; Du, M; Alli, N S; Grigull, J; Halayko, A J; Aliani, M; Topham, M K; Epand, R M; Hatch, G M; Pereira, T J; Kereliuk, S; McDermott, J C; Rampitsch, C; Dolinsky, V W; Gordon, J W

    2015-10-29

    Exposure to metabolic disease during fetal development alters cellular differentiation and perturbs metabolic homeostasis, but the underlying molecular regulators of this phenomenon in muscle cells are not completely understood. To address this, we undertook a computational approach to identify cooperating partners of the myocyte enhancer factor-2 (MEF2) family of transcription factors, known regulators of muscle differentiation and metabolic function. We demonstrate that MEF2 and the serum response factor (SRF) collaboratively regulate the expression of numerous muscle-specific genes, including microRNA-133a (miR-133a). Using tandem mass spectrometry techniques, we identify a conserved phosphorylation motif within the MEF2 and SRF Mcm1 Agamous Deficiens SRF (MADS)-box that regulates miR-133a expression and mitochondrial function in response to a lipotoxic signal. Furthermore, reconstitution of MEF2 function by expression of a neutralizing mutation in this identified phosphorylation motif restores miR-133a expression and mitochondrial membrane potential during lipotoxicity. Mechanistically, we demonstrate that miR-133a regulates mitochondrial function through translational inhibition of a mitophagy and cell death modulating protein, called Nix. Finally, we show that rodents exposed to gestational diabetes during fetal development display muscle diacylglycerol accumulation, concurrent with insulin resistance, reduced miR-133a, and elevated Nix expression, as young adult rats. Given the diverse roles of miR-133a and Nix in regulating mitochondrial function, and proliferation in certain cancers, dysregulation of this genetic pathway may have broad implications involving insulin resistance, cardiovascular disease, and cancer biology.

  8. CCN2/CTGF regulates neovessel formation via targeting structurally conserved cystine knot motifs in multiple angiogenic regulators

    PubMed Central

    Pi, Liya; Shenoy, Anitha K.; Liu, Jianwen; Kim, Seungbum; Nelson, Nikole; Xia, Huiming; Hauswirth, William W.; Petersen, Bryon E.; Schultz, Gregory S.; Scott, Edward W.

    2012-01-01

    Blood vessels are formed during development and tissue repair through a plethora of modifiers that coordinate efficient vessel assembly in various cellular settings. Here we used the yeast 2-hybrid approach and demonstrated a broad affinity of connective tissue growth factor (CCN2/CTGF) to C-terminal cystine knot motifs present in key angiogenic regulators Slit3, von Willebrand factor, platelet-derived growth factor-B, and VEGF-A. Biochemical characterization and histological analysis showed close association of CCN2/CTGF with these regulators in murine angiogenesis models: normal retinal development, oxygen-induced retinopathy (OIR), and Lewis lung carcinomas. CCN2/CTGF and Slit3 proteins worked in concert to promote in vitro angiogenesis and downstream Cdc42 activation. A fragment corresponding to the first three modules of CCN2/CTGF retained this broad binding ability and gained a dominant-negative function. Intravitreal injection of this mutant caused a significant reduction in vascular obliteration and retinal neovascularization vs. saline injection in the OIR model. Knocking down CCN2/CTGF expression by short-hairpin RNA or ectopic expression of this mutant greatly decreased tumorigenesis and angiogenesis. These results provided mechanistic insight into the angiogenic action of CCN2/CTGF and demonstrated the therapeutic potential of dominant-negative CCN2/CTGF mutants for antiangiogenesis.—Pi, L., Shenoy, A. K., Liu, J., Kim, S., Nelson, N., Xia, H., Hauswirth, W. W., Petersen, B. E., Schultz, G. S., Scott, E. W. CCN2/CTGF regulates neovessel formation via targeting structurally conserved cystine knot motifs in multiple angiogenic regulators. PMID:22611085

  9. Local function conservation in sequence and structure space.

    PubMed

    Weinhold, Nils; Sander, Oliver; Domingues, Francisco S; Lengauer, Thomas; Sommer, Ingolf

    2008-07-04

    We assess the variability of protein function in protein sequence and structure space. Various regions in this space exhibit considerable difference in the local conservation of molecular function. We analyze and capture local function conservation by means of logistic curves. Based on this analysis, we propose a method for predicting molecular function of a query protein with known structure but unknown function. The prediction method is rigorously assessed and compared with a previously published function predictor. Furthermore, we apply the method to 500 functionally unannotated PDB structures and discuss selected examples. The proposed approach provides a simple yet consistent statistical model for the complex relations between protein sequence, structure, and function. The GOdot method is available online (http://godot.bioinf.mpi-inf.mpg.de).

  10. A conserved Glu-Arg salt bridge connects co-evolved motifs that define the eukaryote protein kinase fold

    PubMed Central

    Yang, Jie; Wu, Jian; Steichen, Jon M.; Kornev, Alexandr P.; Deal, Michael S.; Li, Sheng; Sankaran, Banumathi; Woods, Virgil L.; Taylor, Susan S.

    2012-01-01

    Eukaryotic protein kinases (EPK)feature two co-evolved structural segments, the Activation segment which starts with the Asp-Phe-Gly (DFG) and ends with the Ala-Pro-Glu (APE) motifs, and the helical GHI-subdomain that comprises αG-αH-αI helices. Eukaryotic-like kinases have a much shorter Activation segment and lack the GHI-subdomain. They thus lack the conserved salt bridge interaction between the APE Glu and an Arg from the GHI-subdomain, a hallmark signature of EPKs. Although the conservation of this salt bridge in EPKs is well known and its implication in diseases has been illustrated by polymorphism analysis, its function has not been carefully studied. In this work, we use murine cAMP dependent protein kinase (PKA) as the model enzyme (Glu208 and Arg280) to examine the role of these two residues. We showed that Ala replacement of either residue caused a 40–120 fold decrease in catalytic efficiency of the enzyme due to an increase in Km(ATP) and a decrease in kcat. Crystal structures, as well as solution studies, also demonstratethat this ion pair contributes to the hydrophobic network and stability of the enzyme. We show that mutation of either Glu or Arg to Ala renders bothmutant proteins less effective substrates for upstream kinase phosphoinositide dependent kinase 1. We propose that the Glu208-Arg280 pair serves as a center hub of connectivity between these two structurally conserved elements in EPKs. Mutations of either residue disrupt communication not only between the two segments but also within the rest of the molecule leading to altered catalytic activity and enzyme regulation. PMID:22138346

  11. Internal epitope tagging informed by relative lack of sequence conservation

    PubMed Central

    Burg, Leonard; Zhang, Karen; Bonawitz, Tristan; Grajevskaja, Viktorija; Bellipanni, Gianfranco; Waring, Richard; Balciunas, Darius

    2016-01-01

    Many experimental techniques rely on specific recognition and stringent binding of proteins by antibodies. This can readily be achieved by introducing an epitope tag. We employed an approach that uses a relative lack of evolutionary conservation to inform epitope tag site selection, followed by integration of the tag-coding sequence into the endogenous locus in zebrafish. We demonstrate that an internal epitope tag is accessible for antibody binding, and that tagged proteins retain wild type function. PMID:27892520

  12. Prevalent Sequences in the Human Genome Can Form Mini i-Motif Structures at Physiological pH.

    PubMed

    Mir, Bartomeu; Serrano, Israel; Buitrago, Diana; Orozco, Modesto; Escaja, Núria; González, Carlos

    2017-10-11

    We report here the solution structure of several repetitive DNA sequences containing d(TCGTTCCGT) and related repeats. At physiological pH, these sequences fold into i-motif like quadruplexes in which every two repeats a globular structure is stabilized by two hemiprotonated C:C(+) base pairs, flanked by two minor groove tetrads resulting from the association of G:C or G:T base pairs. The interaction between the minor groove tetrads and the nearby C:C(+) base pairs affords a strong stabilization, which results in effective pHT values above 7.5. Longer sequences with more than two repeats are able to fold in tandem, forming a rosary bead-like structure. Bioinformatics analysis shows that these sequences are prevalent in the human genome, and are present in development-related genes.

  13. ML2Motif—Reliable extraction of discriminative sequence motifs from learning machines

    PubMed Central

    Kloft, Marius; Müller, Klaus-Robert; Görnitz, Nico

    2017-01-01

    High prediction accuracies are not the only objective to consider when solving problems using machine learning. Instead, particular scientific applications require some explanation of the learned prediction function. For computational biology, positional oligomer importance matrices (POIMs) have been successfully applied to explain the decision of support vector machines (SVMs) using weighted-degree (WD) kernels. To extract relevant biological motifs from POIMs, the motifPOIM method has been devised and showed promising results on real-world data. Our contribution in this paper is twofold: as an extension to POIMs, we propose gPOIM, a general measure of feature importance for arbitrary learning machines and feature sets (including, but not limited to, SVMs and CNNs) and devise a sampling strategy for efficient computation. As a second contribution, we derive a convex formulation of motifPOIMs that leads to more reliable motif extraction from gPOIMs. Empirical evaluations confirm the usefulness of our approach on artificially generated data as well as on real-world datasets. PMID:28346487

  14. Bayesian Markov models consistently outperform PWMs at predicting motifs in nucleotide sequences

    PubMed Central

    Siebert, Matthias; Söding, Johannes

    2016-01-01

    Position weight matrices (PWMs) are the standard model for DNA and RNA regulatory motifs. In PWMs nucleotide probabilities are independent of nucleotides at other positions. Models that account for dependencies need many parameters and are prone to overfitting. We have developed a Bayesian approach for motif discovery using Markov models in which conditional probabilities of order k − 1 act as priors for those of order k. This Bayesian Markov model (BaMM) training automatically adapts model complexity to the amount of available data. We also derive an EM algorithm for de-novo discovery of enriched motifs. For transcription factor binding, BaMMs achieve significantly (P    =  1/16) higher cross-validated partial AUC than PWMs in 97% of 446 ChIP-seq ENCODE datasets and improve performance by 36% on average. BaMMs also learn complex multipartite motifs, improving predictions of transcription start sites, polyadenylation sites, bacterial pause sites, and RNA binding sites by 26–101%. BaMMs never performed worse than PWMs. These robust improvements argue in favour of generally replacing PWMs by BaMMs. PMID:27288444

  15. Automated classification of RNA 3D motifs and the RNA 3D Motif Atlas

    PubMed Central

    Petrov, Anton I.; Zirbel, Craig L.; Leontis, Neocles B.

    2013-01-01

    The analysis of atomic-resolution RNA three-dimensional (3D) structures reveals that many internal and hairpin loops are modular, recurrent, and structured by conserved non-Watson–Crick base pairs. Structurally similar loops define RNA 3D motifs that are conserved in homologous RNA molecules, but can also occur at nonhomologous sites in diverse RNAs, and which often vary in sequence. To further our understanding of RNA motif structure and sequence variability and to provide a useful resource for structure modeling and prediction, we present a new method for automated classification of internal and hairpin loop RNA 3D motifs and a new online database called the RNA 3D Motif Atlas. To classify the motif instances, a representative set of internal and hairpin loops is automatically extracted from a nonredundant list of RNA-containing PDB files. Their structures are compared geometrically, all-against-all, using the FR3D program suite. The loops are clustered into motif groups, taking into account geometric similarity and structural annotations and making allowance for a variable number of bulged bases. The automated procedure that we have implemented identifies all hairpin and internal loop motifs previously described in the literature. All motif instances and motif groups are assigned unique and stable identifiers and are made available in the RNA 3D Motif Atlas (http://rna.bgsu.edu/motifs), which is automatically updated every four weeks. The RNA 3D Motif Atlas provides an interactive user interface for exploring motif diversity and tools for programmatic data access. PMID:23970545

  16. Automated classification of RNA 3D motifs and the RNA 3D Motif Atlas.

    PubMed

    Petrov, Anton I; Zirbel, Craig L; Leontis, Neocles B

    2013-10-01

    The analysis of atomic-resolution RNA three-dimensional (3D) structures reveals that many internal and hairpin loops are modular, recurrent, and structured by conserved non-Watson-Crick base pairs. Structurally similar loops define RNA 3D motifs that are conserved in homologous RNA molecules, but can also occur at nonhomologous sites in diverse RNAs, and which often vary in sequence. To further our understanding of RNA motif structure and sequence variability and to provide a useful resource for structure modeling and prediction, we present a new method for automated classification of internal and hairpin loop RNA 3D motifs and a new online database called the RNA 3D Motif Atlas. To classify the motif instances, a representative set of internal and hairpin loops is automatically extracted from a nonredundant list of RNA-containing PDB files. Their structures are compared geometrically, all-against-all, using the FR3D program suite. The loops are clustered into motif groups, taking into account geometric similarity and structural annotations and making allowance for a variable number of bulged bases. The automated procedure that we have implemented identifies all hairpin and internal loop motifs previously described in the literature. All motif instances and motif groups are assigned unique and stable identifiers and are made available in the RNA 3D Motif Atlas (http://rna.bgsu.edu/motifs), which is automatically updated every four weeks. The RNA 3D Motif Atlas provides an interactive user interface for exploring motif diversity and tools for programmatic data access.

  17. Gibbs motif sampling: detection of bacterial outer membrane protein repeats.

    PubMed Central

    Neuwald, A. F.; Liu, J. S.; Lawrence, C. E.

    1995-01-01

    The detection and alignment of locally conserved regions (motifs) in multiple sequences can provide insight into protein structure, function, and evolution. A new Gibbs sampling algorithm is described that detects motif-encoding regions in sequences and optimally partitions them into distinct motif models; this is illustrated using a set of immunoglobulin fold proteins. When applied to sequences sharing a single motif, the sampler can be used to classify motif regions into related submodels, as is illustrated using helix-turn-helix DNA-binding proteins. Other statistically based procedures are described for searching a database for sequences matching motifs found by the sampler. When applied to a set of 32 very distantly related bacterial integral outer membrane proteins, the sampler revealed that they share a subtle, repetitive motif. Although BLAST (Altschul SF et al., 1990, J Mol Biol 215:403-410) fails to detect significant pairwise similarity between any of the sequences, the repeats present in these outer membrane proteins, taken as a whole, are highly significant (based on a generally applicable statistical test for motifs described here). Analysis of bacterial porins with known trimeric beta-barrel structure and related proteins reveals a similar repetitive motif corresponding to alternating membrane-spanning beta-strands. These beta-strands occur on the membrane interface (as opposed to the trimeric interface) of the beta-barrel. The broad conservation and structural location of these repeats suggests that they play important functional roles. PMID:8520488

  18. Using the Gibbs Motif Sampler for Phylogenetic Footprinting

    SciTech Connect

    Thompson, William; Conlan, Sean; McCue, Lee Ann; Lawrence, Charles

    2007-07-01

    The Gibbs Motif Sampler (Gibbs) (1) is a software package used to predict conserved elements in biopolymer sequences. While the software can be used to locate conserved motifs in protein sequences, its most common use is the prediction of transcription factor binding sites (TFBSs) in promoters upstream of gene sequences. We will describe approaches that use Gibbs to locate TFBSs in a collection of orthologous nucleotide sequences, i.e. phylogenetic footprinting. To illustrate this technique, we present examples that use Gibbs to detect binding sites for the transcription factor LexA in orthologous sequence data from representative species belonging to two different proteobacterial divisions.

  19. The amino acid motif L/IIxxFE defines a novel actin-binding sequence in PDZ-RhoGEF.

    PubMed

    Banerjee, Jayashree; Fischer, Christopher C; Wedegaertner, Philip B

    2009-08-25

    PDZ-RhoGEF is a member of the regulator family of G protein signaling (RGS) domain-containing RhoGEFs (RGS-RhoGEFs) that link activated heterotrimeric G protein alpha subunits of the G12 family to activation of the small GTPase RhoA. Unique among the RGS-RhoGEFs, PDZ-RhoGEF contains a short sequence that localizes the protein to the actin cytoskeleton. In this report, we demonstrate that the actin-binding domain, located between amino acids 561 and 585, directly binds to F-actin in vitro. Extensive mutagenesis identifies isoleucine 568, isoleucine 569, phenylalanine 572, and glutamic acid 573 as being necessary for binding to actin and for colocalization with the actin cytoskeleton in cells. These results define a novel actin-binding sequence in PDZ-RhoGEF with a critical amino acid motif of IIxxFE. Moreover, sequence analysis identifies a similar actin-binding motif in the N-terminus of the RhoGEF frabin, and as with PDZ-RhoGEF, mutagenesis and actin interaction experiments demonstrate an LIxxFE motif, consisting of the key amino acids leucine 23, isoleucine 24, phenylalanine 27, and glutamic acid 28. Taken together, results with PDZ-RhoGEF and frabin identify a novel actin-binding sequence. Lastly, inducible dimerization of the actin-binding region of PDZ-RhoGEF revealed a dimerization-dependent actin bundling activity in vitro. PDZ-RhoGEF exists in cells as a dimer, raising the possibility that PDZ-RhoGEF could influence actin structure in a manner independent of its ability to activate RhoA.

  20. Definition of the tempo of sequence diversity across an alignment and automatic identification of sequence motifs: Application to protein homologous families and superfamilies.

    PubMed

    May, Alex C W

    2002-12-01

    It is often possible to identify sequence motifs that characterize a protein family in terms of its fold and/or function from aligned protein sequences. Such motifs can be used to search for new family members. Partitioning of sequence alignments into regions of similar amino acid variability is usually done by hand. Here, I present a completely automatic method for this purpose: one that is guaranteed to produce globally optimal solutions at all levels of partition granularity. The method is used to compare the tempo of sequence diversity across reliable three-dimensional (3D) structure-based alignments of 209 protein families (HOMSTRAD) and that for 69 superfamilies (CAMPASS). (The mean alignment length for HOMSTRAD and CAMPASS are very similar.) Surprisingly, the optimal segmentation distributions for the closely related proteins and distantly related ones are found to be very similar. Also, optimal segmentation identifies an unusual protein superfamily. Finally, protein 3D structure clues from the tempo of sequence diversity across alignments are examined. The method is general, and could be applied to any area of comparative biological sequence and 3D structure analysis where the constraint of the inherent linear organization of the data imposes an ordering on the set of objects to be clustered.

  1. Properties of Sequence Conservation in Upstream Regulatory and Protein Coding Sequences among Paralogs in Arabidopsis thaliana

    NASA Astrophysics Data System (ADS)

    Richardson, Dale N.; Wiehe, Thomas

    Whole genome duplication (WGD) has catalyzed the formation of new species, genes with novel functions, altered expression patterns, complexified signaling pathways and has provided organisms a level of genetic robustness. We studied the long-term evolution and interrelationships of 5’ upstream regulatory sequences (URSs), protein coding sequences (CDSs) and expression correlations (EC) of duplicated gene pairs in Arabidopsis. Three distinct methods revealed significant evolutionary conservation between paralogous URSs and were highly correlated with microarray-based expression correlation of the respective gene pairs. Positional information on exact matches between sequences unveiled the contribution of micro-chromosomal rearrangements on expression divergence. A three-way rank analysis of URS similarity, CDS divergence and EC uncovered specific gene functional biases. Transcription factor activity was associated with gene pairs exhibiting conserved URSs and divergent CDSs, whereas a broad array of metabolic enzymes was found to be associated with gene pairs showing diverged URSs but conserved CDSs.

  2. Conservation patterns in different functional sequence categoriesof divergent Drosophila species

    SciTech Connect

    Papatsenko, Dmitri; Kislyuk, Andrey; Levine, Michael; Dubchak, Inna

    2005-10-01

    We have explored the distributions of fully conservedungapped blocks in genome-wide pairwise alignments of recently completedspecies of Drosophila: D.yakuba, D.ananassae, D.pseudoobscura, D.virilisand D.mojavensis. Based on these distributions we have found that nearlyevery functional sequence category possesses its own distinctiveconservation pattern, sometimes independent of the overall sequenceconservation level. In the coding and regulatory regions, the ungappedblocks were longer than in introns, UTRs and non-functional sequences. Atthe same time, the blocks in the coding regions carried 3N+2 signaturecharacteristic to synonymic substitutions in the 3rd codon positions.Larger block sizes in transcription regulatory regions can be explainedby the presence of conserved arrays of binding sites for transcriptionfactors. We also have shown that the longest ungapped blocks, or'ultraconserved' sequences, are associated with specific gene groups,including those encoding ion channels and components of the cytoskeleton.We discussed how restrained conservation patterns may help in mappingfunctional sequence categories and improving genomeannotation.

  3. Conservative Patch Algorithm and Mesh Sequencing for PAB3D

    NASA Technical Reports Server (NTRS)

    Pao, S. P.; Abdol-Hamid, K. S.

    2005-01-01

    A mesh-sequencing algorithm and a conservative patched-grid-interface algorithm (hereafter Patch Algorithm ) have been incorporated into the PAB3D code, which is a computer program that solves the Navier-Stokes equations for the simulation of subsonic, transonic, or supersonic flows surrounding an aircraft or other complex aerodynamic shapes. These algorithms are efficient, flexible, and have added tremendously to the capabilities of PAB3D. The mesh-sequencing algorithm makes it possible to perform preliminary computations using only a fraction of the grid cells (provided the original cell count is divisible by an integer) along any grid coordinate axis, independently of the other axes. The patch algorithm addresses another critical need in multi-block grid situation where the cell faces of adjacent grid blocks may not coincide, leading to errors in calculating fluxes of conserved physical quantities across interfaces between the blocks. The patch algorithm, based on the Stokes integral formulation of the applicable conservation laws, effectively matches each of the interfacial cells on one side of the block interface to the corresponding fractional cell area pieces on the other side. This approach is comprehensive and unified such that all interface topology is automatically processed without user intervention. This algorithm is implemented in a preprocessing code that creates a cell-by-cell database that will maintain flux conservation at any level of full or reduced grid density as the user may choose by way of the mesh-sequencing algorithm. These two algorithms have enhanced the numerical accuracy of the code, reduced the time and effort for grid preprocessing, and provided users with the flexibility of performing computations at any desired full or reduced grid resolution to suit their specific computational requirements.

  4. Sequence conservation predicts T cell reactivity against ragweed allergens

    PubMed Central

    Pham, John; Oseroff, Carla; Hinz, Denise; Sidney, John; Paul, Sinu; Greenbaum, Jason; Vita, Randi; Phillips, Elizabeth; Mallal, Simon; Peters, Bjoern; Sette, Alessandro

    2016-01-01

    Background Ragweed is a major cause of seasonal allergy, affecting millions of people worldwide. Several allergens have been defined based on IgE reactivity, but their relative immunogenicity in terms of T cell responses has not been studied. Objective We comprehensively characterized T cell responses from atopic, ragweed-allergic subjects to Amb a 1, Amb a 3, Amb a 4, Amb a 5, Amb a 6, Amb a 8, Amb a 9, Amb a 10, Amb a 11, and Amb p 5, and examined their correlation with serological reactivity and sequence conservation in other allergens. Methods Peripheral blood mononuclear cells (PBMCs) from donors positive for IgE toward ragweed extracts after in vitro expansion for secretion of IL-5 (a representative Th2 cytokine) and IFNγ (Th1) in response to a panel of overlapping peptides spanning the above listed allergens. Results Three previously identified dominant T cell epitopes (Amb a 1 176–191, 200–215, and 344–359) were confirmed and three novel dominant epitopes (Amb a 1 280–295, 304–319, and 320–335) were identified. Amb a 1, the dominant IgE allergen, was also the dominant T cell allergen, but dominance patterns for T cell and IgE responses for the other ragweed allergens did not correlate. Dominance for T cell responses correlated with conservation of ragweed epitopes with sequences of other well-known allergens. Conclusion and clinical relevance These results provide the first assessment of the hierarchy of T cell reactivity in ragweed allergens, which is distinct from that observed for IgE reactivity and influenced by T cell epitope sequence conservation. The results suggest that ragweed allergens associated with lesser IgE reactivity and significant T cell reactivity may be targeted for T cell immunotherapy, and further support the development of immunotherapies against epitopes conserved across species to generate broad reactivity against many common allergens. PMID:27359111

  5. Conservation patterns in angiosperm rDNA ITS2 sequences.

    PubMed Central

    Hershkovitz, M A; Zimmer, E A

    1996-01-01

    The two internal transcribed spacers (ITS1 and ITS2) of nuclear ribosomal DNA have become commonly exploited sources of informative variation for interspecific-/intergeneric-level phylogenetic analyses among angiosperms and other eukaryotes. We present an alignment in which one-third to one-half of the ITS2 sequence is alignable above the family level in angiosperms and a phenetic analysis showing that ITS2 contains information sufficient to diagnose lineages at several hierarchical levels. Base compositional analysis shows that angiosperm ITS2 is inherently GC-rich, and that the proportion of T is much more variable than that for other bases. We propose a general model of angiosperm ITS2 secondary structure that shows common pairing relationships for most of the conserved sequence tracts. Variations in our secondary structure predictions for sequences from different taxa indicate that compensatory mutation is not limited to paired positions. PMID:8760866

  6. Composite Conserved Promoter–Terminator Motifs (PeSLs) that Mediate Modular Shuffling in the Diverse T4-Like Myoviruses

    PubMed Central

    Comeau, André M.; Arbiol, Christine; Krisch, Henry M.

    2014-01-01

    The diverse T4-like phages (Tquatrovirinae) infect a wide array of gram-negative bacterial hosts. The genome architecture of these phages is generally well conserved, most of the phylogenetically variable genes being grouped together in a series hyperplastic regions (HPRs) that are interspersed among large blocks of conserved core genes. Recent evidence from a pair of closely related T4-like phages has suggested that small, composite terminator/promoter sequences (promoterearly stem loop [PeSLs]) were implicated in mediating the high levels of genetic plasticity by indels occurring within the HPRs. Here, we present the genome sequence analysis of two T4-like phages, PST (168 kb, 272 open reading frames [ORFs]) and nt-1 (248 kb, 405 ORFs). These two phages were chosen for comparative sequence analysis because, although they are closely related to phages that have been previously sequenced (T4 and KVP40, respectively), they have different host ranges. In each case, one member of the pair infects a bacterial strain that is a human pathogen, whereas the other phage’s host is a nonpathogen. Despite belonging to phylogenetically distant branches of the T4-likes, these pairs of phage have diverged from each other in part by a mechanism apparently involving PeSL-mediated recombination. This analysis confirms a role of PeSL sequences in the generation of genomic diversity by serving as a point of genetic exchange between otherwise unrelated sequences within the HPRs. Finally, the palette of divergent genes swapped by PeSL-mediated homologous recombination is discussed in the context of the PeSLs’ potentially important role in facilitating phage adaption to new hosts and environments. PMID:24951563

  7. HIV-1 conserved-element vaccines: relationship between sequence conservation and replicative capacity.

    PubMed

    Rolland, Morgane; Manocheewa, Siriphan; Swain, J Victor; Lanxon-Cookson, Erinn C; Kim, Moon; Westfall, Dylan H; Larsen, Brendan B; Gilbert, Peter B; Mullins, James I

    2013-05-01

    To overcome the problem of HIV-1 variability, candidate vaccine antigens have been designed to be composed of conserved elements of the HIV-1 proteome. Such candidate vaccines could be improved with a better understanding of both HIV-1 evolutionary constraints and the fitness cost of specific mutations. We evaluated the in vitro fitness cost of 23 mutations engineered in the HIV-1 subtype B Gag-p24 Center-of-Tree (COT) protein through fitness competition assays. While some mutations at conserved sites exacted a high fitness cost, as expected under the assumption that the most conserved residue confers the highest fitness, there was no overall strong relationship between sequence conservation and replicative capacity. By comparing sites that have evolved since the beginning of the epidemic to those that have remain unchanged, we found that sites that have evolved over time were more likely to correspond to HLA-associated sites and that their mutation had limited fitness costs. Our data showed no transcendent link between high conservation and high fitness cost, indicating that merely focusing on conserved segments of HIV-1 would not be sufficient for a successful vaccine strategy. Nonetheless, a subset of sites exacted a high fitness cost upon mutation--these sites have been under selective pressure to change since the beginning of the epidemic but have proved virtually nonmutable and could constitute preferred targets for vaccine design.

  8. A conserved secondary structural motif in 23S rRNA defines the site of interaction of amicetin, a universal inhibitor of peptide bond formation.

    PubMed Central

    Leviev, I G; Rodriguez-Fonseca, C; Phan, H; Garrett, R A; Heilek, G; Noller, H F; Mankin, A S

    1994-01-01

    The binding site and probable site of action have been determined for the universal antibiotic amicetin which inhibits peptide bond formation. Evidence from in vivo mutants, site-directed mutations and chemical footprinting all implicate a highly conserved motif in the secondary structure of the 23S-like rRNA close to the central circle of domain V. We infer that this motif lies at, or close to, the catalytic site in the peptidyl transfer centre. The binding site of amicetin is the first of a group of functionally related hexose-cytosine inhibitors to be localized on the ribosome. Images PMID:8157007

  9. Evolutionarily Conserved Dual Lysine Motif Determines the Non-Chaperone Function of Secreted Hsp90alpha in Tumor Progression

    PubMed Central

    Sahu, Divya; Hou, Yingping; Tsen, Fred; Tong, Chang; O’Brien, Kathryn; Situ, Alan J.; Schmidt, Thomas; Chen, Mei; Ying, Qilong; Ulmer, Tobias S.; Woodley, David T.; Li, Wei

    2016-01-01

    Both intracellular and extracellular heat shock protein-90 (Hsp90) family proteins (α and β) have been shown to support tumor progression. The tumor-promoting activity of the intracellular Hsp90 proteins is attributed to their N-terminal ATPase-driven chaperone function. What determines the extracellular function of secreted Hsp90 was unclear. Here we show that knocking out Hsp90α nullifies tumor cell abilities to migrate, invade and metastasize without affecting cell survival and growth. Knocking out Hsp90β leads to cell death. Extracellular supplementation with recombinant Hsp90α, but not Hsp90β, protein recovers the tumorigenicity of Hsp90α-knockout cells. Sequential mutagenesis identifies two evolutionarily conserved lysine residues, lys-270 and lys-277, in Hsp90α subfamily that determine the extracellular Hsp90α function. Hsp90β subfamily lacks the dual lysine motif and does not show the same extracellular function. Substitutions of gly-262 and thr-269 in Hsp90β with lysines convert Hsp90β to act as Hsp90α outside the cells. Monoclonal antibody, 1G6-D7, against the dual lysine region of secreted Hsp90α blocks de novo tumor formation and significantly inhibits expansion of already formed tumors. This study suggests an alternative therapeutic approach to selectively target the extracellular Hsp90α to the conventional approach targeting the ATPase of intracellular Hsp90α and Hsp90β in cancer. PMID:27721406

  10. Exploring the conserved water site and hydration of a coiled-coil trimerisation motif: a MD simulation study.

    PubMed

    Dolenc, Jozica; Baron, Riccardo; Missimer, John H; Steinmetz, Michel O; van Gunsteren, Wilfred F

    2008-07-21

    The solvent structure and dynamics around ccbeta-p, a 17-residue peptide that forms a parallel three-stranded alpha-helical coiled coil in solution, was analysed through 10 ns explicit solvent molecular dynamics (MD) simulations at 278 and 330 K. Comparison with two corresponding simulations of the monomeric form of ccbeta-p was used to investigate the changes of hydration upon coiled-coil formation. Pronounced peaks in the solvent density distribution between residues Arg8 and Glu13 of neighbouring helices show the presence of water bridges between the helices of the ccbeta-p trimer; this is in agreement with the water sites observed in X-ray crystallography experiments. Interestingly, this water site is structurally conserved in many three-stranded coiled coils and, together with the Arg and Glu residues, forms part of a motif that determines three-stranded coiled-coil formation. Our findings show that little direct correlation exists between the solvent density distribution and the temporal ordering of water around the trimeric coiled coil. The MD-calculated effective residence times of up to 40 ps show rapid exchange of surface water molecules with the bulk phase, and indicate that the solvent distribution around biomolecules requires interpretation in terms of continuous density distributions rather than in terms of discrete molecules of water. Together, our study contributes to understanding the principles of three-stranded coiled-coil formation.

  11. One to rule them all: A highly conserved motif in mariner transposase controls multiple steps of transposition.

    PubMed

    Bouuaert, Corentin Claeys; Tellier, Michael; Chalmers, Ronald

    2014-01-01

    The development of transposon-based genome manipulation tools can benefit greatly from understanding transposons' inherent regulatory mechanisms. The Tc1-mariner transposons, which are being widely used in biotechnological applications, are subject to a self-inhibitory mechanism whereby increasing transposase expression beyond a certain point decreases the rate of transposition. In a recent paper, Liu and Chalmers performed saturating mutagenesis on the highly conserved WVPHEL motif in the mariner-family transposase from the Hsmar1 element. Curiously, they found that the majority of all possible single mutations were hyperactive. Biochemical characterizations of the mutants revealed that the hyperactivity is due to a defect in communication between transposase subunits, which normally regulates transposition by reducing the rate of synapsis. This provides important clues for improving transposon-based tools. However, some WVPHEL mutants also showed features that would be undesirable for most biotechnological applications: they showed uncontrolled DNA cleavage activities and defects in the coordination of cleavage between the two transposon ends. The study illustrates how the knowledge of inhibitory mechanisms can help improve transposon tools but also highlights an important challenge, which is to specifically target a regulatory mechanism without affecting other important functions of the transposase.

  12. In Vivo Enhancer Analysis Chromosome 16 Conserved NoncodingSequences

    SciTech Connect

    Pennacchio, Len A.; Ahituv, Nadav; Moses, Alan M.; Nobrega,Marcelo; Prabhakar, Shyam; Shoukry, Malak; Minovitsky, Simon; Visel,Axel; Dubchak, Inna; Holt, Amy; Lewis, Keith D.; Plajzer-Frick, Ingrid; Akiyama, Jennifer; De Val, Sarah; Afzal, Veena; Black, Brian L.; Couronne, Olivier; Eisen, Michael B.; Rubin, Edward M.

    2006-02-01

    The identification of enhancers with predicted specificitiesin vertebrate genomes remains a significant challenge that is hampered bya lack of experimentally validated training sets. In this study, weleveraged extreme evolutionary sequence conservation as a filter toidentify putative gene regulatory elements and characterized the in vivoenhancer activity of human-fish conserved and ultraconserved1 noncodingelements on human chromosome 16 as well as such elements from elsewherein the genome. We initially tested 165 of these extremely conservedsequences in a transgenic mouse enhancer assay and observed that 48percent (79/165) functioned reproducibly as tissue-specific enhancers ofgene expression at embryonic day 11.5. While driving expression in abroad range of anatomical structures in the embryo, the majority of the79 enhancers drove expression in various regions of the developingnervous system. Studying a set of DNA elements that specifically droveforebrain expression, we identified DNA signatures specifically enrichedin these elements and used these parameters to rank all ~;3,400human-fugu conserved noncoding elements in the human genome. The testingof the top predictions in transgenic mice resulted in a three-foldenrichment for sequences with forebrain enhancer activity. These datadramatically expand the catalogue of in vivo-characterized human geneenhancers and illustrate the future utility of such training sets for avariety of iological applications including decoding the regulatoryvocabulary of the human genome.

  13. Automated conserved non-coding sequence (CNS) discovery reveals differences in gene content and promoter evolution among grasses.

    PubMed

    Turco, Gina; Schnable, James C; Pedersen, Brent; Freeling, Michael

    2013-01-01

    Conserved non-coding sequences (CNS) are islands of non-coding sequence that, like protein coding exons, show less divergence in sequence between related species than functionless DNA. Several CNSs have been demonstrated experimentally to function as cis-regulatory regions. However, the specific functions of most CNSs remain unknown. Previous searches for CNS in plants have either anchored on exons and only identified nearby sequences or required years of painstaking manual annotation. Here we present an open source tool that can accurately identify CNSs between any two related species with sequenced genomes, including both those immediately adjacent to exons and distal sequences separated by >12 kb of non-coding sequence. We have used this tool to characterize new motifs, associate CNSs with additional functions, and identify previously undetected genes encoding RNA and protein in the genomes of five grass species. We provide a list of 15,363 orthologous CNSs conserved across all grasses tested. We were also able to identify regulatory sequences present in the common ancestor of grasses that have been lost in one or more extant grass lineages. Lists of orthologous gene pairs and associated CNSs are provided for reference inbred lines of arabidopsis, Japonica rice, foxtail millet, sorghum, brachypodium, and maize.

  14. Inter-specific sequence conservation and intra-individual sequence variation in a spider silk gene.

    PubMed

    Tai, Pei-Ling; Hwang, Guang-Yuh; Tso, I-Min

    2004-10-01

    Currently, studies on major ampullate spidroin 1 (MaSp1) genes of non-orb weaving spiders are few, and it is not clear whether genes of these organisms exhibit the same characteristics as those of orb-weavers. In addition, many studies have proposed that MaSp1 might be a single gene with allelic variants, but supporting evidence is still lacking. In this study, we compared partial DNA and amino acid sequences of MaSp1 cloned from different spider guilds. We also cloned partial MaSp1 sequences from genomic DNA and cDNA of the same individuals of spiders using the same primer combination to see if different molecular forms existed. In the repetitive region of partial MaSp1 sequences obtained, GGX, GA and poly-A motifs were present in all Araneomorphae and Mygalomorpae species examined. An extreme similarity in MaSp1 non-repetitive portions was found in sequences of ecribellate, cribellate and Mygalomorphae web-builders and such a result suggested that this sequence might exhibit an important function. A comparison of sequences amplified from the same individual showed that substitutions in amino acids occurred in both repetitive and non-repetitive regions, with a much higher variation in the former. These results suggest that the MaSp1 of Araneomorphae spiders exhibits several forms in an individual spider and it might be either a multiple gene or a single gene with a multiple exon/intron organization.

  15. Sequence-specific DNA binding by MYC/MAX to low-affinity non-E-box motifs

    PubMed Central

    Allevato, Michael; Bolotin, Eugene; Grossman, Mark; Mane-Padros, Daniel; Sladek, Frances M.

    2017-01-01

    The MYC oncoprotein regulates transcription of a large fraction of the genome as an obligatory heterodimer with the transcription factor MAX. The MYC:MAX heterodimer and MAX:MAX homodimer (hereafter MYC/MAX) bind Enhancer box (E-box) DNA elements (CANNTG) and have the greatest affinity for the canonical MYC E-box (CME) CACGTG. However, MYC:MAX also recognizes E-box variants and was reported to bind DNA in a “non-specific” fashion in vitro and in vivo. Here, in order to identify potential additional non-canonical binding sites for MYC/MAX, we employed high throughput in vitro protein-binding microarrays, along with electrophoretic mobility-shift assays and bioinformatic analyses of MYC-bound genomic loci in vivo. We identified all hexameric motifs preferentially bound by MYC/MAX in vitro, which include the low-affinity non-E-box sequence AACGTT, and found that the vast majority (87%) of MYC-bound genomic sites in a human B cell line contain at least one of the top 21 motifs bound by MYC:MAX in vitro. We further show that high MYC/MAX concentrations are needed for specific binding to the low-affinity sequence AACGTT in vitro and that elevated MYC levels in vivo more markedly increase the occupancy of AACGTT sites relative to CME sites, especially at distal intergenic and intragenic loci. Hence, MYC binds diverse DNA motifs with a broad range of affinities in a sequence-specific and dose-dependent manner, suggesting that MYC overexpression has more selective effects on the tumor transcriptome than previously thought. PMID:28719624

  16. A Glance at Microsatellite Motifs from 454 Sequencing Reads of Watermelon Genomic DNA

    USDA-ARS?s Scientific Manuscript database

    A single 454 (Life Sciences Sequencing Technology) run of Charleston Gray watermelon (Citrullus lanatus var. lanatus) genomic DNA was performed and sequence data were assembled. A large scale identification of simple sequence repeat (SSR) was performed and SSR sequence data were used for the develo...

  17. Conservation analysis predicts in vivo occupancy of glucocorticoid receptor-binding sequences at glucocorticoid-induced genes.

    PubMed

    So, Alex Yick-Lun; Cooper, Samantha B; Feldman, Brian J; Manuchehri, Mitra; Yamamoto, Keith R

    2008-04-15

    The glucocorticoid receptor (GR) interacts with specific GR-binding sequences (GBSs) at glucocorticoid response elements (GREs) to orchestrate transcriptional networks. Although the sequences of the GBSs are highly variable among different GREs, the precise sequence within an individual GRE is highly conserved. In this study, we examined whether sequence conservation of sites resembling GBSs is sufficient to predict GR occupancy of GREs at genes responsive to glucocorticoids. Indeed, we found that the level of conservation of these sites at genes up-regulated by glucocorticoids in mouse C3H10T1/2 mesenchymal stem-like cells correlated directly with the extent of occupancy by GR. In striking contrast, we failed to observe GR occupancy of GBSs at genes repressed by glucocorticoids, despite the occurrence of these sites at a frequency similar to that of the induced genes. Thus, GR occupancy of the GBS motif correlates with induction but not repression, and GBS conservation alone is sufficient to predict GR occupancy and GRE function at induced genes.

  18. Fox-2 Splicing Factor Binds to a Conserved Intron Motif to PromoteInclusion of Protein 4.1R Alternative Exon 16

    SciTech Connect

    Ponthier, Julie L.; Schluepen, Christina; Chen, Weiguo; Lersch,Robert A.; Gee, Sherry L.; Hou, Victor C.; Lo, Annie J.; Short, Sarah A.; Chasis, Joel A.; Winkelmann, John C.; Conboy, John G.

    2006-03-01

    Activation of protein 4.1R exon 16 (E16) inclusion during erythropoiesis represents a physiologically important splicing switch that increases 4.1R affinity for spectrin and actin. Previous studies showed that negative regulation of E16 splicing is mediated by the binding of hnRNP A/B proteins to silencer elements in the exon and that downregulation of hnRNP A/B proteins in erythroblasts leads to activation of E16 inclusion. This paper demonstrates that positive regulation of E16 splicing can be mediated by Fox-2 or Fox-1, two closely related splicing factors that possess identical RNA recognition motifs. SELEX experiments with human Fox-1 revealed highly selective binding to the hexamer UGCAUG. Both Fox-1 and Fox-2 were able to bind the conserved UGCAUG elements in the proximal intron downstream of E16, and both could activate E16 splicing in HeLa cell co-transfection assays in a UGCAUG-dependent manner. Conversely, knockdown of Fox-2 expression, achieved with two different siRNA sequences resulted in decreased E16 splicing. Moreover, immunoblot experiments demonstrate mouse erythroblasts express Fox-2, but not Fox-1. These findings suggest that Fox-2 is a physiological activator of E16 splicing in differentiating erythroid cells in vivo. Recent experiments show that UGCAUG is present in the proximal intron sequence of many tissue-specific alternative exons, and we propose that the Fox family of splicing enhancers plays an important role in alternative splicing switches during differentiation in metazoan organisms.

  19. A conserved motif in the linker domain of STAT1 transcription factor is required for both recognition and release from high-affinity DNA-binding sites.

    PubMed

    Hüntelmann, Bettina; Staab, Julia; Herrmann-Lingen, Christoph; Meyer, Thomas

    2014-01-01

    Binding to specific palindromic sequences termed gamma-activated sites (GAS) is a hallmark of gene activation by members of the STAT (signal transducer and activator of transcription) family of cytokine-inducible transcription factors. However, the precise molecular mechanisms involved in the signal-dependent finding of target genes by STAT dimers have not yet been very well studied. In this study, we have characterized a sequence motif in the STAT1 linker domain which is highly conserved among the seven human STAT proteins and includes surface-exposed residues in close proximity to the bound DNA. Using site-directed mutagenesis, we have demonstrated that a lysine residue in position 567 of the full-length molecule is required for GAS recognition. The substitution of alanine for this residue completely abolished both binding to high-affinity GAS elements and transcriptional activation of endogenous target genes in cells stimulated with interferon-γ (IFNγ), while the time course of transient nuclear accumulation and tyrosine phosphorylation were virtually unchanged. In contrast, two glutamic acid residues (E559 and E563) on each monomer are important for the dissociation of dimeric STAT1 from DNA and, when mutated to alanine, result in elevated levels of tyrosine-phosphorylated STAT1 as well as prolonged IFNγ-stimulated nuclear accumulation. In conclusion, our data indicate that the kinetics of signal-dependent GAS binding is determined by an array of glutamic acid residues located at the interior surface of the STAT1 dimer. These negatively charged residues appear to align the long axis of the STAT1 dimer in a position perpendicular to the DNA, thereby facilitating the interaction between lysine 567 and the phosphodiester backbone of a bound GAS element, which is a prerequisite for transient gene induction.

  20. Co-conservation of rRNA tetraloop sequences and helix length suggests involvement of the tetraloops in higher-order interactions

    NASA Technical Reports Server (NTRS)

    Hedenstierna, K. O.; Siefert, J. L.; Fox, G. E.; Murgola, E. J.

    2000-01-01

    Terminal loops containing four nucleotides (tetraloops) are common in structural RNAs, and they frequently conform to one of three sequence motifs, GNRA, UNCG, or CUUG. Here we compare available sequences and secondary structures for rRNAs from bacteria, and we show that helices capped by phylogenetically conserved GNRA loops display a strong tendency to be of conserved length. The simplest interpretation of this correlation is that the conserved GNRA loops are involved in higher-order interactions, intramolecular or intermolecular, resulting in a selective pressure for maintaining the lengths of these helices. A small number of conserved UNCG loops were also found to be associated with conserved length helices, consistent with the possibility that this type of tetraloop also takes part in higher-order interactions.

  1. Co-conservation of rRNA tetraloop sequences and helix length suggests involvement of the tetraloops in higher-order interactions

    NASA Technical Reports Server (NTRS)

    Hedenstierna, K. O.; Siefert, J. L.; Fox, G. E.; Murgola, E. J.

    2000-01-01

    Terminal loops containing four nucleotides (tetraloops) are common in structural RNAs, and they frequently conform to one of three sequence motifs, GNRA, UNCG, or CUUG. Here we compare available sequences and secondary structures for rRNAs from bacteria, and we show that helices capped by phylogenetically conserved GNRA loops display a strong tendency to be of conserved length. The simplest interpretation of this correlation is that the conserved GNRA loops are involved in higher-order interactions, intramolecular or intermolecular, resulting in a selective pressure for maintaining the lengths of these helices. A small number of conserved UNCG loops were also found to be associated with conserved length helices, consistent with the possibility that this type of tetraloop also takes part in higher-order interactions.

  2. Conserved Ser/Arg-rich Motif in PPZ Orthologs from Fungi Is Important for Its Role in Cation Tolerance

    PubMed Central

    Minhas, Anupriya; Sharma, Anupam; Kaur, Harsimran; Rawal, Yashpal; Ganesan, Kaliannan; Mondal, Alok K.

    2012-01-01

    PPZ1 orthologs, novel members of a phosphoprotein phosphatase family of phosphatases, are found only in fungi. They regulate diverse physiological processes in fungi e.g. ion homeostasis, cell size, cell integrity, etc. Although they are an important determinant of salt tolerance in fungi, their physiological role remained unexplored in any halotolerant species. In this context we report here molecular and functional characterization of DhPPZ1 from Debaryomyces hansenii, which is one of the most halotolerant and osmotolerant species of yeast. Our results showed that DhPPZ1 knock-out strain displayed higher tolerance to toxic cations, and unlike in Saccharomyces cerevisiae, Na+/H+ antiporter appeared to have an important role in this process. Besides salt tolerance, DhPPZ1 also had role in cell wall integrity and growth in D. hansenii. We have also identified a short, serine-arginine-rich sequence motif in DhPpz1p that is essential for its role in salt tolerance but not in other physiological processes. Taken together, these results underscore a distinct role of DhPpz1p in D. hansenii and illustrate an example of how organisms utilize the same molecular tool box differently to garner adaptive fitness for their respective ecological niches. PMID:22232558

  3. The Ku-binding motif is a conserved module for recruitment and stimulation of non-homologous end-joining proteins

    PubMed Central

    Grundy, Gabrielle J.; Rulten, Stuart L.; Arribas-Bosacoma, Raquel; Davidson, Kathryn; Kozik, Zuzanna; Oliver, Antony W.; Pearl, Laurence H.; Caldecott, Keith W.

    2016-01-01

    The Ku-binding motif (KBM) is a short peptide module first identified in APLF that we now show is also present in Werner syndrome protein (WRN) and in Modulator of retrovirus infection homologue (MRI). We also identify a related but functionally distinct motif in XLF, WRN, MRI and PAXX, which we denote the XLF-like motif. We show that WRN possesses two KBMs; one at the N terminus next to the exonuclease domain and one at the C terminus next to an XLF-like motif. We reveal that the WRN C-terminal KBM and XLF-like motif function cooperatively to bind Ku complexes and that the N-terminal KBM mediates Ku-dependent stimulation of WRN exonuclease activity. We also show that WRN accelerates DSB repair by a mechanism requiring both KBMs, demonstrating the importance of WRN interaction with Ku. These data define a conserved family of KBMs that function as molecular tethers to recruit and/or stimulate enzymes during NHEJ. PMID:27063109

  4. Assessment of the potential contribution of the highly conserved C-terminal motif (C10) of Borrelia burgdorferi outer surface protein C in transmission and infectivity.

    PubMed

    Earnhart, Christopher G; Rhodes, DeLacy V L; Smith, Alexis A; Yang, Xiuli; Tegels, Brittney; Carlyon, Jason A; Pal, Utpal; Marconi, Richard T

    2014-03-01

    OspC is produced by all species of the Borrelia burgdorferi sensu lato complex and is required for infectivity in mammals. To test the hypothesis that the conserved C-terminal motif (C10) of OspC is required for function in vivo, a mutant B. burgdorferi strain (B31::ospCΔC10) was created in which ospC was replaced with an ospC gene lacking the C10 motif. The ability of the mutant to infect mice was investigated using tick transmission and needle inoculation. Infectivity was assessed by cultivation, qRT-PCR, and measurement of IgG antibody responses. B31::ospCΔC10 retained the ability to infect mice by both needle and tick challenge and was competent to survive in ticks after exposure to the blood meal. To determine whether recombinant OspC protein lacking the C-terminal 10 amino acid residues (rOspCΔC10) can bind plasminogen, the only known mammalian-derived ligand for OspC, binding analyses were performed. Deletion of the C10 motif resulted in a statistically significant decrease in plasminogen binding. Although deletion of the C10 motif influenced plasminogen binding, it can be concluded that the C10 motif is not required for OspC to carry out its critical in vivo functions in tick to mouse transmission. © 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved.

  5. The Ku-binding motif is a conserved module for recruitment and stimulation of non-homologous end-joining proteins.

    PubMed

    Grundy, Gabrielle J; Rulten, Stuart L; Arribas-Bosacoma, Raquel; Davidson, Kathryn; Kozik, Zuzanna; Oliver, Antony W; Pearl, Laurence H; Caldecott, Keith W

    2016-04-11

    The Ku-binding motif (KBM) is a short peptide module first identified in APLF that we now show is also present in Werner syndrome protein (WRN) and in Modulator of retrovirus infection homologue (MRI). We also identify a related but functionally distinct motif in XLF, WRN, MRI and PAXX, which we denote the XLF-like motif. We show that WRN possesses two KBMs; one at the N terminus next to the exonuclease domain and one at the C terminus next to an XLF-like motif. We reveal that the WRN C-terminal KBM and XLF-like motif function cooperatively to bind Ku complexes and that the N-terminal KBM mediates Ku-dependent stimulation of WRN exonuclease activity. We also show that WRN accelerates DSB repair by a mechanism requiring both KBMs, demonstrating the importance of WRN interaction with Ku. These data define a conserved family of KBMs that function as molecular tethers to recruit and/or stimulate enzymes during NHEJ.

  6. New melanocortin 1 receptor binding motif based on the C-terminal sequence of alpha-melanocyte-stimulating hormone.

    PubMed

    Schiöth, Helgi B; Muceniece, Ruta; Mutule, Ilga; Wikberg, Jarl E S

    2006-10-01

    The C-terminal tripeptide of the alpha-melanocyte stimulating hormone (alpha-MSH11-13) possesses strong antiinflammatory activity without known cellular target. In order to better understand the structural requirements for function of such motif, we designed, synthesized and tested out Trp- and Tyr-containing analogues of the alpha-MSH11-13. Seven alpha-MSH11-13 analogues were synthesized and characterized for their binding to the melanocortin receptors recombinantly expressed in insect (Sf9) cells, infected with baculovirus carrying corresponding MC receptor DNA. We also tested these analogues on B16-F1 mouse melanoma cells endogenously expressing the MC1 receptor for binding and for ability to increase cAMP levels as well as on COS-7 cells transfected with the human MC receptors. The data indicate that HS401 (Ac-Tyr-Lys-Pro-Val-NH2) and HS402 (Ac-Lys-Pro-Val-Tyr-NH2) selectively bound to the MC1 receptor and stimulated cAMP generation in a concentration dependent way while the other Tyr- and Trp-containing alpha-MSH11-13 analogues neither bound to MC receptors nor stimulated cAMP. We have thus identified new MC receptor binding motif derived from the C-terminal sequence of alpha-MSH. The tetrapeptides have novel properties as the both act via MC-ergic pathways and also carry the anti-inflammatory alpha-MSH11-13 message sequence.

  7. Nucleotide sequence conservation in paramyxoviruses; the concept of codon constellation.

    PubMed

    Rima, Bert K

    2015-05-01

    The stability and conservation of the sequences of RNA viruses in the field and the high error rates measured in vitro are paradoxical. The field stability indicates that there are very strong selective constraints on sequence diversity. The nature of these constraints is discussed. Apart from constraints on variation in cis-acting RNA and the amino acid sequences of viral proteins, there are other ones relating to the presence of specific dinucleotides such CpG and UpA as well as the importance of RNA secondary structures and RNA degradation rates. Recent other constraints identified in other RNA viruses, such as effects of secondary RNA structure on protein folding or modification of cellular tRNA complements, are also discussed. Using the family Paramyxoviridae, I show that the codon usage pattern (CUP) is (i) specific for each virus species and (ii) that it is markedly different from the host - it does not vary even in vaccine viruses that have been derived by passage in a number of inappropriate host cells. The CUP might thus be an additional constraint on variation, and I propose the concept of codon constellation to indicate the informational content of the sequences of RNA molecules relating not only to stability and structure but also to the efficiency of translation of a viral mRNA resulting from the CUP and the numbers and position of rare codons.

  8. Evolutionarily conserved dual lysine motif determines the non-chaperone function of secreted Hsp90alpha in tumour progression.

    PubMed

    Zou, M; Bhatia, A; Dong, H; Jayaprakash, P; Guo, J; Sahu, D; Hou, Y; Tsen, F; Tong, C; O'Brien, K; Situ, A J; Schmidt, T; Chen, M; Ying, Q; Ulmer, T S; Woodley, D T; Li, W

    2017-04-01

    Both intracellular and extracellular heat shock protein-90 (Hsp90) family proteins (α and β) have been shown to support tumour progression. The tumour-supporting activity of the intracellular Hsp90 is attributed to their N-terminal ATPase-driven chaperone function. What molecular entity determines the extracellular function of secreted Hsp90 and the distinction between Hsp90α and Hsp90β was unclear. Here we demonstrate that CRISPR/Case9 knocking out Hsp90α nullifies tumour cells' ability to migrate, invade and metastasize without affecting the cell survival and growth. Knocking out Hsp90β leads to tumour cell death. Extracellular supplementation with recombinant Hsp90α, but not Hsp90β, protein recovers tumourigenicity of the Hsp90α-knockout cells. Sequential mutagenesis identifies two evolutionarily conserved lysine residues, lys-270 and lys-277, in the Hsp90α subfamily that determine the extracellular Hsp90α function. Hsp90β subfamily lacks the dual lysine motif and the extracellular function. Substitutions of gly-262 and thr-269 in Hsp90β with lysines convert Hsp90β to a Hsp90α-like protein. Newly constructed monoclonal antibody, 1G6-D7, against the dual lysine region of secreted Hsp90α inhibits both de novo tumour formation and expansion of already formed tumours in mice. This study suggests an alternative therapeutic approach to target Hsp90 in cancer, that is, the tumour-secreted Hsp90α, instead of the intracellular Hsp90α and Hsp90β.

  9. Rice MEL2, the RNA recognition motif (RRM) protein, binds in vitro to meiosis-expressed genes containing U-rich RNA consensus sequences in the 3'-UTR.

    PubMed

    Miyazaki, Saori; Sato, Yutaka; Asano, Tomoya; Nagamura, Yoshiaki; Nonomura, Ken-Ichi

    2015-10-01

    Post-transcriptional gene regulation by RNA recognition motif (RRM) proteins through binding to cis-elements in the 3'-untranslated region (3'-UTR) is widely used in eukaryotes to complete various biological processes. Rice MEIOSIS ARRESTED AT LEPTOTENE2 (MEL2) is the RRM protein that functions in the transition to meiosis in proper timing. The MEL2 RRM preferentially associated with the U-rich RNA consensus, UUAGUU[U/A][U/G][A/U/G]U, dependently on sequences and proportionally to MEL2 protein amounts in vitro. The consensus sequences were located in the putative looped structures of the RNA ligand. A genome-wide survey revealed a tendency of MEL2-binding consensus appearing in 3'-UTR of rice genes. Of 249 genes that conserved the consensus in their 3'-UTR, 13 genes spatiotemporally co-expressed with MEL2 in meiotic flowers, and included several genes whose function was supposed in meiosis; such as Replication protein A and OsMADS3. The proteome analysis revealed that the amounts of small ubiquitin-related modifier-like protein and eukaryotic translation initiation factor3-like protein were dramatically altered in mel2 mutant anthers. Taken together with transcriptome and gene ontology results, we propose that the rice MEL2 is involved in the translational regulation of key meiotic genes on 3'-UTRs to achieve the faithful transition of germ cells to meiosis.

  10. Rapid characterization of CRISPR-Cas9 protospacer adjacent motif sequence elements.

    PubMed

    Karvelis, Tautvydas; Gasiunas, Giedrius; Young, Joshua; Bigelyte, Greta; Silanskas, Arunas; Cigan, Mark; Siksnys, Virginijus

    2015-11-19

    To expand the repertoire of Cas9s available for genome targeting, we present a new in vitro method for the simultaneous examination of guide RNA and protospacer adjacent motif (PAM) requirements. The method relies on the in vitro cleavage of plasmid libraries containing a randomized PAM as a function of Cas9-guide RNA complex concentration. Using this method, we accurately reproduce the canonical PAM preferences for Streptococcus pyogenes, Streptococcus thermophilus CRISPR3 (Sth3), and CRISPR1 (Sth1). Additionally, PAM and sgRNA solutions for a novel Cas9 protein from Brevibacillus laterosporus are provided by the assay and are demonstrated to support functional activity in vitro and in plants.

  11. Characteristic motifs for families of allergenic proteins

    PubMed Central

    Ivanciuc, Ovidiu; Garcia, Tzintzuni; Torres, Miguel; Schein, Catherine H.; Braun, Werner

    2008-01-01

    The identification of potential allergenic proteins is usually done by scanning a database of allergenic proteins and locating known allergens with a high sequence similarity. However, there is no universally accepted cut-off value for sequence similarity to indicate potential IgE cross-reactivity. Further, overall sequence similarity may be less important than discrete areas of similarity in proteins with homologous structure. To identify such areas, we first classified all allergens and their subdomains in the Structural Database of Allergenic Proteins (SDAP, http://fermi.utmb.edu/SDAP/) to their closest protein families as defined in Pfam, and identified conserved physicochemical property motifs characteristic of each group of sequences. Allergens populate only a small subset of all known Pfam families, as all allergenic proteins in SDAP could be grouped to only 130 (of 9318 total) Pfams, and 31 families contain more than four allergens. Conserved physicochemical property motifs for the aligned sequences of the most populated Pfam families were identified with the PCPMer program suite and catalogued in the webserver Motif-Mate (http://born.utmb.edu/motifmate/summary.php). We also determined specific motifs for allergenic members of a family that could distinguish them from non-allergenic ones. These allergen specific motifs should be most useful in database searches for potential allergens. We found that sequence motifs unique to the allergens in three families (seed storage proteins, Bet v 1, and tropomyosin) overlap with known IgE epitopes, thus providing evidence that our motif based approach can be used to assess the potential allergenicity of novel proteins. PMID:18951633

  12. Genetic diversity of the conserved motifs of six bacterial leaf blight resistance genes in a set of rice landraces

    PubMed Central

    2014-01-01

    Background Bacterial leaf blight (BLB) caused by the vascular pathogen Xanthomonas oryzae pv. oryzae (Xoo) is one of the most serious diseases leading to crop failure in rice growing countries. A total of 37 resistance genes against Xoo has been identified in rice. Of these, ten BLB resistance genes have been mapped on rice chromosomes, while 6 have been cloned, sequenced and characterized. Diversity analysis at the resistance gene level of this disease is scanty, and the landraces from West Bengal and North Eastern states of India have received little attention so far. The objective of this study was to assess the genetic diversity at conserved domains of 6 BLB resistance genes in a set of 22 rice accessions including landraces and check genotypes collected from the states of Assam, Nagaland, Mizoram and West Bengal. Results In this study 34 pairs of primers were designed from conserved domains of 6 BLB resistance genes; Xa1, xa5, Xa21, Xa21(A1), Xa26 and Xa27. The designed primer pairs were used to generate PCR based polymorphic DNA profiles to detect and elucidate the genetic diversity of the six genes in the 22 diverse rice accessions of known disease phenotype. A total of 140 alleles were identified including 41 rare and 26 null alleles. The average polymorphism information content (PIC) value was 0.56/primer pair. The DNA profiles identified each of the rice landraces unequivocally. The amplified polymorphic DNA bands were used to calculate genetic similarity of the rice landraces in all possible pair combinations. The similarity among the rice accessions ranged from 18% to 89% and the dendrogram produced from the similarity values was divided into 2 major clusters. The conserved domains identified within the sequenced rare alleles include Leucine-Rich Repeat, BED-type zinc finger domain, sugar transferase domain and the domain of the carbohydrate esterase 4 superfamily. Conclusions This study revealed high genetic diversity at conserved domains of six BLB

  13. Conserved Tryptophan Motifs in the Large Tegument Protein pUL36 Are Required for Efficient Secondary Envelopment of Herpes Simplex Virus Capsids

    PubMed Central

    Ivanova, Lyudmila; Buch, Anna; Döhner, Katinka; Pohlmann, Anja; Binz, Anne; Prank, Ute; Sandbaumhüter, Malte

    2016-01-01

    ABSTRACT Herpes simplex virus (HSV) replicates in the skin and mucous membranes, and initiates lytic or latent infections in sensory neurons. Assembly of progeny virions depends on the essential large tegument protein pUL36 of 3,164 amino acid residues that links the capsids to the tegument proteins pUL37 and VP16. Of the 32 tryptophans of HSV-1-pUL36, the tryptophan-acidic motifs 1766WD1767 and 1862WE1863 are conserved in all HSV-1 and HSV-2 isolates. Here, we characterized the role of these motifs in the HSV life cycle since the rare tryptophans often have unique roles in protein function due to their large hydrophobic surface. The infectivity of the mutants HSV-1(17+)Lox-pUL36-WD/AA-WE/AA and HSV-1(17+)Lox-CheVP26-pUL36-WD/AA-WE/AA, in which the capsid has been tagged with the fluorescent protein Cherry, was significantly reduced. Quantitative electron microscopy shows that there were a larger number of cytosolic capsids and fewer enveloped virions compared to their respective parental strains, indicating a severe impairment in secondary capsid envelopment. The capsids of the mutant viruses accumulated in the perinuclear region around the microtubule-organizing center and were not dispersed to the cell periphery but still acquired the inner tegument proteins pUL36 and pUL37. Furthermore, cytoplasmic capsids colocalized with tegument protein VP16 and, to some extent, with tegument protein VP22 but not with the envelope glycoprotein gD. These results indicate that the unique conserved tryptophan-acidic motifs in the central region of pUL36 are required for efficient targeting of progeny capsids to the membranes of secondary capsid envelopment and for efficient virion assembly. IMPORTANCE Herpesvirus infections give rise to severe animal and human diseases, especially in young, immunocompromised, and elderly individuals. The structural hallmark of herpesvirus virions is the tegument, which contains evolutionarily conserved proteins that are essential for several

  14. Conserved Tryptophan Motifs in the Large Tegument Protein pUL36 Are Required for Efficient Secondary Envelopment of Herpes Simplex Virus Capsids.

    PubMed

    Ivanova, Lyudmila; Buch, Anna; Döhner, Katinka; Pohlmann, Anja; Binz, Anne; Prank, Ute; Sandbaumhüter, Malte; Bauerfeind, Rudolf; Sodeik, Beate

    2016-06-01

    Herpes simplex virus (HSV) replicates in the skin and mucous membranes, and initiates lytic or latent infections in sensory neurons. Assembly of progeny virions depends on the essential large tegument protein pUL36 of 3,164 amino acid residues that links the capsids to the tegument proteins pUL37 and VP16. Of the 32 tryptophans of HSV-1-pUL36, the tryptophan-acidic motifs (1766)WD(1767) and (1862)WE(1863) are conserved in all HSV-1 and HSV-2 isolates. Here, we characterized the role of these motifs in the HSV life cycle since the rare tryptophans often have unique roles in protein function due to their large hydrophobic surface. The infectivity of the mutants HSV-1(17(+))Lox-pUL36-WD/AA-WE/AA and HSV-1(17(+))Lox-CheVP26-pUL36-WD/AA-WE/AA, in which the capsid has been tagged with the fluorescent protein Cherry, was significantly reduced. Quantitative electron microscopy shows that there were a larger number of cytosolic capsids and fewer enveloped virions compared to their respective parental strains, indicating a severe impairment in secondary capsid envelopment. The capsids of the mutant viruses accumulated in the perinuclear region around the microtubule-organizing center and were not dispersed to the cell periphery but still acquired the inner tegument proteins pUL36 and pUL37. Furthermore, cytoplasmic capsids colocalized with tegument protein VP16 and, to some extent, with tegument protein VP22 but not with the envelope glycoprotein gD. These results indicate that the unique conserved tryptophan-acidic motifs in the central region of pUL36 are required for efficient targeting of progeny capsids to the membranes of secondary capsid envelopment and for efficient virion assembly. Herpesvirus infections give rise to severe animal and human diseases, especially in young, immunocompromised, and elderly individuals. The structural hallmark of herpesvirus virions is the tegument, which contains evolutionarily conserved proteins that are essential for several stages of

  15. Polymorphism, monomorphism, and sequences in conserved microsatellites in primate species.

    PubMed

    Blanquer-Maumont, A; Crouau-Roy, B

    1995-10-01

    Dimeric short tandem repeats are a source of highly polymorphic markers in the mammalian genome. Genetic variation at these hypervariable loci is extensively used for linkage analysis, for the identification of individuals, and may be useful for interpopulation and interspecies studies. In this paper, we analyze the variability and the sequences of a segment including three microsatellites, first described in man, in several species of primates (chimpanzee, orangutan, gibbon, and macaque) using the heterologous primers (man primers). This region is located on the human chromosome 6p, near the tumor necrosis factor genes, in the major histocompatibility complex. The fact that these primers work in all species studied indicates that they are conserved throughout the different lineages of the two superfamilies, the Hominoidea and the Cercopithecidea, represented by the macaques. However, the intervening sequence displays intraspecific and interspecific variability. The sites of base substitutions and the insertion/deletion events are not evenly distributed within this region. The data suggest that it is necessary to have a minimal number of repeats to increase the rate of mutation sufficiently to allow the development of polymorphism. In some species, the microsatellites present single base variations which reduce the number of contiguous repeats, thus apparently slowing the rate of additional slippage events. Species with such variations or a low number of repeats are monomorphic. These microsatellite sequences are informative in the comparison of closely related species and reflect the phylogeny of the Old World monkeys, apes, and man.

  16. Interaction of the N-Terminal Tandem Domains of hnRNP LL with the BCL2 Promoter i-Motif DNA Sequence.

    PubMed

    Lannes, Laurie; Young, Phoebe; Richter, Christian; Morgner, Nina; Schwalbe, Harald

    2017-08-14

    The human genome contains GC-rich sequences able to form tetraplex secondary structures known as the G-quadruplex and i-motif. Such sequences are notably present in the promoter region of oncogenes and are proposed to function as regulatory elements of gene expression. The P1 promoter of BCL2 contains a 39-mer C-rich sequence (Py39wt) that can fold into a hairpin or an i-motif in a pH-dependent manner in vitro. The protein hnRNP LL was identified to recognise the i-motif over the hairpin conformation and act as an activating transcription factor. Thus, the Py39wt sequence would act as an ON/OFF switch, according to the secondary structure adopted. Herein, a structural study of the interaction between hnRNP LL and Py39wt is reported. Both N-terminal RNA recognition motifs (RRM12) cooperatively recognise one Py39wt DNA sequence and engage their β-sheet to form a large binding platform. In contrast, the C-terminal RRMs show no binding capacity. It is observed that RRM12 binds to Py39wt regardless of the DNA conformation. We propose that RRM12 recognises a single-stranded CTCCC element present in loop 1 of the i-motif and in the apical loop of the hairpin conformation. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  17. Cloning and mapping of a human gene (TBX2) sharing a highly conserved protein motif with a Drosophila omb gene

    SciTech Connect

    Campbell, C.; Goodrich, K.; Casey, G.; Beatty, B.

    1995-07-20

    We have identified and cloned a human gene (TBX2) that exhibits strong sequence homology within a putative DNA binding domain to the drosophila optomotor-blind (omb) gene and lesser homology to the DNA binding domain of the murine brachyury or T gene. Unlike omb, which is expressed in neural tissue, or T, which is not expressed in adult animals, TBX2 is expressed primarily in adult in kidney, lung, and placenta as multiple transcripts of between {approximately} 2 and 4 kb. At least part of this transcript heterogenity appears to be due to alternative polyadenylation. This is the first reported human member of a new family of highly evolutionarily conserved DNA binding proteins, the Tbx or T-box proteins. The human gene has been mapped by somatic cell hybrid mapping and chromosomal in situ hybridization to chromosome 17q23, a region frequently altered in ovarian carcinomas. 19 refs., 6 figs.

  18. A Bayesian hidden Markov model for motif discovery through joint modeling of genomic sequence and ChIP-chip data.

    PubMed

    Gelfond, Jonathan A L; Gupta, Mayetri; Ibrahim, Joseph G

    2009-12-01

    We propose a unified framework for the analysis of chromatin (Ch) immunoprecipitation (IP) microarray (ChIP-chip) data for detecting transcription factor binding sites (TFBSs) or motifs. ChIP-chip assays are used to focus the genome-wide search for TFBSs by isolating a sample of DNA fragments with TFBSs and applying this sample to a microarray with probes corresponding to tiled segments across the genome. Present analytical methods use a two-step approach: (i) analyze array data to estimate IP-enrichment peaks then (ii) analyze the corresponding sequences independently of intensity information. The proposed model integrates peak finding and motif discovery through a unified Bayesian hidden Markov model (HMM) framework that accommodates the inherent uncertainty in both measurements. A Markov chain Monte Carlo algorithm is formulated for parameter estimation, adapting recursive techniques used for HMMs. In simulations and applications to a yeast RAP1 dataset, the proposed method has favorable TFBS discovery performance compared to currently available two-stage procedures in terms of both sensitivity and specificity.

  19. A Bayesian hidden Markov model for motif discovery through joint modeling of genomic sequence and ChIP-chip data

    PubMed Central

    Gelfond, Jonathan A. L.; Gupta, Mayetri; Ibrahim, Joseph G.

    2009-01-01

    SUMMARY We propose a unified framework for the analysis of Chromatin (Ch) Immunoprecipitation (IP) microarray (ChIP-chip) data for detecting transcription factor binding sites (TFBSs) or motifs. ChIP-chip assays are used to focus the genome-wide search for TFBSs by isolating a sample of DNA fragments with TFBSs and applying this sample to a microarray with probes corresponding to tiled segments across the genome. Present analytical methods use a two-step approach: (i) analyze array data to estimate IP enrichment peaks then (ii) analyze the corresponding sequences independently of intensity information. The proposed model integrates peak finding and motif discovery through a unified Bayesian hidden Markov model (HMM) framework that accommodates the inherent uncertainty in both measurements. A Markov Chain Monte Carlo algorithm is formulated for parameter estimation, adapting recursive techniques used for HMMs. In simulations and applications to a yeast RAP1 dataset, the proposed method has favorable TFBS discovery performance compared to currently available two-stage procedures in terms of both sensitivity and specificity. PMID:19210737

  20. Uncharacterized conserved motifs outside the HD-Zip domain in HD-Zip subfamily I transcription factors; a potential source of functional diversity

    PubMed Central

    2011-01-01

    Background Plant HD-Zip transcription factors are modular proteins in which a homeodomain is associated to a leucine zipper. Of the four subfamilies in which they are divided, the tested members from subfamily I bind in vitro the same pseudopalindromic sequence CAAT(A/T)ATTG and among them, several exhibit similar expression patterns. However, most experiments in which HD-Zip I proteins were over or ectopically expressed under the control of the constitutive promoter 35S CaMV resulted in transgenic plants with clearly different phenotypes. Aiming to elucidate the structural mechanisms underlying such observation and taking advantage of the increasing information in databases of sequences from diverse plant species, an in silico analysis was performed. In addition, some of the results were also experimentally supported. Results A phylogenetic tree of 178 HD-Zip I proteins together with the sequence conservation presented outside the HD-Zip domains allowed the distinction of six groups of proteins. A motif-discovery approach enabled the recognition of an activation domain in the carboxy-terminal regions (CTRs) and some putative regulatory mechanisms acting in the amino-terminal regions (NTRs) and CTRs involving sumoylation and phosphorylation. A yeast one-hybrid experiment demonstrated that the activation activity of ATHB1, a member of one of the groups, is located in its CTR. Chimerical constructs were performed combining the HD-Zip domain of one member with the CTR of another and transgenic plants were obtained with these constructs. The phenotype of the chimerical transgenic plants was similar to the observed in transgenic plants bearing the CTR of the donor protein, revealing the importance of this module inside the whole protein. Conclusions The bioinformatical results and the experiments conducted in yeast and transgenic plants strongly suggest that the previously poorly analyzed NTRs and CTRs of HD-Zip I proteins play an important role in their function, hence

  1. A common sequence motif determines the Cajal body-specific localization of box H/ACA scaRNAs.

    PubMed

    Richard, Patricia; Darzacq, Xavier; Bertrand, Edouard; Jády, Beáta E; Verheggen, Céline; Kiss, Tamás

    2003-08-15

    Post-transcriptional synthesis of 2'-O-methylated nucleotides and pseudouridines in Sm spliceosomal small nuclear RNAs takes place in the nucleoplasmic Cajal bodies and it is directed by guide RNAs (scaRNAs) that are structurally and functionally indistinguishable from small nucleolar RNAs (snoRNAs) directing rRNA modification in the nucleolus. The scaRNAs are synthesized in the nucleoplasm and specifically targeted to Cajal bodies. Here, mutational analysis of the human U85 box C/D-H/ACA scaRNA, followed by in situ localization, demonstrates that box H/ACA scaRNAs share a common Cajal body-specific localization signal, the CAB box. Two copies of the evolutionarily conserved CAB consensus (UGAG) are located in the terminal loops of the 5' and 3' hairpins of the box H/ACA domains of mammalian, Drosophila and plant scaRNAs. Upon alteration of the CAB boxes, mutant scaRNAs accumulate in the nucleolus. In turn, authentic snoRNAs can be targeted into Cajal bodies by addition of exogenous CAB box motifs. Our results indicate that scaRNAs represent an ancient group of small nuclear RNAs which are localized to Cajal bodies by an evolutionarily conserved mechanism.

  2. De novo computational identification of stress-related sequence motifs and microRNA target sites in untranslated regions of a plant translatome

    PubMed Central

    Munusamy, Prabhakaran; Zolotarov, Yevgen; Meteignier, Louis-Valentin; Moffett, Peter; Strömvik, Martina V.

    2017-01-01

    Gene regulation at the transcriptional and translational level leads to diversity in phenotypes and function in organisms. Regulatory DNA or RNA sequence motifs adjacent to the gene coding sequence act as binding sites for proteins that in turn enable or disable expression of the gene. Whereas the known DNA and RNA binding proteins range in the thousands, only a few motifs have been examined. In this study, we have predicted putative regulatory motifs in groups of untranslated regions from genes regulated at the translational level in Arabidopsis thaliana under normal and stressed conditions. The test group of sequences was divided into random subgroups and subjected to three de novo motif finding algorithms (Seeder, Weeder and MEME). In addition to identifying sequence motifs, using an in silico tool we have predicted microRNA target sites in the 3′ UTRs of the translationally regulated genes, as well as identified upstream open reading frames located in the 5′ UTRs. Our bioinformatics strategy and the knowledge generated contribute to understanding gene regulation during stress, and can be applied to disease and stress resistant plant development. PMID:28276452

  3. A sequence motif enriched in regions bound by the Drosophila dosage compensation complex

    PubMed Central

    2010-01-01

    Background In Drosophila melanogaster, dosage compensation is mediated by the action of the dosage compensation complex (DCC). How the DCC recognizes the fly X chromosome is still poorly understood. Characteristic sequence signatures at all DCC binding sites have not hitherto been found. Results In this study, we compare the known binding sites of the DCC with oligonucleotide profiles that measure the specificity of the sequences of the D. melanogaster X chromosome. We show that the X chromosome regions bound by the DCC are enriched for a particular type of short, repetitive sequences. Their distribution suggests that these sequences contribute to chromosome recognition, the generation of DCC binding sites and/or the local spreading of the complex. Comparative data indicate that the same sequences may be involved in dosage compensation in other Drosophila species. Conclusions These results offer an explanation for the wild-type binding of the DCC along the Drosophila X chromosome, contribute to delineate the forces leading to the establishment of dosage compensation and suggest new experimental approaches to understand the precise biochemical features of the dosage compensation system. PMID:20226017

  4. A comparative analysis of two conserved motifs in bacterial poly(A) polymerase and CCA-adding enzyme.

    PubMed

    Just, Andrea; Butter, Falk; Trenkmann, Michelle; Heitkam, Tony; Mörl, Mario; Betat, Heike

    2008-09-01

    Showing a high sequence similarity, the evolutionary closely related bacterial poly(A) polymerases (PAP) and CCA-adding enzymes catalyze quite different reactions--PAP adds poly(A) tails to RNA 3'-ends, while CCA-adding enzymes synthesize the sequence CCA at the 3'-terminus of tRNAs. Here, two highly conserved structural elements of the corresponding Escherichia coli enzymes were characterized. The first element is a set of amino acids that was identified in CCA-adding enzymes as a template region determining the enzymes' specificity for CTP and ATP. The same element is also present in PAP, where it confers ATP specificity. The second investigated region corresponds to a flexible loop in CCA-adding enzymes and is involved in the incorporation of the terminal A-residue. Although, PAP seems to carry a similar flexible region, the functional relevance of this element in PAP is not known. The presented results show that the template region has an essential function in both enzymes, while the second element is surprisingly dispensable in PAP. The data support the idea that the bacterial PAP descends from CCA-adding enzymes and still carries some of the structural elements required for CCA-addition as an evolutionary relic and is now fixed in a conformation specific for A-addition.

  5. Sequencing of HLA class II genes based on the conserved diversity of the non-coding regions: sequencing based typing of HLA-DRB genes.

    PubMed

    Kotsch, K; Wehling, J; Blasczyk, R

    1999-05-01

    In this paper, we present a novel sequencing based typing strategy for the HLA-DRB1, 3, 4 and 5 loci. The new approach is based on a group-specific amplification from intron 1 to intron 2 according to the serologically-defined antigens. For this purpose, we have determined the 3' 500 bp-fragment of intron 1 and the 5' 340 bp-fragment of intron 2 of all serological antigens and their most frequent subtypes. We discovered a remarkably conserved diversity characterized by lineage-specific sequence motifs. This lineage-specificity of non-coding motifs in the 1st and 2nd intron offered the possibility to establish a clear serology-related amplification strategy. The method allows the complete analysis of the 2nd exon and the definition of the cis/trans linkage of sequence motifs by intron-mediated polymerase chain reaction (PCR)-based separation of the haplotypes in nearly all serologically heterozygous samples. In particular, the non-coding variabilities between the DR52-associated DRB1 groups made their independent amplification possible. Thus, compared to the standard procedures using exon-based amplification primers, the groups DR3, DR12, some DR13 alleles (1301, 1302) and the DR14 group could be amplified by specific primer mixes. The DR8 could be amplified with an individual primer mix not co-amplifying the DR12. The DR11 and DR13 did not show any individual motif in intron 1 or intron 2. In order to achieve a separate amplification, they had to be amplified by multispecific primer mixes (DR3/11/13/14; DR3/11/13 or DR11/13/14) excluding the other haplotype. Thus, exclusively the alleles in rare DR11,13 heterozygosities without a DRB1*1301 or 1302 could not be amplified separately. Fourteen primer mixes are used to amplify the specificities DR1-14, and 6 primer mixes for the specificities DR51-53. The sequence homology of the 3' end of intron 1 facilitated the application of only three different sequencing primers for all DRB alleles.

  6. A novel human gene (SARM) at chromosome 17q11 encodes a protein with a SAM motif and structural similarity to Armadillo/beta-catenin that is conserved in mouse, Drosophila, and Caenorhabditis elegans.

    PubMed

    Mink, M; Fogelgren, B; Olszewski, K; Maroy, P; Csiszar, K

    2001-06-01

    A novel human gene, SARM, encodes the orthologue of a Drosophila protein (CG7915) and contains a unique combination of the sterile alpha (SAM) and the HEAT/Armadillo motifs. The SARM gene was identified on chromosome 17q11, between markers D17S783 and D17S841 on BAC clone AC002094, which also included a HERV repeat and keratin-18-like, MAC30, TNFAIP1, HSPC017, and vitronectin genes in addition to three unknown genes. The mouse SARM gene was located on a mouse chromosome 11 BAC clone (AC002324). The SARM gene is 1.8 kb centromeric to the vitronectin gene, and the two genes share a promoter region that directs a high level of liver-specific expression of both the SARM and the vitronectin genes. In addition to the liver, the SARM gene was highly expressed in the kidney. A 0.4-kb antisense transcript was coordinately expressed with the SARM gene in the kidney and liver, while in the brain and malignant cell lines, it appeared independent of SARM gene transcription. The SARM gene encodes a protein of 690 amino acids. Based on amino acid sequence homology, we have identified a SAM motif within this derived protein. Structure modeling and protein folding recognition studies confirmed the presence of alpha-alpha right-handed superhelix-like folds consistent with the structure of the Armadillo and HEAT repeats of the beta-catenin and importin protein families. Both motifs are known to be involved in protein-protein interactions promoting the formation of diverse protein complexes. We have identified the same conserved SAM/Armadillo motif combination in the mouse, Drosophila, and Caenorhabditis elegans SARM proteins.

  7. The histone chaperone sNASP binds a conserved peptide motif within the globular core of histone H3 through its TPR repeats

    PubMed Central

    Bowman, Andrew; Lercher, Lukas; Singh, Hari R.; Zinne, Daria; Timinszky, Gyula; Carlomagno, Teresa; Ladurner, Andreas G.

    2016-01-01

    Eukaryotic chromatin is a complex yet dynamic structure, which is regulated in part by the assembly and disassembly of nucleosomes. Key to this process is a group of proteins termed histone chaperones that guide the thermodynamic assembly of nucleosomes by interacting with soluble histones. Here we investigate the interaction between the histone chaperone sNASP and its histone H3 substrate. We find that sNASP binds with nanomolar affinity to a conserved heptapeptide motif in the globular domain of H3, close to the C-terminus. Through functional analysis of sNASP homologues we identified point mutations in surface residues within the TPR domain of sNASP that disrupt H3 peptide interaction, but do not completely disrupt binding to full length H3 in cells, suggesting that sNASP interacts with H3 through additional contacts. Furthermore, chemical shift perturbations from 1H-15N HSQC experiments show that H3 peptide binding maps to the helical groove formed by the stacked TPR motifs of sNASP. Our findings reveal a new mode of interaction between a TPR repeat domain and an evolutionarily conserved peptide motif found in canonical H3 and in all histone H3 variants, including CenpA and have implications for the mechanism of histone chaperoning within the cell. PMID:26673727

  8. Conserved structural motifs at the C-terminus of baculovirus protein IE0 are important for its functions in transactivation and supporting hr5-mediated DNA replication.

    PubMed

    Luria, Neta; Lu, Liqun; Chejanovsky, Nor

    2012-05-01

    IE0 and IE1 are transactivator proteins of the most studied baculovirus, the Autographa californica multiple nucleopolyhedrovirus (AcMNPV). IE0 is a 72.6 kDa protein identical to IE1 with the exception of its 54 N-terminal amino acid residues. To gain some insight about important structural motifs of IE0, we expressed the protein and C‑terminal mutants of it under the control of the Drosophila heat shock promoter and studied the transactivation and replication functions of the transiently expressed proteins. IE0 was able to promote replication of a plasmid bearing the hr5 origin of replication of AcMNPV in transient transfections with a battery of eight plasmids expressing the AcMNPV genes dnapol, helicase, lef-1, lef-2, lef-3, p35, ie-2 and lef-7. IE0 transactivated expression of the baculovirus 39K promoter. Both functions of replication and transactivation were lost after introduction of selected mutations at the basic domain II and helix-loop-helix conserved structural motifs in the C-terminus of the protein. These IE0 mutants were unable to translocate to the cell nucleus. Our results point out the important role of some structural conserved motifs to the proper functioning of IE0.

  9. Structural Analysis of a Repetitive Protein Sequence Motif in Strepsirrhine Primate Amelogenin

    PubMed Central

    Bromley, Keith M.; Hacia, Joseph G.; Bromage, Timothy G.; Snead, Malcolm L.; Moradian-Oldak, Janet; Paine, Michael L.

    2011-01-01

    Strepsirrhines are members of a primate suborder that has a distinctive set of features associated with the development of the dentition. Amelogenin (AMEL), the better known of the enamel matrix proteins, forms 90% of the secreted organic matrix during amelogenesis. Although AMEL has been sequenced in numerous mammalian lineages, the only reported strepsirrhine AMEL sequences are those of the ring-tailed lemur and galago, which contain a set of additional proline-rich tandem repeats absent in all other primates species analyzed to date, but present in some non-primate mammals. Here, we first determined that these repeats are present in AMEL from three additional lemur species and thus are likely to be widespread throughout this group. To evaluate the functional relevance of these repeats in strepsirrhines, we engineered a mutated murine amelogenin sequence containing a similar proline-rich sequence to that of Lemur catta. In the monomeric form, the MQP insertions had no influence on the secondary structure or refolding properties, whereas in the assembled form, the insertions increased the hydrodynamic radii. We speculate that increased AMEL nanosphere size may influence enamel formation in strepsirrhine primates. PMID:21437261

  10. CORE-SINEs: Eukaryotic short interspersed retroposing elements with common sequence motifs

    PubMed Central

    Gilbert, Nicolas; Labuda, Damian

    1999-01-01

    A 65-bp “core” sequence is dispersed in hundreds of thousands copies in the human genome. This sequence was found to constitute the central segment of a group of short interspersed elements (SINEs), referred to as mammalian-wide interspersed repeats, that proliferated before the radiation of placental mammals. Here, we propose that the core identifies an ancient tRNA-like SINE element, which survived in different lineages such as mammals, reptiles, birds, and fish, as well as mollusks, presumably for >550 million years. This element gave rise to a number of sequence families (CORE-SINEs), including mammalian-wide interspersed repeats, whose distinct 3′ ends are shared with different families of long interspersed elements (LINEs). The evolutionary success of the generic CORE-SINE element can be related to the recruitment of the internal promoter from highly transcribed host RNA as well as to its capacity to adapt to changing retropositional opportunities by sequence exchange with actively amplifying LINEs. It reinforces the notion that the very existence of SINEs depends on the cohabitation with both LINEs and the host genome. PMID:10077603

  11. CORE-SINEs: eukaryotic short interspersed retroposing elements with common sequence motifs.

    PubMed

    Gilbert, N; Labuda, D

    1999-03-16

    A 65-bp "core" sequence is dispersed in hundreds of thousands copies in the human genome. This sequence was found to constitute the central segment of a group of short interspersed elements (SINEs), referred to as mammalian-wide interspersed repeats, that proliferated before the radiation of placental mammals. Here, we propose that the core identifies an ancient tRNA-like SINE element, which survived in different lineages such as mammals, reptiles, birds, and fish, as well as mollusks, presumably for >550 million years. This element gave rise to a number of sequence families (CORE-SINEs), including mammalian-wide interspersed repeats, whose distinct 3' ends are shared with different families of long interspersed elements (LINEs). The evolutionary success of the generic CORE-SINE element can be related to the recruitment of the internal promoter from highly transcribed host RNA as well as to its capacity to adapt to changing retropositional opportunities by sequence exchange with actively amplifying LINEs. It reinforces the notion that the very existence of SINEs depends on the cohabitation with both LINEs and the host genome.

  12. Conserved Motifs within Hepatitis C Virus Envelope (E2) RNA and Protein Independently Inhibit T Cell Activation

    PubMed Central

    Bhattarai, Nirjal; McLinden, James H.; Xiang, Jinhua; Kaufman, Thomas M.; Stapleton, Jack T.

    2015-01-01

    T cell receptor (TCR) signaling is required for T-cell activation, proliferation, differentiation, and effector function. Hepatitis C virus (HCV) infection is associated with impaired T-cell function leading to persistent viremia, delayed and inconsistent antibody responses, and mild immune dysfunction. Although multiple factors appear to contribute to T-cell dysfunction, a role for HCV particles in this process has not been identified. Here, we show that incubation of primary human CD4+ and CD8+ T-cells with HCV RNA-containing serum, HCV-RNA containing extracellular vesicles (EVs), cell culture derived HCV particles (HCVcc) and HCV envelope pseudotyped retrovirus particles (HCVpp) inhibited TCR-mediated signaling. Since HCVpp’s contain only E1 and E2, we examined the effect of HCV E2 on TCR signaling pathways. HCV E2 expression recapitulated HCV particle-induced TCR inhibition. A highly conserved, 51 nucleotide (nt) RNA sequence was sufficient to inhibit TCR signaling. Cells expressing the HCV E2 coding RNA contained a short, virus-derived RNA predicted to be a Dicer substrate, which targeted a phosphatase involved in Src-kinase signaling (PTPRE). T-cells and hepatocytes containing HCV E2 RNA had reduced PTPRE protein levels. Mutation of 6 nts abolished the predicted Dicer interactions and restored PTPRE expression and proximal TCR signaling. HCV RNA did not inhibit distal TCR signaling induced by PMA and Ionomycin; however, HCV E2 protein inhibited distal TCR signaling. This inhibition required lymphocyte-specific tyrosine kinase (Lck). Lck phosphorylated HCV E2 at a conserved tyrosine (Y613), and phospho-E2 inhibited nuclear translocation of NFAT. Mutation of Y613 restored distal TCR signaling, even in the context of HCVpps. Thus, HCV particles delivered viral RNA and E2 protein to T-cells, and these inhibited proximal and distal TCR signaling respectively. These effects of HCV particles likely aid in establishing infection and contribute to viral persistence

  13. Molecular sensing of bacteria in plants. The highly conserved RNA-binding motif RNP-1 of bacterial cold shock proteins is recognized as an elicitor signal in tobacco.

    PubMed

    Felix, Georg; Boller, Thomas

    2003-02-21

    To detect microbial infection multicellular organisms have evolved sensing systems for pathogen-associated molecular patterns (PAMPs). Here, we identify bacterial cold shock protein (CSP) as a new such PAMP that acts as a highly active elicitor of defense responses in tobacco. Tobacco cells perceive a conserved domain of CSP and synthetic peptides representing 15 amino acids of this domain-induced responses at subnanomolar concentrations. Central to the elicitor-active domain is the RNP-1 motif KGFGFITP, a motif conserved also in many RNA- and DNA-binding proteins of eukaryotes. Csp15-Nsyl, a peptide representing the domain with highest homology to csp15 in a protein of Nicotiana sylvestris exhibited only weak activity in tobacco cells. Crystallographic and genetic data from the literature show that the RNP-1 domain of bacterial CSPs resides on a protruding loop and exposes a series of aromatic and basic side chains to the surface that are essential for the nucleotide-binding activity of CSPs. Similarly, these side chains were also essential for elicitor activity and replacement of single residues in csp15 with Ala strongly reduced or abolished activity. Most strikingly, csp15-Ala10, a peptide with the RNP-1 motif modified to KGAGFITP, lacked elicitor activity but acted as a competitive antagonist for CSP-related elicitors. Bacteria commonly have a small family of CSP-like proteins including both cold-inducible and noninducible members, and Csp-related elicitor activity was detected in extracts from all bacteria tested. Thus, the CSP domain containing the RNP-1 motif provides a structure characteristic for bacteria in general, and tobacco plants have evolved a highly sensitive chemoperception system to detect this bacterial PAMP.

  14. Tissue-specific DNA methylation is conserved across human, mouse, and rat, and driven by primary sequence conservation.

    PubMed

    Zhou, Jia; Sears, Renee L; Xing, Xiaoyun; Zhang, Bo; Li, Daofeng; Rockweiler, Nicole B; Jang, Hyo Sik; Choudhary, Mayank N K; Lee, Hyung Joo; Lowdon, Rebecca F; Arand, Jason; Tabers, Brianne; Gu, C Charles; Cicero, Theodore J; Wang, Ting

    2017-09-12

    Uncovering mechanisms of epigenome evolution is an essential step towards understanding the evolution of different cellular phenotypes. While studies have confirmed DNA methylation as a conserved epigenetic mechanism in mammalian development, little is known about the conservation of tissue-specific genome-wide DNA methylation patterns. Using a comparative epigenomics approach, we identified and compared the tissue-specific DNA methylation patterns of rat against those of mouse and human across three shared tissue types. We confirmed that tissue-specific differentially methylated regions are strongly associated with tissue-specific regulatory elements. Comparisons between species revealed that at a minimum 11-37% of tissue-specific DNA methylation patterns are conserved, a phenomenon that we define as epigenetic conservation. Conserved DNA methylation is accompanied by conservation of other epigenetic marks including histone modifications. Although a significant amount of locus-specific methylation is epigenetically conserved, the majority of tissue-specific DNA methylation is not conserved across the species and tissue types that we investigated. Examination of the genetic underpinning of epigenetic conservation suggests that primary sequence conservation is a driving force behind epigenetic conservation. In contrast, evolutionary dynamics of tissue-specific DNA methylation are best explained by the maintenance or turnover of binding sites for important transcription factors. Our study extends the limited literature of comparative epigenomics and suggests a new paradigm for epigenetic conservation without genetic conservation through analysis of transcription factor binding sites.

  15. i-motif structures in long cytosine-rich sequences found upstream of the promoter region of the SMARCA4 gene.

    PubMed

    Benabou, Sanae; Aviñó, Anna; Lyonnais, S; González, C; Eritja, Ramon; De Juan, Anna; Gargallo, Raimundo

    2017-09-01

    Cytosine-rich oligonucleotides are capable of forming complex structures known as i-motif with increasingly studied biological properties. The study of sequences prone to form i-motifs located near the promoter region of genes may be difficult because these sequences not only contain repeats of cytosine tracts of disparate length but also these may be separated by loops of varied nature and length. In this work, the formation of intramolecular i-motif structures by a long sequence located upstream of the promoter region of the SMARCA4 gene has been demonstrated. Nuclear Magnetic Resonance, Circular Dichroism, Gel Electrophoresis, Size-Exclusion Chromatography, and multivariate analysis have been used. Not only the wild sequence (5'-TC3T2GCTATC3TGTC2TGC2TCGC3T2G2TCATGA2C4-3') has been studied but also several other truncated and mutated sequences. Despite the apparent complex sequence, the results showed that the wild sequence may form a relatively stable and homogeneous unimolecular i-motif structure, both in terms of pH or temperature. The model ligand TMPyP4 destabilizes the structure, whereas the presence of 20% (w/v) PEG200 stabilized it slightly. This finding opens the door to the study of the interaction of these kind of i-motif structures with stabilizing ligands or proteins. Copyright © 2017 Elsevier B.V. and Société Française de Biochimie et Biologie Moléculaire (SFBBM). All rights reserved.

  16. On the Concept of Cis-regulatory Information: From Sequence Motifs to Logic Functions

    NASA Astrophysics Data System (ADS)

    Tarpine, Ryan; Istrail, Sorin

    The regulatory genome is about the “system level organization of the core genomic regulatory apparatus, and how this is the locus of causality underlying the twin phenomena of animal development and animal evolution” (E.H. Davidson. The Regulatory Genome: Gene Regulatory Networks in Development and Evolution, Academic Press, 2006). Information processing in the regulatory genome is done through regulatory states, defined as sets of transcription factors (sequence-specific DNA binding proteins which determine gene expression) that are expressed and active at the same time. The core information processing machinery consists of modular DNA sequence elements, called cis-modules, that interact with transcription factors. The cis-modules “read” the information contained in the regulatory state of the cell through transcription factor binding, “process” it, and directly or indirectly communicate with the basal transcription apparatus to determine gene expression. This endowment of each gene with the information-receiving capacity through their cis-regulatory modules is essential for the response to every possible regulatory state to which it might be exposed during all phases of the life cycle and in all cell types. We present here a set of challenges addressed by our CYRENE research project aimed at studying the cis-regulatory code of the regulatory genome. The CYRENE Project is devoted to (1) the construction of a database, the cis-Lexicon, containing comprehensive information across species about experimentally validated cis-regulatory modules; and (2) the software development of a next-generation genome browser, the cis-Browser, specialized for the regulatory genome. The presentation is anchored on three main computational challenges: the Gene Naming Problem, the Consensus Sequence Bottleneck Problem, and the Logic Function Inference Problem.

  17. High-throughput sequencing enhanced phage display enables the identification of patient-specific epitope motifs in serum.

    PubMed

    Christiansen, Anders; Kringelum, Jens V; Hansen, Christian S; Bøgh, Katrine L; Sullivan, Eric; Patel, Jigar; Rigby, Neil M; Eiwegger, Thomas; Szépfalusi, Zsolt; de Masi, Federico; Nielsen, Morten; Lund, Ole; Dufva, Martin

    2015-08-06

    Phage display is a prominent screening technique with a multitude of applications including therapeutic antibody development and mapping of antigen epitopes. In this study, phages were selected based on their interaction with patient serum and exhaustively characterised by high-throughput sequencing. A bioinformatics approach was developed in order to identify peptide motifs of interest based on clustering and contrasting to control samples. Comparison of patient and control samples confirmed a major issue in phage display, namely the selection of unspecific peptides. The potential of the bioinformatic approach was demonstrated by identifying epitopes of a prominent peanut allergen, Ara h 1, in sera from patients with severe peanut allergy. The identified epitopes were confirmed by high-density peptide micro-arrays. The present study demonstrates that high-throughput sequencing can empower phage display by (i) enabling the analysis of complex biological samples, (ii) circumventing the traditional laborious picking and functional testing of individual phage clones and (iii) reducing the number of selection rounds.

  18. CpG island erosion, polycomb occupancy and sequence motif enrichment at bivalent promoters in mammalian embryonic stem cells

    PubMed Central

    Mantsoki, Anna; Devailly, Guillaume; Joshi, Anagha

    2015-01-01

    In embryonic stem (ES) cells, developmental regulators have a characteristic bivalent chromatin signature marked by simultaneous presence of both activation (H3K4me3) and repression (H3K27me3) signals and are thought to be in a ‘poised’ state for subsequent activation or silencing during differentiation. We collected eleven pairs (H3K4me3 and H3K27me3) of ChIP sequencing datasets in human ES cells and eight pairs in murine ES cells, and predicted high-confidence (HC) bivalent promoters. Over 85% of H3K27me3 marked promoters were bivalent in human and mouse ES cells. We found that (i) HC bivalent promoters were enriched for developmental factors and were highly likely to be differentially expressed upon transcription factor perturbation; (ii) murine HC bivalent promoters were occupied by both polycomb repressive component classes (PRC1 and PRC2) and grouped into four distinct clusters with different biological functions; (iii) HC bivalent and active promoters were CpG rich while H3K27me3-only promoters lacked CpG islands. Binding enrichment of distinct sets of regulators distinguished bivalent from active promoters. Moreover, a ‘TCCCC’ sequence motif was specifically enriched in bivalent promoters. Finally, this analysis will serve as a resource for future studies to further understand transcriptional regulation during embryonic development. PMID:26582124

  19. CpG island erosion, polycomb occupancy and sequence motif enrichment at bivalent promoters in mammalian embryonic stem cells.

    PubMed

    Mantsoki, Anna; Devailly, Guillaume; Joshi, Anagha

    2015-11-19

    In embryonic stem (ES) cells, developmental regulators have a characteristic bivalent chromatin signature marked by simultaneous presence of both activation (H3K4me3) and repression (H3K27me3) signals and are thought to be in a 'poised' state for subsequent activation or silencing during differentiation. We collected eleven pairs (H3K4me3 and H3K27me3) of ChIP sequencing datasets in human ES cells and eight pairs in murine ES cells, and predicted high-confidence (HC) bivalent promoters. Over 85% of H3K27me3 marked promoters were bivalent in human and mouse ES cells. We found that (i) HC bivalent promoters were enriched for developmental factors and were highly likely to be differentially expressed upon transcription factor perturbation; (ii) murine HC bivalent promoters were occupied by both polycomb repressive component classes (PRC1 and PRC2) and grouped into four distinct clusters with different biological functions; (iii) HC bivalent and active promoters were CpG rich while H3K27me3-only promoters lacked CpG islands. Binding enrichment of distinct sets of regulators distinguished bivalent from active promoters. Moreover, a 'TCCCC' sequence motif was specifically enriched in bivalent promoters. Finally, this analysis will serve as a resource for future studies to further understand transcriptional regulation during embryonic development.

  20. High-throughput sequencing enhanced phage display enables the identification of patient-specific epitope motifs in serum

    PubMed Central

    Christiansen, Anders; Kringelum, Jens V.; Hansen, Christian S.; Bøgh, Katrine L.; Sullivan, Eric; Patel, Jigar; Rigby, Neil M.; Eiwegger, Thomas; Szépfalusi, Zsolt; Masi, Federico de; Nielsen, Morten; Lund, Ole; Dufva, Martin

    2015-01-01

    Phage display is a prominent screening technique with a multitude of applications including therapeutic antibody development and mapping of antigen epitopes. In this study, phages were selected based on their interaction with patient serum and exhaustively characterised by high-throughput sequencing. A bioinformatics approach was developed in order to identify peptide motifs of interest based on clustering and contrasting to control samples. Comparison of patient and control samples confirmed a major issue in phage display, namely the selection of unspecific peptides. The potential of the bioinformatic approach was demonstrated by identifying epitopes of a prominent peanut allergen, Ara h 1, in sera from patients with severe peanut allergy. The identified epitopes were confirmed by high-density peptide micro-arrays. The present study demonstrates that high-throughput sequencing can empower phage display by (i) enabling the analysis of complex biological samples, (ii) circumventing the traditional laborious picking and functional testing of individual phage clones and (iii) reducing the number of selection rounds. PMID:26246327

  1. Peptide sequences identified by phage display are immunodominant functional motifs of Pet and Pic serine proteases secreted by Escherichia coli and Shigella flexneri.

    PubMed

    Ulises, Hernández-Chiñas; Tatiana, Gazarian; Karlen, Gazarian; Guillermo, Mendoza-Hernández; Juan, Xicohtencatl-Cortes; Carlos, Eslava

    2009-12-01

    Plasmid-encoded toxin (Pet) and protein involved in colonization (Pic), are serine protease autotransporters of Enterobacteriaceae (SPATEs) secreted by enteroaggregative Escherichia coli (EAEC), which display the GDSGSG sequence or the serine motif. Our research was directed to localize functional sites in both proteins using the phage display method. From a 12mer linear and a 7mer cysteine-constrained (C7C) libraries displayed on the M13 phage pIII protein we selected different mimotopes using IgG purified from sera of children naturally infected with EAEC producing Pet and Pic proteins, and anti-Pet and anti-Pic IgG purified from rabbits immunized with each one of these proteins. Children IgG selected a homologous group of sequences forming the consensus sequence, motif, PQPxK, and the motifs PGxI/LN and CxPDDSSxC were selected by the rabbit anti-Pet and anti-Pic IgGs, respectively. Analysis of the amino terminal region of a panel of SPATEs showed the presence in all of them of sequences matching the PGxI/LN or CxPDDSSxC motifs, and in a three-dimensional model (Modeller 9v2) designed for Pet, both these motifs were found in the globular portion of the protein, close to the protease active site GDSGSG. Antibodies induced in mice by mimotopes carrying the three aforementioned motifs were reactive with Pet, Pic, and with synthetic peptides carrying the immunogenic mimotope sequences TYPGYINHSKA and LLPQPPKLLLP, thus confirming that the peptide moiety of the selected phages induced the antibodies specific for the toxins. The antibodies induced in mice to the PGxI/LN and CxPDDSSxC mimotopes inhibited fodrin proteolysis and macrophage chemotaxis biological activities of Pet. Our results showed that we were able to generate, by a phage display procedure, mimotopes with sequence motifs PGxI/LN and CxPDDSSxC, and to identify them as functional motifs of the Pet, Pic and other SPATEs involved in their biological activities.

  2. Regulation of DNA replication at the end of the mitochondrial D-loop involves the helicase TWINKLE and a conserved sequence element

    PubMed Central

    Jemt, Elisabeth; Persson, Örjan; Shi, Yonghong; Mehmedovic, Majda; Uhler, Jay P.; Dávila López, Marcela; Freyer, Christoph; Gustafsson, Claes M.; Samuelsson, Tore; Falkenberg, Maria

    2015-01-01

    The majority of mitochondrial DNA replication events are terminated prematurely. The nascent DNA remains stably associated with the template, forming a triple-stranded displacement loop (D-loop) structure. However, the function of the D-loop region of the mitochondrial genome remains poorly understood. Using a comparative genomics approach we here identify two closely related 15 nt sequence motifs of the D-loop, strongly conserved among vertebrates. One motif is at the D-loop 5′-end and is part of the conserved sequence block 1 (CSB1). The other motif, here denoted coreTAS, is at the D-loop 3′-end. Both these sequences may prevent transcription across the D-loop region, since light and heavy strand transcription is terminated at CSB1 and coreTAS, respectively. Interestingly, the replication of the nascent D-loop strand, occurring in a direction opposite to that of heavy strand transcription, is also terminated at coreTAS, suggesting that coreTAS is involved in termination of both transcription and replication. Finally, we demonstrate that the loading of the helicase TWINKLE at coreTAS is reversible, implying that this site is a crucial component of a switch between D-loop formation and full-length mitochondrial DNA replication. PMID:26253742

  3. Frequency, type, and distribution of EST-SSRs from three genotypes of Lolium perenne, and their conservation across orthologous sequences of Festuca arundinacea, Brachypodium distachyon, and Oryza sativa

    PubMed Central

    Asp, Torben; Frei, Ursula K; Didion, Thomas; Nielsen, Klaus K; Lübberstedt, Thomas

    2007-01-01

    Background Simple sequence repeat (SSR) markers are highly informative and widely used for genetic and breeding studies in several plant species. They are used for cultivar identification, variety protection, as anchor markers in genetic mapping, and in marker-assisted breeding. Currently, a limited number of SSR markers are publicly available for perennial ryegrass (Lolium perenne). We report on the exploitation of a comprehensive EST collection in L. perenne for SSR identification. The objectives of this study were 1) to analyse the frequency, type, and distribution of SSR motifs in ESTs derived from three genotypes of L. perenne, 2) to perform a comparative analysis of SSR motif polymorphisms between allelic sequences, 3) to conduct a comparative analysis of SSR motif polymorphisms between orthologous sequences of L. perenne, Festuca arundinacea, Brachypodium distachyon, and O. sativa, 4) to identify functionally associated EST-SSR markers for application in comparative genomics and breeding. Results From 25,744 ESTs, representing 8.53 megabases of nucleotide information from three genotypes of L. perenne, 1,458 ESTs (5.7%) contained one or more SSRs. Of these SSRs, 955 (3.7%) were non-redundant. Tri-nucleotide repeats were the most abundant type of repeats followed by di- and tetra-nucleotide repeats. The EST-SSRs from the three genotypes were analysed for allelic- and/or genotypic SSR motif polymorphisms. Most of the SSR motifs (97.7%) showed no polymorphisms, whereas 22 EST-SSRs showed allelic- and/or genotypic polymorphisms. All polymorphisms identified were changes in the number of repeat units. Comparative analysis of the L. perenne EST-SSRs with sequences of Festuca arundinacea, Brachypodium distachyon, and Oryza sativa identified 19 clusters of orthologous sequences between these four species. Analysis of the clusters showed that the SSR motif generally is conserved in the closely related species F. arundinacea, but often differs in length of the SSR

  4. EXTREME: an online EM algorithm for motif discovery

    PubMed Central

    Quang, Daniel; Xie, Xiaohui

    2014-01-01

    Motivation: Identifying regulatory elements is a fundamental problem in the field of gene transcription. Motif discovery—the task of identifying the sequence preference of transcription factor proteins, which bind to these elements—is an important step in this challenge. MEME is a popular motif discovery algorithm. Unfortunately, MEME’s running time scales poorly with the size of the dataset. Experiments such as ChIP-Seq and DNase-Seq are providing a rich amount of information on the binding preference of transcription factors. MEME cannot discover motifs in data from these experiments in a practical amount of time without a compromising strategy such as discarding a majority of the sequences. Results: We present EXTREME, a motif discovery algorithm designed to find DNA-binding motifs in ChIP-Seq and DNase-Seq data. Unlike MEME, which uses the expectation-maximization algorithm for motif discovery, EXTREME uses the online expectation-maximization algorithm to discover motifs. EXTREME can discover motifs in large datasets in a practical amount of time without discarding any sequences. Using EXTREME on ChIP-Seq and DNase-Seq data, we discover many motifs, including some novel and infrequent motifs that can only be discovered by using the entire dataset. Conservation analysis of one of these novel infrequent motifs confirms that it is evolutionarily conserved and possibly functional. Availability and implementation: All source code is available at the Github repository http://github.com/uci-cbcl/EXTREME. Contact: xhx@ics.uci.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24532725

  5. Contribution of Sequence Motif, Chromatin State, and DNA Structure Features to Predictive Models of Transcription Factor Binding in Yeast

    PubMed Central

    Tsai, Zing Tsung-Yeh; Shiu, Shin-Han; Tsai, Huai-Kuang

    2015-01-01

    Transcription factor (TF) binding is determined by the presence of specific sequence motifs (SM) and chromatin accessibility, where the latter is influenced by both chromatin state (CS) and DNA structure (DS) properties. Although SM, CS, and DS have been used to predict TF binding sites, a predictive model that jointly considers CS and DS has not been developed to predict either TF-specific binding or general binding properties of TFs. Using budding yeast as model, we found that machine learning classifiers trained with either CS or DS features alone perform better in predicting TF-specific binding compared to SM-based classifiers. In addition, simultaneously considering CS and DS further improves the accuracy of the TF binding predictions, indicating the highly complementary nature of these two properties. The contributions of SM, CS, and DS features to binding site predictions differ greatly between TFs, allowing TF-specific predictions and potentially reflecting different TF binding mechanisms. In addition, a "TF-agnostic" predictive model based on three DNA “intrinsic properties” (in silico predicted nucleosome occupancy, major groove geometry, and dinucleotide free energy) that can be calculated from genomic sequences alone has performance that rivals the model incorporating experiment-derived data. This intrinsic property model allows prediction of binding regions not only across TFs, but also across DNA-binding domain families with distinct structural folds. Furthermore, these predicted binding regions can help identify TF binding sites that have a significant impact on target gene expression. Because the intrinsic property model allows prediction of binding regions across DNA-binding domain families, it is TF agnostic and likely describes general binding potential of TFs. Thus, our findings suggest that it is feasible to establish a TF agnostic model for identifying functional regulatory regions in potentially any sequenced genome. PMID:26291518

  6. Contribution of Sequence Motif, Chromatin State, and DNA Structure Features to Predictive Models of Transcription Factor Binding in Yeast.

    PubMed

    Tsai, Zing Tsung-Yeh; Shiu, Shin-Han; Tsai, Huai-Kuang

    2015-08-01

    Transcription factor (TF) binding is determined by the presence of specific sequence motifs (SM) and chromatin accessibility, where the latter is influenced by both chromatin state (CS) and DNA structure (DS) properties. Although SM, CS, and DS have been used to predict TF binding sites, a predictive model that jointly considers CS and DS has not been developed to predict either TF-specific binding or general binding properties of TFs. Using budding yeast as model, we found that machine learning classifiers trained with either CS or DS features alone perform better in predicting TF-specific binding compared to SM-based classifiers. In addition, simultaneously considering CS and DS further improves the accuracy of the TF binding predictions, indicating the highly complementary nature of these two properties. The contributions of SM, CS, and DS features to binding site predictions differ greatly between TFs, allowing TF-specific predictions and potentially reflecting different TF binding mechanisms. In addition, a "TF-agnostic" predictive model based on three DNA "intrinsic properties" (in silico predicted nucleosome occupancy, major groove geometry, and dinucleotide free energy) that can be calculated from genomic sequences alone has performance that rivals the model incorporating experiment-derived data. This intrinsic property model allows prediction of binding regions not only across TFs, but also across DNA-binding domain families with distinct structural folds. Furthermore, these predicted binding regions can help identify TF binding sites that have a significant impact on target gene expression. Because the intrinsic property model allows prediction of binding regions across DNA-binding domain families, it is TF agnostic and likely describes general binding potential of TFs. Thus, our findings suggest that it is feasible to establish a TF agnostic model for identifying functional regulatory regions in potentially any sequenced genome.

  7. Variable motif utilization in homeotic selector (Hox)-cofactor complex formation controls specificity.

    PubMed

    Lelli, Katherine M; Noro, Barbara; Mann, Richard S

    2011-12-27

    Homeotic selector (Hox) proteins often bind DNA cooperatively with cofactors such as Extradenticle (Exd) and Homothorax (Hth) to achieve functional specificity in vivo. Previous studies identified the Hox YPWM motif as an important Exd interaction motif. Using a comparative approach, we characterize the contribution of this and additional conserved sequence motifs to the regulation of specific target genes for three Drosophila Hox proteins. We find that Sex combs reduced (Scr) uses a simple interaction mechanism, where a single tryptophan-containing motif is necessary for Exd-dependent DNA-binding and in vivo functions. Abdominal-A (AbdA) is more complex, using multiple conserved motifs in a context-dependent manner. Lastly, Ultrabithorax (Ubx) is the most flexible, in that it uses multiple conserved motifs that function in parallel to regulate target genes in vivo. We propose that using different binding mechanisms with the same cofactor may be one strategy to achieve functional specificity in vivo.

  8. A type of nucleotide motif that distinguishes tobamovirus species more efficiently than nucleotide signatures.

    PubMed

    Gibbs, A J; Armstrong, J S; Gibbs, M J

    2004-10-01

    The complete genomic sequences of forty-eight tobamoviruses were classified and found to form at least twelve species clusters. Individual species were not conveniently defined by 'nucleotide signatures' (i.e. strings of one or more nucleotides unique to a taxon) as these were scattered sparsely throughout the genomes and were mostly single nucleotides. By contrast all the species were concisely and uniquely distinguished by short nucleotide motifs consisting of conserved genus-specific sites intercalated with variable sites that provided species-specific combinations of nucleotides (nucleotide combination motifs; NC-motifs). We describe the procedure for finding NC-motifs in a convenient and phylogenetically conserved region of the tobamovirus RNA polymerase gene, the '4404-50 motif'. NC-motifs have been found in other sets of homologous sequences, and are convenient for use in published taxonomic descriptions.

  9. Redundant ERF-VII Transcription Factors Bind to an Evolutionarily Conserved cis-Motif to Regulate Hypoxia-Responsive Gene Expression in Arabidopsis

    PubMed Central

    Gasch, Philipp; Fundinger, Moritz; Müller, Jana T.; Lee, Travis; Mustroph, Angelika

    2016-01-01

    The response of Arabidopsis thaliana to low-oxygen stress (hypoxia), such as during shoot submergence or root waterlogging, includes increasing the levels of ∼50 hypoxia-responsive gene transcripts, many of which encode enzymes associated with anaerobic metabolism. Upregulation of over half of these mRNAs involves stabilization of five group VII ethylene response factor (ERF-VII) transcription factors, which are routinely degraded via the N-end rule pathway of proteolysis in an oxygen- and nitric oxide-dependent manner. Despite their importance, neither the quantitative contribution of individual ERF-VIIs nor the cis-regulatory elements they govern are well understood. Here, using single- and double-null mutants, the constitutively synthesized ERF-VIIs RELATED TO APETALA2.2 (RAP2.2) and RAP2.12 are shown to act redundantly as principle activators of hypoxia-responsive genes; constitutively expressed RAP2.3 contributes to this redundancy, whereas the hypoxia-induced HYPOXIA RESPONSIVE ERF1 (HRE1) and HRE2 play minor roles. An evolutionarily conserved 12-bp cis-regulatory motif that binds to and is sufficient for activation by RAP2.2 and RAP2.12 is identified through a comparative phylogenetic motif search, promoter dissection, yeast one-hybrid assays, and chromatin immunopurification. This motif, designated the hypoxia-responsive promoter element, is enriched in promoters of hypoxia-responsive genes in multiple species. PMID:26668304

  10. Modification of Cyclic NGR Tumor Neovasculature-Homing Motif Sequence to Human Plasminogen Kringle 5 Improves Inhibition of Tumor Growth

    PubMed Central

    Jiang, Weiwei; Jin, Guanghui; Ma, Dingyuan; Wang, Feng; Fu, Tong; Chen, Xiao; Chen, Xiwen; Jia, Kunzhi; Marikar, Faiz M. M. T.; Hua, Zichun

    2012-01-01

    Background Blood vessels in tumors express higher level of aminopeptidase N (APN) than normal tissues. Evidence suggests that the CNGRC motif is an APN ligand which targets tumor vasculature. Increased expression of APN in tumor vascular endothelium, therefore, offers an opportunity for targeted delivery of NGR peptide-linked drugs to tumors. Methods/Principal Findings To determine whether an additional cyclic CNGRC sequence could improve endothelial cell homing and antitumor effect, human plasminogen kringle 5 (hPK5) was modified genetically to introduce a CNGRC motif (NGR-hPK5) and was subsequently expressed in yeast. The biological activity of NGR-hPK5 was assessed and compared with that of wild-type hPK5, in vitro and in vivo. NGR-hPK5 showed more potent antiangiogenic activity than wild-type hPK5: the former had a stronger inhibitory effect on proliferation, migration and cord formation of vascular endothelial cells, and produced a stronger antiangiogenic response in the CAM assay. To evaluate the tumor-targeting ability, both wild-type hPK5 and NGR-hPK5 were 99 mTc-labeled, for tracking biodistribution in the in vivo tumor model. By planar imaging and biodistribution analyses of major organs, NGR-hPK5 was found localized to tumor tissues at a higher level than wild-type hPK5 (approximately 3-fold). Finally, the effects of wild-type hPK5 and NGR-modified hPK5 on tumor growth were investigated in two tumor model systems. NGR modification improved tumor localization and, as a consequence, effectively inhibited the growth of mouse Lewis lung carcinoma (LLC) and human colorectal adenocarcinoma (Colo 205) cells in tumor-bearing mice. Conclusions/Significance These studies indicated that the addition of an APN targeting peptide NGR sequence could improve the ability of hPK5 to inhibit angiogenesis and tumor growth. PMID:22590653

  11. Structural motif screening reveals a novel, conserved carbohydrate-binding surface in the pathogenesis-related protein PR-5d

    PubMed Central

    2010-01-01

    Background Aromatic amino acids play a critical role in protein-glycan interactions. Clusters of surface aromatic residues and their features may therefore be useful in distinguishing glycan-binding sites as well as predicting novel glycan-binding proteins. In this work, a structural bioinformatics approach was used to screen the Protein Data Bank (PDB) for coplanar aromatic motifs similar to those found in known glycan-binding proteins. Results The proteins identified in the screen were significantly associated with carbohydrate-related functions according to gene ontology (GO) enrichment analysis, and predicted motifs were found frequently within novel folds and glycan-binding sites not included in the training set. In addition to numerous binding sites predicted in structural genomics proteins of unknown function, one novel prediction was a surface motif (W34/W36/W192) in the tobacco pathogenesis-related protein, PR-5d. Phylogenetic analysis revealed that the surface motif is exclusive to a subfamily of PR-5 proteins from the Solanaceae family of plants, and is absent completely in more distant homologs. To confirm PR-5d's insoluble-polysaccharide binding activity, a cellulose-pulldown assay of tobacco proteins was performed and PR-5d was identified in the cellulose-binding fraction by mass spectrometry. Conclusions Based on the combined results, we propose that the putative binding site in PR-5d may be an evolutionary adaptation of Solanaceae plants including potato, tomato, and tobacco, towards defense against cellulose-containing pathogens such as species of the deadly oomycete genus, Phytophthora. More generally, the results demonstrate that coplanar aromatic clusters on protein surfaces are a structural signature of glycan-binding proteins, and can be used to computationally predict novel glycan-binding proteins from 3 D structure. PMID:20678238

  12. The DNA-binding domain of BenM reveals the structural basis for the recognition of a T-N11-A sequence motif by LysR-type transcriptional regulators.

    PubMed

    Alanazi, Amer M; Neidle, Ellen L; Momany, Cory

    2013-10-01

    LysR-type transcriptional regulators (LTTRs) play critical roles in metabolism and constitute the largest family of bacterial regulators. To understand protein-DNA interactions, atomic structures of the DNA-binding domain and linker-helix regions of a prototypical LTTR, BenM, were determined by X-ray crystallography. BenM structures with and without bound DNA reveal a set of highly conserved amino acids that interact directly with DNA bases. At the N-terminal end of the recognition helix (α3) of a winged-helix-turn-helix DNA-binding motif, several residues create hydrophobic pockets (Pro30, Pro31 and Ser33). These pockets interact with the methyl groups of two thymines in the DNA-recognition motif and its complementary strand, T-N11-A. This motif usually includes some dyad symmetry, as exemplified by a sequence that binds two subunits of a BenM tetramer (ATAC-N7-GTAT). Gln29 forms hydrogen bonds to adenine in the first position of the recognition half-site (ATAC). Another hydrophobic pocket defined by Ala28, Pro30 and Pro31 interacts with the methyl group of thymine, complementary to the base at the third position of the half-site. Arg34 interacts with the complementary base of the 3' position. Arg53, in the wing, provides AT-tract recognition in the minor groove. For DNA recognition, LTTRs use highly conserved interactions between amino acids and nucleotide bases as well as numerous less-conserved secondary interactions.

  13. Discovering conserved insect microRNAs from expressed sequence tags.

    PubMed

    Jia, Qidong; Lin, Kejian; Liang, Jingdong; Yu, Lun; Li, Fei

    2010-12-01

    MicroRNAs (miRNA) participate in regulating diverse biological pathways by translational repression in animals. They have attracted increasing attention recently. However, little work has been done on the miRNA genes in agriculturally important pests. Because the transcripts of most miRNA genes are the products of type-II RNA polymerase, pri-miRNA has a poly(A) tail and appears in expressed sequence tags (EST). We developed a computational pipeline to identify miRNA genes from insect ESTs. First, 980,697 ESTs from 63 insects were collected and used to search the nr database. The ESTs which did not share significant similarities with any known protein-coding genes were treated as non-coding ESTs. Next, known mature miRNAs were used to align with non-coding ESTs. The ESTs which contain the sequence of mature miRNA were treated as candidate ESTs. Finally, putative precursors were extracted flanking the mature miRNA region in candidate ESTs and evaluated by the Triplet-SVM algorithm. As a result, 86 miRNAs from 30 insect species were found based on a strict criterion while 330 miRNAs from 51 species were found based on a loose criterion. Evolution analysis indicated that mir-467, mir-297 and mir-466 were the highest conserved miRNA families in insects. To confirm the reliability of putative insect miRNAs, the expression profile of nine predicted miRNAs in Locusta migratoria was investigated. Eight miRNAs were successfully detected by RT-PCR. Most miRNAs were expressed ubiquitously at all examined tissues and developmental stages whereas Lmi-mir-509 was specifically expressed in the thorax of the 2nd, 4th and 5th instars and adult locust. In all, our work reported an efficient computational strategy for predicting miRNA genes from insect ESTs and presented tens of miRNAs in diverse insect species which are expected to participate in many important physiological processes.

  14. A conserved motif in the ITK PH-domain is required for phosphoinositide binding and TCR signaling but dispensable for adaptor protein interactions.

    PubMed

    Hirve, Nupura; Levytskyy, Roman M; Rigaud, Stephanie; Guimond, David M; Zal, Tomasz; Sauer, Karsten; Tsoukas, Constantine D

    2012-01-01

    Binding of the membrane phospholipid phosphatidylinositol 3,4,5-trisphosphate (PIP(3)) to the Pleckstrin Homology (PH) domain of the Tec family protein tyrosine kinase, Inducible T cell Kinase (ITK), is critical for the recruitment of the kinase to the plasma membrane and its co-localization with the TCR-CD3 molecular complex. Three aromatic residues, termed the FYF motif, located in the inner walls of the phospholipid-binding pocket of the ITK PH domain, are conserved in the PH domains of all Tec kinases, but not in other PH-domain containing proteins, suggesting an important function of the FYF motif in the Tec kinase family. However, the biological significance of the FYF amino acid motif in the ITK-PH domain is unknown. To elucidate it, we have tested the effects of a FYF triple mutant (F26S, Y90F, F92S), henceforth termed FYF-ITK mutant, on ITK function. We found that FYF triple mutation inhibits the TCR-induced production of IL-4 by impairing ITK binding to PIP(3), reducing ITK membrane recruitment, inducing conformational changes at the T cell-APC contact site, and compromising phosphorylation of ITK and subsequent phosphorylation of PLCγ(1). Interestingly, however, the FYF motif is dispensable for the interaction of ITK with two of its signaling partners, SLP-76 and LAT. Thus, the FYF mutation uncouples PIP(3)-mediated ITK membrane recruitment from the interactions of the kinase with key components of the TCR signalosome and abrogates ITK function in T cells.

  15. A Conserved Motif in the ITK PH-Domain Is Required for Phosphoinositide Binding and TCR Signaling but Dispensable for Adaptor Protein Interactions

    PubMed Central

    Rigaud, Stephanie; Guimond, David M.; Zal, Tomasz; Sauer, Karsten; Tsoukas, Constantine D.

    2012-01-01

    Binding of the membrane phospholipid phosphatidylinositol 3,4,5-trisphosphate (PIP3) to the Pleckstrin Homology (PH) domain of the Tec family protein tyrosine kinase, Inducible T cell Kinase (ITK), is critical for the recruitment of the kinase to the plasma membrane and its co-localization with the TCR-CD3 molecular complex. Three aromatic residues, termed the FYF motif, located in the inner walls of the phospholipid-binding pocket of the ITK PH domain, are conserved in the PH domains of all Tec kinases, but not in other PH-domain containing proteins, suggesting an important function of the FYF motif in the Tec kinase family. However, the biological significance of the FYF amino acid motif in the ITK-PH domain is unknown. To elucidate it, we have tested the effects of a FYF triple mutant (F26S, Y90F, F92S), henceforth termed FYF-ITK mutant, on ITK function. We found that FYF triple mutation inhibits the TCR-induced production of IL-4 by impairing ITK binding to PIP3, reducing ITK membrane recruitment, inducing conformational changes at the T cell-APC contact site, and compromising phosphorylation of ITK and subsequent phosphorylation of PLCγ1. Interestingly, however, the FYF motif is dispensable for the interaction of ITK with two of its signaling partners, SLP-76 and LAT. Thus, the FYF mutation uncouples PIP3-mediated ITK membrane recruitment from the interactions of the kinase with key components of the TCR signalosome and abrogates ITK function in T cells. PMID:23028816

  16. Localization and trafficking of an isoform of the AtPRA1 family to the Golgi apparatus depend on both N- and C-terminal sequence motifs.

    PubMed

    Jung, Chan Jin; Lee, Myoung Hui; Min, Myung Ki; Hwang, Inhwan

    2011-02-01

    Prenylated Rab acceptors (PRAs) bind to prenylated Rab proteins and possibly aid in targeting Rabs to their respective compartments. In Arabidopsis, 19 isoforms of PRA1 have been identified and, depending upon the isoforms, they localize to the endoplasmic reticulum (ER), Golgi apparatus and endosomes. Here, we investigated the localization and trafficking of AtPRA1.B6, an isoform of the Arabidopsis PRA1 family. In colocalization experiments with various organellar markers, AtPRA1.B6 tagged with hemagglutinin (HA) at the N-terminus localized to the Golgi apparatus in protoplasts and transgenic plants. The valine residue at the C-terminal end and an EEE motif in the C-terminal cytoplasmic domain were critical for anterograde trafficking from the ER to the Golgi apparatus. The N-terminal region contained a sequence motif for retention of AtPRA1.B6 at the Golgi apparatus. In addition, anterograde trafficking of AtPRA1.B6 from the ER to the Golgi apparatus was highly sensitive to the HA:AtPRA1.B6 level. The region that contains the sequence motif for Golgi retention also conferred the abundance-dependent trafficking inhibition. On the basis of these results, we propose that AtPRA1.B6 localizes to the Golgi apparatus and its ER-to-Golgi trafficking and localization to the Golgi apparatus are regulated by multiple sequence motifs in both the C- and N-terminal cytoplasmic domains.

  17. Automatic annotation of protein motif function with Gene Ontology terms

    PubMed Central

    Lu, Xinghua; Zhai, Chengxiang; Gopalakrishnan, Vanathi; Buchanan, Bruce G

    2004-01-01

    Background Conserved protein sequence motifs are short stretches of amino acid sequence patterns that potentially encode the function of proteins. Several sequence pattern searching algorithms and programs exist foridentifying candidate protein motifs at the whole genome level. However, amuch needed and importanttask is to determine the functions of the newly identified protein motifs. The Gene Ontology (GO) project is an endeavor to annotate the function of genes or protein sequences with terms from a dynamic, controlled vocabulary and these annotations serve well as a knowledge base. Results This paperpresents methods to mine the GO knowledge base and use the association between the GO terms assigned to a sequence and the motifs matched by the same sequence as evidence for predicting the functions of novel protein motifs automatically. The task of assigning GO terms to protein motifsis viewed as both a binary classification and information retrieval problem, where PROSITE motifs are used as samples for mode training and functional prediction. The mutual information of a motif and aGO term association isfound to be a very useful feature. We take advantageof the known motifs to train a logistic regression classifier, which allows us to combine mutual information with other frequency-based features and obtain a probability of correctassociation. The trained logistic regression model has intuitively meaningful and logically plausible parameter values, and performs very well empirically according to our evaluation criteria. Conclusions In this research, different methods for automatic annotation of protein motifs have been investigated. Empirical result demonstrated that the methods have a great potential for detecting and augmenting information about thefunctions of newly discovered candidate protein motifs. PMID:15345032

  18. Rare k-mer DNA: Identification of sequence motifs and prediction of CpG island and promoter.

    PubMed

    Mohamed Hashim, Ezzeddin Kamil; Abdullah, Rosni

    2015-12-21

    Empirical analysis on k-mer DNA has been proven as an effective tool in finding unique patterns in DNA sequences which can lead to the discovery of potential sequence motifs. In an extensive study of empirical k-mer DNA on hundreds of organisms, the researchers found unique multi-modal k-mer spectra occur in the genomes of organisms from the tetrapod clade only which includes all mammals. The multi-modality is caused by the formation of the two lowest modes where k-mers under them are referred as the rare k-mers. The suppression of the two lowest modes (or the rare k-mers) can be attributed to the CG dinucleotide inclusions in them. Apart from that, the rare k-mers are selectively distributed in certain genomic features of CpG Island (CGI), promoter, 5' UTR, and exon. We correlated the rare k-mers with hundreds of annotated features using several bioinformatic tools, performed further intrinsic rare k-mer analyses within the correlated features, and modeled the elucidated rare k-mer clustering feature into a classifier to predict the correlated CGI and promoter features. Our correlation results show that rare k-mers are highly associated with several annotated features of CGI, promoter, 5' UTR, and open chromatin regions. Our intrinsic results show that rare k-mers have several unique topological, compositional, and clustering properties in CGI and promoter features. Finally, the performances of our RWC (rare-word clustering) method in predicting the CGI and promoter features are ranked among the top three, in eight of the CGI and promoter evaluations, among eight of the benchmarked datasets. Crown Copyright © 2015. Published by Elsevier Ltd. All rights reserved.

  19. Characterization of the dead ringer gene identifies a novel, highly conserved family of sequence-specific DNA-binding proteins.

    PubMed Central

    Gregory, S L; Kortschak, R D; Kalionis, B; Saint, R

    1996-01-01

    We reported the identification of a new family of DNA-binding proteins from our characterization of the dead ringer (dri) gene of Drosophila melanogaster. We show that dri encodes a nuclear protein that contains a sequence-specific DNA-binding domain that bears no similarity to known DNA-binding domains. A number of proteins were found to contain sequences homologous to this domain. Other proteins containing the conserved motif include yeast SWI1, two human retinoblastoma binding proteins, and other mammalian regulatory proteins. A mouse B-cell-specific regulator exhibits 75% identity with DRI over the 137-amino-acid DNA-binding domains of these proteins, indicating a high degree of conservation of this domain. Gel retardation and optimal binding site screens revealed that the in vitro sequence specificity of DRI is strikingly similar to that of many homeodomain proteins, although the sequence and predicted secondary structure do not resemble a homeodomain. The early general expression of dri and the similarity of DRI and homeodomain in vitro DNA-binding specificity compound the problem of understanding the in vivo specificity of action of these proteins. Maternally derived dri product is found throughout the embryo until germ band extension, when dri is expressed in a developmentally regulated set of tissues, including salivary gland ducts, parts of the gut, and a subset of neural cells. The discovery of this new, conserved DNA-binding domain offers an explanation for the regulatory activity of several important members of this class and predicts significant regulatory roles for the others. PMID:8622680

  20. Discriminative Motif Finding for Predicting Protein Subcellular Localization

    PubMed Central

    Lin, Tien-ho; Murphy, Robert F.; Bar-Joseph, Ziv

    2010-01-01

    Many methods have been described to predict the subcellular location of proteins from sequence information. However, most of these methods either rely on global sequence properties or use a set of known protein targeting motifs to predict protein localization. Here we develop and test a novel method that identifies potential targeting motifs using a discriminative approach based on hidden Markov models (discriminative HMMs). These models search for motifs that are present in a compartment but absent in other, nearby, compartments by utilizing an hierarchical structure that mimics the protein sorting mechanism. We show that both discriminative motif finding and the hierarchical structure improves localization prediction on a benchmark dataset of yeast proteins. The motifs identified can be mapped to known targeting motifs and they are more conserved than the average protein sequence. Using our motif-based predictions we can identify potential annotation errors in public databases for the location of some of the proteins. A software implementation and the dataset described in this paper are available from http://murphylab.web.cmu.edu/software/2009_TCBB_motif/ PMID:21233524

  1. Classification and assessment tools for structural motif discovery algorithms.

    PubMed

    Badr, Ghada; Al-Turaiki, Isra; Mathkour, Hassan

    2013-01-01

    Motif discovery is the problem of finding recurring patterns in biological data. Patterns can be sequential, mainly when discovered in DNA sequences. They can also be structural (e.g. when discovering RNA motifs). Finding common structural patterns helps to gain a better understanding of the mechanism of action (e.g. post-transcriptional regulation). Unlike DNA motifs, which are sequentially conserved, RNA motifs exhibit conservation in structure, which may be common even if the sequences are different. Over the past few years, hundreds of algorithms have been developed to solve the sequential motif discovery problem, while less work has been done for the structural case. In this paper, we survey, classify, and compare different algorithms that solve the structural motif discovery problem, where the underlying sequences may be different. We highlight their strengths and weaknesses. We start by proposing a benchmark dataset and a measurement tool that can be used to evaluate different motif discovery approaches. Then, we proceed by proposing our experimental setup. Finally, results are obtained using the proposed benchmark to compare available tools. To the best of our knowledge, this is the first attempt to compare tools solely designed for structural motif discovery. Results show that the accuracy of discovered motifs is relatively low. The results also suggest a complementary behavior among tools where some tools perform well on simple structures, while other tools are better for complex structures. We have classified and evaluated the performance of available structural motif discovery tools. In addition, we have proposed a benchmark dataset with tools that can be used to evaluate newly developed tools.

  2. The value of position-specific priors in motif discovery using MEME

    PubMed Central

    2010-01-01

    Background Position-specific priors have been shown to be a flexible and elegant way to extend the power of Gibbs sampler-based motif discovery algorithms. Information of many types–including sequence conservation, nucleosome positioning, and negative examples–can be converted into a prior over the location of motif sites, which then guides the sequence motif discovery algorithm. This approach has been shown to confer many of the benefits of conservation-based and discriminative motif discovery approaches on Gibbs sampler-based motif discovery methods, but has not previously been studied with methods based on expectation maximization (EM). Results We extend the popular EM-based MEME algorithm to utilize position-specific priors and demonstrate their effectiveness for discovering transcription factor (TF) motifs in yeast and mouse DNA sequences. Utilizing a discriminative, conservation-based prior dramatically improves MEME's ability to discover motifs in 156 yeast TF ChIP-chip datasets, more than doubling the number of datasets where it finds the correct motif. On these datasets, MEME using the prior has a higher success rate than eight other conservation-based motif discovery approaches. We also show that the same type of prior improves the accuracy of motifs discovered by MEME in mouse TF ChIP-seq data, and that the motifs tend to be of slightly higher quality those found by a Gibbs sampling algorithm using the same prior. Conclusions We conclude that using position-specific priors can substantially increase the power of EM-based motif discovery algorithms such as MEME algorithm. PMID:20380693

  3. The value of position-specific priors in motif discovery using MEME.

    PubMed

    Bailey, Timothy L; Bodén, Mikael; Whitington, Tom; Machanick, Philip

    2010-04-09

    Position-specific priors have been shown to be a flexible and elegant way to extend the power of Gibbs sampler-based motif discovery algorithms. Information of many types-including sequence conservation, nucleosome positioning, and negative examples-can be converted into a prior over the location of motif sites, which then guides the sequence motif discovery algorithm. This approach has been shown to confer many of the benefits of conservation-based and discriminative motif discovery approaches on Gibbs sampler-based motif discovery methods, but has not previously been studied with methods based on expectation maximization (EM). We extend the popular EM-based MEME algorithm to utilize position-specific priors and demonstrate their effectiveness for discovering transcription factor (TF) motifs in yeast and mouse DNA sequences. Utilizing a discriminative, conservation-based prior dramatically improves MEME's ability to discover motifs in 156 yeast TF ChIP-chip datasets, more than doubling the number of datasets where it finds the correct motif. On these datasets, MEME using the prior has a higher success rate than eight other conservation-based motif discovery approaches. We also show that the same type of prior improves the accuracy of motifs discovered by MEME in mouse TF ChIP-seq data, and that the motifs tend to be of slightly higher quality those found by a Gibbs sampling algorithm using the same prior. We conclude that using position-specific priors can substantially increase the power of EM-based motif discovery algorithms such as MEME algorithm.

  4. The orphan G protein-coupled receptor GPR139 is activated by the peptides: Adrenocorticotropic hormone (ACTH), α-, and β-melanocyte stimulating hormone (α-MSH, and β-MSH), and the conserved core motif HFRW.

    PubMed

    Nøhr, Anne Cathrine; Shehata, Mohamed A; Hauser, Alexander S; Isberg, Vignir; Mokrosinski, Jacek; Andersen, Kirsten B; Farooqi, I Sadaf; Pedersen, Daniel Sejer; Gloriam, David E; Bräuner-Osborne, Hans

    2017-01-01

    GPR139 is an orphan G protein-coupled receptor that is expressed primarily in the brain. Not much is known regarding the function of GPR139. Recently we have shown that GPR139 is activated by the amino acids l-tryptophan and l-phenylalanine (EC50 values of 220 μM and 320 μM, respectively), as well as di-peptides comprised of aromatic amino acids. This led us to hypothesize that GPR139 may be activated by peptides. Sequence alignment of the binding cavities of all class A GPCRs, revealed that the binding pocket of the melanocortin 4 receptor is similar to that of GPR139. Based on the chemogenomics principle "similar targets bind similar ligands", we tested three known endogenous melanocortin 4 receptor agonists; adrenocorticotropic hormone (ACTH) and α- and β-melanocyte stimulating hormone (α-MSH and β-MSH) on CHO-k1 cells stably expressing the human GPR139 in a Fluo-4 Ca(2+)-assay. All three peptides, as well as their conserved core motif HFRW, were found to activate GPR139 in the low micromolar range. Moreover, we found that peptides consisting of nine or ten N-terminal residues of α-MSH activate GPR139 in the submicromolar range. α-MSH1-9 was found to correspond to the product of a predicted cleavage site in the pre-pro-protein pro-opiomelanocortin (POMC). Our results demonstrate that GPR139 is a peptide receptor, activated by ACTH, α-MSH, β-MSH, the conserved core motif HFRW as well as a potential endogenous peptide α-MSH1-9. Further studies are needed to determine the functional relevance of GPR139 mediated signaling by these peptides.

  5. Sequence Conservation, Radial Distance and Packing Density in Spherical Viral Capsids

    PubMed Central

    Lee, Chi-Wen; Huang, Tsun-Tsao; Shih, Chung-Shiuan; Hwang, Jenn-Kang

    2015-01-01

    The conservation level of a residue is a useful measure about the importance of that residue in protein structure and function. Much information about sequence conservation comes from aligning homologous sequences. Profiles showing the variation of the conservation level along the sequence are usually interpreted in evolutionary terms and dictated by site similarities of a proper set of homologous sequences. Here, we report that, of the viral icosahedral capsids, the sequence conservation profile can be determined by variations in the distances between residues and the centroid of the capsid – with a direct inverse proportionality between the conservation level and the centroid distance – as well as by the spatial variations in local packing density. Examining both the centroid and the packing density models against a dataset of 51 crystal structures of nonhomologous icosahedral capsids, we found that many global patterns and minor features derived from the viral structures are consistent with those present in the sequence conservation profiles. The quantitative link between the level of conservation and structural features like centroid-distance or packing density allows us to look at residue conservation from a structural viewpoint as well as from an evolutionary viewpoint. PMID:26132081

  6. Identification of a novel calcium binding motif based on the detection of sequence insertions in the animal peroxidase domain of bacterial proteins.

    PubMed

    Santamaría-Hernando, Saray; Krell, Tino; Ramos-González, María-Isabel

    2012-01-01

    Proteins of the animal heme peroxidase (ANP) superfamily differ greatly in size since they have either one or two catalytic domains that match profile PS50292. The orf PP_2561 of Pseudomonas putida KT2440 that we have called PepA encodes a two-domain ANP. The alignment of these domains with those of PepA homologues revealed a variable number of insertions with the consensus G-x-D-G-x-x-[GN]-[TN]-x-D-D. This motif has also been detected in the structure of pseudopilin (pdb 3G20), where it was found to be involved in Ca(2+) coordination although a sequence analysis did not reveal the presence of any known calcium binding motifs in this protein. Isothermal titration calorimetry revealed that a peptide containing this consensus motif bound specifically calcium ions with affinities ranging between 33-79 µM depending on the pH. Microcalorimetric titrations of the purified N-terminal ANP-like domain of PepA revealed Ca(2+) binding with a K(D) of 12 µM and stoichiometry of 1.25 calcium ions per protein monomer. This domain exhibited peroxidase activity after its reconstitution with heme. These data led to the definition of a novel calcium binding motif that we have termed PERCAL and which was abundantly present in animal peroxidase-like domains of bacterial proteins. Bacterial heme peroxidases thus possess two different types of calcium binding motifs, namely PERCAL and the related hemolysin type calcium binding motif, with the latter being located outside the catalytic domains and in their C-terminal end. A phylogenetic tree of ANP-like catalytic domains of bacterial proteins with PERCAL motifs, including single domain peroxidases, was divided into two major clusters, representing domains with and without PERCAL motif containing insertions. We have verified that the recently reported classification of bacterial heme peroxidases in two families (cd09819 and cd09821) is unrelated to these insertions. Sequences matching PERCAL were detected in all kingdoms of life.

  7. Mutation of the Conserved Calcium-Binding Motif in Neisseria gonorrhoeae PilC1 Impacts Adhesion but Not Piliation

    PubMed Central

    Cheng, Yuan; Johnson, Michael D. L.; Burillo-Kirch, Christine; Mocny, Jeffrey C.; Anderson, James E.; Garrett, Christopher K.; Redinbo, Matthew R.

    2013-01-01

    Neisseria gonorrhoeae PilC1 is a member of the PilC family of type IV pilus-associated adhesins found in Neisseria species and other type IV pilus-producing genera. Previously, a calcium-binding domain was described in the C-terminal domains of PilY1 of Pseudomonas aeruginosa and in PilC1 and PilC2 of Kingella kingae. Genetic analysis of N. gonorrhoeae revealed a similar calcium-binding motif in PilC1. To evaluate the potential significance of this calcium-binding region in N. gonorrhoeae, we produced recombinant full-length PilC1 and a PilC1 C-terminal domain fragment. We show that, while alterations of the calcium-binding motif disrupted the ability of PilC1 to bind calcium, they did not grossly affect the secondary structure of the protein. Furthermore, we demonstrate that both full-length wild-type PilC1 and full-length calcium-binding-deficient PilC1 inhibited gonococcal adherence to cultured human cervical epithelial cells, unlike the truncated PilC1 C-terminal domain. Similar to PilC1 in K. kingae, but in contrast to the calcium-binding mutant of P. aeruginosa PilY1, an equivalent mutation in N. gonorrhoeae PilC1 produced normal amounts of pili. However, the N. gonorrhoeae PilC1 calcium-binding mutant still had partial defects in gonococcal adhesion to ME180 cells and genetic transformation, which are both essential virulence factors in this human pathogen. Thus, we conclude that calcium binding to PilC1 plays a critical role in pilus function in N. gonorrhoeae. PMID:24002068

  8. Localization of Daucus carota NMCP1 to the nuclear periphery: the role of the N-terminal region and an NLS-linked sequence motif, RYNLRR, in the tail domain

    PubMed Central

    Kimura, Yuta; Fujino, Kaien; Ogawa, Kana; Masuda, Kiyoshi

    2014-01-01

    Recent ultrastructural studies revealed that a structure similar to the vertebrate nuclear lamina exists in the nuclei of higher plants. However, plant genomes lack genes for lamins and intermediate-type filament proteins, and this suggests that plant-specific nuclear coiled-coil proteins make up the lamina-like structure in plants. NMCP1 is a protein, first identified in Daucus carota cells, that localizes exclusively to the nuclear periphery in interphase cells. It has a tripartite structure comprised of head, rod, and tail domains, and includes putative nuclear localization signal (NLS) motifs. We identified the functional NLS of DcNMCP1 (carrot NMCP1) and determined the protein regions required for localizing to the nuclear periphery using EGFP-fused constructs transiently expressed in Apium graveolens epidermal cells. Transcription was driven under a CaMV35S promoter, and the genes were introduced into the epidermal cells by a DNA-coated microprojectile delivery system. Of the NLS motifs, KRRRK and RRHK in the tail domain were highly functional for nuclear localization. Addition of the N-terminal 141 amino acids from DcNMCP1 shifted the localization of a region including these NLSs from the entire nucleus to the nuclear periphery. Using this same construct, the replacement of amino acids in RRHK or its preceding sequence, YNL, with alanine residues abolished localization to the nuclear periphery, while replacement of KRRRK did not affect localization. The sequence R/Q/HYNLRR/H, including YNL and the first part of the sequence of RRHK, is evolutionarily conserved in a subclass of NMCP1 sequences from many plant species. These results show that NMCP1 localizes to the nuclear periphery by a combined action of a sequence composed of R/Q/HYNLRR/H, NLS, and the N-terminal region including the head and a portion of the rod domain, suggesting that more than one binding site is implicated in localization of NMCP1. PMID:24616728

  9. Formation and Dissociation of the Interstrand i-Motif by the Sequences d(XnC4Ym) Monitored with Electrospray Ionization Mass Spectrometry

    NASA Astrophysics Data System (ADS)

    Cao, Yanwei; Qin, Yujiao; Bruist, Michael; Gao, Shang; Wang, Bing; Wang, Huixin; Guo, Xinhua

    2015-06-01

    Formation and dissociation of the interstrand i-motifs by DNA with the sequence d(XnC4Ym) (X and Y represent thymine, adenine, or guanine, and n, m range from 0 to 2) are studied with electrospray ionization mass spectrometry (ESI-MS), circular dichroism (CD), and UV spectrophotometry. The ion complexes detected in the gas phase and the melting temperatures (Tm) obtained in solution show that a non-C base residue located at 5' end favors formation of the four-stranded structures, with T > A > G for imparting stability. Comparatively, no rule is found when a non-C base is located at the 3' end. Detection of penta- and hexa-stranded ions indicates the formation of i-motifs with more than four strands. In addition, the i-motifs seen in our mass spectra are accompanied by single-, double-, and triple-stranded ions, and the trimeric ions were always less abundant during annealing and heat-induced dissociation process of the DNA strands in solution (pH = 4.5). This provides a direct evidence of a strand-by-strand formation and dissociation pathway of the interstrand i-motif and formation of the triple strands is the rate-limiting step. In contrast, the trimeric ions are abundant when the tetramolecular ions are subjected to collision-induced dissociation (CID) in the gas phase, suggesting different dissociation behaviors of the interstrand i-motif in the gas phase and in solution. Furthermore, hysteretic UV absorption melting and cooling curves reveal an irreversible dissociation and association kinetic process of the interstrand i-motif in solution.

  10. De Novo Regulatory Motif Discovery Identifies Significant Motifs in Promoters of Five Classes of Plant Dehydrin Genes

    PubMed Central

    Zolotarov, Yevgen; Strömvik, Martina

    2015-01-01

    Plants accumulate dehydrins in response to osmotic stresses. Dehydrins are divided into five different classes, which are thought to be regulated in different manners. To better understand differences in transcriptional regulation of the five dehydrin classes, de novo motif discovery was performed on 350 dehydrin promoter sequences from a total of 51 plant genomes. Overrepresented motifs were identified in the promoters of five dehydrin classes. The Kn dehydrin promoters contain motifs linked with meristem specific expression, as well as motifs linked with cold/dehydration and abscisic acid response. KS dehydrin promoters contain a motif with a GATA core. SKn and YnSKn dehydrin promoters contain motifs that match elements connected with cold/dehydration, abscisic acid and light response. YnKn dehydrin promoters contain motifs that match abscisic acid and light response elements, but not cold/dehydration response elements. Conserved promoter motifs are present in the dehydrin classes and across different plant lineages, indicating that dehydrin gene regulation is likely also conserved. PMID:26114291

  11. Multi-layered control of Galectin-8 mediated autophagy during adenovirus cell entry through a conserved PPxY motif in the viral capsid.

    PubMed

    Montespan, Charlotte; Marvin, Shauna A; Austin, Sisley; Burrage, Andrew M; Roger, Benoit; Rayne, Fabienne; Faure, Muriel; Campell, Edward M; Schneider, Carola; Reimer, Rudolph; Grünewald, Kay; Wiethoff, Christopher M; Wodrich, Harald

    2017-02-01

    Cells employ active measures to restrict infection by pathogens, even prior to responses from the innate and humoral immune defenses. In this context selective autophagy is activated upon pathogen induced membrane rupture to sequester and deliver membrane fragments and their pathogen contents for lysosomal degradation. Adenoviruses, which breach the endosome upon entry, escape this fate by penetrating into the cytosol prior to autophagosome sequestration of the ruptured endosome. We show that virus induced membrane damage is recognized through Galectin-8 and sequesters the autophagy receptors NDP52 and p62. We further show that a conserved PPxY motif in the viral membrane lytic protein VI is critical for efficient viral evasion of autophagic sequestration after endosomal lysis. Comparing the wildtype with a PPxY-mutant virus we show that depletion of Galectin-8 or suppression of autophagy in ATG5-/- MEFs rescues infectivity of the PPxY-mutant virus while depletion of the autophagy receptors NDP52, p62 has only minor effects. Furthermore we show that wildtype viruses exploit the autophagic machinery for efficient nuclear genome delivery and control autophagosome formation via the cellular ubiquitin ligase Nedd4.2 resulting in reduced antigenic presentation. Our data thus demonstrate that a short PPxY-peptide motif in the adenoviral capsid permits multi-layered viral control of autophagic processes during entry.

  12. A short conserved motif in ALYREF directs cap- and EJC-dependent assembly of export complexes on spliced mRNAs

    PubMed Central

    Gromadzka, Agnieszka M.; Steckelberg, Anna-Lena; Singh, Kusum K.; Hofmann, Kay; Gehring, Niels H.

    2016-01-01

    The export of messenger RNAs (mRNAs) is the final of several nuclear posttranscriptional steps of gene expression. The formation of export-competent mRNPs involves the recruitment of export factors that are assumed to facilitate transport of the mature mRNAs. Using in vitro splicing assays, we show that a core set of export factors, including ALYREF, UAP56 and DDX39, readily associate with the spliced RNAs in an EJC (exon junction complex)- and cap-dependent manner. In order to elucidate how ALYREF and other export adaptors mediate mRNA export, we conducted a computational analysis and discovered four short, conserved, linear motifs present in RNA-binding proteins. We show that mutation in one of the new motifs (WxHD) in an unstructured region of ALYREF reduced RNA binding and abolished the interaction with eIF4A3 and CBP80. Additionally, the mutation impaired proper localization to nuclear speckles and export of a spliced reporter mRNA. Our results reveal important details of the orchestrated recruitment of export factors during the formation of export competent mRNPs. PMID:26773052

  13. Multi-layered control of Galectin-8 mediated autophagy during adenovirus cell entry through a conserved PPxY motif in the viral capsid

    PubMed Central

    Montespan, Charlotte; Marvin, Shauna A.; Burrage, Andrew M.; Roger, Benoit; Rayne, Fabienne; Schneider, Carola; Reimer, Rudolph; Wiethoff, Christopher M.

    2017-01-01

    Cells employ active measures to restrict infection by pathogens, even prior to responses from the innate and humoral immune defenses. In this context selective autophagy is activated upon pathogen induced membrane rupture to sequester and deliver membrane fragments and their pathogen contents for lysosomal degradation. Adenoviruses, which breach the endosome upon entry, escape this fate by penetrating into the cytosol prior to autophagosome sequestration of the ruptured endosome. We show that virus induced membrane damage is recognized through Galectin-8 and sequesters the autophagy receptors NDP52 and p62. We further show that a conserved PPxY motif in the viral membrane lytic protein VI is critical for efficient viral evasion of autophagic sequestration after endosomal lysis. Comparing the wildtype with a PPxY-mutant virus we show that depletion of Galectin-8 or suppression of autophagy in ATG5-/- MEFs rescues infectivity of the PPxY-mutant virus while depletion of the autophagy receptors NDP52, p62 has only minor effects. Furthermore we show that wildtype viruses exploit the autophagic machinery for efficient nuclear genome delivery and control autophagosome formation via the cellular ubiquitin ligase Nedd4.2 resulting in reduced antigenic presentation. Our data thus demonstrate that a short PPxY-peptide motif in the adenoviral capsid permits multi-layered viral control of autophagic processes during entry. PMID:28192531

  14. A short conserved motif in ALYREF directs cap- and EJC-dependent assembly of export complexes on spliced mRNAs.

    PubMed

    Gromadzka, Agnieszka M; Steckelberg, Anna-Lena; Singh, Kusum K; Hofmann, Kay; Gehring, Niels H

    2016-03-18

    The export of messenger RNAs (mRNAs) is the final of several nuclear posttranscriptional steps of gene expression. The formation of export-competent mRNPs involves the recruitment of export factors that are assumed to facilitate transport of the mature mRNAs. Using in vitro splicing assays, we show that a core set of export factors, including ALYREF, UAP56 and DDX39, readily associate with the spliced RNAs in an EJC (exon junction complex)- and cap-dependent manner. In order to elucidate how ALYREF and other export adaptors mediate mRNA export, we conducted a computational analysis and discovered four short, conserved, linear motifs present in RNA-binding proteins. We show that mutation in one of the new motifs (WxHD) in an unstructured region of ALYREF reduced RNA binding and abolished the interaction with eIF4A3 and CBP80. Additionally, the mutation impaired proper localization to nuclear speckles and export of a spliced reporter mRNA. Our results reveal important details of the orchestrated recruitment of export factors during the formation of export competent mRNPs.

  15. Functional interaction between the Fanconi Anemia D2 protein and proliferating cell nuclear antigen (PCNA) via a conserved putative PCNA interaction motif.

    PubMed

    Howlett, Niall G; Harney, Julie A; Rego, Meghan A; Kolling, Frederick W; Glover, Thomas W

    2009-10-16

    Fanconi Anemia (FA) is a rare recessive disease characterized by congenital abnormalities, bone marrow failure, and cancer susceptibility. The FA proteins and the familial breast cancer susceptibility gene products, BRCA1 and FANCD1/BRCA2, function cooperatively in the FA-BRCA pathway to repair damaged DNA and to prevent cellular transformation. Activation of this pathway occurs via the mono-ubiquitination of the FANCD2 protein, targeting it to nuclear foci where it co-localizes with FANCD1/BRCA2, RAD51, and PCNA. The regulation of the mono-ubiquitination of FANCD2, as well as its function in DNA repair remain poorly understood. In this study, we have further characterized the interaction between the FANCD2 and PCNA proteins. We have identified a highly conserved, putative FANCD2 PCNA interaction motif (PIP-box), and demonstrate that mutation of this motif disrupts FANCD2-PCNA binding and precludes the mono-ubiquitination of FANCD2. Consequently, the FANCD2 PIP-box mutant protein fails to correct the mitomycin C hypersensitivity of FA-D2 patient cells. Our results suggest that PCNA may function as a molecular platform to facilitate the mono-ubiquitination of FANCD2 and activation of the FA-BRCA pathway.

  16. The carboxyl-terminal part of the putative Berne virus polymerase is expressed by ribosomal frameshifting and contains sequence motifs which indicate that toro- and coronaviruses are evolutionarily related.

    PubMed Central

    Snijder, E J; den Boon, J A; Bredenbeek, P J; Horzinek, M C; Rijnbrand, R; Spaan, W J

    1990-01-01

    Sequence analysis of the 3' part (8 kb) of the polymerase gene of the torovirus prototype Berne virus (BEV) revealed that this area contains at least two open reading frames (provisionally designated ORF1a and ORF1b) which overlap by 12 nucleotides. The complete sequence of ORF1b (6873 nucleotides) was determined. Like the coronaviruses, BEV was shown to express its ORF1b by ribosomal frameshifting during translation of the genomic RNA. The predicted tertiary RNA structure (a pseudoknot) in the toro- and coronaviral frameshift-directing region is similar. Analysis of the amino acid sequence of the predicted BEV ORF1b translation product revealed homology with the ORF1b product of coronaviruses. Four conserved domains were identified: the putative polymerase domain, an area containing conserved cysteine and histidine residues, a putative helicase motif, and a domain which seems to be unique for toro- and coronaviruses. The data on the 3' part of the polymerase gene of BEV supplement previously observed similarities between toro- and coronaviruses at the level of genome organization and expression. The two virus families are more closely related to each other than to other families of positive-stranded RNA viruses. Images PMID:2388833

  17. Identification of a human cDNA sequence which encodes a novel membrane-associated protein containing a zinc metalloprotease motif.

    PubMed

    Bao, Ying-Chun; Tsuruga, Hiromichi; Hirai, Momoki; Yasuda, Kazuki; Yokoi, Norihide; Kitamura, Toshio; Kumagai, Hidetoshi

    2003-06-30

    We report the cloning and characterization of a human cDNA predicted to encode a novel hydrophobic protein containing four transmembrane domains and a zinc metalloprotease motif, HEXXH, between the third and fourth transmembrane domains, and have named the molecule metalloprotease-related protein-1 (MPRP-1). The MPRP-1 gene was localized to chromosome 1-p32.3 by radiation hybrid mapping, and Northern blot analysis revealed expression in many organs, with strong expression in the heart, skeletal muscle, kidney and liver. Immunohistochemical analyisis showed that MPRP-1 was localized in the endoplasmic reticulum (ER), and not in the Golgi compartment. Fragments of DNA encoding a segment homologous to the HEXXH motif of MPRP-1 are widely found in bacteria, yeast, plants, and animals. These results suggest that the MPRP-1 may have highly conserved functions, such as in intracellular proteolytic processing in the ER.

  18. High sequence conservation among cucumber mosaic virus isolates from lily.

    PubMed

    Chen, Y K; Derks, A F; Langeveld, S; Goldbach, R; Prins, M

    2001-08-01

    For classification of Cucumber mosaic virus (CMV) isolates from ornamental crops of different geographical areas, these were characterized by comparing the nucleotide sequences of RNAs 4 and the encoded coat proteins. Within the ornamental-infecting CMV viruses both subgroups were represented. CMV isolates of Alstroemeria and crocus were classified as subgroup II isolates, whereas 8 other isolates, from lily, gladiolus, amaranthus, larkspur, and lisianthus, were identified as subgroup I members. In general, nucleotide sequence comparisons correlated well with geographic distribution, with one notable exception: the analyzed nucleotide sequences of 5 lily isolates showed remarkably high homology despite different origins.

  19. A conserved predicted pseudoknot in the NS2A-encoding sequence of West Nile and Japanese encephalitis flaviviruses suggests NS1' may derive from ribosomal frameshifting

    PubMed Central

    Firth, Andrew E; Atkins, John F

    2009-01-01

    Japanese encephalitis, West Nile, Usutu and Murray Valley encephalitis viruses form a tight subgroup within the larger Flavivirus genus. These viruses utilize a single-polyprotein expression strategy, resulting in ~10 mature proteins. Plotting the conservation at synonymous sites along the polyprotein coding sequence reveals strong conservation peaks at the very 5' end of the coding sequence, and also at the 5' end of the sequence encoding the NS2A protein. Such peaks are generally indicative of functionally important non-coding sequence elements. The second peak corresponds to a predicted stable pseudoknot structure whose biological importance is supported by compensatory mutations that preserve the structure. The pseudoknot is preceded by a conserved slippery heptanucleotide (Y CCU UUU), thus forming a classical stimulatory motif for -1 ribosomal frameshifting. We hypothesize, therefore, that the functional importance of the pseudoknot is to stimulate a portion of ribosomes to shift -1 nt into a short (45 codon), conserved, overlapping open reading frame, termed foo. Since cleavage at the NS1-NS2A boundary is known to require synthesis of NS2A in cis, the resulting transframe fusion protein is predicted to be NS1-NS2AN-term-FOO. We hypothesize that this may explain the origin of the previously identified NS1 'extension' protein in JEV-group flaviviruses, known as NS1'. PMID:19196463

  20. Trypanosoma cruzi Binds to Cytokeratin through Conserved Peptide Motifs Found in the Laminin-G-Like Domain of the gp85/Trans-sialidase Proteins

    PubMed Central

    Teixeira, Andre Azevedo Reis; de Vasconcelos, Veronica de Cássia Sardinha; Colli, Walter; Alves, Maria Júlia Manso; Giordano, Ricardo José

    2015-01-01

    Background Chagas' disease, caused by the protozoan parasite Trypanosoma cruzi, is a disease that affects millions of people most of them living in South and Central Americas. There are few treatment options for individuals with Chagas' disease making it important to understand the molecular details of parasite infection, so novel therapeutic alternatives may be developed for these patients. Here, we investigate the interaction between host cell intermediate filament proteins and the T. cruzi gp85 glycoprotein superfamily with hundreds of members that have long been implicated in parasite cell invasion. Methodology/Principal Findings An in silico analysis was utilized to identify peptide motifs shared by the gp85 T. cruzi proteins and, using phage display, these selected peptide motifs were screened for their ability to bind to cells. One peptide, named TS9, showed significant cell binding capacity and was selected for further studies. Affinity chromatography, phage display and invasion assays revealed that peptide TS9 binds to cytokeratins and vimentin, and prevents T. cruzi cell infection. Interestingly, peptide TS9 and a previously identified binding site for intermediate filament proteins are disposed in an antiparallel β-sheet fold, present in a conserved laminin-G-like domain shared by all members of the family. Moreover, peptide TS9 overlaps with an immunodominant T cell epitope. Conclusions/Significance Taken together, the present study reinforces previous results from our group implicating the gp85 superfamily of glycoproteins and the intermediate filament proteins cytokeratin and vimentin in the parasite infection process. It also suggests an important role in parasite biology for the conserved laminin-G-like domain, present in all members of this large family of cell surface proteins. PMID:26398185

  1. Small yet effective: the ethylene responsive element binding factor-associated amphiphilic repression (EAR) motif.

    PubMed

    Kagale, Sateesh; Rozwadowski, Kevin

    2010-06-01

    The Ethylene-responsive element binding factor-associated Amphiphilic Repression (EAR) motif is a small yet distinct regulatory motif that is conserved in many plant transcriptional regulator (TR) proteins associated with diverse biological functions. We have previously established a list of high-confidence Arabidopsis EAR repressors, the EAR repressome, comprising 219 TRs belonging to 21 different TR families. This class of proteins and the sequence context of the EAR motif exhibited a high degree of conservation across evolutionarily diverse plant species. Our comprehensive genome-wide analysis enabled refining EAR motifs as comprising either LxLxL or DLNxxP. Comparing the representation of these sequence signatures in TRs to that of other repressor motifs we show that the EAR motif is the one most frequently represented, detected in 10 to 25% of the TRs from diverse plant species. The mechanisms involved in regulation of EAR motif function and the cellular fates of EAR repressors are currently not well understood. Our earlier analysis had implicated amino acid residues flanking the EAR motifs in regulation of their functionality. Here, we present additional evidence supporting possible regulation of EAR motif function by phosphorylation of integral or adjacent Ser and/or Thr residues. Additionally, we discuss potential novel roles of EAR motifs in plant-pathogen interaction and processes other than transcriptional repression.

  2. Molecular cloning of a zinc finger autoantigen transiently associated with interphase nucleolus and mitotic centromeres and midbodies. Orthologous proteins with nine CXXC motifs highly conserved from nematodes to humans.

    PubMed

    Bolívar, J; Díaz, I; Iglesias, C; Valdivia, M M

    1999-12-17

    We have cloned a novel human autoimmune antigen in a patient suffering from rheumatoid arthritis with high levels of antibodies to the nucleolus organizer regions. Initially the human autoimmune serum was used to select a cDNA of 317 amino acids from a hamster expression library. Using the hamster DNA as a probe, we isolated the human homologous cDNA of 320 amino acids. Human and hamster polypeptides share a 95% amino acid homology. The deduced 36-kDa protein contains a putative amino-terminal NLS signal, nine cysteine-X-X-cysteine motifs highly conserved, and a carboxyl-terminal poly acidic region. Several homologous expressed sequence tags have been identified in data bases suggesting that orthologous proteins are present throughout evolution from worms to humans. A Drosophila expressed sequence tag was further completely sequenced for a full-length protein with 60% amino acid identity to the human homologue. Northern blot analysis revealed that this novel protein is widely distributed in human tissues with significantly higher expression levels in heart and skeletal muscle. Specific antibodies to the recombinant protein and transfection experiments demonstrated by immunofluorescence the localization of the protein predominantly but not exclusively to the nucleolus of interphase mammalian cells. In actinomycin D-treated cells the protein remains associated with the nucleolus but is not segregated, like other ribosomal factors such as upstream binding factor. In mitosis the protein was found to be associated with centromeres and concentrated at the midbody in cytokinesis. Transient distribution of this evolutionarily conserved zinc finger nucleolar autoantigen to the mitotic centromeres may provide the means for several aspects of cell cycle control and transcriptional regulation.

  3. Ovodefensins, an Oviduct-Specific Antimicrobial Gene Family, Have Evolved in Birds and Reptiles to Protect the Egg by Both Sequence and Intra-Six-Cysteine Sequence Motif Spacing.

    PubMed

    Whenham, Natasha; Lu, Tian Chee; Maidin, Maisarah B M; Wilson, Peter W; Bain, Maureen M; Stevenson, M Lynn; Stevens, Mark P; Bedford, Michael R; Dunn, Ian C

    2015-06-01

    Ovodefensins are a novel beta defensin-related family of antimicrobial peptides containing conserved glycine and six cysteine residues. Originally thought to be restricted to the albumen-producing region of the avian oviduct, expression was found in chicken, turkey, duck, and zebra finch in large quantities in many parts of the oviduct, but this varied between species and between gene forms in the same species. Using new search strategies, the ovodefensin family now has 35 members, including reptiles, but no representatives outside birds and reptiles have been found. Analysis of their evolution shows that ovodefensins divide into six groups based on the intra-cysteine amino acid spacing, representing a unique mechanism alongside traditional evolution of sequence. The groups have been used to base a nomenclature for the family. Antimicrobial activity for three ovodefensins from chicken and duck was confirmed against Escherichia coli and a pathogenic E. coli strain as well as a Gram-positive organism, Staphylococcus aureus, for the first time. However, activity varied greatly between peptides, with Gallus gallus OvoDA1 being the most potent, suggesting a link with the different structures. Expression of Gallus gallus OvoDA1 (gallin) in the oviduct was increased by estrogen and progesterone and in the reproductive state. Overall, the results support the hypothesis that ovodefensins evolved to protect the egg, but they are not necessarily restricted to the egg white. Therefore, divergent motif structure and sequence present an interesting area of research for antimicrobial peptide design and understanding protection of the cleidoic egg.

  4. The telomere repeat motif of basal Metazoa.

    PubMed

    Traut, Walther; Szczepanowski, Monika; Vítková, Magda; Opitz, Christian; Marec, Frantisek; Zrzavý, Jan

    2007-01-01

    In most eukaryotes the telomeres consist of short DNA tandem repeats and associated proteins. Telomeric repeats are added to the chromosome ends by telomerase, a specialized reverse transcriptase. We examined telomerase activity and telomere repeat sequences in representatives of basal metazoan groups. Our results show that the 'vertebrate' telomere motif (TTAGGG)( n ) is present in all basal metazoan groups, i.e. sponges, Cnidaria, Ctenophora, and Placozoa, and also in the unicellular metazoan sister group, the Choanozoa. Thus it can be considered the ancestral telomere repeat motif of Metazoa. It has been conserved from the metazoan radiation in most animal phylogenetic lineages, and replaced by other motifs-according to our present knowledge-only in two major lineages, Arthropoda and Nematoda.

  5. Coupling DNA-binding and ATP hydrolysis in Escherichia coli RecQ: role of a highly conserved aromatic-rich sequence.

    PubMed

    Zittel, Morgan C; Keck, James L

    2005-01-01

    RecQ enzymes are broadly conserved Superfamily-2 (SF-2) DNA helicases that play critical roles in DNA metabolism. RecQ proteins use the energy of ATP hydrolysis to drive DNA unwinding; however, the mechanisms by which RecQ links ATPase activity to DNA-binding/unwinding are unknown. In many Superfamily-1 (SF-1) DNA helicases, helicase sequence motif III links these activities by binding both single-stranded (ss) DNA and ATP. However, the ssDNA-binding aromatic-rich element in motif III present in these enzymes is missing from SF-2 helicases, raising the question of how these enzymes link ATP hydrolysis to DNA-binding/unwinding. We show that Escherichia coli RecQ contains a conserved aromatic-rich loop in its helicase domain between motifs II and III. Although placement of the RecQ aromatic-rich loop is topologically distinct relative to the SF-1 enzymes, both loops map to similar tertiary structural positions. We examined the functions of the E.coli RecQ aromatic-rich loop using RecQ variants with single amino acid substitutions within the segment. Our results indicate that the aromatic-rich loop in RecQ is critical for coupling ATPase and DNA-binding/unwinding activities. Our studies also suggest that RecQ's aromatic-rich loop might couple ATP hydrolysis to DNA-binding in a mechanistically distinct manner from SF-1 helicases.

  6. Retinoic acid-induced down-regulation of the interleukin-2 promoter via cis-regulatory sequences containing an octamer motif.

    PubMed Central

    Felli, M P; Vacca, A; Meco, D; Screpanti, I; Farina, A R; Maroder, M; Martinotti, S; Petrangeli, E; Frati, L; Gulino, A

    1991-01-01

    Retinoic acid (RA) is known to influence the proliferation and differentiation of a wide variety of transformed and developing cells. We found that RA and the specific RA receptor (RAR) ligand Ch55 inhibited the phorbol ester and calcium ionophore-induced expression of the T-cell growth factor interleukin-2 (IL-2) gene. Expression of transiently transfected chloramphenicol acetyltransferase vectors containing the 5'-flanking region of the IL-2 gene was also inhibited by RA. RA-induced down-regulation of the IL-2 enhancer is mediated by RAR, since overexpression of transfected RARs increased RA sensitivity of the IL-2 promoter. Functional analysis of chloramphenicol acetyltransferase vectors containing either internal deletion mutants of the region from -317 to +47 bp of the IL-2 enhancer or multimerized cis-regulatory elements showed that the RA-responsive element in the IL-2 promoter mapped to sequences containing an octamer motif. RAR also inhibited the transcriptional activity of the octamer motif of the immunoglobulin heavy chain enhancer. In spite of the transcriptional inhibition of the IL-2 octamer motif, RA did not decrease the in vitro DNA-binding capability of octamer-1 protein. These results identify a regulatory pathway within the IL-2 promoter which involves the octamer motif and RAR. Images PMID:1652063

  7. Loop Sequence Context Influences the Formation and Stability of the i-Motif for DNA Oligomers of Sequence (CCCXXX)4, where X = A and/or T, under Slightly Acidic Conditions.

    PubMed

    McKim, Mikeal; Buxton, Alexander; Johnson, Courtney; Metz, Amanda; Sheardy, Richard D

    2016-08-11

    The structure and stability of DNA is highly dependent upon the sequence context of the bases (A, G, C, and T) and the environment under which the DNA is prepared (e.g., buffer, temperature, pH, ionic strength). Understanding the factors that influence structure and stability of the i-motif conformation can lead to the design of DNA sequences with highly tunable properties. We have been investigating the influence of pH and temperature on the conformations and stabilities for all permutations of the DNA sequence (CCCXXX)4, where X = A and/or T, using spectroscopic approaches. All oligomers undergo transitions from single-stranded structures at pH 7.0 to i-motif conformations at pH 5.0 as evidenced by circular dichroism (CD) studies. These folded structures possess stacked C:CH(+) base pairs joined by loops of 5'-XXX-3'. Although the pH at the midpoint of the transition (pHmp) varies slightly with loop sequence, the linkage between pH and log K for the proton induced transition is highly loop sequence dependent. All oligomers also undergo the thermally induced i-motif to single-strand transition at pH 5.0 as the temperature is increased from 25 to 95 °C. The temperature at the midpoint of this transition (Tm) is also highly dependent on loop sequence context effects. For seven of eight possible permutations, the pH induced, and thermally induced transitions appear to be highly cooperative and two state. Analysis of the CD optical melting profiles via a van't Hoff approach reveals sequence-dependent thermodynamic parameters for the unfolding as well. Together, these data reveal that the i-motif conformation exhibits exquisite sensitivity to loop sequence context with respect to formation and stability.

  8. Conserved cytoplasmic motifs that distinguish sub-groups of the polyprenol phosphate:N-acetylhexosamine-1-phosphate transferase family.

    PubMed

    Anderson, M S; Eveland, S S; Price, N P

    2000-10-15

    WecA, MraY and WbcO are conserved members of the polyprenol phosphate:N-acetylhexosamine-1-phosphate transferase family involved in the assembly of bacterial cell walls, and catalyze reactions involving a membrane-associated polyprenol phosphate acceptor substrate and a cytoplasmically located UDP-D-amino sugar donor. MraY, WbcO and WecA purportedly utilize different UDP-sugars, although the molecular basis of this specificity is largely unknown. However, domain variations involved in specificity are predicted to occur on the cytoplasmic side of the membrane, adjacent to conserved domains involved in the mechanistic activity, and with access to the cytoplasmically located sugar nucleotides. Conserved C-terminal domains have been identified that satisfy these criteria. Topological analyses indicate that they form the highly basic, fifth cytoplasmic loop between transmembrane regions IX and X. Four diverse loops are apparent, for MraY, WecA, WbcO and RgpG, that uniquely characterize these sub-groups of the transferase family, and a correlation is evident with the known or implied UDP-sugar specificity.

  9. Analyses of phylogeny, evolution, conserved sequences and genome-wide expression of the ICK/KRP family of plant CDK inhibitors

    PubMed Central

    Torres Acosta, Juan Antonio; Fowke, Larry C.; Wang, Hong

    2011-01-01

    Background and Aims The cell cycle is controlled by cyclin-dependent kinases (CDKs), and CDK inhibitors are major regulators of their activities. The ICK/KRP family of CDK inhibitors has been reported in several plants, with seven members in arabidopsis; however, the phylogenetic relationship among members in different species is unknown. Also, there is a need to understand how these genes and proteins are regulated. Furthermore, little information is available on the functional differences among ICK/KRP family members. Methods We searched publicly available databases and identified over 120 unique ICK/KRP protein sequences from more than 60 plant species. Phylogenetic analysis was performed using 101 full-length sequences from 40 species and intron–exon organization of ICK/KRP genes in model species. Conserved sequences and motifs were analysed using ICK/KRP protein sequences from arabidopsis (Arabidopsis thaliana), rice (Orysa sativa) and poplar (Populus trichocarpa). In addition, gene expression was examined using microarray data from arabidopsis, rice and poplar, and further analysed by RT-PCR for arabidopsis. Key Results and Conclusions Phylogenetic analysis showed that plant ICK/KRP proteins can be grouped into three major classes. Whereas the C-class contains sequences from dicotyledons, monocotyledons and gymnosperms, the A- and B-classes contain only sequences from dicotyledons or monocotyledons, respectively, suggesting that the A- and B-classes might have evolved from the C-class. This classification is also supported by exon–intron organization. Genes in the A- and B- classes have four exons, whereas genes in the C-class have only three exons. Analysis of sequences from arabidopsis, rice and poplar identified conserved sequence motifs, some of which had not been described previously, and putative functional sites. The presence of conserved motifs in different family members is consistent with the classification. In addition, gene expression analysis

  10. Nuclear Magnetic Resonance Solution Structures of Lacticin Q and Aureocin A53 Reveal a Structural Motif Conserved among Leaderless Bacteriocins with Broad-Spectrum Activity.

    PubMed

    Acedo, Jeella Z; van Belkum, Marco J; Lohans, Christopher T; Towle, Kaitlyn M; Miskolzie, Mark; Vederas, John C

    2016-02-02

    Lacticin Q (LnqQ) and aureocin A53 (AucA) are leaderless bacteriocins from Lactococcus lactis QU5 and Staphylococcus aureus A53, respectively. These bacteriocins are characterized by the absence of an N-terminal leader sequence and are active against a broad rang