Science.gov

Sample records for conserved sequence motif

  1. The BsaHI restriction-modification system: Cloning, sequencing and analysis of conserved motifs

    PubMed Central

    Neely, Robert K; Roberts, Richard J

    2008-01-01

    Background Restriction and modification enzymes typically recognise short DNA sequences of between two and eight bases in length. Understanding the mechanism of this recognition represents a significant challenge that we begin to address for the BsaHI restriction-modification system, which recognises the six base sequence GRCGYC. Results The DNA sequences of the genes for the BsaHI methyltransferase, bsaHIM, and restriction endonuclease, bsaHIR, have been determined (GenBank accession #EU386360), cloned and expressed in E. coli. Both the restriction endonuclease and methyltransferase enzymes share significant similarity with a group of 6 other enzymes comprising the restriction-modification systems HgiDI and HgiGI and the putative HindVP, NlaCORFDP, NpuORFC228P and SplZORFNP restriction-modification systems. A sequence alignment of these homologues shows that their amino acid sequences are largely conserved and highlights several motifs of interest. We target one such conserved motif, reading SPERRFD, at the C-terminal end of the bsaHIR gene. A mutational analysis of these amino acids indicates that the motif is crucial for enzymatic activity. Sequence alignment of the methyltransferase gene reveals a short motif within the target recognition domain that is conserved among enzymes recognising the same sequences. Thus, this motif may be used as a diagnostic tool to define the recognition sequences of the cytosine C5 methyltransferases. Conclusion We have cloned and sequenced the BsaHI restriction and modification enzymes. We have identified a region of the R. BsaHI enzyme that is crucial for its activity. Analysis of the amino acid sequence of the BsaHI methyltransferase enzyme led us to propose two new motifs that can be used in the diagnosis of the recognition sequence of the cytosine C5-methyltransferases. PMID:18479503

  2. Conserved sequence motifs among bacterial, eukaryotic, and archaeal phosphatases that define a new phosphohydrolase superfamily.

    PubMed

    Thaller, M C; Schippa, S; Rossolini, G M

    1998-07-01

    Members of a new molecular family of bacterial nonspecific acid phosphatases (NSAPs), indicated as class C, were found to share significant sequence similarities to bacterial class B NSAPs and to some plant acid phosphatases, representing the first example of a family of bacterial NSAPs that has a relatively close eukaryotic counterpart. Despite the lack of an overall similarity, conserved sequence motifs were also identified among the above enzyme families (class B and class C bacterial NSAPs, and related plant phosphatases) and several other families of phosphohydrolases, including bacterial phosphoglycolate phosphatases, histidinol-phosphatase domains of the bacterial bifunctional enzymes imidazole-glycerolphosphate dehydratases, and bacterial, eukaryotic, and archaeal phosphoserine phosphatases and threalose-6-phosphatases. These conserved motifs are clustered within two domains, separated by a variable spacer region, according to the pattern [FILMAVT]-D-[ILFRMVY]-D-[GSNDE]-[TV]-[ILVAM]-[AT S VILMC]-X-¿YFWHKR)-X-¿YFWHNQ¿-X( 102,191)-¿KRHNQ¿-G-D-¿FYWHILVMC¿-¿QNH¿-¿FWYGP¿-D -¿PSNQYW¿. The dephosphorylating activity common to all these proteins supports the definition of this phosphatase motif and the inclusion of these enzymes into a superfamily of phosphohydrolases that we propose to indicate as "DDDD" after the presence of the four invariant aspartate residues. Database searches retrieved various hypothetical proteins of unknown function containing this or similar motifs, for which a phosphohydrolase activity could be hypothesized.

  3. Recognition of distantly related protein sequences using conserved motifs and neural networks.

    PubMed

    Frishman, D; Argos, P

    1992-12-01

    A sensitive technique for protein sequence motif recognition based on neural networks has been developed. It involves three major steps. (1) At each appropriate alignment position of a set of N matched sequences, a set of N aligned oligopeptides is specified with preselected window length. N neural nets are subsequently and successively trained on N-1 amino acid spans after eliminating each ith oligopeptide. A test for recognition of each of the ith spans is performed. The average neural net recognition over N such trials is used as a measure of conservation for the particular windowed region of the multiple alignment. This process is repeated for all possible spans of given length in the multiple alignment. (2) The M most conserved regions are regarded as motifs and the oligopeptides within each are used to train intensively M individual neural networks. (3) The M networks are then applied in a search for related primary structures in a databank of known protein sequences. The oligopeptide spans in the database sequence with strongest neural net output for each of the M networks are saved and then scored according to the output signals and the proper combination that follows the expected N- to C-terminal sequence order. The motifs from the database with highest similarity scores can then be used to retrain the M neural nets, which can be subsequently utilized for further searches in the databank, thus providing even greater sensitivity to recognize distant familial proteins. This technique was successfully applied to the integrase, DNA-polymerase and immunoglobulin families.

  4. Eukaryotic genomes contain a [2Fez.sbnd;2S] ferredoxin isoform with a conserved C-terminal sequence motif.

    PubMed

    Seeber, Frank

    2002-11-01

    Apicomplexan protists contain a single mitochondrial [2Fe-2S] ferredoxin sequence (mtFd) with a highly conserved C-terminal motif, VDGxxpxPH, that distinguishes it from other mtFd, which have heterogeneous C-termini. This isoform of mtFd, called 'type II ferredoxin', is widespread in eukaryotes, some species having two isoforms and others possessing only one. Because of the known modulating role of the C-terminus of type I mtFd during association with itself and other interacting proteins, the presence of a conserved C-terminus in type II mtFd suggests it evolved either as a means for optimized homodimerization or to allow interaction with a highly conserved partner(s) that is yet to be defined.

  5. The Runt domain of AML1 (RUNX1) binds a sequence-conserved RNA motif that mimics a DNA element

    PubMed Central

    Fukunaga, Junichi; Nomura, Yusuke; Tanaka, Yoichiro; Amano, Ryo; Tanaka, Taku; Nakamura, Yoshikazu; Kawai, Gota; Sakamoto, Taiichi; Kozu, Tomoko

    2013-01-01

    AML1 (RUNX1) is a key transcription factor for hematopoiesis that binds to the Runt-binding double-stranded DNA element (RDE) of target genes through its N-terminal Runt domain. Aberrations in the AML1 gene are frequently found in human leukemia. To better understand AML1 and its potential utility for diagnosis and therapy, we obtained RNA aptamers that bind specifically to the AML1 Runt domain. Enzymatic probing and NMR analyses revealed that Apt1-S, which is a truncated variant of one of the aptamers, has a CACG tetraloop and two stem regions separated by an internal loop. All the isolated aptamers were found to contain the conserved sequence motif 5′-NNCCAC-3′ and 5′-GCGMGN′N′-3′ (M:A or C; N and N′ form Watson–Crick base pairs). The motif contains one AC mismatch and one base bulged out. Mutational analysis of Apt1-S showed that three guanines of the motif are important for Runt binding as are the three guanines of RDE, which are directly recognized by three arginine residues of the Runt domain. Mutational analyses of the Runt domain revealed that the amino acid residues used for Apt1-S binding were similar to those used for RDE binding. Furthermore, the aptamer competed with RDE for binding to the Runt domain in vitro. These results demonstrated that the Runt domain of the AML1 protein binds to the motif of the aptamer that mimics DNA. Our findings should provide new insights into RNA function and utility in both basic and applied sciences. PMID:23709277

  6. U17/snR30 is a ubiquitous snoRNA with two conserved sequence motifs essential for 18S rRNA production.

    PubMed

    Atzorn, Vera; Fragapane, Paola; Kiss, Tamás

    2004-02-01

    Saccharomyces cerevisiae snR30 is an essential box H/ACA small nucleolar RNA (snoRNA) required for the processing of 18S rRNA. Here, we show that the previously characterized human, reptilian, amphibian, and fish U17 snoRNAs represent the vertebrate homologues of yeast snR30. We also demonstrate that U17/snR30 is present in the fission yeast Schizosaccharomyces pombe and the unicellular ciliated protozoan Tetrahymena thermophila. Evolutionary comparison revealed that the 3'-terminal hairpins of U17/snR30 snoRNAs contain two highly conserved sequence motifs, the m1 (AUAUUCCUA) and m2 (AAACCAU) elements. Mutation analysis of yeast snR30 demonstrated that the m1 and m2 elements are essential for early cleavages of the 35S pre-rRNA and, consequently, for the production of mature 18S rRNA. The m1 and m2 motifs occupy the opposite strands of an internal loop structure, and they are located invariantly 7 nucleotides upstream from the ACA box of U17/snR30 snoRNAs. U17/snR30 is the first identified box H/ACA snoRNA that possesses an evolutionarily conserved role in the nucleolytic processing of eukaryotic pre-rRNA.

  7. A Gibbs sampler for motif detection in phylogenetically close sequences

    NASA Astrophysics Data System (ADS)

    Siddharthan, Rahul; van Nimwegen, Erik; Siggia, Eric

    2004-03-01

    Genes are regulated by transcription factors that bind to DNA upstream of genes and recognize short conserved ``motifs'' in a random intergenic ``background''. Motif-finders such as the Gibbs sampler compare the probability of these short sequences being represented by ``weight matrices'' to the probability of their arising from the background ``null model'', and explore this space (analogous to a free-energy landscape). But closely related species may show conservation not because of functional sites but simply because they have not had sufficient time to diverge, so conventional methods will fail. We introduce a new Gibbs sampler algorithm that accounts for common ancestry when searching for motifs, while requiring minimal ``prior'' assumptions on the number and types of motifs, assessing the significance of detected motifs by ``tracking'' clusters that stay together. We apply this scheme to motif detection in sporulation-cycle genes in the yeast S. cerevisiae, using recent sequences of other closely-related Saccharomyces species.

  8. Short sequence motifs, overrepresented in mammalian conservednon-coding sequences

    SciTech Connect

    Minovitsky, Simon; Stegmaier, Philip; Kel, Alexander; Kondrashov,Alexey S.; Dubchak, Inna

    2007-02-21

    Background: A substantial fraction of non-coding DNAsequences of multicellular eukaryotes is under selective constraint. Inparticular, ~;5 percent of the human genome consists of conservednon-coding sequences (CNSs). CNSs differ from other genomic sequences intheir nucleotide composition and must play important functional roles,which mostly remain obscure.Results: We investigated relative abundancesof short sequence motifs in all human CNSs present in the human/mousewhole-genome alignments vs. three background sets of sequences: (i)weakly conserved or unconserved non-coding sequences (non-CNSs); (ii)near-promoter sequences (located between nucleotides -500 and -1500,relative to a start of transcription); and (iii) random sequences withthe same nucleotide composition as that of CNSs. When compared tonon-CNSs and near-promoter sequences, CNSs possess an excess of AT-richmotifs, often containing runs of identical nucleotides. In contrast, whencompared to random sequences, CNSs contain an excess of GC-rich motifswhich, however, lack CpG dinucleotides. Thus, abundance of short sequencemotifs in human CNSs, taken as a whole, is mostly determined by theiroverall compositional properties and not by overrepresentation of anyspecific short motifs. These properties are: (i) high AT-content of CNSs,(ii) a tendency, probably due to context-dependent mutation, of A's andT's to clump, (iii) presence of short GC-rich regions, and (iv) avoidanceof CpG contexts, due to their hypermutability. Only a small number ofshort motifs, overrepresented in all human CNSs are similar to bindingsites of transcription factors from the FOX family.Conclusion: Human CNSsas a whole appear to be too broad a class of sequences to possess strongfootprints of any short sequence-specific functions. Such footprintsshould be studied at the level of functional subclasses of CNSs, such asthose which flank genes with a particular pattern of expression. Overallproperties of CNSs are affected by patterns in

  9. Comparison of SIV and HIV-1 genomic RNA structures reveals impact of sequence evolution on conserved and non-conserved structural motifs.

    PubMed

    Pollom, Elizabeth; Dang, Kristen K; Potter, E Lake; Gorelick, Robert J; Burch, Christina L; Weeks, Kevin M; Swanstrom, Ronald

    2013-01-01

    RNA secondary structure plays a central role in the replication and metabolism of all RNA viruses, including retroviruses like HIV-1. However, structures with known function represent only a fraction of the secondary structure reported for HIV-1(NL4-3). One tool to assess the importance of RNA structures is to examine their conservation over evolutionary time. To this end, we used SHAPE to model the secondary structure of a second primate lentiviral genome, SIVmac239, which shares only 50% sequence identity at the nucleotide level with HIV-1NL4-3. Only about half of the paired nucleotides are paired in both genomic RNAs and, across the genome, just 71 base pairs form with the same pairing partner in both genomes. On average the RNA secondary structure is thus evolving at a much faster rate than the sequence. Structure at the Gag-Pro-Pol frameshift site is maintained but in a significantly altered form, while the impact of selection for maintaining a protein binding interaction can be seen in the conservation of pairing partners in the small RRE stems where Rev binds. Structures that are conserved between SIVmac239 and HIV-1(NL4-3) also occur at the 5' polyadenylation sequence, in the plus strand primer sites, PPT and cPPT, and in the stem-loop structure that includes the first splice acceptor site. The two genomes are adenosine-rich and cytidine-poor. The structured regions are enriched in guanosines, while unpaired regions are enriched in adenosines, and functionaly important structures have stronger base pairing than nonconserved structures. We conclude that much of the secondary structure is the result of fortuitous pairing in a metastable state that reforms during sequence evolution. However, secondary structure elements with important function are stabilized by higher guanosine content that allows regions of structure to persist as sequence evolution proceeds, and, within the confines of selective pressure, allows structures to evolve. PMID:23593004

  10. Detecting correlations among functional-sequence motifs

    NASA Astrophysics Data System (ADS)

    Pirino, Davide; Rigosa, Jacopo; Ledda, Alice; Ferretti, Luca

    2012-06-01

    Sequence motifs are words of nucleotides in DNA with biological functions, e.g., gene regulation. Identification of such words proceeds through rejection of Markov models on the expected motif frequency along the genome. Additional biological information can be extracted from the correlation structure among patterns of motif occurrences. In this paper a log-linear multivariate intensity Poisson model is estimated via expectation maximization on a set of motifs along the genome of E. coli K12. The proposed approach allows for excitatory as well as inhibitory interactions among motifs and between motifs and other genomic features like gene occurrences. Our findings confirm previous stylized facts about such types of interactions and shed new light on genome-maintenance functions of some particular motifs. We expect these methods to be applicable to a wider set of genomic features.

  11. Networks of motifs from sequences of symbols.

    PubMed

    Sinatra, Roberta; Condorelli, Daniele; Latora, Vito

    2010-10-22

    We introduce a method to convert an ensemble of sequences of symbols into a weighted directed network whose nodes are motifs, while the directed links and their weights are defined from statistically significant co-occurences of two motifs in the same sequence. The analysis of communities of networks of motifs is shown to be able to correlate sequences with functions in the human proteome database, to detect hot topics from online social dialogs, to characterize trajectories of dynamical systems, and it might find other useful applications to process large amounts of data in various fields.

  12. Networks of Motifs from Sequences of Symbols

    NASA Astrophysics Data System (ADS)

    Sinatra, Roberta; Condorelli, Daniele; Latora, Vito

    2010-10-01

    We introduce a method to convert an ensemble of sequences of symbols into a weighted directed network whose nodes are motifs, while the directed links and their weights are defined from statistically significant co-occurences of two motifs in the same sequence. The analysis of communities of networks of motifs is shown to be able to correlate sequences with functions in the human proteome database, to detect hot topics from online social dialogs, to characterize trajectories of dynamical systems, and it might find other useful applications to process large amounts of data in various fields.

  13. SCANMOT: searching for similar sequences using a simultaneous scan of multiple sequence motifs.

    PubMed

    Chakrabarti, Saikat; Anand, A Prem; Bhardwaj, Nitin; Pugalenthi, Ganesan; Sowdhamini, R

    2005-07-01

    Establishment of similarities between proteins is very important for the study of the relationship between sequence, structure and function and for the analysis of evolutionary relationships. Motif-based search methods play a crucial role in establishing the connections between proteins that are particularly useful for distant relationships. This paper reports SCANMOT, a web-based server that searches for similarities between proteins by simultaneous matching of multiple motifs. SCANMOT searches for similar sequences in entire sequence databases using multiple conserved regions and utilizes inter-motif spacing as restraints. The SCANMOT server is available via http://www.ncbs.res.in/~faculty/mini/scanmot/scanmot.html.

  14. Discovering Motifs in Biological Sequences Using the Micron Automata Processor.

    PubMed

    Roy, Indranil; Aluru, Srinivas

    2016-01-01

    Finding approximately conserved sequences, called motifs, across multiple DNA or protein sequences is an important problem in computational biology. In this paper, we consider the (l, d) motif search problem of identifying one or more motifs of length l present in at least q of the n given sequences, with each occurrence differing from the motif in at most d substitutions. The problem is known to be NP-complete, and the largest solved instance reported to date is (26,11). We propose a novel algorithm for the (l,d) motif search problem using streaming execution over a large set of non-deterministic finite automata (NFA). This solution is designed to take advantage of the micron automata processor, a new technology close to deployment that can simultaneously execute multiple NFA in parallel. We demonstrate the capability for solving much larger instances of the (l, d) motif search problem using the resources available within a single automata processor board, by estimating run-times for problem instances (39,18) and (40,17). The paper serves as a useful guide to solving problems using this new accelerator technology. PMID:26886735

  15. Sequence conserved for subcellular localization

    PubMed Central

    Nair, Rajesh; Rost, Burkhard

    2002-01-01

    The more proteins diverged in sequence, the more difficult it becomes for bioinformatics to infer similarities of protein function and structure from sequence. The precise thresholds used in automated genome annotations depend on the particular aspect of protein function transferred by homology. Here, we presented the first large-scale analysis of the relation between sequence similarity and identity in subcellular localization. Three results stood out: (1) The subcellular compartment is generally more conserved than what might have been expected given that short sequence motifs like nuclear localization signals can alter the native compartment; (2) the sequence conservation of localization is similar between different compartments; and (3) it is similar to the conservation of structure and enzymatic activity. In particular, we found the transition between the regions of conserved and nonconserved localization to be very sharp, although the thresholds for conservation were less well defined than for structure and enzymatic activity. We found that a simple measure for sequence similarity accounting for pairwise sequence identity and alignment length, the HSSP distance, distinguished accurately between protein pairs of identical and different localizations. In fact, BLAST expectation values outperformed the HSSP distance only for alignments in the subtwilight zone. We succeeded in slightly improving the accuracy of inferring localization through homology by fine tuning the thresholds. Finally, we applied our results to the entire SWISS-PROT database and five entirely sequenced eukaryotes. PMID:12441382

  16. PhyME: a software tool for finding motifs in sets of orthologous sequences.

    PubMed

    Sinha, Saurabh

    2007-01-01

    Discovery of transcription factor binding sites is a crucial and challenging problem in bioinformatics. Several computational tools have been developed for this problem, popularly known as the motif-finding problem. PhyME is an ab initio motif-finding algorithm, which finds overrepresented motifs in input sequences while accounting for their evolutionary conservation in orthologs of those sequences. Here, we describe the usage of this algorithm, publicly available as a Linux-based implementation. PMID:17993682

  17. Motif3D: Relating protein sequence motifs to 3D structure.

    PubMed

    Gaulton, Anna; Attwood, Teresa K

    2003-07-01

    Motif3D is a web-based protein structure viewer designed to allow sequence motifs, and in particular those contained in the fingerprints of the PRINTS database, to be visualised on three-dimensional (3D) structures. Additional functionality is provided for the rhodopsin-like G protein-coupled receptors, enabling fingerprint motifs of any of the receptors in this family to be mapped onto the single structure available, that of bovine rhodopsin. Motif3D can be used via the web interface available at: http://www.bioinf.man.ac.uk/dbbrowser/motif3d/motif3d.html.

  18. Sequence-Based Screening for Rare Enzymes: New Insights into the World of AMDases Reveal a Conserved Motif and 58 Novel Enzymes Clustering in Eight Distinct Families

    PubMed Central

    Maimanakos, Janine; Chow, Jennifer; Gaßmeyer, Sarah K.; Güllert, Simon; Busch, Florian; Kourist, Robert; Streit, Wolfgang R.

    2016-01-01

    Arylmalonate Decarboxylases (AMDases, EC 4.1.1.76) are very rare and mostly underexplored enzymes. Currently only four known and biochemically characterized representatives exist. However, their ability to decarboxylate α-disubstituted malonic acid derivatives to optically pure products without cofactors makes them attractive and promising candidates for the use as biocatalysts in industrial processes. Until now, AMDases could not be separated from other members of the aspartate/glutamate racemase superfamily based on their gene sequences. Within this work, a search algorithm was developed that enables a reliable prediction of AMDase activity for potential candidates. Based on specific sequence patterns and screening methods 58 novel AMDase candidate genes could be identified in this work. Thereby, AMDases with the conserved sequence pattern of Bordetella bronchiseptica’s prototype appeared to be limited to the classes of Alpha-, Beta-, and Gamma-proteobacteria. Amino acid homologies and comparison of gene surrounding sequences enabled the classification of eight enzyme clusters. Particularly striking is the accumulation of genes coding for different transporters of the tripartite tricarboxylate transporters family, TRAP transporters and ABC transporters as well as genes coding for mandelate racemases/muconate lactonizing enzymes that might be involved in substrate uptake or degradation of AMDase products. Further, three novel AMDases were characterized which showed a high enantiomeric excess (>99%) of the (R)-enantiomer of flurbiprofen. These are the recombinant AmdA and AmdV from Variovorax sp. strains HH01 and HH02, originated from soil, and AmdP from Polymorphum gilvum found by a data base search. Altogether our findings give new insights into the class of AMDases and reveal many previously unknown enzyme candidates with high potential for bioindustrial processes. PMID:27610105

  19. Sequence-Based Screening for Rare Enzymes: New Insights into the World of AMDases Reveal a Conserved Motif and 58 Novel Enzymes Clustering in Eight Distinct Families.

    PubMed

    Maimanakos, Janine; Chow, Jennifer; Gaßmeyer, Sarah K; Güllert, Simon; Busch, Florian; Kourist, Robert; Streit, Wolfgang R

    2016-01-01

    Arylmalonate Decarboxylases (AMDases, EC 4.1.1.76) are very rare and mostly underexplored enzymes. Currently only four known and biochemically characterized representatives exist. However, their ability to decarboxylate α-disubstituted malonic acid derivatives to optically pure products without cofactors makes them attractive and promising candidates for the use as biocatalysts in industrial processes. Until now, AMDases could not be separated from other members of the aspartate/glutamate racemase superfamily based on their gene sequences. Within this work, a search algorithm was developed that enables a reliable prediction of AMDase activity for potential candidates. Based on specific sequence patterns and screening methods 58 novel AMDase candidate genes could be identified in this work. Thereby, AMDases with the conserved sequence pattern of Bordetella bronchiseptica's prototype appeared to be limited to the classes of Alpha-, Beta-, and Gamma-proteobacteria. Amino acid homologies and comparison of gene surrounding sequences enabled the classification of eight enzyme clusters. Particularly striking is the accumulation of genes coding for different transporters of the tripartite tricarboxylate transporters family, TRAP transporters and ABC transporters as well as genes coding for mandelate racemases/muconate lactonizing enzymes that might be involved in substrate uptake or degradation of AMDase products. Further, three novel AMDases were characterized which showed a high enantiomeric excess (>99%) of the (R)-enantiomer of flurbiprofen. These are the recombinant AmdA and AmdV from Variovorax sp. strains HH01 and HH02, originated from soil, and AmdP from Polymorphum gilvum found by a data base search. Altogether our findings give new insights into the class of AMDases and reveal many previously unknown enzyme candidates with high potential for bioindustrial processes. PMID:27610105

  20. Sequence-Based Screening for Rare Enzymes: New Insights into the World of AMDases Reveal a Conserved Motif and 58 Novel Enzymes Clustering in Eight Distinct Families

    PubMed Central

    Maimanakos, Janine; Chow, Jennifer; Gaßmeyer, Sarah K.; Güllert, Simon; Busch, Florian; Kourist, Robert; Streit, Wolfgang R.

    2016-01-01

    Arylmalonate Decarboxylases (AMDases, EC 4.1.1.76) are very rare and mostly underexplored enzymes. Currently only four known and biochemically characterized representatives exist. However, their ability to decarboxylate α-disubstituted malonic acid derivatives to optically pure products without cofactors makes them attractive and promising candidates for the use as biocatalysts in industrial processes. Until now, AMDases could not be separated from other members of the aspartate/glutamate racemase superfamily based on their gene sequences. Within this work, a search algorithm was developed that enables a reliable prediction of AMDase activity for potential candidates. Based on specific sequence patterns and screening methods 58 novel AMDase candidate genes could be identified in this work. Thereby, AMDases with the conserved sequence pattern of Bordetella bronchiseptica’s prototype appeared to be limited to the classes of Alpha-, Beta-, and Gamma-proteobacteria. Amino acid homologies and comparison of gene surrounding sequences enabled the classification of eight enzyme clusters. Particularly striking is the accumulation of genes coding for different transporters of the tripartite tricarboxylate transporters family, TRAP transporters and ABC transporters as well as genes coding for mandelate racemases/muconate lactonizing enzymes that might be involved in substrate uptake or degradation of AMDase products. Further, three novel AMDases were characterized which showed a high enantiomeric excess (>99%) of the (R)-enantiomer of flurbiprofen. These are the recombinant AmdA and AmdV from Variovorax sp. strains HH01 and HH02, originated from soil, and AmdP from Polymorphum gilvum found by a data base search. Altogether our findings give new insights into the class of AMDases and reveal many previously unknown enzyme candidates with high potential for bioindustrial processes.

  1. Conservation defines functional motifs in the squint/nodal-related 1 RNA dorsal localization element

    PubMed Central

    Gilligan, Patrick C.; Kumari, Pooja; Lim, Shimin; Cheong, Albert; Chang, Alex; Sampath, Karuna

    2011-01-01

    RNA localization is emerging as a general principle of sub-cellular protein localization and cellular organization. However, the sequence and structural requirements in many RNA localization elements remain poorly understood. Whereas transcription factor-binding sites in DNA can be recognized as short degenerate motifs, and consensus binding sites readily inferred, protein-binding sites in RNA often contain structural features, and can be difficult to infer. We previously showed that zebrafish squint/nodal-related 1 (sqt/ndr1) RNA localizes to the future dorsal side of the embryo. Interestingly, mammalian nodal RNA can also localize to dorsal when injected into zebrafish embryos, suggesting that the sequence motif(s) may be conserved, even though the fish and mammal UTRs cannot be aligned. To define potential sequence and structural features, we obtained ndr1 3′-UTR sequences from approximately 50 fishes that are closely, or distantly, related to zebrafish, for high-resolution phylogenetic footprinting. We identify conserved sequence and structural motifs within the zebrafish/carp family and catfish. We find that two novel motifs, a single-stranded AGCAC motif and a small stem-loop, are required for efficient sqt RNA localization. These findings show that comparative sequencing in the zebrafish/carp family is an efficient approach for identifying weak consensus binding sites for RNA regulatory proteins. PMID:21149265

  2. Conserved sequence elements associated with exon skipping

    PubMed Central

    Miriami, Elana; Margalit, Hanah; Sperling, Ruth

    2003-01-01

    One of the major forms of alternative splicing, which generates multiple mRNA isoforms differing in the precise combinations of their exon sequences, is exon skipping. While in constitutive splicing all exons are included, in the skipped pattern(s) one or more exons are skipped. The regulation of this process is still not well understood; so far, cis- regulatory elements (such as exonic splicing enhancers) were identified in individual cases. We therefore set to investigate the possibility that exon skipping is controlled by sequences in the adjacent introns. We employed a computer analysis on 54 sequences documented as undergoing exon skipping, and identified two motifs both in the upstream and downstream introns of the skipped exons. One motif is highly enriched in pyrimidines (mostly C residues), and the other motif is highly enriched in purines (mostly G residues). The two motifs differ from the known cis-elements present at the 5′ and 3′ splice site. Interestingly, the two motifs are complementary, and their relative positional order is conserved in the flanking introns. These suggest that base pairing interactions can underlie a mechanism that involves secondary structure to regulate exon skipping. Remarkably, the two motifs are conserved in mouse orthologous genes that undergo exon skipping. PMID:12655015

  3. Occurrence probability of structured motifs in random sequences.

    PubMed

    Robin, S; Daudin, J-J; Richard, H; Sagot, M-F; Schbath, S

    2002-01-01

    The problem of extracting from a set of nucleic acid sequences motifs which may have biological function is more and more important. In this paper, we are interested in particular motifs that may be implicated in the transcription process. These motifs, called structured motifs, are composed of two ordered parts separated by a variable distance and allowing for substitutions. In order to assess their statistical significance, we propose approximations of the probability of occurrences of such a structured motif in a given sequence. An application of our method to evaluate candidate promoters in E. coli and B. subtilis is presented. Simulations show the goodness of the approximations. PMID:12614545

  4. iMotifs: an integrated sequence motif visualization and analysis environment

    PubMed Central

    Piipari, Matias; Down, Thomas A.; Saini, Harpreet; Enright, Anton; Hubbard, Tim J.P.

    2010-01-01

    Motivation: Short sequence motifs are an important class of models in molecular biology, used most commonly for describing transcription factor binding site specificity patterns. High-throughput methods have been recently developed for detecting regulatory factor binding sites in vivo and in vitro and consequently high-quality binding site motif data are becoming available for increasing number of organisms and regulatory factors. Development of intuitive tools for the study of sequence motifs is therefore important. iMotifs is a graphical motif analysis environment that allows visualization of annotated sequence motifs and scored motif hits in sequences. It also offers motif inference with the sensitive NestedMICA algorithm, as well as overrepresentation and pairwise motif matching capabilities. All of the analysis functionality is provided without the need to convert between file formats or learn different command line interfaces. The application includes a bundled and graphically integrated version of the NestedMICA motif inference suite that has no outside dependencies. Problems associated with local deployment of software are therefore avoided. Availability: iMotifs is licensed with the GNU Lesser General Public License v2.0 (LGPL 2.0). The software and its source is available at http://wiki.github.com/mz2/imotifs and can be run on Mac OS X Leopard (Intel/PowerPC). We also provide a cross-platform (Linux, OS X, Windows) LGPL 2.0 licensed library libxms for the Perl, Ruby, R and Objective-C programming languages for input and output of XMS formatted annotated sequence motif set files. Contact: matias.piipari@gmail.com; imotifs@googlegroups.com PMID:20106815

  5. Identifying novel sequence variants of RNA 3D motifs

    PubMed Central

    Zirbel, Craig L.; Roll, James; Sweeney, Blake A.; Petrov, Anton I.; Pirrung, Meg; Leontis, Neocles B.

    2015-01-01

    Predicting RNA 3D structure from sequence is a major challenge in biophysics. An important sub-goal is accurately identifying recurrent 3D motifs from RNA internal and hairpin loop sequences extracted from secondary structure (2D) diagrams. We have developed and validated new probabilistic models for 3D motif sequences based on hybrid Stochastic Context-Free Grammars and Markov Random Fields (SCFG/MRF). The SCFG/MRF models are constructed using atomic-resolution RNA 3D structures. To parameterize each model, we use all instances of each motif found in the RNA 3D Motif Atlas and annotations of pairwise nucleotide interactions generated by the FR3D software. Isostericity relations between non-Watson–Crick basepairs are used in scoring sequence variants. SCFG techniques model nested pairs and insertions, while MRF ideas handle crossing interactions and base triples. We use test sets of randomly-generated sequences to set acceptance and rejection thresholds for each motif group and thus control the false positive rate. Validation was carried out by comparing results for four motif groups to RMDetect. The software developed for sequence scoring (JAR3D) is structured to automatically incorporate new motifs as they accumulate in the RNA 3D Motif Atlas when new structures are solved and is available free for download. PMID:26130723

  6. Sequence Bundles: a novel method for visualising, discovering and exploring sequence motifs

    PubMed Central

    2014-01-01

    Background We introduce Sequence Bundles--a novel data visualisation method for representing multiple sequence alignments (MSAs). We identify and address key limitations of the existing bioinformatics data visualisation methods (i.e. the Sequence Logo) by enabling Sequence Bundles to give salient visual expression to sequence motifs and other data features, which would otherwise remain hidden. Methods For the development of Sequence Bundles we employed research-led information design methodologies. Sequences are encoded as uninterrupted, semi-opaque lines plotted on a 2-dimensional reconfigurable grid. Each line represents a single sequence. The thickness and opacity of the stack at each residue in each position indicates the level of conservation and the lines' curved paths expose patterns in correlation and functionality. Several MSAs can be visualised in a composite image. The Sequence Bundles method is designed to favour a tangible, continuous and intuitive display of information. Results We have developed a software demonstration application for generating a Sequence Bundles visualisation of MSAs provided for the BioVis 2013 redesign contest. A subsequent exploration of the visualised line patterns allowed for the discovery of a number of interesting features in the dataset. Reported features include the extreme conservation of sequences displaying a specific residue and bifurcations of the consensus sequence. Conclusions Sequence Bundles is a novel method for visualisation of MSAs and the discovery of sequence motifs. It can aid in generating new insight and hypothesis making. Sequence Bundles is well disposed for future implementation as an interactive visual analytics software, which can complement existing visualisation tools. PMID:25237395

  7. Function-based classification of carbohydrate-active enzymes by recognition of short, conserved peptide motifs.

    PubMed

    Busk, Peter Kamp; Lange, Lene

    2013-06-01

    Functional prediction of carbohydrate-active enzymes is difficult due to low sequence identity. However, similar enzymes often share a few short motifs, e.g., around the active site, even when the overall sequences are very different. To exploit this notion for functional prediction of carbohydrate-active enzymes, we developed a simple algorithm, peptide pattern recognition (PPR), that can divide proteins into groups of sequences that share a set of short conserved sequences. When this method was used on 118 glycoside hydrolase 5 proteins with 9% average pairwise identity and representing four characterized enzymatic functions, 97% of the proteins were sorted into groups correlating with their enzymatic activity. Furthermore, we analyzed 8,138 glycoside hydrolase 13 proteins including 204 experimentally characterized enzymes with 28 different functions. There was a 91% correlation between group and enzyme activity. These results indicate that the function of carbohydrate-active enzymes can be predicted with high precision by finding short, conserved motifs in their sequences. The glycoside hydrolase 61 family is important for fungal biomass conversion, but only a few proteins of this family have been functionally characterized. Interestingly, PPR divided 743 glycoside hydrolase 61 proteins into 16 subfamilies useful for targeted investigation of the function of these proteins and pinpointed three conserved motifs with putative importance for enzyme activity. Furthermore, the conserved sequences were useful for cloning of new, subfamily-specific glycoside hydrolase 61 proteins from 14 fungi. In conclusion, identification of conserved sequence motifs is a new approach to sequence analysis that can predict carbohydrate-active enzyme functions with high precision. PMID:23524681

  8. Function-based classification of carbohydrate-active enzymes by recognition of short, conserved peptide motifs.

    PubMed

    Busk, Peter Kamp; Lange, Lene

    2013-06-01

    Functional prediction of carbohydrate-active enzymes is difficult due to low sequence identity. However, similar enzymes often share a few short motifs, e.g., around the active site, even when the overall sequences are very different. To exploit this notion for functional prediction of carbohydrate-active enzymes, we developed a simple algorithm, peptide pattern recognition (PPR), that can divide proteins into groups of sequences that share a set of short conserved sequences. When this method was used on 118 glycoside hydrolase 5 proteins with 9% average pairwise identity and representing four characterized enzymatic functions, 97% of the proteins were sorted into groups correlating with their enzymatic activity. Furthermore, we analyzed 8,138 glycoside hydrolase 13 proteins including 204 experimentally characterized enzymes with 28 different functions. There was a 91% correlation between group and enzyme activity. These results indicate that the function of carbohydrate-active enzymes can be predicted with high precision by finding short, conserved motifs in their sequences. The glycoside hydrolase 61 family is important for fungal biomass conversion, but only a few proteins of this family have been functionally characterized. Interestingly, PPR divided 743 glycoside hydrolase 61 proteins into 16 subfamilies useful for targeted investigation of the function of these proteins and pinpointed three conserved motifs with putative importance for enzyme activity. Furthermore, the conserved sequences were useful for cloning of new, subfamily-specific glycoside hydrolase 61 proteins from 14 fungi. In conclusion, identification of conserved sequence motifs is a new approach to sequence analysis that can predict carbohydrate-active enzyme functions with high precision.

  9. [Conserved motifs in the primary and secondary ITS1 structures in bryophytes].

    PubMed

    Milyutina, I A; Ignatov, M S

    2015-01-01

    A study of the ITS1 nucleotide sequences of 1000 moss species of 62 families, 11 liverwort species from five orders, and one hornwort Anthoceros agrestis identified five highly conserved motifs (CM1-CM5), which are presumably involved in pre-rRNA processing. Although the ITS1 sequences substantially differ in length and the extent of divergence, the conserved motifs are found in all of them. ITS1 secondary structures were constructed for 76 mosses, and main regularities at conserved motif positioning were observed. The positions of processing sites in the ITS1 secondary structure of the yeast Saccharomyces cerevisiae were found to be similar to the positions of the conserved motifs in the ITS1 secondary structures of mosses and liverworts. In addition, a potential hairpin formation in the putative secondary structure of a pre-rRNA fragment was considered for the region between ITS1 CM4-CM5 and a highly conserved region between hairpins 49 and 50 (H49 and H50) of the 18S rRNA.

  10. [Conserved motifs in the primary and secondary ITS1 structures in bryophytes].

    PubMed

    Milyutina, I A; Ignatov, M S

    2015-01-01

    A study of the ITS1 nucleotide sequences of 1000 moss species of 62 families, 11 liverwort species from five orders, and one hornwort Anthoceros agrestis identified five highly conserved motifs (CM1-CM5), which are presumably involved in pre-rRNA processing. Although the ITS1 sequences substantially differ in length and the extent of divergence, the conserved motifs are found in all of them. ITS1 secondary structures were constructed for 76 mosses, and main regularities at conserved motif positioning were observed. The positions of processing sites in the ITS1 secondary structure of the yeast Saccharomyces cerevisiae were found to be similar to the positions of the conserved motifs in the ITS1 secondary structures of mosses and liverworts. In addition, a potential hairpin formation in the putative secondary structure of a pre-rRNA fragment was considered for the region between ITS1 CM4-CM5 and a highly conserved region between hairpins 49 and 50 (H49 and H50) of the 18S rRNA. PMID:26107892

  11. Transmembrane helix dimerization: beyond the search for sequence motifs.

    PubMed

    Li, Edwin; Wimley, William C; Hristova, Kalina

    2012-02-01

    Studies of the dimerization of transmembrane (TM) helices have been ongoing for many years now, and have provided clues to the fundamental principles behind membrane protein (MP) folding. Our understanding of TM helix dimerization has been dominated by the idea that sequence motifs, simple recognizable amino acid sequences that drive lateral interaction, can be used to explain and predict the lateral interactions between TM helices in membrane proteins. But as more and more unique interacting helices are characterized, it is becoming clear that the sequence motif paradigm is incomplete. Experimental evidence suggests that the search for sequence motifs, as mediators of TM helix dimerization, cannot solve the membrane protein folding problem alone. Here we review the current understanding in the field, as it has evolved from the paradigm of sequence motifs into a view in which the interactions between TM helices are much more complex. This article is part of a Special Issue entitled: Membrane protein structure and function.

  12. Finding the most significant common sequence and structure motifs in a set of RNA sequences.

    PubMed Central

    Gorodkin, J; Heyer, L J; Stormo, G D

    1997-01-01

    We present a computational scheme to locally align a collection of RNA sequences using sequence and structure constraints. In addition, the method searches for the resulting alignments with the most significant common motifs, among all possible collections. The first part utilizes a simplified version of the Sankoff algorithm for simultaneous folding and alignment of RNA sequences, but maintains tractability by constructing multi-sequence alignments from pairwise comparisons. The algorithm finds the multiple alignments using a greedy approach and has similarities to both CLUSTAL and CONSENSUS, but the core algorithm assures that the pairwise alignments are optimized for both sequence and structure conservation. The choice of scoring system and the method of progressively constructing the final solution are important considerations that are discussed. Example solutions, and comparisons with other approaches, are provided. The solutions include finding consensus structures identical to published ones. PMID:9278497

  13. Fission Yeast Hotspot Sequence Motifs Are Also Active in Budding Yeast

    PubMed Central

    Steiner, Walter W.; Steiner, Estelle M.

    2012-01-01

    In most organisms, including humans, meiotic recombination occurs preferentially at a limited number of sites in the genome known as hotspots. There has been substantial progress recently in elucidating the factors determining the location of meiotic recombination hotspots, and it is becoming clear that simple sequence motifs play a significant role. In S. pombe, there are at least five unique sequence motifs that have been shown to produce hotspots of recombination, and it is likely that there are more. In S. cerevisiae, simple sequence motifs have also been shown to produce hotspots or show significant correlations with hotspots. Some of the hotspot motifs in both yeasts are known or suspected to bind transcription factors (TFs), which are required for the activity of those hotspots. Here we show that four of the five hotspot motifs identified in S. pombe also create hotspots in the distantly related budding yeast S. cerevisiae. For one of these hotspots, M26 (also called CRE), we identify TFs, Cst6 and Sko1, that activate and inhibit the hotspot, respectively. In addition, two of the hotspot motifs show significant correlations with naturally occurring hotspots. The conservation of these hotspots between the distantly related fission and budding yeasts suggests that these sequence motifs, and others yet to be discovered, may function widely as hotspots in many diverse organisms. PMID:23300865

  14. Dipeptide frequency/bias analysis identifies conserved sites of nonrandomness shared by cysteine-rich motifs.

    PubMed

    Campion, S R; Ameen, A S; Lai, L; King, J M; Munzenmaier, T N

    2001-08-15

    This report describes the application of a simple computational tool, AAPAIR.TAB, for the systematic analysis of the cysteine-rich EGF, Sushi, and Laminin motif/sequence families at the two-amino acid level. Automated dipeptide frequency/bias analysis detects preferences in the distribution of amino acids in established protein families, by determining which "ordered dipeptides" occur most frequently in comprehensive motif-specific sequence data sets. Graphic display of the dipeptide frequency/bias data revealed family-specific preferences for certain dipeptides, but more importantly detected a shared preference for employment of the ordered dipeptides Gly-Tyr (GY) and Gly-Phe (GF) in all three protein families. The dipeptide Asn-Gly (NG) also exhibited high-frequency and bias in the EGF and Sushi motif families, whereas Asn-Thr (NT) was distinguished in the Laminin family. Evaluation of the distribution of dipeptides identified by frequency/bias analysis subsequently revealed the highly restricted localization of the G(F/Y) and N(G/T) sequence elements at two separate sites of extreme conservation in the consensus sequence of all three sequence families. The similar employment of the high-frequency/bias dipeptides in three distinct protein sequence families was further correlated with the concurrence of these shared molecular determinants at similar positions within the distinctive scaffolds of three structurally divergent, but similarly employed, motif modules.

  15. Annotating RNA motifs in sequences and alignments

    PubMed Central

    Gardner, Paul P.; Eldai, Hisham

    2015-01-01

    RNA performs a diverse array of important functions across all cellular life. These functions include important roles in translation, building translational machinery and maturing messenger RNA. More recent discoveries include the miRNAs and bacterial sRNAs that regulate gene expression, the thermosensors, riboswitches and other cis-regulatory elements that help prokaryotes sense their environment and eukaryotic piRNAs that suppress transposition. However, there can be a long period between the initial discovery of a RNA and determining its function. We present a bioinformatic approach to characterize RNA motifs, which are critical components of many RNA structure–function relationships. These motifs can, in some instances, provide researchers with functional hypotheses for uncharacterized RNAs. Moreover, we introduce a new profile-based database of RNA motifs—RMfam—and illustrate some applications for investigating the evolution and functional characterization of RNA. All the data and scripts associated with this work are available from: https://github.com/ppgardne/RMfam. PMID:25520192

  16. Phosphatidylinositol transfer proteins: sequence motifs in structural and evolutionary analyses

    PubMed Central

    Wyckoff, Gerald J.; Solidar, Ada; Yoden, Marilyn D.

    2016-01-01

    Phosphatidylinositol transfer proteins (PITP) are a family of monomeric proteins that bind and transfer phosphatidylinositol and phosphatidylcholine between membrane compartments. They are required for production of inositol and diacylglycerol second messengers, and are found in most metazoan organisms. While PITPs are known to carry out crucial cell-signaling roles in many organisms, the structure, function and evolution of the majority of family members remains unexplored; primarily because the ubiquity and diversity of the family thwarts traditional methods of global alignment. To surmount this obstacle, we instead took a novel approach, using MEME and a parsimony-based analysis to create a cladogram of conserved sequence motifs in 56 PITP family proteins from 26 species. In keeping with previous functional annotations, three clades were supported within our evolutionary analysis; two classes of soluble proteins and a class of membrane-associated proteins. By, focusing on conserved regions, the analysis allowed for in depth queries regarding possible functional roles of PITP proteins in both intra- and extra- cellular signaling. PMID:27429707

  17. Discovering motifs in ranked lists of DNA sequences.

    PubMed

    Eden, Eran; Lipson, Doron; Yogev, Sivan; Yakhini, Zohar

    2007-03-23

    Computational methods for discovery of sequence elements that are enriched in a target set compared with a background set are fundamental in molecular biology research. One example is the discovery of transcription factor binding motifs that are inferred from ChIP-chip (chromatin immuno-precipitation on a microarray) measurements. Several major challenges in sequence motif discovery still require consideration: (i) the need for a principled approach to partitioning the data into target and background sets; (ii) the lack of rigorous models and of an exact p-value for measuring motif enrichment; (iii) the need for an appropriate framework for accounting for motif multiplicity; (iv) the tendency, in many of the existing methods, to report presumably significant motifs even when applied to randomly generated data. In this paper we present a statistical framework for discovering enriched sequence elements in ranked lists that resolves these four issues. We demonstrate the implementation of this framework in a software application, termed DRIM (discovery of rank imbalanced motifs), which identifies sequence motifs in lists of ranked DNA sequences. We applied DRIM to ChIP-chip and CpG methylation data and obtained the following results. (i) Identification of 50 novel putative transcription factor (TF) binding sites in yeast ChIP-chip data. The biological function of some of them was further investigated to gain new insights on transcription regulation networks in yeast. For example, our discoveries enable the elucidation of the network of the TF ARO80. Another finding concerns a systematic TF binding enhancement to sequences containing CA repeats. (ii) Discovery of novel motifs in human cancer CpG methylation data. Remarkably, most of these motifs are similar to DNA sequence elements bound by the Polycomb complex that promotes histone methylation. Our findings thus support a model in which histone methylation and CpG methylation are mechanistically linked. Overall, we

  18. A Conserved Motif Provides Binding Specificity to the PP2A-B56 Phosphatase.

    PubMed

    Hertz, Emil Peter Thrane; Kruse, Thomas; Davey, Norman E; López-Méndez, Blanca; Sigurðsson, Jón Otti; Montoya, Guillermo; Olsen, Jesper V; Nilsson, Jakob

    2016-08-18

    Dynamic protein phosphorylation is a fundamental mechanism regulating biological processes in all organisms. Protein phosphatase 2A (PP2A) is the main source of phosphatase activity in the cell, but the molecular details of substrate recognition are unknown. Here, we report that a conserved surface-exposed pocket on PP2A regulatory B56 subunits binds to a consensus sequence on interacting proteins, which we term the LxxIxE motif. The composition of the motif modulates the affinity for B56, which in turn determines the phosphorylation status of associated substrates. Phosphorylation of amino acid residues within the motif increases B56 binding, allowing integration of kinase and phosphatase activity. We identify conserved LxxIxE motifs in essential proteins throughout the eukaryotic domain of life and in human viruses, suggesting that the motifs are required for basic cellular function. Our study provides a molecular description of PP2A binding specificity with broad implications for understanding signaling in eukaryotes.

  19. A Conserved Motif Provides Binding Specificity to the PP2A-B56 Phosphatase.

    PubMed

    Hertz, Emil Peter Thrane; Kruse, Thomas; Davey, Norman E; López-Méndez, Blanca; Sigurðsson, Jón Otti; Montoya, Guillermo; Olsen, Jesper V; Nilsson, Jakob

    2016-08-18

    Dynamic protein phosphorylation is a fundamental mechanism regulating biological processes in all organisms. Protein phosphatase 2A (PP2A) is the main source of phosphatase activity in the cell, but the molecular details of substrate recognition are unknown. Here, we report that a conserved surface-exposed pocket on PP2A regulatory B56 subunits binds to a consensus sequence on interacting proteins, which we term the LxxIxE motif. The composition of the motif modulates the affinity for B56, which in turn determines the phosphorylation status of associated substrates. Phosphorylation of amino acid residues within the motif increases B56 binding, allowing integration of kinase and phosphatase activity. We identify conserved LxxIxE motifs in essential proteins throughout the eukaryotic domain of life and in human viruses, suggesting that the motifs are required for basic cellular function. Our study provides a molecular description of PP2A binding specificity with broad implications for understanding signaling in eukaryotes. PMID:27453045

  20. Oligonucleotide Sequence Motifs as Nucleosome Positioning Signals

    PubMed Central

    Collings, Clayton K.; Fernandez, Alfonso G.; Pitschka, Chad G.; Hawkins, Troy B.; Anderson, John N.

    2010-01-01

    To gain a better understanding of the sequence patterns that characterize positioned nucleosomes, we first performed an analysis of the periodicities of the 256 tetranucleotides in a yeast genome-wide library of nucleosomal DNA sequences that was prepared by in vitro reconstitution. The approach entailed the identification and analysis of 24 unique tetranucleotides that were defined by 8 consensus sequences. These consensus sequences were shown to be responsible for most if not all of the tetranucleotide and dinucleotide periodicities displayed by the entire library, demonstrating that the periodicities of dinucleotides that characterize the yeast genome are, in actuality, due primarily to the 8 consensus sequences. A novel combination of experimental and bioinformatic approaches was then used to show that these tetranucleotides are important for preferred formation of nucleosomes at specific sites along DNA in vitro. These results were then compared to tetranucleotide patterns in genome-wide in vivo libraries from yeast and C. elegans in order to assess the contributions of DNA sequence in the control of nucleosome residency in the cell. These comparisons revealed striking similarities in the tetranucleotide occurrence profiles that are likely to be involved in nucleosome positioning in both in vitro and in vivo libraries, suggesting that DNA sequence is an important factor in the control of nucleosome placement in vivo. However, the strengths of the tetranucleotide periodicities were 3–4 fold higher in the in vitro as compared to the in vivo libraries, which implies that DNA sequence plays less of a role in dictating nucleosome positions in vivo. The results of this study have important implications for models of sequence-dependent positioning since they suggest that a defined subset of tetranucleotides is involved in preferred nucleosome occupancy and that these tetranucleotides are the major source of the dinucleotide periodicities that are characteristic of

  1. A conserved heptamer motif for ribosomal RNA transcription termination in animal mitochondria.

    PubMed Central

    Valverde, J R; Marco, R; Garesse, R

    1994-01-01

    A search of sequence data bases for a tridecamer transcription termination signal, previously described in human mtDNA as being responsible for the accumulation of mitochondrial ribosomal RNAs (rRNAs) in excess over the rest of mitochondrial genes, has revealed that this termination signal occurs in equivalent positions in a wide variety of organisms from protozoa to mammals. Due to the compact organization of the mtDNA, the tridecamer motif usually appears as part of the 3' adjacent gene sequence. Because in phylogenetically widely separated organisms the mitochondrial genome has experienced many rearrangements, it is interesting that its occurrence near the 3' end of the large rRNA is independent of the adjacent gene. The tridecamer sequence has diverged in phylogenetically widely separated organisms. Nevertheless, a well-conserved heptamer--TGGCAGA, the mitochondrial rRNA termination box--can be defined. Although extending the experimental evidence of its role as a transcription termination signal in humans will be of great interest, its evolutionary conservation strongly suggests that mitochondrial rRNA transcription termination could be a widely conserved mechanism in animals. Furthermore, the conservation of a homologous tridecamer motif in one of the last 3' secondary loops of nonmitochondrial 23S-like rRNAs suggests that the role of the sequence has changed during mitochondrial evolution. PMID:7515499

  2. Classification of protein motifs based on subcellular localization uncovers evolutionary relationships at both sequence and functional levels

    PubMed Central

    2013-01-01

    Background Most proteins have evolved in specific cellular compartments that limit their functions and potential interactions. On the other hand, motifs define amino acid arrangements conserved between protein family members and represent powerful tools for assigning function to protein sequences. The ideal motif would identify all members of a protein family but in practice many motifs identify both family members and unrelated proteins, referred to as True Positive (TP) and False Positive (FP) sequences, respectively. Results To address the relationship between protein motifs, protein function and cellular localization, we systematically assigned subcellular localization data to motif sequences from the comprehensive PROSITE sequence motif database. Using this data we analyzed relationships between localization and function. We find that TPs and FPs have a strong tendency to localize in different compartments. When multiple localizations are considered, TPs are usually distributed between related cellular compartments. We also identified cases where FPs are concentrated in particular subcellular regions, indicating possible functional or evolutionary relationships with TP sequences of the same motif. Conclusions Our findings suggest that the systematic examination of subcellular localization has the potential to uncover evolutionary and functional relationships between motif-containing sequences. We believe that this type of analysis complements existing motif annotations and could aid in their interpretation. Our results shed light on the evolution of cellular organelles and potentially establish the basis for new subcellular localization and function prediction algorithms. PMID:23865897

  3. Chaotic motif sampler: detecting motifs from biological sequences by using chaotic neurodynamics

    NASA Astrophysics Data System (ADS)

    Matsuura, Takafumi; Ikeguchi, Tohru

    Identification of a region in biological sequences, motif extraction problem (MEP) is solved in bioinformatics. However, the MEP is an NP-hard problem. Therefore, it is almost impossible to obtain an optimal solution within a reasonable time frame. To find near optimal solutions for NP-hard combinatorial optimization problems such as traveling salesman problems, quadratic assignment problems, and vehicle routing problems, chaotic search, which is one of the deterministic approaches, has been proposed and exhibits better performance than stochastic approaches. In this paper, we propose a new alignment method that employs chaotic dynamics to solve the MEPs. It is called the Chaotic Motif Sampler. We show that the performance of the Chaotic Motif Sampler is considerably better than that of the conventional methods such as the Gibbs Site Sampler and the Neighborhood Optimization for Multiple Alignment Discovery.

  4. A Conserved Upstream Motif Orchestrates Autonomous, Germline-Enriched Expression of Caenorhabditis elegans piRNAs

    PubMed Central

    Day, Amanda M.; Chun, Sang Young; Khivansara, Vishal; Kim, John K.

    2013-01-01

    Piwi-interacting RNAs (piRNAs) fulfill a critical, conserved role in defending the genome against foreign genetic elements. In many organisms, piRNAs appear to be derived from processing of a long, polycistronic RNA precursor. Here, we establish that each Caenorhabditis elegans piRNA represents a tiny, autonomous transcriptional unit. Remarkably, the minimal C. elegans piRNA cassette requires only a 21 nucleotide (nt) piRNA sequence and an ∼50 nt upstream motif with limited genomic context for expression. Combining computational analyses with a novel, in vivo transgenic system, we demonstrate that this upstream motif is necessary for independent expression of a germline-enriched, Piwi-dependent piRNA. We further show that a single nucleotide position within this motif directs differential germline enrichment. Accordingly, over 70% of C. elegans piRNAs are selectively expressed in male or female germline, and comparison of the genes they target suggests that these two populations have evolved independently. Together, our results indicate that C. elegans piRNA upstream motifs act as independent promoters to specify which sequences are expressed as piRNAs, how abundantly they are expressed, and in what germline. As the genome encodes well over 15,000 unique piRNA sequences, our study reveals that the number of transcriptional units encoding piRNAs rivals the number of mRNA coding genes in the C. elegans genome. PMID:23516384

  5. Functional Analysis of Semi-conserved Transit Peptide Motifs and Mechanistic Implications in Precursor Targeting and Recognition.

    PubMed

    Holbrook, Kristen; Subramanian, Chitra; Chotewutmontri, Prakitchai; Reddick, L Evan; Wright, Sarah; Zhang, Huixia; Moncrief, Lily; Bruce, Barry D

    2016-09-01

    Over 95% of plastid proteins are nuclear-encoded as their precursors containing an N-terminal extension known as the transit peptide (TP). Although highly variable, TPs direct the precursors through a conserved, posttranslational mechanism involving translocons in the outer (TOC) and inner envelope (TOC). The organelle import specificity is mediated by one or more components of the Toc complex. However, the high TP diversity creates a paradox on how the sequences can be specifically recognized. An emerging model of TP design is that they contain multiple loosely conserved motifs that are recognized at different steps in the targeting and transport process. Bioinformatics has demonstrated that many TPs contain semi-conserved physicochemical motifs, termed FGLK. In order to characterize FGLK motifs in TP recognition and import, we have analyzed two well-studied TPs from the precursor of RuBisCO small subunit (SStp) and ferredoxin (Fdtp). Both SStp and Fdtp contain two FGLK motifs. Analysis of large set mutations (∼85) in these two motifs using in vitro, in organello, and in vivo approaches support a model in which the FGLK domains mediate interaction with TOC34 and possibly other TOC components. In vivo import analysis suggests that multiple FGLK motifs are functionally redundant. Furthermore, we discuss how FGLK motifs are required for efficient precursor protein import and how these elements may permit a convergent function of this highly variable class of targeting sequences. PMID:27378725

  6. An evolutionary analysis of flightin reveals a conserved motif unique and widespread in Pancrustacea.

    PubMed

    Soto-Adames, Felipe N; Alvarez-Ortiz, Pedro; Vigoreaux, Jim O

    2014-01-01

    Flightin is a thick filament protein that in Drosophila melanogaster is uniquely expressed in the asynchronous, indirect flight muscles (IFM). Flightin is required for the structure and function of the IFM and is indispensable for flight in Drosophila. Given the importance of flight acquisition in the evolutionary history of insects, here we study the phylogeny and distribution of flightin. Flightin was identified in 69 species of hexapods in classes Collembola (springtails), Protura, Diplura, and insect orders Thysanura (silverfish), Dictyoptera (roaches), Orthoptera (grasshoppers), Pthiraptera (lice), Hemiptera (true bugs), Coleoptera (beetles), Neuroptera (green lacewing), Hymenoptera (bees, ants, and wasps), Lepidoptera (moths), and Diptera (flies and mosquitoes). Flightin was also found in 14 species of crustaceans in orders Anostraca (water flea), Cladocera (brine shrimp), Isopoda (pill bugs), Amphipoda (scuds, sideswimmers), and Decapoda (lobsters, crabs, and shrimps). Flightin was not identified in representatives of chelicerates, myriapods, or any species outside Pancrustacea (Tetraconata, sensu Dohle). Alignment of amino acid sequences revealed a conserved region of 52 amino acids, referred herein as WYR, that is bound by strictly conserved tryptophan (W) and arginine (R) and an intervening sequence with a high content of tyrosines (Y). This motif has no homologs in GenBank or PROSITE and is unique to flightin and paraflightin, a putative flightin paralog identified in decapods. A third motif of unclear affinities to pancrustacean WYR was observed in chelicerates. Phylogenetic analysis of amino acid sequences of the conserved motif suggests that paraflightin originated before the divergence of amphipods, isopods, and decapods. We conclude that flightin originated de novo in the ancestor of Pancrustacea > 500 MYA, well before the divergence of insects (~400 MYA) and the origin of flight (~325 MYA), and that its IFM-specific function in Drosophila is a more

  7. An evolutionary analysis of flightin reveals a conserved motif unique and widespread in Pancrustacea.

    PubMed

    Soto-Adames, Felipe N; Alvarez-Ortiz, Pedro; Vigoreaux, Jim O

    2014-01-01

    Flightin is a thick filament protein that in Drosophila melanogaster is uniquely expressed in the asynchronous, indirect flight muscles (IFM). Flightin is required for the structure and function of the IFM and is indispensable for flight in Drosophila. Given the importance of flight acquisition in the evolutionary history of insects, here we study the phylogeny and distribution of flightin. Flightin was identified in 69 species of hexapods in classes Collembola (springtails), Protura, Diplura, and insect orders Thysanura (silverfish), Dictyoptera (roaches), Orthoptera (grasshoppers), Pthiraptera (lice), Hemiptera (true bugs), Coleoptera (beetles), Neuroptera (green lacewing), Hymenoptera (bees, ants, and wasps), Lepidoptera (moths), and Diptera (flies and mosquitoes). Flightin was also found in 14 species of crustaceans in orders Anostraca (water flea), Cladocera (brine shrimp), Isopoda (pill bugs), Amphipoda (scuds, sideswimmers), and Decapoda (lobsters, crabs, and shrimps). Flightin was not identified in representatives of chelicerates, myriapods, or any species outside Pancrustacea (Tetraconata, sensu Dohle). Alignment of amino acid sequences revealed a conserved region of 52 amino acids, referred herein as WYR, that is bound by strictly conserved tryptophan (W) and arginine (R) and an intervening sequence with a high content of tyrosines (Y). This motif has no homologs in GenBank or PROSITE and is unique to flightin and paraflightin, a putative flightin paralog identified in decapods. A third motif of unclear affinities to pancrustacean WYR was observed in chelicerates. Phylogenetic analysis of amino acid sequences of the conserved motif suggests that paraflightin originated before the divergence of amphipods, isopods, and decapods. We conclude that flightin originated de novo in the ancestor of Pancrustacea > 500 MYA, well before the divergence of insects (~400 MYA) and the origin of flight (~325 MYA), and that its IFM-specific function in Drosophila is a more

  8. Conserved motif of CDK5RAP2 mediates its localization to centrosomes and the Golgi complex.

    PubMed

    Wang, Zhe; Wu, Tao; Shi, Lin; Zhang, Lin; Zheng, Wei; Qu, Jianan Y; Niu, Ruifang; Qi, Robert Z

    2010-07-16

    As the primary microtubule-organizing centers, centrosomes require gamma-tubulin for microtubule nucleation and organization. Located in close vicinity to centrosomes, the Golgi complex is another microtubule-organizing organelle in interphase cells. CDK5RAP2 is a gamma-tubulin complex-binding protein and functions in gamma-tubulin attachment to centrosomes. In this study, we find that CDK5RAP2 localizes to the Golgi complex in an ATP- and centrosome-dependent manner and associates with Golgi membranes independently of microtubules. CDK5RAP2 contains a centrosome-targeting domain with its core region highly homologous to the Motif 2 (CM2) of centrosomin, a functionally related protein in Drosophila. This sequence, referred to as the CM2-like motif, is also conserved in related proteins in chicken and zebrafish. Therefore, CDK5RAP2 may undertake a conserved mechanism for centrosomal localization. Using a mutational approach, we demonstrate that the CM2-like motif plays a crucial role in the centrosomal and Golgi localization of CDK5RAP2. Furthermore, the CM2-like motif is essential for the association of the centrosome-targeting domain to pericentrin and AKAP450. The binding with pericentrin is required for the centrosomal and Golgi localization of CDK5RAP2, whereas the binding with AKAP450 is required for the Golgi localization. Although the CM2-like motif possesses the activity of Ca(2+)-independent calmodulin binding, binding of calmodulin to this sequence is dispensable for centrosomal and Golgi association. Altogether, CDK5RAP2 may represent a novel mechanism for centrosomal and Golgi localization. PMID:20466722

  9. MoD Tools: regulatory motif discovery in nucleotide sequences from co-regulated or homologous genes.

    PubMed

    Pavesi, Giulio; Mereghetti, Paolo; Zambelli, Federico; Stefani, Marco; Mauri, Giancarlo; Pesole, Graziano

    2006-07-01

    Understanding the complex mechanisms regulating gene expression at the transcriptional and post-transcriptional levels is one of the greatest challenges of the post-genomic era. The MoD (MOtif Discovery) Tools web server comprises a set of tools for the discovery of novel conserved sequence and structure motifs in nucleotide sequences, motifs that in turn are good candidates for regulatory activity. The server includes the following programs: Weeder, for the discovery of conserved transcription factor binding sites (TFBSs) in nucleotide sequences from co-regulated genes; WeederH, for the discovery of conserved TFBSs and distal regulatory modules in sequences from homologous genes; RNAProfile, for the discovery of conserved secondary structure motifs in unaligned RNA sequences whose secondary structure is not known. In this way, a given gene can be compared with other co-regulated genes or with its homologs, or its mRNA can be analyzed for conserved motifs regulating its post-transcriptional fate. The web server thus provides researchers with different strategies and methods to investigate the regulation of gene expression, at both the transcriptional and post-transcriptional levels. Available at http://www.pesolelab.it/modtools/ and http://www.beacon.unimi.it/modtools/.

  10. Axoneme-specific beta-tubulin specialization: a conserved C-terminal motif specifies the central pair.

    PubMed

    Nielsen, M G; Turner, F R; Hutchens, J A; Raff, E C

    2001-04-01

    Axonemes are ancient organelles that mediate motility of cilia and flagella in animals, plants, and protists. The long evolutionary conservation of axoneme architecture, a cylinder of nine doublet microtubules surrounding a central pair of singlet microtubules, suggests all motile axonemes may share common assembly mechanisms. Consistent with this, alpha- and beta-tubulins utilized in motile axonemes fall among the most conserved tubulin sequences [1, 2], and the beta-tubulins contain a sequence motif at the same position in the carboxyl terminus [3]. Axoneme doublet microtubules are initiated from the corresponding triplet microtubules of the basal body [4], but the large macromolecular "central apparatus" that includes the central pair microtubules and associated structures [5] is a specialization unique to motile axonemes. In Drosophila spermatogenesis, basal bodies and axonemes utilize the same alpha-tubulin but different beta-tubulins [6--13]. beta 1 is utilized for the centriole/basal body, and beta 2 is utilized for the motile sperm tail axoneme. beta 2 contains the motile axoneme-specific sequence motif, but beta 1 does not [3]. Here, we show that the "axoneme motif" specifies the central pair. beta 1 can provide partial function for axoneme assembly but cannot make the central microtubules [14]. Introducing the axoneme motif into the beta 1 carboxyl terminus, a two amino acid change, conferred upon beta 1 the ability to assemble 9 + 2 axonemes. This finding explains the conservation of the axoneme-specific sequence motif through 1.5 billion years of evolution.

  11. Notch signaling from the endosome requires a conserved dileucine motif

    PubMed Central

    Zheng, Li; Saunders, Cosmo A.; Sorensen, Erika B.; Waxmonsky, Nicole C.; Conner, Sean D.

    2013-01-01

    Notch signaling is reliant on γ-secretase–mediated processing, although the subcellular location where γ-secretase cleaves Notch to initiate signaling remains unresolved. Accumulating evidence demonstrates that Notch signaling is modulated by endocytosis and endosomal transport. In this study, we investigated the relationship between Notch transport itinerary and signaling capacity. In doing so, we discovered a highly conserved dileucine sorting signal encoded within the cytoplasmic tail that directs Notch to the limiting membrane of the lysosome for signaling. Mutating the dileucine motif led to receptor accumulation in cation-dependent mannose-phosphate receptor–positive tubular early endosomes and a reduction in Notch signaling capacity. Moreover, truncated receptor forms that mimic activated Notch were readily cleaved by γ-secretase within the endosome; however, the cleavage product was proteasome-sensitive and failed to contribute to robust signaling. Collectively these results indicate that Notch signaling from the lysosome limiting membrane is conserved and that receptor targeting to this compartment is an active process. Moreover, the data support a model in which Notch signaling in mammalian systems is initiated from either the plasma membrane or lysosome, but not the early endosome. PMID:23171551

  12. Sequence-motif Detection of NAD(P)-binding Proteins: Discovery of a Unique Antibacterial Drug Target

    NASA Astrophysics Data System (ADS)

    Hua, Yun Hao; Wu, Chih Yuan; Sargsyan, Karen; Lim, Carmay

    2014-09-01

    Many enzymes use nicotinamide adenine dinucleotide or nicotinamide adenine dinucleotide phosphate (NAD(P)) as essential coenzymes. These enzymes often do not share significant sequence identity and cannot be easily detected by sequence homology. Previously, we determined all distinct locally conserved pyrophosphate-binding structures (3d motifs) from NAD(P)-bound protein structures, from which 1d sequence motifs were derived. Here, we aim to establish the precision of these 3d and 1d motifs to annotate NAD(P)-binding proteins. We show that the pyrophosphate-binding 3d motifs are characteristic of NAD(P)-binding proteins, as they are rarely found in nonNAD(P)-binding proteins. Furthermore, several 1d motifs could distinguish between proteins that bind only NAD and those that bind only NADP. They could also distinguish between NAD(P)-binding proteins from nonNAD(P)-binding ones. Interestingly, one of the pyrophosphate-binding 3d and corresponding 1d motifs was found only in enoyl-acyl carrier protein reductases, which are enzymes essential for bacterial fatty acid biosynthesis. This unique 3d motif serves as an attractive novel drug target, as it is conserved across many bacterial species and is not found in human proteins.

  13. Gapped alignment of protein sequence motifs through Monte Carlo optimization of a hidden Markov model

    PubMed Central

    Neuwald, Andrew F; Liu, Jun S

    2004-01-01

    Background Certain protein families are highly conserved across distantly related organisms and belong to large and functionally diverse superfamilies. The patterns of conservation present in these protein sequences presumably are due to selective constraints maintaining important but unknown structural mechanisms with some constraints specific to each family and others shared by a larger subset or by the entire superfamily. To exploit these patterns as a source of functional information, we recently devised a statistically based approach called contrast hierarchical alignment and interaction network (CHAIN) analysis, which infers the strengths of various categories of selective constraints from co-conserved patterns in a multiple alignment. The power of this approach strongly depends on the quality of the multiple alignments, which thus motivated development of theoretical concepts and strategies to improve alignment of conserved motifs within large sets of distantly related sequences. Results Here we describe a hidden Markov model (HMM), an algebraic system, and Markov chain Monte Carlo (MCMC) sampling strategies for alignment of multiple sequence motifs. The MCMC sampling strategies are useful both for alignment optimization and for adjusting position specific background amino acid frequencies for alignment uncertainties. Associated statistical formulations provide an objective measure of alignment quality as well as automatic gap penalty optimization. Improved alignments obtained in this way are compared with PSI-BLAST based alignments within the context of CHAIN analysis of three protein families: Giα subunits, prolyl oligopeptidases, and transitional endoplasmic reticulum (p97) AAA+ ATPases. Conclusion While not entirely replacing PSI-BLAST based alignments, which likewise may be optimized for CHAIN analysis using this approach, these motif-based methods often more accurately align very distantly related sequences and thus can provide a better measure of

  14. Sequence-based classification using discriminatory motif feature selection.

    PubMed

    Xiong, Hao; Capurso, Daniel; Sen, Saunak; Segal, Mark R

    2011-01-01

    Most existing methods for sequence-based classification use exhaustive feature generation, employing, for example, all k-mer patterns. The motivation behind such (enumerative) approaches is to minimize the potential for overlooking important features. However, there are shortcomings to this strategy. First, practical constraints limit the scope of exhaustive feature generation to patterns of length ≤ k, such that potentially important, longer (> k) predictors are not considered. Second, features so generated exhibit strong dependencies, which can complicate understanding of derived classification rules. Third, and most importantly, numerous irrelevant features are created. These concerns can compromise prediction and interpretation. While remedies have been proposed, they tend to be problem-specific and not broadly applicable. Here, we develop a generally applicable methodology, and an attendant software pipeline, that is predicated on discriminatory motif finding. In addition to the traditional training and validation partitions, our framework entails a third level of data partitioning, a discovery partition. A discriminatory motif finder is used on sequences and associated class labels in the discovery partition to yield a (small) set of features. These features are then used as inputs to a classifier in the training partition. Finally, performance assessment occurs on the validation partition. Important attributes of our approach are its modularity (any discriminatory motif finder and any classifier can be deployed) and its universality (all data, including sequences that are unaligned and/or of unequal length, can be accommodated). We illustrate our approach on two nucleosome occupancy datasets and a protein solubility dataset, previously analyzed using enumerative feature generation. Our method achieves excellent performance results, with and without optimization of classifier tuning parameters. A Python pipeline implementing the approach is available at

  15. Nucleotide binding database NBDB – a collection of sequence motifs with specific protein-ligand interactions

    PubMed Central

    Zheng, Zejun; Goncearenco, Alexander; Berezovsky, Igor N.

    2016-01-01

    NBDB database describes protein motifs, elementary functional loops (EFLs) that are involved in binding of nucleotide-containing ligands and other biologically relevant cofactors/coenzymes, including ATP, AMP, ATP, GMP, GDP, GTP, CTP, PAP, PPS, FMN, FAD(H), NAD(H), NADP, cAMP, cGMP, c-di-AMP and c-di-GMP, ThPP, THD, F-420, ACO, CoA, PLP and SAM. The database is freely available online at http://nbdb.bii.a-star.edu.sg. In total, NBDB contains data on 249 motifs that work in interactions with 24 ligands. Sequence profiles of EFL motifs were derived de novo from nonredundant Uniprot proteome sequences. Conserved amino acid residues in the profiles interact specifically with distinct chemical parts of nucleotide-containing ligands, such as nitrogenous bases, phosphate groups, ribose, nicotinamide, and flavin moieties. Each EFL profile in the database is characterized by a pattern of corresponding ligand–protein interactions found in crystallized ligand–protein complexes. NBDB database helps to explore the determinants of nucleotide and cofactor binding in different protein folds and families. NBDB can also detect fragments that match to profiles of particular EFLs in the protein sequence provided by user. Comprehensive information on sequence, structures, and interactions of EFLs with ligands provides a foundation for experimental and computational efforts on design of required protein functions. PMID:26507856

  16. Nucleotide binding database NBDB--a collection of sequence motifs with specific protein-ligand interactions.

    PubMed

    Zheng, Zejun; Goncearenco, Alexander; Berezovsky, Igor N

    2016-01-01

    NBDB database describes protein motifs, elementary functional loops (EFLs) that are involved in binding of nucleotide-containing ligands and other biologically relevant cofactors/coenzymes, including ATP, AMP, ATP, GMP, GDP, GTP, CTP, PAP, PPS, FMN, FAD(H), NAD(H), NADP, cAMP, cGMP, c-di-AMP and c-di-GMP, ThPP, THD, F-420, ACO, CoA, PLP and SAM. The database is freely available online at http://nbdb.bii.a-star.edu.sg. In total, NBDB contains data on 249 motifs that work in interactions with 24 ligands. Sequence profiles of EFL motifs were derived de novo from nonredundant Uniprot proteome sequences. Conserved amino acid residues in the profiles interact specifically with distinct chemical parts of nucleotide-containing ligands, such as nitrogenous bases, phosphate groups, ribose, nicotinamide, and flavin moieties. Each EFL profile in the database is characterized by a pattern of corresponding ligand-protein interactions found in crystallized ligand-protein complexes. NBDB database helps to explore the determinants of nucleotide and cofactor binding in different protein folds and families. NBDB can also detect fragments that match to profiles of particular EFLs in the protein sequence provided by user. Comprehensive information on sequence, structures, and interactions of EFLs with ligands provides a foundation for experimental and computational efforts on design of required protein functions.

  17. Nucleotide binding database NBDB--a collection of sequence motifs with specific protein-ligand interactions.

    PubMed

    Zheng, Zejun; Goncearenco, Alexander; Berezovsky, Igor N

    2016-01-01

    NBDB database describes protein motifs, elementary functional loops (EFLs) that are involved in binding of nucleotide-containing ligands and other biologically relevant cofactors/coenzymes, including ATP, AMP, ATP, GMP, GDP, GTP, CTP, PAP, PPS, FMN, FAD(H), NAD(H), NADP, cAMP, cGMP, c-di-AMP and c-di-GMP, ThPP, THD, F-420, ACO, CoA, PLP and SAM. The database is freely available online at http://nbdb.bii.a-star.edu.sg. In total, NBDB contains data on 249 motifs that work in interactions with 24 ligands. Sequence profiles of EFL motifs were derived de novo from nonredundant Uniprot proteome sequences. Conserved amino acid residues in the profiles interact specifically with distinct chemical parts of nucleotide-containing ligands, such as nitrogenous bases, phosphate groups, ribose, nicotinamide, and flavin moieties. Each EFL profile in the database is characterized by a pattern of corresponding ligand-protein interactions found in crystallized ligand-protein complexes. NBDB database helps to explore the determinants of nucleotide and cofactor binding in different protein folds and families. NBDB can also detect fragments that match to profiles of particular EFLs in the protein sequence provided by user. Comprehensive information on sequence, structures, and interactions of EFLs with ligands provides a foundation for experimental and computational efforts on design of required protein functions. PMID:26507856

  18. Rewiring yeast sugar transporter preference through modifying a conserved protein motif

    PubMed Central

    Young, Eric M.; Tong, Alice; Bui, Hang; Spofford, Caitlin; Alper, Hal S.

    2014-01-01

    Utilization of exogenous sugars found in lignocellulosic biomass hydrolysates, such as xylose, must be improved before yeast can serve as an efficient biofuel and biochemical production platform. In particular, the first step in this process, the molecular transport of xylose into the cell, can serve as a significant flux bottleneck and is highly inhibited by other sugars. Here we demonstrate that sugar transport preference and kinetics can be rewired through the programming of a sequence motif of the general form G-G/F-XXX-G found in the first transmembrane span. By evaluating 46 different heterologously expressed transporters, we find that this motif is conserved among functional transporters and highly enriched in transporters that confer growth on xylose. Through saturation mutagenesis and subsequent rational mutagenesis, four transporter mutants unable to confer growth on glucose but able to sustain growth on xylose were engineered. Specifically, Candida intermedia gxs1 Phe38Ile39Met40, Scheffersomyces stipitis rgt2 Phe38 and Met40, and Saccharomyces cerevisiae hxt7 Ile39Met40Met340 all exhibit this phenotype. In these cases, primary hexose transporters were rewired into xylose transporters. These xylose transporters nevertheless remained inhibited by glucose. Furthermore, in the course of identifying this motif, novel wild-type transporters with superior monosaccharide growth profiles were discovered, namely S. stipitis RGT2 and Debaryomyces hansenii 2D01474. These findings build toward the engineering of efficient pentose utilization in yeast and provide a blueprint for reprogramming transporter properties. PMID:24344268

  19. Sequence motifs of myelin membrane proteins: towards the molecular basis of diseases.

    PubMed

    Sedzik, Jan; Jastrzebski, Jan Pawel; Ikenaka, Kazuhiro

    2013-04-01

    The shortest sequence of amino acids in protein containing functional and structural information is a "motif." To understand myelin protein functions, we intensively searched for motifs that can be found in myelin proteins. Some myelin proteins had several different motifs or repetition of the same motif. The most abundant motif found among myelin proteins was a myristoylation motif. Bovine MAG held 11 myristoylation motifs and human myelin basic protein held as many as eight such motifs. PMP22 had the fewest myristoylation motifs, which was only one; rat PMP22 contained no such motifs. Cholesterol recognition/interaction amino-acid consensus (CRAC) motif was not found in myelin basic protein. P2 protein of different species contained only one CRAC motif, except for P2 of horse, which had no such motifs. MAG, MOG, and P0 were very rich in CRAC, three to eight motifs per protein. The analysis of motifs in myelin proteins is expected to provide structural insight and refinement of predicted 3D models for which structures are as yet unknown. Analysis of motifs in mutant proteins associated with neurological diseases uncovered that some motifs disappeared in P0 with mutation found in neurological diseases. There are 2,500 motifs deposited in a databank, but 21 were found in myelin proteins, which is only 1% of the total known motifs. There was great variability in the number of motifs among proteins from different species. The appearance or disappearance of protein motifs after gaining point mutation in the protein related to neurological diseases was very interesting. PMID:23339078

  20. Bioinformatic Identification of Conserved Cis-Sequences in Coregulated Genes.

    PubMed

    Bülow, Lorenz; Hehl, Reinhard

    2016-01-01

    Bioinformatics tools can be employed to identify conserved cis-sequences in sets of coregulated plant genes because more and more gene expression and genomic sequence data become available. Knowledge on the specific cis-sequences, their enrichment and arrangement within promoters, facilitates the design of functional synthetic plant promoters that are responsive to specific stresses. The present chapter illustrates an example for the bioinformatic identification of conserved Arabidopsis thaliana cis-sequences enriched in drought stress-responsive genes. This workflow can be applied for the identification of cis-sequences in any sets of coregulated genes. The workflow includes detailed protocols to determine sets of coregulated genes, to extract the corresponding promoter sequences, and how to install and run a software package to identify overrepresented motifs. Further bioinformatic analyses that can be performed with the results are discussed. PMID:27557771

  1. Pleiotropic functions of a conserved insect-specific Hox peptide motif.

    PubMed

    Hittinger, Chris Todd; Stern, David L; Carroll, Sean B

    2005-12-01

    The proteins that regulate developmental processes in animals have generally been well conserved during evolution. A few cases are known where protein activities have functionally evolved. These rare examples raise the issue of how highly conserved regulatory proteins with many roles evolve new functions while maintaining old functions. We have investigated this by analyzing the function of the ;QA' peptide motif of the Hox protein Ultrabithorax (Ubx), a motif that has been conserved throughout insect evolution since its establishment early in the lineage. We precisely deleted the QA motif at the endogenous locus via allelic replacement in Drosophila melanogaster. Although the QA motif was originally characterized as involved in the repression of limb formation, we have found that it is highly pleiotropic. Curiously, deleting the QA motif had strong effects in some tissues while barely affecting others, suggesting that QA function is preferentially required for a subset of Ubx target genes. QA deletion homozygotes had a normal complement of limbs, but, at reduced doses of Ubx and the abdominal-A (abd-A) Hox gene, ectopic limb primordia and adult abdominal limbs formed when the QA motif was absent. These results show that redundancy and the additive contributions of activity-regulating peptide motifs play important roles in moderating the phenotypic consequences of Hox protein evolution, and that pleiotropic peptide motifs that contribute quantitatively to several functions are subject to intense purifying selection.

  2. An Automaton for Motifs Recognition in DNA Sequences

    NASA Astrophysics Data System (ADS)

    Perez, Gerardo; Mejia, Yuridia P.; Olmos, Ivan; Gonzalez, Jesus A.; Sánchez, Patricia; Vázquez, Candelario

    In this paper we present a new algorithm to find inexact motifs (which are transformed into a set of exact subsequences) from a DNA sequence. Our algorithm builds an automaton that searches for the set of exact subsequences in the DNA database (that can be very long). It starts with a preprocessing phase in which it builds the finite automaton, in this phase it also considers the case in which two different subsequences share a substring (in other words, the subsequences might overlap), this is implemented in a similar way as the KMP algorithm. During the searching phase, the algorithm recognizes all instances in the set of input subsequences that appear in the DNA sequence. The automaton is able to perform the search phase in linear time with respect to the dimension of the input sequence. Experimental results show that the proposed algorithm performs better than the Aho-Corasick algorithm, which has been proved to perform better than the naive approach, even more; it is considered to run in linear time.

  3. The TAGteam motif facilitates binding of 21 sequence-specific transcription factors in the Drosophila embryo

    PubMed Central

    Satija, Rahul; Bradley, Robert K.

    2012-01-01

    Highly overlapping patterns of genome-wide binding of many distinct transcription factors have been observed in worms, insects, and mammals, but the origins and consequences of this overlapping binding remain unclear. While analyzing chromatin immunoprecipitation data sets from 21 sequence-specific transcription factors active in the Drosophila embryo, we found that binding of all factors exhibits a dose-dependent relationship with “TAGteam” sequence motifs bound by the zinc finger protein Vielfaltig, also known as Zelda, a recently discovered activator of the zygotic genome. TAGteam motifs are present and well conserved in highly bound regions, and are associated with transcription factor binding even in the absence of canonical recognition motifs for these factors. Furthermore, levels of binding in promoters and enhancers of zygotically transcribed genes are correlated with RNA polymerase II occupancy and gene expression levels. Our results suggest that Vielfaltig acts as a master regulator of early development by facilitating the genome-wide establishment of overlapping patterns of binding of diverse transcription factors that drive global gene expression. PMID:22247430

  4. SVM2Motif--Reconstructing Overlapping DNA Sequence Motifs by Mimicking an SVM Predictor.

    PubMed

    Vidovic, Marina M-C; Görnitz, Nico; Müller, Klaus-Robert; Rätsch, Gunnar; Kloft, Marius

    2015-01-01

    Identifying discriminative motifs underlying the functionality and evolution of organisms is a major challenge in computational biology. Machine learning approaches such as support vector machines (SVMs) achieve state-of-the-art performances in genomic discrimination tasks, but--due to its black-box character--motifs underlying its decision function are largely unknown. As a remedy, positional oligomer importance matrices (POIMs) allow us to visualize the significance of position-specific subsequences. Although being a major step towards the explanation of trained SVM models, they suffer from the fact that their size grows exponentially in the length of the motif, which renders their manual inspection feasible only for comparably small motif sizes, typically k ≤ 5. In this work, we extend the work on positional oligomer importance matrices, by presenting a new machine-learning methodology, entitled motifPOIM, to extract the truly relevant motifs--regardless of their length and complexity--underlying the predictions of a trained SVM model. Our framework thereby considers the motifs as free parameters in a probabilistic model, a task which can be phrased as a non-convex optimization problem. The exponential dependence of the POIM size on the oligomer length poses a major numerical challenge, which we address by an efficient optimization framework that allows us to find possibly overlapping motifs consisting of up to hundreds of nucleotides. We demonstrate the efficacy of our approach on a synthetic data set as well as a real-world human splice site data set. PMID:26690911

  5. TOPDOM: database of conservatively located domains and motifs in proteins

    PubMed Central

    Varga, Julia; Dobson, László; Tusnády, Gábor E.

    2016-01-01

    Summary: The TOPDOM database—originally created as a collection of domains and motifs located consistently on the same side of the membranes in α-helical transmembrane proteins—has been updated and extended by taking into consideration consistently localized domains and motifs in globular proteins, too. By taking advantage of the recently developed CCTOP algorithm to determine the type of a protein and predict topology in case of transmembrane proteins, and by applying a thorough search for domains and motifs as well as utilizing the most up-to-date version of all source databases, we managed to reach a 6-fold increase in the size of the whole database and a 2-fold increase in the number of transmembrane proteins. Availability and implementation: TOPDOM database is available at http://topdom.enzim.hu. The webpage utilizes the common Apache, PHP5 and MySQL software to provide the user interface for accessing and searching the database. The database itself is generated on a high performance computer. Contact: tusnady.gabor@ttk.mta.hu. Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153630

  6. An evolutionary conserved motif is responsible for immunoglobulin heavy chain packing in the B cell membrane.

    PubMed

    Varriale, Sonia; Merlino, Antonello; Coscia, Maria Rosaria; Mazzarella, Lelio; Oreste, Umberto

    2010-12-01

    All species of vertebrates synthesize immunoglobulin molecules, which differ in an number of aspects but also share a few common features responsible for their function, such as the presence of a transmembrane domain in the membrane bound form of the immunoglobulin heavy chain (IgTMD) that ensures communication with the signal transducing Igα-Igβ peptides. We have analyzed the gene sequence encoding the IgTMD of different heavy chain isotypes of very distant species, from shark to mammals. The IgTMD sequences show a high degree of sequence identity and their encoding nucleotide sequences were shown to be subject to purifying selection at most sites. We have built molecular models of seven IgTMDs from different vertebrate species and have investigated the formation of homodimer in a palmitoyl oleoyl phosphatidylcholine (POPC) lipid bilayer by molecular dynamics simulations. We found that the conserved FXXXFXXS/TXXXS motif, never observed to date in protein transmembrane chains, is responsible for the two heavy chains association through two pairs of Phe-Phe hydrophobic interactions and two pairs of Ser/Thr-Ser/Ser hydrogen bonds. This interaction pattern, which stabilizes the dimer conformation in the lipid bilayer, was unique, being different from any other pattern identified in transmembrane helices to date. PMID:20937398

  7. JAR3D Webserver: Scoring and aligning RNA loop sequences to known 3D motifs

    PubMed Central

    Roll, James; Zirbel, Craig L.; Sweeney, Blake; Petrov, Anton I.; Leontis, Neocles

    2016-01-01

    Many non-coding RNAs have been identified and may function by forming 2D and 3D structures. RNA hairpin and internal loops are often represented as unstructured on secondary structure diagrams, but RNA 3D structures show that most such loops are structured by non-Watson–Crick basepairs and base stacking. Moreover, different RNA sequences can form the same RNA 3D motif. JAR3D finds possible 3D geometries for hairpin and internal loops by matching loop sequences to motif groups from the RNA 3D Motif Atlas, by exact sequence match when possible, and by probabilistic scoring and edit distance for novel sequences. The scoring gauges the ability of the sequences to form the same pattern of interactions observed in 3D structures of the motif. The JAR3D webserver at http://rna.bgsu.edu/jar3d/ takes one or many sequences of a single loop as input, or else one or many sequences of longer RNAs with multiple loops. Each sequence is scored against all current motif groups. The output shows the ten best-matching motif groups. Users can align input sequences to each of the motif groups found by JAR3D. JAR3D will be updated with every release of the RNA 3D Motif Atlas, and so its performance is expected to improve over time. PMID:27235417

  8. Weighted sequence motifs as an improved seeding step in microRNA target prediction algorithms.

    PubMed

    Saetrom, Ola; Snøve, Ola; Saetrom, Pål

    2005-07-01

    We present a new microRNA target prediction algorithm called TargetBoost, and show that the algorithm is stable and identifies more true targets than do existing algorithms. TargetBoost uses machine learning on a set of validated microRNA targets in lower organisms to create weighted sequence motifs that capture the binding characteristics between microRNAs and their targets. Existing algorithms require candidates to have (1) near-perfect complementarity between microRNAs' 5' end and their targets; (2) relatively high thermodynamic duplex stability; (3) multiple target sites in the target's 3' UTR; and (4) evolutionary conservation of the target between species. Most algorithms use one of the two first requirements in a seeding step, and use the three others as filters to improve the method's specificity. The initial seeding step determines an algorithm's sensitivity and also influences its specificity. As all algorithms may add filters to increase the specificity, we propose that methods should be compared before such filtering. We show that TargetBoost's weighted sequence motif approach is favorable to using both the duplex stability and the sequence complementarity steps. (TargetBoost is available as a Web tool from http://www.interagon.com/demo/.).

  9. Ser/Thr Motifs in Transmembrane Proteins: Conservation Patterns and Effects on Local Protein Structure and Dynamics

    PubMed Central

    del Val, Coral; White, Stephen H.

    2014-01-01

    We combined systematic bioinformatics analyses and molecular dynamics simulations to assess the conservation patterns of Ser and Thr motifs in membrane proteins, and the effect of such motifs on the structure and dynamics of α-helical transmembrane (TM) segments. We find that Ser/Thr motifs are often present in β-barrel TM proteins. At least one Ser/Thr motif is present in almost half of the sequences of α-helical proteins analyzed here. The extensive bioinformatics analyses and inspection of protein structures led to the identification of molecular transporters with noticeable numbers of Ser/Thr motifs within the TM region. Given the energetic penalty for burying multiple Ser/Thr groups in the membrane hydrophobic core, the observation of transporters with multiple membrane-embedded Ser/Thr is intriguing and raises the question of how the presence of multiple Ser/Thr affects protein local structure and dynamics. Molecular dynamics simulations of four different Ser-containing model TM peptides indicate that backbone hydrogen bonding of membrane-buried Ser/Thr hydroxyl groups can significantly change the local structure and dynamics of the helix. Ser groups located close to the membrane interface can hydrogen bond to solvent water instead of protein backbone, leading to an enhanced local solvation of the peptide. PMID:22836667

  10. Discovery of Recurrent Sequence Motifs in Saccharomyces cerevisiae Cell Wall Proteins

    PubMed Central

    Coronado, Juan E.; Epstein, Susan L.; Qiu, Wei-Gang; Lipke, Peter N.

    2008-01-01

    This paper describes a procedure for the discovery of recurrent substrings in amino acid sequences of proteins, and its application to fungal cell walls. The evolutionary origins of fungal cell walls are an open biological question. This question can be approached by studies of similarity among the sequences and sub-sequences of fungal wall proteins and by comparison to proteins in animals. We describe here how we have discovered building blocks, represented as recurrent sequence motifs (sub-sequences), within fungal cell wall proteins. These motifs have not been systematically identified before, because the low Shannon entropy of the cell wall sequences has hindered searches for local sequence similarities by sequence alignments. Nonetheless, our new, composition-based scoring matrices for local alignment searches now support statistically valid alignments for such low entropy sequences (Coronado et al. 2006. Euk. Cell 5: 628–637). We have now searched for similarities in a set of 171 known and putative cell wall proteins from baker’s yeast, Saccharomyces cerevisiae. The aligned segments were repeatedly subdivided and catalogued to identify 217 recurrent sequence motifs of length 8 amino acids or greater. 95% of these motifs occur in more than one cell wall protein. The median length of the motifs is 22 amino acid residues, considerably shorter than protein domains. For many cell wall proteins, these motifs collectively account for more than half of their amino acids. The prevalence of these motifs supports the idea of fungal cell wall proteins as assemblies of recurrent building blocks. PMID:19430580

  11. Comprehensive analysis of animal TALE homeobox genes: new conserved motifs and cases of accelerated evolution.

    PubMed

    Mukherjee, Krishanu; Bürglin, Thomas R

    2007-08-01

    TALE homeodomain proteins are an ancient subgroup within the group of homeodomain transcription factors that play important roles in animal, plant, and fungal development. We have extracted the full complement of TALE superclass homeobox genes from the genome projects of seven protostomes, seven deuterostomes, and Nematostella. This was supplemented with TALE homeobox genes from additional species and phylogenetic analyses were carried out with 276 sequences. We found 20 homeobox genes and 4 pseudogenes in humans, 21 genes in mouse, 8 genes in Drosophila, and 5 genes plus one truncated gene in Caenorhabditis elegans. Apart from the previously identified TALE classes MEIS, PBC, IRO, and TGIF, a novel class is identified, termed MOHAWK (MKX). Further, we show that the MEIS class can be divided into two families, PREP and MEIS. Prep genes have previously only been described in vertebrates but are lacking in Drosophila. Here we identify orthologues in other insect taxa as well as in the cnidarian Nematostella. In C. elegans, a divergent Prep protein has lost the homeodomain. Full-length multiple sequence alignment of the protostome and deuterostome sequences allowed us to identify several novel conserved motifs within the MKX, TGIF, and MEIS classes. Phylogenetic analyses revealed fast-evolving PBC class genes; in particular, some X-linked PBC genes in nematodes are subject to rapid evolution. In addition, several instances of gene loss were identified. In conclusion, our comprehensive analysis provides a defining framework for the classification of animal TALE homeobox genes and the understanding of their evolution.

  12. Illumina MiSeq sequencing disfavours a sequence motif in the GFP reporter gene.

    PubMed

    Van den Hoecke, Silvie; Verhelst, Judith; Saelens, Xavier

    2016-01-01

    Green fluorescent protein (GFP) is one of the most used reporter genes. We have used next-generation sequencing (NGS) to analyse the genetic diversity of a recombinant influenza A virus that expresses GFP and found a remarkable coverage dip in the GFP coding sequence. This coverage dip was present when virus-derived RT-PCR product or the parental plasmid DNA was used as starting material for NGS and regardless of whether Nextera XT transposase or Covaris shearing was used for DNA fragmentation. Therefore, the sequence coverage dip in the GFP coding sequence was not the result of emerging GFP mutant viruses or a bias introduced by Nextera XT fragmentation. Instead, we found that the Illumina MiSeq sequencing method disfavours the 'CCCGCC' motif in the GFP coding sequence. PMID:27193250

  13. Illumina MiSeq sequencing disfavours a sequence motif in the GFP reporter gene

    PubMed Central

    Van den Hoecke, Silvie; Verhelst, Judith; Saelens, Xavier

    2016-01-01

    Green fluorescent protein (GFP) is one of the most used reporter genes. We have used next-generation sequencing (NGS) to analyse the genetic diversity of a recombinant influenza A virus that expresses GFP and found a remarkable coverage dip in the GFP coding sequence. This coverage dip was present when virus-derived RT-PCR product or the parental plasmid DNA was used as starting material for NGS and regardless of whether Nextera XT transposase or Covaris shearing was used for DNA fragmentation. Therefore, the sequence coverage dip in the GFP coding sequence was not the result of emerging GFP mutant viruses or a bias introduced by Nextera XT fragmentation. Instead, we found that the Illumina MiSeq sequencing method disfavours the ‘CCCGCC’ motif in the GFP coding sequence. PMID:27193250

  14. Sequence motifs and prokaryotic expression of the reptilian paramyxovirus fusion protein

    USGS Publications Warehouse

    Franke, J.; Batts, W.N.; Ahne, W.; Kurath, G.; Winton, J.R.

    2006-01-01

    Fourteen reptilian paramyxovirus isolates were chosen to represent the known extent of genetic diversity among this novel group of viruses. Selected regions of the fusion (F) gene were sequenced, analyzed and compared. The F gene of all isolates contained conserved motifs homologous to those described for other members of the family Paramyxoviridae including: signal peptide, transmembrane domain, furin cleavage site, fusion peptide, N-linked glycosylation sites, and two heptad repeats, the second of which (HRB-LZ) had the characteristics of a leucine zipper. Selected regions of the fusion gene of isolate Gono-GER85 were inserted into a prokaryotic expression system to generate three recombinant protein fragments of various sizes. The longest recombinant protein was cleaved by furin into two fragments of predicted length. Western blot analysis with virus-neutralizing rabbit-antiserum against this isolate demonstrated that only the longest construct reacted with the antiserum. This construct was unique in containing 30 additional C-terminal amino acids that included most of the HRB-LZ. These results indicate that the F genes of reptilian paramyxoviruses contain highly conserved motifs typical of other members of the family and suggest that the HRB-LZ domain of the reptilian paramyxovirus F protein contains a linear antigenic epitope. ?? Springer-Verlag 2005.

  15. The rnhB gene encoding RNase HII of Streptococcus pneumoniae and evidence of conserved motifs in eucaryotic genes.

    PubMed Central

    Zhang, Y B; Ayalew, S; Lacks, S A

    1997-01-01

    A single RNase H enzyme was detected in extracts of Streptococcus pneumoniae. The gene encoding this enzyme was cloned and expressed in Escherichia coli, as demonstrated by its ability to complement a double-mutant rnhA recC strain. Sequence analysis of the cloned DNA revealed an open reading frame of 290 codons that encodes a polypeptide of 31.9 kDa. The predicted protein exhibits a low level of homology (19% identity of amino acid residues) to RNase HII encoded by rnhB of E. coli. Identification of the S. pneumoniae RNase HII translation start site by amino-terminal sequencing of the protein and of mRNA start sites by primer extension with reverse transcriptase showed that the major transcript encoding rnhB begins at the protein start site. Comparison of the S. pneumoniae and E. coli RNase HII sequences and sequences of other, putative bacterial rnhB gene products surmised from sequencing data revealed three conserved motifs. Use of these motifs to search for homologous genes in eucaryotes demonstrated the presence of rnhB genes in a yeast and a roundworm. Partial rnhB gene sequences were detected among expressed sequences of mouse and human cells. From these data, it appears that RNase HII is universally present in living cells. PMID:9190796

  16. Unique Structural Features and Sequence Motifs of Proline Utilization A (PutA)

    PubMed Central

    Singh, Ranjan K.; Tanner, John J.

    2013-01-01

    Proline utilization A proteins (PutAs) are bifunctional enzymes that catalyze the oxidation of proline to glutamate using spatially separated proline dehydrogenase and pyrroline-5-carboxylate dehydrogenase active sites. Here we use the crystal structure of the minimalist PutA from Bradyrhizobium japonicum (BjPutA) along with sequence analysis to identify unique structural features of PutAs. This analysis shows that PutAs have secondary structural elements and domains not found in the related monofunctional enzymes. Some of these extra features are predicted to be important for substrate channeling in BjPutA. Multiple sequence alignment analysis shows that some PutAs have a 17-residue conserved motif in the C-terminal 20–30 residues of the polypeptide chain. The BjPutA structure shows that this motif helps seal the internal substrate-channeling cavity from the bulk medium. Finally, it is shown that some PutAs have a 100–200 residue domain of unknown function in the C-terminus that is not found in minimalist PutAs. Remote homology detection suggests that this domain is homologous to the oligomerization beta-hairpin and Rossmann fold domain of BjPutA. PMID:22201760

  17. Unique structural features and sequence motifs of proline utilization A (PutA).

    PubMed

    Singh, Ranjan K; Tanner, John J

    2012-01-01

    Proline utilization A proteins (PutAs) are bifunctional enzymes that catalyze the oxidation of proline to glutamate using spatially separated proline dehydrogenase and pyrroline-5-carboxylate dehydrogenase active sites. Here we use the crystal structure of the minimalist PutA from Bradyrhizobium japonicum (BjPutA) along with sequence analysis to identify unique structural features of PutAs. This analysis shows that PutAs have secondary structural elements and domains not found in the related monofunctional enzymes. Some of these extra features are predicted to be important for substrate channeling in BjPutA. Multiple sequence alignment analysis shows that some PutAs have a 17-residue conserved motif in the C-terminal 20-30 residues of the polypeptide chain. The BjPutA structure shows that this motif helps seal the internal substrate-channeling cavity from the bulk medium. Finally, it is shown that some PutAs have a 100-200 residue domain of unknown function in the C-terminus that is not found in minimalist PutAs. Remote homology detection suggests that this domain is homologous to the oligomerization beta-hairpin and Rossmann fold domain of BjPutA. PMID:22201760

  18. A conserved motif mediates both multimer formation and allosteric activation of phosphoglycerate mutase 5.

    PubMed

    Wilkins, Jordan M; McConnell, Cyrus; Tipton, Peter A; Hannink, Mark

    2014-09-01

    Phosphoglycerate mutase 5 (PGAM5) is an atypical mitochondrial Ser/Thr phosphatase that modulates mitochondrial dynamics and participates in both apoptotic and necrotic cell death. The mechanisms that regulate the phosphatase activity of PGAM5 are poorly understood. The C-terminal phosphoglycerate mutase domain of PGAM5 shares homology with the catalytic domains found in other members of the phosphoglycerate mutase family, including a conserved histidine that is absolutely required for catalytic activity. However, this conserved domain is not sufficient for maximal phosphatase activity. We have identified a highly conserved amino acid motif, WDXNWD, located within the unique N-terminal region, which is required for assembly of PGAM5 into large multimeric complexes. Alanine substitutions within the WDXNWD motif abolish the formation of multimeric complexes and markedly reduce phosphatase activity of PGAM5. A peptide containing the WDXNWD motif dissociates the multimeric complex and reduces but does not fully abolish phosphatase activity. Addition of the WDXNWD-containing peptide in trans to a mutant PGAM5 protein lacking the WDXNWD motif markedly increases phosphatase activity of the mutant protein. Our results are consistent with an intermolecular allosteric regulation mechanism for the phosphatase activity of PGAM5, in which the assembly of PGAM5 into multimeric complexes, mediated by the WDXNWD motif, results in maximal activation of phosphatase activity. Our results suggest the possibility of identifying small molecules that function as allosteric regulators of the phosphatase activity of PGAM5. PMID:25012655

  19. Pseudouridine synthases: four families of enzymes containing a putative uridine-binding motif also conserved in dUTPases and dCTP deaminases.

    PubMed

    Koonin, E V

    1996-06-15

    Using a combination of several methods for protein sequence comparison and motif analysis, it is shown that the four recently described pseudouridine syntheses with different specificities belong to four distinct families. Three of these families share two conserved motifs that are likely to be directly involved in catalysis. One of these motifs is detected also in two other families of enzymes that specifically bind uridine, namely deoxycitidine triphosphate deaminases and deoxyuridine triphosphatases. It is proposed that this motif is an essential part of the uridine-binding site. Two of the pseudouridine syntheses, one of which modifies the anticodon arm of tRNAs and the other is predicted to modify a portion of the large ribosomal subunit RNA belonging to the peptidyltransferase center, are encoded in all extensively sequenced genomes, including the 'minimal' genome of Mycoplasma genitalium. These particular RNA modifications and the respective enzymes are likely to be essential for the functioning of any cell.

  20. Physical-chemical property based sequence motifs and methods regarding same

    DOEpatents

    Braun, Werner; Mathura, Venkatarajan S.; Schein, Catherine H.

    2008-09-09

    A data analysis system, program, and/or method, e.g., a data mining/data exploration method, using physical-chemical property motifs. For example, a sequence database may be searched for identifying segments thereof having physical-chemical properties similar to the physical-chemical property motifs.

  1. SVC: structured visualization of evolutionary sequence conservation.

    PubMed

    Roepcke, S; Fiziev, P; Seeburg, P H; Vingron, M

    2005-07-01

    We have developed a web application for the detailed analysis and visualization of evolutionary sequence conservation in complex vertebrate genes. Given a pair of orthologous genes, the protein-coding sequences are aligned. When these sequences are mapped back onto their encoding exons in the genomes, a scaffold of the conserved gene structure naturally emerges. Sequence similarity between exons and introns is analysed and embedded into the gene structure scaffold. The visualization on the SVC server provides detailed information about evolutionarily conserved features of these genes. It further allows concise representation of complex splice patterns in the context of evolutionary conservation. A particular application of our tool arises from the fact that around mRNA editing sites both exonic and intronic sequences are highly conserved. This aids in delineation of these sites. SVC is available at http://svc.molgen.mpg.de.

  2. The human homolog of a candidate mouse t complex responder gene: conserved motifs and evolution with punctuated equilibria.

    PubMed

    Islam, S D; Pilder, S H; Decker, C L; Cebra-Thomas, J A; Silver, L M

    1993-12-01

    The mouse Tcp-10 gene has been established as a molecular candidate for the t complex responder locus which plays a central role in the transmission ratio distortion phenotype expressed by males heterozygous for a t haplotype. Here we describe a comparison of the mouse and human TCP10 coding sequences. The results show that whole exons have been added or eliminated from the transcripts expressed in each species, suggesting an evolutionary process of punctuated equilibria for this gene. Two of the polypeptide regions that are most conserved between the two species contain specific peptide motifs. The conserved C-terminal region contains a unique nonapeptide repeat of unknown function and the conserved N-terminal region contains a pair of leucine zippers within a region that shows additional similarity to the coiled-coil regions of various cytosolic polypeptides. These results are discussed in terms of the possible function of the TCP10 protein.

  3. Nuclear Magnetic Resonance Structure of a Novel Globular Domain in RBM10 Containing OCRE, the Octamer Repeat Sequence Motif.

    PubMed

    Martin, Bryan T; Serrano, Pedro; Geralt, Michael; Wüthrich, Kurt

    2016-01-01

    The OCtamer REpeat (OCRE) has been annotated as a 42-residue sequence motif with 12 tyrosine residues in the spliceosome trans-regulatory elements RBM5 and RBM10 (RBM [RNA-binding motif]), which are known to regulate alternative splicing of Fas and Bcl-x pre-mRNA transcripts. Nuclear magnetic resonance structure determination showed that the RBM10 OCRE sequence motif is part of a 55-residue globular domain containing 16 aromatic amino acids, which consists of an anti-parallel arrangement of six β strands, with the first five strands containing complete or incomplete Tyr triplets. This OCRE globular domain is a distinctive component of RBM10 and is more widely conserved in RBM10s across the animal kingdom than the ubiquitous RNA recognition components. It is also found in the functionally related RBM5. Thus, it appears that the three-dimensional structure of the globular OCRE domain, rather than the 42-residue OCRE sequence motif alone, confers specificity on RBM10 intermolecular interactions in the spliceosome.

  4. Interaction of MYC with Host Cell Factor-1 is meditated by the evolutionarily-conserved Myc box IV motif

    PubMed Central

    Thomas, Lance R.; Foshage, Audra M.; Weissmiller, April M.; Popay, Tessa M.; Grieb, Brian C.; Qualls, Susan J.; Ng, Victoria; Carboneau, Bethany; Lorey, Shelly; Eischen, Christine M.; Tansey, William P.

    2015-01-01

    The MYC family of oncogenes encodes a set of three related transcription factors that are overexpressed in many human tumors and contribute to the cancer-related deaths of more than 70,000 Americans every year. MYC proteins drive tumorigenesis by interacting with co-factors that enable them to regulate the expression of thousands of genes linked to cell growth, proliferation, metabolism, and genome stability. One effective way to identify critical cofactors required for MYC function has been to focus on sequence motifs within MYC that are conserved throughout evolution, on the assumption that their conservation is driven by protein-protein interactions that are vital for MYC activity. In addition to their DNA-binding domains, MYC proteins carry five regions of high sequence conservation known as Myc boxes (Mb). To date, four of the Myc box motifs (MbI, MbII, MbIIIa, and MbIIIb) have had a molecular function assigned to them, but the precise role of the remaining Myc box, MbIV, and the reason for its preservation in vertebrate Myc proteins, is unknown. Here, we show that MbIV is required for the association of MYC with the abundant transcriptional coregulator host cell factor 1 (HCF-1). We show that the invariant core of MbIV resembles the tetrapeptide HCF-binding motif (HBM) found in many HCF-interaction partners, and demonstrate that MYC interacts with HCF in a manner indistinguishable from the prototypical HBM-containing protein VP16. Finally, we show that rationalized point mutations in MYC that disrupt interaction with HCF-1 attenuate the ability of MYC to drive tumorigenesis in mice. Together, these data expose a molecular function for MbIV and indicate that HCF-1 is an important co-factor for MYC. PMID:26522729

  5. Evolutionarily conserved sequences on human chromosome 21

    SciTech Connect

    Frazer, Kelly A.; Sheehan, John B.; Stokowski, Renee P.; Chen, Xiyin; Hosseini, Roya; Cheng, Jan-Fang; Fodor, Stephen P.A.; Cox, David R.; Patil, Nila

    2001-09-01

    Comparison of human sequences with the DNA of other mammals is an excellent means of identifying functional elements in the human genome. Here we describe the utility of high-density oligonucleotide arrays as a rapid approach for comparing human sequences with the DNA of multiple species whose sequences are not presently available. High-density arrays representing approximately 22.5 Mb of nonrepetitive human chromosome 21 sequence were synthesized and then hybridized with mouse and dog DNA to identify sequences conserved between humans and mice (human-mouse elements) and between humans and dogs (human-dog elements). Our data show that sequence comparison of multiple species provides a powerful empiric method for identifying actively conserved elements in the human genome. A large fraction of these evolutionarily conserved elements are present in regions on chromosome 21 that do not encode known genes.

  6. Analysis of Genomic Sequence Motifs for Deciphering Transcription Factor Binding and Transcriptional Regulation in Eukaryotic Cells

    PubMed Central

    Boeva, Valentina

    2016-01-01

    Eukaryotic genomes contain a variety of structured patterns: repetitive elements, binding sites of DNA and RNA associated proteins, splice sites, and so on. Often, these structured patterns can be formalized as motifs and described using a proper mathematical model such as position weight matrix and IUPAC consensus. Two key tasks are typically carried out for motifs in the context of the analysis of genomic sequences. These are: identification in a set of DNA regions of over-represented motifs from a particular motif database, and de novo discovery of over-represented motifs. Here we describe existing methodology to perform these two tasks for motifs characterizing transcription factor binding. When applied to the output of ChIP-seq and ChIP-exo experiments, or to promoter regions of co-modulated genes, motif analysis techniques allow for the prediction of transcription factor binding events and enable identification of transcriptional regulators and co-regulators. The usefulness of motif analysis is further exemplified in this review by how motif discovery improves peak calling in ChIP-seq and ChIP-exo experiments and, when coupled with information on gene expression, allows insights into physical mechanisms of transcriptional modulation. PMID:26941778

  7. Analysis of Genomic Sequence Motifs for Deciphering Transcription Factor Binding and Transcriptional Regulation in Eukaryotic Cells.

    PubMed

    Boeva, Valentina

    2016-01-01

    Eukaryotic genomes contain a variety of structured patterns: repetitive elements, binding sites of DNA and RNA associated proteins, splice sites, and so on. Often, these structured patterns can be formalized as motifs and described using a proper mathematical model such as position weight matrix and IUPAC consensus. Two key tasks are typically carried out for motifs in the context of the analysis of genomic sequences. These are: identification in a set of DNA regions of over-represented motifs from a particular motif database, and de novo discovery of over-represented motifs. Here we describe existing methodology to perform these two tasks for motifs characterizing transcription factor binding. When applied to the output of ChIP-seq and ChIP-exo experiments, or to promoter regions of co-modulated genes, motif analysis techniques allow for the prediction of transcription factor binding events and enable identification of transcriptional regulators and co-regulators. The usefulness of motif analysis is further exemplified in this review by how motif discovery improves peak calling in ChIP-seq and ChIP-exo experiments and, when coupled with information on gene expression, allows insights into physical mechanisms of transcriptional modulation.

  8. Repulsive parallel MCMC algorithm for discovering diverse motifs from large sequence sets

    PubMed Central

    Ikebata, Hisaki; Yoshida, Ryo

    2015-01-01

    Motivation: The motif discovery problem consists of finding recurring patterns of short strings in a set of nucleotide sequences. This classical problem is receiving renewed attention as most early motif discovery methods lack the ability to handle large data of recent genome-wide ChIP studies. New ChIP-tailored methods focus on reducing computation time and pay little regard to the accuracy of motif detection. Unlike such methods, our method focuses on increasing the detection accuracy while maintaining the computation efficiency at an acceptable level. The major advantage of our method is that it can mine diverse multiple motifs undetectable by current methods. Results: The repulsive parallel Markov chain Monte Carlo (RPMCMC) algorithm that we propose is a parallel version of the widely used Gibbs motif sampler. RPMCMC is run on parallel interacting motif samplers. A repulsive force is generated when different motifs produced by different samplers near each other. Thus, different samplers explore different motifs. In this way, we can detect much more diverse motifs than conventional methods can. Through application to 228 transcription factor ChIP-seq datasets of the ENCODE project, we show that the RPMCMC algorithm can find many reliable cofactor interacting motifs that existing methods are unable to discover. Availability and implementation: A C++ implementation of RPMCMC and discovered cofactor motifs for the 228 ENCODE ChIP-seq datasets are available from http://daweb.ism.ac.jp/yoshidalab/motif. Contact: ikebata.hisaki@ism.ac.jp, yoshidar@ism.ac.jp Supplementary information: Supplementary data are available from Bioinformatics online. PMID:25583120

  9. V1R promoters are well conserved and exhibit common putative regulatory motifs

    PubMed Central

    Stewart, Robert; Lane, Robert P

    2007-01-01

    Background The mouse vomeronasal organ (VNO) processes chemosensory information, including pheromone signals that influence reproductive behaviors. The sensory neurons of the VNO express two types of chemosensory receptors, V1R and V2R. There are ~165 V1R genes in the mouse genome that have been classified into ~12 divergent subfamilies. Each sensory neuron of the apical compartment of the VNO transcribes only one of the repertoire of V1R genes. A model for mutually exclusive V1R transcription in these cells has been proposed in which each V1R gene might compete stochastically for a single transcriptional complex. This model predicts that the large repertoire of divergent V1R genes in the mouse genome contains common regulatory elements. In this study, we have characterized V1R promoter regions by comparative genomics and by mapping transcription start sites. Results We find that transcription is initiated from ~1 kb promoter regions that are well conserved within V1R subfamilies. While cross-subfamily homology is not evident by traditional methods, we developed a heuristic motif-searching tool, LogoAlign, and applied this tool to identify motifs shared within the promoters of all V1R genes. Our motif-searching tool exhibits rapid convergence to a relatively small number of non-redundant solutions (97% convergence). We also find that the best motifs contain significantly more information than those identified in controls, and that these motifs are more likely to be found in the immediate vicinity of transcription start sites than elsewhere in gene blocks. The best motifs occur near transcription start sites of ~90% of all V1R genes and across all of the divergent subfamilies. Therefore, these motifs are candidate binding sites for transcription factors involved in V1R co-regulation. Conclusion Our analyses show that V1R subfamilies have broad and well conserved promoter regions from which transcription is initiated. Results from a new motif-finding algorithm, Logo

  10. A novel 43 kd protein binds a conserved Mammalian caccc motif within the Drosophila ras2/rop bidirectional promoter.

    PubMed

    Lightfoot, K; Duarte, R; Segev, O

    1995-11-01

    The Drosophila ras2 promoter is an authentic bidirectional promoter governing the expression of both the Dras2 and rop genes by a single mechanism. Characterisation of the Dras2/rop promoter has revealed that a unitary complex (M) interacts with two promoter sub-domains (regions A and B). Two distinct transcription factors (factors A and B),which make up the major complex (M), bind regions A and B, respectively. We have analyzed the putative CACCC element and AP-1-Iike sequence contained within region B (-41 to -20) of the Dras2/rop promoter. It was found that AP-1 is not involved in Dras2 expression as is the case for the human Ha-ras1 gene. The entire CACCC motif (-34 to -21) shares 83% homology with the conserved mammalian element. Detailed mutational analysis has however revealed that the CACCC core sequence (-27 to -23) is vital for Dras2/rop recognition by factor B. The cytosine residues at positions -27, -25, -24 and -23 were observed to play a critical role in factor B recognition. Factor B has been purified as a 43 kD polypeptide as measured by SDS-PAGE and the relative mass was confirmed by photo-chemical crosslinking. Our findings are the first report of the conservation of the mammalian CACCC motif in Drosophila.

  11. Modeling of the Ebola virus delta peptide reveals a potential lytic sequence motif.

    PubMed

    Gallaher, William R; Garry, Robert F

    2015-01-20

    Filoviruses, such as Ebola and Marburg viruses, cause severe outbreaks of human infection, including the extensive epidemic of Ebola virus disease (EVD) in West Africa in 2014. In the course of examining mutations in the glycoprotein gene associated with 2014 Ebola virus (EBOV) sequences, a differential level of conservation was noted between the soluble form of glycoprotein (sGP) and the full length glycoprotein (GP), which are both encoded by the GP gene via RNA editing. In the region of the proteins encoded after the RNA editing site sGP was more conserved than the overlapping region of GP when compared to a distant outlier species, Tai Forest ebolavirus. Half of the amino acids comprising the "delta peptide", a 40 amino acid carboxy-terminal fragment of sGP, were identical between otherwise widely divergent species. A lysine-rich amphipathic peptide motif was noted at the carboxyl terminus of delta peptide with high structural relatedness to the cytolytic peptide of the non-structural protein 4 (NSP4) of rotavirus. EBOV delta peptide is a candidate viroporin, a cationic pore-forming peptide, and may contribute to EBOV pathogenesis.

  12. Modeling of the Ebola Virus Delta Peptide Reveals a Potential Lytic Sequence Motif

    PubMed Central

    Gallaher, William R.; Garry, Robert F.

    2015-01-01

    Filoviruses, such as Ebola and Marburg viruses, cause severe outbreaks of human infection, including the extensive epidemic of Ebola virus disease (EVD) in West Africa in 2014. In the course of examining mutations in the glycoprotein gene associated with 2014 Ebola virus (EBOV) sequences, a differential level of conservation was noted between the soluble form of glycoprotein (sGP) and the full length glycoprotein (GP), which are both encoded by the GP gene via RNA editing. In the region of the proteins encoded after the RNA editing site sGP was more conserved than the overlapping region of GP when compared to a distant outlier species, Tai Forest ebolavirus. Half of the amino acids comprising the “delta peptide”, a 40 amino acid carboxy-terminal fragment of sGP, were identical between otherwise widely divergent species. A lysine-rich amphipathic peptide motif was noted at the carboxyl terminus of delta peptide with high structural relatedness to the cytolytic peptide of the non-structural protein 4 (NSP4) of rotavirus. EBOV delta peptide is a candidate viroporin, a cationic pore-forming peptide, and may contribute to EBOV pathogenesis. PMID:25609303

  13. Sequence fingerprints of microRNA conservation.

    PubMed

    Shi, Bing; Gao, Wei; Wang, Juan

    2012-01-01

    It is known that the conservation of protein-coding genes is associated with their sequences both various species, such as animals and plants. However, the association between microRNA (miRNA) conservation and their sequences in various species remains unexplored. Here we report the association of miRNA conservation with its sequence features, such as base content and cleavage sites, suggesting that miRNA sequences contain the fingerprints for miRNA conservation. More interestingly, different species show different and even opposite patterns between miRNA conservation and sequence features. For example, mammalian miRNAs show a positive/negative correlation between conservation and AU/GC content, whereas plant miRNAs show a negative/positive correlation between conservation and AU/GC content. Further analysis puts forward the hypothesis that the introns of protein-coding genes may be a main driving force for the origin and evolution of mammalian miRNAs. At the 5' end, conserved miRNAs have a preference for base U, while less-conserved miRNAs have a preference for a non-U base in mammals. This difference does not exist in insects and plants, in which both conserved miRNAs and less-conserved miRNAs have a preference for base U at the 5' end. We further revealed that the non-U preference at the 5' end of less-conserved mammalian miRNAs is associated with miRNA function diversity, which may have evolved from the pressure of a highly sophisticated environmental stimulus the mammals encountered during evolution. These results indicated that miRNA sequences contain the fingerprints for conservation, and these fingerprints vary according to species. More importantly, the results suggest that although species share common mechanisms by which miRNAs originate and evolve, mammals may develop a novel mechanism for miRNA origin and evolution. In addition, the fingerprint found in this study can be predictor of miRNA conservation, and the findings are helpful in achieving a

  14. PROMOT: a FORTRAN program to scan protein sequences against a library of known motifs.

    PubMed

    Sternberg, M J

    1991-04-01

    Information about the three-dimensional structure or function of a newly determined protein sequence can be obtained if the protein is found to contain a characterized motif or pattern of residues. Recently a database (PROSITE) has been established that contains 337 known motifs encoded as a list of allowed residue types at specific positions along the sequence. PROMOT is a FORTRAN computer program that takes a protein sequence and examines if it contains any of the motifs in PROSITE. The program also extends the definitions of patterns beyond those used in PROSITE to provide a simple, yet flexible, method to scan either a PROSITE or a user-defined pattern against a protein sequence database.

  15. REPdenovo: Inferring De Novo Repeat Motifs from Short Sequence Reads

    PubMed Central

    Chu, Chong; Nielsen, Rasmus; Wu, Yufeng

    2016-01-01

    Repeat elements are important components of eukaryotic genomes. One limitation in our understanding of repeat elements is that most analyses rely on reference genomes that are incomplete and often contain missing data in highly repetitive regions that are difficult to assemble. To overcome this problem we develop a new method, REPdenovo, which assembles repeat sequences directly from raw shotgun sequencing data. REPdenovo can construct various types of repeats that are highly repetitive and have low sequence divergence within copies. We show that REPdenovo is substantially better than existing methods both in terms of the number and the completeness of the repeat sequences that it recovers. The key advantage of REPdenovo is that it can reconstruct long repeats from sequence reads. We apply the method to human data and discover a number of potentially new repeats sequences that have been missed by previous repeat annotations. Many of these sequences are incorporated into various parasite genomes, possibly because the filtering process for host DNA involved in the sequencing of the parasite genomes failed to exclude the host derived repeat sequences. REPdenovo is a new powerful computational tool for annotating genomes and for addressing questions regarding the evolution of repeat families. The software tool, REPdenovo, is available for download at https://github.com/Reedwarbler/REPdenovo. PMID:26977803

  16. Identification of disease-specific motifs in the antibody specificity repertoire via next-generation sequencing.

    PubMed

    Pantazes, Robert J; Reifert, Jack; Bozekowski, Joel; Ibsen, Kelly N; Murray, Joseph A; Daugherty, Patrick S

    2016-01-01

    Disease-specific antibodies can serve as highly effective biomarkers but have been identified for only a relatively small number of autoimmune diseases. A method was developed to identify disease-specific binding motifs through integration of bacterial display peptide library screening, next-generation sequencing (NGS) and computational analysis. Antibody specificity repertoires were determined by identifying bound peptide library members for each specimen using cell sorting and performing NGS. A computational algorithm, termed Identifying Motifs Using Next- generation sequencing Experiments (IMUNE), was developed and applied to discover disease- and healthy control-specific motifs. IMUNE performs comprehensive pattern searches, identifies patterns statistically enriched in the disease or control groups and clusters the patterns to generate motifs. Using celiac disease sera as a discovery set, IMUNE identified a consensus motif (QPEQPF[PS]E) with high diagnostic sensitivity and specificity in a validation sera set, in addition to novel motifs. Peptide display and sequencing (Display-Seq) coupled with IMUNE analysis may thus be useful to characterize antibody repertoires and identify disease-specific antibody epitopes and biomarkers. PMID:27481573

  17. Identification of disease-specific motifs in the antibody specificity repertoire via next-generation sequencing

    PubMed Central

    Pantazes, Robert J.; Reifert, Jack; Bozekowski, Joel; Ibsen, Kelly N.; Murray, Joseph A.; Daugherty, Patrick S.

    2016-01-01

    Disease-specific antibodies can serve as highly effective biomarkers but have been identified for only a relatively small number of autoimmune diseases. A method was developed to identify disease-specific binding motifs through integration of bacterial display peptide library screening, next-generation sequencing (NGS) and computational analysis. Antibody specificity repertoires were determined by identifying bound peptide library members for each specimen using cell sorting and performing NGS. A computational algorithm, termed Identifying Motifs Using Next- generation sequencing Experiments (IMUNE), was developed and applied to discover disease- and healthy control-specific motifs. IMUNE performs comprehensive pattern searches, identifies patterns statistically enriched in the disease or control groups and clusters the patterns to generate motifs. Using celiac disease sera as a discovery set, IMUNE identified a consensus motif (QPEQPF[PS]E) with high diagnostic sensitivity and specificity in a validation sera set, in addition to novel motifs. Peptide display and sequencing (Display-Seq) coupled with IMUNE analysis may thus be useful to characterize antibody repertoires and identify disease-specific antibody epitopes and biomarkers. PMID:27481573

  18. Conserved Hydration Sites in Pin1 Reveal a Distinctive Water Recognition Motif in Proteins.

    PubMed

    Barman, Arghya; Smitherman, Crystal; Souffrant, Michael; Gadda, Giovanni; Hamelberg, Donald

    2016-01-25

    Structurally conserved water molecules are important for biomolecular stability, flexibility, and function. X-ray crystallographic studies of Pin1 have resolved a number of water molecules around the enzyme, including two highly conserved water molecules within the protein. The functional role of these localized water molecules remains unknown and unexplored. Pin1 catalyzes cis/trans isomerizations of peptidyl prolyl bonds that are preceded by a phosphorylated serine or threonine residue. Pin1 is involved in many subcellular signaling processes and is a potential therapeutic target for the treatment of several life threatening diseases. Here, we investigate the significance of these structurally conserved water molecules in the catalytic domain of Pin1 using molecular dynamics (MD) simulations, free energy calculations, analysis of X-ray crystal structures, and circular dichroism (CD) experiments. MD simulations and free energy calculations suggest the tighter binding water molecule plays a crucial role in maintaining the integrity and stability of a critical hydrogen-bonding network in the active site. The second water molecule is exchangeable with bulk solvent and is found in a distinctive helix-turn-coil motif. Structural bioinformatics analysis of nonredundant X-ray crystallographic protein structures in the Protein Data Bank (PDB) suggest this motif is present in several other proteins and can act as a water site, akin to the calcium EF hand. CD experiments suggest the isolated motif is in a distorted PII conformation and requires the protein environment to fully form the α-helix-turn-coil motif. This study provides valuable insights into the role of hydration in the structural integrity of Pin1 that can be exploited in protein engineering and drug design. PMID:26651388

  19. Roquin promotes constitutive mRNA decay via a conserved class of stem-loop recognition motifs.

    PubMed

    Leppek, Kathrin; Schott, Johanna; Reitter, Sonja; Poetz, Fabian; Hammond, Ming C; Stoecklin, Georg

    2013-05-01

    Tumor necrosis factor-α (TNF-α) is the most potent proinflammatory cytokine in mammals. The degradation of TNF-α mRNA is critical for restricting TNF-α synthesis and involves a constitutive decay element (CDE) in the 3' UTR of the mRNA. Here, we demonstrate that the CDE folds into an RNA stem-loop motif that is specifically recognized by Roquin and Roquin2. Binding of Roquin initiates degradation of TNF-α mRNA and limits TNF-α production in macrophages. Roquin proteins promote mRNA degradation by recruiting the Ccr4-Caf1-Not deadenylase complex. CDE sequences are highly conserved and are found in more than 50 vertebrate mRNAs, many of which encode regulators of development and inflammation. In macrophages, CDE-containing mRNAs were identified as the primary targets of Roquin on a transcriptome-wide scale. Thus, Roquin proteins act broadly as mediators of mRNA deadenylation by recognizing a conserved class of stem-loop RNA degradation motifs.

  20. Sequence Motifs in Transit Peptides Act as Independent Functional Units and Can Be Transferred to New Sequence Contexts.

    PubMed

    Lee, Dong Wook; Woo, Seungjin; Geem, Kyoung Rok; Hwang, Inhwan

    2015-09-01

    A large number of nuclear-encoded proteins are imported into chloroplasts after they are translated in the cytosol. Import is mediated by transit peptides (TPs) at the N termini of these proteins. TPs contain many small motifs, each of which is critical for a specific step in the process of chloroplast protein import; however, it remains unknown how these motifs are organized to give rise to TPs with diverse sequences. In this study, we generated various hybrid TPs by swapping domains between Rubisco small subunit (RbcS) and chlorophyll a/b-binding protein, which have highly divergent sequences, and examined the abilities of the resultant TPs to deliver proteins into chloroplasts. Subsequently, we compared the functionality of sequence motifs in the hybrid TPs with those of wild-type TPs. The sequence motifs in the hybrid TPs exhibited three different modes of functionality, depending on their domain composition, as follows: active in both wild-type and hybrid TPs, active in wild-type TPs but inactive in hybrid TPs, and inactive in wild-type TPs but active in hybrid TPs. Moreover, synthetic TPs, in which only three critical motifs from RbcS or chlorophyll a/b-binding protein TPs were incorporated into an unrelated sequence, were able to deliver clients to chloroplasts with a comparable efficiency to RbcS TP. Based on these results, we propose that diverse sequence motifs in TPs are independent functional units that interact with specific translocon components at various steps during protein import and can be transferred to new sequence contexts. PMID:26149569

  1. Using a color-coded ambigraphic nucleic acid notation to visualize conserved palindromic motifs within and across genomes

    PubMed Central

    2014-01-01

    Background Ambiscript is a graphically-designed nucleic acid notation that uses symbol symmetries to support sequence complementation, highlight biologically-relevant palindromes, and facilitate the analysis of consensus sequences. Although the original Ambiscript notation was designed to easily represent consensus sequences for multiple sequence alignments, the notation’s black-on-white ambiguity characters are unable to reflect the statistical distribution of nucleotides found at each position. We now propose a color-augmented ambigraphic notation to encode the frequency of positional polymorphisms in these consensus sequences. Results We have implemented this color-coding approach by creating an Adobe Flash® application ( http://www.ambiscript.org) that shades and colors modified Ambiscript characters according to the prevalence of the encoded nucleotide at each position in the alignment. The resulting graphic helps viewers perceive biologically-relevant patterns in multiple sequence alignments by uniquely combining color, shading, and character symmetries to highlight palindromes and inverted repeats in conserved DNA motifs. Conclusion Juxtaposing an intuitive color scheme over the deliberate character symmetries of an ambigraphic nucleic acid notation yields a highly-functional nucleic acid notation that maximizes information content and successfully embodies key principles of graphic excellence put forth by the statistician and graphic design theorist, Edward Tufte. PMID:24447494

  2. Discovery of conserved motifs in promoters of orthologous genes in prokaryotes.

    PubMed

    Janky, Rekin's; van Helden, Jacques

    2007-01-01

    We present a method to predict cis-acting elements for a given gene by detecting over-represented motifs in promoters of a set of ortholo gous genes in prokaryotes (single-gene, multiple-genomes approach). The method has been used successfully to detect regulatory elements at various taxonomical levels in prokaryotes. A web interface is available at the Regulatory Sequence Analysis Tools site (http://rsat.scmbb.ulb.ac.be/rsat/).

  3. Sequence conservation on the Y chromosome

    SciTech Connect

    Gibson, L.H.; Yang-Feng, L.; Lau, C.

    1994-09-01

    The Y chromosome is present in all mammals and is considered to be essential to sex determination. Despite intense genomic research, only a few genes have been identified and mapped to this chromosome in humans. Several of them, such as SRY and ZFY, have been demonstrated to be conserved and Y-located in other mammals. In order to address the issue of sequence conservation on the Y chromosome, we performed fluorescence in situ hybridization (FISH) with DNA from a human Y cosmid library as a probe to study the Y chromosomes from other mammalian species. Total DNA from 3,000-4,500 cosmid pools were labeled with biotinylated-dUTP and hybridized to metaphase chromosomes. For human and primate preparations, human cot1 DNA was included in the hybridization mixture to suppress the hybridization from repeat sequences. FISH signals were detected on the Y chromosomes of human, gorilla, orangutan and baboon (Old World monkey) and were absent on those of squirrel monkey (New World monkey), Indian munjac, wood lemming, Chinese hamster, rat and mouse. Since sequence analysis suggested that specific genes, e.g. SRY and ZFY, are conserved between these two groups, the lack of detectable hybridization in the latter group implies either that conservation of the human Y sequences is limited to the Y chromosomes of the great apes and Old World monkeys, or that the size of the syntenic segment is too small to be detected under the resolution of FISH, or that homologeous sequences have undergone considerable divergence. Further studies with reduced hybridization stringency are currently being conducted. Our results provide some clues as to Y-sequence conservation across species and demonstrate the limitations of FISH across species with total DNA sequences from a particular chromosome.

  4. Mutational analysis of two highly conserved motifs in the silencing suppressor encoded by tomato spotted wilt virus (genus Tospovirus, family Bunyaviridae).

    PubMed

    Zhai, Ying; Bag, Sudeep; Mitter, Neena; Turina, Massimo; Pappu, Hanu R

    2014-06-01

    Tospoviruses cause serious economic losses to a wide range of field and horticultural crops on a global scale. The NSs gene encoded by tospoviruses acts as a suppressor of host plant defense. We identified amino acid motifs that are conserved in all of the NSs proteins of tospoviruses for which the sequence is known. Using tomato spotted wilt virus (TSWV) as a model, the role of these motifs in suppressor activity of NSs was investigated. Using site-directed point mutations in two conserved motifs, glycine, lysine and valine/threonine (GKV/T) at positions 181-183 and tyrosine and leucine (YL) at positions 412-413, and an assay to measure the reversal of gene silencing in Nicotiana benthamiana line 16c, we show that substitutions (K182 to A, and L413 to A) in these motifs abolished suppressor activity of the NSs protein, indicating that these two motifs are essential for the RNAi suppressor function of tospoviruses. PMID:24363189

  5. Viroids: from genotype to phenotype just relying on RNA sequence and structural motifs.

    PubMed

    Flores, Ricardo; Serra, Pedro; Minoia, Sofía; Di Serio, Francesco; Navarro, Beatriz

    2012-01-01

    As a consequence of two unique physical properties, small size and circularity, viroid RNAs do not code for proteins and thus depend on RNA sequence/structural motifs for interacting with host proteins that mediate their invasion, replication, spread, and circumvention of defensive barriers. Viroid genomes fold up on themselves adopting collapsed secondary structures wherein stretches of nucleotides stabilized by Watson-Crick pairs are flanked by apparently unstructured loops. However, compelling data show that they are instead stabilized by alternative non-canonical pairs and that specific loops in the rod-like secondary structure, characteristic of Potato spindle tuber viroid and most other members of the family Pospiviroidae, are critical for replication and systemic trafficking. In contrast, rather than folding into a rod-like secondary structure, most members of the family Avsunviroidae adopt multibranched conformations occasionally stabilized by kissing-loop interactions critical for viroid viability in vivo. Besides these most stable secondary structures, viroid RNAs alternatively adopt during replication transient metastable conformations containing elements of local higher-order structure, prominent among which are the hammerhead ribozymes catalyzing a key replicative step in the family Avsunviroidae, and certain conserved hairpins that also mediate replication steps in the family Pospiviroidae. Therefore, different RNA structures - either global or local - determine different functions, thus highlighting the need for in-depth structural studies on viroid RNAs.

  6. Stanniocalcin 1 binds hemin through a partially conserved heme regulatory motif

    SciTech Connect

    Westberg, Johan A.; Jiang, Ji; Andersson, Leif C.

    2011-06-03

    Highlights: {yields} Stanniocalcin 1 (STC1) binds heme through novel heme binding motif. {yields} Central iron atom of heme and cysteine-114 of STC1 are essential for binding. {yields} STC1 binds Fe{sup 2+} and Fe{sup 3+} heme. {yields} STC1 peptide prevents oxidative decay of heme. -- Abstract: Hemin (iron protoporphyrin IX) is a necessary component of many proteins, functioning either as a cofactor or an intracellular messenger. Hemoproteins have diverse functions, such as transportation of gases, gas detection, chemical catalysis and electron transfer. Stanniocalcin 1 (STC1) is a protein involved in respiratory responses of the cell but whose mechanism of action is still undetermined. We examined the ability of STC1 to bind hemin in both its reduced and oxidized states and located Cys{sup 114} as the axial ligand of the central iron atom of hemin. The amino acid sequence differs from the established (Cys-Pro) heme regulatory motif (HRM) and therefore presents a novel heme binding motif (Cys-Ser). A STC1 peptide containing the heme binding sequence was able to inhibit both spontaneous and H{sub 2}O{sub 2} induced decay of hemin. Binding of hemin does not affect the mitochondrial localization of STC1.

  7. Membrane localization of MinD is mediated by a C-terminal motif that is conserved across eubacteria, archaea, and chloroplasts.

    PubMed

    Szeto, Tim H; Rowland, Susan L; Rothfield, Lawrence I; King, Glenn F

    2002-11-26

    MinD is a widely conserved ATPase that has been demonstrated to play a pivotal role in selection of the division site in eubacteria and chloroplasts. It is a member of the large ParA superfamily of ATPases that are characterized by a deviant Walker-type ATP-binding motif. MinD localizes to the cytoplasmic face of the inner membrane in Escherichia coli, and its association with the inner membrane is a prerequisite for membrane recruitment of the septation inhibitor MinC. However, the mechanism by which MinD associates with the membrane has proved enigmatic; it seems to lack a transmembrane domain and the amino acid sequence is devoid of hydrophobic tracts that might predispose the protein to interaction with lipids. In this study, we show that the extreme C-terminal region of MinD contains a highly conserved 8- to 12-residue sequence motif that is essential for membrane localization of the protein. We provide evidence that this motif forms an amphipathic helix that most likely mediates a direct interaction between MinD and membrane phospholipids. A model is proposed whereby the membrane-targeting motif mediates the rapid cycles of membrane attachment-release-reattachment that are presumed to occur during pole-to-pole oscillation of MinD in E. coli. PMID:12424340

  8. Role of conserved intracellular motifs in Serrate signalling, cis-inhibition and endocytosis

    PubMed Central

    Glittenberg, Marcus; Pitsouli, Chrysoula; Garvey, Clare; Delidakis, Christos; Bray, Sarah

    2006-01-01

    Notch is the receptor in a signalling pathway that operates in a diverse spectrum of developmental processes. Its ligands (e.g. Serrate) are transmembrane proteins whose signalling competence is regulated by the endocytosis-promoting E3 ubiquitin ligases, Mindbomb1 and Neuralized. The ligands also inhibit Notch present in the same cell (cis-inhibition). Here, we identify two conserved motifs in the intracellular domain of Serrate that are required for efficient endocytosis. The first, a dileucine motif, is dispensable for trans-activation and cis-inhibition despite the endocytic defect, demonstrating that signalling can be separated from bulk endocytosis. The second, a novel motif, is necessary for interactions with Mindbomb1/Neuralized and is strictly required for Serrate to trans-activate and internalise efficiently but not for it to inhibit Notch signalling. Cis-inhibition is compromised when an ER retention signal is added to Serrate, or when the levels of Neuralized are increased, and together these data indicate that cis-inhibitory interactions occur at the cell surface. The balance of ubiquitinated/unubiquitinated ligand will thus affect the signalling capacity of the cell at several levels. PMID:17006545

  9. Phylum-Level Conservation of Regulatory Information in Nematodes despite Extensive Non-coding Sequence Divergence

    PubMed Central

    Gordon, Kacy L.; Arthur, Robert K.; Ruvinsky, Ilya

    2015-01-01

    Gene regulatory information guides development and shapes the course of evolution. To test conservation of gene regulation within the phylum Nematoda, we compared the functions of putative cis-regulatory sequences of four sets of orthologs (unc-47, unc-25, mec-3 and elt-2) from distantly-related nematode species. These species, Caenorhabditis elegans, its congeneric C. briggsae, and three parasitic species Meloidogyne hapla, Brugia malayi, and Trichinella spiralis, represent four of the five major clades in the phylum Nematoda. Despite the great phylogenetic distances sampled and the extensive sequence divergence of nematode genomes, all but one of the regulatory elements we tested are able to drive at least a subset of the expected gene expression patterns. We show that functionally conserved cis-regulatory elements have no more extended sequence similarity to their C. elegans orthologs than would be expected by chance, but they do harbor motifs that are important for proper expression of the C. elegans genes. These motifs are too short to be distinguished from the background level of sequence similarity, and while identical in sequence they are not conserved in orientation or position. Functional tests reveal that some of these motifs contribute to proper expression. Our results suggest that conserved regulatory circuitry can persist despite considerable turnover within cis elements. PMID:26020930

  10. Phylum-Level Conservation of Regulatory Information in Nematodes despite Extensive Non-coding Sequence Divergence.

    PubMed

    Gordon, Kacy L; Arthur, Robert K; Ruvinsky, Ilya

    2015-05-01

    Gene regulatory information guides development and shapes the course of evolution. To test conservation of gene regulation within the phylum Nematoda, we compared the functions of putative cis-regulatory sequences of four sets of orthologs (unc-47, unc-25, mec-3 and elt-2) from distantly-related nematode species. These species, Caenorhabditis elegans, its congeneric C. briggsae, and three parasitic species Meloidogyne hapla, Brugia malayi, and Trichinella spiralis, represent four of the five major clades in the phylum Nematoda. Despite the great phylogenetic distances sampled and the extensive sequence divergence of nematode genomes, all but one of the regulatory elements we tested are able to drive at least a subset of the expected gene expression patterns. We show that functionally conserved cis-regulatory elements have no more extended sequence similarity to their C. elegans orthologs than would be expected by chance, but they do harbor motifs that are important for proper expression of the C. elegans genes. These motifs are too short to be distinguished from the background level of sequence similarity, and while identical in sequence they are not conserved in orientation or position. Functional tests reveal that some of these motifs contribute to proper expression. Our results suggest that conserved regulatory circuitry can persist despite considerable turnover within cis elements.

  11. Novel hexamerization motif is discovered in a conserved cytoplasmic protein from Salmonella typhimurium.

    SciTech Connect

    Petrova, T.; Cuff, M.; Wu, R.; Kim, Y.; Holzle, D.; Joachimiak, A.; Biosciences Division; Inst. of Mathematical Problems of Biology

    2007-01-01

    The cytoplasmic protein Stm3548 of unknown function obtained from a strain of Salmonella typhimurium was determined by X-ray crystallography at a resolution of 2.25 A. The asymmetric unit contains a hexamer of structurally identical monomers. The monomer is a globular domain with a long beta-hairpin protrusion that distinguishes this structure. This beta-hairpin occupies a central position in the hexamer, and its residues participate in the majority of interactions between subunits of the hexamer. We suggest that the structure of Stm3548 presents a new hexamerization motif. Because the residues participating in interdomain interactions are highly conserved among close members of protein family DUF1355 and buried solvent accessible area for the hexamer is significant, the hexamer is most likely conserved as well. A light scattering experiment confirmed the presence of hexamer in solution.

  12. Discovery of sequence motifs related to coexpression of genes using evolutionary computation

    PubMed Central

    Fogel, Gary B.; Weekes, Dana G.; Varga, Gabor; Dow, Ernst R.; Harlow, Harry B.; Onyia, Jude E.; Su, Chen

    2004-01-01

    Transcription factors are key regulatory elements that control gene expression. Recognition of transcription factor binding site (TFBS) motifs in the upstream region of coexpressed genes is therefore critical towards a true understanding of the regulations of gene expression. The task of discovering eukaryotic TFBSs remains a challenging problem. Here, we demonstrate that evolutionary computation can be used to search for TFBSs in upstream regions of genes known to be coexpressed. Evolutionary computation was used to search for TFBSs of genes regulated by octamer-binding factor and nuclear factor kappa B. The discovered binding sites included experimentally determined known binding motifs as well as lists of putative, previously unknown TFBSs. We believe that this method to search nucleotide sequence information efficiently for similar motifs will be useful for discovering TFBSs that affect gene regulation. PMID:15266008

  13. A Conserved GPG-Motif in the HIV-1 Nef Core Is Required for Principal Nef-Activities.

    PubMed

    Martínez-Bonet, Marta; Palladino, Claudia; Briz, Veronica; Rudolph, Jochen M; Fackler, Oliver T; Relloso, Miguel; Muñoz-Fernandez, Maria Angeles; Madrid, Ricardo

    2015-01-01

    To find out new determinants required for Nef activity we performed a functional alanine scanning analysis along a discrete but highly conserved region at the core of HIV-1 Nef. We identified the GPG-motif, located at the 121-137 region of HIV-1 NL4.3 Nef, as a novel protein signature strictly required for the p56Lck dependent Nef-induced CD4-downregulation in T-cells. Since the Nef-GPG motif was dispensable for CD4-downregulation in HeLa-CD4 cells, Nef/AP-1 interaction and Nef-dependent effects on Tf-R trafficking, the observed effects on CD4 downregulation cannot be attributed to structure constraints or to alterations on general protein trafficking. Besides, we found that the GPG-motif was also required for Nef-dependent inhibition of ring actin re-organization upon TCR triggering and MHCI downregulation, suggesting that the GPG-motif could actively cooperate with the Nef PxxP motif for these HIV-1 Nef-related effects. Finally, we observed that the Nef-GPG motif was required for optimal infectivity of those viruses produced in T-cells. According to these findings, we propose the conserved GPG-motif in HIV-1 Nef as functional region required for HIV-1 infectivity and therefore with a potential interest for the interference of Nef activity during HIV-1 infection. PMID:26700863

  14. A Conserved GPG-Motif in the HIV-1 Nef Core Is Required for Principal Nef-Activities

    PubMed Central

    Martínez-Bonet, Marta; Palladino, Claudia; Briz, Veronica; Rudolph, Jochen M.; Fackler, Oliver T.; Relloso, Miguel; Muñoz-Fernandez, Maria Angeles; Madrid, Ricardo

    2015-01-01

    To find out new determinants required for Nef activity we performed a functional alanine scanning analysis along a discrete but highly conserved region at the core of HIV-1 Nef. We identified the GPG-motif, located at the 121–137 region of HIV-1 NL4.3 Nef, as a novel protein signature strictly required for the p56Lck dependent Nef-induced CD4-downregulation in T-cells. Since the Nef-GPG motif was dispensable for CD4-downregulation in HeLa-CD4 cells, Nef/AP-1 interaction and Nef-dependent effects on Tf-R trafficking, the observed effects on CD4 downregulation cannot be attributed to structure constraints or to alterations on general protein trafficking. Besides, we found that the GPG-motif was also required for Nef-dependent inhibition of ring actin re-organization upon TCR triggering and MHCI downregulation, suggesting that the GPG-motif could actively cooperate with the Nef PxxP motif for these HIV-1 Nef-related effects. Finally, we observed that the Nef-GPG motif was required for optimal infectivity of those viruses produced in T-cells. According to these findings, we propose the conserved GPG-motif in HIV-1 Nef as functional region required for HIV-1 infectivity and therefore with a potential interest for the interference of Nef activity during HIV-1 infection. PMID:26700863

  15. Using machine learning to predict gene expression and discover sequence motifs

    NASA Astrophysics Data System (ADS)

    Li, Xuejing

    Recently, large amounts of experimental data for complex biological systems have become available. We use tools and algorithms from machine learning to build data-driven predictive models. We first present a novel algorithm to discover gene sequence motifs associated with temporal expression patterns of genes. Our algorithm, which is based on partial least squares (PLS) regression, is able to directly model the flow of information, from gene sequence to gene expression, to learn cis regulatory motifs and characterize associated gene expression patterns. Our algorithm outperforms traditional computational methods e.g. clustering in motif discovery. We then present a study of extending a machine learning model for transcriptional regulation predictive of genetic regulatory response to Caenorhabditis elegans. We show meaningful results both in terms of prediction accuracy on the test experiments and biological information extracted from the regulatory program. The model discovers DNA binding sites ab initio. We also present a case study where we detect a signal of lineage-specific regulation. Finally we present a comparative study on learning predictive models for motif discovery, based on different boosting algorithms: Adaptive Boosting (AdaBoost), Linear Programming Boosting (LPBoost) and Totally Corrective Boosting (TotalBoost). We evaluate and compare the performance of the three boosting algorithms via both statistical and biological validation, for hypoxia response in Saccharomyces cerevisiae.

  16. A Conserved Three-nucleotide Core Motif Defines Musashi RNA Binding Specificity*

    PubMed Central

    Zearfoss, N. Ruth; Deveau, Laura M.; Clingman, Carina C.; Schmidt, Eric; Johnson, Emily S.; Massi, Francesca; Ryder, Sean P.

    2014-01-01

    Musashi (MSI) family proteins control cell proliferation and differentiation in many biological systems. They are overexpressed in tumors of several origins, and their expression level correlates with poor prognosis. MSI proteins control gene expression by binding RNA and regulating its translation. They contain two RNA recognition motif (RRM) domains, which recognize a defined sequence element. The relative contribution of each nucleotide to the binding affinity and specificity is unknown. We analyzed the binding specificity of three MSI family RRM domains using a quantitative fluorescence anisotropy assay. We found that the core element driving recognition is the sequence UAG. Nucleotides outside of this motif have a limited contribution to binding free energy. For mouse MSI1, recognition is determined by the first of the two RRM domains. The second RRM adds affinity but does not contribute to binding specificity. In contrast, the recognition element for Drosophila MSI is more extensive than the mouse homolog, suggesting functional divergence. The short nature of the binding determinant suggests that protein-RNA affinity alone is insufficient to drive target selection by MSI family proteins. PMID:25368328

  17. In planta analysis of a cis-regulatory cytokinin response motif in Arabidopsis and identification of a novel enhancer sequence.

    PubMed

    Ramireddy, Eswarayya; Brenner, Wolfram G; Pfeifer, Andreas; Heyl, Alexander; Schmülling, Thomas

    2013-07-01

    The phytohormone cytokinin plays a key role in regulating plant growth and development, and is involved in numerous physiological responses to environmental changes. The type-B response regulators, which regulate the transcription of cytokinin response genes, are a part of the cytokinin signaling system. Arabidopsis thaliana encodes 11 type-B response regulators (type-B ARRs), and some of them were shown to bind in vitro to the core cytokinin response motif (CRM) 5'-(A/G)GAT(T/C)-3' or, in the case of ARR1, to an extended motif (ECRM), 5'-AAGAT(T/C)TT-3'. Here we obtained in planta proof for the functionality of the latter motif. Promoter deletion analysis of the primary cytokinin response gene ARR6 showed that a combination of two extended motifs within the promoter is required to mediate the full transcriptional activation by ARR1 and other type-B ARRs. CRMs were found to be over-represented in the vicinity of ECRMs in the promoters of cytokinin-regulated genes, suggesting their functional relevance. Moreover, an evolutionarily conserved 27 bp long T-rich region between -220 and -193 bp was identified and shown to be required for the full activation by type-B ARRs and the response to cytokinin. This novel enhancer is not bound by the DNA-binding domain of ARR1, indicating that additional proteins might be involved in mediating the transcriptional cytokinin response. Furthermore, genome-wide expression profiling identified genes, among them ARR16, whose induction by cytokinin depends on both ARR1 and other specific type-B ARRs. This together with the ECRM/CRM sequence clustering indicates cooperative action of different type-B ARRs for the activation of particular target genes. PMID:23620480

  18. A Convex Atomic-Norm Approach to Multiple Sequence Alignment and Motif Discovery

    PubMed Central

    Yen, Ian E. H.; Lin, Xin; Zhang, Jiong; Ravikumar, Pradeep; Dhillon, Inderjit S.

    2016-01-01

    Multiple Sequence Alignment and Motif Discovery, known as NP-hard problems, are two fundamental tasks in Bioinformatics. Existing approaches to these two problems are based on either local search methods such as Expectation Maximization (EM), Gibbs Sampling or greedy heuristic methods. In this work, we develop a convex relaxation approach to both problems based on the recent concept of atomic norm and develop a new algorithm, termed Greedy Direction Method of Multiplier, for solving the convex relaxation with two convex atomic constraints. Experiments show that our convex relaxation approach produces solutions of higher quality than those standard tools widely-used in Bioinformatics community on the Multiple Sequence Alignment and Motif Discovery problems. PMID:27559428

  19. Immortal coils: conserved dimerization motifs of the Drosophila ovulation prohormone ovulin.

    PubMed

    Wong, Alex; Christopher, Adam B; Buehner, Norene A; Wolfner, Mariana F

    2010-04-01

    Dimerization is an important feature of the function of some proteins, including prohormones. For proteins whose amino acid sequences evolve rapidly, it is unclear how such structural characteristics are retained biochemically. Here we address this question by focusing on ovulin, a prohormone that induces ovulation in Drosophila melanogaster females after mating. Ovulin is known to dimerize, and is one of the most rapidly evolving proteins encoded by the Drosophila genome. We show that residues within a previously hypothesized conserved dimerization domain (a coiled-coil) and a newly identified conserved dimerization domain (YxxxY) within ovulin are necessary for the formation of ovulin dimers. Moreover, dimerization is conserved in ovulin proteins from non-melanogaster species of Drosophila despite up to 80% sequence divergence. We show that heterospecific ovulin dimers can be formed in interspecies hybrid animals and in two-hybrid assays between ovulin proteins that are 15% diverged, indicating conservation of tertiary structure amidst a background of rapid sequence evolution. Our results suggest that because ovulin's self-interaction requires only small conserved domains, the rest of the molecule can be relatively tolerant to mutations. Consistent with this view, in comparisons of 8510 proteins across 6 species of Drosophila we find that rates of amino acid divergence are higher for proteins with coiled-coil protein-interaction domains than for non-coiled-coil proteins.

  20. Integrating bioinformatic resources to predict transcription factors interacting with cis-sequences conserved in co-regulated genes

    PubMed Central

    2014-01-01

    Background Using motif detection programs it is fairly straightforward to identify conserved cis-sequences in promoters of co-regulated genes. In contrast, the identification of the transcription factors (TFs) interacting with these cis-sequences is much more elaborate. To facilitate this, we explore the possibility of using several bioinformatic and experimental approaches for TF identification. This starts with the selection of co-regulated gene sets and leads first to the prediction and then to the experimental validation of TFs interacting with cis-sequences conserved in the promoters of these co-regulated genes. Results Using the PathoPlant database, 32 up-regulated gene groups were identified with microarray data for drought-responsive gene expression from Arabidopsis thaliana. Application of the binding site estimation suite of tools (BEST) discovered 179 conserved sequence motifs within the corresponding promoters. Using the STAMP web-server, 49 sequence motifs were classified into 7 motif families for which similarities with known cis-regulatory sequences were identified. All motifs were subjected to a footprintDB analysis to predict interacting DNA binding domains from plant TF families. Predictions were confirmed by using a yeast-one-hybrid approach to select interacting TFs belonging to the predicted TF families. TF-DNA interactions were further experimentally validated in yeast and with a Physcomitrella patens transient expression system, leading to the discovery of several novel TF-DNA interactions. Conclusions The present work demonstrates the successful integration of several bioinformatic resources with experimental approaches to predict and validate TFs interacting with conserved sequence motifs in co-regulated genes. PMID:24773781

  1. RAD51 interacts with the evolutionarily conserved BRC motifs in the human breast cancer susceptibility gene brca2.

    PubMed

    Wong, A K; Pero, R; Ormonde, P A; Tavtigian, S V; Bartel, P L

    1997-12-19

    Recent work has shown that the murine BRCA2 tumor suppressor protein interacts with the murine RAD51 protein. This interaction suggests that BRCA2 participates in DNA repair. Residues 3196-3232 of the murine BRCA2 protein were shown to be involved in this interaction. Here, we report the detailed mapping of additional domains that are involved in interactions between the human homologs of these two proteins. Through yeast two-hybrid and biochemical assays, we demonstrate that the RAD51 protein interacts specifically with the eight evolutionarily conserved BRC motifs encoded in exon 11 of brca2 and with a similar motif found in a Caenorhabditis elegans hypothetical protein. Deletion analysis demonstrates that residues 98-339 of human RAD51 interact with the 59-residue minimal region that is conserved in all BRC motifs. These data suggest that the BRC repeats function to bind RAD51.

  2. Conserved Intramolecular Interactions Maintain Myosin Interacting-Heads Motifs Explaining Tarantula Muscle Super-Relaxed State Structural Basis.

    PubMed

    Alamo, Lorenzo; Qi, Dan; Wriggers, Willy; Pinto, Antonio; Zhu, Jingui; Bilbao, Aivett; Gillilan, Richard E; Hu, Songnian; Padrón, Raúl

    2016-03-27

    Tarantula striated muscle is an outstanding system for understanding the molecular organization of myosin filaments. Three-dimensional reconstruction based on cryo-electron microscopy images and single-particle image processing revealed that, in a relaxed state, myosin molecules undergo intramolecular head-head interactions, explaining why head activity switches off. The filament model obtained by rigidly docking a chicken smooth muscle myosin structure to the reconstruction was improved by flexibly fitting an atomic model built by mixing structures from different species to a tilt-corrected 2-nm three-dimensional map of frozen-hydrated tarantula thick filament. We used heavy and light chain sequences from tarantula myosin to build a single-species homology model of two heavy meromyosin interacting-heads motifs (IHMs). The flexibly fitted model includes previously missing loops and shows five intramolecular and five intermolecular interactions that keep the IHM in a compact off structure, forming four helical tracks of IHMs around the backbone. The residues involved in these interactions are oppositely charged, and their sequence conservation suggests that IHM is present across animal species. The new model, PDB 3JBH, explains the structural origin of the ATP turnover rates detected in relaxed tarantula muscle by ascribing the very slow rate to docked unphosphorylated heads, the slow rate to phosphorylated docked heads, and the fast rate to phosphorylated undocked heads. The conservation of intramolecular interactions across animal species and the presence of IHM in bilaterians suggest that a super-relaxed state should be maintained, as it plays a role in saving ATP in skeletal, cardiac, and smooth muscles. PMID:26851071

  3. CyanoLyase: a database of phycobilin lyase sequences, motifs and functions.

    PubMed

    Bretaudeau, Anthony; Coste, François; Humily, Florian; Garczarek, Laurence; Le Corguillé, Gildas; Six, Christophe; Ratin, Morgane; Collin, Olivier; Schluchter, Wendy M; Partensky, Frédéric

    2013-01-01

    CyanoLyase (http://cyanolyase.genouest.org/) is a manually curated sequence and motif database of phycobilin lyases and related proteins. These enzymes catalyze the covalent ligation of chromophores (phycobilins) to specific binding sites of phycobiliproteins (PBPs). The latter constitute the building bricks of phycobilisomes, the major light-harvesting systems of cyanobacteria and red algae. Phycobilin lyases sequences are poorly annotated in public databases. Sequences included in CyanoLyase were retrieved from all available genomes of these organisms and a few others by similarity searches using biochemically characterized enzyme sequences and then classified into 3 clans and 32 families. Amino acid motifs were computed for each family using Protomata learner. CyanoLyase also includes BLAST and a novel pattern matching tool (Protomatch) that allow users to rapidly retrieve and annotate lyases from any new genome. In addition, it provides phylogenetic analyses of all phycobilin lyases families, describes their function, their presence/absence in all genomes of the database (phyletic profiles) and predicts the chromophorylation of PBPs in each strain. The site also includes a thorough bibliography about phycobilin lyases and genomes included in the database. This resource should be useful to scientists and companies interested in natural or artificial PBPs, which have a number of biotechnological applications, notably as fluorescent markers.

  4. Sequence motifs associated with hepatotoxicity of locked nucleic acid—modified antisense oligonucleotides

    PubMed Central

    Burdick, Andrew D.; Sciabola, Simone; Mantena, Srinivasa R.; Hollingshead, Brett D.; Stanton, Robert; Warneke, James A.; Zeng, Ming; Martsen, Elena; Medvedev, Alexander; Makarov, Sergei S.; Reed, Lori A.; Davis, John W.; Whiteley, Laurence O.

    2014-01-01

    Fully phosphorothioate antisense oligonucleotides (ASOs) with locked nucleic acids (LNAs) improve target affinity, RNase H activation and stability. LNA modified ASOs can cause hepatotoxicity, and this risk is currently not fully understood. In vitro cytotoxicity screens have not been reliable predictors of hepatic toxicity in non-clinical testing; however, mice are considered to be a sensitive test species. To better understand the relationship between nucleotide sequence and hepatotoxicity, a structure–toxicity analysis was performed using results from 2 week repeated-dose-tolerability studies in mice administered LNA-modified ASOs. ASOs targeting human Apolipoprotien C3 (Apoc3), CREB (cAMP Response Element Binding Protein) Regulated Transcription Coactivator 2 (Crtc2) or Glucocorticoid Receptor (GR, NR3C1) were classified based upon the presence or absence of hepatotoxicity in mice. From these data, a random-decision forest-classification model generated from nucleotide sequence descriptors identified two trinucleotide motifs (TCC and TGC) that were present only in hepatotoxic sequences. We found that motif containing sequences were more likely to bind to hepatocellular proteins in vitro and increased P53 and NRF2 stress pathway activity in vivo. These results suggest in silico approaches can be utilized to establish structure–toxicity relationships of LNA-modified ASOs and decrease the likelihood of hepatotoxicity in preclinical testing. PMID:24550163

  5. A dominant negative mutation in the conserved RNA helicase motif 'SAT' causes splicing factor PRP2 to stall in spliceosomes.

    PubMed Central

    Plumpton, M; McGarvey, M; Beggs, J D

    1994-01-01

    To characterize sequences in the RNA helicase-like PRP2 protein of Saccharomyces cerevisiae that are essential for its function in pre-mRNA splicing, a pool of random PRP2 mutants was generated. A dominant negative allele was isolated which, when overexpressed in a wild-type yeast strain, inhibited cell growth by causing a defect in pre-mRNA splicing. This defect was partially alleviated by simultaneous co-overexpression of wild-type PRP2. The dominant negative PRP2 protein inhibited splicing in vitro and caused the accumulation of stalled splicing complexes. Immunoprecipitation with anti-PRP2 antibodies confirmed that dominant negative PRP2 protein competed with its wild-type counterpart for interaction with spliceosomes, with which the mutant protein remained associated. The PRP2-dn1 mutation led to a single amino acid change within the conserved SAT motif that in the prototype helicase eIF-4A is required for RNA unwinding. Purified dominant negative PRP2 protein had approximately 40% of the wild-type level of RNA-stimulated ATPase activity. As ATPase activity was reduced only slightly, but splicing activity was abolished, we propose that the dominant negative phenotype is due primarily to a defect in the putative RNA helicase activity of PRP2 protein. Images PMID:8112301

  6. CDR3β sequence motifs regulate autoreactivity of human invariant NKT cell receptors.

    PubMed

    Chamoto, Kenji; Guo, Tingxi; Imataki, Osamu; Tanaka, Makito; Nakatsugawa, Munehide; Ochi, Toshiki; Yamashita, Yuki; Saito, Akiko M; Saito, Toshiki I; Butler, Marcus O; Hirano, Naoto

    2016-04-01

    Invariant natural killer T (iNKT) cells are a subset of T lymphocytes that recognize lipid ligands presented by monomorphic CD1d. Human iNKT T cell receptor (TCR) is largely composed of invariant Vα24 (Vα24i) TCRα chain and semi-variant Vβ11 TCRβ chain, where complementarity-determining region (CDR)3β is the sole variable region. One of the characteristic features of iNKT cells is that they retain autoreactivity even after the thymic selection. However, the molecular features of human iNKT TCR CDR3β sequences that regulate autoreactivity remain unknown. Since the numbers of iNKT cells with detectable autoreactivity in peripheral blood is limited, we introduced the Vα24i gene into peripheral T cells and generated a de novo human iNKT TCR repertoire. By stimulating the transfected T cells with artificial antigen presenting cells (aAPCs) presenting self-ligands, we enriched strongly autoreactive iNKT TCRs and isolated a large panel of human iNKT TCRs with a broad range autoreactivity. From this panel of unique iNKT TCRs, we deciphered three CDR3β sequence motifs frequently encoded by strongly-autoreactive iNKT TCRs: a VD region with 2 or more acidic amino acids, usage of the Jβ2-5 allele, and a CDR3β region of 13 amino acids in length. iNKT TCRs encoding 2 or 3 sequence motifs also exhibit higher autoreactivity than those encoding 0 or 1 motifs. These data facilitate our understanding of the molecular basis for human iNKT cell autoreactivity involved in immune responses associated with human disease. PMID:26748722

  7. Specific Sequence Motifs Direct the Oxygenation and Chlorination of Tryptophan by Myeloperoxidase

    PubMed Central

    Fu, Xiaoyun; Wang, Yi; Kao, Jeffery; Irwin, Angela; d’Avignon, André; Mecham, Robert P.; Parks, William C.; Heinecke, Jay W.

    2008-01-01

    Most studies of protein oxidation have typically focused on the reactivity of single amino acid side chains while ignoring the potential importance of adjacent sequences in directing the reaction pathway. We previously showed that hypochlorous acid (HOCl), a specific product of myeloperoxidase, inactivates matrilysin by modifying adjacent tryptophan and glycine (WG) residues in the catalytic domain. Here, we use model peptides that mimic the region of matrilysin involved in this reaction, VVWGTA, VVWATA and the library VVWXTA, to determine whether specific sequence motifs are targeted for chlorination or oxygenation by myeloperoxidase. Our results demonstrate that HOCl generated by myeloperoxidase or activated neutrophils converts the peptide VVWGTA to a chlorinated product, WG+32(Cl). Tandem mass spectrometry in concert with high resolution 1H and two-dimensional NMR analysis revealed that the modification required cross-linking of the tryptophan to the amide of glycine followed by chlorination of the indole ring of tryptophan. In contrast, when glycine in the peptide was replaced with alanine, the major products were mono- and di-oxygenated tryptophan residues. When the peptide library VVWXTA (where X represents all 20 common amino acids) was exposed to HOCl, only WG produced a high yield of the chloroindolenine derivative. However, when glycine was replaced by other amino acids, oxygenated tryptophan derivatives were the major products. Our observations indicate that WG may represent a specific sequence motif in proteins that is targeted for chlorination by myeloperoxidase. PMID:16548523

  8. Sequence-dependent stability test of a left-handed β-helix motif.

    PubMed

    Hayre, Natha R; Singh, Rajiv R P; Cox, Daniel L

    2012-03-21

    The left-handed β-helix (LHBH) is an intriguing, rare structural pattern in polypeptides that has been implicated in the formation of amyloid aggregates. We used accurate all-atom replica-exchange molecular dynamics (REMD) simulations to study the relative stability of diverse sequences in the LHBH conformation. Ensemble-average coordinates from REMD served as a scoring criterion to identify sequences and threadings optimally suited to the LHBH, as in a fold recognition paradigm. We examined the repeatability of our REMD simulations, finding that single simulations can be reliable to a quantifiable extent. We find expected behavior for the positive and negative control cases of a native LHBH and intrinsically disordered sequences, respectively. Polyglutamine and a designed hexapeptide repeat show remarkable affinity for the LHBH motif. A structural model for misfolded murine prion protein was also considered, and showed intermediate stability under the given conditions. Our technique is found to be an effective probe of LHBH stability, and promises to be scalable to broader studies of this and potentially other novel or rare motifs. The superstable character of the designed hexapeptide repeat suggests theoretical and experimental follow-ups.

  9. Multiple cellular proteins interact with LEDGF/p75 through a conserved unstructured consensus motif.

    PubMed

    Tesina, Petr; Čermáková, Kateřina; Hořejší, Magdalena; Procházková, Kateřina; Fábry, Milan; Sharma, Subhalakshmi; Christ, Frauke; Demeulemeester, Jonas; Debyser, Zeger; De Rijck, Jan; Veverka, Václav; Řezáčová, Pavlína

    2015-01-01

    Lens epithelium-derived growth factor (LEDGF/p75) is an epigenetic reader and attractive therapeutic target involved in HIV integration and the development of mixed lineage leukaemia (MLL1) fusion-driven leukaemia. Besides HIV integrase and the MLL1-menin complex, LEDGF/p75 interacts with various cellular proteins via its integrase binding domain (IBD). Here we present structural characterization of IBD interactions with transcriptional repressor JPO2 and domesticated transposase PogZ, and show that the PogZ interaction is nearly identical to the interaction of LEDGF/p75 with MLL1. The interaction with the IBD is maintained by an intrinsically disordered IBD-binding motif (IBM) common to all known cellular partners of LEDGF/p75. In addition, based on IBM conservation, we identify and validate IWS1 as a novel LEDGF/p75 interaction partner. Our results also reveal how HIV integrase efficiently displaces cellular binding partners from LEDGF/p75. Finally, the similar binding modes of LEDGF/p75 interaction partners represent a new challenge for the development of selective interaction inhibitors.

  10. Conserved motifs reveal details of ancestry and structure in the small TIM chaperones of the mitochondrial intermembrane space.

    PubMed

    Gentle, Ian E; Perry, Andrew J; Alcock, Felicity H; Likić, Vladimir A; Dolezal, Pavel; Ng, Ee Ting; Purcell, Anthony W; McConnville, Malcolm; Naderer, Thomas; Chanez, Anne-Laure; Charrière, Fabien; Aschinger, Caroline; Schneider, André; Tokatlidis, Kostas; Lithgow, Trevor

    2007-05-01

    The mitochondrial inner and outer membranes are composed of a variety of integral membrane proteins, assembled into the membranes posttranslationally. The small translocase of the inner mitochondrial membranes (TIMs) are a group of approximately 10 kDa proteins that function as chaperones to ferry the imported proteins across the mitochondrial intermembrane space to the outer and inner membranes. In yeast, there are 5 small TIM proteins: Tim8, Tim9, Tim10, Tim12, and Tim13, with equivalent proteins reported in humans. Using hidden Markov models, we find that many eukaryotes have proteins equivalent to the Tim8 and Tim13 and the Tim9 and Tim10 subunits. Some eukaryotes provide "snapshots" of evolution, with a single protein showing the features of both Tim8 and Tim13, suggesting that a single progenitor gene has given rise to each of the small TIMs through duplication and modification. We show that no "Tim12" family of proteins exist, but rather that variant forms of the cognate small TIMs have been recently duplicated and modified to provide new functions: the yeast Tim12 is a modified form of Tim10, whereas in humans and some protists variant forms of Tim9, Tim8, and Tim13 are found instead. Sequence motif analysis reveals acidic residues conserved in the Tim10 substrate-binding tentacles, whereas more hydrophobic residues are found in the equivalent substrate-binding region of Tim13. The substrate-binding region of Tim10 and Tim13 represent structurally independent domains: when the acidic domain from Tim10 is attached to Tim13, the Tim8-Tim13(10) complex becomes essential and the Tim9-Tim10 complex becomes dispensable. The conserved features in the Tim10 and Tim13 subunits provide distinct binding surfaces to accommodate the broad range of substrate proteins delivered to the mitochondrial inner and outer membranes.

  11. Microfluidic affinity and ChIP-seq analyses converge on a conserved FOXP2-binding motif in chimp and human, which enables the detection of evolutionarily novel targets

    PubMed Central

    Nelson, Christopher S.; Fuller, Chris K.; Fordyce, Polly M.; Greninger, Alexander L.; Li, Hao; DeRisi, Joseph L.

    2013-01-01

    The transcription factor forkhead box P2 (FOXP2) is believed to be important in the evolution of human speech. A mutation in its DNA-binding domain causes severe speech impairment. Humans have acquired two coding changes relative to the conserved mammalian sequence. Despite intense interest in FOXP2, it has remained an open question whether the human protein’s DNA-binding specificity and chromatin localization are conserved. Previous in vitro and ChIP-chip studies have provided conflicting consensus sequences for the FOXP2-binding site. Using MITOMI 2.0 microfluidic affinity assays, we describe the binding site of FOXP2 and its affinity profile in base-specific detail for all substitutions of the strongest binding site. We find that human and chimp FOXP2 have similar binding sites that are distinct from previously suggested consensus binding sites. Additionally, through analysis of FOXP2 ChIP-seq data from cultured neurons, we find strong overrepresentation of a motif that matches our in vitro results and identifies a set of genes with FOXP2 binding sites. The FOXP2-binding sites tend to be conserved, yet we identified 38 instances of evolutionarily novel sites in humans. Combined, these data present a comprehensive portrait of FOXP2’s-binding properties and imply that although its sequence specificity has been conserved, some of its genomic binding sites are newly evolved. PMID:23625967

  12. Microfluidic affinity and ChIP-seq analyses converge on a conserved FOXP2-binding motif in chimp and human, which enables the detection of evolutionarily novel targets.

    PubMed

    Nelson, Christopher S; Fuller, Chris K; Fordyce, Polly M; Greninger, Alexander L; Li, Hao; DeRisi, Joseph L

    2013-07-01

    The transcription factor forkhead box P2 (FOXP2) is believed to be important in the evolution of human speech. A mutation in its DNA-binding domain causes severe speech impairment. Humans have acquired two coding changes relative to the conserved mammalian sequence. Despite intense interest in FOXP2, it has remained an open question whether the human protein's DNA-binding specificity and chromatin localization are conserved. Previous in vitro and ChIP-chip studies have provided conflicting consensus sequences for the FOXP2-binding site. Using MITOMI 2.0 microfluidic affinity assays, we describe the binding site of FOXP2 and its affinity profile in base-specific detail for all substitutions of the strongest binding site. We find that human and chimp FOXP2 have similar binding sites that are distinct from previously suggested consensus binding sites. Additionally, through analysis of FOXP2 ChIP-seq data from cultured neurons, we find strong overrepresentation of a motif that matches our in vitro results and identifies a set of genes with FOXP2 binding sites. The FOXP2-binding sites tend to be conserved, yet we identified 38 instances of evolutionarily novel sites in humans. Combined, these data present a comprehensive portrait of FOXP2's-binding properties and imply that although its sequence specificity has been conserved, some of its genomic binding sites are newly evolved.

  13. Analysis of the evolutionarily conserved repeat motifs in the genome of the highly endangered central Indian swamp deer Cervus duvauceli branderi.

    PubMed

    Ali, S; Ansari, S; Ehtesham, N Z; Azfer, M A; Homkar, U; Gopal, R; Hasnain, S E

    1998-11-26

    We have analyzed the genome of central Indian swamp deer Cervus duvauceli branderi, an inhabitant of the Kanha National Park, a wildlife conservatory in Central India, with a view to provide a genetic basis for their extinction. Evolutionarily conserved repeat sequence motifs (GATA)3.75, TA(GATA)4, (GACA)3.75, (TGG)6 and a set of mouse beta-actin primers were used to uncover the sequence variation within and between related species by employing techniques of hybridization and AP-PCR amplification. The oligo probe carrying the GACA and TGG repeat motifs was found to be positive with Cervus genome, whereas (GATA)3.75, TA(GATA)4 and beta-actin probes did not cross-hybridize with the same. AP-PCR amplification with (GACA)3.75, unlike the (TGG)6 primer, generated distinct bands in the range of 0. 37-2.10kb amongst different genomes including Cervus. A comparative genome analysis of other species using the AP-PCR approach with (GACA)3.75 primer revealed the phylogenetic status of Cervus duvauceli branderi. From the analysis of a very limited number of Cervus DNA samples, we observed a high level of genetic homogeneity that may be a prime reason for the extinction of this species. This study has implications in the context of conservation of this endangered Cervus duvauceli branderi species.

  14. Prediction of Secondary Structures Conserved in Multiple RNA Sequences.

    PubMed

    Xu, Zhenjiang Zech; Mathews, David H

    2016-01-01

    RNA structure is conserved by evolution to a greater extent than sequence. Predicting the conserved structure for multiple homologous sequences can be much more accurate than predicting the structure for a single sequence. RNAstructure is a software package that includes the programs Dynalign, Multilign, TurboFold, and PARTS for predicting conserved RNA secondary structure. This chapter provides protocols for using these programs. PMID:27665591

  15. Conserved function of the lysine-based KXD/E motif in Golgi retention for endomembrane proteins among different organisms

    PubMed Central

    Woo, Cheuk Hang; Gao, Caiji; Yu, Ping; Tu, Linna; Meng, Zhaoyue; Banfield, David K.; Yao, Xiaoqiang; Jiang, Liwen

    2015-01-01

    We recently identified a new COPI-interacting KXD/E motif in the C-terminal cytosolic tail (CT) of Arabidopsis endomembrane protein 12 (AtEMP12) as being a crucial Golgi retention mechanism for AtEMP12. This KXD/E motif is conserved in CTs of all EMPs found in plants, yeast, and humans and is also present in hundreds of other membrane proteins. Here, by cloning selective EMP isoforms from plants, yeast, and mammals, we study the localizations of EMPs in different expression systems, since there are contradictory reports on the localizations of EMPs. We show that the N-terminal and C-terminal GFP-tagged EMP fusions are localized to Golgi and post-Golgi compartments, respectively, in plant, yeast, and mammalian cells. In vitro pull-down assay further proves the interaction of the KXD/E motif with COPI coatomer in yeast. COPI loss of function in yeast and plants causes mislocalization of EMPs or KXD/E motif–containing proteins to vacuole. Ultrastructural studies further show that RNA interference (RNAi) knockdown of coatomer expression in transgenic Arabidopsis plants causes severe morphological changes in the Golgi. Taken together, our results demonstrate that N-terminal GFP fusions reflect the real localization of EMPs, and KXD/E is a conserved motif in COPI interaction and Golgi retention in eukaryotes. PMID:26378254

  16. Using Weeder, Pscan, and PscanChIP for the Discovery of Enriched Transcription Factor Binding Site Motifs in Nucleotide Sequences.

    PubMed

    Zambelli, Federico; Pesole, Graziano; Pavesi, Giulio

    2014-01-01

    One of the greatest challenges facing modern molecular biology is understanding the complex mechanisms regulating gene expression. A fundamental step in this process requires the characterization of sequence motifs involved in the regulation of gene expression at transcriptional and post-transcriptional levels. In particular, transcription is modulated by the interaction of transcription factors (TFs) with their corresponding binding sites. Weeder, Pscan, and PscanChIP are software tools freely available for noncommercial users as a stand-alone or Web-based applications for the automatic discovery of conserved motifs in a set of DNA sequences likely to be bound by the same TFs. Input for the tools can be promoter sequences from co-expressed or co-regulated genes (for which Weeder and Pscan are suitable), or regions identified through genome wide ChIP-seq or similar experiments (Weeder and PscanChIP). The motifs are either found by a de novo approach (Weeder) or by using descriptors of the binding specificity of TFs (Pscan and PscanChIP). PMID:25199791

  17. Functionally conserved enhancers with divergent sequences in distant vertebrates

    SciTech Connect

    Yang, Song; Oksenberg, Nir; Takayama, Sachiko; Heo, Seok -Jin; Poliakov, Alexander; Ahituv, Nadav; Dubchak, Inna; Boffelli, Dario

    2015-10-30

    To examine the contributions of sequence and function conservation in the evolution of enhancers, we systematically identified enhancers whose sequences are not conserved among distant groups of vertebrate species, but have homologous function and are likely to be derived from a common ancestral sequence. In conclusion, our approach combined comparative genomics and epigenomics to identify potential enhancer sequences in the genomes of three groups of distantly related vertebrate species.

  18. A common set of conserved motifs in a vast variety of putative nucleic acid-dependent ATPases including MCM proteins involved in the initiation of eukaryotic DNA replication.

    PubMed

    Koonin, E V

    1993-06-11

    A new superfamily of (putative) DNA-dependent ATPases is described that includes the ATPase domains of prokaryotic NtrC-related transcription regulators, MCM proteins involved in the initiation of eukaryotic DNA replication, and a group of uncharacterized bacterial and chloroplast proteins. MCM proteins are shown to contain a modified form of the ATP-binding motif and are predicted to mediate ATP-dependent opening of double-stranded DNA in the replication origins. In a second line of investigation, it is demonstrated that the products of unidentified open reading frames from Marchantia mitochondria and from yeast, and a domain of a baculovirus protein involved in viral DNA replication are related to the superfamily III of DNA and RNA helicases that previously has been known to include only proteins of small viruses. Comparison of the multiple alignments showed that the proteins of the NtrC superfamily and the helicases of superfamily III share three related sequence motifs tightly packed in the ATPase domain that consists of 100-150 amino acid residues. A similar array of conserved motifs is found in the family of DnaA-related ATPases. It is hypothesized that the three large groups of nucleic acid-dependent ATPases have similar structure of the core ATPase domain and have evolved from a common ancestor.

  19. Temporal motifs reveal homophily, gender-specific patterns, and group talk in call sequences.

    PubMed

    Kovanen, Lauri; Kaski, Kimmo; Kertész, János; Saramäki, Jari

    2013-11-01

    Recent studies on electronic communication records have shown that human communication has complex temporal structure. We study how communication patterns that involve multiple individuals are affected by attributes such as sex and age. To this end, we represent the communication records as a colored temporal network where node color is used to represent individuals' attributes, and identify patterns known as temporal motifs. We then construct a null model for the occurrence of temporal motifs that takes into account the interaction frequencies and connectivity between nodes of different colors. This null model allows us to detect significant patterns in call sequences that cannot be observed in a static network that uses interaction frequencies as link weights. We find sex-related differences in communication patterns in a large dataset of mobile phone records and show the existence of temporal homophily, the tendency of similar individuals to participate in communication patterns beyond what would be expected on the basis of their average interaction frequencies. We also show that temporal patterns differ between dense and sparse neighborhoods in the network. Because also this result is independent of interaction frequencies, it can be seen as an extension of Granovetter's hypothesis to temporal networks. PMID:24145424

  20. QuasiMotiFinder: protein annotation by searching for evolutionarily conserved motif-like patterns.

    PubMed

    Gutman, Roee; Berezin, Carine; Wollman, Roy; Rosenberg, Yossi; Ben-Tal, Nir

    2005-07-01

    Sequence signature databases such as PROSITE, which include amino acid segments that are indicative of a protein's function, are useful for protein annotation. Lamentably, the annotation is not always accurate. A signature may be falsely detected in a protein that does not carry out the associated function (false positive prediction, FP) or may be overlooked in a protein that does carry out the function (false negative prediction, FN). A new approach has emerged in which a signature is replaced with a sequence profile, calculated based on multiple sequence alignment (MSA) of homologous proteins that share the same function. This approach, which is superior to the simple pattern search, essentially searches with the sequence of the query protein against an MSA library. We suggest here an alternative approach, implemented in the QuasiMotiFinder web server (http://quasimotifinder.tau.ac.il/), which is based on a search with an MSA of homologous query proteins against the original PROSITE signatures. The explicit use of the average evolutionary conservation of the signature in the query proteins significantly reduces the rate of FP prediction compared with the simple pattern search. QuasiMotiFinder also has a reduced rate of FN prediction compared with simple pattern searches, since the traditional search for precise signatures has been replaced by a permissive search for signature-like patterns that are physicochemically similar to known signatures. Overall, QuasiMotiFinder and the profile search are comparable to each other in terms of performance. They are also complementary to each other in that signatures that are falsely detected in (or overlooked by) one may be correctly detected by the other.

  1. MtHc: a motif-based hierarchical method for clustering massive 16S rRNA sequences into OTUs.

    PubMed

    Wei, Ze-Gang; Zhang, Shao-Wu

    2015-07-01

    The recent sequencing revolution driven by high-throughput technologies has led to rapid accumulation of 16S rRNA sequences for microbial communities. Clustering short sequences into operational taxonomic units (OTUs) is an initial crucial process in analyzing metagenomic data. Although many methods have been proposed for OTU inferences, a major challenge is the balance between inference accuracy and computational efficiency. To address these challenges, we present a novel motif-based hierarchical method (namely MtHc) for clustering massive 16S rRNA sequences into OTUs with high clustering accuracy and low memory usage. Suppose all the 16S rRNA sequences can be used to construct a complete weighted network, where sequences are viewed as nodes, each pair of sequences is connected by an imaginary edge, and the distance of a pair of sequences represents the weight of the edge. MtHc consists of three main phrases. First, heuristically search the motif that is defined as n-node sub-graph (in the present study, n = 3, 4, 5), in which the distance between any two nodes is less than a threshold. Second, use the motif as a seed to form candidate clusters by computing the distances of other sequences with the motif. Finally, hierarchically merge the candidate clusters to generate the OTUs by only calculating the distances of motifs between two clusters. Compared with the existing methods on several simulated and real-life metagenomic datasets, we demonstrate that MtHc has higher clustering performance, less memory usage and robustness for setting parameters, and that it is more effective to handle the large-scale metagenomic datasets. The MtHC software can be freely download from for academic users.

  2. MtHc: a motif-based hierarchical method for clustering massive 16S rRNA sequences into OTUs.

    PubMed

    Wei, Ze-Gang; Zhang, Shao-Wu

    2015-07-01

    The recent sequencing revolution driven by high-throughput technologies has led to rapid accumulation of 16S rRNA sequences for microbial communities. Clustering short sequences into operational taxonomic units (OTUs) is an initial crucial process in analyzing metagenomic data. Although many methods have been proposed for OTU inferences, a major challenge is the balance between inference accuracy and computational efficiency. To address these challenges, we present a novel motif-based hierarchical method (namely MtHc) for clustering massive 16S rRNA sequences into OTUs with high clustering accuracy and low memory usage. Suppose all the 16S rRNA sequences can be used to construct a complete weighted network, where sequences are viewed as nodes, each pair of sequences is connected by an imaginary edge, and the distance of a pair of sequences represents the weight of the edge. MtHc consists of three main phrases. First, heuristically search the motif that is defined as n-node sub-graph (in the present study, n = 3, 4, 5), in which the distance between any two nodes is less than a threshold. Second, use the motif as a seed to form candidate clusters by computing the distances of other sequences with the motif. Finally, hierarchically merge the candidate clusters to generate the OTUs by only calculating the distances of motifs between two clusters. Compared with the existing methods on several simulated and real-life metagenomic datasets, we demonstrate that MtHc has higher clustering performance, less memory usage and robustness for setting parameters, and that it is more effective to handle the large-scale metagenomic datasets. The MtHC software can be freely download from for academic users. PMID:25912934

  3. The structure of an endogenous Drosophila centromere reveals the prevalence of tandemly repeated sequences able to form i-motifs

    PubMed Central

    Garavís, Miguel; Méndez-Lago, María; Gabelica, Valérie; Whitehead, Siobhan L.; González, Carlos; Villasante, Alfredo

    2015-01-01

    Centromeres are the chromosomal loci at which spindle microtubules attach to mediate chromosome segregation during mitosis and meiosis. In most eukaryotes, centromeres are made up of highly repetitive DNA sequences (satellite DNA) interspersed with middle repetitive DNA sequences (transposable elements). Despite the efforts to establish complete genomic sequences of eukaryotic organisms, the so-called ‘finished’ genomes are not actually complete because the centromeres have not been assembled due to the intrinsic difficulties in constructing both physical maps and complete sequence assemblies of long stretches of tandemly repetitive DNA. Here we show the first molecular structure of an endogenous Drosophila centromere and the ability of the C-rich dodeca satellite strand to form dimeric i-motifs. The finding of i-motif structures in simple and complex centromeric satellite DNAs leads us to suggest that these centromeric sequences may have been selected not by their primary sequence but by their ability to form noncanonical secondary structures. PMID:26289671

  4. Upstream regions of the human cardiac actin gene that modulate its transcription in muscle cells: presence of an evolutionarily conserved repeated motif.

    PubMed Central

    Minty, A; Kedes, L

    1986-01-01

    Transfection into cultured cell lines was used to investigate the transcriptional regulation of the human cardiac actin gene. We first demonstrated that in both human heart and human skeletal muscle, cardiac actin mRNAs initiate at the identical site and contain the same first exon, which is separated from the first coding exon by an intron of 700 base pairs. A region of 485 base pairs upstream from the transcription initiation site of the human cardiac actin gene directs high-level transient expression of the bacterial chloramphenicol acetyltransferase gene in differentiated myotubes of the mouse C2C12 muscle cell line, but not in mouse L fibroblast or rat PC-G2 pheochromocytoma cells. Deletion analysis of this region showed that at least two physically separated sequence elements are involved, a distal one starting between -443 and -395 and a proximal one starting between -177 and -118, and suggested that these sequences interact with positively acting transcriptional factors in muscle cells. When these two sequence elements are inserted separately upstream of a heterologous (simian virus 40) promoter, they do not affect transcription but do give a small (four- to fivefold) stimulation when tested together. Overall, these regulatory regions upstream of the cap site of the human cardiac actin gene show remarkably high sequence conservation with the equivalent regions of the mouse and chick genes. Furthermore, there is an evolutionarily conserved repeated motif that may be important in the transcriptional regulation of actin and other contractile protein genes. Images PMID:3785189

  5. ZFP57 recognizes multiple and closely spaced sequence motif variants to maintain repressive epigenetic marks in mouse embryonic stem cells

    PubMed Central

    Anvar, Zahra; Cammisa, Marco; Riso, Vincenzo; Baglivo, Ilaria; Kukreja, Harpreet; Sparago, Angela; Girardot, Michael; Lad, Shraddha; De Feis, Italia; Cerrato, Flavia; Angelini, Claudia; Feil, Robert; Pedone, Paolo V.; Grimaldi, Giovanna; Riccio, Andrea

    2016-01-01

    Imprinting Control Regions (ICRs) need to maintain their parental allele-specific DNA methylation during early embryogenesis despite genome-wide demethylation and subsequent de novo methylation. ZFP57 and KAP1 are both required for maintaining the repressive DNA methylation and H3-lysine-9-trimethylation (H3K9me3) at ICRs. In vitro, ZFP57 binds a specific hexanucleotide motif that is enriched at its genomic binding sites. We now demonstrate in mouse embryonic stem cells (ESCs) that SNPs disrupting closely-spaced hexanucleotide motifs are associated with lack of ZFP57 binding and H3K9me3 enrichment. Through a transgenic approach in mouse ESCs, we further demonstrate that an ICR fragment containing three ZFP57 motif sequences recapitulates the original methylated or unmethylated status when integrated into the genome at an ectopic position. Mutation of Zfp57 or the hexanucleotide motifs led to loss of ZFP57 binding and DNA methylation of the transgene. Finally, we identified a sequence variant of the hexanucleotide motif that interacts with ZFP57 both in vivo and in vitro. The presence of multiple and closely located copies of ZFP57 motif variants emerges as a distinct characteristic that is required for the faithful maintenance of repressive epigenetic marks at ICRs and other ZFP57 binding sites. PMID:26481358

  6. SIRW: A web server for the Simple Indexing and Retrieval System that combines sequence motif searches with keyword searches.

    PubMed

    Ramu, Chenna

    2003-07-01

    SIRW (http://sirw.embl.de/) is a World Wide Web interface to the Simple Indexing and Retrieval System (SIR) that is capable of parsing and indexing various flat file databases. In addition it provides a framework for doing sequence analysis (e.g. motif pattern searches) for selected biological sequences through keyword search. SIRW is an ideal tool for the bioinformatics community for searching as well as analyzing biological sequences of interest.

  7. Direct RNA motif definition and identification from multiple sequence alignments using secondary structure profiles.

    PubMed

    Gautheret, D; Lambert, A

    2001-11-01

    We present here a new approach to the problem of defining RNA signatures and finding their occurrences in sequence databases. The proposed method is based on "secondary structure profiles". An RNA sequence alignment with secondary structure information is used as an input. Two types of weight matrices/profiles are constructed from this alignment: single strands are represented by a classical lod-scores profile while helical regions are represented by an extended "helical profile" comprising 16 lod-scores per position, one for each of the 16 possible base-pairs. Database searches are then conducted using a simultaneous search for helical profiles and dynamic programming alignment of single strand profiles. The algorithm has been implemented into a new software, ERPIN, that performs both profile construction and database search. Applications are presented for several RNA motifs. The automated use of sequence information in both single-stranded and helical regions yields better sensitivity/specificity ratios than descriptor-based programs. Furthermore, since the translation of alignments into profiles is straightforward with ERPIN, iterative searches can easily be conducted to enrich collections of homologous RNAs.

  8. Exceptional motifs in different Markov chain models for a statistical analysis of DNA sequences.

    PubMed

    Schbath, S; Prum, B; de Turckheim, E

    1995-01-01

    Identifying exceptional motifs is often used for extracting information from long DNA sequences. The two difficulties of the method are the choice of the model that defines the expected frequencies of words and the approximation of the variance of the difference T(W) between the number of occurrences of a word W and its estimation. We consider here different Markov chain models, either with stationary or periodic transition probabilities. We estimate the variance of the difference T(W) by the conditional variance of the number of occurrences of W given the oligonucleotides counts that define the model. Two applications show how to use asymptotically standard normal statistics associated with the counts to describe a given sequence in terms of its outlying words. Sequences of Escherichia coli and of Bacillus subtilis are compared with respect to their exceptional tri- and tetranucleotides. For both bacteria, exceptional 3-words are mainly found in the coding frame. E. coli palindrome counts are analyzed in different models, showing that many overabundant words are one-letter mutations of avoided palindromes. PMID:8521272

  9. Synthesis, anti-mycobacterial activity and DNA sequence-selectivity of a library of biaryl-motifs containing polyamides.

    PubMed

    Brucoli, Federico; Guzman, Juan D; Maitra, Arundhati; James, Colin H; Fox, Keith R; Bhakta, Sanjib

    2015-07-01

    The alarming rise of extensively drug-resistant tuberculosis (XDR-TB) strains, compel the development of new molecules with novel modes of action to control this world health emergency. Distamycin analogues containing N-terminal biaryl-motifs 2(1-5)(1-7) were synthesised using a solution-phase approach and evaluated for their anti-mycobacterial activity and DNA-sequence selectivity. Thiophene dimer motif-containing polyamide 2(2,6) exhibited 10-fold higher inhibitory activity against Mycobacterium tuberculosis compared to distamycin and library member 2(5,7) showed high binding affinity for the 5'-ACATAT-3' sequence.

  10. Function of the PEX19-binding site of human adrenoleukodystrophy protein as targeting motif in man and yeast. PMP targeting is evolutionarily conserved.

    PubMed

    Halbach, André; Lorenzen, Stephan; Landgraf, Christiane; Volkmer-Engert, Rudolf; Erdmann, Ralf; Rottensteiner, Hanspeter

    2005-06-01

    We predicted in human peroxisomal membrane proteins (PMPs) the binding sites for PEX19, a key player in the topogenesis of PMPs, by virtue of an algorithm developed for yeast PMPs. The best scoring PEX19-binding site was found in the adrenoleukodystrophy protein (ALDP). The identified site was indeed bound by human PEX19 and was also recognized by the orthologous yeast PEX19 protein. Likewise, both human and yeast PEX19 bound with comparable affinities to the PEX19-binding site of the yeast PMP Pex13p. Interestingly, the identified PEX19-binding site of ALDP coincided with its previously determined targeting motif. We corroborated the requirement of the ALDP PEX19-binding site for peroxisomal targeting in human fibroblasts and showed that the minimal ALDP fragment targets correctly also in yeast, again in a PEX19-binding site-dependent manner. Furthermore, the human PEX19-binding site of ALDP proved interchangeable with that of yeast Pex13p in an in vivo targeting assay. Finally, we showed in vitro that most of the predicted binding sequences of human PMPs represent true binding sites for human PEX19, indicating that human PMPs harbor common PEX19-binding sites that do resemble those of yeast. Our data clearly revealed a role for PEX19-binding sites as PMP-targeting motifs across species, thereby demonstrating the evolutionary conservation of PMP signal sequences from yeast to man.

  11. Identification of internal transcribed spacer sequence motifs in truffles: a first step toward their DNA bar coding.

    PubMed

    El Karkouri, Khalid; Murat, Claude; Zampieri, Elisa; Bonfante, Paola

    2007-08-01

    This work presents DNA sequence motifs from the internal transcribed spacer (ITS) of the nuclear rRNA repeat unit which are useful for the identification of five European and Asiatic truffles (Tuber magnatum, T. melanosporum, T. indicum, T. aestivum, and T. mesentericum). Truffles are edible mycorrhizal ascomycetes that show similar morphological characteristics but that have distinct organoleptic and economic values. A total of 36 out of 46 ITS1 or ITS2 sequence motifs have allowed an accurate in silico distinction of the five truffles to be made (i.e., by pattern matching and/or BLAST analysis on downloaded GenBank sequences and directly against GenBank databases). The motifs considered the intraspecific genetic variability of each species, including rare haplotypes, and assigned their respective species from either the ascocarps or ectomycorrhizas. The data indicate that short ITS1 or ITS2 motifs (< or = 50 bp in size) can be considered promising tools for truffle species identification. A dot blot hybridization analysis of T. magnatum and T. melanosporum compared with other close relatives or distant lineages allowed at least one highly specific motif to be identified for each species. These results were confirmed in a blind test which included new field isolates. The current work has provided a reliable new tool for a truffle oligonucleotide bar code and identification in ecological and evolutionary studies. PMID:17601808

  12. Analysis of the Campylobacter jejuni Genome by SMRT DNA Sequencing Identifies Restriction-Modification Motifs

    PubMed Central

    O’Loughlin, Jason L.; Eucker, Tyson P.; Chavez, Juan D.; Samuelson, Derrick R.; Neal-McKinney, Jason; Gourley, Christopher R.; Bruce, James E.; Konkel, Michael E.

    2015-01-01

    Campylobacter jejuni is a leading bacterial cause of human gastroenteritis. The goal of this study was to analyze the C. jejuni F38011 strain, recovered from an individual with severe enteritis, at a genomic and proteomic level to gain insight into microbial processes. The C. jejuni F38011 genome is comprised of 1,691,939 bp, with a mol.% (G+C) content of 30.5%. PacBio sequencing coupled with REBASE analysis was used to predict C. jejuni F38011 genomic sites and enzymes that may be involved in DNA restriction-modification. A total of five putative methylation motifs were identified as well as the C. jejuni enzymes that could be responsible for the modifications. Peptides corresponding to the deduced amino acid sequence of the C. jejuni enzymes were identified using proteomics. This work sets the stage for studies to dissect the precise functions of the C. jejuni putative restriction-modification enzymes. Taken together, the data generated in this study contributes to our knowledge of the genomic content, methylation profile, and encoding capacity of C. jejuni. PMID:25695747

  13. Roles of conserved proline and glycosyltransferase motifs of EmbC in biosynthesis of lipoarabinomannan.

    PubMed

    Berg, Stefan; Starbuck, James; Torrelles, Jordi B; Vissa, Varalakshmi D; Crick, Dean C; Chatterjee, Delphi; Brennan, Patrick J

    2005-02-18

    D-Arabinans, composed of D-arabinofuranose (D-Araf), dominate the structure of mycobacterial cell walls in two settings, as part of lipoarabinomannan (LAM) and arabinogalactan, each with markedly different structures and functions. Little is known of the complexity of their biosynthesis. beta-D-Arabinofuranosyl-1-monophosphoryldecaprenol is the only known sugar donor. EmbA, EmbB, and EmbC, products of the paralogous genes embA, embB, and embC, the sites of resistance to the anti-tuberculosis drug ethambutol (EMB), are the only known implicated enzymes. EmbA and -B apparently contribute to the synthesis of arabinogalactan, whereas EmbC is reserved for the synthesis of LAM. The Emb proteins show no overall similarity to any known proteins beyond Mycobacterium and related genera. However, functional motifs, equivalent to a proline-rich motif of several bacterial polysaccharide co-polymerases and a superfamily of glycosyltransferases, were found. Site-directed mutagenesis in glycosyltransferase superfamily C resulted in complete ablation of LAM synthesis. Point mutations in three amino acids of the proline motif of EmbC resulted in marked reduction of LAM-arabinan synthesis and accumulation of an unknown intermediate and of the known precursor lipomannan. Yet the pattern of the differently linked d-Araf units observed in wild type LAM-arabinan was largely retained in the proline motif mutants. The results allow for the presentation of a unique model of arabinan synthesis. PMID:15546869

  14. Structural determinants of Rab and Rab Escort Protein interaction: Rab family motifs define a conserved binding surface.

    PubMed

    Pereira-Leal, José B; Strom, Molly; Godfrey, Richard F; Seabra, Miguel C

    2003-01-31

    Rab proteins are a large family of monomeric GTPases with 60 members identified in the human genome. Rab GTPases require an isoprenyl modification to their C-terminus for membrane association and function in the regulation of vesicular trafficking pathways. This reaction is catalysed by Rab geranylgeranyl transferase, which recognises as protein substrate any given Rab in a 1:1 complex with Rab Escort Protein (REP). REP is therefore able to bind many distinct Rab proteins but the molecular basis for this activity is still unclear. We recently identified conserved motifs in Rabs termed RabF motifs, which we proposed to mediate a conserved mode of interaction between Rabs and REPs. Here, we tested this hypothesis. We first used REP1 as a bait in the yeast two-hybrid system and isolated strictly full-length Rabs, suggesting that REP recognises multiple regions within and properly folded Rabs. We introduced point mutations in Rab3a as a model Rab and assessed the ability of the mutants to interact with REP using the yeast two-hybrid system and an in vitro prenylation assay. We identified several residues that affect REP:Rab binding in the RabF1, RabF3, and RabF4 regions (which include parts of the switch I and II regions), but not other RabF regions. These results support the hypothesis that Rabs bind REP via conserved RabF motifs and provide a molecular explanation for the preferential recognition of the GDP-bound conformation of Rab by REP. PMID:12535645

  15. Sequence, structure, and cooperativity in folding of elementary protein structural motifs.

    PubMed

    Lai, Jason K; Kubelka, Ginka S; Kubelka, Jan

    2015-08-11

    Residue-level unfolding of two helix-turn-helix proteins--one naturally occurring and one de novo designed--is reconstructed from multiple sets of site-specific (13)C isotopically edited infrared (IR) and circular dichroism (CD) data using Ising-like statistical-mechanical models. Several model variants are parameterized to test the importance of sequence-specific interactions (approximated by Miyazawa-Jernigan statistical potentials), local structural flexibility (derived from the ensemble of NMR structures), interhelical hydrogen bonds, and native contacts separated by intervening disordered regions (through the Wako-Saitô-Muñoz-Eaton scheme, which disallows such configurations). The models are optimized by directly simulating experimental observables: CD ellipticity at 222 nm for model proteins and their fragments and (13)C-amide I' bands for multiple isotopologues of each protein. We find that data can be quantitatively reproduced by the model that allows two interacting segments flanking a disordered loop (double sequence approximation) and incorporates flexibility in the native contact maps, but neither sequence-specific interactions nor hydrogen bonds are required. The near-identical free energy profiles as a function of the global order parameter are consistent with expected similar folding kinetics for nearly identical structures. However, the predicted folding mechanism for the two motifs is different, reflecting the order of local stability. We introduce free energy profiles for "experimental" reaction coordinates--namely, the degree of local folding as sensed by site-specific (13)C-edited IR, which highlight folding heterogeneity and contrast its overall, average description with the detailed, local picture.

  16. Sequence, structure, and cooperativity in folding of elementary protein structural motifs

    PubMed Central

    Lai, Jason K.; Kubelka, Ginka S.; Kubelka, Jan

    2015-01-01

    Residue-level unfolding of two helix-turn-helix proteins—one naturally occurring and one de novo designed—is reconstructed from multiple sets of site-specific 13C isotopically edited infrared (IR) and circular dichroism (CD) data using Ising-like statistical-mechanical models. Several model variants are parameterized to test the importance of sequence-specific interactions (approximated by Miyazawa–Jernigan statistical potentials), local structural flexibility (derived from the ensemble of NMR structures), interhelical hydrogen bonds, and native contacts separated by intervening disordered regions (through the Wako–Saitô–Muñoz–Eaton scheme, which disallows such configurations). The models are optimized by directly simulating experimental observables: CD ellipticity at 222 nm for model proteins and their fragments and 13C-amide I′ bands for multiple isotopologues of each protein. We find that data can be quantitatively reproduced by the model that allows two interacting segments flanking a disordered loop (double sequence approximation) and incorporates flexibility in the native contact maps, but neither sequence-specific interactions nor hydrogen bonds are required. The near-identical free energy profiles as a function of the global order parameter are consistent with expected similar folding kinetics for nearly identical structures. However, the predicted folding mechanism for the two motifs is different, reflecting the order of local stability. We introduce free energy profiles for “experimental” reaction coordinates—namely, the degree of local folding as sensed by site-specific 13C-edited IR, which highlight folding heterogeneity and contrast its overall, average description with the detailed, local picture. PMID:26216963

  17. Flow Cytometry-assisted Cloning of Specific Sequence Motifs fromComplex 16S ribosomal RNA Gene Libraries.

    SciTech Connect

    Nielsen, J.L.; Schramm, A.; Bernhard, A.E.; van den Engh, G.J.; Stahl, D.A.

    2004-07-21

    A flow cytometry method was developed for rapid screeningand recovery of cloned DNA containing common sequence motifs. Thisapproach, termed fluorescence-activated cell sorting-assisted cloning,was used to recover sequences affiliated with a unique lineage within theBacteroidetes not abundant in a clone library of environmental 16S rRNAgenes. Retrieval and sequence analysis of phylogenetically informativegenes has become a standard cultivation-independent technique toinvestigate microbial diversity in nature (7, 18). Genes encoding the 16SrRNA, because of the relative ease of their selective amplification, havebeen most frequently employed for general diversity surveys (16).Environmental studies have also focused on specific subpopulationsaffiliated with a phylogenetic group or identified by genes encodingspecific metabolic functions (e.g., ammonia oxidation, sulfaterespiration, and nitrate reduction) (8,15,20). However, specificpopulations may be of low abundance (1,23), or the genes encodingspecific metabolic functions may be insufficiently conserved to providepriming sites for general PCR amplification. Three general approacheshave been used to obtain 16S rRNA sequence information from low-abundancepopulations: screening hundreds to thousands of clones in a general 16SrRNA gene library (21), flow cytometric sorting of a subpopulation ofenvironmentally derived cells labeled by fluorescent in situhybridization (FISH) (27), or selective PCR amplification using primersspecific for the subpopulation (2,23). While the first approach is simplytime-consuming and tedious, the second has been restricted to fairlylarge and strongly fluorescent cells from aquatic samples (5, 27). Thethird approach often generates fragments of only a few hundred bases dueto the limited number of specific priming sites. Partial sequenceinformation often degrades analysis, obscuring or distorting thephylogenetic placement of the new sequences (11, 20). A more robustcharacterization of environ

  18. A Developmental Sequence of Skills Leading to Conservation

    ERIC Educational Resources Information Center

    Walker, Alice A.

    1978-01-01

    Examines the developmental sequence of skills involved in the understanding of relational concepts and in the development of conservation. Fifty kindergarten children participated in the study. (BD/BR)

  19. A conserved motif in JNK/p38-specific MAPK phosphatases as a determinant for JNK1 recognition and inactivation.

    PubMed

    Liu, Xin; Zhang, Chen-Song; Lu, Chang; Lin, Sheng-Cai; Wu, Jia-Wei; Wang, Zhi-Xin

    2016-01-01

    Mitogen-activated protein kinases (MAPKs), important in a large array of signalling pathways, are tightly controlled by a cascade of protein kinases and by MAPK phosphatases (MKPs). MAPK signalling efficiency and specificity is modulated by protein-protein interactions between individual MAPKs and the docking motifs in cognate binding partners. Two types of docking interactions have been identified: D-motif-mediated interaction and FXF-docking interaction. Here we report the crystal structure of JNK1 bound to the catalytic domain of MKP7 at 2.4-Å resolution, providing high-resolution structural insight into the FXF-docking interaction. The (285)FNFL(288) segment in MKP7 directly binds to a hydrophobic site on JNK1 that is near the MAPK insertion and helix αG. Biochemical studies further reveal that this highly conserved structural motif is present in all members of the MKP family, and the interaction mode is universal and critical for the MKP-MAPK recognition and biological function. PMID:26988444

  20. A conserved motif in JNK/p38-specific MAPK phosphatases as a determinant for JNK1 recognition and inactivation

    PubMed Central

    Liu, Xin; Zhang, Chen-Song; Lu, Chang; Lin, Sheng-Cai; Wu, Jia-Wei; Wang, Zhi-Xin

    2016-01-01

    Mitogen-activated protein kinases (MAPKs), important in a large array of signalling pathways, are tightly controlled by a cascade of protein kinases and by MAPK phosphatases (MKPs). MAPK signalling efficiency and specificity is modulated by protein–protein interactions between individual MAPKs and the docking motifs in cognate binding partners. Two types of docking interactions have been identified: D-motif-mediated interaction and FXF-docking interaction. Here we report the crystal structure of JNK1 bound to the catalytic domain of MKP7 at 2.4-Å resolution, providing high-resolution structural insight into the FXF-docking interaction. The 285FNFL288 segment in MKP7 directly binds to a hydrophobic site on JNK1 that is near the MAPK insertion and helix αG. Biochemical studies further reveal that this highly conserved structural motif is present in all members of the MKP family, and the interaction mode is universal and critical for the MKP-MAPK recognition and biological function. PMID:26988444

  1. CSTminer: a web tool for the identification of coding and noncoding conserved sequence tags through cross-species genome comparison.

    PubMed

    Castrignanò, Tiziana; Canali, Alessandro; Grillo, Giorgio; Liuni, Sabino; Mignone, Flavio; Pesole, Graziano

    2004-07-01

    The identification and characterization of genome tracts that are highly conserved across species during evolution may contribute significantly to the functional annotation of whole-genome sequences. Indeed, such sequences are likely to correspond to known or unknown coding exons or regulatory motifs. Here, we present a web server implementing a previously developed algorithm that, by comparing user-submitted genome sequences, is able to identify statistically significant conserved blocks and assess their coding or noncoding nature through the measure of a coding potential score. The web tool, available at http://www.caspur.it/CSTminer/, is dynamically interconnected with the Ensembl genome resources and produces a graphical output showing a map of detected conserved sequences and annotated gene features.

  2. AptaTRACE Elucidates RNA Sequence-Structure Motifs from Selection Trends in HT-SELEX Experiments.

    PubMed

    Dao, Phuong; Hoinka, Jan; Takahashi, Mayumi; Zhou, Jiehua; Ho, Michelle; Wang, Yijie; Costa, Fabrizio; Rossi, John J; Backofen, Rolf; Burnett, John; Przytycka, Teresa M

    2016-07-01

    Aptamers, short RNA or DNA molecules that bind distinct targets with high affinity and specificity, can be identified using high-throughput systematic evolution of ligands by exponential enrichment (HT-SELEX), but scalable analytic tools for understanding sequence-function relationships from diverse HT-SELEX data are not available. Here we present AptaTRACE, a computational approach that leverages the experimental design of the HT-SELEX protocol, RNA secondary structure, and the potential presence of many secondary motifs to identify sequence-structure motifs that show a signature of selection. We apply AptaTRACE to identify nine motifs in C-C chemokine receptor type 7 targeted by aptamers in an in vitro cell-SELEX experiment. We experimentally validate two aptamers whose binding required both sequence and structural features. AptaTRACE can identify low-abundance motifs, and we show through simulations that, because of this, it could lower HT-SELEX cost and time by reducing the number of selection cycles required. PMID:27467247

  3. Finishing and Special Motifs: Lessons Learned from CRISPR Analysis Using Next-Generation Draft Sequences ( 7th Annual SFAF Meeting, 2012)

    SciTech Connect

    Campbell, Catherine

    2012-06-01

    Catherine Campbell on "Finishing and Special Motifs: Lessons learned from CRISPR analysis using next-generation draft sequences" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.

  4. Finishing and Special Motifs: Lessons Learned from CRISPR Analysis Using Next-Generation Draft Sequences ( 7th Annual SFAF Meeting, 2012)

    ScienceCinema

    Campbell, Catherine [Noblis

    2016-07-12

    Catherine Campbell on "Finishing and Special Motifs: Lessons learned from CRISPR analysis using next-generation draft sequences" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.

  5. Structure of the Brd4 ET domain bound to a C-terminal motif from γ-retroviral integrases reveals a conserved mechanism of interaction

    PubMed Central

    Crowe, Brandon L.; Larue, Ross C.; Yuan, Chunhua; Hess, Sonja; Kvaratskhelia, Mamuka; Foster, Mark P.

    2016-01-01

    The bromodomain and extraterminal domain (BET) protein family are promising therapeutic targets for a range of diseases linked to transcriptional activation, cancer, viral latency, and viral integration. Tandem bromodomains selectively tether BET proteins to chromatin by engaging cognate acetylated histone marks, and the extraterminal (ET) domain is the focal point for recruiting a range of cellular and viral proteins. BET proteins guide γ-retroviral integration to transcription start sites and enhancers through bimodal interaction with chromatin and the γ-retroviral integrase (IN). We report the NMR-derived solution structure of the Brd4 ET domain bound to a conserved peptide sequence from the C terminus of murine leukemia virus (MLV) IN. The complex reveals a protein–protein interaction governed by the binding-coupled folding of disordered regions in both interacting partners to form a well-structured intermolecular three-stranded β sheet. In addition, we show that a peptide comprising the ET binding motif (EBM) of MLV IN can disrupt the cognate interaction of Brd4 with NSD3, and that substitutions of Brd4 ET residues essential for binding MLV IN also impair interaction of Brd4 with a number of cellular partners involved in transcriptional regulation and chromatin remodeling. This suggests that γ-retroviruses have evolved the EBM to mimic a cognate interaction motif to achieve effective integration in host chromatin. Collectively, our findings identify key structural features of the ET domain of Brd4 that allow for interactions with both cellular and viral proteins. PMID:26858406

  6. Structure of the Brd4 ET domain bound to a C-terminal motif from γ-retroviral integrases reveals a conserved mechanism of interaction.

    PubMed

    Crowe, Brandon L; Larue, Ross C; Yuan, Chunhua; Hess, Sonja; Kvaratskhelia, Mamuka; Foster, Mark P

    2016-02-23

    The bromodomain and extraterminal domain (BET) protein family are promising therapeutic targets for a range of diseases linked to transcriptional activation, cancer, viral latency, and viral integration. Tandem bromodomains selectively tether BET proteins to chromatin by engaging cognate acetylated histone marks, and the extraterminal (ET) domain is the focal point for recruiting a range of cellular and viral proteins. BET proteins guide γ-retroviral integration to transcription start sites and enhancers through bimodal interaction with chromatin and the γ-retroviral integrase (IN). We report the NMR-derived solution structure of the Brd4 ET domain bound to a conserved peptide sequence from the C terminus of murine leukemia virus (MLV) IN. The complex reveals a protein-protein interaction governed by the binding-coupled folding of disordered regions in both interacting partners to form a well-structured intermolecular three-stranded β sheet. In addition, we show that a peptide comprising the ET binding motif (EBM) of MLV IN can disrupt the cognate interaction of Brd4 with NSD3, and that substitutions of Brd4 ET residues essential for binding MLV IN also impair interaction of Brd4 with a number of cellular partners involved in transcriptional regulation and chromatin remodeling. This suggests that γ-retroviruses have evolved the EBM to mimic a cognate interaction motif to achieve effective integration in host chromatin. Collectively, our findings identify key structural features of the ET domain of Brd4 that allow for interactions with both cellular and viral proteins.

  7. Structural analysis of the regulatory elements of the type-II procollagen gene. Conservation of promoter and first intron sequences between human and mouse.

    PubMed Central

    Vikkula, M; Metsäranta, M; Syvänen, A C; Ala-Kokko, L; Vuorio, E; Peltonen, L

    1992-01-01

    Transcription of the type-II procollagen gene (COL2A1) is very specifically restricted to a limited number of tissues, particularly cartilages. In order to identify transcription-control motifs we have sequenced the promoter region and the first intron of the human and mouse COL2A1 genes. With the assumption that these motifs should be well conserved during evolution, we have searched for potential elements important for the tissue-specific transcription of the COL2A1 gene by aligning the two sequences with each other and with the available rat type-II procollagen sequence for the promoter. With this approach we could identify specific evolutionarily well-conserved motifs in the promoter area. On the other hand, several suggested regulatory elements in the promoter region did not show evolutionary conservation. In the middle of the first intron we found a cluster of well-conserved transcription-control elements and we conclude that these conserved motifs most probably possess a significant function in the control of the tissue-specific transcription of the COL2A1 gene. We also describe locations of additional, highly conserved nucleotide stretches, which are good candidate regions in the search for binding sites of yet-uncharacterized cartilage-specific transcription regulators of the COL2A1 gene. PMID:1637314

  8. Triazine-Based Sequence-Defined Polymers with Side-Chain Diversity and Backbone-Backbone Interaction Motifs.

    PubMed

    Grate, Jay W; Mo, Kai-For; Daily, Michael D

    2016-03-14

    Sequence control in polymers, well-known in nature, encodes structure and functionality. Here we introduce a new architecture, based on the nucleophilic aromatic substitution chemistry of cyanuric chloride, that creates a new class of sequence-defined polymers dubbed TZPs. Proof of concept is demonstrated with two synthesized hexamers, having neutral and ionizable side chains. Molecular dynamics simulations show backbone-backbone interactions, including H-bonding motifs and pi-pi interactions. This architecture is arguably biomimetic while differing from sequence-defined polymers having peptide bonds. The synthetic methodology supports the structural diversity of side chains known in peptides, as well as backbone-backbone hydrogen-bonding motifs, and will thus enable new macromolecules and materials with useful functions. PMID:26865312

  9. DNA recognition for virus assembly through multiple sequence-independent interactions with a helix-turn-helix motif

    PubMed Central

    Greive, Sandra J.; Fung, Herman K.H.; Chechik, Maria; Jenkins, Huw T.; Weitzel, Stephen E.; Aguiar, Pedro M.; Brentnall, Andrew S.; Glousieau, Matthieu; Gladyshev, Grigory V.; Potts, Jennifer R.; Antson, Alfred A.

    2016-01-01

    The helix-turn-helix (HTH) motif features frequently in protein DNA-binding assemblies. Viral pac site-targeting small terminase proteins possess an unusual architecture in which the HTH motifs are displayed in a ring, distinct from the classical HTH dimer. Here we investigate how such a circular array of HTH motifs enables specific recognition of the viral genome for initiation of DNA packaging during virus assembly. We found, by surface plasmon resonance and analytical ultracentrifugation, that individual HTH motifs of the Bacillus phage SF6 small terminase bind the packaging regions of SF6 and related SPP1 genome weakly, with little local sequence specificity. Nuclear magnetic resonance chemical shift perturbation studies with an arbitrary single-site substrate suggest that the HTH motif contacts DNA similarly to how certain HTH proteins contact DNA non-specifically. Our observations support a model where specificity is generated through conformational selection of an intrinsically bent DNA segment by a ring of HTHs which bind weakly but cooperatively. Such a system would enable viral gene regulation and control of the viral life cycle, with a minimal genome, conferring a major evolutionary advantage for SPP1-like viruses. PMID:26673721

  10. SVM2Motif—Reconstructing Overlapping DNA Sequence Motifs by Mimicking an SVM Predictor

    PubMed Central

    Vidovic, Marina M. -C.; Görnitz, Nico; Müller, Klaus-Robert; Rätsch, Gunnar; Kloft, Marius

    2015-01-01

    Identifying discriminative motifs underlying the functionality and evolution of organisms is a major challenge in computational biology. Machine learning approaches such as support vector machines (SVMs) achieve state-of-the-art performances in genomic discrimination tasks, but—due to its black-box character—motifs underlying its decision function are largely unknown. As a remedy, positional oligomer importance matrices (POIMs) allow us to visualize the significance of position-specific subsequences. Although being a major step towards the explanation of trained SVM models, they suffer from the fact that their size grows exponentially in the length of the motif, which renders their manual inspection feasible only for comparably small motif sizes, typically k ≤ 5. In this work, we extend the work on positional oligomer importance matrices, by presenting a new machine-learning methodology, entitled motifPOIM, to extract the truly relevant motifs—regardless of their length and complexity—underlying the predictions of a trained SVM model. Our framework thereby considers the motifs as free parameters in a probabilistic model, a task which can be phrased as a non-convex optimization problem. The exponential dependence of the POIM size on the oligomer length poses a major numerical challenge, which we address by an efficient optimization framework that allows us to find possibly overlapping motifs consisting of up to hundreds of nucleotides. We demonstrate the efficacy of our approach on a synthetic data set as well as a real-world human splice site data set. PMID:26690911

  11. The mammalian Rab family of small GTPases: definition of family and subfamily sequence motifs suggests a mechanism for functional specificity in the Ras superfamily.

    PubMed

    Pereira-Leal, J B; Seabra, M C

    2000-08-25

    The Rab/Ypt/Sec4 family forms the largest branch of the Ras superfamily of GTPases, acting as essential regulators of vesicular transport pathways. We used the large amount of information in the databases to analyse the mammalian Rab family. We defined Rab-conserved sequences that we designate Rab family (RabF) motifs using the conserved PM and G motifs as "landmarks". The Rab-specific regions were used to identify new Rab proteins in the databases and suggest rules for nomenclature. Surprisingly, we find that RabF regions cluster in and around switch I and switch II regions, i.e. the regions that change conformation upon GDP or GTP binding. This finding suggests that specificity of Rab-effector interaction cannot be conferred solely through the switch regions as is usually inferred. Instead, we propose a model whereby an effector binds to RabF (switch) regions to discriminate between nucleotide-bound states and simultaneously to other regions that confer specificity to the interaction, possibly Rab subfamily (RabSF) specific regions that we also define here. We discuss structural and functional data that support this model and its general applicability to the Ras superfamily of proteins.

  12. Conserved sequence elements in the 5' region of the Ultrabithorax transcription unit

    PubMed Central

    Wilde, C. Deborah; Akam, Michael

    1987-01-01

    Clones homologous to the 5' region of the Ultrabithorax gene of Drosophila melanogaster have been isolated from D. pseudoobscura, D. funebris and Musca domestica. Regions that encode most of the Ubx protein have been sequenced in all three of these species, and the 5' upstream region has been sequenced in D. funebris to a point ˜1000 bases upstream of the probable mRNA start site. Here we compare these sequences with those described elsewhere for D. melanogaster. Deduced amino acid sequences of the Ubx protein show 8% (D. pseudoobscura), 15% (D. funebris) and 22% (M. domestica) divergence from D. melanogaster. However, these figures mask very different rates of evolution in different regions of the protein. A glycine-rich (`hinge') region is conserved in each of these species, although its length is variable. Comparison of D. funebris and D. melanogaster sequences in the long 5' untranslated leader region of the mRNA, and in the region immediately upstream of the start point of transcription, reveals tightly conserved elements embedded in an otherwise non-homologous sequence. These conserved elements include a 118-bp region that spans the mRNA start site, an internally repetitive (TAA)n region in the untranslated leader and a short repeated motif immediately upstream of the ATG codon that initiates the major open reading frame of the Ubx protein. Two other conserved elements were identified upstream of the transcription start site; both elements have structural features consistent with a role as recognition sites for regulatory proteins. ImagesFig. 2. PMID:16453766

  13. Phylogenetic Analysis of Geographically Diverse Radopholus similis via rDNA Sequence Reveals a Monomorphic Motif

    PubMed Central

    Kaplan, D. T.; Thomas, W. K.; Frisse, L. M.; Sarah, J. L.; Stanton, J. M.; Speijer, P. R.; Marin, D. H.; Opperman, C. H.

    2000-01-01

    The nucleic acid sequences of rDNA ITS1 and the rDNA D2/D3 expansion segment were compared for 57 burrowing nematode isolates collected from Australia, Cameroon, Central America, Cuba, Dominican Republic, Florida, Guadeloupe, Hawaii, Nigeria, Honduras, Indonesia, Ivory Coast, Puerto Rico, South Africa, and Uganda. Of the 57 isolates, 55 were morphologically similar to Radopholus similis and seven were citrus-parasitic. The nucleic acid sequences for PCR-amplified ITS1 and for the D2/D3 expansion segment of the 28S rDNA gene were each identical for all putative R. similis. Sequence divergence for both the ITS1 and the D2/D3 was concordant with morphological differences that distinguish R. similis from other burrowing nematode species. This result substantiates previous observations that the R. similis genome is highly conserved across geographic regions. Autapomorphies that would delimit phylogenetic lineages of non-citrus-parasitic R. similis from those that parasitize citrus were not observed. The data presented herein support the concept that R. similis is comprised of two pathotypes-one that parasitizes citrus and one that does not. PMID:19270959

  14. Evolutionary conservation of long noncoding RNAs; sequence, structure, function

    PubMed Central

    Johnsson, Per; Lipovich, Leonard; Grandér, Dan; Morris, Kevin V.

    2014-01-01

    Background Recent advances in genome wide studies have revealed the abundance of long non-coding RNAs (lncRNAs) in mammalian transcriptomes. The ENCODE Consortium has elucidated the prevalence of human lncRNA genes, which are as numerous as protein-coding genes. Surprisingly, many lncRNAs do not show the same pattern of high interspecies conservation as protein-coding genes. The absence of functional studies and the frequent lack of sequence conservation therefore make functional interpretation of these newly discovered transcripts challenging. Many investigators have suggested the presence and importance of secondary structural elements within lncRNAs, but mammalian lncRNA secondary structure remains poorly understood. It is intriguing to speculate that in this group of genes, RNA secondary structures might be preserved throughout evolution and that this might explain the lack of sequence conservation among many lncRNAs. Scope of review Here, we review the extent of interspecies conservation among different lncRNAs, with a focus on a subset of lncRNAs that have been functionally investigated. The function of lncRNAs is widespread and we investigate whether different forms of functionalities may be conserved. Major conclusions Lack of conservation does not imbue a lack of function. We highlight several examples of lncRNAs where RNA structure appears to be the main functional unit and evolutionary constraint. We survey existing genomewide studies of mammalian lncRNA conservation and summarize their limitations. We further review specific human lncRNAs which lack evolutionary conservation beyond primates but have proven to be both functional and therapeutically relevant. General significance Pioneering studies highlight a role in lncRNAs for secondary structures, and possibly the presence of functional “modules”, which are interspersed with longer and less conserved stretches of nucleotide sequences. Taken together, high-throughput analysis of conservation and

  15. Creation of Hybrid Nanorods From Sequences of Natural Trimeric Fibrous Proteins Using the Fibritin Trimerization Motif

    NASA Astrophysics Data System (ADS)

    Papanikolopoulou, Katerina; van Raaij, Mark J.; Mitraki, Anna

    Stable, artificial fibrous proteins that can be functionalized open new avenues in fields such as bionanomaterials design and fiber engineering. An important source of inspiration for the creation of such proteins are natural fibrous proteins such as collagen, elastin, insect silks, and fibers from phages and viruses. The fibrous parts of this last class of proteins usually adopt trimeric, β-stranded structural folds and are appended to globular, receptor-binding domains. It has been recently shown that the globular domains are essential for correct folding and trimerization and can be successfully substituted by a very small (27-amino acid) trimerization motif from phage T4 fibritin. The hybrid proteins are correctly folded nanorods that can withstand extreme conditions. When the fibrous part derives from the adenovirus fiber shaft, different tissue-targeting specificities can be engineered into the hybrid proteins, which therefore can be used as gene therapy vectors. The integration of such stable nanorods in devices is also a big challenge in the field of biomechanical design. The fibritin foldon domain is a versatile trimerization motif and can be combined with a variety of fibrous motifs, such as coiled-coil, collagenous, and triple β-stranded motifs, provided the appropriate linkers are used. The combination of different motifs within the same fibrous molecule to create stable rods with multiple functions can even be envisioned. We provide a comprehensive overview of the experimental procedures used for designing, creating, and characterizing hybrid fibrous nanorods using the fibritin trimerization motif.

  16. Comparison of loline alkaloid gene clusters across fungal endophytes: predicting the co-regulatory sequence motifs and the evolutionary history.

    PubMed

    Kutil, Brandi L; Greenwald, Charles; Liu, Gang; Spiering, Martin J; Schardl, Christopher L; Wilkinson, Heather H

    2007-10-01

    LOL, a fungal secondary metabolite gene cluster found in Epichloë and Neotyphodium species, is responsible for production of insecticidal loline alkaloids. To analyze the genetic architecture and to predict the evolutionary history of LOL, we compared five clusters from four fungal species (single clusters from Epichloë festucae, Neotyphodium sp. PauTG-1, Neotyphodium coenophialum, and two clusters we previously characterized in Neotyphodium uncinatum). Using PhyloCon to compare putative lol gene promoter regions, we have identified four motifs conserved across the lol genes in all five clusters. Each motif has significant similarity to known fungal transcription factor binding sites in the TRANSFAC database. Conservation of these motifs is further support for the hypothesis that the lol genes are co-regulated. Interestingly, the history of asexual Neotyphodium spp. includes multiple interspecific hybridization events. Comparing clusters from three Neotyphodium species and E. festucae allowed us to determine which Epichloë ancestors are the most likely contributors of LOL in these asexual species. For example, while no present day Epichloë typhina isolates are known to produce lolines, our data support the hypothesis that the E. typhina ancestor(s) of three asexual endophyte species contained a LOL gene cluster. Thus, these data support a model of evolution in which the polymorphism in loline alkaloid production phenotypes among endophyte species is likely due to the loss of the trait over time.

  17. Specific Prenylation of Tomato Rab Proteins by Geranylgeranyl Type-II Transferase Requires a Conserved Cysteine-Cysteine Motif.

    PubMed

    Yalovsky, S.; Loraine, A. E.; Gruissem, W.

    1996-04-01

    Posttranslational isoprenylation of some small GTP-binding proteins is required for their biological activity. Rab geranylgeranyl transferase (Rab GGTase) uses geranylgeranyl pyrophosphate to modify Rab proteins, its only known substrates. Geranylgeranylation of Rabs is believed to promote their association with target membranes and interaction with other proteins. Plants, like other eukaryotes, contain Rab-like proteins that are associated with intracellular membranes. However, to our knowledge, the geranylgeranylation of Rab proteins has not yet been characterized from any plant source. This report presents an activity assay that allows the characterization of prenylation of Rab-like proteins in vitro, by protein extracts prepared from plants. Tomato Rab1 proteins and mammalian Rab1a were modified by geranylgeranyl pyrophosphate but not by farnesyl pyrophosphate. This modification required a conserved cysteine-cysteine motif. A mutant form lacking the cysteine-cysteine motif could not be modified, but inhibited the geranylgeranylation of its wild-type homolog. The tomato Rab proteins were modified in vitro by protein extract prepared from yeast, but failed to become modified when the protein extract was prepared from a yeast strain containing a mutant allele for the [alpha] subunit of yeast Rab GGTase (bet4 ts). These results demonstrate that plant cells, like other eukaryotes, contain Rab GGTase-like activity.

  18. Patterns of sequence conservation in presynaptic neural genes

    PubMed Central

    Hadley, Dexter; Murphy, Tara; Valladares, Otto; Hannenhalli, Sridhar; Ungar, Lyle; Kim, Junhyong; Bućan, Maja

    2006-01-01

    Background The neuronal synapse is a fundamental functional unit in the central nervous system of animals. Because synaptic function is evolutionarily conserved, we reasoned that functional sequences of genes and related genomic elements known to play important roles in neurotransmitter release would also be conserved. Results Evolutionary rate analysis revealed that presynaptic proteins evolve slowly, although some members of large gene families exhibit accelerated evolutionary rates relative to other family members. Comparative sequence analysis of 46 megabases spanning 150 presynaptic genes identified more than 26,000 elements that are highly conserved in eight vertebrate species, as well as a small subset of sequences (6%) that are shared among unrelated presynaptic genes. Analysis of large gene families revealed that upstream and intronic regions of closely related family members are extremely divergent. We also identified 504 exceptionally long conserved elements (≥360 base pairs, ≥80% pair-wise identity between human and other mammals) in intergenic and intronic regions of presynaptic genes. Many of these elements form a highly stable stem-loop RNA structure and consequently are candidates for novel regulatory elements, whereas some conserved noncoding elements are shown to correlate with specific gene expression profiles. The SynapseDB online database integrates these findings and other functional genomic resources for synaptic genes. Conclusion Highly conserved elements in nonprotein coding regions of 150 presynaptic genes represent sequences that may be involved in the transcriptional or post-transcriptional regulation of these genes. Furthermore, comparative sequence analysis will facilitate selection of genes and noncoding sequences for future functional studies and analysis of variation studies in neurodevelopmental and psychiatric disorders. PMID:17096848

  19. Sequence and domain conservation of the coelacanth Hsp40 and Hsp90 chaperones suggests conservation of function.

    PubMed

    Bishop, Özlem Tastan; Edkins, Adrienne Lesley; Blatch, Gregory Lloyd

    2014-09-01

    Molecular chaperones and their associated co-chaperones play an important role in preserving and regulating the active conformational state of cellular proteins. The chaperone complement of the Indonesian Coelacanth, Latimeria menadoensis, was elucidated using transcriptomic sequences. Heat shock protein 90 (Hsp90) and heat shock protein 40 (Hsp40) chaperones, and associated co-chaperones were focused on, and homologous human sequences were used to search the sequence databases. Coelacanth homologs of the cytosolic, mitochondrial and endoplasmic reticulum (ER) homologs of human Hsp90 were identified, as well as all of the major co-chaperones of the cytosolic isoform. Most of the human Hsp40s were found to have coelacanth homologs, and the data suggested that all of the chaperone machinery for protein folding at the ribosome, protein translocation to cellular compartments such as the ER and protein degradation were conserved. Some interesting similarities and differences were identified when interrogating human, mouse, and zebrafish homologs. For example, DnaJB13 is predicted to be a non-functional Hsp40 in humans, mouse, and zebrafish due to a corrupted histidine-proline-aspartic acid (HPD) motif, while the coelacanth homolog has an intact HPD. These and other comparisons enabled important functional and evolutionary questions to be posed for future experimental studies.

  20. Characterization of the mouse DAX-1 gene reveals evolutionary conservation of a unique amino-terminal motif and widespread expression in mouse tissue.

    PubMed

    Bae, D S; Schaefer, M L; Partan, B W; Muglia, L

    1996-09-01

    The human genetic disorder adrenal hypoplasia congenita with hypogonadotropic hypogonadism results from mutations in the recently isolated DAX-1 gene, a member of the nuclear hormone receptor superfamily. To study the role of DAX-1 in adrenal development and activation of the hypothalamic pituitary-gonadal axis, animal model systems will be essential. Here, we report the isolation and characterization of the mouse DAX-1 gene and its tissue-specific pattern of expression. The mouse DAX-1 gene codes for a 472-amino acid protein, with 75% overall nucleotide sequence homology to its human homolog. The 3.5 amino-terminal repeats of a unique motif with probable DNA-binding activity have been conserved between mouse and human, although highest conservation in the DAX-1 peptide exists in the carboxy-terminal ligand-binding domain. The DAX-1 gene remains X-linked in the mouse, consistent with its potential role in sex determination. We have developed a sensitive reverse transcription-PCR assay that detects DAX-1 messenger RNA in the central nervous system, pituitary, lung, heart, spleen, kidney, and thymus in addition to the adrenal and testis DAX-1 expression noted for the human DAX-1 gene. Future studies using mouse models of altered DAX-1 expression will be critical in defining the role of this factor in tissue- and development-specific gene regulation.

  1. A highly conserved motif at the COOH terminus dictates endoplasmic reticulum exit and cell surface expression of NKCC2.

    PubMed

    Zaarour, Nancy; Demaretz, Sylvie; Defontaine, Nadia; Mordasini, David; Laghmani, Kamel

    2009-08-01

    Mutations in the apically located Na(+)-K(+)-2Cl(-) co-transporter, NKCC2, lead to type I Bartter syndrome, a life-threatening kidney disorder, yet the mechanisms underlying the regulation of mutated NKCC2 proteins in renal cells have not been investigated. Here, we identified a trihydrophobic motif in the distal COOH terminus of NKCC2 that was required for endoplasmic reticulum (ER) exit and surface expression of the co-transporter. Indeed, microscopic confocal imaging showed that a naturally occurring mutation depriving NKCC2 of its distal COOH-terminal region results in the absence of cell surface expression. Biotinylation assays revealed that lack of cell surface expression was associated with abolition of mature complex-glycosylated NKCC2. Pulse-chase analysis demonstrated that the absence of mature protein was not caused by reduced synthesis or increased rates of degradation of mutant co-transporters. Co-immunolocalization experiments revealed that these mutants co-localized with the ER marker protein-disulfide isomerase, demonstrating that they are retained in the ER. Cell treatment with proteasome or lysosome inhibitors failed to restore the loss of complex-glycosylated NKCC2, further eliminating the possibility that mutant co-transporters were processed by the Golgi apparatus. Serial truncation of the NKCC2 COOH terminus, followed by site-directed mutagenesis, identified hydrophobic residues (1081)LLV(1083) as an ER exit signal necessary for maturation of NKCC2. Mutation of (1081)LLV(1083) to AAA within the context of the full-length protein prevented NKCC2 ER exit independently of the expression system. This trihydrophobic motif is highly conserved in the COOH-terminal tails of all members of the cation-chloride co-transporter family, and thus may function as a common motif mediating their transport from the ER to the cell surface. Taken together, these data are consistent with a model whereby naturally occurring premature terminations that interfere with

  2. Conserved Sequence Preferences Contribute to Substrate Recognition by the Proteasome*

    PubMed Central

    Yu, Houqing; Singh Gautam, Amit K.; Wilmington, Shameika R.; Wylie, Dennis; Martinez-Fonts, Kirby; Kago, Grace; Warburton, Marie; Chavali, Sreenivas; Inobe, Tomonao; Finkelstein, Ilya J.; Babu, M. Madan

    2016-01-01

    The proteasome has pronounced preferences for the amino acid sequence of its substrates at the site where it initiates degradation. Here, we report that modulating these sequences can tune the steady-state abundance of proteins over 2 orders of magnitude in cells. This is the same dynamic range as seen for inducing ubiquitination through a classic N-end rule degron. The stability and abundance of His3 constructs dictated by the initiation site affect survival of yeast cells and show that variation in proteasomal initiation can affect fitness. The proteasome's sequence preferences are linked directly to the affinity of the initiation sites to their receptor on the proteasome and are conserved between Saccharomyces cerevisiae, Schizosaccharomyces pombe, and human cells. These findings establish that the sequence composition of unstructured initiation sites influences protein abundance in vivo in an evolutionarily conserved manner and can affect phenotype and fitness. PMID:27226608

  3. Complete mitochondrial genome of the red drum, Sciaenops ocellatus (Perciformes, Sciaenidae): absence of the typical conserved motif in the origin of the light-strand replication.

    PubMed

    Cheng, Yuanzhi; Shi, Ge; Xu, Tianjun; Li, Haiyan; Sun, Yueyan; Wang, Rixin

    2012-04-01

    In this study, the complete mitochondrial genome of the red drum Sciaenops ocellatus was determined first. The genome was 16,500 bp in length and contained 13 protein-coding genes, 2 ribosomal RNA genes, 22 transfer RNA genes, and 2 main non-coding regions (the control region and the origin of the light-strand replication); the gene composition and order of which were similar to most other vertebrates. The overall base composition of the heavy strand was T 25.5%, C 30.7%, A 27.5%, and G 16.3%, with a slight AT bias of 53%. Within the control region, the discrete and conserved sequence blocks were identified. Motif 5'-ACCGG-3' rather than 5'-GCCGG-3' was detected in the origin of light-strand replication (O(L)) of red drum, which is rare in the mitogenomes of Sciaenidae species. These results would play an important role in elucidating sequence-function relationships of the O(L). PMID:22409755

  4. A conserved intronic U1 snRNP-binding sequence promotes trans-splicing in Drosophila.

    PubMed

    Gao, Jun-Li; Fan, Yu-Jie; Wang, Xiu-Ye; Zhang, Yu; Pu, Jia; Li, Liang; Shao, Wei; Zhan, Shuai; Hao, Jianjiang; Xu, Yong-Zhen

    2015-04-01

    Unlike typical cis-splicing, trans-splicing joins exons from two separate transcripts to produce chimeric mRNA and has been detected in most eukaryotes. Trans-splicing in trypanosomes and nematodes has been characterized as a spliced leader RNA-facilitated reaction; in contrast, its mechanism in higher eukaryotes remains unclear. Here we investigate mod(mdg4), a classic trans-spliced gene in Drosophila, and report that two critical RNA sequences in the middle of the last 5' intron, TSA and TSB, promote trans-splicing of mod(mdg4). In TSA, a 13-nucleotide (nt) core motif is conserved across Drosophila species and is essential and sufficient for trans-splicing, which binds U1 small nuclear RNP (snRNP) through strong base-pairing with U1 snRNA. In TSB, a conserved secondary structure acts as an enhancer. Deletions of TSA and TSB using the CRISPR/Cas9 system result in developmental defects in flies. Although it is not clear how the 5' intron finds the 3' introns, compensatory changes in U1 snRNA rescue trans-splicing of TSA mutants, demonstrating that U1 recruitment is critical to promote trans-splicing in vivo. Furthermore, TSA core-like motifs are found in many other trans-spliced Drosophila genes, including lola. These findings represent a novel mechanism of trans-splicing, in which RNA motifs in the 5' intron are sufficient to bring separate transcripts into close proximity to promote trans-splicing. PMID:25838544

  5. A conserved intronic U1 snRNP-binding sequence promotes trans-splicing in Drosophila.

    PubMed

    Gao, Jun-Li; Fan, Yu-Jie; Wang, Xiu-Ye; Zhang, Yu; Pu, Jia; Li, Liang; Shao, Wei; Zhan, Shuai; Hao, Jianjiang; Xu, Yong-Zhen

    2015-04-01

    Unlike typical cis-splicing, trans-splicing joins exons from two separate transcripts to produce chimeric mRNA and has been detected in most eukaryotes. Trans-splicing in trypanosomes and nematodes has been characterized as a spliced leader RNA-facilitated reaction; in contrast, its mechanism in higher eukaryotes remains unclear. Here we investigate mod(mdg4), a classic trans-spliced gene in Drosophila, and report that two critical RNA sequences in the middle of the last 5' intron, TSA and TSB, promote trans-splicing of mod(mdg4). In TSA, a 13-nucleotide (nt) core motif is conserved across Drosophila species and is essential and sufficient for trans-splicing, which binds U1 small nuclear RNP (snRNP) through strong base-pairing with U1 snRNA. In TSB, a conserved secondary structure acts as an enhancer. Deletions of TSA and TSB using the CRISPR/Cas9 system result in developmental defects in flies. Although it is not clear how the 5' intron finds the 3' introns, compensatory changes in U1 snRNA rescue trans-splicing of TSA mutants, demonstrating that U1 recruitment is critical to promote trans-splicing in vivo. Furthermore, TSA core-like motifs are found in many other trans-spliced Drosophila genes, including lola. These findings represent a novel mechanism of trans-splicing, in which RNA motifs in the 5' intron are sufficient to bring separate transcripts into close proximity to promote trans-splicing.

  6. A conserved intronic U1 snRNP-binding sequence promotes trans-splicing in Drosophila

    PubMed Central

    Gao, Jun-Li; Fan, Yu-Jie; Wang, Xiu-Ye; Zhang, Yu; Pu, Jia; Li, Liang; Shao, Wei; Zhan, Shuai; Hao, Jianjiang

    2015-01-01

    Unlike typical cis-splicing, trans-splicing joins exons from two separate transcripts to produce chimeric mRNA and has been detected in most eukaryotes. Trans-splicing in trypanosomes and nematodes has been characterized as a spliced leader RNA-facilitated reaction; in contrast, its mechanism in higher eukaryotes remains unclear. Here we investigate mod(mdg4), a classic trans-spliced gene in Drosophila, and report that two critical RNA sequences in the middle of the last 5′ intron, TSA and TSB, promote trans-splicing of mod(mdg4). In TSA, a 13-nucleotide (nt) core motif is conserved across Drosophila species and is essential and sufficient for trans-splicing, which binds U1 small nuclear RNP (snRNP) through strong base-pairing with U1 snRNA. In TSB, a conserved secondary structure acts as an enhancer. Deletions of TSA and TSB using the CRISPR/Cas9 system result in developmental defects in flies. Although it is not clear how the 5′ intron finds the 3′ introns, compensatory changes in U1 snRNA rescue trans-splicing of TSA mutants, demonstrating that U1 recruitment is critical to promote trans-splicing in vivo. Furthermore, TSA core-like motifs are found in many other trans-spliced Drosophila genes, including lola. These findings represent a novel mechanism of trans-splicing, in which RNA motifs in the 5′ intron are sufficient to bring separate transcripts into close proximity to promote trans-splicing. PMID:25838544

  7. Automated classification of RNA 3D motifs and the RNA 3D Motif Atlas.

    PubMed

    Petrov, Anton I; Zirbel, Craig L; Leontis, Neocles B

    2013-10-01

    The analysis of atomic-resolution RNA three-dimensional (3D) structures reveals that many internal and hairpin loops are modular, recurrent, and structured by conserved non-Watson-Crick base pairs. Structurally similar loops define RNA 3D motifs that are conserved in homologous RNA molecules, but can also occur at nonhomologous sites in diverse RNAs, and which often vary in sequence. To further our understanding of RNA motif structure and sequence variability and to provide a useful resource for structure modeling and prediction, we present a new method for automated classification of internal and hairpin loop RNA 3D motifs and a new online database called the RNA 3D Motif Atlas. To classify the motif instances, a representative set of internal and hairpin loops is automatically extracted from a nonredundant list of RNA-containing PDB files. Their structures are compared geometrically, all-against-all, using the FR3D program suite. The loops are clustered into motif groups, taking into account geometric similarity and structural annotations and making allowance for a variable number of bulged bases. The automated procedure that we have implemented identifies all hairpin and internal loop motifs previously described in the literature. All motif instances and motif groups are assigned unique and stable identifiers and are made available in the RNA 3D Motif Atlas (http://rna.bgsu.edu/motifs), which is automatically updated every four weeks. The RNA 3D Motif Atlas provides an interactive user interface for exploring motif diversity and tools for programmatic data access.

  8. Automated classification of RNA 3D motifs and the RNA 3D Motif Atlas

    PubMed Central

    Petrov, Anton I.; Zirbel, Craig L.; Leontis, Neocles B.

    2013-01-01

    The analysis of atomic-resolution RNA three-dimensional (3D) structures reveals that many internal and hairpin loops are modular, recurrent, and structured by conserved non-Watson–Crick base pairs. Structurally similar loops define RNA 3D motifs that are conserved in homologous RNA molecules, but can also occur at nonhomologous sites in diverse RNAs, and which often vary in sequence. To further our understanding of RNA motif structure and sequence variability and to provide a useful resource for structure modeling and prediction, we present a new method for automated classification of internal and hairpin loop RNA 3D motifs and a new online database called the RNA 3D Motif Atlas. To classify the motif instances, a representative set of internal and hairpin loops is automatically extracted from a nonredundant list of RNA-containing PDB files. Their structures are compared geometrically, all-against-all, using the FR3D program suite. The loops are clustered into motif groups, taking into account geometric similarity and structural annotations and making allowance for a variable number of bulged bases. The automated procedure that we have implemented identifies all hairpin and internal loop motifs previously described in the literature. All motif instances and motif groups are assigned unique and stable identifiers and are made available in the RNA 3D Motif Atlas (http://rna.bgsu.edu/motifs), which is automatically updated every four weeks. The RNA 3D Motif Atlas provides an interactive user interface for exploring motif diversity and tools for programmatic data access. PMID:23970545

  9. Gibbs motif sampling: detection of bacterial outer membrane protein repeats.

    PubMed Central

    Neuwald, A. F.; Liu, J. S.; Lawrence, C. E.

    1995-01-01

    The detection and alignment of locally conserved regions (motifs) in multiple sequences can provide insight into protein structure, function, and evolution. A new Gibbs sampling algorithm is described that detects motif-encoding regions in sequences and optimally partitions them into distinct motif models; this is illustrated using a set of immunoglobulin fold proteins. When applied to sequences sharing a single motif, the sampler can be used to classify motif regions into related submodels, as is illustrated using helix-turn-helix DNA-binding proteins. Other statistically based procedures are described for searching a database for sequences matching motifs found by the sampler. When applied to a set of 32 very distantly related bacterial integral outer membrane proteins, the sampler revealed that they share a subtle, repetitive motif. Although BLAST (Altschul SF et al., 1990, J Mol Biol 215:403-410) fails to detect significant pairwise similarity between any of the sequences, the repeats present in these outer membrane proteins, taken as a whole, are highly significant (based on a generally applicable statistical test for motifs described here). Analysis of bacterial porins with known trimeric beta-barrel structure and related proteins reveals a similar repetitive motif corresponding to alternating membrane-spanning beta-strands. These beta-strands occur on the membrane interface (as opposed to the trimeric interface) of the beta-barrel. The broad conservation and structural location of these repeats suggests that they play important functional roles. PMID:8520488

  10. Defining a Conformational Consensus Motif in Cotransin-Sensitive Signal Sequences: A Proteomic and Site-Directed Mutagenesis Study

    PubMed Central

    Klein, Wolfgang; Westendorf, Carolin; Schmidt, Antje; Conill-Cortés, Mercè; Rutz, Claudia; Blohs, Marcus; Beyermann, Michael; Protze, Jonas; Krause, Gerd; Krause, Eberhard; Schülein, Ralf

    2015-01-01

    The cyclodepsipeptide cotransin was described to inhibit the biosynthesis of a small subset of proteins by a signal sequence-discriminatory mechanism at the Sec61 protein-conducting channel. However, it was not clear how selective cotransin is, i.e. how many proteins are sensitive. Moreover, a consensus motif in signal sequences mediating cotransin sensitivity has yet not been described. To address these questions, we performed a proteomic study using cotransin-treated human hepatocellular carcinoma cells and the stable isotope labelling by amino acids in cell culture technique in combination with quantitative mass spectrometry. We used a saturating concentration of cotransin (30 micromolar) to identify also less-sensitive proteins and to discriminate the latter from completely resistant proteins. We found that the biosynthesis of almost all secreted proteins was cotransin-sensitive under these conditions. In contrast, biosynthesis of the majority of the integral membrane proteins was cotransin-resistant. Cotransin sensitivity of signal sequences was neither related to their length nor to their hydrophobicity. Instead, in the case of signal anchor sequences, we identified for the first time a conformational consensus motif mediating cotransin sensitivity. PMID:25806945

  11. Defining a conformational consensus motif in cotransin-sensitive signal sequences: a proteomic and site-directed mutagenesis study.

    PubMed

    Klein, Wolfgang; Westendorf, Carolin; Schmidt, Antje; Conill-Cortés, Mercè; Rutz, Claudia; Blohs, Marcus; Beyermann, Michael; Protze, Jonas; Krause, Gerd; Krause, Eberhard; Schülein, Ralf

    2015-01-01

    The cyclodepsipeptide cotransin was described to inhibit the biosynthesis of a small subset of proteins by a signal sequence-discriminatory mechanism at the Sec61 protein-conducting channel. However, it was not clear how selective cotransin is, i.e. how many proteins are sensitive. Moreover, a consensus motif in signal sequences mediating cotransin sensitivity has yet not been described. To address these questions, we performed a proteomic study using cotransin-treated human hepatocellular carcinoma cells and the stable isotope labelling by amino acids in cell culture technique in combination with quantitative mass spectrometry. We used a saturating concentration of cotransin (30 micromolar) to identify also less-sensitive proteins and to discriminate the latter from completely resistant proteins. We found that the biosynthesis of almost all secreted proteins was cotransin-sensitive under these conditions. In contrast, biosynthesis of the majority of the integral membrane proteins was cotransin-resistant. Cotransin sensitivity of signal sequences was neither related to their length nor to their hydrophobicity. Instead, in the case of signal anchor sequences, we identified for the first time a conformational consensus motif mediating cotransin sensitivity.

  12. Drosophila melanogaster Hox transcription factors access the RNA polymerase II machinery through direct homeodomain binding to a conserved motif of mediator subunit Med19.

    PubMed

    Boube, Muriel; Hudry, Bruno; Immarigeon, Clément; Carrier, Yannick; Bernat-Fabre, Sandra; Merabet, Samir; Graba, Yacine; Bourbon, Henri-Marc; Cribbs, David L

    2014-05-01

    Hox genes in species across the metazoa encode transcription factors (TFs) containing highly-conserved homeodomains that bind target DNA sequences to regulate batteries of developmental target genes. DNA-bound Hox proteins, together with other TF partners, induce an appropriate transcriptional response by RNA Polymerase II (PolII) and its associated general transcription factors. How the evolutionarily conserved Hox TFs interface with this general machinery to generate finely regulated transcriptional responses remains obscure. One major component of the PolII machinery, the Mediator (MED) transcription complex, is composed of roughly 30 protein subunits organized in modules that bridge the PolII enzyme to DNA-bound TFs. Here, we investigate the physical and functional interplay between Drosophila melanogaster Hox developmental TFs and MED complex proteins. We find that the Med19 subunit directly binds Hox homeodomains, in vitro and in vivo. Loss-of-function Med19 mutations act as dose-sensitive genetic modifiers that synergistically modulate Hox-directed developmental outcomes. Using clonal analysis, we identify a role for Med19 in Hox-dependent target gene activation. We identify a conserved, animal-specific motif that is required for Med19 homeodomain binding, and for activation of a specific Ultrabithorax target. These results provide the first direct molecular link between Hox homeodomain proteins and the general PolII machinery. They support a role for Med19 as a PolII holoenzyme-embedded "co-factor" that acts together with Hox proteins through their homeodomains in regulated developmental transcription.

  13. A conserved MADS-box phosphorylation motif regulates differentiation and mitochondrial function in skeletal, cardiac, and smooth muscle cells.

    PubMed

    Mughal, W; Nguyen, L; Pustylnik, S; da Silva Rosa, S C; Piotrowski, S; Chapman, D; Du, M; Alli, N S; Grigull, J; Halayko, A J; Aliani, M; Topham, M K; Epand, R M; Hatch, G M; Pereira, T J; Kereliuk, S; McDermott, J C; Rampitsch, C; Dolinsky, V W; Gordon, J W

    2015-01-01

    Exposure to metabolic disease during fetal development alters cellular differentiation and perturbs metabolic homeostasis, but the underlying molecular regulators of this phenomenon in muscle cells are not completely understood. To address this, we undertook a computational approach to identify cooperating partners of the myocyte enhancer factor-2 (MEF2) family of transcription factors, known regulators of muscle differentiation and metabolic function. We demonstrate that MEF2 and the serum response factor (SRF) collaboratively regulate the expression of numerous muscle-specific genes, including microRNA-133a (miR-133a). Using tandem mass spectrometry techniques, we identify a conserved phosphorylation motif within the MEF2 and SRF Mcm1 Agamous Deficiens SRF (MADS)-box that regulates miR-133a expression and mitochondrial function in response to a lipotoxic signal. Furthermore, reconstitution of MEF2 function by expression of a neutralizing mutation in this identified phosphorylation motif restores miR-133a expression and mitochondrial membrane potential during lipotoxicity. Mechanistically, we demonstrate that miR-133a regulates mitochondrial function through translational inhibition of a mitophagy and cell death modulating protein, called Nix. Finally, we show that rodents exposed to gestational diabetes during fetal development display muscle diacylglycerol accumulation, concurrent with insulin resistance, reduced miR-133a, and elevated Nix expression, as young adult rats. Given the diverse roles of miR-133a and Nix in regulating mitochondrial function, and proliferation in certain cancers, dysregulation of this genetic pathway may have broad implications involving insulin resistance, cardiovascular disease, and cancer biology. PMID:26512955

  14. Conserved structural motifs in the central pair complex of eukaryotic flagella.

    PubMed

    Carbajal-González, Blanca I; Heuser, Thomas; Fu, Xiaofeng; Lin, Jianfeng; Smith, Brandon W; Mitchell, David R; Nicastro, Daniela

    2013-02-01

    Cilia and flagella are conserved hair-like appendages of eukaryotic cells that function as sensing and motility generating organelles. Motility is driven by thousands of axonemal dyneins that require precise regulation. One essential motility regulator is the central pair complex (CPC) and many CPC defects cause paralysis of cilia/flagella. Several human diseases, such as immotile cilia syndrome, show CPC abnormalities, but little is known about the detailed three-dimensional (3D) structure and function of the CPC. The CPC is located in the center of typical [9+2] cilia/flagella and is composed of two singlet microtubules (MTs), each with a set of associated projections that extend toward the surrounding nine doublet MTs. Using cryo-electron tomography coupled with subtomogram averaging, we visualized and compared the 3D structures of the CPC in both the green alga Chlamydomonas and the sea urchin Strongylocentrotus at the highest resolution published to date. Despite the evolutionary distance between these species, their CPCs exhibit remarkable structural conservation. We identified several new projections, including those that form the elusive sheath, and show that the bridge has a more complex architecture than previously thought. Organism-specific differences include the presence of MT inner proteins in Chlamydomonas, but not Strongylocentrotus, and different overall outlines of the highly connected projection network, which forms a round-shaped cylinder in algae, but is more oval in sea urchin. These differences could be adaptations to the mechanical requirements of the rotating CPC in Chlamydomonas, compared to the Strongylocentrotus CPC which has a fixed orientation. PMID:23281266

  15. Conserved structural motifs in the central pair complex of eukaryotic flagella.

    PubMed

    Carbajal-González, Blanca I; Heuser, Thomas; Fu, Xiaofeng; Lin, Jianfeng; Smith, Brandon W; Mitchell, David R; Nicastro, Daniela

    2013-02-01

    Cilia and flagella are conserved hair-like appendages of eukaryotic cells that function as sensing and motility generating organelles. Motility is driven by thousands of axonemal dyneins that require precise regulation. One essential motility regulator is the central pair complex (CPC) and many CPC defects cause paralysis of cilia/flagella. Several human diseases, such as immotile cilia syndrome, show CPC abnormalities, but little is known about the detailed three-dimensional (3D) structure and function of the CPC. The CPC is located in the center of typical [9+2] cilia/flagella and is composed of two singlet microtubules (MTs), each with a set of associated projections that extend toward the surrounding nine doublet MTs. Using cryo-electron tomography coupled with subtomogram averaging, we visualized and compared the 3D structures of the CPC in both the green alga Chlamydomonas and the sea urchin Strongylocentrotus at the highest resolution published to date. Despite the evolutionary distance between these species, their CPCs exhibit remarkable structural conservation. We identified several new projections, including those that form the elusive sheath, and show that the bridge has a more complex architecture than previously thought. Organism-specific differences include the presence of MT inner proteins in Chlamydomonas, but not Strongylocentrotus, and different overall outlines of the highly connected projection network, which forms a round-shaped cylinder in algae, but is more oval in sea urchin. These differences could be adaptations to the mechanical requirements of the rotating CPC in Chlamydomonas, compared to the Strongylocentrotus CPC which has a fixed orientation.

  16. A conserved Glu-Arg salt bridge connects coevolved motifs that define the eukaryotic protein kinase fold.

    PubMed

    Yang, Jie; Wu, Jian; Steichen, Jon M; Kornev, Alexandr P; Deal, Michael S; Li, Sheng; Sankaran, Banumathi; Woods, Virgil L; Taylor, Susan S

    2012-01-27

    Eukaryotic protein kinases (EPKs) feature two coevolved structural segments, the Activation segment, which starts with the Asp-Phe-Gly (DFG) and ends with the Ala-Pro-Glu (APE) motifs, and the helical GHI subdomain that comprises αG-αH-αI helices. Eukaryotic-like kinases have a much shorter Activation segment and lack the GHI subdomain. They thus lack the conserved salt bridge interaction between the APE Glu and an Arg from the GHI subdomain, a hallmark signature of EPKs. Although the conservation of this salt bridge in EPKs is well known and its implication in diseases has been illustrated by polymorphism analysis, its function has not been carefully studied. In this work, we use murine cAMP-dependent protein kinase (protein kinase A) as the model enzyme (Glu208 and Arg280) to examine the role of these two residues. We showed that Ala replacement of either residue caused a 40- to 120-fold decrease in catalytic efficiency of the enzyme due to an increase in K(m)(ATP) and a decrease in k(cat). Crystal structures, as well as solution studies, also demonstrate that this ion pair contributes to the hydrophobic network and stability of the enzyme. We show that mutation of either Glu or Arg to Ala renders both mutant proteins less effective substrates for upstream kinase phosphoinositide-dependent kinase 1. We propose that the Glu208-Arg280 pair serves as a center hub of connectivity between these two structurally conserved elements in EPKs. Mutations of either residue disrupt communication not only between the two segments but also within the rest of the molecule, leading to altered catalytic activity and enzyme regulation.

  17. Sequence-Specific Recognition of DNA by Proteins: Binding Motifs Discovered Using a Novel Statistical/Computational Analysis

    PubMed Central

    Jakubec, David; Laskowski, Roman A.; Vondrasek, Jiri

    2016-01-01

    Decades of intensive experimental studies of the recognition of DNA sequences by proteins have provided us with a view of a diverse and complicated world in which few to no features are shared between individual DNA-binding protein families. The originally conceived direct readout of DNA residue sequences by amino acid side chains offers very limited capacity for sequence recognition, while the effects of the dynamic properties of the interacting partners remain difficult to quantify and almost impossible to generalise. In this work we investigated the energetic characteristics of all DNA residue—amino acid side chain combinations in the conformations found at the interaction interface in a very large set of protein—DNA complexes by the means of empirical potential-based calculations. General specificity-defining criteria were derived and utilised to look beyond the binding motifs considered in previous studies. Linking energetic favourability to the observed geometrical preferences, our approach reveals several additional amino acid motifs which can distinguish between individual DNA bases. Our results remained valid in environments with various dielectric properties. PMID:27384774

  18. Sequence-Specific Recognition of DNA by Proteins: Binding Motifs Discovered Using a Novel Statistical/Computational Analysis.

    PubMed

    Jakubec, David; Laskowski, Roman A; Vondrasek, Jiri

    2016-01-01

    Decades of intensive experimental studies of the recognition of DNA sequences by proteins have provided us with a view of a diverse and complicated world in which few to no features are shared between individual DNA-binding protein families. The originally conceived direct readout of DNA residue sequences by amino acid side chains offers very limited capacity for sequence recognition, while the effects of the dynamic properties of the interacting partners remain difficult to quantify and almost impossible to generalise. In this work we investigated the energetic characteristics of all DNA residue-amino acid side chain combinations in the conformations found at the interaction interface in a very large set of protein-DNA complexes by the means of empirical potential-based calculations. General specificity-defining criteria were derived and utilised to look beyond the binding motifs considered in previous studies. Linking energetic favourability to the observed geometrical preferences, our approach reveals several additional amino acid motifs which can distinguish between individual DNA bases. Our results remained valid in environments with various dielectric properties. PMID:27384774

  19. Sepsid even-skipped Enhancers Are Functionally Conserved in Drosophila Despite Lack of Sequence Conservation

    PubMed Central

    Iyer, Venky N.; Meier, Rudolf; Eisen, Michael B.

    2008-01-01

    The gene expression pattern specified by an animal regulatory sequence is generally viewed as arising from the particular arrangement of transcription factor binding sites it contains. However, we demonstrate here that regulatory sequences whose binding sites have been almost completely rearranged can still produce identical outputs. We sequenced the even-skipped locus from six species of scavenger flies (Sepsidae) that are highly diverged from the model species Drosophila melanogaster, but share its basic patterns of developmental gene expression. Although there is little sequence similarity between the sepsid eve enhancers and their well-characterized D. melanogaster counterparts, the sepsid and Drosophila enhancers drive nearly identical expression patterns in transgenic D. melanogaster embryos. We conclude that the molecular machinery that connects regulatory sequences to the transcription apparatus is more flexible than previously appreciated. In exploring this diverse collection of sequences to identify the shared features that account for their similar functions, we found a small number of short (20–30 bp) sequences nearly perfectly conserved among the species. These highly conserved sequences are strongly enriched for pairs of overlapping or adjacent binding sites. Together, these observations suggest that the local arrangement of binding sites relative to each other is more important than their overall arrangement into larger units of cis-regulatory function. PMID:18584029

  20. A Glance at Microsatellite Motifs from 454 Sequencing Reads of Watermelon Genomic DNA

    Technology Transfer Automated Retrieval System (TEKTRAN)

    A single 454 (Life Sciences Sequencing Technology) run of Charleston Gray watermelon (Citrullus lanatus var. lanatus) genomic DNA was performed and sequence data were assembled. A large scale identification of simple sequence repeat (SSR) was performed and SSR sequence data were used for the develo...

  1. Epsilon glutathione transferases possess a unique class-conserved subunit interface motif that directly interacts with glutathione in the active site.

    PubMed

    Wongsantichon, Jantana; Robinson, Robert C; Ketterman, Albert J

    2015-10-20

    Epsilon class glutathione transferases (GSTs) have been shown to contribute significantly to insecticide resistance. We report a new Epsilon class protein crystal structure from Drosophila melanogaster for the glutathione transferase DmGSTE6. The structure reveals a novel Epsilon clasp motif that is conserved across hundreds of millions of years of evolution of the insect Diptera order. This histidine-serine motif lies in the subunit interface and appears to contribute to quaternary stability as well as directly connecting the two glutathiones in the active sites of this dimeric enzyme.

  2. Epsilon glutathione transferases possess a unique class-conserved subunit interface motif that directly interacts with glutathione in the active site

    PubMed Central

    Wongsantichon, Jantana; Robinson, Robert C.; Ketterman, Albert J.

    2015-01-01

    Epsilon class glutathione transferases (GSTs) have been shown to contribute significantly to insecticide resistance. We report a new Epsilon class protein crystal structure from Drosophila melanogaster for the glutathione transferase DmGSTE6. The structure reveals a novel Epsilon clasp motif that is conserved across hundreds of millions of years of evolution of the insect Diptera order. This histidine-serine motif lies in the subunit interface and appears to contribute to quaternary stability as well as directly connecting the two glutathiones in the active sites of this dimeric enzyme. PMID:26487708

  3. Properties of Sequence Conservation in Upstream Regulatory and Protein Coding Sequences among Paralogs in Arabidopsis thaliana

    NASA Astrophysics Data System (ADS)

    Richardson, Dale N.; Wiehe, Thomas

    Whole genome duplication (WGD) has catalyzed the formation of new species, genes with novel functions, altered expression patterns, complexified signaling pathways and has provided organisms a level of genetic robustness. We studied the long-term evolution and interrelationships of 5’ upstream regulatory sequences (URSs), protein coding sequences (CDSs) and expression correlations (EC) of duplicated gene pairs in Arabidopsis. Three distinct methods revealed significant evolutionary conservation between paralogous URSs and were highly correlated with microarray-based expression correlation of the respective gene pairs. Positional information on exact matches between sequences unveiled the contribution of micro-chromosomal rearrangements on expression divergence. A three-way rank analysis of URS similarity, CDS divergence and EC uncovered specific gene functional biases. Transcription factor activity was associated with gene pairs exhibiting conserved URSs and divergent CDSs, whereas a broad array of metabolic enzymes was found to be associated with gene pairs showing diverged URSs but conserved CDSs.

  4. Automated conserved non-coding sequence (CNS) discovery reveals differences in gene content and promoter evolution among grasses

    PubMed Central

    Turco, Gina; Schnable, James C.; Pedersen, Brent; Freeling, Michael

    2013-01-01

    Conserved non-coding sequences (CNS) are islands of non-coding sequence that, like protein coding exons, show less divergence in sequence between related species than functionless DNA. Several CNSs have been demonstrated experimentally to function as cis-regulatory regions. However, the specific functions of most CNSs remain unknown. Previous searches for CNS in plants have either anchored on exons and only identified nearby sequences or required years of painstaking manual annotation. Here we present an open source tool that can accurately identify CNSs between any two related species with sequenced genomes, including both those immediately adjacent to exons and distal sequences separated by >12 kb of non-coding sequence. We have used this tool to characterize new motifs, associate CNSs with additional functions, and identify previously undetected genes encoding RNA and protein in the genomes of five grass species. We provide a list of 15,363 orthologous CNSs conserved across all grasses tested. We were also able to identify regulatory sequences present in the common ancestor of grasses that have been lost in one or more extant grass lineages. Lists of orthologous gene pairs and associated CNSs are provided for reference inbred lines of arabidopsis, Japonica rice, foxtail millet, sorghum, brachypodium, and maize. PMID:23874343

  5. Sturgeon conservation genomics: SNP discovery and validation using RAD sequencing.

    PubMed

    Ogden, R; Gharbi, K; Mugue, N; Martinsohn, J; Senn, H; Davey, J W; Pourkazemi, M; McEwing, R; Eland, C; Vidotto, M; Sergeev, A; Congiu, L

    2013-06-01

    Caviar-producing sturgeons belonging to the genus Acipenser are considered to be one of the most endangered species groups in the world. Continued overfishing in spite of increasing legislation, zero catch quotas and extensive aquaculture production have led to the collapse of wild stocks across Europe and Asia. The evolutionary relationships among Adriatic, Russian, Persian and Siberian sturgeons are complex because of past introgression events and remain poorly understood. Conservation management, traceability and enforcement suffer a lack of appropriate DNA markers for the genetic identification of sturgeon at the species, population and individual level. This study employed RAD sequencing to discover and characterize single nucleotide polymorphism (SNP) DNA markers for use in sturgeon conservation in these four tetraploid species over three biological levels, using a single sequencing lane. Four population meta-samples and eight individual samples from one family were barcoded separately before sequencing. Analysis of 14.4 Gb of paired-end RAD data focused on the identification of SNPs in the paired-end contig, with subsequent in silico and empirical validation of candidate markers. Thousands of putatively informative markers were identified including, for the first time, SNPs that show population-wide differentiation between Russian and Persian sturgeons, representing an important advance in our ability to manage these cryptic species. The results highlight the challenges of genotyping-by-sequencing in polyploid taxa, while establishing the potential genetic resources for developing a new range of caviar traceability and enforcement tools. PMID:23473098

  6. Conservation patterns in different functional sequence categoriesof divergent Drosophila species

    SciTech Connect

    Papatsenko, Dmitri; Kislyuk, Andrey; Levine, Michael; Dubchak, Inna

    2005-10-01

    We have explored the distributions of fully conservedungapped blocks in genome-wide pairwise alignments of recently completedspecies of Drosophila: D.yakuba, D.ananassae, D.pseudoobscura, D.virilisand D.mojavensis. Based on these distributions we have found that nearlyevery functional sequence category possesses its own distinctiveconservation pattern, sometimes independent of the overall sequenceconservation level. In the coding and regulatory regions, the ungappedblocks were longer than in introns, UTRs and non-functional sequences. Atthe same time, the blocks in the coding regions carried 3N+2 signaturecharacteristic to synonymic substitutions in the 3rd codon positions.Larger block sizes in transcription regulatory regions can be explainedby the presence of conserved arrays of binding sites for transcriptionfactors. We also have shown that the longest ungapped blocks, or'ultraconserved' sequences, are associated with specific gene groups,including those encoding ion channels and components of the cytoskeleton.We discussed how restrained conservation patterns may help in mappingfunctional sequence categories and improving genomeannotation.

  7. Sturgeon conservation genomics: SNP discovery and validation using RAD sequencing.

    PubMed

    Ogden, R; Gharbi, K; Mugue, N; Martinsohn, J; Senn, H; Davey, J W; Pourkazemi, M; McEwing, R; Eland, C; Vidotto, M; Sergeev, A; Congiu, L

    2013-06-01

    Caviar-producing sturgeons belonging to the genus Acipenser are considered to be one of the most endangered species groups in the world. Continued overfishing in spite of increasing legislation, zero catch quotas and extensive aquaculture production have led to the collapse of wild stocks across Europe and Asia. The evolutionary relationships among Adriatic, Russian, Persian and Siberian sturgeons are complex because of past introgression events and remain poorly understood. Conservation management, traceability and enforcement suffer a lack of appropriate DNA markers for the genetic identification of sturgeon at the species, population and individual level. This study employed RAD sequencing to discover and characterize single nucleotide polymorphism (SNP) DNA markers for use in sturgeon conservation in these four tetraploid species over three biological levels, using a single sequencing lane. Four population meta-samples and eight individual samples from one family were barcoded separately before sequencing. Analysis of 14.4 Gb of paired-end RAD data focused on the identification of SNPs in the paired-end contig, with subsequent in silico and empirical validation of candidate markers. Thousands of putatively informative markers were identified including, for the first time, SNPs that show population-wide differentiation between Russian and Persian sturgeons, representing an important advance in our ability to manage these cryptic species. The results highlight the challenges of genotyping-by-sequencing in polyploid taxa, while establishing the potential genetic resources for developing a new range of caviar traceability and enforcement tools.

  8. Conservation patterns in angiosperm rDNA ITS2 sequences.

    PubMed Central

    Hershkovitz, M A; Zimmer, E A

    1996-01-01

    The two internal transcribed spacers (ITS1 and ITS2) of nuclear ribosomal DNA have become commonly exploited sources of informative variation for interspecific-/intergeneric-level phylogenetic analyses among angiosperms and other eukaryotes. We present an alignment in which one-third to one-half of the ITS2 sequence is alignable above the family level in angiosperms and a phenetic analysis showing that ITS2 contains information sufficient to diagnose lineages at several hierarchical levels. Base compositional analysis shows that angiosperm ITS2 is inherently GC-rich, and that the proportion of T is much more variable than that for other bases. We propose a general model of angiosperm ITS2 secondary structure that shows common pairing relationships for most of the conserved sequence tracts. Variations in our secondary structure predictions for sequences from different taxa indicate that compensatory mutation is not limited to paired positions. PMID:8760866

  9. Conservative Patch Algorithm and Mesh Sequencing for PAB3D

    NASA Technical Reports Server (NTRS)

    Pao, S. P.; Abdol-Hamid, K. S.

    2005-01-01

    A mesh-sequencing algorithm and a conservative patched-grid-interface algorithm (hereafter Patch Algorithm ) have been incorporated into the PAB3D code, which is a computer program that solves the Navier-Stokes equations for the simulation of subsonic, transonic, or supersonic flows surrounding an aircraft or other complex aerodynamic shapes. These algorithms are efficient, flexible, and have added tremendously to the capabilities of PAB3D. The mesh-sequencing algorithm makes it possible to perform preliminary computations using only a fraction of the grid cells (provided the original cell count is divisible by an integer) along any grid coordinate axis, independently of the other axes. The patch algorithm addresses another critical need in multi-block grid situation where the cell faces of adjacent grid blocks may not coincide, leading to errors in calculating fluxes of conserved physical quantities across interfaces between the blocks. The patch algorithm, based on the Stokes integral formulation of the applicable conservation laws, effectively matches each of the interfacial cells on one side of the block interface to the corresponding fractional cell area pieces on the other side. This approach is comprehensive and unified such that all interface topology is automatically processed without user intervention. This algorithm is implemented in a preprocessing code that creates a cell-by-cell database that will maintain flux conservation at any level of full or reduced grid density as the user may choose by way of the mesh-sequencing algorithm. These two algorithms have enhanced the numerical accuracy of the code, reduced the time and effort for grid preprocessing, and provided users with the flexibility of performing computations at any desired full or reduced grid resolution to suit their specific computational requirements.

  10. Fox-2 Splicing Factor Binds to a Conserved Intron Motif to PromoteInclusion of Protein 4.1R Alternative Exon 16

    SciTech Connect

    Ponthier, Julie L.; Schluepen, Christina; Chen, Weiguo; Lersch,Robert A.; Gee, Sherry L.; Hou, Victor C.; Lo, Annie J.; Short, Sarah A.; Chasis, Joel A.; Winkelmann, John C.; Conboy, John G.

    2006-03-01

    Activation of protein 4.1R exon 16 (E16) inclusion during erythropoiesis represents a physiologically important splicing switch that increases 4.1R affinity for spectrin and actin. Previous studies showed that negative regulation of E16 splicing is mediated by the binding of hnRNP A/B proteins to silencer elements in the exon and that downregulation of hnRNP A/B proteins in erythroblasts leads to activation of E16 inclusion. This paper demonstrates that positive regulation of E16 splicing can be mediated by Fox-2 or Fox-1, two closely related splicing factors that possess identical RNA recognition motifs. SELEX experiments with human Fox-1 revealed highly selective binding to the hexamer UGCAUG. Both Fox-1 and Fox-2 were able to bind the conserved UGCAUG elements in the proximal intron downstream of E16, and both could activate E16 splicing in HeLa cell co-transfection assays in a UGCAUG-dependent manner. Conversely, knockdown of Fox-2 expression, achieved with two different siRNA sequences resulted in decreased E16 splicing. Moreover, immunoblot experiments demonstrate mouse erythroblasts express Fox-2, but not Fox-1. These findings suggest that Fox-2 is a physiological activator of E16 splicing in differentiating erythroid cells in vivo. Recent experiments show that UGCAUG is present in the proximal intron sequence of many tissue-specific alternative exons, and we propose that the Fox family of splicing enhancers plays an important role in alternative splicing switches during differentiation in metazoan organisms.

  11. Conserved Ser/Arg-rich Motif in PPZ Orthologs from Fungi Is Important for Its Role in Cation Tolerance

    PubMed Central

    Minhas, Anupriya; Sharma, Anupam; Kaur, Harsimran; Rawal, Yashpal; Ganesan, Kaliannan; Mondal, Alok K.

    2012-01-01

    PPZ1 orthologs, novel members of a phosphoprotein phosphatase family of phosphatases, are found only in fungi. They regulate diverse physiological processes in fungi e.g. ion homeostasis, cell size, cell integrity, etc. Although they are an important determinant of salt tolerance in fungi, their physiological role remained unexplored in any halotolerant species. In this context we report here molecular and functional characterization of DhPPZ1 from Debaryomyces hansenii, which is one of the most halotolerant and osmotolerant species of yeast. Our results showed that DhPPZ1 knock-out strain displayed higher tolerance to toxic cations, and unlike in Saccharomyces cerevisiae, Na+/H+ antiporter appeared to have an important role in this process. Besides salt tolerance, DhPPZ1 also had role in cell wall integrity and growth in D. hansenii. We have also identified a short, serine-arginine-rich sequence motif in DhPpz1p that is essential for its role in salt tolerance but not in other physiological processes. Taken together, these results underscore a distinct role of DhPpz1p in D. hansenii and illustrate an example of how organisms utilize the same molecular tool box differently to garner adaptive fitness for their respective ecological niches. PMID:22232558

  12. In Vivo Enhancer Analysis Chromosome 16 Conserved NoncodingSequences

    SciTech Connect

    Pennacchio, Len A.; Ahituv, Nadav; Moses, Alan M.; Nobrega,Marcelo; Prabhakar, Shyam; Shoukry, Malak; Minovitsky, Simon; Visel,Axel; Dubchak, Inna; Holt, Amy; Lewis, Keith D.; Plajzer-Frick, Ingrid; Akiyama, Jennifer; De Val, Sarah; Afzal, Veena; Black, Brian L.; Couronne, Olivier; Eisen, Michael B.; Rubin, Edward M.

    2006-02-01

    The identification of enhancers with predicted specificitiesin vertebrate genomes remains a significant challenge that is hampered bya lack of experimentally validated training sets. In this study, weleveraged extreme evolutionary sequence conservation as a filter toidentify putative gene regulatory elements and characterized the in vivoenhancer activity of human-fish conserved and ultraconserved1 noncodingelements on human chromosome 16 as well as such elements from elsewherein the genome. We initially tested 165 of these extremely conservedsequences in a transgenic mouse enhancer assay and observed that 48percent (79/165) functioned reproducibly as tissue-specific enhancers ofgene expression at embryonic day 11.5. While driving expression in abroad range of anatomical structures in the embryo, the majority of the79 enhancers drove expression in various regions of the developingnervous system. Studying a set of DNA elements that specifically droveforebrain expression, we identified DNA signatures specifically enrichedin these elements and used these parameters to rank all ~;3,400human-fugu conserved noncoding elements in the human genome. The testingof the top predictions in transgenic mice resulted in a three-foldenrichment for sequences with forebrain enhancer activity. These datadramatically expand the catalogue of in vivo-characterized human geneenhancers and illustrate the future utility of such training sets for avariety of iological applications including decoding the regulatoryvocabulary of the human genome.

  13. Rapid characterization of CRISPR-Cas9 protospacer adjacent motif sequence elements.

    PubMed

    Karvelis, Tautvydas; Gasiunas, Giedrius; Young, Joshua; Bigelyte, Greta; Silanskas, Arunas; Cigan, Mark; Siksnys, Virginijus

    2015-01-01

    To expand the repertoire of Cas9s available for genome targeting, we present a new in vitro method for the simultaneous examination of guide RNA and protospacer adjacent motif (PAM) requirements. The method relies on the in vitro cleavage of plasmid libraries containing a randomized PAM as a function of Cas9-guide RNA complex concentration. Using this method, we accurately reproduce the canonical PAM preferences for Streptococcus pyogenes, Streptococcus thermophilus CRISPR3 (Sth3), and CRISPR1 (Sth1). Additionally, PAM and sgRNA solutions for a novel Cas9 protein from Brevibacillus laterosporus are provided by the assay and are demonstrated to support functional activity in vitro and in plants. PMID:26585795

  14. Rice MEL2, the RNA recognition motif (RRM) protein, binds in vitro to meiosis-expressed genes containing U-rich RNA consensus sequences in the 3'-UTR.

    PubMed

    Miyazaki, Saori; Sato, Yutaka; Asano, Tomoya; Nagamura, Yoshiaki; Nonomura, Ken-Ichi

    2015-10-01

    Post-transcriptional gene regulation by RNA recognition motif (RRM) proteins through binding to cis-elements in the 3'-untranslated region (3'-UTR) is widely used in eukaryotes to complete various biological processes. Rice MEIOSIS ARRESTED AT LEPTOTENE2 (MEL2) is the RRM protein that functions in the transition to meiosis in proper timing. The MEL2 RRM preferentially associated with the U-rich RNA consensus, UUAGUU[U/A][U/G][A/U/G]U, dependently on sequences and proportionally to MEL2 protein amounts in vitro. The consensus sequences were located in the putative looped structures of the RNA ligand. A genome-wide survey revealed a tendency of MEL2-binding consensus appearing in 3'-UTR of rice genes. Of 249 genes that conserved the consensus in their 3'-UTR, 13 genes spatiotemporally co-expressed with MEL2 in meiotic flowers, and included several genes whose function was supposed in meiosis; such as Replication protein A and OsMADS3. The proteome analysis revealed that the amounts of small ubiquitin-related modifier-like protein and eukaryotic translation initiation factor3-like protein were dramatically altered in mel2 mutant anthers. Taken together with transcriptome and gene ontology results, we propose that the rice MEL2 is involved in the translational regulation of key meiotic genes on 3'-UTRs to achieve the faithful transition of germ cells to meiosis.

  15. Armadillo motifs involved in vesicular transport.

    PubMed

    Striegl, Harald; Andrade-Navarro, Miguel A; Heinemann, Udo

    2010-02-01

    Armadillo (ARM) repeat proteins function in various cellular processes including vesicular transport and membrane tethering. They contain an imperfect repeating sequence motif that forms a conserved three-dimensional structure. Recently, structural and functional insight into tethering mediated by the ARM-repeat protein p115 has been provided. Here we describe the p115 ARM-motifs for reasons of clarity and nomenclature and show that both sequence and structure are highly conserved among ARM-repeat proteins. We argue that there is no need to invoke repeat types other than ARM repeats for a proper description of the structure of the p115 globular head region. Additionally, we propose to define a new subfamily of ARM-like proteins and show lack of evidence that the ARM motifs found in p115 are present in other long coiled-coil tethering factors of the golgin family.

  16. The BEN domain is a novel sequence-specific DNA-binding domain conserved in neural transcriptional repressors

    PubMed Central

    Dai, Qi; Ren, Aiming; Westholm, Jakub O.; Serganov, Artem A.; Patel, Dinshaw J.; Lai, Eric C.

    2013-01-01

    We recently reported that Drosophila Insensitive (Insv) promotes sensory organ development and has activity as a nuclear corepressor for the Notch transcription factor Suppressor of Hairless [Su(H)]. Insv lacks domains of known biochemical function but contains a single BEN domain (i.e., a “BEN-solo” protein). Our chromatin immunoprecipitation (ChIP) sequencing (ChIP-seq) analysis confirmed binding of Insensitive to Su(H) target genes in the Enhancer of split gene complex [E(spl)-C]; however, de novo motif analysis revealed a novel site strongly enriched in Insv peaks (TCYAATHRGAA). We validate binding of endogenous Insv to genomic regions bearing such sites, whose associated genes are enriched for neural functions and are functionally repressed by Insv. Unexpectedly, we found that the Insv BEN domain binds specifically to this sequence motif and that Insv directly regulates transcription via this motif. We determined the crystal structure of the BEN–DNA target complex, revealing homodimeric binding of the BEN domain and extensive nucleotide contacts via α helices and a C-terminal loop. Point mutations in key DNA-contacting residues severely impair DNA binding in vitro and capacity for transcriptional regulation in vivo. We further demonstrate DNA-binding and repression activities by the mammalian neural BEN-solo protein BEND5. Altogether, we define novel DNA-binding activity in a conserved family of transcriptional repressors, opening a molecular window on this extensive gene family. PMID:23468431

  17. Co-conservation of rRNA tetraloop sequences and helix length suggests involvement of the tetraloops in higher-order interactions

    NASA Technical Reports Server (NTRS)

    Hedenstierna, K. O.; Siefert, J. L.; Fox, G. E.; Murgola, E. J.

    2000-01-01

    Terminal loops containing four nucleotides (tetraloops) are common in structural RNAs, and they frequently conform to one of three sequence motifs, GNRA, UNCG, or CUUG. Here we compare available sequences and secondary structures for rRNAs from bacteria, and we show that helices capped by phylogenetically conserved GNRA loops display a strong tendency to be of conserved length. The simplest interpretation of this correlation is that the conserved GNRA loops are involved in higher-order interactions, intramolecular or intermolecular, resulting in a selective pressure for maintaining the lengths of these helices. A small number of conserved UNCG loops were also found to be associated with conserved length helices, consistent with the possibility that this type of tetraloop also takes part in higher-order interactions.

  18. Phenotypic consequences of mutations in the conserved motifs of the putative helicase domain of the human Cockayne syndrome group B gene.

    PubMed

    Muftuoglu, Meltem; Selzer, Rebecca; Tuo, Jingsheng; Brosh, Robert M; Bohr, Vilhelm A

    2002-01-23

    Cockayne syndrome (CS) is a human genetic disorder characterized by several neurological and developmental abnormalities. Two genetic complementation groups, CS-A and CS-B, have been identified. The CSB protein belongs to helicase superfamily 2, and to the SWI/SNF family of proteins. The CSB protein is implicated in transcription-coupled repair (TCR), basal transcription and chromatin remodeling. In addition, CS cells undergo UV-induced apoptosis at much lower doses than normal cells. However, the molecular function of the CSB protein in these biological pathways has remained unclear. Evidence indicates that the integrity of the Walker A and B boxes (motifs I and II) are important for CSB function, but the functional significance of the helicase motifs Ia, III--IV has not been previously examined. In this study, single amino acid changes in highly conserved residues of helicase motifs Ia, III, V, VI and a second putative nucleotide-binding motif (NTB) of the CSB protein were generated by site-directed mutagenesis to analyze the genetic function of the CSB protein in survival, RNA synthesis recovery and apoptosis after UV treatment. The survival analysis of these CS-B mutant cell lines was also performed after treatment with the chemical carcinogen, 4-nitroquinoline-1-oxide (4-NQO). The lesions induced by UV light, cyclobutane pyrimidine dimers, are known to be repaired by TCR whereas the lesions induced by 4-NQO are repaired by global genome repair. The results of this study demonstrate that the point mutations in highly conserved residues of helicase motifs Ia, III, V and VI abolished the genetic function of the CSB protein in survival, RNA synthesis recovery and apoptosis after UV treatment. Similarly, the same mutants failed to complement the sensitivity toward 4-NQO. Thus, the integrity of these helicase motifs is important for the biological function of the CSB protein. On the contrary, a point mutation in a C-terminal, second, NTB motif of the CSB protein

  19. Infection of capilloviruses requires subgenomic RNAs whose transcription is controlled by promoter-like sequences conserved among flexiviruses.

    PubMed

    Komatsu, Ken; Hirata, Hisae; Fukagawa, Takako; Yamaji, Yasuyuki; Okano, Yukari; Ishikawa, Kazuya; Adachi, Tatsushi; Maejima, Kensaku; Hashimoto, Masayoshi; Namba, Shigetou

    2012-07-01

    The first open-reading frame (ORF) of apple stem grooving virus (ASGV), of the genus Capillovirus, encodes an apparently chimeric polyprotein containing conserved regions for replicase (Rep) and coat protein (CP). However, our previous study revealed that ASGV mutants with distinct and discontinuous Rep- and CP-coding regions successfully infect plants, indicating that CP expressed via a subgenomic RNA (sgRNA) is sufficient for viability of the virus. Here we identified a transcription start site of the CP sgRNA and revealed that CP translated from the sgRNA is essential for ASGV infection. We mapped the transcription start sites of both the CP and the movement protein (MP) sgRNAs of ASGV and found a hexanucleotide motif, UUAGGU, conserved upstream from both sgRNA transcription start sites. Mutational analysis of the putative CP initiation codon and of the UUAGGU sequence upstream from the transcription start site of CP sgRNA demonstrated their importance for ASGV accumulation. Our results also demonstrated that potato virus T (PVT), an unassigned species closely related to ASGV, produces two sgRNAs putatively deployed for the CP and MP expression and that the same hexanucleotide motif as found in ASGV is located upstream from the transcription start sites of both sgRNAs. This motif, which constituted putative core elements of the sgRNA promoter, is broadly conserved among viruses in the families Alphaflexiviridae and Betaflexiviridae, suggesting that the gene expression strategy of the viruses in both families has been conserved throughout evolution.

  20. Nucleotide sequence conservation in paramyxoviruses; the concept of codon constellation.

    PubMed

    Rima, Bert K

    2015-05-01

    The stability and conservation of the sequences of RNA viruses in the field and the high error rates measured in vitro are paradoxical. The field stability indicates that there are very strong selective constraints on sequence diversity. The nature of these constraints is discussed. Apart from constraints on variation in cis-acting RNA and the amino acid sequences of viral proteins, there are other ones relating to the presence of specific dinucleotides such CpG and UpA as well as the importance of RNA secondary structures and RNA degradation rates. Recent other constraints identified in other RNA viruses, such as effects of secondary RNA structure on protein folding or modification of cellular tRNA complements, are also discussed. Using the family Paramyxoviridae, I show that the codon usage pattern (CUP) is (i) specific for each virus species and (ii) that it is markedly different from the host - it does not vary even in vaccine viruses that have been derived by passage in a number of inappropriate host cells. The CUP might thus be an additional constraint on variation, and I propose the concept of codon constellation to indicate the informational content of the sequences of RNA molecules relating not only to stability and structure but also to the efficiency of translation of a viral mRNA resulting from the CUP and the numbers and position of rare codons.

  1. Genetic diversity of the conserved motifs of six bacterial leaf blight resistance genes in a set of rice landraces

    PubMed Central

    2014-01-01

    Background Bacterial leaf blight (BLB) caused by the vascular pathogen Xanthomonas oryzae pv. oryzae (Xoo) is one of the most serious diseases leading to crop failure in rice growing countries. A total of 37 resistance genes against Xoo has been identified in rice. Of these, ten BLB resistance genes have been mapped on rice chromosomes, while 6 have been cloned, sequenced and characterized. Diversity analysis at the resistance gene level of this disease is scanty, and the landraces from West Bengal and North Eastern states of India have received little attention so far. The objective of this study was to assess the genetic diversity at conserved domains of 6 BLB resistance genes in a set of 22 rice accessions including landraces and check genotypes collected from the states of Assam, Nagaland, Mizoram and West Bengal. Results In this study 34 pairs of primers were designed from conserved domains of 6 BLB resistance genes; Xa1, xa5, Xa21, Xa21(A1), Xa26 and Xa27. The designed primer pairs were used to generate PCR based polymorphic DNA profiles to detect and elucidate the genetic diversity of the six genes in the 22 diverse rice accessions of known disease phenotype. A total of 140 alleles were identified including 41 rare and 26 null alleles. The average polymorphism information content (PIC) value was 0.56/primer pair. The DNA profiles identified each of the rice landraces unequivocally. The amplified polymorphic DNA bands were used to calculate genetic similarity of the rice landraces in all possible pair combinations. The similarity among the rice accessions ranged from 18% to 89% and the dendrogram produced from the similarity values was divided into 2 major clusters. The conserved domains identified within the sequenced rare alleles include Leucine-Rich Repeat, BED-type zinc finger domain, sugar transferase domain and the domain of the carbohydrate esterase 4 superfamily. Conclusions This study revealed high genetic diversity at conserved domains of six BLB

  2. A sequence motif enriched in regions bound by the Drosophila dosage compensation complex

    PubMed Central

    2010-01-01

    Background In Drosophila melanogaster, dosage compensation is mediated by the action of the dosage compensation complex (DCC). How the DCC recognizes the fly X chromosome is still poorly understood. Characteristic sequence signatures at all DCC binding sites have not hitherto been found. Results In this study, we compare the known binding sites of the DCC with oligonucleotide profiles that measure the specificity of the sequences of the D. melanogaster X chromosome. We show that the X chromosome regions bound by the DCC are enriched for a particular type of short, repetitive sequences. Their distribution suggests that these sequences contribute to chromosome recognition, the generation of DCC binding sites and/or the local spreading of the complex. Comparative data indicate that the same sequences may be involved in dosage compensation in other Drosophila species. Conclusions These results offer an explanation for the wild-type binding of the DCC along the Drosophila X chromosome, contribute to delineate the forces leading to the establishment of dosage compensation and suggest new experimental approaches to understand the precise biochemical features of the dosage compensation system. PMID:20226017

  3. A Conserved Motif in the Membrane Proximal C-Terminal Tail of Human Muscarinic M1 Acetylcholine Receptors Affects Plasma Membrane Expression

    PubMed Central

    Ehlert, Frederick J.; Shults, Crystal A.

    2010-01-01

    We investigated the functional role of a conserved motif, F(x)6LL, in the membrane proximal C-tail of the human muscarinic M1 (hM1) receptor. By use of site-directed mutagenesis, several different point mutations were introduced into the C-tail sequence 423FRDTFRLLL431. Wild-type and mutant hM1 receptors were transiently expressed in Chinese hamster ovary cells, and the amount of plasma membrane-expressed receptor was determined by use of intact, whole-cell [3H]N-methylscopolamine binding assays. The plasma membrane expression of hM1 receptors possessing either L430A or L431A or both point mutations was significantly reduced compared with the wild type. The hM1 receptor possessing a L430A/L431A double-point mutation was retained in the endoplasmic reticulum (ER), and atropine treatment caused the redistribution of the mutant receptor from the ER to the plasma membrane. Atropine treatment also caused an increase in the maximal response and potency of carbachol-stimulated phosphoinositide hydrolysis elicited by the L430A/L431A mutant. The effect of atropine on the L430A/L431A receptor mutant suggests that L430 and L431 play a role in folding hM1 receptors, which is necessary for exit from the ER. Using site-directed mutagenesis, we also identified amino acid residues at the base of transmembrane-spanning domain 1 (TM1), V46 and L47, that, when mutated, reduce the plasma membrane expression of hM1 receptors in an atropine-reversible manner. Overall, these mutagenesis data show that amino acid residues in the membrane-proximal C-tail and base of TM1 are necessary for hM1 receptors to achieve a transport-competent state. PMID:19841475

  4. Structural analysis of a repetitive protein sequence motif in strepsirrhine primate amelogenin.

    PubMed

    Lacruz, Rodrigo S; Lakshminarayanan, Rajamani; Bromley, Keith M; Hacia, Joseph G; Bromage, Timothy G; Snead, Malcolm L; Moradian-Oldak, Janet; Paine, Michael L

    2011-01-01

    Strepsirrhines are members of a primate suborder that has a distinctive set of features associated with the development of the dentition. Amelogenin (AMEL), the better known of the enamel matrix proteins, forms 90% of the secreted organic matrix during amelogenesis. Although AMEL has been sequenced in numerous mammalian lineages, the only reported strepsirrhine AMEL sequences are those of the ring-tailed lemur and galago, which contain a set of additional proline-rich tandem repeats absent in all other primates species analyzed to date, but present in some non-primate mammals. Here, we first determined that these repeats are present in AMEL from three additional lemur species and thus are likely to be widespread throughout this group. To evaluate the functional relevance of these repeats in strepsirrhines, we engineered a mutated murine amelogenin sequence containing a similar proline-rich sequence to that of Lemur catta. In the monomeric form, the MQP insertions had no influence on the secondary structure or refolding properties, whereas in the assembled form, the insertions increased the hydrodynamic radii. We speculate that increased AMEL nanosphere size may influence enamel formation in strepsirrhine primates. PMID:21437261

  5. Structural Analysis of a Repetitive Protein Sequence Motif in Strepsirrhine Primate Amelogenin

    PubMed Central

    Bromley, Keith M.; Hacia, Joseph G.; Bromage, Timothy G.; Snead, Malcolm L.; Moradian-Oldak, Janet; Paine, Michael L.

    2011-01-01

    Strepsirrhines are members of a primate suborder that has a distinctive set of features associated with the development of the dentition. Amelogenin (AMEL), the better known of the enamel matrix proteins, forms 90% of the secreted organic matrix during amelogenesis. Although AMEL has been sequenced in numerous mammalian lineages, the only reported strepsirrhine AMEL sequences are those of the ring-tailed lemur and galago, which contain a set of additional proline-rich tandem repeats absent in all other primates species analyzed to date, but present in some non-primate mammals. Here, we first determined that these repeats are present in AMEL from three additional lemur species and thus are likely to be widespread throughout this group. To evaluate the functional relevance of these repeats in strepsirrhines, we engineered a mutated murine amelogenin sequence containing a similar proline-rich sequence to that of Lemur catta. In the monomeric form, the MQP insertions had no influence on the secondary structure or refolding properties, whereas in the assembled form, the insertions increased the hydrodynamic radii. We speculate that increased AMEL nanosphere size may influence enamel formation in strepsirrhine primates. PMID:21437261

  6. The tryptophan repressor sequence is highly conserved among the Enterobacteriaceae.

    PubMed Central

    Arvidson, D N; Arvidson, C G; Lawson, C L; Miner, J; Adams, C; Youderian, P

    1994-01-01

    Tryptophan biosynthesis in Escherichia coli is regulated by the product of the trpR gene, the tryptophan (Trp) repressor. Trp aporepressor binds the corepressor, L-tryptophan, to form a holorepressor complex, which binds trp operator DNA tightly, and inhibits transcription of the tryptophan biosynthetic operon. The conservation of trp operator sequences among enteric Gram-negative bacteria suggests that trpR genes from other bacterial species can be cloned by complementation in E. coli. To clone trpR homologues, a deletion of the E. coli trpR gene, delta trpR504, was made on a plasmid by site-directed mutagenesis, then crossed onto the E. coli genome. Plasmid clones of the trpR genes of Enterobacter aerogenes and Enterobacter cloacae were isolated by complementation of the delta trpR504 allele, scored as the ability to repress beta-galactosidase synthesis from a prophage-borne trpE-lacZ gene fusion. The predicted amino acid sequences of four enteric TrpR proteins show differences, clustered on the backside of the folded repressor, opposite the DNA-binding helix-turn-helix substructures. These differences are predicted to have little effect on the interactions of the aporepressor with tryptophan, holorepressor with operator DNA, or tandemly bound holorepressor dimers with one another. Although there is some variation observed at the dimer interface, interactions predicted to stabilize the interface are conserved. The phylogenetic relationships revealed by the TrpR amino acid sequence alignment agree with the results of others. PMID:8208606

  7. Regions outside of conserved PxxPxR motifs drive the high affinity interaction of GRB2 with SH3 domain ligands.

    PubMed

    Bartelt, Rebekah R; Light, Jonathan; Vacaflores, Aldo; Butcher, Alayna; Pandian, Madhana; Nash, Piers; Houtman, Jon C D

    2015-10-01

    SH3 domains are evolutionarily conserved protein interaction domains that control nearly all cellular processes in eukaryotes. The current model is that most SH3 domains bind discreet PxxPxR motifs with weak affinity and relatively low selectivity. However, the interactions of full-length SH3 domain-containing proteins with ligands are highly specific and have much stronger affinity. This suggests that regions outside of PxxPxR motifs drive these interactions. In this study, we observed that PxxPxR motifs were required for the binding of the adaptor protein GRB2 to short peptides from its ligand SOS1. Surprisingly, PxxPxR motifs from the proline rich region of SOS1 or CBL were neither necessary nor sufficient for the in vitro or in vivo interaction with full-length GRB2. Together, our findings show that regions outside of the consensus PxxPxR sites drive the high affinity association of GRB2 with SH3 domain ligands, suggesting that the binding mechanism for this and other SH3 domain interactions may be more complex than originally thought.

  8. The histone chaperone sNASP binds a conserved peptide motif within the globular core of histone H3 through its TPR repeats.

    PubMed

    Bowman, Andrew; Lercher, Lukas; Singh, Hari R; Zinne, Daria; Timinszky, Gyula; Carlomagno, Teresa; Ladurner, Andreas G

    2016-04-20

    Eukaryotic chromatin is a complex yet dynamic structure, which is regulated in part by the assembly and disassembly of nucleosomes. Key to this process is a group of proteins termed histone chaperones that guide the thermodynamic assembly of nucleosomes by interacting with soluble histones. Here we investigate the interaction between the histone chaperone sNASP and its histone H3 substrate. We find that sNASP binds with nanomolar affinity to a conserved heptapeptide motif in the globular domain of H3, close to the C-terminus. Through functional analysis of sNASP homologues we identified point mutations in surface residues within the TPR domain of sNASP that disrupt H3 peptide interaction, but do not completely disrupt binding to full length H3 in cells, suggesting that sNASP interacts with H3 through additional contacts. Furthermore, chemical shift perturbations from(1)H-(15)N HSQC experiments show that H3 peptide binding maps to the helical groove formed by the stacked TPR motifs of sNASP. Our findings reveal a new mode of interaction between a TPR repeat domain and an evolutionarily conserved peptide motif found in canonical H3 and in all histone H3 variants, including CenpA and have implications for the mechanism of histone chaperoning within the cell.

  9. A highly conserved sequence in the 3'-untranslated region of the drosophila Adh gene plays a functional role in Adh expression.

    PubMed Central

    Parsch, J; Stephan, W; Tanda, S

    1999-01-01

    Phylogenetic analysis identified a highly conserved eight-base sequence (AAGGCTGA) within the 3'-untranslated region (UTR) of the Drosophila alcohol dehydrogenase gene, Adh. To examine the functional significance of this conserved motif, we performed in vitro deletion mutagenesis on the D. melanogaster Adh gene followed by P-element-mediated germline transformation. Deletion of all or part of the eight-base sequence leads to a twofold increase in in vivo ADH enzymatic activity. The increase in activity is temporally and spatially general and is the result of an underlying increase in Adh transcript. These results indicate that the conserved 3'-UTR motif plays a functional role in the negative regulation of Adh gene expression. The evolutionary significance of our results may be understood in the context of the amino acid change that produces the ADH-F allele and also leads to a twofold increase in ADH activity. While there is compelling evidence that the amino acid replacement has been a target of positive selection, the conservation of the 3'-UTR sequence suggests that it is under strong purifying selection. The selective difference between these two sequence changes, which have similar effects on ADH activity, may be explained by different metabolic costs associated with the increase in activity. PMID:9927459

  10. Peptide sequences identified by phage display are immunodominant functional motifs of Pet and Pic serine proteases secreted by Escherichia coli and Shigella flexneri.

    PubMed

    Ulises, Hernández-Chiñas; Tatiana, Gazarian; Karlen, Gazarian; Guillermo, Mendoza-Hernández; Juan, Xicohtencatl-Cortes; Carlos, Eslava

    2009-12-01

    Plasmid-encoded toxin (Pet) and protein involved in colonization (Pic), are serine protease autotransporters of Enterobacteriaceae (SPATEs) secreted by enteroaggregative Escherichia coli (EAEC), which display the GDSGSG sequence or the serine motif. Our research was directed to localize functional sites in both proteins using the phage display method. From a 12mer linear and a 7mer cysteine-constrained (C7C) libraries displayed on the M13 phage pIII protein we selected different mimotopes using IgG purified from sera of children naturally infected with EAEC producing Pet and Pic proteins, and anti-Pet and anti-Pic IgG purified from rabbits immunized with each one of these proteins. Children IgG selected a homologous group of sequences forming the consensus sequence, motif, PQPxK, and the motifs PGxI/LN and CxPDDSSxC were selected by the rabbit anti-Pet and anti-Pic IgGs, respectively. Analysis of the amino terminal region of a panel of SPATEs showed the presence in all of them of sequences matching the PGxI/LN or CxPDDSSxC motifs, and in a three-dimensional model (Modeller 9v2) designed for Pet, both these motifs were found in the globular portion of the protein, close to the protease active site GDSGSG. Antibodies induced in mice by mimotopes carrying the three aforementioned motifs were reactive with Pet, Pic, and with synthetic peptides carrying the immunogenic mimotope sequences TYPGYINHSKA and LLPQPPKLLLP, thus confirming that the peptide moiety of the selected phages induced the antibodies specific for the toxins. The antibodies induced in mice to the PGxI/LN and CxPDDSSxC mimotopes inhibited fodrin proteolysis and macrophage chemotaxis biological activities of Pet. Our results showed that we were able to generate, by a phage display procedure, mimotopes with sequence motifs PGxI/LN and CxPDDSSxC, and to identify them as functional motifs of the Pet, Pic and other SPATEs involved in their biological activities.

  11. On the Concept of Cis-regulatory Information: From Sequence Motifs to Logic Functions

    NASA Astrophysics Data System (ADS)

    Tarpine, Ryan; Istrail, Sorin

    The regulatory genome is about the “system level organization of the core genomic regulatory apparatus, and how this is the locus of causality underlying the twin phenomena of animal development and animal evolution” (E.H. Davidson. The Regulatory Genome: Gene Regulatory Networks in Development and Evolution, Academic Press, 2006). Information processing in the regulatory genome is done through regulatory states, defined as sets of transcription factors (sequence-specific DNA binding proteins which determine gene expression) that are expressed and active at the same time. The core information processing machinery consists of modular DNA sequence elements, called cis-modules, that interact with transcription factors. The cis-modules “read” the information contained in the regulatory state of the cell through transcription factor binding, “process” it, and directly or indirectly communicate with the basal transcription apparatus to determine gene expression. This endowment of each gene with the information-receiving capacity through their cis-regulatory modules is essential for the response to every possible regulatory state to which it might be exposed during all phases of the life cycle and in all cell types. We present here a set of challenges addressed by our CYRENE research project aimed at studying the cis-regulatory code of the regulatory genome. The CYRENE Project is devoted to (1) the construction of a database, the cis-Lexicon, containing comprehensive information across species about experimentally validated cis-regulatory modules; and (2) the software development of a next-generation genome browser, the cis-Browser, specialized for the regulatory genome. The presentation is anchored on three main computational challenges: the Gene Naming Problem, the Consensus Sequence Bottleneck Problem, and the Logic Function Inference Problem.

  12. High-throughput sequencing enhanced phage display enables the identification of patient-specific epitope motifs in serum

    PubMed Central

    Christiansen, Anders; Kringelum, Jens V.; Hansen, Christian S.; Bøgh, Katrine L.; Sullivan, Eric; Patel, Jigar; Rigby, Neil M.; Eiwegger, Thomas; Szépfalusi, Zsolt; Masi, Federico de; Nielsen, Morten; Lund, Ole; Dufva, Martin

    2015-01-01

    Phage display is a prominent screening technique with a multitude of applications including therapeutic antibody development and mapping of antigen epitopes. In this study, phages were selected based on their interaction with patient serum and exhaustively characterised by high-throughput sequencing. A bioinformatics approach was developed in order to identify peptide motifs of interest based on clustering and contrasting to control samples. Comparison of patient and control samples confirmed a major issue in phage display, namely the selection of unspecific peptides. The potential of the bioinformatic approach was demonstrated by identifying epitopes of a prominent peanut allergen, Ara h 1, in sera from patients with severe peanut allergy. The identified epitopes were confirmed by high-density peptide micro-arrays. The present study demonstrates that high-throughput sequencing can empower phage display by (i) enabling the analysis of complex biological samples, (ii) circumventing the traditional laborious picking and functional testing of individual phage clones and (iii) reducing the number of selection rounds. PMID:26246327

  13. CpG island erosion, polycomb occupancy and sequence motif enrichment at bivalent promoters in mammalian embryonic stem cells

    PubMed Central

    Mantsoki, Anna; Devailly, Guillaume; Joshi, Anagha

    2015-01-01

    In embryonic stem (ES) cells, developmental regulators have a characteristic bivalent chromatin signature marked by simultaneous presence of both activation (H3K4me3) and repression (H3K27me3) signals and are thought to be in a ‘poised’ state for subsequent activation or silencing during differentiation. We collected eleven pairs (H3K4me3 and H3K27me3) of ChIP sequencing datasets in human ES cells and eight pairs in murine ES cells, and predicted high-confidence (HC) bivalent promoters. Over 85% of H3K27me3 marked promoters were bivalent in human and mouse ES cells. We found that (i) HC bivalent promoters were enriched for developmental factors and were highly likely to be differentially expressed upon transcription factor perturbation; (ii) murine HC bivalent promoters were occupied by both polycomb repressive component classes (PRC1 and PRC2) and grouped into four distinct clusters with different biological functions; (iii) HC bivalent and active promoters were CpG rich while H3K27me3-only promoters lacked CpG islands. Binding enrichment of distinct sets of regulators distinguished bivalent from active promoters. Moreover, a ‘TCCCC’ sequence motif was specifically enriched in bivalent promoters. Finally, this analysis will serve as a resource for future studies to further understand transcriptional regulation during embryonic development. PMID:26582124

  14. A highly conserved DNA replication module from Streptococcus thermophilus phages is similar in sequence and topology to a module from Lactococcus lactis phages.

    PubMed

    Desiere, F; Lucchini, S; Bruttin, A; Zwahlen, M C; Brüssow, H

    1997-08-01

    A highly conserved DNA region extending over 5 kb was observed in Streptococcus thermophilus bacteriophages. Comparative sequencing of one temperate and 26 virulent phages demonstrated in the most extreme case an 18% aa difference for a predicted protein, while the majority of the phages showed fewer, if any aa changes. The relative degree of aa conservation was not homogeneous over the DNA segment investigated. Sequence analysis of the conserved segment revealed genes possibly involved in DNA transactions. Three predicted proteins (orf 233, 443, and 382 gene product (gp)) showed nucleoside triphosphate binding motifs. Orf 443 gp showed in addition a DEAH box motif, characteristically found in a subgroup of helicases, and a variant zinc finger motif known from a phage T7 helicase/primase. Tree analysis classified orf 443 gp as a distant member of the helicase superfamily. Orf 382 gp showed similarity to putative plasmid DNA primases. Downstream of orf 382 a noncoding repeat region was identified that showed similarity to a putative minus origin from a cryptic S. thermophilus plasmid. Four predicted proteins showed not only high degrees of aa identity (34 to 63%) with proteins from Lactococcus lactis phages, but their genes showed a similar topological organization. We interpret this as evidence for a horizontal gene transfer event between phages of the two bacterial genera in the distant past. PMID:9268169

  15. Sequence specific protein binding to and activation of the TGF-beta 3 promoter through a repeated TCCC motif.

    PubMed Central

    Lafyatis, R; Denhez, F; Williams, T; Sporn, M; Roberts, A

    1991-01-01

    We have previously characterized the TGF-beta 3 promoter and shown that the activity of this promoter is highly variable in different cell types. Although the promoter contains a proximal cAMP responsive element, which is critical to basal and forskolin-induced promoter activity, this element is not responsible for the variable, cell-specific regulation of the promoter. In this paper, we identify a 25 base pair sequence in the proximal region of the TGF-beta 3 promoter that binds a novel DNA-binding protein. This region includes the sequence T-CCCTCCCTCCC, (3 x TCCC), and mutation of these T-CCC repeats inhibits protein binding. Further, we show that in the cell line A375, which we have previously shown expresses high levels of TGF-beta 3 mRNA, this region is responsible for mediating high level TGF-beta 3 promoter activity. Immediately 3' to the 3 x TCCC sequence is a consensus AP-2 binding site, however, we show that this region does not bind AP-2, and AP-2 does not transactivate the TGF-beta 3 promoter. Therefore, we provide strong evidence that high level expression of TGF-beta 3 in A375 cells results from transactivation of the TGF-beta 3 promoter by a protein that binds to a repeated TCCC motif in the promoter and suggest that this DNA-binding protein likely also regulates aspects of developmental and tissue-specific expression of this cytokine. Images PMID:1754378

  16. A novel human AP endonuclease with conserved zinc-finger-like motifs involved in DNA strand break responses

    PubMed Central

    Kanno, Shin-ichiro; Kuzuoka, Hiroyuki; Sasao, Shigeru; Hong, Zehui; Lan, Li; Nakajima, Satoshi; Yasui, Akira

    2007-01-01

    DNA damage causes genome instability and cell death, but many of the cellular responses to DNA damage still remain elusive. We here report a human protein, PALF (PNK and APTX-like FHA protein), with an FHA (forkhead-associated) domain and novel zinc-finger-like CYR (cysteine–tyrosine–arginine) motifs that are involved in responses to DNA damage. We found that the CYR motif is widely distributed among DNA repair proteins of higher eukaryotes, and that PALF, as well as a Drosophila protein with tandem CYR motifs, has endo- and exonuclease activities against abasic site and other types of base damage. PALF accumulates rapidly at single-strand breaks in a poly(ADP-ribose) polymerase 1 (PARP1)-dependent manner in human cells. Indeed, PALF interacts directly with PARP1 and is required for its activation and for cellular resistance to methyl-methane sulfonate. PALF also interacts directly with KU86, LIGASEIV and phosphorylated XRCC4 proteins and possesses endo/exonuclease activity at protruding DNA ends. Various treatments that produce double-strand breaks induce formation of PALF foci, which fully coincide with γH2AX foci. Thus, PALF and the CYR motif may play important roles in DNA repair of higher eukaryotes. PMID:17396150

  17. Marker production by PCR amplification with primer pairs from conserved sequences of WRKY genes in chili pepper.

    PubMed

    Kim, Hyoun-Joung; Lee, Heung-Ryul; Han, Jung-Heon; Yeom, Seon-In; Harn, Chee-Hark; Kim, Byung-Dong

    2008-04-30

    Despite increasing awareness of the importance of WRKY genes in plant defense signaling, the locations of these genes in the Capsicum genome have not been established. To develop WRKY-based markers, primer sequences were deduced from the conserved sequences of the DNA binding motif within the WRKY domains of tomato and pepper genes. These primers were derived from upstream and downstream parts of the conserved sequences of the three WRKY groups. Six primer combinations of each WRKY group were tested for polymorphisms between the mapping parents, C. annuum 'CM334' and C. annuum 'Chilsungcho'. DNA fragments amplified by primer pairs deduced from WRKY Group II genes revealed high levels of polymorphism. Using 32 primer pairs to amplify upstream and downstream parts of the WRKY domain of WRKY group II genes, 60 polymorphic bands were detected. Polymorphisms were not detected with primer pairs from downstream parts of WRKY group II genes. Half of these primers were subjected to F2 genotyping to construct a linkage map. Thirty of 41 markers were located evenly spaced on 20 of the 28 linkage groups, without clustering. This linkage map also consisted of 199 AFLP and 26 SSR markers. This WRKY-based marker system is a rapid and simple method for generating sequence-specific markers for plant gene families.

  18. Sequence motifs of human her-2 protooncogene important for Peptide binding to hla-A2.

    PubMed

    Fisk, B; Chesak, B; Ioannides, M; Wharton, J; Ioannides, C

    1994-07-01

    Tumor progression and metastasis are often associated with overexpression of specific cellular proteins. In 1991, we introduced a hypothesis that epitopes of nonmutated overexpressed proteins can be targets of a specific cellular immune response against tumor mediated by T cells (Mol Carcinogen 6: 77-81, 1992) and that, when T cell epitopes are present, distinction between tumor immunity/autoimmunity and unresponsiveness can be predicated on the protein concentration as a limiting factor of epitope supply. In support of this hypothesis, we demonstrated that CTL from patients with ovarian tumors which overexpress HER-2 proto-oncogene can recognize both autologous tumor and synthetic analogs of a specific epitope from HER-2, which was identified based on the convergence of all criteria for selection of HLA-A2 associated epitopes recognized by T cells. In this study, we identified all epitopes in HER-2 containing nonapeptides with HLA-A2 anchors. Of these, analysis of potential amphiphilic sites identified both sequences and specific mutations that positively affected the reactivity of conformationally dependent HLA-A2 specific mAb which served as an indication of HER-2 peptide binding. We also report the in vitro induction of cellular responses to these peptides by PBMC from healthy HLA-A2+ volunteers as an indication of their ability to stimulate/ restimulate pre-existing T cell responses to HER-2. The peptides induced proliferative responses in one of four donors tested and CTL responses (one of three peptides tested in two of three donors). This strategy may allow selection of immunogenic HER-2 peptides and elucidation of mechanisms operating in induction of tolerance to defined epitopes on self-proteins. PMID:21559557

  19. Characterization of the fibronectin-attachment protein of Mycobacterium avium reveals a fibronectin-binding motif conserved among mycobacteria.

    PubMed

    Schorey, J S; Holsti, M A; Ratliff, T L; Allen, P M; Brown, E J

    1996-07-01

    Mycobacterium avium is an intracellular pathogen and a major opportunistic infectious agent observed in patients with acquired immune deficiency syndrome (AIDS). Evidence suggests that the initial portal of infection by M. avium is often the gastrointestinal tract. However, the mechanism by which the M. avium crosses the epithelial barrier is unclear. A possible mechanism is suggested by the ability of M. avium to bind fibronectin, an extracellular matrix protein that is a virulence factor for several extracellular pathogenic bacteria which bind to mucosal surfaces. To further characterize fibronectin binding by M. avium, we have cloned the M. avium fibronectin-attachment protein (FAP). The M. avium FAP (FAP-A) has an unusually large number of Pro and Ala residues (40% overall) and is 50% identical to FAP of both Mycobacterium leprae and Mycobacterium tuberculosis. Using recombinant FAP-A and FAP-A peptides, we show that two non-continuous regions in FAP-A bind fibronectin. Peptides from these regions and homologous sequences from M. leprae FAP inhibit fibronectin binding by both M. avium and Mycobacterium bovis Bacillus Calmette-Guerin (BCG). These regions have no homology to eukaryotic fibronectin-binding proteins and are only distantly related to fibronectin-binding peptides of Gram-positive bacteria. Nevertheless, these fibronectin-binding regions are highly conserved among the mycobacterial FAPs, suggesting an essential function for this interaction in mycobacteria infection of their metazoan hosts.

  20. Gene conversion causing human inherited disease: evidence for involvement of non-B-DNA-forming sequences and recombination-promoting motifs in DNA breakage and repair

    PubMed Central

    Chuzhanova, Nadia; Chen, Jian-Min; Bacolla, Albino; Patrinos, George P.; Férec, Claude; Wells, Robert D.; Cooper, David N.

    2009-01-01

    A variety of DNA sequence motifs including inverted repeats, minisatellites, and the χ recombination hotspot, have been reported in association with gene conversion in human genes causing inherited disease. However, no methodical statistically-based analysis has been performed to formalize these observations. We have performed an in silico analysis of the DNA sequence tracts involved in 27 non-overlapping gene conversion events in 19 different genes reported in the context of inherited disease. We found that gene conversion events tend to occur within (C+G)- and CpG-rich regions and that sequences with the potential to form non-B-DNA structures, and which may be involved in the generation of double-strand breaks that could in turn serve to promote gene conversion, occur disproportionately within maximal converted tracts and/or short flanking regions. Maximal converted tracts were also found to be enriched (p<0.01) in a truncated version of the χ-element (a TGGTGG motif), immunoglobulin heavy chain class switch repeats, translin target sites and several novel motifs including (or overlapping) the classical meiotic recombination hotspot, CCTCCCCT. Finally, gene conversions tend to occur in genomic regions that have the potential to fold into stable hairpin conformations. These findings support the concept that recombination-inducing motifs, in association with alternative DNA conformations, can promote recombination in the human genome. PMID:19431182

  1. Redundant ERF-VII Transcription Factors Bind to an Evolutionarily Conserved cis-Motif to Regulate Hypoxia-Responsive Gene Expression in Arabidopsis.

    PubMed

    Gasch, Philipp; Fundinger, Moritz; Müller, Jana T; Lee, Travis; Bailey-Serres, Julia; Mustroph, Angelika

    2016-01-01

    The response of Arabidopsis thaliana to low-oxygen stress (hypoxia), such as during shoot submergence or root waterlogging, includes increasing the levels of ∼50 hypoxia-responsive gene transcripts, many of which encode enzymes associated with anaerobic metabolism. Upregulation of over half of these mRNAs involves stabilization of five group VII ethylene response factor (ERF-VII) transcription factors, which are routinely degraded via the N-end rule pathway of proteolysis in an oxygen- and nitric oxide-dependent manner. Despite their importance, neither the quantitative contribution of individual ERF-VIIs nor the cis-regulatory elements they govern are well understood. Here, using single- and double-null mutants, the constitutively synthesized ERF-VIIs RELATED TO APETALA2.2 (RAP2.2) and RAP2.12 are shown to act redundantly as principle activators of hypoxia-responsive genes; constitutively expressed RAP2.3 contributes to this redundancy, whereas the hypoxia-induced HYPOXIA RESPONSIVE ERF1 (HRE1) and HRE2 play minor roles. An evolutionarily conserved 12-bp cis-regulatory motif that binds to and is sufficient for activation by RAP2.2 and RAP2.12 is identified through a comparative phylogenetic motif search, promoter dissection, yeast one-hybrid assays, and chromatin immunopurification. This motif, designated the hypoxia-responsive promoter element, is enriched in promoters of hypoxia-responsive genes in multiple species. PMID:26668304

  2. Rare k-mer DNA: Identification of sequence motifs and prediction of CpG island and promoter.

    PubMed

    Mohamed Hashim, Ezzeddin Kamil; Abdullah, Rosni

    2015-12-21

    Empirical analysis on k-mer DNA has been proven as an effective tool in finding unique patterns in DNA sequences which can lead to the discovery of potential sequence motifs. In an extensive study of empirical k-mer DNA on hundreds of organisms, the researchers found unique multi-modal k-mer spectra occur in the genomes of organisms from the tetrapod clade only which includes all mammals. The multi-modality is caused by the formation of the two lowest modes where k-mers under them are referred as the rare k-mers. The suppression of the two lowest modes (or the rare k-mers) can be attributed to the CG dinucleotide inclusions in them. Apart from that, the rare k-mers are selectively distributed in certain genomic features of CpG Island (CGI), promoter, 5' UTR, and exon. We correlated the rare k-mers with hundreds of annotated features using several bioinformatic tools, performed further intrinsic rare k-mer analyses within the correlated features, and modeled the elucidated rare k-mer clustering feature into a classifier to predict the correlated CGI and promoter features. Our correlation results show that rare k-mers are highly associated with several annotated features of CGI, promoter, 5' UTR, and open chromatin regions. Our intrinsic results show that rare k-mers have several unique topological, compositional, and clustering properties in CGI and promoter features. Finally, the performances of our RWC (rare-word clustering) method in predicting the CGI and promoter features are ranked among the top three, in eight of the CGI and promoter evaluations, among eight of the benchmarked datasets.

  3. Rare k-mer DNA: Identification of sequence motifs and prediction of CpG island and promoter.

    PubMed

    Mohamed Hashim, Ezzeddin Kamil; Abdullah, Rosni

    2015-12-21

    Empirical analysis on k-mer DNA has been proven as an effective tool in finding unique patterns in DNA sequences which can lead to the discovery of potential sequence motifs. In an extensive study of empirical k-mer DNA on hundreds of organisms, the researchers found unique multi-modal k-mer spectra occur in the genomes of organisms from the tetrapod clade only which includes all mammals. The multi-modality is caused by the formation of the two lowest modes where k-mers under them are referred as the rare k-mers. The suppression of the two lowest modes (or the rare k-mers) can be attributed to the CG dinucleotide inclusions in them. Apart from that, the rare k-mers are selectively distributed in certain genomic features of CpG Island (CGI), promoter, 5' UTR, and exon. We correlated the rare k-mers with hundreds of annotated features using several bioinformatic tools, performed further intrinsic rare k-mer analyses within the correlated features, and modeled the elucidated rare k-mer clustering feature into a classifier to predict the correlated CGI and promoter features. Our correlation results show that rare k-mers are highly associated with several annotated features of CGI, promoter, 5' UTR, and open chromatin regions. Our intrinsic results show that rare k-mers have several unique topological, compositional, and clustering properties in CGI and promoter features. Finally, the performances of our RWC (rare-word clustering) method in predicting the CGI and promoter features are ranked among the top three, in eight of the CGI and promoter evaluations, among eight of the benchmarked datasets. PMID:26427337

  4. Identification of Cardiac Troponin I Sequence Motifs Leading to Heart Failure by Inducing Myocardial Inflammation and Fibrosis

    PubMed Central

    Kaya, Ziya; Göser, Stefan; Buss, Sebastian J.; Leuschner, Florian; Öttl, Renate; Li, Jin; Völkers, Mirko; Zittrich, Stefan; Pfitzer, Gabriele; Rose, Noel R.; Katus, Hugo A.

    2009-01-01

    Background Despite the widespread use of cardiac troponins for diagnosis of myocyte injury and risk stratification in acute cardiac disorders, little is known about the long term effects of the released troponins on cardiac function. Recently, we showed that an autoimmune response to cardiac troponin I induces severe inflammation and subsequent fibrosis in the myocardium. This autoimmune disorder predisposes in mice to heart failure and cardiac death. Methods and Results To investigate the role of cTnI-specific T-cells, T-cells were isolated from splenocytes of mice immunized with murine cardiac troponin I (mcTnI). WT mice receiving mcTnI-specific T-cells showed high mcTnI-specific antibody titers, increased production of pro-inflammatory cytokines IL-1β and TNF-α, severe inflammation and fibrosis in the myocardium, and reduced fractional shortening. To identify the antigenic determinants of troponin I responsible for the observed inflammation, fibrosis and heart failure, 16 overlapping 16-18mer peptides covering the entire amino acid sequence of mcTnI (211 residues) were synthesized. Only mice immunized with the residues 105-122 of mcTnI developed significant inflammation and fibrosis in the myocardium with increased expression of inflammatory chemokines RANTES, MCP-1, MIP-1α, MIP-1β, MIP-2, TCA-3, eotaxin and chemokine receptors CCR1, CCR2, CCR5. Mice immunized with the corresponding human cTnI residues 104-121 and the mcTnI residues 131-148 developed milder disease. Conclusion Transfer of troponin I-specific T-cells can induce inflammation and fibrosis in WT mice leading to deterioration of contractile function. Furthermore, two sequence motifs of cTnI that induce inflammation and fibrosis in the myocardium are characterized. PMID:18955666

  5. Functional consequences of mutations in the conserved SF2 motifs and post-translational phosphorylation of the CSB protein.

    PubMed

    Christiansen, Mette; Stevnsner, Tinna; Modin, Charlotte; Martensen, Pia M; Brosh, Robert M; Bohr, Vilhelm A

    2003-02-01

    The rare inherited human genetic disorder Cockayne syndrome (CS) is characterized by developmental abnormalities, UV sensitivity and premature aging. The cellular and molecular phenotypes of CS include increased sensitivity to UV-induced and oxidative DNA lesions. Two genes are involved: CSA and CSB. The CS group B (CSB) protein has roles in transcription, transcription-coupled repair, and base excision repair. It is a DNA stimulated ATPase and remodels chromatin in vitro. Here, we have analyzed wild-type (wt) and motif II, V and VI mutant CSB proteins. We find that the mutant proteins display different degrees of ATPase activity deficiency, and in contrast to the in vivo complementation studies, the motif II mutant is more defective than motif V and VI CSB mutants. Furthermore, CSB wt ATPase activity was studied with different biologically important DNA cofactors: DNA with different secondary structures and damaged DNA. The results indicate that the state of DNA secondary structure affects the level of CSB ATPase activity. We find that the CSB protein is phosphorylated in untreated cells and that UV irradiation leads to its dephosphorylation. Importantly, dephosphorylation of the protein in vitro results in increased ATPase activity of the protein, suggesting that the activity of the CSB protein is subject to phosphorylation control in vivo. These observations may have significant implications for the function of CSB in vivo. PMID:12560492

  6. Sequence and structural analysis of the Asp-box motif and Asp-box beta-propellers; a widespread propeller-type characteristic of the Vps10 domain family and several glycoside hydrolase families

    PubMed Central

    Quistgaard, Esben M; Thirup, Søren S

    2009-01-01

    Background The Asp-box is a short sequence and structure motif that folds as a well-defined β-hairpin. It is present in different folds, but occurs most prominently as repeats in β-propellers. Asp-box β-propellers are known to be characteristically irregular and to occur in many medically important proteins, most of which are glycosidase enzymes, but they are otherwise not well characterized and are only rarely treated as a distinct β-propeller family. We have analyzed the sequence, structure, function and occurrence of the Asp-box and s-Asp-box -a related shorter variant, and provide a comprehensive classification and computational analysis of the Asp-box β-propeller family. Results We find that all conserved residues of the Asp-box support its structure, whereas the residues in variable positions are generally used for other purposes. The Asp-box clearly has a structural role in β-propellers and is highly unlikely to be involved in ligand binding. Sequence analysis of the Asp-box β-propeller family reveals it to be very widespread especially in bacteria and suggests a wide functional range. Disregarding the Asp-boxes, sequence conservation of the propeller blades is very low, but a distinct pattern of residues with specific properties have been identified. Interestingly, Asp-boxes are occasionally found very close to other propeller-associated repeats in extensive mixed-motif stretches, which strongly suggests the existence of a novel class of hybrid β-propellers. Structural analysis reveals that the top and bottom faces of Asp-box β-propellers have striking and consistently different loop properties; the bottom is structurally conserved whereas the top shows great structural variation. Interestingly, only the top face is used for functional purposes in known structures. A structural analysis of the 10-bladed β-propeller fold, which has so far only been observed in the Asp-box family, reveals that the inner strands of the blades are unusually far apart

  7. Gene sequence, localization, and evolutionary conservation of DAZLA, a candidate male sterility gene.

    PubMed

    Seboun, E; Barbaux, S; Bourgeron, T; Nishi, S; Agulnik, A; Egashira, M; Nikkawa, N; Bishop, C; Fellous, M; McElreavey, K; Kasahara, M; Algonik, A

    1997-04-15

    We have isolated the human homologue of the mouse germ cell-specific transcript Tpx2, which we had previously mapped to mouse chromosome 17. Sequence analysis shows that the human gene is part of the DAZ (Deleted in Azoospermia) family, represents the human homologue of the mouse Dazla and Drosophila boule genes, and is termed DAZLA. Like Dazla and boule, DAZLA is single copy and maps to 3p25. This defines a new region of synteny between mouse chromosome 17 and human chromosome 3. Unlike DAZ, which has multiple DAZ repeats, DAZLA encodes a putative RNA-binding protein with a single RNA-binding motif and a single DAZ repeat. DAZLA is more closely related to Dazla in the mouse than to the Y-linked homologue DAZ (88% identity overall with mouse Dazla compared to 76% identity with the human DAZ protein sequence). Southern blot analysis showed that DAZLA is autosomal in all mammals tested and that DAZ has been recently translocated to the Y chromosome, sometime after the divergence of Old World and New World primates. To investigate the evolutionary relatedness of DAZLA and DAZ further, their partial genomic structures were obtained and compared. This revealed that the genomic organization of both genes in the 5' region is highly conserved. DAZLA is a new member of the DAZ family of genes, which is associated with spermatogenesis and male sterility. Familial cases of male infertility in humans show an autosomal recessive mode of inheritance. It is possible that some of these families may carry mutations in the DAZLA gene.

  8. Characterization, nucleotide sequence, and conserved genomic locations of insertion sequence ISRm5 in Rhizobium meliloti.

    PubMed Central

    Laberge, S; Middleton, A T; Wheatcroft, R

    1995-01-01

    A target for ISRm3 transposition in Rhizobium meliloti IZ450 is another insertion sequence element, named ISRm5. ISRm5 is 1,340 bp in length and possesses terminal inverted repeats of unequal lengths (27 and 28 bp) and contain five mismatches. An open reading frame that spans 89% of the length of one DNA strand encodes a putative transposase with significant similarity to the putative transposases of 11 insertion sequence elements from diverse bacterial species, including ISRm3 from R. meliloti. Multiple copies and variants of ISRm5 occur in the R. meliloti genome, often in close association with ISRm3. Five ISRm5 copies in two strains were studied, and each was found to be located between 8-bp direct repeats. At two of these loci, which were shown to be highly conserved in R. meliloti, the copies of ISRm5 were found to be associated with pairs of short inverted repeats resembling transcription terminators. This structural arrangement not only may provide a conserved niche for ISRm5 but also may be a preferred target for transposition. PMID:7768811

  9. Identification of a novel calcium binding motif based on the detection of sequence insertions in the animal peroxidase domain of bacterial proteins.

    PubMed

    Santamaría-Hernando, Saray; Krell, Tino; Ramos-González, María-Isabel

    2012-01-01

    Proteins of the animal heme peroxidase (ANP) superfamily differ greatly in size since they have either one or two catalytic domains that match profile PS50292. The orf PP_2561 of Pseudomonas putida KT2440 that we have called PepA encodes a two-domain ANP. The alignment of these domains with those of PepA homologues revealed a variable number of insertions with the consensus G-x-D-G-x-x-[GN]-[TN]-x-D-D. This motif has also been detected in the structure of pseudopilin (pdb 3G20), where it was found to be involved in Ca(2+) coordination although a sequence analysis did not reveal the presence of any known calcium binding motifs in this protein. Isothermal titration calorimetry revealed that a peptide containing this consensus motif bound specifically calcium ions with affinities ranging between 33-79 µM depending on the pH. Microcalorimetric titrations of the purified N-terminal ANP-like domain of PepA revealed Ca(2+) binding with a K(D) of 12 µM and stoichiometry of 1.25 calcium ions per protein monomer. This domain exhibited peroxidase activity after its reconstitution with heme. These data led to the definition of a novel calcium binding motif that we have termed PERCAL and which was abundantly present in animal peroxidase-like domains of bacterial proteins. Bacterial heme peroxidases thus possess two different types of calcium binding motifs, namely PERCAL and the related hemolysin type calcium binding motif, with the latter being located outside the catalytic domains and in their C-terminal end. A phylogenetic tree of ANP-like catalytic domains of bacterial proteins with PERCAL motifs, including single domain peroxidases, was divided into two major clusters, representing domains with and without PERCAL motif containing insertions. We have verified that the recently reported classification of bacterial heme peroxidases in two families (cd09819 and cd09821) is unrelated to these insertions. Sequences matching PERCAL were detected in all kingdoms of life. PMID

  10. A conserved CATTCCT motif is required for skeletal muscle-specific activity of the cardiac troponin T gene promoter.

    PubMed Central

    Mar, J H; Ordahl, C P

    1988-01-01

    Transcription of the cardiac troponin T (cTNT) gene is restricted to cardiac and embryonic skeletal muscle tissue. A DNA segment containing 129 nucleotides upstream from the cTNT transcription initiation site (cTNT-129) directs expression of a heterologous marker gene in transfected embryonic skeletal muscle cells but is inactive in embryonic cardiac or fibroblast cells. By using chimeric promoter constructions, in which distal and proximal segments of cTNT-129 are fused to reciprocal segments of the herpes simplex virus thymidine kinase (HSV tk) gene promoter, the DNA segment responsible for this cell specificity can be localized to the cTNT distal promoter region, located between 50 and 129 nucleotides upstream of the transcription initiation site. The ability of the cTNT distal promoter region to confer skeletal muscle-specific activity upon a heterologous promoter is abolished when it is displaced 60 nucleotides upstream, indicating that its ability to direct skeletal muscle-specific transcription probably requires proximity to other components of the transcription initiation region. Two copies of the heptamer, CATTCCT ("muscle-CAT" or "M-CAT" motif), reside within the 80-nucleotide cTNT distal promoter region. A 3-nucleotide mutation in one of these copies inactivates the cTNT promoter in skeletal muscle cells. Therefore, the M-CAT motif is a distal promoter element required for expression of the cTNT promoter in embryonic skeletal muscle cells. Since the M-CAT motif is found in other contractile protein gene promoters, it may represent one example of a muscle-specific promoter element. Images PMID:3413104

  11. More robust detection of motifs in coexpressed genes by using phylogenetic information

    PubMed Central

    Monsieurs, Pieter; Thijs, Gert; Fadda, Abeer A; De Keersmaecker, Sigrid CJ; Vanderleyden, Jozef; De Moor, Bart; Marchal, Kathleen

    2006-01-01

    Background Several motif detection algorithms have been developed to discover overrepresented motifs in sets of coexpressed genes. However, in a noisy gene list, the number of genes containing the motif versus the number lacking the motif might not be sufficiently high to allow detection by classical motif detection tools. To still recover motifs which are not significantly enriched but still present, we developed a procedure in which we use phylogenetic footprinting to first delineate all potential motifs in each gene. Then we mutually compare all detected motifs and identify the ones that are shared by at least a few genes in the data set as potential candidates. Results We applied our methodology to a compiled test data set containing known regulatory motifs and to two biological data sets derived from genome wide expression studies. By executing four consecutive steps of 1) identifying conserved regions in orthologous intergenic regions, 2) aligning these conserved regions, 3) clustering the conserved regions containing similar regulatory regions followed by extraction of the regulatory motifs and 4) screening the input intergenic sequences with detected regulatory motif models, our methodology proves to be a powerful tool for detecting regulatory motifs when a low signal to noise ratio is present in the input data set. Comparing our results with two other motif detection algorithms points out the robustness of our algorithm. Conclusion We developed an approach that can reliably identify multiple regulatory motifs lacking a high degree of overrepresentation in a set of coexpressed genes (motifs belonging to sparsely connected hubs in the regulatory network) by exploiting the advantages of using both coexpression and phylogenetic information. PMID:16549017

  12. Formation and Dissociation of the Interstrand i-Motif by the Sequences d(XnC4Ym) Monitored with Electrospray Ionization Mass Spectrometry

    NASA Astrophysics Data System (ADS)

    Cao, Yanwei; Qin, Yujiao; Bruist, Michael; Gao, Shang; Wang, Bing; Wang, Huixin; Guo, Xinhua

    2015-06-01

    Formation and dissociation of the interstrand i-motifs by DNA with the sequence d(XnC4Ym) (X and Y represent thymine, adenine, or guanine, and n, m range from 0 to 2) are studied with electrospray ionization mass spectrometry (ESI-MS), circular dichroism (CD), and UV spectrophotometry. The ion complexes detected in the gas phase and the melting temperatures (Tm) obtained in solution show that a non-C base residue located at 5' end favors formation of the four-stranded structures, with T > A > G for imparting stability. Comparatively, no rule is found when a non-C base is located at the 3' end. Detection of penta- and hexa-stranded ions indicates the formation of i-motifs with more than four strands. In addition, the i-motifs seen in our mass spectra are accompanied by single-, double-, and triple-stranded ions, and the trimeric ions were always less abundant during annealing and heat-induced dissociation process of the DNA strands in solution (pH = 4.5). This provides a direct evidence of a strand-by-strand formation and dissociation pathway of the interstrand i-motif and formation of the triple strands is the rate-limiting step. In contrast, the trimeric ions are abundant when the tetramolecular ions are subjected to collision-induced dissociation (CID) in the gas phase, suggesting different dissociation behaviors of the interstrand i-motif in the gas phase and in solution. Furthermore, hysteretic UV absorption melting and cooling curves reveal an irreversible dissociation and association kinetic process of the interstrand i-motif in solution.

  13. De Novo Regulatory Motif Discovery Identifies Significant Motifs in Promoters of Five Classes of Plant Dehydrin Genes

    PubMed Central

    Zolotarov, Yevgen; Strömvik, Martina

    2015-01-01

    Plants accumulate dehydrins in response to osmotic stresses. Dehydrins are divided into five different classes, which are thought to be regulated in different manners. To better understand differences in transcriptional regulation of the five dehydrin classes, de novo motif discovery was performed on 350 dehydrin promoter sequences from a total of 51 plant genomes. Overrepresented motifs were identified in the promoters of five dehydrin classes. The Kn dehydrin promoters contain motifs linked with meristem specific expression, as well as motifs linked with cold/dehydration and abscisic acid response. KS dehydrin promoters contain a motif with a GATA core. SKn and YnSKn dehydrin promoters contain motifs that match elements connected with cold/dehydration, abscisic acid and light response. YnKn dehydrin promoters contain motifs that match abscisic acid and light response elements, but not cold/dehydration response elements. Conserved promoter motifs are present in the dehydrin classes and across different plant lineages, indicating that dehydrin gene regulation is likely also conserved. PMID:26114291

  14. Localization of Daucus carota NMCP1 to the nuclear periphery: the role of the N-terminal region and an NLS-linked sequence motif, RYNLRR, in the tail domain.

    PubMed

    Kimura, Yuta; Fujino, Kaien; Ogawa, Kana; Masuda, Kiyoshi

    2014-01-01

    Recent ultrastructural studies revealed that a structure similar to the vertebrate nuclear lamina exists in the nuclei of higher plants. However, plant genomes lack genes for lamins and intermediate-type filament proteins, and this suggests that plant-specific nuclear coiled-coil proteins make up the lamina-like structure in plants. NMCP1 is a protein, first identified in Daucus carota cells, that localizes exclusively to the nuclear periphery in interphase cells. It has a tripartite structure comprised of head, rod, and tail domains, and includes putative nuclear localization signal (NLS) motifs. We identified the functional NLS of DcNMCP1 (carrot NMCP1) and determined the protein regions required for localizing to the nuclear periphery using EGFP-fused constructs transiently expressed in Apium graveolens epidermal cells. Transcription was driven under a CaMV35S promoter, and the genes were introduced into the epidermal cells by a DNA-coated microprojectile delivery system. Of the NLS motifs, KRRRK and RRHK in the tail domain were highly functional for nuclear localization. Addition of the N-terminal 141 amino acids from DcNMCP1 shifted the localization of a region including these NLSs from the entire nucleus to the nuclear periphery. Using this same construct, the replacement of amino acids in RRHK or its preceding sequence, YNL, with alanine residues abolished localization to the nuclear periphery, while replacement of KRRRK did not affect localization. The sequence R/Q/HYNLRR/H, including YNL and the first part of the sequence of RRHK, is evolutionarily conserved in a subclass of NMCP1 sequences from many plant species. These results show that NMCP1 localizes to the nuclear periphery by a combined action of a sequence composed of R/Q/HYNLRR/H, NLS, and the N-terminal region including the head and a portion of the rod domain, suggesting that more than one binding site is implicated in localization of NMCP1.

  15. A Conserved 20S Proteasome Assembly Factor Requires a C-terminal HbYX Motif for Proteasomal Precursor Binding

    PubMed Central

    Kusmierczyk, Andrew R.; Kunjappu, Mary J.; Kim, Roger Y.; Hochstrasser, Mark

    2011-01-01

    Dedicated chaperones facilitate eukaryotic proteasome assembly, yet how they function remains largely unknown. Here we demonstrate that a yeast 20S proteasome assembly factor, Pba1–Pba2, requires a previously overlooked C-terminal HbYX (hydrophobic-tyrosine-X) motif for function. HbYX motifs in proteasome activators open the 20S proteasome entry pore, but Pba1–Pba2 instead binds inactive proteasomal precursors. We discovered an archaeal ortholog of this factor, here named PbaA, that also binds preferentially to proteasomal precursors in a HbYX-dependent fashion using the same proteasomal α-ring surface pockets bound by activators. Remarkably, PbaA and the related PbaB protein can be induced to bind mature 20S proteasomes if the active sites in the central chamber are occupied by inhibitors. Our data suggest an allosteric mechanism in which proteasome active-site maturation determines assembly chaperone binding, potentially shielding assembly intermediates or misassembled complexes from non-productive associations until assembly is complete. PMID:21499243

  16. Tumor-associated mutations in a conserved structural motif alter physical and biochemical properties of human RAD51 recombinase

    PubMed Central

    Chen, Jianhong; Morrical, Milagros D.; Donigan, Katherine A.; Weidhaas, Joanne B.; Sweasy, Joann B.; Averill, April M.; Tomczak, Jennifer A.; Morrical, Scott W.

    2015-01-01

    Human RAD51 protein catalyzes DNA pairing and strand exchange reactions that are central to homologous recombination and homology-directed DNA repair. Successful recombination/repair requires the formation of a presynaptic filament of RAD51 on ssDNA. Mutations in BRCA2 and other proteins that control RAD51 activity are associated with human cancer. Here we describe a set of mutations associated with human breast tumors that occur in a common structural motif of RAD51. Tumor-associated D149N, R150Q and G151D mutations map to a Schellman loop motif located on the surface of the RecA homology domain of RAD51. All three variants are proficient in DNA strand exchange, but G151D is slightly more sensitive to salt than wild-type (WT). Both G151D and R150Q exhibit markedly lower catalytic efficiency for adenosine triphosphate hydrolysis compared to WT. All three mutations alter the physical properties of RAD51 nucleoprotein filaments, with G151D showing the most dramatic changes. G151D forms mixed nucleoprotein filaments with WT RAD51 that have intermediate properties compared to unmixed filaments. These findings raise the possibility that mutations in RAD51 itself may contribute to genome instability in tumor cells, either directly through changes in recombinase properties, or indirectly through changes in interactions with regulatory proteins. PMID:25539919

  17. Mutation of the Conserved Calcium-Binding Motif in Neisseria gonorrhoeae PilC1 Impacts Adhesion but Not Piliation

    PubMed Central

    Cheng, Yuan; Johnson, Michael D. L.; Burillo-Kirch, Christine; Mocny, Jeffrey C.; Anderson, James E.; Garrett, Christopher K.; Redinbo, Matthew R.

    2013-01-01

    Neisseria gonorrhoeae PilC1 is a member of the PilC family of type IV pilus-associated adhesins found in Neisseria species and other type IV pilus-producing genera. Previously, a calcium-binding domain was described in the C-terminal domains of PilY1 of Pseudomonas aeruginosa and in PilC1 and PilC2 of Kingella kingae. Genetic analysis of N. gonorrhoeae revealed a similar calcium-binding motif in PilC1. To evaluate the potential significance of this calcium-binding region in N. gonorrhoeae, we produced recombinant full-length PilC1 and a PilC1 C-terminal domain fragment. We show that, while alterations of the calcium-binding motif disrupted the ability of PilC1 to bind calcium, they did not grossly affect the secondary structure of the protein. Furthermore, we demonstrate that both full-length wild-type PilC1 and full-length calcium-binding-deficient PilC1 inhibited gonococcal adherence to cultured human cervical epithelial cells, unlike the truncated PilC1 C-terminal domain. Similar to PilC1 in K. kingae, but in contrast to the calcium-binding mutant of P. aeruginosa PilY1, an equivalent mutation in N. gonorrhoeae PilC1 produced normal amounts of pili. However, the N. gonorrhoeae PilC1 calcium-binding mutant still had partial defects in gonococcal adhesion to ME180 cells and genetic transformation, which are both essential virulence factors in this human pathogen. Thus, we conclude that calcium binding to PilC1 plays a critical role in pilus function in N. gonorrhoeae. PMID:24002068

  18. Ovodefensins, an Oviduct-Specific Antimicrobial Gene Family, Have Evolved in Birds and Reptiles to Protect the Egg by Both Sequence and Intra-Six-Cysteine Sequence Motif Spacing.

    PubMed

    Whenham, Natasha; Lu, Tian Chee; Maidin, Maisarah B M; Wilson, Peter W; Bain, Maureen M; Stevenson, M Lynn; Stevens, Mark P; Bedford, Michael R; Dunn, Ian C

    2015-06-01

    Ovodefensins are a novel beta defensin-related family of antimicrobial peptides containing conserved glycine and six cysteine residues. Originally thought to be restricted to the albumen-producing region of the avian oviduct, expression was found in chicken, turkey, duck, and zebra finch in large quantities in many parts of the oviduct, but this varied between species and between gene forms in the same species. Using new search strategies, the ovodefensin family now has 35 members, including reptiles, but no representatives outside birds and reptiles have been found. Analysis of their evolution shows that ovodefensins divide into six groups based on the intra-cysteine amino acid spacing, representing a unique mechanism alongside traditional evolution of sequence. The groups have been used to base a nomenclature for the family. Antimicrobial activity for three ovodefensins from chicken and duck was confirmed against Escherichia coli and a pathogenic E. coli strain as well as a Gram-positive organism, Staphylococcus aureus, for the first time. However, activity varied greatly between peptides, with Gallus gallus OvoDA1 being the most potent, suggesting a link with the different structures. Expression of Gallus gallus OvoDA1 (gallin) in the oviduct was increased by estrogen and progesterone and in the reproductive state. Overall, the results support the hypothesis that ovodefensins evolved to protect the egg, but they are not necessarily restricted to the egg white. Therefore, divergent motif structure and sequence present an interesting area of research for antimicrobial peptide design and understanding protection of the cleidoic egg.

  19. Loop Sequence Context Influences the Formation and Stability of the i-Motif for DNA Oligomers of Sequence (CCCXXX)4, where X = A and/or T, under Slightly Acidic Conditions.

    PubMed

    McKim, Mikeal; Buxton, Alexander; Johnson, Courtney; Metz, Amanda; Sheardy, Richard D

    2016-08-11

    The structure and stability of DNA is highly dependent upon the sequence context of the bases (A, G, C, and T) and the environment under which the DNA is prepared (e.g., buffer, temperature, pH, ionic strength). Understanding the factors that influence structure and stability of the i-motif conformation can lead to the design of DNA sequences with highly tunable properties. We have been investigating the influence of pH and temperature on the conformations and stabilities for all permutations of the DNA sequence (CCCXXX)4, where X = A and/or T, using spectroscopic approaches. All oligomers undergo transitions from single-stranded structures at pH 7.0 to i-motif conformations at pH 5.0 as evidenced by circular dichroism (CD) studies. These folded structures possess stacked C:CH(+) base pairs joined by loops of 5'-XXX-3'. Although the pH at the midpoint of the transition (pHmp) varies slightly with loop sequence, the linkage between pH and log K for the proton induced transition is highly loop sequence dependent. All oligomers also undergo the thermally induced i-motif to single-strand transition at pH 5.0 as the temperature is increased from 25 to 95 °C. The temperature at the midpoint of this transition (Tm) is also highly dependent on loop sequence context effects. For seven of eight possible permutations, the pH induced, and thermally induced transitions appear to be highly cooperative and two state. Analysis of the CD optical melting profiles via a van't Hoff approach reveals sequence-dependent thermodynamic parameters for the unfolding as well. Together, these data reveal that the i-motif conformation exhibits exquisite sensitivity to loop sequence context with respect to formation and stability. PMID:27438583

  20. A short conserved motif in ALYREF directs cap- and EJC-dependent assembly of export complexes on spliced mRNAs.

    PubMed

    Gromadzka, Agnieszka M; Steckelberg, Anna-Lena; Singh, Kusum K; Hofmann, Kay; Gehring, Niels H

    2016-03-18

    The export of messenger RNAs (mRNAs) is the final of several nuclear posttranscriptional steps of gene expression. The formation of export-competent mRNPs involves the recruitment of export factors that are assumed to facilitate transport of the mature mRNAs. Using in vitro splicing assays, we show that a core set of export factors, including ALYREF, UAP56 and DDX39, readily associate with the spliced RNAs in an EJC (exon junction complex)- and cap-dependent manner. In order to elucidate how ALYREF and other export adaptors mediate mRNA export, we conducted a computational analysis and discovered four short, conserved, linear motifs present in RNA-binding proteins. We show that mutation in one of the new motifs (WxHD) in an unstructured region of ALYREF reduced RNA binding and abolished the interaction with eIF4A3 and CBP80. Additionally, the mutation impaired proper localization to nuclear speckles and export of a spliced reporter mRNA. Our results reveal important details of the orchestrated recruitment of export factors during the formation of export competent mRNPs. PMID:26773052

  1. A short conserved motif in ALYREF directs cap- and EJC-dependent assembly of export complexes on spliced mRNAs

    PubMed Central

    Gromadzka, Agnieszka M.; Steckelberg, Anna-Lena; Singh, Kusum K.; Hofmann, Kay; Gehring, Niels H.

    2016-01-01

    The export of messenger RNAs (mRNAs) is the final of several nuclear posttranscriptional steps of gene expression. The formation of export-competent mRNPs involves the recruitment of export factors that are assumed to facilitate transport of the mature mRNAs. Using in vitro splicing assays, we show that a core set of export factors, including ALYREF, UAP56 and DDX39, readily associate with the spliced RNAs in an EJC (exon junction complex)- and cap-dependent manner. In order to elucidate how ALYREF and other export adaptors mediate mRNA export, we conducted a computational analysis and discovered four short, conserved, linear motifs present in RNA-binding proteins. We show that mutation in one of the new motifs (WxHD) in an unstructured region of ALYREF reduced RNA binding and abolished the interaction with eIF4A3 and CBP80. Additionally, the mutation impaired proper localization to nuclear speckles and export of a spliced reporter mRNA. Our results reveal important details of the orchestrated recruitment of export factors during the formation of export competent mRNPs. PMID:26773052

  2. Trypanosoma cruzi Binds to Cytokeratin through Conserved Peptide Motifs Found in the Laminin-G-Like Domain of the gp85/Trans-sialidase Proteins

    PubMed Central

    Teixeira, Andre Azevedo Reis; de Vasconcelos, Veronica de Cássia Sardinha; Colli, Walter; Alves, Maria Júlia Manso; Giordano, Ricardo José

    2015-01-01

    Background Chagas' disease, caused by the protozoan parasite Trypanosoma cruzi, is a disease that affects millions of people most of them living in South and Central Americas. There are few treatment options for individuals with Chagas' disease making it important to understand the molecular details of parasite infection, so novel therapeutic alternatives may be developed for these patients. Here, we investigate the interaction between host cell intermediate filament proteins and the T. cruzi gp85 glycoprotein superfamily with hundreds of members that have long been implicated in parasite cell invasion. Methodology/Principal Findings An in silico analysis was utilized to identify peptide motifs shared by the gp85 T. cruzi proteins and, using phage display, these selected peptide motifs were screened for their ability to bind to cells. One peptide, named TS9, showed significant cell binding capacity and was selected for further studies. Affinity chromatography, phage display and invasion assays revealed that peptide TS9 binds to cytokeratins and vimentin, and prevents T. cruzi cell infection. Interestingly, peptide TS9 and a previously identified binding site for intermediate filament proteins are disposed in an antiparallel β-sheet fold, present in a conserved laminin-G-like domain shared by all members of the family. Moreover, peptide TS9 overlaps with an immunodominant T cell epitope. Conclusions/Significance Taken together, the present study reinforces previous results from our group implicating the gp85 superfamily of glycoproteins and the intermediate filament proteins cytokeratin and vimentin in the parasite infection process. It also suggests an important role in parasite biology for the conserved laminin-G-like domain, present in all members of this large family of cell surface proteins. PMID:26398185

  3. Targeted mutations in a highly conserved motif of the nsp1β protein impair the interferon antagonizing activity of porcine reproductive and respiratory syndrome virus.

    PubMed

    Li, Yanhua; Zhu, Longchao; Lawson, Steven R; Fang, Ying

    2013-09-01

    Non-structural protein 1β (nsp1β) of porcine reproductive and respiratory syndrome virus (PRRSV) contains a papain-like cysteine protease (PLPβ) domain and has been identified as the main viral protein antagonizing the host innate immune response. In this study, nsp1β was determined to suppress the expression of reporter genes as well as to suppress 'self-expression' in transfected cells, and this activity appeared to be associated with its interferon (IFN) antagonist function. To knock down the effect of nsp1β on IFN activity, a panel of site-specific mutations in nsp1β was analysed. Double mutations K130A/R134A (type 1 PRRSV) or K124A/R128A (type 2 PRRSV) targeting a highly conserved motif of nsp1β, GKYLQRRLQ (in bold), impaired the ability of nsp1β to suppress IFN-β and reporter gene expression, as well as to suppress 'self-expression' in vitro. Subsequently, viable recombinant viruses vSD01-08-K130A/R134A and vSD95-21-K124A/R128A, containing double mutations in the GKYLQRRLQ motif were generated using reverse genetics. In comparison with WT viruses, these nsp1β mutants showed impaired growth ability in infected cells, but the PLPβ cleavage function was not directly affected. The expression of selected innate immune genes was determined in vSD95-21-K124A/R128A mutant-infected cells. The results consistently showed that gene expression levels of IFN-α, IFN-β and IFN-stimulated gene 15 were upregulated in cells that were infected with the vSD95-21-K124A/R128A compared with that of WT virus. These data suggest that PRRSV nsp1β may selectively suppress cellular gene expression, including expression of genes involved in the host innate immune function. Modifying the key residues in the highly conserved GKYLQRRLQ motif could attenuate virus growth and improve the cellular innate immune responses. PMID:23761406

  4. A conserved predicted pseudoknot in the NS2A-encoding sequence of West Nile and Japanese encephalitis flaviviruses suggests NS1' may derive from ribosomal frameshifting

    PubMed Central

    Firth, Andrew E; Atkins, John F

    2009-01-01

    Japanese encephalitis, West Nile, Usutu and Murray Valley encephalitis viruses form a tight subgroup within the larger Flavivirus genus. These viruses utilize a single-polyprotein expression strategy, resulting in ~10 mature proteins. Plotting the conservation at synonymous sites along the polyprotein coding sequence reveals strong conservation peaks at the very 5' end of the coding sequence, and also at the 5' end of the sequence encoding the NS2A protein. Such peaks are generally indicative of functionally important non-coding sequence elements. The second peak corresponds to a predicted stable pseudoknot structure whose biological importance is supported by compensatory mutations that preserve the structure. The pseudoknot is preceded by a conserved slippery heptanucleotide (Y CCU UUU), thus forming a classical stimulatory motif for -1 ribosomal frameshifting. We hypothesize, therefore, that the functional importance of the pseudoknot is to stimulate a portion of ribosomes to shift -1 nt into a short (45 codon), conserved, overlapping open reading frame, termed foo. Since cleavage at the NS1-NS2A boundary is known to require synthesis of NS2A in cis, the resulting transframe fusion protein is predicted to be NS1-NS2AN-term-FOO. We hypothesize that this may explain the origin of the previously identified NS1 'extension' protein in JEV-group flaviviruses, known as NS1'. PMID:19196463

  5. PRINTS--a database of protein motif fingerprints.

    PubMed

    Attwood, T K; Beck, M E; Bleasby, A J; Parry-Smith, D J

    1994-09-01

    PRINTS is a compendium of protein motif 'fingerprints'. A fingerprint is defined as a group of motifs excised from conserved regions of a sequence alignment, whose diagnostic power or potency is refined by iterative databasescanning (in this case the OWL composite sequence database). Generally, the motifs do not overlap, but are separated along a sequence, though they may be contiguous in 3D-space. The use of groups of independent, linearly- or spatially-distinct motifs allows protein folds and functionalities to be characterised more flexibly and powerfully than conventional single-component patterns or regular expressions. The current version of the database contains 200 entries (encoding 950 motifs), covering a wide range of globular and membrane proteins, modular polypeptides, and so on. The growth of the databaseis influenced by a number of factors; e.g. the use of multiple motifs; the maximisation of sequence information through iterative database scanning; and the fact that the database searched is a large composite. The information contained within PRINTS is distinct from, but complementary to the consensus expressions stored in the widely-used PROSITE dictionary of patterns.

  6. Sequence and spatiotemporal expression analysis of CLE-motif containing genes from the reniform nematode (Rotylenchulus reniformis Linford & Oliveira)

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The reniform nematode, Rotylenchulus reniformis, is a sedentary semi-endoparasitic species with a host range that encompasses more than 77 plant families. Nematode effector proteins containing plant-ligand motifs similar to CLAVATA3/ESR (CLE) peptides have been identified in the Heterodera, Globode...

  7. Nuclear Magnetic Resonance Solution Structures of Lacticin Q and Aureocin A53 Reveal a Structural Motif Conserved among Leaderless Bacteriocins with Broad-Spectrum Activity.

    PubMed

    Acedo, Jeella Z; van Belkum, Marco J; Lohans, Christopher T; Towle, Kaitlyn M; Miskolzie, Mark; Vederas, John C

    2016-02-01

    Lacticin Q (LnqQ) and aureocin A53 (AucA) are leaderless bacteriocins from Lactococcus lactis QU5 and Staphylococcus aureus A53, respectively. These bacteriocins are characterized by the absence of an N-terminal leader sequence and are active against a broad range of Gram-positive bacteria. LnqQ and AucA consist of 53 and 51 amino acids, respectively, and have 47% identical sequences. In this study, their three-dimensional structures were elucidated using solution nuclear magnetic resonance and were shown to consist of four α-helices that assume a very similar compact, globular overall fold (root-mean-square deviation of 1.7 Å) with a highly cationic surface and a hydrophobic core. The structures of LnqQ and AucA resemble the shorter two-component leaderless bacteriocins, enterocins 7A and 7B, despite having low levels of sequence identity. Homology modeling revealed that the observed structural motif may be shared among leaderless bacteriocins with broad-spectrum activity against Gram-positive organisms. The elucidated structures of LnqQ and AucA also exhibit some resemblance to circular bacteriocins. Despite their similar overall fold, inhibition studies showed that LnqQ and AucA have different antimicrobial potency against the Gram-positive strains tested, suggesting that sequence disparities play a crucial role in their mechanisms of action.

  8. A methodology for motif discovery employing iterated cluster re-assignment.

    PubMed

    Abul, Osman; Drabløs, Finn; Sandve, Geir Kjetil

    2006-01-01

    Motif discovery is a crucial part of regulatory network identification, and therefore widely studied in the literature. Motif discovery programs search for statistically significant, well-conserved and over-represented patterns in given promoter sequences. When gene expression data is available, there are mainly three paradigms for motif discovery; cluster-first, regression, and joint probabilistic. The success of motif discovery depends highly on the homogeneity of input sequences, regardless of paradigm employed. In this work, we propose a methodology for getting homogeneous subsets from input sequences for increased motif discovery performance. It is a unification of cluster-first and regression paradigms based on iterative cluster re-assignment. The experimental results show the effectiveness of the methodology.

  9. Enzymatic activity of poliovirus RNA polymerase mutants with single amino acid changes in the conserved YGDD amino acid motif.

    PubMed

    Jablonski, S A; Luo, M; Morrow, C D

    1991-09-01

    RNA-dependent RNA polymerases contain a highly conserved region of amino acids with a core segment composed of the amino acids YGDD which have been hypothesized to be at or near the catalytic active site of the molecule. Six mutations in this conserved YGDD region of the poliovirus RNA-dependent RNA polymerase were made by using oligonucleotide site-directed DNA mutagenesis of the poliovirus cDNA to substitute A, C, M, P, S, or V for the amino acid G. The mutant polymerase genes were expressed in Escherichia coli, and the purified RNA polymerases were tested for in vitro enzyme activity. Two of the mutant RNA polymerases (those in which the glycine residue was replaced with alanine or serine) exhibited in vitro enzymatic activity ranging from 5 to 20% of wild-type activity, while the remaining mutant RNA polymerases were inactive. Alterations in the in vitro reaction conditions by modification of temperature, metal ion concentration, or pH resulted in no significant differences in the activities of the mutant RNA polymerases relative to that of the wild-type enzyme. An antipeptide antibody directed against the wild-type core amino acid segment containing the YGDD region of the poliovirus polymerase reacted with the wild-type recombinant RNA polymerase and to a limited extent with the two enzymatically active mutant polymerases; the antipeptide antibody did not react with the mutant RNA polymerases which did not have in vitro enzyme activity. These results are discussed in the context of secondary-structure predictions for the core segment containing the conserved YGDD amino acids in the poliovirus RNA polymerase. PMID:1651402

  10. MINER: software for phylogenetic motif identification.

    PubMed

    La, David; Livesay, Dennis R

    2005-07-01

    MINER is web-based software for phylogenetic motif (PM) identification. PMs are sequence regions (fragments) that conserve the overall familial phylogeny. PMs have been shown to correspond to a wide variety of catalytic regions, substrate-binding sites and protein interfaces, making them ideal functional site predictions. The MINER output provides an intuitive interface for interactive PM sequence analysis and structural visualization. The web implementation of MINER is freely available at http://www.pmap.csupomona.edu/MINER/. Source code is available to the academic community on request.

  11. Isolation and comparative analysis of the wheat TaPT2 promoter: identification in silico of new putative regulatory motifs conserved between monocots and dicots.

    PubMed

    Tittarelli, A; Milla, L; Vargas, F; Morales, A; Neupert, C; Meisel, L A; Salvo-G, H; Peñaloza, E; Muñoz, G; Corcuera, L J; Silva, H

    2007-01-01

    Phosphorus deficiency is one of the major nutrient stresses affecting plant growth. Plants respond to phosphate (Pi) deficiency through multiple strategies, including the synthesis of high-affinity Pi transporters. In this study, the expression pattern of one putative wheat high-affinity phosphate transporter, TaPT2, was examined in roots and leaves under Pi-deficient conditions. TaPT2 transcript levels increased in roots of Pi-starved plants. A 579 bp fragment of the TaPT2 promoter is sufficient to drive the expression of the GUS reporter gene specifically in roots of Pi-deprived wheat. This TaPT2 promoter fragment was also able to drive expression of the GUS reporter gene in transgenic Arabidopsis thaliana, under similar growth conditions. Conserved regions and candidate regulatory motifs were detected by comparing this promoter with Pi transporter promoters from barley, rice, and Arabidopsis. Altogether, these results indicate that there are conserved cis-acting elements and trans-acting factors that enable the TaPT2 promoter to be regulated in a tissue-specific and Pi-dependent fashion in both monocots and dicots.

  12. Mutations in a Highly Conserved Motif of nsp1β Protein Attenuate the Innate Immune Suppression Function of Porcine Reproductive and Respiratory Syndrome Virus

    PubMed Central

    Li, Yanhua; Shyu, Duan-Liang; Shang, Pengcheng; Bai, Jianfa; Ouyang, Kang; Dhakal, Santosh; Hiremath, Jagadish; Binjawadagi, Basavaraj

    2016-01-01

    ABSTRACT Porcine reproductive and respiratory syndrome virus (PRRSV) nonstructural protein 1β (nsp1β) is a multifunctional viral protein, which is involved in suppressing the host innate immune response and activating a unique −2/−1 programmed ribosomal frameshifting (PRF) signal for the expression of frameshifting products. In this study, site-directed mutagenesis analysis showed that the R128A or R129A mutation introduced into a highly conserved motif (123GKYLQRRLQ131) reduced the ability of nsp1β to suppress interferon beta (IFN-β) activation and also impaired nsp1β's function as a PRF transactivator. Three recombinant viruses, vR128A, vR129A, and vRR129AA, carrying single or double mutations in the GKYLQRRLQ motif were characterized. In comparison to the wild-type (WT) virus, vR128A and vR129A showed slightly reduced growth abilities, while the vRR129AA mutant had a significantly reduced growth ability in infected cells. Consistent with the attenuated growth phenotype in vitro, pigs infected with nsp1β mutants had lower levels of viremia than did WT virus-infected pigs. Compared to the WT virus in infected cells, all three mutated viruses stimulated high levels of IFN-α expression and exhibited a reduced ability to suppress the mRNA expression of selected interferon-stimulated genes (ISGs). In pigs infected with nsp1β mutants, IFN-α production was increased in the lungs at early time points postinfection, which was correlated with increased innate NK cell function. Furthermore, the augmented innate response was consistent with the increased production of IFN-γ in pigs infected with mutated viruses. These data demonstrate that residues R128 and R129 are critical for nsp1β function and that modifying these key residues in the GKYLQRRLQ motif attenuates virus growth ability and improves the innate and adaptive immune responses in infected animals. IMPORTANCE PRRSV infection induces poor antiviral innate IFN and cytokine responses, which results in

  13. PscanChIP: finding over-represented transcription factor-binding site motifs and their correlations in sequences from ChIP-Seq experiments

    PubMed Central

    Zambelli, Federico; Pesole, Graziano; Pavesi, Giulio

    2013-01-01

    Chromatin immunoprecipitation followed by sequencing with next-generation technologies (ChIP-Seq) has become the de facto standard for building genome-wide maps of regions bound by a given transcription factor (TF). The regions identified, however, have to be further analyzed to determine the actual DNA-binding sites for the TF, as well as sites for other TFs belonging to the same TF complex or in general co-operating or interacting with it in transcription regulation. PscanChIP is a web server that, starting from a collection of genomic regions derived from a ChIP-Seq experiment, scans them using motif descriptors like JASPAR or TRANSFAC position-specific frequency matrices, or descriptors uploaded by users, and it evaluates both motif enrichment and positional bias within the regions according to different measures and criteria. PscanChIP can successfully identify not only the actual binding sites for the TF investigated by a ChIP-Seq experiment but also secondary motifs corresponding to other TFs that tend to bind the same regions, and, if present, precise positional correlations among their respective sites. The web interface is free for use, and there is no login requirement. It is available at http://www.beaconlab.it/pscan_chip_dev. PMID:23748563

  14. Sequence motif upstream of the Hendra virus fusion protein cleavage site is not sufficient to promote efficient proteolytic processing

    SciTech Connect

    Craft, Willie Warren; Dutch, Rebecca Ellis . E-mail: rdutc2@uky.edu

    2005-10-10

    The Hendra virus fusion (HeV F) protein is synthesized as a precursor, F{sub 0}, and proteolytically cleaved into the mature F{sub 1} and F{sub 2} heterodimer, following an HDLVDGVK{sub 109} motif. This cleavage event is required for fusogenic activity. To determine the amino acid requirements for processing of the HeV F protein, we constructed multiple mutants. Individual and simultaneous alanine substitutions of the eight residues immediately upstream of the cleavage site did not eliminate processing. A chimeric SV5 F protein in which the furin site was substituted for the VDGVK{sub 109} motif of the HeV F protein was not processed but was expressed on the cell surface. Another chimeric SV5 F protein containing the HDLVDGVK{sub 109} motif of the HeV F protein underwent partial cleavage. These data indicate that the upstream region can play a role in protease recognition, but is neither absolutely required nor sufficient for efficient processing of the HeV F protein.

  15. D-SLIMMER: domain-SLiM interaction motifs miner for sequence based protein-protein interaction data.

    PubMed

    Hugo, Willy; Ng, See-Kiong; Sung, Wing-Kin

    2011-12-01

    Many biologically important protein-protein interactions (PPIs) have been found to be mediated by short linear motifs (SLiMs). These interactions are mediated by the binding of a protein domain, often with a nonlinear interaction interface, to a SLiM. We propose a method called D-SLIMMER to mine for SLiMs in PPI data on the basis of the interaction density between a nonlinear motif (i.e., a protein domain) in one protein and a SLiM in the other protein. Our results on a benchmark of 113 experimentally verified reference SLiMs showed that D-SLIMMER outperformed existing methods notably for discovering domain-SLiMs interaction motifs. To illustrate the significance of the SLiMs detected, we highlighted two SLiMs discovered from the PPI data by D-SLIMMER that are variants of the known ELM SLiM, as well as a literature-backed SLiM that is yet to be listed in the reference databases. We also presented a novel SLiM predicted by D-SLIMMER that was strongly supported by existing biological literatures. These examples showed that D-SLIMMER is able to find SLiMs that are biologically relevant.

  16. NNAlign: a web-based prediction method allowing non-expert end-user discovery of sequence motifs in quantitative peptide data.

    PubMed

    Andreatta, Massimo; Schafer-Nielsen, Claus; Lund, Ole; Buus, Søren; Nielsen, Morten

    2011-01-01

    Recent advances in high-throughput technologies have made it possible to generate both gene and protein sequence data at an unprecedented rate and scale thereby enabling entirely new "omics"-based approaches towards the analysis of complex biological processes. However, the amount and complexity of data that even a single experiment can produce seriously challenges researchers with limited bioinformatics expertise, who need to handle, analyze and interpret the data before it can be understood in a biological context. Thus, there is an unmet need for tools allowing non-bioinformatics users to interpret large data sets. We have recently developed a method, NNAlign, which is generally applicable to any biological problem where quantitative peptide data is available. This method efficiently identifies underlying sequence patterns by simultaneously aligning peptide sequences and identifying motifs associated with quantitative readouts. Here, we provide a web-based implementation of NNAlign allowing non-expert end-users to submit their data (optionally adjusting method parameters), and in return receive a trained method (including a visual representation of the identified motif) that subsequently can be used as prediction method and applied to unknown proteins/peptides. We have successfully applied this method to several different data sets including peptide microarray-derived sets containing more than 100,000 data points. NNAlign is available online at http://www.cbs.dtu.dk/services/NNAlign.

  17. The Evolutionarily Conserved Tre2/Bub2/Cdc16 (TBC), Lysin Motif (LysM), Domain Catalytic (TLDc) Domain Is Neuroprotective against Oxidative Stress*

    PubMed Central

    Finelli, Mattéa J.; Sanchez-Pulido, Luis; Liu, Kevin X; Davies, Kay E.; Oliver, Peter L.

    2016-01-01

    Oxidative stress is a pathological feature of many neurological disorders; therefore, utilizing proteins that are protective against such cellular insults is a potentially valuable therapeutic approach. Oxidation resistance 1 (OXR1) has been shown previously to be critical for oxidative stress resistance in neuronal cells; deletion of this gene causes neurodegeneration in mice, yet conversely, overexpression of OXR1 is protective in cellular and mouse models of amyotrophic lateral sclerosis. However, the molecular mechanisms involved are unclear. OXR1 contains the Tre2/Bub2/Cdc16 (TBC), lysin motif (LysM), domain catalytic (TLDc) domain, a motif present in a family of proteins including TBC1 domain family member 24 (TBC1D24), a protein mutated in a range of disorders characterized by seizures, hearing loss, and neurodegeneration. The TLDc domain is highly conserved across species, although the structure-function relationship is unknown. To understand the role of this domain in the stress response, we carried out systematic analysis of all mammalian TLDc domain-containing proteins, investigating their expression and neuroprotective properties in parallel. In addition, we performed a detailed structural and functional study of this domain in which we identified key residues required for its activity. Finally, we present a new mouse insertional mutant of Oxr1, confirming that specific disruption of the TLDc domain in vivo is sufficient to cause neurodegeneration. Our data demonstrate that the integrity of the TLDc domain is essential for conferring neuroprotection, an important step in understanding the functional significance of all TLDc domain-containing proteins in the cellular stress response and disease. PMID:26668325

  18. The valine and lysine residues in the conserved FxVTxK motif are important for the function of phylogenetically distant plant cellulose synthases.

    PubMed

    Slabaugh, Erin; Scavuzzo-Duggan, Tess; Chaves, Arielle; Wilson, Liza; Wilson, Carmen; Davis, Jonathan K; Cosgrove, Daniel J; Anderson, Charles T; Roberts, Alison W; Haigler, Candace H

    2016-05-01

    Cellulose synthases (CESAs) synthesize the β-1,4-glucan chains that coalesce to form cellulose microfibrils in plant cell walls. In addition to a large cytosolic (catalytic) domain, CESAs have eight predicted transmembrane helices (TMHs). However, analogous to the structure of BcsA, a bacterial CESA, predicted TMH5 in CESA may instead be an interfacial helix. This would place the conserved FxVTxK motif in the plant cell cytosol where it could function as a substrate-gating loop as occurs in BcsA. To define the functional importance of the CESA region containing FxVTxK, we tested five parallel mutations in Arabidopsis thaliana CESA1 and Physcomitrella patens CESA5 in complementation assays of the relevant cesa mutants. In both organisms, the substitution of the valine or lysine residues in FxVTxK severely affected CESA function. In Arabidopsis roots, both changes were correlated with lower cellulose anisotropy, as revealed by Pontamine Fast Scarlet. Analysis of hypocotyl inner cell wall layers by atomic force microscopy showed that two altered versions of Atcesa1 could rescue cell wall phenotypes observed in the mutant background line. Overall, the data show that the FxVTxK motif is functionally important in two phylogenetically distant plant CESAs. The results show that Physcomitrella provides an efficient model for assessing the effects of engineered CESA mutations affecting primary cell wall synthesis and that diverse testing systems can lead to nuanced insights into CESA structure-function relationships. Although CESA membrane topology needs to be experimentally determined, the results support the possibility that the FxVTxK region functions similarly in CESA and BcsA.

  19. Molecular characterization of flavanone 3 beta-hydroxylases. Consensus sequence, comparison with related enzymes and the role of conserved histidine residues.

    PubMed

    Britsch, L; Dedio, J; Saedler, H; Forkmann, G

    1993-10-15

    A heterologous cDNA probe from Petunia hybrida was used to isolate flavanone-3 beta-hydroxylase-encoding cDNA clones from carnation (Dianthus caryophyllus), china aster (Callistephus chinensis) and stock (Matthiola incana). The deduced protein sequences together with the known sequences of the enzyme from P. hybrida, barley (Hordeum vulgare) and snapdragon (Antirrhinum majus) enabled the determination of a consensus sequence which revealed an overall 84% similarity (53% identity) of flavanone 3 beta-hydroxylases from the different sources. Alignment with the sequences of other known enzymes of the same class and to related non-heme iron-(II) enzymes demonstrated the strict genetic conservation of 14 amino acids, in particular, of three histidines and an aspartic acid. The conservation of the histidine motifs provides strong support for the possible conservation of structurally similar iron-binding sites in these enzymes. The putative role of histidines as chelators of ferrous ions in the active site of flavanone 3 beta-hydroxylases was corroborated by diethyl-pyrocarbonate modification of the partially purified recombinant Petunia enzyme.

  20. The preference of the mitochondrial endonuclease for a conserved sequence block in mitochondrial DNA is highly conserved during mammalian evolution.

    PubMed Central

    Low, R L; Buzan, J M; Couper, C L

    1988-01-01

    Endonuclease activity identified in crude preparations of rat and human heart mitochondria has each been partially purified and characterized. Both the rat and human activities purify as a single enzyme that closely resembles the endonuclease of bovine-heart mitochondria (Cummings, O.W. et. al. (1987) J. Biol. Chem. 262:2005-2015). All three enzymes, for example elute similarly during gel filtration and DNA-cellulose chromatography, and exhibit similar enzymatic properties. Although the nucleotide sequences of the mtDNAs indicate that there has occurred an unusual degree of divergence in the displacement-loop region during mammalian evolution, the nucleotide specificities of the mt endonucleases appear highly conserved and show a striking preference for an evolutionarily-conserved sequence tract that is located upstream from the heavy (H)-strand origin of DNA replication (OriH). Images PMID:3399407

  1. Conversion of a helix-turn-helix motif sequence-specific DNA binding protein into a site-specific DNA cleavage agent.

    PubMed Central

    Ebright, R H; Ebright, Y W; Pendergrast, P S; Gunasekera, A

    1990-01-01

    Escherichia coli catabolite gene activator protein (CAP) is a helix-turn-helix motif sequence-specific DNA binding protein [de Crombrugghe, B., Busby, S. & Buc, H. (1984) Science 224, 831-838; and Pabo, C. & Sauer, R. (1984) Annu. Rev. Biochem. 53, 293-321]. In this work, CAP has been converted into a site-specific DNA cleavage agent by incorporation of the chelator 1,10-phenanthroline at amino acid 10 of the helix-turn-helix motif. [(N-Acetyl-5-amino-1,10-phenanthroline)-Cys178]CAP binds to a 22-base-pair DNA recognition site with Kobs = 1 x 10(8) M-1. In the presence of Cu(II) and reducing agent, [(N-acetyl-5-amino-1,10-phenanthroline)-Cys178]CAP cleaves DNA at four adjacent nucleotides on each DNA strand within the DNA recognition site. The DNA cleavage reaction has been demonstrated using 40-base-pair and 7164-base-pair DNA substrates. The DNA cleavage reaction is not inhibited by dam methylation of the DNA substrate. Such semisynthetic site-specific DNA cleavage agents have potential applications in chromosome mapping, cloning, and sequencing. Images PMID:2158096

  2. Sequence analysis shows that Lifeguard belongs to a new evolutionarily conserved cytoprotective family.

    PubMed

    Reimers, Kerstin; Choi, Claudia Y-U; Mau-Thek, Eddy; Vogt, Peter M

    2006-10-01

    Cellular sensitivity to apoptotic stimuli is determined by several regulatory proteins. The biological and biomedical impact of these regulatory proteins is of fundamental importance for understanding and controlling apoptotic processes. We used a bioinformatic approach to characterise the antiapoptotic protein Lifeguard (LFG). LFG is an evolutionarily well-conserved protein with homologues in many species. Due to its hydrophobic nature it is predicted to reside in cellular membranes, namely the endoplasmatic reticulum and the plasma membrane, with seven transmembrane spanners and a small cytoplasmic domain. The consensus motif of a protein family with unknown function UPF0005 was found in the C-terminus. The structure of Lifeguard resembles the antiapoptotic protein Bax Inhibitor-1 (BI-1). Concordantly, it was shown that Bax co-immunoprecipitates with LFG. Our results indicate that LFG belongs to a new cytoprotective family with evolutionarily conserved functions in the prevention of programmed cell death.

  3. Identification of a conserved sequence in the non-coding regions of many human genes.

    PubMed Central

    Donehower, L A; Slagle, B L; Wilde, M; Darlington, G; Butel, J S

    1989-01-01

    We have analyzed a sequence of approximately 70 base pairs (bp) that shows a high degree of similarity to sequences present in the non-coding regions of a number of human and other mammalian genes. The sequence was discovered in a fragment of human genomic DNA adjacent to an integrated hepatitis B virus genome in cells derived from human hepatocellular carcinoma tissue. When one of the viral flanking sequences was compared to nucleotide sequences in GenBank, more than thirty human genes were identified that contained a similar sequence in their non-coding regions. The sequence element was usually found once or twice in a gene, either in an intron or in the 5' or 3' flanking regions. It did not share any similarities with known short interspersed nucleotide elements (SINEs) or presently known gene regulatory elements. This element was highly conserved at the same position within the corresponding human and mouse genes for myoglobin and N-myc, indicating evolutionary conservation and possible functional importance. Preliminary DNase I footprinting data suggested that the element or its adjacent sequences may bind nuclear factors to generate specific DNase I hypersensitive sites. The size, structure, and evolutionary conservation of this sequence indicates that it is distinct from other types of short interspersed repetitive elements. It is possible that the element may have a cis-acting functional role in the genome. Images PMID:2536922

  4. Loop 7 of E2 enzymes: an ancestral conserved functional motif involved in the E2-mediated steps of the ubiquitination cascade.

    PubMed

    Papaleo, Elena; Casiraghi, Nicola; Arrigoni, Alberto; Vanoni, Marco; Coccetti, Paola; De Gioia, Luca

    2012-01-01

    The ubiquitin (Ub) system controls almost every aspect of eukaryotic cell biology. Protein ubiquitination depends on the sequential action of three classes of enzymes (E1, E2 and E3). E2 Ub-conjugating enzymes have a central role in the ubiquitination pathway, interacting with both E1 and E3, and influencing the ultimate fate of the substrates. Several E2s are characterized by an extended acidic insertion in loop 7 (L7), which if mutated is known to impair the proper E2-related functions. In the present contribution, we show that acidic loop is a conserved ancestral motif in E2s, relying on the presence of alternate hydrophobic and acidic residues. Moreover, the dynamic properties of a subset of family 3 E2s, as well as their binary and ternary complexes with Ub and the cognate E3, have been investigated. Here we provide a model of L7 role in the different steps of the ubiquitination cascade of family 3 E2s. The L7 hydrophobic residues turned out to be the main determinant for the stabilization of the E2 inactive conformations by a tight network of interactions in the catalytic cleft. Moreover, phosphorylation is known from previous studies to promote E2 competent conformations for Ub charging, inducing electrostatic repulsion and acting on the L7 acidic residues. Here we show that these active conformations are stabilized by a network of hydrophobic interactions between L7 and L4, the latter being a conserved interface for E3-recruitment in several E2s. In the successive steps, L7 conserved acidic residues also provide an interaction interface for both Ub and the Rbx1 RING subdomain of the cognate E3. Our data therefore suggest a crucial role for L7 of family 3 E2s in all the E2-mediated steps of the ubiquitination cascade. Its different functions are exploited thank to its conserved hydrophobic and acidic residues in a finely orchestrate mechanism.

  5. RNAMotifScanX: a graph alignment approach for RNA structural motif identification.

    PubMed

    Zhong, Cuncong; Zhang, Shaojie

    2015-03-01

    RNA structural motifs are recurrent three-dimensional (3D) components found in the RNA architecture. These RNA structural motifs play important structural or functional roles and usually exhibit highly conserved 3D geometries and base-interaction patterns. Analysis of the RNA 3D structures and elucidation of their molecular functions heavily rely on efficient and accurate identification of these motifs. However, efficient RNA structural motif search tools are lacking due to the high complexity of these motifs. In this work, we present RNAMotifScanX, a motif search tool based on a base-interaction graph alignment algorithm. This novel algorithm enables automatic identification of both partially and fully matched motif instances. RNAMotifScanX considers noncanonical base-pairing interactions, base-stacking interactions, and sequence conservation of the motifs, which leads to significantly improved sensitivity and specificity as compared with other state-of-the-art search tools. RNAMotifScanX also adopts a carefully designed branch-and-bound technique, which enables ultra-fast search of large kink-turn motifs against a 23S rRNA. The software package RNAMotifScanX is implemented using GNU C++, and is freely available from http://genome.ucf.edu/RNAMotifScanX.

  6. Relation between mRNA expression and sequence information in Desulfovibrio vulgaris: Combinatorial contributions of upstream regulatory motifs and coding sequence features to variations in mRNA abundance

    SciTech Connect

    Wu, Gang; Nie, Lei; Zhang, Weiwen

    2006-05-26

    ABSTRACT-The context-dependent expression of genes is the core for biological activities, and significant attention has been given to identification of various factors contributing to gene expression at genomic scale. However, so far this type of analysis has been focused whether on relation between mRNA expression and non-coding sequence features such as upstream regulatory motifs or on correlation between mRN abundance and non-random features in coding sequences (e.g. codon usage and amino acid usage). In this study multiple regression analyses of the mRNA abundance and all sequence information in Desulfovibrio vulgaris were performed, with the goal to investigate how much coding and non-coding sequence features contribute to the variations in mRNA expression, and in what manner they act together...

  7. Detection of Weakly Conserved Ancestral Mammalian RegulatorySequences by Primate Comparisons

    SciTech Connect

    Wang, Qian-fei; Prabhakar, Shyam; Chanan, Sumita; Cheng,Jan-Fang; Rubin, Edward M.; Boffelli, Dario

    2006-06-01

    Genomic comparisons between human and distant, non-primatemammals are commonly used to identify cis-regulatory elements based onconstrained sequence evolution. However, these methods fail to detectcryptic functional elements, which are too weakly conserved among mammalsto distinguish from nonfunctional DNA. To address this problem, weexplored the potential of deep intra-primate sequence comparisons. Wesequenced the orthologs of 558 kb of human genomic sequence, coveringmultiple loci involved in cholesterol homeostasis, in 6 nonhumanprimates. Our analysis identified 6 noncoding DNA elements displayingsignificant conservation among primates, but undetectable in more distantcomparisons. In vitro and in vivo tests revealed that at least three ofthese 6 elements have regulatory function. Notably, the mouse orthologsof these three functional human sequences had regulatory activity despitetheir lack of significant sequence conservation, indicating that they arecryptic ancestral cis-regulatory elements. These regulatory elementscould still be detected in a smaller set of three primate speciesincluding human, rhesus and marmoset. Since the human and rhesus genomesequences are already available, and the marmoset genome is activelybeing sequenced, the primate-specific conservation analysis describedhere can be applied in the near future on a whole-genome scale, tocomplement the annotation provided by more distant speciescomparisons.

  8. Accelerated Evolution of Conserved Noncoding Sequences in theHuman Genome

    SciTech Connect

    Prambhakar, Shyam; Noonan, James P.; Paabo, Svante; Rubin, EdwardM.

    2006-07-06

    Genomic comparisons between human and distant, non-primatemammals are commonly used to identify cis-regulatory elements based onconstrained sequence evolution. However, these methods fail to detect"cryptic" functional elements, which are too weakly conserved amongmammals to distinguish from nonfunctional DNA. To address this problem,we explored the potential of deep intra-primate sequence comparisons. Wesequenced the orthologs of 558 kb of human genomic sequence, coveringmultiple loci involved in cholesterol homeostasis, in 6 nonhumanprimates. Our analysis identified 6 noncoding DNA elements displayingsignificant conservation among primates, but undetectable in more distantcomparisons. In vitro and in vivo tests revealed that at least three ofthese 6 elements have regulatory function. Notably, the mouse orthologsof these three functional human sequences had regulatory activity despitetheir lack of significant sequence conservation, indicating that they arecryptic ancestral cis-regulatory elements. These regulatory elementscould still be detected in a smaller set of three primate speciesincluding human, rhesus and marmoset. Since the human and rhesus genomesequences are already available, and the marmoset genome is activelybeing sequenced, the primate-specific conservation analysis describedhere can be applied in the near future on a whole-genome scale, tocomplement the annotation provided by more distant speciescomparisons.

  9. Phylogenetic reconstruction using secondary structures and sequence motifs of ITS2 rDNA of Paragonimus westermani (Kerbert, 1878) Braun, 1899 (Digenea: Paragonimidae) and related species

    PubMed Central

    2009-01-01

    motifs allowed an accurate in-silico distinction of lung flukes. Conclusion Data indicate that ITS2 motifs (≤ 50 bp in size) can be considered a promising tool for trematode species identification. RNA secondary structure analysis could be a valuable tool for distinguishing new species and completing Paragonimus systematics, more so because ITS2 secondary structure contains more information than the usual primary sequence alignment. PMID:19958489

  10. Comparative genomic analysis of upstream miRNA regulatory motifs in Caenorhabditis.

    PubMed

    Jovelin, Richard; Krizus, Aldis; Taghizada, Bakhtiyar; Gray, Jeremy C; Phillips, Patrick C; Claycomb, Julie M; Cutter, Asher D

    2016-07-01

    MicroRNAs (miRNAs) comprise a class of short noncoding RNA molecules that play diverse developmental and physiological roles by controlling mRNA abundance and protein output of the vast majority of transcripts. Despite the importance of miRNAs in regulating gene function, we still lack a complete understanding of how miRNAs themselves are transcriptionally regulated. To fill this gap, we predicted regulatory sequences by searching for abundant short motifs located upstream of miRNAs in eight species of Caenorhabditis nematodes. We identified three conserved motifs across the Caenorhabditis phylogeny that show clear signatures of purifying selection from comparative genomics, patterns of nucleotide changes in motifs of orthologous miRNAs, and correlation between motif incidence and miRNA expression. We then validated our predictions with transgenic green fluorescent protein reporters and site-directed mutagenesis for a subset of motifs located in an enhancer region upstream of let-7 We demonstrate that a CT-dinucleotide motif is sufficient for proper expression of GFP in the seam cells of adult C. elegans, and that two other motifs play incremental roles in combination with the CT-rich motif. Thus, functional tests of sequence motifs identified through analysis of molecular evolutionary signatures provide a powerful path for efficiently characterizing the transcriptional regulation of miRNA genes. PMID:27140965

  11. Conservation of sequence and function in fertilization of the cortical granule serine protease in echinoderms.

    PubMed

    Oulhen, Nathalie; Xu, Dongdong; Wessel, Gary M

    2014-08-01

    Conservation of the cortical granule serine protease during fertilization in echinoderms was tested both functionally in sea stars, and computationally throughout the echinoderm phylum. We find that the inhibitor of serine protease (soybean trypsin inhibitor) effectively blocks proper transition of the sea star fertilization envelope into a protective sperm repellent, whereas inhibitors of the other main types of proteases had no effect. Scanning the transcriptomes of 15 different echinoderm ovaries revealed sequences of high conservation to the originally identified sea urchin cortical serine protease, CGSP1. These conserved sequences contained the catalytic triad necessary for enzymatic activity, and the tandemly repeated LDLr-like repeats. We conclude that the protease involved in the slow block to polyspermy is an essential and conserved element of fertilization in echinoderms, and may provide an important reagent for identification and testing of the cell surface proteins in eggs necessary for sperm binding.

  12. Conservation of the human telomere sequence (TTAGGG)n among vertebrates.

    PubMed Central

    Meyne, J; Ratliff, R L; Moyzis, R K

    1989-01-01

    To determine the evolutionary origin of the human telomere sequence (TTAGGG)n, biotinylated oligodeoxynucleotides of this sequence were hybridized to metaphase spreads from 91 different species, including representative orders of bony fish, reptiles, amphibians, birds, and mammals. Under stringent hybridization conditions, fluorescent signals were detected at the telomeres of all chromosomes, in all 91 species. The conservation of the (TTAGGG)n sequence and its telomeric location, in species thought to share a common ancestor over 400 million years ago, strongly suggest that this sequence is the functional vertebrate telomere. Images PMID:2780561

  13. Mutagenesis and biochemical studies on AuaA confirmed the importance of the two conserved aspartate-rich motifs and suggested difference in the amino acids for substrate binding in membrane-bound prenyltransferases.

    PubMed

    Stec, Edyta; Li, Shu-Ming

    2012-07-01

    AuaA is a membrane-bound farnesyltransferase from the myxobacterium Stigmatella aurantiaca involved in the biosynthesis of aurachins. Like other known membrane-bound aromatic prenyltransferases, AuaA contains two conserved aspartate-rich motifs. Several amino acids in the first motif NXxxDxxxD were proposed to be responsible for prenyl diphosphate binding via metal ions like Mg(2+). Site-directed mutagenesis experiments demonstrated in this study that asparagine, but not the arginine residue in NRxxDxxxD, is important for the enzyme activity of AuaA, differing from the importance of NQ or ND residues in the NQxxDxxxD or NDxxDxxxD motifs observed in some membrane-bound prenyltransferases. The second motif of known membrane-bound prenyltransferases was proposed to be involved in the binding of their aromatic substrates. KDIxDxEGD, also found in AuaA, had been previously speculated to be characteristic for binding of flavonoids or homogenisate. Site-directed mutagenesis experiments with AuaA showed that KDIxDxEGD was critical for the enzyme activity. However, this motif is very likely not specific for flavonoid or homogenisate prenyltransferases, because none of the tested flavonoids was accepted by AuaA or its mutant R53A in the presence of farnesyl, geranyl or dimethylallyl diphosphate.

  14. Evolution of conserved non-coding sequences within the vertebrate Hox clusters through the two-round whole genome duplications revealed by phylogenetic footprinting analysis.

    PubMed

    Matsunami, Masatoshi; Sumiyama, Kenta; Saitou, Naruya

    2010-12-01

    As a result of two-round whole genome duplications, four or more paralogous Hox clusters exist in vertebrate genomes. The paralogous genes in the Hox clusters show similar expression patterns, implying shared regulatory mechanisms for expression of these genes. Previous studies partly revealed the expression mechanisms of Hox genes. However, cis-regulatory elements that control these paralogous gene expression are still poorly understood. Toward solving this problem, the authors searched conserved non-coding sequences (CNSs), which are candidates of cis-regulatory elements. When comparing orthologous Hox clusters of 19 vertebrate species, 208 intergenic conserved regions were found. The authors then searched for CNSs that were conserved not only between orthologous clusters but also among the four paralogous Hox clusters. The authors found three regions that are conserved among all the four clusters and eight regions that are conserved between intergenic regions of two paralogous Hox clusters. In total, 28 CNSs were identified in the paralogous Hox clusters, and nine of them were newly found in this study. One of these novel regions bears a RARE motif. These CNSs are candidates for gene expression regulatory regions among paralogous Hox clusters. The authors also compared vertebrate CNSs with amphioxus CNSs within the Hox cluster, and found that two CNSs in the HoxA and HoxB clusters retain homology with amphioxus CNSs through the two-round whole genome duplications.

  15. Avian retroviral RNA encapsidation: reexamination of functional 5' RNA sequences and the role of nucleocapsid Cys-His motifs.

    PubMed Central

    Aronoff, R; Hajjar, A M; Linial, M L

    1993-01-01

    RNA packaging signals (psi) from the 5' ends of murine and avian retroviral genomes have previously been shown to direct encapsidation of heterologous mRNA into the retroviral virion. The avian 5' packaging region has now been further characterized, and we have defined a 270-nucleotide sequence, A psi, which is sufficient to direct packaging of heterologous RNA. Identification of the A psi sequence suggests that several retroviral cis-acting sequences contained in psi+ (the primer binding site, the putative dimer linkage sequence, and the splice donor site) are dispensable for specific RNA encapsidation. Subgenomic env mRNA is not efficiently encapsidated into particles, even though the A psi sequence is present in this RNA. In contrast, spliced heterologous psi-containing RNA is packaged into virions as efficiently as unspliced species; thus splicing per se is not responsible for the failure of env mRNA to be encapsidated. We also found that an avian retroviral mutant deleted for both nucleocapsid Cys-His boxes retains the capacity to encapsidate RNA containing psi sequences, although this RNA is unstable and is thus difficult to detect in mature particles. Electron microscopy reveals that virions produced by this mutant lack a condensed core, which may allow the RNA to be accessible to nucleases. Images PMID:8380070

  16. An alternative oxidase monoclonal antibody recognises a highly conserved sequence among alternative oxidase subunits.

    PubMed

    Finnegan, P M; Wooding, A R; Day, D A

    1999-03-19

    The alternative oxidase is found in the inner mitochondrial membranes of plants and some fungi and protists. A monoclonal antibody raised against the alternative oxidase from the aroid lily Sauromatum guttatum has been used extensively to detect the enzyme in these organisms. Using an immunoblotting strategy, the antibody binding site has been localised to the sequence RADEAHHRDVNH within the soybean alternative oxidase 2 protein. Examination of sequence variants showed that A2 and residues C-terminal to H7 are required for recognition by the monoclonal antibody raised against the alternative oxidase. The recognition sequence is highly conserved among all alternative oxidase proteins and is absolutely conserved in 12 of 14 higher plant sequences, suggesting that this antibody will continue to be extremely useful in studying the expression and synthesis of the alternative oxidase.

  17. A comprehensive analysis of the La-motif protein superfamily.

    PubMed

    Bousquet-Antonelli, Cécile; Deragon, Jean-Marc

    2009-05-01

    The extremely well-conserved La motif (LAM), in synergy with the immediately following RNA recognition motif (RRM), allows direct binding of the (genuine) La autoantigen to RNA polymerase III primary transcripts. This motif is not only found on La homologs, but also on La-related proteins (LARPs) of unrelated function. LARPs are widely found amongst eukaryotes and, although poorly characterized, appear to be RNA-binding proteins fulfilling crucial cellular functions. We searched the fully sequenced genomes of 83 eukaryotic species scattered along the tree of life for the presence of LAM-containing proteins. We observed that these proteins are absent from archaea and present in all eukaryotes (except protists from the Plasmodium genus), strongly suggesting that the LAM is an ancestral motif that emerged early after the archaea-eukarya radiation. A complete evolutionary and structural analysis of these proteins resulted in their classification into five families: the genuine La homologs and four LARP families. Unexpectedly, in each family a conserved domain representing either a classical RRM or an RRM-like motif immediately follows the LAM of most proteins. An evolutionary analysis of the LAM-RRM/RRM-L regions shows that these motifs co-evolved and should be used as a single entity to define the functional region of interaction of LARPs with their substrates. We also found two extremely well conserved motifs, named LSA and DM15, shared by LARP6 and LARP1 family members, respectively. We suggest that members of the same family are functional homologs and/or share a common molecular mode of action on different RNA baits.

  18. Distinct XPPX sequence motifs induce ribosome stalling, which is rescued by the translation elongation factor EF-P

    PubMed Central

    Peil, Lauri; Starosta, Agata L.; Lassak, Jürgen; Atkinson, Gemma C.; Virumäe, Kai; Spitzer, Michaela; Tenson, Tanel; Jung, Kirsten; Remme, Jaanus; Wilson, Daniel N.

    2013-01-01

    Ribosomes are the protein synthesizing factories of the cell, polymerizing polypeptide chains from their constituent amino acids. However, distinct combinations of amino acids, such as polyproline stretches, cannot be efficiently polymerized by ribosomes, leading to translational stalling. The stalled ribosomes are rescued by the translational elongation factor P (EF-P), which by stimulating peptide-bond formation allows translation to resume. Using metabolic stable isotope labeling and mass spectrometry, we demonstrate in vivo that EF-P is important for expression of not only polyproline-containing proteins, but also for specific subsets of proteins containing diprolyl motifs (XPP/PPX). Together with a systematic in vitro and in vivo analysis, we provide a distinct hierarchy of stalling triplets, ranging from strong stallers, such as PPP, DPP, and PPN to weak stallers, such as CPP, PPR, and PPH, all of which are substrates for EF-P. These findings provide mechanistic insight into how the characteristics of the specific amino acid substrates influence the fundamentals of peptide bond formation. PMID:24003132

  19. Sequence conservation of the 12D3 gene in Mexican isolates of Babesia bovis.

    PubMed

    Perez, J; Javier Perez, J; Vargas, P; Antonio Alvarez, J; Rojas, C; Figueroa, J V

    2010-04-01

    The 12D3 antigen present in Babesia bovis has been evaluated as a recombinant vaccine candidate and the 12d3 coding sequence has been reported for an Australian and an USA (Texas) isolate of B. bovis. However, no approach has been conducted to perform analysis of 12d3 sequence conservation on a larger number of B. bovis isolates. This could provide important information to determine whether a recombinant vaccine containing this antigen could be widely used. This study reports the cloning and sequencing analysis of the 12d3 coding region in 20 different B. bovis isolates collected from various geographical regions in the tropics and subtropics of Mexico. Comparative analysis of the consensus nucleotide sequences obtained for each isolate revealed a high degree of conservation (94-99% sequence identity) among the 12d3 alleles present in the Mexican isolates when compared with the 12d3 ORF sequences from the Texan (T2Bo) B. bovis isolate. Similarly, BLASTX sequence homology search showed a high percent identity (93-99%) of the deduced amino acid 12D3 sequence as compared with the T2Bo isolate sequence. The high level of sequence conservation in 12d3 among the 20 B. bovis isolates collected from geographically distant locations in Mexico suggests that there exists a minimal bovine-host immunological pressure which could be translated into antigenic diversity or variation, and most probably this is reflected in the non-inmunodominant characteristic of the 12D3 antigen as it has been previously described in the literature. 12D3 antigen can be considered as a viable candidate for inclusion in a recombinant vaccine for cattle babesiosis caused by B. bovis in Mexico.

  20. Homologous recombination enhancement conferred by the Z-DNA motif d(TG)30 is abrogated by simian virus 40 T antigen binding to adjacent DNA sequences.

    PubMed

    Wahls, W P; Moore, P D

    1990-02-01

    The Z-DNA motif polydeoxythymidylic-guanylic [d(TG)].polydeoxyadenylic-cytidylic acid [d(AC)], present throughout eucaryotic genomes, is capable of readily forming left-handed Z-DNA in vitro and has been shown to promote homologous recombination. The effects of simian virus 40 T-antigen-dependent substrate replication upon the stimulation of recombination conferred by the Z-DNA motif d(TG)30 were analyzed. Presence of d(TG)30 adjacent to a T-antigen-binding site I can stimulate homologous recombination between nonreplicating plasmids, providing that T antigen is absent, in both simian CV-1 cells and human EJ cells (W. P. Wahls, L. J. Wallace, and P. D. Moore, Mol. Cell. Biol. 10:785-793). It has also been shown elsewhere that the presence of d(TG)n not adjacent to the T-antigen-binding site can stimulate homologous recombination in simian virus 40 molecules replicating in the presence of T antigen (P. Bullock, J. Miller, and M. Botchan, Mol. Cell. Biol. 6:3948-3953, 1986). However, it is demonstrated here that d(TG)30 nine base pairs distant from a T-antigen-binding site bound with T antigen does not stimulate recombination between either replicating or nonreplicating substrates in somatic cells. The bound T antigen either prevents the d(TG)30 sequence from acquiring a recombinogenic configuration (such as left-handed Z-DNA), or it prevents the interaction of recombinase proteins with the sequence by stearic hindrance. PMID:2153923

  1. Characterizing and controlling intrinsic biases of lambda exonuclease in nascent strand sequencing reveals phasing between nucleosomes and G-quadruplex motifs around a subset of human replication origins.

    PubMed

    Foulk, Michael S; Urban, John M; Casella, Cinzia; Gerbi, Susan A

    2015-05-01

    Nascent strand sequencing (NS-seq) is used to discover DNA replication origins genome-wide, allowing identification of features for their specification. NS-seq depends on the ability of lambda exonuclease (λ-exo) to efficiently digest parental DNA while leaving RNA-primer protected nascent strands intact. We used genomics and biochemical approaches to determine if λ-exo digests all parental DNA sequences equally. We report that λ-exo does not efficiently digest G-quadruplex (G4) structures in a plasmid. Moreover, λ-exo digestion of nonreplicating genomic DNA (LexoG0) enriches GC-rich DNA and G4 motifs genome-wide. We used LexoG0 data to control for nascent strand-independent λ-exo biases in NS-seq and validated this approach at the rDNA locus. The λ-exo-controlled NS-seq peaks are not GC-rich, and only 35.5% overlap with 6.8% of all G4s, suggesting that G4s are not general determinants for origin specification but may play a role for a subset. Interestingly, we observed a periodic spacing of G4 motifs and nucleosomes around the peak summits, suggesting that G4s may position nucleosomes at this subset of origins. Finally, we demonstrate that use of Na(+) instead of K(+) in the λ-exo digestion buffer reduced the effect of G4s on λ-exo digestion and discuss ways to increase both the sensitivity and specificity of NS-seq.

  2. Characterizing and controlling intrinsic biases of lambda exonuclease in nascent strand sequencing reveals phasing between nucleosomes and G-quadruplex motifs around a subset of human replication origins

    PubMed Central

    Foulk, Michael S.; Urban, John M.; Casella, Cinzia; Gerbi, Susan A.

    2015-01-01

    Nascent strand sequencing (NS-seq) is used to discover DNA replication origins genome-wide, allowing identification of features for their specification. NS-seq depends on the ability of lambda exonuclease (λ-exo) to efficiently digest parental DNA while leaving RNA-primer protected nascent strands intact. We used genomics and biochemical approaches to determine if λ-exo digests all parental DNA sequences equally. We report that λ-exo does not efficiently digest G-quadruplex (G4) structures in a plasmid. Moreover, λ-exo digestion of nonreplicating genomic DNA (LexoG0) enriches GC-rich DNA and G4 motifs genome-wide. We used LexoG0 data to control for nascent strand–independent λ-exo biases in NS-seq and validated this approach at the rDNA locus. The λ-exo–controlled NS-seq peaks are not GC-rich, and only 35.5% overlap with 6.8% of all G4s, suggesting that G4s are not general determinants for origin specification but may play a role for a subset. Interestingly, we observed a periodic spacing of G4 motifs and nucleosomes around the peak summits, suggesting that G4s may position nucleosomes at this subset of origins. Finally, we demonstrate that use of Na+ instead of K+ in the λ-exo digestion buffer reduced the effect of G4s on λ-exo digestion and discuss ways to increase both the sensitivity and specificity of NS-seq. PMID:25695952

  3. Limb body wall complex, amniotic band sequence, or new syndrome caused by mutation in IQ Motif containing K (IQCK)?

    PubMed Central

    Kruszka, Paul; Uwineza, Annette; Mutesa, Leon; Martinez, Ariel F; Abe, Yu; Zackai, Elaine H; Ganetzky, Rebecca; Chung, Brian; Stevenson, Roger E; Adelstein, Robert S; Ma, Xuefei; Mullikin, James C; Hong, Sung-Kook; Muenke, Maximilian

    2015-01-01

    Limb body wall complex (LBWC) and amniotic band sequence (ABS) are multiple congenital anomaly conditions with craniofacial, limb, and ventral wall defects. LBWC and ABS are considered separate entities by some, and a continuum of severity of the same condition by others. The etiology of LBWC/ABS remains unknown and multiple hypotheses have been proposed. One individual with features of LBWC and his unaffected parents were whole exome sequenced and Sanger sequenced as confirmation of the mutation. Functional studies were conducted using morpholino knockdown studies followed by human mRNA rescue experiments. Using whole exome sequencing, a de novo heterozygous mutation was found in the gene IQCK: c.667C>G; p.Q223E and confirmed by Sanger sequencing in an individual with LBWC. Morpholino knockdown of iqck mRNA in the zebrafish showed ventral defects including failure of ventral fin to develop and cardiac edema. Human wild-type IQCK mRNA rescued the zebrafish phenotype, whereas human p.Q223E IQCK mRNA did not, but worsened the phenotype of the morpholino knockdown zebrafish. This study supports a genetic etiology for LBWC/ABS, or potentially a new syndrome. PMID:26436108

  4. Mapping the structure of folding cores in TIM barrel proteins by hydrogen exchange mass spectrometry: the roles of motif and sequence for the indole-3-glycerol phosphate synthase from Sulfolobus solfataricus.

    PubMed

    Gu, Zhenyu; Zitzewitz, Jill A; Matthews, C Robert

    2007-04-27

    To test the roles of motif and amino acid sequence in the folding mechanisms of TIM barrel proteins, hydrogen-deuterium exchange was used to explore the structure of the stable folding intermediates for the of indole-3-glycerol phosphate synthase from Sulfolobus solfataricus (sIGPS). Previous studies of the urea denaturation of sIGPS revealed the presence of an intermediate that is highly populated at approximately 4.5 M urea and contains approximately 50% of the secondary structure of the native (N) state. Kinetic studies showed that this apparent equilibrium intermediate is actually comprised of two thermodynamically distinct species, I(a) and I(b). To probe the location of the secondary structure in this pair of stable on-pathway intermediates, the equilibrium unfolding process of sIGPS was monitored by hydrogen-deuterium exchange mass spectrometry. The intact protein and pepsin-digested fragments were studied at various concentrations of urea by electrospray and matrix-assisted laser desorption ionization time-of-flight mass spectrometry, respectively. Intact sIGPS strongly protects at least 54 amide protons from hydrogen-deuterium exchange in the intermediate states, demonstrating the presence of stable folded cores. When the protection patterns and the exchange mechanisms for the peptides are considered with the proposed folding mechanism, the results can be interpreted to define the structural boundaries of I(a) and I(b). Comparison of these results with previous hydrogen-deuterium exchange studies on another TIM barrel protein of low sequence identify, alpha-tryptophan synthase (alphaTS), indicates that the thermodynamic states corresponding to the folding intermediates are better conserved than their structures. Although the TIM barrel motif appears to define the basic features of the folding free energy surface, the structures of the partially folded states that appear during the folding reaction depend on the amino acid sequence. Markedly, the good

  5. PTS-Mediated Regulation of the Transcription Activator MtlR from Different Species: Surprising Differences despite Strong Sequence Conservation.

    PubMed

    Joyet, Philippe; Derkaoui, Meriem; Bouraoui, Houda; Deutscher, Josef

    2015-01-01

    The hexitol D-mannitol is transported by many bacteria via a phosphoenolpyruvate (PEP):carbohydrate phosphotransferase system (PTS). In most Firmicutes, the transcription activator MtlR controls the expression of the genes encoding the D-mannitol-specific PTS components and D-mannitol-1-P dehydrogenase. MtlR contains an N-terminal helix-turn-helix motif followed by an Mga-like domain, two PTS regulation domains (PRDs), an EIIB(Gat)- and an EIIA(Mtl)-like domain. The four regulatory domains are the target of phosphorylation by PTS components. Despite strong sequence conservation, the mechanisms controlling the activity of MtlR from Lactobacillus casei, Bacillus subtilis and Geobacillus stearothermophilus are quite different. Owing to the presence of a tyrosine in place of the second conserved histidine (His) in PRD2, L. casei MtlR is not phosphorylated by Enzyme I (EI) and HPr. When the corresponding His in PRD2 of MtlR from B. subtilis and G. stearothermophilus was replaced with alanine, the transcription regulator was no longer phosphorylated and remained inactive. Surprisingly, L. casei MtlR functions without phosphorylation in PRD2 because in a ptsI (EI) mutant MtlR is constitutively active. EI inactivation prevents not only phosphorylation of HPr, but also of the PTS(Mtl) components, which inactivate MtlR by phosphorylating its EIIB(Gat)- or EIIA(Mtl)-like domain. This explains the constitutive phenotype of the ptsI mutant. The absence of EIIB(Mtl)-mediated phosphorylation leads to induction of the L. caseimtl operon. This mechanism resembles mtlARFD induction in G. stearothermophilus, but differs from EIIA(Mtl)-mediated induction in B. subtilis. In contrast to B. subtilis MtlR, L. casei MtlR activation does not require sequestration to the membrane via the unphosphorylated EIIB(Mtl) domain. PMID:26159071

  6. Highly conserved d-loop sequences in woolly mouse opossums Marmosa (Micoureus).

    PubMed

    Rocha, Rita Gomes; Leite, Yuri Luiz Reis; Ferreira, Eduardo; Justino, Juliana; Costa, Leonora Pires

    2012-04-01

    This study reports the occurrence of highly conserved d-loop sequences in the mitochondrial genome of the woolly mouse opossum genus Marmosa subgenus Micoureus (Mammalia, Didelphimorphia, Didelphidae). Sixty-six sequences of Marmosa (Micoureus) demerarae, Marmosa (Micoureus) constantiae, and Marmosa (Micoureus) paraguayanus were amplified using universal d-loop primers and virtually no genetic differences were detected within and among species. These sequences matched the control region of the mitochondrial marsupial genome. Analyses of qualitative aspects of these sequences revealed that their structural composition is very similar to the d-loop region of other didelphid species. However, the total lack of variability has not been reported from other closely related species. The data analyzed here support the occurrence of highly conserved d-loop sequences, and we found no support for the hypothesis that these sequences are d-loop-like nuclear pseudogenes. Furthermore, the control and flanking regions obtained with different primers corroborate the lack of variability of the d-loop sequences in the mitochondrial genome of Marmosa (Micoureus).

  7. ConSurf 2010: calculating evolutionary conservation in sequence and structure of proteins and nucleic acids.

    PubMed

    Ashkenazy, Haim; Erez, Elana; Martz, Eric; Pupko, Tal; Ben-Tal, Nir

    2010-07-01

    It is informative to detect highly conserved positions in proteins and nucleic acid sequence/structure since they are often indicative of structural and/or functional importance. ConSurf (http://consurf.tau.ac.il) and ConSeq (http://conseq.tau.ac.il) are two well-established web servers for calculating the evolutionary conservation of amino acid positions in proteins using an empirical Bayesian inference, starting from protein structure and sequence, respectively. Here, we present the new version of the ConSurf web server that combines the two independent servers, providing an easier and more intuitive step-by-step interface, while offering the user more flexibility during the process. In addition, the new version of ConSurf calculates the evolutionary rates for nucleic acid sequences. The new version is freely available at: http://consurf.tau.ac.il/.

  8. Significance of satellite DNA revealed by conservation of a widespread repeat DNA sequence among angiosperms.

    PubMed

    Mehrotra, Shweta; Goel, Shailendra; Raina, Soom Nath; Rajpal, Vijay Rani

    2014-08-01

    The analysis of plant genome structure and evolution requires comprehensive characterization of repetitive sequences that make up the majority of plant nuclear DNA. In the present study, we analyzed the nature of pCtKpnI-I and pCtKpnI-II tandem repeated sequences, reported earlier in Carthamus tinctorius. Interestingly, homolog of pCtKpnI-I repeat sequence was also found to be present in widely divergent families of angiosperms. pCtKpnI-I showed high sequence similarity but low copy number among various taxa of different families of angiosperms analyzed. In comparison, pCtKpnI-II was specific to the genus Carthamus and was not present in any other taxa analyzed. The molecular structure of pCtKpnI-I was analyzed in various unrelated taxa of angiosperms to decipher the evolutionary conserved nature of the sequence and its possible functional role.

  9. Studying RNA Homology and Conservation with Infernal: From Single Sequences to RNA Families.

    PubMed

    Barquist, Lars; Burge, Sarah W; Gardner, Paul P

    2016-01-01

    Emerging high-throughput technologies have led to a deluge of putative non-coding RNA (ncRNA) sequences identified in a wide variety of organisms. Systematic characterization of these transcripts will be a tremendous challenge. Homology detection is critical to making maximal use of functional information gathered about ncRNAs: identifying homologous sequence allows us to transfer information gathered in one organism to another quickly and with a high degree of confidence. ncRNA presents a challenge for homology detection, as the primary sequence is often poorly conserved and de novo secondary structure prediction and search remain difficult. This unit introduces methods developed by the Rfam database for identifying "families" of homologous ncRNAs starting from single "seed" sequences, using manually curated sequence alignments to build powerful statistical models of sequence and structure conservation known as covariance models (CMs), implemented in the Infernal software package. We provide a step-by-step iterative protocol for identifying ncRNA homologs and then constructing an alignment and corresponding CM. We also work through an example for the bacterial small RNA MicA, discovering a previously unreported family of divergent MicA homologs in genus Xenorhabdus in the process. © 2016 by John Wiley & Sons, Inc. PMID:27322404

  10. A Conserved Interaction between a C-Terminal Motif in Norovirus VPg and the HEAT-1 Domain of eIF4G Is Essential for Translation Initiation.

    PubMed

    Leen, Eoin N; Sorgeloos, Frédéric; Correia, Samantha; Chaudhry, Yasmin; Cannac, Fabien; Pastore, Chiara; Xu, Yingqi; Graham, Stephen C; Matthews, Stephen J; Goodfellow, Ian G; Curry, Stephen

    2016-01-01

    Translation initiation is a critical early step in the replication cycle of the positive-sense, single-stranded RNA genome of noroviruses, a major cause of gastroenteritis in humans. Norovirus RNA, which has neither a 5´ m7G cap nor an internal ribosome entry site (IRES), adopts an unusual mechanism to initiate protein synthesis that relies on interactions between the VPg protein covalently attached to the 5´-end of the viral RNA and eukaryotic initiation factors (eIFs) in the host cell. For murine norovirus (MNV) we previously showed that VPg binds to the middle fragment of eIF4G (4GM; residues 652-1132). Here we have used pull-down assays, fluorescence anisotropy, and isothermal titration calorimetry (ITC) to demonstrate that a stretch of ~20 amino acids at the C terminus of MNV VPg mediates direct and specific binding to the HEAT-1 domain within the 4GM fragment of eIF4G. Our analysis further reveals that the MNV C terminus binds to eIF4G HEAT-1 via a motif that is conserved in all known noroviruses. Fine mutagenic mapping suggests that the MNV VPg C terminus may interact with eIF4G in a helical conformation. NMR spectroscopy was used to define the VPg binding site on eIF4G HEAT-1, which was confirmed by mutagenesis and binding assays. We have found that this site is non-overlapping with the binding site for eIF4A on eIF4G HEAT-1 by demonstrating that norovirus VPg can form ternary VPg-eIF4G-eIF4A complexes. The functional significance of the VPg-eIF4G interaction was shown by the ability of fusion proteins containing the C-terminal peptide of MNV VPg to inhibit in vitro translation of norovirus RNA but not cap- or IRES-dependent translation. These observations define important structural details of a functional interaction between norovirus VPg and eIF4G and reveal a binding interface that might be exploited as a target for antiviral therapy.

  11. A Conserved Interaction between a C-Terminal Motif in Norovirus VPg and the HEAT-1 Domain of eIF4G Is Essential for Translation Initiation

    PubMed Central

    Leen, Eoin N.; Sorgeloos, Frédéric; Correia, Samantha; Chaudhry, Yasmin; Cannac, Fabien; Pastore, Chiara; Xu, Yingqi; Graham, Stephen C.; Matthews, Stephen J.; Goodfellow, Ian G.; Curry, Stephen

    2016-01-01

    Translation initiation is a critical early step in the replication cycle of the positive-sense, single-stranded RNA genome of noroviruses, a major cause of gastroenteritis in humans. Norovirus RNA, which has neither a 5´ m7G cap nor an internal ribosome entry site (IRES), adopts an unusual mechanism to initiate protein synthesis that relies on interactions between the VPg protein covalently attached to the 5´-end of the viral RNA and eukaryotic initiation factors (eIFs) in the host cell. For murine norovirus (MNV) we previously showed that VPg binds to the middle fragment of eIF4G (4GM; residues 652–1132). Here we have used pull-down assays, fluorescence anisotropy, and isothermal titration calorimetry (ITC) to demonstrate that a stretch of ~20 amino acids at the C terminus of MNV VPg mediates direct and specific binding to the HEAT-1 domain within the 4GM fragment of eIF4G. Our analysis further reveals that the MNV C terminus binds to eIF4G HEAT-1 via a motif that is conserved in all known noroviruses. Fine mutagenic mapping suggests that the MNV VPg C terminus may interact with eIF4G in a helical conformation. NMR spectroscopy was used to define the VPg binding site on eIF4G HEAT-1, which was confirmed by mutagenesis and binding assays. We have found that this site is non-overlapping with the binding site for eIF4A on eIF4G HEAT-1 by demonstrating that norovirus VPg can form ternary VPg-eIF4G-eIF4A complexes. The functional significance of the VPg-eIF4G interaction was shown by the ability of fusion proteins containing the C-terminal peptide of MNV VPg to inhibit in vitro translation of norovirus RNA but not cap- or IRES-dependent translation. These observations define important structural details of a functional interaction between norovirus VPg and eIF4G and reveal a binding interface that might be exploited as a target for antiviral therapy. PMID:26734730

  12. Mouse Brca1: localization sequence analysis and identification of evolutionarily conserved domains.

    PubMed

    Abel, K J; Xu, J; Yin, G Y; Lyons, R H; Meisler, M H; Weber, B L

    1995-12-01

    The human genes BRCA1, conferring susceptibility to early-onset breast and ovarian cancer, has recently been isolated. Here we describe isolation of cDNAs, sequence analysis, and genomic localization of the murine homolog, Brac1. The mouse cDNA sequence predicts a protein of 1812 amino acids; a number of small gaps account for the 51 fewer residues in the mouse protein relative to human BRCA1. While the predicted mouse and human proteins display on the whole a high level of homology (58% identity, 73% similarity), the regions of greatest homology are at the respective amino and carboxyl termini. Most reported disease-associated missense mutations in human BCRA1 occurred within these more highly conserved terminal regions. A predicted zinc-building RING finger domain near the amino terminus lies within a 50 amino acid stretch that is perfectly conserved in both species. The strong conservation during mammalian evolution argues for the importance of this domain, perhaps mediating a role for BRCA1 in DNA and/or protein binding. We have also identified a conserved highly acidic domain in the carboxyl terminal half of the BCRA1 protein resembling acidic transactivation domains of certain transcription factors. Using an interspecific backcross panel, Brca1 was mapped to a region of mouse chromosome 11 that exhibits conserved linkage with 17q21. The sequence and isolated cDNAs will provide useful reagents for studying the expression of Brca1 in the mouse, and for testing the importance of the evolutionarily conserved domains.

  13. Co-evolution of segregation guide DNA motifs and the FtsK translocase in bacteria: identification of the atypical Lactococcus lactis KOPS motif

    PubMed Central

    Nolivos, Sophie; Touzain, Fabrice; Pages, Carine; Coddeville, Michele; Rousseau, Philippe; El Karoui, Meriem; Le Bourgeois, Pascal; Cornet, François

    2012-01-01

    Bacteria use the global bipolarization of their chromosomes into replichores to control the dynamics and segregation of their genome during the cell cycle. This involves the control of protein activities by recognition of specific short DNA motifs whose orientation along the chromosome is highly skewed. The KOPS motifs act in chromosome segregation by orienting the activity of the FtsK DNA translocase towards the terminal replichore junction. KOPS motifs have been identified in γ-Proteobacteria and in Bacillus subtilis as closely related G-rich octamers. We have identified the KOPS motif of Lactococcus lactis, a model bacteria of the Streptococcaceae family harbouring a compact and low GC% genome. This motif, 5′-GAAGAAG-3, was predicted in silico using the occurrence and skew characteristics of known KOPS motifs. We show that it is specifically recognized by L. lactis FtsK in vitro and controls its activity in vivo. L. lactis KOPS is thus an A-rich heptamer motif. Our results show that KOPS-controlled chromosome segregation is conserved in Streptococcaceae but that KOPS may show important variation in sequence and length between bacterial families. This suggests that FtsK adapts to its host genome by selecting motifs with convenient occurrence frequencies and orientation skews to orient its activity. PMID:22373923

  14. Identification of polymorphic, conserved simple sequence repeats (SSRs) in cultivated Brassica species.

    PubMed

    Szewc-McFadden, A K; Kresovich, S; Bliek, S M; Mitchell, S E; McFerson, J R

    1996-09-01

    The application of simple sequence repeat (SSR) genotyping for the characterization of genetic variation in crop plants has been hindered by ready access to useful primer pairs and potentially limited conservation of the repeat sequences among related species. In this phase of work, we report on the identification and characterization of SSRs that are conserved in Brassica napus L. (rapeseed) and its putative progenitors, B. oleracea L. (cabbage, and related vegetable types) and B. rapa (vegetable and oil types). Approximately 140 clones from a size-fractionated genomic library of B. napus were sequenced, and primer pairs were designed for 21 dinucleotide SSRs. Seventeen primer pairs amplified products in the three species and, among these, 13 detected variation between and within species. Unlike findings on SSR information content in human, no relationship could be established between the number of tandem repeats within the target sequence and heterozygosity. All primer pairs have been designed to work under identical amplification conditions; therefore, single-reaction, multiplex polymerase chain reaction (PCR) with these SSRs is possible. Once moderate numbers of primer pairs are accessible to the user community, SSR genotyping may provide a useful method for the characterization, conservation, and utilization of agricultural crop diversity.

  15. Metazoan remaining genes for essential amino acid biosynthesis: sequence conservation and evolutionary analyses.

    PubMed

    Costa, Igor R; Thompson, Julie D; Ortega, José Miguel; Prosdocimi, Francisco

    2014-12-24

    Essential amino acids (EAA) consist of a group of nine amino acids that animals are unable to synthesize via de novo pathways. Recently, it has been found that most metazoans lack the same set of enzymes responsible for the de novo EAA biosynthesis. Here we investigate the sequence conservation and evolution of all the metazoan remaining genes for EAA pathways. Initially, the set of all 49 enzymes responsible for the EAA de novo biosynthesis in yeast was retrieved. These enzymes were used as BLAST queries to search for similar sequences in a database containing 10 complete metazoan genomes. Eight enzymes typically attributed to EAA pathways were found to be ubiquitous in metazoan genomes, suggesting a conserved functional role. In this study, we address the question of how these genes evolved after losing their pathway partners. To do this, we compared metazoan genes with their fungal and plant orthologs. Using phylogenetic analysis with maximum likelihood, we found that acetolactate synthase (ALS) and betaine-homocysteine S-methyltransferase (BHMT) diverged from the expected Tree of Life (ToL) relationships. High sequence conservation in the paraphyletic group Plant-Fungi was identified for these two genes using a newly developed Python algorithm. Selective pressure analysis of ALS and BHMT protein sequences showed higher non-synonymous mutation ratios in comparisons between metazoans/fungi and metazoans/plants, supporting the hypothesis that these two genes have undergone non-ToL evolution in animals.

  16. The Bordetella type III secretion system effector BteA contains a conserved N-terminal motif that guides bacterial virulence factors to lipid rafts.

    PubMed

    French, Christopher T; Panina, Ekaterina M; Yeh, Sylvia H; Griffith, Natasha; Arambula, Diego G; Miller, Jeff F

    2009-12-01

    The Bordetella type III secretion system (T3SS) effector protein BteA is necessary and sufficient for rapid cytotoxicity in a wide range of mammalian cells. We show that BteA is highly conserved and functionally interchangeable between Bordetella bronchiseptica, Bordetella pertussis and Bordetella parapertussis. The identification of BteA sequences required for cytotoxicity allowed the construction of non-cytotoxic mutants for localization studies. BteA derivatives were targeted to lipid rafts and showed clear colocalization with cortical actin, ezrin and the lipid raft marker GM1. We hypothesized that BteA associates with the cytoplasmic face of lipid rafts to locally modulate host cell responses to Bordetella attachment. B. bronchiseptica adhered to host cells almost exclusively to GM1-enriched lipid raft microdomains and BteA colocalized to these same sites following T3SS-mediated translocation. Disruption of lipid rafts with methyl-beta-cyclodextrin protected cells from T3SS-induced cytotoxicity. Localization to lipid rafts was mediated by a 130-amino-acid lipid raft targeting domain at the N-terminus of BteA, and homologous domains were identified in virulence factors from other bacterial species. Lipid raft targeting sequences from a T3SS effector (Plu4750) and an RTX-type toxin (Plu3217) from Photorhabdus luminescens directed fusion proteins to lipid rafts in a manner identical to the N-terminus of BteA. PMID:19650828

  17. RNA 3D Structural Motifs: Definition, Identification, Annotation, and Database Searching

    NASA Astrophysics Data System (ADS)

    Nasalean, Lorena; Stombaugh, Jesse; Zirbel, Craig L.; Leontis, Neocles B.

    Structured RNA molecules resemble proteins in the hierarchical organization of their global structures, folding and broad range of functions. Structured RNAs are composed of recurrent modular motifs that play specific functional roles. Some motifs direct the folding of the RNA or stabilize the folded structure through tertiary interactions. Others bind ligands or proteins or catalyze chemical reactions. Therefore, it is desirable, starting from the RNA sequence, to be able to predict the locations of recurrent motifs in RNA molecules. Conversely, the potential occurrence of one or more known 3D RNA motifs may indicate that a genomic sequence codes for a structured RNA molecule. To identify known RNA structural motifs in new RNA sequences, precise structure-based definitions are needed that specify the core nucleotides of each motif and their conserved interactions. By comparing instances of each recurrent motif and applying base pair isosteriCity relations, one can identify neutral mutations that preserve its structure and function in the contexts in which it occurs.

  18. Evolutionary Analysis and Classification of OATs, OCTs, OCTNs, and Other SLC22 Transporters: Structure-Function Implications and Analysis of Sequence Motifs

    PubMed Central

    Date, Rishabh C.; Bush, Kevin T.; Springer, Stevan A.; Saier, Milton H.; Wu, Wei; Nigam, Sanjay K.

    2015-01-01

    The SLC22 family includes organic anion transporters (OATs), organic cation transporters (OCTs) and organic carnitine and zwitterion transporters (OCTNs). These are often referred to as drug transporters even though they interact with many endogenous metabolites and signaling molecules (Nigam, S.K., Nature Reviews Drug Discovery, 14:29–44, 2015). Phylogenetic analysis of SLC22 supports the view that these transporters may have evolved over 450 million years ago. Many OAT members were found to appear after a major expansion of the SLC22 family in mammals, suggesting a physiological and/or toxicological role during the mammalian radiation. Putative SLC22 orthologs exist in worms, sea urchins, flies, and ciona. At least six groups of SLC22 exist. OATs and OCTs form two Major clades of SLC22, within which (apart from Oat and Oct subclades), there are also clear Oat-like, Octn, and Oct-related subclades, as well as a distantly related group we term “Oat-related” (which may have different functions). Based on available data, it is arguable whether SLC22A18, which is related to bacterial drug-proton antiporters, should be assigned to SLC22. Disease-causing mutations, single nucleotide polymorphisms (SNPs) and other functionally analyzed mutations in OAT1, OAT3, URAT1, OCT1, OCT2, OCTN1, and OCTN2 map to the first extracellular domain, the large central intracellular domain, and transmembrane domains 9 and 10. These regions are highly conserved within subclades, but not between subclades, and may be necessary for SLC22 transporter function and functional diversification. Our results not only link function to evolutionarily conserved motifs but indicate the need for a revised sub-classification of SLC22. PMID:26536134

  19. Ancient conserved regions in new gene sequences and the protein databases

    SciTech Connect

    Green, P.; Hillier, L.; Waterston, R. ); Lipman, D.; States, D.; Claverie, J.M. )

    1993-03-19

    Sets of new gene sequences from human, nematode, and yeast were compared with each other and with a set of Escherichia coli genes in order to detect ancient evolutionarily conserved regions (ACRs) in the encoded proteins. Nearly all of the ACRs so identified were found to be homologous to sequences in the protein databases. This suggests that currently known proteins may already include representatives of most ACRs and that new sequences not similar to any database sequence are unlikely to contain ACRs. Preliminary analyses indicate that moderately expressed genes may be more likely to contain ACRs than rarely expressed genes. It is estimated that there are fewer than 900 ACRs in all. 20 refs., 2 figs., 4 tabs.

  20. Identification of conserved and novel microRNAs in Aquilaria sinensis based on small RNA sequencing and transcriptome sequence data.

    PubMed

    Gao, Zhi-Hui; Wei, Jian-He; Yang, Yun; Zhang, Zheng; Xiong, Huan-Ying; Zhao, Wen-Ting

    2012-08-15

    Agarwood is in great demand for its high value in medicine, incense, and perfume across Asia, Middle East, and Europe. As agarwood is formed only when the Aquilaria trees are wounded or infected by some microbes, overharvesting and habitat loss are threatening some populations of agarwood-producing species. Aquilaria sinensis is such a significant economic tree species. To promote the production efficiency and protect the resource of A. sinensis, it would be critical to reveal the regulation mechanisms of stress-induced agarwood formation. MicroRNAs (miRNAs), a key gene expression regulator involved in various plant stress response and metabolic processes, might function in agarwood formation, but no report concerning miRNAs in Aquilaria is available. In this study, the small RNA high-throughput sequencing and 454 transcriptome data were adopted to identify both conserved and novel miRNAs in A. sinensis. Deep sequencing showed that the small RNA (sRNA) population of A. sinensis was complex and the length of sRNAs varied. By in silico analysis of the small RNA deep sequencing data and transcriptome data, we discovered 27 novel miRNAs in A. sinensis. Based on the mature miRNA sequence conservation, we identified 74 putative conserved miRNAs from A. sinensis and 10 of them were confirmed with hairpin forming precursor. Interestingly, a novel miRNA sequence was determined to be the miRNA of asi-miR408, but with accumulation much higher than asi-miR408. The expression levels of ten stress-responsive miRNAs were examined during the time-course after wound treatment. Eight were shown to be wound-responsive. This not only shows the existence of miRNAs in this Asian economically significant tree species but also indicated its critical role in stress-induced agarwood formation. The highly accumulated miRNA of asi-miR408 implied miRNAs would be functional as well as miRNAs in plants.

  1. Newly identified motifs in Candida albicans Cdr1 protein nucleotide binding domains are pleiotropic drug resistance subfamily-specific and functionally asymmetric

    PubMed Central

    Rawal, Manpreet Kaur; Banerjee, Atanu; Shah, Abdul Haseeb; Khan, Mohammad Firoz; Sen, Sobhan; Saxena, Ajay Kumar; Monk, Brian C.; Cannon, Richard D.; Bhatnagar, Rakesh; Mondal, Alok Kumar; Prasad, Rajendra

    2016-01-01

    An analysis of Candida albicans ABC transporters identified conserved related α-helical sequence motifs immediately C-terminal of each Walker A sequence. Despite the occurrence of these motifs in ABC subfamilies of other yeasts and higher eukaryotes, their roles in protein function remained unexplored. In this study we have examined the functional significance of these motifs in the C. albicans PDR transporter Cdr1p. The motifs present in NBD1 and NBD2 were subjected to alanine scanning mutagenesis, deletion, or replacement of an entire motif. Systematic replacement of individual motif residues with alanine did not affect the function of Cdr1p but deletion of the M1-motif in NBD1 (M1-Del) resulted in Cdr1p being trapped within the endoplasmic reticulum. In contrast, deletion of the M2-motif in NBD2 (M2-Del) yielded a non-functional protein with normal plasma membrane localization. Replacement of the motif in M1-Del with six alanines (M1-Ala) significantly improved localization of the protein and partially restored function. Conversely, replacement of the motif in M2-Del with six alanines (M2-Ala) did not reverse the phenotype and susceptibility to antifungal substrates of Cdr1p was unchanged. Together, the M1 and M2 motifs contribute to the functional asymmetry of NBDs and are important for maturation of Cdr1p and ATP catalysis, respectively. PMID:27251950

  2. Acidic/IQ Motif Regulator of Calmodulin*

    PubMed Central

    Putkey, John A.; Waxham, M. Neal; Gaertner, Tara R.; Brewer, Kari J.; Goldsmith, Michael; Kubota, Yoshihisa; Kleerekoper, Quinn K.

    2013-01-01

    The small IQ motif proteins PEP-19 (62 amino acids) and RC3 (78 amino acids) greatly accelerate the rates of Ca2+ binding to sites III and IV in the C-domain of calmodulin (CaM). We show here that PEP-19 decreases the degree of cooperativity of Ca2+ binding to sites III and IV, and we present a model showing that this could increase Ca2+ binding rate constants. Comparative sequence analysis showed that residues 28 to 58 from PEP-19 are conserved in other proteins. This region includes the IQ motif (amino acids 39–62), and an adjacent acidic cluster of amino acids (amino acids 28–40). A synthetic peptide spanning residues 28–62 faithfully mimics intact PEP-19 with respect to increasing the rates of Ca2+ association and dissociation, as well as binding preferentially to the C-domain of CaM. In contrast, a peptide encoding only the core IQ motif does not modulate Ca2+ binding, and binds to multiple sites on CaM. A peptide that includes only the acidic region does not bind to CaM. These results show that PEP-19 has a novel acidic/IQ CaM regulatory motif in which the IQ sequence provides a targeting function that allows binding of PEP-19 to CaM, whereas the acidic residues modify the nature of this interaction, and are essential for modulating Ca2+ binding to the C-domain of CaM. PMID:17991744

  3. Strong evolutionary conservation of neuropeptide Y: sequences of chicken, goldfish, and Torpedo marmorata DNA clones.

    PubMed Central

    Blomqvist, A G; Söderberg, C; Lundell, I; Milner, R J; Larhammar, D

    1992-01-01

    Neuropeptide Y (NPY) is an abundant and widespread neuropeptide in the nervous system of mammals. NPY belongs to a family of 36-amino acid peptides that also includes pancreatic polypeptide and the endocrine gut peptide YY as well as the fish pancreatic peptide Y. To study the evolution of this peptide family, we have isolated clones encoding NPY from central nervous system cDNA libraries of chicken, goldfish, and the ray Torpedo marmorata, as well as from a chicken genomic library. The predicted chicken NPY amino acid sequence differs from that of rat at only one position. The goldfish sequence differs at five positions and shows that bony fishes have a true NPY peptide in addition to their pancreatic peptide Y. The Torpedo sequence differs from that of rat at three positions. As Torpedo NPY has no unique positions when compared with the other sequences, it seems to be identical to the NPY of the common ancestor of cartilaginous fishes, bony fishes, and tetrapods after 420 million years of evolution. The 30-amino acid carboxyl-terminal extension of the NPY precursor also displays considerable sequence conservation. These results show that NPY is one of the most highly conserved neuroendocrine peptides. Images PMID:1549597

  4. Strong evolutionary conservation of neuropeptide Y: sequences of chicken, goldfish, and Torpedo marmorata DNA clones.

    PubMed

    Blomqvist, A G; Söderberg, C; Lundell, I; Milner, R J; Larhammar, D

    1992-03-15

    Neuropeptide Y (NPY) is an abundant and widespread neuropeptide in the nervous system of mammals. NPY belongs to a family of 36-amino acid peptides that also includes pancreatic polypeptide and the endocrine gut peptide YY as well as the fish pancreatic peptide Y. To study the evolution of this peptide family, we have isolated clones encoding NPY from central nervous system cDNA libraries of chicken, goldfish, and the ray Torpedo marmorata, as well as from a chicken genomic library. The predicted chicken NPY amino acid sequence differs from that of rat at only one position. The goldfish sequence differs at five positions and shows that bony fishes have a true NPY peptide in addition to their pancreatic peptide Y. The Torpedo sequence differs from that of rat at three positions. As Torpedo NPY has no unique positions when compared with the other sequences, it seems to be identical to the NPY of the common ancestor of cartilaginous fishes, bony fishes, and tetrapods after 420 million years of evolution. The 30-amino acid carboxyl-terminal extension of the NPY precursor also displays considerable sequence conservation. These results show that NPY is one of the most highly conserved neuroendocrine peptides.

  5. Sequence-related human proteins cluster by degree of evolutionary conservation.

    PubMed

    Mrowka, Ralf; Patzak, Andreas; Herzel, Hanspeter; Holste, Dirk

    2004-11-01

    Gene duplication followed by adaptive evolution is thought to be a central mechanism for the emergence of novel genes. To illuminate the contribution of duplicated protein-coding sequences to the complexity of the human genome, we study the connectivity of pairwise sequence-related human proteins and construct a network (N) of linked protein sequences with shared similarities. We find that (i) the connectivity distribution P (k) for k sequence-related proteins decays as a power law P (k) approximately k(-gamma) with gamma approximately 1.2 , (ii) the top rank of N consists of a single large cluster of proteins ( approximately 70%) , while bottom ranks consist of multiple isolated clusters, and (iii) structural characteristics of N show both a high degree of clustering and an intermediate connectivity ("small-world" features). We gain further insight into structural properties of N by studying the relationship between the connectivity distribution and the phylogenetic conservation of proteins in bacteria, plants, invertebrates, and vertebrates. We find that (iv) the proportion of sequence-related proteins increases with increasing extent of evolutionary conservation. Our results support that small-world network properties constitute a footprint of an evolutionary mechanism and extend the traditional interpretation of protein families.

  6. Sequence-related human proteins cluster by degree of evolutionary conservation

    NASA Astrophysics Data System (ADS)

    Mrowka, Ralf; Patzak, Andreas; Herzel, Hanspeter; Holste, Dirk

    2004-11-01

    Gene duplication followed by adaptive evolution is thought to be a central mechanism for the emergence of novel genes. To illuminate the contribution of duplicated protein-coding sequences to the complexity of the human genome, we study the connectivity of pairwise sequence-related human proteins and construct a network (N) of linked protein sequences with shared similarities. We find that (i) the connectivity distribution P(k) for k sequence-related proteins decays as a power law P(k)˜k-γ with γ≈1.2 , (ii) the top rank of N consists of a single large cluster of proteins (≈70%) , while bottom ranks consist of multiple isolated clusters, and (iii) structural characteristics of N show both a high degree of clustering and an intermediate connectivity (“small-world” features). We gain further insight into structural properties of N by studying the relationship between the connectivity distribution and the phylogenetic conservation of proteins in bacteria, plants, invertebrates, and vertebrates. We find that (iv) the proportion of sequence-related proteins increases with increasing extent of evolutionary conservation. Our results support that small-world network properties constitute a footprint of an evolutionary mechanism and extend the traditional interpretation of protein families.

  7. The Large Mitochondrial Genome of Symbiodinium minutum Reveals Conserved Noncoding Sequences between Dinoflagellates and Apicomplexans.

    PubMed

    Shoguchi, Eiichi; Shinzato, Chuya; Hisata, Kanako; Satoh, Nori; Mungpakdee, Sutada

    2015-08-01

    Even though mitochondrial genomes, which characterize eukaryotic cells, were first discovered more than 50 years ago, mitochondrial genomics remains an important topic in molecular biology and genome sciences. The Phylum Alveolata comprises three major groups (ciliates, apicomplexans, and dinoflagellates), the mitochondrial genomes of which have diverged widely. Even though the gene content of dinoflagellate mitochondrial genomes is reportedly comparable to that of apicomplexans, the highly fragmented and rearranged genome structures of dinoflagellates have frustrated whole genomic analysis. Consequently, noncoding sequences and gene arrangements of dinoflagellate mitochondrial genomes have not been well characterized. Here we report that the continuous assembled genome (∼326 kb) of the dinoflagellate, Symbiodinium minutum, is AT-rich (∼64.3%) and that it contains three protein-coding genes. Based upon in silico analysis, the remaining 99% of the genome comprises transcriptomic noncoding sequences. RNA edited sites and unique, possible start and stop codons clarify conserved regions among dinoflagellates. Our massive transcriptome analysis shows that almost all regions of the genome are transcribed, including 27 possible fragmented ribosomal RNA genes and 12 uncharacterized small RNAs that are similar to mitochondrial RNA genes of the malarial parasite, Plasmodium falciparum. Gene map comparisons show that gene order is only slightly conserved between S. minutum and P. falciparum. However, small RNAs and intergenic sequences share sequence similarities with P. falciparum, suggesting that the function of noncoding sequences has been preserved despite development of very different genome structures.

  8. The Large Mitochondrial Genome of Symbiodinium minutum Reveals Conserved Noncoding Sequences between Dinoflagellates and Apicomplexans

    PubMed Central

    Shoguchi, Eiichi; Shinzato, Chuya; Hisata, Kanako; Satoh, Nori; Mungpakdee, Sutada

    2015-01-01

    Even though mitochondrial genomes, which characterize eukaryotic cells, were first discovered more than 50 years ago, mitochondrial genomics remains an important topic in molecular biology and genome sciences. The Phylum Alveolata comprises three major groups (ciliates, apicomplexans, and dinoflagellates), the mitochondrial genomes of which have diverged widely. Even though the gene content of dinoflagellate mitochondrial genomes is reportedly comparable to that of apicomplexans, the highly fragmented and rearranged genome structures of dinoflagellates have frustrated whole genomic analysis. Consequently, noncoding sequences and gene arrangements of dinoflagellate mitochondrial genomes have not been well characterized. Here we report that the continuous assembled genome (∼326 kb) of the dinoflagellate, Symbiodinium minutum, is AT-rich (∼64.3%) and that it contains three protein-coding genes. Based upon in silico analysis, the remaining 99% of the genome comprises transcriptomic noncoding sequences. RNA edited sites and unique, possible start and stop codons clarify conserved regions among dinoflagellates. Our massive transcriptome analysis shows that almost all regions of the genome are transcribed, including 27 possible fragmented ribosomal RNA genes and 12 uncharacterized small RNAs that are similar to mitochondrial RNA genes of the malarial parasite, Plasmodium falciparum. Gene map comparisons show that gene order is only slightly conserved between S. minutum and P. falciparum. However, small RNAs and intergenic sequences share sequence similarities with P. falciparum, suggesting that the function of noncoding sequences has been preserved despite development of very different genome structures. PMID:26199191

  9. Auditory sequence processing reveals evolutionarily conserved regions of frontal cortex in macaques and humans

    PubMed Central

    Wilson, Benjamin; Kikuchi, Yukiko; Sun, Li; Hunter, David; Dick, Frederic; Smith, Kenny; Thiele, Alexander; Griffiths, Timothy D.; Marslen-Wilson, William D.; Petkov, Christopher I.

    2015-01-01

    An evolutionary account of human language as a neurobiological system must distinguish between human-unique neurocognitive processes supporting language and evolutionarily conserved, domain-general processes that can be traced back to our primate ancestors. Neuroimaging studies across species may determine whether candidate neural processes are supported by homologous, functionally conserved brain areas or by different neurobiological substrates. Here we use functional magnetic resonance imaging in Rhesus macaques and humans to examine the brain regions involved in processing the ordering relationships between auditory nonsense words in rule-based sequences. We find that key regions in the human ventral frontal and opercular cortex have functional counterparts in the monkey brain. These regions are also known to be associated with initial stages of human syntactic processing. This study raises the possibility that certain ventral frontal neural systems, which play a significant role in language function in modern humans, originally evolved to support domain-general abilities involved in sequence processing. PMID:26573340

  10. A comparative analysis of distribution and conservation of microsatellites in the transcripts of sequenced Fusarium species and development of genic-SSR markers for polymorphism analysis.

    PubMed

    Mahfooz, Sahil; Srivastava, Arpita; Srivastava, Alok K; Arora, Dilip K

    2015-09-01

    We used an in silico approach to survey and compare microsatellites in transcript sequences of four sequenced members of genus Fusarium. G + C content of transcripts was found to be positively correlated with the frequency of SSRs. Our analysis revealed that, in all the four transcript sequences studied, the occurrence, relative abundance and density of microsatellites varied and was not influenced by transcript sizes. No correlation between relative abundance and transcript sizes was observed. The relative abundance and density of microsatellites were highest in the transcripts of Fusarium solani when compared with F. graminearum, F. verticillioides and F. oxysporum. The maximum frequency of SSRs among all four sequence sets was of trinucleotide repeats (67.8%), whereas the dinucleotide repeat represents <1%. Among all classes of repeats, 36.5% motifs were found conserved within Fusarium species. In order to study polymorphism within Fusarium isolates, 11 polymorphic genic-SSR markers were developed. Of the 11 markers, 5 were from F. oxysporum and remaining 6 belongs to F. solani. SSR markers from F. oxysporum were found to be more polymorphic (38%) as compared to F. solani (26%). Eleven polymorphic markers obtained in this study clearly demonstrate the utility of newly developed SSR markers in establishing genetic relationships among different isolates of Fusarium.

  11. IFN-γ in turtle: conservation in sequence and signalling and role in inhibiting iridovirus replication in Chinese soft-shelled turtle Pelodiscus sinensis.

    PubMed

    Fu, Jian Ping; Chen, Shan Nan; Zou, Peng Fei; Huang, Bei; Guo, Zheng; Zeng, Ling Bing; Qin, Qi Wei; Nie, Pin

    2014-03-01

    The IFN-γ gene was identified in a turtle, the Chinese soft-shelled turtle, Pelodiscus sinensis, with its genome consisting of 4 exons and 3 introns. The deduced amino acid sequence of this gene contains a signal peptide, an IFN-γ family signature motif (130)IQRKAVNELFPT, an NLS motif (155)KRKR and three potential N-glycosylation sites. As revealed by real-time quantitative PCR, the gene was constitutively expressed in all tested organs/tissues, with higher level observed in blood, intestine and thymus. An induced expression of IFN-γ at mRNA level was observed in peripheral blood leucocytes (PBLs) in response to in vitro stimulation of LPS and PolyI:C. The overexpression of IFN-γ in the Chinese soft-shelled turtle artery (STA) cell line resulted in the increase in the expression of transcriptional regulators, such as IRF1, IRF7 and STAT1, and antiviral genes, such as Mx, PKR, implying possibly the existence of a conserved signalling network and role for IFN-γ in the turtle. Furthermore, the infection of soft-shelled turtle iridovirus (STIV) in the cell line transfected with IFN-γ may cause the cell death as demonstrated with the elevated lactate dehydrogenase (LDH) level and cell mortality. However, the mechanism involved in the antiviral activity may require further investigation.

  12. Motif types, motif locations and base composition patterns around the RNA polyadenylation site in microorganisms, plants and animals

    PubMed Central

    2014-01-01

    Background The polyadenylation of RNA is critical for gene functioning, but the conserved sequence motifs (often called signal or signature motifs), motif locations and abundances, and base composition patterns around mRNA polyadenylation [poly(A)] sites are still uncharacterized in most species. The evolutionary tendency for poly(A) site selection is still largely unknown. Results We analyzed the poly(A) site regions of 31 species or phyla. Different groups of species showed different poly(A) signal motifs: UUACUU at the poly(A) site in the parasite Trypanosoma cruzi; UGUAAC (approximately 13 bases upstream of the site) in the alga Chlamydomonas reinhardtii; UGUUUG (or UGUUUGUU) at mainly the fourth base downstream of the poly(A) site in the parasite Blastocystis hominis; and AAUAAA at approximately 16 bases and approximately 19 bases upstream of the poly(A) site in animals and plants, respectively. Polyadenylation signal motifs are usually several hundred times more abundant around poly(A) sites than in whole genomes. These predominant motifs usually had very specific locations, whether upstream of, at, or downstream of poly(A) sites, depending on the species or phylum. The poly(A) site was usually an adenosine (A) in all analyzed species except for B. hominis, and there was weak A predominance in C. reinhardtii. Fungi, animals, plants, and the protist Phytophthora infestans shared a general base abundance pattern (or base composition pattern) of “U-rich—A-rich—U-rich—Poly(A) site—U-rich regions”, or U-A-U-A-U for short, with some variation for each kingdom or subkingdom. Conclusion This study identified the poly(A) signal motifs, motif locations, and base composition patterns around mRNA poly(A) sites in protists, fungi, plants, and animals and provided insight into poly(A) site evolution. PMID:25052519

  13. A phylogenetically conserved sequence within viral 3' untranslated RNA pseudoknots regulates translation.

    PubMed Central

    Leathers, V; Tanguay, R; Kobayashi, M; Gallie, D R

    1993-01-01

    Both the 68-base 5' leader (omega) and the 205-base 3' untranslated region (UTR) of tobacco mosaic virus (TMV) promote efficient translation. A 35-base region within omega is necessary and sufficient for the regulation. Within the 3' UTR, a 52-base region, composed of two RNA pseudoknots, is required for regulation. These pseudoknots are phylogenetically conserved among seven viruses from two different viral groups and one satellite virus. The pseudoknots contained significant conservation at the secondary and tertiary levels and at several positions at the primary sequence level. Mutational analysis of the sequences determined that the primary sequence in several conserved positions, particularly within the third pseudoknot, was essential for function. The higher-order structure of the pseudoknots was also required. Both the leader and the pseudoknot region were specifically recognized by, and competed for, the same proteins in extracts made from carrot cell suspension cells and wheat germ. Binding of the proteins is much stronger to omega than the pseudoknot region. Synergism was observed between the TMV 3' UTR and the cap and to a lesser extent between omega and the 3' UTR. The functional synergism and the protein binding data suggest that the cap, TMV 5' leader, and 3' UTR interact to establish an efficient level of translation. Images PMID:8355685

  14. Conservation of uORF repressiveness and sequence features in mouse, human and zebrafish

    PubMed Central

    Chew, Guo-Liang; Pauli, Andrea; Schier, Alexander F.

    2016-01-01

    Upstream open reading frames (uORFs) are ubiquitous repressive genetic elements in vertebrate mRNAs. While much is known about the regulation of individual genes by their uORFs, the range of uORF-mediated translational repression in vertebrate genomes is largely unexplored. Moreover, it is unclear whether the repressive effects of uORFs are conserved across species. To address these questions, we analyse transcript sequences and ribosome profiling data from human, mouse and zebrafish. We find that uORFs are depleted near coding sequences (CDSes) and have initiation contexts that diminish their translation. Linear modelling reveals that sequence features at both uORFs and CDSes modulate the translation of CDSes. Moreover, the ratio of translation over 5′ leaders and CDSes is conserved between human and mouse, and correlates with the number of uORFs. These observations suggest that the prevalence of vertebrate uORFs may be explained by their conserved role in repressing CDS translation. PMID:27216465

  15. Predicting protein ligand binding sites by combining evolutionary sequence conservation and 3D structure.

    PubMed

    Capra, John A; Laskowski, Roman A; Thornton, Janet M; Singh, Mona; Funkhouser, Thomas A

    2009-12-01

    Identifying a protein's functional sites is an important step towards characterizing its molecular function. Numerous structure- and sequence-based methods have been developed for this problem. Here we introduce ConCavity, a small molecule binding site prediction algorithm that integrates evolutionary sequence conservation estimates with structure-based methods for identifying protein surface cavities. In large-scale testing on a diverse set of single- and multi-chain protein structures, we show that ConCavity substantially outperforms existing methods for identifying both 3D ligand binding pockets and individual ligand binding residues. As part of our testing, we perform one of the first direct comparisons of conservation-based and structure-based methods. We find that the two approaches provide largely complementary information, which can be combined to improve upon either approach alone. We also demonstrate that ConCavity has state-of-the-art performance in predicting catalytic sites and drug binding pockets. Overall, the algorithms and analysis presented here significantly improve our ability to identify ligand binding sites and further advance our understanding of the relationship between evolutionary sequence conservation and structural and functional attributes of proteins. Data, source code, and prediction visualizations are available on the ConCavity web site (http://compbio.cs.princeton.edu/concavity/).

  16. An Algorithm for Motif Discovery with Iteration on Lengths of Motifs.

    PubMed

    Fan, Yetian; Wu, Wei; Yang, Jie; Yang, Wenyu; Liu, Rongrong

    2015-01-01

    Analysis of DNA sequence motifs is becoming increasingly important in the study of gene regulation, and the identification of motif in DNA sequences is a complex problem in computational biology. Motif discovery has attracted the attention of more and more researchers, and varieties of algorithms have been proposed. Most existing motif discovery algorithms fix the motif's length as one of the input parameters. In this paper, a novel method is proposed to identify the optimal length of the motif and the optimal motif with that length, through an iteration process on increasing length numbers. For each fixed length, a modified genetic algorithm (GA) is used for finding the optimal motif with that length. Three operators are used in the modified GA: Mutation that is similar to the one used in usual GA but is modified to avoid local optimum in our case, and Addition and Deletion that are proposed by us for the problem. A criterion is given for singling out the optimal length in the increasing motif's lengths. We call this method AMDILM (an algorithm for motif discovery with iteration on lengths of motifs). The experiments on simulated data and real biological data show that AMDILM can accurately identify the optimal motif length. Meanwhile, the optimal motifs discovered by AMDILM are consistent with the real ones and are similar with the motifs obtained by the three well-known methods: Gibbs Sampler, MEME and Weeder. PMID:26357084

  17. Conservation and antigenicity of N-terminal sequences of GP185 from different Plasmodium falciparum isolates.

    PubMed

    Howard, R F; Ardeshir, F; Reese, R T

    1986-01-01

    Complementary DNA (cDNA) clones for GP185, a major antigenically diverse glycoprotein of Plasmodium falciparum, were isolated from a cDNA library of the Honduras I/CDC (Honduras I) isolate, and 1052 bp were sequenced. The expression of cDNA fragments in Escherichia coli using the vector pCQV2 allowed verification of the reading frame. This GP185 cDNA sequence, like the cDNA sequence for a homologous gene of the K1 isolate [Hall et al., Nature 311 (1984) 379-382], codes for a polypeptide which is truncated due to multiple, in-frame stop codons. This polypeptide corresponds to the N-terminal 15% of the proposed coding region of the GP185 gene [Holder et al., Nature 317 (1985) 270-273]. Comparison of the nucleotide sequences for the GP185 gene of Honduras I and five other isolates indicated that there are two areas of conserved DNA sequence, one of 310 bp (beginning 181 bp upstream from the proposed initiation codon) and the other of greater than or equal to 360 bp (located entirely within the coding region), separated by a region encoding isolate-specific tandem amino acid repeats. Rat antiserum was raised to a fusion protein derived from the conserved regions and the intervening repeat region of this Honduras I protein. This antiserum bound GP185 on immunoblots of the homologous Honduras I isolate and the heterologous K1 isolate, which has different tandem repeats. Serum from owl monkeys and humans previously infected with P. falciparum reacted with the fusion protein on immunoblots demonstrating that determinants in the N-terminal 15% of GP185 were immunogenic in infected individuals and suggesting that some of these sites are conserved among isolates.(ABSTRACT TRUNCATED AT 250 WORDS)

  18. Variation in conserved non-coding sequences on chromosome 5q andsusceptibility to asthma and atopy

    SciTech Connect

    Donfack, Joseph; Schneider, Daniel H.; Tan, Zheng; Kurz,Thorsten; Dubchak, Inna; Frazer, Kelly A.; Ober, Carole

    2005-09-10

    Background: Evolutionarily conserved sequences likely havebiological function. Methods: To determine whether variation in conservedsequences in non-coding DNA contributes to risk for human disease, westudied six conserved non-coding elements in the Th2 cytokine cluster onhuman chromosome 5q31 in a large Hutterite pedigree and in samples ofoutbred European American and African American asthma cases and controls.Results: Among six conserved non-coding elements (>100 bp,>70percent identity; human-mouse comparison), we identified one singlenucleotide polymorphism (SNP) in each of two conserved elements and sixSNPs in the flanking regions of three conserved elements. We genotypedour samples for four of these SNPs and an additional three SNPs each inthe IL13 and IL4 genes. While there was only modest evidence forassociation with single SNPs in the Hutterite and European Americansamples (P<0.05), there were highly significant associations inEuropean Americans between asthma and haplotypes comprised of SNPs in theIL4 gene (P<0.001), including a SNP in a conserved non-codingelement. Furthermore, variation in the IL13 gene was strongly associatedwith total IgE (P = 0.00022) and allergic sensitization to mold allergens(P = 0.00076) in the Hutterites, and more modestly associated withsensitization to molds in the European Americans and African Americans (P<0.01). Conclusion: These results indicate that there is overalllittle variation in the conserved non-coding elements on 5q31, butvariation in IL4 and IL13, including possibly one SNP in a conservedelement, influence asthma and atopic phenotypes in diversepopulations.

  19. Characterization of G protein coupling mediated by the conserved D1343.49 of DRY motif, M2416.34, and F2516.44 residues on human CXCR1

    PubMed Central

    Han, Xinbing; Feng, Yan; Chen, Xinhua; Gerard, Craig; Boisvert, William A.

    2015-01-01

    CXCR1, a receptor for interleukin-8 (IL-8), plays an important role in defending against pathogen invasion during neutrophil-mediated innate immune response. Human CXCR1 is a G protein-coupled receptor (GPCR) with its characteristic seven transmembrane domains (TMs). Functional and structural analyses of several GPCRs have revealed that conserved residues on TM3 (including the highly conserved Asp-Arg-Tyr (DRY) motif) and TM6 near intracellular loops contain domains critical for G protein coupling as well as GPCR activation. The objective of this study was to elucidate the role of critical amino acid residues on TM3 near intracellular loop 2 (i2) and TM6 near intracellular loop 3 (i3), including S1323.47 (Baldwin location), D1343.49, M2416.34, and F2516.44, in G protein coupling and CXCR1 activation. The results demonstrate that mutations of D1343.49 at DRY motif of CXCR1 (D134N and D134V) completely abolished the ligand binding and functional response of the receptor. Additionally, point mutations at positions 241 and 251 between TM6 and i3 loop generated mutant receptors with modest constitutive activity via Gα15 signaling activation. Our results show that D1343.49 on the highly conserved DRY motif has a distinct role for CXCR1 compared to its homologues (CXCR2 and KSHV-GPCR) in G protein coupling and receptor activation. In addition, M2416.34 and F2516.44 along with our previously identified V2476.40 on TM6 are spatially located in a “hot spot” likely essential for CXCR1 activation. Identification of these amino acid residues may be useful for elucidating mechanism of CXCR1 activation and designing specific antagonists for the treatment of CXCR1-mediated diseases. PMID:25834784

  20. Conserved elements in the 3' untranslated region of flavivirus RNAs and potential cyclization sequences.

    PubMed

    Hahn, C S; Hahn, Y S; Rice, C M; Lee, E; Dalgarno, L; Strauss, E G; Strauss, J H

    1987-11-01

    We have isolated a cDNA clone after reverse transcription of the genomic RNA of Asibi yellow fever virus whose structure suggests it was formed by self-priming from a 3'-terminal hairpin of 87 nucleotides in the genomic RNA. We have also isolated a clone from cDNA made to Murray Valley encephalitis virus RNA that also appears to have arisen by self-priming from a 3'-terminal structure very similar or identical to that of yellow fever. In addition, 3'-terminal sequencing of the S1 strain of dengue 2 RNA shows that this RNA is also capable of forming a 3'-terminal hairpin of 79 nucleotides. Furthermore, we have identified two 20-nucleotide sequence elements which are present in the 3' untranslated region of all three viruses; one of these sequence elements is repeated in Murray Valley encephalitis and dengue 2 RNA but not in yellow fever RNA. In all three viruses, which represent the three major serological subgroups of the mosquito-borne flaviviruses, the 3'-proximal conserved sequence element, which is found immediately adjacent to the potential 3'-terminal hairpin, is complementary to another conserved domain near the 5' end of the viral RNAs, suggesting that flavivirus RNAs can cyclize (calculated delta G less than -11 kcal; 1 kcal = 4.184 kJ).

  1. A Short Sequence Motif in the 5′ Leader of the HIV-1 Genome Modulates Extended RNA Dimer Formation and Virus Replication*

    PubMed Central

    van Bel, Nikki; Das, Atze T.; Cornelissen, Marion; Abbink, Truus E. M.; Berkhout, Ben

    2014-01-01

    The 5′ leader of the HIV-1 RNA genome encodes signals that control various steps in the replication cycle, including the dimerization initiation signal (DIS) that triggers RNA dimerization. The DIS folds a hairpin structure with a palindromic sequence in the loop that allows RNA dimerization via intermolecular kissing loop (KL) base pairing. The KL dimer can be stabilized by including the DIS stem nucleotides in the intermolecular base pairing, forming an extended dimer (ED). The role of the ED RNA dimer in HIV-1 replication has hardly been addressed because of technical challenges. We analyzed a set of leader mutants with a stabilized DIS hairpin for in vitro RNA dimerization and virus replication in T cells. In agreement with previous observations, DIS hairpin stability modulated KL and ED dimerization. An unexpected previous finding was that mutation of three nucleotides immediately upstream of the DIS hairpin significantly reduced in vitro ED formation. In this study, we tested such mutants in vivo for the importance of the ED in HIV-1 biology. Mutants with a stabilized DIS hairpin replicated less efficiently than WT HIV-1. This defect was most severe when the upstream sequence motif was altered. Virus evolution experiments with the defective mutants yielded fast replicating HIV-1 variants with second site mutations that (partially) restored the WT hairpin stability. Characterization of the mutant and revertant RNA molecules and the corresponding viruses confirmed the correlation between in vitro ED RNA dimer formation and efficient virus replication, thus indicating that the ED structure is important for HIV-1 replication. PMID:25368321

  2. The crystal structure of the extracellular 11-heme cytochrome UndA reveals a conserved 10-heme motif and defined binding site for soluble iron chelates.

    PubMed

    Edwards, Marcus J; Hall, Andrea; Shi, Liang; Fredrickson, James K; Zachara, John M; Butt, Julea N; Richardson, David J; Clarke, Thomas A

    2012-07-01

    Members of the genus Shewanella translocate deca- or undeca-heme cytochromes to the external cell surface thus enabling respiration using extracellular minerals and polynuclear Fe(III) chelates. The high resolution structure of the first undeca-heme outer membrane cytochrome, UndA, reveals a crossed heme chain with four potential electron ingress/egress sites arranged within four domains. Sequence and structural alignment of UndA and the deca-heme MtrF reveals the extra heme of UndA is inserted between MtrF hemes 6 and 7. The remaining UndA hemes can be superposed over the heme chain of the decaheme MtrF, suggesting that a ten heme core is conserved between outer membrane cytochromes. The UndA structure has also been crystallographically resolved in complex with substrates, an Fe(III)-nitrilotriacetate dimer or an Fe(III)-citrate trimer. The structural resolution of these UndA-Fe(III)-chelate complexes provides a rationale for previous kinetic measurements on UndA and other outer membrane cytochromes.

  3. The Crystal Structure of the Extracellular 11-heme Cytochrome UndA Reveals a Conserved 10-heme Motif and Defined Binding Site for Soluble Iron Chelates.

    SciTech Connect

    Edwards, Marcus; Hall, Andrea; Shi, Liang; Fredrickson, Jim K.; Zachara, John M.; Butt, Julea N.; Richardson, David; Clarke, Thomas A.

    2012-07-03

    Members of the genus Shewanella translocate deca- or undeca-heme cytochromes to the external cell surface thus enabling respiration using extracellular minerals and polynuclear Fe(III) chelates. The high resolution structure of the first undeca-heme outer membrane cytochrome, UndA, reveals a crossed heme chain with four potential electron ingress/egress sites arranged within four domains. Sequence and structural alignment of UndA and the deca-heme MtrF reveals the extra heme of UndA is inserted between MtrF hemes 6 and 7. The remaining UndA hemes can be superposed over the heme chain of the decaheme MtrF, suggesting that a ten heme core is conserved between outer membrane cytochromes. The UndA structure is the first outer membrane cytochrome to be crystallographically resolved in complex with substrates, an Fe(III)-nitrilotriacetate dimer or an Fe(III)-citrate trimer. The structural resolution of these UndA-Fe(III)-chelate complexes provides a rationale for previous kinetic measurements on UndA and other outer membrane cytochromes.

  4. Sample sequencing of vascular plants demonstrates widespread conservation and divergence of microRNAs.

    PubMed

    Chávez Montes, Ricardo A; de Fátima Rosas-Cárdenas, Flor; De Paoli, Emanuele; Accerbi, Monica; Rymarquis, Linda A; Mahalingam, Gayathri; Marsch-Martínez, Nayelli; Meyers, Blake C; Green, Pamela J; de Folter, Stefan

    2014-04-23

    Small RNAs are pivotal regulators of gene expression that guide transcriptional and post-transcriptional silencing mechanisms in eukaryotes, including plants. Here we report a comprehensive atlas of sRNA and miRNA from 3 species of algae and 31 representative species across vascular plants, including non-model plants. We sequence and quantify sRNAs from 99 different tissues or treatments across species, resulting in a data set of over 132 million distinct sequences. Using miRBase mature sequences as a reference, we identify the miRNA sequences present in these libraries. We apply diverse profiling methods to examine critical sRNA and miRNA features, such as size distribution, tissue-specific regulation and sequence conservation between species, as well as to predict putative new miRNA sequences. We also develop database resources, computational analysis tools and a dedicated website, http://smallrna.udel.edu/. This study provides new insights on plant sRNAs and miRNAs, and a foundation for future studies.

  5. Sample sequencing of vascular plants demonstrates widespread conservation and divergence of microRNAs.

    PubMed

    Chávez Montes, Ricardo A; de Fátima Rosas-Cárdenas, Flor; De Paoli, Emanuele; Accerbi, Monica; Rymarquis, Linda A; Mahalingam, Gayathri; Marsch-Martínez, Nayelli; Meyers, Blake C; Green, Pamela J; de Folter, Stefan

    2014-01-01

    Small RNAs are pivotal regulators of gene expression that guide transcriptional and post-transcriptional silencing mechanisms in eukaryotes, including plants. Here we report a comprehensive atlas of sRNA and miRNA from 3 species of algae and 31 representative species across vascular plants, including non-model plants. We sequence and quantify sRNAs from 99 different tissues or treatments across species, resulting in a data set of over 132 million distinct sequences. Using miRBase mature sequences as a reference, we identify the miRNA sequences present in these libraries. We apply diverse profiling methods to examine critical sRNA and miRNA features, such as size distribution, tissue-specific regulation and sequence conservation between species, as well as to predict putative new miRNA sequences. We also develop database resources, computational analysis tools and a dedicated website, http://smallrna.udel.edu/. This study provides new insights on plant sRNAs and miRNAs, and a foundation for future studies. PMID:24759728

  6. Highly conserved D-loop-like nuclear mitochondrial sequences (Numts) in tiger (Panthera tigris).

    PubMed

    Zhang, Wenping; Zhang, Zhihe; Shen, Fujun; Hou, Rong; Lv, Xiaoping; Yue, Bisong

    2006-08-01

    Using oligonucleotide primers designed to match hypervariable segments I (HVS-1) of Panthera tigris mitochondrial DNA (mtDNA), we amplified two different PCR products (500 bp and 287 bp) in the tiger (Panthera tigris), but got only one PCR product (287 bp) in the leopard (Panthera pardus). Sequence analyses indicated that the sequence of 287 bp was a D-loop-like nuclear mitochondrial sequence (Numts), indicating a nuclear transfer that occurred approximately 4.8-17 million years ago in the tiger and 4.6-16 million years ago in the leopard. Although the mtDNA D-loop sequence has a rapid rate of evolution, the 287-bp Numts are highly conserved; they are nearly identical in tiger subspecies and only 1.742% different between tiger and leopard. Thus, such sequences represent molecular 'fossils' that can shed light on evolution of the mitochondrial genome and may be the most appropriate outgroup for phylogenetic analysis. This is also proved by comparing the phylogenetic trees reconstructed using the D-loop sequence of snow leopard and the 287-bp Numts as outgroup.

  7. Highly conserved D-loop-like nuclear mitochondrial sequences (Numts) in tiger (Panthera tigris).

    PubMed

    Zhang, Wenping; Zhang, Zhihe; Shen, Fujun; Hou, Rong; Lv, Xiaoping; Yue, Bisong

    2006-08-01

    Using oligonucleotide primers designed to match hypervariable segments I (HVS-1) of Panthera tigris mitochondrial DNA (mtDNA), we amplified two different PCR products (500 bp and 287 bp) in the tiger (Panthera tigris), but got only one PCR product (287 bp) in the leopard (Panthera pardus). Sequence analyses indicated that the sequence of 287 bp was a D-loop-like nuclear mitochondrial sequence (Numts), indicating a nuclear transfer that occurred approximately 4.8-17 million years ago in the tiger and 4.6-16 million years ago in the leopard. Although the mtDNA D-loop sequence has a rapid rate of evolution, the 287-bp Numts are highly conserved; they are nearly identical in tiger subspecies and only 1.742% different between tiger and leopard. Thus, such sequences represent molecular 'fossils' that can shed light on evolution of the mitochondrial genome and may be the most appropriate outgroup for phylogenetic analysis. This is also proved by comparing the phylogenetic trees reconstructed using the D-loop sequence of snow leopard and the 287-bp Numts as outgroup. PMID:17072079

  8. Coagulase and Efb of Staphylococcus aureus Have a Common Fibrinogen Binding Motif

    PubMed Central

    Ko, Ya-Ping; Kang, Mingsong; Ganesh, Vannakambadi K.; Ravirajan, Dharmanand; Li, Bin

    2016-01-01

    ABSTRACT Coagulase (Coa) and Efb, secreted Staphylococcus aureus proteins, are important virulence factors in staphylococcal infections. Coa interacts with fibrinogen (Fg) and induces the formation of fibrin(ogen) clots through activation of prothrombin. Efb attracts Fg to the bacterial surface and forms a shield to protect the bacteria from phagocytic clearance. This communication describes the use of an array of synthetic peptides to identify variants of a linear Fg binding motif present in Coa and Efb which are responsible for the Fg binding activities of these proteins. This motif represents the first Fg binding motif identified for any microbial protein. We initially located the Fg binding sites to Coa’s C-terminal disordered segment containing tandem repeats by using recombinant fragments of Coa in enzyme-linked immunosorbent assay-type binding experiments. Sequence analyses revealed that this Coa region contained shorter segments with sequences similar to the Fg binding segments in Efb. An alanine scanning approach allowed us to identify the residues in Coa and Efb that are critical for Fg binding and to define the Fg binding motifs in the two proteins. In these motifs, the residues required for Fg binding are largely conserved, and they therefore constitute variants of a common Fg binding motif which binds to Fg with high affinity. Defining a specific motif also allowed us to identify a functional Fg binding register for the Coa repeats that is different from the repeat unit previously proposed. PMID:26733070

  9. [A primary study of evolution of hepatitis B virus based on motif discovery].

    PubMed

    Ma, Lei; Yi, Qing-Qing; Zhang, Qi; He, Jian-Feng

    2014-01-01

    Hepatitis B is a serious infectious disease worldwide, and hepatitis B virus (HBV) is the direct cause of this disease. In recent years, as an essential part of its evolutionary process, HBV mutation has been extensively studied domestically and globally. However, the study on the conserved sequences in HBV sequences is still in its infancy. In this study, we applied multiple EM for motif elicitation (MEME) algorithm to discover HBV motif and proposed a new metric, conservative index (CI), to carry out phylogenetic analysis based on HBV sequences. Then, the constructed phylogenetic tree was subjected to reliability assessment. The results demonstrated that the new metric CI combined with the MEME algorithm can effectively help to discover motifs in HBV sequences and construct a phylogenetic tree based on them and to analyze the evolutionary relationship between HBV sequences; in addition, the possible ancestral sequences of samples may be obtained by conservative analysis. The proposed method is valuable for the exploratory study on large HBV sequence data sets. PMID:24772892

  10. MotifMiner: A Table Driven Greedy Algorithm for DNA Motif Mining

    NASA Astrophysics Data System (ADS)

    Seeja, K. R.; Alam, M. A.; Jain, S. K.

    DNA motif discovery is a much explored problem in functional genomics. This paper describes a table driven greedy algorithm for discovering regulatory motifs in the promoter sequences of co-expressed genes. The proposed algorithm searches both DNA strands for the common patterns or motifs. The inputs to the algorithm are set of promoter sequences, the motif length and minimum Information Content. The algorithm generates subsequences of given length from the shortest input promoter sequence. It stores these subsequences and their reverse complements in a table. Then it searches the remaining sequences for good matches of these subsequences. The Information Content score is used to measure the goodness of the motifs. The algorithm has been tested with synthetic data and real data. The results are found promising. The algorithm could discover meaningful motifs from the muscle specific regulatory sequences.

  11. Mitochondrial genome sequences illuminate maternal lineages of conservation concern in a rare carnivore

    PubMed Central

    2011-01-01

    Background Science-based wildlife management relies on genetic information to infer population connectivity and identify conservation units. The most commonly used genetic marker for characterizing animal biodiversity and identifying maternal lineages is the mitochondrial genome. Mitochondrial genotyping figures prominently in conservation and management plans, with much of the attention focused on the non-coding displacement ("D") loop. We used massively parallel multiplexed sequencing to sequence complete mitochondrial genomes from 40 fishers, a threatened carnivore that possesses low mitogenomic diversity. This allowed us to test a key assumption of conservation genetics, specifically, that the D-loop accurately reflects genealogical relationships and variation of the larger mitochondrial genome. Results Overall mitogenomic divergence in fishers is exceedingly low, with 66 segregating sites and an average pairwise distance between genomes of 0.00088 across their aligned length (16,290 bp). Estimates of variation and genealogical relationships from the displacement (D) loop region (299 bp) are contradicted by the complete mitochondrial genome, as well as the protein coding fraction of the mitochondrial genome. The sources of this contradiction trace primarily to the near-absence of mutations marking the D-loop region of one of the most divergent lineages, and secondarily to independent (recurrent) mutations at two nucleotide position in the D-loop amplicon. Conclusions Our study has two important implications. First, inferred genealogical reconstructions based on the fisher D-loop region contradict inferences based on the entire mitogenome to the point that the populations of greatest conservation concern cannot be accurately resolved. Whole-genome analysis identifies Californian haplotypes from the northern-most populations as highly distinctive, with a significant excess of amino acid changes that may be indicative of molecular adaptation; D-loop sequences fail

  12. 16S ribosomal RNA pseudouridine synthase RsuA of Escherichia coli: deletion, mutation of the conserved Asp102 residue, and sequence comparison among all other pseudouridine synthases.

    PubMed

    Conrad, J; Niu, L; Rudd, K; Lane, B G; Ofengand, J

    1999-06-01

    The gene for RsuA, the pseudouridine synthase that converts U516 to pseudouridine in 16S ribosomal RNA of Escherichia coli, has been deleted in strains MG1655 and BL21/DE3. Deletion of this gene resulted in the specific loss of pseudouridine516 in both cell lines, and replacement of the gene in trans on a plasmid restored the pseudouridine. Therefore, rsuA is the only gene in E. coli with the ability to produce a protein capable of forming pseudouridine516. There was no effect on the growth rate of rsuA- MG1655 either in rich or minimal medium at either 24, 37, or 42 degrees C. Plasmid rescue of the BL21/DE3 rsuA- strain using pET15b containing an rsuA gene with aspartate102 replaced by asparagine or threonine demonstrated that neither mutant was active in vivo. This result supports a role for this aspartate, located in a unique GRLD sequence in this gene, at the catalytic center of the synthase. Induction of wild-type and the two mutant synthases in strain BL21/DE3 from genes in pET15b yielded a strong overexpression of all three proteins in approximately equal amounts showing that the mutations did not affect production of the protein in vivo and thus that the lack of activity was not due to a failure to produce a gene product. Aspartate102 is found in a conserved motif present in many pseudouridine synthases. The conservation and distribution of this motif in nature was assessed.

  13. Staufen1 dimerizes through a conserved motif and a degenerate dsRNA-binding domain to promote mRNA decay.

    PubMed

    Gleghorn, Michael L; Gong, Chenguang; Kielkopf, Clara L; Maquat, Lynne E

    2013-04-01

    Staufen1 (STAU1)-mediated mRNA decay (SMD) degrades mammalian-cell mRNAs that bind the double-stranded RNA (dsRNA)-binding protein STAU1 in their 3' untranslated region. We report a new motif, which typifies STAU homologs from all vertebrate classes, that is responsible for human STAU1 (hSTAU1) homodimerization. Our crystal structure and mutagenesis analyses reveal that this motif, which we named the Staufen-swapping motif (SSM), and the dsRNA-binding domain 5 ('RBD'5) mediate protein dimerization: the two SSM α-helices of one molecule interact primarily through a hydrophobic patch with the two 'RBD'5 α-helices of a second molecule. 'RBD'5 adopts the canonical α-β-β-β-α fold of a functional RBD, but it lacks residues and features required to bind duplex RNA. In cells, SSM-mediated hSTAU1 dimerization increases the efficiency of SMD by augmenting hSTAU1 binding to the ATP-dependent RNA helicase hUPF1. Dimerization regulates keratinocyte-mediated wound healing and many other cellular processes.

  14. Staufen1 dimerizes through a conserved motif and a degenerate dsRNA-binding domain to promote mRNA decay.

    PubMed

    Gleghorn, Michael L; Gong, Chenguang; Kielkopf, Clara L; Maquat, Lynne E

    2013-04-01

    Staufen1 (STAU1)-mediated mRNA decay (SMD) degrades mammalian-cell mRNAs that bind the double-stranded RNA (dsRNA)-binding protein STAU1 in their 3' untranslated region. We report a new motif, which typifies STAU homologs from all vertebrate classes, that is responsible for human STAU1 (hSTAU1) homodimerization. Our crystal structure and mutagenesis analyses reveal that this motif, which we named the Staufen-swapping motif (SSM), and the dsRNA-binding domain 5 ('RBD'5) mediate protein dimerization: the two SSM α-helices of one molecule interact primarily through a hydrophobic patch with the two 'RBD'5 α-helices of a second molecule. 'RBD'5 adopts the canonical α-β-β-β-α fold of a functional RBD, but it lacks residues and features required to bind duplex RNA. In cells, SSM-mediated hSTAU1 dimerization increases the efficiency of SMD by augmenting hSTAU1 binding to the ATP-dependent RNA helicase hUPF1. Dimerization regulates keratinocyte-mediated wound healing and many other cellular processes. PMID:23524536

  15. Cytochrome Oxidase I (COI) sequence conservation and variation patterns in the yellowfin and longtail tunas.

    PubMed

    Kunal, Swaraj Priyaranjan; Kumar, Girish

    2013-01-01

    Tunas are commercially important fishery worldwide. There are at least 13 species of tuna belonging to three genera, out of which genus Thunnus has maximum eight species. On the basis of their availability, they can be characterised as oceanic such as Thunnus albacares (yellowfin tuna) or coastal such as Thunnus tonggol (longtail tuna). Although these two are different species, morphological differentiation can only be seen in mature individuals, hence misidentification may result in erroneous data set, which ultimately affect conservation strategies. The mitochondrial DNA cytochrome oxidase c subunit 1 (COI) gene is one of the most popular markers for population genetic and phylogeographic studies across the animal kingdom. The present study aims to study the sequence conservation and variation in mitochondrial Cytochrome Oxidase I (COI) between these two species of tuna. COI sequence analysis of yellowfin and longtail revealed the close relationship between them in Thunnus genera. The present study is the first direct comparison of mitochondrial COI sequences of these two tuna species. PMID:23649742

  16. An rRNA variable region has an evolutionarily conserved essential role despite sequence divergence.

    PubMed Central

    Sweeney, R; Chen, L; Yao, M C

    1994-01-01

    Regions extremely variable in size and sequence occur at conserved locations in eukaryotic rRNAs. The functional importance of one such region was determined by gene reconstruction and replacement in Tetrahymena thermophila. Deletion of the D8 region of the large-subunit rRNA inactivates T. thermophila rRNA genes (rDNA): transformants containing only this type of rDNA are unable to grow. Replacement with an unrelated sequence of similar size or a variable region from a different position in the rRNA also inactivated the rDNA. Mutant rRNAs resulting from such constructs were present only in precursor forms, suggesting that these rRNAs are deficient in either processing or stabilization of the mature form. Replacement with D8 regions from three other organisms restored function, even though the sequences are very different. Thus, these D8 regions share an essential functional feature that is not reflected in their primary sequences. Similar tertiary structures may be the quality these sequences share that allows them to function interchangeably. Images PMID:8196658

  17. Substitution of a conserved cysteine-996 in a cysteine-rich motif of the laminin {alpha}2-chain in congenital muscular dystrophy with partial deficiency of the protein

    SciTech Connect

    Nissinen, M.; Xu Zhang; Tryggvason, K.

    1996-06-01

    Congenital muscular dystrophies (CMDs) are autosomal recessive muscle disorders of early onset. Approximately half of CMD patients present laminin {alpha}2-chain (merosin) deficiency in muscle biopsies, and the disease locus has been mapped to the region of the LAMA2 gene (6q22-23) in several families. Recently, two nonsense mutations in the laminin {alpha}2-chain gene were identified in CMD patients exhibiting complete deficiency of the laminin {alpha}2-chain in muscle biopsies. However, a subset of CMD patients with linkage to LAMA2 show only partial absence of the laminin {alpha}2-chain around muscle fibers, by immunocytochemical analysis. In the present study we have identified a homozygous missense mutation in the {alpha}2-chain gene of a consanguineous Turkish family with partial laminin {alpha}2-chain deficiency. The T{r_arrow}C transition at position 3035 in the cDNA sequence results in a Cys996{r_arrow}Arg substitution. The mutation that affects one of the conserved cysteine-rich repeats in the short arm of the laminin {alpha}2-chain should result in normal synthesis of the chain and in formation and secretion of a heterotrimeric laminin molecule. Muscular dysfunction is possibly caused either by abnormal disulfide cross-links and folding of the laminin repeat, leading to the disturbance of an as yet unknown binding function of the laminin {alpha}2-chain and to shorter half-life of the muscle-specific laminin-2 and laminin-4 isoforms, or by increased proteolytic sensitivity, leading to truncation of the short arm. 42 refs., 7 figs.

  18. A Collection of Conserved Noncoding Sequences to Study Gene Regulation in Flowering Plants.

    PubMed

    Van de Velde, Jan; Van Bel, Michiel; Vaneechoutte, Dries; Vandepoele, Klaas

    2016-08-01

    Transcription factors (TFs) regulate gene expression by binding cis-regulatory elements, of which the identification remains an ongoing challenge owing to the prevalence of large numbers of nonfunctional TF binding sites. Powerful comparative genomics methods, such as phylogenetic footprinting, can be used for the detection of conserved noncoding sequences (CNSs), which are functionally constrained and can greatly help in reducing the number of false-positive elements. In this study, we applied a phylogenetic footprinting approach for the identification of CNSs in 10 dicot plants, yielding 1,032,291 CNSs associated with 243,187 genes. To annotate CNSs with TF binding sites, we made use of binding site information for 642 TFs originating from 35 TF families in Arabidopsis (Arabidopsis thaliana). In three species, the identified CNSs were evaluated using TF chromatin immunoprecipitation sequencing data, resulting in significant overlap for the majority of data sets. To identify ultraconserved CNSs, we included genomes of additional plant families and identified 715 binding sites for 501 genes conserved in dicots, monocots, mosses, and green algae. Additionally, we found that genes that are part of conserved mini-regulons have a higher coherence in their expression profile than other divergent gene pairs. All identified CNSs were integrated in the PLAZA 3.0 Dicots comparative genomics platform (http://bioinformatics.psb.ugent.be/plaza/versions/plaza_v3_dicots/) together with new functionalities facilitating the exploration of conserved cis-regulatory elements and their associated genes. The availability of this data set in a user-friendly platform enables the exploration of functional noncoding DNA to study gene regulation in a variety of plant species, including crops. PMID:27261064

  19. A Collection of Conserved Noncoding Sequences to Study Gene Regulation in Flowering Plants1[OPEN

    PubMed Central

    2016-01-01

    Transcription factors (TFs) regulate gene expression by binding cis-regulatory elements, of which the identification remains an ongoing challenge owing to the prevalence of large numbers of nonfunctional TF binding sites. Powerful comparative genomics methods, such as phylogenetic footprinting, can be used for the detection of conserved noncoding sequences (CNSs), which are functionally constrained and can greatly help in reducing the number of false-positive elements. In this study, we applied a phylogenetic footprinting approach for the identification of CNSs in 10 dicot plants, yielding 1,032,291 CNSs associated with 243,187 genes. To annotate CNSs with TF binding sites, we made use of binding site information for 642 TFs originating from 35 TF families in Arabidopsis (Arabidopsis thaliana). In three species, the identified CNSs were evaluated using TF chromatin immunoprecipitation sequencing data, resulting in significant overlap for the majority of data sets. To identify ultraconserved CNSs, we included genomes of additional plant families and identified 715 binding sites for 501 genes conserved in dicots, monocots, mosses, and green algae. Additionally, we found that genes that are part of conserved mini-regulons have a higher coherence in their expression profile than other divergent gene pairs. All identified CNSs were integrated in the PLAZA 3.0 Dicots comparative genomics platform (http://bioinformatics.psb.ugent.be/plaza/versions/plaza_v3_dicots/) together with new functionalities facilitating the exploration of conserved cis-regulatory elements and their associated genes. The availability of this data set in a user-friendly platform enables the exploration of functional noncoding DNA to study gene regulation in a variety of plant species, including crops. PMID:27261064

  20. A highly conserved repeated chromosomal sequence in the radioresistant bacterium Deinococcus radiodurans SARK.

    PubMed

    Lennon, E; Gutman, P D; Yao, H L; Minton, K W

    1991-03-01

    A DNA fragment containing a portion of a DNA damage-inducible gene from Deinococcus radiodurans SARK hybridized to numerous fragments of SARK genomic DNA because of a highly conserved repetitive chromosomal element. The element is of variable length, ranging from 150 to 192 bp, depending on the absence or presence of one or two 21-bp sequences located internally. A putative translational start site of the damage-inducible gene is within the reiterated element. The element contains dyad symmetries that suggest modes of transcriptional and/or translational control.

  1. SA-Mot: a web server for the identification of motifs of interest extracted from protein loops.

    PubMed

    Regad, Leslie; Saladin, Adrien; Maupetit, Julien; Geneix, Colette; Camproux, Anne-Claude

    2011-07-01

    The detection of functional motifs is an important step for the determination of protein functions. We present here a new web server SA-Mot (Structural Alphabet Motif) for the extraction and location of structural motifs of interest from protein loops. Contrary to other methods, SA-Mot does not focus only on functional motifs, but it extracts recurrent and conserved structural motifs involved in structural redundancy of loops. SA-Mot uses the structural word notion to extract all structural motifs from uni-dimensional sequences corresponding to loop structures. Then, SA-Mot provides a description of these structural motifs using statistics computed in the loop data set and in SCOP superfamily, sequence and structural parameters. SA-Mot results correspond to an interactive table listing all structural motifs extracted from a target structure and their associated descriptors. Using this information, the users can easily locate loop regions that are important for the protein folding and function. The SA-Mot web server is available at http://sa-mot.mti.univ-paris-diderot.fr.

  2. Sequence recombination and conservation of Varroa destructor virus-1 and deformed wing virus in field collected honey bees (Apis mellifera).

    PubMed

    Wang, Hui; Xie, Jiazheng; Shreeve, Tim G; Ma, Jinmin; Pallett, Denise W; King, Linda A; Possee, Robert D

    2013-01-01

    We sequenced small (s) RNAs from field collected honeybees (Apis mellifera) and bumblebees (Bombuspascuorum) using the Illumina technology. The sRNA reads were assembled and resulting contigs were used to search for virus homologues in GenBank. Matches with Varroadestructor virus-1 (VDV1) and Deformed wing virus (DWV) genomic sequences were obtained for A. mellifera but not B. pascuorum. Further analyses suggested that the prevalent virus population was composed of VDV-1 and a chimera of 5'-DWV-VDV1-DWV-3'. The recombination junctions in the chimera genomes were confirmed by using RT-PCR, cDNA cloning and Sanger sequencing. We then focused on conserved short fragments (CSF, size > 25 nt) in the virus genomes by using GenBank sequences and the deep sequencing data obtained in this study. The majority of CSF sites confirmed conservation at both between-species (GenBank sequences) and within-population (dataset of this study) levels. However, conserved nucleotide positions in the GenBank sequences might be variable at the within-population level. High mutation rates (Pi>10%) were observed at a number of sites using the deep sequencing data, suggesting that sequence conservation might not always be maintained at the population level. Virus-host interactions and strategies for developing RNAi treatments against VDV1/DWV infections are discussed.

  3. Conserved DNA sequences adjacent to chromosome fragmentation and telomere addition sites in Euplotes crassus.

    PubMed

    Klobutcher, L A; Gygax, S E; Podoloff, J D; Vermeesch, J R; Price, C M; Tebeau, C M; Jahn, C L

    1998-09-15

    During the formation of a new macronucleus in the ciliate Euplotes crassus, micronuclear chromosomes are reproducibly broken at approximately 10 000 sites. This chromosome fragmentation process is tightly coupled with de novo telomere synthesis by the telomerase ribonucleoprotein complex, generating short linear macronuclear DNA molecules. In this study, the sequences of 58 macronuclear DNA termini and eight regions of the micronuclear genome containing chromosome fragmentation/telomere addition sites were determined. Through a statistically based analysis of these data, along with previously published sequences, we have defined a 10 bp conserved sequence element (E-Cbs, 5'-HATTGAAaHH-3', H = A, C or T) near chromosome fragmentation sites. The E-Cbs typically resides within the DNA destined to form a macronuclear DNA molecule, but can also reside within flanking micronuclear DNA that is eliminated during macronuclear development. The location of the E-Cbs in macronuclear-destined versus flanking micronuclear DNA leads us to propose a model of chromosome fragmentation that involves a 6 bp staggered cut in the chromosome. The identification of adjacent macronuclear-destined sequences that overlap by 6 bp provides support for the model. Finally, our data provide evidence that telomerase is able to differentiate between newly generated ends that contain partial telomeric repeats and those that do not in vivo.

  4. Effects of Mutation in the Conserved GTSRH Sequence of the Motor Protein Prestin on Its Characteristics

    NASA Astrophysics Data System (ADS)

    Kumano, Shun; Iida, Koji; Murakoshi, Michio; Naito, Naoyuki; Tsumoto, Kouhei; Ikeda, Katsuhisa; Kumagai, Izumi; Kobayashi, Toshimitsu; Wada, Hiroshi

    Prestin is a motor protein responsible for the outer hair cell (OHC) electromotility which amplifies the vibration of the organ of Corti in the inner ear. Identification of the functional significance of particular amino acids is necessary to characterize prestin. In this study, an attempt was made to clarify the role of the GTSRH sequence at positions 127-131 in prestin conserved in six proteins of the solute carrier (SLC) 26 family of which prestin is a member. To elucidate what role that sequence plays in the characteristics of prestin, mutations were introduced into the sequence and the characteristics of the constructed point mutants were investigated by Western blotting, immunofluorescence experiments and the whole-cell patch-clamp technique. The localization of T128A was altered, the anion transport function of H131A and that of S129T were lost and such functions of G127A, T128A, S129A and R130A declined. These results suggest that the GTSRH sequence plays an important role in the localization of prestin, as well as in its anion transport function.

  5. Mining, compressing and classifying with extensible motifs

    PubMed Central

    Apostolico, Alberto; Comin, Matteo; Parida, Laxmi

    2006-01-01

    Background Motif patterns of maximal saturation emerged originally in contexts of pattern discovery in biomolecular sequences and have recently proven a valuable notion also in the design of data compression schemes. Informally, a motif is a string of intermittently solid and wild characters that recurs more or less frequently in an input sequence or family of sequences. Motif discovery techniques and tools tend to be computationally imposing, however, special classes of "rigid" motifs have been identified of which the discovery is affordable in low polynomial time. Results In the present work, "extensible" motifs are considered such that each sequence of gaps comes endowed with some elasticity, whereby the same pattern may be stretched to fit segments of the source that match all the solid characters but are otherwise of different lengths. A few applications of this notion are then described. In applications of data compression by textual substitution, extensible motifs are seen to bring savings on the size of the codebook, and hence to improve compression. In germane contexts, in which compressibility is used in its dual role as a basis for structural inference and classification, extensible motifs are seen to support unsupervised classification and phylogeny reconstruction. Conclusion Off-line compression based on extensible motifs can be used advantageously to compress and classify biological sequences. PMID:16722593

  6. Discriminative motif optimization based on perceptron training

    PubMed Central

    Patel, Ronak Y.; Stormo, Gary D.

    2014-01-01

    Motivation: Generating accurate transcription factor (TF) binding site motifs from data generated using the next-generation sequencing, especially ChIP-seq, is challenging. The challenge arises because a typical experiment reports a large number of sequences bound by a TF, and the length of each sequence is relatively long. Most traditional motif finders are slow in handling such enormous amount of data. To overcome this limitation, tools have been developed that compromise accuracy with speed by using heuristic discrete search strategies or limited optimization of identified seed motifs. However, such strategies may not fully use the information in input sequences to generate motifs. Such motifs often form good seeds and can be further improved with appropriate scoring functions and rapid optimization. Results: We report a tool named discriminative motif optimizer (DiMO). DiMO takes a seed motif along with a positive and a negative database and improves the motif based on a discriminative strategy. We use area under receiver-operating characteristic curve (AUC) as a measure of discriminating power of motifs and a strategy based on perceptron training that maximizes AUC rapidly in a discriminative manner. Using DiMO, on a large test set of 87 TFs from human, drosophila and yeast, we show that it is possible to significantly improve motifs identified by nine motif finders. The motifs are generated/optimized using training sets and evaluated on test sets. The AUC is improved for almost 90% of the TFs on test sets and the magnitude of increase is up to 39%. Availability and implementation: DiMO is available at http://stormo.wustl.edu/DiMO Contact: rpatel@genetics.wustl.edu, ronakypatel@gmail.com PMID:24369152

  7. Genome-Wide Analysis of Ethylene-Responsive Element Binding Factor-Associated Amphiphilic Repression Motif-Containing Transcriptional Regulators in Arabidopsis1[W][OA

    PubMed Central

    Kagale, Sateesh; Links, Matthew G.; Rozwadowski, Kevin

    2010-01-01

    The ethylene-responsive element binding factor-associated amphiphilic repression (EAR) motif is a transcriptional regulatory motif identified in members of the ethylene-responsive element binding factor, C2H2, and auxin/indole-3-acetic acid families of transcriptional regulators. Sequence comparison of the core EAR motif sites from these proteins revealed two distinct conservation patterns: LxLxL and DLNxxP. Proteins containing these motifs play key roles in diverse biological functions by negatively regulating genes involved in developmental, hormonal, and stress signaling pathways. Through a genome-wide bioinformatics analysis, we have identified the complete repertoire of the EAR repressome in Arabidopsis (Arabidopsis thaliana) comprising 219 proteins belonging to 21 different transcriptional regulator families. Approximately 72% of these proteins contain a LxLxL type of EAR motif, 22% contain a DLNxxP type of EAR motif, and the remaining 6% have a motif where LxLxL and DLNxxP are overlapping. Published in vitro and in planta investigations support approximately 40% of these proteins functioning as negative regulators of gene expression. Comparative sequence analysis of EAR motif sites and adjoining regions has identified additional preferred residues and potential posttranslational modification sites that may influence the functionality of the EAR motif. Homology searches against protein databases of poplar (Populus trichocarpa), grapevine (Vitis vinifera), rice (Oryza sativa), and sorghum (Sorghum bicolor) revealed that the EAR motif is conserved across these diverse plant species. This genome-wide analysis represents the most extensive survey of EAR motif-containing proteins in Arabidopsis to date and provides a resource enabling investigations into their biological roles and the mechanism of EAR motif-mediated transcriptional regulation. PMID:20097792

  8. Peptide Vocabulary Analysis Reveals Ultra-Conservation and Homonymity in Protein Sequences

    PubMed Central

    Gatherer, Derek

    2007-01-01

    A new algorithm is presented for vocabulary analysis (word detection) in texts of human origin. It performs at 60%–70% overall accuracy and greater than 80% accuracy for longer words, and approximately 85% sensitivity on Alice in Wonderland, a considerable improvement on previous methods. When applied to protein sequences, it detects short sequences analogous to words in human texts, i.e. intolerant to changes in spelling (mutation), and relatively context-independent in their meaning (function). Some of these are homonyms of up to 7 amino acids, which can assume different structures in different proteins. Others are ultra-conserved stretches of up to 18 amino acids within proteins of less than 40% overall identity, reflecting extreme constraint or convergent evolution. Different species are found to have qualitatively different major peptide vocabularies, e.g. some are dominated by large gene families, while others are rich in simple repeats or dominated by internally repetitive proteins. This suggests the possibility of a peptide vocabulary signature, analogous to genome signatures in DNA. Homonyms may be useful in detecting convergent evolution and positive selection in protein evolution. Ultra-conserved words may be useful in identifying structures intolerant to substitution over long periods of evolutionary time. PMID:20066129

  9. Identification of conservative microRNAs in Saanen dairy goat testis through deep sequencing.

    PubMed

    Wu, J; Zhu, H; Song, W; Li, M; Liu, C; Li, N; Tang, F; Mu, H; Liao, M; Li, X; Guan, W; Li, X; Hua, J

    2014-02-01

    MicroRNA (miRNA) is a kind of small non-coding RNA molecules that function as important gene expression regulators by targeting messenger RNAs for post-transcriptional endonucleolytic cleavage or translational inhibition. In this study, small RNA libraries were constructed based on adult dairy goat testicular tissues and sequenced using the Illumina high-throughput sequencing technology. Blasted to miRNAs of cow and sheep in miRBase 19.0, 373 conserved miRNAs were identified in dairy goat testis and 91 novel paired-miRNAs were found. Expression of miRNAs in the dairy goat testis (miR-10b, miR-126-3p, miR-126-5p, miR-34c, miR-449b and miR-1468) was confirmed by qRT-PCR. In addition, the 128 conserved miRNAs were found by comparing the miRNA expression profiles in dairy goat testis with those in cow and mouse, which all might be involved in dairy goat testis development and meiosis. This study reveals the first miRNA profile related to the biology of testis in the dairy goat. The characterization of these miRNAs could contribute to a better understanding of the molecular mechanisms of reproductive physiology and development in the dairy goat.

  10. Conserved Noncoding Sequences Regulate lhx5 Expression in the Zebrafish Forebrain

    PubMed Central

    Sun, Liu; Chen, Fengjiao; Peng, Gang

    2015-01-01

    The LIM homeobox family protein Lhx5 plays important roles in forebrain development in the vertebrates. The lhx5 gene exhibits complex temporal and spatial expression patterns during early development but its transcriptional regulation mechanisms are not well understood. Here, we have used transgenesis in zebrafish in order to define regulatory elements that drive lhx5 expression in the forebrain. Through comparative genomic analysis we identified 10 non-coding sequences conserved in five teleost species. We next examined the enhancer activities of these conserved non-coding sequences with Tol2 transposon mediated transgenesis. We found a proximately located enhancer gave rise to robust reporter EGFP expression in the forebrain regions. In addition, we identified an enhancer located at approximately 50 kb upstream of lhx5 coding region that is responsible for reporter gene expression in the hypothalamus. We also identify an enhancer located approximately 40 kb upstream of the lhx5 coding region that is required for expression in the prethalamus (ventral thalamus). Together our results suggest discrete enhancer elements control lhx5 expression in different regions of the forebrain. PMID:26147098

  11. Conservation.

    ERIC Educational Resources Information Center

    National Audubon Society, New York, NY.

    This set of teaching aids consists of seven Audubon Nature Bulletins, providing the teacher and student with informational reading on various topics in conservation. The bulletins have these titles: Plants as Makers of Soil, Water Pollution Control, The Ground Water Table, Conservation--To Keep This Earth Habitable, Our Threatened Air Supply,…

  12. Discriminative motif discovery via simulated evolution and random under-sampling.

    PubMed

    Song, Tao; Gu, Hong

    2014-01-01

    Conserved motifs in biological sequences are closely related to their structure and functions. Recently, discriminative motif discovery methods have attracted more and more attention. However, little attention has been devoted to the data imbalance problem, which is one of the main reasons affecting the performance of the discriminative models. In this article, a simulated evolution method is applied to solve the multi-class imbalance problem at the stage of data preprocessing, and at the stage of Hidden Markov Models (HMMs) training, a random under-sampling method is introduced for the imbalance between the positive and negative datasets. It is shown that, in the task of discovering targeting motifs of nine subcellular compartments, the motifs found by our method are more conserved than the methods without considering data imbalance problem and recover the most known targeting motifs from Minimotif Miner and InterPro. Meanwhile, we use the found motifs to predict protein subcellular localization and achieve higher prediction precision and recall for the minority classes.

  13. Discriminative Motif Discovery via Simulated Evolution and Random Under-Sampling

    PubMed Central

    Song, Tao; Gu, Hong

    2014-01-01

    Conserved motifs in biological sequences are closely related to their structure and functions. Recently, discriminative motif discovery methods have attracted more and more attention. However, little attention has been devoted to the data imbalance problem, which is one of the main reasons affecting the performance of the discriminative models. In this article, a simulated evolution method is applied to solve the multi-class imbalance problem at the stage of data preprocessing, and at the stage of Hidden Markov Models (HMMs) training, a random under-sampling method is introduced for the imbalance between the positive and negative datasets. It is shown that, in the task of discovering targeting motifs of nine subcellular compartments, the motifs found by our method are more conserved than the methods without considering data imbalance problem and recover the most known targeting motifs from Minimotif Miner and InterPro. Meanwhile, we use the found motifs to predict protein subcellular localization and achieve higher prediction precision and recall for the minority classes. PMID:24551063

  14. Phylogeny, sequence conservation, and functional complementation of the SBDS protein family.

    PubMed

    Boocock, G R B; Marit, M R; Rommens, J M

    2006-06-01

    The Shwachman-Bodian-Diamond syndrome (SBDS) protein family occurs widely in nature, although its function has not been determined. Comprehensive database searches revealed SBDS homologues from 159 species, including examples from all sequenced archaeal and eukaryotic genomes and all eukaryotic kingdoms. Sequence alignment with ClustalX and MUSCLE algorithms led to the identification of conserved residues that occurred predominantly in the amino-terminal FYSH domain where they appeared to contribute to protein folding or stability. Only SBDS residue Gly91 was invariant in all species. Four distantly related protists were found to have two divergent SBDS genes in their genomes. In each case, phylogenetic analyses and the identification of shared sequence features suggested that one gene was derived from lateral gene transfer. We also identified a shared C-terminal zinc finger domain fusion in flowering plants and chromalveolates that may shed light on the function of the protein family and the evolutionary histories of these kingdoms. To assess the extent of SBDS functional conservation, we carried out complementation studies of SBDS homologues and interspecies chimeras in Saccharomyces cerevisiae. We determined that the FYSH domain was widely interchangeable among eukaryotes, while domain 2 imparted species specificity to protein function. Domain 3 was largely dispensable for function in our yeast complementation assay. Overall, the phylogeny of SBDS was shared with a group of proteins that were markedly enriched for RNA metabolism and/or ribosome-associated functions. These findings link Shwachman-Diamond syndrome to other bone marrow failure syndromes with defects in nucleolus-associated processes, including Diamond-Blackfan anemia, cartilage-hair hypoplasia, and dyskeratosis congenita.

  15. PETcofold: predicting conserved interactions and structures of two multiple alignments of RNA sequences

    PubMed Central

    Seemann, Stefan E.; Richter, Andreas S.; Gesell, Tanja; Backofen, Rolf; Gorodkin, Jan

    2011-01-01

    Motivation: Predicting RNA–RNA interactions is essential for determining the function of putative non-coding RNAs. Existing methods for the prediction of interactions are all based on single sequences. Since comparative methods have already been useful in RNA structure determination, we assume that conserved RNA–RNA interactions also imply conserved function. Of these, we further assume that a non-negligible amount of the existing RNA–RNA interactions have also acquired compensating base changes throughout evolution. We implement a method, PETcofold, that can take covariance information in intra-molecular and inter-molecular base pairs into account to predict interactions and secondary structures of two multiple alignments of RNA sequences. Results: PETcofold's ability to predict RNA–RNA interactions was evaluated on a carefully curated dataset of 32 bacterial small RNAs and their targets, which was manually extracted from the literature. For evaluation of both RNA–RNA interaction and structure prediction, we were able to extract only a few high-quality examples: one vertebrate small nucleolar RNA and four bacterial small RNAs. For these we show that the prediction can be improved by our comparative approach. Furthermore, PETcofold was evaluated on controlled data with phylogenetically simulated sequences enriched for covariance patterns at the interaction sites. We observed increased performance with increased amounts of covariance. Availability: The program PETcofold is available as source code and can be downloaded from http://rth.dk/resources/petcofold. Contact: gorodkin@rth.dk; backofen@informatik.uni-freiburg.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:21088024

  16. The C-Terminal Sequence and PI motif of the Orchid (Oncidium Gower Ramsey) PISTILLATA (PI) Ortholog Determine its Ability to Bind AP3 Orthologs and Enter the Nucleus to Regulate Downstream Genes Controlling Petal and Stamen Formation.

    PubMed

    Mao, Wan-Ting; Hsu, Hsing-Fun; Hsu, Wei-Han; Li, Jen-Ying; Lee, Yung-I; Yang, Chang-Hsien

    2015-11-01

    This study focused on the investigation of the effects of the PI motif and C-terminus of the Oncidium Gower Ramsey MADS box gene 8 (OMADS8), a PISTILLATA (PI) ortholog, on floral organ formation. 35S::OMADS8 completely rescued and 35S::OMADS8-PI (with the PI motif deleted) partially rescued petal/stamen formation, whereas these deficiencies were not rescued by 35S::OMADS8-C (C-terminal 29 amino acids deleted) in pi-1 mutants. OMADS8 could interact with Arabidopsis APETALA3 (AP3) and enter the nucleus. The nuclear entry efficiency was reduced for OMADS8-PI/AP3 and OMADS8-C/AP3. OMADS8 could also interact with OMADS5/OMADS9 (the Oncidium AP3 ortholog) and enter the nucleus with an efficiency only slightly affected by the deletion of the C-terminal sequence or PI motif. However, the stability of the OMADS8/OMADS5 and OMADS8/OMADS9 complexes was significantly reduced by deletion of the C-terminal sequence or PI motif. Further analysis indicated that the expression of genes downstream of AP3/PI (BNQ1/BNQ2/GNC/At4g30270) was compensated by 35S::OMADS8 and 35S::OMADS8-PI to a level similar to wild-type plants but was not affected by 35S::OMADS8-C in the pi-1 mutants. A similar FRET (fluorescence resonance energy transfer) efficiency was observed for Arabidopsis AGAMOUS (AG) and the Oncidium AG ortholog OMADS4 for OMADS8, OMADS8-PI and OMADS8-C. These results indicated that the OMADS8 PI motif and C-terminus were valuable for the interaction of OMADS8 with the AP3 orthologs to form higher order heterotetrameric complexes that regulated petal/stamen formation in both Oncidium orchids and transgenic Arabidopsis. However, the C-terminal sequence and PI motif were dispensable for the interaction of OMADS8 with the AG orthologs.

  17. A motif present in the main cytoplasmic loop of nicotinic acetylcholine receptors and catalases.

    PubMed

    Morgado-Valle, C; García-Colunga, J; Miledi, R; Díaz-Muñoz, M

    2001-05-01

    A motif containing five conserved amino acids (RXPXTH(X)14P) was detected in 111 proteins, including 82 nicotinic acetylcholine receptor (nAChR) subunits and 20 catalases. To explore possible functional roles of this motif in nAChRs two approaches were used: first, the motif sequences in nAChR subunits and catalases were analysed and compared; and, second, deletions in the rat alpha2 and beta4 nAChR subunits expressed in Xenopus oocytes were analysed. Compared to the three-dimensional structure of bovine hepatic catalase, structural coincidences were found in the motif of catalases and nAChRs. On the other hand, partial deletions of the motif in the alpha2 or beta4 subunits and injection of the mutants into oocytes was followed by a very weak expression of functional nAChRs; oocytes injected with alpha2 and beta4 subunits in which the entire motif had been deleted failed to elicit any acetylcholine currents. The results suggest that the motif may play a role in the activation of nAChRs. PMID:11370971

  18. Evolutionary conservation of sequence and secondary structures inCRISPR repeats

    SciTech Connect

    Kunin, Victor; Sorek, Rotem; Hugenholtz, Philip

    2006-09-01

    Clustered Regularly Interspaced Palindromic Repeats (CRISPRs) are a novel class of direct repeats, separated by unique spacer sequences of similar length, that are present in {approx}40% of bacterial and all archaeal genomes analyzed to date. More than 40 gene families, called CRISPR-associated sequences (CAS), appear in conjunction with these repeats and are thought to be involved in the propagation and functioning of CRISPRs. It has been proposed that the CRISPR/CAS system samples, maintains a record of, and inactivates invasive DNA that the cell has encountered, and therefore constitutes a prokaryotic analog of an immune system. Here we analyze CRISPR repeats identified in 195 microbial genomes and show that they can be organized into multiple clusters based on sequence similarity. All individual repeats in any given cluster were inferred to form characteristic RNA secondary structure, ranging from non-existent to pronounced. Stable secondary structures included G:U base pairs and exhibited multiple compensatory base changes in the stem region, indicating evolutionary conservation and functional importance. We also show that the repeat-based classification corresponds to, and expands upon, a previously reported CAS gene-based classification including specific relationships between CRISPR and CAS subtypes.

  19. Signature motifs of GDP polyribonucleotidyltransferase, a non-segmented negative strand RNA viral mRNA capping enzyme, domain in the L protein are required for covalent enzyme-pRNA intermediate formation.

    PubMed

    Neubauer, Julie; Ogino, Minako; Green, Todd J; Ogino, Tomoaki

    2016-01-01

    The unconventional mRNA capping enzyme (GDP polyribonucleotidyltransferase, PRNTase; block V) domain in RNA polymerase L proteins of non-segmented negative strand (NNS) RNA viruses (e.g. rabies, measles, Ebola) contains five collinear sequence elements, Rx(3)Wx(3-8)ΦxGxζx(P/A) (motif A; Φ, hydrophobic; ζ, hydrophilic), (Y/W)ΦGSxT (motif B), W (motif C), HR (motif D) and ζxxΦx(F/Y)QxxΦ (motif E). We performed site-directed mutagenesis of the L protein of vesicular stomatitis virus (VSV, a prototypic NNS RNA virus) to examine participation of these motifs in mRNA capping. Similar to the catalytic residues in motif D, G1100 in motif A, T1157 in motif B, W1188 in motif C, and F1269 and Q1270 in motif E were found to be essential or important for the PRNTase activity in the step of the covalent L-pRNA intermediate formation, but not for the GTPase activity that generates GDP (pRNA acceptor). Cap defective mutations in these residues induced termination of mRNA synthesis at position +40 followed by aberrant stop-start transcription, and abolished virus gene expression in host cells. These results suggest that the conserved motifs constitute the active site of the PRNTase domain and the L-pRNA intermediate formation followed by the cap formation is essential for successful synthesis of full-length mRNAs.

  20. Amino acid binding by the class I aminoacyl-tRNA synthetases: role for a conserved proline in the signature sequence.

    PubMed Central

    Burbaum, J. J.; Schimmel, P.

    1992-01-01

    Although partial or complete three-dimensional structures are known for three Class I aminoacyl-tRNA synthetases, the amino acid-binding sites in these proteins remain poorly characterized. To explore the methionine binding site of Escherichia coli methionyl-tRNA synthetase, we chose to study a specific, randomly generated methionine auxotroph that contains a mutant methionyl-tRNA synthetase whose defect is manifested in an elevated Km for methionine (Barker, D.G., Ebel, J.-P., Jakes, R.C., & Bruton, C.J., 1982, Eur. J. Biochem. 127, 449-457), and employed the polymerase chain reaction to sequence this mutant synthetase directly. We identified a Pro 14 to Ser replacement (P14S), which accounts for a greater than 300-fold elevation in Km for methionine and has little effect on either the Km for ATP or the kcat of the amino acid activation reaction. This mutation destabilizes the protein in vivo, which may partly account for the observed auxotrophy. The altered proline is found in the "signature sequence" of the Class I synthetases and is conserved. This sequence motif is 1 of 2 found in the 10 Class I aminoacyl-tRNA synthetases and, in the known structures, it is in the nucleotide-binding fold as part of a loop between the end of a beta-strand and the start of an alpha-helix. The phenotype of the mutant and the stability and affinity for methionine of the wild-type and mutant enzymes are influenced by the amino acid that is 25 residues beyond the C-terminus of the signature sequence.(ABSTRACT TRUNCATED AT 250 WORDS) PMID:1304356

  1. Stochastic motif extraction using hidden Markov model

    SciTech Connect

    Fujiwara, Yukiko; Asogawa, Minoru; Konagaya, Akihiko

    1994-12-31

    In this paper, we study the application of an HMM (hidden Markov model) to the problem of representing protein sequences by a stochastic motif. A stochastic protein motif represents the small segments of protein sequences that have a certain function or structure. The stochastic motif, represented by an HMM, has conditional probabilities to deal with the stochastic nature of the motif. This HMM directive reflects the characteristics of the motif, such as a protein periodical structure or grouping. In order to obtain the optimal HMM, we developed the {open_quotes}iterative duplication method{close_quotes} for HMM topology learning. It starts from a small fully-connected network and iterates the network generation and parameter optimization until it achieves sufficient discrimination accuracy. Using this method, we obtained an HMM for a leucine zipper motif. Compared to the accuracy of a symbolic pattern representation with accuracy of 14.8 percent, an HMM achieved 79.3 percent in prediction. Additionally, the method can obtain an HMM for various types of zinc finger motifs, and it might separate the mixed data. We demonstrated that this approach is applicable to the validation of the protein databases; a constructed HMM b as indicated that one protein sequence annotated as {open_quotes}lencine-zipper like sequence{close_quotes} in the database is quite different from other leucine-zipper sequences in terms of likelihood, and we found this discrimination is plausible.

  2. T-cell recognition is shaped by epitope sequence conservation in the host proteome and microbiome.

    PubMed

    Bresciani, Anne; Paul, Sinu; Schommer, Nina; Dillon, Myles B; Bancroft, Tara; Greenbaum, Jason; Sette, Alessandro; Nielsen, Morten; Peters, Bjoern

    2016-05-01

    Several mechanisms exist to avoid or suppress inflammatory T-cell immune responses that could prove harmful to the host due to targeting self-antigens or commensal microbes. We hypothesized that these mechanisms could become evident when comparing the immunogenicity of a peptide from a pathogen or allergen with the conservation of its sequence in the human proteome or the healthy human microbiome. Indeed, performing such comparisons on large sets of validated T-cell epitopes, we found that epitopes that are similar with self-antigens above a certain threshold showed lower immunogenicity, presumably as a result of negative selection of T cells capable of recognizing such peptides. Moreover, we also found a reduced level of immune recognition for epitopes conserved in the commensal microbiome, presumably as a result of peripheral tolerance. These findings indicate that the existence (and potentially the polarization) of T-cell responses to a given epitope is influenced and to some extent predictable based on its similarity to self-antigens and commensal antigens.

  3. Position-specific prediction of methylation sites from sequence conservation based on information theory.

    PubMed

    Shi, Yinan; Guo, Yanzhi; Hu, Yayun; Li, Menglong

    2015-07-23

    Protein methylation plays vital roles in many biological processes and has been implicated in various human diseases. To fully understand the mechanisms underlying methylation for use in drug design and work in methylation-related diseases, an initial but crucial step is to identify methylation sites. The use of high-throughput bioinformatics methods has become imperative to predict methylation sites. In this study, we developed a novel method that is based only on sequence conservation to predict protein methylation sites. Conservation difference profiles between methylated and non-methylated peptides were constructed by the information entropy (IE) in a wider neighbor interval around the methylation sites that fully incorporated all of the environmental information. Then, the distinctive neighbor residues were identified by the importance scores of information gain (IG). The most representative model was constructed by support vector machine (SVM) for Arginine and Lysine methylation, respectively. This model yielded a promising result on both the benchmark dataset and independent test set. The model was used to screen the entire human proteome, and many unknown substrates were identified. These results indicate that our method can serve as a useful supplement to elucidate the mechanism of protein methylation and facilitate hypothesis-driven experimental design and validation.

  4. Intronic motif pairs cooperate across exons to promote pre-mRNA splicing

    PubMed Central

    2010-01-01

    Background A very early step in splice site recognition is exon definition, a process that is as yet poorly understood. Communication between the two ends of an exon is thought to be required for this step. We report genome-wide evidence for exons being defined through the combinatorial activity of motifs located in flanking intronic regions. Results Strongly co-occurring motifs were found to specifically reside in four intronic regions surrounding a large number of human exons. These paired motifs occur around constitutive and alternative exons but not pseudo exons. Most co-occurring motifs are limited to intronic regions within 100 nucleotides of the exon. They are preferentially associated with weaker exons. Their pairing is conserved in evolution and they exhibit a lower frequency of single nucleotide polymorphism when paired. Paired motifs display specificity with respect to distance from the exon borders and in constitutive versus alternative splicing. Many resemble binding sites for heterogeneous nuclear ribonucleoproteins. Specific pairs are associated with tissue-specific genes, the higher expression of which coincides with that of the pertinent RNA binding proteins. Tested pairs acted synergistically to enhance exon inclusion, and this enhancement was found to be exon-specific. Conclusions The exon-flanking sequence pairs identified here by genomic analysis promote exon inclusion and may play a role in the exon definition step in pre-mRNA splicing. We propose a model in which multiple concerted interactions are required between exonic sequences and flanking intronic sequences to effect exon definition. PMID:20704715

  5. Structural Basis for WDR5 Interaction (Win) Motif Recognition in Human SET1 Family Histone Methyltransferases*

    PubMed Central

    Dharmarajan, Venkatasubramanian; Lee, Jeong-Heon; Patel, Anamika; Skalnik, David G.; Cosgrove, Michael S.

    2012-01-01

    Translocations and amplifications of the mixed lineage leukemia-1 (MLL1) gene are associated with aggressive myeloid and lymphocytic leukemias in humans. MLL1 is a member of the SET1 family of histone H3 lysine 4 (H3K4) methyltransferases, which are required for transcription of genes involved in hematopoiesis and development. MLL1 associates with a subcomplex containing WDR5, RbBP5, Ash2L, and DPY-30 (WRAD), which together form the MLL1 core complex that is required for sequential mono- and dimethylation of H3K4. We previously demonstrated that WDR5 binds the conserved WDR5 interaction (Win) motif of MLL1 in vitro, an interaction that is required for the H3K4 dimethylation activity of the MLL1 core complex. In this investigation, we demonstrate that arginine 3765 of the MLL1 Win motif is required to co-immunoprecipitate WRAD from mammalian cells, suggesting that the WDR5-Win motif interaction is important for the assembly of the MLL1 core complex in vivo. We also demonstrate that peptides that mimic SET1 family Win motif sequences inhibit H3K4 dimethylation by the MLL1 core complex with varying degrees of efficiency. To understand the structural basis for these differences, we determined structures of WDR5 bound to six different naturally occurring Win motif sequences at resolutions ranging from 1.9 to 1.2 Å. Our results reveal that binding energy differences result from interactions between non-conserved residues C-terminal to the Win motif and to a lesser extent from subtle variation of residues within the Win motif. These results highlight a new class of methylation inhibitors that may be useful for the treatment of MLL1-related malignancies. PMID:22665483

  6. Structural basis for WDR5 interaction (Win) motif recognition in human SET1 family histone methyltransferases.

    PubMed

    Dharmarajan, Venkatasubramanian; Lee, Jeong-Heon; Patel, Anamika; Skalnik, David G; Cosgrove, Michael S

    2012-08-10

    Translocations and amplifications of the mixed lineage leukemia-1 (MLL1) gene are associated with aggressive myeloid and lymphocytic leukemias in humans. MLL1 is a member of the SET1 family of histone H3 lysine 4 (H3K4) methyltransferases, which are required for transcription of genes involved in hematopoiesis and development. MLL1 associates with a subcomplex containing WDR5, RbBP5, Ash2L, and DPY-30 (WRAD), which together form the MLL1 core complex that is required for sequential mono- and dimethylation of H3K4. We previously demonstrated that WDR5 binds the conserved WDR5 interaction (Win) motif of MLL1 in vitro, an interaction that is required for the H3K4 dimethylation activity of the MLL1 core complex. In this investigation, we demonstrate that arginine 3765 of the MLL1 Win motif is required to co-immunoprecipitate WRAD from mammalian cells, suggesting that the WDR5-Win motif interaction is important for the assembly of the MLL1 core complex in vivo. We also demonstrate that peptides that mimic SET1 family Win motif sequences inhibit H3K4 dimethylation by the MLL1 core complex with varying degrees of efficiency. To understand the structural basis for these differences, we determined structures of WDR5 bound to six different naturally occurring Win motif sequences at resolutions ranging from 1.9 to 1.2 Å. Our results reveal that binding energy differences result from interactions between non-conserved residues C-terminal to the Win motif and to a lesser extent from subtle variation of residues within the Win motif. These results highlight a new class of methylation inhibitors that may be useful for the treatment of MLL1-related malignancies. PMID:22665483

  7. Mouse annexin V chromosomal localization, cDNA sequence conservation, and molecular evolution

    SciTech Connect

    Rodriguez-Garcia, M.I.; Morgan, R.O.; Kozak, C.A.

    1996-01-15

    A full-length cDNA encoding mouse annexin V (ANX5) was cloned, sequenced, and utilized for chromosomal mapping. The gene lies on mouse chromosome 3 in close linkage with the fibroblast growth factor 2 (basic) gene and is syntenic with other genes known to have orthologous counterparts on human chromosome 4q. The open reading frame encoded a protein of 319 amino acids (aa), with 92-96% identity to ANX5 in other species. Internal repeat 3 of mouse ANX5 exhibited the highest level of nonconservative aa replacements with respect to other annexin subfamilies, but the greatest sequence conservation among ANX5 species members. This region may thus contain features that distinguish ANX5 from other annexins in properties or function. Phylogenetic analysis and homology testing of ANX5 members indicated that the 34-kDa annexin from Torpedo marmorata may also belong to this subfamily. Comparison of nine species of ANX5 led to an estimation of the unit evolutionary mutation rate at 1% aa replacements every 8 million years, comparable to other annexins. 46 refs., 4 figs.

  8. High Throughput Sequencing of T Cell Antigen Receptors Reveals a Conserved TCR Repertoire

    PubMed Central

    Hou, Xianliang; Lu, Chong; Chen, Sisi; Xie, Qian; Cui, Guangying; Chen, Jianing; Chen, Zhi; Wu, Zhongwen; Ding, Yulong; Ye, Ping; Dai, Yong; Diao, Hongyan

    2016-01-01

    Abstract The T-cell receptor (TCR) repertoire is a mirror of the human immune system that reflects processes caused by infections, cancer, autoimmunity, and aging. Next-generation sequencing has become a powerful tool for deep TCR profiling. Herein, we used this technology to study the repertoire features of TCR beta chain in the blood of healthy individuals. Peripheral blood samples were collected from 10 healthy donors. T cells were isolated with anti-human CD3 magnetic beads according to the manufacturer's protocol. We then combined multiplex-PCR, Illumina sequencing, and IMGT/High V-QUEST to analyze the characteristics and polymorphisms of the TCR. Most of the individual T cell clones were present at very low frequencies, suggesting that they had not undergone clonal expansion. The usage frequencies of the TCR beta variable, beta joining, and beta diversity gene segments were similar among T cells from different individuals. Notably, the usage frequency of individual nucleotides and amino acids within complementarity-determining region (CDR3) intervals was remarkably consistent between individuals. Moreover, our data show that terminal deoxynucleotidyl transferase activity was biased toward the insertion of G (31.92%) and C (27.14%) over A (21.82%) and T (19.12%) nucleotides. Some conserved features could be observed in the composition of CDR3, which may inform future studies of human TCR gene recombination. PMID:26962778

  9. Host species-specific conservation of a family of repeated DNA sequences in the genome of a fungal plant pathogen.

    PubMed Central

    Hamer, J E; Farrall, L; Orbach, M J; Valent, B; Chumley, F G

    1989-01-01

    We have identified a family of dispersed repetitive DNA sequences in the genome of Magnaporthe grisea, the fungus that causes rice blast disease. We have named this family of DNA sequences "MGR" for M. grisea repeat. Analysis of five MGR clones demonstrates that MGR sequences are highly polymorphic. The segregation of MGR sequences in genetic crosses and hybridization of MGR probes to separated, chromosome-size DNA molecules of M. grisea shows that this family of sequences is distributed among the M. grisea chromosomes. MGR sequences also hybridize to discrete poly(A)+ RNAs. Southern blot analysis using a MGR probe can distinguish rice pathogens from various sources. However, MGR sequences are not highly conserved in the genomes of M. grisea field isolates that do not infect rice. These results suggest that host selection for a specific pathogen genotype has occurred during the breeding and cultivation of rice. Images PMID:2602385

  10. Mining Conditional Phosphorylation Motifs.

    PubMed

    Liu, Xiaoqing; Wu, Jun; Gong, Haipeng; Deng, Shengchun; He, Zengyou

    2014-01-01

    Phosphorylation motifs represent position-specific amino acid patterns around the phosphorylation sites in the set of phosphopeptides. Several algorithms have been proposed to uncover phosphorylation motifs, whereas the problem of efficiently discovering a set of significant motifs with sufficiently high coverage and non-redundancy still remains unsolved. Here we present a novel notion called conditional phosphorylation motifs. Through this new concept, the motifs whose over-expressiveness mainly benefits from its constituting parts can be filtered out effectively. To discover conditional phosphorylation motifs, we propose an algorithm called C-Motif for a non-redundant identification of significant phosphorylation motifs. C-Motif is implemented under the Apriori framework, and it tests the statistical significance together with the frequency of candidate motifs in a single stage. Experiments demonstrate that C-Motif outperforms some current algorithms such as MMFPh and Motif-All in terms of coverage and non-redundancy of the results and efficiency of the execution. The source code of C-Motif is available at: https://sourceforge. net/projects/cmotif/. PMID:26356863

  11. Detecting DNA regulatory motifs by incorporating positional trendsin information content

    SciTech Connect

    Kechris, Katherina J.; van Zwet, Erik; Bickel, Peter J.; Eisen,Michael B.

    2004-05-04

    On the basis of the observation that conserved positions in transcription factor binding sites are often clustered together, we propose a simple extension to the model-based motif discovery methods. We assign position-specific prior distributions to the frequency parameters of the model, penalizing deviations from a specified conservation profile. Examples with both simulated and real data show that this extension helps discover motifs as the data become noisier or when there is a competing false motif.

  12. Identifying Conserved and Novel MicroRNAs in Developing Seeds of Brassica napus Using Deep Sequencing

    PubMed Central

    Körbes, Ana Paula; Machado, Ronei Dorneles; Guzman, Frank; Almerão, Mauricio Pereira; de Oliveira, Luiz Felipe Valter; Loss-Morais, Guilherme; Turchetto-Zolet, Andreia Carina; Cagliari, Alexandro; dos Santos Maraschin, Felipe; Margis-Pinheiro, Marcia; Margis, Rogerio

    2012-01-01

    MicroRNAs (miRNAs) are important post-transcriptional regulators of plant development and seed formation. In Brassica napus, an important edible oil crop, valuable lipids are synthesized and stored in specific seed tissues during embryogenesis. The miRNA transcriptome of B. napus is currently poorly characterized, especially at different seed developmental stages. This work aims to describe the miRNAome of developing seeds of B. napus by identifying plant-conserved and novel miRNAs and comparing miRNA abundance in mature versus developing seeds. Members of 59 miRNA families were detected through a computational analysis of a large number of reads obtained from deep sequencing two small RNA and two RNA-seq libraries of (i) pooled immature developing stages and (ii) mature B. napus seeds. Among these miRNA families, 17 families are currently known to exist in B. napus; additionally 29 families not reported in B. napus but conserved in other plant species were identified by alignment with known plant mature miRNAs. Assembled mRNA-seq contigs allowed for a search of putative new precursors and led to the identification of 13 novel miRNA families. Analysis of miRNA population between libraries reveals that several miRNAs and isomiRNAs have different abundance in developing stages compared to mature seeds. The predicted miRNA target genes encode a broad range of proteins related to seed development and energy storage. This work presents a comparative study of the miRNA transcriptome of mature and developing B. napus seeds and provides a basis for future research on individual miRNAs and their functions in embryogenesis, seed maturation and lipid accumulation in B. napus. PMID:23226347

  13. Designing synthetic RNAs to determine the relevance of structural motifs in picornavirus IRES elements

    NASA Astrophysics Data System (ADS)

    Fernandez-Chamorro, Javier; Lozano, Gloria; Garcia-Martin, Juan Antonio; Ramajo, Jorge; Dotu, Ivan; Clote, Peter; Martinez-Salas, Encarnacion

    2016-04-01

    The function of Internal Ribosome Entry Site (IRES) elements is intimately linked to their RNA structure. Viral IRES elements are organized in modular domains consisting of one or more stem-loops that harbor conserved RNA motifs critical for internal initiation of translation. A conserved motif is the pyrimidine-tract located upstream of the functional initiation codon in type I and II picornavirus IRES. By computationally designing synthetic RNAs to fold into a structure that sequesters the polypyrimidine tract in a hairpin, we establish a correlation between predicted inaccessibility of the pyrimidine tract and IRES activity, as determined in both in vitro and in vivo systems. Our data supports the hypothesis that structural sequestration of the pyrimidine-tract within a stable hairpin inactivates IRES activity, since the stronger the stability of the hairpin the higher the inhibition of protein synthesis. Destabilization of the stem-loop immediately upstream of the pyrimidine-tract also decreases IRES activity. Our work introduces a hybrid computational/experimental method to determine the importance of structural motifs for biological function. Specifically, we show the feasibility of using the software RNAiFold to design synthetic RNAs with particular sequence and structural motifs that permit subsequent experimental determination of the importance of such motifs for biological function.

  14. Designing synthetic RNAs to determine the relevance of structural motifs in picornavirus IRES elements

    PubMed Central

    Fernandez-Chamorro, Javier; Lozano, Gloria; Garcia-Martin, Juan Antonio; Ramajo, Jorge; Dotu, Ivan; Clote, Peter; Martinez-Salas, Encarnacion

    2016-01-01

    The function of Internal Ribosome Entry Site (IRES) elements is intimately linked to their RNA structure. Viral IRES elements are organized in modular domains consisting of one or more stem-loops that harbor conserved RNA motifs critical for internal initiation of translation. A conserved motif is the pyrimidine-tract located upstream of the functional initiation codon in type I and II picornavirus IRES. By computationally designing synthetic RNAs to fold into a structure that sequesters the polypyrimidine tract in a hairpin, we establish a correlation between predicted inaccessibility of the pyrimidine tract and IRES activity, as determined in both in vitro and in vivo systems. Our data supports the hypothesis that structural sequestration of the pyrimidine-tract within a stable hairpin inactivates IRES activity, since the stronger the stability of the hairpin the higher the inhibition of protein synthesis. Destabilization of the stem-loop immediately upstream of the pyrimidine-tract also decreases IRES activity. Our work introduces a hybrid computational/experimental method to determine the importance of structural motifs for biological function. Specifically, we show the feasibility of using the software RNAiFold to design synthetic RNAs with particular sequence and structural motifs that permit subsequent experimental determination of the importance of such motifs for biological function. PMID:27053355

  15. The 'helix clamp' in HIV-1 reverse transcriptase: a new nucleic acid binding motif common in nucleic acid polymerases.

    PubMed Central

    Hermann, T; Meier, T; Götte, M; Heumann, H

    1994-01-01

    Amino acid sequences homologous to 259KLVGKL (X)16KLLR284 of human immunodeficiency virus type 1 reverse transcriptase (HIV-1 RT) are conserved in several nucleotide polymerizing enzymes. This amino acid motif has been identified in the crystal structure model as an element of the enzyme's nucleic acid binding apparatus. It is part of the helix-turn-helix structure, alpha H-turn-alpha I, within the 'thumb' region of HIV-1 RT. The motif grasps the complexed nucleic acid at one side. Molecular modeling studies on HIV-1 RT in complex with a nucleic acid fragment suggest that the motif has binding function in the p66 subunit as well as in the p51 subunit, acting as a kind of 'helix clamp'. Given its wide distribution within the nucleic acid polymerases, the helix clamp motif is assumed to be a structure of general significance for nucleic acid binding. Images PMID:7527138

  16. Phylogenetic Inference From Conserved sites Alignments

    SciTech Connect

    grundy, W.N.; Naylor, G.J.P.

    1999-08-15

    Molecular sequences provide a rich source of data for inferring the phylogenetic relationships among species. However, recent work indicates that even an accurate multiple alignment of a large sequence set may yield an incorrect phylogeny and that the quality of the phylogenetic tree improves when the input consists only of the highly conserved, motif regions of the alignment. This work introduces two methods of producing multiple alignments that include only the conserved regions of the initial alignment. The first method retains conserved motifs, whereas the second retains individual conserved sites in the initial alignment. Using parsimony analysis on a mitochondrial data set containing 19 species among which the phylogenetic relationships are widely accepted, both conserved alignment methods produce better phylogenetic trees than the complete alignment. Unlike any of the 19 inference methods used before to analyze this data, both methods produce trees that are completely consistent with the known phylogeny. The motif-based method employs far fewer alignment sites for comparable error rates. For a larger data set containing mitochondrial sequences from 39 species, the site-based method produces a phylogenetic tree that is largely consistent with known phylogenetic relationships and suggests several novel placements.

  17. THE GRK4 SUBFAMILY OF G PROTEIN-COUPLED RECEPTOR KINASES: ALTERNATIVE SPLICING, GENE ORGANIZATION, AND SEQUENCE CONSERVATION

    EPA Science Inventory

    The GRK4 subfamily of G protein-coupled receptor kinases. Alternative splicing, gene organization, and sequence conservation.

    Premont RT, Macrae AD, Aparicio SA, Kendall HE, Welch JE, Lefkowitz RJ.

    Department of Medicine, Howard Hughes Medical Institute, Duke Univer...

  18. Functional importance of motif I of pseudouridine synthases: mutagenesis of aligned lysine and proline residues.

    PubMed

    Spedaliere, C J; Hamilton, C S; Mueller, E G

    2000-08-01

    On the basis of sequence alignments, the pseudouridine synthases were grouped into four families that share no statistically significant global sequence similarity, though some common sequence motifs were discovered [Koonin, E. V. (1996) Nucleic Acids. Res. 24, 2411-2415; Gustafsson, C., Reid, R., Greene, P. J., and Santi, D. V. (1996) Nucleic Acids Res. 24, 3756-3762]. We have investigated the functional significance of these alignments by substituting the nearly invariant lysine and proline residues in Motif I of RluA and TruB, pseudouridine synthases belonging to different families. Contrary to our expectations, the altered enzymes display only very mild kinetic impairment. Substitution of the aligned lysine and proline residues does, however, reduce structural stability, consistent with a temperature sensitive phenotype that results from substitution of the cognate proline residue in Cbf5p, a yeast homologue of TruB [Zerbarjadian, Y., King, T., Fournier, M. J., Clarke, L., and Carbon, J. (1999) Mol. Cell. Biol. 19, 7461-7472]. Together, our data support a functional role for Motif I, as predicted by sequence alignments, though the effect of substituting the highly conserved residues was milder than we anticipated. By extrapolation, our findings also support the assignment of pseudouridine synthase function to certain physiologically important eukaryotic proteins that contain Motif I, including the human protein dyskerin, alteration of which leads to the disease dyskeratosis congenita.

  19. Worldwide sequence conservation of transmission-blocking vaccine candidate Pvs230 in Plasmodium vivax

    PubMed Central

    Doi, Masanori; Tanabe, Kazuyuki; Tachibana, Shin-Ichiro; Hamai, Meiko; Tachibana, Mayumi; Mita, Toshihiro; Yagi, Masanori; Zeyrek, Fadile Yildiz; Ferreira, Marcelo U.; Ohmae, Hiroshi; Kaneko, Akira; Randrianarivelojosia, Milijaona; Sattabongkot, Jetsumon; Cao, Ya-Ming; Horii, Toshihiro; Torii, Motomi; Tsuboi, Takafumi

    2011-01-01

    Pfs230, surface protein of gametocyte/gamete of the human malaria parasite, Plasmodium falciparum, is a prime candidate of malaria transmission-blocking vaccine. P. vivax has an ortholog of Pfs230 (Pvs230), however, there has been no study in any aspects on Pvs230 to date. To investigate whether Pvs230 can be a vivax malaria transmission-blocking vaccine, we performed evolutionary and population genetic analysis of the Pvs230 gene (pvs230: PVX_003905). Our analysis of Pvs230 and its orthologs in seven Plasmodium species revealed two distinctive parts: an interspecies variable part (IVP) containing species-specific oligopeptide repeats at the N-terminus and a 7.5 kb interspecies conserved part (ICP) containing 14 cysteine-rich domains. Pvs230 was closely related to its orthologs, Pks230 and Pcys230, in monkey malaria parasites. Analysis of 113 pvs230 sequences obtained from worldwide, showed that nucleotide diversity is remarkably low in the non-repeat 8-kb region of pvs230 (θπ = 0.00118) with 77 polymorphic nucleotide sites, 40 of which resulting in amino acid replacements. A signature of purifying selection but not of balancing selection was seen on pvs230. Functional and/or structural constraints may limit the level of polymorphism in pvs230. The observed limited polymorphism in pvs230 should ground for utilization of Pvs230 as an effective transmission-blocking vaccine. PMID:21514344

  20. Genotyping by sequencing resolves shallow population structure to inform conservation of Chinook salmon (Oncorhynchus tshawytscha)

    PubMed Central

    Larson, Wesley A; Seeb, Lisa W; Everett, Meredith V; Waples, Ryan K; Templin, William D; Seeb, James E

    2014-01-01

    Recent advances in population genomics have made it possible to detect previously unidentified structure, obtain more accurate estimates of demographic parameters, and explore adaptive divergence, potentially revolutionizing the way genetic data are used to manage wild populations. Here, we identified 10 944 single-nucleotide polymorphisms using restriction-site-associated DNA (RAD) sequencing to explore population structure, demography, and adaptive divergence in five populations of Chinook salmon (Oncorhynchus tshawytscha) from western Alaska. Patterns of population structure were similar to those of past studies, but our ability to assign individuals back to their region of origin was greatly improved (>90% accuracy for all populations). We also calculated effective size with and without removing physically linked loci identified from a linkage map, a novel method for nonmodel organisms. Estimates of effective size were generally above 1000 and were biased downward when physically linked loci were not removed. Outlier tests based on genetic differentiation identified 733 loci and three genomic regions under putative selection. These markers and genomic regions are excellent candidates for future research and can be used to create high-resolution panels for genetic monitoring and population assignment. This work demonstrates the utility of genomic data to inform conservation in highly exploited species with shallow population structure. PMID:24665338

  1. Conserved Non-Coding Sequences are Associated with Rates of mRNA Decay in Arabidopsis.

    PubMed

    Spangler, Jacob B; Feltus, Frank Alex

    2013-01-01

    Steady-state mRNA levels are tightly regulated through a combination of transcriptional and post-transcriptional control mechanisms. The discovery of cis-acting DNA elements that encode these control mechanisms is of high importance. We have investigated the influence of conserved non-coding sequences (CNSs), DNA patterns retained after an ancient whole genome duplication event, on the breadth of gene expression and the rates of mRNA decay in Arabidopsis thaliana. The absence of CNSs near α duplicate genes was associated with a decrease in breadth of gene expression and slower mRNA decay rates while the presence CNSs near α duplicates was associated with an increase in breadth of gene expression and faster mRNA decay rates. The observed difference in mRNA decay rate was fastest in genes with CNSs in both non-transcribed and transcribed regions, albeit through an unknown mechanism. This study supports the notion that some Arabidopsis CNSs regulate the steady-state mRNA levels through post-transcriptional control mechanisms and that CNSs also play a role in controlling the breadth of gene expression.

  2. A highly conserved sequence is a novel gene involved in de novo vitamin B6 biosynthesis

    PubMed Central

    Ehrenshaft, Marilyn; Bilski, Piotr; Li, Ming Y.; Chignell, Colin F.; Daub, Margaret E.

    1999-01-01

    The Cercospora nicotianae SOR1 (singlet oxygen resistance) gene was identified previously as a gene involved in resistance of this fungus to singlet-oxygen-generating phototoxins. Although homologues to SOR1 occur in organisms in four kingdoms and encode one of the most highly conserved proteins yet identified, the precise function of this protein has, until now, remained unknown. We show that SOR1 is essential in pyridoxine (vitamin B6) synthesis in C. nicotianae and Aspergillus flavus, although it shows no homology to previously identified pyridoxine synthesis genes identified in Escherichia coli. Sequence database analysis demonstrated that organisms encode either SOR1 or E. coli pyridoxine biosynthesis genes, but not both, suggesting that there are two divergent pathways for de novo pyridoxine biosynthesis in nature. Pathway divergence appears to have occurred during the evolution of the eubacteria. We also present data showing that pyridoxine quenches singlet oxygen at a rate comparable to that of vitamins C and E, two of the most highly efficient biological antioxidants, suggesting a previously unknown role for pyridoxine in active oxygen resistance. PMID:10430950

  3. The VQ Motif-Containing Protein Family of Plant-Specific Transcriptional Regulators1

    PubMed Central

    Jing, Yanjun; Lin, Rongcheng

    2015-01-01

    The VQ motif-containing proteins (designated as VQ proteins) are a class of plant-specific proteins with a conserved and single short FxxhVQxhTG amino acid sequence motif. VQ proteins regulate diverse developmental processes, including responses to biotic and abiotic stresses, seed development, and photomorphogenesis. In this Update, we summarize and discuss recent advances in our understanding of the regulation and function of VQ proteins and the role of the VQ motif in mediating transcriptional regulation and protein-protein interactions in signaling pathways. Based on the accumulated evidence, we propose a general mechanism of action for the VQ protein family, which likely defines a novel class of transcriptional regulators specific to plants. PMID:26220951

  4. Sampling Motif-Constrained Ensembles of Networks

    NASA Astrophysics Data System (ADS)

    Fischer, Rico; Leitão, Jorge C.; Peixoto, Tiago P.; Altmann, Eduardo G.

    2015-10-01

    The statistical significance of network properties is conditioned on null models which satisfy specified properties but that are otherwise random. Exponential random graph models are a principled theoretical framework to generate such constrained ensembles, but which often fail in practice, either due to model inconsistency or due to the impossibility to sample networks from them. These problems affect the important case of networks with prescribed clustering coefficient or number of small connected subgraphs (motifs). In this Letter we use the Wang-Landau method to obtain a multicanonical sampling that overcomes both these problems. We sample, in polynomial time, networks with arbitrary degree sequences from ensembles with imposed motifs counts. Applying this method to social networks, we investigate the relation between transitivity and homophily, and we quantify the correlation between different types of motifs, finding that single motifs can explain up to 60% of the variation of motif profiles.

  5. Naturally processed HLA class II peptides reveal highly conserved immunogenic flanking region sequence preferences that reflect antigen processing rather than peptide-MHC interactions.

    PubMed

    Godkin, A J; Smith, K J; Willis, A; Tejada-Simon, M V; Zhang, J; Elliott, T; Hill, A V

    2001-06-01

    MHC class II heterodimers bind peptides 12-20 aa in length. The peptide flanking residues (PFRs) of these ligands extend from a central binding core consisting of nine amino acids. Increasing evidence suggests that the PFRs can alter the immunogenicity of T cell epitopes. We have previously noted that eluted peptide pool sequence data derived from an MHC class II Ag reflect patterns of enrichment not only in the core binding region but also in the PFRS: We sought to distinguish whether these enrichments reflect cellular processes or direct MHC-peptide interactions. Using the multiple sclerosis-associated allele HLA-DR2, pool sequence data from naturally processed ligands were compared with the patterns of enrichment obtained by binding semicombinatorial peptide libraries to empty HLA-DR2 molecules. Naturally processed ligands revealed patterns of enrichment reflecting both the binding motif of HLA-DR2 (position (P)1, aliphatic; P4, bulky hydrophobic; and P6, polar) as well as the nonbound flanking regions, including acidic residues at the N terminus and basic residues at the C terminus. These PFR enrichments were independent of MHC-peptide interactions. Further studies revealed similar patterns in nine other HLA alleles, with the C-terminal basic residues being as highly conserved as the previously described N-terminal prolines of MHC class II ligands. There is evidence that addition of C-terminal basic PFRs to known peptide epitopes is able to enhance both processing as well as T cell activation. Recognition of these allele-transcending patterns in the PFRs may prove useful in epitope identification and vaccine design.

  6. Conserved Patterns of Microbial Immune Escape: Pathogenic Microbes of Diverse Origin Target the Human Terminal Complement Inhibitor Vitronectin via a Single Common Motif

    PubMed Central

    Kraiczy, Peter; Hammerschmidt, Sven; Skerka, Christine; Zipfel, Peter F.; Riesbeck, Kristian

    2016-01-01

    Pathogenicity of many microbes relies on their capacity to resist innate immunity, and to survive and persist in an immunocompetent human host microbes have developed highly efficient and sophisticated complement evasion strategies. Here we show that different human pathogens including Gram-negative and Gram-positive bacteria, as well as the fungal pathogen Candida albicans, acquire the human terminal complement regulator vitronectin to their surface. By using truncated vitronectin fragments we found that all analyzed microbial pathogens (n = 13) bound human vitronectin via the same C-terminal heparin-binding domain (amino acids 352–374). This specific interaction leaves the terminal complement complex (TCC) regulatory region of vitronectin accessible, allowing inhibition of C5b-7 membrane insertion and C9 polymerization. Vitronectin complexed with the various microbes and corresponding proteins was thus functionally active and inhibited complement-mediated C5b-9 deposition. Taken together, diverse microbial pathogens expressing different structurally unrelated vitronectin-binding molecules interact with host vitronectin via the same conserved region to allow versatile control of the host innate immune response. PMID:26808444

  7. A survey of motif finding Web tools for detecting binding site motifs in ChIP-Seq data

    PubMed Central

    2014-01-01

    Abstract ChIP-Seq (chromatin immunoprecipitation sequencing) has provided the advantage for finding motifs as ChIP-Seq experiments narrow down the motif finding to binding site locations. Recent motif finding tools facilitate the motif detection by providing user-friendly Web interface. In this work, we reviewed nine motif finding Web tools that are capable for detecting binding site motifs in ChIP-Seq data. We showed each motif finding Web tool has its own advantages for detecting motifs that other tools may not discover. We recommended the users to use multiple motif finding Web tools that implement different algorithms for obtaining significant motifs, overlapping resemble motifs, and non-overlapping motifs. Finally, we provided our suggestions for future development of motif finding Web tool that better assists researchers for finding motifs in ChIP-Seq data. Reviewers This article was reviewed by Prof. Sandor Pongor, Dr. Yuriy Gusev, and Dr. Shyam Prabhakar (nominated by Prof. Limsoon Wong). PMID:24555784

  8. Conserved primary sequences of the DNA terminal proteins of five different human adenovirus groups.

    PubMed

    Green, M; Brackmann, K; Wold, W S; Cartas, M; Thornton, H; Elder, J H

    1979-09-01

    The 31 human adenoviruses (Ad) from five groups (A-E) whose DNAs are <20% homologous by molecular hybridization. Ad5 (group C) DNA contains a 55,000-dalton protein probably covalently bound to each 5' terminus. This covalently bound protein may be analogous to polypeptides found in other viral and nonviral systems that are covalently bound to genomic DNAs or RNAs and that are thought to function in DNA or RNA replication. Because of the importance of proteins linked to nucleic acids, we have investigated whether DNAs from all five groups of human adenoviruses have terminal proteins, as well as the peptide relationships among the different terminal proteins. We show here that DNAs from Ad12, 7, 2, 19, and 4, representing Ad groups A-E, respectively, all contain covalently bound proteins of about 55,000 daltons. To investigate the peptide relatedness among the terminal proteins, we prepared microgram quantities of covalently bound protein from Ads in groups A-E and compared their chymotryptic and tryptic (125)I-labeled peptide maps. We find that the covalently bound protein maps of the five Ad groups are highly related and possibly identical. On the other hand, the tryptic and chymotryptic peptide maps of the major virion protein II and the core proteins V and VII of groups B, C, and E Ads show considerable heterology. Assuming that the covalently bound protein is virally coded, the conserved primary sequence of these proteins suggests a major functional role for the protein in Ad replication. Because the genetic origin of the Ad covalently bound proteins is not established, our data are also consistent with the possibility that the protein is coded by a cellular gene.

  9. Sequence and expression pattern of pax-6 are highly conserved between zebrafish and mice.

    PubMed

    Püschel, A W; Gruss, P; Westerfield, M

    1992-03-01

    Despite obvious differences in the patterns of early embryonic development, vertebrates share a number of developmental mechanisms and control genes, suggesting that they use similar genetic programs at some stages of development. To examine this idea, we isolated and characterized one such gene, pax-6, a member of the pax gene family, from the zebrafish Brachydanio rerio and determined the evolutionary conservation in the structure and expression of this gene by comparison to its homolog in mice. We found two alternatively spliced forms of the zebrafish pax-6 message. Sequence and expression pattern of the zebrafish pax-6 gene are remarkably similar to its murine homolog. pax-6 expression begins during early neurulation. A stripe of cells in the neuroectoderm, including the prospective diencephalon and a part of the telencephalon, expresses pax-6 as well as the hindbrain and the ventral spinal cord extending from the level of the first rhombomere to the posterior end of the CNS. During later development more limited regions of the brain including the eye, the olfactory bulb and the pituitary gland express pax-6. Cells at the midbrain-hindbrain junction express eng genes and are separated from the neighboring pax-6 regions by several cells that express neither gene, indicating a complex subdivision of this region. pax-6 expression appears during processes when cell-to-cell signalling is thought to be important, for example during induction of the eye and regionalization of the spinal cord and brain, suggesting that it may be one component mediating the response to inductive interactions.

  10. Novel sequences encoding venom C-type lectins are conserved in phylogenetically and geographically distinct Echis and Bitis viper species.

    PubMed

    Harrison, R A; Oliver, J; Hasson, S S; Bharati, K; Theakston, R D G

    2003-10-01

    Envenoming by Echis saw scaled vipers and Bitis arietans puff adders is the leading cause of death and morbidity in Africa due to snake bite. Despite their medical importance, the composition and constituent functionality of venoms from these vipers remains poorly understood. Here, we report the cloning of cDNA sequences encoding seven clusters or isoforms of the haemostasis-disruptive C-type lectin (CTL) proteins from the venom glands of Echis ocellatus, E. pyramidum leakeyi, E. carinatus sochureki and B. arietans. All these CTL sequences encoded the cysteine scaffold that defines the carbohydrate-recognition domain of mammalian CTLs. All but one of the Echis and Bitis CTL sequences showed greater sequence similarity to the beta than alpha CTL subunits in venoms of related Asian and American vipers. Four of the new CTL clusters showed marked inter-cluster sequence conservation across all four viper species which were significantly different from that of previously published viper CTLs. The other three Echis and Bitis CTL clusters showed varying degrees of sequence similarity to published viper venom CTLs. Because viper venom CTLs exhibit a high degree of sequence similarity and yet exert profoundly different effects on the mammalian haemostatic system, no attempt was made to assign functionality to the new Echis and Bitis CTLs on the basis of sequence alone. The extraordinary level of inter-specific and inter-generic sequence conservation exhibited by the Echis and Bitis CTLs leads us to speculate that antibodies to representative molecules should neutralise the biological function of this important group of venom toxins in vipers that are distributed throughout Africa, the Middle East and the Indian subcontinent. PMID:14557069

  11. Sequence conservation of homeologous bacterial artificial chromosomes and transcription of homeologous genes in soybean (Glycine max L. Merr.).

    PubMed

    Schlueter, Jessica A; Scheffler, Brian E; Schlueter, Shannon D; Shoemaker, Randy C

    2006-10-01

    The paleopolyploid soybean genome was investigated by sequencing homeologous BAC clones anchored by duplicate N-hydroxycinnamoyl/benzoyltransferase (HCBT) genes. The homeologous BACs were genetically mapped to linkage groups C1 and C2. Annotation of the 173,747- and 98,760-bp BACs showed that gene conservation in both order and orientation is high between homeologous regions with only a single gene insertion/deletion and local tandem duplications differing between the regions. The nucleotide sequence conservation extends into intergenic regions as well, probably due to conserved regulatory sequences. Most of the homeologs appear to have a role in either transcription/DNA binding or cellular signaling, suggesting a potential preference for retention of duplicate genes with these functions. Reverse transcriptase-PCR analysis of homeologs showed that in the tissues sampled, most homeologs have not diverged greatly in their transcription profiles. However, four cases of changes in transcription were identified, primarily in the HCBT gene cluster. Because a mapped locus corresponds to a soybean cyst nematode (SCN) QTL, the potential role of HCBT genes in response to SCN is discussed. These results are the first sequenced-based analysis of homeologous BACs in soybean, a diploidized paleopolyploid. PMID:16888343

  12. Use of a Drosophila Genome-Wide Conserved Sequence Database to Identify Functionally Related cis-Regulatory Enhancers

    PubMed Central

    Brody, Thomas; Yavatkar, Amarendra S; Kuzin, Alexander; Kundu, Mukta; Tyson, Leonard J; Ross, Jermaine; Lin, Tzu-Yang; Lee, Chi-Hon; Awasaki, Takeshi; Lee, Tzumin; Odenwald, Ward F

    2012-01-01

    Background: Phylogenetic footprinting has revealed that cis-regulatory enhancers consist of conserved DNA sequence clusters (CSCs). Currently, there is no systematic approach for enhancer discovery and analysis that takes full-advantage of the sequence information within enhancer CSCs. Results: We have generated a Drosophila genome-wide database of conserved DNA consisting of >100,000 CSCs derived from EvoPrints spanning over 90% of the genome. cis-Decoder database search and alignment algorithms enable the discovery of functionally related enhancers. The program first identifies conserved repeat elements within an input enhancer and then searches the database for CSCs that score highly against the input CSC. Scoring is based on shared repeats as well as uniquely shared matches, and includes measures of the balance of shared elements, a diagnostic that has proven to be useful in predicting cis-regulatory function. To demonstrate the utility of these tools, a temporally-restricted CNS neuroblast enhancer was used to identify other functionally related enhancers and analyze their structural organization. Conclusions: cis-Decoder reveals that co-regulating enhancers consist of combinations of overlapping shared sequence elements, providing insights into the mode of integration of multiple regulating transcription factors. The database and accompanying algorithms should prove useful in the discovery and analysis of enhancers involved in any developmental process. Developmental Dynamics 241:169–189, 2012. © 2011 Wiley Periodicals, Inc. Key findings A genome-wide catalog of Drosophila conserved DNA sequence clusters. cis-Decoder discovers functionally related enhancers. Functionally related enhancers share balanced sequence element copy numbers. Many enhancers function during multiple phases of development. PMID:22174086

  13. Conserved Amino Acid Sequence Features in the α Subunits of MoFe, VFe, and FeFe Nitrogenases

    PubMed Central

    Glazer, Alexander N.; Kechris, Katerina J.

    2009-01-01

    Background This study examines the structural features and phylogeny of the α subunits of 69 full-length NifD (MoFe subunit), VnfD (VFe subunit), and AnfD (FeFe subunit) sequences. Methodology/Principal Findings The analyses of this set of sequences included BLAST scores, multiple sequence alignment, examination of patterns of covariant residues, phylogenetic analysis and comparison of the sequences flanking the conserved Cys and His residues that attach the FeMo cofactor to NifD and that are also conserved in the alternative nitrogenases. The results show that NifD nitrogenases fall into two distinct groups. Group I includes NifD sequences from many genera within Bacteria, including all nitrogen-fixing aerobes examined, as well as strict anaerobes and some facultative anaerobes, but no archaeal sequences. In contrast, Group II NifD sequences were limited to a small number of archaeal and bacterial sequences from strict anaerobes. The VnfD and AnfD sequences fall into two separate groups, more closely related to Group II NifD than to Group I NifD. The pattern of perfectly conserved residues, distributed along the full length of the Group I and II NifD, VnfD, and AnfD, confirms unambiguously that these polypeptides are derived from a common ancestral sequence. Conclusions/Significance There is no indication of a relationship between the patterns of covariant residues specific to each of the four groups discussed above that would give indications of an evolutionary pathway leading from one type of nitrogenase to another. Rather the totality of the data, along with the phylogenetic analysis, is consistent with a radiation of Group I and II NifDs, VnfD and AnfD from a common ancestral sequence. All the data presented here strongly support the suggestion made by some earlier investigators that the nitrogenase family had already evolved in the last common ancestor of the Archaea and Bacteria. PMID:19578539

  14. MISAE: a new approach for regulatory motif extraction.

    PubMed

    Sun, Zhaohui; Yang, Jingyi; Deogun, Jitender S

    2004-01-01

    The recognition of regulatory motifs of co-regulated genes is essential for understanding the regulatory mechanisms. However, the automatic extraction of regulatory motifs from a given data set of the upstream non-coding DNA sequences of a family of co-regulated genes is difficult because regulatory motifs are often subtle and inexact. This problem is further complicated by the corruption of the data sets. In this paper, a new approach called Mismatch-allowed Probabilistic Suffix Tree Motif Extraction (MISAE) is proposed. It combines the mismatch-allowed probabilistic suffix tree that is a probabilistic model and local prediction for the extraction of regulatory motifs. The proposed approach is tested on 15 co-regulated gene families and compares favorably with other state-of-the-art approaches. Moreover, MISAE performs well on "corrupted" data sets. It is able to extract the motif from a "corrupted" data set with less than one fourth of the sequences containing the real motif.

  15. Nucleotide sequence analysis of HLA-B*1523 and B*8101. Dominant alpha-helical motifs produce complex serologic recognition patterns for the HLA-B"DT" and HLA-B"NM5" antigens.

    PubMed

    Ellexson, M E; Zhang, G; Stewart, D; Lau, M; Teresi, G; Terasaki, P; Roe, B; Hildebrand, W

    1995-10-01

    Assigning a precise serologic specificity to the class I HLA-B"NM5" and HLA-B"DT" molecules has proven difficult, with patterns of serologic cross-reactivity suggesting that NM5 is most like antigens in the B5 CREG and that DT is either B7 or B40 like. To better understand the relationship these antigens share with other HLA-B molecules we determined the nucleotide sequence of the alleles encoding HLA-B"NM5" and HLA-B"DT". Sequencing results show that NM5 shares the most overall sequence homology with the B70 antigens and that differences at the alpha-helical Bw4/Bw6 epitope preclude serologic cross-reactivity between NM5 and the B70 antigens. Accordingly, NM5 has been assigned the name B*1523. The strong serologic impact of helical sequence conservations and variations is reiterated for the class I HLA-B"DT" molecule. Comparative analysis demonstrates that sequence conservations in the first domain's alpha-helix stimulate cross-reactivity between HLA-B"DT" and HLA-B7, whereas epitopes conserved in the second domain's alpha-helix impel cross-reactivity between HLA-B"DT" and HLA-B48. To convey the unique lineage of this hybrid B7/B48 molecule the name HLA-B*8101 has been assigned to HLA-B"DT".

  16. The Thiamin Pyrophosphate-Motif

    NASA Technical Reports Server (NTRS)

    Dominiak, Paulina M.; Ciszak, Ewa M.

    2003-01-01

    Using databases the authors have identified a common thiamin pyrophosphate (TPP)-motif in the family of functionally diverse TPP-dependent enzymes. This common motif consists of multimeric organization of subunits, two catalytic centers, common amino acid sequence, and specific contacts to provide a flip-flop, or alternate site, mechanism of action. Each catalytic center [PP:PYR] is formed at the interface of the PP-domain binding the magnesium ion, pyrophosphate and aminopyrimidine ring of TPP, and the PYR-domain binding the aminopyrimidine ring of that cofactor. A pair of these catalytic centers constitutes the catalytic core [PP:PYR]* within these enzymes. Analysis of the structural elements of this catalytic core reveals novel definition of the common amino acid sequences, which are GX@&(G)@XXGQ, and GDGX25-30 within the PP- domain, and the E&(G)@XXG@ within the PYR-domain, where Q, corresponds to a hydrophobic amino acid. This TPP-motif provides a novel tool for annotation of TPP-dependent enzymes useful in advancing functional proteomics.

  17. An update on cell surface proteins containing extensin-motifs.

    PubMed

    Borassi, Cecilia; Sede, Ana R; Mecchia, Martin A; Salgado Salter, Juan D; Marzol, Eliana; Muschietti, Jorge P; Estevez, Jose M

    2016-01-01

    In recent years it has become clear that there are several molecular links that interconnect the plant cell surface continuum, which is highly important in many biological processes such as plant growth, development, and interaction with the environment. The plant cell surface continuum can be defined as the space that contains and interlinks the cell wall, plasma membrane and cytoskeleton compartments. In this review, we provide an updated view of cell surface proteins that include modular domains with an extensin (EXT)-motif followed by a cytoplasmic kinase-like domain, known as PERKs (for proline-rich extensin-like receptor kinases); with an EXT-motif and an actin binding domain, known as formins; and with extracellular hybrid-EXTs. We focus our attention on the EXT-motifs with the short sequence Ser-Pro(3-5), which is found in several different protein contexts within the same extracellular space, highlighting a putative conserved structural and functional role. A closer understanding of the dynamic regulation of plant cell surface continuum and its relationship with the downstream signalling cascade is a crucial forthcoming challenge.

  18. Regulatory motifs in Chk1

    PubMed Central

    Caparelli, Michael L.; O’Connell, Matthew J.

    2013-01-01

    Chk1 is the effector kinase of the G2 DNA damage checkpoint. Chk1 homologs possess a highly conserved N-terminal kinase domain and a less conserved C-terminal regulatory domain. In response to DNA damage, Chk1 is recruited to mediator proteins assembled at lesions on replication protein A (RPA)-coated single-stranded DNA (ssDNA). Chk1 is then activated by phosphorylation on S345 in the C-terminal regulatory domain by the PI3 kinase-related kinases ATM and ATR to enforce a G2 cell cycle arrest to allow time for DNA repair. Models have emerged in which this C-terminal phosphorylation relieves auto-inhibitory regulation of the kinase domain by the regulatory domain. However, experiments in fission yeast have shown that deletion of this putative auto-inhibitory domain actually inactivates Chk1 function. We show here that Chk1 homologs possess a kinase-associated 1 (KA1) domain that possesses residues previously implicated in Chk1 auto-inhibition. In addition, all Chk1 homologs have a small and highly conserved C-terminal extension (CTE domain). In fission yeast, both of these motifs are essential for Chk1 activation through interaction with the mediator protein Crb2, the homolog of human 53BP1. Thus, through different intra- and intermolecular interactions, these motifs explain why the regulatory domain exerts both positive and negative control over Chk1 activation. Such motifs may provide alternative targets to the ATP-binding pocket on which to dock Chk1 inhibitors as anticancer therapeutics. PMID:23422000

  19. Dominant sequences of human major histocompatibility complex conserved extended haplotypes from HLA-DQA2 to DAXX.

    PubMed

    Larsen, Charles E; Alford, Dennis R; Trautwein, Michael R; Jalloh, Yanoh K; Tarnacki, Jennifer L; Kunnenkeri, Sushruta K; Fici, Dolores A; Yunis, Edmond J; Awdeh, Zuheir L; Alper, Chester A

    2014-10-01

    We resequenced and phased 27 kb of DNA within 580 kb of the MHC class II region in 158 population chromosomes, most of which were conserved extended haplotypes (CEHs) of European descent or contained their centromeric fragments. We determined the single nucleotide polymorphism and deletion-insertion polymorphism alleles of the dominant sequences from HLA-DQA2 to DAXX for these CEHs. Nine of 13 CEHs remained sufficiently intact to possess a dominant sequence extending at least to DAXX, 230 kb centromeric to HLA-DPB1. We identified the regions centromeric to HLA-DQB1 within which single instances of eight "common" European MHC haplotypes previously sequenced by the MHC Haplotype Project (MHP) were representative of those dominant CEH sequences. Only two MHP haplotypes had a dominant CEH sequence throughout the centromeric and extended class II region and one MHP haplotype did not represent a known European CEH anywhere in the region. We identified the centromeric recombination transition points of other MHP sequences from CEH representation to non-representation. Several CEH pairs or groups shared sequence identity in small blocks but had significantly different (although still conserved for each separate CEH) sequences in surrounding regions. These patterns partly explain strong calculated linkage disequilibrium over only short (tens to hundreds of kilobases) distances in the context of a finite number of observed megabase-length CEHs comprising half a population's haplotypes. Our results provide a clearer picture of European CEH class II allelic structure and population haplotype architecture, improved regional CEH markers, and raise questions concerning regional recombination hotspots. PMID:25299700

  20. Dominant Sequences of Human Major Histocompatibility Complex Conserved Extended Haplotypes from HLA-DQA2 to DAXX

    PubMed Central

    Larsen, Charles E.; Alford, Dennis R.; Trautwein, Michael R.; Jalloh, Yanoh K.; Tarnacki, Jennifer L.; Kunnenkeri, Sushruta K.; Fici, Dolores A.; Yunis, Edmond J.; Awdeh, Zuheir L.; Alper, Chester A.

    2014-01-01

    We resequenced and phased 27 kb of DNA within 580 kb of the MHC class II region in 158 population chromosomes, most of which were conserved extended haplotypes (CEHs) of European descent or contained their centromeric fragments. We determined the single nucleotide polymorphism and deletion-insertion polymorphism alleles of the dominant sequences from HLA-DQA2 to DAXX for these CEHs. Nine of 13 CEHs remained sufficiently intact to possess a dominant sequence extending at least to DAXX, 230 kb centromeric to HLA-DPB1. We identified the regions centromeric to HLA-DQB1 within which single instances of eight “common” European MHC haplotypes previously sequenced by the MHC Haplotype Project (MHP) were representative of those dominant CEH sequences. Only two MHP haplotypes had a dominant CEH sequence throughout the centromeric and extended class II region and one MHP haplotype did not represent a known European CEH anywhere in the region. We identified the centromeric recombination transition points of other MHP sequences from CEH representation to non-representation. Several CEH pairs or groups shared sequence identity in small blocks but had significantly different (although still conserved for each separate CEH) sequences in surrounding regions. These patterns partly explain strong calculated linkage disequilibrium over only short (tens to hundreds of kilobases) distances in the context of a finite number of observed megabase-length CEHs comprising half a population's haplotypes. Our results provide a clearer picture of European CEH class II allelic structure and population haplotype architecture, improved regional CEH markers, and raise questions concerning regional recombination hotspots. PMID:25299700

  1. Crystallization and Preliminary X-ray Diffraction Analysis of motif N from Saccharomyces cerevisiae Dbf4

    SciTech Connect

    Matthews, L.; Duong, A; Prasad, A; Duncker, B; Guarne, A

    2009-01-01

    The Cdc7-Dbf4 complex plays an instrumental role in the initiation of DNA replication and is a target of replication-checkpoint responses in Saccharomyces cerevisiae. Cdc7 is a conserved serine/threonine kinase whose activity depends on association with its regulatory subunit, Dbf4. A conserved sequence near the N-terminus of Dbf4 (motif N) is necessary for the interaction of Cdc7-Dbf4 with the checkpoint kinase Rad53. To understand the role of the Cdc7-Dbf4 complex in checkpoint responses, a fragment of Saccharomyces cerevisiae Dbf4 encompassing motif N was isolated, overproduced and crystallized. A complete native data set was collected at 100 K from crystals that diffracted X-rays to 2.75 {angstrom} resolution and structure determination is currently under way.

  2. Phylogenetic conservation of the 3' cryptic recombination signal sequence (3'cRSS) in the VH genes of jawed vertebrates.

    PubMed

    Sun, Yi; Liu, Zhancai; Li, Zhaoyong; Lian, Zhengxing; Zhao, Yaofeng

    2012-01-01

    The VH replacement process is a RAG-mediated secondary recombination in which the variable region of a rearranged VHDJH is replaced by a different germline VH gene. In almost all human and mouse VH genes, two sequence features appear to be crucial for VH replacement. First, an embedded heptamer, which is located near the 3' end of the rearranged VH gene, serves as a cryptic recombination signal sequence (3'cRSS) for the VH replacement process. Second, a short stretch of nucleotides located downstream of the 3'cRSS serve as a footprint of the original VH region, frequently encoding charged amino acids. In this review, we show that both of these two features are conserved in the VH genes of all jawed vertebrates, which suggests that the VH replacement process may be a conserved mechanism.

  3. High-throughput genomic sequencing of cassava bacterial blight strains identifies conserved effectors to target for durable resistance.

    PubMed

    Bart, Rebecca; Cohn, Megan; Kassen, Andrew; McCallum, Emily J; Shybut, Mikel; Petriello, Annalise; Krasileva, Ksenia; Dahlbeck, Douglas; Medina, Cesar; Alicai, Titus; Kumar, Lava; Moreira, Leandro M; Rodrigues Neto, Júlio; Verdier, Valerie; Santana, María Angélica; Kositcharoenkul, Nuttima; Vanderschuren, Hervé; Gruissem, Wilhelm; Bernal, Adriana; Staskawicz, Brian J

    2012-07-10

    Cassava bacterial blight (CBB), incited by Xanthomonas axonopodis pv. manihotis (Xam), is the most important bacterial disease of cassava, a staple food source for millions of people in developing countries. Here we present a widely applicable strategy for elucidating the virulence components of a pathogen population. We report Illumina-based draft genomes for 65 Xam strains and deduce the phylogenetic relatedness of Xam across the areas where cassava is grown. Using an extensive database of effector proteins from animal and plant pathogens, we identify the effector repertoire for each sequenced strain and use a comparative sequence analysis to deduce the least polymorphic of the conserved effectors. These highly conserved effectors have been maintained over 11 countries, three continents, and 70 y of evolution and as such represent ideal targets for developing resistance strategies. PMID:22699502

  4. Identification of conserved genomic regions and variation therein amongst Cetartiodactyla species using next generation sequencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Background Next Generation Sequencing has created an opportunity to genetically characterize an individual both inexpensively and comprehensively. In earlier work produced in our collaboration [1], it was demonstrated that, for animals without a reference genome, their Next Generation Sequence data ...

  5. Genome-wide discovery and differential regulation of conserved and novel microRNAs in chickpea via deep sequencing.

    PubMed

    Jain, Mukesh; Chevala, V V S Narayana; Garg, Rohini

    2014-11-01

    MicroRNAs (miRNAs) are essential components of complex gene regulatory networks that orchestrate plant development. Although several genomic resources have been developed for the legume crop chickpea, miRNAs have not been discovered until now. For genome-wide discovery of miRNAs in chickpea (Cicer arietinum), we sequenced the small RNA content from seven major tissues/organs employing Illumina technology. About 154 million reads were generated, which represented more than 20 million distinct small RNA sequences. We identified a total of 440 conserved miRNAs in chickpea based on sequence similarity with known miRNAs in other plants. In addition, 178 novel miRNAs were identified using a miRDeep pipeline with plant-specific scoring. Some of the conserved and novel miRNAs with significant sequence similarity were grouped into families. The chickpea miRNAs targeted a wide range of mRNAs involved in diverse cellular processes, including transcriptional regulation (transcription factors), protein modification and turnover, signal transduction, and metabolism. Our analysis revealed several miRNAs with differential spatial expression. Many of the chickpea miRNAs were expressed in a tissue-specific manner. The conserved and differential expression of members of the same miRNA family in different tissues was also observed. Some of the same family members were predicted to target different chickpea mRNAs, which suggested the specificity and complexity of miRNA-mediated developmental regulation. This study, for the first time, reveals a comprehensive set of conserved and novel miRNAs along with their expression patterns and putative targets in chickpea, and provides a framework for understanding regulation of developmental processes in legumes.

  6. Genome sequence conservation of Hendra virus isolates during spillover to horses, Australia.

    PubMed

    Marsh, Glenn A; Todd, Shawn; Foord, Adam; Hansson, Eric; Davies, Kelly; Wright, Lynda; Morrissy, Chris; Halpin, Kim; Middleton, Deborah; Field, Hume E; Daniels, Peter; Wang, Lin-Fa

    2010-11-01

    Bat-to-horse transmission of Hendra virus has occurred at least 14 times. Although clinical signs in horses have differed, genome sequencing has demonstrated little variation among the isolates. Our sequencing of 5 isolates from recent Hendra virus outbreaks in horses found no correlation between sequences and time or geographic location of outbreaks.

  7. ConBind: motif-aware cross-species alignment for the identification of functional transcription factor binding sites

    PubMed Central

    Lelieveld, Stefan H.; Schütte, Judith; Dijkstra, Maurits J.J.; Bawono, Punto; Kinston, Sarah J.; Göttgens, Berthold; Heringa, Jaap; Bonzanni, Nicola

    2016-01-01

    Eukaryotic gene expression is regulated by transcription factors (TFs) binding to promoter as well as distal enhancers. TFs recognize short, but specific binding sites (TFBSs) that are located within the promoter and enhancer regions. Functionally relevant TFBSs are often highly conserved during evolution leaving a strong phylogenetic signal. While multiple sequence alignment (MSA) is a potent tool to detect the phylogenetic signal, the current MSA implementations are optimized to align the maximum number of identical nucleotides. This approach might result in the omission of conserved motifs that contain interchangeable nucleotides such as the ETS motif (IUPAC code: GGAW). Here, we introduce ConBind, a novel method to enhance alignment of short motifs, even if their mutual sequence similarity is only partial. ConBind improves the identification of conserved TFBSs by improving the alignment accuracy of TFBS families within orthologous DNA sequences. Functional validation of the Gfi1b + 13 enhancer reveals that ConBind identifies additional functionally important ETS binding sites that were missed by all other tested alignment tools. In addition to the analysis of known regulatory regions, our web tool is useful for the analysis of TFBSs on so far unknown DNA regions identified through ChIP-sequencing. PMID:26721389

  8. ConBind: motif-aware cross-species alignment for the identification of functional transcription factor binding sites.

    PubMed

    Lelieveld, Stefan H; Schütte, Judith; Dijkstra, Maurits J J; Bawono, Punto; Kinston, Sarah J; Göttgens, Berthold; Heringa, Jaap; Bonzanni, Nicola

    2016-05-01

    Eukaryotic gene expression is regulated by transcription factors (TFs) binding to promoter as well as distal enhancers. TFs recognize short, but specific binding sites (TFBSs) that are located within the promoter and enhancer regions. Functionally relevant TFBSs are often highly conserved during evolution leaving a strong phylogenetic signal. While multiple sequence alignment (MSA) is a potent tool to detect the phylogenetic signal, the current MSA implementations are optimized to align the maximum number of identical nucleotides. This approach might result in the omission of conserved motifs that contain interchangeable nucleotides such as the ETS motif (IUPAC code: GGAW). Here, we introduce ConBind, a novel method to enhance alignment of short motifs, even if their mutual sequence similarity is only partial. ConBind improves the identification of conserved TFBSs by improving the alignment accuracy of TFBS families within orthologous DNA sequences. Functional validation of the Gfi1b + 13 enhancer reveals that ConBind identifies additional functionally important ETS binding sites that were missed by all other tested alignment tools. In addition to the analysis of known regulatory regions, our web tool is useful for the analysis of TFBSs on so far unknown DNA regions identified through ChIP-sequencing.

  9. The RXL motif of the African cassava mosaic virus Rep protein is necessary for rereplication of yeast DNA and viral infection in plants

    SciTech Connect

    Hipp, Katharina; Rau, Peter; Schäfer, Benjamin; Gronenborn, Bruno; Jeske, Holger

    2014-08-15

    Geminiviruses, single-stranded DNA plant viruses, encode a replication-initiator protein (Rep) that is indispensable for virus replication. A potential cyclin interaction motif (RXL) in the sequence of African cassava mosaic virus Rep may be an alternative link to cell cycle controls to the known interaction with plant homologs of retinoblastoma protein (pRBR). Mutation of this motif abrogated rereplication in fission yeast induced by expression of wildtype Rep suggesting that Rep interacts via its RXL motif with one or several yeast proteins. The RXL motif is essential for viral infection of Nicotiana benthamiana plants, since mutation of this motif in infectious clones prevented any symptomatic infection. The cell-cycle link (Clink) protein of a nanovirus (faba bean necrotic yellows virus) was investigated that activates the cell cycle by binding via its LXCXE motif to pRBR. Expression of wildtype Clink and a Clink mutant deficient in pRBR-binding did not trigger rereplication in fission yeast. - Highlights: • A potential cyclin interaction motif is conserved in geminivirus Rep proteins. • In ACMV Rep, this motif (RXL) is essential for rereplication of fission yeast DNA. • Mutating RXL abrogated viral infection completely in Nicotiana benthamiana. • Expression of a nanovirus Clink protein in yeast did not induce rereplication. • Plant viruses may have evolved multiple routes to exploit host DNA synthesis.

  10. Conserved sequence-specific lincRNA-steroid receptor interactions drive transcriptional repression and direct cell fate

    SciTech Connect

    Hudson, William H.; Pickard, Mark R.; de Vera, Ian Mitchelle S.; Kuiper, Emily G.; Mourtada-Maarabouni, Mirna; Conn, Graeme L.; Kojetin, Douglas J.; Williams, Gwyn T.; Ortlund, Eric A.

    2014-12-23

    The majority of the eukaryotic genome is transcribed, generating a significant number of long intergenic noncoding RNAs (lincRNAs). Although lincRNAs represent the most poorly understood product of transcription, recent work has shown lincRNAs fulfill important cellular functions. In addition to low sequence conservation, poor understanding of structural mechanisms driving lincRNA biology hinders systematic prediction of their function. Here we report the molecular requirements for the recognition of steroid receptors (SRs) by the lincRNA growth arrest-specific 5 (Gas5), which regulates steroid-mediated transcriptional regulation, growth arrest and apoptosis. We identify the functional Gas5-SR interface and generate point mutations that ablate the SR-Gas5 lincRNA interaction, altering Gas5-driven apoptosis in cancer cell lines. Further, we find that the Gas5 SR-recognition sequence is conserved among haplorhines, with its evolutionary origin as a splice acceptor site. This study demonstrates that lincRNAs can recognize protein targets in a conserved, sequence-specific manner in order to affect critical cell functions.

  11. Identification and characterization of novel and conserved microRNAs in radish (Raphanus sativus L.) using high-throughput sequencing.

    PubMed

    Xu, Liang; Wang, Yan; Xu, Yuanyuan; Wang, Liangju; Zhai, Lulu; Zhu, Xianwen; Gong, Yiqin; Ye, Shan; Liu, Liwang

    2013-03-01

    MicroRNAs (miRNAs) are endogenous, non-coding, small RNAs that play significant regulatory roles in plant growth, development, and biotic and abiotic stress responses. To date, a great number of conserved and species-specific miRNAs have been identified in many important plant species such as Arabidopsis, rice and poplar. However, little is known about identification of miRNAs and their target genes in radish (Raphanus sativus L.). In the present study, a small RNA library from radish root was constructed and sequenced using the high-throughput Solexa sequencing. Through sequence alignment and secondary structure prediction, a total of 545 conserved miRNA families as well as 15 novel (with their miRNA* strand) and 64 potentially novel miRNAs were identified. Quantitative real-time PCR (qRT-PCR) analysis confirmed that both conserved and novel miRNAs were expressed in radish, and some of them were preferentially expressed in certain tissues. A total of 196 potential target genes were predicted for 42 novel radish miRNAs. Gene ontology (GO) analysis showed that most of the targets were involved in plant growth, development, metabolism and stress responses. This study represents a first large-scale identification and characterization of radish miRNAs and their potential target genes. These results could lead to the further identification of radish miRNAs and enhance our understanding of radish miRNA regulatory mechanisms in diverse biological and metabolic processes.

  12. Conserved sequence-specific lincRNA-steroid receptor interactions drive transcriptional repression and direct cell fate

    PubMed Central

    Hudson, William H.; Pickard, Mark R.; de Vera, Ian Mitchelle S.; Kuiper, Emily G.; Mourtada-Maarabouni, Mirna; Conn, Graeme L.; Kojetin, Douglas J.; Williams, Gwyn T.; Ortlund, Eric A.

    2014-01-01

    The majority of the eukaryotic genome is transcribed, generating a significant number of long intergenic non-coding RNAs (lincRNAs). While lincRNAs represent the most poorly understood product of transcription, recent work has shown lincRNAs fulfill important cellular functions. In addition to low sequence conservation, poor understanding of structural mechanisms driving lincRNA biology hinders systematic prediction of their function. Here, we report the molecular requirements for the recognition of steroid receptors (SRs) by the lincRNA Gas5, which regulates steroid-mediated transcriptional regulation, growth arrest, and apoptosis. We identify the functional Gas5-SR interface and generate point mutations that ablate the SR-Gas5 lincRNA interaction, altering Gas5-driven apoptosis in cancer cell lines. Further, we find that the Gas5 SR-recognition sequence is conserved among haplorhines, with its evolutionary origin as a splice acceptor site. This study demonstrates that lincRNAs can recognize protein targets in a conserved, sequence-specific manner in order to affect critical cell functions. PMID:25377354

  13. Using SCOPE to identify potential regulatory motifs in coregulated genes.

    PubMed

    Martyanov, Viktor; Gross, Robert H

    2011-05-31

    SCOPE is an ensemble motif finder that uses three component algorithms in parallel to identify potential regulatory motifs by over-representation and motif position preference. Each component algorithm is optimized to find a different kind of motif. By taking the best of these three approaches, SCOPE performs better than any single algorithm, even in the presence of noisy data. In this article, we utilize a web version of SCOPE to examine genes that are involved in telomere maintenance. SCOPE has been incorporated into at least two other motif finding programs and has been used in other studies. The three algorithms that comprise SCOPE are BEAM, which finds non-degenerate motifs (ACCGGT), PRISM, which finds degenerate motifs (ASCGWT), and SPACER, which finds longer bipartite motifs (ACCnnnnnnnnGGT). These three algorithms have been optimized to find their corresponding type of motif. Together, they allow SCOPE to perform extremely well. Once a gene set has been analyzed and candidate motifs identified, SCOPE can look for other genes that contain the motif which, when added to the original set, will improve the motif score. This can occur through over-representation or motif position preference. Working with partial gene sets that have biologically verified transcription factor binding sites, SCOPE was able to identify most of the rest of the genes also regulated by the given transcription factor. Output from SCOPE shows candidate motifs, their significance, and other information both as a table and as a graphical motif map. FAQs and video tutorials are available at the SCOPE web site which also includes a "Sample Search" button that allows the user to perform a trial run. Scope has a very friendly user interface that enables novice users to access the algorithm's full power without having to become an expert in the bioinformatics of motif finding. As input, SCOPE can take a list of genes, or FASTA sequences. These can be entered in browser text fields, or read from

  14. Motifs in brain networks.

    PubMed

    Sporns, Olaf; Kötter, Rolf

    2004-11-01

    Complex brains have evolved a highly efficient network architecture whose structural connectivity is capable of generating a large repertoire of functional states. We detect characteristic network building blocks (structural and functional motifs) in neuroanatomical data sets and identify a small set of structural motifs that occur in significantly increased numbers. Our analysis suggests the hypothesis that brain networks maximize both the number and the diversity of functional motifs, while the repertoire of structural motifs remains small. Using functional motif number as a cost function in an optimization algorithm, we obtain network topologies that resemble real brain networks across a broad spectrum of structural measures, including small-world attributes. These results are consistent with the hypothesis that highly evolved neural architectures are organized to maximize functional repertoires and to support highly efficient integration of information.

  15. Motifs in Brain Networks

    PubMed Central

    2004-01-01

    Complex brains have evolved a highly efficient network architecture whose structural connectivity is capable of generating a large repertoire of functional states. We detect characteristic network building blocks (structural and functional motifs) in neuroanatomical data sets and identify a small set of structural motifs that occur in significantly increased numbers. Our analysis suggests the hypothesis that brain networks maximize both the number and the diversity of functional motifs, while the repertoire of structural motifs remains small. Using functional motif number as a cost function in an optimization algorithm, we obtain network topologies that resemble real brain networks across a broad spectrum of structural measures, including small-world attributes. These results are consistent with the hypothesis that highly evolved neural architectures are organized to maximize functional repertoires and to support highly efficient integration of information. PMID:15510229

  16. Specific motifs in the external loops of connexin proteins can determine gap junction formation between chick heart myocytes.

    PubMed Central

    Warner, A; Clements, D K; Parikh, S; Evans, W H; DeHaan, R L

    1995-01-01

    1. Gap junction formation was compared in the absence and presence of small peptides containing extracellular loop sequences of gap junction (connexin) proteins by measuring the time taken for pairs of spontaneously beating embryonic chick heart myoballs to synchronize beat rates. Test peptides were derived from connexin 32. Non-homologous peptides were used as controls. Control pairs took 42 +/- 0.5 min (mean +/- S.E.M.; n = 1088) to synchronize. 2. Connexins 32 and 43, but not 26, were detected in gap junction plaques. The density and distribution of connexin immunolabelling varied between myoballs. 3. Peptides containing conserved motifs from extracellular loops 1 and 2 delayed gap junction formation. The steep portion of the dose-response relation lay between 30 and 300 microM peptide. 4. In loop 1, the conserved motifs QPG and SHVR were identified as being involved in junction formation. In loop 2, the conserved SRPTEK motif was important. The ability of peptides containing the SRPTEK motif to interfere with the formation of gap junctions was enhanced by amino acids from the putative membrane-spanning region. 5. Peptides from loop 1 and loop 2 were equivalently effective; there was no synergism between them. 6. The inclusion of conserved cysteines in test peptides did not make them more effective in the competition assay. Images Figure 1 PMID:8576861

  17. The GA motif: an RNA element common to bacterial antitermination systems, rRNA, and eukaryotic RNAs.

    PubMed Central

    Winkler, W C; Grundy, F J; Murphy, B A; Henkin, T M

    2001-01-01

    Two different transcription termination control mechanisms, the T box and S box systems, are used to regulate transcription of many bacterial aminoacyl-tRNA synthetase, amino acid biosynthesis, and amino acid transport genes. Both of these regulatory mechanisms involve an untranslated mRNA leader region capable of adopting alternate structural conformations that result in transcription termination or transcription elongation into the downstream region. Comparative analyses revealed a small RNA secondary structural element, designated the GA motif, that is highly conserved in both T box and S box leader sequences. The motif consists of two short helices separated by an asymmetric internal loop, with highly conserved GA dinucleotide sequences on either side of the internal loop. Site-directed mutagenesis of this motif in model T and S box leader sequences indicated that it is essential for transcriptional regulation in both systems. This motif is similar to the binding site of yeast ribosomal protein L30, the Snu13p binding sites found in U4 snRNA and box C/D snoRNAs, and two elements in 23S rRNA. PMID:11497434

  18. Characterization of DNA sequences that mediate nuclear protein binding to the regulatory region of the Pisum sativum (pea) chlorophyl a/b binding protein gene AB80: identification of a repeated heptamer motif.

    PubMed

    Argüello, G; García-Hernández, E; Sánchez, M; Gariglio, P; Herrera-Estrella, L; Simpson, J

    1992-05-01

    Two protein factors binding to the regulatory region of the pea chlorophyl a/b binding protein gene AB80 have been identified. One of these factors is found only in green tissue but not in etiolated or root tissue. The second factor (denominated ABF-2) binds to a DNA sequence element that contains a direct heptamer repeat TCTCAAA. It was found that presence of both of the repeats is essential for binding. ABF-2 is present in both green and etiolated tissue and in roots and factors analogous to ABF-2 are present in several plant species. Computer analysis showed that the TCTCAAA motif is present in the regulatory region of several plant genes. PMID:1303797

  19. A novel secondary structure based on fused five-membered rings motif

    PubMed Central

    Dhar, Jesmita; Kishore, Raghuvansh; Chakrabarti, Pinak

    2016-01-01

    An analysis of protein structures indicates the existence of a novel, fused five-membered rings motif, comprising of two residues (i and i + 1), stabilized by interresidue Ni+1–H∙∙∙Ni and intraresidue Ni+1–H∙∙∙O=Ci+1 hydrogen bonds. Fused-rings geometry is the common thread running through many commonly occurring motifs, such as β-turn, β-bulge, Asx-turn, Ser/Thr-turn, Schellman motif, and points to its structural robustness. A location close to the beginning of a β-strand is rather common for the motif. Devoid of side chain, Gly seems to be a key player in this motif, occurring at i, for which the backbone torsion angles cluster at ~(−90°, −10°) and (70°, 20°). The fused-rings structures, distant from each other in sequence, can hydrogen bond with each other, and the two segments aligned to each other in a parallel fashion, give rise to a novel secondary structure, topi, which is quite common in proteins, distinct from two major secondary structures, α-helix and β-sheet. Majority of the peptide segments making topi are identified as aggregation-prone and the residues tend to be conserved among homologous proteins. PMID:27511362

  20. Septal localization by membrane targeting sequences and a conserved sequence essential for activity at the COOH-terminus of Bacillus subtilis cardiolipin synthase.

    PubMed

    Kusaka, Jin; Shuto, Satoshi; Imai, Yukiko; Ishikawa, Kazuki; Saito, Tomo; Natori, Kohei; Matsuoka, Satoshi; Hara, Hiroshi; Matsumoto, Kouji

    2016-04-01

    The acidic phospholipid cardiolipin (CL) is localized on polar and septal membranes and plays an important physiological role in Bacillus subtilis cells. ClsA, the enzyme responsible for CL synthesis, is also localized on septal membranes. We found that GFP fusion proteins of the enzyme with NH2-terminal and internal deletions retained septal localization. However, derivatives with deletions starting from the COOH-terminus (Leu482) ceased to localize to the septum once the deletion passed the Ile residue at 448, indicating that the sequence responsible for septal localization is confined within a short distance from the COOH-terminus. Two sequences, Ile436-Leu450 and Leu466-Leu478, are predicted to individually form an amphipathic α-helix. This configuration is known as a membrane targeting sequence (MTS) and we therefore refer to them as MTS2 and MTS1, respectively. Either one has the ability to affect septal localization, and each of these sequences by itself localizes to the septum. Membrane association of the constructs of this enzyme containing the MTSs was verified by subcellular fractionation of the cells. CL synthesis, in contrast, was abolished after deleting just the last residue, Leu482, in the COOH-terminal four amino acid residue sequence, Ser-Pro-Ile-Leu, which is highly conserved among bacterial CL synthases.

  1. PCR-based study of conserved and variable DNA sequences of Tritrichomonas foetus isolates from Saskatchewan, Canada.

    PubMed Central

    Riley, D E; Wagner, B; Polley, L; Krieger, J N

    1995-01-01

    The protozoan parasite Tritrichomonas foetus causes infertility and spontaneous abortion in cattle. In Saskatchewan, Canada, the culture prevalence of trichomonads was 65 of 1,048 (6%) among 1,048 bulls tested within a 1-year period ending in April 1994. Saskatchewan was previously thought to be free of the parasite. To confirm the culture results, possible T. foetus DNA presence was determined by the PCR. All of the 16 culture-positive isolates tested were PCR positive by a single-band test, but one PCR product was weak. DNA fingerprinting by both T17 PCR and randomly amplified polymorphic DNA PCR revealed genetic variation or polymorphism among the T. foetus isolates. T17 PCR also revealed conserved loci that distinguished these T. foetus isolates from Trichomonas vaginalis, from a variety of other protozoa, and from prokaryotes. TCO-1 PCR, a PCR test designed to sample DNA sequence homologous to the 5' flank of a highly conserved cell division control gene, detected genetic polymorphism at low stringency and a conserved, single locus at higher stringency. These findings suggested that T. foetus isolates exhibit both conserved genetic loci and polymorphic loci detectable by independent PCR methods. Both conserved and polymorphic genetic loci may prove useful for improved clinical diagnosis of T. foetus. The polymorphic loci detected by PCR suggested either a long history of infection or multiple lines of T. foetus infection in Saskatchewan. Polymorphic loci detected by PCR may provide data for epidemiologic studies of T. foetus. PMID:7615746

  2. The complete mitochondrial genome sequence of the liverwort Pleurozia purpurea reveals extremely conservative mitochondrial genome evolution in liverworts.

    PubMed

    Wang, Bin; Xue, Jiayu; Li, Libo; Liu, Yang; Qiu, Yin-Long

    2009-12-01

    Plant mitochondrial genomes have been known to be highly unusual in their large sizes, frequent intra-genomic rearrangement, and generally conservative sequence evolution. Recent studies show that in early land plants the mitochondrial genomes exhibit a mixed mode of conservative yet dynamic evolution. Here, we report the completely sequenced mitochondrial genome from the liverwort Pleurozia purpurea. The circular genome has a size of 168,526 base pairs, containing 43 protein-coding genes, 3 rRNA genes, 25 tRNA genes, and 31 group I or II introns. It differs from the Marchantia polymorpha mitochondrial genome, the only other liverwort chondriome that has been sequenced, in lacking two genes (trnRucg and trnTggu) and one intron (rrn18i1065gII). The two genomes have identical gene orders and highly similar sequences in exons, introns, and intergenic spacers. Finally, a comparative analysis of duplicated trnRucu and other trnR genes from the two liverworts and several other organisms identified the recent lateral origin of trnRucg in Marchantia mtDNA through modification of a duplicated trnRucu. This study shows that the mitochondrial genomes evolve extremely slowly in liverworts, the earliest-diverging lineage of extant land plants, in stark contrast to what is known of highly dynamic evolution of mitochondrial genomes in seed plants.

  3. Mapping the transcription start points of the Staphylococcus aureus eap, emp, and vwb promoters reveals a conserved octanucleotide sequence that is essential for expression of these genes.

    PubMed

    Harraghy, Niamh; Homerova, Dagmar; Herrmann, Mathias; Kormanec, Jan

    2008-01-01

    Mapping the transcription start points of the eap, emp, and vwb promoters revealed a conserved octanucleotide sequence (COS). Deleting this sequence abolished the expression of eap, emp, and vwb. However, electrophoretic mobility shift assays gave no evidence that this sequence was a binding site for SarA or SaeR, known regulators of eap and emp.

  4. Helix-packing motifs in membrane proteins.

    PubMed

    Walters, R F S; DeGrado, W F

    2006-09-12

    The fold of a helical membrane protein is largely determined by interactions between membrane-imbedded helices. To elucidate recurring helix-helix interaction motifs, we dissected the crystallographic structures of membrane proteins into a library of interacting helical pairs. The pairs were clustered according to their three-dimensional similarity (rmsd motifs whose structural features can be understood in terms of simple principles of helix-helix packing. Thus, the universe of common transmembrane helix-pairing motifs is relatively simple. The largest cluster, which comprises 29% of the library members, consists of an antiparallel motif with left-handed packing angles, and it is frequently stabilized by packing of small side chains occurring every seven residues in the sequence. Right-handed parallel and antiparallel structures show a similar tendency to segregate small residues to the helix-helix interface but spaced at four-residue intervals. Position-specific sequence propensities were derived for the most populated motifs. These structural and sequential motifs should be quite useful for the design and structural prediction of membrane proteins.

  5. Structural Conservation Predominates Over Sequence Variability in the Crown of HIV Type 1's V3 Loop

    PubMed Central

    Almond, David; Kimura, Tetsuya; Kong, XiangPeng; Swetnam, James; Zolla-Pazner, Susan

    2010-01-01

    Abstract The diversity of HIV-1 is a confounding problem for vaccine design, as the human immune response appears to favor poor or strain-specific responses to any given HIV-1 virus strain. A significant portion of this diversity is manifested as sequence variability in the loops of HIV-1's surface envelope glycoprotein. Here we show that the most variable sequence positions in the third variable (V3) loop crown cluster to a small zone on the surface of one face of the V3 loop ß-hairpin conformation. These results provide a novel visualization of the gp120 V3 loop, specifically demonstrating a surprising preponderance of conserved three-dimensional structure in a highly sequence-variable region. From a structural point of view, there appears to be less diversity in this region of the HIV-1 “principle neutralizing domain” than previously appreciated. PMID:20560796

  6. High-throughput sequencing discovery of conserved and novel microRNAs in Chinese cabbage (Brassica rapa L. ssp. pekinensis).

    PubMed

    Wang, Fengde; Li, Libin; Liu, Lifeng; Li, Huayin; Zhang, Yihui; Yao, Yingyin; Ni, Zhongfu; Gao, Jianwei

    2012-07-01

    MicroRNAs (miRNAs) are a class of 21-24 nucleotide non-coding RNAs that down-regulate gene expression by cleaving or inhibiting the translation of target gene transcripts. miRNAs have been extensively analyzed in a few model plant species such as Arabidopsis, rice and Populus, and partially investigated in other non-model plant species. However, only a few conserved miRNAs have been identified in Chinese cabbage, a common and economically important crop in Asia. To identify novel and conserved miRNAs in Chinese cabbage (Brassica rapa L. ssp. pekinensis) we constructed a small RNA library. Using high-throughput Solexa sequencing to identify microRNAs we found 11,210 unique sequences belonging to 321 conserved miRNA families and 228 novel miRNAs. We ran a Blast search with these sequences against the Chinese cabbage mRNA database and found 2,308 and 736 potential target genes for 221 conserved and 125 novel miRNAs, respectively. The BlastX search against the Arabidopsis genome and GO analysis suggested most of the targets were involved in plant growth, metabolism, development and stress response. This study provides the first large scale-cloning and characterization of Chinese cabbage miRNAs and their potential targets. These miRNAs add to the growing database of new miRNAs, prompt further study on Chinese cabbage miRNA regulation mechanisms, and help toward a greater understanding of the important roles of miRNAs in Chinese cabbage.

  7. Overlapping ETS and CRE Motifs ((G/C)CGGAAGTGACGTCA) preferentially bound by GABPα and CREB proteins.

    PubMed

    Chatterjee, Raghunath; Zhao, Jianfei; He, Ximiao; Shlyakhtenko, Andrey; Mann, Ishminder; Waterfall, Joshua J; Meltzer, Paul; Sathyanarayana, B K; FitzGerald, Peter C; Vinson, Charles

    2012-10-01

    Previously, we identified 8-bps long DNA sequences (8-mers) that localize in human proximal promoters and grouped them into known transcription factor binding sites (TFBS). We now examine split 8-mers consisting of two 4-mers separated by 1-bp to 30-bps (X(4)-N(1-30)-X(4)) to identify pairs of TFBS that localize in proximal promoters at a precise distance. These include two overlapping TFBS: the ETS⇔ETS motif ((C/G)CCGGAAGCGGAA) and the ETS⇔CRE motif ((C/G)CGGAAGTGACGTCAC). The nucleotides in bold are part of both TFBS. Molecular modeling shows that the ETS⇔CRE motif can be bound simultaneously by both the ETS and the B-ZIP domains without protein-protein clashes. The electrophoretic mobility shift assay (EMSA) shows that the ETS protein GABPα and the B-ZIP protein CREB preferentially bind to the ETS⇔CRE motif only when the two TFBS overlap precisely. In contrast, the ETS domain of ETV5 and CREB interfere with each other for binding the ETS⇔CRE. The 11-mer (CGGAAGTGACG), the conserved part of the ETS⇔CRE motif, occurs 226 times in the human genome and 83% are in known regulatory regions. In vivo GABPα and CREB ChIP-seq peaks identified the ETS⇔CRE as the most enriched motif occurring in promoters of genes involved in mRNA processing, cellular catabolic processes, and stress response, suggesting that a specific class of genes is regulated by this composite motif. PMID:23050235

  8. Overlapping ETS and CRE Motifs ((G/C)CGGAAGTGACGTCA) preferentially bound by GABPα and CREB proteins.

    PubMed

    Chatterjee, Raghunath; Zhao, Jianfei; He, Ximiao; Shlyakhtenko, Andrey; Mann, Ishminder; Waterfall, Joshua J; Meltzer, Paul; Sathyanarayana, B K; FitzGerald, Peter C; Vinson, Charles

    2012-10-01

    Previously, we identified 8-bps long DNA sequences (8-mers) that localize in human proximal promoters and grouped them into known transcription factor binding sites (TFBS). We now examine split 8-mers consisting of two 4-mers separated by 1-bp to 30-bps (X(4)-N(1-30)-X(4)) to identify pairs of TFBS that localize in proximal promoters at a precise distance. These include two overlapping TFBS: the ETS⇔ETS motif ((C/G)CCGGAAGCGGAA) and the ETS⇔CRE motif ((C/G)CGGAAGTGACGTCAC). The nucleotides in bold are part of both TFBS. Molecular modeling shows that the ETS⇔CRE motif can be bound simultaneously by both the ETS and the B-ZIP domains without protein-protein clashes. The electrophoretic mobility shift assay (EMSA) shows that the ETS protein GABPα and the B-ZIP protein CREB preferentially bind to the ETS⇔CRE motif only when the two TFBS overlap precisely. In contrast, the ETS domain of ETV5 and CREB interfere with each other for binding the ETS⇔CRE. The 11-mer (CGGAAGTGACG), the conserved part of the ETS⇔CRE motif, occurs 226 times in the human genome and 83% are in known regulatory regions. In vivo GABPα and CREB ChIP-seq peaks identified the ETS⇔CRE as the most enriched motif occurring in promoters of genes involved in mRNA processing, cellular catabolic processes, and stress response, suggesting that a specific class of genes is regulated by this composite motif.

  9. Retroposition and evolution of the DNA-binding motifs of YY1, YY2 and REX1.

    PubMed

    Kim, Jeong Do; Faulk, Christopher; Kim, Joomyeong

    2007-01-01

    YY1 is a DNA-binding transcription factor found in both vertebrates and invertebrates. Database searches identified 62 YY1 related sequences from all the available genome sequences ranging from flying insects to human. These sequences are characterized by high levels of sequence conservation, ranging from 66% to 100% similarity, in the zinc finger DNA-binding domain of the predicted proteins. Phylogenetic analyses uncovered duplication events of YY1 in several different lineages, including flies, fish and mammals. Retroposition is responsible for generating one duplicate in flies, PHOL from PHO, and two duplicates in placental mammals, YY2 and Reduced Expression 1 (REX1) from YY1. DNA-binding motif studies have demonstrated that YY2 still binds to the same consensus sequence as YY1 but with much lower affinity. In contrast, REX1 binds to DNA motifs divergent from YY1, but the binding motifs of REX1 and YY1 share some similarity at their core regions (5'-CCAT-3'). This suggests that the two duplicates, YY2 and REX1, although generated through similar retroposition events have undergone different selection schemes to adapt to new roles in placental mammals. Overall, the conservation of YY2 and REX1 in all placental mammals predicts that each duplicate has co-evolved with some unique features of eutherian mammals. PMID:17478514

  10. Characterization of the RNA motif responsible for the specific interaction of potato spindle tuber viroid RNA (PSTVd) and the tomato protein Virp1

    PubMed Central

    Gozmanova, Mariyana; Denti, Michela Alessandra; Minkov, Ivan Nikiforov; Tsagris, Mina; Tabler, Martin

    2003-01-01

    Viroids are small non-coding parasitic RNAs that are able to infect their host plants systemically. This circular naked RNA makes use of host proteins to accomplish its proliferation. Here we analyze the specific binding of the tomato protein Virp1 to the terminal right domain of potato spindle tuber viroid RNA (PSTVd). We find that two asymmetric internal loops within the PSTVd (+) RNA, each composed of the sequence elements 5′-ACAGG and CUCUUCC-5′, are responsible for the specific RNA–protein interaction. In view of the nucleotide composition we call this structural element an ‘RY motif’. The RY motif located close to the terminal right hairpin loop of the PSTVd secondary structure has an ∼5-fold stronger binding affinity than the more centrally located RY motif. Simultaneous sequence alterations in both RY motifs abolished the specific binding to Virp1. Mutations in any of the two RY motifs resulted in non-infectious viroid RNA, with the exception of one case, where reversion to sequence wild type took place. In contrast, the simultaneous exchange of two nucleotides within the terminal right hairpin loop of PSTVd had only moderate influence on the binding to Virp1. This variant was infectious and sequence changes were maintained in the progeny. The relevance of the phylogenetic conservation of the RY motif, and sequence elements therein, amongst various genera of the family Pospiviroidae is discussed. PMID:14500815

  11. Listeriolysin genes: complete sequence of ilo from Listeria ivanovii and of lso from Listeria seeligeri.

    PubMed

    Haas, A; Dumbsky, M; Kreft, J

    1992-02-28

    The complete DNA sequences coding for the thiol-activated cytolysins from Listeria ivanovii, ivanolysin O (ILO) and for seeligerolysin O (LSO) from Listeria seeligeri have been determined. The deduced amino acid sequences revealed that: (i) the primary translation products comprise 528 (ILO) and 530 (LSO) amino acids, respectively, (ii) ILO contains two cysteines, LSO has a substitution in the conserved cysteine motif.

  12. The Thiamin Pyrophosphate-Motif

    NASA Technical Reports Server (NTRS)

    Dominiak, P.; Ciszak, E.

    2003-01-01

    Using databases the authors have identified a common thiamin pyrophosphate (TPP)-motif in the family of functionally diverse TPP-dependent enzymes. This common motif consists of multimeric organization of subunits and two catalytic centers. Each catalytic center (PP:PYR) is formed at the interface of the PP-domain binding the magnesium ion, pyrophosphate and amhopyrimidine ring of TPP, and the PYR-domain binding the aminopyrimidine ring of that cofactor. A pair of these catalytic centers constitutes the catalytic core (PP:PYR)(sub 2) within these enzymes. Analysis of the structural elements of this catalytic core reveals novel definition of the common amino acid sequences, which are GXPhiX(sub 4)(G)PhiXXGQ and GDGX(sub 25-30)NN in the PP-domain, and the EX(sub 4)(G)PhiXXGPhi in the PYR-domain, where Phi corresponds to a hydrophobic amino acid. This TPP-motif provides a novel tool for annotation of TPP-dependent enzymes useful in advancing functional proteomics.

  13. The Thiamine-Pyrophosphate-Motif

    NASA Technical Reports Server (NTRS)

    Ciszak, Ewa; Dominiak, Paulina

    2004-01-01

    Thiamin pyrophosphate (TPP), a derivative of vitamin B1, is a cofactor for enzymes performing catalysis in pathways of energy production including the well known decarboxylation of a-keto acid dehydrogenases followed by transketolation. TPP-dependent enzymes constitute a structurally and functionally diverse group exhibiting multimeric subunit organization, multiple domains and two chemically equivalent catalytic centers. Annotation of functional TPP-dependcnt enzymes, therefore, has not been trivial due to low sequence similarity related to this complex organization. Our approach to analysis of structures of known TPP-dependent enzymes reveals for the first time features common to this group, which we have termed the TPP-motif. The TPP-motif consists of specific spatial arrangements of structural elements and their specific contacts to provide for a flip-flop, or alternate site, enzymatic mechanism of action. Analysis of structural elements entrained in the flip-flop action displayed by TPP-dependent enzymes reveals a novel definition of the common amino acid sequences. These sequences allow for annotation of TPP-dependent enzymes, thus advancing functional proteomics. Further details of three-dimensional structures of TPP-dependent enzymes will be discussed.

  14. CLIMP: Clustering Motifs via Maximal Cliques with Parallel Computing Design.

    PubMed

    Zhang, Shaoqiang; Chen, Yong

    2016-01-01

    A set of conserved binding sites recognized by a transcription factor is called a motif, which can be found by many applications of comparative genomics for identifying over-represented segments. Moreover, when numerous putative motifs are predicted from a collection of genome-wide data, their similarity data can be represented as a large graph, where these motifs are connected to one another. However, an efficient clustering algorithm is desired for clustering the motifs that belong to the same groups and separating the motifs that belong to different groups, or even deleting an amount of spurious ones. In this work, a new motif clustering algorithm, CLIMP, is proposed by using maximal cliques and sped up by parallelizing its program. When a synthetic motif dataset from the database JASPAR, a set of putative motifs from a phylogenetic foot-printing dataset, and a set of putative motifs from a ChIP dataset are used to compare the performances of CLIMP and two other high-performance algorithms, the results demonstrate that CLIMP mostly outperforms the two algorithms on the three datasets for motif clustering, so that it can be a useful complement of the clustering procedures in some genome-wide motif prediction pipelines. CLIMP is available at http://sqzhang.cn/climp.html. PMID:27487245

  15. CLIMP: Clustering Motifs via Maximal Cliques with Parallel Computing Design.

    PubMed

    Zhang, Shaoqiang; Chen, Yong

    2016-01-01

    A set of conserved binding sites recognized by a transcription factor is called a motif, which can be found by many applications of comparative genomics for identifying over-represented segments. Moreover, when numerous putative motifs are predicted from a collection of genome-wide data, their similarity data can be represented as a large graph, where these motifs are connected to one another. However, an efficient clustering algorithm is desired for clustering the motifs that belong to the same groups and separating the motifs that belong to different groups, or even deleting an amount of spurious ones. In this work, a new motif clustering algorithm, CLIMP, is proposed by using maximal cliques and sped up by parallelizing its program. When a synthetic motif dataset from the database JASPAR, a set of putative motifs from a phylogenetic foot-printing dataset, and a set of putative motifs from a ChIP dataset are used to compare the performances of CLIMP and two other high-performance algorithms, the results demonstrate that CLIMP mostly outperforms the two algorithms on the three datasets for motif clustering, so that it can be a useful complement of the clustering procedures in some genome-wide motif prediction pipelines. CLIMP is available at http://sqzhang.cn/climp.html.

  16. CLIMP: Clustering Motifs via Maximal Cliques with Parallel Computing Design

    PubMed Central

    Chen, Yong

    2016-01-01

    A set of conserved binding sites recognized by a transcription factor is called a motif, which can be found by many applications of comparative genomics for identifying over-represented segments. Moreover, when numerous putative motifs are predicted from a collection of genome-wide data, their similarity data can be represented as a large graph, where these motifs are connected to one another. However, an efficient clustering algorithm is desired for clustering the motifs that belong to the same groups and separating the motifs that belong to different groups, or even deleting an amount of spurious ones. In this work, a new motif clustering algorithm, CLIMP, is proposed by using maximal cliques and sped up by parallelizing its program. When a synthetic motif dataset from the database JASPAR, a set of putative motifs from a phylogenetic foot-printing dataset, and a set of putative motifs from a ChIP dataset are used to compare the performances of CLIMP and two other high-performance algorithms, the results demonstrate that CLIMP mostly outperforms the two algorithms on the three datasets for motif clustering, so that it can be a useful complement of the clustering procedures in some genome-wide motif prediction pipelines. CLIMP is available at http://sqzhang.cn/climp.html. PMID:27487245

  17. Domains in microbial beta-1, 4-glycanases: sequence conservation, function, and enzyme families.

    PubMed Central

    Gilkes, N R; Henrissat, B; Kilburn, D G; Miller, R C; Warren, R A

    1991-01-01

    Several types of domain occur in beta-1, 4-glycanases. The best characterized of these are the catalytic domains and the cellulose-binding domains. The domains may be joined by linker sequences rich in proline or hydroxyamino acids or both. Some of the enzymes contain repeated sequences up to 150 amino acids in length. The enzymes can be grouped into families on the basis of sequence similarities between the catalytic domains. There are sequence similarities between the cellulose-binding domains, of which two types have been identified, and also between some domains of unknown function. The beta-1, 4-glycanases appear to have arisen by the shuffling of a relatively small number of progenitor sequences. PMID:1886523

  18. S6:S18 ribosomal protein complex interacts with a structural motif present in its own mRNA

    PubMed Central

    Matelska, Dorota; Purta, Elzbieta; Panek, Sylwia; Boniecki, Michal J.; Bujnicki, Janusz M.; Dunin-Horkawicz, Stanislaw

    2013-01-01

    Prokaryotic ribosomal protein genes are typically grouped within highly conserved operons. In many cases, one or more of the encoded proteins not only bind to a specific site in the ribosomal RNA, but also to a motif localized within their own mRNA, and thereby regulate expression of the operon. In this study, we computationally predicted an RNA motif present in many bacterial phyla within the 5′ untranslated region of operons encoding ribosomal proteins S6 and S18. We demonstrated that the S6:S18 complex binds to this motif, which we hereafter refer to as the S6:S18 complex-binding motif (S6S18CBM). This motif is a conserved CCG sequence presented in a bulge flanked by a stem and a hairpin structure. A similar structure containing a CCG trinucleotide forms the S6:S18 complex binding site in 16S ribosomal RNA. We have constructed a 3D structural model of a S6:S18 complex with S6S18CBM, which suggests that the CCG trinucleotide in a specific structural context may be specifically recognized by the S18 protein. This prediction was supported by site-directed mutagenesis of both RNA and protein components. These results provide a molecular basis for understanding protein-RNA recognition and suggest that the S6S18CBM is involved in an auto-regulatory mechanism. PMID:23980204

  19. Structural Motifs of Gold Nanoparticles.

    NASA Astrophysics Data System (ADS)

    Cleveland, C. L.; Luedtke, W. D.; Landman, Uzi

    1996-03-01

    Through an extensive search, involving energy minimization using embedded atom potentials, we found(R.L. Whetten et al./), submitted to Nature (1995). that the energetically optimal sequence for AuN clusters (30 <= N <= 3000 atoms) consists of fcc crystallites, with a truncated-octahedral (TO) morphological motif, and variants thereof. These predictions for bare gold particles, and for particles coated by sef-assembled thiol monolayers, are discussed in light of recent experiments on the preparation and characterization (including mass spectrometry, electron microscopy, and X-ray diffraction) of nanocrystalline gold molecules (see Ref. 2).

  20. Multi-species sequence comparison reveals conservation of ghrelin gene-derived splice variants encoding a truncated ghrelin peptide.

    PubMed

    Seim, Inge; Jeffery, Penny L; Thomas, Patrick B; Walpole, Carina M; Maugham, Michelle; Fung, Jenny N T; Yap, Pei-Yi; O'Keeffe, Angela J; Lai, John; Whiteside, Eliza J; Herington, Adrian C; Chopin, Lisa K

    2016-06-01

    The peptide hormone ghrelin is a potent orexigen produced predominantly in the stomach. It has a number of other biological actions, including roles in appetite stimulation, energy balance, the stimulation of growth hormone release and the regulation of cell proliferation. Recently, several ghrelin gene splice variants have been described. Here, we attempted to identify conserved alternative splicing of the ghrelin gene by cross-species sequence comparisons. We identified a novel human exon 2-deleted variant and provide preliminary evidence that this splice variant and in1-ghrelin encode a C-terminally truncated form of the ghrelin peptide, termed minighrelin. These variants are expressed in humans and mice, demonstrating conservation of alternative splicing spanning 90 million years. Minighrelin appears to have similar actions to full-length ghrelin, as treatment with exogenous minighrelin peptide stimulates appetite and feeding in mice. Forced expression of the exon 2-deleted preproghrelin variant mirrors the effect of the canonical preproghrelin, stimulating cell proliferation and migration in the PC3 prostate cancer cell line. This is the first study to characterise an exon 2-deleted preproghrelin variant and to demonstrate sequence conservation of ghrelin gene-derived splice variants that encode a truncated ghrelin peptide. This adds further impetus for studies into the alternative splicing of the ghrelin gene and the function of novel ghrelin peptides in vertebrates.

  1. Export of malaria proteins requires co-translational processing of the PEXEL motif independent of phosphatidylinositol-3-phosphate binding

    PubMed Central

    Boddey, Justin A.; O'Neill, Matthew T.; Lopaticki, Sash; Carvalho, Teresa G.; Hodder, Anthony N.; Nebl, Thomas; Wawra, Stephan; van West, Pieter; Ebrahimzadeh, Zeinab; Richard, Dave; Flemming, Sven; Spielmann, Tobias; Przyborski, Jude; Babon, Jeff J.; Cowman, Alan F.

    2016-01-01

    Plasmodium falciparum exports proteins into erythrocytes using the Plasmodium export element (PEXEL) motif, which is cleaved in the endoplasmic reticulum (ER) by plasmepsin V (PMV). A recent study reported that phosphatidylinositol-3-phosphate (PI(3)P) concentrated in the ER binds to PEXEL motifs and is required for export independent of PMV, and that PEXEL motifs are functionally interchangeable with RxLR motifs of oomycete effectors. Here we show that the PEXEL does not bind PI(3)P, and that this lipid is not concentrated in the ER. We find that RxLR motifs cannot mediate export in P. falciparum. Parasites expressing a mutated version of KAHRP, with the PEXEL motif repositioned near the signal sequence, prevented PMV cleavage. This mutant possessed the putative PI(3)P-binding residues but is not exported. Reinstatement of PEXEL to its original location restores processing by PMV and export. These results challenge the PI(3)P hypothesis and provide evidence that PEXEL position is conserved for co-translational processing and export. PMID:26832821

  2. A hydrophobic proline-rich motif is involved in the intracellular targeting of temperature-induced lipocalin.

    PubMed

    Hernández-Gras, Francesc; Boronat, Albert

    2015-06-01

    Temperature-induced lipocalins (TILs) play an essential role in the response of plants to different abiotic stresses. In agreement with their proposed role in protecting membrane lipids, TILs have been reported to be associated to cell membranes. However, TILs show an overall hydrophilic character and do not contain any signal for membrane targeting nor hydrophobic sequences that could represent transmembrane domains. Arabidopsis TIL (AtTIL) is considered the ortholog of human ApoD, a protein known to associate to membranes through a short hydrophobic loop protruding from strands 5 and 6 of the lipocalin β-barrel. An equivalent loop (referred to as HPR motif) is also present between β-strands 5 and 6 of TILs. The HPR motif, which is highly conserved among TIL proteins, extends over as short stretch of eight amino acids and contains four invariant proline residues. Subcellular localization studies have shown that TILs are targeted to a variety of cell membranes and organelles. We have also found that the HPR motif is necessary and sufficient for the intracellular targeting of TILs. Modeling studies suggest that the HPR motif may directly anchor TILs to cell membranes, favoring in this way further contact with the polar group of membrane lipids. However, some particular features of the HPR motif open the possibility that targeting of TILs to cell membranes could be mediated by interaction with other proteins. The functional analysis of the HPR motif unveils the existence of novel mechanisms involved in the intracellular targeting of proteins in plants.

  3. Conservation of plastid sequences in the plant nuclear genome for millions of years facilitates endosymbiotic evolution.

    PubMed

    Rousseau-Gueutin, Mathieu; Ayliffe