Science.gov

Sample records for integrated sequence motif

  1. Mining protein sequences for motifs.

    PubMed

    Narasimhan, Giri; Bu, Changsong; Gao, Yuan; Wang, Xuning; Xu, Ning; Mathee, Kalai

    2002-01-01

    We use methods from Data Mining and Knowledge Discovery to design an algorithm for detecting motifs in protein sequences. The algorithm assumes that a motif is constituted by the presence of a "good" combination of residues in appropriate locations of the motif. The algorithm attempts to compile such good combinations into a "pattern dictionary" by processing an aligned training set of protein sequences. The dictionary is subsequently used to detect motifs in new protein sequences. Statistical significance of the detection results are ensured by statistically determining the various parameters of the algorithm. Based on this approach, we have implemented a program called GYM. The Helix-Turn-Helix motif was used as a model system on which to test our program. The program was also extended to detect Homeodomain motifs. The detection results for the two motifs compare favorably with existing programs. In addition, the GYM program provides a lot of useful information about a given protein sequence. PMID:12487759

  2. Detecting correlations among functional-sequence motifs

    NASA Astrophysics Data System (ADS)

    Pirino, Davide; Rigosa, Jacopo; Ledda, Alice; Ferretti, Luca

    2012-06-01

    Sequence motifs are words of nucleotides in DNA with biological functions, e.g., gene regulation. Identification of such words proceeds through rejection of Markov models on the expected motif frequency along the genome. Additional biological information can be extracted from the correlation structure among patterns of motif occurrences. In this paper a log-linear multivariate intensity Poisson model is estimated via expectation maximization on a set of motifs along the genome of E. coli K12. The proposed approach allows for excitatory as well as inhibitory interactions among motifs and between motifs and other genomic features like gene occurrences. Our findings confirm previous stylized facts about such types of interactions and shed new light on genome-maintenance functions of some particular motifs. We expect these methods to be applicable to a wider set of genomic features.

  3. Detecting correlations among functional-sequence motifs.

    PubMed

    Pirino, Davide; Rigosa, Jacopo; Ledda, Alice; Ferretti, Luca

    2012-06-01

    Sequence motifs are words of nucleotides in DNA with biological functions, e.g., gene regulation. Identification of such words proceeds through rejection of Markov models on the expected motif frequency along the genome. Additional biological information can be extracted from the correlation structure among patterns of motif occurrences. In this paper a log-linear multivariate intensity Poisson model is estimated via expectation maximization on a set of motifs along the genome of E. coli K12. The proposed approach allows for excitatory as well as inhibitory interactions among motifs and between motifs and other genomic features like gene occurrences. Our findings confirm previous stylized facts about such types of interactions and shed new light on genome-maintenance functions of some particular motifs. We expect these methods to be applicable to a wider set of genomic features. PMID:23005179

  4. Detecting seeded motifs in DNA sequences.

    PubMed

    Pizzi, Cinzia; Bortoluzzi, Stefania; Bisognin, Andrea; Coppe, Alessandro; Danieli, Gian Antonio

    2005-01-01

    The problem of detecting DNA motifs with functional relevance in real biological sequences is difficult due to a number of biological, statistical and computational issues and also because of the lack of knowledge about the structure of searched patterns. Many algorithms are implemented in fully automated processes, which are often based upon a guess of input parameters from the user at the very first step. In this paper, we present a novel method for the detection of seeded DNA motifs, composed by regions with a different extent of variability. The method is based on a multi-step approach, which was implemented in a motif searching web tool (MOST). Overrepresented exact patterns are extracted from input sequences and clustered to produce motifs core regions, which are then extended and scored to generate seeded motifs. The combination of automated pattern discovery algorithms and different display tools for the evaluation and selection of results at several analysis steps can potentially lead to much more meaningful results than complete automation can produce. Experimental results on different yeast and human real datasets proved the methodology to be a promising solution for finding seeded motifs. MOST web tool is freely available at http://telethon.bio.unipd.it/bioinfo/MOST. PMID:16141193

  5. Detecting seeded motifs in DNA sequences

    PubMed Central

    Pizzi, Cinzia; Bortoluzzi, Stefania; Bisognin, Andrea; Coppe, Alessandro; Danieli, Gian Antonio

    2005-01-01

    The problem of detecting DNA motifs with functional relevance in real biological sequences is difficult due to a number of biological, statistical and computational issues and also because of the lack of knowledge about the structure of searched patterns. Many algorithms are implemented in fully automated processes, which are often based upon a guess of input parameters from the user at the very first step. In this paper, we present a novel method for the detection of seeded DNA motifs, composed by regions with a different extent of variability. The method is based on a multi-step approach, which was implemented in a motif searching web tool (MOST). Overrepresented exact patterns are extracted from input sequences and clustered to produce motifs core regions, which are then extended and scored to generate seeded motifs. The combination of automated pattern discovery algorithms and different display tools for the evaluation and selection of results at several analysis steps can potentially lead to much more meaningful results than complete automation can produce. Experimental results on different yeast and human real datasets proved the methodology to be a promising solution for finding seeded motifs. MOST web tool is freely available at . PMID:16141193

  6. MIDDAS-M: Motif-Independent De Novo Detection of Secondary Metabolite Gene Clusters through the Integration of Genome Sequencing and Transcriptome Data

    PubMed Central

    Umemura, Myco; Koike, Hideaki; Nagano, Nozomi; Ishii, Tomoko; Kawano, Jin; Yamane, Noriko; Kozone, Ikuko; Horimoto, Katsuhisa; Shin-ya, Kazuo; Asai, Kiyoshi; Yu, Jiujiang; Bennett, Joan W.; Machida, Masayuki

    2013-01-01

    Many bioactive natural products are produced as “secondary metabolites” by plants, bacteria, and fungi. During the middle of the 20th century, several secondary metabolites from fungi revolutionized the pharmaceutical industry, for example, penicillin, lovastatin, and cyclosporine. They are generally biosynthesized by enzymes encoded by clusters of coordinately regulated genes, and several motif-based methods have been developed to detect secondary metabolite biosynthetic (SMB) gene clusters using the sequence information of typical SMB core genes such as polyketide synthases (PKS) and non-ribosomal peptide synthetases (NRPS). However, no detection method exists for SMB gene clusters that are functional and do not include core SMB genes at present. To advance the exploration of SMB gene clusters, especially those without known core genes, we developed MIDDAS-M, a motif-independent de novo detection algorithm for SMB gene clusters. We integrated virtual gene cluster generation in an annotated genome sequence with highly sensitive scoring of the cooperative transcriptional regulation of cluster member genes. MIDDAS-M accurately predicted 38 SMB gene clusters that have been experimentally confirmed and/or predicted by other motif-based methods in 3 fungal strains. MIDDAS-M further identified a new SMB gene cluster for ustiloxin B, which was experimentally validated. Sequence analysis of the cluster genes indicated a novel mechanism for peptide biosynthesis independent of NRPS. Because it is fully computational and independent of empirical knowledge about SMB core genes, MIDDAS-M allows a large-scale, comprehensive analysis of SMB gene clusters, including those with novel biosynthetic mechanisms that do not contain any functionally characterized genes. PMID:24391870

  7. Short sequence motifs, overrepresented in mammalian conservednon-coding sequences

    SciTech Connect

    Minovitsky, Simon; Stegmaier, Philip; Kel, Alexander; Kondrashov,Alexey S.; Dubchak, Inna

    2007-02-21

    Background: A substantial fraction of non-coding DNAsequences of multicellular eukaryotes is under selective constraint. Inparticular, ~;5 percent of the human genome consists of conservednon-coding sequences (CNSs). CNSs differ from other genomic sequences intheir nucleotide composition and must play important functional roles,which mostly remain obscure.Results: We investigated relative abundancesof short sequence motifs in all human CNSs present in the human/mousewhole-genome alignments vs. three background sets of sequences: (i)weakly conserved or unconserved non-coding sequences (non-CNSs); (ii)near-promoter sequences (located between nucleotides -500 and -1500,relative to a start of transcription); and (iii) random sequences withthe same nucleotide composition as that of CNSs. When compared tonon-CNSs and near-promoter sequences, CNSs possess an excess of AT-richmotifs, often containing runs of identical nucleotides. In contrast, whencompared to random sequences, CNSs contain an excess of GC-rich motifswhich, however, lack CpG dinucleotides. Thus, abundance of short sequencemotifs in human CNSs, taken as a whole, is mostly determined by theiroverall compositional properties and not by overrepresentation of anyspecific short motifs. These properties are: (i) high AT-content of CNSs,(ii) a tendency, probably due to context-dependent mutation, of A's andT's to clump, (iii) presence of short GC-rich regions, and (iv) avoidanceof CpG contexts, due to their hypermutability. Only a small number ofshort motifs, overrepresented in all human CNSs are similar to bindingsites of transcription factors from the FOX family.Conclusion: Human CNSsas a whole appear to be too broad a class of sequences to possess strongfootprints of any short sequence-specific functions. Such footprintsshould be studied at the level of functional subclasses of CNSs, such asthose which flank genes with a particular pattern of expression. Overallproperties of CNSs are affected by patterns in

  8. CodingMotif: exact determination of overrepresented nucleotide motifs in coding sequences

    PubMed Central

    2012-01-01

    Background It has been increasingly appreciated that coding sequences harbor regulatory sequence motifs in addition to encoding for protein. These sequence motifs are expected to be overrepresented in nucleotide sequences bound by a common protein or small RNA. However, detecting overrepresented motifs has been difficult because of interference by constraints at the protein level. Sampling-based approaches to solve this problem based on codon-shuffling have been limited to exploring only an infinitesimal fraction of the sequence space and by their use of parametric approximations. Results We present a novel O(N(log N)2)-time algorithm, CodingMotif, to identify nucleotide-level motifs of unusual copy number in protein-coding regions. Using a new dynamic programming algorithm we are able to exhaustively calculate the distribution of the number of occurrences of a motif over all possible coding sequences that encode the same amino acid sequence, given a background model for codon usage and dinucleotide biases. Our method takes advantage of the sparseness of loci where a given motif can occur, greatly speeding up the required convolution calculations. Knowledge of the distribution allows one to assess the exact non-parametric p-value of whether a given motif is over- or under- represented. We demonstrate that our method identifies known functional motifs more accurately than sampling and parametric-based approaches in a variety of coding datasets of various size, including ChIP-seq data for the transcription factors NRSF and GABP. Conclusions CodingMotif provides a theoretically and empirically-demonstrated advance for the detection of motifs overrepresented in coding sequences. We expect CodingMotif to be useful for identifying motifs in functional genomic datasets such as DNA-protein binding, RNA-protein binding, or microRNA-RNA binding within coding regions. A software implementation is available at http://bioinformatics.bc.edu/chuanglab/codingmotif.tar PMID

  9. Occurrence probability of structured motifs in random sequences.

    PubMed

    Robin, S; Daudin, J-J; Richard, H; Sagot, M-F; Schbath, S

    2002-01-01

    The problem of extracting from a set of nucleic acid sequences motifs which may have biological function is more and more important. In this paper, we are interested in particular motifs that may be implicated in the transcription process. These motifs, called structured motifs, are composed of two ordered parts separated by a variable distance and allowing for substitutions. In order to assess their statistical significance, we propose approximations of the probability of occurrences of such a structured motif in a given sequence. An application of our method to evaluate candidate promoters in E. coli and B. subtilis is presented. Simulations show the goodness of the approximations. PMID:12614545

  10. DynaMIT: the dynamic motif integration toolkit

    PubMed Central

    Dassi, Erik; Quattrone, Alessandro

    2016-01-01

    De-novo motif search is a frequently applied bioinformatics procedure to identify and prioritize recurrent elements in sequences sets for biological investigation, such as the ones derived from high-throughput differential expression experiments. Several algorithms have been developed to perform motif search, employing widely different approaches and often giving divergent results. In order to maximize the power of these investigations and ultimately be able to draft solid biological hypotheses, there is the need for applying multiple tools on the same sequences and merge the obtained results. However, motif reporting formats and statistical evaluation methods currently make such an integration task difficult to perform and mostly restricted to specific scenarios. We thus introduce here the Dynamic Motif Integration Toolkit (DynaMIT), an extremely flexible platform allowing to identify motifs employing multiple algorithms, integrate them by means of a user-selected strategy and visualize results in several ways; furthermore, the platform is user-extendible in all its aspects. DynaMIT is freely available at http://cibioltg.bitbucket.org. PMID:26253738

  11. RSAT::Plants: Motif Discovery Within Clusters of Upstream Sequences in Plant Genomes.

    PubMed

    Contreras-Moreira, Bruno; Castro-Mondragon, Jaime A; Rioualen, Claire; Cantalapiedra, Carlos P; van Helden, Jacques

    2016-01-01

    The plant-dedicated mirror of the Regulatory Sequence Analysis Tools (RSAT, http://plants.rsat.eu ) offers specialized options for researchers dealing with plant transcriptional regulation. The website contains whole-sequenced genomes from species regularly updated from Ensembl Plants and other sources (currently 40), and supports an array of tasks frequently required for the analysis of regulatory sequences, such as retrieving upstream sequences, motif discovery, motif comparison, and pattern matching. RSAT::Plants also integrates the footprintDB collection of DNA motifs. This protocol explains step-by-step how to discover DNA motifs in regulatory regions of clusters of co-expressed genes in plants. It also explains how to empirically control the significance of the result, and how to associate the discovered motifs with putative binding factors. PMID:27557774

  12. Identifying novel sequence variants of RNA 3D motifs

    PubMed Central

    Zirbel, Craig L.; Roll, James; Sweeney, Blake A.; Petrov, Anton I.; Pirrung, Meg; Leontis, Neocles B.

    2015-01-01

    Predicting RNA 3D structure from sequence is a major challenge in biophysics. An important sub-goal is accurately identifying recurrent 3D motifs from RNA internal and hairpin loop sequences extracted from secondary structure (2D) diagrams. We have developed and validated new probabilistic models for 3D motif sequences based on hybrid Stochastic Context-Free Grammars and Markov Random Fields (SCFG/MRF). The SCFG/MRF models are constructed using atomic-resolution RNA 3D structures. To parameterize each model, we use all instances of each motif found in the RNA 3D Motif Atlas and annotations of pairwise nucleotide interactions generated by the FR3D software. Isostericity relations between non-Watson–Crick basepairs are used in scoring sequence variants. SCFG techniques model nested pairs and insertions, while MRF ideas handle crossing interactions and base triples. We use test sets of randomly-generated sequences to set acceptance and rejection thresholds for each motif group and thus control the false positive rate. Validation was carried out by comparing results for four motif groups to RMDetect. The software developed for sequence scoring (JAR3D) is structured to automatically incorporate new motifs as they accumulate in the RNA 3D Motif Atlas when new structures are solved and is available free for download. PMID:26130723

  13. Identifying novel sequence variants of RNA 3D motifs.

    PubMed

    Zirbel, Craig L; Roll, James; Sweeney, Blake A; Petrov, Anton I; Pirrung, Meg; Leontis, Neocles B

    2015-09-01

    Predicting RNA 3D structure from sequence is a major challenge in biophysics. An important sub-goal is accurately identifying recurrent 3D motifs from RNA internal and hairpin loop sequences extracted from secondary structure (2D) diagrams. We have developed and validated new probabilistic models for 3D motif sequences based on hybrid Stochastic Context-Free Grammars and Markov Random Fields (SCFG/MRF). The SCFG/MRF models are constructed using atomic-resolution RNA 3D structures. To parameterize each model, we use all instances of each motif found in the RNA 3D Motif Atlas and annotations of pairwise nucleotide interactions generated by the FR3D software. Isostericity relations between non-Watson-Crick basepairs are used in scoring sequence variants. SCFG techniques model nested pairs and insertions, while MRF ideas handle crossing interactions and base triples. We use test sets of randomly-generated sequences to set acceptance and rejection thresholds for each motif group and thus control the false positive rate. Validation was carried out by comparing results for four motif groups to RMDetect. The software developed for sequence scoring (JAR3D) is structured to automatically incorporate new motifs as they accumulate in the RNA 3D Motif Atlas when new structures are solved and is available free for download. PMID:26130723

  14. BlockLogo: visualization of peptide and sequence motif conservation.

    PubMed

    Olsen, Lars Rønn; Kudahl, Ulrich Johan; Simon, Christian; Sun, Jing; Schönbach, Christian; Reinherz, Ellis L; Zhang, Guang Lan; Brusic, Vladimir

    2013-12-31

    BlockLogo is a web-server application for the visualization of protein and nucleotide fragments, continuous protein sequence motifs, and discontinuous sequence motifs using calculation of block entropy from multiple sequence alignments. The user input consists of a multiple sequence alignment, selection of motif positions, type of sequence, and output format definition. The output has BlockLogo along with the sequence logo, and a table of motif frequencies. We deployed BlockLogo as an online application and have demonstrated its utility through examples that show visualization of T-cell epitopes and B-cell epitopes (both continuous and discontinuous). Our additional example shows a visualization and analysis of structural motifs that determine the specificity of peptide binding to HLA-DR molecules. The BlockLogo server also employs selected experimentally validated prediction algorithms to enable on-the-fly prediction of MHC binding affinity to 15 common HLA class I and class II alleles as well as visual analysis of discontinuous epitopes from multiple sequence alignments. It enables the visualization and analysis of structural and functional motifs that are usually described as regular expressions. It provides a compact view of discontinuous motifs composed of distant positions within biological sequences. BlockLogo is available at: http://research4.dfci.harvard.edu/cvc/blocklogo/ and http://met-hilab.bu.edu/blocklogo/. PMID:24001880

  15. A Gibbs sampler for motif detection in phylogenetically close sequences

    NASA Astrophysics Data System (ADS)

    Siddharthan, Rahul; van Nimwegen, Erik; Siggia, Eric

    2004-03-01

    Genes are regulated by transcription factors that bind to DNA upstream of genes and recognize short conserved ``motifs'' in a random intergenic ``background''. Motif-finders such as the Gibbs sampler compare the probability of these short sequences being represented by ``weight matrices'' to the probability of their arising from the background ``null model'', and explore this space (analogous to a free-energy landscape). But closely related species may show conservation not because of functional sites but simply because they have not had sufficient time to diverge, so conventional methods will fail. We introduce a new Gibbs sampler algorithm that accounts for common ancestry when searching for motifs, while requiring minimal ``prior'' assumptions on the number and types of motifs, assessing the significance of detected motifs by ``tracking'' clusters that stay together. We apply this scheme to motif detection in sporulation-cycle genes in the yeast S. cerevisiae, using recent sequences of other closely-related Saccharomyces species.

  16. Discovering Motifs in Biological Sequences Using the Micron Automata Processor.

    PubMed

    Roy, Indranil; Aluru, Srinivas

    2016-01-01

    Finding approximately conserved sequences, called motifs, across multiple DNA or protein sequences is an important problem in computational biology. In this paper, we consider the (l, d) motif search problem of identifying one or more motifs of length l present in at least q of the n given sequences, with each occurrence differing from the motif in at most d substitutions. The problem is known to be NP-complete, and the largest solved instance reported to date is (26,11). We propose a novel algorithm for the (l,d) motif search problem using streaming execution over a large set of non-deterministic finite automata (NFA). This solution is designed to take advantage of the micron automata processor, a new technology close to deployment that can simultaneously execute multiple NFA in parallel. We demonstrate the capability for solving much larger instances of the (l, d) motif search problem using the resources available within a single automata processor board, by estimating run-times for problem instances (39,18) and (40,17). The paper serves as a useful guide to solving problems using this new accelerator technology. PMID:26886735

  17. Discovering Motifs in Ranked Lists of DNA Sequences

    PubMed Central

    Eden, Eran; Lipson, Doron; Yogev, Sivan; Yakhini, Zohar

    2007-01-01

    Computational methods for discovery of sequence elements that are enriched in a target set compared with a background set are fundamental in molecular biology research. One example is the discovery of transcription factor binding motifs that are inferred from ChIP–chip (chromatin immuno-precipitation on a microarray) measurements. Several major challenges in sequence motif discovery still require consideration: (i) the need for a principled approach to partitioning the data into target and background sets; (ii) the lack of rigorous models and of an exact p-value for measuring motif enrichment; (iii) the need for an appropriate framework for accounting for motif multiplicity; (iv) the tendency, in many of the existing methods, to report presumably significant motifs even when applied to randomly generated data. In this paper we present a statistical framework for discovering enriched sequence elements in ranked lists that resolves these four issues. We demonstrate the implementation of this framework in a software application, termed DRIM (discovery of rank imbalanced motifs), which identifies sequence motifs in lists of ranked DNA sequences. We applied DRIM to ChIP–chip and CpG methylation data and obtained the following results. (i) Identification of 50 novel putative transcription factor (TF) binding sites in yeast ChIP–chip data. The biological function of some of them was further investigated to gain new insights on transcription regulation networks in yeast. For example, our discoveries enable the elucidation of the network of the TF ARO80. Another finding concerns a systematic TF binding enhancement to sequences containing CA repeats. (ii) Discovery of novel motifs in human cancer CpG methylation data. Remarkably, most of these motifs are similar to DNA sequence elements bound by the Polycomb complex that promotes histone methylation. Our findings thus support a model in which histone methylation and CpG methylation are mechanistically linked. Overall

  18. Oligonucleotide Sequence Motifs as Nucleosome Positioning Signals

    PubMed Central

    Collings, Clayton K.; Fernandez, Alfonso G.; Pitschka, Chad G.; Hawkins, Troy B.; Anderson, John N.

    2010-01-01

    To gain a better understanding of the sequence patterns that characterize positioned nucleosomes, we first performed an analysis of the periodicities of the 256 tetranucleotides in a yeast genome-wide library of nucleosomal DNA sequences that was prepared by in vitro reconstitution. The approach entailed the identification and analysis of 24 unique tetranucleotides that were defined by 8 consensus sequences. These consensus sequences were shown to be responsible for most if not all of the tetranucleotide and dinucleotide periodicities displayed by the entire library, demonstrating that the periodicities of dinucleotides that characterize the yeast genome are, in actuality, due primarily to the 8 consensus sequences. A novel combination of experimental and bioinformatic approaches was then used to show that these tetranucleotides are important for preferred formation of nucleosomes at specific sites along DNA in vitro. These results were then compared to tetranucleotide patterns in genome-wide in vivo libraries from yeast and C. elegans in order to assess the contributions of DNA sequence in the control of nucleosome residency in the cell. These comparisons revealed striking similarities in the tetranucleotide occurrence profiles that are likely to be involved in nucleosome positioning in both in vitro and in vivo libraries, suggesting that DNA sequence is an important factor in the control of nucleosome placement in vivo. However, the strengths of the tetranucleotide periodicities were 3–4 fold higher in the in vitro as compared to the in vivo libraries, which implies that DNA sequence plays less of a role in dictating nucleosome positions in vivo. The results of this study have important implications for models of sequence-dependent positioning since they suggest that a defined subset of tetranucleotides is involved in preferred nucleosome occupancy and that these tetranucleotides are the major source of the dinucleotide periodicities that are characteristic of

  19. Discovering common stem–loop motifs in unaligned RNA sequences

    PubMed Central

    Gorodkin, Jan; Stricklin, Shawn L.; Stormo, Gary D.

    2001-01-01

    Post-transcriptional regulation of gene expression is often accomplished by proteins binding to specific sequence motifs in mRNA molecules, to affect their translation or stability. The motifs are often composed of a combination of sequence and structural constraints such that the overall structure is preserved even though much of the primary sequence is variable. While several methods exist to discover transcriptional regulatory sites in the DNA sequences of coregulated genes, the RNA motif discovery problem is much more difficult because of covariation in the positions. We describe the combined use of two approaches for RNA structure prediction, FOLDALIGN and COVE, that together can discover and model stem–loop RNA motifs in unaligned sequences, such as UTRs from post-transcriptionally coregulated genes. We evaluate the method on two datasets, one a section of rRNA genes with randomly truncated ends so that a global alignment is not possible, and the other a hyper-variable collection of IRE-like elements that were inserted into randomized UTR sequences. In both cases the combined method identified the motifs correctly, and in the rRNA example we show that it is capable of determining the structure, which includes bulge and internal loops as well as a variable length hairpin loop. Those automated results are quantitatively evaluated and found to agree closely with structures contained in curated databases, with correlation coefficients up to 0.9. A basic server, Stem–Loop Align SearcH (SLASH), which will perform stem–loop searches in unaligned RNA sequences, is available at http://www.bioinf.au.dk/slash/. PMID:11353083

  20. Identification of disease-specific motifs in the antibody specificity repertoire via next-generation sequencing.

    PubMed

    Pantazes, Robert J; Reifert, Jack; Bozekowski, Joel; Ibsen, Kelly N; Murray, Joseph A; Daugherty, Patrick S

    2016-01-01

    Disease-specific antibodies can serve as highly effective biomarkers but have been identified for only a relatively small number of autoimmune diseases. A method was developed to identify disease-specific binding motifs through integration of bacterial display peptide library screening, next-generation sequencing (NGS) and computational analysis. Antibody specificity repertoires were determined by identifying bound peptide library members for each specimen using cell sorting and performing NGS. A computational algorithm, termed Identifying Motifs Using Next- generation sequencing Experiments (IMUNE), was developed and applied to discover disease- and healthy control-specific motifs. IMUNE performs comprehensive pattern searches, identifies patterns statistically enriched in the disease or control groups and clusters the patterns to generate motifs. Using celiac disease sera as a discovery set, IMUNE identified a consensus motif (QPEQPF[PS]E) with high diagnostic sensitivity and specificity in a validation sera set, in addition to novel motifs. Peptide display and sequencing (Display-Seq) coupled with IMUNE analysis may thus be useful to characterize antibody repertoires and identify disease-specific antibody epitopes and biomarkers. PMID:27481573

  1. Identification of disease-specific motifs in the antibody specificity repertoire via next-generation sequencing

    PubMed Central

    Pantazes, Robert J.; Reifert, Jack; Bozekowski, Joel; Ibsen, Kelly N.; Murray, Joseph A.; Daugherty, Patrick S.

    2016-01-01

    Disease-specific antibodies can serve as highly effective biomarkers but have been identified for only a relatively small number of autoimmune diseases. A method was developed to identify disease-specific binding motifs through integration of bacterial display peptide library screening, next-generation sequencing (NGS) and computational analysis. Antibody specificity repertoires were determined by identifying bound peptide library members for each specimen using cell sorting and performing NGS. A computational algorithm, termed Identifying Motifs Using Next- generation sequencing Experiments (IMUNE), was developed and applied to discover disease- and healthy control-specific motifs. IMUNE performs comprehensive pattern searches, identifies patterns statistically enriched in the disease or control groups and clusters the patterns to generate motifs. Using celiac disease sera as a discovery set, IMUNE identified a consensus motif (QPEQPF[PS]E) with high diagnostic sensitivity and specificity in a validation sera set, in addition to novel motifs. Peptide display and sequencing (Display-Seq) coupled with IMUNE analysis may thus be useful to characterize antibody repertoires and identify disease-specific antibody epitopes and biomarkers. PMID:27481573

  2. Computing distribution of scale independent motifs in biological sequences

    PubMed Central

    Almeida, Jonas S; Vinga, Susana

    2006-01-01

    The use of Chaos Game Representation (CGR) or its generalization, Universal Sequence Maps (USM), to describe the distribution of biological sequences has been found objectionable because of the fractal structure of that coordinate system. Consequently, the investigation of distribution of symbolic motifs at multiple scales is hampered by an inexact association between distance and sequence dissimilarity. A solution to this problem could unleash the use of iterative maps as phase-state representation of sequences where its statistical properties can be conveniently investigated. In this study a family of kernel density functions is described that accommodates the fractal nature of iterative function representations of symbolic sequences and, consequently, enables the exact investigation of sequence motifs of arbitrary lengths in that scale-independent representation. Furthermore, the proposed kernel density includes both Markovian succession and currently used alignment-free sequence dissimilarity metrics as special solutions. Therefore, the fractal kernel described is in fact a generalization that provides a common framework for a diverse suite of sequence analysis techniques. PMID:17049089

  3. Do short, frequent DNA sequence motifs mould the epigenome?

    PubMed

    Quante, Timo; Bird, Adrian

    2016-04-01

    'Epigenome' refers to the panoply of chemical modifications borne by DNA and its associated proteins that locally affect genome function. Epigenomic patterns are thought to be determined by external constraints resulting from development, disease and the environment, but DNA sequence is also a potential influence. We propose that domains of relatively uniform DNA base composition may modulate the epigenome through cell type-specific proteins that recognize short, frequent sequence motifs. Differential recruitment of epigenomic modifiers may adjust gene expression in multigene blocks as an alternative to tuning the activity of each gene separately, thus simplifying gene expression programming. PMID:26837845

  4. Sequence-Based Classification Using Discriminatory Motif Feature Selection

    PubMed Central

    Xiong, Hao; Capurso, Daniel; Sen, Śaunak; Segal, Mark R.

    2011-01-01

    Most existing methods for sequence-based classification use exhaustive feature generation, employing, for example, all -mer patterns. The motivation behind such (enumerative) approaches is to minimize the potential for overlooking important features. However, there are shortcomings to this strategy. First, practical constraints limit the scope of exhaustive feature generation to patterns of length , such that potentially important, longer () predictors are not considered. Second, features so generated exhibit strong dependencies, which can complicate understanding of derived classification rules. Third, and most importantly, numerous irrelevant features are created. These concerns can compromise prediction and interpretation. While remedies have been proposed, they tend to be problem-specific and not broadly applicable. Here, we develop a generally applicable methodology, and an attendant software pipeline, that is predicated on discriminatory motif finding. In addition to the traditional training and validation partitions, our framework entails a third level of data partitioning, a discovery partition. A discriminatory motif finder is used on sequences and associated class labels in the discovery partition to yield a (small) set of features. These features are then used as inputs to a classifier in the training partition. Finally, performance assessment occurs on the validation partition. Important attributes of our approach are its modularity (any discriminatory motif finder and any classifier can be deployed) and its universality (all data, including sequences that are unaligned and/or of unequal length, can be accommodated). We illustrate our approach on two nucleosome occupancy datasets and a protein solubility dataset, previously analyzed using enumerative feature generation. Our method achieves excellent performance results, with and without optimization of classifier tuning parameters. A Python pipeline implementing the approach is available at http

  5. Identification of imine reductase-specific sequence motifs.

    PubMed

    Fademrecht, Silvia; Scheller, Philipp N; Nestl, Bettina M; Hauer, Bernhard; Pleiss, Jürgen

    2016-05-01

    Chiral amines are valuable building blocks for the production of a variety of pharmaceuticals, agrochemicals and other specialty chemicals. Only recently, imine reductases (IREDs) were discovered which catalyze the stereoselective reduction of imines to chiral amines. Although several IREDs were biochemically characterized in the last few years, knowledge of the reaction mechanism and the molecular basis of substrate specificity and stereoselectivity is limited. To gain further insights into the sequence-function relationships, the Imine Reductase Engineering Database (www.IRED.BioCatNet.de) was established and a systematic analysis of 530 putative IREDs was performed. A standard numbering scheme based on R-IRED-Sk was introduced to facilitate the identification and communication of structurally equivalent positions in different proteins. A conservation analysis revealed a highly conserved cofactor binding region and a predominantly hydrophobic substrate binding cleft. Two IRED-specific motifs were identified, the cofactor binding motif GLGxMGx5 [ATS]x4 Gx4 [VIL]WNR[TS]x2 [KR] and the active site motif Gx[DE]x[GDA]x[APS]x3 {K}x[ASL]x[LMVIAG]. Our results indicate a preference toward NADPH for all IREDs and explain why, despite their sequence similarity to β-hydroxyacid dehydrogenases (β-HADs), no conversion of β-hydroxyacids has been observed. Superfamily-specific conservations were investigated to explore the molecular basis of their stereopreference. Based on our analysis and previous experimental results on IRED mutants, an exclusive role of standard position 187 for stereoselectivity is excluded. Alternatively, two standard positions 139 and 194 were identified which are superfamily-specifically conserved and differ in R- and S-selective enzymes. Proteins 2016; 84:600-610. © 2016 Wiley Periodicals, Inc. PMID:26857686

  6. Phosphotyrosine Substrate Sequence Motifs for Dual Specificity Phosphatases

    PubMed Central

    Zhao, Bryan M.; Keasey, Sarah L.; Tropea, Joseph E.; Lountos, George T.; Dyas, Beverly K.; Cherry, Scott; Raran-Kurussi, Sreejith; Waugh, David S.; Ulrich, Robert G.

    2015-01-01

    Protein tyrosine phosphatases dephosphorylate tyrosine residues of proteins, whereas, dual specificity phosphatases (DUSPs) are a subgroup of protein tyrosine phosphatases that dephosphorylate not only Tyr(P) residue, but also the Ser(P) and Thr(P) residues of proteins. The DUSPs are linked to the regulation of many cellular functions and signaling pathways. Though many cellular targets of DUSPs are known, the relationship between catalytic activity and substrate specificity is poorly defined. We investigated the interactions of peptide substrates with select DUSPs of four types: MAP kinases (DUSP1 and DUSP7), atypical (DUSP3, DUSP14, DUSP22 and DUSP27), viral (variola VH1), and Cdc25 (A-C). Phosphatase recognition sites were experimentally determined by measuring dephosphorylation of 6,218 microarrayed Tyr(P) peptides representing confirmed and theoretical phosphorylation motifs from the cellular proteome. A broad continuum of dephosphorylation was observed across the microarrayed peptide substrates for all phosphatases, suggesting a complex relationship between substrate sequence recognition and optimal activity. Further analysis of peptide dephosphorylation by hierarchical clustering indicated that DUSPs could be organized by substrate sequence motifs, and peptide-specificities by phylogenetic relationships among the catalytic domains. The most highly dephosphorylated peptides represented proteins from 29 cell-signaling pathways, greatly expanding the list of potential targets of DUSPs. These newly identified DUSP substrates will be important for examining structure-activity relationships with physiologically relevant targets. PMID:26302245

  7. Functional roles of short sequence motifs in the endocytosis of membrane receptors

    PubMed Central

    Pandey, Kailash N.

    2009-01-01

    Internalization and trafficking of cell-surface membrane receptors and proteins into subcellular compartments is mediated by specific short-sequence signal motifs, which are usually located within the cytoplasmic domains of these receptor and protein molecules. The signals usually consist of short linear amino acid sequences, which are recognized by adaptor coat proteins along the endocytic and sorting pathways. The complex arrays of signals and recognition proteins ensure the dynamic movement, accurate trafficking, and designated distribution of transmembrane receptors and ligands into intracellular compartments, particularly to the endosomal-lysosomal system. This review summarizes the new information and concepts, integrating them with the current and established views of endocytosis, intracellular trafficking, and sorting of membrane receptors and proteins. Particular emphasis has been given to the functional roles of short-sequence signal motifs responsible for the itinerary and destination of membrane receptors and proteins moving into the subcellular compartments. The specific characteristics and functions of short-sequence motifs, including various tyrosine-based, dileucine-type, and other short-sequence signals in the trafficking and sorting of membrane receptors and membrane proteins are presented and discussed. PMID:19482617

  8. A Monte Carlo-based framework enhances the discovery and interpretation of regulatory sequence motifs

    PubMed Central

    2012-01-01

    Background Discovery of functionally significant short, statistically overrepresented subsequence patterns (motifs) in a set of sequences is a challenging problem in bioinformatics. Oftentimes, not all sequences in the set contain a motif. These non-motif-containing sequences complicate the algorithmic discovery of motifs. Filtering the non-motif-containing sequences from the larger set of sequences while simultaneously determining the identity of the motif is, therefore, desirable and a non-trivial problem in motif discovery research. Results We describe MotifCatcher, a framework that extends the sensitivity of existing motif-finding tools by employing random sampling to effectively remove non-motif-containing sequences from the motif search. We developed two implementations of our algorithm; each built around a commonly used motif-finding tool, and applied our algorithm to three diverse chromatin immunoprecipitation (ChIP) data sets. In each case, the motif finder with the MotifCatcher extension demonstrated improved sensitivity over the motif finder alone. Our approach organizes candidate functionally significant discovered motifs into a tree, which allowed us to make additional insights. In all cases, we were able to support our findings with experimental work from the literature. Conclusions Our framework demonstrates that additional processing at the sequence entry level can significantly improve the performance of existing motif-finding tools. For each biological data set tested, we were able to propose novel biological hypotheses supported by experimental work from the literature. Specifically, in Escherichia coli, we suggested binding site motifs for 6 non-traditional LexA protein binding sites; in Saccharomyces cerevisiae, we hypothesize 2 disparate mechanisms for novel binding sites of the Cse4p protein; and in Halobacterium sp. NRC-1, we discoverd subtle differences in a general transcription factor (GTF) binding site motif across several data sets. We

  9. SVM2Motif--Reconstructing Overlapping DNA Sequence Motifs by Mimicking an SVM Predictor.

    PubMed

    Vidovic, Marina M-C; Görnitz, Nico; Müller, Klaus-Robert; Rätsch, Gunnar; Kloft, Marius

    2015-01-01

    Identifying discriminative motifs underlying the functionality and evolution of organisms is a major challenge in computational biology. Machine learning approaches such as support vector machines (SVMs) achieve state-of-the-art performances in genomic discrimination tasks, but--due to its black-box character--motifs underlying its decision function are largely unknown. As a remedy, positional oligomer importance matrices (POIMs) allow us to visualize the significance of position-specific subsequences. Although being a major step towards the explanation of trained SVM models, they suffer from the fact that their size grows exponentially in the length of the motif, which renders their manual inspection feasible only for comparably small motif sizes, typically k ≤ 5. In this work, we extend the work on positional oligomer importance matrices, by presenting a new machine-learning methodology, entitled motifPOIM, to extract the truly relevant motifs--regardless of their length and complexity--underlying the predictions of a trained SVM model. Our framework thereby considers the motifs as free parameters in a probabilistic model, a task which can be phrased as a non-convex optimization problem. The exponential dependence of the POIM size on the oligomer length poses a major numerical challenge, which we address by an efficient optimization framework that allows us to find possibly overlapping motifs consisting of up to hundreds of nucleotides. We demonstrate the efficacy of our approach on a synthetic data set as well as a real-world human splice site data set. PMID:26690911

  10. JAR3D Webserver: Scoring and aligning RNA loop sequences to known 3D motifs.

    PubMed

    Roll, James; Zirbel, Craig L; Sweeney, Blake; Petrov, Anton I; Leontis, Neocles

    2016-07-01

    Many non-coding RNAs have been identified and may function by forming 2D and 3D structures. RNA hairpin and internal loops are often represented as unstructured on secondary structure diagrams, but RNA 3D structures show that most such loops are structured by non-Watson-Crick basepairs and base stacking. Moreover, different RNA sequences can form the same RNA 3D motif. JAR3D finds possible 3D geometries for hairpin and internal loops by matching loop sequences to motif groups from the RNA 3D Motif Atlas, by exact sequence match when possible, and by probabilistic scoring and edit distance for novel sequences. The scoring gauges the ability of the sequences to form the same pattern of interactions observed in 3D structures of the motif. The JAR3D webserver at http://rna.bgsu.edu/jar3d/ takes one or many sequences of a single loop as input, or else one or many sequences of longer RNAs with multiple loops. Each sequence is scored against all current motif groups. The output shows the ten best-matching motif groups. Users can align input sequences to each of the motif groups found by JAR3D. JAR3D will be updated with every release of the RNA 3D Motif Atlas, and so its performance is expected to improve over time. PMID:27235417

  11. Sequence Motifs in MADS Transcription Factors Responsible for Specificity and Diversification of Protein-Protein Interaction

    PubMed Central

    van Dijk, Aalt D. J.; Morabito, Giuseppa; Fiers, Martijn; van Ham, Roeland C. H. J.; Angenent, Gerco C.; Immink, Richard G. H.

    2010-01-01

    Protein sequences encompass tertiary structures and contain information about specific molecular interactions, which in turn determine biological functions of proteins. Knowledge about how protein sequences define interaction specificity is largely missing, in particular for paralogous protein families with high sequence similarity, such as the plant MADS domain transcription factor family. In comparison to the situation in mammalian species, this important family of transcription regulators has expanded enormously in plant species and contains over 100 members in the model plant species Arabidopsis thaliana. Here, we provide insight into the mechanisms that determine protein-protein interaction specificity for the Arabidopsis MADS domain transcription factor family, using an integrated computational and experimental approach. Plant MADS proteins have highly similar amino acid sequences, but their dimerization patterns vary substantially. Our computational analysis uncovered small sequence regions that explain observed differences in dimerization patterns with reasonable accuracy. Furthermore, we show the usefulness of the method for prediction of MADS domain transcription factor interaction networks in other plant species. Introduction of mutations in the predicted interaction motifs demonstrated that single amino acid mutations can have a large effect and lead to loss or gain of specific interactions. In addition, various performed bioinformatics analyses shed light on the way evolution has shaped MADS domain transcription factor interaction specificity. Identified protein-protein interaction motifs appeared to be strongly conserved among orthologs, indicating their evolutionary importance. We also provide evidence that mutations in these motifs can be a source for sub- or neo-functionalization. The analyses presented here take us a step forward in understanding protein-protein interactions and the interplay between protein sequences and network evolution. PMID

  12. False occurrences of functional motifs in protein sequences highlight evolutionary constraints

    PubMed Central

    Via, Allegra; Gherardini, Pier Federico; Ferraro, Enrico; Ausiello, Gabriele; Scalia Tomba, Gianpaolo; Helmer-Citterich, Manuela

    2007-01-01

    Background False occurrences of functional motifs in protein sequences can be considered as random events due solely to the sequence composition of a proteome. Here we use a numerical approach to investigate the random appearance of functional motifs with the aim of addressing biological questions such as: How are organisms protected from undesirable occurrences of motifs otherwise selected for their functionality? Has the random appearance of functional motifs in protein sequences been affected during evolution? Results Here we analyse the occurrence of functional motifs in random sequences and compare it to that observed in biological proteomes; the behaviour of random motifs is also studied. Most motifs exhibit a number of false positives significantly similar to the number of times they appear in randomized proteomes (=expected number of false positives). Interestingly, about 3% of the analysed motifs show a different kind of behaviour and appear in biological proteomes less than they do in random sequences. In some of these cases, a mechanism of evolutionary negative selection is apparent; this helps to prevent unwanted functionalities which could interfere with cellular mechanisms. Conclusion Our thorough statistical and biological analysis showed that there are several mechanisms and evolutionary constraints both of which affect the appearance of functional motifs in protein sequences. PMID:17331242

  13. Illumina MiSeq sequencing disfavours a sequence motif in the GFP reporter gene

    PubMed Central

    Van den Hoecke, Silvie; Verhelst, Judith; Saelens, Xavier

    2016-01-01

    Green fluorescent protein (GFP) is one of the most used reporter genes. We have used next-generation sequencing (NGS) to analyse the genetic diversity of a recombinant influenza A virus that expresses GFP and found a remarkable coverage dip in the GFP coding sequence. This coverage dip was present when virus-derived RT-PCR product or the parental plasmid DNA was used as starting material for NGS and regardless of whether Nextera XT transposase or Covaris shearing was used for DNA fragmentation. Therefore, the sequence coverage dip in the GFP coding sequence was not the result of emerging GFP mutant viruses or a bias introduced by Nextera XT fragmentation. Instead, we found that the Illumina MiSeq sequencing method disfavours the ‘CCCGCC’ motif in the GFP coding sequence. PMID:27193250

  14. Illumina MiSeq sequencing disfavours a sequence motif in the GFP reporter gene.

    PubMed

    Van den Hoecke, Silvie; Verhelst, Judith; Saelens, Xavier

    2016-01-01

    Green fluorescent protein (GFP) is one of the most used reporter genes. We have used next-generation sequencing (NGS) to analyse the genetic diversity of a recombinant influenza A virus that expresses GFP and found a remarkable coverage dip in the GFP coding sequence. This coverage dip was present when virus-derived RT-PCR product or the parental plasmid DNA was used as starting material for NGS and regardless of whether Nextera XT transposase or Covaris shearing was used for DNA fragmentation. Therefore, the sequence coverage dip in the GFP coding sequence was not the result of emerging GFP mutant viruses or a bias introduced by Nextera XT fragmentation. Instead, we found that the Illumina MiSeq sequencing method disfavours the 'CCCGCC' motif in the GFP coding sequence. PMID:27193250

  15. Phosphatidylinositol transfer proteins: sequence motifs in structural and evolutionary analyses

    PubMed Central

    Wyckoff, Gerald J.; Solidar, Ada; Yoden, Marilyn D.

    2016-01-01

    Phosphatidylinositol transfer proteins (PITP) are a family of monomeric proteins that bind and transfer phosphatidylinositol and phosphatidylcholine between membrane compartments. They are required for production of inositol and diacylglycerol second messengers, and are found in most metazoan organisms. While PITPs are known to carry out crucial cell-signaling roles in many organisms, the structure, function and evolution of the majority of family members remains unexplored; primarily because the ubiquity and diversity of the family thwarts traditional methods of global alignment. To surmount this obstacle, we instead took a novel approach, using MEME and a parsimony-based analysis to create a cladogram of conserved sequence motifs in 56 PITP family proteins from 26 species. In keeping with previous functional annotations, three clades were supported within our evolutionary analysis; two classes of soluble proteins and a class of membrane-associated proteins. By, focusing on conserved regions, the analysis allowed for in depth queries regarding possible functional roles of PITP proteins in both intra- and extra- cellular signaling.

  16. ZFP57 recognizes multiple and closely spaced sequence motif variants to maintain repressive epigenetic marks in mouse embryonic stem cells

    PubMed Central

    Anvar, Zahra; Cammisa, Marco; Riso, Vincenzo; Baglivo, Ilaria; Kukreja, Harpreet; Sparago, Angela; Girardot, Michael; Lad, Shraddha; De Feis, Italia; Cerrato, Flavia; Angelini, Claudia; Feil, Robert; Pedone, Paolo V.; Grimaldi, Giovanna; Riccio, Andrea

    2016-01-01

    Imprinting Control Regions (ICRs) need to maintain their parental allele-specific DNA methylation during early embryogenesis despite genome-wide demethylation and subsequent de novo methylation. ZFP57 and KAP1 are both required for maintaining the repressive DNA methylation and H3-lysine-9-trimethylation (H3K9me3) at ICRs. In vitro, ZFP57 binds a specific hexanucleotide motif that is enriched at its genomic binding sites. We now demonstrate in mouse embryonic stem cells (ESCs) that SNPs disrupting closely-spaced hexanucleotide motifs are associated with lack of ZFP57 binding and H3K9me3 enrichment. Through a transgenic approach in mouse ESCs, we further demonstrate that an ICR fragment containing three ZFP57 motif sequences recapitulates the original methylated or unmethylated status when integrated into the genome at an ectopic position. Mutation of Zfp57 or the hexanucleotide motifs led to loss of ZFP57 binding and DNA methylation of the transgene. Finally, we identified a sequence variant of the hexanucleotide motif that interacts with ZFP57 both in vivo and in vitro. The presence of multiple and closely located copies of ZFP57 motif variants emerges as a distinct characteristic that is required for the faithful maintenance of repressive epigenetic marks at ICRs and other ZFP57 binding sites. PMID:26481358

  17. Physical-chemical property based sequence motifs and methods regarding same

    DOEpatents

    Braun, Werner; Mathura, Venkatarajan S.; Schein, Catherine H.

    2008-09-09

    A data analysis system, program, and/or method, e.g., a data mining/data exploration method, using physical-chemical property motifs. For example, a sequence database may be searched for identifying segments thereof having physical-chemical properties similar to the physical-chemical property motifs.

  18. Bioinformatic identification of novel regulatory DNA sequence motifs in Streptomyces coelicolor

    PubMed Central

    Studholme, David J; Bentley, Stephen D; Kormanec, Jan

    2004-01-01

    Background Streptomyces coelicolor is a bacterium with a vast repertoire of metabolic functions and complex systems of cellular development. Its genome sequence is rich in genes that encode regulatory proteins to control these processes in response to its changing environment. We wished to apply a recently published bioinformatic method for identifying novel regulatory sequence signals to gain new insights into regulation in S. coelicolor. Results The method involved production of position-specific weight matrices from alignments of over-represented words of DNA sequence. We generated 2497 weight matrices, each representing a candidate regulatory DNA sequence motif. We scanned the genome sequence of S. coelicolor against each of these matrices. A DNA sequence motif represented by one of the matrices was found preferentially in non-coding sequences immediately upstream of genes involved in polysaccharide degradation, including several that encode chitinases. This motif (TGGTCTAGACCA) was also found upstream of genes encoding components of the phosphoenolpyruvate phosphotransfer system (PTS). We hypothesise that this DNA sequence motif represents a regulatory element that is responsive to availability of carbon-sources. Other motifs of potential biological significance were found upstream of genes implicated in secondary metabolism (TTAGGTtAGgCTaACCTAA), sigma factors (TGACN19TGAC), DNA replication and repair (ttgtCAGTGN13TGGA), nucleotide conversions (CTACgcNCGTAG), and ArsR (TCAGN12TCAG). A motif found upstream of genes involved in chromosome replication (TGTCagtgcN7Tagg) was similar to a previously described motif found in UV-responsive promoters. Conclusions We successfully applied a recently published in silico method to identify conserved sequence motifs in S. coelicolor that may be biologically significant as regulatory elements. Our data are broadly consistent with and further extend data from previously published studies. We invite experimental testing of

  19. Over-represented localized sequence motifs in ribosomal protein gene promoters of basal metazoans.

    PubMed

    Perina, Drago; Korolija, Marina; Roller, Maša; Harcet, Matija; Jeličić, Branka; Mikoč, Andreja; Cetković, Helena

    2011-07-01

    Equimolecular presence of ribosomal proteins (RPs) in the cell is needed for ribosome assembly and is achieved by synchronized expression of ribosomal protein genes (RPGs) with promoters of similar strengths. Over-represented motifs of RPG promoter regions are identified as targets for specific transcription factors. Unlike RPs, those motifs are not conserved between mammals, drosophila, and yeast. We analyzed RPGs proximal promoter regions of three basal metazoans with sequenced genomes: sponge, cnidarian, and placozoan and found common features, such as 5'-terminal oligopyrimidine tracts and TATA-boxes. Furthermore, we identified over-represented motifs, some of which displayed the highest similarity to motifs abundant in human RPG promoters and not present in Drosophila or yeast. Our results indicate that humans over-represented motifs, as well as corresponding domains of transcription factors, were established very early in metazoan evolution. The fast evolving nature of RPGs regulatory network leads to formation of other, lineage specific, over-represented motifs. PMID:21457775

  20. Fast and Accurate Discovery of Degenerate Linear Motifs in Protein Sequences

    PubMed Central

    Levy, Emmanuel D.; Michnick, Stephen W.

    2014-01-01

    Linear motifs mediate a wide variety of cellular functions, which makes their characterization in protein sequences crucial to understanding cellular systems. However, the short length and degenerate nature of linear motifs make their discovery a difficult problem. Here, we introduce MotifHound, an algorithm particularly suited for the discovery of small and degenerate linear motifs. MotifHound performs an exact and exhaustive enumeration of all motifs present in proteins of interest, including all of their degenerate forms, and scores the overrepresentation of each motif based on its occurrence in proteins of interest relative to a background (e.g., proteome) using the hypergeometric distribution. To assess MotifHound, we benchmarked it together with state-of-the-art algorithms. The benchmark consists of 11,880 sets of proteins from S. cerevisiae; in each set, we artificially spiked-in one motif varying in terms of three key parameters, (i) number of occurrences, (ii) length and (iii) the number of degenerate or “wildcard” positions. The benchmark enabled the evaluation of the impact of these three properties on the performance of the different algorithms. The results showed that MotifHound and SLiMFinder were the most accurate in detecting degenerate linear motifs. Interestingly, MotifHound was 15 to 20 times faster at comparable accuracy and performed best in the discovery of highly degenerate motifs. We complemented the benchmark by an analysis of proteins experimentally shown to bind the FUS1 SH3 domain from S. cerevisiae. Using the full-length protein partners as sole information, MotifHound recapitulated most experimentally determined motifs binding to the FUS1 SH3 domain. Moreover, these motifs exhibited properties typical of SH3 binding peptides, e.g., high intrinsic disorder and evolutionary conservation, despite the fact that none of these properties were used as prior information. MotifHound is available (http://michnick.bcm.umontreal.ca or http

  1. Characterization of the tandem CWCH2 sequence motif: a hallmark of inter-zinc finger interactions

    PubMed Central

    2010-01-01

    Background The C2H2 zinc finger (ZF) domain is widely conserved among eukaryotic proteins. In Zic/Gli/Zap1 C2H2 ZF proteins, the two N-terminal ZFs form a single structural unit by sharing a hydrophobic core. This structural unit defines a new motif comprised of two tryptophan side chains at the center of the hydrophobic core. Because each tryptophan residue is located between the two cysteine residues of the C2H2 motif, we have named this structure the tandem CWCH2 (tCWCH2) motif. Results Here, we characterized 587 tCWCH2-containing genes using data derived from public databases. We categorized genes into 11 classes including Zic/Gli/Glis, Arid2/Rsc9, PacC, Mizf, Aebp2, Zap1/ZafA, Fungl, Zfp106, Twincl, Clr1, and Fungl-4ZF, based on sequence similarity, domain organization, and functional similarities. tCWCH2 motifs are mostly found in organisms belonging to the Opisthokonta (metazoa, fungi, and choanoflagellates) and Amoebozoa (amoeba, Dictyostelium discoideum). By comparison, the C2H2 ZF motif is distributed widely among the eukaryotes. The structure and organization of the tCWCH2 motif, its phylogenetic distribution, and molecular phylogenetic analysis suggest that prototypical tCWCH2 genes existed in the Opisthokonta ancestor. Within-group or between-group comparisons of the tCWCH2 amino acid sequence identified three additional sequence features (site-specific amino acid frequencies, longer linker sequence between two C2H2 ZFs, and frequent extra-sequences within C2H2 ZF motifs). Conclusion These features suggest that the tCWCH2 motif is a specialized motif involved in inter-zinc finger interactions. PMID:20167128

  2. Analysis of Genomic Sequence Motifs for Deciphering Transcription Factor Binding and Transcriptional Regulation in Eukaryotic Cells

    PubMed Central

    Boeva, Valentina

    2016-01-01

    Eukaryotic genomes contain a variety of structured patterns: repetitive elements, binding sites of DNA and RNA associated proteins, splice sites, and so on. Often, these structured patterns can be formalized as motifs and described using a proper mathematical model such as position weight matrix and IUPAC consensus. Two key tasks are typically carried out for motifs in the context of the analysis of genomic sequences. These are: identification in a set of DNA regions of over-represented motifs from a particular motif database, and de novo discovery of over-represented motifs. Here we describe existing methodology to perform these two tasks for motifs characterizing transcription factor binding. When applied to the output of ChIP-seq and ChIP-exo experiments, or to promoter regions of co-modulated genes, motif analysis techniques allow for the prediction of transcription factor binding events and enable identification of transcriptional regulators and co-regulators. The usefulness of motif analysis is further exemplified in this review by how motif discovery improves peak calling in ChIP-seq and ChIP-exo experiments and, when coupled with information on gene expression, allows insights into physical mechanisms of transcriptional modulation. PMID:26941778

  3. Analysis of Genomic Sequence Motifs for Deciphering Transcription Factor Binding and Transcriptional Regulation in Eukaryotic Cells.

    PubMed

    Boeva, Valentina

    2016-01-01

    Eukaryotic genomes contain a variety of structured patterns: repetitive elements, binding sites of DNA and RNA associated proteins, splice sites, and so on. Often, these structured patterns can be formalized as motifs and described using a proper mathematical model such as position weight matrix and IUPAC consensus. Two key tasks are typically carried out for motifs in the context of the analysis of genomic sequences. These are: identification in a set of DNA regions of over-represented motifs from a particular motif database, and de novo discovery of over-represented motifs. Here we describe existing methodology to perform these two tasks for motifs characterizing transcription factor binding. When applied to the output of ChIP-seq and ChIP-exo experiments, or to promoter regions of co-modulated genes, motif analysis techniques allow for the prediction of transcription factor binding events and enable identification of transcriptional regulators and co-regulators. The usefulness of motif analysis is further exemplified in this review by how motif discovery improves peak calling in ChIP-seq and ChIP-exo experiments and, when coupled with information on gene expression, allows insights into physical mechanisms of transcriptional modulation. PMID:26941778

  4. Repulsive parallel MCMC algorithm for discovering diverse motifs from large sequence sets

    PubMed Central

    Ikebata, Hisaki; Yoshida, Ryo

    2015-01-01

    Motivation: The motif discovery problem consists of finding recurring patterns of short strings in a set of nucleotide sequences. This classical problem is receiving renewed attention as most early motif discovery methods lack the ability to handle large data of recent genome-wide ChIP studies. New ChIP-tailored methods focus on reducing computation time and pay little regard to the accuracy of motif detection. Unlike such methods, our method focuses on increasing the detection accuracy while maintaining the computation efficiency at an acceptable level. The major advantage of our method is that it can mine diverse multiple motifs undetectable by current methods. Results: The repulsive parallel Markov chain Monte Carlo (RPMCMC) algorithm that we propose is a parallel version of the widely used Gibbs motif sampler. RPMCMC is run on parallel interacting motif samplers. A repulsive force is generated when different motifs produced by different samplers near each other. Thus, different samplers explore different motifs. In this way, we can detect much more diverse motifs than conventional methods can. Through application to 228 transcription factor ChIP-seq datasets of the ENCODE project, we show that the RPMCMC algorithm can find many reliable cofactor interacting motifs that existing methods are unable to discover. Availability and implementation: A C++ implementation of RPMCMC and discovered cofactor motifs for the 228 ENCODE ChIP-seq datasets are available from http://daweb.ism.ac.jp/yoshidalab/motif. Contact: ikebata.hisaki@ism.ac.jp, yoshidar@ism.ac.jp Supplementary information: Supplementary data are available from Bioinformatics online. PMID:25583120

  5. An artificial intelligence approach to motif discovery in protein sequences: application to steriod dehydrogenases.

    PubMed

    Bailey, T L; Baker, M E; Elkan, C P

    1997-05-01

    MEME (Multiple Expectation-maximization for Motif Elicitation) is a unique new software tool that uses artificial intelligence techniques to discover motifs shared by a set of protein sequences in a fully automated manner. This paper is the first detailed study of the use of MEME to analyse a large, biologically relevant set of sequences, and to evaluate the sensitivity and accuracy of MEME in identifying structurally important motifs. For this purpose, we chose the short-chain alcohol dehydrogenase superfamily because it is large and phylogenetically diverse, providing a test of how well MEME can work on sequences with low amino acid similarity. Moreover, this dataset contains enzymes of biological importance, and because several enzymes have known X-ray crystallographic structures, we can test the usefulness of MEME for structural analysis. The first six motifs from MEME map onto structurally important alpha-helices and beta-strands on Streptomyces hydrogenans 20beta-hydroxysteroid dehydrogenase. We also describe MAST (Motif Alignment Search Tool), which conveniently uses output from MEME for searching databases such as SWISS-PROT and Genpept. MAST provides statistical measures that permit a rigorous evaluation of the significance of database searches with individual motifs or groups of motifs. A database search of Genpept90 by MAST with the log-odds matrix of the first six motifs obtained from MEME yields a bimodal output, demonstrating the selectivity of MAST. We show for the first time, using primary sequence analysis, that bacterial sugar epimerases are homologs of short-chain dehydrogenases. MEME and MAST will be increasingly useful as genome sequencing provides large datasets of phylogenetically divergent sequences of biomedical interest. PMID:9366496

  6. The BsaHI restriction-modification system: Cloning, sequencing and analysis of conserved motifs

    PubMed Central

    Neely, Robert K; Roberts, Richard J

    2008-01-01

    Background Restriction and modification enzymes typically recognise short DNA sequences of between two and eight bases in length. Understanding the mechanism of this recognition represents a significant challenge that we begin to address for the BsaHI restriction-modification system, which recognises the six base sequence GRCGYC. Results The DNA sequences of the genes for the BsaHI methyltransferase, bsaHIM, and restriction endonuclease, bsaHIR, have been determined (GenBank accession #EU386360), cloned and expressed in E. coli. Both the restriction endonuclease and methyltransferase enzymes share significant similarity with a group of 6 other enzymes comprising the restriction-modification systems HgiDI and HgiGI and the putative HindVP, NlaCORFDP, NpuORFC228P and SplZORFNP restriction-modification systems. A sequence alignment of these homologues shows that their amino acid sequences are largely conserved and highlights several motifs of interest. We target one such conserved motif, reading SPERRFD, at the C-terminal end of the bsaHIR gene. A mutational analysis of these amino acids indicates that the motif is crucial for enzymatic activity. Sequence alignment of the methyltransferase gene reveals a short motif within the target recognition domain that is conserved among enzymes recognising the same sequences. Thus, this motif may be used as a diagnostic tool to define the recognition sequences of the cytosine C5 methyltransferases. Conclusion We have cloned and sequenced the BsaHI restriction and modification enzymes. We have identified a region of the R. BsaHI enzyme that is crucial for its activity. Analysis of the amino acid sequence of the BsaHI methyltransferase enzyme led us to propose two new motifs that can be used in the diagnosis of the recognition sequence of the cytosine C5-methyltransferases. PMID:18479503

  7. Predicting candidate genomic sequences that correspond to synthetic functional RNA motifs

    PubMed Central

    Laserson, Uri; Gan, Hin Hark; Schlick, Tamar

    2005-01-01

    Riboswitches and RNA interference are important emerging mechanisms found in many organisms to control gene expression. To enhance our understanding of such RNA roles, finding small regulatory motifs in genomes presents a challenge on a wide scale. Many simple functional RNA motifs have been found by in vitro selection experiments, which produce synthetic target-binding aptamers as well as catalytic RNAs, including the hammerhead ribozyme. Motivated by the prediction of Piganeau and Schroeder [(2003) Chem. Biol., 10, 103–104] that synthetic RNAs may have natural counterparts, we develop and apply an efficient computational protocol for identifying aptamer-like motifs in genomes. We define motifs from the sequence and structural information of synthetic aptamers, search for sequences in genomes that will produce motif matches, and then evaluate the structural stability and statistical significance of the potential hits. Our application to aptamers for streptomycin, chloramphenicol, neomycin B and ATP identifies 37 candidate sequences (in coding and non-coding regions) that fold to the target aptamer structures in bacterial and archaeal genomes. Further energetic screening reveals that several candidates exhibit energetic properties and sequence conservation patterns that are characteristic of functional motifs. Besides providing candidates for experimental testing, our computational protocol offers an avenue for expanding natural RNA's functional repertoire. PMID:16254081

  8. REPdenovo: Inferring De Novo Repeat Motifs from Short Sequence Reads

    PubMed Central

    Chu, Chong; Nielsen, Rasmus; Wu, Yufeng

    2016-01-01

    Repeat elements are important components of eukaryotic genomes. One limitation in our understanding of repeat elements is that most analyses rely on reference genomes that are incomplete and often contain missing data in highly repetitive regions that are difficult to assemble. To overcome this problem we develop a new method, REPdenovo, which assembles repeat sequences directly from raw shotgun sequencing data. REPdenovo can construct various types of repeats that are highly repetitive and have low sequence divergence within copies. We show that REPdenovo is substantially better than existing methods both in terms of the number and the completeness of the repeat sequences that it recovers. The key advantage of REPdenovo is that it can reconstruct long repeats from sequence reads. We apply the method to human data and discover a number of potentially new repeats sequences that have been missed by previous repeat annotations. Many of these sequences are incorporated into various parasite genomes, possibly because the filtering process for host DNA involved in the sequencing of the parasite genomes failed to exclude the host derived repeat sequences. REPdenovo is a new powerful computational tool for annotating genomes and for addressing questions regarding the evolution of repeat families. The software tool, REPdenovo, is available for download at https://github.com/Reedwarbler/REPdenovo. PMID:26977803

  9. Sequence Motifs in Transit Peptides Act as Independent Functional Units and Can Be Transferred to New Sequence Contexts.

    PubMed

    Lee, Dong Wook; Woo, Seungjin; Geem, Kyoung Rok; Hwang, Inhwan

    2015-09-01

    A large number of nuclear-encoded proteins are imported into chloroplasts after they are translated in the cytosol. Import is mediated by transit peptides (TPs) at the N termini of these proteins. TPs contain many small motifs, each of which is critical for a specific step in the process of chloroplast protein import; however, it remains unknown how these motifs are organized to give rise to TPs with diverse sequences. In this study, we generated various hybrid TPs by swapping domains between Rubisco small subunit (RbcS) and chlorophyll a/b-binding protein, which have highly divergent sequences, and examined the abilities of the resultant TPs to deliver proteins into chloroplasts. Subsequently, we compared the functionality of sequence motifs in the hybrid TPs with those of wild-type TPs. The sequence motifs in the hybrid TPs exhibited three different modes of functionality, depending on their domain composition, as follows: active in both wild-type and hybrid TPs, active in wild-type TPs but inactive in hybrid TPs, and inactive in wild-type TPs but active in hybrid TPs. Moreover, synthetic TPs, in which only three critical motifs from RbcS or chlorophyll a/b-binding protein TPs were incorporated into an unrelated sequence, were able to deliver clients to chloroplasts with a comparable efficiency to RbcS TP. Based on these results, we propose that diverse sequence motifs in TPs are independent functional units that interact with specific translocon components at various steps during protein import and can be transferred to new sequence contexts. PMID:26149569

  10. Improved detection of helix-turn-helix DNA-binding motifs in protein sequences.

    PubMed Central

    Dodd, I B; Egan, J B

    1990-01-01

    We present an update of our method for systematic detection and evaluation of potential helix-turn-helix DNA-binding motifs in protein sequences [Dodd, I. and Egan, J. B. (1987) J. Mol. Biol. 194, 557-564]. The new method is considerably more powerful, detecting approximately 50% more likely helix-turn-helix sequences without an increase in false predictions. This improvement is due almost entirely to the use of a much larger reference set of 91 presumed helix-turn-helix sequences. The scoring matrix derived from this reference set has been calibrated against a large protein sequence database so that the score obtained by a sequence can be used to give a practical estimation of the probability that the sequence is a helix-turn-helix motif. PMID:2402433

  11. Creation of Hybrid Nanorods From Sequences of Natural Trimeric Fibrous Proteins Using the Fibritin Trimerization Motif

    NASA Astrophysics Data System (ADS)

    Papanikolopoulou, Katerina; van Raaij, Mark J.; Mitraki, Anna

    Stable, artificial fibrous proteins that can be functionalized open new avenues in fields such as bionanomaterials design and fiber engineering. An important source of inspiration for the creation of such proteins are natural fibrous proteins such as collagen, elastin, insect silks, and fibers from phages and viruses. The fibrous parts of this last class of proteins usually adopt trimeric, β-stranded structural folds and are appended to globular, receptor-binding domains. It has been recently shown that the globular domains are essential for correct folding and trimerization and can be successfully substituted by a very small (27-amino acid) trimerization motif from phage T4 fibritin. The hybrid proteins are correctly folded nanorods that can withstand extreme conditions. When the fibrous part derives from the adenovirus fiber shaft, different tissue-targeting specificities can be engineered into the hybrid proteins, which therefore can be used as gene therapy vectors. The integration of such stable nanorods in devices is also a big challenge in the field of biomechanical design. The fibritin foldon domain is a versatile trimerization motif and can be combined with a variety of fibrous motifs, such as coiled-coil, collagenous, and triple β-stranded motifs, provided the appropriate linkers are used. The combination of different motifs within the same fibrous molecule to create stable rods with multiple functions can even be envisioned. We provide a comprehensive overview of the experimental procedures used for designing, creating, and characterizing hybrid fibrous nanorods using the fibritin trimerization motif.

  12. Classification of protein motifs based on subcellular localization uncovers evolutionary relationships at both sequence and functional levels

    PubMed Central

    2013-01-01

    Background Most proteins have evolved in specific cellular compartments that limit their functions and potential interactions. On the other hand, motifs define amino acid arrangements conserved between protein family members and represent powerful tools for assigning function to protein sequences. The ideal motif would identify all members of a protein family but in practice many motifs identify both family members and unrelated proteins, referred to as True Positive (TP) and False Positive (FP) sequences, respectively. Results To address the relationship between protein motifs, protein function and cellular localization, we systematically assigned subcellular localization data to motif sequences from the comprehensive PROSITE sequence motif database. Using this data we analyzed relationships between localization and function. We find that TPs and FPs have a strong tendency to localize in different compartments. When multiple localizations are considered, TPs are usually distributed between related cellular compartments. We also identified cases where FPs are concentrated in particular subcellular regions, indicating possible functional or evolutionary relationships with TP sequences of the same motif. Conclusions Our findings suggest that the systematic examination of subcellular localization has the potential to uncover evolutionary and functional relationships between motif-containing sequences. We believe that this type of analysis complements existing motif annotations and could aid in their interpretation. Our results shed light on the evolution of cellular organelles and potentially establish the basis for new subcellular localization and function prediction algorithms. PMID:23865897

  13. Discovering active motifs in sets of related protein sequences and using them for classification.

    PubMed Central

    Wang, J T; Marr, T G; Shasha, D; Shapiro, B A; Chirn, G W

    1994-01-01

    We describe a method for discovering active motifs in a set of related protein sequences. The method is an automatic two step process: (1) find candidate motifs in a small sample of the sequences; (2) test whether these motifs are approximately present in all the sequences. To reduce the running time, we develop two optimization heuristics based on statistical estimation and pattern matching techniques. Experimental results obtained by running these algorithms on generated data and functionally related proteins demonstrate the good performance of the presented method compared with visual method of O'Farrell and Leopold. By combining the discovered motifs with an existing fingerprint technique, we develop a protein classifier. When we apply the classifier to the 698 groups of related proteins in the PROSITE catalog, it gives information that is complementary to the BLOCKS protein classifier of Henikoff and Henikoff. Thus, using our classifier in conjunction with theirs, one can obtain high confidence classifications (if BLOCKS and our classifier agree) or suggest a new hypothesis (if the two disagree). PMID:8052532

  14. Identification of potential regulatory motifs in odorant receptor genes by analysis of promoter sequences

    PubMed Central

    Michaloski, Jussara S.; Galante, Pedro A.F.

    2006-01-01

    Mouse odorant receptors (ORs) are encoded by >1000 genes dispersed throughout the genome. Each olfactory neuron expresses one single OR gene, while the rest of the genes remain silent. The mechanisms underlying OR gene expression are poorly understood. Here, we investigated if OR genes share common cis-regulatory sequences in their promoter regions. We carried out a comprehensive analysis in which the upstream regions of a large number of OR genes were compared. First, using RLM-RACE, we generated cDNAs containing the complete 5′-untranslated regions (5′-UTRs) for a total number of 198 mouse OR genes. Then, we aligned these cDNA sequences to the mouse genome so that the 5′ structure and transcription start sites (TSSs) of the OR genes could be precisely determined. Sequences upstream of the TSSs were retrieved and browsed for common elements. We found DNA sequence motifs that are overrepresented in the promoter regions of the OR genes. Most motifs resemble O/E-like sites and are preferentially localized within 200 bp upstream of the TSSs. Finally, we show that these motifs specifically interact with proteins extracted from nuclei prepared from the olfactory epithelium, but not from brain or liver. Our results show that the OR genes share common promoter elements. The present strategy should provide information on the role played by cis-regulatory sequences in OR gene regulation. PMID:16902085

  15. Defining a Conformational Consensus Motif in Cotransin-Sensitive Signal Sequences: A Proteomic and Site-Directed Mutagenesis Study

    PubMed Central

    Klein, Wolfgang; Westendorf, Carolin; Schmidt, Antje; Conill-Cortés, Mercè; Rutz, Claudia; Blohs, Marcus; Beyermann, Michael; Protze, Jonas; Krause, Gerd; Krause, Eberhard; Schülein, Ralf

    2015-01-01

    The cyclodepsipeptide cotransin was described to inhibit the biosynthesis of a small subset of proteins by a signal sequence-discriminatory mechanism at the Sec61 protein-conducting channel. However, it was not clear how selective cotransin is, i.e. how many proteins are sensitive. Moreover, a consensus motif in signal sequences mediating cotransin sensitivity has yet not been described. To address these questions, we performed a proteomic study using cotransin-treated human hepatocellular carcinoma cells and the stable isotope labelling by amino acids in cell culture technique in combination with quantitative mass spectrometry. We used a saturating concentration of cotransin (30 micromolar) to identify also less-sensitive proteins and to discriminate the latter from completely resistant proteins. We found that the biosynthesis of almost all secreted proteins was cotransin-sensitive under these conditions. In contrast, biosynthesis of the majority of the integral membrane proteins was cotransin-resistant. Cotransin sensitivity of signal sequences was neither related to their length nor to their hydrophobicity. Instead, in the case of signal anchor sequences, we identified for the first time a conformational consensus motif mediating cotransin sensitivity. PMID:25806945

  16. Using machine learning to predict gene expression and discover sequence motifs

    NASA Astrophysics Data System (ADS)

    Li, Xuejing

    Recently, large amounts of experimental data for complex biological systems have become available. We use tools and algorithms from machine learning to build data-driven predictive models. We first present a novel algorithm to discover gene sequence motifs associated with temporal expression patterns of genes. Our algorithm, which is based on partial least squares (PLS) regression, is able to directly model the flow of information, from gene sequence to gene expression, to learn cis regulatory motifs and characterize associated gene expression patterns. Our algorithm outperforms traditional computational methods e.g. clustering in motif discovery. We then present a study of extending a machine learning model for transcriptional regulation predictive of genetic regulatory response to Caenorhabditis elegans. We show meaningful results both in terms of prediction accuracy on the test experiments and biological information extracted from the regulatory program. The model discovers DNA binding sites ab initio. We also present a case study where we detect a signal of lineage-specific regulation. Finally we present a comparative study on learning predictive models for motif discovery, based on different boosting algorithms: Adaptive Boosting (AdaBoost), Linear Programming Boosting (LPBoost) and Totally Corrective Boosting (TotalBoost). We evaluate and compare the performance of the three boosting algorithms via both statistical and biological validation, for hypoxia response in Saccharomyces cerevisiae.

  17. Sequence-specific intramembrane proteolysis: identification of a recognition motif in rhomboid substrates.

    PubMed

    Strisovsky, Kvido; Sharpe, Hayley J; Freeman, Matthew

    2009-12-25

    Members of the widespread rhomboid family of intramembrane proteases cleave transmembrane domain (TMD) proteins to regulate processes as diverse as EGF receptor signaling, mitochondrial dynamics, and invasion by apicomplexan parasites. However, lack of information about their substrates means that the biological role of most rhomboids remains obscure. Knowledge of how rhomboids recognize their substrates would illuminate their mechanism and might also allow substrate prediction. Previous work has suggested that rhomboid substrates are specified by helical instability in their TMD. Here we demonstrate that rhomboids instead primarily recognize a specific sequence surrounding the cleavage site. This recognition motif is necessary for substrate cleavage, it determines the cleavage site, and it is more strictly required than TM helix-destabilizing residues. Our work demonstrates that intramembrane proteases can be sequence specific and that genome-wide substrate prediction based on their recognition motifs is feasible. PMID:20064469

  18. A Convex Atomic-Norm Approach to Multiple Sequence Alignment and Motif Discovery

    PubMed Central

    Yen, Ian E. H.; Lin, Xin; Zhang, Jiong; Ravikumar, Pradeep; Dhillon, Inderjit S.

    2016-01-01

    Multiple Sequence Alignment and Motif Discovery, known as NP-hard problems, are two fundamental tasks in Bioinformatics. Existing approaches to these two problems are based on either local search methods such as Expectation Maximization (EM), Gibbs Sampling or greedy heuristic methods. In this work, we develop a convex relaxation approach to both problems based on the recent concept of atomic norm and develop a new algorithm, termed Greedy Direction Method of Multiplier, for solving the convex relaxation with two convex atomic constraints. Experiments show that our convex relaxation approach produces solutions of higher quality than those standard tools widely-used in Bioinformatics community on the Multiple Sequence Alignment and Motif Discovery problems. PMID:27559428

  19. CyanoLyase: a database of phycobilin lyase sequences, motifs and functions

    PubMed Central

    Bretaudeau, Anthony; Coste, François; Humily, Florian; Garczarek, Laurence; Le Corguillé, Gildas; Six, Christophe; Ratin, Morgane; Collin, Olivier; Schluchter, Wendy M.; Partensky, Frédéric

    2013-01-01

    CyanoLyase (http://cyanolyase.genouest.org/) is a manually curated sequence and motif database of phycobilin lyases and related proteins. These enzymes catalyze the covalent ligation of chromophores (phycobilins) to specific binding sites of phycobiliproteins (PBPs). The latter constitute the building bricks of phycobilisomes, the major light-harvesting systems of cyanobacteria and red algae. Phycobilin lyases sequences are poorly annotated in public databases. Sequences included in CyanoLyase were retrieved from all available genomes of these organisms and a few others by similarity searches using biochemically characterized enzyme sequences and then classified into 3 clans and 32 families. Amino acid motifs were computed for each family using Protomata learner. CyanoLyase also includes BLAST and a novel pattern matching tool (Protomatch) that allow users to rapidly retrieve and annotate lyases from any new genome. In addition, it provides phylogenetic analyses of all phycobilin lyases families, describes their function, their presence/absence in all genomes of the database (phyletic profiles) and predicts the chromophorylation of PBPs in each strain. The site also includes a thorough bibliography about phycobilin lyases and genomes included in the database. This resource should be useful to scientists and companies interested in natural or artificial PBPs, which have a number of biotechnological applications, notably as fluorescent markers. PMID:23175607

  20. Gapped alignment of protein sequence motifs through Monte Carlo optimization of a hidden Markov model

    PubMed Central

    Neuwald, Andrew F; Liu, Jun S

    2004-01-01

    Background Certain protein families are highly conserved across distantly related organisms and belong to large and functionally diverse superfamilies. The patterns of conservation present in these protein sequences presumably are due to selective constraints maintaining important but unknown structural mechanisms with some constraints specific to each family and others shared by a larger subset or by the entire superfamily. To exploit these patterns as a source of functional information, we recently devised a statistically based approach called contrast hierarchical alignment and interaction network (CHAIN) analysis, which infers the strengths of various categories of selective constraints from co-conserved patterns in a multiple alignment. The power of this approach strongly depends on the quality of the multiple alignments, which thus motivated development of theoretical concepts and strategies to improve alignment of conserved motifs within large sets of distantly related sequences. Results Here we describe a hidden Markov model (HMM), an algebraic system, and Markov chain Monte Carlo (MCMC) sampling strategies for alignment of multiple sequence motifs. The MCMC sampling strategies are useful both for alignment optimization and for adjusting position specific background amino acid frequencies for alignment uncertainties. Associated statistical formulations provide an objective measure of alignment quality as well as automatic gap penalty optimization. Improved alignments obtained in this way are compared with PSI-BLAST based alignments within the context of CHAIN analysis of three protein families: Giα subunits, prolyl oligopeptidases, and transitional endoplasmic reticulum (p97) AAA+ ATPases. Conclusion While not entirely replacing PSI-BLAST based alignments, which likewise may be optimized for CHAIN analysis using this approach, these motif-based methods often more accurately align very distantly related sequences and thus can provide a better measure of

  1. CDR3β sequence motifs regulate autoreactivity of human invariant NKT cell receptors.

    PubMed

    Chamoto, Kenji; Guo, Tingxi; Imataki, Osamu; Tanaka, Makito; Nakatsugawa, Munehide; Ochi, Toshiki; Yamashita, Yuki; Saito, Akiko M; Saito, Toshiki I; Butler, Marcus O; Hirano, Naoto

    2016-04-01

    Invariant natural killer T (iNKT) cells are a subset of T lymphocytes that recognize lipid ligands presented by monomorphic CD1d. Human iNKT T cell receptor (TCR) is largely composed of invariant Vα24 (Vα24i) TCRα chain and semi-variant Vβ11 TCRβ chain, where complementarity-determining region (CDR)3β is the sole variable region. One of the characteristic features of iNKT cells is that they retain autoreactivity even after the thymic selection. However, the molecular features of human iNKT TCR CDR3β sequences that regulate autoreactivity remain unknown. Since the numbers of iNKT cells with detectable autoreactivity in peripheral blood is limited, we introduced the Vα24i gene into peripheral T cells and generated a de novo human iNKT TCR repertoire. By stimulating the transfected T cells with artificial antigen presenting cells (aAPCs) presenting self-ligands, we enriched strongly autoreactive iNKT TCRs and isolated a large panel of human iNKT TCRs with a broad range autoreactivity. From this panel of unique iNKT TCRs, we deciphered three CDR3β sequence motifs frequently encoded by strongly-autoreactive iNKT TCRs: a VD region with 2 or more acidic amino acids, usage of the Jβ2-5 allele, and a CDR3β region of 13 amino acids in length. iNKT TCRs encoding 2 or 3 sequence motifs also exhibit higher autoreactivity than those encoding 0 or 1 motifs. These data facilitate our understanding of the molecular basis for human iNKT cell autoreactivity involved in immune responses associated with human disease. PMID:26748722

  2. Identification of sequence motifs involved in Dengue virus-host interactions.

    PubMed

    Asnet Mary, J; Paramasivan, R; Shenbagarathai, R

    2016-03-01

    Dengue fever is a rapidly spreading mosquito-borne virus infection, which remains a serious global public health problem. As there is no specific treatment or commercial vaccine available for effective control of the disease, the attempts on developing novel control strategies are underway. Viruses utilize the surface receptor proteins of host to enter into the cells. Though various proteins were said to be receptors of Dengue virus (DENV) using Virus Overlay Protein Binding Assay, the precise interaction between DENV and host is not explored. Understanding the structural features of domain III envelope glycoprotein would help in developing efficient antiviral inhibitors. Therefore, an attempt was made to identify the sequence motifs present in domain III envelope glycoprotein of Dengue virus. Computational analysis revealed that the NGR motif is present in the domain III envelope glycoprotein of DENV-1 and DENV-3. Similarly, DENV-1, DENV-2 and DENV-4 were found to contain Yxxphi motif which is a tyrosine-based sorting signal responsible for the interaction with a mu subunit of adaptor protein complex. High-throughput virtual screening resulted in five compounds as lead molecules based on glide score, which ranges from -4.664 to -6.52 kcal/Mol. This computational prediction provides an additional tool for understanding the virus-host interactions and helps to identify potential targets in the host. Further, experimental evidence is warranted to confirm the virus-host interactions and also inhibitory activity of reported lead compounds. PMID:25905427

  3. Sequence-motif Detection of NAD(P)-binding Proteins: Discovery of a Unique Antibacterial Drug Target

    NASA Astrophysics Data System (ADS)

    Hua, Yun Hao; Wu, Chih Yuan; Sargsyan, Karen; Lim, Carmay

    2014-09-01

    Many enzymes use nicotinamide adenine dinucleotide or nicotinamide adenine dinucleotide phosphate (NAD(P)) as essential coenzymes. These enzymes often do not share significant sequence identity and cannot be easily detected by sequence homology. Previously, we determined all distinct locally conserved pyrophosphate-binding structures (3d motifs) from NAD(P)-bound protein structures, from which 1d sequence motifs were derived. Here, we aim to establish the precision of these 3d and 1d motifs to annotate NAD(P)-binding proteins. We show that the pyrophosphate-binding 3d motifs are characteristic of NAD(P)-binding proteins, as they are rarely found in nonNAD(P)-binding proteins. Furthermore, several 1d motifs could distinguish between proteins that bind only NAD and those that bind only NADP. They could also distinguish between NAD(P)-binding proteins from nonNAD(P)-binding ones. Interestingly, one of the pyrophosphate-binding 3d and corresponding 1d motifs was found only in enoyl-acyl carrier protein reductases, which are enzymes essential for bacterial fatty acid biosynthesis. This unique 3d motif serves as an attractive novel drug target, as it is conserved across many bacterial species and is not found in human proteins.

  4. Sequence-motif Detection of NAD(P)-binding Proteins: Discovery of a Unique Antibacterial Drug Target

    PubMed Central

    Hua, Yun Hao; Wu, Chih Yuan; Sargsyan, Karen; Lim, Carmay

    2014-01-01

    Many enzymes use nicotinamide adenine dinucleotide or nicotinamide adenine dinucleotide phosphate (NAD(P)) as essential coenzymes. These enzymes often do not share significant sequence identity and cannot be easily detected by sequence homology. Previously, we determined all distinct locally conserved pyrophosphate-binding structures (3d motifs) from NAD(P)-bound protein structures, from which 1d sequence motifs were derived. Here, we aim to establish the precision of these 3d and 1d motifs to annotate NAD(P)-binding proteins. We show that the pyrophosphate-binding 3d motifs are characteristic of NAD(P)-binding proteins, as they are rarely found in nonNAD(P)-binding proteins. Furthermore, several 1d motifs could distinguish between proteins that bind only NAD and those that bind only NADP. They could also distinguish between NAD(P)-binding proteins from nonNAD(P)-binding ones. Interestingly, one of the pyrophosphate-binding 3d and corresponding 1d motifs was found only in enoyl-acyl carrier protein reductases, which are enzymes essential for bacterial fatty acid biosynthesis. This unique 3d motif serves as an attractive novel drug target, as it is conserved across many bacterial species and is not found in human proteins. PMID:25253464

  5. Role of two sequence motifs of mesencephalic astrocyte-derived neurotrophic factor in its survival-promoting activity

    PubMed Central

    Mätlik, K; Yu, Li-ying; Eesmaa, A; Hellman, M; Lindholm, P; Peränen, J; Galli, E; Anttila, J; Saarma, M; Permi, P; Airavaara, M; Arumäe, U

    2015-01-01

    Mesencephalic astrocyte-derived neurotrophic factor (MANF) is a prosurvival protein that protects the cells when applied intracellularly in vitro or extracellularly in vivo. Its protective mechanisms are poorly known. Here we studied the role of two short sequence motifs within the carboxy-(C) terminal domain of MANF in its neuroprotective activity: the CKGC sequence (a CXXC motif) that could be involved in redox reactions, and the C-terminal RTDL sequence, an endoplasmic reticulum (ER) retention signal. We mutated these motifs and analyzed the antiapoptotic effect and intracellular localization of these mutants of MANF when overexpressed in cultured sympathetic or sensory neurons. As an in vivo model for studying the effect of these mutants after their extracellular application, we used the rat model of cerebral ischemia. Even though we found no evidence for oxidoreductase activity of MANF, the mutation of CXXC motif completely abolished its protective effect, showing that this motif is crucial for both MANF's intracellular and extracellular activity. The RTDL motif was not needed for the neuroprotective activity of MANF after its extracellular application in the stroke model in vivo. However, in vitro the deletion of RTDL motif inactivated MANF in the sympathetic neurons where the mutant protein localized to Golgi, but not in the sensory neurons where the mutant localized to the ER, showing that intracellular MANF protects these peripheral neurons in vitro only when localized to the ER. PMID:26720341

  6. qPMS7: a fast algorithm for finding (ℓ, d)-motifs in DNA and protein sequences.

    PubMed

    Dinh, Hieu; Rajasekaran, Sanguthevar; Davila, Jaime

    2012-01-01

    Detection of rare events happening in a set of DNA/protein sequences could lead to new biological discoveries. One kind of such rare events is the presence of patterns called motifs in DNA/protein sequences. Finding motifs is a challenging problem since the general version of motif search has been proven to be intractable. Motifs discovery is an important problem in biology. For example, it is useful in the detection of transcription factor binding sites and transcriptional regulatory elements that are very crucial in understanding gene function, human disease, drug design, etc. Many versions of the motif search problem have been proposed in the literature. One such is the (ℓ, d)-motif search (or Planted Motif Search (PMS)). A generalized version of the PMS problem, namely, Quorum Planted Motif Search (qPMS), is shown to accurately model motifs in real data. However, solving the qPMS problem is an extremely difficult task because a special case of it, the PMS Problem, is already NP-hard, which means that any algorithm solving it can be expected to take exponential time in the worse case scenario. In this paper, we propose a novel algorithm named qPMS7 that tackles the qPMS problem on real data as well as challenging instances. Experimental results show that our Algorithm qPMS7 is on an average 5 times faster than the state-of-art algorithm. The executable program of Algorithm qPMS7 is freely available on the web at http://pms.engr.uconn.edu/downloads/qPMS7.zip. Our online motif discovery tools that use Algorithm qPMS7 are freely available at http://pms.engr.uconn.edu or http://motifsearch.com. PMID:22848493

  7. Unique Structural Features and Sequence Motifs of Proline Utilization A (PutA)

    PubMed Central

    Singh, Ranjan K.; Tanner, John J.

    2013-01-01

    Proline utilization A proteins (PutAs) are bifunctional enzymes that catalyze the oxidation of proline to glutamate using spatially separated proline dehydrogenase and pyrroline-5-carboxylate dehydrogenase active sites. Here we use the crystal structure of the minimalist PutA from Bradyrhizobium japonicum (BjPutA) along with sequence analysis to identify unique structural features of PutAs. This analysis shows that PutAs have secondary structural elements and domains not found in the related monofunctional enzymes. Some of these extra features are predicted to be important for substrate channeling in BjPutA. Multiple sequence alignment analysis shows that some PutAs have a 17-residue conserved motif in the C-terminal 20–30 residues of the polypeptide chain. The BjPutA structure shows that this motif helps seal the internal substrate-channeling cavity from the bulk medium. Finally, it is shown that some PutAs have a 100–200 residue domain of unknown function in the C-terminus that is not found in minimalist PutAs. Remote homology detection suggests that this domain is homologous to the oligomerization beta-hairpin and Rossmann fold domain of BjPutA. PMID:22201760

  8. Unique structural features and sequence motifs of proline utilization A (PutA).

    PubMed

    Singh, Ranjan K; Tanner, John J

    2012-01-01

    Proline utilization A proteins (PutAs) are bifunctional enzymes that catalyze the oxidation of proline to glutamate using spatially separated proline dehydrogenase and pyrroline-5-carboxylate dehydrogenase active sites. Here we use the crystal structure of the minimalist PutA from Bradyrhizobium japonicum (BjPutA) along with sequence analysis to identify unique structural features of PutAs. This analysis shows that PutAs have secondary structural elements and domains not found in the related monofunctional enzymes. Some of these extra features are predicted to be important for substrate channeling in BjPutA. Multiple sequence alignment analysis shows that some PutAs have a 17-residue conserved motif in the C-terminal 20-30 residues of the polypeptide chain. The BjPutA structure shows that this motif helps seal the internal substrate-channeling cavity from the bulk medium. Finally, it is shown that some PutAs have a 100-200 residue domain of unknown function in the C-terminus that is not found in minimalist PutAs. Remote homology detection suggests that this domain is homologous to the oligomerization beta-hairpin and Rossmann fold domain of BjPutA. PMID:22201760

  9. Quadfinder: server for identification and analysis of quadruplex-forming motifs in nucleotide sequences

    PubMed Central

    Scaria, Vinod; Hariharan, Manoj; Arora, Amit; Maiti, Souvik

    2006-01-01

    G-quadruplex secondary structures, which play a structural role in repetitive DNA such as telomeres, may also play a functional role at other genomic locations as targetable regulatory elements which control gene expression. The recent interest in application of quadruplexes in biological systems prompted us to develop a tool for the identification and analysis of quadruplex-forming nucleotide sequences especially in the RNA. Here we present Quadfinder, an online server for prediction and bioinformatics of uni-molecular quadruplex-forming nucleotide sequences. The server is designed to be user-friendly and needs minimal intervention by the user, while providing flexibility of defining the variants of the motif. The server is freely available at URL . PMID:16845097

  10. Sequence Analysis and Domain Motifs in the Porcine Skin Decorin Glycosaminoglycan Chain*

    PubMed Central

    Zhao, Xue; Yang, Bo; Solakylidirim, Kemal; Joo, Eun Ji; Toida, Toshihiko; Higashi, Kyohei; Linhardt, Robert J.; Li, Lingyun

    2013-01-01

    Decorin proteoglycan is comprised of a core protein containing a single O-linked dermatan sulfate/chondroitin sulfate glycosaminoglycan (GAG) chain. Although the sequence of the decorin core protein is determined by the gene encoding its structure, the structure of its GAG chain is determined in the Golgi. The recent application of modern MS to bikunin, a far simpler chondroitin sulfate proteoglycans, suggests that it has a single or small number of defined sequences. On this basis, a similar approach to sequence the decorin of porcine skin much larger and more structurally complex dermatan sulfate/chondroitin sulfate GAG chain was undertaken. This approach resulted in information on the consistency/variability of its linkage region at the reducing end of the GAG chain, its iduronic acid-rich domain, glucuronic acid-rich domain, and non-reducing end. A general motif for the porcine skin decorin GAG chain was established. A single small decorin GAG chain was sequenced using MS/MS analysis. The data obtained in the study suggest that the decorin GAG chain has a small or a limited number of sequences. PMID:23423381

  11. A search for small noncoding RNAs in Staphylococcus aureus reveals a conserved sequence motif for regulation

    PubMed Central

    Geissmann, Thomas; Chevalier, Clément; Cros, Marie-Josée; Boisset, Sandrine; Fechter, Pierre; Noirot, Céline; Schrenzel, Jacques; François, Patrice; Vandenesch, François; Gaspin, Christine; Romby, Pascale

    2009-01-01

    Bioinformatic analysis of the intergenic regions of Staphylococcus aureus predicted multiple regulatory regions. From this analysis, we characterized 11 novel noncoding RNAs (RsaA‐K) that are expressed in several S. aureus strains under different experimental conditions. Many of them accumulate in the late-exponential phase of growth. All ncRNAs are stable and their expression is Hfq-independent. The transcription of several of them is regulated by the alternative sigma B factor (RsaA, D and F) while the expression of RsaE is agrA-dependent. Six of these ncRNAs are specific to S. aureus, four are conserved in other Staphylococci, and RsaE is also present in Bacillaceae. Transcriptomic and proteomic analysis indicated that RsaE regulates the synthesis of proteins involved in various metabolic pathways. Phylogenetic analysis combined with RNA structure probing, searches for RsaE‐mRNA base pairing, and toeprinting assays indicate that a conserved and unpaired UCCC sequence motif of RsaE binds to target mRNAs and prevents the formation of the ribosomal initiation complex. This study unexpectedly shows that most of the novel ncRNAs carry the conserved C−rich motif, suggesting that they are members of a class of ncRNAs that target mRNAs by a shared mechanism. PMID:19786493

  12. Sequence motifs and prokaryotic expression of the reptilian paramyxovirus fusion protein

    USGS Publications Warehouse

    Franke, J.; Batts, W.N.; Ahne, W.; Kurath, G.; Winton, J.R.

    2006-01-01

    Fourteen reptilian paramyxovirus isolates were chosen to represent the known extent of genetic diversity among this novel group of viruses. Selected regions of the fusion (F) gene were sequenced, analyzed and compared. The F gene of all isolates contained conserved motifs homologous to those described for other members of the family Paramyxoviridae including: signal peptide, transmembrane domain, furin cleavage site, fusion peptide, N-linked glycosylation sites, and two heptad repeats, the second of which (HRB-LZ) had the characteristics of a leucine zipper. Selected regions of the fusion gene of isolate Gono-GER85 were inserted into a prokaryotic expression system to generate three recombinant protein fragments of various sizes. The longest recombinant protein was cleaved by furin into two fragments of predicted length. Western blot analysis with virus-neutralizing rabbit-antiserum against this isolate demonstrated that only the longest construct reacted with the antiserum. This construct was unique in containing 30 additional C-terminal amino acids that included most of the HRB-LZ. These results indicate that the F genes of reptilian paramyxoviruses contain highly conserved motifs typical of other members of the family and suggest that the HRB-LZ domain of the reptilian paramyxovirus F protein contains a linear antigenic epitope. ?? Springer-Verlag 2005.

  13. Temporal motifs reveal homophily, gender-specific patterns, and group talk in call sequences.

    PubMed

    Kovanen, Lauri; Kaski, Kimmo; Kertész, János; Saramäki, Jari

    2013-11-01

    Recent studies on electronic communication records have shown that human communication has complex temporal structure. We study how communication patterns that involve multiple individuals are affected by attributes such as sex and age. To this end, we represent the communication records as a colored temporal network where node color is used to represent individuals' attributes, and identify patterns known as temporal motifs. We then construct a null model for the occurrence of temporal motifs that takes into account the interaction frequencies and connectivity between nodes of different colors. This null model allows us to detect significant patterns in call sequences that cannot be observed in a static network that uses interaction frequencies as link weights. We find sex-related differences in communication patterns in a large dataset of mobile phone records and show the existence of temporal homophily, the tendency of similar individuals to participate in communication patterns beyond what would be expected on the basis of their average interaction frequencies. We also show that temporal patterns differ between dense and sparse neighborhoods in the network. Because also this result is independent of interaction frequencies, it can be seen as an extension of Granovetter's hypothesis to temporal networks. PMID:24145424

  14. Exceptional motifs in different Markov chain models for a statistical analysis of DNA sequences.

    PubMed

    Schbath, S; Prum, B; de Turckheim, E

    1995-01-01

    Identifying exceptional motifs is often used for extracting information from long DNA sequences. The two difficulties of the method are the choice of the model that defines the expected frequencies of words and the approximation of the variance of the difference T(W) between the number of occurrences of a word W and its estimation. We consider here different Markov chain models, either with stationary or periodic transition probabilities. We estimate the variance of the difference T(W) by the conditional variance of the number of occurrences of W given the oligonucleotides counts that define the model. Two applications show how to use asymptotically standard normal statistics associated with the counts to describe a given sequence in terms of its outlying words. Sequences of Escherichia coli and of Bacillus subtilis are compared with respect to their exceptional tri- and tetranucleotides. For both bacteria, exceptional 3-words are mainly found in the coding frame. E. coli palindrome counts are analyzed in different models, showing that many overabundant words are one-letter mutations of avoided palindromes. PMID:8521272

  15. Platelet immunoreceptor tyrosine-based activation motif (ITAM) signaling and vascular integrity.

    PubMed

    Boulaftali, Yacine; Hess, Paul R; Kahn, Mark L; Bergmeier, Wolfgang

    2014-03-28

    Platelets are well-known for their critical role in hemostasis, that is, the prevention of blood loss at sites of mechanical vessel injury. Inappropriate platelet activation and adhesion, however, can lead to thrombotic complications, such as myocardial infarction and stroke. To fulfill its role in hemostasis, the platelet is equipped with various G protein-coupled receptors that mediate the response to soluble agonists such as thrombin, ADP, and thromboxane A2. In addition to G protein-coupled receptors, platelets express 3 glycoproteins that belong to the family of immunoreceptor tyrosine-based activation motif receptors: Fc receptor γ chain, which is noncovalently associated with the glycoprotein VI collagen receptor, C-type lectin 2, the receptor for podoplanin, and Fc receptor γII A, a low-affinity receptor for immune complexes. Although both genetic and chemical approaches have documented a critical role for platelet G protein-coupled receptors in hemostasis, the contribution of immunoreceptor tyrosine-based activation motif receptors to this process is less defined. Studies performed during the past decade, however, have identified new roles for platelet immunoreceptor tyrosine-based activation motif signaling in vascular integrity in utero and at sites of inflammation. The purpose of this review is to summarize recent findings on how platelet immunoreceptor tyrosine-based activation motif signaling controls vascular integrity, both in the presence and absence of mechanical injury. PMID:24677237

  16. The nature of actinomycin D binding to d(AACCAXYG) sequence motifs

    PubMed Central

    Chen, Fu-Ming; Sha, Feng; Chin, Ko-Hsin; Chou, Shan-Ho

    2004-01-01

    Earlier studies by others had indicated that actinomycin D (ACTD) binds well to d(AACCATAG) and the end sequence TAG-3′ is essential for its strong binding. In an effort to verify these assertions and to uncover other possible strong ACTD binding sequences as well as to elucidate the nature of their binding, systematic studies have been carried out with oligomers of d(AACCAXYG) sequence motifs, where X and Y can be any DNA base. The results indicate that in addition to TAG-3′, oligomers ending with XAG-3′ and XCG-3′ all provide binding constants ≥1 × 107 M–1 and even sequences ending with XTG-3′ and XGG-3′ exhibit binding affinities in the range 1–8 × 106 M–1. The nature of the strong ACTD affinity of the sequences d(A1A2C3C4A5X6Y7G8) was delineated via comparative binding studies of d(AACCAAAG), d(AGCCAAAG) and their base substituted derivatives. Two binding modes are proposed to coexist, with the major component consisting of the 3′-terminus G base folding back to base pair with C4 and the ACTD inserting at A2C3C4 by looping out the C3 while both faces of the chromophore are stacked by A and G bases, respectively. The minor mode is for the G to base pair with C3 and to have the same A/chromophore/G stacking but without a looped out base. These assertions are supported by induced circular dichroic and fluorescence spectral measurements. PMID:14715925

  17. Exploiting topological constraints to reveal buried sequence motifs in the membrane-bound N-linked oligosaccharyl transferases.

    PubMed

    Jaffee, Marcie B; Imperiali, Barbara

    2011-09-01

    The central enzyme in N-linked glycosylation is the oligosaccharyl transferase (OTase), which catalyzes glycan transfer from a polyprenyldiphosphate-linked carrier to select asparagines within acceptor proteins. PglB from Campylobacter jejuni is a single-subunit OTase with homology to the Stt3 subunit of the complex multimeric yeast OTase. Sequence identity between PglB and Stt3 is low (17.9%); however, both have a similar predicted architecture and contain the conserved WWDxG motif. To investigate the relationship between PglB and other Stt3 proteins, sequence analysis was performed using 28 homologues from evolutionarily distant organisms. Since detection of small conserved motifs within large membrane-associated proteins is complicated by divergent sequences surrounding the motifs, we developed a program to parse sequences according to predicted topology and then analyze topologically related regions. This approach identified three conserved motifs that served as the basis for subsequent mutagenesis and functional studies. This work reveals that several inter-transmembrane loop regions of PglB/Stt3 contain strictly conserved motifs that are essential for PglB function. The recent publication of a 3.4 Å resolution structure of full-length C. lari OTase provides clear structural evidence that these loops play a fundamental role in catalysis [ Lizak , C. ; ( 2011 ) Nature 474 , 350 - 355 ]. The current study provides biochemical support for the role of the inter-transmembrane domain loops in OTase catalysis and demonstrates the utility of combining topology prediction and sequence analysis for exposing buried pockets of homology in large membrane proteins. The described approach allowed detection of the catalytic motifs prior to availability of structural data and reveals additional catalytically relevant residues that are not predicted by structural data alone. PMID:21812456

  18. Identification of Internal Transcribed Spacer Sequence Motifs in Truffles: a First Step toward Their DNA Bar Coding▿ †

    PubMed Central

    El Karkouri, Khalid; Murat, Claude; Zampieri, Elisa; Bonfante, Paola

    2007-01-01

    This work presents DNA sequence motifs from the internal transcribed spacer (ITS) of the nuclear rRNA repeat unit which are useful for the identification of five European and Asiatic truffles (Tuber magnatum, T. melanosporum, T. indicum, T. aestivum, and T. mesentericum). Truffles are edible mycorrhizal ascomycetes that show similar morphological characteristics but that have distinct organoleptic and economic values. A total of 36 out of 46 ITS1 or ITS2 sequence motifs have allowed an accurate in silico distinction of the five truffles to be made (i.e., by pattern matching and/or BLAST analysis on downloaded GenBank sequences and directly against GenBank databases). The motifs considered the intraspecific genetic variability of each species, including rare haplotypes, and assigned their respective species from either the ascocarps or ectomycorrhizas. The data indicate that short ITS1 or ITS2 motifs (≤50 bp in size) can be considered promising tools for truffle species identification. A dot blot hybridization analysis of T. magnatum and T. melanosporum compared with other close relatives or distant lineages allowed at least one highly specific motif to be identified for each species. These results were confirmed in a blind test which included new field isolates. The current work has provided a reliable new tool for a truffle oligonucleotide bar code and identification in ecological and evolutionary studies. PMID:17601808

  19. Identification of internal transcribed spacer sequence motifs in truffles: a first step toward their DNA bar coding.

    PubMed

    El Karkouri, Khalid; Murat, Claude; Zampieri, Elisa; Bonfante, Paola

    2007-08-01

    This work presents DNA sequence motifs from the internal transcribed spacer (ITS) of the nuclear rRNA repeat unit which are useful for the identification of five European and Asiatic truffles (Tuber magnatum, T. melanosporum, T. indicum, T. aestivum, and T. mesentericum). Truffles are edible mycorrhizal ascomycetes that show similar morphological characteristics but that have distinct organoleptic and economic values. A total of 36 out of 46 ITS1 or ITS2 sequence motifs have allowed an accurate in silico distinction of the five truffles to be made (i.e., by pattern matching and/or BLAST analysis on downloaded GenBank sequences and directly against GenBank databases). The motifs considered the intraspecific genetic variability of each species, including rare haplotypes, and assigned their respective species from either the ascocarps or ectomycorrhizas. The data indicate that short ITS1 or ITS2 motifs (< or = 50 bp in size) can be considered promising tools for truffle species identification. A dot blot hybridization analysis of T. magnatum and T. melanosporum compared with other close relatives or distant lineages allowed at least one highly specific motif to be identified for each species. These results were confirmed in a blind test which included new field isolates. The current work has provided a reliable new tool for a truffle oligonucleotide bar code and identification in ecological and evolutionary studies. PMID:17601808

  20. Protospacer Adjacent Motif (PAM)-Distal Sequences Engage CRISPR Cas9 DNA Target Cleavage

    PubMed Central

    Ethier, Sylvain; Schmeing, T. Martin; Dostie, Josée; Pelletier, Jerry

    2014-01-01

    The clustered regularly interspaced short palindromic repeat (CRISPR)-associated enzyme Cas9 is an RNA-guided nuclease that has been widely adapted for genome editing in eukaryotic cells. However, the in vivo target specificity of Cas9 is poorly understood and most studies rely on in silico predictions to define the potential off-target editing spectrum. Using chromatin immunoprecipitation followed by sequencing (ChIP-seq), we delineate the genome-wide binding panorama of catalytically inactive Cas9 directed by two different single guide (sg) RNAs targeting the Trp53 locus. Cas9:sgRNA complexes are able to load onto multiple sites with short seed regions adjacent to 5′NGG3′ protospacer adjacent motifs (PAM). Yet among 43 ChIP-seq sites harboring seed regions analyzed for mutational status, we find editing only at the intended on-target locus and one off-target site. In vitro analysis of target site recognition revealed that interactions between the 5′ end of the guide and PAM-distal target sequences are necessary to efficiently engage Cas9 nucleolytic activity, providing an explanation for why off-target editing is significantly lower than expected from ChIP-seq data. PMID:25275497

  1. Identification of an Electrostatic Ruler Motif for Sequence-Specific Binding of Collagenase to Collagen.

    PubMed

    Subramanian, Sundar Raman; Singam, Ettayapuram Ramaprasad Azhagiya; Berinski, Michael; Subramanian, Venkatesan; Wade, Rebecca C

    2016-08-25

    Sequence-specific cleavage of collagen by mammalian collagenase plays a pivotal role in cell function. Collagenases are matrix metalloproteinases that cleave the peptide bond at a specific position on fibrillar collagen. The collagenase Hemopexin-like (HPX) domain has been proposed to be responsible for substrate recognition, but the mechanism by which collagenases identify the cleavage site on fibrillar collagen is not clearly understood. In this study, Brownian dynamics simulations coupled with atomic-detail and coarse-grained molecular dynamics simulations were performed to dock matrix metalloproteinase-1 (MMP-1) on a collagen IIIα1 triple helical peptide. We find that the HPX domain recognizes the collagen triple helix at a conserved R-X11-R motif C-terminal to the cleavage site to which the HPX domain of collagen is guided electrostatically. The binding of the HPX domain between the two arginine residues is energetically stabilized by hydrophobic contacts with collagen. From the simulations and analysis of the sequences and structural flexibility of collagen and collagenase, a mechanistic scheme by which MMP-1 can recognize and bind collagen for proteolysis is proposed. PMID:27245212

  2. Sequence, structure, and cooperativity in folding of elementary protein structural motifs

    PubMed Central

    Lai, Jason K.; Kubelka, Ginka S.; Kubelka, Jan

    2015-01-01

    Residue-level unfolding of two helix-turn-helix proteins—one naturally occurring and one de novo designed—is reconstructed from multiple sets of site-specific 13C isotopically edited infrared (IR) and circular dichroism (CD) data using Ising-like statistical-mechanical models. Several model variants are parameterized to test the importance of sequence-specific interactions (approximated by Miyazawa–Jernigan statistical potentials), local structural flexibility (derived from the ensemble of NMR structures), interhelical hydrogen bonds, and native contacts separated by intervening disordered regions (through the Wako–Saitô–Muñoz–Eaton scheme, which disallows such configurations). The models are optimized by directly simulating experimental observables: CD ellipticity at 222 nm for model proteins and their fragments and 13C-amide I′ bands for multiple isotopologues of each protein. We find that data can be quantitatively reproduced by the model that allows two interacting segments flanking a disordered loop (double sequence approximation) and incorporates flexibility in the native contact maps, but neither sequence-specific interactions nor hydrogen bonds are required. The near-identical free energy profiles as a function of the global order parameter are consistent with expected similar folding kinetics for nearly identical structures. However, the predicted folding mechanism for the two motifs is different, reflecting the order of local stability. We introduce free energy profiles for “experimental” reaction coordinates—namely, the degree of local folding as sensed by site-specific 13C-edited IR, which highlight folding heterogeneity and contrast its overall, average description with the detailed, local picture. PMID:26216963

  3. Sequence, structure, and cooperativity in folding of elementary protein structural motifs.

    PubMed

    Lai, Jason K; Kubelka, Ginka S; Kubelka, Jan

    2015-08-11

    Residue-level unfolding of two helix-turn-helix proteins--one naturally occurring and one de novo designed--is reconstructed from multiple sets of site-specific (13)C isotopically edited infrared (IR) and circular dichroism (CD) data using Ising-like statistical-mechanical models. Several model variants are parameterized to test the importance of sequence-specific interactions (approximated by Miyazawa-Jernigan statistical potentials), local structural flexibility (derived from the ensemble of NMR structures), interhelical hydrogen bonds, and native contacts separated by intervening disordered regions (through the Wako-Saitô-Muñoz-Eaton scheme, which disallows such configurations). The models are optimized by directly simulating experimental observables: CD ellipticity at 222 nm for model proteins and their fragments and (13)C-amide I' bands for multiple isotopologues of each protein. We find that data can be quantitatively reproduced by the model that allows two interacting segments flanking a disordered loop (double sequence approximation) and incorporates flexibility in the native contact maps, but neither sequence-specific interactions nor hydrogen bonds are required. The near-identical free energy profiles as a function of the global order parameter are consistent with expected similar folding kinetics for nearly identical structures. However, the predicted folding mechanism for the two motifs is different, reflecting the order of local stability. We introduce free energy profiles for "experimental" reaction coordinates--namely, the degree of local folding as sensed by site-specific (13)C-edited IR, which highlight folding heterogeneity and contrast its overall, average description with the detailed, local picture. PMID:26216963

  4. AptaTRACE Elucidates RNA Sequence-Structure Motifs from Selection Trends in HT-SELEX Experiments.

    PubMed

    Dao, Phuong; Hoinka, Jan; Takahashi, Mayumi; Zhou, Jiehua; Ho, Michelle; Wang, Yijie; Costa, Fabrizio; Rossi, John J; Backofen, Rolf; Burnett, John; Przytycka, Teresa M

    2016-07-01

    Aptamers, short RNA or DNA molecules that bind distinct targets with high affinity and specificity, can be identified using high-throughput systematic evolution of ligands by exponential enrichment (HT-SELEX), but scalable analytic tools for understanding sequence-function relationships from diverse HT-SELEX data are not available. Here we present AptaTRACE, a computational approach that leverages the experimental design of the HT-SELEX protocol, RNA secondary structure, and the potential presence of many secondary motifs to identify sequence-structure motifs that show a signature of selection. We apply AptaTRACE to identify nine motifs in C-C chemokine receptor type 7 targeted by aptamers in an in vitro cell-SELEX experiment. We experimentally validate two aptamers whose binding required both sequence and structural features. AptaTRACE can identify low-abundance motifs, and we show through simulations that, because of this, it could lower HT-SELEX cost and time by reducing the number of selection cycles required. PMID:27467247

  5. Finishing and Special Motifs: Lessons Learned from CRISPR Analysis Using Next-Generation Draft Sequences ( 7th Annual SFAF Meeting, 2012)

    ScienceCinema

    Campbell, Catherine [Noblis

    2013-03-22

    Catherine Campbell on "Finishing and Special Motifs: Lessons learned from CRISPR analysis using next-generation draft sequences" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.

  6. Finishing and Special Motifs: Lessons Learned from CRISPR Analysis Using Next-Generation Draft Sequences ( 7th Annual SFAF Meeting, 2012)

    SciTech Connect

    Campbell, Catherine

    2012-06-01

    Catherine Campbell on "Finishing and Special Motifs: Lessons learned from CRISPR analysis using next-generation draft sequences" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.

  7. Endocytosis and Trafficking of Natriuretic Peptide Receptor-A: Potential Role of Short Sequence Motifs

    PubMed Central

    Pandey, Kailash N.

    2015-01-01

    The targeted endocytosis and redistribution of transmembrane receptors among membrane-bound subcellular organelles are vital for their correct signaling and physiological functions. Membrane receptors committed for internalization and trafficking pathways are sorted into coated vesicles. Cardiac hormones, atrial and brain natriuretic peptides (ANP and BNP) bind to guanylyl cyclase/natriuretic peptide receptor-A (GC-A/NPRA) and elicit the generation of intracellular second messenger cyclic guanosine 3',5'-monophosphate (cGMP), which lowers blood pressure and incidence of heart failure. After ligand binding, the receptor is rapidly internalized, sequestrated, and redistributed into intracellular locations. Thus, NPRA is considered a dynamic cellular macromolecule that traverses different subcellular locations through its lifetime. The utilization of pharmacologic and molecular perturbants has helped in delineating the pathways of endocytosis, trafficking, down-regulation, and degradation of membrane receptors in intact cells. This review describes the investigation of the mechanisms of internalization, trafficking, and redistribution of NPRA compared with other cell surface receptors from the plasma membrane into the cell interior. The roles of different short-signal peptide sequence motifs in the internalization and trafficking of other membrane receptors have been briefly reviewed and their potential significance in the internalization and trafficking of NPRA is discussed. PMID:26151885

  8. Viroids: From Genotype to Phenotype Just Relying on RNA Sequence and Structural Motifs

    PubMed Central

    Flores, Ricardo; Serra, Pedro; Minoia, Sofía; Di Serio, Francesco; Navarro, Beatriz

    2012-01-01

    As a consequence of two unique physical properties, small size and circularity, viroid RNAs do not code for proteins and thus depend on RNA sequence/structural motifs for interacting with host proteins that mediate their invasion, replication, spread, and circumvention of defensive barriers. Viroid genomes fold up on themselves adopting collapsed secondary structures wherein stretches of nucleotides stabilized by Watson–Crick pairs are flanked by apparently unstructured loops. However, compelling data show that they are instead stabilized by alternative non-canonical pairs and that specific loops in the rod-like secondary structure, characteristic of Potato spindle tuber viroid and most other members of the family Pospiviroidae, are critical for replication and systemic trafficking. In contrast, rather than folding into a rod-like secondary structure, most members of the family Avsunviroidae adopt multibranched conformations occasionally stabilized by kissing-loop interactions critical for viroid viability in vivo. Besides these most stable secondary structures, viroid RNAs alternatively adopt during replication transient metastable conformations containing elements of local higher-order structure, prominent among which are the hammerhead ribozymes catalyzing a key replicative step in the family Avsunviroidae, and certain conserved hairpins that also mediate replication steps in the family Pospiviroidae. Therefore, different RNA structures – either global or local – determine different functions, thus highlighting the need for in-depth structural studies on viroid RNAs. PMID:22719735

  9. Modeling of the Ebola virus delta peptide reveals a potential lytic sequence motif.

    PubMed

    Gallaher, William R; Garry, Robert F

    2015-01-01

    Filoviruses, such as Ebola and Marburg viruses, cause severe outbreaks of human infection, including the extensive epidemic of Ebola virus disease (EVD) in West Africa in 2014. In the course of examining mutations in the glycoprotein gene associated with 2014 Ebola virus (EBOV) sequences, a differential level of conservation was noted between the soluble form of glycoprotein (sGP) and the full length glycoprotein (GP), which are both encoded by the GP gene via RNA editing. In the region of the proteins encoded after the RNA editing site sGP was more conserved than the overlapping region of GP when compared to a distant outlier species, Tai Forest ebolavirus. Half of the amino acids comprising the "delta peptide", a 40 amino acid carboxy-terminal fragment of sGP, were identical between otherwise widely divergent species. A lysine-rich amphipathic peptide motif was noted at the carboxyl terminus of delta peptide with high structural relatedness to the cytolytic peptide of the non-structural protein 4 (NSP4) of rotavirus. EBOV delta peptide is a candidate viroporin, a cationic pore-forming peptide, and may contribute to EBOV pathogenesis. PMID:25609303

  10. DNA recognition for virus assembly through multiple sequence-independent interactions with a helix-turn-helix motif.

    PubMed

    Greive, Sandra J; Fung, Herman K H; Chechik, Maria; Jenkins, Huw T; Weitzel, Stephen E; Aguiar, Pedro M; Brentnall, Andrew S; Glousieau, Matthieu; Gladyshev, Grigory V; Potts, Jennifer R; Antson, Alfred A

    2016-01-29

    The helix-turn-helix (HTH) motif features frequently in protein DNA-binding assemblies. Viral pac site-targeting small terminase proteins possess an unusual architecture in which the HTH motifs are displayed in a ring, distinct from the classical HTH dimer. Here we investigate how such a circular array of HTH motifs enables specific recognition of the viral genome for initiation of DNA packaging during virus assembly. We found, by surface plasmon resonance and analytical ultracentrifugation, that individual HTH motifs of the Bacillus phage SF6 small terminase bind the packaging regions of SF6 and related SPP1 genome weakly, with little local sequence specificity. Nuclear magnetic resonance chemical shift perturbation studies with an arbitrary single-site substrate suggest that the HTH motif contacts DNA similarly to how certain HTH proteins contact DNA non-specifically. Our observations support a model where specificity is generated through conformational selection of an intrinsically bent DNA segment by a ring of HTHs which bind weakly but cooperatively. Such a system would enable viral gene regulation and control of the viral life cycle, with a minimal genome, conferring a major evolutionary advantage for SPP1-like viruses. PMID:26673721

  11. DNA recognition for virus assembly through multiple sequence-independent interactions with a helix-turn-helix motif

    PubMed Central

    Greive, Sandra J.; Fung, Herman K.H.; Chechik, Maria; Jenkins, Huw T.; Weitzel, Stephen E.; Aguiar, Pedro M.; Brentnall, Andrew S.; Glousieau, Matthieu; Gladyshev, Grigory V.; Potts, Jennifer R.; Antson, Alfred A.

    2016-01-01

    The helix-turn-helix (HTH) motif features frequently in protein DNA-binding assemblies. Viral pac site-targeting small terminase proteins possess an unusual architecture in which the HTH motifs are displayed in a ring, distinct from the classical HTH dimer. Here we investigate how such a circular array of HTH motifs enables specific recognition of the viral genome for initiation of DNA packaging during virus assembly. We found, by surface plasmon resonance and analytical ultracentrifugation, that individual HTH motifs of the Bacillus phage SF6 small terminase bind the packaging regions of SF6 and related SPP1 genome weakly, with little local sequence specificity. Nuclear magnetic resonance chemical shift perturbation studies with an arbitrary single-site substrate suggest that the HTH motif contacts DNA similarly to how certain HTH proteins contact DNA non-specifically. Our observations support a model where specificity is generated through conformational selection of an intrinsically bent DNA segment by a ring of HTHs which bind weakly but cooperatively. Such a system would enable viral gene regulation and control of the viral life cycle, with a minimal genome, conferring a major evolutionary advantage for SPP1-like viruses. PMID:26673721

  12. Defining RNA motif-aminoglycoside interactions via two-dimensional combinatorial screening and structure-activity relationships through sequencing.

    PubMed

    Velagapudi, Sai Pradeep; Disney, Matthew D

    2013-10-15

    RNA is an extremely important target for the development of chemical probes of function or small molecule therapeutics. Aminoglycosides are the most well studied class of small molecules to target RNA. However, the RNA motifs outside of the bacterial rRNA A-site that are likely to be bound by these compounds in biological systems is largely unknown. If such information were known, it could allow for aminoglycosides to be exploited to target other RNAs and, in addition, could provide invaluable insights into potential bystander targets of these clinically used drugs. We utilized two-dimensional combinatorial screening (2DCS), a library-versus-library screening approach, to select the motifs displayed in a 3×3 nucleotide internal loop library and in a 6-nucleotide hairpin library that bind with high affinity and selectivity to six aminoglycoside derivatives. The selected RNA motifs were then analyzed using structure-activity relationships through sequencing (StARTS), a statistical approach that defines the privileged RNA motif space that binds a small molecule. StARTS allowed for the facile annotation of the selected RNA motif-aminoglycoside interactions in terms of affinity and selectivity. The interactions selected by 2DCS generally have nanomolar affinities, which is higher affinity than the binding of aminoglycosides to a mimic of their therapeutic target, the bacterial rRNA A-site. PMID:23719281

  13. A survey of DNA motif finding algorithms

    PubMed Central

    Das, Modan K; Dai, Ho-Kwok

    2007-01-01

    Background Unraveling the mechanisms that regulate gene expression is a major challenge in biology. An important task in this challenge is to identify regulatory elements, especially the binding sites in deoxyribonucleic acid (DNA) for transcription factors. These binding sites are short DNA segments that are called motifs. Recent advances in genome sequence availability and in high-throughput gene expression analysis technologies have allowed for the development of computational methods for motif finding. As a result, a large number of motif finding algorithms have been implemented and applied to various motif models over the past decade. This survey reviews the latest developments in DNA motif finding algorithms. Results Earlier algorithms use promoter sequences of coregulated genes from single genome and search for statistically overrepresented motifs. Recent algorithms are designed to use phylogenetic footprinting or orthologous sequences and also an integrated approach where promoter sequences of coregulated genes and phylogenetic footprinting are used. All the algorithms studied have been reported to correctly detect the motifs that have been previously detected by laboratory experimental approaches, and some algorithms were able to find novel motifs. However, most of these motif finding algorithms have been shown to work successfully in yeast and other lower organisms, but perform significantly worse in higher organisms. Conclusion Despite considerable efforts to date, DNA motif finding remains a complex challenge for biologists and computer scientists. Researchers have taken many different approaches in developing motif discovery tools and the progress made in this area of research is very encouraging. Performance comparison of different motif finding tools and identification of the best tools have proven to be a difficult task because tools are designed based on algorithms and motif models that are diverse and complex and our incomplete understanding of

  14. Fine Scale Analysis of Crossover and Non-Crossover and Detection of Recombination Sequence Motifs in the Honeybee (Apis mellifera)

    PubMed Central

    Bessoltane, Nadia; Toffano-Nioche, Claire; Solignac, Michel; Mougel, Florence

    2012-01-01

    Background Meiotic exchanges are non-uniformly distributed across the genome of most studied organisms. This uneven distribution suggests that recombination is initiated by specific signals and/or regulations. Some of these signals were recently identified in humans and mice. However, it is unclear whether or not sequence signals are also involved in chromosomal recombination of insects. Methodology We analyzed recombination frequencies in the honeybee, in which genome sequencing provided a large amount of SNPs spread over the entire set of chromosomes. As the genome sequences were obtained from a pool of haploid males, which were the progeny of a single queen, an oocyte method (study of recombination on haploid males that develop from unfertilized eggs and hence are the direct reflect of female gametes haplotypes) was developed to detect recombined pairs of SNP sites. Sequences were further compared between recombinant and non-recombinant fragments to detect recombination-specific motifs. Conclusions Recombination events between adjacent SNP sites were detected at an average distance of 92 bp and revealed the existence of high rates of recombination events. This study also shows the presence of conversion without crossover (i. e. non-crossover) events, the number of which largely outnumbers that of crossover events. Furthermore the comparison of sequences that have undergone recombination with sequences that have not, led to the discovery of sequence motifs (CGCA, GCCGC, CCGCA), which may correspond to recombination signals. PMID:22567142

  15. Sequence and Spatiotemporal Expression Analysis of CLE-Motif Containing Genes from the Reniform Nematode (Rotylenchulus reniformis Linford & Oliveira)

    PubMed Central

    Wubben, Martin J.; Gavilano, Lily; Baum, Thomas J.; Davis, Eric L.

    2015-01-01

    The reniform nematode, Rotylenchulus reniformis, is a sedentary semi-endoparasitic species with a host range that encompasses more than 77 plant families. Nematode effector proteins containing plant-ligand motifs similar to CLAVATA3/ESR (CLE) peptides have been identified in the Heterodera, Globodera, and Meloidogyne genera of sedentary endoparasites. Here, we describe the isolation, sequence analysis, and spatiotemporal expression of three R. reniformis genes encoding putative CLE motifs named Rr-cle-1, Rr-cle-2, and Rr-cle-3. The Rr-cle cDNAs showed >98% identity with each other and the predicted peptides were identical with the exception of a short stretch of residues at the carboxy(C)-terminus of the variable domain (VD). Each RrCLE peptide possessed an amino-terminal signal peptide for secretion and a single C-terminal CLE motif that was most similar to Heterodera CLE motifs. Aligning the Rr-cle cDNAs with their corresponding genomic sequences showed three exons with an intron separating the signal peptide from the VD and a second intron separating the VD from the CLE motif. An alignment of the RrCLE1 peptide with Heterodera glycines and Heterodera schachtii CLE proteins revealed a high level of homology within the VD region associated with regulating in planta trafficking of the processed CLE peptide. Quantitative RT-PCR (qRT-PCR) showed similar expression profiles for each Rr-cle transcript across the R. reniformis life-cycle with the greatest transcript abundance being in sedentary parasitic female nematodes. In situ hybridization showed specific Rr-cle expression within the dorsal esophageal gland cell of sedentary parasitic females. PMID:26170479

  16. Sequence and Spatiotemporal Expression Analysis of CLE-Motif Containing Genes from the Reniform Nematode (Rotylenchulus reniformis Linford & Oliveira).

    PubMed

    Wubben, Martin J; Gavilano, Lily; Baum, Thomas J; Davis, Eric L

    2015-06-01

    The reniform nematode, Rotylenchulus reniformis, is a sedentary semi-endoparasitic species with a host range that encompasses more than 77 plant families. Nematode effector proteins containing plant-ligand motifs similar to CLAVATA3/ESR (CLE) peptides have been identified in the Heterodera, Globodera, and Meloidogyne genera of sedentary endoparasites. Here, we describe the isolation, sequence analysis, and spatiotemporal expression of three R. reniformis genes encoding putative CLE motifs named Rr-cle-1, Rr-cle-2, and Rr-cle-3. The Rr-cle cDNAs showed >98% identity with each other and the predicted peptides were identical with the exception of a short stretch of residues at the carboxy(C)-terminus of the variable domain (VD). Each RrCLE peptide possessed an amino-terminal signal peptide for secretion and a single C-terminal CLE motif that was most similar to Heterodera CLE motifs. Aligning the Rr-cle cDNAs with their corresponding genomic sequences showed three exons with an intron separating the signal peptide from the VD and a second intron separating the VD from the CLE motif. An alignment of the RrCLE1 peptide with Heterodera glycines and Heterodera schachtii CLE proteins revealed a high level of homology within the VD region associated with regulating in planta trafficking of the processed CLE peptide. Quantitative RT-PCR (qRT-PCR) showed similar expression profiles for each Rr-cle transcript across the R. reniformis life-cycle with the greatest transcript abundance being in sedentary parasitic female nematodes. In situ hybridization showed specific Rr-cle expression within the dorsal esophageal gland cell of sedentary parasitic females. PMID:26170479

  17. Application of PCR amplicon sequencing using a single primer pair in PCR amplification to assess variations in Helicobacter pylori CagA EPIYA tyrosine phosphorylation motifs

    PubMed Central

    2010-01-01

    Background The presence of various EPIYA tyrosine phosphorylation motifs in the CagA protein of Helicobacter pylori has been suggested to contribute to pathogenesis in adults. In this study, a unique PCR assay and sequencing strategy was developed to establish the number and variation of cagA EPIYA motifs. Findings MDA-DNA derived from gastric biopsy specimens from eleven subjects with gastritis was used with M13- and T7-sequence-tagged primers for amplification of the cagA EPIYA motif region. Automated capillary electrophoresis using a high resolution kit and amplicon sequencing confirmed variations in the cagA EPIYA motif region. In nine cases, sequencing revealed the presence of AB, ABC, or ABCC (Western type) cagA EPIYA motif, respectively. In two cases, double cagA EPIYA motifs were detected (ABC/ABCC or ABC/AB), indicating the presence of two H. pylori strains in the same biopsy. Conclusion Automated capillary electrophoresis and Amplicon sequencing using a single, M13- and T7-sequence-tagged primer pair in PCR amplification enabled a rapid molecular typing of cagA EPIYA motifs. Moreover, the techniques described allowed for a rapid detection of mixed H. pylori strains present in the same biopsy specimen. PMID:20181142

  18. A self-assembling peptide RADA16-I integrated with spider fibroin uncrystalline motifs

    PubMed Central

    Sun, Lijuan; Zhao, Xiaojun

    2012-01-01

    Mechanical strength of nanofiber scaffolds formed by the self-assembling peptide RADA16-I or its derivatives is not very good and limits their application. To address this problem, we inserted spidroin uncrystalline motifs, which confer incomparable elasticity and hydrophobicity to spider silk GGAGGS or GPGGY, into the C-terminus of RADA16-I to newly design two peptides: R3 (n-RADARADARADARADA-GGAGGS-c) and R4 (n-RADARADARADARADA-GPGGY-c), and then observed the effect of these motifs on biophysical properties of the peptide. Atomic force microscopy, transmitting electron microscopy, and circular dichroism spectroscopy confirm that R3 and R4 display β-sheet structure and self-assemble into long nanofibers. Compared with R3, the β-sheet structure and nanofibers formed by R4 are more stable; they change to random coil and unordered aggregation at higher temperature. Rheology measurements indicate that novel peptides form hydrogel when induced by DMEM, and the storage modulus of R3 and R4 hydrogel is 0.5 times and 3 times higher than that of RADA16-I, respectively. Furthermore, R4 hydrogel remarkably promotes growth of liver cell L02 and liver cancer cell SMCC7721 compared with 2D culture, determined by MTT assay. Novel peptides still have potential as hydrophobic drug carriers; they can stabilize pyrene microcrystals in aqueous solution and deliver this into a lipophilic environment, identified by fluorescence emission spectra. Altogether, the spider fibroin motif GPGGY most effectively enhances mechanical strength and hydrophobicity of the peptide. This study provides a new method in the design of nanobiomaterials and helps us to understand the role of the amino acid sequence in nanofiber formation. PMID:22346352

  19. An approach to delineate primers for a group of poorly conserved sequences incorporating the common motif region.

    PubMed

    Sahu, Mousumi; Sahu, Jagajjit; Sahoo, Smita; Dehury, Budheswar; Sarma, Kishore; Sarmah, Ranjan; Sen, Priyabrata; Modi, Mahendra Kumar; Barooah, Madhumita

    2012-01-01

    Glutathione synthetase (gshB) has previously been reported to confer tolerance to acidic soil condition in Rhizobium species. Cloning the gene coding for this enzyme necessitates the designing of proper primer sets which in turn depends on the identification of high quality sequence similarity in multiple global alignments. In this experiment, a group of homologous gene sequences related to gshB gene (accession no: gi-86355669:327589-328536) of Rhizobium etli CFN 42, were extracted from NCBI nucleotide sequence databases using BLASTN and were analyzed for designing degenerate primers. However, the T-coffee multiple global alignment results did not show any block of conserved region for the above sequence set to design the primers. Therefore, we attempted to identify the location of common motif region based on multiple local alignments employing the MEME algorithm supported with MAST and Primer3. The results revealed some common motif regions that enabled us to design the primer sets for related gshB gene sequences. The result will be validated in wet lab. PMID:22419837

  20. Composite motifs integrating multiple protein structures increase sensitivity for function prediction.

    PubMed

    Chen, Brian Y; Bryant, Drew H; Cruess, Amanda E; Bylund, Joseph H; Fofanov, Viacheslav Y; Kristensen, David M; Kimmel, Marek; Lichtarge, Olivier; Kavraki, Lydia E

    2007-01-01

    The study of disease often hinges on the biological function of proteins, but determining protein function is a difficult experimental process. To minimize duplicated effort, algorithms for function prediction seek characteristics indicative of possible protein function. One approach is to identify substructural matches of geometric and chemical similarity between motifs representing known active sites and target protein structures with unknown function. In earlier work, statistically significant matches of certain effective motifs have identified functionally related active sites. Effective motifs must be carefully designed to maintain similarity to functionally related sites (sensitivity) and avoid incidental similarities to functionally unrelated protein geometry (specificity). Existing motif design techniques use the geometry of a single protein structure. Poor selection of this structure can limit motif effectiveness if the selected functional site lacks similarity to functionally related sites. To address this problem, this paper presents composite motifs, which combine structures of functionally related active sites to potentially increase sensitivity. Our experimentation compares the effectiveness of composite motifs with simple motifs designed from single protein structures. On six distinct families of functionally related proteins, leave-one-out testing showed that composite motifs had sensitivity comparable to the most sensitive of all simple motifs and specificity comparable to the average simple motif. On our data set, we observed that composite motifs simultaneously capture variations in active site conformation, diminish the problem of selecting motif structures, and enable the fusion of protein structures from diverse data sources. PMID:17951837

  1. Bayesian Markov models consistently outperform PWMs at predicting motifs in nucleotide sequences.

    PubMed

    Siebert, Matthias; Söding, Johannes

    2016-07-27

    Position weight matrices (PWMs) are the standard model for DNA and RNA regulatory motifs. In PWMs nucleotide probabilities are independent of nucleotides at other positions. Models that account for dependencies need many parameters and are prone to overfitting. We have developed a Bayesian approach for motif discovery using Markov models in which conditional probabilities of order k - 1 act as priors for those of order k This Bayesian Markov model (BaMM) training automatically adapts model complexity to the amount of available data. We also derive an EM algorithm for de-novo discovery of enriched motifs. For transcription factor binding, BaMMs achieve significantly (P    =  1/16) higher cross-validated partial AUC than PWMs in 97% of 446 ChIP-seq ENCODE datasets and improve performance by 36% on average. BaMMs also learn complex multipartite motifs, improving predictions of transcription start sites, polyadenylation sites, bacterial pause sites, and RNA binding sites by 26-101%. BaMMs never performed worse than PWMs. These robust improvements argue in favour of generally replacing PWMs by BaMMs. PMID:27288444

  2. Two structurally distinct {kappa}B sequence motifs cooperatively control LPS-induced KC gene transcription in mouse macrophages

    SciTech Connect

    Ohmori, Y.; Fukumoto, S.; Hamilton, T.A.

    1995-10-01

    The mouse KC gene is an {alpha}-chemokine gene whose transcription is induced in mononuclear phagocytes by LPS. DNA sequences necessary for transcriptional control of KC by LPS were identified in the region flanking the transcription start site. Transient transfection analysis in macrophages using deletion mutants of a 1.5-kb sequence placed in front of the chloramphenicol acetyl transferase (CAT) gene identified an LPS-responsive region between residues -104 and +30. This region contained two {kappa}B sequence motifs. The first motif (position -70 to -59, {kappa}B1) is highly conserved in all three human GRO genes and in the mouse macrophage inflammatory protein-2 (MIP-2) gene. The second {kappa}B motif (position -89 to -78, {kappa}B2) was conserved only between the mouse and the rat KC genes. Consistent with previous reports, the highly conserved {kappa}B site ({kappa}B1) was essential for LPS inducibility. Surprisingly, the distal {kappa}B site ({kappa}B2) was also necessary for optimal response; mutation of either {kappa}B site markedly reduced sensitivity to LPS in RAW264.7 cells and to TNF-{alpha} in NIH 3T3 fibroblasts. Although both {kappa}B1 and {kappa}B2 sequences were able to bind members of the Rel homology family, including NF{kappa}B1 (P50), RelA (65), and c-Rel, the {kappa}B1 site bound these factors with higher affinity and functioned more effectively than the {kappa}B2 site in a heterologous promoter. These findings demonstrate that transcriptional control of the KC gene requires cooperation between two {kappa}B sites and is thus distinct from that of the three human GRO genes and the mouse MIP-2 gene. 71 refs., 8 figs.

  3. Sequence-Specific Recognition of DNA by Proteins: Binding Motifs Discovered Using a Novel Statistical/Computational Analysis.

    PubMed

    Jakubec, David; Laskowski, Roman A; Vondrasek, Jiri

    2016-01-01

    Decades of intensive experimental studies of the recognition of DNA sequences by proteins have provided us with a view of a diverse and complicated world in which few to no features are shared between individual DNA-binding protein families. The originally conceived direct readout of DNA residue sequences by amino acid side chains offers very limited capacity for sequence recognition, while the effects of the dynamic properties of the interacting partners remain difficult to quantify and almost impossible to generalise. In this work we investigated the energetic characteristics of all DNA residue-amino acid side chain combinations in the conformations found at the interaction interface in a very large set of protein-DNA complexes by the means of empirical potential-based calculations. General specificity-defining criteria were derived and utilised to look beyond the binding motifs considered in previous studies. Linking energetic favourability to the observed geometrical preferences, our approach reveals several additional amino acid motifs which can distinguish between individual DNA bases. Our results remained valid in environments with various dielectric properties. PMID:27384774

  4. Sequence-Specific Recognition of DNA by Proteins: Binding Motifs Discovered Using a Novel Statistical/Computational Analysis

    PubMed Central

    Jakubec, David; Laskowski, Roman A.; Vondrasek, Jiri

    2016-01-01

    Decades of intensive experimental studies of the recognition of DNA sequences by proteins have provided us with a view of a diverse and complicated world in which few to no features are shared between individual DNA-binding protein families. The originally conceived direct readout of DNA residue sequences by amino acid side chains offers very limited capacity for sequence recognition, while the effects of the dynamic properties of the interacting partners remain difficult to quantify and almost impossible to generalise. In this work we investigated the energetic characteristics of all DNA residue—amino acid side chain combinations in the conformations found at the interaction interface in a very large set of protein—DNA complexes by the means of empirical potential-based calculations. General specificity-defining criteria were derived and utilised to look beyond the binding motifs considered in previous studies. Linking energetic favourability to the observed geometrical preferences, our approach reveals several additional amino acid motifs which can distinguish between individual DNA bases. Our results remained valid in environments with various dielectric properties. PMID:27384774

  5. Incorporating substrate sequence motifs and spatial amino acid composition to identify kinase-specific phosphorylation sites on protein three-dimensional structures

    PubMed Central

    2013-01-01

    sites with similar sequenced motifs, this work also integrates the 3D structural information to improve the cross classifying specificity. PMID:24564522

  6. A Glance at Microsatellite Motifs from 454 Sequencing Reads of Watermelon Genomic DNA

    Technology Transfer Automated Retrieval System (TEKTRAN)

    A single 454 (Life Sciences Sequencing Technology) run of Charleston Gray watermelon (Citrullus lanatus var. lanatus) genomic DNA was performed and sequence data were assembled. A large scale identification of simple sequence repeat (SSR) was performed and SSR sequence data were used for the develo...

  7. In planta analysis of a cis-regulatory cytokinin response motif in Arabidopsis and identification of a novel enhancer sequence.

    PubMed

    Ramireddy, Eswarayya; Brenner, Wolfram G; Pfeifer, Andreas; Heyl, Alexander; Schmülling, Thomas

    2013-07-01

    The phytohormone cytokinin plays a key role in regulating plant growth and development, and is involved in numerous physiological responses to environmental changes. The type-B response regulators, which regulate the transcription of cytokinin response genes, are a part of the cytokinin signaling system. Arabidopsis thaliana encodes 11 type-B response regulators (type-B ARRs), and some of them were shown to bind in vitro to the core cytokinin response motif (CRM) 5'-(A/G)GAT(T/C)-3' or, in the case of ARR1, to an extended motif (ECRM), 5'-AAGAT(T/C)TT-3'. Here we obtained in planta proof for the functionality of the latter motif. Promoter deletion analysis of the primary cytokinin response gene ARR6 showed that a combination of two extended motifs within the promoter is required to mediate the full transcriptional activation by ARR1 and other type-B ARRs. CRMs were found to be over-represented in the vicinity of ECRMs in the promoters of cytokinin-regulated genes, suggesting their functional relevance. Moreover, an evolutionarily conserved 27 bp long T-rich region between -220 and -193 bp was identified and shown to be required for the full activation by type-B ARRs and the response to cytokinin. This novel enhancer is not bound by the DNA-binding domain of ARR1, indicating that additional proteins might be involved in mediating the transcriptional cytokinin response. Furthermore, genome-wide expression profiling identified genes, among them ARR16, whose induction by cytokinin depends on both ARR1 and other specific type-B ARRs. This together with the ECRM/CRM sequence clustering indicates cooperative action of different type-B ARRs for the activation of particular target genes. PMID:23620480

  8. Rapid characterization of CRISPR-Cas9 protospacer adjacent motif sequence elements.

    PubMed

    Karvelis, Tautvydas; Gasiunas, Giedrius; Young, Joshua; Bigelyte, Greta; Silanskas, Arunas; Cigan, Mark; Siksnys, Virginijus

    2015-01-01

    To expand the repertoire of Cas9s available for genome targeting, we present a new in vitro method for the simultaneous examination of guide RNA and protospacer adjacent motif (PAM) requirements. The method relies on the in vitro cleavage of plasmid libraries containing a randomized PAM as a function of Cas9-guide RNA complex concentration. Using this method, we accurately reproduce the canonical PAM preferences for Streptococcus pyogenes, Streptococcus thermophilus CRISPR3 (Sth3), and CRISPR1 (Sth1). Additionally, PAM and sgRNA solutions for a novel Cas9 protein from Brevibacillus laterosporus are provided by the assay and are demonstrated to support functional activity in vitro and in plants. PMID:26585795

  9. An apoptosis-inhibiting gene from a nuclear polyhedrosis virus encoding a polypeptide with Cys/His sequence motifs.

    PubMed Central

    Birnbaum, M J; Clem, R J; Miller, L K

    1994-01-01

    Two different baculovirus genes are known to be able to block apoptosis triggered upon infection of Spodoptera frugiperda cells with p35 mutants of the insect baculovirus Autographa californica nuclear polyhedrosis virus (AcMNPV):p35 (P35-encoding gene) of AcMNPV (R. J. Clem, M. Fechheimer, and L. K. Miller, Science 254:1388-1390, 1991) and iap (inhibitor of apoptosis gene) of Cydia pomonella granulosis virus (CpGV) (N. E. Crook, R. J. Clem, and L. K. Miller, J. Virol. 67:2168-2174, 1993). Using a genetic complementation assay to identify additional genes which inhibit apoptosis during infection with a p35 mutant, we have isolated a gene from Orgyia pseudotsugata NPV (OpMNPV) that was able to functionally substitute for AcMNPV p35. The nucleotide sequence of this gene, Op-iap, predicted a 30-kDa polypeptide product with approximately 58% amino acid sequence identity to the product of CpGV iap, Cp-IAP. Like Cp-IAP, the predicted product of Op-iap has a carboxy-terminal C3HC4 zinc finger-like motif. In addition, a pair of additional cysteine/histidine motifs were found in the N-terminal regions of both polypeptide sequences. Recombinant p35 mutant viruses carrying either Op-iap or Cp-iap appeared to have a normal phenotype in S. frugiperda cells. Thus, Cp-IAP and Op-IAP appear to be functionally analogous to P35 but are likely to block apoptosis by a different mechanism which may involve direct interaction with DNA. Images PMID:8139034

  10. An apoptosis-inhibiting gene from a nuclear polyhedrosis virus encoding a polypeptide with Cys/His sequence motifs.

    PubMed

    Birnbaum, M J; Clem, R J; Miller, L K

    1994-04-01

    Two different baculovirus genes are known to be able to block apoptosis triggered upon infection of Spodoptera frugiperda cells with p35 mutants of the insect baculovirus Autographa californica nuclear polyhedrosis virus (AcMNPV):p35 (P35-encoding gene) of AcMNPV (R. J. Clem, M. Fechheimer, and L. K. Miller, Science 254:1388-1390, 1991) and iap (inhibitor of apoptosis gene) of Cydia pomonella granulosis virus (CpGV) (N. E. Crook, R. J. Clem, and L. K. Miller, J. Virol. 67:2168-2174, 1993). Using a genetic complementation assay to identify additional genes which inhibit apoptosis during infection with a p35 mutant, we have isolated a gene from Orgyia pseudotsugata NPV (OpMNPV) that was able to functionally substitute for AcMNPV p35. The nucleotide sequence of this gene, Op-iap, predicted a 30-kDa polypeptide product with approximately 58% amino acid sequence identity to the product of CpGV iap, Cp-IAP. Like Cp-IAP, the predicted product of Op-iap has a carboxy-terminal C3HC4 zinc finger-like motif. In addition, a pair of additional cysteine/histidine motifs were found in the N-terminal regions of both polypeptide sequences. Recombinant p35 mutant viruses carrying either Op-iap or Cp-iap appeared to have a normal phenotype in S. frugiperda cells. Thus, Cp-IAP and Op-IAP appear to be functionally analogous to P35 but are likely to block apoptosis by a different mechanism which may involve direct interaction with DNA. PMID:8139034

  11. Identification of Promoter Motifs Involved in the Network of Phytochrome A-Regulated Gene Expression by Combined Analysis of Genomic Sequence and Microarray Data1[w

    PubMed Central

    Hudson, Matthew E.; Quail, Peter H.

    2003-01-01

    Several hundred Arabidopsis genes, transcriptionally regulated by phytochrome A (phyA), were previously identified using an oligonucleotide microarray. We have now identified, in silico, conserved sequence motifs in the promoters of these genes by comparing the promoter sequences to those of all the genes present on the microarray from which they were sampled. This was done using a Perl script (called Sift) that identifies over-represented motifs using an enumerative approach. The utility of Sift was verified by analysis of circadian-regulated promoters known to contain a biologically significant motif. Several elements were then identified in phyA-responsive promoters by their over-representation. Five previously undescribed motifs were detected in the promoters of phyA-induced genes. Four novel motifs were found in phyA-repressed promoters, plus a motif that strongly resembles the DE1 element. The G-box, CACGTG, was a prominent hit in both induced and repressed phyA-responsive promoters. Intriguingly, two distinct flanking consensus sequences were observed adjacent to the G-box core sequence: one predominating in phyA-induced promoters, the other in phyA-repressed promoters. Such different conserved flanking nucleotides around the core motif in these two sets of promoters may indicate that different members of the same family of DNA-binding proteins mediate phyA induction and repression. An increased abundance of G-box sequences was observed in the most rapidly phyA-responsive genes and in the promoters of phyA-regulated transcription factors, indicating that G-box-binding transcription factors are upstream components in a transcriptional cascade that mediates phyA-regulated development. PMID:14681527

  12. Gibbs motif sampling: detection of bacterial outer membrane protein repeats.

    PubMed Central

    Neuwald, A. F.; Liu, J. S.; Lawrence, C. E.

    1995-01-01

    The detection and alignment of locally conserved regions (motifs) in multiple sequences can provide insight into protein structure, function, and evolution. A new Gibbs sampling algorithm is described that detects motif-encoding regions in sequences and optimally partitions them into distinct motif models; this is illustrated using a set of immunoglobulin fold proteins. When applied to sequences sharing a single motif, the sampler can be used to classify motif regions into related submodels, as is illustrated using helix-turn-helix DNA-binding proteins. Other statistically based procedures are described for searching a database for sequences matching motifs found by the sampler. When applied to a set of 32 very distantly related bacterial integral outer membrane proteins, the sampler revealed that they share a subtle, repetitive motif. Although BLAST (Altschul SF et al., 1990, J Mol Biol 215:403-410) fails to detect significant pairwise similarity between any of the sequences, the repeats present in these outer membrane proteins, taken as a whole, are highly significant (based on a generally applicable statistical test for motifs described here). Analysis of bacterial porins with known trimeric beta-barrel structure and related proteins reveals a similar repetitive motif corresponding to alternating membrane-spanning beta-strands. These beta-strands occur on the membrane interface (as opposed to the trimeric interface) of the beta-barrel. The broad conservation and structural location of these repeats suggests that they play important functional roles. PMID:8520488

  13. Cross-reactivity between the rheumatoid arthritis-associated motif EQKRAA and structurally related sequences found in Proteus mirabilis.

    PubMed

    Tiwana, H; Wilson, C; Alvarez, A; Abuknesha, R; Bansal, S; Ebringer, A

    1999-06-01

    Cross-reactivity or molecular mimicry may be one of the underlying mechanisms involved in the etiopathogenesis of rheumatoid arthritis (RA). Antiserum against the RA susceptibility sequence EQKRAA was shown to bind to a similar peptide ESRRAL present in the hemolysin of the gram-negative bacterium Proteus mirabilis, and an anti-ESRRAL serum reacted with EQKRAA. There was no reactivity with either anti-EQKRAA or anti-ESRRAL to a peptide containing the EDERAA sequence which is present in HLA-DRB1*0402, an allele not associated with RA. Furthermore, the EQKRAA and ESRRAL antisera bound to a mouse fibroblast transfectant cell line (Dap.3) expressing HLA-DRB1*0401 but not to DRB1*0402. However, peptide sequences structurally related to the RA susceptibility motif LEIEKDFTTYGEE (P. mirabilis urease), VEIRAEGNRFTY (collagen type II) and DELSPETSPYVKE (collagen type XI) did not bind significantly to cell lines expressing HLA-DRB1*0401 or HLA-DRB1*0402 compared to the control peptide YASGASGASGAS. It is suggested here that molecular mimicry between HLA alleles associated with RA and P. mirabilis may be relevant in the etiopathogenesis of the disease. PMID:10338479

  14. Structural Analysis of a Repetitive Protein Sequence Motif in Strepsirrhine Primate Amelogenin

    PubMed Central

    Bromley, Keith M.; Hacia, Joseph G.; Bromage, Timothy G.; Snead, Malcolm L.; Moradian-Oldak, Janet; Paine, Michael L.

    2011-01-01

    Strepsirrhines are members of a primate suborder that has a distinctive set of features associated with the development of the dentition. Amelogenin (AMEL), the better known of the enamel matrix proteins, forms 90% of the secreted organic matrix during amelogenesis. Although AMEL has been sequenced in numerous mammalian lineages, the only reported strepsirrhine AMEL sequences are those of the ring-tailed lemur and galago, which contain a set of additional proline-rich tandem repeats absent in all other primates species analyzed to date, but present in some non-primate mammals. Here, we first determined that these repeats are present in AMEL from three additional lemur species and thus are likely to be widespread throughout this group. To evaluate the functional relevance of these repeats in strepsirrhines, we engineered a mutated murine amelogenin sequence containing a similar proline-rich sequence to that of Lemur catta. In the monomeric form, the MQP insertions had no influence on the secondary structure or refolding properties, whereas in the assembled form, the insertions increased the hydrodynamic radii. We speculate that increased AMEL nanosphere size may influence enamel formation in strepsirrhine primates. PMID:21437261

  15. Structural analysis of a repetitive protein sequence motif in strepsirrhine primate amelogenin.

    PubMed

    Lacruz, Rodrigo S; Lakshminarayanan, Rajamani; Bromley, Keith M; Hacia, Joseph G; Bromage, Timothy G; Snead, Malcolm L; Moradian-Oldak, Janet; Paine, Michael L

    2011-01-01

    Strepsirrhines are members of a primate suborder that has a distinctive set of features associated with the development of the dentition. Amelogenin (AMEL), the better known of the enamel matrix proteins, forms 90% of the secreted organic matrix during amelogenesis. Although AMEL has been sequenced in numerous mammalian lineages, the only reported strepsirrhine AMEL sequences are those of the ring-tailed lemur and galago, which contain a set of additional proline-rich tandem repeats absent in all other primates species analyzed to date, but present in some non-primate mammals. Here, we first determined that these repeats are present in AMEL from three additional lemur species and thus are likely to be widespread throughout this group. To evaluate the functional relevance of these repeats in strepsirrhines, we engineered a mutated murine amelogenin sequence containing a similar proline-rich sequence to that of Lemur catta. In the monomeric form, the MQP insertions had no influence on the secondary structure or refolding properties, whereas in the assembled form, the insertions increased the hydrodynamic radii. We speculate that increased AMEL nanosphere size may influence enamel formation in strepsirrhine primates. PMID:21437261

  16. MPS Editor - An Integrated Sequencing Environment

    NASA Technical Reports Server (NTRS)

    Streiffert, Barbara A.; O'Reilly, Taifun; Schrock, Mitchell; Catchen, Jaime

    2010-01-01

    In today's operations environment, the teams are smaller and need to be more efficient while still ensuring the safety and success of the mission. In addition, teams often begin working on a mission in its early development phases and continue on the team through actual operations. For these reasons the operations teams want to be presented with a software environment that integrates multiple needed software applications as well as providing them with context sensitive editing support for entering commands and sequences of commands. At Jet Propulsion Laboratory, the Multi-Mission Planning and Sequencing (MPS) Editor provided by the Multi-Mission Ground Systems and Services (MGSS) supports those operational needs.

  17. High-throughput sequencing enhanced phage display enables the identification of patient-specific epitope motifs in serum

    PubMed Central

    Christiansen, Anders; Kringelum, Jens V.; Hansen, Christian S.; Bøgh, Katrine L.; Sullivan, Eric; Patel, Jigar; Rigby, Neil M.; Eiwegger, Thomas; Szépfalusi, Zsolt; Masi, Federico de; Nielsen, Morten; Lund, Ole; Dufva, Martin

    2015-01-01

    Phage display is a prominent screening technique with a multitude of applications including therapeutic antibody development and mapping of antigen epitopes. In this study, phages were selected based on their interaction with patient serum and exhaustively characterised by high-throughput sequencing. A bioinformatics approach was developed in order to identify peptide motifs of interest based on clustering and contrasting to control samples. Comparison of patient and control samples confirmed a major issue in phage display, namely the selection of unspecific peptides. The potential of the bioinformatic approach was demonstrated by identifying epitopes of a prominent peanut allergen, Ara h 1, in sera from patients with severe peanut allergy. The identified epitopes were confirmed by high-density peptide micro-arrays. The present study demonstrates that high-throughput sequencing can empower phage display by (i) enabling the analysis of complex biological samples, (ii) circumventing the traditional laborious picking and functional testing of individual phage clones and (iii) reducing the number of selection rounds. PMID:26246327

  18. CpG island erosion, polycomb occupancy and sequence motif enrichment at bivalent promoters in mammalian embryonic stem cells

    PubMed Central

    Mantsoki, Anna; Devailly, Guillaume; Joshi, Anagha

    2015-01-01

    In embryonic stem (ES) cells, developmental regulators have a characteristic bivalent chromatin signature marked by simultaneous presence of both activation (H3K4me3) and repression (H3K27me3) signals and are thought to be in a ‘poised’ state for subsequent activation or silencing during differentiation. We collected eleven pairs (H3K4me3 and H3K27me3) of ChIP sequencing datasets in human ES cells and eight pairs in murine ES cells, and predicted high-confidence (HC) bivalent promoters. Over 85% of H3K27me3 marked promoters were bivalent in human and mouse ES cells. We found that (i) HC bivalent promoters were enriched for developmental factors and were highly likely to be differentially expressed upon transcription factor perturbation; (ii) murine HC bivalent promoters were occupied by both polycomb repressive component classes (PRC1 and PRC2) and grouped into four distinct clusters with different biological functions; (iii) HC bivalent and active promoters were CpG rich while H3K27me3-only promoters lacked CpG islands. Binding enrichment of distinct sets of regulators distinguished bivalent from active promoters. Moreover, a ‘TCCCC’ sequence motif was specifically enriched in bivalent promoters. Finally, this analysis will serve as a resource for future studies to further understand transcriptional regulation during embryonic development. PMID:26582124

  19. On the Concept of Cis-regulatory Information: From Sequence Motifs to Logic Functions

    NASA Astrophysics Data System (ADS)

    Tarpine, Ryan; Istrail, Sorin

    The regulatory genome is about the “system level organization of the core genomic regulatory apparatus, and how this is the locus of causality underlying the twin phenomena of animal development and animal evolution” (E.H. Davidson. The Regulatory Genome: Gene Regulatory Networks in Development and Evolution, Academic Press, 2006). Information processing in the regulatory genome is done through regulatory states, defined as sets of transcription factors (sequence-specific DNA binding proteins which determine gene expression) that are expressed and active at the same time. The core information processing machinery consists of modular DNA sequence elements, called cis-modules, that interact with transcription factors. The cis-modules “read” the information contained in the regulatory state of the cell through transcription factor binding, “process” it, and directly or indirectly communicate with the basal transcription apparatus to determine gene expression. This endowment of each gene with the information-receiving capacity through their cis-regulatory modules is essential for the response to every possible regulatory state to which it might be exposed during all phases of the life cycle and in all cell types. We present here a set of challenges addressed by our CYRENE research project aimed at studying the cis-regulatory code of the regulatory genome. The CYRENE Project is devoted to (1) the construction of a database, the cis-Lexicon, containing comprehensive information across species about experimentally validated cis-regulatory modules; and (2) the software development of a next-generation genome browser, the cis-Browser, specialized for the regulatory genome. The presentation is anchored on three main computational challenges: the Gene Naming Problem, the Consensus Sequence Bottleneck Problem, and the Logic Function Inference Problem.

  20. Integrating bioinformatic resources to predict transcription factors interacting with cis-sequences conserved in co-regulated genes

    PubMed Central

    2014-01-01

    Background Using motif detection programs it is fairly straightforward to identify conserved cis-sequences in promoters of co-regulated genes. In contrast, the identification of the transcription factors (TFs) interacting with these cis-sequences is much more elaborate. To facilitate this, we explore the possibility of using several bioinformatic and experimental approaches for TF identification. This starts with the selection of co-regulated gene sets and leads first to the prediction and then to the experimental validation of TFs interacting with cis-sequences conserved in the promoters of these co-regulated genes. Results Using the PathoPlant database, 32 up-regulated gene groups were identified with microarray data for drought-responsive gene expression from Arabidopsis thaliana. Application of the binding site estimation suite of tools (BEST) discovered 179 conserved sequence motifs within the corresponding promoters. Using the STAMP web-server, 49 sequence motifs were classified into 7 motif families for which similarities with known cis-regulatory sequences were identified. All motifs were subjected to a footprintDB analysis to predict interacting DNA binding domains from plant TF families. Predictions were confirmed by using a yeast-one-hybrid approach to select interacting TFs belonging to the predicted TF families. TF-DNA interactions were further experimentally validated in yeast and with a Physcomitrella patens transient expression system, leading to the discovery of several novel TF-DNA interactions. Conclusions The present work demonstrates the successful integration of several bioinformatic resources with experimental approaches to predict and validate TFs interacting with conserved sequence motifs in co-regulated genes. PMID:24773781

  1. Flow Cytometry-assisted Cloning of Specific Sequence Motifs fromComplex 16S ribosomal RNA Gene Libraries.

    SciTech Connect

    Nielsen, J.L.; Schramm, A.; Bernhard, A.E.; van den Engh, G.J.; Stahl, D.A.

    2004-07-21

    A flow cytometry method was developed for rapid screeningand recovery of cloned DNA containing common sequence motifs. Thisapproach, termed fluorescence-activated cell sorting-assisted cloning,was used to recover sequences affiliated with a unique lineage within theBacteroidetes not abundant in a clone library of environmental 16S rRNAgenes. Retrieval and sequence analysis of phylogenetically informativegenes has become a standard cultivation-independent technique toinvestigate microbial diversity in nature (7, 18). Genes encoding the 16SrRNA, because of the relative ease of their selective amplification, havebeen most frequently employed for general diversity surveys (16).Environmental studies have also focused on specific subpopulationsaffiliated with a phylogenetic group or identified by genes encodingspecific metabolic functions (e.g., ammonia oxidation, sulfaterespiration, and nitrate reduction) (8,15,20). However, specificpopulations may be of low abundance (1,23), or the genes encodingspecific metabolic functions may be insufficiently conserved to providepriming sites for general PCR amplification. Three general approacheshave been used to obtain 16S rRNA sequence information from low-abundancepopulations: screening hundreds to thousands of clones in a general 16SrRNA gene library (21), flow cytometric sorting of a subpopulation ofenvironmentally derived cells labeled by fluorescent in situhybridization (FISH) (27), or selective PCR amplification using primersspecific for the subpopulation (2,23). While the first approach is simplytime-consuming and tedious, the second has been restricted to fairlylarge and strongly fluorescent cells from aquatic samples (5, 27). Thethird approach often generates fragments of only a few hundred bases dueto the limited number of specific priming sites. Partial sequenceinformation often degrades analysis, obscuring or distorting thephylogenetic placement of the new sequences (11, 20). A more robustcharacterization of environ

  2. Rare k-mer DNA: Identification of sequence motifs and prediction of CpG island and promoter.

    PubMed

    Mohamed Hashim, Ezzeddin Kamil; Abdullah, Rosni

    2015-12-21

    Empirical analysis on k-mer DNA has been proven as an effective tool in finding unique patterns in DNA sequences which can lead to the discovery of potential sequence motifs. In an extensive study of empirical k-mer DNA on hundreds of organisms, the researchers found unique multi-modal k-mer spectra occur in the genomes of organisms from the tetrapod clade only which includes all mammals. The multi-modality is caused by the formation of the two lowest modes where k-mers under them are referred as the rare k-mers. The suppression of the two lowest modes (or the rare k-mers) can be attributed to the CG dinucleotide inclusions in them. Apart from that, the rare k-mers are selectively distributed in certain genomic features of CpG Island (CGI), promoter, 5' UTR, and exon. We correlated the rare k-mers with hundreds of annotated features using several bioinformatic tools, performed further intrinsic rare k-mer analyses within the correlated features, and modeled the elucidated rare k-mer clustering feature into a classifier to predict the correlated CGI and promoter features. Our correlation results show that rare k-mers are highly associated with several annotated features of CGI, promoter, 5' UTR, and open chromatin regions. Our intrinsic results show that rare k-mers have several unique topological, compositional, and clustering properties in CGI and promoter features. Finally, the performances of our RWC (rare-word clustering) method in predicting the CGI and promoter features are ranked among the top three, in eight of the CGI and promoter evaluations, among eight of the benchmarked datasets. PMID:26427337

  3. Identification of G and P genotype-specific motifs in the predicted VP7 and VP4 amino acid sequences.

    PubMed

    Ma, Yongping

    2015-12-01

    Equine rotavirus (ERV) strain L338 (G13P[18]) has a unique G and P genotype. However, the evolutionary relationship of L338 with other ERVs is still unknown. Here whole genome analysis of the L338 ERV strain was independently performed. Its genotype constellations were determined as G13-P[18]-I6-R9-C9-M6-A6-N9-T12-E14-H11, confirming previous genotype assignments. The L338 strain only shared the P[18] and I6 genotypes with other ERVs. The nucleotide sequences of the other 9 RNA segments were different from those of cogent genes of all other group A rotavirus (RVA) strains including ERVs and formed unique phylogenetic lineages. The L338 evolutionary footprints were tentatively identified in both VP7 and VP4 amino acid sequences: two regions were found in VP7 and twelve in VP4. The conserved regions shared between L338 and other group A rotavirus strains (RVAs) indicated that L338 was more closely related genomically to animal and human RVAs other than ERVs, suggesting that L338 may not be an endogenous equine RV but have emerged as an interspecies reassortant with other RVA strains. Furthermore, genotype-specific motifs of all 27 G and 37 P types were identified in regions 7-1a (aa 91-100) of VP7 and regions 8-1 (aa146-151) and 8-3 (aa113-118 and 125-135) of VP4 (VP8*). PMID:26321159

  4. Integrative visual analysis of protein sequence mutations

    PubMed Central

    2014-01-01

    Background An important aspect of studying the relationship between protein sequence, structure and function is the molecular characterization of the effect of protein mutations. To understand the functional impact of amino acid changes, the multiple biological properties of protein residues have to be considered together. Results Here, we present a novel visual approach for analyzing residue mutations. It combines different biological visualizations and integrates them with molecular data derived from external resources. To show various aspects of the biological information on different scales, our approach includes one-dimensional sequence views, three-dimensional protein structure views and two-dimensional views of residue interaction networks as well as aggregated views. The views are linked tightly and synchronized to reduce the cognitive load of the user when switching between them. In particular, the protein mutations are mapped onto the views together with further functional and structural information. We also assess the impact of individual amino acid changes by the detailed analysis and visualization of the involved residue interactions. We demonstrate the effectiveness of our approach and the developed software on the data provided for the BioVis 2013 data contest. Conclusions Our visual approach and software greatly facilitate the integrative and interactive analysis of protein mutations based on complementary visualizations. The different data views offered to the user are enriched with information about molecular properties of amino acid residues and further biological knowledge. PMID:25237389

  5. Identification of a Novel Calcium Binding Motif Based on the Detection of Sequence Insertions in the Animal Peroxidase Domain of Bacterial Proteins

    PubMed Central

    Santamaría-Hernando, Saray

    2012-01-01

    Proteins of the animal heme peroxidase (ANP) superfamily differ greatly in size since they have either one or two catalytic domains that match profile PS50292. The orf PP_2561 of Pseudomonas putida KT2440 that we have called PepA encodes a two-domain ANP. The alignment of these domains with those of PepA homologues revealed a variable number of insertions with the consensus G-x-D-G-x-x-[GN]-[TN]-x-D-D. This motif has also been detected in the structure of pseudopilin (pdb 3G20), where it was found to be involved in Ca2+ coordination although a sequence analysis did not reveal the presence of any known calcium binding motifs in this protein. Isothermal titration calorimetry revealed that a peptide containing this consensus motif bound specifically calcium ions with affinities ranging between 33–79 µM depending on the pH. Microcalorimetric titrations of the purified N-terminal ANP-like domain of PepA revealed Ca2+ binding with a KD of 12 µM and stoichiometry of 1.25 calcium ions per protein monomer. This domain exhibited peroxidase activity after its reconstitution with heme. These data led to the definition of a novel calcium binding motif that we have termed PERCAL and which was abundantly present in animal peroxidase-like domains of bacterial proteins. Bacterial heme peroxidases thus possess two different types of calcium binding motifs, namely PERCAL and the related hemolysin type calcium binding motif, with the latter being located outside the catalytic domains and in their C-terminal end. A phylogenetic tree of ANP-like catalytic domains of bacterial proteins with PERCAL motifs, including single domain peroxidases, was divided into two major clusters, representing domains with and without PERCAL motif containing insertions. We have verified that the recently reported classification of bacterial heme peroxidases in two families (cd09819 and cd09821) is unrelated to these insertions. Sequences matching PERCAL were detected in all kingdoms of life. PMID

  6. Identification of a novel calcium binding motif based on the detection of sequence insertions in the animal peroxidase domain of bacterial proteins.

    PubMed

    Santamaría-Hernando, Saray; Krell, Tino; Ramos-González, María-Isabel

    2012-01-01

    Proteins of the animal heme peroxidase (ANP) superfamily differ greatly in size since they have either one or two catalytic domains that match profile PS50292. The orf PP_2561 of Pseudomonas putida KT2440 that we have called PepA encodes a two-domain ANP. The alignment of these domains with those of PepA homologues revealed a variable number of insertions with the consensus G-x-D-G-x-x-[GN]-[TN]-x-D-D. This motif has also been detected in the structure of pseudopilin (pdb 3G20), where it was found to be involved in Ca(2+) coordination although a sequence analysis did not reveal the presence of any known calcium binding motifs in this protein. Isothermal titration calorimetry revealed that a peptide containing this consensus motif bound specifically calcium ions with affinities ranging between 33-79 µM depending on the pH. Microcalorimetric titrations of the purified N-terminal ANP-like domain of PepA revealed Ca(2+) binding with a K(D) of 12 µM and stoichiometry of 1.25 calcium ions per protein monomer. This domain exhibited peroxidase activity after its reconstitution with heme. These data led to the definition of a novel calcium binding motif that we have termed PERCAL and which was abundantly present in animal peroxidase-like domains of bacterial proteins. Bacterial heme peroxidases thus possess two different types of calcium binding motifs, namely PERCAL and the related hemolysin type calcium binding motif, with the latter being located outside the catalytic domains and in their C-terminal end. A phylogenetic tree of ANP-like catalytic domains of bacterial proteins with PERCAL motifs, including single domain peroxidases, was divided into two major clusters, representing domains with and without PERCAL motif containing insertions. We have verified that the recently reported classification of bacterial heme peroxidases in two families (cd09819 and cd09821) is unrelated to these insertions. Sequences matching PERCAL were detected in all kingdoms of life. PMID

  7. Temporal Integration Windows for Naturalistic Visual Sequences

    PubMed Central

    Fairhall, Scott L.; Albi, Angela; Melcher, David

    2014-01-01

    There is increasing evidence that the brain possesses mechanisms to integrate incoming sensory information as it unfolds over time-periods of 2–3 seconds. The ubiquity of this mechanism across modalities, tasks, perception and production has led to the proposal that it may underlie our experience of the subjective present. A critical test of this claim is that this phenomenon should be apparent in naturalistic visual experiences. We tested this using movie-clips as a surrogate for our day-to-day experience, temporally scrambling them to require (re-) integration within and beyond the hypothesized 2–3 second interval. Two independent experiments demonstrate a step-wise increase in the difficulty to follow stimuli at the hypothesized 2–3 second scrambling condition. Moreover, only this difference could not be accounted for by low-level visual properties. This provides the first evidence that this 2–3 second integration window extends to complex, naturalistic visual sequences more consistent with our experience of the subjective present. PMID:25010517

  8. Formation and Dissociation of the Interstrand i-Motif by the Sequences d(XnC4Ym) Monitored with Electrospray Ionization Mass Spectrometry

    NASA Astrophysics Data System (ADS)

    Cao, Yanwei; Qin, Yujiao; Bruist, Michael; Gao, Shang; Wang, Bing; Wang, Huixin; Guo, Xinhua

    2015-06-01

    Formation and dissociation of the interstrand i-motifs by DNA with the sequence d(XnC4Ym) (X and Y represent thymine, adenine, or guanine, and n, m range from 0 to 2) are studied with electrospray ionization mass spectrometry (ESI-MS), circular dichroism (CD), and UV spectrophotometry. The ion complexes detected in the gas phase and the melting temperatures (Tm) obtained in solution show that a non-C base residue located at 5' end favors formation of the four-stranded structures, with T > A > G for imparting stability. Comparatively, no rule is found when a non-C base is located at the 3' end. Detection of penta- and hexa-stranded ions indicates the formation of i-motifs with more than four strands. In addition, the i-motifs seen in our mass spectra are accompanied by single-, double-, and triple-stranded ions, and the trimeric ions were always less abundant during annealing and heat-induced dissociation process of the DNA strands in solution (pH = 4.5). This provides a direct evidence of a strand-by-strand formation and dissociation pathway of the interstrand i-motif and formation of the triple strands is the rate-limiting step. In contrast, the trimeric ions are abundant when the tetramolecular ions are subjected to collision-induced dissociation (CID) in the gas phase, suggesting different dissociation behaviors of the interstrand i-motif in the gas phase and in solution. Furthermore, hysteretic UV absorption melting and cooling curves reveal an irreversible dissociation and association kinetic process of the interstrand i-motif in solution.

  9. Integration of Cyanine, Merocyanine and Styryl Dye Motifs with Synthetic Bacteriochlorins.

    PubMed

    Yang, Eunkyung; Zhang, Nuonuo; Krayer, Michael; Taniguchi, Masahiko; Diers, James R; Kirmaier, Christine; Lindsey, Jonathan S; Bocian, David F; Holten, Dewey

    2016-01-01

    Understanding the effects of substituents on spectral properties is essential for the rational design of tailored bacteriochlorins for light-harvesting and other applications. Toward this goal, three new bacteriochlorins containing previously unexplored conjugating substituents have been prepared and characterized. The conjugating substituents include two positively charged species, 2-(N-ethyl 2-quinolinium)vinyl- (B-1) and 2-(N-ethyl 4-pyridinium)vinyl- (B-2), and a neutral group, acroleinyl- (B-3); the charged species resemble cyanine (or styryl) dye motifs whereas the neutral unit resembles a merocyanine dye motif. The three bacteriochlorins are examined by static and time-resolved absorption and emission spectroscopy and density functional theoretical calculations. B-1 and B-2 have Qy absorption bathochromically shifted well into the NIR region (822 and 852 nm), farther than B-3 (793 nm) and other 3,13-disubstituted bacteriochlorins studied previously. B-1 and B-2 have broad Qy absorption and fluorescence features with large peak separation (Stokes shift), low fluorescence yields, and shortened S1 (Qy ) excited-state lifetimes (~700 ps and ~100 ps). More typical spectra and S1 lifetime (~2.3 ns) are found for B-3. The combined photophysical and molecular-orbital characteristics suggest the altered spectra and enhanced nonradiative S1 decay of B-1 and B-2 derive from excited-state configurations in which electron density is shifted between the macrocycle and the substituents. PMID:26505265

  10. Modeling and analysis of MH1 domain of Smads and their interaction with promoter DNA sequence motif.

    PubMed

    Makkar, Pooja; Metpally, Raghu Prasad R; Sangadala, Sreedhara; Reddy, Boojala Vijay B

    2009-04-01

    The Smads are a group of related intracellular proteins critical for transmitting the signals to the nucleus from the transforming growth factor-beta (TGF-beta) superfamily of proteins at the cell surface. The prototypic members of the Smad family, Mad and Sma, were first described in Drosophila and Caenorhabditis elegans, respectively. Related proteins in Xenopus, Humans, Mice and Rats were subsequently identified, and are now known as Smads. Smad protein family members act downstream in the TGF-beta signaling pathway mediating various biological processes, including cell growth, differentiation, matrix production, apoptosis and development. Smads range from about 400-500 amino acids in length and are grouped into the receptor-regulated Smads (R-Smads), the common Smads (Co-Smads) and the inhibitory Smads (I-Smads). There are eight Smads in mammals, Smad1/5/8 (bone morphogenetic protein regulated) and Smad2/3 (TGF-beta/activin regulated) are termed R-Smads, Smad4 is denoted as Co-Smad and Smad6/7 are inhibitory Smads. A typical Smad consists of a conserved N-terminal Mad Homology 1 (MH1) domain and a C-terminal Mad Homology 2 (MH2) domain connected by a proline rich linker. The MH1 domain plays key role in DNA recognition and also facilitates the binding of Smad4 to the phosphorylated C-terminus of R-Smads to form activated complex. The MH2 domain exhibits transcriptional activation properties. In order to understand the structural basis of interaction of various Smads with their target proteins and the promoter DNA, we modeled MH1 domain of the remaining mammalian Smads based on known crystal structures of Smad3-MH1 domain bound to GTCT Smad box DNA sequence (1OZJ). We generated a B-DNA structure using average base-pair parameters of Twist, Tilt, Roll and base Slide angles. We then modeled interaction pose of the MH1 domain of Smad1/5/8 to their corresponding DNA sequence motif GCCG. These models provide the structural basis towards understanding functional

  11. Using Weeder, Pscan, and PscanChIP for the Discovery of Enriched Transcription Factor Binding Site Motifs in Nucleotide Sequences.

    PubMed

    Zambelli, Federico; Pesole, Graziano; Pavesi, Giulio

    2014-01-01

    One of the greatest challenges facing modern molecular biology is understanding the complex mechanisms regulating gene expression. A fundamental step in this process requires the characterization of sequence motifs involved in the regulation of gene expression at transcriptional and post-transcriptional levels. In particular, transcription is modulated by the interaction of transcription factors (TFs) with their corresponding binding sites. Weeder, Pscan, and PscanChIP are software tools freely available for noncommercial users as a stand-alone or Web-based applications for the automatic discovery of conserved motifs in a set of DNA sequences likely to be bound by the same TFs. Input for the tools can be promoter sequences from co-expressed or co-regulated genes (for which Weeder and Pscan are suitable), or regions identified through genome wide ChIP-seq or similar experiments (Weeder and PscanChIP). The motifs are either found by a de novo approach (Weeder) or by using descriptors of the binding specificity of TFs (Pscan and PscanChIP). PMID:25199791

  12. Loop Sequence Context Influences the Formation and Stability of the i-Motif for DNA Oligomers of Sequence (CCCXXX)4, where X = A and/or T, under Slightly Acidic Conditions.

    PubMed

    McKim, Mikeal; Buxton, Alexander; Johnson, Courtney; Metz, Amanda; Sheardy, Richard D

    2016-08-11

    The structure and stability of DNA is highly dependent upon the sequence context of the bases (A, G, C, and T) and the environment under which the DNA is prepared (e.g., buffer, temperature, pH, ionic strength). Understanding the factors that influence structure and stability of the i-motif conformation can lead to the design of DNA sequences with highly tunable properties. We have been investigating the influence of pH and temperature on the conformations and stabilities for all permutations of the DNA sequence (CCCXXX)4, where X = A and/or T, using spectroscopic approaches. All oligomers undergo transitions from single-stranded structures at pH 7.0 to i-motif conformations at pH 5.0 as evidenced by circular dichroism (CD) studies. These folded structures possess stacked C:CH(+) base pairs joined by loops of 5'-XXX-3'. Although the pH at the midpoint of the transition (pHmp) varies slightly with loop sequence, the linkage between pH and log K for the proton induced transition is highly loop sequence dependent. All oligomers also undergo the thermally induced i-motif to single-strand transition at pH 5.0 as the temperature is increased from 25 to 95 °C. The temperature at the midpoint of this transition (Tm) is also highly dependent on loop sequence context effects. For seven of eight possible permutations, the pH induced, and thermally induced transitions appear to be highly cooperative and two state. Analysis of the CD optical melting profiles via a van't Hoff approach reveals sequence-dependent thermodynamic parameters for the unfolding as well. Together, these data reveal that the i-motif conformation exhibits exquisite sensitivity to loop sequence context with respect to formation and stability. PMID:27438583

  13. Secondary structure model of the Mason-Pfizer monkey virus 5' leader sequence: identification of a structural motif common to a variety of retroviruses.

    PubMed Central

    Harrison, G P; Hunter, E; Lever, A M

    1995-01-01

    A stable secondary structure model is presented for the region 3' of the primer-binding site to 130 bases into the gag sequence of the prototype type D retrovirus Mason-Pfizer monkey virus. Using biochemical probing of RNA from this region in association with free energy minimization, we have identified a stem-loop structure in the region, which from other studies has been shown to be important for genomic RNA encapsidation. The structure involves a highly stable stem of five G-C pairs terminating in a heptaloop. Comparison of the Mason-Pfizer monkey virus structure with one predicted for squirrel monkey retrovirus demonstrates an identical stem and a common ACC motif in the loop. Free energy studies of the secondary structure of the 5' regions of eight other retroviruses predict stem loops which have similar GAYC motifs. We believe this may represent a common structural and sequence motif which among other functions may be involved in genomic RNA packaging in these viruses. PMID:7884866

  14. Identification of the First Prokaryotic Collagen Sequence Motif That Mediates Binding to Human Collagen Receptors, Integrins α2β1 and α11β1*

    PubMed Central

    Caswell, Clayton C.; Barczyk, Malgorzata; Keene, Douglas R.; Lukomska, Ewa; Gullberg, Donald E.; Lukomski, Slawomir

    2008-01-01

    Many pathogenic bacteria interact with human integrins to enter host cells and to augment host colonization. Group A Streptococcus (GAS) employs molecular mimicry by direct interactions between the cell surface streptococcal collagen-like protein-1 (Scl1) and the human collagen receptor, integrin α2β1. The collagen-like (CL) region of the Scl1 protein mediates integrin-binding, although, the integrin binding motif was not defined. Here, we used molecular cloning and site-directed mutagenesis to identify the GLPGER sequence as the α2β1 and the α11β1 binding motif. Electron microscopy experiments mapped binding sites of the recombinant α2-integrin-inserted domain to the GLPGER motif of the recombinant Scl (rScl) protein. rScl proteins and a synthetic peptide harboring the GLPGER motif mediated the attachment of C2C12-α2 + myoblasts expressing the α2β1 integrin as the sole collagen receptor. The C2C12-α11 + myoblasts expressing the α11β1 integrin also attached to GLPGER-harboring rScl proteins. Furthermore, the C2C12-α11 + cells attached to rScl1 more efficiently than C2C12-α2 + cells, suggesting that the α11β1 integrin may have a higher binding affinity for the GLPGER sequence. Human endothelial cells and dermal fibroblasts adhered to rScl proteins, indicating that multiple cell types may recognize and bind the Scl proteins via their collagen receptors. This work is a stepping stone toward defining the utilization of collagen receptors by microbial collagen-like proteins that are expressed by pathogenic bacteria. PMID:18990704

  15. Functional analysis reveals the possible role of the C-terminal sequences and PI motif in the function of lily (Lilium longiflorum) PISTILLATA (PI) orthologues

    PubMed Central

    Chen, Ming-Kun; Hsieh, Wen-Ping; Yang, Chang-Hsien

    2012-01-01

    Two lily (Lilium longiflorum) PISTILLATA (PI) genes, Lily MADS Box Gene 8 and 9 (LMADS8/9), were characterized. LMADS9 lacked 29 C-terminal amino acids including the PI motif that was present in LMADS8. Both LMADS8/9 mRNAs were prevalent in the first and second whorl tepals during all stages of development and were expressed in the stamen only in young flower buds. LMADS8/9 could both form homodimers, but the ability of LMADS8 homodimers to bind to CArG1 was relatively stronger than that of LMADS9 homodimers. 35S:LMADS8 completely, and 35S:LMADS9 only partially, rescued the second whorl petal formation and partially converted the first whorl sepal into a petal-like structure in Arabidopsis pi-1 mutants. Ectopic expression of LMADS8-C (with deletion of the 29 amino acids of the C-terminal sequence) or LMADS8-PI (with only the PI motif deleted) only partially rescued petal formation in pi mutants, which was similar to what was observed in 35S:LMADS9/pi plants. In contrast, 35:LMADS9+L8C (with the addition of the 29 amino acids of the LMADS8 C-terminal sequence) or 35S:LMADS9+L8PI (with the addition of the LMADS8 PI motif) demonstrated an increased ability to rescue petal formation in pi mutants, which was similar to what was observed in 35S:LMADS8/pi plants. Furthermore, ectopic expression of LMADS8-M (with the MADS domain truncated) generated more severe dominant negative phenotypes than those seen in 35S:LMADS9-M flowers. These results revealed that the 29 amino acids including the PI motif in the C-terminal region of the lily PI orthologue are valuable for its function in regulating perianth organ formation. PMID:22068145

  16. Sequence and spatiotemporal expression analysis of CLE-motif containing genes from the reniform nematode (Rotylenchulus reniformis Linford & Oliveira)

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The reniform nematode, Rotylenchulus reniformis, is a sedentary semi-endoparasitic species with a host range that encompasses more than 77 plant families. Nematode effector proteins containing plant-ligand motifs similar to CLAVATA3/ESR (CLE) peptides have been identified in the Heterodera, Globode...

  17. Highly Divergent Integration Profile of Adeno-Associated Virus Serotype 5 Revealed by High-Throughput Sequencing

    PubMed Central

    Janovitz, Tyler; Oliveira, Thiago; Sadelain, Michel

    2014-01-01

    ABSTRACT Adeno-associated virus serotype 5 (AAV-5) is a human parvovirus that infects a high percentage of the population. It is the most divergent AAV, the DNA sequence cleaved by the viral endonuclease is distinct from all other described serotypes and, uniquely, AAV-5 does not cross-complement the replication of other serotypes. In contrast to the well-characterized integration of AAV-2, no published studies have investigated the genomic integration of AAV-5. In this study, we analyzed more than 660,000 AAV-5 integration junctions using high-throughput integrant capture sequencing of infected human cells. The integration activity of AAV-5 was 99.7% distinct from AAV-2 and favored intronic sequences. Genome-wide integration was highly correlated with viral replication protein binding and endonuclease sites, and a 39-bp consensus integration motif was revealed that included these features. Algorithmic scanning identified 126 AAV-5 hot spots, the largest of which encompassed 3.3% of all integration events. The unique aspects of AAV-5 integration may provide novel tools for biotechnology and gene therapy. IMPORTANCE Viral integration into the host genome is an important aspect of virus host cell biology. Genomic integration studies of the small single-stranded AAVs have largely focused on site preferential integration of AAV-2, which depends on the viral replication protein (Rep). We have now established the first genome wide integration profile of the highly divergent AAV-5 serotype. Using integrant capture sequencing, more than 600,000 AAV-5 integration junctions in human cells were analyzed. AAV-5 integration hot spots were 99.7% distinct from AAV-2. Integration favored intronic sequences, occurred on all chromosomes, and integration hot spot distribution was correlated with human genomic GAGC repeats and transcriptional activity. These features support expansion of AAV-5 based vectors for gene transfer considerations. PMID:24335317

  18. PscanChIP: finding over-represented transcription factor-binding site motifs and their correlations in sequences from ChIP-Seq experiments

    PubMed Central

    Zambelli, Federico; Pesole, Graziano; Pavesi, Giulio

    2013-01-01

    Chromatin immunoprecipitation followed by sequencing with next-generation technologies (ChIP-Seq) has become the de facto standard for building genome-wide maps of regions bound by a given transcription factor (TF). The regions identified, however, have to be further analyzed to determine the actual DNA-binding sites for the TF, as well as sites for other TFs belonging to the same TF complex or in general co-operating or interacting with it in transcription regulation. PscanChIP is a web server that, starting from a collection of genomic regions derived from a ChIP-Seq experiment, scans them using motif descriptors like JASPAR or TRANSFAC position-specific frequency matrices, or descriptors uploaded by users, and it evaluates both motif enrichment and positional bias within the regions according to different measures and criteria. PscanChIP can successfully identify not only the actual binding sites for the TF investigated by a ChIP-Seq experiment but also secondary motifs corresponding to other TFs that tend to bind the same regions, and, if present, precise positional correlations among their respective sites. The web interface is free for use, and there is no login requirement. It is available at http://www.beaconlab.it/pscan_chip_dev. PMID:23748563

  19. A unique transactivation sequence motif is found in the carboxyl-terminal domain of the single-strand-binding protein FBP.

    PubMed Central

    Duncan, R; Collins, I; Tomonaga, T; Zhang, T; Levens, D

    1996-01-01

    The far-upstream element-binding protein (FBP) is one of several recently described factors which bind to a single strand of DNA in the 5' region of the c-myc gene. Although cotransfection of FBP increases expression from a far-upstream element-bearing c-myc promoter reporter, the mechanism of this stimulation is heretofore unknown. Can a single-strand-binding protein function as a classical transactivator, or are these proteins restricted to stabilizing or altering the conformation of DNA in an architectural role? Using chimeric GAL4-FBP fusion proteins we have shown that the carboxyl-terminal region (residues 448 to 644) is a potent transcriptional activation domain. This region contains three copies of a unique amino acid sequence motif containing tyrosine diads. Analysis of deletion mutants demonstrated that a single tyrosine motif alone (residues 609 to 644) was capable of activating transcription. The activation property of the C-terminal domain is repressed by the N-terminal 107 amino acids of FBP. These results show that FBP contains a transactivation domain which can function alone, suggesting that FBP contributes directly to c-myc transcription while bound to a single-strand site. Furthermore, activation is mediated by a new motif which can be negatively regulated by a repression domain of FBP. PMID:8628294

  20. Integration of bioinformatics and synthetic promoters leads to the discovery of novel elicitor-responsive cis-regulatory sequences in Arabidopsis.

    PubMed

    Koschmann, Jeannette; Machens, Fabian; Becker, Marlies; Niemeyer, Julia; Schulze, Jutta; Bülow, Lorenz; Stahl, Dietmar J; Hehl, Reinhard

    2012-09-01

    A combination of bioinformatic tools, high-throughput gene expression profiles, and the use of synthetic promoters is a powerful approach to discover and evaluate novel cis-sequences in response to specific stimuli. With Arabidopsis (Arabidopsis thaliana) microarray data annotated to the PathoPlant database, 732 different queries with a focus on fungal and oomycete pathogens were performed, leading to 510 up-regulated gene groups. Using the binding site estimation suite of tools, BEST, 407 conserved sequence motifs were identified in promoter regions of these coregulated gene sets. Motif similarities were determined with STAMP, classifying the 407 sequence motifs into 37 families. A comparative analysis of these 37 families with the AthaMap, PLACE, and AGRIS databases revealed similarities to known cis-elements but also led to the discovery of cis-sequences not yet implicated in pathogen response. Using a parsley (Petroselinum crispum) protoplast system and a modified reporter gene vector with an internal transformation control, 25 elicitor-responsive cis-sequences from 10 different motif families were identified. Many of the elicitor-responsive cis-sequences also drive reporter gene expression in an Agrobacterium tumefaciens infection assay in Nicotiana benthamiana. This work significantly increases the number of known elicitor-responsive cis-sequences and demonstrates the successful integration of a diverse set of bioinformatic resources combined with synthetic promoter analysis for data mining and functional screening in plant-pathogen interaction. PMID:22744985

  1. Integration of Bioinformatics and Synthetic Promoters Leads to the Discovery of Novel Elicitor-Responsive cis-Regulatory Sequences in Arabidopsis1[C][W][OA

    PubMed Central

    Koschmann, Jeannette; Machens, Fabian; Becker, Marlies; Niemeyer, Julia; Schulze, Jutta; Bülow, Lorenz; Stahl, Dietmar J.; Hehl, Reinhard

    2012-01-01

    A combination of bioinformatic tools, high-throughput gene expression profiles, and the use of synthetic promoters is a powerful approach to discover and evaluate novel cis-sequences in response to specific stimuli. With Arabidopsis (Arabidopsis thaliana) microarray data annotated to the PathoPlant database, 732 different queries with a focus on fungal and oomycete pathogens were performed, leading to 510 up-regulated gene groups. Using the binding site estimation suite of tools, BEST, 407 conserved sequence motifs were identified in promoter regions of these coregulated gene sets. Motif similarities were determined with STAMP, classifying the 407 sequence motifs into 37 families. A comparative analysis of these 37 families with the AthaMap, PLACE, and AGRIS databases revealed similarities to known cis-elements but also led to the discovery of cis-sequences not yet implicated in pathogen response. Using a parsley (Petroselinum crispum) protoplast system and a modified reporter gene vector with an internal transformation control, 25 elicitor-responsive cis-sequences from 10 different motif families were identified. Many of the elicitor-responsive cis-sequences also drive reporter gene expression in an Agrobacterium tumefaciens infection assay in Nicotiana benthamiana. This work significantly increases the number of known elicitor-responsive cis-sequences and demonstrates the successful integration of a diverse set of bioinformatic resources combined with synthetic promoter analysis for data mining and functional screening in plant-pathogen interaction. PMID:22744985

  2. Sequence and peptide-binding motif for a variant of HLA-A*0214 (A*02142) in an HIV-1-resistant individual from the Nairobi Sex Worker cohort.

    PubMed

    Luscher, M A; MacDonald, K S; Bwayo, J J; Plummer, F A; Barber, B H

    2001-02-01

    As part of the ongoing study of natural HIV-1 resistance in the women of the Nairobi Sex Workers' study, we have examined a resistance-associated HLA class I allele at the molecular level. Typing by polymerase chain reaction using sequence-specific primers determined that this molecule is closely related to HLA-A*0214, one of a family of HLA-A2 supertype alleles which correlate with HIV-1 resistance in this population. Direct nucleotide sequencing shows that this molecule differs from A*0214, having a silent nucleotide substitution. We therefore propose to designate it HLA-A*02142. We have determined the peptide-binding motif of HLA-A*0214/02142 by peptide elution and bulk Edman degradative sequencing. The resulting motif, X-[Q,V]-X-X-X-K-X-X-[V,L], includes lysine as an anchor at position 6. The data complement available information on the peptide-binding characteristics of this molecule, and will be of use in identifying antigenic peptides from HIV-1 and other pathogens. PMID:11261925

  3. Sequence motif upstream of the Hendra virus fusion protein cleavage site is not sufficient to promote efficient proteolytic processing

    SciTech Connect

    Craft, Willie Warren; Dutch, Rebecca Ellis . E-mail: rdutc2@uky.edu

    2005-10-10

    The Hendra virus fusion (HeV F) protein is synthesized as a precursor, F{sub 0}, and proteolytically cleaved into the mature F{sub 1} and F{sub 2} heterodimer, following an HDLVDGVK{sub 109} motif. This cleavage event is required for fusogenic activity. To determine the amino acid requirements for processing of the HeV F protein, we constructed multiple mutants. Individual and simultaneous alanine substitutions of the eight residues immediately upstream of the cleavage site did not eliminate processing. A chimeric SV5 F protein in which the furin site was substituted for the VDGVK{sub 109} motif of the HeV F protein was not processed but was expressed on the cell surface. Another chimeric SV5 F protein containing the HDLVDGVK{sub 109} motif of the HeV F protein underwent partial cleavage. These data indicate that the upstream region can play a role in protease recognition, but is neither absolutely required nor sufficient for efficient processing of the HeV F protein.

  4. GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions.

    PubMed

    Besemer, J; Lomsadze, A; Borodovsky, M

    2001-06-15

    Improving the accuracy of prediction of gene starts is one of a few remaining open problems in computer prediction of prokaryotic genes. Its difficulty is caused by the absence of relatively strong sequence patterns identifying true translation initiation sites. In the current paper we show that the accuracy of gene start prediction can be improved by combining models of protein-coding and non-coding regions and models of regulatory sites near gene start within an iterative Hidden Markov model based algorithm. The new gene prediction method, called GeneMarkS, utilizes a non-supervised training procedure and can be used for a newly sequenced prokaryotic genome with no prior knowledge of any protein or rRNA genes. The GeneMarkS implementation uses an improved version of the gene finding program GeneMark.hmm, heuristic Markov models of coding and non-coding regions and the Gibbs sampling multiple alignment program. GeneMarkS predicted precisely 83.2% of the translation starts of GenBank annotated Bacillus subtilis genes and 94.4% of translation starts in an experimentally validated set of Escherichia coli genes. We have also observed that GeneMarkS detects prokaryotic genes, in terms of identifying open reading frames containing real genes, with an accuracy matching the level of the best currently used gene detection methods. Accurate translation start prediction, in addition to the refinement of protein sequence N-terminal data, provides the benefit of precise positioning of the sequence region situated upstream to a gene start. Therefore, sequence motifs related to transcription and translation regulatory sites can be revealed and analyzed with higher precision. These motifs were shown to possess a significant variability, the functional and evolutionary connections of which are discussed. PMID:11410670

  5. Analysis of BAC-end sequences in common bean (Phaseolus vulgaris L.) towards the development and characterization of long motifs SSRs.

    PubMed

    Müller, Bárbara Salomão de Faria; Sakamoto, Tetsu; de Menezes, Ivandilson Pessoa Pinto; Prado, Guilherme Souza; Martins, Wellington Santos; Brondani, Claudio; de Barros, Everaldo Gonçalves; Vianello, Rosana Pereira

    2014-11-01

    The increasing volume of genomic data on the Phaseolus vulgaris species have contributed to its importance as a model genetic species and positively affected the investigation of other legumes of scientific and economic value. To expand and gain a more in-depth knowledge of the common bean genome, the ends of a number of bacterial artificial chromosome (BAC) were sequenced, annotated and the presence of repetitive sequences was determined. In total, 52,270 BESs (BAC-end sequences), equivalent to 32 Mbp (~6 %) of the genome, were processed. In total, 3,789 BES-SSRs were identified, with a distribution of one SSR (simple sequence repeat) per 8.36 kbp and 2,000 were suitable for the development of SSRs, of which 194 were evaluated in low-resolution screening. From 40 BES-SSRs based on long motifs SSRs (≥ trinucleotides) analyzed in high-resolution genotyping, 34 showed an equally good amplification for the Andean and for the Mesoamerican genepools, exhibiting an average gene diversity (H E) of 0.490 and 5.59 alleles/locus, of which six classified as Class I showed a H E ≥ 0.7. The PCoA and structure analysis allowed to discriminate the gene pools (K = 2, FST = 0.733). From the 52,270 BESs, 2 % corresponded to transcription factors and 3 % to transposable elements. Putative functions for 24,321 BESs were identified and for 19,363 were assigned functional categories (gene ontology). This study identified highly polymorphic BES-SSRs containing tri- to hexanucleotides motifs and bringing together relevant genetic characteristics useful for breeding programs. Additionally, the BESs were incorporated into the international genome-sequencing project for the common bean. PMID:25164100

  6. Structural alphabet motif discovery and a structural motif database.

    PubMed

    Ku, Shih-Yen; Hu, Yuh-Jyh

    2012-01-01

    This study proposes a general framework for structural motif discovery. The framework is based on a modular design in which the system components can be modified or replaced independently to increase its applicability to various studies. It is a two-stage approach that first converts protein 3D structures into structural alphabet sequences, and then applies a sequence motif-finding tool to these sequences to detect conserved motifs. We named the structural motif database we built the SA-Motifbase, which provides the structural information conserved at different hierarchical levels in SCOP. For each motif, SA-Motifbase presents its 3D view; alphabet letter preference; alphabet letter frequency distribution; and the significance. SA-Motifbase is available at http://bioinfo.cis.nctu.edu.tw/samotifbase/. PMID:22099701

  7. Role of GxxxG Motifs in Transmembrane Domain Interactions.

    PubMed

    Teese, Mark G; Langosch, Dieter

    2015-08-25

    Transmembrane (TM) helices of integral membrane proteins can facilitate strong and specific noncovalent protein-protein interactions. Mutagenesis and structural analyses have revealed numerous examples in which the interaction between TM helices of single-pass membrane proteins is dependent on a GxxxG or (small)xxx(small) motif. It is therefore tempting to use the presence of these simple motifs as an indicator of TM helix interactions. In this Current Topic review, we point out that these motifs are quite common, with more than 50% of single-pass TM domains containing a (small)xxx(small) motif. However, the actual interaction strength of motif-containing helices depends strongly on sequence context and membrane properties. In addition, recent studies have revealed several GxxxG-containing TM domains that interact via alternative interfaces involving hydrophobic, polar, aromatic, or even ionizable residues that do not form recognizable motifs. In multipass membrane proteins, GxxxG motifs can be important for protein folding, and not just oligomerization. Our current knowledge thus suggests that the presence of a GxxxG motif alone is a weak predictor of protein dimerization in the membrane. PMID:26244771

  8. Relation between mRNA expression and sequence information in Desulfovibrio vulgaris: Combinatorial contributions of upstream regulatory motifs and coding sequence features to variations in mRNA abundance

    SciTech Connect

    Wu, Gang; Nie, Lei; Zhang, Weiwen

    2006-05-26

    ABSTRACT-The context-dependent expression of genes is the core for biological activities, and significant attention has been given to identification of various factors contributing to gene expression at genomic scale. However, so far this type of analysis has been focused whether on relation between mRNA expression and non-coding sequence features such as upstream regulatory motifs or on correlation between mRN abundance and non-random features in coding sequences (e.g. codon usage and amino acid usage). In this study multiple regression analyses of the mRNA abundance and all sequence information in Desulfovibrio vulgaris were performed, with the goal to investigate how much coding and non-coding sequence features contribute to the variations in mRNA expression, and in what manner they act together...

  9. Clinical Integration of Next Generation Sequencing Technology

    PubMed Central

    Gullapalli, R.R.; Lyons-Weiler, M.; Petrosko, P.; Dhir, R.; Becich, M.J.; LaFramboise, W.A.

    2012-01-01

    Abstract/Synopsis Recent technological advances in Next Generation Sequencing (NGS) methods have substantially reduced cost and operational complexity leading to the production of bench top sequencers and commercial software solutions for implementation in small research and clinical laboratories. This chapter summarizes requirements and hurdles to the successful implementation of these systems including 1) calibration, validation and optimization of the instrumentation, experimental paradigm and primary readout, 2) secure transfer, storage and secondary processing of the data, 3) implementation of software tools for targeted analysis, and 4) training of research and clinical personnel to evaluate data fidelity and interpret the molecular significance of the genomic output. In light of the commercial and technological impetus to bring NGS technology into the clinical domain, it is critical that novel tests incorporate rigid protocols with built-in calibration standards and that data transfer and processing occur under exacting security measures for interpretation by clinicians with specialized training in molecular diagnostics. PMID:23078661

  10. Identification of a Novel Sequence Motif Recognized by the Ankyrin Repeat Domain of zDHHC17/13 S-Acyltransferases.

    PubMed

    Lemonidis, Kimon; Sanchez-Perez, Maria C; Chamberlain, Luke H

    2015-09-01

    S-Acylation is a major post-translational modification affecting several cellular processes. It is particularly important for neuronal functions. This modification is catalyzed by a family of transmembrane S-acyltransferases that contain a conserved zinc finger DHHC (zDHHC) domain. Typically, eukaryote genomes encode for 7-24 distinct zDHHC enzymes, with two members also harboring an ankyrin repeat (AR) domain at their cytosolic N termini. The AR domain of zDHHC enzymes is predicted to engage in numerous interactions and facilitates both substrate recruitment and S-acylation-independent functions; however, the sequence/structural features recognized by this module remain unknown. The two mammalian AR-containing S-acyltransferases are the Golgi-localized zDHHC17 and zDHHC13, also known as Huntingtin-interacting proteins 14 and 14-like, respectively; they are highly expressed in brain, and their loss in mice leads to neuropathological deficits that are reminiscent of Huntington's disease. Here, we report that zDHHC17 and zDHHC13 recognize, via their AR domain, evolutionary conserved and closely related sequences of a [VIAP][VIT]XXQP consensus in SNAP25, SNAP23, cysteine string protein, Huntingtin, cytoplasmic linker protein 3, and microtubule-associated protein 6. This novel AR-binding sequence motif is found in regions predicted to be unstructured and is present in a number of zDHHC17 substrates and zDHHC17/13-interacting S-acylated proteins. This is the first study to identify a motif recognized by AR-containing zDHHCs. PMID:26198635

  11. Integrating Sequence Evolution into Probabilistic Orthology Analysis.

    PubMed

    Ullah, Ikram; Sjöstrand, Joel; Andersson, Peter; Sennblad, Bengt; Lagergren, Jens

    2015-11-01

    Orthology analysis, that is, finding out whether a pair of homologous genes are orthologs - stemming from a speciation - or paralogs - stemming from a gene duplication - is of central importance in computational biology, genome annotation, and phylogenetic inference. In particular, an orthologous relationship makes functional equivalence of the two genes highly likely. A major approach to orthology analysis is to reconcile a gene tree to the corresponding species tree, (most commonly performed using the most parsimonious reconciliation, MPR). However, most such phylogenetic orthology methods infer the gene tree without considering the constraints implied by the species tree and, perhaps even more importantly, only allow the gene sequences to influence the orthology analysis through the a priori reconstructed gene tree. We propose a sound, comprehensive Bayesian Markov chain Monte Carlo-based method, DLRSOrthology, to compute orthology probabilities. It efficiently sums over the possible gene trees and jointly takes into account the current gene tree, all possible reconciliations to the species tree, and the, typically strong, signal conveyed by the sequences. We compare our method with PrIME-GEM, a probabilistic orthology approach built on a probabilistic duplication-loss model, and MrBayesMPR, a probabilistic orthology approach that is based on conventional Bayesian inference coupled with MPR. We find that DLRSOrthology outperforms these competing approaches on synthetic data as well as on biological data sets and is robust to incomplete taxon sampling artifacts. PMID:26130236

  12. Efficient exact motif discovery

    PubMed Central

    Marschall, Tobias; Rahmann, Sven

    2009-01-01

    Motivation: The motif discovery problem consists of finding over-represented patterns in a collection of biosequences. It is one of the classical sequence analysis problems, but still has not been satisfactorily solved in an exact and efficient manner. This is partly due to the large number of possibilities of defining the motif search space and the notion of over-representation. Even for well-defined formalizations, the problem is frequently solved in an ad hoc manner with heuristics that do not guarantee to find the best motif. Results: We show how to solve the motif discovery problem (almost) exactly on a practically relevant space of IUPAC generalized string patterns, using the p-value with respect to an i.i.d. model or a Markov model as the measure of over-representation. In particular, (i) we use a highly accurate compound Poisson approximation for the null distribution of the number of motif occurrences. We show how to compute the exact clump size distribution using a recently introduced device called probabilistic arithmetic automaton (PAA). (ii) We define two p-value scores for over-representation, the first one based on the total number of motif occurrences, the second one based on the number of sequences in a collection with at least one occurrence. (iii) We describe an algorithm to discover the optimal pattern with respect to either of the scores. The method exploits monotonicity properties of the compound Poisson approximation and is by orders of magnitude faster than exhaustive enumeration of IUPAC strings (11.8 h compared with an extrapolated runtime of 4.8 years). (iv) We justify the use of the proposed scores for motif discovery by showing our method to outperform other motif discovery algorithms (e.g. MEME, Weeder) on benchmark datasets. We also propose new motifs on Mycobacterium tuberculosis. Availability and Implementation: The method has been implemented in Java. It can be obtained from http://ls11-www

  13. Distinct XPPX sequence motifs induce ribosome stalling, which is rescued by the translation elongation factor EF-P

    PubMed Central

    Peil, Lauri; Starosta, Agata L.; Lassak, Jürgen; Atkinson, Gemma C.; Virumäe, Kai; Spitzer, Michaela; Tenson, Tanel; Jung, Kirsten; Remme, Jaanus; Wilson, Daniel N.

    2013-01-01

    Ribosomes are the protein synthesizing factories of the cell, polymerizing polypeptide chains from their constituent amino acids. However, distinct combinations of amino acids, such as polyproline stretches, cannot be efficiently polymerized by ribosomes, leading to translational stalling. The stalled ribosomes are rescued by the translational elongation factor P (EF-P), which by stimulating peptide-bond formation allows translation to resume. Using metabolic stable isotope labeling and mass spectrometry, we demonstrate in vivo that EF-P is important for expression of not only polyproline-containing proteins, but also for specific subsets of proteins containing diprolyl motifs (XPP/PPX). Together with a systematic in vitro and in vivo analysis, we provide a distinct hierarchy of stalling triplets, ranging from strong stallers, such as PPP, DPP, and PPN to weak stallers, such as CPP, PPR, and PPH, all of which are substrates for EF-P. These findings provide mechanistic insight into how the characteristics of the specific amino acid substrates influence the fundamentals of peptide bond formation. PMID:24003132

  14. Nucleophosmin integrates within the nucleolus via multi-modal interactions with proteins displaying R-rich linear motifs and rRNA

    PubMed Central

    Mitrea, Diana M; Cika, Jaclyn A; Guy, Clifford S; Ban, David; Banerjee, Priya R; Stanley, Christopher B; Nourse, Amanda; Deniz, Ashok A; Kriwacki, Richard W

    2016-01-01

    The nucleolus is a membrane-less organelle formed through liquid-liquid phase separation of its components from the surrounding nucleoplasm. Here, we show that nucleophosmin (NPM1) integrates within the nucleolus via a multi-modal mechanism involving multivalent interactions with proteins containing arginine-rich linear motifs (R-motifs) and ribosomal RNA (rRNA). Importantly, these R-motifs are found in canonical nucleolar localization signals. Based on a novel combination of biophysical approaches, we propose a model for the molecular organization within liquid-like droplets formed by the N-terminal domain of NPM1 and R-motif peptides, thus providing insights into the structural organization of the nucleolus. We identify multivalency of acidic tracts and folded nucleic acid binding domains, mediated by N-terminal domain oligomerization, as structural features required for phase separation of NPM1 with other nucleolar components in vitro and for localization within mammalian nucleoli. We propose that one mechanism of nucleolar localization involves phase separation of proteins within the nucleolus. DOI: http://dx.doi.org/10.7554/eLife.13571.001 PMID:26836305

  15. Nucleophosmin integrates within the nucleolus via multi-modal interactions with proteins displaying R-rich linear motifs and rRNA

    DOE PAGESBeta

    Mitrea, Diana M.; Cika, Jaclyn A.; Guy, Clifford S.; Ban, David; Banerjee, Priya R.; Stanley, Christopher B.; Nourse, Amanda; Deniz, Ashok A.; Kriwacki, Richard W.

    2016-02-02

    The nucleolus is a membrane-less organelle formed through liquid-liquid phase separation of its components from the surrounding nucleoplasm. Here, we show that nucleophosmin (NPM1) integrates within the nucleolus via a multi-modal mechanism involving multivalent interactions with proteins containing arginine-rich linear motifs (R-motifs) and ribosomal RNA (rRNA). Importantly, these R-motifs are found in canonical nucleolar localization signals. Based on a novel combination of biophysical approaches, we propose a model for the molecular organization within liquid-like droplets formed by the N-terminal domain of NPM1 and R-motif peptides, thus providing insights into the structural organization of the nucleolus. We identify multivalency of acidicmore » tracts and folded nucleic acid binding domains, mediated by N-terminal domain oligomerization, as structural features required for phase separation of NPM1 with other nucleolar components in vitro and for localization within mammalian nucleoli. We propose that one mechanism of nucleolar localization involves phase separation of proteins within the nucleolus.« less

  16. Nucleophosmin integrates within the nucleolus via multi-modal interactions with proteins displaying R-rich linear motifs and rRNA.

    PubMed

    Mitrea, Diana M; Cika, Jaclyn A; Guy, Clifford S; Ban, David; Banerjee, Priya R; Stanley, Christopher B; Nourse, Amanda; Deniz, Ashok A; Kriwacki, Richard W

    2016-01-01

    The nucleolus is a membrane-less organelle formed through liquid-liquid phase separation of its components from the surrounding nucleoplasm. Here, we show that nucleophosmin (NPM1) integrates within the nucleolus via a multi-modal mechanism involving multivalent interactions with proteins containing arginine-rich linear motifs (R-motifs) and ribosomal RNA (rRNA). Importantly, these R-motifs are found in canonical nucleolar localization signals. Based on a novel combination of biophysical approaches, we propose a model for the molecular organization within liquid-like droplets formed by the N-terminal domain of NPM1 and R-motif peptides, thus providing insights into the structural organization of the nucleolus. We identify multivalency of acidic tracts and folded nucleic acid binding domains, mediated by N-terminal domain oligomerization, as structural features required for phase separation of NPM1 with other nucleolar components in vitro and for localization within mammalian nucleoli. We propose that one mechanism of nucleolar localization involves phase separation of proteins within the nucleolus. PMID:26836305

  17. Characterizing and controlling intrinsic biases of lambda exonuclease in nascent strand sequencing reveals phasing between nucleosomes and G-quadruplex motifs around a subset of human replication origins

    PubMed Central

    Foulk, Michael S.; Urban, John M.; Casella, Cinzia; Gerbi, Susan A.

    2015-01-01

    Nascent strand sequencing (NS-seq) is used to discover DNA replication origins genome-wide, allowing identification of features for their specification. NS-seq depends on the ability of lambda exonuclease (λ-exo) to efficiently digest parental DNA while leaving RNA-primer protected nascent strands intact. We used genomics and biochemical approaches to determine if λ-exo digests all parental DNA sequences equally. We report that λ-exo does not efficiently digest G-quadruplex (G4) structures in a plasmid. Moreover, λ-exo digestion of nonreplicating genomic DNA (LexoG0) enriches GC-rich DNA and G4 motifs genome-wide. We used LexoG0 data to control for nascent strand–independent λ-exo biases in NS-seq and validated this approach at the rDNA locus. The λ-exo–controlled NS-seq peaks are not GC-rich, and only 35.5% overlap with 6.8% of all G4s, suggesting that G4s are not general determinants for origin specification but may play a role for a subset. Interestingly, we observed a periodic spacing of G4 motifs and nucleosomes around the peak summits, suggesting that G4s may position nucleosomes at this subset of origins. Finally, we demonstrate that use of Na+ instead of K+ in the λ-exo digestion buffer reduced the effect of G4s on λ-exo digestion and discuss ways to increase both the sensitivity and specificity of NS-seq. PMID:25695952

  18. Limb body wall complex, amniotic band sequence, or new syndrome caused by mutation in IQ Motif containing K (IQCK)?

    PubMed

    Kruszka, Paul; Uwineza, Annette; Mutesa, Leon; Martinez, Ariel F; Abe, Yu; Zackai, Elaine H; Ganetzky, Rebecca; Chung, Brian; Stevenson, Roger E; Adelstein, Robert S; Ma, Xuefei; Mullikin, James C; Hong, Sung-Kook; Muenke, Maximilian

    2015-09-01

    Limb body wall complex (LBWC) and amniotic band sequence (ABS) are multiple congenital anomaly conditions with craniofacial, limb, and ventral wall defects. LBWC and ABS are considered separate entities by some, and a continuum of severity of the same condition by others. The etiology of LBWC/ABS remains unknown and multiple hypotheses have been proposed. One individual with features of LBWC and his unaffected parents were whole exome sequenced and Sanger sequenced as confirmation of the mutation. Functional studies were conducted using morpholino knockdown studies followed by human mRNA rescue experiments. Using whole exome sequencing, a de novo heterozygous mutation was found in the gene IQCK: c.667C>G; p.Q223E and confirmed by Sanger sequencing in an individual with LBWC. Morpholino knockdown of iqck mRNA in the zebrafish showed ventral defects including failure of ventral fin to develop and cardiac edema. Human wild-type IQCK mRNA rescued the zebrafish phenotype, whereas human p.Q223E IQCK mRNA did not, but worsened the phenotype of the morpholino knockdown zebrafish. This study supports a genetic etiology for LBWC/ABS, or potentially a new syndrome. PMID:26436108

  19. Limb body wall complex, amniotic band sequence, or new syndrome caused by mutation in IQ Motif containing K (IQCK)?

    PubMed Central

    Kruszka, Paul; Uwineza, Annette; Mutesa, Leon; Martinez, Ariel F; Abe, Yu; Zackai, Elaine H; Ganetzky, Rebecca; Chung, Brian; Stevenson, Roger E; Adelstein, Robert S; Ma, Xuefei; Mullikin, James C; Hong, Sung-Kook; Muenke, Maximilian

    2015-01-01

    Limb body wall complex (LBWC) and amniotic band sequence (ABS) are multiple congenital anomaly conditions with craniofacial, limb, and ventral wall defects. LBWC and ABS are considered separate entities by some, and a continuum of severity of the same condition by others. The etiology of LBWC/ABS remains unknown and multiple hypotheses have been proposed. One individual with features of LBWC and his unaffected parents were whole exome sequenced and Sanger sequenced as confirmation of the mutation. Functional studies were conducted using morpholino knockdown studies followed by human mRNA rescue experiments. Using whole exome sequencing, a de novo heterozygous mutation was found in the gene IQCK: c.667C>G; p.Q223E and confirmed by Sanger sequencing in an individual with LBWC. Morpholino knockdown of iqck mRNA in the zebrafish showed ventral defects including failure of ventral fin to develop and cardiac edema. Human wild-type IQCK mRNA rescued the zebrafish phenotype, whereas human p.Q223E IQCK mRNA did not, but worsened the phenotype of the morpholino knockdown zebrafish. This study supports a genetic etiology for LBWC/ABS, or potentially a new syndrome. PMID:26436108

  20. RNAPattMatch: a web server for RNA sequence/structure motif detection based on pattern matching with flexible gaps

    PubMed Central

    Drory Retwitzer, Matan; Polishchuk, Maya; Churkin, Elena; Kifer, Ilona; Yakhini, Zohar; Barash, Danny

    2015-01-01

    Searching for RNA sequence-structure patterns is becoming an essential tool for RNA practitioners. Novel discoveries of regulatory non-coding RNAs in targeted organisms and the motivation to find them across a wide range of organisms have prompted the use of computational RNA pattern matching as an enhancement to sequence similarity. State-of-the-art programs differ by the flexibility of patterns allowed as queries and by their simplicity of use. In particular—no existing method is available as a user-friendly web server. A general program that searches for RNA sequence-structure patterns is RNA Structator. However, it is not available as a web server and does not provide the option to allow flexible gap pattern representation with an upper bound of the gap length being specified at any position in the sequence. Here, we introduce RNAPattMatch, a web-based application that is user friendly and makes sequence/structure RNA queries accessible to practitioners of various background and proficiency. It also extends RNA Structator and allows a more flexible variable gaps representation, in addition to analysis of results using energy minimization methods. RNAPattMatch service is available at http://www.cs.bgu.ac.il/rnapattmatch. A standalone version of the search tool is also available to download at the site. PMID:25940619

  1. A survey of motif finding Web tools for detecting binding site motifs in ChIP-Seq data.

    PubMed

    Tran, Ngoc Tam L; Huang, Chun-Hsi

    2014-01-01

    ChIP-Seq (chromatin immunoprecipitation sequencing) has provided the advantage for finding motifs as ChIP-Seq experiments narrow down the motif finding to binding site locations. Recent motif finding tools facilitate the motif detection by providing user-friendly Web interface. In this work, we reviewed nine motif finding Web tools that are capable for detecting binding site motifs in ChIP-Seq data. We showed each motif finding Web tool has its own advantages for detecting motifs that other tools may not discover. We recommended the users to use multiple motif finding Web tools that implement different algorithms for obtaining significant motifs, overlapping resemble motifs, and non-overlapping motifs. Finally, we provided our suggestions for future development of motif finding Web tool that better assists researchers for finding motifs in ChIP-Seq data. PMID:24555784

  2. Nucleotide sequence and organization of the human S-protein gene: repeating peptide motifs in the pexin family and a model for their evolution

    SciTech Connect

    Jenne, D.; Stanley, K.K.

    1987-10-20

    The S-protein/vitronectin gene was isolated from a human genomic DNA library, and its sequence of about 5.3 kilobases including the adjacent 5' and 3' flanking regions was established. Alignment of the genomic DNA nucleotide sequence and the cDNA sequence indicated that the gene consisted of eight exons and seven introns. The intron positions in the S-protein gene and their phase type were compared to those in the hemopexin gene which shares amino acid sequence homologies with transin and the S-protein. Three introns have been found at equivalent positions; two other introns are very close to these positions and are interpreted as cases of intron sliding. Introns 3-7 occur at a conserved glycine residue within repeating peptide segments, whereas introns 1 and 2 are at the boundaries of the Somatomedin B domain of S-protein. The analysis of the exon structure in relations to repeating peptide motifs within the S-protein strongly suggest that it contains only seven repeats, one less than the hemopexin molecule. A very similar repeat pattern like that in hemopexin is shown to be present also in two other related proteins, transin and interstitial collagenase. An evolutionary model for the generation of the repeat pattern in the S-protein and the other members of this novel pexin gene family is proposed, and the sequence modifications for some of the repeats during divergent evolution are discussed in relation to know unique functional properties of hemopexin and S-protein.

  3. Construction of an integrated database to support genomic sequence analysis

    SciTech Connect

    Gilbert, W.; Overbeek, R.

    1994-11-01

    The central goal of this project is to develop an integrated database to support comparative analysis of genomes including DNA sequence data, protein sequence data, gene expression data and metabolism data. In developing the logic-based system GenoBase, a broader integration of available data was achieved due to assistance from collaborators. Current goals are to easily include new forms of data as they become available and to easily navigate through the ensemble of objects described within the database. This report comments on progress made in these areas.

  4. The role of context in RNA structure: flanking sequences reconfigure CAG motif folding in huntingtin exon 1 transcripts

    PubMed Central

    Busan, Steven; Weeks, Kevin M.

    2016-01-01

    The length of the CAG repeat region in the huntingtin messenger RNA is predictive of Huntington’s disease. Structural studies of CAG repeat-containing RNAs suggest that these sequences form simple hairpin structures; however, in the context of the full-length huntingtin mRNA, CAG repeats may form complex structures that could be targeted for therapeutic intervention. We examined the structures of transcripts spanning the first exon of the huntingtin mRNA with both healthy and disease-prone repeat lengths. In transcripts with 17 to 70 repeats, the CAG sequences base paired extensively with bases in the 5′ UTR and with a conserved region downstream of the CCG repeat region. In huntingtin transcripts with healthy numbers of repeats, the previously observed CAG hairpin was either absent or short. In contrast, in transcripts with disease-associated numbers of repeats, a CAG hairpin was present and extended from a three-helix junction. Our findings demonstrate the profound importance of sequence context in RNA folding and identify specific structural differences between healthy and disease-inducing huntingtin alleles that may be targets for therapeutic intervention. PMID:24199621

  5. Sequence databases: integrated information retrieval and data submission.

    PubMed

    Weisemann, J M; Boguski, M S; Ouellette, B F

    2001-05-01

    This unit describes the NCBI's Entrez database browser. Entrez integrates DNA and protein sequence data, three dimensional structures, and taxonomic information with its associated abstracts and citations contained in PubMed (MEDLINE). It is possible to search the Entrez information space using conventional search queries (authors, gene names, map location) as well as by bibliographic associations (articles that are related to one another) and sequence homology. Also described are the procedures for submission of new data, updates, and corrections to the sequence databases. PMID:18428302

  6. An integrated semiconductor device enabling non-optical genome sequencing.

    PubMed

    Rothberg, Jonathan M; Hinz, Wolfgang; Rearick, Todd M; Schultz, Jonathan; Mileski, William; Davey, Mel; Leamon, John H; Johnson, Kim; Milgrew, Mark J; Edwards, Matthew; Hoon, Jeremy; Simons, Jan F; Marran, David; Myers, Jason W; Davidson, John F; Branting, Annika; Nobile, John R; Puc, Bernard P; Light, David; Clark, Travis A; Huber, Martin; Branciforte, Jeffrey T; Stoner, Isaac B; Cawley, Simon E; Lyons, Michael; Fu, Yutao; Homer, Nils; Sedova, Marina; Miao, Xin; Reed, Brian; Sabina, Jeffrey; Feierstein, Erika; Schorn, Michelle; Alanjary, Mohammad; Dimalanta, Eileen; Dressman, Devin; Kasinskas, Rachel; Sokolsky, Tanya; Fidanza, Jacqueline A; Namsaraev, Eugeni; McKernan, Kevin J; Williams, Alan; Roth, G Thomas; Bustillo, James

    2011-07-21

    The seminal importance of DNA sequencing to the life sciences, biotechnology and medicine has driven the search for more scalable and lower-cost solutions. Here we describe a DNA sequencing technology in which scalable, low-cost semiconductor manufacturing techniques are used to make an integrated circuit able to directly perform non-optical DNA sequencing of genomes. Sequence data are obtained by directly sensing the ions produced by template-directed DNA polymerase synthesis using all-natural nucleotides on this massively parallel semiconductor-sensing device or ion chip. The ion chip contains ion-sensitive, field-effect transistor-based sensors in perfect register with 1.2 million wells, which provide confinement and allow parallel, simultaneous detection of independent sequencing reactions. Use of the most widely used technology for constructing integrated circuits, the complementary metal-oxide semiconductor (CMOS) process, allows for low-cost, large-scale production and scaling of the device to higher densities and larger array sizes. We show the performance of the system by sequencing three bacterial genomes, its robustness and scalability by producing ion chips with up to 10 times as many sensors and sequencing a human genome. PMID:21776081

  7. Core sequence in the RNA motif recognized by the ErmE methyltransferase revealed by relaxing the fidelity of the enzyme for its target.

    PubMed Central

    Hansen, L H; Vester, B; Douthwaite, S

    1999-01-01

    Under physiological conditions, the ErmE methyltransferase specifically modifies a single adenosine within ribosomal RNA (rRNA), and thereby confers resistance to multiple antibiotics. The adenosine (A2058 in Escherichia coli 23S rRNA) lies within a highly conserved structure, and is methylated efficiently, and with equally high fidelity, in rRNAs from phylogenetically diverse bacteria. However, the fidelity of ErmE is reduced when magnesium is removed, and over twenty new sites of ErmE methylation appear in E. coli 16S and 23S rRNAs. These sites show widely different degrees of reactivity to ErmE. The canonical A2058 site is largely unaffected by magnesium depletion and remains the most reactive site in the rRNA. This suggests that methylation at the new sites results from changes in the RNA substrate rather than the methyltransferase. Chemical probing confirms that the rRNA structure opens upon magnesium depletion, exposing potential new interaction sites to the enzyme. The new ErmE sites show homology with the canonical A2058 site, and have the consensus sequence aNNNcgGAHAg (ErmE methylation occurs exclusively at adenosines (underlined); these are preceded by a guanosine, equivalent to G2057; there is a high preference for the adenosine equivalent to A2060; H is any nucleotide except G; N is any nucleotide; and there are slight preferences for the nucleotides shown in lower case). This consensus is believed to represent the core of the motif that Erm methyltransferases recognize at their canonical A2058 site. The data also reveal constraints on the higher order structure of the motif that affect methyltransferase recognition. PMID:9917069

  8. Rice MEL2, the RNA recognition motif (RRM) protein, binds in vitro to meiosis-expressed genes containing U-rich RNA consensus sequences in the 3'-UTR.

    PubMed

    Miyazaki, Saori; Sato, Yutaka; Asano, Tomoya; Nagamura, Yoshiaki; Nonomura, Ken-Ichi

    2015-10-01

    Post-transcriptional gene regulation by RNA recognition motif (RRM) proteins through binding to cis-elements in the 3'-untranslated region (3'-UTR) is widely used in eukaryotes to complete various biological processes. Rice MEIOSIS ARRESTED AT LEPTOTENE2 (MEL2) is the RRM protein that functions in the transition to meiosis in proper timing. The MEL2 RRM preferentially associated with the U-rich RNA consensus, UUAGUU[U/A][U/G][A/U/G]U, dependently on sequences and proportionally to MEL2 protein amounts in vitro. The consensus sequences were located in the putative looped structures of the RNA ligand. A genome-wide survey revealed a tendency of MEL2-binding consensus appearing in 3'-UTR of rice genes. Of 249 genes that conserved the consensus in their 3'-UTR, 13 genes spatiotemporally co-expressed with MEL2 in meiotic flowers, and included several genes whose function was supposed in meiosis; such as Replication protein A and OsMADS3. The proteome analysis revealed that the amounts of small ubiquitin-related modifier-like protein and eukaryotic translation initiation factor3-like protein were dramatically altered in mel2 mutant anthers. Taken together with transcriptome and gene ontology results, we propose that the rice MEL2 is involved in the translational regulation of key meiotic genes on 3'-UTRs to achieve the faithful transition of germ cells to meiosis. PMID:26319516

  9. Radiation Desiccation Response Motif-Like Sequences Are Involved in Transcriptional Activation of the Deinococcal ssb Gene by Ionizing Radiation but Not by Desiccation▿

    PubMed Central

    Ujaoney, Aman Kumar; Potnis, Akhilesh A.; Kane, Pratiksha; Mukhopadhyaya, Rita; Apte, Shree Kumar

    2010-01-01

    Single-stranded-DNA binding protein (SSB) levels during poststress recovery of Deinococcus radiodurans were significantly enhanced by 60Co gamma rays or mitomycin C treatment but not by exposure to UV rays, hydrogen peroxide (H2O2), or desiccation. Addition of rifampin prior to postirradiation recovery blocked such induction. In silico analysis of the ssb promoter region revealed a 17-bp palindromic radiation/desiccation response motif (RDRM1) at bp −114 to −98 and a somewhat similar sequence (RDRM2) at bp −213 to −197, upstream of the ssb open reading frame. Involvement of these cis elements in radiation-responsive ssb gene expression was assessed by constructing transcriptional fusions of edited versions of the ssb promoter region with a nonspecific acid phosphatase encoding reporter gene, phoN. Recombinant D. radiodurans strains carrying such constructs clearly revealed (i) transcriptional induction of the ssb promoter upon irradiation and mitomycin C treatment but not upon UV or H2O2 treatment and (ii) involvement of both RDRM-like sequences in such activation of SSB expression, in an additive manner. PMID:20802034

  10. Sequence analysis of mouse vomeronasal receptor gene clusters reveals common promoter motifs and a history of recent expansion

    PubMed Central

    Lane, Robert P.; Cutforth, Tyler; Axel, Richard; Hood, Leroy; Trask, Barbara J.

    2002-01-01

    We have analyzed the organization and sequence of 73 V1R genes encoding putative pheromone receptors to identify regulatory features and characterize the evolutionary history of the V1R family. The 73 V1Rs arose from seven ancestral genes around the time of mouse–rat speciation through large local duplications, and this expansion may contribute to speciation events. Orthologous V1R genes appear to have been lost during primate evolution. Exceptional noncoding homology is observed across four V1R subfamilies at one cluster and thus may be important for locus-specific transcriptional regulation. PMID:11752409

  11. Cooperative Hybridization of γPNA Miniprobes to a Repeating Sequence Motif and Application to Telomere Analysis

    PubMed Central

    Sureshkumar, Gopalsamy; Ly, Danith H.; Opresko, Patricia L.; Armitage, Bruce A.

    2014-01-01

    GammaPNA oligomers having one or two repeats of the sequence AATCCC were designed to hybridize to DNA having one or more repeats of the complementary TTAGGG sequence found in the human telomere. UV melting curves and surface plasmon resonance experiments demonstrate high affinity and cooperativity for hybridization of these miniprobes to DNA having multiple complementary repeats. Fluorescence spectroscopy for Cy3-labeled miniprobes demonstrate increases in fluorescence intensity for assembling multiple short probes on a DNA target compared with fewer longer probes. The fluorescent γPNA miniprobes were then used to stain telomeres in metaphase chromosomes derived from U2OS cells possessing heterogeneous long telomeres and Jurkat cells harboring homogenous short telomeres. The miniprobes yielded comparable fluorescence intensity to a commercially available PNA 18mer probe in U2OS cells, but significantly brighter fluorescence was observed for telomeres in Jurkat cells. These results suggest that γPNA miniprobes can be effective telomere-staining reagents with applications toward analysis of critically short telomeres, which have been implicated in a range of human diseases. PMID:25115693

  12. Identification of amino acid sequence motifs in desmocollin, a desmosomal glycoprotein, that are required for plakoglobin binding and plaque formation.

    PubMed

    Troyanovsky, S M; Troyanovsky, R B; Eshkind, L G; Leube, R E; Franke, W W

    1994-11-01

    By transfecting epithelial cells with gene constructs encoding chimeric proteins of the transmembrane part of the gap junction protein connexin 32 in combination with various segments of the cytoplasmic part of the desmosomal cadherin desmocollin 1a, we have determined that a relatively short sequence element is necessary for the formation of desmosome-like plaques and for the specific anchorage of bundles of intermediate-sized filaments (IFs). Deletion of as little as the carboxyl-terminal 37 aa resulted in a lack of IF anchorage and binding of the plaque protein plakoglobin, as shown by immunolocalization and immunoprecipitation experiments. In addition, we show that the sequence requirements for the recruitment of desmoplakin, another desmosomal plaque protein, differ and that a short (10 aa) segment of the desmocollin 1a tail, located close to the plasma membrane, is also required for the binding of plakoglobin, as well as of desmoplakin, and also for IF anchorage. The importance of the carboxyl-terminal domain, homologous in diverse types of cadherins, is emphasized, as it must harbor, in a mutually exclusive pattern, the information for assembly of the IF-anchoring desmosomal plaque in desmocollins and for formation of the alpha-/beta-catenin- and vinculin-containing, actin filament-anchoring plaque in E- and N-cadherin. PMID:7971964

  13. An Integrated Enzyme Kinetics Laboratory Sequence for Undergraduates.

    ERIC Educational Resources Information Center

    Bucholtz, Michael L.

    1988-01-01

    Describes a three-week sequence to take undergraduate students through the study of enzyme kinetics in an integrated manner that reinforces the basic concepts of initial velocity and the effects of varying operational parameters. Discusses laboratory sessions and the use of a microcomputer in instruction. (CW)

  14. Pegasys: software for executing and integrating analyses of biological sequences

    PubMed Central

    Shah, Sohrab P; He, David YM; Sawkins, Jessica N; Druce, Jeffrey C; Quon, Gerald; Lett, Drew; Zheng, Grace XY; Xu, Tao; Ouellette, BF Francis

    2004-01-01

    Background We present Pegasys – a flexible, modular and customizable software system that facilitates the execution and data integration from heterogeneous biological sequence analysis tools. Results The Pegasys system includes numerous tools for pair-wise and multiple sequence alignment, ab initio gene prediction, RNA gene detection, masking repetitive sequences in genomic DNA as well as filters for database formatting and processing raw output from various analysis tools. We introduce a novel data structure for creating workflows of sequence analyses and a unified data model to store its results. The software allows users to dynamically create analysis workflows at run-time by manipulating a graphical user interface. All non-serial dependent analyses are executed in parallel on a compute cluster for efficiency of data generation. The uniform data model and backend relational database management system of Pegasys allow for results of heterogeneous programs included in the workflow to be integrated and exported into General Feature Format for further analyses in GFF-dependent tools, or GAME XML for import into the Apollo genome editor. The modularity of the design allows for new tools to be added to the system with little programmer overhead. The database application programming interface allows programmatic access to the data stored in the backend through SQL queries. Conclusions The Pegasys system enables biologists and bioinformaticians to create and manage sequence analysis workflows. The software is released under the Open Source GNU General Public License. All source code and documentation is available for download at . PMID:15096276

  15. DILIMOT: discovery of linear motifs in proteins.

    PubMed

    Neduva, Victor; Russell, Robert B

    2006-07-01

    Discovery of protein functional motifs is critical in modern biology. Small segments of 3-10 residues play critical roles in protein interactions, post-translational modifications and trafficking. DILIMOT (DIscovery of LInear MOTifs) is a server for the prediction of these short linear motifs within a set of proteins. Given a set of sequences sharing a common functional feature (e.g. interaction partner or localization) the method finds statistically over-represented motifs likely to be responsible for it. The input sequences are first passed through a set of filters to remove regions unlikely to contain instances of linear motifs. Motifs are then found in the remaining sequence and ranked according to a statistic that measure over-representation and conservation across homologues in related species. The results are displayed via a visual interface for easy perusal. The server is available at http://dilimot.embl.de. PMID:16845024

  16. Function of a unique sequence motif in the long terminal repeat of feline leukemia virus isolated from an unusual set of naturally occurring tumors.

    PubMed

    Athas, G B; Lobelle-Rich, P; Levy, L S

    1995-06-01

    Feline leukemia virus (FeLV) proviruses have been characterized from naturally occurring non-B-cell, non-T-cell tumors occurring in the spleens of infected cats. These proviruses exhibit a unique sequence motif in the long terminal repeat (LTR), namely, a 21-bp tandem triplication beginning 25 bp downstream of the enhancer. The repeated finding of the triplication-containing LTR in non-B-cell, non-T-cell lymphomas of the spleen suggests that the unique LTR is an essential participant in the development of tumors of this particular phenotype. The nucleotide sequence of the triplication-containing LTR most closely resembles that of FeLV subgroup C. Studies performed to measure the ability of the triplication-containing LTR to modulate gene expression indicate that the 21-bp triplication provides transcriptional enhancer function to the LTR that contains it and that it substitutes at least in part for the duplication of the enhancer. The 21-bp triplication confers a bona fide enhancer function upon LTR-directed reporter gene expression; however, the possibility of a spacer function was not eliminated. The studies demonstrate further that the triplication-containing LTR acts preferentially in a cell-type-specific manner, i.e., it is 12-fold more active in K-562 cells than is an LTR lacking the triplication. A recombinant, infectious FeLV bearing the 21-bp triplication in U3 was constructed. Cells infected with the recombinant were shown to accumulate higher levels of viral RNA transcripts and virus particles in culture supernatants than did cells infected with the parental type. The triplication-containing LTR is implicated in the induction of tumors of a particular phenotype, perhaps through transcriptional regulation of the virus and/or adjacent cellular genes, in the appropriate target cell. PMID:7745680

  17. A Short Sequence Motif in the 5′ Leader of the HIV-1 Genome Modulates Extended RNA Dimer Formation and Virus Replication*

    PubMed Central

    van Bel, Nikki; Das, Atze T.; Cornelissen, Marion; Abbink, Truus E. M.; Berkhout, Ben

    2014-01-01

    The 5′ leader of the HIV-1 RNA genome encodes signals that control various steps in the replication cycle, including the dimerization initiation signal (DIS) that triggers RNA dimerization. The DIS folds a hairpin structure with a palindromic sequence in the loop that allows RNA dimerization via intermolecular kissing loop (KL) base pairing. The KL dimer can be stabilized by including the DIS stem nucleotides in the intermolecular base pairing, forming an extended dimer (ED). The role of the ED RNA dimer in HIV-1 replication has hardly been addressed because of technical challenges. We analyzed a set of leader mutants with a stabilized DIS hairpin for in vitro RNA dimerization and virus replication in T cells. In agreement with previous observations, DIS hairpin stability modulated KL and ED dimerization. An unexpected previous finding was that mutation of three nucleotides immediately upstream of the DIS hairpin significantly reduced in vitro ED formation. In this study, we tested such mutants in vivo for the importance of the ED in HIV-1 biology. Mutants with a stabilized DIS hairpin replicated less efficiently than WT HIV-1. This defect was most severe when the upstream sequence motif was altered. Virus evolution experiments with the defective mutants yielded fast replicating HIV-1 variants with second site mutations that (partially) restored the WT hairpin stability. Characterization of the mutant and revertant RNA molecules and the corresponding viruses confirmed the correlation between in vitro ED RNA dimer formation and efficient virus replication, thus indicating that the ED structure is important for HIV-1 replication. PMID:25368321

  18. Binding of Actinomycin D to Single-Stranded DNA of Sequence Motifs d(TGTCTnG) and d(TGTnGTCT)

    PubMed Central

    Chen, Fu-Ming; Sha, Feng; Chin, Ko-Hsin; Chou, Shan-Ho

    2003-01-01

    Our recent binding studies with oligomers derived from base replacements on d(CGTCGTCG) had led to the finding that actinomycin D (ACTD) binds strongly to d(TGTCATTG) of apparent single-stranded conformation without GpC sequence. A fold-back binding model was speculated in which the planar phenoxazone inserts at the GTC site with a loop-out T base whereas the G base at the 3′-terminus folds back to form a basepair with the internal C and stacks on the opposite face of the chromophore. To provide a more concrete support for such a model, ACTD equilibrium binding studies were carried out and the results are reported herein on oligomers of sequence motifs d(TGTCTnG) and d(TGTnGTC). These oligomers are not expected to form dimeric duplexes and contain no canonical GpC sequences. It was found that ACTD binds strongly to d(TGTCTTTTG), d(TGTTTTGTC), and d(TGTTTTTGTC), all exhibiting 1:1 drug/strand binding stoichiometry. The fold-back binding model with displaced T base is further supported by the finding that appending TC and TCA at the 3′-terminus of d(TGTCTTTTG) results in oligomers that exhibit enhanced ACTD affinities, consequence of the added basepairing to facilitate the hairpin formation of d(TGTCTTTTGTC) and d(TGTCTTTTGTCA) in stabilizing the GTC/GTC binding site for juxtaposing the two G bases for easy stacking on both faces of the phenoxazone chromophore. Further support comes from the observation of considerable reduction in ACTD affinity when GTC is replaced by GTTC in an oligomer, in line with the reasoning that displacing two T bases to form a bulge for ACTD binding is more difficult than displacing a single base. Based on the elucidated binding principle of phenoxazone ring requiring its opposite faces to be stacked by the 3′-sides of two G bases for tight ACTD binding, several oligonucleotide sequences have been designed and found to bind well. PMID:12524296

  19. MotifMiner: A Table Driven Greedy Algorithm for DNA Motif Mining

    NASA Astrophysics Data System (ADS)

    Seeja, K. R.; Alam, M. A.; Jain, S. K.

    DNA motif discovery is a much explored problem in functional genomics. This paper describes a table driven greedy algorithm for discovering regulatory motifs in the promoter sequences of co-expressed genes. The proposed algorithm searches both DNA strands for the common patterns or motifs. The inputs to the algorithm are set of promoter sequences, the motif length and minimum Information Content. The algorithm generates subsequences of given length from the shortest input promoter sequence. It stores these subsequences and their reverse complements in a table. Then it searches the remaining sequences for good matches of these subsequences. The Information Content score is used to measure the goodness of the motifs. The algorithm has been tested with synthetic data and real data. The results are found promising. The algorithm could discover meaningful motifs from the muscle specific regulatory sequences.

  20. Papillomavirus sequences integrate near cellular oncogenes in some cervical carcinomas

    SciTech Connect

    Duerst, M.; Croce, C.M.; Gissmann, L.; Schwarz, E.; Huebner, K.

    1987-02-01

    The chromosomal locations of cellular sequences flanking integrated papillomavirus DNA in four cervical cell lines and a primary cervical carcinoma have been determined. The two human papillomavirus (HPV) 16 flanking sequences derived from the tumor were localized to chromosomes regions 20pter..-->..20q13 and 3p25..-->..3qter, regions that also contain the protooncogenes c-src-1 and c-raf-1, respectively. The HPV 16 integration site in the SiHa cervical carcinoma-derived cell line is in chromosome region 13q14..-->..13q32. The HPV 18 integration site in SW756 cervical carcinoma cells is in chromosome 12 but is not closely linked to the Ki-ras2 gene. Finally, in two cervical carcinoma cell lines, HeLa and C4-I, HPV 18 DNA is integrated in chromosome 8, 5' of the c-myc gene. The HeLaHPV 18 integration site is within 40 kilobases 5' of the c-myc gene, inside the HL60 amplification unit surrounding and including the c-myc gene. Additionally, steady-state levels of c-myc mRNA are elevated in HeLa and C4-I cells relative to other cervical carcinoma cell lines. Thus, in at least some genital tumors, cis-activation of cellular oncogenes by HPV may be involved in malignant transformation of cervical cells.

  1. Perspectives of integrative cancer genomics in next generation sequencing era.

    PubMed

    Kwon, So Mee; Cho, Hyunwoo; Choi, Ji Hye; Jee, Byul A; Jo, Yuna; Woo, Hyun Goo

    2012-06-01

    The explosive development of genomics technologies including microarrays and next generation sequencing (NGS) has provided comprehensive maps of cancer genomes, including the expression of mRNAs and microRNAs, DNA copy numbers, sequence variations, and epigenetic changes. These genome-wide profiles of the genetic aberrations could reveal the candidates for diagnostic and/or prognostic biomarkers as well as mechanistic insights into tumor development and progression. Recent efforts to establish the huge cancer genome compendium and integrative omics analyses, so-called "integromics", have extended our understanding on the cancer genome, showing its daunting complexity and heterogeneity. However, the challenges of the structured integration, sharing, and interpretation of the big omics data still remain to be resolved. Here, we review several issues raised in cancer omics data analysis, including NGS, focusing particularly on the study design and analysis strategies. This might be helpful to understand the current trends and strategies of the rapidly evolving cancer genomics research. PMID:23105932

  2. Music and language perception: expectations, structural integration, and cognitive sequencing.

    PubMed

    Tillmann, Barbara

    2012-10-01

    Music can be described as sequences of events that are structured in pitch and time. Studying music processing provides insight into how complex event sequences are learned, perceived, and represented by the brain. Given the temporal nature of sound, expectations, structural integration, and cognitive sequencing are central in music perception (i.e., which sounds are most likely to come next and at what moment should they occur?). This paper focuses on similarities in music and language cognition research, showing that music cognition research provides insight into the understanding of not only music processing but also language processing and the processing of other structured stimuli. The hypothesis of shared resources between music and language processing and of domain-general dynamic attention has motivated the development of research to test music as a means to stimulate sensory, cognitive, and motor processes. PMID:22760955

  3. Selection against spurious promoter motifs correlates withtranslational efficiency across bacteria

    SciTech Connect

    Froula, Jeffrey L.; Francino, M. Pilar

    2007-05-01

    Because binding of RNAP to misplaced sites could compromise the efficiency of transcription, natural selection for the optimization of gene expression should regulate the distribution of DNA motifs capable of RNAP-binding across the genome. Here we analyze the distribution of the -10 promoter motifs that bind the {sigma}{sup 70} subunit of RNAP in 42 bacterial genomes. We show that selection on these motifs operates across the genome, maintaining an over-representation of -10 motifs in regulatory sequences while eliminating them from the nonfunctional and, in most cases, from the protein coding regions. In some genomes, however, -10 sites are over-represented in the coding sequences; these sites could induce pauses effecting regulatory roles throughout the length of a transcriptional unit. For nonfunctional sequences, the extent of motif under-representation varies across genomes in a manner that broadly correlates with the number of tRNA genes, a good indicator of translational speed and growth rate. This suggests that minimizing the time invested in gene transcription is an important selective pressure against spurious binding. However, selection against spurious binding is detectable in the reduced genomes of host-restricted bacteria that grow at slow rates, indicating that components of efficiency other than speed may also be important. Minimizing the number of RNAP molecules per cell required for transcription, and the corresponding energetic expense, may be most relevant in slow growers. These results indicate that genome-level properties affecting the efficiency of transcription and translation can respond in an integrated manner to optimize gene expression. The detection of selection against promoter motifs in nonfunctional regions also implies that no sequence may evolve free of selective constraints, at least in the relatively small and unstructured genomes of bacteria.

  4. MEME Suite: tools for motif discovery and searching

    PubMed Central

    Bailey, Timothy L.; Boden, Mikael; Buske, Fabian A.; Frith, Martin; Grant, Charles E.; Clementi, Luca; Ren, Jingyuan; Li, Wilfred W.; Noble, William S.

    2009-01-01

    The MEME Suite web server provides a unified portal for online discovery and analysis of sequence motifs representing features such as DNA binding sites and protein interaction domains. The popular MEME motif discovery algorithm is now complemented by the GLAM2 algorithm which allows discovery of motifs containing gaps. Three sequence scanning algorithms—MAST, FIMO and GLAM2SCAN—allow scanning numerous DNA and protein sequence databases for motifs discovered by MEME and GLAM2. Transcription factor motifs (including those discovered using MEME) can be compared with motifs in many popular motif databases using the motif database scanning algorithm Tomtom. Transcription factor motifs can be further analyzed for putative function by association with Gene Ontology (GO) terms using the motif-GO term association tool GOMO. MEME output now contains sequence LOGOS for each discovered motif, as well as buttons to allow motifs to be conveniently submitted to the sequence and motif database scanning algorithms (MAST, FIMO and Tomtom), or to GOMO, for further analysis. GLAM2 output similarly contains buttons for further analysis using GLAM2SCAN and for rerunning GLAM2 with different parameters. All of the motif-based tools are now implemented as web services via Opal. Source code, binaries and a web server are freely available for noncommercial use at http://meme.nbcr.net. PMID:19458158

  5. The Annotation of RNA Motifs

    PubMed Central

    2002-01-01

    The recent deluge of new RNA structures, including complete atomic-resolution views of both subunits of the ribosome, has on the one hand literally overwhelmed our individual abilities to comprehend the diversity of RNA structure, and on the other hand presented us with new opportunities for comprehensive use of RNA sequences for comparative genetic, evolutionary and phylogenetic studies. Two concepts are key to understanding RNA structure: hierarchical organization of global structure and isostericity of local interactions. Global structure changes extremely slowly, as it relies on conserved long-range tertiary interactions. Tertiary RNA–RNA and quaternary RNA–protein interactions are mediated by RNA motifs, defined as recurrent and ordered arrays of non-Watson–Crick base-pairs. A single RNA motif comprises a family of sequences, all of which can fold into the same three-dimensional structure and can mediate the same interaction(s). The chemistry and geometry of base pairing constrain the evolution of motifs in such a way that random mutations that occur within motifs are accepted or rejected insofar as they can mediate a similar ordered array of interactions. The steps involved in the analysis and annotation of RNA motifs in 3D structures are: (a) decomposition of each motif into non-Watson–Crick base-pairs; (b) geometric classification of each basepair; (c) identification of isosteric substitutions for each basepair by comparison to isostericity matrices; (d) alignment of homologous sequences using the isostericity matrices to identify corresponding positions in the crystal structure; (e) acceptance or rejection of the null hypothesis that the motif is conserved. PMID:18629252

  6. DNA motifs determining the accuracy of repeat duplication during CRISPR adaptation in Haloarcula hispanica

    PubMed Central

    Wang, Rui; Li, Ming; Gong, Luyao; Hu, Songnian; Xiang, Hua

    2016-01-01

    Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) acquire new spacers to generate adaptive immunity in prokaryotes. During spacer integration, the leader-preceded repeat is always accurately duplicated, leading to speculations of a repeat-length ruler. Here in Haloarcula hispanica, we demonstrate that the accurate duplication of its 30-bp repeat requires two conserved mid-repeat motifs, AACCC and GTGGG. The AACCC motif was essential and needed to be ∼10 bp downstream from the leader-repeat junction site, where duplication consistently started. Interestingly, repeat duplication terminated sequence-independently and usually with a specific distance from the GTGGG motif, which seemingly served as an anchor site for a molecular ruler. Accordingly, altering the spacing between the two motifs led to an aberrant duplication size (29, 31, 32 or 33 bp). We propose the adaptation complex may recognize these mid-repeat elements to enable measuring the repeat DNA for spacer integration. PMID:27085805

  7. DNA motifs determining the accuracy of repeat duplication during CRISPR adaptation in Haloarcula hispanica.

    PubMed

    Wang, Rui; Li, Ming; Gong, Luyao; Hu, Songnian; Xiang, Hua

    2016-05-19

    Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) acquire new spacers to generate adaptive immunity in prokaryotes. During spacer integration, the leader-preceded repeat is always accurately duplicated, leading to speculations of a repeat-length ruler. Here in Haloarcula hispanica, we demonstrate that the accurate duplication of its 30-bp repeat requires two conserved mid-repeat motifs, AACCC and GTGGG. The AACCC motif was essential and needed to be ∼10 bp downstream from the leader-repeat junction site, where duplication consistently started. Interestingly, repeat duplication terminated sequence-independently and usually with a specific distance from the GTGGG motif, which seemingly served as an anchor site for a molecular ruler. Accordingly, altering the spacing between the two motifs led to an aberrant duplication size (29, 31, 32 or 33 bp). We propose the adaptation complex may recognize these mid-repeat elements to enable measuring the repeat DNA for spacer integration. PMID:27085805

  8. The Pichia pastoris PER6 gene product is a peroxisomal integral membrane protein essential for peroxisome biogenesis and has sequence similarity to the Zellweger syndrome protein PAF-1.

    PubMed Central

    Waterham, H R; de Vries, Y; Russel, K A; Xie, W; Veenhuis, M; Cregg, J M

    1996-01-01

    We report the cloning of PER6, a gene essential for peroxisome biogenesis in the methylotrophic yeast Pichia pastoris. The PER6 sequence predicts that its product Per6p is a 52-kDa polypeptide with the cysteine-rich C3HC4 motif. Per6p has significant overall sequence similarity with the human peroxisome assembly factor PAF-1, a protein that is defective in certain patients suffering from the peroxisomal disorder Zellweger syndrome, and with car1, a protein required for peroxisome biogenesis and caryogamy in the filamentous fungus Podospora anserina. In addition, the C3HC4 motif and two of the three membrane-spanning segments predicted for Per6p align with the C3HC4 motifs and the two membrane-spanning segments predicted for PAF-1 and car1. Like PAF-1, Per6p is a peroxisomal integral membrane protein. In methanol- or oleic acid-induced cells of per6 mutants, morphologically recognizable peroxisomes are absent. Instead, peroxisomal remnants are observed. In addition, peroxisomal matrix proteins are synthesized but located in the cytosol. The similarities between Per6p and PAF-1 in amino acid sequence and biochemical properties, and between mutants defective in their respective genes, suggest that Per6p is the putative yeast homolog of PAF-1. PMID:8628321

  9. Ovodefensins, an Oviduct-Specific Antimicrobial Gene Family, Have Evolved in Birds and Reptiles to Protect the Egg by Both Sequence and Intra-Six-Cysteine Sequence Motif Spacing.

    PubMed

    Whenham, Natasha; Lu, Tian Chee; Maidin, Maisarah B M; Wilson, Peter W; Bain, Maureen M; Stevenson, M Lynn; Stevens, Mark P; Bedford, Michael R; Dunn, Ian C

    2015-06-01

    Ovodefensins are a novel beta defensin-related family of antimicrobial peptides containing conserved glycine and six cysteine residues. Originally thought to be restricted to the albumen-producing region of the avian oviduct, expression was found in chicken, turkey, duck, and zebra finch in large quantities in many parts of the oviduct, but this varied between species and between gene forms in the same species. Using new search strategies, the ovodefensin family now has 35 members, including reptiles, but no representatives outside birds and reptiles have been found. Analysis of their evolution shows that ovodefensins divide into six groups based on the intra-cysteine amino acid spacing, representing a unique mechanism alongside traditional evolution of sequence. The groups have been used to base a nomenclature for the family. Antimicrobial activity for three ovodefensins from chicken and duck was confirmed against Escherichia coli and a pathogenic E. coli strain as well as a Gram-positive organism, Staphylococcus aureus, for the first time. However, activity varied greatly between peptides, with Gallus gallus OvoDA1 being the most potent, suggesting a link with the different structures. Expression of Gallus gallus OvoDA1 (gallin) in the oviduct was increased by estrogen and progesterone and in the reproductive state. Overall, the results support the hypothesis that ovodefensins evolved to protect the egg, but they are not necessarily restricted to the egg white. Therefore, divergent motif structure and sequence present an interesting area of research for antimicrobial peptide design and understanding protection of the cleidoic egg. PMID:25972010

  10. Stochastic motif extraction using hidden Markov model

    SciTech Connect

    Fujiwara, Yukiko; Asogawa, Minoru; Konagaya, Akihiko

    1994-12-31

    In this paper, we study the application of an HMM (hidden Markov model) to the problem of representing protein sequences by a stochastic motif. A stochastic protein motif represents the small segments of protein sequences that have a certain function or structure. The stochastic motif, represented by an HMM, has conditional probabilities to deal with the stochastic nature of the motif. This HMM directive reflects the characteristics of the motif, such as a protein periodical structure or grouping. In order to obtain the optimal HMM, we developed the {open_quotes}iterative duplication method{close_quotes} for HMM topology learning. It starts from a small fully-connected network and iterates the network generation and parameter optimization until it achieves sufficient discrimination accuracy. Using this method, we obtained an HMM for a leucine zipper motif. Compared to the accuracy of a symbolic pattern representation with accuracy of 14.8 percent, an HMM achieved 79.3 percent in prediction. Additionally, the method can obtain an HMM for various types of zinc finger motifs, and it might separate the mixed data. We demonstrated that this approach is applicable to the validation of the protein databases; a constructed HMM b as indicated that one protein sequence annotated as {open_quotes}lencine-zipper like sequence{close_quotes} in the database is quite different from other leucine-zipper sequences in terms of likelihood, and we found this discrimination is plausible.

  11. Temporal motifs in time-dependent networks

    NASA Astrophysics Data System (ADS)

    Kovanen, Lauri; Karsai, Márton; Kaski, Kimmo; Kertész, János; Saramäki, Jari

    2011-11-01

    Temporal networks are commonly used to represent systems where connections between elements are active only for restricted periods of time, such as telecommunication, neural signal processing, biochemical reaction and human social interaction networks. We introduce the framework of temporal motifs to study the mesoscale topological-temporal structure of temporal networks in which the events of nodes do not overlap in time. Temporal motifs are classes of similar event sequences, where the similarity refers not only to topology but also to the temporal order of the events. We provide a mapping from event sequences to coloured directed graphs that enables an efficient algorithm for identifying temporal motifs. We discuss some aspects of temporal motifs, including causality and null models, and present basic statistics of temporal motifs in a large mobile call network.

  12. Integrated sequence and immunology filovirus database at Los Alamos

    SciTech Connect

    Yusim, Karina; Yoon, Hyejin; Foley, Brian; Feng, Shihai; Macke, Jennifer; Dimitrijevic, Mira; Abfalterer, Werner; Szinger, James; Fischer, Will; Kuiken, Carla; Korber, Bette

    2016-01-01

    The Ebola outbreak of 2013–15 infected more than 28,000 people and claimed more lives than all previous filovirus outbreaks combined. Governmental agencies, clinical teams, and the world scientific community pulled together in a multifaceted response ranging from prevention and disease control, to evaluating vaccines and therapeutics in human trials. We report that as this epidemic is finally coming to a close, refocusing on long-term prevention strategies becomes paramount. Given the very real threat of future filovirus outbreaks, and the inherent uncertainty of the next outbreak virus and geographic location, it is prudent to consider the extent and implications of known natural diversity in advancing vaccines and therapeutic approaches. To facilitate such consideration, we have updated and enhanced the content of the filovirus portion of Los Alamos Hemorrhagic Fever Viruses Database. We have integrated and performed baseline analysis of all family Filoviridae sequences deposited into GenBank, with associated immune response data, and metadata, and we have added new computational tools with web-interfaces to assist users with analysis. Here, we (i) describe the main features of updated database, (ii) provide integrated views and some basic analyses summarizing evolutionary patterns as they relate to geo-temporal data captured in the database and (iii) highlight the most conserved regions in the proteome that may be useful for a T cell vaccine strategy.

  13. Integrated sequence and immunology filovirus database at Los Alamos

    DOE PAGESBeta

    Yusim, Karina; Yoon, Hyejin; Foley, Brian; Feng, Shihai; Macke, Jennifer; Dimitrijevic, Mira; Abfalterer, Werner; Szinger, James; Fischer, Will; Kuiken, Carla; et al

    2016-01-01

    The Ebola outbreak of 2013–15 infected more than 28,000 people and claimed more lives than all previous filovirus outbreaks combined. Governmental agencies, clinical teams, and the world scientific community pulled together in a multifaceted response ranging from prevention and disease control, to evaluating vaccines and therapeutics in human trials. We report that as this epidemic is finally coming to a close, refocusing on long-term prevention strategies becomes paramount. Given the very real threat of future filovirus outbreaks, and the inherent uncertainty of the next outbreak virus and geographic location, it is prudent to consider the extent and implications of knownmore » natural diversity in advancing vaccines and therapeutic approaches. To facilitate such consideration, we have updated and enhanced the content of the filovirus portion of Los Alamos Hemorrhagic Fever Viruses Database. We have integrated and performed baseline analysis of all family Filoviridae sequences deposited into GenBank, with associated immune response data, and metadata, and we have added new computational tools with web-interfaces to assist users with analysis. Here, we (i) describe the main features of updated database, (ii) provide integrated views and some basic analyses summarizing evolutionary patterns as they relate to geo-temporal data captured in the database and (iii) highlight the most conserved regions in the proteome that may be useful for a T cell vaccine strategy.« less

  14. Integrated sequence and immunology filovirus database at Los Alamos

    PubMed Central

    Yoon, Hyejin; Foley, Brian; Feng, Shihai; Macke, Jennifer; Dimitrijevic, Mira; Abfalterer, Werner; Szinger, James; Fischer, Will; Kuiken, Carla; Korber, Bette

    2016-01-01

    The Ebola outbreak of 2013–15 infected more than 28 000 people and claimed more lives than all previous filovirus outbreaks combined. Governmental agencies, clinical teams, and the world scientific community pulled together in a multifaceted response ranging from prevention and disease control, to evaluating vaccines and therapeutics in human trials. As this epidemic is finally coming to a close, refocusing on long-term prevention strategies becomes paramount. Given the very real threat of future filovirus outbreaks, and the inherent uncertainty of the next outbreak virus and geographic location, it is prudent to consider the extent and implications of known natural diversity in advancing vaccines and therapeutic approaches. To facilitate such consideration, we have updated and enhanced the content of the filovirus portion of Los Alamos Hemorrhagic Fever Viruses Database. We have integrated and performed baseline analysis of all family Filoviridae sequences deposited into GenBank, with associated immune response data, and metadata, and we have added new computational tools with web-interfaces to assist users with analysis. Here, we (i) describe the main features of updated database, (ii) provide integrated views and some basic analyses summarizing evolutionary patterns as they relate to geo-temporal data captured in the database and (iii) highlight the most conserved regions in the proteome that may be useful for a T cell vaccine strategy. Database URL: www.hfv.lanl.gov PMID:27103629

  15. Single base pair differences in a shared motif determine differential Rhodopsin expression

    PubMed Central

    Rister, Jens; Razzaq, Ansa; Boodram, Pamela; Desai, Nisha; Tsanis, Cleopatra; Chen, Hongtao; Jukam, David; Desplan, Claude

    2016-01-01

    The final identity and functional properties of a neuron are specified by terminal differentiation genes, which are controlled by specific motifs in compact regulatory regions. To determine how these sequences integrate inputs from transcription factors that specify cell types, we compared the regulatory mechanism of Drosophila Rhodopsin genes that are expressed in subsets of photoreceptors to that of phototransduction genes that are expressed broadly, in all photoreceptors. Both sets of genes share an 11bp activator motif. Broadly expressed genes contain a palindromic version that mediates expression in all photoreceptors. In contrast, each Rhodopsin exhibits unique single bp substitutions that break the symmetry of the palindrome and generate activator or repressor motifs critical for restricting expression to photoreceptor subsets. Novel sensory neuron subtypes can therefore evolve through single base pair changes in short regulatory motifs, allowing the discrimination of a wide spectrum of stimuli. PMID:26785491

  16. Mining Conditional Phosphorylation Motifs.

    PubMed

    Liu, Xiaoqing; Wu, Jun; Gong, Haipeng; Deng, Shengchun; He, Zengyou

    2014-01-01

    Phosphorylation motifs represent position-specific amino acid patterns around the phosphorylation sites in the set of phosphopeptides. Several algorithms have been proposed to uncover phosphorylation motifs, whereas the problem of efficiently discovering a set of significant motifs with sufficiently high coverage and non-redundancy still remains unsolved. Here we present a novel notion called conditional phosphorylation motifs. Through this new concept, the motifs whose over-expressiveness mainly benefits from its constituting parts can be filtered out effectively. To discover conditional phosphorylation motifs, we propose an algorithm called C-Motif for a non-redundant identification of significant phosphorylation motifs. C-Motif is implemented under the Apriori framework, and it tests the statistical significance together with the frequency of candidate motifs in a single stage. Experiments demonstrate that C-Motif outperforms some current algorithms such as MMFPh and Motif-All in terms of coverage and non-redundancy of the results and efficiency of the execution. The source code of C-Motif is available at: https://sourceforge. net/projects/cmotif/. PMID:26356863

  17. RSAT peak-motifs: motif analysis in full-size ChIP-seq datasets.

    PubMed

    Thomas-Chollier, Morgane; Herrmann, Carl; Defrance, Matthieu; Sand, Olivier; Thieffry, Denis; van Helden, Jacques

    2012-02-01

    ChIP-seq is increasingly used to characterize transcription factor binding and chromatin marks at a genomic scale. Various tools are now available to extract binding motifs from peak data sets. However, most approaches are only available as command-line programs, or via a website but with size restrictions. We present peak-motifs, a computational pipeline that discovers motifs in peak sequences, compares them with databases, exports putative binding sites for visualization in the UCSC genome browser and generates an extensive report suited for both naive and expert users. It relies on time- and memory-efficient algorithms enabling the treatment of several thousand peaks within minutes. Regarding time efficiency, peak-motifs outperforms all comparable tools by several orders of magnitude. We demonstrate its accuracy by analyzing data sets ranging from 4000 to 1,28,000 peaks for 12 embryonic stem cell-specific transcription factors. In all cases, the program finds the expected motifs and returns additional motifs potentially bound by cofactors. We further apply peak-motifs to discover tissue-specific motifs in peak collections for the p300 transcriptional co-activator. To our knowledge, peak-motifs is the only tool that performs a complete motif analysis and offers a user-friendly web interface without any restriction on sequence size or number of peaks. PMID:22156162

  18. RegulonDB version 9.0: high-level integration of gene regulation, coexpression, motif clustering and beyond

    PubMed Central

    Gama-Castro, Socorro; Salgado, Heladia; Santos-Zavaleta, Alberto; Ledezma-Tejeida, Daniela; Muñiz-Rascado, Luis; García-Sotelo, Jair Santiago; Alquicira-Hernández, Kevin; Martínez-Flores, Irma; Pannier, Lucia; Castro-Mondragón, Jaime Abraham; Medina-Rivera, Alejandra; Solano-Lira, Hilda; Bonavides-Martínez, César; Pérez-Rueda, Ernesto; Alquicira-Hernández, Shirley; Porrón-Sotelo, Liliana; López-Fuentes, Alejandra; Hernández-Koutoucheva, Anastasia; Moral-Chávez, Víctor Del; Rinaldi, Fabio; Collado-Vides, Julio

    2016-01-01

    RegulonDB (http://regulondb.ccg.unam.mx) is one of the most useful and important resources on bacterial gene regulation,as it integrates the scattered scientific knowledge of the best-characterized organism, Escherichia coli K-12, in a database that organizes large amounts of data. Its electronic format enables researchers to compare their results with the legacy of previous knowledge and supports bioinformatics tools and model building. Here, we summarize our progress with RegulonDB since our last Nucleic Acids Research publication describing RegulonDB, in 2013. In addition to maintaining curation up-to-date, we report a collection of 232 interactions with small RNAs affecting 192 genes, and the complete repertoire of 189 Elementary Genetic Sensory-Response units (GENSOR units), integrating the signal, regulatory interactions, and metabolic pathways they govern. These additions represent major progress to a higher level of understanding of regulated processes. We have updated the computationally predicted transcription factors, which total 304 (184 with experimental evidence and 120 from computational predictions); we updated our position-weight matrices and have included tools for clustering them in evolutionary families. We describe our semiautomatic strategy to accelerate curation, including datasets from high-throughput experiments, a novel coexpression distance to search for ‘neighborhood’ genes to known operons and regulons, and computational developments. PMID:26527724

  19. RegulonDB version 9.0: high-level integration of gene regulation, coexpression, motif clustering and beyond.

    PubMed

    Gama-Castro, Socorro; Salgado, Heladia; Santos-Zavaleta, Alberto; Ledezma-Tejeida, Daniela; Muñiz-Rascado, Luis; García-Sotelo, Jair Santiago; Alquicira-Hernández, Kevin; Martínez-Flores, Irma; Pannier, Lucia; Castro-Mondragón, Jaime Abraham; Medina-Rivera, Alejandra; Solano-Lira, Hilda; Bonavides-Martínez, César; Pérez-Rueda, Ernesto; Alquicira-Hernández, Shirley; Porrón-Sotelo, Liliana; López-Fuentes, Alejandra; Hernández-Koutoucheva, Anastasia; Del Moral-Chávez, Víctor; Rinaldi, Fabio; Collado-Vides, Julio

    2016-01-01

    RegulonDB (http://regulondb.ccg.unam.mx) is one of the most useful and important resources on bacterial gene regulation,as it integrates the scattered scientific knowledge of the best-characterized organism, Escherichia coli K-12, in a database that organizes large amounts of data. Its electronic format enables researchers to compare their results with the legacy of previous knowledge and supports bioinformatics tools and model building. Here, we summarize our progress with RegulonDB since our last Nucleic Acids Research publication describing RegulonDB, in 2013. In addition to maintaining curation up-to-date, we report a collection of 232 interactions with small RNAs affecting 192 genes, and the complete repertoire of 189 Elementary Genetic Sensory-Response units (GENSOR units), integrating the signal, regulatory interactions, and metabolic pathways they govern. These additions represent major progress to a higher level of understanding of regulated processes. We have updated the computationally predicted transcription factors, which total 304 (184 with experimental evidence and 120 from computational predictions); we updated our position-weight matrices and have included tools for clustering them in evolutionary families. We describe our semiautomatic strategy to accelerate curation, including datasets from high-throughput experiments, a novel coexpression distance to search for 'neighborhood' genes to known operons and regulons, and computational developments. PMID:26527724

  20. Automated classification of RNA 3D motifs and the RNA 3D Motif Atlas

    PubMed Central

    Petrov, Anton I.; Zirbel, Craig L.; Leontis, Neocles B.

    2013-01-01

    The analysis of atomic-resolution RNA three-dimensional (3D) structures reveals that many internal and hairpin loops are modular, recurrent, and structured by conserved non-Watson–Crick base pairs. Structurally similar loops define RNA 3D motifs that are conserved in homologous RNA molecules, but can also occur at nonhomologous sites in diverse RNAs, and which often vary in sequence. To further our understanding of RNA motif structure and sequence variability and to provide a useful resource for structure modeling and prediction, we present a new method for automated classification of internal and hairpin loop RNA 3D motifs and a new online database called the RNA 3D Motif Atlas. To classify the motif instances, a representative set of internal and hairpin loops is automatically extracted from a nonredundant list of RNA-containing PDB files. Their structures are compared geometrically, all-against-all, using the FR3D program suite. The loops are clustered into motif groups, taking into account geometric similarity and structural annotations and making allowance for a variable number of bulged bases. The automated procedure that we have implemented identifies all hairpin and internal loop motifs previously described in the literature. All motif instances and motif groups are assigned unique and stable identifiers and are made available in the RNA 3D Motif Atlas (http://rna.bgsu.edu/motifs), which is automatically updated every four weeks. The RNA 3D Motif Atlas provides an interactive user interface for exploring motif diversity and tools for programmatic data access. PMID:23970545

  1. Localization of Daucus carota NMCP1 to the nuclear periphery: the role of the N-terminal region and an NLS-linked sequence motif, RYNLRR, in the tail domain

    PubMed Central

    Kimura, Yuta; Fujino, Kaien; Ogawa, Kana; Masuda, Kiyoshi

    2014-01-01

    Recent ultrastructural studies revealed that a structure similar to the vertebrate nuclear lamina exists in the nuclei of higher plants. However, plant genomes lack genes for lamins and intermediate-type filament proteins, and this suggests that plant-specific nuclear coiled-coil proteins make up the lamina-like structure in plants. NMCP1 is a protein, first identified in Daucus carota cells, that localizes exclusively to the nuclear periphery in interphase cells. It has a tripartite structure comprised of head, rod, and tail domains, and includes putative nuclear localization signal (NLS) motifs. We identified the functional NLS of DcNMCP1 (carrot NMCP1) and determined the protein regions required for localizing to the nuclear periphery using EGFP-fused constructs transiently expressed in Apium graveolens epidermal cells. Transcription was driven under a CaMV35S promoter, and the genes were introduced into the epidermal cells by a DNA-coated microprojectile delivery system. Of the NLS motifs, KRRRK and RRHK in the tail domain were highly functional for nuclear localization. Addition of the N-terminal 141 amino acids from DcNMCP1 shifted the localization of a region including these NLSs from the entire nucleus to the nuclear periphery. Using this same construct, the replacement of amino acids in RRHK or its preceding sequence, YNL, with alanine residues abolished localization to the nuclear periphery, while replacement of KRRRK did not affect localization. The sequence R/Q/HYNLRR/H, including YNL and the first part of the sequence of RRHK, is evolutionarily conserved in a subclass of NMCP1 sequences from many plant species. These results show that NMCP1 localizes to the nuclear periphery by a combined action of a sequence composed of R/Q/HYNLRR/H, NLS, and the N-terminal region including the head and a portion of the rod domain, suggesting that more than one binding site is implicated in localization of NMCP1. PMID:24616728

  2. Integration of Temporal and Ordinal Information During Serial Interception Sequence Learning

    PubMed Central

    Gobel, Eric W.; Sanchez, Daniel J.; Reber, Paul J.

    2011-01-01

    The expression of expert motor skills typically involves learning to perform a precisely timed sequence of movements (e.g., language production, music performance, athletic skills). Research examining incidental sequence learning has previously relied on a perceptually-cued task that gives participants exposure to repeating motor sequences but does not require timing of responses for accuracy. Using a novel perceptual-motor sequence learning task, learning a precisely timed cued sequence of motor actions is shown to occur without explicit instruction. Participants learned a repeating sequence through practice and showed sequence-specific knowledge via a performance decrement when switched to an unfamiliar sequence. In a second experiment, the integration of representation of action order and timing sequence knowledge was examined. When either action order or timing sequence information was selectively disrupted, performance was reduced to levels similar to completely novel sequences. Unlike prior sequence-learning research that has found timing information to be secondary to learning action sequences, when the task demands require accurate action and timing information, an integrated representation of these types of information is acquired. These results provide the first evidence for incidental learning of fully integrated action and timing sequence information in the absence of an independent representation of action order, and suggest that this integrative mechanism may play a material role in the acquisition of complex motor skills. PMID:21417511

  3. Integration of temporal and ordinal information during serial interception sequence learning.

    PubMed

    Gobel, Eric W; Sanchez, Daniel J; Reber, Paul J

    2011-07-01

    The expression of expert motor skills typically involves learning to perform a precisely timed sequence of movements. Research examining incidental sequence learning has relied on a perceptually cued task that gives participants exposure to repeating motor sequences but does not require timing of responses for accuracy. In the 1st experiment, a novel perceptual-motor sequence learning task was used, and learning a precisely timed cued sequence of motor actions was shown to occur without explicit instruction. Participants learned a repeating sequence through practice and showed sequence-specific knowledge via a performance decrement when switched to an unfamiliar sequence. In the 2nd experiment, the integration of representation of action order and timing sequence knowledge was examined. When either action order or timing sequence information was selectively disrupted, performance was reduced to levels similar to completely novel sequences. Unlike prior sequence-learning research that has found timing information to be secondary to learning action sequences, when the task demands require accurate action and timing information, an integrated representation of these types of information is acquired. These results provide the first evidence for incidental learning of fully integrated action and timing sequence information in the absence of an independent representation of action order and suggest that this integrative mechanism may play a material role in the acquisition of complex motor skills. PMID:21417511

  4. Integrated and Independent Learning of Hand-Related Constituent Sequences

    ERIC Educational Resources Information Center

    Berner, Michael P.; Hoffmann, Joachim

    2009-01-01

    In almost all daily activities fingers of both hands are used in coordinated succession. The present experiments explored whether learning in such tasks pertains not only to the overall sequence spanning both hands but also to the constituent sequences of each hand. In a serial reaction time task, 2 repeating hand-related sequences were…

  5. Sampling Motif-Constrained Ensembles of Networks

    NASA Astrophysics Data System (ADS)

    Fischer, Rico; Leitão, Jorge C.; Peixoto, Tiago P.; Altmann, Eduardo G.

    2015-10-01

    The statistical significance of network properties is conditioned on null models which satisfy specified properties but that are otherwise random. Exponential random graph models are a principled theoretical framework to generate such constrained ensembles, but which often fail in practice, either due to model inconsistency or due to the impossibility to sample networks from them. These problems affect the important case of networks with prescribed clustering coefficient or number of small connected subgraphs (motifs). In this Letter we use the Wang-Landau method to obtain a multicanonical sampling that overcomes both these problems. We sample, in polynomial time, networks with arbitrary degree sequences from ensembles with imposed motifs counts. Applying this method to social networks, we investigate the relation between transitivity and homophily, and we quantify the correlation between different types of motifs, finding that single motifs can explain up to 60% of the variation of motif profiles.

  6. Deep sequencing of the hepatitis B virus in hepatocellular carcinoma patients reveals enriched integration events, structural alterations and sequence variations.

    PubMed

    Toh, Soo Ting; Jin, Yu; Liu, Lizhen; Wang, Jingbo; Babrzadeh, Farbod; Gharizadeh, Baback; Ronaghi, Mostafa; Toh, Han Chong; Chow, Pierce Kah-Hoe; Chung, Alexander Y-F; Ooi, London L-P-J; Lee, Caroline G-L

    2013-04-01

    Chronic hepatitis B virus (HBV) infection is epidemiologically associated with hepatocellular carcinoma (HCC), but its role in HCC remains poorly understood due to technological limitations. In this study, we systematically characterize HBV in HCC patients. HBV sequences were enriched from 48 HCC patients using an oligo-bead-based strategy, pooled together and sequenced using the FLX-Genome-Sequencer. In the tumors, preferential integration of HBV into promoters of genes (P < 0.001) and significant enrichment of integration into chromosome 10 (P < 0.01) were observed. Integration into chromosome 10 was significantly associated with poorly differentiated tumors (P < 0.05). Notably, in the tumors, recurrent integration into the promoter of the human telomerase reverse transcriptase (TERT) gene was found to correlate with increased TERT expression. The preferred region within the HBV genome involved in integration and viral structural alteration is at the 3'-end of hepatitis B virus X protein (HBx), where viral replication/transcription initiates. Upon integration, the 3'-end of the HBx is often deleted. HBx-human chimeric transcripts, the most common type of chimeric transcripts, can be expressed as chimeric proteins. Sequence variation resulting in non-conservative amino acid substitutions are commonly observed in HBV genome. This study highlights HBV as highly mutable in HCC patients with preferential regions within the host and virus genome for HBV integration/structural alterations. PMID:23276797

  7. ATtRACT—a database of RNA-binding proteins and associated motifs

    PubMed Central

    Giudice, Girolamo; Sánchez-Cabo, Fátima; Torroja, Carlos; Lara-Pezzi, Enrique

    2016-01-01

    RNA-binding proteins (RBPs) play a crucial role in key cellular processes, including RNA transport, splicing, polyadenylation and stability. Understanding the interaction between RBPs and RNA is key to improve our knowledge of RNA processing, localization and regulation in a global manner. Despite advances in recent years, a unified non-redundant resource that includes information on experimentally validated motifs, RBPs and integrated tools to exploit this information is lacking. Here, we developed a database named ATtRACT (available at http://attract.cnic.es) that compiles information on 370 RBPs and 1583 RBP consensus binding motifs, 192 of which are not present in any other database. To populate ATtRACT we (i) extracted and hand-curated experimentally validated data from CISBP-RNA, SpliceAid–F, RBPDB databases, (ii) integrated and updated the unavailable ASD database and (iii) extracted information from Protein-RNA complexes present in Protein Data Bank database through computational analyses. ATtRACT provides also efficient algorithms to search a specific motif and scan one or more RNA sequences at a time. It also allows discovering de novo motifs enriched in a set of related sequences and compare them with the motifs included in the database. Database URL: http:// attract. cnic. es PMID:27055826

  8. A survey of motif finding Web tools for detecting binding site motifs in ChIP-Seq data

    PubMed Central

    2014-01-01

    Abstract ChIP-Seq (chromatin immunoprecipitation sequencing) has provided the advantage for finding motifs as ChIP-Seq experiments narrow down the motif finding to binding site locations. Recent motif finding tools facilitate the motif detection by providing user-friendly Web interface. In this work, we reviewed nine motif finding Web tools that are capable for detecting binding site motifs in ChIP-Seq data. We showed each motif finding Web tool has its own advantages for detecting motifs that other tools may not discover. We recommended the users to use multiple motif finding Web tools that implement different algorithms for obtaining significant motifs, overlapping resemble motifs, and non-overlapping motifs. Finally, we provided our suggestions for future development of motif finding Web tool that better assists researchers for finding motifs in ChIP-Seq data. Reviewers This article was reviewed by Prof. Sandor Pongor, Dr. Yuriy Gusev, and Dr. Shyam Prabhakar (nominated by Prof. Limsoon Wong). PMID:24555784

  9. Sequence and structural analysis of the Asp-box motif and Asp-box beta-propellers; a widespread propeller-type characteristic of the Vps10 domain family and several glycoside hydrolase families

    PubMed Central

    Quistgaard, Esben M; Thirup, Søren S

    2009-01-01

    Background The Asp-box is a short sequence and structure motif that folds as a well-defined β-hairpin. It is present in different folds, but occurs most prominently as repeats in β-propellers. Asp-box β-propellers are known to be characteristically irregular and to occur in many medically important proteins, most of which are glycosidase enzymes, but they are otherwise not well characterized and are only rarely treated as a distinct β-propeller family. We have analyzed the sequence, structure, function and occurrence of the Asp-box and s-Asp-box -a related shorter variant, and provide a comprehensive classification and computational analysis of the Asp-box β-propeller family. Results We find that all conserved residues of the Asp-box support its structure, whereas the residues in variable positions are generally used for other purposes. The Asp-box clearly has a structural role in β-propellers and is highly unlikely to be involved in ligand binding. Sequence analysis of the Asp-box β-propeller family reveals it to be very widespread especially in bacteria and suggests a wide functional range. Disregarding the Asp-boxes, sequence conservation of the propeller blades is very low, but a distinct pattern of residues with specific properties have been identified. Interestingly, Asp-boxes are occasionally found very close to other propeller-associated repeats in extensive mixed-motif stretches, which strongly suggests the existence of a novel class of hybrid β-propellers. Structural analysis reveals that the top and bottom faces of Asp-box β-propellers have striking and consistently different loop properties; the bottom is structurally conserved whereas the top shows great structural variation. Interestingly, only the top face is used for functional purposes in known structures. A structural analysis of the 10-bladed β-propeller fold, which has so far only been observed in the Asp-box family, reveals that the inner strands of the blades are unusually far apart

  10. DNA Motif Databases and Their Uses.

    PubMed

    Stormo, Gary D

    2015-01-01

    Transcription factors (TFs) recognize and bind to specific DNA sequences. The specificity of a TF is usually represented as a position weight matrix (PWM). Several databases of DNA motifs exist and are used in biological research to address important biological questions. This overview describes PWMs and some of the most commonly used motif databases, as well as a few of their common applications. PMID:26334922

  11. Integrating alignment-based and alignment-free sequence similarity measures for biological sequence classification

    PubMed Central

    Borozan, Ivan; Watt, Stuart; Ferretti, Vincent

    2015-01-01

    Motivation: Alignment-based sequence similarity searches, while accurate for some type of sequences, can produce incorrect results when used on more divergent but functionally related sequences that have undergone the sequence rearrangements observed in many bacterial and viral genomes. Here, we propose a classification model that exploits the complementary nature of alignment-based and alignment-free similarity measures with the aim to improve the accuracy with which DNA and protein sequences are characterized. Results: Our model classifies sequences using a combined sequence similarity score calculated by adaptively weighting the contribution of different sequence similarity measures. Weights are determined independently for each sequence in the test set and reflect the discriminatory ability of individual similarity measures in the training set. Because the similarity between some sequences is determined more accurately with one type of measure rather than another, our classifier allows different sets of weights to be associated with different sequences. Using five different similarity measures, we show that our model significantly improves the classification accuracy over the current composition- and alignment-based models, when predicting the taxonomic lineage for both short viral sequence fragments and complete viral sequences. We also show that our model can be used effectively for the classification of reads from a real metagenome dataset as well as protein sequences. Availability and implementation: All the datasets and the code used in this study are freely available at https://collaborators.oicr.on.ca/vferretti/borozan_csss/csss.html. Contact: ivan.borozan@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25573913

  12. The Thiamin Pyrophosphate-Motif

    NASA Technical Reports Server (NTRS)

    Dominiak, Paulina M.; Ciszak, Ewa M.

    2003-01-01

    Using databases the authors have identified a common thiamin pyrophosphate (TPP)-motif in the family of functionally diverse TPP-dependent enzymes. This common motif consists of multimeric organization of subunits, two catalytic centers, common amino acid sequence, and specific contacts to provide a flip-flop, or alternate site, mechanism of action. Each catalytic center [PP:PYR] is formed at the interface of the PP-domain binding the magnesium ion, pyrophosphate and aminopyrimidine ring of TPP, and the PYR-domain binding the aminopyrimidine ring of that cofactor. A pair of these catalytic centers constitutes the catalytic core [PP:PYR]* within these enzymes. Analysis of the structural elements of this catalytic core reveals novel definition of the common amino acid sequences, which are GX@&(G)@XXGQ, and GDGX25-30 within the PP- domain, and the E&(G)@XXG@ within the PYR-domain, where Q, corresponds to a hydrophobic amino acid. This TPP-motif provides a novel tool for annotation of TPP-dependent enzymes useful in advancing functional proteomics.

  13. NG6: Integrated next generation sequencing storage and processing environment

    PubMed Central

    2012-01-01

    Background Next generation sequencing platforms are now well implanted in sequencing centres and some laboratories. Upcoming smaller scale machines such as the 454 junior from Roche or the MiSeq from Illumina will increase the number of laboratories hosting a sequencer. In such a context, it is important to provide these teams with an easily manageable environment to store and process the produced reads. Results We describe a user-friendly information system able to manage large sets of sequencing data. It includes, on one hand, a workflow environment already containing pipelines adapted to different input formats (sff, fasta, fastq and qseq), different sequencers (Roche 454, Illumina HiSeq) and various analyses (quality control, assembly, alignment, diversity studies,…) and, on the other hand, a secured web site giving access to the results. The connected user will be able to download raw and processed data and browse through the analysis result statistics. The provided workflows can easily be modified or extended and new ones can be added. Ergatis is used as a workflow building, running and monitoring system. The analyses can be run locally or in a cluster environment using Sun Grid Engine. Conclusions NG6 is a complete information system designed to answer the needs of a sequencing platform. It provides a user-friendly interface to process, store and download high-throughput sequencing data. PMID:22958229

  14. Expression and characterization of EF-hand I loop mutants of aequorin replaced with other loop sequences of Ca2+-binding proteins: an approach to studying the EF-hand motif of proteins.

    PubMed

    Inouye, Satoshi; Sahara-Miura, Yuiko

    2016-07-01

    The binding properties of Ca(2+) to EF-hand I of aequorin (AQ) were characterized by replacing the loop sequence of EF-hand I (AQ[I]) with other known loop sequences of Ca(2+)-binding proteins, including photoproteins (aequorin, clytin-I, clytin-II and mitrocomin), Renilla luciferin-binding protein (RLBP) and calmodulin (CaM). For evaluation of the binding affinity of Ca(2+) to AQ[I] mutants, the half-decay time of the maximum intensity in the luminescence reaction triggered by Ca(2+) was used as an indicator and 22 kinds of AQ[I] mutants were expressed in Escherichia coli cells. AQ[I] mutants replaced with the EF-hand I and EF-hand III from photoproteins showed sufficient luminescence activity, but it was not shown by other EF-hands from RLBP and CaM. An AQ[I] mutant with a lysine or arginine residue at the second position of the non-conserved amino acid residue showed a slow-decay pattern of luminescence, indicating that the Ca(2+)-binding affinity to aequorin was reduced by a positive charge at the second position of the loop sequence. The specific loop sequence of the EF-hand I motif in aequorin caused the specific Ca(2+)-triggered luminescence pattern. PMID:26896488

  15. Space-related pharma-motifs for fast search of protein binding motifs and polypharmacological targets

    PubMed Central

    2012-01-01

    Background To discover a compound inhibiting multiple proteins (i.e. polypharmacological targets) is a new paradigm for the complex diseases (e.g. cancers and diabetes). In general, the polypharmacological proteins often share similar local binding environments and motifs. As the exponential growth of the number of protein structures, to find the similar structural binding motifs (pharma-motifs) is an emergency task for drug discovery (e.g. side effects and new uses for old drugs) and protein functions. Results We have developed a Space-Related Pharmamotifs (called SRPmotif) method to recognize the binding motifs by searching against protein structure database. SRPmotif is able to recognize conserved binding environments containing spatially discontinuous pharma-motifs which are often short conserved peptides with specific physico-chemical properties for protein functions. Among 356 pharma-motifs, 56.5% interacting residues are highly conserved. Experimental results indicate that 81.1% and 92.7% polypharmacological targets of each protein-ligand complex are annotated with same biological process (BP) and molecular function (MF) terms, respectively, based on Gene Ontology (GO). Our experimental results show that the identified pharma-motifs often consist of key residues in functional (active) sites and play the key roles for protein functions. The SRPmotif is available at http://gemdock.life.nctu.edu.tw/SRP/. Conclusions SRPmotif is able to identify similar pharma-interfaces and pharma-motifs sharing similar binding environments for polypharmacological targets by rapidly searching against the protein structure database. Pharma-motifs describe the conservations of binding environments for drug discovery and protein functions. Additionally, these pharma-motifs provide the clues for discovering new sequence-based motifs to predict protein functions from protein sequence databases. We believe that SRPmotif is useful for elucidating protein functions and drug discovery

  16. A systematic evaluation of sorting motifs in the sodium-iodide symporter (NIS).

    PubMed

    Darrouzet, Elisabeth; Graslin, Fanny; Marcellin, Didier; Tcheremisinova, Iulia; Marchetti, Charles; Salleron, Lisa; Pognonec, Philippe; Pourcher, Thierry

    2016-04-01

    The sodium-iodide symporter (NIS) is an integral membrane protein that plays a crucial role in iodide accumulation, especially in the thyroid. As for many other membrane proteins, its intracellular sorting and distribution have a tremendous effect on its function, and constitute an important aspect of its regulation. Many short sequences have been shown to contribute to protein trafficking along the sorting or endocytic pathways. Using bioinformatics tools, we identified such potential sites on human NIS [tyrosine-based motifs, SH2-(Src homology 2), SH3- and PDZ (post-synaptic density-95/discs large tumour suppressor/zonula occludens-1)-binding motifs, and diacidic, dibasic and dileucine motifs] and analysed their roles using mutagenesis. We found that several of these sites play a role in protein stability and/or targeting to the membrane. Aside from the mutation at position 178 (SH2 plus tyrosine-based motif) that affects iodide uptake, the most drastic effect is associated with the mutation of an internal PDZ-binding motif at position 121 that completely abolishes NIS expression at the plasma membrane. Mutating the sites located on the C-terminal domain of the protein has no effect except for the creation of a diacidic motif that decreases the total NIS protein level without affecting its expression at the plasma membrane. PMID:26831514

  17. Integrating sequence, evolution and functional genomics in regulatory genomics

    PubMed Central

    Vingron, Martin; Brazma, Alvis; Coulson, Richard; van Helden, Jacques; Manke, Thomas; Palin, Kimmo; Sand, Olivier; Ukkonen, Esko

    2009-01-01

    With genome analysis expanding from the study of genes to the study of gene regulation, 'regulatory genomics' utilizes sequence information, evolution and functional genomics measurements to unravel how regulatory information is encoded in the genome. PMID:19226437

  18. Selection against spurious promoter motifs correlates with translational efficiency across bacteria.

    PubMed

    Froula, Jeffrey L; Francino, M Pilar

    2007-01-01

    Because binding of RNAP to misplaced sites could compromise the efficiency of transcription, natural selection for the optimization of gene expression should regulate the distribution of DNA motifs capable of RNAP-binding across the genome. Here we analyze the distribution of the -10 promoter motifs that bind the sigma(70) subunit of RNAP in 42 bacterial genomes. We show that selection on these motifs operates across the genome, maintaining an over-representation of -10 motifs in regulatory sequences while eliminating them from the nonfunctional and, in most cases, from the protein coding regions. In some genomes, however, -10 sites are over-represented in the coding sequences; these sites could induce pauses effecting regulatory roles throughout the length of a transcriptional unit. For nonfunctional sequences, the extent of motif under-representation varies across genomes in a manner that broadly correlates with the number of tRNA genes, a good indicator of translational speed and growth rate. This suggests that minimizing the time invested in gene transcription is an important selective pressure against spurious binding. However, selection against spurious binding is detectable in the reduced genomes of host-restricted bacteria that grow at slow rates, indicating that components of efficiency other than speed may also be important. Minimizing the number of RNAP molecules per cell required for transcription, and the corresponding energetic expense, may be most relevant in slow growers. These results indicate that genome-level properties affecting the efficiency of transcription and translation can respond in an integrated manner to optimize gene expression. The detection of selection against promoter motifs in nonfunctional regions also confirms previous results indicating that no sequence may evolve free of selective constraints, at least in the relatively small and unstructured genomes of bacteria. PMID:17710145

  19. Characterization of DNA sequences that mediate nuclear protein binding to the regulatory region of the Pisum sativum (pea) chlorophyl a/b binding protein gene AB80: identification of a repeated heptamer motif.

    PubMed

    Argüello, G; García-Hernández, E; Sánchez, M; Gariglio, P; Herrera-Estrella, L; Simpson, J

    1992-05-01

    Two protein factors binding to the regulatory region of the pea chlorophyl a/b binding protein gene AB80 have been identified. One of these factors is found only in green tissue but not in etiolated or root tissue. The second factor (denominated ABF-2) binds to a DNA sequence element that contains a direct heptamer repeat TCTCAAA. It was found that presence of both of the repeats is essential for binding. ABF-2 is present in both green and etiolated tissue and in roots and factors analogous to ABF-2 are present in several plant species. Computer analysis showed that the TCTCAAA motif is present in the regulatory region of several plant genes. PMID:1303797

  20. Fast approximate motif statistics.

    PubMed

    Nicodème, P

    2001-01-01

    We present in this article a fast approximate method for computing the statistics of a number of non-self-overlapping matches of motifs in a random text in the nonuniform Bernoulli model. This method is well suited for protein motifs where the probability of self-overlap of motifs is small. For 96% of the PROSITE motifs, the expectations of occurrences of the motifs in a 7-million-amino-acids random database are computed by the approximate method with less than 1% error when compared with the exact method. Processing of the whole PROSITE takes about 30 seconds with the approximate method. We apply this new method to a comparison of the C. elegans and S. cerevisiae proteomes. PMID:11535175

  1. MannDB – A microbial database of automated protein sequence analyses and evidence integration for protein characterization

    PubMed Central

    Zhou, Carol L Ecale; Lam, Marisa W; Smith, Jason R; Zemla, Adam T; Dyer, Matthew D; Kuczmarski, Thomas A; Vitalis, Elizabeth A; Slezak, Thomas R

    2006-01-01

    Background MannDB was created to meet a need for rapid, comprehensive automated protein sequence analyses to support selection of proteins suitable as targets for driving the development of reagents for pathogen or protein toxin detection. Because a large number of open-source tools were needed, it was necessary to produce a software system to scale the computations for whole-proteome analysis. Thus, we built a fully automated system for executing software tools and for storage, integration, and display of automated protein sequence analysis and annotation data. Description MannDB is a relational database that organizes data resulting from fully automated, high-throughput protein-sequence analyses using open-source tools. Types of analyses provided include predictions of cleavage, chemical properties, classification, features, functional assignment, post-translational modifications, motifs, antigenicity, and secondary structure. Proteomes (lists of hypothetical and known proteins) are downloaded and parsed from Genbank and then inserted into MannDB, and annotations from SwissProt are downloaded when identifiers are found in the Genbank entry or when identical sequences are identified. Currently 36 open-source tools are run against MannDB protein sequences either on local systems or by means of batch submission to external servers. In addition, BLAST against protein entries in MvirDB, our database of microbial virulence factors, is performed. A web client browser enables viewing of computational results and downloaded annotations, and a query tool enables structured and free-text search capabilities. When available, links to external databases, including MvirDB, are provided. MannDB contains whole-proteome analyses for at least one representative organism from each category of biological threat organism listed by APHIS, CDC, HHS, NIAID, USDA, USFDA, and WHO. Conclusion MannDB comprises a large number of genomes and comprehensive protein sequence analyses

  2. Control of Integrated Task Sequences Shapes Components of Reaching.

    PubMed

    Viswanathan, Priya; Whitall, Jill; Kagerer, Florian A

    2016-01-01

    Reaching toward an object usually consists of a sequence of elemental actions. Using a reaching task sequence, the authors investigated how task elements of that sequence affected feedforward and feedback components of the reaching phase of the movement. Nine right-handed adults performed, with their dominant and nondominant hands, 4 tasks of different complexities: a simple reaching task; a reach-to-grasp task; a reach-to-grasp and lift object task; and a reach-to-grasp, lift, and place object task. Results showed that in the reach-to-grasp and lift object task more time was allocated to the feedforward component of the reach phase, while latency between the task elements decreased. We also found between-hand differences, supporting previous findings of increased efficiency of processing planning-related information in the preferred hand. The presence of task-related modifications supports the concept of contextual effects when planning a movement. PMID:27254601

  3. The highly conserved amino acid sequence motif Tyr-Gly-Asp-Thr-Asp-Ser in alpha-like DNA polymerases is required by phage phi 29 DNA polymerase for protein-primed initiation and polymerization.

    PubMed Central

    Bernad, A; Lázaro, J M; Salas, M; Blanco, L

    1990-01-01

    The alpha-like DNA polymerases from bacteriophage phi 29 and other viruses, prokaryotes and eukaryotes contain an amino acid consensus sequence that has been proposed to form part of the dNTP binding site. We have used site-directed mutants to study five of the six highly conserved consecutive amino acids corresponding to the most conserved C-terminal segment (Tyr-Gly-Asp-Thr-Asp-Ser). Our results indicate that in phi 29 DNA polymerase this consensus sequence, although irrelevant for the 3'----5' exonuclease activity, is essential for initiation and elongation. Based on these results and on its homology with known or putative metal-binding amino acid sequences, we propose that in phi 29 DNA polymerase the Tyr-Gly-Asp-Thr-Asp-Ser consensus motif is part of the dNTP binding site, involved in the synthetic activities of the polymerase (i.e., initiation and polymerization), and that it is involved particularly in the metal binding associated with the dNTP site. Images PMID:2191296

  4. Common sequence motifs coding for higher-plant and prokaryotic O-acetylserine (thiol)-lyases: bacterial origin of a chloroplast transit peptide?

    PubMed

    Rolland, N; Job, D; Douce, R

    1993-08-01

    A comparison of the amino acid sequence of O-acetylserine (thiol)-lyase (EC 4.2.99.8) from Escherichia coli and the isoforms of this enzyme found in the cytosolic and chloroplastic compartments of spinach (Spinacia oleracea) leaf cells allows the essential lysine residue involved in the binding of the pyridoxal 5'-phosphate cofactor to be identified. The results of further sequence comparison of cDNAs coding for these proteins are discussed in the frame of the endosymbiotic theory of chloroplast evolution. The results are compatible with a mechanism in which the chloroplast enzyme originated from the cytosolic enzyme and both plant genes originated from a common prokaryotic ancestor. The comparison also suggests that the 5'-non-coding sequence of the bacterial gene was transferred to the plant cell nucleus and that it has been used to create the N-terminal portions of both plant enzymes, and possibly the transit peptide of the chloroplast enzyme. PMID:7916619

  5. Integrating Information Literacy with a Sequenced English Composition Curriculum

    ERIC Educational Resources Information Center

    Holliday, Wendy; Fagerheim, Britt

    2006-01-01

    This article details the process of implementing a sequenced information literacy program for two core English composition courses at Utah State University. An extensive needs assessment guided the project, leading to a curriculum design process with the goal of building a foundation for deeper critical thinking skills. The curriculum development…

  6. Comparative analysis of the full genome sequence of European bat lyssavirus type 1 and type 2 with other lyssaviruses and evidence for a conserved transcription termination and polyadenylation motif in the G-L 3' non-translated region.

    PubMed

    Marston, D A; McElhinney, L M; Johnson, N; Müller, T; Conzelmann, K K; Tordo, N; Fooks, A R

    2007-04-01

    We report the first full-length genomic sequences for European bat lyssavirus type-1 (EBLV-1) and type-2 (EBLV-2). The EBLV-1 genomic sequence was derived from a virus isolated from a serotine bat in Hamburg, Germany, in 1968 and the EBLV-2 sequence was derived from a virus isolate from a human case of rabies that occurred in Scotland in 2002. A long-distance PCR strategy was used to amplify the open reading frames (ORFs), followed by standard and modified RACE (rapid amplification of cDNA ends) techniques to amplify the 3' and 5' ends. The lengths of each complete viral genome for EBLV-1 and EBLV-2 were 11 966 and 11 930 base pairs, respectively, and follow the standard rhabdovirus genome organization of five viral proteins. Comparison with other lyssavirus sequences demonstrates variation in degrees of homology, with the genomic termini showing a high degree of complementarity. The nucleoprotein was the most conserved, both intra- and intergenotypically, followed by the polymerase (L), matrix and glyco- proteins, with the phosphoprotein being the most variable. In addition, we have shown that the two EBLVs utilize a conserved transcription termination and polyadenylation (TTP) motif, approximately 50 nt upstream of the L gene start codon. All available lyssavirus sequences to date, with the exception of Pasteur virus (PV) and PV-derived isolates, use the second TTP site. This observation may explain differences in pathogenicity between lyssavirus strains, dependent on the length of the untranslated region, which might affect transcriptional activity and RNA stability. PMID:17374776

  7. Cloning, Expression, and Sequencing of a Cell Surface Antigen Containing a Leucine-Rich Repeat Motif from Bacteroides forsythus ATCC 43037

    PubMed Central

    Sharma, Ashu; Sojar, Hakimuddin T.; Glurich, Ingrid; Honma, Kiyonobu; Kuramitsu, Howard K.; Genco, Robert J.

    1998-01-01

    Bacteroides forsythus is a recently recognized human periodontopathogen associated with advanced, as well as recurrent, periodontitis. However, very little is known about the mechanism of pathogenesis of this organism. The present study was undertaken to identify the surface molecules of this bacterium that may play roles in its adherence to oral tissues or triggering of a host immune response(s). The gene (bspA) encoding a cell surface-associated protein of B. forsythus with an apparent molecular mass of 98 kDa was isolated by immunoscreening of a B. forsythus gene library constructed in a lambda ZAP II vector. The encoded 98-kDa protein (BspA) contains 14 complete repeats of 23 amino acid residues that show partial homology to leucine-rich repeat motifs. A recombinant protein containing the repeat region was expressed in Escherichia coli, purified, and utilized for antibody production, as well as in vitro binding studies. The purified recombinant protein bound strongly to fibronectin and fibrinogen in a dose-dependent manner and further inhibited the binding of B. forsythus cells to these extracellular matrix (ECM) components. In addition, adult patients with B. forsythus-associated periodontitis expressed specific antibodies against the BspA protein. We report here the cloning and expression of an immunogenic cell surface-associated protein (BspA) of B. forsythus and speculate that it mediates the binding of bacteria to ECM components and clotting factors (fibronectin and fibrinogen, respectively), which may be important in the colonization of the oral cavity by this bacterium and is also a target for the host immune response. PMID:9826345

  8. Envelope formation is blocked by mutation of a sequence related to the HKD phospholipid metabolism motif in the vaccinia virus F13L protein.

    PubMed

    Roper, R L; Moss, B

    1999-02-01

    The outer envelope of the extracellular form of vaccinia virus is derived from Golgi membranes that have been modified by the insertion of specific viral proteins, of which the major component is the 37-kDa, palmitylated, nonglycosylated product of the F13L gene. The F13L protein contains a variant of the HKD (His-Lys-Asp) motif, which is conserved in numerous enzymes of phospholipid metabolism. Vaccinia virus mutants with a conservative substitution of either the K (K314R) or the D (D319E) residue of the F13L protein formed only tiny plaques similar to those produced by an F13L deletion mutant, were unable to produce extracellular enveloped virions, and failed to mediate low-pH-induced fusion of infected cells. Membrane-wrapped forms of intracellular virus were rarely detected in electron microscopic images of cells infected with either of the mutants. Western blotting and pulse-chase experiments demonstrated that the D319E protein was less stable than either the K314R or wild-type F13L protein. Most striking, however, was the failure of either of the two mutated proteins to concentrate in the Golgi compartment. Palmitylation, oleation, and partitioning of the F13L protein in Triton X-114 detergent were unaffected by the K314R substitution. These results indicated that the F13L protein must retain the K314 and D319 for it to localize in the Golgi compartment and function in membrane envelopment of vaccinia virus. PMID:9882312

  9. Evolutionary Analysis and Classification of OATs, OCTs, OCTNs, and Other SLC22 Transporters: Structure-Function Implications and Analysis of Sequence Motifs

    PubMed Central

    Date, Rishabh C.; Bush, Kevin T.; Springer, Stevan A.; Saier, Milton H.; Wu, Wei; Nigam, Sanjay K.

    2015-01-01

    The SLC22 family includes organic anion transporters (OATs), organic cation transporters (OCTs) and organic carnitine and zwitterion transporters (OCTNs). These are often referred to as drug transporters even though they interact with many endogenous metabolites and signaling molecules (Nigam, S.K., Nature Reviews Drug Discovery, 14:29–44, 2015). Phylogenetic analysis of SLC22 supports the view that these transporters may have evolved over 450 million years ago. Many OAT members were found to appear after a major expansion of the SLC22 family in mammals, suggesting a physiological and/or toxicological role during the mammalian radiation. Putative SLC22 orthologs exist in worms, sea urchins, flies, and ciona. At least six groups of SLC22 exist. OATs and OCTs form two Major clades of SLC22, within which (apart from Oat and Oct subclades), there are also clear Oat-like, Octn, and Oct-related subclades, as well as a distantly related group we term “Oat-related” (which may have different functions). Based on available data, it is arguable whether SLC22A18, which is related to bacterial drug-proton antiporters, should be assigned to SLC22. Disease-causing mutations, single nucleotide polymorphisms (SNPs) and other functionally analyzed mutations in OAT1, OAT3, URAT1, OCT1, OCT2, OCTN1, and OCTN2 map to the first extracellular domain, the large central intracellular domain, and transmembrane domains 9 and 10. These regions are highly conserved within subclades, but not between subclades, and may be necessary for SLC22 transporter function and functional diversification. Our results not only link function to evolutionarily conserved motifs but indicate the need for a revised sub-classification of SLC22. PMID:26536134

  10. Integration of new alternative reference strain genome sequences into the Saccharomyces genome database.

    PubMed

    Song, Giltae; Balakrishnan, Rama; Binkley, Gail; Costanzo, Maria C; Dalusag, Kyla; Demeter, Janos; Engel, Stacia; Hellerstedt, Sage T; Karra, Kalpana; Hitz, Benjamin C; Nash, Robert S; Paskov, Kelley; Sheppard, Travis; Skrzypek, Marek; Weng, Shuai; Wong, Edith; Michael Cherry, J

    2016-01-01

    The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) is the authoritative community resource for the Saccharomyces cerevisiae reference genome sequence and its annotation. To provide a wider scope of genetic and phenotypic variation in yeast, the genome sequences and their corresponding annotations from 11 alternative S. cerevisiae reference strains have been integrated into SGD. Genomic and protein sequence information for genes from these strains are now available on the Sequence and Protein tab of the corresponding Locus Summary pages. We illustrate how these genome sequences can be utilized to aid our understanding of strain-specific functional and phenotypic differences.Database URL: www.yeastgenome.org. PMID:27252399

  11. Integration of new alternative reference strain genome sequences into the Saccharomyces genome database

    PubMed Central

    Song, Giltae; Balakrishnan, Rama; Binkley, Gail; Costanzo, Maria C.; Dalusag, Kyla; Demeter, Janos; Engel, Stacia; Hellerstedt, Sage T.; Karra, Kalpana; Hitz, Benjamin C.; Nash, Robert S.; Paskov, Kelley; Sheppard, Travis; Skrzypek, Marek; Weng, Shuai; Wong, Edith; Michael Cherry, J.

    2016-01-01

    The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) is the authoritative community resource for the Saccharomyces cerevisiae reference genome sequence and its annotation. To provide a wider scope of genetic and phenotypic variation in yeast, the genome sequences and their corresponding annotations from 11 alternative S. cerevisiae reference strains have been integrated into SGD. Genomic and protein sequence information for genes from these strains are now available on the Sequence and Protein tab of the corresponding Locus Summary pages. We illustrate how these genome sequences can be utilized to aid our understanding of strain-specific functional and phenotypic differences. Database URL: www.yeastgenome.org PMID:27252399

  12. The Thiamin Pyrophosphate-Motif

    NASA Technical Reports Server (NTRS)

    Dominiak, P.; Ciszak, E.

    2003-01-01

    Using databases the authors have identified a common thiamin pyrophosphate (TPP)-motif in the family of functionally diverse TPP-dependent enzymes. This common motif consists of multimeric organization of subunits and two catalytic centers. Each catalytic center (PP:PYR) is formed at the interface of the PP-domain binding the magnesium ion, pyrophosphate and amhopyrimidine ring of TPP, and the PYR-domain binding the aminopyrimidine ring of that cofactor. A pair of these catalytic centers constitutes the catalytic core (PP:PYR)(sub 2) within these enzymes. Analysis of the structural elements of this catalytic core reveals novel definition of the common amino acid sequences, which are GXPhiX(sub 4)(G)PhiXXGQ and GDGX(sub 25-30)NN in the PP-domain, and the EX(sub 4)(G)PhiXXGPhi in the PYR-domain, where Phi corresponds to a hydrophobic amino acid. This TPP-motif provides a novel tool for annotation of TPP-dependent enzymes useful in advancing functional proteomics.

  13. Comparison of SIV and HIV-1 genomic RNA structures reveals impact of sequence evolution on conserved and non-conserved structural motifs.

    PubMed

    Pollom, Elizabeth; Dang, Kristen K; Potter, E Lake; Gorelick, Robert J; Burch, Christina L; Weeks, Kevin M; Swanstrom, Ronald

    2013-01-01

    RNA secondary structure plays a central role in the replication and metabolism of all RNA viruses, including retroviruses like HIV-1. However, structures with known function represent only a fraction of the secondary structure reported for HIV-1(NL4-3). One tool to assess the importance of RNA structures is to examine their conservation over evolutionary time. To this end, we used SHAPE to model the secondary structure of a second primate lentiviral genome, SIVmac239, which shares only 50% sequence identity at the nucleotide level with HIV-1NL4-3. Only about half of the paired nucleotides are paired in both genomic RNAs and, across the genome, just 71 base pairs form with the same pairing partner in both genomes. On average the RNA secondary structure is thus evolving at a much faster rate than the sequence. Structure at the Gag-Pro-Pol frameshift site is maintained but in a significantly altered form, while the impact of selection for maintaining a protein binding interaction can be seen in the conservation of pairing partners in the small RRE stems where Rev binds. Structures that are conserved between SIVmac239 and HIV-1(NL4-3) also occur at the 5' polyadenylation sequence, in the plus strand primer sites, PPT and cPPT, and in the stem-loop structure that includes the first splice acceptor site. The two genomes are adenosine-rich and cytidine-poor. The structured regions are enriched in guanosines, while unpaired regions are enriched in adenosines, and functionaly important structures have stronger base pairing than nonconserved structures. We conclude that much of the secondary structure is the result of fortuitous pairing in a metastable state that reforms during sequence evolution. However, secondary structure elements with important function are stabilized by higher guanosine content that allows regions of structure to persist as sequence evolution proceeds, and, within the confines of selective pressure, allows structures to evolve. PMID:23593004

  14. The Thiamine-Pyrophosphate-Motif

    NASA Technical Reports Server (NTRS)

    Ciszak, Ewa; Dominiak, Paulina

    2004-01-01

    Thiamin pyrophosphate (TPP), a derivative of vitamin B1, is a cofactor for enzymes performing catalysis in pathways of energy production including the well known decarboxylation of a-keto acid dehydrogenases followed by transketolation. TPP-dependent enzymes constitute a structurally and functionally diverse group exhibiting multimeric subunit organization, multiple domains and two chemically equivalent catalytic centers. Annotation of functional TPP-dependcnt enzymes, therefore, has not been trivial due to low sequence similarity related to this complex organization. Our approach to analysis of structures of known TPP-dependent enzymes reveals for the first time features common to this group, which we have termed the TPP-motif. The TPP-motif consists of specific spatial arrangements of structural elements and their specific contacts to provide for a flip-flop, or alternate site, enzymatic mechanism of action. Analysis of structural elements entrained in the flip-flop action displayed by TPP-dependent enzymes reveals a novel definition of the common amino acid sequences. These sequences allow for annotation of TPP-dependent enzymes, thus advancing functional proteomics. Further details of three-dimensional structures of TPP-dependent enzymes will be discussed.

  15. Process sequence optimization for digital microfluidic integration using EWOD technique

    NASA Astrophysics Data System (ADS)

    Yadav, Supriya; Joyce, Robin; Sharma, Akash Kumar; Sharma, Himani; Sharma, Niti Nipun; Varghese, Soney; Akhtar, Jamil

    2016-04-01

    Micro/nano-fluidic MEMS biosensors are the devices that detects the biomolecules. The emerging micro/nano-fluidic devices provide high throughput and high repeatability with very low response time and reduced device cost as compared to traditional devices. This article presents the experimental details for process sequence optimization of digital microfluidics (DMF) using "electrowetting-on-dielectric" (EWOD). Stress free thick film deposition of silicon dioxide using PECVD and subsequent process for EWOD techniques have been optimized in this work.

  16. Integrated visual analysis of protein structures, sequences, and feature data

    PubMed Central

    2015-01-01

    Background To understand the molecular mechanisms that give rise to a protein's function, biologists often need to (i) find and access all related atomic-resolution 3D structures, and (ii) map sequence-based features (e.g., domains, single-nucleotide polymorphisms, post-translational modifications) onto these structures. Results To streamline these processes we recently developed Aquaria, a resource offering unprecedented access to protein structure information based on an all-against-all comparison of SwissProt and PDB sequences. In this work, we provide a requirements analysis for several frequently occuring tasks in molecular biology and describe how design choices in Aquaria meet these requirements. Finally, we show how the interface can be used to explore features of a protein and gain biologically meaningful insights in two case studies conducted by domain experts. Conclusions The user interface design of Aquaria enables biologists to gain unprecedented access to molecular structures and simplifies the generation of insight. The tasks involved in mapping sequence features onto structures can be conducted easier and faster using Aquaria. PMID:26329268

  17. Overlapping CRE and E Box Motifs in the Enhancer Sequences of the Bovine Leukemia Virus 5′ Long Terminal Repeat Are Critical for Basal and Acetylation-Dependent Transcriptional Activity of the Viral Promoter: Implications for Viral Latency

    PubMed Central

    Calomme, Claire; Dekoninck, Ann; Nizet, Séverine; Adam, Emmanuelle; Nguyên, Thi Liên-Anh; Van Den Broeke, Anne; Willems, Luc; Kettmann, Richard; Burny, Arsène; Lint, Carine Van

    2004-01-01

    Bovine leukemia virus (BLV) infection is characterized by viral latency in a large proportion of cells containing an integrated provirus. In this study, we postulated that mechanisms directing the recruitment of deacetylases to the BLV 5′ long terminal repeat (LTR) could explain the transcriptional repression of viral expression in vivo. Accordingly, we showed that BLV promoter activity was induced by several deacetylase inhibitors (such as trichostatin A [TSA]) in the context of episomal LTR constructs and in the context of an integrated BLV provirus. Moreover, treatment of BLV-infected cells with TSA increased H4 acetylation at the viral promoter, showing a close correlation between the level of histone acetylation and transcriptional activation of the BLV LTR. Among the known cis-regulatory DNA elements located in the 5′ LTR, three E box motifs overlapping cyclic AMP responsive elements (CREs) in U3 were shown to be involved in transcriptional repression of BLV basal gene expression. Importantly, the combined mutations of these three E box motifs markedly reduced the inducibility of the BLV promoter by TSA. E boxes are susceptible to recognition by transcriptional repressors such as Max-Mad-mSin3 complexes that repress transcription by recruiting deacetylases. However, our in vitro binding studies failed to reveal the presence of Mad-Max proteins in the BLV LTR E box-specific complexes. Remarkably, TSA increased the occupancy of the CREs by CREB/ATF. Therefore, we postulated that the E box-specific complexes exerted their negative cooperative effect on BLV transcription by steric hindrance with the activators CREB/ATF and/or their transcriptional coactivators possessing acetyltransferase activities. Our results thus suggest that the overlapping CRE and E box elements in the BLV LTR were selected during evolution as a novel strategy for BLV to allow better silencing of viral transcription and to escape from the host immune response. PMID:15564493

  18. Sequence-Based Screening for Rare Enzymes: New Insights into the World of AMDases Reveal a Conserved Motif and 58 Novel Enzymes Clustering in Eight Distinct Families

    PubMed Central

    Maimanakos, Janine; Chow, Jennifer; Gaßmeyer, Sarah K.; Güllert, Simon; Busch, Florian; Kourist, Robert; Streit, Wolfgang R.

    2016-01-01

    Arylmalonate Decarboxylases (AMDases, EC 4.1.1.76) are very rare and mostly underexplored enzymes. Currently only four known and biochemically characterized representatives exist. However, their ability to decarboxylate α-disubstituted malonic acid derivatives to optically pure products without cofactors makes them attractive and promising candidates for the use as biocatalysts in industrial processes. Until now, AMDases could not be separated from other members of the aspartate/glutamate racemase superfamily based on their gene sequences. Within this work, a search algorithm was developed that enables a reliable prediction of AMDase activity for potential candidates. Based on specific sequence patterns and screening methods 58 novel AMDase candidate genes could be identified in this work. Thereby, AMDases with the conserved sequence pattern of Bordetella bronchiseptica’s prototype appeared to be limited to the classes of Alpha-, Beta-, and Gamma-proteobacteria. Amino acid homologies and comparison of gene surrounding sequences enabled the classification of eight enzyme clusters. Particularly striking is the accumulation of genes coding for different transporters of the tripartite tricarboxylate transporters family, TRAP transporters and ABC transporters as well as genes coding for mandelate racemases/muconate lactonizing enzymes that might be involved in substrate uptake or degradation of AMDase products. Further, three novel AMDases were characterized which showed a high enantiomeric excess (>99%) of the (R)-enantiomer of flurbiprofen. These are the recombinant AmdA and AmdV from Variovorax sp. strains HH01 and HH02, originated from soil, and AmdP from Polymorphum gilvum found by a data base search. Altogether our findings give new insights into the class of AMDases and reveal many previously unknown enzyme candidates with high potential for bioindustrial processes. PMID:27610105

  19. Sequence-Based Screening for Rare Enzymes: New Insights into the World of AMDases Reveal a Conserved Motif and 58 Novel Enzymes Clustering in Eight Distinct Families.

    PubMed

    Maimanakos, Janine; Chow, Jennifer; Gaßmeyer, Sarah K; Güllert, Simon; Busch, Florian; Kourist, Robert; Streit, Wolfgang R

    2016-01-01

    Arylmalonate Decarboxylases (AMDases, EC 4.1.1.76) are very rare and mostly underexplored enzymes. Currently only four known and biochemically characterized representatives exist. However, their ability to decarboxylate α-disubstituted malonic acid derivatives to optically pure products without cofactors makes them attractive and promising candidates for the use as biocatalysts in industrial processes. Until now, AMDases could not be separated from other members of the aspartate/glutamate racemase superfamily based on their gene sequences. Within this work, a search algorithm was developed that enables a reliable prediction of AMDase activity for potential candidates. Based on specific sequence patterns and screening methods 58 novel AMDase candidate genes could be identified in this work. Thereby, AMDases with the conserved sequence pattern of Bordetella bronchiseptica's prototype appeared to be limited to the classes of Alpha-, Beta-, and Gamma-proteobacteria. Amino acid homologies and comparison of gene surrounding sequences enabled the classification of eight enzyme clusters. Particularly striking is the accumulation of genes coding for different transporters of the tripartite tricarboxylate transporters family, TRAP transporters and ABC transporters as well as genes coding for mandelate racemases/muconate lactonizing enzymes that might be involved in substrate uptake or degradation of AMDase products. Further, three novel AMDases were characterized which showed a high enantiomeric excess (>99%) of the (R)-enantiomer of flurbiprofen. These are the recombinant AmdA and AmdV from Variovorax sp. strains HH01 and HH02, originated from soil, and AmdP from Polymorphum gilvum found by a data base search. Altogether our findings give new insights into the class of AMDases and reveal many previously unknown enzyme candidates with high potential for bioindustrial processes. PMID:27610105

  20. Self-association motifs in the enteroaggregative Escherichia coli heat-resistant agglutinin 1.

    PubMed

    Glaubman, Jessica; Hofmann, Jennifer; Bonney, Megan E; Park, Sumin; Thomas, Jessica M; Kokona, Bashkim; Ramos Falcón, Laura I; Chung, Yoonjie K; Fairman, Robert; Okeke, Iruka N

    2016-07-01

    The heat-resistant agglutinin 1 (Hra1) is an integral outer membrane protein found in strains of Escherichia coli that are exceptional colonizers. Hra1 from enteroaggregative E. coli strain 042 is sufficient to confer adherence to human epithelial cells and to cause bacterial autoaggregation. Hra1 is closely related to the Tia invasin, which also confers adherence, but not autoaggregation. Here, we have demonstrated that Hra1 mediates autoaggregation by self-association and we hypothesize that at least some surface-exposed amino acid sequences that are present in Hra1, but absent in Tia, represent autoaggregation motifs. We inserted FLAG tags along the length of Hra1 and used immune-dot blots to verify that four in silico-predicted outer loops were indeed surface exposed. In Hra1 we swapped nine candidate motifs in three of these loops, ranging from one to ten amino acids in length, to the corresponding sequences in Tia. Three of the motifs were required for Hra1-mediated autoaggregation. The database was searched for other surface proteins containing these motifs; the GGXWRDDXK motif was also present in a surface-exposed region of Rck, a Salmonella enterica serotype Typhimurium complement resistance protein. Cloning and site-specific mutagenesis demonstrated that Rck can confer weak, GGXWRDDXK-dependent autoaggregation by self-association. Hra1 and Rck appear to form heterologous associations and GGXWRDDXK is required on both molecules for Hra1-Rck association. However, a GGYWRDDLKE peptide was not sufficient to interfere with Hra1-mediated autoaggregation. In the present study, three autoaggregation motifs in an integral outer membrane protein have been identified and it was demonstrated that at least one of them works in the context of a different cell surface. PMID:27166217

  1. ATtRACT-a database of RNA-binding proteins and associated motifs.

    PubMed

    Giudice, Girolamo; Sánchez-Cabo, Fátima; Torroja, Carlos; Lara-Pezzi, Enrique

    2016-01-01

    RNA-binding proteins (RBPs) play a crucial role in key cellular processes, including RNA transport, splicing, polyadenylation and stability. Understanding the interaction between RBPs and RNA is key to improve our knowledge of RNA processing, localization and regulation in a global manner. Despite advances in recent years, a unified non-redundant resource that includes information on experimentally validated motifs, RBPs and integrated tools to exploit this information is lacking. Here, we developed a database named ATtRACT (available athttp://attract.cnic.es) that compiles information on 370 RBPs and 1583 RBP consensus binding motifs, 192 of which are not present in any other database. To populate ATtRACT we (i) extracted and hand-curated experimentally validated data from CISBP-RNA, SpliceAid-F, RBPDB databases, (ii) integrated and updated the unavailable ASD database and (iii) extracted information from Protein-RNA complexes present in Protein Data Bank database through computational analyses. ATtRACT provides also efficient algorithms to search a specific motif and scan one or more RNA sequences at a time. It also allows discoveringde novomotifs enriched in a set of related sequences and compare them with the motifs included in the database.Database URL:http:// attract. cnic. es. PMID:27055826

  2. Evidence for multiple mechanisms for membrane binding and integration via carboxyl-terminal insertion sequences.

    PubMed

    Kim, P K; Janiak-Spens, F; Trimble, W S; Leber, B; Andrews, D W

    1997-07-22

    Subcellular localization of proteins with carboxyl-terminal insertion sequences requires the molecule be both targeted to and integrated into the correct membrane. The mechanism of membrane integration of cytochrome b5 has been shown to be promiscuous, spontaneous, nonsaturable, and independent of membrane proteins. Thus endoplasmic reticulum localization for cytochrome b5 depends primarily on accurate targeting to the appropriate membrane. Here direct comparison of this mechanism with that of three other proteins integrated into membranes via carboxyl-terminal insertion sequences [vesicle-associated membrane protein 1(Vamp1), polyomavirus middle-T antigen, and Bcl-2] revealed that, unlike cytochrome b5, membrane selectivity for these molecules is conferred at least in part by the mechanisms of membrane integration. Bcl-2 membrane integration was similar to that of cytochrome b5 except that insertion into lipid vesicles was inefficient. Unlike cytochrome b5 and Bcl-2, Vamp1 binding to canine pancreatic microsomes was saturable, ATP-dependent, and abolished by mild trypsin treatment of microsomes. Surprisingly, although the insertion sequence of polyomavirus middle-T antigen was sufficient to mediate electrostatic binding to membranes, binding did not lead to integration into the bilayer. Together these results demonstrate that there are at least two different mechanisms for correct membrane integration of proteins with insertion sequences, one mediated primarily by targeting and one relying on factors in the target membrane to mediate selective integration. Our results also demonstrate that, contrary to expectation, hydrophobicity is not sufficient for insertion sequence-mediated membrane integration. We suggest that the structure of the insertion sequence determines whether or not specific membrane-bound receptor proteins are required for membrane integration. PMID:9220974

  3. Clinical integration of next generation sequencing: a policy analysis.

    PubMed

    Kaufman, David; Curnutte, Margaret; McGuire, Amy L

    2014-01-01

    Clinical next generation sequencing (NGS) technologies are challenging existing regulatory paradigms. We advocate a coordinate policy approach, which first requires a comprehensive understanding of the existing regulatory and legal structures. This paper introduces four key policy domains - including quality assurance, insurance coverage, intellectual property management, and data sharing - that must be addressed to ensure high quality clinical NGS. In bringing these policy issues into conversation through this special issue for the Journal of Law, Medicine & Ethics, we hope to lay the foundation for further discussion by a range of stakeholder groups with diverse and strong interests in the governance of NGS. PMID:25298287

  4. Encoded Expansion: An Efficient Algorithm to Discover Identical String Motifs

    PubMed Central

    Azmi, Aqil M.; Al-Ssulami, Abdulrakeeb

    2014-01-01

    A major task in computational biology is the discovery of short recurring string patterns known as motifs. Most of the schemes to discover motifs are either stochastic or combinatorial in nature. Stochastic approaches do not guarantee finding the correct motifs, while the combinatorial schemes tend to have an exponential time complexity with respect to motif length. To alleviate the cost, the combinatorial approach exploits dynamic data structures such as trees or graphs. Recently (Karci (2009) Efficient automatic exact motif discovery algorithms for biological sequences, Expert Systems with Applications 36:7952–7963) devised a deterministic algorithm that finds all the identical copies of string motifs of all sizes in theoretical time complexity of and a space complexity of where is the length of the input sequence and is the length of the longest possible string motif. In this paper, we present a significant improvement on Karci's original algorithm. The algorithm that we propose reports all identical string motifs of sizes that occur at least times. Our algorithm starts with string motifs of size 2, and at each iteration it expands the candidate string motifs by one symbol throwing out those that occur less than times in the entire input sequence. We use a simple array and data encoding to achieve theoretical worst-case time complexity of and a space complexity of Encoding of the substrings can speed up the process of comparison between string motifs. Experimental results on random and real biological sequences confirm that our algorithm has indeed a linear time complexity and it is more scalable in terms of sequence length than the existing algorithms. PMID:24871320

  5. Integration of hepatitis B virus DNA in chromosome-specific satellite sequences

    SciTech Connect

    Shaul, Y.; Garcia, P.D.; Schonberg, S.; Rutter, W.J.

    1986-09-01

    The authors previously reported the cloning and detailed analysis of the integrated hepatitis B virus sequences in a human hepatoma cell line. They report here the integration of at least one of hepatitis B virus at human satellite DNA sequences. The majority of the cellular sequences identified by this satellite were organized as a multimeric composition of a 0.6-kilobase EcoRI fragment. This clone hybridized in situ almost exclusively to the centromeric heterochromatin of chromosomes 1 and 16 and to a lower extent to chromosome 2 and to the heterochromatic region of the Y chromosome. The immediate flanking host sequence appeared as a hierarchy of repeating units which were almost identical to a previously reported human satellite III DNA sequence.

  6. Integrative analysis of next generation sequencing for small non-coding RNAs and transcriptional regulation in Myelodysplastic Syndromes

    PubMed Central

    2011-01-01

    Background Myelodysplastic Syndromes (MDSS) are pre-leukemic disorders with increasing incident rates worldwide, but very limited treatment options. Little is known about small regulatory RNAs and how they contribute to pathogenesis, progression and transcriptome changes in MDS. Methods Patients' primary marrow cells were screened for short RNAs (RNA-seq) using next generation sequencing. Exon arrays from the same cells were used to profile gene expression and additional measures on 98 patients obtained. Integrative bioinformatics algorithms were proposed, and pathway and ontology analysis performed. Results In low-grade MDS, observations implied extensive post-transcriptional regulation via microRNAs (miRNA) and the recently discovered Piwi interacting RNAs (piRNA). Large expression differences were found for MDS-associated and novel miRNAs, including 48 sequences matching to miRNA star (miRNA*) motifs. The detected species were predicted to regulate disease stage specific molecular functions and pathways, including apoptosis and response to DNA damage. In high-grade MDS, results suggested extensive post-translation editing via transfer RNAs (tRNAs), providing a potential link for reduced apoptosis, a hallmark for this disease stage. Bioinformatics analysis confirmed important regulatory roles for MDS linked miRNAs and TFs, and strengthened the biological significance of miRNA*. The "RNA polymerase II promoters" were identified as the tightest controlled biological function. We suggest their control by a miRNA dominated feedback loop, which might be linked to the dramatically different miRNA amounts seen between low and high-grade MDS. Discussion The presented results provide novel findings that build a basis of further investigations of diagnostic biomarkers, targeted therapies and studies on MDS pathogenesis. PMID:21342535

  7. Phage randomization in a charybdotoxin scaffold leads to CD4-mimetic recognition motifs that bind HIV-1 envelope through non-aromatic sequences.

    PubMed

    Li, C; Dowd, C S; Zhang, W; Chaiken, I M

    2001-06-01

    Binding of HIV-1 gp120 to T-cell receptor CD4 initiates conformational changes in the viral envelope that trigger viral entry into host cells. Phage epitope randomization of a beta-turn loop of a charybdotoxin-based miniprotein scaffold was used to identify peptides that can bind gp120 and block the gp120-CD4 interaction. We describe here the display of the charybdotoxin scaffold on the filamentous phage fUSE5, its use to construct a beta-turn library, and miniprotein sequences identified through library panning with immobilized Env gp120. Competition enzyme-linked immunosorbent assay (ELISA) identified high-frequency phage selectants for which specific gp120 binding was competed by sCD4. Several of these selectants contain hydrophobic residues in place of the Phe that occurs in the gp120-binding beta-turns of both CD4 and previously identified scorpion toxin CD4 mimetics. One of these selectants, denoted TXM[24GQTL27], contains GQTL in place of the CD4 beta-turn sequence 40QGSF43. TXM[24GQTL27] peptide was prepared using solid-phase chemical synthesis, its binding to gp120 demonstrated by optical biosensor kinetics analysis and its affinity for the CD4 binding site of gp120 confirmed by competition ELISA. The results demonstrate that aromatic-less loop-containing CD4 recognition mimetics can be formed with detectable envelope protein binding within a beta-turn of the charybdotoxin miniprotein scaffold. The results of this work establish a methodology for phage display of a charybdotoxin miniprotein scaffold and point to the potential value of phage-based epitope randomization of this miniprotein for identifying novel CD4 mimetics. The latter are potentially useful in deconvoluting structural determinants of CD4-HIV envelope recognition and possibly in designing antagonists of viral entry. PMID:11437954

  8. Integrated sequence and immunology filovirus database at Los Alamos.

    PubMed

    Yusim, Karina; Yoon, Hyejin; Foley, Brian; Feng, Shihai; Macke, Jennifer; Dimitrijevic, Mira; Abfalterer, Werner; Szinger, James; Fischer, Will; Kuiken, Carla; Korber, Bette

    2016-01-01

    The Ebola outbreak of 2013-15 infected more than 28 000 people and claimed more lives than all previous filovirus outbreaks combined. Governmental agencies, clinical teams, and the world scientific community pulled together in a multifaceted response ranging from prevention and disease control, to evaluating vaccines and therapeutics in human trials. As this epidemic is finally coming to a close, refocusing on long-term prevention strategies becomes paramount. Given the very real threat of future filovirus outbreaks, and the inherent uncertainty of the next outbreak virus and geographic location, it is prudent to consider the extent and implications of known natural diversity in advancing vaccines and therapeutic approaches. To facilitate such consideration, we have updated and enhanced the content of the filovirus portion of Los Alamos Hemorrhagic Fever Viruses Database. We have integrated and performed baseline analysis of all family ITALIC! Filoviridaesequences deposited into GenBank, with associated immune response data, and metadata, and we have added new computational tools with web-interfaces to assist users with analysis. Here, we (i) describe the main features of updated database, (ii) provide integrated views and some basic analyses summarizing evolutionary patterns as they relate to geo-temporal data captured in the database and (iii) highlight the most conserved regions in the proteome that may be useful for a T cell vaccine strategy.Database URL:www.hfv.lanl.gov. PMID:27103629

  9. A Conserved Motif Provides Binding Specificity to the PP2A-B56 Phosphatase.

    PubMed

    Hertz, Emil Peter Thrane; Kruse, Thomas; Davey, Norman E; López-Méndez, Blanca; Sigurðsson, Jón Otti; Montoya, Guillermo; Olsen, Jesper V; Nilsson, Jakob

    2016-08-18

    Dynamic protein phosphorylation is a fundamental mechanism regulating biological processes in all organisms. Protein phosphatase 2A (PP2A) is the main source of phosphatase activity in the cell, but the molecular details of substrate recognition are unknown. Here, we report that a conserved surface-exposed pocket on PP2A regulatory B56 subunits binds to a consensus sequence on interacting proteins, which we term the LxxIxE motif. The composition of the motif modulates the affinity for B56, which in turn determines the phosphorylation status of associated substrates. Phosphorylation of amino acid residues within the motif increases B56 binding, allowing integration of kinase and phosphatase activity. We identify conserved LxxIxE motifs in essential proteins throughout the eukaryotic domain of life and in human viruses, suggesting that the motifs are required for basic cellular function. Our study provides a molecular description of PP2A binding specificity with broad implications for understanding signaling in eukaryotes. PMID:27453045

  10. RMOD: a tool for regulatory motif detection in signaling network.

    PubMed

    Kim, Jinki; Yi, Gwan-Su

    2013-01-01

    Regulatory motifs are patterns of activation and inhibition that appear repeatedly in various signaling networks and that show specific regulatory properties. However, the network structures of regulatory motifs are highly diverse and complex, rendering their identification difficult. Here, we present a RMOD, a web-based system for the identification of regulatory motifs and their properties in signaling networks. RMOD finds various network structures of regulatory motifs by compressing the signaling network and detecting the compressed forms of regulatory motifs. To apply it into a large-scale signaling network, it adopts a new subgraph search algorithm using a novel data structure called path-tree, which is a tree structure composed of isomorphic graphs of query regulatory motifs. This algorithm was evaluated using various sizes of signaling networks generated from the integration of various human signaling pathways and it showed that the speed and scalability of this algorithm outperforms those of other algorithms. RMOD includes interactive analysis and auxiliary tools that make it possible to manipulate the whole processes from building signaling network and query regulatory motifs to analyzing regulatory motifs with graphical illustration and summarized descriptions. As a result, RMOD provides an integrated view of the regulatory motifs and mechanism underlying their regulatory motif activities within the signaling network. RMOD is freely accessible online at the following URL: http://pks.kaist.ac.kr/rmod. PMID:23874612