integrated sequence motif: Topics by Science.gov

Sample records for integrated sequence motif

DynaMIT: the dynamic motif integration toolkit

PubMed Central

Dassi, Erik; Quattrone, Alessandro

2016-01-01

De-novo motif search is a frequently applied bioinformatics procedure to identify and prioritize recurrent elements in sequences sets for biological investigation, such as the ones derived from high-throughput differential expression experiments. Several algorithms have been developed to perform motif search, employing widely different approaches and often giving divergent results. In order to maximize the power of these investigations and ultimately be able to draft solid biological hypotheses, there is the need for applying multiple tools on the same sequences and merge the obtained results. However, motif reporting formats and statistical evaluation methods currently make such an integration task difficult to perform and mostly restricted to specific scenarios. We thus introduce here the Dynamic Motif Integration Toolkit (DynaMIT), an extremely flexible platform allowing to identify motifs employing multiple algorithms, integrate them by means of a user-selected strategy and visualize results in several ways; furthermore, the platform is user-extendible in all its aspects. DynaMIT is freely available at http://cibioltg.bitbucket.org. PMID:26253738
DLocalMotif: a discriminative approach for discovering local motifs in protein sequences.

PubMed

Mehdi, Ahmed M; Sehgal, Muhammad Shoaib B; Kobe, Bostjan; Bailey, Timothy L; Bodén, Mikael

2013-01-01

Local motifs are patterns of DNA or protein sequences that occur within a sequence interval relative to a biologically defined anchor or landmark. Current protein motif discovery methods do not adequately consider such constraints to identify biologically significant motifs that are only weakly over-represented but spatially confined. Using negatives, i.e. sequences known to not contain a local motif, can further increase the specificity of their discovery. This article introduces the method DLocalMotif that makes use of positional information and negative data for local motif discovery in protein sequences. DLocalMotif combines three scoring functions, measuring degrees of motif over-representation, entropy and spatial confinement, specifically designed to discriminatively exploit the availability of negative data. The method is shown to outperform current methods that use only a subset of these motif characteristics. We apply the method to several biological datasets. The analysis of peroxisomal targeting signals uncovers several novel motifs that occur immediately upstream of the dominant peroxisomal targeting signal-1 signal. The analysis of proline-tyrosine nuclear localization signals uncovers multiple novel motifs that overlap with C2H2 zinc finger domains. We also evaluate the method on classical nuclear localization signals and endoplasmic reticulum retention signals and find that DLocalMotif successfully recovers biologically relevant sequence properties. http://bioinf.scmb.uq.edu.au/dlocalmotif/
DMINDA: an integrated web server for DNA motif identification and analyses

PubMed Central

Ma, Qin; Zhang, Hanyuan; Mao, Xizeng; Zhou, Chuan; Liu, Bingqiang; Chen, Xin; Xu, Ying

2014-01-01

DMINDA (DNA motif identification and analyses) is an integrated web server for DNA motif identification and analyses, which is accessible at http://csbl.bmb.uga.edu/DMINDA/. This web site is freely available to all users and there is no login requirement. This server provides a suite of cis-regulatory motif analysis functions on DNA sequences, which are important to elucidation of the mechanisms of transcriptional regulation: (i) de novo motif finding for a given set of promoter sequences along with statistical scores for the predicted motifs derived based on information extracted from a control set, (ii) scanning motif instances of a query motif in provided genomic sequences, (iii) motif comparison and clustering of identified motifs, and (iv) co-occurrence analyses of query motifs in given promoter sequences. The server is powered by a backend computer cluster with over 150 computing nodes, and is particularly useful for motif prediction and analyses in prokaryotic genomes. We believe that DMINDA, as a new and comprehensive web server for cis-regulatory motif finding and analyses, will benefit the genomic research community in general and prokaryotic genome researchers in particular. PMID:24753419
DMINDA: an integrated web server for DNA motif identification and analyses.

PubMed

Ma, Qin; Zhang, Hanyuan; Mao, Xizeng; Zhou, Chuan; Liu, Bingqiang; Chen, Xin; Xu, Ying

2014-07-01

DMINDA (DNA motif identification and analyses) is an integrated web server for DNA motif identification and analyses, which is accessible at http://csbl.bmb.uga.edu/DMINDA/. This web site is freely available to all users and there is no login requirement. This server provides a suite of cis-regulatory motif analysis functions on DNA sequences, which are important to elucidation of the mechanisms of transcriptional regulation: (i) de novo motif finding for a given set of promoter sequences along with statistical scores for the predicted motifs derived based on information extracted from a control set, (ii) scanning motif instances of a query motif in provided genomic sequences, (iii) motif comparison and clustering of identified motifs, and (iv) co-occurrence analyses of query motifs in given promoter sequences. The server is powered by a backend computer cluster with over 150 computing nodes, and is particularly useful for motif prediction and analyses in prokaryotic genomes. We believe that DMINDA, as a new and comprehensive web server for cis-regulatory motif finding and analyses, will benefit the genomic research community in general and prokaryotic genome researchers in particular. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Sequence information gain based motif analysis.

PubMed

Maynou, Joan; Pairó, Erola; Marco, Santiago; Perera, Alexandre

2015-11-09

The detection of regulatory regions in candidate sequences is essential for the understanding of the regulation of a particular gene and the mechanisms involved. This paper proposes a novel methodology based on information theoretic metrics for finding regulatory sequences in promoter regions. This methodology (SIGMA) has been tested on genomic sequence data for Homo sapiens and Mus musculus. SIGMA has been compared with different publicly available alternatives for motif detection, such as MEME/MAST, Biostrings (Bioconductor package), MotifRegressor, and previous work such Qresiduals projections or information theoretic based detectors. Comparative results, in the form of Receiver Operating Characteristic curves, show how, in 70% of the studied Transcription Factor Binding Sites, the SIGMA detector has a better performance and behaves more robustly than the methods compared, while having a similar computational time. The performance of SIGMA can be explained by its parametric simplicity in the modelling of the non-linear co-variability in the binding motif positions. Sequence Information Gain based Motif Analysis is a generalisation of a non-linear model of the cis-regulatory sequences detection based on Information Theory. This generalisation allows us to detect transcription factor binding sites with maximum performance disregarding the covariability observed in the positions of the training set of sequences. SIGMA is freely available to the public at http://b2slab.upc.edu.
Discovering Sequence Motifs with Arbitrary Insertions and Deletions

PubMed Central

Frith, Martin C.; Saunders, Neil F. W.; Kobe, Bostjan; Bailey, Timothy L.

2008-01-01

Biology is encoded in molecular sequences: deciphering this encoding remains a grand scientific challenge. Functional regions of DNA, RNA, and protein sequences often exhibit characteristic but subtle motifs; thus, computational discovery of motifs in sequences is a fundamental and much-studied problem. However, most current algorithms do not allow for insertions or deletions (indels) within motifs, and the few that do have other limitations. We present a method, GLAM2 (Gapped Local Alignment of Motifs), for discovering motifs allowing indels in a fully general manner, and a companion method GLAM2SCAN for searching sequence databases using such motifs. glam2 is a generalization of the gapless Gibbs sampling algorithm. It re-discovers variable-width protein motifs from the PROSITE database significantly more accurately than the alternative methods PRATT and SAM-T2K. Furthermore, it usefully refines protein motifs from the ELM database: in some cases, the refined motifs make orders of magnitude fewer overpredictions than the original ELM regular expressions. GLAM2 performs respectably on the BAliBASE multiple alignment benchmark, and may be superior to leading multiple alignment methods for “motif-like” alignments with N- and C-terminal extensions. Finally, we demonstrate the use of GLAM2 to discover protein kinase substrate motifs and a gapped DNA motif for the LIM-only transcriptional regulatory complex: using GLAM2SCAN, we identify promising targets for the latter. GLAM2 is especially promising for short protein motifs, and it should improve our ability to identify the protein cleavage sites, interaction sites, post-translational modification attachment sites, etc., that underlie much of biology. It may be equally useful for arbitrarily gapped motifs in DNA and RNA, although fewer examples of such motifs are known at present. GLAM2 is public domain software, available for download at http://bioinformatics.org.au/glam2. PMID:18437229
BlockLogo: visualization of peptide and sequence motif conservation

PubMed Central

Olsen, Lars Rønn; Kudahl, Ulrich Johan; Simon, Christian; Sun, Jing; Schönbach, Christian; Reinherz, Ellis L.; Zhang, Guang Lan; Brusic, Vladimir

2013-01-01

BlockLogo is a web-server application for visualization of protein and nucleotide fragments, continuous protein sequence motifs, and discontinuous sequence motifs using calculation of block entropy from multiple sequence alignments. The user input consists of a multiple sequence alignment, selection of motif positions, type of sequence, and output format definition. The output has BlockLogo along with the sequence logo, and a table of motif frequencies. We deployed BlockLogo as an online application and have demonstrated its utility through examples that show visualization of T-cell epitopes and B-cell epitopes (both continuous and discontinuous). Our additional example shows a visualization and analysis of structural motifs that determine specificity of peptide binding to HLA-DR molecules. The BlockLogo server also employs selected experimentally validated prediction algorithms to enable on-the-fly prediction of MHC binding affinity to 15 common HLA class I and class II alleles as well as visual analysis of discontinuous epitopes from multiple sequence alignments. It enables the visualization and analysis of structural and functional motifs that are usually described as regular expressions. It provides a compact view of discontinuous motifs composed of distant positions within biological sequences. BlockLogo is available at: http://research4.dfci.harvard.edu/cvc/blocklogo/ and http://methilab.bu.edu/blocklogo/ PMID:24001880
Identifying novel sequence variants of RNA 3D motifs

PubMed Central

Zirbel, Craig L.; Roll, James; Sweeney, Blake A.; Petrov, Anton I.; Pirrung, Meg; Leontis, Neocles B.

2015-01-01

Predicting RNA 3D structure from sequence is a major challenge in biophysics. An important sub-goal is accurately identifying recurrent 3D motifs from RNA internal and hairpin loop sequences extracted from secondary structure (2D) diagrams. We have developed and validated new probabilistic models for 3D motif sequences based on hybrid Stochastic Context-Free Grammars and Markov Random Fields (SCFG/MRF). The SCFG/MRF models are constructed using atomic-resolution RNA 3D structures. To parameterize each model, we use all instances of each motif found in the RNA 3D Motif Atlas and annotations of pairwise nucleotide interactions generated by the FR3D software. Isostericity relations between non-Watson–Crick basepairs are used in scoring sequence variants. SCFG techniques model nested pairs and insertions, while MRF ideas handle crossing interactions and base triples. We use test sets of randomly-generated sequences to set acceptance and rejection thresholds for each motif group and thus control the false positive rate. Validation was carried out by comparing results for four motif groups to RMDetect. The software developed for sequence scoring (JAR3D) is structured to automatically incorporate new motifs as they accumulate in the RNA 3D Motif Atlas when new structures are solved and is available free for download. PMID:26130723
Occurrence probability of structured motifs in random sequences.

PubMed

Robin, S; Daudin, J-J; Richard, H; Sagot, M-F; Schbath, S

2002-01-01

The problem of extracting from a set of nucleic acid sequences motifs which may have biological function is more and more important. In this paper, we are interested in particular motifs that may be implicated in the transcription process. These motifs, called structured motifs, are composed of two ordered parts separated by a variable distance and allowing for substitutions. In order to assess their statistical significance, we propose approximations of the probability of occurrences of such a structured motif in a given sequence. An application of our method to evaluate candidate promoters in E. coli and B. subtilis is presented. Simulations show the goodness of the approximations.
SSMART: Sequence-structure motif identification for RNA-binding proteins.

PubMed

Munteanu, Alina; Mukherjee, Neelanjan; Ohler, Uwe

2018-06-11

RNA-binding proteins (RBPs) regulate every aspect of RNA metabolism and function. There are hundreds of RBPs encoded in the eukaryotic genomes, and each recognize its RNA targets through a specific mixture of RNA sequence and structure properties. For most RBPs, however, only a primary sequence motif has been determined, while the structure of the binding sites is uncharacterized. We developed SSMART, an RNA motif finder that simultaneously models the primary sequence and the structural properties of the RNA targets sites. The sequence-structure motifs are represented as consensus strings over a degenerate alphabet, extending the IUPAC codes for nucleotides to account for secondary structure preferences. Evaluation on synthetic data showed that SSMART is able to recover both sequence and structure motifs implanted into 3'UTR-like sequences, for various degrees of structured/unstructured binding sites. In addition, we successfully used SSMART on high-throughput in vivo and in vitro data, showing that we not only recover the known sequence motif, but also gain insight into the structural preferences of the RBP. Availability: SSMART is freely available at https://ohlerlab.mdc-berlin.de/software/SSMART_137/. Supplementary data are available at Bioinformatics online.
Identification of sequence motifs significantly associated with antisense activity.

PubMed

McQuisten, Kyle A; Peek, Andrew S

2007-06-07

Predicting the suppression activity of antisense oligonucleotide sequences is the main goal of the rational design of nucleic acids. To create an effective predictive model, it is important to know what properties of an oligonucleotide sequence associate significantly with antisense activity. Also, for the model to be efficient we must know what properties do not associate significantly and can be omitted from the model. This paper will discuss the results of a randomization procedure to find motifs that associate significantly with either high or low antisense suppression activity, analysis of their properties, as well as the results of support vector machine modelling using these significant motifs as features. We discovered 155 motifs that associate significantly with high antisense suppression activity and 202 motifs that associate significantly with low suppression activity. The motifs range in length from 2 to 5 bases, contain several motifs that have been previously discovered as associating highly with antisense activity, and have thermodynamic properties consistent with previous work associating thermodynamic properties of sequences with their antisense activity. Statistical analysis revealed no correlation between a motif's position within an antisense sequence and that sequences antisense activity. Also, many significant motifs existed as subwords of other significant motifs. Support vector regression experiments indicated that the feature set of significant motifs increased correlation compared to all possible motifs as well as several subsets of the significant motifs. The thermodynamic properties of the significantly associated motifs support existing data correlating the thermodynamic properties of the antisense oligonucleotide with antisense efficiency, reinforcing our hypothesis that antisense suppression is strongly associated with probe/target thermodynamics, as there are no enzymatic mediators to speed the process along like the RNA Induced
An integrative and applicable phylogenetic footprinting framework for cis-regulatory motifs identification in prokaryotic genomes.

PubMed

Liu, Bingqiang; Zhang, Hanyuan; Zhou, Chuan; Li, Guojun; Fennell, Anne; Wang, Guanghui; Kang, Yu; Liu, Qi; Ma, Qin

2016-08-09

Phylogenetic footprinting is an important computational technique for identifying cis-regulatory motifs in orthologous regulatory regions from multiple genomes, as motifs tend to evolve slower than their surrounding non-functional sequences. Its application, however, has several difficulties for optimizing the selection of orthologous data and reducing the false positives in motif prediction. Here we present an integrative phylogenetic footprinting framework for accurate motif predictions in prokaryotic genomes (MP(3)). The framework includes a new orthologous data preparation procedure, an additional promoter scoring and pruning method and an integration of six existing motif finding algorithms as basic motif search engines. Specifically, we collected orthologous genes from available prokaryotic genomes and built the orthologous regulatory regions based on sequence similarity of promoter regions. This procedure made full use of the large-scale genomic data and taxonomy information and filtered out the promoters with limited contribution to produce a high quality orthologous promoter set. The promoter scoring and pruning is implemented through motif voting by a set of complementary predicting tools that mine as many motif candidates as possible and simultaneously eliminate the effect of random noise. We have applied the framework to Escherichia coli k12 genome and evaluated the prediction performance through comparison with seven existing programs. This evaluation was systematically carried out at the nucleotide and binding site level, and the results showed that MP(3) consistently outperformed other popular motif finding tools. We have integrated MP(3) into our motif identification and analysis server DMINDA, allowing users to efficiently identify and analyze motifs in 2,072 completely sequenced prokaryotic genomes. The performance evaluation indicated that MP(3) is effective for predicting regulatory motifs in prokaryotic genomes. Its application may enhance
Mining for class-specific motifs in protein sequence classification

PubMed Central

2013-01-01

Background In protein sequence classification, identification of the sequence motifs or n-grams that can precisely discriminate between classes is a more interesting scientific question than the classification itself. A number of classification methods aim at accurate classification but fail to explain which sequence features indeed contribute to the accuracy. We hypothesize that sequences in lower denominations (n-grams) can be used to explore the sequence landscape and to identify class-specific motifs that discriminate between classes during classification. Discriminative n-grams are short peptide sequences that are highly frequent in one class but are either minimally present or absent in other classes. In this study, we present a new substitution-based scoring function for identifying discriminative n-grams that are highly specific to a class. Results We present a scoring function based on discriminative n-grams that can effectively discriminate between classes. The scoring function, initially, harvests the entire set of 4- to 8-grams from the protein sequences of different classes in the dataset. Similar n-grams of the same size are combined to form new n-grams, where the similarity is defined by positive amino acid substitution scores in the BLOSUM62 matrix. Substitution has resulted in a large increase in the number of discriminatory n-grams harvested. Due to the unbalanced nature of the dataset, the frequencies of the n-grams are normalized using a dampening factor, which gives more weightage to the n-grams that appear in fewer classes and vice-versa. After the n-grams are normalized, the scoring function identifies discriminative 4- to 8-grams for each class that are frequent enough to be above a selection threshold. By mapping these discriminative n-grams back to the protein sequences, we obtained contiguous n-grams that represent short class-specific motifs in protein sequences. Our method fared well compared to an existing motif finding method known as
Learning cellular sorting pathways using protein interactions and sequence motifs.

PubMed

Lin, Tien-Ho; Bar-Joseph, Ziv; Murphy, Robert F

2011-11-01

Proper subcellular localization is critical for proteins to perform their roles in cellular functions. Proteins are transported by different cellular sorting pathways, some of which take a protein through several intermediate locations until reaching its final destination. The pathway a protein is transported through is determined by carrier proteins that bind to specific sequence motifs. In this article, we present a new method that integrates protein interaction and sequence motif data to model how proteins are sorted through these sorting pathways. We use a hidden Markov model (HMM) to represent protein sorting pathways. The model is able to determine intermediate sorting states and to assign carrier proteins and motifs to the sorting pathways. In simulation studies, we show that the method can accurately recover an underlying sorting model. Using data for yeast, we show that our model leads to accurate prediction of subcellular localization. We also show that the pathways learned by our model recover many known sorting pathways and correctly assign proteins to the path they utilize. The learned model identified new pathways and their putative carriers and motifs and these may represent novel protein sorting mechanisms. Supplementary results and software implementation are available from http://murphylab.web.cmu.edu/software/2010_RECOMB_pathways/.
Learning Cellular Sorting Pathways Using Protein Interactions and Sequence Motifs

PubMed Central

Lin, Tien-Ho; Bar-Joseph, Ziv

2011-01-01

Abstract Proper subcellular localization is critical for proteins to perform their roles in cellular functions. Proteins are transported by different cellular sorting pathways, some of which take a protein through several intermediate locations until reaching its final destination. The pathway a protein is transported through is determined by carrier proteins that bind to specific sequence motifs. In this article, we present a new method that integrates protein interaction and sequence motif data to model how proteins are sorted through these sorting pathways. We use a hidden Markov model (HMM) to represent protein sorting pathways. The model is able to determine intermediate sorting states and to assign carrier proteins and motifs to the sorting pathways. In simulation studies, we show that the method can accurately recover an underlying sorting model. Using data for yeast, we show that our model leads to accurate prediction of subcellular localization. We also show that the pathways learned by our model recover many known sorting pathways and correctly assign proteins to the path they utilize. The learned model identified new pathways and their putative carriers and motifs and these may represent novel protein sorting mechanisms. Supplementary results and software implementation are available from http://murphylab.web.cmu.edu/software/2010_RECOMB_pathways/. PMID:21999284
Identification of the sequence motif of glycoside hydrolase 13 family members

PubMed Central

Kumar, Vikash

2011-01-01

A bioinformatics analysis of sequences of enzymes of the glycoside hydrolase (GH) 13 family members such as α-amylase, cyclodextrin glycosyltransferase (CGTase), branching enzyme and cyclomaltodextrinase has been carried out in order to find out the sequence motifs that govern the reactions specificities of these enzymes by using hidden Markov model (HMM) profile. This analysis suggests the existence of such sequence motifs and residues of these motifs constituting the −1 to +3 catalytic subsites of the enzyme. Hence, by introducing mutations in the residues of these four subsites, one can change the reaction specificities of the enzymes. In general it has been observed that α -amylase sequence motif have low sequence conservation than rest of the motifs of the GH13 family members. PMID:21544166
WebMOTIFS: automated discovery, filtering and scoring of DNA sequence motifs using multiple programs and Bayesian approaches

PubMed Central

Romer, Katherine A.; Kayombya, Guy-Richard; Fraenkel, Ernest

2007-01-01

WebMOTIFS provides a web interface that facilitates the discovery and analysis of DNA-sequence motifs. Several studies have shown that the accuracy of motif discovery can be significantly improved by using multiple de novo motif discovery programs and using randomized control calculations to identify the most significant motifs or by using Bayesian approaches. WebMOTIFS makes it easy to apply these strategies. Using a single submission form, users can run several motif discovery programs and score, cluster and visualize the results. In addition, the Bayesian motif discovery program THEME can be used to determine the class of transcription factors that is most likely to regulate a set of sequences. Input can be provided as a list of gene or probe identifiers. Used with the default settings, WebMOTIFS accurately identifies biologically relevant motifs from diverse data in several species. WebMOTIFS is freely available at http://fraenkel.mit.edu/webmotifs. PMID:17584794
Memetic algorithms for de novo motif-finding in biomedical sequences.

PubMed

Bi, Chengpeng

2012-09-01

The objectives of this study are to design and implement a new memetic algorithm for de novo motif discovery, which is then applied to detect important signals hidden in various biomedical molecular sequences. In this paper, memetic algorithms are developed and tested in de novo motif-finding problems. Several strategies in the algorithm design are employed that are to not only efficiently explore the multiple sequence local alignment space, but also effectively uncover the molecular signals. As a result, there are a number of key features in the implementation of the memetic motif-finding algorithm (MaMotif), including a chromosome replacement operator, a chromosome alteration-aware local search operator, a truncated local search strategy, and a stochastic operation of local search imposed on individual learning. To test the new algorithm, we compare MaMotif with a few of other similar algorithms using simulated and experimental data including genomic DNA, primary microRNA sequences (let-7 family), and transmembrane protein sequences. The new memetic motif-finding algorithm is successfully implemented in C++, and exhaustively tested with various simulated and real biological sequences. In the simulation, it shows that MaMotif is the most time-efficient algorithm compared with others, that is, it runs 2 times faster than the expectation maximization (EM) method and 16 times faster than the genetic algorithm-based EM hybrid. In both simulated and experimental testing, results show that the new algorithm is compared favorably or superior to other algorithms. Notably, MaMotif is able to successfully discover the transcription factors' binding sites in the chromatin immunoprecipitation followed by massively parallel sequencing (ChIP-Seq) data, correctly uncover the RNA splicing signals in gene expression, and precisely find the highly conserved helix motif in the transmembrane protein sequences, as well as rightly detect the palindromic segments in the primary micro
Discovering Motifs in Biological Sequences Using the Micron Automata Processor.

PubMed

Roy, Indranil; Aluru, Srinivas

2016-01-01

Finding approximately conserved sequences, called motifs, across multiple DNA or protein sequences is an important problem in computational biology. In this paper, we consider the (l, d) motif search problem of identifying one or more motifs of length l present in at least q of the n given sequences, with each occurrence differing from the motif in at most d substitutions. The problem is known to be NP-complete, and the largest solved instance reported to date is (26,11). We propose a novel algorithm for the (l,d) motif search problem using streaming execution over a large set of non-deterministic finite automata (NFA). This solution is designed to take advantage of the micron automata processor, a new technology close to deployment that can simultaneously execute multiple NFA in parallel. We demonstrate the capability for solving much larger instances of the (l, d) motif search problem using the resources available within a single automata processor board, by estimating run-times for problem instances (39,18) and (40,17). The paper serves as a useful guide to solving problems using this new accelerator technology.
A Monte Carlo-based framework enhances the discovery and interpretation of regulatory sequence motifs

PubMed Central

2012-01-01

Background Discovery of functionally significant short, statistically overrepresented subsequence patterns (motifs) in a set of sequences is a challenging problem in bioinformatics. Oftentimes, not all sequences in the set contain a motif. These non-motif-containing sequences complicate the algorithmic discovery of motifs. Filtering the non-motif-containing sequences from the larger set of sequences while simultaneously determining the identity of the motif is, therefore, desirable and a non-trivial problem in motif discovery research. Results We describe MotifCatcher, a framework that extends the sensitivity of existing motif-finding tools by employing random sampling to effectively remove non-motif-containing sequences from the motif search. We developed two implementations of our algorithm; each built around a commonly used motif-finding tool, and applied our algorithm to three diverse chromatin immunoprecipitation (ChIP) data sets. In each case, the motif finder with the MotifCatcher extension demonstrated improved sensitivity over the motif finder alone. Our approach organizes candidate functionally significant discovered motifs into a tree, which allowed us to make additional insights. In all cases, we were able to support our findings with experimental work from the literature. Conclusions Our framework demonstrates that additional processing at the sequence entry level can significantly improve the performance of existing motif-finding tools. For each biological data set tested, we were able to propose novel biological hypotheses supported by experimental work from the literature. Specifically, in Escherichia coli, we suggested binding site motifs for 6 non-traditional LexA protein binding sites; in Saccharomyces cerevisiae, we hypothesize 2 disparate mechanisms for novel binding sites of the Cse4p protein; and in Halobacterium sp. NRC-1, we discoverd subtle differences in a general transcription factor (GTF) binding site motif across several data sets. We

A Gibbs sampler for motif detection in phylogenetically close sequences

NASA Astrophysics Data System (ADS)

Siddharthan, Rahul; van Nimwegen, Erik; Siggia, Eric

2004-03-01

Genes are regulated by transcription factors that bind to DNA upstream of genes and recognize short conserved ``motifs'' in a random intergenic ``background''. Motif-finders such as the Gibbs sampler compare the probability of these short sequences being represented by ``weight matrices'' to the probability of their arising from the background ``null model'', and explore this space (analogous to a free-energy landscape). But closely related species may show conservation not because of functional sites but simply because they have not had sufficient time to diverge, so conventional methods will fail. We introduce a new Gibbs sampler algorithm that accounts for common ancestry when searching for motifs, while requiring minimal ``prior'' assumptions on the number and types of motifs, assessing the significance of detected motifs by ``tracking'' clusters that stay together. We apply this scheme to motif detection in sporulation-cycle genes in the yeast S. cerevisiae, using recent sequences of other closely-related Saccharomyces species.
ssHMM: extracting intuitive sequence-structure motifs from high-throughput RNA-binding protein data

PubMed Central

Krestel, Ralf; Ohler, Uwe; Vingron, Martin; Marsico, Annalisa

2017-01-01

Abstract RNA-binding proteins (RBPs) play an important role in RNA post-transcriptional regulation and recognize target RNAs via sequence-structure motifs. The extent to which RNA structure influences protein binding in the presence or absence of a sequence motif is still poorly understood. Existing RNA motif finders either take the structure of the RNA only partially into account, or employ models which are not directly interpretable as sequence-structure motifs. We developed ssHMM, an RNA motif finder based on a hidden Markov model (HMM) and Gibbs sampling which fully captures the relationship between RNA sequence and secondary structure preference of a given RBP. Compared to previous methods which output separate logos for sequence and structure, it directly produces a combined sequence-structure motif when trained on a large set of sequences. ssHMM’s model is visualized intuitively as a graph and facilitates biological interpretation. ssHMM can be used to find novel bona fide sequence-structure motifs of uncharacterized RBPs, such as the one presented here for the YY1 protein. ssHMM reaches a high motif recovery rate on synthetic data, it recovers known RBP motifs from CLIP-Seq data, and scales linearly on the input size, being considerably faster than MEMERIS and RNAcontext on large datasets while being on par with GraphProt. It is freely available on Github and as a Docker image. PMID:28977546
LDsplit: screening for cis-regulatory motifs stimulating meiotic recombination hotspots by analysis of DNA sequence polymorphisms.

PubMed

Yang, Peng; Wu, Min; Guo, Jing; Kwoh, Chee Keong; Przytycka, Teresa M; Zheng, Jie

2014-02-17

As a fundamental genomic element, meiotic recombination hotspot plays important roles in life sciences. Thus uncovering its regulatory mechanisms has broad impact on biomedical research. Despite the recent identification of the zinc finger protein PRDM9 and its 13-mer binding motif as major regulators for meiotic recombination hotspots, other regulators remain to be discovered. Existing methods for finding DNA sequence motifs of recombination hotspots often rely on the enrichment of co-localizations between hotspots and short DNA patterns, which ignore the cross-individual variation of recombination rates and sequence polymorphisms in the population. Our objective in this paper is to capture signals encoded in genetic variations for the discovery of recombination-associated DNA motifs. Recently, an algorithm called "LDsplit" has been designed to detect the association between single nucleotide polymorphisms (SNPs) and proximal meiotic recombination hotspots. The association is measured by the difference of population recombination rates at a hotspot between two alleles of a candidate SNP. Here we present an open source software tool of LDsplit, with integrative data visualization for recombination hotspots and their proximal SNPs. Applying LDsplit on SNPs inside an established 7-mer motif bound by PRDM9 we observed that SNP alleles preserving the original motif tend to have higher recombination rates than the opposite alleles that disrupt the motif. Running on SNP windows around hotspots each containing an occurrence of the 7-mer motif, LDsplit is able to guide the established motif finding algorithm of MEME to recover the 7-mer motif. In contrast, without LDsplit the 7-mer motif could not be identified. LDsplit is a software tool for the discovery of cis-regulatory DNA sequence motifs stimulating meiotic recombination hotspots by screening and narrowing down to hotspot associated SNPs. It is the first computational method that utilizes the genetic variation of
LDsplit: screening for cis-regulatory motifs stimulating meiotic recombination hotspots by analysis of DNA sequence polymorphisms

PubMed Central

2014-01-01

Background As a fundamental genomic element, meiotic recombination hotspot plays important roles in life sciences. Thus uncovering its regulatory mechanisms has broad impact on biomedical research. Despite the recent identification of the zinc finger protein PRDM9 and its 13-mer binding motif as major regulators for meiotic recombination hotspots, other regulators remain to be discovered. Existing methods for finding DNA sequence motifs of recombination hotspots often rely on the enrichment of co-localizations between hotspots and short DNA patterns, which ignore the cross-individual variation of recombination rates and sequence polymorphisms in the population. Our objective in this paper is to capture signals encoded in genetic variations for the discovery of recombination-associated DNA motifs. Results Recently, an algorithm called “LDsplit” has been designed to detect the association between single nucleotide polymorphisms (SNPs) and proximal meiotic recombination hotspots. The association is measured by the difference of population recombination rates at a hotspot between two alleles of a candidate SNP. Here we present an open source software tool of LDsplit, with integrative data visualization for recombination hotspots and their proximal SNPs. Applying LDsplit on SNPs inside an established 7-mer motif bound by PRDM9 we observed that SNP alleles preserving the original motif tend to have higher recombination rates than the opposite alleles that disrupt the motif. Running on SNP windows around hotspots each containing an occurrence of the 7-mer motif, LDsplit is able to guide the established motif finding algorithm of MEME to recover the 7-mer motif. In contrast, without LDsplit the 7-mer motif could not be identified. Conclusions LDsplit is a software tool for the discovery of cis-regulatory DNA sequence motifs stimulating meiotic recombination hotspots by screening and narrowing down to hotspot associated SNPs. It is the first computational method that
PISMA: A Visual Representation of Motif Distribution in DNA Sequences.

PubMed

Alcántara-Silva, Rogelio; Alvarado-Hermida, Moisés; Díaz-Contreras, Gibrán; Sánchez-Barrios, Martha; Carrera, Samantha; Galván, Silvia Carolina

2017-01-01

Because the graphical presentation and analysis of motif distribution can provide insights for experimental hypothesis, PISMA aims at identifying motifs on DNA sequences, counting and showing them graphically. The motif length ranges from 2 to 10 bases, and the DNA sequences range up to 10 kb. The motif distribution is shown as a bar-code-like, as a gene-map-like, and as a transcript scheme. We obtained graphical schemes of the CpG site distribution from 91 human papillomavirus genomes. Also, we present 2 analyses: one of DNA motifs associated with either methylation-resistant or methylation-sensitive CpG islands and another analysis of motifs associated with exosome RNA secretion. PISMA is developed in Java; it is executable in any type of hardware and in diverse operating systems. PISMA is freely available to noncommercial users. The English version and the User Manual are provided in Supplementary Files 1 and 2, and a Spanish version is available at www.biomedicas.unam.mx/wp-content/software/pisma.zip and www.biomedicas.unam.mx/wp-content/pdf/manual/pisma.pdf.
CoSMoS: Conserved Sequence Motif Search in the proteome

PubMed Central

Liu, Xiao I; Korde, Neeraj; Jakob, Ursula; Leichert, Lars I

2006-01-01

Background With the ever-increasing number of gene sequences in the public databases, generating and analyzing multiple sequence alignments becomes increasingly time consuming. Nevertheless it is a task performed on a regular basis by researchers in many labs. Results We have now created a database called CoSMoS to find the occurrences and at the same time evaluate the significance of sequence motifs and amino acids encoded in the whole genome of the model organism Escherichia coli K12. We provide a precomputed set of multiple sequence alignments for each individual E. coli protein with all of its homologues in the RefSeq database. The alignments themselves, information about the occurrence of sequence motifs together with information on the conservation of each of the more than 1.3 million amino acids encoded in the E. coli genome can be accessed via the web interface of CoSMoS. Conclusion CoSMoS is a valuable tool to identify highly conserved sequence motifs, to find regions suitable for mutational studies in functional analyses and to predict important structural features in E. coli proteins. PMID:16433915
A motif detection and classification method for peptide sequences using genetic programming.

PubMed

Tomita, Yasuyuki; Kato, Ryuji; Okochi, Mina; Honda, Hiroyuki

2008-08-01

An exploration of common rules (property motifs) in amino acid sequences has been required for the design of novel sequences and elucidation of the interactions between molecules controlled by the structural or physical environment. In the present study, we developed a new method to search property motifs that are common in peptide sequence data. Our method comprises the following two characteristics: (i) the automatic determination of the position and length of common property motifs by calculating the physicochemical similarity of amino acids, and (ii) the quick and effective exploration of motif candidates that discriminates the positives and negatives by the introduction of genetic programming (GP). Our method was evaluated by two types of model data sets. First, the intentionally buried property motifs were searched in the artificially derived peptide data containing intentionally buried property motifs. As a result, the expected property motifs were correctly extracted by our algorithm. Second, the peptide data that interact with MHC class II molecules were analyzed as one of the models of biologically active peptides with buried motifs in various lengths. Twofold MHC class II binding peptides were identified with the rule using our method, compared to the existing scoring matrix method. In conclusion, our GP based motif searching approach enabled to obtain knowledge of functional aspects of the peptides without any prior knowledge.
PISMA: A Visual Representation of Motif Distribution in DNA Sequences

PubMed Central

Alcántara-Silva, Rogelio; Alvarado-Hermida, Moisés; Díaz-Contreras, Gibrán; Sánchez-Barrios, Martha; Carrera, Samantha; Galván, Silvia Carolina

2017-01-01

Background: Because the graphical presentation and analysis of motif distribution can provide insights for experimental hypothesis, PISMA aims at identifying motifs on DNA sequences, counting and showing them graphically. The motif length ranges from 2 to 10 bases, and the DNA sequences range up to 10 kb. The motif distribution is shown as a bar-code–like, as a gene-map–like, and as a transcript scheme. Results: We obtained graphical schemes of the CpG site distribution from 91 human papillomavirus genomes. Also, we present 2 analyses: one of DNA motifs associated with either methylation-resistant or methylation-sensitive CpG islands and another analysis of motifs associated with exosome RNA secretion. Availability and Implementation: PISMA is developed in Java; it is executable in any type of hardware and in diverse operating systems. PISMA is freely available to noncommercial users. The English version and the User Manual are provided in Supplementary Files 1 and 2, and a Spanish version is available at www.biomedicas.unam.mx/wp-content/software/pisma.zip and www.biomedicas.unam.mx/wp-content/pdf/manual/pisma.pdf. PMID:28469418
The BaMM web server for de-novo motif discovery and regulatory sequence analysis.

PubMed

Kiesel, Anja; Roth, Christian; Ge, Wanwan; Wess, Maximilian; Meier, Markus; Söding, Johannes

2018-05-28

The BaMM web server offers four tools: (i) de-novo discovery of enriched motifs in a set of nucleotide sequences, (ii) scanning a set of nucleotide sequences with motifs to find motif occurrences, (iii) searching with an input motif for similar motifs in our BaMM database with motifs for >1000 transcription factors, trained from the GTRD ChIP-seq database and (iv) browsing and keyword searching the motif database. In contrast to most other servers, we represent sequence motifs not by position weight matrices (PWMs) but by Bayesian Markov Models (BaMMs) of order 4, which we showed previously to perform substantially better in ROC analyses than PWMs or first order models. To address the inadequacy of P- and E-values as measures of motif quality, we introduce the AvRec score, the average recall over the TP-to-FP ratio between 1 and 100. The BaMM server is freely accessible without registration at https://bammmotif.mpibpc.mpg.de.
CompariMotif: quick and easy comparisons of sequence motifs.

PubMed

Edwards, Richard J; Davey, Norman E; Shields, Denis C

2008-05-15

CompariMotif is a novel tool for making motif-motif comparisons, identifying and describing similarities between regular expression motifs. CompariMotif can identify a number of different relationships between motifs, including exact matches, variants of degenerate motifs and complex overlapping motifs. Motif relationships are scored using shared information content, allowing the best matches to be easily identified in large comparisons. Many input and search options are available, enabling a list of motifs to be compared to itself (to identify recurring motifs) or to datasets of known motifs. CompariMotif can be run online at http://bioware.ucd.ie/ and is freely available for academic use as a set of open source Python modules under a GNU General Public License from http://bioinformatics.ucd.ie/shields/software/comparimotif/
Physical-chemical property based sequence motifs and methods regarding same

DOEpatents

Braun, Werner [Friendswood, TX; Mathura, Venkatarajan S [Sarasota, FL; Schein, Catherine H [Friendswood, TX

2008-09-09

A data analysis system, program, and/or method, e.g., a data mining/data exploration method, using physical-chemical property motifs. For example, a sequence database may be searched for identifying segments thereof having physical-chemical properties similar to the physical-chemical property motifs.
Sequence Bundles: a novel method for visualising, discovering and exploring sequence motifs

PubMed Central

2014-01-01

Background We introduce Sequence Bundles--a novel data visualisation method for representing multiple sequence alignments (MSAs). We identify and address key limitations of the existing bioinformatics data visualisation methods (i.e. the Sequence Logo) by enabling Sequence Bundles to give salient visual expression to sequence motifs and other data features, which would otherwise remain hidden. Methods For the development of Sequence Bundles we employed research-led information design methodologies. Sequences are encoded as uninterrupted, semi-opaque lines plotted on a 2-dimensional reconfigurable grid. Each line represents a single sequence. The thickness and opacity of the stack at each residue in each position indicates the level of conservation and the lines' curved paths expose patterns in correlation and functionality. Several MSAs can be visualised in a composite image. The Sequence Bundles method is designed to favour a tangible, continuous and intuitive display of information. Results We have developed a software demonstration application for generating a Sequence Bundles visualisation of MSAs provided for the BioVis 2013 redesign contest. A subsequent exploration of the visualised line patterns allowed for the discovery of a number of interesting features in the dataset. Reported features include the extreme conservation of sequences displaying a specific residue and bifurcations of the consensus sequence. Conclusions Sequence Bundles is a novel method for visualisation of MSAs and the discovery of sequence motifs. It can aid in generating new insight and hypothesis making. Sequence Bundles is well disposed for future implementation as an interactive visual analytics software, which can complement existing visualisation tools. PMID:25237395
The CGTCA sequence motif is essential for biological activity of the vasoactive intestinal peptide gene cAMP-regulated enhancer.

PubMed Central

Fink, J S; Verhave, M; Kasper, S; Tsukada, T; Mandel, G; Goodman, R H

1988-01-01

cAMP-regulated transcription of the human vasoactive intestinal peptide gene is dependent upon a 17-base-pair DNA element located 70 base pairs upstream from the transcriptional initiation site. This element is similar to sequences in other genes known to be regulated by cAMP and to sequences in several viral enhancers. We have demonstrated that the vasoactive intestinal peptide regulatory element is an enhancer that depends upon the integrity of two CGTCA sequence motifs for biological activity. Mutations in either of the CGTCA motifs diminish the ability of the element to respond to cAMP. Enhancers containing the CGTCA motif from the somatostatin and adenovirus genes compete for binding of nuclear proteins from C6 glioma and PC12 cells to the vasoactive intestinal peptide enhancer, suggesting that CGTCA-containing enhancers interact with similar transacting factors. Images PMID:2842787
Statistical Methods for Identifying Sequence Motifs Affecting Point Mutations

PubMed Central

Zhu, Yicheng; Neeman, Teresa; Yap, Von Bing; Huttley, Gavin A.

2017-01-01

Mutation processes differ between types of point mutation, genomic locations, cells, and biological species. For some point mutations, specific neighboring bases are known to be mechanistically influential. Beyond these cases, numerous questions remain unresolved, including: what are the sequence motifs that affect point mutations? How large are the motifs? Are they strand symmetric? And, do they vary between samples? We present new log-linear models that allow explicit examination of these questions, along with sequence logo style visualization to enable identifying specific motifs. We demonstrate the performance of these methods by analyzing mutation processes in human germline and malignant melanoma. We recapitulate the known CpG effect, and identify novel motifs, including a highly significant motif associated with A→G mutations. We show that major effects of neighbors on germline mutation lie within ±2 of the mutating base. Models are also presented for contrasting the entire mutation spectra (the distribution of the different point mutations). We show the spectra vary significantly between autosomes and X-chromosome, with a difference in T→C transition dominating. Analyses of malignant melanoma confirmed reported characteristic features of this cancer, including statistically significant strand asymmetry, and markedly different neighboring influences. The methods we present are made freely available as a Python library https://bitbucket.org/pycogent3/mutationmotif. PMID:27974498
Simultaneously learning DNA motif along with its position and sequence rank preferences through expectation maximization algorithm.

PubMed

Zhang, ZhiZhuo; Chang, Cheng Wei; Hugo, Willy; Cheung, Edwin; Sung, Wing-Kin

2013-03-01

Although de novo motifs can be discovered through mining over-represented sequence patterns, this approach misses some real motifs and generates many false positives. To improve accuracy, one solution is to consider some additional binding features (i.e., position preference and sequence rank preference). This information is usually required from the user. This article presents a de novo motif discovery algorithm called SEME (sampling with expectation maximization for motif elicitation), which uses pure probabilistic mixture model to model the motif's binding features and uses expectation maximization (EM) algorithms to simultaneously learn the sequence motif, position, and sequence rank preferences without asking for any prior knowledge from the user. SEME is both efficient and accurate thanks to two important techniques: the variable motif length extension and importance sampling. Using 75 large-scale synthetic datasets, 32 metazoan compendium benchmark datasets, and 164 chromatin immunoprecipitation sequencing (ChIP-Seq) libraries, we demonstrated the superior performance of SEME over existing programs in finding transcription factor (TF) binding sites. SEME is further applied to a more difficult problem of finding the co-regulated TF (coTF) motifs in 15 ChIP-Seq libraries. It identified significantly more correct coTF motifs and, at the same time, predicted coTF motifs with better matching to the known motifs. Finally, we show that the learned position and sequence rank preferences of each coTF reveals potential interaction mechanisms between the primary TF and the coTF within these sites. Some of these findings were further validated by the ChIP-Seq experiments of the coTFs. The application is available online.
MotifMark: Finding regulatory motifs in DNA sequences.

PubMed

Hassanzadeh, Hamid Reza; Kolhe, Pushkar; Isbell, Charles L; Wang, May D

2017-07-01

The interaction between proteins and DNA is a key driving force in a significant number of biological processes such as transcriptional regulation, repair, recombination, splicing, and DNA modification. The identification of DNA-binding sites and the specificity of target proteins in binding to these regions are two important steps in understanding the mechanisms of these biological activities. A number of high-throughput technologies have recently emerged that try to quantify the affinity between proteins and DNA motifs. Despite their success, these technologies have their own limitations and fall short in precise characterization of motifs, and as a result, require further downstream analysis to extract useful and interpretable information from a haystack of noisy and inaccurate data. Here we propose MotifMark, a new algorithm based on graph theory and machine learning, that can find binding sites on candidate probes and rank their specificity in regard to the underlying transcription factor. We developed a pipeline to analyze experimental data derived from compact universal protein binding microarrays and benchmarked it against two of the most accurate motif search methods. Our results indicate that MotifMark can be a viable alternative technique for prediction of motif from protein binding microarrays and possibly other related high-throughput techniques.
BayesMotif: de novo protein sorting motif discovery from impure datasets.

PubMed

Hu, Jianjun; Zhang, Fan

2010-01-18

Protein sorting is the process that newly synthesized proteins are transported to their target locations within or outside of the cell. This process is precisely regulated by protein sorting signals in different forms. A major category of sorting signals are amino acid sub-sequences usually located at the N-terminals or C-terminals of protein sequences. Genome-wide experimental identification of protein sorting signals is extremely time-consuming and costly. Effective computational algorithms for de novo discovery of protein sorting signals is needed to improve the understanding of protein sorting mechanisms. We formulated the protein sorting motif discovery problem as a classification problem and proposed a Bayesian classifier based algorithm (BayesMotif) for de novo identification of a common type of protein sorting motifs in which a highly conserved anchor is present along with a less conserved motif regions. A false positive removal procedure is developed to iteratively remove sequences that are unlikely to contain true motifs so that the algorithm can identify motifs from impure input sequences. Experiments on both implanted motif datasets and real-world datasets showed that the enhanced BayesMotif algorithm can identify anchored sorting motifs from pure or impure protein sequence dataset. It also shows that the false positive removal procedure can help to identify true motifs even when there is only 20% of the input sequences containing true motif instances. We proposed BayesMotif, a novel Bayesian classification based algorithm for de novo discovery of a special category of anchored protein sorting motifs from impure datasets. Compared to conventional motif discovery algorithms such as MEME, our algorithm can find less-conserved motifs with short highly conserved anchors. Our algorithm also has the advantage of easy incorporation of additional meta-sequence features such as hydrophobicity or charge of the motifs which may help to overcome the limitations of
Identification of sequence-structure RNA binding motifs for SELEX-derived aptamers.

PubMed

Hoinka, Jan; Zotenko, Elena; Friedman, Adam; Sauna, Zuben E; Przytycka, Teresa M

2012-06-15

Systematic Evolution of Ligands by EXponential Enrichment (SELEX) represents a state-of-the-art technology to isolate single-stranded (ribo)nucleic acid fragments, named aptamers, which bind to a molecule (or molecules) of interest via specific structural regions induced by their sequence-dependent fold. This powerful method has applications in designing protein inhibitors, molecular detection systems, therapeutic drugs and antibody replacement among others. However, full understanding and consequently optimal utilization of the process has lagged behind its wide application due to the lack of dedicated computational approaches. At the same time, the combination of SELEX with novel sequencing technologies is beginning to provide the data that will allow the examination of a variety of properties of the selection process. To close this gap we developed, Aptamotif, a computational method for the identification of sequence-structure motifs in SELEX-derived aptamers. To increase the chances of identifying functional motifs, Aptamotif uses an ensemble-based approach. We validated the method using two published aptamer datasets containing experimentally determined motifs of increasing complexity. We were able to recreate the author's findings to a high degree, thus proving the capability of our approach to identify binding motifs in SELEX data. Additionally, using our new experimental dataset, we illustrate the application of Aptamotif to elucidate several properties of the selection process.
Exploitation of peptide motif sequences and their use in nanobiotechnology.

PubMed

Shiba, Kiyotaka

2010-08-01

Short amino acid sequences extracted from natural proteins or created using in vitro evolution systems are sometimes associated with particular biological functions. These peptides, called peptide motifs, can serve as functional units for the creation of various tools for nanobiotechnology. In particular, peptide motifs that have the ability to specifically recognize the surfaces of solid materials and to mineralize certain inorganic materials have been linking biological science to material science. Here, I review how these peptide motifs have been isolated from natural proteins or created using in vitro evolution systems, and how they have been used in the nanobiotechnology field. Copyright © 2010 Elsevier Ltd. All rights reserved.
Recurring sequence-structure motifs in (βα)8-barrel proteins and experimental optimization of a chimeric protein designed based on such motifs.

PubMed

Wang, Jichao; Zhang, Tongchuan; Liu, Ruicun; Song, Meilin; Wang, Juncheng; Hong, Jiong; Chen, Quan; Liu, Haiyan

2017-02-01

An interesting way of generating novel artificial proteins is to combine sequence motifs from natural proteins, mimicking the evolutionary path suggested by natural proteins comprising recurring motifs. We analyzed the βα and αβ modules of TIM barrel proteins by structure alignment-based sequence clustering. A number of preferred motifs were identified. A chimeric TIM was designed by using recurring elements as mutually compatible interfaces. The foldability of the designed TIM protein was then significantly improved by six rounds of directed evolution. The melting temperature has been improved by more than 20°C. A variety of characteristics suggested that the resulting protein is well-folded. Our analysis provided a library of peptide motifs that is potentially useful for different protein engineering studies. The protein engineering strategy of using recurring motifs as interfaces to connect partial natural proteins may be applied to other protein folds. Copyright © 2016 Elsevier B.V. All rights reserved.

A private DNA motif finding algorithm.

PubMed

Chen, Rui; Peng, Yun; Choi, Byron; Xu, Jianliang; Hu, Haibo

2014-08-01

With the increasing availability of genomic sequence data, numerous methods have been proposed for finding DNA motifs. The discovery of DNA motifs serves a critical step in many biological applications. However, the privacy implication of DNA analysis is normally neglected in the existing methods. In this work, we propose a private DNA motif finding algorithm in which a DNA owner's privacy is protected by a rigorous privacy model, known as ∊-differential privacy. It provides provable privacy guarantees that are independent of adversaries' background knowledge. Our algorithm makes use of the n-gram model and is optimized for processing large-scale DNA sequences. We evaluate the performance of our algorithm over real-life genomic data and demonstrate the promise of integrating privacy into DNA motif finding. Copyright © 2014 Elsevier Inc. All rights reserved.
Identification of sequence motifs in oligonucleotides whose presence is correlated with antisense activity

PubMed Central

Matveeva, O. V.; Tsodikov, A. D.; Giddings, M.; Freier, S. M.; Wyatt, J. R.; Spiridonov, A. N.; Shabalina, S. A.; Gesteland, R. F.; Atkins, J. F.

2000-01-01

Design of antisense oligonucleotides targeting any mRNA can be much more efficient when several activity-enhancing motifs are included and activity-decreasing motifs are avoided. This conclusion was made after statistical analysis of data collected from >1000 experiments with phosphorothioate-modified oligonucleotides. Highly significant positive correlation between the presence of motifs CCAC, TCCC, ACTC, GCCA and CTCT in the oligonucleotide and its antisense efficiency was demonstrated. In addition, negative correlation was revealed for the motifs GGGG, ACTG, AAA and TAA. It was found that the likelihood of activity of an oligonucleotide against a desired mRNA target is sequence motif content dependent. PMID:10908347
DEEP MOTIF DASHBOARD: VISUALIZING AND UNDERSTANDING GENOMIC SEQUENCES USING DEEP NEURAL NETWORKS.

PubMed

Lanchantin, Jack; Singh, Ritambhara; Wang, Beilun; Qi, Yanjun

2017-01-01

Deep neural network (DNN) models have recently obtained state-of-the-art prediction accuracy for the transcription factor binding (TFBS) site classification task. However, it remains unclear how these approaches identify meaningful DNA sequence signals and give insights as to why TFs bind to certain locations. In this paper, we propose a toolkit called the Deep Motif Dashboard (DeMo Dashboard) which provides a suite of visualization strategies to extract motifs, or sequence patterns from deep neural network models for TFBS classification. We demonstrate how to visualize and understand three important DNN models: convolutional, recurrent, and convolutional-recurrent networks. Our first visualization method is finding a test sequence's saliency map which uses first-order derivatives to describe the importance of each nucleotide in making the final prediction. Second, considering recurrent models make predictions in a temporal manner (from one end of a TFBS sequence to the other), we introduce temporal output scores, indicating the prediction score of a model over time for a sequential input. Lastly, a class-specific visualization strategy finds the optimal input sequence for a given TFBS positive class via stochastic gradient optimization. Our experimental results indicate that a convolutional-recurrent architecture performs the best among the three architectures. The visualization techniques indicate that CNN-RNN makes predictions by modeling both motifs as well as dependencies among them.
Defining a Conformational Consensus Motif in Cotransin-Sensitive Signal Sequences: A Proteomic and Site-Directed Mutagenesis Study

PubMed Central

Klein, Wolfgang; Westendorf, Carolin; Schmidt, Antje; Conill-Cortés, Mercè; Rutz, Claudia; Blohs, Marcus; Beyermann, Michael; Protze, Jonas; Krause, Gerd; Krause, Eberhard; Schülein, Ralf

2015-01-01

The cyclodepsipeptide cotransin was described to inhibit the biosynthesis of a small subset of proteins by a signal sequence-discriminatory mechanism at the Sec61 protein-conducting channel. However, it was not clear how selective cotransin is, i.e. how many proteins are sensitive. Moreover, a consensus motif in signal sequences mediating cotransin sensitivity has yet not been described. To address these questions, we performed a proteomic study using cotransin-treated human hepatocellular carcinoma cells and the stable isotope labelling by amino acids in cell culture technique in combination with quantitative mass spectrometry. We used a saturating concentration of cotransin (30 micromolar) to identify also less-sensitive proteins and to discriminate the latter from completely resistant proteins. We found that the biosynthesis of almost all secreted proteins was cotransin-sensitive under these conditions. In contrast, biosynthesis of the majority of the integral membrane proteins was cotransin-resistant. Cotransin sensitivity of signal sequences was neither related to their length nor to their hydrophobicity. Instead, in the case of signal anchor sequences, we identified for the first time a conformational consensus motif mediating cotransin sensitivity. PMID:25806945
Composite Structural Motifs of Binding Sites for Delineating Biological Functions of Proteins

PubMed Central

Kinjo, Akira R.; Nakamura, Haruki

2012-01-01

Most biological processes are described as a series of interactions between proteins and other molecules, and interactions are in turn described in terms of atomic structures. To annotate protein functions as sets of interaction states at atomic resolution, and thereby to better understand the relation between protein interactions and biological functions, we conducted exhaustive all-against-all atomic structure comparisons of all known binding sites for ligands including small molecules, proteins and nucleic acids, and identified recurring elementary motifs. By integrating the elementary motifs associated with each subunit, we defined composite motifs that represent context-dependent combinations of elementary motifs. It is demonstrated that function similarity can be better inferred from composite motif similarity compared to the similarity of protein sequences or of individual binding sites. By integrating the composite motifs associated with each protein function, we define meta-composite motifs each of which is regarded as a time-independent diagrammatic representation of a biological process. It is shown that meta-composite motifs provide richer annotations of biological processes than sequence clusters. The present results serve as a basis for bridging atomic structures to higher-order biological phenomena by classification and integration of binding site structures. PMID:22347478
Deep Motif Dashboard: Visualizing and Understanding Genomic Sequences Using Deep Neural Networks

PubMed Central

Lanchantin, Jack; Singh, Ritambhara; Wang, Beilun; Qi, Yanjun

2018-01-01

Deep neural network (DNN) models have recently obtained state-of-the-art prediction accuracy for the transcription factor binding (TFBS) site classification task. However, it remains unclear how these approaches identify meaningful DNA sequence signals and give insights as to why TFs bind to certain locations. In this paper, we propose a toolkit called the Deep Motif Dashboard (DeMo Dashboard) which provides a suite of visualization strategies to extract motifs, or sequence patterns from deep neural network models for TFBS classification. We demonstrate how to visualize and understand three important DNN models: convolutional, recurrent, and convolutional-recurrent networks. Our first visualization method is finding a test sequence’s saliency map which uses first-order derivatives to describe the importance of each nucleotide in making the final prediction. Second, considering recurrent models make predictions in a temporal manner (from one end of a TFBS sequence to the other), we introduce temporal output scores, indicating the prediction score of a model over time for a sequential input. Lastly, a class-specific visualization strategy finds the optimal input sequence for a given TFBS positive class via stochastic gradient optimization. Our experimental results indicate that a convolutional-recurrent architecture performs the best among the three architectures. The visualization techniques indicate that CNN-RNN makes predictions by modeling both motifs as well as dependencies among them. PMID:27896980
RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach.

PubMed

Pan, Xiaoyong; Shen, Hong-Bin

2017-02-28

RNAs play key roles in cells through the interactions with proteins known as the RNA-binding proteins (RBP) and their binding motifs enable crucial understanding of the post-transcriptional regulation of RNAs. How the RBPs correctly recognize the target RNAs and why they bind specific positions is still far from clear. Machine learning-based algorithms are widely acknowledged to be capable of speeding up this process. Although many automatic tools have been developed to predict the RNA-protein binding sites from the rapidly growing multi-resource data, e.g. sequence, structure, their domain specific features and formats have posed significant computational challenges. One of current difficulties is that the cross-source shared common knowledge is at a higher abstraction level beyond the observed data, resulting in a low efficiency of direct integration of observed data across domains. The other difficulty is how to interpret the prediction results. Existing approaches tend to terminate after outputting the potential discrete binding sites on the sequences, but how to assemble them into the meaningful binding motifs is a topic worth of further investigation. In viewing of these challenges, we propose a deep learning-based framework (iDeep) by using a novel hybrid convolutional neural network and deep belief network to predict the RBP interaction sites and motifs on RNAs. This new protocol is featured by transforming the original observed data into a high-level abstraction feature space using multiple layers of learning blocks, where the shared representations across different domains are integrated. To validate our iDeep method, we performed experiments on 31 large-scale CLIP-seq datasets, and our results show that by integrating multiple sources of data, the average AUC can be improved by 8% compared to the best single-source-based predictor; and through cross-domain knowledge integration at an abstraction level, it outperforms the state-of-the-art predictors by 6
[Screening specific recognition motif of RNA-binding proteins by SELEX in combination with next-generation sequencing technique].

PubMed

Zhang, Lu; Xu, Jinhao; Ma, Jinbiao

2016-07-25

RNA-binding protein exerts important biological function by specifically recognizing RNA motif. SELEX (Systematic evolution of ligands by exponential enrichment), an in vitro selection method, can obtain consensus motif with high-affinity and specificity for many target molecules from DNA or RNA libraries. Here, we combined SELEX with next-generation sequencing to study the protein-RNA interaction in vitro. A pool of RNAs with 20 bp random sequences were transcribed by T7 promoter, and target protein was inserted into plasmid containing SBP-tag, which can be captured by streptavidin beads. Through only one cycle, the specific RNA motif can be obtained, which dramatically improved the selection efficiency. Using this method, we found that human hnRNP A1 RRMs domain (UP1 domain) bound RNA motifs containing AGG and AG sequences. The EMSA experiment indicated that hnRNP A1 RRMs could bind the obtained RNA motif. Taken together, this method provides a rapid and effective method to study the RNA binding specificity of proteins.
Gibbs motif sampling: detection of bacterial outer membrane protein repeats.

PubMed Central

Neuwald, A. F.; Liu, J. S.; Lawrence, C. E.

1995-01-01

The detection and alignment of locally conserved regions (motifs) in multiple sequences can provide insight into protein structure, function, and evolution. A new Gibbs sampling algorithm is described that detects motif-encoding regions in sequences and optimally partitions them into distinct motif models; this is illustrated using a set of immunoglobulin fold proteins. When applied to sequences sharing a single motif, the sampler can be used to classify motif regions into related submodels, as is illustrated using helix-turn-helix DNA-binding proteins. Other statistically based procedures are described for searching a database for sequences matching motifs found by the sampler. When applied to a set of 32 very distantly related bacterial integral outer membrane proteins, the sampler revealed that they share a subtle, repetitive motif. Although BLAST (Altschul SF et al., 1990, J Mol Biol 215:403-410) fails to detect significant pairwise similarity between any of the sequences, the repeats present in these outer membrane proteins, taken as a whole, are highly significant (based on a generally applicable statistical test for motifs described here). Analysis of bacterial porins with known trimeric beta-barrel structure and related proteins reveals a similar repetitive motif corresponding to alternating membrane-spanning beta-strands. These beta-strands occur on the membrane interface (as opposed to the trimeric interface) of the beta-barrel. The broad conservation and structural location of these repeats suggests that they play important functional roles. PMID:8520488
MicroRNA categorization using sequence motifs and k-mers.

PubMed

Yousef, Malik; Khalifa, Waleed; Acar, İlhan Erkin; Allmer, Jens

2017-03-14

Post-transcriptional gene dysregulation can be a hallmark of diseases like cancer and microRNAs (miRNAs) play a key role in the modulation of translation efficiency. Known pre-miRNAs are listed in miRBase, and they have been discovered in a variety of organisms ranging from viruses and microbes to eukaryotic organisms. The computational detection of pre-miRNAs is of great interest, and such approaches usually employ machine learning to discriminate between miRNAs and other sequences. Many features have been proposed describing pre-miRNAs, and we have previously introduced the use of sequence motifs and k-mers as useful ones. There have been reports of xeno-miRNAs detected via next generation sequencing. However, they may be contaminations and to aid that important decision-making process, we aimed to establish a means to differentiate pre-miRNAs from different species. To achieve distinction into species, we used one species' pre-miRNAs as the positive and another species' pre-miRNAs as the negative training and test data for the establishment of machine learned models based on sequence motifs and k-mers as features. This approach resulted in higher accuracy values between distantly related species while species with closer relation produced lower accuracy values. We were able to differentiate among species with increasing success when the evolutionary distance increases. This conclusion is supported by previous reports of fast evolutionary changes in miRNAs since even in relatively closely related species a fairly good discrimination was possible.
Distance-dependent duplex DNA destabilization proximal to G-quadruplex/i-motif sequences

PubMed Central

König, Sebastian L. B.; Huppert, Julian L.; Sigel, Roland K. O.; Evans, Amanda C.

2013-01-01

G-quadruplexes and i-motifs are complementary examples of non-canonical nucleic acid substructure conformations. G-quadruplex thermodynamic stability has been extensively studied for a variety of base sequences, but the degree of duplex destabilization that adjacent quadruplex structure formation can cause has yet to be fully addressed. Stable in vivo formation of these alternative nucleic acid structures is likely to be highly dependent on whether sufficient spacing exists between neighbouring duplex- and quadruplex-/i-motif-forming regions to accommodate quadruplexes or i-motifs without disrupting duplex stability. Prediction of putative G-quadruplex-forming regions is likely to be assisted by further understanding of what distance (number of base pairs) is required for duplexes to remain stable as quadruplexes or i-motifs form. Using oligonucleotide constructs derived from precedented G-quadruplexes and i-motif-forming bcl-2 P1 promoter region, initial biophysical stability studies indicate that the formation of G-quadruplex and i-motif conformations do destabilize proximal duplex regions. The undermining effect that quadruplex formation can have on duplex stability is mitigated with increased distance from the duplex region: a spacing of five base pairs or more is sufficient to maintain duplex stability proximal to predicted quadruplex/i-motif-forming regions. PMID:23771141
Discovery of T Cell Receptor β Motifs Specific to HLA-B27-Positive Ankylosing Spondylitis by Deep Repertoire Sequence Analysis.

PubMed

Faham, Malek; Carlton, Victoria; Moorhead, Martin; Zheng, Jianbiao; Klinger, Mark; Pepin, Francois; Asbury, Thomas; Vignali, Marissa; Emerson, Ryan O; Robins, Harlan S; Ireland, James; Baechler-Gillespie, Emily; Inman, Robert D

2017-04-01

Ankylosing spondylitis (AS), a chronic inflammatory disorder, has a notable association with HLA-B27. One hypothesis suggests that a common antigen that binds to HLA-B27 is important for AS disease pathogenesis. This study was undertaken to determine sequences and motifs that are shared among HLA-B27-positive AS patients, using T cell repertoire next-generation sequencing. To identify motifs enriched among B27-positive AS patients, we performed T cell receptor β (TCRβ) repertoire sequencing on samples from 191 B27-positive AS patients, 43 B27-negative AS patients, and 227 controls, and we obtained >77 million TCRβ clonotype sequences. First, we assessed whether any of 50 previously published sequences were enriched in B27-positive AS patients. We then used training and test cohorts to identify discovered motifs that were enriched in B27-positive AS patients versus controls. Six previously published and 11 discovered motifs were enriched in the B27-positive AS samples as compared to controls. After combining motifs related by sequence, we identified a total of 15 independent motifs. Both the full set of 15 motifs and a set of 6 published motifs were enriched in the B27-positive AS patients as compared to B27-positive healthy individuals (P = 0.049 and P = 0.001, respectively). Using an independent cohort, we validated that at least some of these motifs were associated with AS, and not simply with B27-positive status. We identified TCRβ motifs that are enriched in B27-positive AS patients as compared to B27-positive healthy controls. This suggests that a common antigen, presented by HLA-B27 and detected by CD8+ T cells, may be associated with AS disease pathogenesis. © 2016, American College of Rheumatology.
SMARTIV: combined sequence and structure de-novo motif discovery for in-vivo RNA binding data.

PubMed

Polishchuk, Maya; Paz, Inbal; Yakhini, Zohar; Mandel-Gutfreund, Yael

2018-05-25

Gene expression regulation is highly dependent on binding of RNA-binding proteins (RBPs) to their RNA targets. Growing evidence supports the notion that both RNA primary sequence and its local secondary structure play a role in specific Protein-RNA recognition and binding. Despite the great advance in high-throughput experimental methods for identifying sequence targets of RBPs, predicting the specific sequence and structure binding preferences of RBPs remains a major challenge. We present a novel webserver, SMARTIV, designed for discovering and visualizing combined RNA sequence and structure motifs from high-throughput RNA-binding data, generated from in-vivo experiments. The uniqueness of SMARTIV is that it predicts motifs from enriched k-mers that combine information from ranked RNA sequences and their predicted secondary structure, obtained using various folding methods. Consequently, SMARTIV generates Position Weight Matrices (PWMs) in a combined sequence and structure alphabet with assigned P-values. SMARTIV concisely represents the sequence and structure motif content as a single graphical logo, which is informative and easy for visual perception. SMARTIV was examined extensively on a variety of high-throughput binding experiments for RBPs from different families, generated from different technologies, showing consistent and accurate results. Finally, SMARTIV is a user-friendly webserver, highly efficient in run-time and freely accessible via http://smartiv.technion.ac.il/.
Triazine-based sequence-defined polymers with side-chain diversity and backbone-backbone interaction motifs

DOE PAGES

Grate, Jay W.; Mo, Kai -For; Daily, Michael D.

2016-02-10

Sequence control in polymers, well-known in nature, encodes structure and functionality. Here we introduce a new architecture, based on the nucleophilic aromatic substitution chemistry of cyanuric chloride, that creates a new class of sequence-defined polymers dubbed TZPs. Proof of concept is demonstrated with two synthesized hexamers, having neutral and ionizable side chains. Molecular dynamics simulations show backbone–backbone interactions, including H-bonding motifs and pi–pi interactions. This architecture is arguably biomimetic while differing from sequence-defined polymers having peptide bonds. In conclusion, the synthetic methodology supports the structural diversity of side chains known in peptides, as well as backbone–backbone hydrogen-bonding motifs, and willmore » thus enable new macromolecules and materials with useful functions.« less
Triazine-Based Sequence-Defined Polymers with Side-Chain Diversity and Backbone-Backbone Interaction Motifs.

PubMed

Grate, Jay W; Mo, Kai-For; Daily, Michael D

2016-03-14

Sequence control in polymers, well-known in nature, encodes structure and functionality. Here we introduce a new architecture, based on the nucleophilic aromatic substitution chemistry of cyanuric chloride, that creates a new class of sequence-defined polymers dubbed TZPs. Proof of concept is demonstrated with two synthesized hexamers, having neutral and ionizable side chains. Molecular dynamics simulations show backbone-backbone interactions, including H-bonding motifs and pi-pi interactions. This architecture is arguably biomimetic while differing from sequence-defined polymers having peptide bonds. The synthetic methodology supports the structural diversity of side chains known in peptides, as well as backbone-backbone hydrogen-bonding motifs, and will thus enable new macromolecules and materials with useful functions. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Triazine-based sequence-defined polymers with side-chain diversity and backbone-backbone interaction motifs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Grate, Jay W.; Mo, Kai -For; Daily, Michael D.

Sequence control in polymers, well-known in nature, encodes structure and functionality. Here we introduce a new architecture, based on the nucleophilic aromatic substitution chemistry of cyanuric chloride, that creates a new class of sequence-defined polymers dubbed TZPs. Proof of concept is demonstrated with two synthesized hexamers, having neutral and ionizable side chains. Molecular dynamics simulations show backbone–backbone interactions, including H-bonding motifs and pi–pi interactions. This architecture is arguably biomimetic while differing from sequence-defined polymers having peptide bonds. In conclusion, the synthetic methodology supports the structural diversity of side chains known in peptides, as well as backbone–backbone hydrogen-bonding motifs, and willmore » thus enable new macromolecules and materials with useful functions.« less
An effective approach for annotation of protein families with low sequence similarity and conserved motifs: identifying GDSL hydrolases across the plant kingdom.

PubMed

Vujaklija, Ivan; Bielen, Ana; Paradžik, Tina; Biđin, Siniša; Goldstein, Pavle; Vujaklija, Dušica

2016-02-18

The massive accumulation of protein sequences arising from the rapid development of high-throughput sequencing, coupled with automatic annotation, results in high levels of incorrect annotations. In this study, we describe an approach to decrease annotation errors of protein families characterized by low overall sequence similarity. The GDSL lipolytic family comprises proteins with multifunctional properties and high potential for pharmaceutical and industrial applications. The number of proteins assigned to this family has increased rapidly over the last few years. In particular, the natural abundance of GDSL enzymes reported recently in plants indicates that they could be a good source of novel GDSL enzymes. We noticed that a significant proportion of annotated sequences lack specific GDSL motif(s) or catalytic residue(s). Here, we applied motif-based sequence analyses to identify enzymes possessing conserved GDSL motifs in selected proteomes across the plant kingdom. Motif-based HMM scanning (Viterbi decoding-VD and posterior decoding-PD) and the here described PD/VD protocol were successfully applied on 12 selected plant proteomes to identify sequences with GDSL motifs. A significant number of identified GDSL sequences were novel. Moreover, our scanning approach successfully detected protein sequences lacking at least one of the essential motifs (171/820) annotated by Pfam profile search (PfamA) as GDSL. Based on these analyses we provide a curated list of GDSL enzymes from the selected plants. CLANS clustering and phylogenetic analysis helped us to gain a better insight into the evolutionary relationship of all identified GDSL sequences. Three novel GDSL subfamilies as well as unreported variations in GDSL motifs were discovered in this study. In addition, analyses of selected proteomes showed a remarkable expansion of GDSL enzymes in the lycophyte, Selaginella moellendorffii. Finally, we provide a general motif-HMM scanner which is easily accessible through
SamSelect: a sample sequence selection algorithm for quorum planted motif search on large DNA datasets.

PubMed

Yu, Qiang; Wei, Dingbang; Huo, Hongwei

2018-06-18

Given a set of t n-length DNA sequences, q satisfying 0 < q ≤ 1, and l and d satisfying 0 ≤ d < l < n, the quorum planted motif search (qPMS) finds l-length strings that occur in at least qt input sequences with up to d mismatches and is mainly used to locate transcription factor binding sites in DNA sequences. Existing qPMS algorithms have been able to efficiently process small standard datasets (e.g., t = 20 and n = 600), but they are too time consuming to process large DNA datasets, such as ChIP-seq datasets that contain thousands of sequences or more. We analyze the effects of t and q on the time performance of qPMS algorithms and find that a large t or a small q causes a longer computation time. Based on this information, we improve the time performance of existing qPMS algorithms by selecting a sample sequence set D' with a small t and a large q from the large input dataset D and then executing qPMS algorithms on D'. A sample sequence selection algorithm named SamSelect is proposed. The experimental results on both simulated and real data show (1) that SamSelect can select D' efficiently and (2) that the qPMS algorithms executed on D' can find implanted or real motifs in a significantly shorter time than when executed on D. We improve the ability of existing qPMS algorithms to process large DNA datasets from the perspective of selecting high-quality sample sequence sets so that the qPMS algorithms can find motifs in a short time in the selected sample sequence set D', rather than take an unfeasibly long time to search the original sequence set D. Our motif discovery method is an approximate algorithm.
TrawlerWeb: an online de novo motif discovery tool for next-generation sequencing datasets.

PubMed

Dang, Louis T; Tondl, Markus; Chiu, Man Ho H; Revote, Jerico; Paten, Benedict; Tano, Vincent; Tokolyi, Alex; Besse, Florence; Quaife-Ryan, Greg; Cumming, Helen; Drvodelic, Mark J; Eichenlaub, Michael P; Hallab, Jeannette C; Stolper, Julian S; Rossello, Fernando J; Bogoyevitch, Marie A; Jans, David A; Nim, Hieu T; Porrello, Enzo R; Hudson, James E; Ramialison, Mirana

2018-04-05

A strong focus of the post-genomic era is mining of the non-coding regulatory genome in order to unravel the function of regulatory elements that coordinate gene expression (Nat 489:57-74, 2012; Nat 507:462-70, 2014; Nat 507:455-61, 2014; Nat 518:317-30, 2015). Whole-genome approaches based on next-generation sequencing (NGS) have provided insight into the genomic location of regulatory elements throughout different cell types, organs and organisms. These technologies are now widespread and commonly used in laboratories from various fields of research. This highlights the need for fast and user-friendly software tools dedicated to extracting cis-regulatory information contained in these regulatory regions; for instance transcription factor binding site (TFBS) composition. Ideally, such tools should not require prior programming knowledge to ensure they are accessible for all users. We present TrawlerWeb, a web-based version of the Trawler_standalone tool (Nat Methods 4:563-5, 2007; Nat Protoc 5:323-34, 2010), to allow for the identification of enriched motifs in DNA sequences obtained from next-generation sequencing experiments in order to predict their TFBS composition. TrawlerWeb is designed for online queries with standard options common to web-based motif discovery tools. In addition, TrawlerWeb provides three unique new features: 1) TrawlerWeb allows the input of BED files directly generated from NGS experiments, 2) it automatically generates an input-matched biologically relevant background, and 3) it displays resulting conservation scores for each instance of the motif found in the input sequences, which assists the researcher in prioritising the motifs to validate experimentally. Finally, to date, this web-based version of Trawler_standalone remains the fastest online de novo motif discovery tool compared to other popular web-based software, while generating predictions with high accuracy. TrawlerWeb provides users with a fast, simple and easy-to-use web
Hairpin structures with conserved sequence motifs determine the 3' ends of non-polyadenylated invertebrate iridovirus transcripts.

PubMed

İnce, İkbal Agah; Pijlman, Gorben P; Vlak, Just M; van Oers, Monique M

2017-11-01

Previously, we observed that the transcripts of Invertebrate iridescent virus 6 (IIV6) are not polyadenylated, in line with the absence of canonical poly(A) motifs (AATAAA) downstream of the open reading frames (ORFs) in the genome. Here, we determined the 3' ends of the transcripts of fifty-four IIV6 virion protein genes in infected Drosophila Schneider 2 (S2) cells. By using ligation-based amplification of cDNA ends (LACE) it was shown that the IIV6 mRNAs often ended with a CAUUA motif. In silico analysis showed that the 3'-untranslated regions of IIV6 genes have the ability to form hairpin structures (22-56 nt in length) and that for about half of all IIV6 genes these 3' sequences contained complementary TAATG and CATTA motifs. We also show that a hairpin in the 3' flanking region with conserved sequence motifs is a conserved feature in invertebrate-infecting iridoviruses (genus Iridovirus and Chloriridovirus). Copyright © 2017 Elsevier Inc. All rights reserved.

Motif finding in DNA sequences based on skipping nonconserved positions in background Markov chains.

PubMed

Zhao, Xiaoyan; Sze, Sing-Hoi

2011-05-01

One strategy to identify transcription factor binding sites is through motif finding in upstream DNA sequences of potentially co-regulated genes. Despite extensive efforts, none of the existing algorithms perform very well. We consider a string representation that allows arbitrary ignored positions within the nonconserved portion of single motifs, and use O(2(l)) Markov chains to model the background distributions of motifs of length l while skipping these positions within each Markov chain. By focusing initially on positions that have fixed nucleotides to define core occurrences, we develop an algorithm to identify motifs of moderate lengths. We compare the performance of our algorithm to other motif finding algorithms on a few benchmark data sets, and show that significant improvement in accuracy can be obtained when the sites are sufficiently conserved within a given sample, while comparable performance is obtained when the site conservation rate is low. A software program (PosMotif ) and detailed results are available online at http://faculty.cse.tamu.edu/shsze/posmotif.
Statistical tests to compare motif count exceptionalities

PubMed Central

Robin, Stéphane; Schbath, Sophie; Vandewalle, Vincent

2007-01-01

Background Finding over- or under-represented motifs in biological sequences is now a common task in genomics. Thanks to p-value calculation for motif counts, exceptional motifs are identified and represent candidate functional motifs. The present work addresses the related question of comparing the exceptionality of one motif in two different sequences. Just comparing the motif count p-values in each sequence is indeed not sufficient to decide if this motif is significantly more exceptional in one sequence compared to the other one. A statistical test is required. Results We develop and analyze two statistical tests, an exact binomial one and an asymptotic likelihood ratio test, to decide whether the exceptionality of a given motif is equivalent or significantly different in two sequences of interest. For that purpose, motif occurrences are modeled by Poisson processes, with a special care for overlapping motifs. Both tests can take the sequence compositions into account. As an illustration, we compare the octamer exceptionalities in the Escherichia coli K-12 backbone versus variable strain-specific loops. Conclusion The exact binomial test is particularly adapted for small counts. For large counts, we advise to use the likelihood ratio test which is asymptotic but strongly correlated with the exact binomial test and very simple to use. PMID:17346349
Rapid motif compliance scoring with match weight sets.

PubMed

Venezia, D; O'Hara, P J

1993-02-01

Most current implementations of motif matching in biological sequences have sacrificed the generality of weight matrix scoring for shorter runtimes. The program MOTIF incorporates a weight matrix and a rapid, backtracking tree-search algorithm to score motif compliance with greatly enhanced performance while placing no constraints on the motif. In addition, any positions within a motif can be marked as 'inviolate', thereby requiring an exact match. MOTIF allows a choice of regular expression formats and can use both motif and sequence libraries as either targets or queries. Nucleic acid sequences can optionally be translated by MOTIF in any frame(s) and used against peptide motifs.
Evolutionary dynamics of a conserved sequence motif in the ribosomal genes of the ciliate Paramecium.

PubMed

Catania, Francesco; Lynch, Michael

2010-05-04

In protozoa, the identification of preserved motifs by comparative genomics is often impeded by difficulties to generate reliable alignments for non-coding sequences. Moreover, the evolutionary dynamics of regulatory elements in 3' untranslated regions (both in protozoa and metazoa) remains a virtually unexplored issue. By screening Paramecium tetraurelia's 3' untranslated regions for 8-mers that were previously found to be preserved in mammalian 3' UTRs, we detect and characterize a motif that is distinctly conserved in the ribosomal genes of this ciliate. The motif appears to be conserved across Paramecium aurelia species but is absent from the ribosomal genes of four additional non-Paramecium species surveyed, including another ciliate, Tetrahymena thermophila. Motif-free ribosomal genes retain fewer paralogs in the genome and appear to be lost more rapidly relative to motif-containing genes. Features associated with the discovered preserved motif are consistent with this 8-mer playing a role in post-transcriptional regulation. Our observations 1) shed light on the evolution of a putative regulatory motif across large phylogenetic distances; 2) are expected to facilitate the understanding of the modulation of ribosomal genes expression in Paramecium; and 3) reveal a largely unexplored--and presumably not restricted to Paramecium--association between the presence/absence of a DNA motif and the evolutionary fate of its host genes.
Evolutionary dynamics of a conserved sequence motif in the ribosomal genes of the ciliate Paramecium

PubMed Central

2010-01-01

Background In protozoa, the identification of preserved motifs by comparative genomics is often impeded by difficulties to generate reliable alignments for non-coding sequences. Moreover, the evolutionary dynamics of regulatory elements in 3' untranslated regions (both in protozoa and metazoa) remains a virtually unexplored issue. Results By screening Paramecium tetraurelia's 3' untranslated regions for 8-mers that were previously found to be preserved in mammalian 3' UTRs, we detect and characterize a motif that is distinctly conserved in the ribosomal genes of this ciliate. The motif appears to be conserved across Paramecium aurelia species but is absent from the ribosomal genes of four additional non-Paramecium species surveyed, including another ciliate, Tetrahymena thermophila. Motif-free ribosomal genes retain fewer paralogs in the genome and appear to be lost more rapidly relative to motif-containing genes. Features associated with the discovered preserved motif are consistent with this 8-mer playing a role in post-transcriptional regulation. Conclusions Our observations 1) shed light on the evolution of a putative regulatory motif across large phylogenetic distances; 2) are expected to facilitate the understanding of the modulation of ribosomal genes expression in Paramecium; and 3) reveal a largely unexplored--and presumably not restricted to Paramecium--association between the presence/absence of a DNA motif and the evolutionary fate of its host genes. PMID:20441586
The effect of orthology and coregulation on detecting regulatory motifs.

PubMed

Storms, Valerie; Claeys, Marleen; Sanchez, Aminael; De Moor, Bart; Verstuyf, Annemieke; Marchal, Kathleen

2010-02-03

Computational de novo discovery of transcription factor binding sites is still a challenging problem. The growing number of sequenced genomes allows integrating orthology evidence with coregulation information when searching for motifs. Moreover, the more advanced motif detection algorithms explicitly model the phylogenetic relatedness between the orthologous input sequences and thus should be well adapted towards using orthologous information. In this study, we evaluated the conditions under which complementing coregulation with orthologous information improves motif detection for the class of probabilistic motif detection algorithms with an explicit evolutionary model. We designed datasets (real and synthetic) covering different degrees of coregulation and orthologous information to test how well Phylogibbs and Phylogenetic sampler, as representatives of the motif detection algorithms with evolutionary model performed as compared to MEME, a more classical motif detection algorithm that treats orthologs independently. Under certain conditions detecting motifs in the combined coregulation-orthology space is indeed more efficient than using each space separately, but this is not always the case. Moreover, the difference in success rate between the advanced algorithms and MEME is still marginal. The success rate of motif detection depends on the complex interplay between the added information and the specificities of the applied algorithms. Insights in this relation provide information useful to both developers and users. All benchmark datasets are available at http://homes.esat.kuleuven.be/~kmarchal/Supplementary_Storms_Valerie_PlosONE.
A sequence-specific transcription activator motif and powerful synthetic variants that bind Mediator using a fuzzy protein interface.

PubMed

Warfield, Linda; Tuttle, Lisa M; Pacheco, Derek; Klevit, Rachel E; Hahn, Steven

2014-08-26

Although many transcription activators contact the same set of coactivator complexes, the mechanism and specificity of these interactions have been unclear. For example, do intrinsically disordered transcription activation domains (ADs) use sequence-specific motifs, or do ADs of seemingly different sequence have common properties that encode activation function? We find that the central activation domain (cAD) of the yeast activator Gcn4 functions through a short, conserved sequence-specific motif. Optimizing the residues surrounding this short motif by inserting additional hydrophobic residues creates very powerful ADs that bind the Mediator subunit Gal11/Med15 with high affinity via a "fuzzy" protein interface. In contrast to Gcn4, the activity of these synthetic ADs is not strongly dependent on any one residue of the AD, and this redundancy is similar to that of some natural ADs in which few if any sequence-specific residues have been identified. The additional hydrophobic residues in the synthetic ADs likely allow multiple faces of the AD helix to interact with the Gal11 activator-binding domain, effectively forming a fuzzier interface than that of the wild-type cAD.
Unique Structural Features and Sequence Motifs of Proline Utilization A (PutA)

PubMed Central

Singh, Ranjan K.; Tanner, John J.

2013-01-01

Proline utilization A proteins (PutAs) are bifunctional enzymes that catalyze the oxidation of proline to glutamate using spatially separated proline dehydrogenase and pyrroline-5-carboxylate dehydrogenase active sites. Here we use the crystal structure of the minimalist PutA from Bradyrhizobium japonicum (BjPutA) along with sequence analysis to identify unique structural features of PutAs. This analysis shows that PutAs have secondary structural elements and domains not found in the related monofunctional enzymes. Some of these extra features are predicted to be important for substrate channeling in BjPutA. Multiple sequence alignment analysis shows that some PutAs have a 17-residue conserved motif in the C-terminal 20–30 residues of the polypeptide chain. The BjPutA structure shows that this motif helps seal the internal substrate-channeling cavity from the bulk medium. Finally, it is shown that some PutAs have a 100–200 residue domain of unknown function in the C-terminus that is not found in minimalist PutAs. Remote homology detection suggests that this domain is homologous to the oligomerization beta-hairpin and Rossmann fold domain of BjPutA. PMID:22201760
Sequence-specific DNA binding by MYC/MAX to low-affinity non-E-box motifs.

PubMed

Allevato, Michael; Bolotin, Eugene; Grossman, Mark; Mane-Padros, Daniel; Sladek, Frances M; Martinez, Ernest

2017-01-01

The MYC oncoprotein regulates transcription of a large fraction of the genome as an obligatory heterodimer with the transcription factor MAX. The MYC:MAX heterodimer and MAX:MAX homodimer (hereafter MYC/MAX) bind Enhancer box (E-box) DNA elements (CANNTG) and have the greatest affinity for the canonical MYC E-box (CME) CACGTG. However, MYC:MAX also recognizes E-box variants and was reported to bind DNA in a "non-specific" fashion in vitro and in vivo. Here, in order to identify potential additional non-canonical binding sites for MYC/MAX, we employed high throughput in vitro protein-binding microarrays, along with electrophoretic mobility-shift assays and bioinformatic analyses of MYC-bound genomic loci in vivo. We identified all hexameric motifs preferentially bound by MYC/MAX in vitro, which include the low-affinity non-E-box sequence AACGTT, and found that the vast majority (87%) of MYC-bound genomic sites in a human B cell line contain at least one of the top 21 motifs bound by MYC:MAX in vitro. We further show that high MYC/MAX concentrations are needed for specific binding to the low-affinity sequence AACGTT in vitro and that elevated MYC levels in vivo more markedly increase the occupancy of AACGTT sites relative to CME sites, especially at distal intergenic and intragenic loci. Hence, MYC binds diverse DNA motifs with a broad range of affinities in a sequence-specific and dose-dependent manner, suggesting that MYC overexpression has more selective effects on the tumor transcriptome than previously thought.
ATtRACT-a database of RNA-binding proteins and associated motifs.

PubMed

Giudice, Girolamo; Sánchez-Cabo, Fátima; Torroja, Carlos; Lara-Pezzi, Enrique

2016-01-01

RNA-binding proteins (RBPs) play a crucial role in key cellular processes, including RNA transport, splicing, polyadenylation and stability. Understanding the interaction between RBPs and RNA is key to improve our knowledge of RNA processing, localization and regulation in a global manner. Despite advances in recent years, a unified non-redundant resource that includes information on experimentally validated motifs, RBPs and integrated tools to exploit this information is lacking. Here, we developed a database named ATtRACT (available athttp://attract.cnic.es) that compiles information on 370 RBPs and 1583 RBP consensus binding motifs, 192 of which are not present in any other database. To populate ATtRACT we (i) extracted and hand-curated experimentally validated data from CISBP-RNA, SpliceAid-F, RBPDB databases, (ii) integrated and updated the unavailable ASD database and (iii) extracted information from Protein-RNA complexes present in Protein Data Bank database through computational analyses. ATtRACT provides also efficient algorithms to search a specific motif and scan one or more RNA sequences at a time. It also allows discoveringde novomotifs enriched in a set of related sequences and compare them with the motifs included in the database.Database URL:http:// attract. cnic. es. © The Author(s) 2016. Published by Oxford University Press.
The Effect of Orthology and Coregulation on Detecting Regulatory Motifs

PubMed Central

Storms, Valerie; Claeys, Marleen; Sanchez, Aminael; De Moor, Bart; Verstuyf, Annemieke; Marchal, Kathleen

2010-01-01

Background Computational de novo discovery of transcription factor binding sites is still a challenging problem. The growing number of sequenced genomes allows integrating orthology evidence with coregulation information when searching for motifs. Moreover, the more advanced motif detection algorithms explicitly model the phylogenetic relatedness between the orthologous input sequences and thus should be well adapted towards using orthologous information. In this study, we evaluated the conditions under which complementing coregulation with orthologous information improves motif detection for the class of probabilistic motif detection algorithms with an explicit evolutionary model. Methodology We designed datasets (real and synthetic) covering different degrees of coregulation and orthologous information to test how well Phylogibbs and Phylogenetic sampler, as representatives of the motif detection algorithms with evolutionary model performed as compared to MEME, a more classical motif detection algorithm that treats orthologs independently. Results and Conclusions Under certain conditions detecting motifs in the combined coregulation-orthology space is indeed more efficient than using each space separately, but this is not always the case. Moreover, the difference in success rate between the advanced algorithms and MEME is still marginal. The success rate of motif detection depends on the complex interplay between the added information and the specificities of the applied algorithms. Insights in this relation provide information useful to both developers and users. All benchmark datasets are available at http://homes.esat.kuleuven.be/~kmarchal/Supplementary_Storms_Valerie_PlosONE. PMID:20140085
Discriminative motif optimization based on perceptron training

PubMed Central

Patel, Ronak Y.; Stormo, Gary D.

2014-01-01

Motivation: Generating accurate transcription factor (TF) binding site motifs from data generated using the next-generation sequencing, especially ChIP-seq, is challenging. The challenge arises because a typical experiment reports a large number of sequences bound by a TF, and the length of each sequence is relatively long. Most traditional motif finders are slow in handling such enormous amount of data. To overcome this limitation, tools have been developed that compromise accuracy with speed by using heuristic discrete search strategies or limited optimization of identified seed motifs. However, such strategies may not fully use the information in input sequences to generate motifs. Such motifs often form good seeds and can be further improved with appropriate scoring functions and rapid optimization. Results: We report a tool named discriminative motif optimizer (DiMO). DiMO takes a seed motif along with a positive and a negative database and improves the motif based on a discriminative strategy. We use area under receiver-operating characteristic curve (AUC) as a measure of discriminating power of motifs and a strategy based on perceptron training that maximizes AUC rapidly in a discriminative manner. Using DiMO, on a large test set of 87 TFs from human, drosophila and yeast, we show that it is possible to significantly improve motifs identified by nine motif finders. The motifs are generated/optimized using training sets and evaluated on test sets. The AUC is improved for almost 90% of the TFs on test sets and the magnitude of increase is up to 39%. Availability and implementation: DiMO is available at http://stormo.wustl.edu/DiMO Contact: rpatel@genetics.wustl.edu, ronakypatel@gmail.com PMID:24369152
Automated classification of RNA 3D motifs and the RNA 3D Motif Atlas

PubMed Central

Petrov, Anton I.; Zirbel, Craig L.; Leontis, Neocles B.

2013-01-01

The analysis of atomic-resolution RNA three-dimensional (3D) structures reveals that many internal and hairpin loops are modular, recurrent, and structured by conserved non-Watson–Crick base pairs. Structurally similar loops define RNA 3D motifs that are conserved in homologous RNA molecules, but can also occur at nonhomologous sites in diverse RNAs, and which often vary in sequence. To further our understanding of RNA motif structure and sequence variability and to provide a useful resource for structure modeling and prediction, we present a new method for automated classification of internal and hairpin loop RNA 3D motifs and a new online database called the RNA 3D Motif Atlas. To classify the motif instances, a representative set of internal and hairpin loops is automatically extracted from a nonredundant list of RNA-containing PDB files. Their structures are compared geometrically, all-against-all, using the FR3D program suite. The loops are clustered into motif groups, taking into account geometric similarity and structural annotations and making allowance for a variable number of bulged bases. The automated procedure that we have implemented identifies all hairpin and internal loop motifs previously described in the literature. All motif instances and motif groups are assigned unique and stable identifiers and are made available in the RNA 3D Motif Atlas (http://rna.bgsu.edu/motifs), which is automatically updated every four weeks. The RNA 3D Motif Atlas provides an interactive user interface for exploring motif diversity and tools for programmatic data access. PMID:23970545
Identification of 15 candidate structured noncoding RNA motifs in fungi by comparative genomics.

PubMed

Li, Sanshu; Breaker, Ronald R

2017-10-13

With the development of rapid and inexpensive DNA sequencing, the genome sequences of more than 100 fungal species have been made available. This dataset provides an excellent resource for comparative genomics analyses, which can be used to discover genetic elements, including noncoding RNAs (ncRNAs). Bioinformatics tools similar to those used to uncover novel ncRNAs in bacteria, likewise, should be useful for searching fungal genomic sequences, and the relative ease of genetic experiments with some model fungal species could facilitate experimental validation studies. We have adapted a bioinformatics pipeline for discovering bacterial ncRNAs to systematically analyze many fungal genomes. This comparative genomics pipeline integrates information on conserved RNA sequence and structural features with alternative splicing information to reveal fungal RNA motifs that are candidate regulatory domains, or that might have other possible functions. A total of 15 prominent classes of structured ncRNA candidates were identified, including variant HDV self-cleaving ribozyme representatives, atypical snoRNA candidates, and possible structured antisense RNA motifs. Candidate regulatory motifs were also found associated with genes for ribosomal proteins, S-adenosylmethionine decarboxylase (SDC), amidase, and HexA protein involved in Woronin body formation. We experimentally confirm that the variant HDV ribozymes undergo rapid self-cleavage, and we demonstrate that the SDC RNA motif reduces the expression of SAM decarboxylase by translational repression. Furthermore, we provide evidence that several other motifs discovered in this study are likely to be functional ncRNA elements. Systematic screening of fungal genomes using a computational discovery pipeline has revealed the existence of a variety of novel structured ncRNAs. Genome contexts and similarities to known ncRNA motifs provide strong evidence for the biological and biochemical functions of some newly found ncRNA motifs
cWINNOWER Algorithm for Finding Fuzzy DNA Motifs

NASA Technical Reports Server (NTRS)

Liang, Shoudan

2003-01-01

The cWINNOWER algorithm detects fuzzy motifs in DNA sequences rich in protein-binding signals. A signal is defined as any short nucleotide pattern having up to d mutations differing from a motif of length l. The algorithm finds such motifs if multiple mutated copies of the motif (i.e., the signals) are present in the DNA sequence in sufficient abundance. The cWINNOWER algorithm substantially improves the sensitivity of the winnower method of Pevzner and Sze by imposing a consensus constraint, enabling it to detect much weaker signals. We studied the minimum number of detectable motifs qc as a function of sequence length N for random sequences. We found that qc increases linearly with N for a fast version of the algorithm based on counting three-member sub-cliques. Imposing consensus constraints reduces qc, by a factor of three in this case, which makes the algorithm dramatically more sensitive. Our most sensitive algorithm, which counts four-member sub-cliques, needs a minimum of only 13 signals to detect motifs in a sequence of length N = 12000 for (l,d) = (15,4).
Counting of oligomers in sequences generated by markov chains for DNA motif discovery.

PubMed

Shan, Gao; Zheng, Wei-Mou

2009-02-01

By means of the technique of the imbedded Markov chain, an efficient algorithm is proposed to exactly calculate first, second moments of word counts and the probability for a word to occur at least once in random texts generated by a Markov chain. A generating function is introduced directly from the imbedded Markov chain to derive asymptotic approximations for the problem. Two Z-scores, one based on the number of sequences with hits and the other on the total number of word hits in a set of sequences, are examined for discovery of motifs on a set of promoter sequences extracted from A. thaliana genome. Source code is available at http://www.itp.ac.cn/zheng/oligo.c.
Characteristic motifs for families of allergenic proteins

PubMed Central

Ivanciuc, Ovidiu; Garcia, Tzintzuni; Torres, Miguel; Schein, Catherine H.; Braun, Werner

2008-01-01

The identification of potential allergenic proteins is usually done by scanning a database of allergenic proteins and locating known allergens with a high sequence similarity. However, there is no universally accepted cut-off value for sequence similarity to indicate potential IgE cross-reactivity. Further, overall sequence similarity may be less important than discrete areas of similarity in proteins with homologous structure. To identify such areas, we first classified all allergens and their subdomains in the Structural Database of Allergenic Proteins (SDAP, http://fermi.utmb.edu/SDAP/) to their closest protein families as defined in Pfam, and identified conserved physicochemical property motifs characteristic of each group of sequences. Allergens populate only a small subset of all known Pfam families, as all allergenic proteins in SDAP could be grouped to only 130 (of 9318 total) Pfams, and 31 families contain more than four allergens. Conserved physicochemical property motifs for the aligned sequences of the most populated Pfam families were identified with the PCPMer program suite and catalogued in the webserver Motif-Mate (http://born.utmb.edu/motifmate/summary.php). We also determined specific motifs for allergenic members of a family that could distinguish them from non-allergenic ones. These allergen specific motifs should be most useful in database searches for potential allergens. We found that sequence motifs unique to the allergens in three families (seed storage proteins, Bet v 1, and tropomyosin) overlap with known IgE epitopes, thus providing evidence that our motif based approach can be used to assess the potential allergenicity of novel proteins. PMID:18951633
A sequence upstream of canonical PDZ-binding motif within CFTR COOH-terminus enhances NHERF1 interaction.

PubMed

Sharma, Neeraj; LaRusch, Jessica; Sosnay, Patrick R; Gottschalk, Laura B; Lopez, Andrea P; Pellicore, Matthew J; Evans, Taylor; Davis, Emily; Atalar, Melis; Na, Chan-Hyun; Rosson, Gedge D; Belchis, Deborah; Milewski, Michal; Pandey, Akhilesh; Cutting, Garry R

2016-12-01

The development of cystic fibrosis transmembrane conductance regulator (CFTR) targeted therapy for cystic fibrosis has generated interest in maximizing membrane residence of mutant forms of CFTR by manipulating interactions with scaffold proteins, such as sodium/hydrogen exchange regulatory factor-1 (NHERF1). In this study, we explored whether COOH-terminal sequences in CFTR beyond the PDZ-binding motif influence its interaction with NHERF1. NHERF1 displayed minimal self-association in blot overlays (NHERF1, K d = 1,382 ± 61.1 nM) at concentrations well above physiological levels, estimated at 240 nM from RNA-sequencing and 260 nM by liquid chromatography tandem mass spectrometry in sweat gland, a key site of CFTR function in vivo. However, NHERF1 oligomerized at considerably lower concentrations (10 nM) in the presence of the last 111 amino acids of CFTR (20 nM) in blot overlays and cross-linking assays and in coimmunoprecipitations using differently tagged versions of NHERF1. Deletion and alanine mutagenesis revealed that a six-amino acid sequence 1417 EENKVR 1422 and the terminal 1478 TRL 1480 (PDZ-binding motif) in the COOH-terminus were essential for the enhanced oligomerization of NHERF1. Full-length CFTR stably expressed in Madin-Darby canine kidney epithelial cells fostered NHERF1 oligomerization that was substantially reduced (∼5-fold) on alanine substitution of EEN, KVR, or EENKVR residues or deletion of the TRL motif. Confocal fluorescent microscopy revealed that the EENKVR and TRL sequences contribute to preferential localization of CFTR to the apical membrane. Together, these results indicate that COOH-terminal sequences mediate enhanced NHERF1 interaction and facilitate the localization of CFTR, a property that could be manipulated to stabilize mutant forms of CFTR at the apical surface to maximize the effect of CFTR-targeted therapeutics. Copyright © 2016 the American Physiological Society.
A sequence upstream of canonical PDZ-binding motif within CFTR COOH-terminus enhances NHERF1 interaction

PubMed Central

Sharma, Neeraj; LaRusch, Jessica; Sosnay, Patrick R.; Gottschalk, Laura B.; Lopez, Andrea P.; Pellicore, Matthew J.; Evans, Taylor; Davis, Emily; Atalar, Melis; Na, Chan-Hyun; Rosson, Gedge D.; Belchis, Deborah; Milewski, Michal; Pandey, Akhilesh

2016-01-01

The development of cystic fibrosis transmembrane conductance regulator (CFTR) targeted therapy for cystic fibrosis has generated interest in maximizing membrane residence of mutant forms of CFTR by manipulating interactions with scaffold proteins, such as sodium/hydrogen exchange regulatory factor-1 (NHERF1). In this study, we explored whether COOH-terminal sequences in CFTR beyond the PDZ-binding motif influence its interaction with NHERF1. NHERF1 displayed minimal self-association in blot overlays (NHERF1, Kd = 1,382 ± 61.1 nM) at concentrations well above physiological levels, estimated at 240 nM from RNA-sequencing and 260 nM by liquid chromatography tandem mass spectrometry in sweat gland, a key site of CFTR function in vivo. However, NHERF1 oligomerized at considerably lower concentrations (10 nM) in the presence of the last 111 amino acids of CFTR (20 nM) in blot overlays and cross-linking assays and in coimmunoprecipitations using differently tagged versions of NHERF1. Deletion and alanine mutagenesis revealed that a six-amino acid sequence 1417EENKVR1422 and the terminal 1478TRL1480 (PDZ-binding motif) in the COOH-terminus were essential for the enhanced oligomerization of NHERF1. Full-length CFTR stably expressed in Madin-Darby canine kidney epithelial cells fostered NHERF1 oligomerization that was substantially reduced (∼5-fold) on alanine substitution of EEN, KVR, or EENKVR residues or deletion of the TRL motif. Confocal fluorescent microscopy revealed that the EENKVR and TRL sequences contribute to preferential localization of CFTR to the apical membrane. Together, these results indicate that COOH-terminal sequences mediate enhanced NHERF1 interaction and facilitate the localization of CFTR, a property that could be manipulated to stabilize mutant forms of CFTR at the apical surface to maximize the effect of CFTR-targeted therapeutics. PMID:27793802
The nonamer UUAUUUAUU is the key AU-rich sequence motif that mediates mRNA degradation.

PubMed Central

Zubiaga, A M; Belasco, J G; Greenberg, M E

1995-01-01

Labile mRNAs that encode cytokine and immediate-early gene products often contain AU-rich sequences within their 3' untranslated region (UTR). These AU-rich sequences appear to be key determinants of the short half-lives of these mRNAs, although the sequence features of these elements and the mechanism by which they target mRNAs for rapid decay have not been fully defined. We have examined the features of AU-rich elements (AREs) that are crucial for their function as determinants of mRNA instability in mammalian cells by testing the ability of various mutant c-fos AREs and synthetic AREs to direct rapid mRNA deadenylation and decay when inserted within the 3' UTR of the normally stable beta-globin mRNA. Evidence is presented that the pentamer AUUUA, which previously was suggested to be the minimal determinant of instability present in mammalian AREs, cannot direct rapid mRNA deadenylation and decay. Instead, the nonomer UUAUUUAUU is the elemental AU-rich sequence motif that destabilizes mRNA. Removal of one uridine residue from either end of the nonamer (UUAUUUAU or UAUUUAUU) results in a decrease of potency of the element, while removal of a uridine residue from both ends of the nonamer (UAUUUAU) eliminates detectable destabilizing activity. The inclusion of an additional uridine residue at both ends of the nonamer (UUUAUUUAUUU) does not further increase the efficacy of the element. Taken together, these findings suggest that the nonamer UUAUUUAUU is the minimal AU-rich motif that effectively destabilizes mRNA. Additional ARE potency is achieved by combining multiple copies of this nonamer in a single mRNA 3' UTR. Furthermore, analysis of poly(A) shortening rates for ARE-containing mRNAs reveals that the UUAUUUAUU sequence also accelerates mRNA deadenylation and suggests that the UUAUUUAUU motif targets mRNA for rapid deadenylation as an early step in the mRNA decay process. PMID:7891716

Integration of Bioinformatics and Synthetic Promoters Leads to the Discovery of Novel Elicitor-Responsive cis-Regulatory Sequences in Arabidopsis1[C][W][OA

PubMed Central

Koschmann, Jeannette; Machens, Fabian; Becker, Marlies; Niemeyer, Julia; Schulze, Jutta; Bülow, Lorenz; Stahl, Dietmar J.; Hehl, Reinhard

2012-01-01

A combination of bioinformatic tools, high-throughput gene expression profiles, and the use of synthetic promoters is a powerful approach to discover and evaluate novel cis-sequences in response to specific stimuli. With Arabidopsis (Arabidopsis thaliana) microarray data annotated to the PathoPlant database, 732 different queries with a focus on fungal and oomycete pathogens were performed, leading to 510 up-regulated gene groups. Using the binding site estimation suite of tools, BEST, 407 conserved sequence motifs were identified in promoter regions of these coregulated gene sets. Motif similarities were determined with STAMP, classifying the 407 sequence motifs into 37 families. A comparative analysis of these 37 families with the AthaMap, PLACE, and AGRIS databases revealed similarities to known cis-elements but also led to the discovery of cis-sequences not yet implicated in pathogen response. Using a parsley (Petroselinum crispum) protoplast system and a modified reporter gene vector with an internal transformation control, 25 elicitor-responsive cis-sequences from 10 different motif families were identified. Many of the elicitor-responsive cis-sequences also drive reporter gene expression in an Agrobacterium tumefaciens infection assay in Nicotiana benthamiana. This work significantly increases the number of known elicitor-responsive cis-sequences and demonstrates the successful integration of a diverse set of bioinformatic resources combined with synthetic promoter analysis for data mining and functional screening in plant-pathogen interaction. PMID:22744985
Sequence Analysis and Domain Motifs in the Porcine Skin Decorin Glycosaminoglycan Chain*

PubMed Central

Zhao, Xue; Yang, Bo; Solakylidirim, Kemal; Joo, Eun Ji; Toida, Toshihiko; Higashi, Kyohei; Linhardt, Robert J.; Li, Lingyun

2013-01-01

Decorin proteoglycan is comprised of a core protein containing a single O-linked dermatan sulfate/chondroitin sulfate glycosaminoglycan (GAG) chain. Although the sequence of the decorin core protein is determined by the gene encoding its structure, the structure of its GAG chain is determined in the Golgi. The recent application of modern MS to bikunin, a far simpler chondroitin sulfate proteoglycans, suggests that it has a single or small number of defined sequences. On this basis, a similar approach to sequence the decorin of porcine skin much larger and more structurally complex dermatan sulfate/chondroitin sulfate GAG chain was undertaken. This approach resulted in information on the consistency/variability of its linkage region at the reducing end of the GAG chain, its iduronic acid-rich domain, glucuronic acid-rich domain, and non-reducing end. A general motif for the porcine skin decorin GAG chain was established. A single small decorin GAG chain was sequenced using MS/MS analysis. The data obtained in the study suggest that the decorin GAG chain has a small or a limited number of sequences. PMID:23423381
Sequence-Specific Recognition of DNA by Proteins: Binding Motifs Discovered Using a Novel Statistical/Computational Analysis

PubMed Central

Jakubec, David; Laskowski, Roman A.; Vondrasek, Jiri

2016-01-01

Decades of intensive experimental studies of the recognition of DNA sequences by proteins have provided us with a view of a diverse and complicated world in which few to no features are shared between individual DNA-binding protein families. The originally conceived direct readout of DNA residue sequences by amino acid side chains offers very limited capacity for sequence recognition, while the effects of the dynamic properties of the interacting partners remain difficult to quantify and almost impossible to generalise. In this work we investigated the energetic characteristics of all DNA residue—amino acid side chain combinations in the conformations found at the interaction interface in a very large set of protein—DNA complexes by the means of empirical potential-based calculations. General specificity-defining criteria were derived and utilised to look beyond the binding motifs considered in previous studies. Linking energetic favourability to the observed geometrical preferences, our approach reveals several additional amino acid motifs which can distinguish between individual DNA bases. Our results remained valid in environments with various dielectric properties. PMID:27384774
The amino acid motif L/IIxxFE defines a novel actin-binding sequence in PDZ-RhoGEF

PubMed Central

Banerjee, Jayashree; Fischer, Christopher C.; Wedegaertner, Philip B.

2009-01-01

PDZ-RhoGEF is a member of the regulator of G protein signaling (RGS) domain-containing RhoGEFs (RGS-RhoGEFs) that link activated heterotrimeric G protein α subunits of the G12 family to activation of the small GTPase RhoA. Unique among the RGS-RhoGEFs, PDZ-RhoGEF contains a short sequence that localizes the protein to the actin cytoskeleton. In this report, we demonstrate that the actin-binding domain, located between amino acids 561–585, directly binds to F-actin in vitro. Extensive mutagenesis identifies isoleucine 568, isoleucine 569, phenylalanine 572, and glutamic acid 573 as necessary for binding to actin and for co-localization with the actin cytoskeleton in cells. These results define a novel actin-binding sequence in PDZ-RhoGEF with a critical amino acid motif of IIxxFE. Moreover, sequence analysis identifies a similar actin-binding motif in the N-terminus of the RhoGEF frabin, and, as with PDZ-RhoGEF, mutagenesis and actin interaction experiments demonstrate a motif of LIxxFE, consisting of the key amino acids leucine 23, isoleucine 24, phenylalanine 27, and glutamic acid 28. Taken together, results with PDZ-RhoGEF and frabin identify a novel actin binding sequence. Lastly, inducible dimerization of the actin-binding region of PDZ-RhoGEF revealed a dimerization-dependent actin bundling activity in vitro. PDZ-RhoGEF exists in cells as a dimer, raising the possibility that PDZ-RhoGEF could influence actin structure independent of its ability to activate RhoA. PMID:19618964
Computational Tools for the Identification and Interpretation of Sequence Motifs in Immunopeptidomes.

PubMed

Alvarez, Bruno; Barra, Carolina; Nielsen, Morten; Andreatta, Massimo

2018-01-12

Recent advances in proteomics and mass-spectrometry have widely expanded the detectable peptide repertoire presented by major histocompatibility complex (MHC) molecules on the cell surface, collectively known as the immunopeptidome. Finely characterizing the immunopeptidome brings about important basic insights into the mechanisms of antigen presentation, but can also reveal promising targets for vaccine development and cancer immunotherapy. This report describes a number of practical and efficient approaches to analyze immunopeptidomics data, discussing the identification of meaningful sequence motifs in various scenarios and considering current limitations. Guidelines are provided for the filtering of false hits and contaminants, and to address the problem of motif deconvolution in cell lines expressing multiple MHC alleles, both for the MHC class I and class II systems. Finally, it is demonstrated how machine learning can be readily employed by non-expert users to generate accurate prediction models directly from mass-spectrometry eluted ligand data sets. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Motif discovery and motif finding from genome-mapped DNase footprint data.

PubMed

Kulakovskiy, Ivan V; Favorov, Alexander V; Makeev, Vsevolod J

2009-09-15

Footprint data is an important source of information on transcription factor recognition motifs. However, a footprinting fragment can contain no sequences similar to known protein recognition sites. Inspection of genome fragments nearby can help to identify missing site positions. Genome fragments containing footprints were supplied to a pipeline that constructed a position weight matrix (PWM) for different motif lengths and selected the optimal PWM. Fragments were aligned with the SeSiMCMC sampler and a new heuristic algorithm, Bigfoot. Footprints with missing hits were found for approximately 50% of factors. Adding only 2 bp on both sides of a footprinting fragment recovered most hits. We automatically constructed motifs for 41 Drosophila factors. New motifs can recognize footprints with a greater sensitivity at the same false positive rate than existing models. Also we discuss possible overfitting of constructed motifs. Software and the collection of regulatory motifs are freely available at http://line.imb.ac.ru/DMMPMM.
Motif discovery with data mining in 3D protein structure databases: discovery, validation and prediction of the U-shape zinc binding ("Huf-Zinc") motif.

PubMed

Maurer-Stroh, Sebastian; Gao, He; Han, Hao; Baeten, Lies; Schymkowitz, Joost; Rousseau, Frederic; Zhang, Louxin; Eisenhaber, Frank

2013-02-01

Data mining in protein databases, derivatives from more fundamental protein 3D structure and sequence databases, has considerable unearthed potential for the discovery of sequence motif--structural motif--function relationships as the finding of the U-shape (Huf-Zinc) motif, originally a small student's project, exemplifies. The metal ion zinc is critically involved in universal biological processes, ranging from protein-DNA complexes and transcription regulation to enzymatic catalysis and metabolic pathways. Proteins have evolved a series of motifs to specifically recognize and bind zinc ions. Many of these, so called zinc fingers, are structurally independent globular domains with discontinuous binding motifs made up of residues mostly far apart in sequence. Through a systematic approach starting from the BRIX structure fragment database, we discovered that there exists another predictable subset of zinc-binding motifs that not only have a conserved continuous sequence pattern but also share a characteristic local conformation, despite being included in totally different overall folds. While this does not allow general prediction of all Zn binding motifs, a HMM-based web server, Huf-Zinc, is available for prediction of these novel, as well as conventional, zinc finger motifs in protein sequences. The Huf-Zinc webserver can be freely accessed through this URL (http://mendel.bii.a-star.edu.sg/METHODS/hufzinc/).
MotifNet: a web-server for network motif analysis.

PubMed

Smoly, Ilan Y; Lerman, Eugene; Ziv-Ukelson, Michal; Yeger-Lotem, Esti

2017-06-15

Network motifs are small topological patterns that recur in a network significantly more often than expected by chance. Their identification emerged as a powerful approach for uncovering the design principles underlying complex networks. However, available tools for network motif analysis typically require download and execution of computationally intensive software on a local computer. We present MotifNet, the first open-access web-server for network motif analysis. MotifNet allows researchers to analyze integrated networks, where nodes and edges may be labeled, and to search for motifs of up to eight nodes. The output motifs are presented graphically and the user can interactively filter them by their significance, number of instances, node and edge labels, and node identities, and view their instances. MotifNet also allows the user to distinguish between motifs that are centered on specific nodes and motifs that recur in distinct parts of the network. MotifNet is freely available at http://netbio.bgu.ac.il/motifnet . The website was implemented using ReactJs and supports all major browsers. The server interface was implemented in Python with data stored on a MySQL database. estiyl@bgu.ac.il or michaluz@cs.bgu.ac.il. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Finishing and Special Motifs: Lessons Learned from CRISPR Analysis Using Next-Generation Draft Sequences (7th Annual SFAF Meeting, 2012)

ScienceCinema

Campbell, Catherine

2018-01-22

Catherine Campbell on "Finishing and Special Motifs: Lessons learned from CRISPR analysis using next-generation draft sequences" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.
Finishing and Special Motifs: Lessons Learned from CRISPR Analysis Using Next-Generation Draft Sequences (7th Annual SFAF Meeting, 2012)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Campbell, Catherine

Catherine Campbell on "Finishing and Special Motifs: Lessons learned from CRISPR analysis using next-generation draft sequences" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.
Automatic annotation of protein motif function with Gene Ontology terms.

PubMed

Lu, Xinghua; Zhai, Chengxiang; Gopalakrishnan, Vanathi; Buchanan, Bruce G

2004-09-02

Conserved protein sequence motifs are short stretches of amino acid sequence patterns that potentially encode the function of proteins. Several sequence pattern searching algorithms and programs exist foridentifying candidate protein motifs at the whole genome level. However, a much needed and important task is to determine the functions of the newly identified protein motifs. The Gene Ontology (GO) project is an endeavor to annotate the function of genes or protein sequences with terms from a dynamic, controlled vocabulary and these annotations serve well as a knowledge base. This paper presents methods to mine the GO knowledge base and use the association between the GO terms assigned to a sequence and the motifs matched by the same sequence as evidence for predicting the functions of novel protein motifs automatically. The task of assigning GO terms to protein motifs is viewed as both a binary classification and information retrieval problem, where PROSITE motifs are used as samples for mode training and functional prediction. The mutual information of a motif and aGO term association is found to be a very useful feature. We take advantage of the known motifs to train a logistic regression classifier, which allows us to combine mutual information with other frequency-based features and obtain a probability of correct association. The trained logistic regression model has intuitively meaningful and logically plausible parameter values, and performs very well empirically according to our evaluation criteria. In this research, different methods for automatic annotation of protein motifs have been investigated. Empirical result demonstrated that the methods have a great potential for detecting and augmenting information about the functions of newly discovered candidate protein motifs.
Space-related pharma-motifs for fast search of protein binding motifs and polypharmacological targets

PubMed Central

2012-01-01

Background To discover a compound inhibiting multiple proteins (i.e. polypharmacological targets) is a new paradigm for the complex diseases (e.g. cancers and diabetes). In general, the polypharmacological proteins often share similar local binding environments and motifs. As the exponential growth of the number of protein structures, to find the similar structural binding motifs (pharma-motifs) is an emergency task for drug discovery (e.g. side effects and new uses for old drugs) and protein functions. Results We have developed a Space-Related Pharmamotifs (called SRPmotif) method to recognize the binding motifs by searching against protein structure database. SRPmotif is able to recognize conserved binding environments containing spatially discontinuous pharma-motifs which are often short conserved peptides with specific physico-chemical properties for protein functions. Among 356 pharma-motifs, 56.5% interacting residues are highly conserved. Experimental results indicate that 81.1% and 92.7% polypharmacological targets of each protein-ligand complex are annotated with same biological process (BP) and molecular function (MF) terms, respectively, based on Gene Ontology (GO). Our experimental results show that the identified pharma-motifs often consist of key residues in functional (active) sites and play the key roles for protein functions. The SRPmotif is available at http://gemdock.life.nctu.edu.tw/SRP/. Conclusions SRPmotif is able to identify similar pharma-interfaces and pharma-motifs sharing similar binding environments for polypharmacological targets by rapidly searching against the protein structure database. Pharma-motifs describe the conservations of binding environments for drug discovery and protein functions. Additionally, these pharma-motifs provide the clues for discovering new sequence-based motifs to predict protein functions from protein sequence databases. We believe that SRPmotif is useful for elucidating protein functions and drug discovery
Space-related pharma-motifs for fast search of protein binding motifs and polypharmacological targets.

PubMed

Chiu, Yi-Yuan; Lin, Chun-Yu; Lin, Chih-Ta; Hsu, Kai-Cheng; Chang, Li-Zen; Yang, Jinn-Moon

2012-01-01

To discover a compound inhibiting multiple proteins (i.e. polypharmacological targets) is a new paradigm for the complex diseases (e.g. cancers and diabetes). In general, the polypharmacological proteins often share similar local binding environments and motifs. As the exponential growth of the number of protein structures, to find the similar structural binding motifs (pharma-motifs) is an emergency task for drug discovery (e.g. side effects and new uses for old drugs) and protein functions. We have developed a Space-Related Pharmamotifs (called SRPmotif) method to recognize the binding motifs by searching against protein structure database. SRPmotif is able to recognize conserved binding environments containing spatially discontinuous pharma-motifs which are often short conserved peptides with specific physico-chemical properties for protein functions. Among 356 pharma-motifs, 56.5% interacting residues are highly conserved. Experimental results indicate that 81.1% and 92.7% polypharmacological targets of each protein-ligand complex are annotated with same biological process (BP) and molecular function (MF) terms, respectively, based on Gene Ontology (GO). Our experimental results show that the identified pharma-motifs often consist of key residues in functional (active) sites and play the key roles for protein functions. The SRPmotif is available at http://gemdock.life.nctu.edu.tw/SRP/. SRPmotif is able to identify similar pharma-interfaces and pharma-motifs sharing similar binding environments for polypharmacological targets by rapidly searching against the protein structure database. Pharma-motifs describe the conservations of binding environments for drug discovery and protein functions. Additionally, these pharma-motifs provide the clues for discovering new sequence-based motifs to predict protein functions from protein sequence databases. We believe that SRPmotif is useful for elucidating protein functions and drug discovery.
DNA motifs determining the accuracy of repeat duplication during CRISPR adaptation in Haloarcula hispanica

PubMed Central

Wang, Rui; Li, Ming; Gong, Luyao; Hu, Songnian; Xiang, Hua

2016-01-01

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) acquire new spacers to generate adaptive immunity in prokaryotes. During spacer integration, the leader-preceded repeat is always accurately duplicated, leading to speculations of a repeat-length ruler. Here in Haloarcula hispanica, we demonstrate that the accurate duplication of its 30-bp repeat requires two conserved mid-repeat motifs, AACCC and GTGGG. The AACCC motif was essential and needed to be ∼10 bp downstream from the leader-repeat junction site, where duplication consistently started. Interestingly, repeat duplication terminated sequence-independently and usually with a specific distance from the GTGGG motif, which seemingly served as an anchor site for a molecular ruler. Accordingly, altering the spacing between the two motifs led to an aberrant duplication size (29, 31, 32 or 33 bp). We propose the adaptation complex may recognize these mid-repeat elements to enable measuring the repeat DNA for spacer integration. PMID:27085805
Discovery of phosphorylation motif mixtures in phosphoproteomics data

PubMed Central

Ritz, Anna; Shakhnarovich, Gregory; Salomon, Arthur R.; Raphael, Benjamin J.

2009-01-01

Motivation: Modification of proteins via phosphorylation is a primary mechanism for signal transduction in cells. Phosphorylation sites on proteins are determined in part through particular patterns, or motifs, present in the amino acid sequence. Results: We describe an algorithm that simultaneously discovers multiple motifs in a set of peptides that were phosphorylated by several different kinases. Such sets of peptides are routinely produced in proteomics experiments.Our motif-finding algorithm uses the principle of minimum description length to determine a mixture of sequence motifs that distinguish a foreground set of phosphopeptides from a background set of unphosphorylated peptides. We show that our algorithm outperforms existing motif-finding algorithms on synthetic datasets consisting of mixtures of known phosphorylation sites. We also derive a motif specificity score that quantifies whether or not the phosphoproteins containing an instance of a motif have a significant number of known interactions. Application of our motif-finding algorithm to recently published human and mouse proteomic studies recovers several known phosphorylation motifs and reveals a number of novel motifs that are enriched for interactions with a particular kinase or phosphatase. Our tools provide a new approach for uncovering the sequence specificities of uncharacterized kinases or phosphatases. Availability: Software is available at http:/cs.brown.edu/people/braphael/software.html. Contact: aritz@cs.brown.edu; braphael@cs.brown.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:18996944
Structural polymorphism of a cytosine-rich DNA sequence forming i-motif structure: Exploring pH based biosensors.

PubMed

Ahmed, Saami; Kaushik, Mahima; Chaudhary, Swati; Kukreti, Shrikant

2018-05-01

Sequence recognition and conformational polymorphism enable DNA to emerge out as a substantial tool in fabricating the devices within nano-dimensions. These DNA associated nano devices work on the principle of conformational switches, which can be facilitated by many factors like sequence of DNA/RNA strand, change in pH or temperature, enzyme or ligand interactions etc. Thus, controlling these DNA conformational changes to acquire the desired function is significant for evolving DNA hybridization biosensor, used in genetic screening and molecular diagnosis. For exploring this conformational switching ability of cytosine-rich DNA oligonucleotides as a function of pH for their potential usage as biosensors, this study has been designed. A C-rich stretch of DNA sequence (5'-TCCCCCAATTAATTCCCCCA-3'; SG20c) has been investigated using UV-Thermal denaturation, poly-acrylamide gel electrophoresis and CD spectroscopy. The SG20c sequence is shown to adopt various topologies of i-motif structure at low pH. This pH dependent transition of SG20c from unstructured single strand to unimolecular and bimolecular i-motif structures can further be exploited for its utilization as switching on/off pH-based biosensors. Copyright © 2018. Published by Elsevier B.V.
Presence of a consensus DNA motif at nearby DNA sequence of the mutation susceptible CG nucleotides.

PubMed

Chowdhury, Kaushik; Kumar, Suresh; Sharma, Tanu; Sharma, Ankit; Bhagat, Meenakshi; Kamai, Asangla; Ford, Bridget M; Asthana, Shailendra; Mandal, Chandi C

2018-01-10

Complexity in tissues affected by cancer arises from somatic mutations and epigenetic modifications in the genome. The mutation susceptible hotspots present within the genome indicate a non-random nature and/or a position specific selection of mutation. An association exists between the occurrence of mutations and epigenetic DNA methylation. This study is primarily aimed at determining mutation status, and identifying a signature for predicting mutation prone zones of tumor suppressor (TS) genes. Nearby sequences from the top five positions having a higher mutation frequency in each gene of 42 TS genes were selected from a cosmic database and were considered as mutation prone zones. The conserved motifs present in the mutation prone DNA fragments were identified. Molecular docking studies were done to determine putative interactions between the identified conserved motifs and enzyme methyltransferase DNMT1. Collective analysis of 42 TS genes found GC as the most commonly replaced and AT as the most commonly formed residues after mutation. Analysis of the top 5 mutated positions of each gene (210 DNA segments for 42 TS genes) identified that CG nucleotides of the amino acid codons (e.g., Arginine) are most susceptible to mutation, and found a consensus DNA "T/AGC/GAGGA/TG" sequence present in these mutation prone DNA segments. Similar to TS genes, analysis of 54 oncogenes not only found CG nucleotides of the amino acid Arg as the most susceptible to mutation, but also identified the presence of similar consensus DNA motifs in the mutation prone DNA fragments (270 DNA segments for 54 oncogenes) of oncogenes. Docking studies depicted that, upon binding of DNMT1 methylates to this consensus DNA motif (C residues of CpG islands), mutation was likely to occur. Thus, this study proposes that DNMT1 mediated methylation in chromosomal DNA may decrease if a foreign DNA segment containing this consensus sequence along with CG nucleotides is exogenously introduced to dividing
oPOSSUM: integrated tools for analysis of regulatory motif over-representation

PubMed Central

Ho Sui, Shannan J.; Fulton, Debra L.; Arenillas, David J.; Kwon, Andrew T.; Wasserman, Wyeth W.

2007-01-01

The identification of over-represented transcription factor binding sites from sets of co-expressed genes provides insights into the mechanisms of regulation for diverse biological contexts. oPOSSUM, an internet-based system for such studies of regulation, has been improved and expanded in this new release. New features include a worm-specific version for investigating binding sites conserved between Caenorhabditis elegans and C. briggsae, as well as a yeast-specific version for the analysis of co-expressed sets of Saccharomyces cerevisiae genes. The human and mouse applications feature improvements in ortholog mapping, sequence alignments and the delineation of multiple alternative promoters. oPOSSUM2, introduced for the analysis of over-represented combinations of motifs in human and mouse genes, has been integrated with the original oPOSSUM system. Analysis using user-defined background gene sets is now supported. The transcription factor binding site models have been updated to include new profiles from the JASPAR database. oPOSSUM is available at http://www.cisreg.ca/oPOSSUM/ PMID:17576675
cWINNOWER algorithm for finding fuzzy dna motifs

NASA Technical Reports Server (NTRS)

Liang, S.; Samanta, M. P.; Biegel, B. A.

2004-01-01

The cWINNOWER algorithm detects fuzzy motifs in DNA sequences rich in protein-binding signals. A signal is defined as any short nucleotide pattern having up to d mutations differing from a motif of length l. The algorithm finds such motifs if a clique consisting of a sufficiently large number of mutated copies of the motif (i.e., the signals) is present in the DNA sequence. The cWINNOWER algorithm substantially improves the sensitivity of the winnower method of Pevzner and Sze by imposing a consensus constraint, enabling it to detect much weaker signals. We studied the minimum detectable clique size qc as a function of sequence length N for random sequences. We found that qc increases linearly with N for a fast version of the algorithm based on counting three-member sub-cliques. Imposing consensus constraints reduces qc by a factor of three in this case, which makes the algorithm dramatically more sensitive. Our most sensitive algorithm, which counts four-member sub-cliques, needs a minimum of only 13 signals to detect motifs in a sequence of length N = 12,000 for (l, d) = (15, 4). Copyright Imperial College Press.
DNA motifs associated with aberrant CpG island methylation.

PubMed

Feltus, F Alex; Lee, Eva K; Costello, Joseph F; Plass, Christoph; Vertino, Paula M

2006-05-01

Epigenetic silencing involving the aberrant methylation of promoter region CpG islands is widely recognized as a tumor suppressor silencing mechanism in cancer. However, the molecular pathways underlying aberrant DNA methylation remain elusive. Recently we showed that, on a genome-wide level, CpG island loci differ in their intrinsic susceptibility to aberrant methylation and that this susceptibility can be predicted based on underlying sequence context. These data suggest that there are sequence/structural features that contribute to the protection from or susceptibility to aberrant methylation. Here we use motif elicitation coupled with classification techniques to identify DNA sequence motifs that selectively define methylation-prone or methylation-resistant CpG islands. Motifs common to 28 methylation-prone or 47 methylation-resistant CpG island-containing genomic fragments were determined using the MEME and MAST algorithms (). The five most discriminatory motifs derived from methylation-prone sequences were found to be associated with CpG islands in general and were nonrandomly distributed throughout the genome. In contrast, the eight most discriminatory motifs derived from the methylation-resistant CpG islands were randomly distributed throughout the genome. Interestingly, this latter group tended to associate with Alu and other repetitive sequences. Used together, the frequency of occurrence of these motifs successfully discriminated methylation-prone and methylation-resistant CpG island groups with an accuracy of 87% after 10-fold cross-validation. The motifs identified here are candidate methylation-targeting or methylation-protection DNA sequences.

Extracting DNA words based on the sequence features: non-uniform distribution and integrity.

PubMed

Li, Zhi; Cao, Hongyan; Cui, Yuehua; Zhang, Yanbo

2016-01-25

DNA sequence can be viewed as an unknown language with words as its functional units. Given that most sequence alignment algorithms such as the motif discovery algorithms depend on the quality of background information about sequences, it is necessary to develop an ab initio algorithm for extracting the "words" based only on the DNA sequences. We considered that non-uniform distribution and integrity were two important features of a word, based on which we developed an ab initio algorithm to extract "DNA words" that have potential functional meaning. A Kolmogorov-Smirnov test was used for consistency test of uniform distribution of DNA sequences, and the integrity was judged by the sequence and position alignment. Two random base sequences were adopted as negative control, and an English book was used as positive control to verify our algorithm. We applied our algorithm to the genomes of Saccharomyces cerevisiae and 10 strains of Escherichia coli to show the utility of the methods. The results provide strong evidences that the algorithm is a promising tool for ab initio building a DNA dictionary. Our method provides a fast way for large scale screening of important DNA elements and offers potential insights into the understanding of a genome.
RNA motif search with data-driven element ordering.

PubMed

Rampášek, Ladislav; Jimenez, Randi M; Lupták, Andrej; Vinař, Tomáš; Brejová, Broňa

2016-05-18

In this paper, we study the problem of RNA motif search in long genomic sequences. This approach uses a combination of sequence and structure constraints to uncover new distant homologs of known functional RNAs. The problem is NP-hard and is traditionally solved by backtracking algorithms. We have designed a new algorithm for RNA motif search and implemented a new motif search tool RNArobo. The tool enhances the RNAbob descriptor language, allowing insertions in helices, which enables better characterization of ribozymes and aptamers. A typical RNA motif consists of multiple elements and the running time of the algorithm is highly dependent on their ordering. By approaching the element ordering problem in a principled way, we demonstrate more than 100-fold speedup of the search for complex motifs compared to previously published tools. We have developed a new method for RNA motif search that allows for a significant speedup of the search of complex motifs that include pseudoknots. Such speed improvements are crucial at a time when the rate of DNA sequencing outpaces growth in computing. RNArobo is available at http://compbio.fmph.uniba.sk/rnarobo .
A survey of motif finding Web tools for detecting binding site motifs in ChIP-Seq data

PubMed Central

2014-01-01

Abstract ChIP-Seq (chromatin immunoprecipitation sequencing) has provided the advantage for finding motifs as ChIP-Seq experiments narrow down the motif finding to binding site locations. Recent motif finding tools facilitate the motif detection by providing user-friendly Web interface. In this work, we reviewed nine motif finding Web tools that are capable for detecting binding site motifs in ChIP-Seq data. We showed each motif finding Web tool has its own advantages for detecting motifs that other tools may not discover. We recommended the users to use multiple motif finding Web tools that implement different algorithms for obtaining significant motifs, overlapping resemble motifs, and non-overlapping motifs. Finally, we provided our suggestions for future development of motif finding Web tool that better assists researchers for finding motifs in ChIP-Seq data. Reviewers This article was reviewed by Prof. Sandor Pongor, Dr. Yuriy Gusev, and Dr. Shyam Prabhakar (nominated by Prof. Limsoon Wong). PMID:24555784
kpLogo: positional k-mer analysis reveals hidden specificity in biological sequences

PubMed Central

2017-01-01

Abstract Motifs of only 1–4 letters can play important roles when present at key locations within macromolecules. Because existing motif-discovery tools typically miss these position-specific short motifs, we developed kpLogo, a probability-based logo tool for integrated detection and visualization of position-specific ultra-short motifs from a set of aligned sequences. kpLogo also overcomes the limitations of conventional motif-visualization tools in handling positional interdependencies and utilizing ranked or weighted sequences increasingly available from high-throughput assays. kpLogo can be found at http://kplogo.wi.mit.edu/. PMID:28460012
Defining RNA motif-aminoglycoside interactions via two-dimensional combinatorial screening and structure-activity relationships through sequencing.

PubMed

Velagapudi, Sai Pradeep; Disney, Matthew D

2013-10-15

RNA is an extremely important target for the development of chemical probes of function or small molecule therapeutics. Aminoglycosides are the most well studied class of small molecules to target RNA. However, the RNA motifs outside of the bacterial rRNA A-site that are likely to be bound by these compounds in biological systems is largely unknown. If such information were known, it could allow for aminoglycosides to be exploited to target other RNAs and, in addition, could provide invaluable insights into potential bystander targets of these clinically used drugs. We utilized two-dimensional combinatorial screening (2DCS), a library-versus-library screening approach, to select the motifs displayed in a 3×3 nucleotide internal loop library and in a 6-nucleotide hairpin library that bind with high affinity and selectivity to six aminoglycoside derivatives. The selected RNA motifs were then analyzed using structure-activity relationships through sequencing (StARTS), a statistical approach that defines the privileged RNA motif space that binds a small molecule. StARTS allowed for the facile annotation of the selected RNA motif-aminoglycoside interactions in terms of affinity and selectivity. The interactions selected by 2DCS generally have nanomolar affinities, which is higher affinity than the binding of aminoglycosides to a mimic of their therapeutic target, the bacterial rRNA A-site. Copyright © 2013 Elsevier Ltd. All rights reserved.
SIRW: A web server for the Simple Indexing and Retrieval System that combines sequence motif searches with keyword searches.

PubMed

Ramu, Chenna

2003-07-01

SIRW (http://sirw.embl.de/) is a World Wide Web interface to the Simple Indexing and Retrieval System (SIR) that is capable of parsing and indexing various flat file databases. In addition it provides a framework for doing sequence analysis (e.g. motif pattern searches) for selected biological sequences through keyword search. SIRW is an ideal tool for the bioinformatics community for searching as well as analyzing biological sequences of interest.
Methods and statistics for combining motif match scores.

PubMed

Bailey, T L; Gribskov, M

1998-01-01

Position-specific scoring matrices are useful for representing and searching for protein sequence motifs. A sequence family can often be described by a group of one or more motifs, and an effective search must combine the scores for matching a sequence to each of the motifs in the group. We describe three methods for combining match scores and estimating the statistical significance of the combined scores and evaluate the search quality (classification accuracy) and the accuracy of the estimate of statistical significance of each. The three methods are: 1) sum of scores, 2) sum of reduced variates, 3) product of score p-values. We show that method 3) is superior to the other two methods in both regards, and that combining motif scores indeed gives better search accuracy. The MAST sequence homology search algorithm utilizing the product of p-values scoring method is available for interactive use and downloading at URL http:/(/)www.sdsc.edu/MEME.
Efficient exact motif discovery.

PubMed

Marschall, Tobias; Rahmann, Sven

2009-06-15

The motif discovery problem consists of finding over-represented patterns in a collection of biosequences. It is one of the classical sequence analysis problems, but still has not been satisfactorily solved in an exact and efficient manner. This is partly due to the large number of possibilities of defining the motif search space and the notion of over-representation. Even for well-defined formalizations, the problem is frequently solved in an ad hoc manner with heuristics that do not guarantee to find the best motif. We show how to solve the motif discovery problem (almost) exactly on a practically relevant space of IUPAC generalized string patterns, using the p-value with respect to an i.i.d. model or a Markov model as the measure of over-representation. In particular, (i) we use a highly accurate compound Poisson approximation for the null distribution of the number of motif occurrences. We show how to compute the exact clump size distribution using a recently introduced device called probabilistic arithmetic automaton (PAA). (ii) We define two p-value scores for over-representation, the first one based on the total number of motif occurrences, the second one based on the number of sequences in a collection with at least one occurrence. (iii) We describe an algorithm to discover the optimal pattern with respect to either of the scores. The method exploits monotonicity properties of the compound Poisson approximation and is by orders of magnitude faster than exhaustive enumeration of IUPAC strings (11.8 h compared with an extrapolated runtime of 4.8 years). (iv) We justify the use of the proposed scores for motif discovery by showing our method to outperform other motif discovery algorithms (e.g. MEME, Weeder) on benchmark datasets. We also propose new motifs on Mycobacterium tuberculosis. The method has been implemented in Java. It can be obtained from http://ls11-www.cs.tu-dortmund.de/people/marschal/paa_md/.
Sequence motifs and prokaryotic expression of the reptilian paramyxovirus fusion protein

USGS Publications Warehouse

Franke, J.; Batts, W.N.; Ahne, W.; Kurath, G.; Winton, J.R.

2006-01-01

Fourteen reptilian paramyxovirus isolates were chosen to represent the known extent of genetic diversity among this novel group of viruses. Selected regions of the fusion (F) gene were sequenced, analyzed and compared. The F gene of all isolates contained conserved motifs homologous to those described for other members of the family Paramyxoviridae including: signal peptide, transmembrane domain, furin cleavage site, fusion peptide, N-linked glycosylation sites, and two heptad repeats, the second of which (HRB-LZ) had the characteristics of a leucine zipper. Selected regions of the fusion gene of isolate Gono-GER85 were inserted into a prokaryotic expression system to generate three recombinant protein fragments of various sizes. The longest recombinant protein was cleaved by furin into two fragments of predicted length. Western blot analysis with virus-neutralizing rabbit-antiserum against this isolate demonstrated that only the longest construct reacted with the antiserum. This construct was unique in containing 30 additional C-terminal amino acids that included most of the HRB-LZ. These results indicate that the F genes of reptilian paramyxoviruses contain highly conserved motifs typical of other members of the family and suggest that the HRB-LZ domain of the reptilian paramyxovirus F protein contains a linear antigenic epitope. ?? Springer-Verlag 2005.
A generic motif discovery algorithm for sequential data.

PubMed

Jensen, Kyle L; Styczynski, Mark P; Rigoutsos, Isidore; Stephanopoulos, Gregory N

2006-01-01

Motif discovery in sequential data is a problem of great interest and with many applications. However, previous methods have been unable to combine exhaustive search with complex motif representations and are each typically only applicable to a certain class of problems. Here we present a generic motif discovery algorithm (Gemoda) for sequential data. Gemoda can be applied to any dataset with a sequential character, including both categorical and real-valued data. As we show, Gemoda deterministically discovers motifs that are maximal in composition and length. As well, the algorithm allows any choice of similarity metric for finding motifs. Finally, Gemoda's output motifs are representation-agnostic: they can be represented using regular expressions, position weight matrices or any number of other models for any type of sequential data. We demonstrate a number of applications of the algorithm, including the discovery of motifs in amino acids sequences, a new solution to the (l,d)-motif problem in DNA sequences and the discovery of conserved protein substructures. Gemoda is freely available at http://web.mit.edu/bamel/gemoda
SCOPE: a web server for practical de novo motif discovery.

PubMed

Carlson, Jonathan M; Chakravarty, Arijit; DeZiel, Charles E; Gross, Robert H

2007-07-01

SCOPE is a novel parameter-free method for the de novo identification of potential regulatory motifs in sets of coordinately regulated genes. The SCOPE algorithm combines the output of three component algorithms, each designed to identify a particular class of motifs. Using an ensemble learning approach, SCOPE identifies the best candidate motifs from its component algorithms. In tests on experimentally determined datasets, SCOPE identified motifs with a significantly higher level of accuracy than a number of other web-based motif finders run with their default parameters. Because SCOPE has no adjustable parameters, the web server has an intuitive interface, requiring only a set of gene names or FASTA sequences and a choice of species. The most significant motifs found by SCOPE are displayed graphically on the main results page with a table containing summary statistics for each motif. Detailed motif information, including the sequence logo, PWM, consensus sequence and specific matching sites can be viewed through a single click on a motif. SCOPE's efficient, parameter-free search strategy has enabled the development of a web server that is readily accessible to the practising biologist while providing results that compare favorably with those of other motif finders. The SCOPE web server is at .
The Thiamin Pyrophosphate-Motif

NASA Technical Reports Server (NTRS)

Dominiak, Paulina M.; Ciszak, Ewa M.

2003-01-01

Using databases the authors have identified a common thiamin pyrophosphate (TPP)-motif in the family of functionally diverse TPP-dependent enzymes. This common motif consists of multimeric organization of subunits, two catalytic centers, common amino acid sequence, and specific contacts to provide a flip-flop, or alternate site, mechanism of action. Each catalytic center [PP:PYR] is formed at the interface of the PP-domain binding the magnesium ion, pyrophosphate and aminopyrimidine ring of TPP, and the PYR-domain binding the aminopyrimidine ring of that cofactor. A pair of these catalytic centers constitutes the catalytic core [PP:PYR]* within these enzymes. Analysis of the structural elements of this catalytic core reveals novel definition of the common amino acid sequences, which are GX@&(G)@XXGQ, and GDGX25-30 within the PP- domain, and the E&(G)@XXG@ within the PYR-domain, where Q, corresponds to a hydrophobic amino acid. This TPP-motif provides a novel tool for annotation of TPP-dependent enzymes useful in advancing functional proteomics.
Hybrid DNA i-motif: Aminoethylprolyl-PNA (pC5) enhance the stability of DNA (dC5) i-motif structure.

PubMed

Gade, Chandrasekhar Reddy; Sharma, Nagendra K

2017-12-15

This report describes the synthesis of C-rich sequence, cytosine pentamer, of aep-PNA and its biophysical studies for the formation of hybrid DNA:aep-PNAi-motif structure with DNA cytosine pentamer (dC 5 ) under acidic pH conditions. Herein, the CD/UV/NMR/ESI-Mass studies strongly support the formation of stable hybrid DNA i-motif structure with aep-PNA even near acidic conditions. Hence aep-PNA C-rich sequence cytosine could be considered as potential DNA i-motif stabilizing agents in vivo conditions. Copyright © 2017 Elsevier Ltd. All rights reserved.
Creation of hybrid nanorods from sequences of natural trimeric fibrous proteins using the fibritin trimerization motif.

PubMed

Papanikolopoulou, Katerina; van Raaij, Mark J; Mitraki, Anna

2008-01-01

Stable, artificial fibrous proteins that can be functionalized open new avenues in fields such as bionanomaterials design and fiber engineering. An important source of inspiration for the creation of such proteins are natural fibrous proteins such as collagen, elastin, insect silks, and fibers from phages and viruses. The fibrous parts of this last class of proteins usually adopt trimeric, beta-stranded structural folds and are appended to globular, receptor-binding domains. It has been recently shown that the globular domains are essential for correct folding and trimerization and can be successfully substituted by a very small (27-amino acid) trimerization motif from phage T4 fibritin. The hybrid proteins are correctly folded nanorods that can withstand extreme conditions. When the fibrous part derives from the adenovirus fiber shaft, different tissue-targeting specificities can be engineered into the hybrid proteins, which therefore can be used as gene therapy vectors. The integration of such stable nanorods in devices is also a big challenge in the field of biomechanical design. The fibritin foldon domain is a versatile trimerization motif and can be combined with a variety of fibrous motifs, such as coiled-coil, collagenous, and triple beta-stranded motifs, provided the appropriate linkers are used. The combination of different motifs within the same fibrous molecule to create stable rods with multiple functions can even be envisioned. We provide a comprehensive overview of the experimental procedures used for designing, creating, and characterizing hybrid fibrous nanorods using the fibritin trimerization motif.
Creation of Hybrid Nanorods From Sequences of Natural Trimeric Fibrous Proteins Using the Fibritin Trimerization Motif

NASA Astrophysics Data System (ADS)

Papanikolopoulou, Katerina; van Raaij, Mark J.; Mitraki, Anna

Stable, artificial fibrous proteins that can be functionalized open new avenues in fields such as bionanomaterials design and fiber engineering. An important source of inspiration for the creation of such proteins are natural fibrous proteins such as collagen, elastin, insect silks, and fibers from phages and viruses. The fibrous parts of this last class of proteins usually adopt trimeric, β-stranded structural folds and are appended to globular, receptor-binding domains. It has been recently shown that the globular domains are essential for correct folding and trimerization and can be successfully substituted by a very small (27-amino acid) trimerization motif from phage T4 fibritin. The hybrid proteins are correctly folded nanorods that can withstand extreme conditions. When the fibrous part derives from the adenovirus fiber shaft, different tissue-targeting specificities can be engineered into the hybrid proteins, which therefore can be used as gene therapy vectors. The integration of such stable nanorods in devices is also a big challenge in the field of biomechanical design. The fibritin foldon domain is a versatile trimerization motif and can be combined with a variety of fibrous motifs, such as coiled-coil, collagenous, and triple β-stranded motifs, provided the appropriate linkers are used. The combination of different motifs within the same fibrous molecule to create stable rods with multiple functions can even be envisioned. We provide a comprehensive overview of the experimental procedures used for designing, creating, and characterizing hybrid fibrous nanorods using the fibritin trimerization motif.
Limitations and potentials of current motif discovery algorithms

PubMed Central

Hu, Jianjun; Li, Bin; Kihara, Daisuke

2005-01-01

Computational methods for de novo identification of gene regulation elements, such as transcription factor binding sites, have proved to be useful for deciphering genetic regulatory networks. However, despite the availability of a large number of algorithms, their strengths and weaknesses are not sufficiently understood. Here, we designed a comprehensive set of performance measures and benchmarked five modern sequence-based motif discovery algorithms using large datasets generated from Escherichia coli RegulonDB. Factors that affect the prediction accuracy, scalability and reliability are characterized. It is revealed that the nucleotide and the binding site level accuracy are very low, while the motif level accuracy is relatively high, which indicates that the algorithms can usually capture at least one correct motif in an input sequence. To exploit diverse predictions from multiple runs of one or more algorithms, a consensus ensemble algorithm has been developed, which achieved 6–45% improvement over the base algorithms by increasing both the sensitivity and specificity. Our study illustrates limitations and potentials of existing sequence-based motif discovery algorithms. Taking advantage of the revealed potentials, several promising directions for further improvements are discussed. Since the sequence-based algorithms are the baseline of most of the modern motif discovery algorithms, this paper suggests substantial improvements would be possible for them. PMID:16284194
An analysis of the positional distribution of DNA motifs in promoter regions and its biological relevance.

PubMed

Casimiro, Ana C; Vinga, Susana; Freitas, Ana T; Oliveira, Arlindo L

2008-02-07

Motif finding algorithms have developed in their ability to use computationally efficient methods to detect patterns in biological sequences. However the posterior classification of the output still suffers from some limitations, which makes it difficult to assess the biological significance of the motifs found. Previous work has highlighted the existence of positional bias of motifs in the DNA sequences, which might indicate not only that the pattern is important, but also provide hints of the positions where these patterns occur preferentially. We propose to integrate position uniformity tests and over-representation tests to improve the accuracy of the classification of motifs. Using artificial data, we have compared three different statistical tests (Chi-Square, Kolmogorov-Smirnov and a Chi-Square bootstrap) to assess whether a given motif occurs uniformly in the promoter region of a gene. Using the test that performed better in this dataset, we proceeded to study the positional distribution of several well known cis-regulatory elements, in the promoter sequences of different organisms (S. cerevisiae, H. sapiens, D. melanogaster, E. coli and several Dicotyledons plants). The results show that position conservation is relevant for the transcriptional machinery. We conclude that many biologically relevant motifs appear heterogeneously distributed in the promoter region of genes, and therefore, that non-uniformity is a good indicator of biological relevance and can be used to complement over-representation tests commonly used. In this article we present the results obtained for the S. cerevisiae data sets.
Helix-packing motifs in membrane proteins.

PubMed

Walters, R F S; DeGrado, W F

2006-09-12

The fold of a helical membrane protein is largely determined by interactions between membrane-imbedded helices. To elucidate recurring helix-helix interaction motifs, we dissected the crystallographic structures of membrane proteins into a library of interacting helical pairs. The pairs were clustered according to their three-dimensional similarity (rmsd motifs whose structural features can be understood in terms of simple principles of helix-helix packing. Thus, the universe of common transmembrane helix-pairing motifs is relatively simple. The largest cluster, which comprises 29% of the library members, consists of an antiparallel motif with left-handed packing angles, and it is frequently stabilized by packing of small side chains occurring every seven residues in the sequence. Right-handed parallel and antiparallel structures show a similar tendency to segregate small residues to the helix-helix interface but spaced at four-residue intervals. Position-specific sequence propensities were derived for the most populated motifs. These structural and sequential motifs should be quite useful for the design and structural prediction of membrane proteins.
RNA 3D Structural Motifs: Definition, Identification, Annotation, and Database Searching

NASA Astrophysics Data System (ADS)

Nasalean, Lorena; Stombaugh, Jesse; Zirbel, Craig L.; Leontis, Neocles B.

Structured RNA molecules resemble proteins in the hierarchical organization of their global structures, folding and broad range of functions. Structured RNAs are composed of recurrent modular motifs that play specific functional roles. Some motifs direct the folding of the RNA or stabilize the folded structure through tertiary interactions. Others bind ligands or proteins or catalyze chemical reactions. Therefore, it is desirable, starting from the RNA sequence, to be able to predict the locations of recurrent motifs in RNA molecules. Conversely, the potential occurrence of one or more known 3D RNA motifs may indicate that a genomic sequence codes for a structured RNA molecule. To identify known RNA structural motifs in new RNA sequences, precise structure-based definitions are needed that specify the core nucleotides of each motif and their conserved interactions. By comparing instances of each recurrent motif and applying base pair isosteriCity relations, one can identify neutral mutations that preserve its structure and function in the contexts in which it occurs.
Classification and assessment tools for structural motif discovery algorithms.

PubMed

Badr, Ghada; Al-Turaiki, Isra; Mathkour, Hassan

2013-01-01

Motif discovery is the problem of finding recurring patterns in biological data. Patterns can be sequential, mainly when discovered in DNA sequences. They can also be structural (e.g. when discovering RNA motifs). Finding common structural patterns helps to gain a better understanding of the mechanism of action (e.g. post-transcriptional regulation). Unlike DNA motifs, which are sequentially conserved, RNA motifs exhibit conservation in structure, which may be common even if the sequences are different. Over the past few years, hundreds of algorithms have been developed to solve the sequential motif discovery problem, while less work has been done for the structural case. In this paper, we survey, classify, and compare different algorithms that solve the structural motif discovery problem, where the underlying sequences may be different. We highlight their strengths and weaknesses. We start by proposing a benchmark dataset and a measurement tool that can be used to evaluate different motif discovery approaches. Then, we proceed by proposing our experimental setup. Finally, results are obtained using the proposed benchmark to compare available tools. To the best of our knowledge, this is the first attempt to compare tools solely designed for structural motif discovery. Results show that the accuracy of discovered motifs is relatively low. The results also suggest a complementary behavior among tools where some tools perform well on simple structures, while other tools are better for complex structures. We have classified and evaluated the performance of available structural motif discovery tools. In addition, we have proposed a benchmark dataset with tools that can be used to evaluate newly developed tools.

Immune Selection In Vitro Reveals Human Immunodeficiency Virus Type 1 Nef Sequence Motifs Important for Its Immune Evasion Function In Vivo

PubMed Central

Lee, Patricia; Ng, Hwee L.; Yang, Otto O.

2012-01-01

Human immunodeficiency virus type 1 (HIV-1) Nef downregulates major histocompatibility complex class I (MHC-I), impairing the clearance of infected cells by CD8+ cytotoxic T lymphocytes (CTLs). While sequence motifs mediating this function have been determined by in vitro mutagenesis studies of laboratory-adapted HIV-1 molecular clones, it is unclear whether the highly variable Nef sequences of primary isolates in vivo rely on the same sequence motifs. To address this issue, nef quasispecies from nine chronically HIV-1-infected persons were examined for sequence evolution and altered MHC-I downregulatory function under Gag-specific CTL immune pressure in vitro. This selection resulted in decreased nef diversity and strong purifying selection. Site-by-site analysis identified 13 codons undergoing purifying selection and 1 undergoing positive selection. Of the former, only 6 have been reported to have roles in Nef function, including 4 associated with MHC-I downregulation. Functional testing of naturally occurring in vivo polymorphisms at the 7 sites with no previously known functional role revealed 3 mutations (A84D, Y135F, and G140R) that ablated MHC-I downregulation and 3 (N52A, S169I, and V180E) that partially impaired MHC-I downregulation. Globally, the CTL pressure in vitro selected functional Nef from the in vivo quasispecies mixtures that predominately lacked MHC-I downregulatory function at the baseline. Overall, these data demonstrate that CTL pressure exerts a strong purifying selective pressure for MHC-I downregulation and identifies novel functional motifs present in Nef sequences in vivo. PMID:22553319
iFORM: Incorporating Find Occurrence of Regulatory Motifs.

PubMed

Ren, Chao; Chen, Hebing; Yang, Bite; Liu, Feng; Ouyang, Zhangyi; Bo, Xiaochen; Shu, Wenjie

2016-01-01

Accurately identifying the binding sites of transcription factors (TFs) is crucial to understanding the mechanisms of transcriptional regulation and human disease. We present incorporating Find Occurrence of Regulatory Motifs (iFORM), an easy-to-use and efficient tool for scanning DNA sequences with TF motifs described as position weight matrices (PWMs). Both performance assessment with a receiver operating characteristic (ROC) curve and a correlation-based approach demonstrated that iFORM achieves higher accuracy and sensitivity by integrating five classical motif discovery programs using Fisher's combined probability test. We have used iFORM to provide accurate results on a variety of data in the ENCODE Project and the NIH Roadmap Epigenomics Project, and the tool has demonstrated its utility in further elucidating individual roles of functional elements. Both the source and binary codes for iFORM can be freely accessed at https://github.com/wenjiegroup/iFORM. The identified TF binding sites across human cell and tissue types using iFORM have been deposited in the Gene Expression Omnibus under the accession ID GSE53962.
Unitary circular code motifs in genomes of eukaryotes.

PubMed

El Soufi, Karim; Michel, Christian J

A set X of 20 trinucleotides was identified in genes of bacteria, eukaryotes, plasmids and viruses, which has in average the highest occurrence in reading frame compared to its two shifted frames (Michel, 2015; Arquès and Michel, 1996). This set X has an interesting mathematical property as X is a circular code (Arquès and Michel, 1996). Thus, the motifs from this circular code X, called X motifs, have the property to always retrieve, synchronize and maintain the reading frame in genes. The origin of this circular code X in genes is an open problem since its discovery in 1996. Here, we first show that the unitary circular codes (UCC), i.e. sets of one word, allow to generate unitary circular code motifs (UCC motifs), i.e. a concatenation of the same motif (simple repeats) leading to low complexity DNA. Three classes of UCC motifs are studied here: repeated dinucleotides (D + motifs), repeated trinucleotides (T + motifs) and repeated tetranucleotides (T + motifs). Thus, the D + , T + and T + motifs allow to retrieve, synchronize and maintain a frame modulo 2, modulo 3 and modulo 4, respectively, and their shifted frames (1 modulo 2; 1 and 2 modulo 3; 1, 2 and 3 modulo 4 according to the C 2 , C 3 and C 4 properties, respectively) in the DNA sequences. The statistical distribution of the D + , T + and T + motifs is analyzed in the genomes of eukaryotes. A UCC motif and its comp lementary UCC motif have the same distribution in the eukaryotic genomes. Furthermore, a UCC motif and its complementary UCC motif have increasing occurrences contrary to their number of hydrogen bonds, very significant with the T + motifs. The longest D + , T + and T + motifs in the studied eukaryotic genomes are also given. Surprisingly, a scarcity of repeated trinucleotides (T + motifs) in the large eukaryotic genomes is observed compared to the D + and T + motifs. This result has been investigated and may be explained by two outcomes. Repeated trinucleotides (T + motifs) are identified
De Novo Regulatory Motif Discovery Identifies Significant Motifs in Promoters of Five Classes of Plant Dehydrin Genes.

PubMed

Zolotarov, Yevgen; Strömvik, Martina

2015-01-01

Plants accumulate dehydrins in response to osmotic stresses. Dehydrins are divided into five different classes, which are thought to be regulated in different manners. To better understand differences in transcriptional regulation of the five dehydrin classes, de novo motif discovery was performed on 350 dehydrin promoter sequences from a total of 51 plant genomes. Overrepresented motifs were identified in the promoters of five dehydrin classes. The Kn dehydrin promoters contain motifs linked with meristem specific expression, as well as motifs linked with cold/dehydration and abscisic acid response. KS dehydrin promoters contain a motif with a GATA core. SKn and YnSKn dehydrin promoters contain motifs that match elements connected with cold/dehydration, abscisic acid and light response. YnKn dehydrin promoters contain motifs that match abscisic acid and light response elements, but not cold/dehydration response elements. Conserved promoter motifs are present in the dehydrin classes and across different plant lineages, indicating that dehydrin gene regulation is likely also conserved.
CircularLogo: A lightweight web application to visualize intra-motif dependencies.

PubMed

Ye, Zhenqing; Ma, Tao; Kalmbach, Michael T; Dasari, Surendra; Kocher, Jean-Pierre A; Wang, Liguo

2017-05-22

The sequence logo has been widely used to represent DNA or RNA motifs for more than three decades. Despite its intelligibility and intuitiveness, the traditional sequence logo is unable to display the intra-motif dependencies and therefore is insufficient to fully characterize nucleotide motifs. Many methods have been developed to quantify the intra-motif dependencies, but fewer tools are available for visualization. We developed CircularLogo, a web-based interactive application, which is able to not only visualize the position-specific nucleotide consensus and diversity but also display the intra-motif dependencies. Applying CircularLogo to HNF6 binding sites and tRNA sequences demonstrated its ability to show intra-motif dependencies and intuitively reveal biomolecular structure. CircularLogo is implemented in JavaScript and Python based on the Django web framework. The program's source code and user's manual are freely available at http://circularlogo.sourceforge.net . CircularLogo web server can be accessed from http://bioinformaticstools.mayo.edu/circularlogo/index.html . CircularLogo is an innovative web application that is specifically designed to visualize and interactively explore intra-motif dependencies.
A Bioinformatics Approach for Detecting Repetitive Nested Motifs using Pattern Matching.

PubMed

Romero, José R; Carballido, Jessica A; Garbus, Ingrid; Echenique, Viviana C; Ponzoni, Ignacio

2016-01-01

The identification of nested motifs in genomic sequences is a complex computational problem. The detection of these patterns is important to allow the discovery of transposable element (TE) insertions, incomplete reverse transcripts, deletions, and/or mutations. In this study, a de novo strategy for detecting patterns that represent nested motifs was designed based on exhaustive searches for pairs of motifs and combinatorial pattern analysis. These patterns can be grouped into three categories, motifs within other motifs, motifs flanked by other motifs, and motifs of large size. The methodology used in this study, applied to genomic sequences from the plant species Aegilops tauschii and Oryza sativa , revealed that it is possible to identify putative nested TEs by detecting these three types of patterns. The results were validated through BLAST alignments, which revealed the efficacy and usefulness of the new method, which is called Mamushka.
Transcriptional regulation of human eosinophil RNases by an evolutionary- conserved sequence motif in primate genome

PubMed Central

Wang, Hsiu-Yu; Chang, Hao-Teng; Pai, Tun-Wen; Wu, Chung-I; Lee, Yuan-Hung; Chang, Yen-Hsin; Tai, Hsiu-Ling; Tang, Chuan-Yi; Chou, Wei-Yao; Chang, Margaret Dah-Tsyr

2007-01-01

Background Human eosinophil-derived neurotoxin (edn) and eosinophil cationic protein (ecp) are members of a subfamily of primate ribonuclease (rnase) genes. Although they are generated by gene duplication event, distinct edn and ecp expression profile in various tissues have been reported. Results In this study, we obtained the upstream promoter sequences of several representative primate eosinophil rnases. Bioinformatic analysis revealed the presence of a shared 34-nucleotide (nt) sequence stretch located at -81 to -48 in all edn promoters and macaque ecp promoter. Such a unique sequence motif constituted a region essential for transactivation of human edn in hepatocellular carcinoma cells. Gel electrophoretic mobility shift assay, transient transfection and scanning mutagenesis experiments allowed us to identify binding sites for two transcription factors, Myc-associated zinc finger protein (MAZ) and SV-40 protein-1 (Sp1), within the 34-nt segment. Subsequent in vitro and in vivo binding assays demonstrated a direct molecular interaction between this 34-nt region and MAZ and Sp1. Interestingly, overexpression of MAZ and Sp1 respectively repressed and enhanced edn promoter activity. The regulatory transactivation motif was mapped to the evolutionarily conserved -74/-65 region of the edn promoter, which was guanidine-rich and critical for recognition by both transcription factors. Conclusion Our results provide the first direct evidence that MAZ and Sp1 play important roles on the transcriptional activation of the human edn promoter through specific binding to a 34-nt segment present in representative primate eosinophil rnase promoters. PMID:17927842
SARNAclust: Semi-automatic detection of RNA protein binding motifs from immunoprecipitation data

PubMed Central

Dotu, Ivan; Adamson, Scott I.; Coleman, Benjamin; Fournier, Cyril; Ricart-Altimiras, Emma; Eyras, Eduardo

2018-01-01

RNA-protein binding is critical to gene regulation, controlling fundamental processes including splicing, translation, localization and stability, and aberrant RNA-protein interactions are known to play a role in a wide variety of diseases. However, molecular understanding of RNA-protein interactions remains limited; in particular, identification of RNA motifs that bind proteins has long been challenging, especially when such motifs depend on both sequence and structure. Moreover, although RNA binding proteins (RBPs) often contain more than one binding domain, algorithms capable of identifying more than one binding motif simultaneously have not been developed. In this paper we present a novel pipeline to determine binding peaks in crosslinking immunoprecipitation (CLIP) data, to discover multiple possible RNA sequence/structure motifs among them, and to experimentally validate such motifs. At the core is a new semi-automatic algorithm SARNAclust, the first unsupervised method to identify and deconvolve multiple sequence/structure motifs simultaneously. SARNAclust computes similarity between sequence/structure objects using a graph kernel, providing the ability to isolate the impact of specific features through the bulge graph formalism. Application of SARNAclust to synthetic data shows its capability of clustering 5 motifs at once with a V-measure value of over 0.95, while GraphClust achieves only a V-measure of 0.083 and RNAcontext cannot detect any of the motifs. When applied to existing eCLIP sets, SARNAclust finds known motifs for SLBP and HNRNPC and novel motifs for several other RBPs such as AGGF1, AKAP8L and ILF3. We demonstrate an experimental validation protocol, a targeted Bind-n-Seq-like high-throughput sequencing approach that relies on RNA inverse folding for oligo pool design, that can validate the components within the SLBP motif. Finally, we use this protocol to experimentally interrogate the SARNAclust motif predictions for protein ILF3. Our
Identification of a novel calcium binding motif based on the detection of sequence insertions in the animal peroxidase domain of bacterial proteins.

PubMed

Santamaría-Hernando, Saray; Krell, Tino; Ramos-González, María-Isabel

2012-01-01

Proteins of the animal heme peroxidase (ANP) superfamily differ greatly in size since they have either one or two catalytic domains that match profile PS50292. The orf PP_2561 of Pseudomonas putida KT2440 that we have called PepA encodes a two-domain ANP. The alignment of these domains with those of PepA homologues revealed a variable number of insertions with the consensus G-x-D-G-x-x-[GN]-[TN]-x-D-D. This motif has also been detected in the structure of pseudopilin (pdb 3G20), where it was found to be involved in Ca(2+) coordination although a sequence analysis did not reveal the presence of any known calcium binding motifs in this protein. Isothermal titration calorimetry revealed that a peptide containing this consensus motif bound specifically calcium ions with affinities ranging between 33-79 µM depending on the pH. Microcalorimetric titrations of the purified N-terminal ANP-like domain of PepA revealed Ca(2+) binding with a K(D) of 12 µM and stoichiometry of 1.25 calcium ions per protein monomer. This domain exhibited peroxidase activity after its reconstitution with heme. These data led to the definition of a novel calcium binding motif that we have termed PERCAL and which was abundantly present in animal peroxidase-like domains of bacterial proteins. Bacterial heme peroxidases thus possess two different types of calcium binding motifs, namely PERCAL and the related hemolysin type calcium binding motif, with the latter being located outside the catalytic domains and in their C-terminal end. A phylogenetic tree of ANP-like catalytic domains of bacterial proteins with PERCAL motifs, including single domain peroxidases, was divided into two major clusters, representing domains with and without PERCAL motif containing insertions. We have verified that the recently reported classification of bacterial heme peroxidases in two families (cd09819 and cd09821) is unrelated to these insertions. Sequences matching PERCAL were detected in all kingdoms of life.
Selection of functional 2A sequences within foot-and-mouth disease virus; requirements for the NPGP motif with a distinct codon bias.

PubMed

Kjær, Jonas; Belsham, Graham J

2018-01-01

Foot-and-mouth disease virus (FMDV) has a positive-sense ssRNA genome including a single, large, open reading frame. Splitting of the encoded polyprotein at the 2A/2B junction is mediated by the 2A peptide (18 residues long), which induces a nonproteolytic, cotranslational "cleavage" at its own C terminus. A conserved feature among variants of 2A is the C-terminal motif N 16 P 17 G 18 /P 19 , where P 19 is the first residue of 2B. It has been shown previously that certain amino acid substitutions can be tolerated at residues E 14 , S 15 , and N 16 within the 2A sequence of infectious FMDVs, but no variants at residues P 17 , G 18 , or P 19 have been identified. In this study, using highly degenerate primers, we analyzed if any other residues can be present at each position of the NPG/P motif within infectious FMDV. No alternative forms of this motif were found to be encoded by rescued FMDVs after two, three, or four passages. However, surprisingly, a clear codon preference for the wt nucleotide sequence encoding the NPGP motif within these viruses was observed. Indeed, the codons selected to code for P 17 and P 19 within this motif were distinct; thus the synonymous codons are not equivalent. © 2018 Kjær and Belsham; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
MOCCS: Clarifying DNA-binding motif ambiguity using ChIP-Seq data.

PubMed

Ozaki, Haruka; Iwasaki, Wataru

2016-08-01

As a key mechanism of gene regulation, transcription factors (TFs) bind to DNA by recognizing specific short sequence patterns that are called DNA-binding motifs. A single TF can accept ambiguity within its DNA-binding motifs, which comprise both canonical (typical) and non-canonical motifs. Clarification of such DNA-binding motif ambiguity is crucial for revealing gene regulatory networks and evaluating mutations in cis-regulatory elements. Although chromatin immunoprecipitation sequencing (ChIP-seq) now provides abundant data on the genomic sequences to which a given TF binds, existing motif discovery methods are unable to directly answer whether a given TF can bind to a specific DNA-binding motif. Here, we report a method for clarifying the DNA-binding motif ambiguity, MOCCS. Given ChIP-Seq data of any TF, MOCCS comprehensively analyzes and describes every k-mer to which that TF binds. Analysis of simulated datasets revealed that MOCCS is applicable to various ChIP-Seq datasets, requiring only a few minutes per dataset. Application to the ENCODE ChIP-Seq datasets proved that MOCCS directly evaluates whether a given TF binds to each DNA-binding motif, even if known position weight matrix models do not provide sufficient information on DNA-binding motif ambiguity. Furthermore, users are not required to provide numerous parameters or background genomic sequence models that are typically unavailable. MOCCS is implemented in Perl and R and is freely available via https://github.com/yuifu/moccs. By complementing existing motif-discovery software, MOCCS will contribute to the basic understanding of how the genome controls diverse cellular processes via DNA-protein interactions. Copyright © 2016 Elsevier Ltd. All rights reserved.
Temporal motifs reveal homophily, gender-specific patterns, and group talk in call sequences.

PubMed

Kovanen, Lauri; Kaski, Kimmo; Kertész, János; Saramäki, Jari

2013-11-05

Recent studies on electronic communication records have shown that human communication has complex temporal structure. We study how communication patterns that involve multiple individuals are affected by attributes such as sex and age. To this end, we represent the communication records as a colored temporal network where node color is used to represent individuals' attributes, and identify patterns known as temporal motifs. We then construct a null model for the occurrence of temporal motifs that takes into account the interaction frequencies and connectivity between nodes of different colors. This null model allows us to detect significant patterns in call sequences that cannot be observed in a static network that uses interaction frequencies as link weights. We find sex-related differences in communication patterns in a large dataset of mobile phone records and show the existence of temporal homophily, the tendency of similar individuals to participate in communication patterns beyond what would be expected on the basis of their average interaction frequencies. We also show that temporal patterns differ between dense and sparse neighborhoods in the network. Because also this result is independent of interaction frequencies, it can be seen as an extension of Granovetter's hypothesis to temporal networks.
Searching RNA motifs and their intermolecular contacts with constraint networks.

PubMed

Thébault, P; de Givry, S; Schiex, T; Gaspin, C

2006-09-01

Searching RNA gene occurrences in genomic sequences is a task whose importance has been renewed by the recent discovery of numerous functional RNA, often interacting with other ligands. Even if several programs exist for RNA motif search, none exists that can represent and solve the problem of searching for occurrences of RNA motifs in interaction with other molecules. We present a constraint network formulation of this problem. RNA are represented as structured motifs that can occur on more than one sequence and which are related together by possible hybridization. The implemented tool MilPat is used to search for several sRNA families in genomic sequences. Results show that MilPat allows to efficiently search for interacting motifs in large genomic sequences and offers a simple and extensible framework to solve such problems. New and known sRNA are identified as H/ACA candidates in Methanocaldococcus jannaschii. http://carlit.toulouse.inra.fr/MilPaT/MilPat.pl.
Fast social-like learning of complex behaviors based on motor motifs.

PubMed

Calvo Tapia, Carlos; Tyukin, Ivan Y; Makarov, Valeri A

2018-05-01

Social learning is widely observed in many species. Less experienced agents copy successful behaviors exhibited by more experienced individuals. Nevertheless, the dynamical mechanisms behind this process remain largely unknown. Here we assume that a complex behavior can be decomposed into a sequence of n motor motifs. Then a neural network capable of activating motor motifs in a given sequence can drive an agent. To account for (n-1)! possible sequences of motifs in a neural network, we employ the winnerless competition approach. We then consider a teacher-learner situation: one agent exhibits a complex movement, while another one aims at mimicking the teacher's behavior. Despite the huge variety of possible motif sequences we show that the learner, equipped with the provided learning model, can rewire "on the fly" its synaptic couplings in no more than (n-1) learning cycles and converge exponentially to the durations of the teacher's motifs. We validate the learning model on mobile robots. Experimental results show that the learner is indeed capable of copying the teacher's behavior composed of six motor motifs in a few learning cycles. The reported mechanism of learning is general and can be used for replicating different functions, including, for example, sound patterns or speech.
Fast social-like learning of complex behaviors based on motor motifs

NASA Astrophysics Data System (ADS)

Calvo Tapia, Carlos; Tyukin, Ivan Y.; Makarov, Valeri A.

2018-05-01

Social learning is widely observed in many species. Less experienced agents copy successful behaviors exhibited by more experienced individuals. Nevertheless, the dynamical mechanisms behind this process remain largely unknown. Here we assume that a complex behavior can be decomposed into a sequence of n motor motifs. Then a neural network capable of activating motor motifs in a given sequence can drive an agent. To account for (n -1 )! possible sequences of motifs in a neural network, we employ the winnerless competition approach. We then consider a teacher-learner situation: one agent exhibits a complex movement, while another one aims at mimicking the teacher's behavior. Despite the huge variety of possible motif sequences we show that the learner, equipped with the provided learning model, can rewire "on the fly" its synaptic couplings in no more than (n -1 ) learning cycles and converge exponentially to the durations of the teacher's motifs. We validate the learning model on mobile robots. Experimental results show that the learner is indeed capable of copying the teacher's behavior composed of six motor motifs in a few learning cycles. The reported mechanism of learning is general and can be used for replicating different functions, including, for example, sound patterns or speech.
RSAT 2018: regulatory sequence analysis tools 20th anniversary.

PubMed

Nguyen, Nga Thi Thuy; Contreras-Moreira, Bruno; Castro-Mondragon, Jaime A; Santana-Garcia, Walter; Ossio, Raul; Robles-Espinoza, Carla Daniela; Bahin, Mathieu; Collombet, Samuel; Vincens, Pierre; Thieffry, Denis; van Helden, Jacques; Medina-Rivera, Alejandra; Thomas-Chollier, Morgane

2018-05-02

RSAT (Regulatory Sequence Analysis Tools) is a suite of modular tools for the detection and the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, including from genome-wide datasets like ChIP-seq/ATAC-seq, (ii) motif scanning, (iii) motif analysis (quality assessment, comparisons and clustering), (iv) analysis of regulatory variations, (v) comparative genomics. Six public servers jointly support 10 000 genomes from all kingdoms. Six novel or refactored programs have been added since the 2015 NAR Web Software Issue, including updated programs to analyse regulatory variants (retrieve-variation-seq, variation-scan, convert-variations), along with tools to extract sequences from a list of coordinates (retrieve-seq-bed), to select motifs from motif collections (retrieve-matrix), and to extract orthologs based on Ensembl Compara (get-orthologs-compara). Three use cases illustrate the integration of new and refactored tools to the suite. This Anniversary update gives a 20-year perspective on the software suite. RSAT is well-documented and available through Web sites, SOAP/WSDL (Simple Object Access Protocol/Web Services Description Language) web services, virtual machines and stand-alone programs at http://www.rsat.eu/.
Efficient farnesylation of an extended C-terminal C(x)3X sequence motif expands the scope of the prenylated proteome.

PubMed

Blanden, Melanie J; Suazo, Kiall F; Hildebrandt, Emily R; Hardgrove, Daniel S; Patel, Meet; Saunders, William P; Distefano, Mark D; Schmidt, Walter K; Hougland, James L

2018-02-23

Protein prenylation is a post-translational modification that has been most commonly associated with enabling protein trafficking to and interaction with cellular membranes. In this process, an isoprenoid group is attached to a cysteine near the C terminus of a substrate protein by protein farnesyltransferase (FTase) or protein geranylgeranyltransferase type I or II (GGTase-I and GGTase-II). FTase and GGTase-I have long been proposed to specifically recognize a four-amino acid C AAX C-terminal sequence within their substrates. Surprisingly, genetic screening reveals that yeast FTase can modify sequences longer than the canonical C AAX sequence, specifically C( x ) 3 X sequences with four amino acids downstream of the cysteine. Biochemical and cell-based studies using both peptide and protein substrates reveal that mammalian FTase orthologs can also prenylate C( x ) 3 X sequences. As the search to identify physiologically relevant C( x ) 3 X proteins begins, this new prenylation motif nearly doubles the number of proteins within the yeast and human proteomes that can be explored as potential FTase substrates. This work expands our understanding of prenylation's impact within the proteome, establishes the biologically relevant reactivity possible with this new motif, and opens new frontiers in determining the impact of non-canonically prenylated proteins on cell function. © 2018 by The American Society for Biochemistry and Molecular Biology, Inc.
Exceptional motifs in different Markov chain models for a statistical analysis of DNA sequences.

PubMed

Schbath, S; Prum, B; de Turckheim, E

1995-01-01

Identifying exceptional motifs is often used for extracting information from long DNA sequences. The two difficulties of the method are the choice of the model that defines the expected frequencies of words and the approximation of the variance of the difference T(W) between the number of occurrences of a word W and its estimation. We consider here different Markov chain models, either with stationary or periodic transition probabilities. We estimate the variance of the difference T(W) by the conditional variance of the number of occurrences of W given the oligonucleotides counts that define the model. Two applications show how to use asymptotically standard normal statistics associated with the counts to describe a given sequence in terms of its outlying words. Sequences of Escherichia coli and of Bacillus subtilis are compared with respect to their exceptional tri- and tetranucleotides. For both bacteria, exceptional 3-words are mainly found in the coding frame. E. coli palindrome counts are analyzed in different models, showing that many overabundant words are one-letter mutations of avoided palindromes.
Temporal motifs reveal homophily, gender-specific patterns, and group talk in call sequences

PubMed Central

Kovanen, Lauri; Kaski, Kimmo; Kertész, János; Saramäki, Jari

2013-01-01

Recent studies on electronic communication records have shown that human communication has complex temporal structure. We study how communication patterns that involve multiple individuals are affected by attributes such as sex and age. To this end, we represent the communication records as a colored temporal network where node color is used to represent individuals’ attributes, and identify patterns known as temporal motifs. We then construct a null model for the occurrence of temporal motifs that takes into account the interaction frequencies and connectivity between nodes of different colors. This null model allows us to detect significant patterns in call sequences that cannot be observed in a static network that uses interaction frequencies as link weights. We find sex-related differences in communication patterns in a large dataset of mobile phone records and show the existence of temporal homophily, the tendency of similar individuals to participate in communication patterns beyond what would be expected on the basis of their average interaction frequencies. We also show that temporal patterns differ between dense and sparse neighborhoods in the network. Because also this result is independent of interaction frequencies, it can be seen as an extension of Granovetter’s hypothesis to temporal networks. PMID:24145424
Deciphering functional glycosaminoglycan motifs in development.

PubMed

Townley, Robert A; Bülow, Hannes E

2018-03-23

Glycosaminoglycans (GAGs) such as heparan sulfate, chondroitin/dermatan sulfate, and keratan sulfate are linear glycans, which when attached to protein backbones form proteoglycans. GAGs are essential components of the extracellular space in metazoans. Extensive modifications of the glycans such as sulfation, deacetylation and epimerization create structural GAG motifs. These motifs regulate protein-protein interactions and are thereby repsonsible for many of the essential functions of GAGs. This review focusses on recent genetic approaches to characterize GAG motifs and their function in defined signaling pathways during development. We discuss a coding approach for GAGs that would enable computational analyses of GAG sequences such as alignments and the computation of position weight matrices to describe GAG motifs. Copyright © 2018 Elsevier Ltd. All rights reserved.

SALAD database: a motif-based database of protein annotations for plant comparative genomics

PubMed Central

Mihara, Motohiro; Itoh, Takeshi; Izawa, Takeshi

2010-01-01

Proteins often have several motifs with distinct evolutionary histories. Proteins with similar motifs have similar biochemical properties and thus related biological functions. We constructed a unique comparative genomics database termed the SALAD database (http://salad.dna.affrc.go.jp/salad/) from plant-genome-based proteome data sets. We extracted evolutionarily conserved motifs by MEME software from 209 529 protein-sequence annotation groups selected by BLASTP from the proteome data sets of 10 species: rice, sorghum, Arabidopsis thaliana, grape, a lycophyte, a moss, 3 algae, and yeast. Similarity clustering of each protein group was performed by pairwise scoring of the motif patterns of the sequences. The SALAD database provides a user-friendly graphical viewer that displays a motif pattern diagram linked to the resulting bootstrapped dendrogram for each protein group. Amino-acid-sequence-based and nucleotide-sequence-based phylogenetic trees for motif combination alignment, a logo comparison diagram for each clade in the tree, and a Pfam-domain pattern diagram are also available. We also developed a viewer named ‘SALAD on ARRAYs’ to view arbitrary microarray data sets of paralogous genes linked to the same dendrogram in a window. The SALAD database is a powerful tool for comparing protein sequences and can provide valuable hints for biological analysis. PMID:19854933
SALAD database: a motif-based database of protein annotations for plant comparative genomics.

PubMed

Mihara, Motohiro; Itoh, Takeshi; Izawa, Takeshi

2010-01-01

Proteins often have several motifs with distinct evolutionary histories. Proteins with similar motifs have similar biochemical properties and thus related biological functions. We constructed a unique comparative genomics database termed the SALAD database (http://salad.dna.affrc.go.jp/salad/) from plant-genome-based proteome data sets. We extracted evolutionarily conserved motifs by MEME software from 209,529 protein-sequence annotation groups selected by BLASTP from the proteome data sets of 10 species: rice, sorghum, Arabidopsis thaliana, grape, a lycophyte, a moss, 3 algae, and yeast. Similarity clustering of each protein group was performed by pairwise scoring of the motif patterns of the sequences. The SALAD database provides a user-friendly graphical viewer that displays a motif pattern diagram linked to the resulting bootstrapped dendrogram for each protein group. Amino-acid-sequence-based and nucleotide-sequence-based phylogenetic trees for motif combination alignment, a logo comparison diagram for each clade in the tree, and a Pfam-domain pattern diagram are also available. We also developed a viewer named 'SALAD on ARRAYs' to view arbitrary microarray data sets of paralogous genes linked to the same dendrogram in a window. The SALAD database is a powerful tool for comparing protein sequences and can provide valuable hints for biological analysis.
Convergent evolution and mimicry of protein linear motifs in host-pathogen interactions.

PubMed

Chemes, Lucía Beatriz; de Prat-Gay, Gonzalo; Sánchez, Ignacio Enrique

2015-06-01

Pathogen linear motif mimics are highly evolvable elements that facilitate rewiring of host protein interaction networks. Host linear motifs and pathogen mimics differ in sequence, leading to thermodynamic and structural differences in the resulting protein-protein interactions. Moreover, the functional output of a mimic depends on the motif and domain repertoire of the pathogen protein. Regulatory evolution mediated by linear motifs can be understood by measuring evolutionary rates, quantifying positive and negative selection and performing phylogenetic reconstructions of linear motif natural history. Convergent evolution of linear motif mimics is widespread among unrelated proteins from viral, prokaryotic and eukaryotic pathogens and can also take place within individual protein phylogenies. Statistics, biochemistry and laboratory models of infection link pathogen linear motifs to phenotypic traits such as tropism, virulence and oncogenicity. In vitro evolution experiments and analysis of natural sequences suggest that changes in linear motif composition underlie pathogen adaptation to a changing environment. Copyright © 2015 Elsevier Ltd. All rights reserved.
Finding specific RNA motifs: Function in a zeptomole world?

PubMed Central

KNIGHT, ROB; YARUS, MICHAEL

2003-01-01

We have developed a new method for estimating the abundance of any modular (piecewise) RNA motif within a longer random region. We have used this method to estimate the size of the active motifs available to modern SELEX experiments (picomoles of unique sequences) and to a plausible RNA World (zeptomoles of unique sequences: 1 zmole = 602 sequences). Unexpectedly, activities such as specific isoleucine binding are almost certainly present in zeptomoles of molecules, and even ribozymes such as self-cleavage motifs may appear (depending on assumptions about the minimal structures). The number of specified nucleotides is not the only important determinant of a motif’s rarity: The number of modules into which it is divided, and the details of this division, are also crucial. We propose three maxims for easily isolated motifs: the Maxim of Minimization, the Maxim of Multiplicity, and the Maxim of the Median. These maxims together state that selected motifs should be small and composed of as many separate, equally sized modules as possible. For evenly divided motifs with four modules, the largest accessible activity in picomole scale (1–1000 pmole) pools of length 100 is about 34 nucleotides; while for zeptomole scale (1–1000 zmole) pools it is about 20 specific nucleotides (50% probability of occurrence). This latter figure includes some ribozymes and aptamers. Consequently, an RNA metabolism apparently could have begun with only zeptomoles of RNA molecules. PMID:12554865
D-MATRIX: A web tool for constructing weight matrix of conserved DNA motifs

PubMed Central

Sen, Naresh; Mishra, Manoj; Khan, Feroz; Meena, Abha; Sharma, Ashok

2009-01-01

Despite considerable efforts to date, DNA motif prediction in whole genome remains a challenge for researchers. Currently the genome wide motif prediction tools required either direct pattern sequence (for single motif) or weight matrix (for multiple motifs). Although there are known motif pattern databases and tools for genome level prediction but no tool for weight matrix construction. Considering this, we developed a D-MATRIX tool which predicts the different types of weight matrix based on user defined aligned motif sequence set and motif width. For retrieval of known motif sequences user can access the commonly used databases such as TFD, RegulonDB, DBTBS, Transfac. DMATRIX program uses a simple statistical approach for weight matrix construction, which can be converted into different file formats according to user requirement. It provides the possibility to identify the conserved motifs in the coregulated genes or whole genome. As example, we successfully constructed the weight matrix of LexA transcription factor binding site with the help of known sosbox cisregulatory elements in Deinococcus radiodurans genome. The algorithm is implemented in C-Sharp and wrapped in ASP.Net to maintain a user friendly web interface. DMATRIX tool is accessible through the CIMAP domain network. Availability http://203.190.147.116/dmatrix/ PMID:19759861
D-MATRIX: a web tool for constructing weight matrix of conserved DNA motifs.

PubMed

Sen, Naresh; Mishra, Manoj; Khan, Feroz; Meena, Abha; Sharma, Ashok

2009-07-27

Despite considerable efforts to date, DNA motif prediction in whole genome remains a challenge for researchers. Currently the genome wide motif prediction tools required either direct pattern sequence (for single motif) or weight matrix (for multiple motifs). Although there are known motif pattern databases and tools for genome level prediction but no tool for weight matrix construction. Considering this, we developed a D-MATRIX tool which predicts the different types of weight matrix based on user defined aligned motif sequence set and motif width. For retrieval of known motif sequences user can access the commonly used databases such as TFD, RegulonDB, DBTBS, Transfac. D-MATRIX program uses a simple statistical approach for weight matrix construction, which can be converted into different file formats according to user requirement. It provides the possibility to identify the conserved motifs in the co-regulated genes or whole genome. As example, we successfully constructed the weight matrix of LexA transcription factor binding site with the help of known sos-box cis-regulatory elements in Deinococcus radiodurans genome. The algorithm is implemented in C-Sharp and wrapped in ASP.Net to maintain a user friendly web interface. D-MATRIX tool is accessible through the CIMAP domain network. http://203.190.147.116/dmatrix/
Identification of cancer-specific motifs in mimotope profiles of serum antibody repertoire.

PubMed

Gerasimov, Ekaterina; Zelikovsky, Alex; Măndoiu, Ion; Ionov, Yurij

2017-06-07

For fighting cancer, earlier detection is crucial. Circulating auto-antibodies produced by the patient's own immune system after exposure to cancer proteins are promising bio-markers for the early detection of cancer. Since an antibody recognizes not the whole antigen but 4-7 critical amino acids within the antigenic determinant (epitope), the whole proteome can be represented by a random peptide phage display library. This opens the possibility to develop an early cancer detection test based on a set of peptide sequences identified by comparing cancer patients' and healthy donors' global peptide profiles of antibody specificities. Due to the enormously large number of peptide sequences contained in global peptide profiles generated by next generation sequencing, the large number of cancer and control sera is required to identify cancer-specific peptides with high degree of statistical significance. To decrease the number of peptides in profiles generated by nextgen sequencing without losing cancer-specific sequences we used for generation of profiles the phage library enriched by panning on the pool of cancer sera. To further decrease the complexity of profiles we used computational methods for transforming a list of peptides constituting the mimotope profiles to the list motifs formed by similar peptide sequences. We have shown that the amino-acid order is meaningful in mimotope motifs since they contain significantly more peptides than motifs among peptides where amino-acids are randomly permuted. Also the single sample motifs significantly differ from motifs in peptides drawn from multiple samples. Finally, multiple cancer-specific motifs have been identified.
Evidence for the Concerted Evolution between Short Linear Protein Motifs and Their Flanking Regions

PubMed Central

Chica, Claudia; Diella, Francesca; Gibson, Toby J.

2009-01-01

Background Linear motifs are short modules of protein sequences that play a crucial role in mediating and regulating many protein–protein interactions. The function of linear motifs strongly depends on the context, e.g. functional instances mainly occur inside flexible regions that are accessible for interaction. Sometimes linear motifs appear as isolated islands of conservation in multiple sequence alignments. However, they also occur in larger blocks of sequence conservation, suggesting an active role for the neighbouring amino acids. Results The evolution of regions flanking 116 functional linear motif instances was studied. The conservation of the amino acid sequence and order/disorder tendency of those regions was related to presence/absence of the instance. For the majority of the analysed instances, the pairs of sequences conserving the linear motif were also observed to maintain a similar local structural tendency and/or to have higher local sequence conservation when compared to pairs of sequences where one is missing the linear motif. Furthermore, those instances have a higher chance to co–evolve with the neighbouring residues in comparison to the distant ones. Those findings are supported by examples where the regulation of the linear motif–mediated interaction has been shown to depend on the modifications (e.g. phosphorylation) at neighbouring positions or is thought to benefit from the binding versatility of disordered regions. Conclusion The results suggest that flanking regions are relevant for linear motif–mediated interactions, both at the structural and sequence level. More interestingly, they indicate that the prediction of linear motif instances can be enriched with contextual information by performing a sequence analysis similar to the one presented here. This can facilitate the understanding of the role of these predicted instances in determining the protein function inside the broader context of the cellular network where they arise
Factoring local sequence composition in motif significance analysis.

PubMed

Ng, Patrick; Keich, Uri

2008-01-01

We recently introduced a biologically realistic and reliable significance analysis of the output of a popular class of motif finders. In this paper we further improve our significance analysis by incorporating local base composition information. Relying on realistic biological data simulation, as well as on FDR analysis applied to real data, we show that our method is significantly better than the increasingly popular practice of using the normal approximation to estimate the significance of a finder's output. Finally we turn to leveraging our reliable significance analysis to improve the actual motif finding task. Specifically, endowing a variant of the Gibbs Sampler with our improved significance analysis we demonstrate that de novo finders can perform better than has been perceived. Significantly, our new variant outperforms all the finders reviewed in a recently published comprehensive analysis of the Harbison genome-wide binding location data. Interestingly, many of these finders incorporate additional information such as nucleosome positioning and the significance of binding data.
The Thiamine-Pyrophosphate-Motif

NASA Technical Reports Server (NTRS)

Ciszak, Ewa; Dominiak, Paulina

2004-01-01

Thiamin pyrophosphate (TPP), a derivative of vitamin B1, is a cofactor for enzymes performing catalysis in pathways of energy production including the well known decarboxylation of a-keto acid dehydrogenases followed by transketolation. TPP-dependent enzymes constitute a structurally and functionally diverse group exhibiting multimeric subunit organization, multiple domains and two chemically equivalent catalytic centers. Annotation of functional TPP-dependcnt enzymes, therefore, has not been trivial due to low sequence similarity related to this complex organization. Our approach to analysis of structures of known TPP-dependent enzymes reveals for the first time features common to this group, which we have termed the TPP-motif. The TPP-motif consists of specific spatial arrangements of structural elements and their specific contacts to provide for a flip-flop, or alternate site, enzymatic mechanism of action. Analysis of structural elements entrained in the flip-flop action displayed by TPP-dependent enzymes reveals a novel definition of the common amino acid sequences. These sequences allow for annotation of TPP-dependent enzymes, thus advancing functional proteomics. Further details of three-dimensional structures of TPP-dependent enzymes will be discussed.
A three-dimensional RNA motif in Potato spindle tuber viroid mediates trafficking from palisade mesophyll to spongy mesophyll in Nicotiana benthamiana.

PubMed

Takeda, Ryuta; Petrov, Anton I; Leontis, Neocles B; Ding, Biao

2011-01-01

Cell-to-cell trafficking of RNA is an emerging biological principle that integrates systemic gene regulation, viral infection, antiviral response, and cell-to-cell communication. A key mechanistic question is how an RNA is specifically selected for trafficking from one type of cell into another type. Here, we report the identification of an RNA motif in Potato spindle tuber viroid (PSTVd) required for trafficking from palisade mesophyll to spongy mesophyll in Nicotiana benthamiana leaves. This motif, called loop 6, has the sequence 5'-CGA-3'...5'-GAC-3' flanked on both sides by cis Watson-Crick G/C and G/U wobble base pairs. We present a three-dimensional (3D) structural model of loop 6 that specifies all non-Watson-Crick base pair interactions, derived by isostericity-based sequence comparisons with 3D RNA motifs from the RNA x-ray crystal structure database. The model is supported by available chemical modification patterns, natural sequence conservation/variations in PSTVd isolates and related species, and functional characterization of all possible mutants for each of the loop 6 base pairs. Our findings and approaches have broad implications for studying the 3D RNA structural motifs mediating trafficking of diverse RNA species across specific cellular boundaries and for studying the structure-function relationships of RNA motifs in other biological processes.
NNAlign: A Web-Based Prediction Method Allowing Non-Expert End-User Discovery of Sequence Motifs in Quantitative Peptide Data

PubMed Central

Andreatta, Massimo; Schafer-Nielsen, Claus; Lund, Ole; Buus, Søren; Nielsen, Morten

2011-01-01

Recent advances in high-throughput technologies have made it possible to generate both gene and protein sequence data at an unprecedented rate and scale thereby enabling entirely new “omics”-based approaches towards the analysis of complex biological processes. However, the amount and complexity of data that even a single experiment can produce seriously challenges researchers with limited bioinformatics expertise, who need to handle, analyze and interpret the data before it can be understood in a biological context. Thus, there is an unmet need for tools allowing non-bioinformatics users to interpret large data sets. We have recently developed a method, NNAlign, which is generally applicable to any biological problem where quantitative peptide data is available. This method efficiently identifies underlying sequence patterns by simultaneously aligning peptide sequences and identifying motifs associated with quantitative readouts. Here, we provide a web-based implementation of NNAlign allowing non-expert end-users to submit their data (optionally adjusting method parameters), and in return receive a trained method (including a visual representation of the identified motif) that subsequently can be used as prediction method and applied to unknown proteins/peptides. We have successfully applied this method to several different data sets including peptide microarray-derived sets containing more than 100,000 data points. NNAlign is available online at http://www.cbs.dtu.dk/services/NNAlign. PMID:22073191
NNAlign: a web-based prediction method allowing non-expert end-user discovery of sequence motifs in quantitative peptide data.

PubMed

Andreatta, Massimo; Schafer-Nielsen, Claus; Lund, Ole; Buus, Søren; Nielsen, Morten

2011-01-01

Recent advances in high-throughput technologies have made it possible to generate both gene and protein sequence data at an unprecedented rate and scale thereby enabling entirely new "omics"-based approaches towards the analysis of complex biological processes. However, the amount and complexity of data that even a single experiment can produce seriously challenges researchers with limited bioinformatics expertise, who need to handle, analyze and interpret the data before it can be understood in a biological context. Thus, there is an unmet need for tools allowing non-bioinformatics users to interpret large data sets. We have recently developed a method, NNAlign, which is generally applicable to any biological problem where quantitative peptide data is available. This method efficiently identifies underlying sequence patterns by simultaneously aligning peptide sequences and identifying motifs associated with quantitative readouts. Here, we provide a web-based implementation of NNAlign allowing non-expert end-users to submit their data (optionally adjusting method parameters), and in return receive a trained method (including a visual representation of the identified motif) that subsequently can be used as prediction method and applied to unknown proteins/peptides. We have successfully applied this method to several different data sets including peptide microarray-derived sets containing more than 100,000 data points. NNAlign is available online at http://www.cbs.dtu.dk/services/NNAlign.
UTRdb and UTRsite: a collection of sequences and regulatory motifs of the untranslated regions of eukaryotic mRNAs

PubMed Central

Mignone, Flavio; Grillo, Giorgio; Licciulli, Flavio; Iacono, Michele; Liuni, Sabino; Kersey, Paul J.; Duarte, Jorge; Saccone, Cecilia; Pesole, Graziano

2005-01-01

The 5′ and 3′ untranslated regions of eukaryotic mRNAs play crucial roles in the post-transcriptional regulation of gene expression through the modulation of nucleo-cytoplasmic mRNA transport, translation efficiency, subcellular localization and message stability. UTRdb is a curated database of 5′ and 3′ untranslated sequences of eukaryotic mRNAs, derived from several sources of primary data. Experimentally validated functional motifs are annotated (and also collated as the UTRsite database) and cross-links to genomic and protein data are provided. The integration of UTRdb with genomic and protein data has allowed the implementation of a powerful retrieval resource for the selection and extraction of UTR subsets based on their genomic coordinates and/or features of the protein encoded by the relevant mRNA (e.g. GO term, PFAM domain, etc.). All internet resources implemented for retrieval and functional analysis of 5′ and 3′ untranslated regions of eukaryotic mRNAs are accessible at http://www.ba.itb.cnr.it/UTR/. PMID:15608165
Analysis of alkaptonuria (AKU) mutations and polymorphisms reveals that the CCC sequence motif is a mutational hot spot in the homogentisate 1,2 dioxygenase gene (HGO).

PubMed Central

Beltrán-Valero de Bernabé, D; Jimenez, F J; Aquaron, R; Rodríguez de Córdoba, S

1999-01-01

We recently showed that alkaptonuria (AKU) is caused by loss-of-function mutations in the homogentisate 1,2 dioxygenase gene (HGO). Herein we describe haplotype and mutational analyses of HGO in seven new AKU pedigrees. These analyses identified two novel single-nucleotide polymorphisms (INV4+31A-->G and INV11+18A-->G) and six novel AKU mutations (INV1-1G-->A, W60G, Y62C, A122D, P230T, and D291E), which further illustrates the remarkable allelic heterogeneity found in AKU. Reexamination of all 29 mutations and polymorphisms thus far described in HGO shows that these nucleotide changes are not randomly distributed; the CCC sequence motif and its inverted complement, GGG, are preferentially mutated. These analyses also demonstrated that the nucleotide substitutions in HGO do not involve CpG dinucleotides, which illustrates important differences between HGO and other genes for the occurrence of mutation at specific short-sequence motifs. Because the CCC sequence motifs comprise a significant proportion (34.5%) of all mutated bases that have been observed in HGO, we conclude that the CCC triplet is a mutational hot spot in HGO. PMID:10205262
Direct AUC optimization of regulatory motifs.

PubMed

Zhu, Lin; Zhang, Hong-Bo; Huang, De-Shuang

2017-07-15

The discovery of transcription factor binding site (TFBS) motifs is essential for untangling the complex mechanism of genetic variation under different developmental and environmental conditions. Among the huge amount of computational approaches for de novo identification of TFBS motifs, discriminative motif learning (DML) methods have been proven to be promising for harnessing the discovery power of accumulated huge amount of high-throughput binding data. However, they have to sacrifice accuracy for speed and could fail to fully utilize the information of the input sequences. We propose a novel algorithm called CDAUC for optimizing DML-learned motifs based on the area under the receiver-operating characteristic curve (AUC) criterion, which has been widely used in the literature to evaluate the significance of extracted motifs. We show that when the considered AUC loss function is optimized in a coordinate-wise manner, the cost function of each resultant sub-problem is a piece-wise constant function, whose optimal value can be found exactly and efficiently. Further, a key step of each iteration of CDAUC can be efficiently solved as a computational geometry problem. Experimental results on real world high-throughput datasets illustrate that CDAUC outperforms competing methods for refining DML motifs, while being one order of magnitude faster. Meanwhile, preliminary results also show that CDAUC may also be useful for improving the interpretability of convolutional kernels generated by the emerging deep learning approaches for predicting TF sequences specificities. CDAUC is available at: https://drive.google.com/drive/folders/0BxOW5MtIZbJjNFpCeHlBVWJHeW8 . dshuang@tongji.edu.cn. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
CisSERS: Customizable in silico sequence evaluation for restriction sites

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sharpe, Richard M.; Koepke, Tyson; Harper, Artemus

High-throughput sequencing continues to produce an immense volume of information that is processed and assembled into mature sequence data. Here, data analysis tools are urgently needed that leverage the embedded DNA sequence polymorphisms and consequent changes to restriction sites or sequence motifs in a high-throughput manner to enable biological experimentation. CisSERS was developed as a standalone open source tool to analyze sequence datasets and provide biologists with individual or comparative genome organization information in terms of presence and frequency of patterns or motifs such as restriction enzymes. Predicted agarose gel visualization of the custom analyses results was also integrated tomore » enhance the usefulness of the software. CisSERS offers several novel functionalities, such as handling of large and multiple datasets in parallel, multiple restriction enzyme site detection and custom motif detection features, which are seamlessly integrated with real time agarose gel visualization. Using a simple fasta-formatted file as input, CisSERS utilizes the REBASE enzyme database. Results from CisSERSenable the user to make decisions for designing genotyping by sequencing experiments, reduced representation sequencing, 3’UTR sequencing, and cleaved amplified polymorphic sequence (CAPS) molecular markers for large sample sets. CisSERS is a java based graphical user interface built around a perl backbone. Several of the applications of CisSERS including CAPS molecular marker development were successfully validated using wet-lab experimentation. Here, we present the tool CisSERSand results from in-silico and corresponding wet-lab analyses demonstrating that CisSERS is a technology platform solution that facilitates efficient data utilization in genomics and genetics studies.« less
CisSERS: Customizable in silico sequence evaluation for restriction sites

DOE PAGES

Sharpe, Richard M.; Koepke, Tyson; Harper, Artemus; ...

2016-04-12

High-throughput sequencing continues to produce an immense volume of information that is processed and assembled into mature sequence data. Here, data analysis tools are urgently needed that leverage the embedded DNA sequence polymorphisms and consequent changes to restriction sites or sequence motifs in a high-throughput manner to enable biological experimentation. CisSERS was developed as a standalone open source tool to analyze sequence datasets and provide biologists with individual or comparative genome organization information in terms of presence and frequency of patterns or motifs such as restriction enzymes. Predicted agarose gel visualization of the custom analyses results was also integrated tomore » enhance the usefulness of the software. CisSERS offers several novel functionalities, such as handling of large and multiple datasets in parallel, multiple restriction enzyme site detection and custom motif detection features, which are seamlessly integrated with real time agarose gel visualization. Using a simple fasta-formatted file as input, CisSERS utilizes the REBASE enzyme database. Results from CisSERSenable the user to make decisions for designing genotyping by sequencing experiments, reduced representation sequencing, 3’UTR sequencing, and cleaved amplified polymorphic sequence (CAPS) molecular markers for large sample sets. CisSERS is a java based graphical user interface built around a perl backbone. Several of the applications of CisSERS including CAPS molecular marker development were successfully validated using wet-lab experimentation. Here, we present the tool CisSERSand results from in-silico and corresponding wet-lab analyses demonstrating that CisSERS is a technology platform solution that facilitates efficient data utilization in genomics and genetics studies.« less
SVM2Motif—Reconstructing Overlapping DNA Sequence Motifs by Mimicking an SVM Predictor

PubMed Central

Vidovic, Marina M. -C.; Görnitz, Nico; Müller, Klaus-Robert; Rätsch, Gunnar; Kloft, Marius

2015-01-01

Identifying discriminative motifs underlying the functionality and evolution of organisms is a major challenge in computational biology. Machine learning approaches such as support vector machines (SVMs) achieve state-of-the-art performances in genomic discrimination tasks, but—due to its black-box character—motifs underlying its decision function are largely unknown. As a remedy, positional oligomer importance matrices (POIMs) allow us to visualize the significance of position-specific subsequences. Although being a major step towards the explanation of trained SVM models, they suffer from the fact that their size grows exponentially in the length of the motif, which renders their manual inspection feasible only for comparably small motif sizes, typically k ≤ 5. In this work, we extend the work on positional oligomer importance matrices, by presenting a new machine-learning methodology, entitled motifPOIM, to extract the truly relevant motifs—regardless of their length and complexity—underlying the predictions of a trained SVM model. Our framework thereby considers the motifs as free parameters in a probabilistic model, a task which can be phrased as a non-convex optimization problem. The exponential dependence of the POIM size on the oligomer length poses a major numerical challenge, which we address by an efficient optimization framework that allows us to find possibly overlapping motifs consisting of up to hundreds of nucleotides. We demonstrate the efficacy of our approach on a synthetic data set as well as a real-world human splice site data set. PMID:26690911
qPMS9: An Efficient Algorithm for Quorum Planted Motif Search

NASA Astrophysics Data System (ADS)

Nicolae, Marius; Rajasekaran, Sanguthevar

2015-01-01

Discovering patterns in biological sequences is a crucial problem. For example, the identification of patterns in DNA sequences has resulted in the determination of open reading frames, identification of gene promoter elements, intron/exon splicing sites, and SH RNAs, location of RNA degradation signals, identification of alternative splicing sites, etc. In protein sequences, patterns have led to domain identification, location of protease cleavage sites, identification of signal peptides, protein interactions, determination of protein degradation elements, identification of protein trafficking elements, discovery of short functional motifs, etc. In this paper we focus on the identification of an important class of patterns, namely, motifs. We study the (l, d) motif search problem or Planted Motif Search (PMS). PMS receives as input n strings and two integers l and d. It returns all sequences M of length l that occur in each input string, where each occurrence differs from M in at most d positions. Another formulation is quorum PMS (qPMS), where the motif appears in at least q% of the strings. We introduce qPMS9, a parallel exact qPMS algorithm that offers significant runtime improvements on DNA and protein datasets. qPMS9 solves the challenging DNA (l, d)-instances (28, 12) and (30, 13). The source code is available at https://code.google.com/p/qpms9/.

DNA motif alignment by evolving a population of Markov chains.

PubMed

Bi, Chengpeng

2009-01-30

Deciphering cis-regulatory elements or de novo motif-finding in genomes still remains elusive although much algorithmic effort has been expended. The Markov chain Monte Carlo (MCMC) method such as Gibbs motif samplers has been widely employed to solve the de novo motif-finding problem through sequence local alignment. Nonetheless, the MCMC-based motif samplers still suffer from local maxima like EM. Therefore, as a prerequisite for finding good local alignments, these motif algorithms are often independently run a multitude of times, but without information exchange between different chains. Hence it would be worth a new algorithm design enabling such information exchange. This paper presents a novel motif-finding algorithm by evolving a population of Markov chains with information exchange (PMC), each of which is initialized as a random alignment and run by the Metropolis-Hastings sampler (MHS). It is progressively updated through a series of local alignments stochastically sampled. Explicitly, the PMC motif algorithm performs stochastic sampling as specified by a population-based proposal distribution rather than individual ones, and adaptively evolves the population as a whole towards a global maximum. The alignment information exchange is accomplished by taking advantage of the pooled motif site distributions. A distinct method for running multiple independent Markov chains (IMC) without information exchange, or dubbed as the IMC motif algorithm, is also devised to compare with its PMC counterpart. Experimental studies demonstrate that the performance could be improved if pooled information were used to run a population of motif samplers. The new PMC algorithm was able to improve the convergence and outperformed other popular algorithms tested using simulated and biological motif sequences.
A Three-Dimensional RNA Motif in Potato spindle tuber viroid Mediates Trafficking from Palisade Mesophyll to Spongy Mesophyll in Nicotiana benthamiana[W

PubMed Central

Takeda, Ryuta; Petrov, Anton I.; Leontis, Neocles B.; Ding, Biao

2011-01-01

Cell-to-cell trafficking of RNA is an emerging biological principle that integrates systemic gene regulation, viral infection, antiviral response, and cell-to-cell communication. A key mechanistic question is how an RNA is specifically selected for trafficking from one type of cell into another type. Here, we report the identification of an RNA motif in Potato spindle tuber viroid (PSTVd) required for trafficking from palisade mesophyll to spongy mesophyll in Nicotiana benthamiana leaves. This motif, called loop 6, has the sequence 5′-CGA-3′...5′-GAC-3′ flanked on both sides by cis Watson-Crick G/C and G/U wobble base pairs. We present a three-dimensional (3D) structural model of loop 6 that specifies all non-Watson-Crick base pair interactions, derived by isostericity-based sequence comparisons with 3D RNA motifs from the RNA x-ray crystal structure database. The model is supported by available chemical modification patterns, natural sequence conservation/variations in PSTVd isolates and related species, and functional characterization of all possible mutants for each of the loop 6 base pairs. Our findings and approaches have broad implications for studying the 3D RNA structural motifs mediating trafficking of diverse RNA species across specific cellular boundaries and for studying the structure-function relationships of RNA motifs in other biological processes. PMID:21258006
A dinucleotide motif in oligonucleotides shows potent immunomodulatory activity and overrides species-specific recognition observed with CpG motif.

PubMed

Kandimalla, Ekambar R; Bhagat, Lakshmi; Zhu, Fu-Gang; Yu, Dong; Cong, Yan-Ping; Wang, Daqing; Tang, Jimmy X; Tang, Jin-Yan; Knetter, Cathrine F; Lien, Egil; Agrawal, Sudhir

2003-11-25

Bacterial and synthetic DNAs containing CpG dinucleotides in specific sequence contexts activate the vertebrate immune system through Toll-like receptor 9 (TLR9). In the present study, we used a synthetic nucleoside with a bicyclic heterobase [1-(2'-deoxy-beta-d-ribofuranosyl)-2-oxo-7-deaza-8-methyl-purine; R] to replace the C in CpG, resulting in an RpG dinucleotide. The RpG dinucleotide was incorporated in mouse- and human-specific motifs in oligodeoxynucleotides (oligos) and 3'-3-linked oligos, referred to as immunomers. Oligos containing the RpG motif induced cytokine secretion in mouse spleen-cell cultures. Immunomers containing RpG dinucleotides showed activity in transfected-HEK293 cells stably expressing mouse TLR9, suggesting direct involvement of TLR9 in the recognition of RpG motif. In J774 macrophages, RpG motifs activated NF-kappa B and mitogen-activated protein kinase pathways. Immunomers containing the RpG dinucleotide induced high levels of IL-12 and IFN-gamma, but lower IL-6 in time- and concentration-dependent fashion in mouse spleen-cell cultures costimulated with IL-2. Importantly, immunomers containing GTRGTT and GARGTT motifs were recognized to a similar extent by both mouse and human immune systems. Additionally, both mouse- and human-specific RpG immunomers potently stimulated proliferation of peripheral blood mononuclear cells obtained from diverse vertebrate species, including monkey, pig, horse, sheep, goat, rat, and chicken. An immunomer containing GTRGTT motif prevented conalbumin-induced and ragweed allergen-induced allergic inflammation in mice. We show that a synthetic bicyclic nucleotide is recognized in the C position of a CpG dinucleotide by immune cells from diverse vertebrate species without bias for flanking sequences, suggesting a divergent nucleotide motif recognition pattern of TLR9.
The Thiamin Pyrophosphate-Motif

NASA Technical Reports Server (NTRS)

Dominiak, P.; Ciszak, E.

2003-01-01

Using databases the authors have identified a common thiamin pyrophosphate (TPP)-motif in the family of functionally diverse TPP-dependent enzymes. This common motif consists of multimeric organization of subunits and two catalytic centers. Each catalytic center (PP:PYR) is formed at the interface of the PP-domain binding the magnesium ion, pyrophosphate and amhopyrimidine ring of TPP, and the PYR-domain binding the aminopyrimidine ring of that cofactor. A pair of these catalytic centers constitutes the catalytic core (PP:PYR)(sub 2) within these enzymes. Analysis of the structural elements of this catalytic core reveals novel definition of the common amino acid sequences, which are GXPhiX(sub 4)(G)PhiXXGQ and GDGX(sub 25-30)NN in the PP-domain, and the EX(sub 4)(G)PhiXXGPhi in the PYR-domain, where Phi corresponds to a hydrophobic amino acid. This TPP-motif provides a novel tool for annotation of TPP-dependent enzymes useful in advancing functional proteomics.
Binding properties of SUMO-interacting motifs (SIMs) in yeast.

PubMed

Jardin, Christophe; Horn, Anselm H C; Sticht, Heinrich

2015-03-01

Small ubiquitin-like modifier (SUMO) conjugation and interaction play an essential role in many cellular processes. A large number of yeast proteins is known to interact non-covalently with SUMO via short SUMO-interacting motifs (SIMs), but the structural details of this interaction are yet poorly characterized. In the present work, sequence analysis of a large dataset of 148 yeast SIMs revealed the existence of a hydrophobic core binding motif and a preference for acidic residues either within or adjacent to the core motif. Thus the sequence properties of yeast SIMs are highly similar to those described for human. Molecular dynamics simulations were performed to investigate the binding preferences for four representative SIM peptides differing in the number and distribution of acidic residues. Furthermore, the relative stability of two previously observed alternative binding orientations (parallel, antiparallel) was assessed. For all SIMs investigated, the antiparallel binding mode remained stable in the simulations and the SIMs were tightly bound via their hydrophobic core residues supplemented by polar interactions of the acidic residues. In contrary, the stability of the parallel binding mode is more dependent on the sequence features of the SIM motif like the number and position of acidic residues or the presence of additional adjacent interaction motifs. This information should be helpful to enhance the prediction of SIMs and their binding properties in different organisms to facilitate the reconstruction of the SUMO interactome.
Simple Shared Motifs (SSM) in conserved region of promoters: a new approach to identify co-regulation patterns.

PubMed

Gruel, Jérémy; LeBorgne, Michel; LeMeur, Nolwenn; Théret, Nathalie

2011-09-12

Regulation of gene expression plays a pivotal role in cellular functions. However, understanding the dynamics of transcription remains a challenging task. A host of computational approaches have been developed to identify regulatory motifs, mainly based on the recognition of DNA sequences for transcription factor binding sites. Recent integration of additional data from genomic analyses or phylogenetic footprinting has significantly improved these methods. Here, we propose a different approach based on the compilation of Simple Shared Motifs (SSM), groups of sequences defined by their length and similarity and present in conserved sequences of gene promoters. We developed an original algorithm to search and count SSM in pairs of genes. An exceptional number of SSM is considered as a common regulatory pattern. The SSM approach is applied to a sample set of genes and validated using functional gene-set enrichment analyses. We demonstrate that the SSM approach selects genes that are over-represented in specific biological categories (Ontology and Pathways) and are enriched in co-expressed genes. Finally we show that genes co-expressed in the same tissue or involved in the same biological pathway have increased SSM values. Using unbiased clustering of genes, Simple Shared Motifs analysis constitutes an original contribution to provide a clearer definition of expression networks.
Simple Shared Motifs (SSM) in conserved region of promoters: a new approach to identify co-regulation patterns

PubMed Central

2011-01-01

Background Regulation of gene expression plays a pivotal role in cellular functions. However, understanding the dynamics of transcription remains a challenging task. A host of computational approaches have been developed to identify regulatory motifs, mainly based on the recognition of DNA sequences for transcription factor binding sites. Recent integration of additional data from genomic analyses or phylogenetic footprinting has significantly improved these methods. Results Here, we propose a different approach based on the compilation of Simple Shared Motifs (SSM), groups of sequences defined by their length and similarity and present in conserved sequences of gene promoters. We developed an original algorithm to search and count SSM in pairs of genes. An exceptional number of SSM is considered as a common regulatory pattern. The SSM approach is applied to a sample set of genes and validated using functional gene-set enrichment analyses. We demonstrate that the SSM approach selects genes that are over-represented in specific biological categories (Ontology and Pathways) and are enriched in co-expressed genes. Finally we show that genes co-expressed in the same tissue or involved in the same biological pathway have increased SSM values. Conclusions Using unbiased clustering of genes, Simple Shared Motifs analysis constitutes an original contribution to provide a clearer definition of expression networks. PMID:21910886
Interactions of HIPPI, a molecular partner of Huntingtin interacting protein HIP1, with the specific motif present at the putative promoter sequence of the caspase-1, caspase-8 and caspase-10 genes.

PubMed

Majumder, P; Choudhury, A; Banerjee, M; Lahiri, A; Bhattacharyya, N P

2007-08-01

To investigate the mechanism of increased expression of caspase-1 caused by exogenous Hippi, observed earlier in HeLa and Neuro2A cells, in this work we identified a specific motif AAAGACATG (- 101 to - 93) at the caspase-1 gene upstream sequence where HIPPI could bind. Various mutations in this specific sequence compromised the interaction, showing the specificity of the interactions. In the luciferase reporter assay, when the reporter gene was driven by caspase-1 gene upstream sequences (- 151 to - 92) with the mutation G to T at position - 98, luciferase activity was decreased significantly in green fluorescent protein-Hippi-expressing HeLa cells in comparison to that obtained with the wild-type caspase-1 gene 60 bp upstream sequence, indicating the biological significance of such binding. It was observed that the C-terminal 'pseudo' death effector domain of HIPPI interacted with the 60 bp (- 151 to - 92) upstream sequence of the caspase-1 gene containing the motif. We further observed that expression of caspase-8 and caspase-10 was increased in green fluorescent protein-Hippi-expressing HeLa cells. In addition, HIPPI interacted in vitro with putative promoter sequences of these genes, containing a similar motif. In summary, we identified a novel function of HIPPI; it binds to specific upstream sequences of the caspase-1, caspase-8 and caspase-10 genes and alters the expression of the genes. This result showed the motif-specific interaction of HIPPI with DNA, and indicates that it could act as transcription regulator.
Assessment of composite motif discovery methods.

PubMed

Klepper, Kjetil; Sandve, Geir K; Abul, Osman; Johansen, Jostein; Drablos, Finn

2008-02-26

Computational discovery of regulatory elements is an important area of bioinformatics research and more than a hundred motif discovery methods have been published. Traditionally, most of these methods have addressed the problem of single motif discovery - discovering binding motifs for individual transcription factors. In higher organisms, however, transcription factors usually act in combination with nearby bound factors to induce specific regulatory behaviours. Hence, recent focus has shifted from single motifs to the discovery of sets of motifs bound by multiple cooperating transcription factors, so called composite motifs or cis-regulatory modules. Given the large number and diversity of methods available, independent assessment of methods becomes important. Although there have been several benchmark studies of single motif discovery, no similar studies have previously been conducted concerning composite motif discovery. We have developed a benchmarking framework for composite motif discovery and used it to evaluate the performance of eight published module discovery tools. Benchmark datasets were constructed based on real genomic sequences containing experimentally verified regulatory modules, and the module discovery programs were asked to predict both the locations of these modules and to specify the single motifs involved. To aid the programs in their search, we provided position weight matrices corresponding to the binding motifs of the transcription factors involved. In addition, selections of decoy matrices were mixed with the genuine matrices on one dataset to test the response of programs to varying levels of noise. Although some of the methods tested tended to score somewhat better than others overall, there were still large variations between individual datasets and no single method performed consistently better than the rest in all situations. The variation in performance on individual datasets also shows that the new benchmark datasets represents a
Protospacer Adjacent Motif (PAM)-Distal Sequences Engage CRISPR Cas9 DNA Target Cleavage

PubMed Central

Ethier, Sylvain; Schmeing, T. Martin; Dostie, Josée; Pelletier, Jerry

2014-01-01

The clustered regularly interspaced short palindromic repeat (CRISPR)-associated enzyme Cas9 is an RNA-guided nuclease that has been widely adapted for genome editing in eukaryotic cells. However, the in vivo target specificity of Cas9 is poorly understood and most studies rely on in silico predictions to define the potential off-target editing spectrum. Using chromatin immunoprecipitation followed by sequencing (ChIP-seq), we delineate the genome-wide binding panorama of catalytically inactive Cas9 directed by two different single guide (sg) RNAs targeting the Trp53 locus. Cas9:sgRNA complexes are able to load onto multiple sites with short seed regions adjacent to 5′NGG3′ protospacer adjacent motifs (PAM). Yet among 43 ChIP-seq sites harboring seed regions analyzed for mutational status, we find editing only at the intended on-target locus and one off-target site. In vitro analysis of target site recognition revealed that interactions between the 5′ end of the guide and PAM-distal target sequences are necessary to efficiently engage Cas9 nucleolytic activity, providing an explanation for why off-target editing is significantly lower than expected from ChIP-seq data. PMID:25275497
Identification of an Electrostatic Ruler Motif for Sequence-Specific Binding of Collagenase to Collagen.

PubMed

Subramanian, Sundar Raman; Singam, Ettayapuram Ramaprasad Azhagiya; Berinski, Michael; Subramanian, Venkatesan; Wade, Rebecca C

2016-08-25

Sequence-specific cleavage of collagen by mammalian collagenase plays a pivotal role in cell function. Collagenases are matrix metalloproteinases that cleave the peptide bond at a specific position on fibrillar collagen. The collagenase Hemopexin-like (HPX) domain has been proposed to be responsible for substrate recognition, but the mechanism by which collagenases identify the cleavage site on fibrillar collagen is not clearly understood. In this study, Brownian dynamics simulations coupled with atomic-detail and coarse-grained molecular dynamics simulations were performed to dock matrix metalloproteinase-1 (MMP-1) on a collagen IIIα1 triple helical peptide. We find that the HPX domain recognizes the collagen triple helix at a conserved R-X11-R motif C-terminal to the cleavage site to which the HPX domain of collagen is guided electrostatically. The binding of the HPX domain between the two arginine residues is energetically stabilized by hydrophobic contacts with collagen. From the simulations and analysis of the sequences and structural flexibility of collagen and collagenase, a mechanistic scheme by which MMP-1 can recognize and bind collagen for proteolysis is proposed.
Amino acid sequence motifs essential for P0-mediated suppression of RNA silencing in an isolate of potato leafroll virus from Inner Mongolia.

PubMed

Zhuo, Tao; Li, Yuan-Yuan; Xiang, Hai-Ying; Wu, Zhan-Yu; Wang, Xian-Bin; Wang, Ying; Zhang, Yong-Liang; Li, Da-Wei; Yu, Jia-Lin; Han, Cheng-Gui

2014-06-01

Polerovirus P0 suppressors of host gene silencing contain a consensus F-box-like motif with Leu/Pro (L/P) requirements for suppressor activity. The Inner Mongolian Potato leafroll virus (PLRV) P0 protein (P0(PL-IM)) has an unusual F-box-like motif that contains a Trp/Gly (W/G) sequence and an additional GW/WG-like motif (G139/W140/G141) that is lacking in other P0 proteins. We used Agrobacterium infiltration-mediated RNA silencing assays to establish that P0(PL-IM) has a strong suppressor activity. Mutagenesis experiments demonstrated that the P0(PL-IM) F-box-like motif encompasses amino acids 76-LPRHLHYECLEWGLLCG THP-95, and that the suppressor activity is abolished by L76A, W87A, or G88A substitution. The suppressor activity is also weakened substantially by mutations within the G139/W140/G141 region and is eliminated by a mutation (F220R) in a C-terminal conserved sequence of P0(PL-IM). As has been observed with other P0 proteins, P0(PL-IM) suppression is correlated with reduced accumulation of the host AGO1-silencing complex protein. However, P0(PL-IM) fails to bind SKP1, which functions in a proteasome pathway that may be involved in AGO1 degradation. These results suggest that P0(PL-IM) may suppress RNA silencing by using an alternative pathway to target AGO1 for degradation. Our results help improve our understanding of the molecular mechanisms involved in PLRV infection.
Argo_CUDA: Exhaustive GPU based approach for motif discovery in large DNA datasets.

PubMed

Vishnevsky, Oleg V; Bocharnikov, Andrey V; Kolchanov, Nikolay A

2018-02-01

The development of chromatin immunoprecipitation sequencing (ChIP-seq) technology has revolutionized the genetic analysis of the basic mechanisms underlying transcription regulation and led to accumulation of information about a huge amount of DNA sequences. There are a lot of web services which are currently available for de novo motif discovery in datasets containing information about DNA/protein binding. An enormous motif diversity makes their finding challenging. In order to avoid the difficulties, researchers use different stochastic approaches. Unfortunately, the efficiency of the motif discovery programs dramatically declines with the query set size increase. This leads to the fact that only a fraction of top "peak" ChIP-Seq segments can be analyzed or the area of analysis should be narrowed. Thus, the motif discovery in massive datasets remains a challenging issue. Argo_Compute Unified Device Architecture (CUDA) web service is designed to process the massive DNA data. It is a program for the detection of degenerate oligonucleotide motifs of fixed length written in 15-letter IUPAC code. Argo_CUDA is a full-exhaustive approach based on the high-performance GPU technologies. Compared with the existing motif discovery web services, Argo_CUDA shows good prediction quality on simulated sets. The analysis of ChIP-Seq sequences revealed the motifs which correspond to known transcription factor binding sites.
Prediction of virus-host protein-protein interactions mediated by short linear motifs.

PubMed

Becerra, Andrés; Bucheli, Victor A; Moreno, Pedro A

2017-03-09

Short linear motifs in host organisms proteins can be mimicked by viruses to create protein-protein interactions that disable or control metabolic pathways. Given that viral linear motif instances of host motif regular expressions can be found by chance, it is necessary to develop filtering methods of functional linear motifs. We conduct a systematic comparison of linear motifs filtering methods to develop a computational approach for predicting motif-mediated protein-protein interactions between human and the human immunodeficiency virus 1 (HIV-1). We implemented three filtering methods to obtain linear motif sets: 1) conserved in viral proteins (C), 2) located in disordered regions (D) and 3) rare or scarce in a set of randomized viral sequences (R). The sets C,D,R are united and intersected. The resulting sets are compared by the number of protein-protein interactions correctly inferred with them - with experimental validation. The comparison is done with HIV-1 sequences and interactions from the National Institute of Allergy and Infectious Diseases (NIAID). The number of correctly inferred interactions allows to rank the interactions by the sets used to deduce them: D∪R and C. The ordering of the sets is descending on the probability of capturing functional interactions. With respect to HIV-1, the sets C∪R, D∪R, C∪D∪R infer all known interactions between HIV1 and human proteins mediated by linear motifs. We found that the majority of conserved linear motifs in the virus are located in disordered regions. We have developed a method for predicting protein-protein interactions mediated by linear motifs between HIV-1 and human proteins. The method only use protein sequences as inputs. We can extend the software developed to any other eukaryotic virus and host in order to find and rank candidate interactions. In future works we will use it to explore possible viral attack mechanisms based on linear motif mimicry.
Using SCOPE to identify potential regulatory motifs in coregulated genes.

PubMed

Martyanov, Viktor; Gross, Robert H

2011-05-31

SCOPE is an ensemble motif finder that uses three component algorithms in parallel to identify potential regulatory motifs by over-representation and motif position preference. Each component algorithm is optimized to find a different kind of motif. By taking the best of these three approaches, SCOPE performs better than any single algorithm, even in the presence of noisy data. In this article, we utilize a web version of SCOPE to examine genes that are involved in telomere maintenance. SCOPE has been incorporated into at least two other motif finding programs and has been used in other studies. The three algorithms that comprise SCOPE are BEAM, which finds non-degenerate motifs (ACCGGT), PRISM, which finds degenerate motifs (ASCGWT), and SPACER, which finds longer bipartite motifs (ACCnnnnnnnnGGT). These three algorithms have been optimized to find their corresponding type of motif. Together, they allow SCOPE to perform extremely well. Once a gene set has been analyzed and candidate motifs identified, SCOPE can look for other genes that contain the motif which, when added to the original set, will improve the motif score. This can occur through over-representation or motif position preference. Working with partial gene sets that have biologically verified transcription factor binding sites, SCOPE was able to identify most of the rest of the genes also regulated by the given transcription factor. Output from SCOPE shows candidate motifs, their significance, and other information both as a table and as a graphical motif map. FAQs and video tutorials are available at the SCOPE web site which also includes a "Sample Search" button that allows the user to perform a trial run. Scope has a very friendly user interface that enables novice users to access the algorithm's full power without having to become an expert in the bioinformatics of motif finding. As input, SCOPE can take a list of genes, or FASTA sequences. These can be entered in browser text fields, or read from
A +1 ribosomal frameshifting motif prevalent among plant amalgaviruses.

PubMed

Nibert, Max L; Pyle, Jesse D; Firth, Andrew E

2016-11-01

Sequence accessions attributable to novel plant amalgaviruses have been found in the Transcriptome Shotgun Assembly database. Sixteen accessions, derived from 12 different plant species, appear to encompass the complete protein-coding regions of the proposed amalgaviruses, which would substantially expand the size of genus Amalgavirus from 4 current species. Other findings include evidence for UUU_CGN as a +1 ribosomal frameshifting motif prevalent among plant amalgaviruses; for a variant version of this motif found thus far in only two amalgaviruses from solanaceous plants; for a region of α-helical coiled coil propensity conserved in a central region of the ORF1 translation product of plant amalgaviruses; and for conserved sequences in a C-terminal region of the ORF2 translation product (RNA-dependent RNA polymerase) of plant amalgaviruses, seemingly beyond the region of conserved polymerase motifs. These results additionally illustrate the value of mining the TSA database and others for novel viral sequences for comparative analyses. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
Function, dynamics and evolution of network motif modules in integrated gene regulatory networks of worm and plant.

PubMed

Defoort, Jonas; Van de Peer, Yves; Vermeirssen, Vanessa

2018-06-05

Gene regulatory networks (GRNs) consist of different molecular interactions that closely work together to establish proper gene expression in time and space. Especially in higher eukaryotes, many questions remain on how these interactions collectively coordinate gene regulation. We study high quality GRNs consisting of undirected protein-protein, genetic and homologous interactions, and directed protein-DNA, regulatory and miRNA-mRNA interactions in the worm Caenorhabditis elegans and the plant Arabidopsis thaliana. Our data-integration framework integrates interactions in composite network motifs, clusters these in biologically relevant, higher-order topological network motif modules, overlays these with gene expression profiles and discovers novel connections between modules and regulators. Similar modules exist in the integrated GRNs of worm and plant. We show how experimental or computational methodologies underlying a certain data type impact network topology. Through phylogenetic decomposition, we found that proteins of worm and plant tend to functionally interact with proteins of a similar age, while at the regulatory level TFs favor same age, but also older target genes. Despite some influence of the duplication mode difference, we also observe at the motif and module level for both species a preference for age homogeneity for undirected and age heterogeneity for directed interactions. This leads to a model where novel genes are added together to the GRNs in a specific biological functional context, regulated by one or more TFs that also target older genes in the GRNs. Overall, we detected topological, functional and evolutionary properties of GRNs that are potentially universal in all species.
Nucleophosmin integrates within the nucleolus via multi-modal interactions with proteins displaying R-rich linear motifs and rRNA.

PubMed

Mitrea, Diana M; Cika, Jaclyn A; Guy, Clifford S; Ban, David; Banerjee, Priya R; Stanley, Christopher B; Nourse, Amanda; Deniz, Ashok A; Kriwacki, Richard W

2016-02-02

The nucleolus is a membrane-less organelle formed through liquid-liquid phase separation of its components from the surrounding nucleoplasm. Here, we show that nucleophosmin (NPM1) integrates within the nucleolus via a multi-modal mechanism involving multivalent interactions with proteins containing arginine-rich linear motifs (R-motifs) and ribosomal RNA (rRNA). Importantly, these R-motifs are found in canonical nucleolar localization signals. Based on a novel combination of biophysical approaches, we propose a model for the molecular organization within liquid-like droplets formed by the N-terminal domain of NPM1 and R-motif peptides, thus providing insights into the structural organization of the nucleolus. We identify multivalency of acidic tracts and folded nucleic acid binding domains, mediated by N-terminal domain oligomerization, as structural features required for phase separation of NPM1 with other nucleolar components in vitro and for localization within mammalian nucleoli. We propose that one mechanism of nucleolar localization involves phase separation of proteins within the nucleolus.
A self-assembling peptide RADA16-I integrated with spider fibroin uncrystalline motifs

PubMed Central

Sun, Lijuan; Zhao, Xiaojun

2012-01-01

Mechanical strength of nanofiber scaffolds formed by the self-assembling peptide RADA16-I or its derivatives is not very good and limits their application. To address this problem, we inserted spidroin uncrystalline motifs, which confer incomparable elasticity and hydrophobicity to spider silk GGAGGS or GPGGY, into the C-terminus of RADA16-I to newly design two peptides: R3 (n-RADARADARADARADA-GGAGGS-c) and R4 (n-RADARADARADARADA-GPGGY-c), and then observed the effect of these motifs on biophysical properties of the peptide. Atomic force microscopy, transmitting electron microscopy, and circular dichroism spectroscopy confirm that R3 and R4 display β-sheet structure and self-assemble into long nanofibers. Compared with R3, the β-sheet structure and nanofibers formed by R4 are more stable; they change to random coil and unordered aggregation at higher temperature. Rheology measurements indicate that novel peptides form hydrogel when induced by DMEM, and the storage modulus of R3 and R4 hydrogel is 0.5 times and 3 times higher than that of RADA16-I, respectively. Furthermore, R4 hydrogel remarkably promotes growth of liver cell L02 and liver cancer cell SMCC7721 compared with 2D culture, determined by MTT assay. Novel peptides still have potential as hydrophobic drug carriers; they can stabilize pyrene microcrystals in aqueous solution and deliver this into a lipophilic environment, identified by fluorescence emission spectra. Altogether, the spider fibroin motif GPGGY most effectively enhances mechanical strength and hydrophobicity of the peptide. This study provides a new method in the design of nanobiomaterials and helps us to understand the role of the amino acid sequence in nanofiber formation. PMID:22346352
Seed storage protein gene promoters contain conserved DNA motifs in Brassicaceae, Fabaceae and Poaceae

PubMed Central

Fauteux, François; Strömvik, Martina V

2009-01-01

Background Accurate computational identification of cis-regulatory motifs is difficult, particularly in eukaryotic promoters, which typically contain multiple short and degenerate DNA sequences bound by several interacting factors. Enrichment in combinations of rare motifs in the promoter sequence of functionally or evolutionarily related genes among several species is an indicator of conserved transcriptional regulatory mechanisms. This provides a basis for the computational identification of cis-regulatory motifs. Results We have used a discriminative seeding DNA motif discovery algorithm for an in-depth analysis of 54 seed storage protein (SSP) gene promoters from three plant families, namely Brassicaceae (mustards), Fabaceae (legumes) and Poaceae (grasses) using backgrounds based on complete sets of promoters from a representative species in each family, namely Arabidopsis (Arabidopsis thaliana (L.) Heynh.), soybean (Glycine max (L.) Merr.) and rice (Oryza sativa L.) respectively. We have identified three conserved motifs (two RY-like and one ACGT-like) in Brassicaceae and Fabaceae SSP gene promoters that are similar to experimentally characterized seed-specific cis-regulatory elements. Fabaceae SSP gene promoter sequences are also enriched in a novel, seed-specific E2Fb-like motif. Conserved motifs identified in Poaceae SSP gene promoters include a GCN4-like motif, two prolamin-box-like motifs and an Skn-1-like motif. Evidence of the presence of a variant of the TATA-box is found in the SSP gene promoters from the three plant families. Motifs discovered in SSP gene promoters were used to score whole-genome sets of promoters from Arabidopsis, soybean and rice. The highest-scoring promoters are associated with genes coding for different subunits or precursors of seed storage proteins. Conclusion Seed storage protein gene promoter motifs are conserved in diverse species, and different plant families are characterized by a distinct combination of conserved motifs

DNA nanotechnology based on i-motif structures.

PubMed

Dong, Yuanchen; Yang, Zhongqiang; Liu, Dongsheng

2014-06-17

CONSPECTUS: Most biological processes happen at the nanometer scale, and understanding the energy transformations and material transportation mechanisms within living organisms has proved challenging. To better understand the secrets of life, researchers have investigated artificial molecular motors and devices over the past decade because such systems can mimic certain biological processes. DNA nanotechnology based on i-motif structures is one system that has played an important role in these investigations. In this Account, we summarize recent advances in functional DNA nanotechnology based on i-motif structures. The i-motif is a DNA quadruplex that occurs as four stretches of cytosine repeat sequences form C·CH(+) base pairs, and their stabilization requires slightly acidic conditions. This unique property has produced the first DNA molecular motor driven by pH changes. The motor is reliable, and studies show that it is capable of millisecond running speeds, comparable to the speed of natural protein motors. With careful design, the output of these types of motors was combined to drive micrometer-sized cantilevers bend. Using established DNA nanostructure assembly and functionalization methods, researchers can easily integrate the motor within other DNA assembled structures and functional units, producing DNA molecular devices with new functions such as suprahydrophobic/suprahydrophilic smart surfaces that switch, intelligent nanopores triggered by pH changes, molecular logic gates, and DNA nanosprings. Recently, researchers have produced motors driven by light and electricity, which have allowed DNA motors to be integrated within silicon-based nanodevices. Moreover, some devices based on i-motif structures have proven useful for investigating processes within living cells. The pH-responsiveness of the i-motif structure also provides a way to control the stepwise assembly of DNA nanostructures. In addition, because of the stability of the i-motif, this
A novel approach to identifying regulatory motifs in distantly related genomes

PubMed Central

Van Hellemont, Ruth; Monsieurs, Pieter; Thijs, Gert; De Moor, Bart; Van de Peer, Yves; Marchal, Kathleen

2005-01-01

Although proven successful in the identification of regulatory motifs, phylogenetic footprinting methods still show some shortcomings. To assess these difficulties, most apparent when applying phylogenetic footprinting to distantly related organisms, we developed a two-step procedure that combines the advantages of sequence alignment and motif detection approaches. The results on well-studied benchmark datasets indicate that the presented method outperforms other methods when the sequences become either too long or too heterogeneous in size. PMID:16420672
The Runt domain of AML1 (RUNX1) binds a sequence-conserved RNA motif that mimics a DNA element.

PubMed

Fukunaga, Junichi; Nomura, Yusuke; Tanaka, Yoichiro; Amano, Ryo; Tanaka, Taku; Nakamura, Yoshikazu; Kawai, Gota; Sakamoto, Taiichi; Kozu, Tomoko

2013-07-01

AML1 (RUNX1) is a key transcription factor for hematopoiesis that binds to the Runt-binding double-stranded DNA element (RDE) of target genes through its N-terminal Runt domain. Aberrations in the AML1 gene are frequently found in human leukemia. To better understand AML1 and its potential utility for diagnosis and therapy, we obtained RNA aptamers that bind specifically to the AML1 Runt domain. Enzymatic probing and NMR analyses revealed that Apt1-S, which is a truncated variant of one of the aptamers, has a CACG tetraloop and two stem regions separated by an internal loop. All the isolated aptamers were found to contain the conserved sequence motif 5'-NNCCAC-3' and 5'-GCGMGN'N'-3' (M:A or C; N and N' form Watson-Crick base pairs). The motif contains one AC mismatch and one base bulged out. Mutational analysis of Apt1-S showed that three guanines of the motif are important for Runt binding as are the three guanines of RDE, which are directly recognized by three arginine residues of the Runt domain. Mutational analyses of the Runt domain revealed that the amino acid residues used for Apt1-S binding were similar to those used for RDE binding. Furthermore, the aptamer competed with RDE for binding to the Runt domain in vitro. These results demonstrated that the Runt domain of the AML1 protein binds to the motif of the aptamer that mimics DNA. Our findings should provide new insights into RNA function and utility in both basic and applied sciences.
A flexible motif search technique based on generalized profiles.

PubMed

Bucher, P; Karplus, K; Moeri, N; Hofmann, K

1996-03-01

A flexible motif search technique is presented which has two major components: (1) a generalized profile syntax serving as a motif definition language; and (2) a motif search method specifically adapted to the problem of finding multiple instances of a motif in the same sequence. The new profile structure, which is the core of the generalized profile syntax, combines the functions of a variety of motif descriptors implemented in other methods, including regular expression-like patterns, weight matrices, previously used profiles, and certain types of hidden Markov models (HMMs). The relationship between generalized profiles and other biomolecular motif descriptors is analyzed in detail, with special attention to HMMs. Generalized profiles are shown to be equivalent to a particular class of HMMs, and conversion procedures in both directions are given. The conversion procedures provide an interpretation for local alignment in the framework of stochastic models, allowing for clear, simple significance tests. A mathematical statement of the motif search problem defines the new method exactly without linking it to a specific algorithmic solution. Part of the definition includes a new definition of disjointness of alignments.
Identification of a DNA sequence motif required for expression of iron-regulated genes in pseudomonads.

PubMed

Rombel, I T; McMorran, B J; Lamont, I L

1995-02-20

Many bacteria respond to a lack of iron in the environment by synthesizing siderophores, which act as iron-scavenging compounds. Fluorescent pseudomonads synthesize strain-specific but chemically related siderophores called pyoverdines or pseudobactins. We have investigated the mechanisms by which iron controls expression of genes involved in pyoverdine metabolism in Pseudomonas aeruginosa. Transcription of these genes is repressed by the presence of iron in the growth medium. Three promoters from these genes were cloned and the activities of the promoters were dependent on the amounts of iron in the growth media. Two of the promoters were sequenced and the transcriptional start site were identified by S1 nuclease analysis. Sequences similar to the consensus binding site for the Fur repressor protein, which controls expression of iron-repressible genes in several gram-negative species, were not present in the promoters, suggesting that they are unlikely to have a high affinity for Fur. However, comparison of the promoter sequences with those of iron-regulated genes from other Pseudomonas species and also the iron-regulated exotoxin gene of P. aeruginosa allowed identification of a shared sequence element, with the consensus sequence (G/C)CTAAAT-CCC, which is likely to act as a binding site for a transcriptional activator protein. Mutations in this sequence greatly reduced the activities of the promoters characterized here as well as those of other iron-regulated promoters. The requirement for this motif in the promoters of iron-regulated genes of different Pseudomonas species indicates that similar mechanisms are likely to be involved in controlling expression of a range of iron-regulated genes in pseudomonads.
RNA Bricks—a database of RNA 3D motifs and their interactions

PubMed Central

Chojnowski, Grzegorz; Waleń, Tomasz; Bujnicki, Janusz M.

2014-01-01

The RNA Bricks database (http://iimcb.genesilico.pl/rnabricks), stores information about recurrent RNA 3D motifs and their interactions, found in experimentally determined RNA structures and in RNA–protein complexes. In contrast to other similar tools (RNA 3D Motif Atlas, RNA Frabase, Rloom) RNA motifs, i.e. ‘RNA bricks’ are presented in the molecular environment, in which they were determined, including RNA, protein, metal ions, water molecules and ligands. All nucleotide residues in RNA bricks are annotated with structural quality scores that describe real-space correlation coefficients with the electron density data (if available), backbone geometry and possible steric conflicts, which can be used to identify poorly modeled residues. The database is also equipped with an algorithm for 3D motif search and comparison. The algorithm compares spatial positions of backbone atoms of the user-provided query structure and of stored RNA motifs, without relying on sequence or secondary structure information. This enables the identification of local structural similarities among evolutionarily related and unrelated RNA molecules. Besides, the search utility enables searching ‘RNA bricks’ according to sequence similarity, and makes it possible to identify motifs with modified ribonucleotide residues at specific positions. PMID:24220091
High-throughput sequencing enhanced phage display enables the identification of patient-specific epitope motifs in serum.

PubMed

Christiansen, Anders; Kringelum, Jens V; Hansen, Christian S; Bøgh, Katrine L; Sullivan, Eric; Patel, Jigar; Rigby, Neil M; Eiwegger, Thomas; Szépfalusi, Zsolt; de Masi, Federico; Nielsen, Morten; Lund, Ole; Dufva, Martin

2015-08-06

Phage display is a prominent screening technique with a multitude of applications including therapeutic antibody development and mapping of antigen epitopes. In this study, phages were selected based on their interaction with patient serum and exhaustively characterised by high-throughput sequencing. A bioinformatics approach was developed in order to identify peptide motifs of interest based on clustering and contrasting to control samples. Comparison of patient and control samples confirmed a major issue in phage display, namely the selection of unspecific peptides. The potential of the bioinformatic approach was demonstrated by identifying epitopes of a prominent peanut allergen, Ara h 1, in sera from patients with severe peanut allergy. The identified epitopes were confirmed by high-density peptide micro-arrays. The present study demonstrates that high-throughput sequencing can empower phage display by (i) enabling the analysis of complex biological samples, (ii) circumventing the traditional laborious picking and functional testing of individual phage clones and (iii) reducing the number of selection rounds.
Nucleophosmin integrates within the nucleolus via multi-modal interactions with proteins displaying R-rich linear motifs and rRNA

PubMed Central

Mitrea, Diana M; Cika, Jaclyn A; Guy, Clifford S; Ban, David; Banerjee, Priya R; Stanley, Christopher B; Nourse, Amanda; Deniz, Ashok A; Kriwacki, Richard W

2016-01-01

The nucleolus is a membrane-less organelle formed through liquid-liquid phase separation of its components from the surrounding nucleoplasm. Here, we show that nucleophosmin (NPM1) integrates within the nucleolus via a multi-modal mechanism involving multivalent interactions with proteins containing arginine-rich linear motifs (R-motifs) and ribosomal RNA (rRNA). Importantly, these R-motifs are found in canonical nucleolar localization signals. Based on a novel combination of biophysical approaches, we propose a model for the molecular organization within liquid-like droplets formed by the N-terminal domain of NPM1 and R-motif peptides, thus providing insights into the structural organization of the nucleolus. We identify multivalency of acidic tracts and folded nucleic acid binding domains, mediated by N-terminal domain oligomerization, as structural features required for phase separation of NPM1 with other nucleolar components in vitro and for localization within mammalian nucleoli. We propose that one mechanism of nucleolar localization involves phase separation of proteins within the nucleolus. DOI: http://dx.doi.org/10.7554/eLife.13571.001 PMID:26836305
Nucleophosmin integrates within the nucleolus via multi-modal interactions with proteins displaying R-rich linear motifs and rRNA

DOE PAGES

Mitrea, Diana M.; Cika, Jaclyn A.; Guy, Clifford S.; ...

2016-02-02

In this study, the nucleolus is a membrane-less organelle formed through liquid-liquid phase separation of its components from the surrounding nucleoplasm. Here, we show that nucleophosmin (NPM1) integrates within the nucleolus via a multi-modal mechanism involving multivalent interactions with proteins containing arginine-rich linear motifs (R-motifs) and ribosomal RNA (rRNA). Importantly, these R-motifs are found in canonical nucleolar localization signals. Based on a novel combination of biophysical approaches, we propose a model for the molecular organization within liquid-like droplets formed by the N-terminal domain of NPM1 and R-motif peptides, thus providing insights into the structural organization of the nucleolus. We identifymore » multivalency of acidic tracts and folded nucleic acid binding domains, mediated by N-terminal domain oligomerization, as structural features required for phase separation of NPM1 with other nucleolar components in vitro and for localization within mammalian nucleoli. We propose that one mechanism of nucleolar localization involves phase separation of proteins within the nucleolus.« less
The Runt domain of AML1 (RUNX1) binds a sequence-conserved RNA motif that mimics a DNA element

PubMed Central

Fukunaga, Junichi; Nomura, Yusuke; Tanaka, Yoichiro; Amano, Ryo; Tanaka, Taku; Nakamura, Yoshikazu; Kawai, Gota; Sakamoto, Taiichi; Kozu, Tomoko

2013-01-01

AML1 (RUNX1) is a key transcription factor for hematopoiesis that binds to the Runt-binding double-stranded DNA element (RDE) of target genes through its N-terminal Runt domain. Aberrations in the AML1 gene are frequently found in human leukemia. To better understand AML1 and its potential utility for diagnosis and therapy, we obtained RNA aptamers that bind specifically to the AML1 Runt domain. Enzymatic probing and NMR analyses revealed that Apt1-S, which is a truncated variant of one of the aptamers, has a CACG tetraloop and two stem regions separated by an internal loop. All the isolated aptamers were found to contain the conserved sequence motif 5′-NNCCAC-3′ and 5′-GCGMGN′N′-3′ (M:A or C; N and N′ form Watson–Crick base pairs). The motif contains one AC mismatch and one base bulged out. Mutational analysis of Apt1-S showed that three guanines of the motif are important for Runt binding as are the three guanines of RDE, which are directly recognized by three arginine residues of the Runt domain. Mutational analyses of the Runt domain revealed that the amino acid residues used for Apt1-S binding were similar to those used for RDE binding. Furthermore, the aptamer competed with RDE for binding to the Runt domain in vitro. These results demonstrated that the Runt domain of the AML1 protein binds to the motif of the aptamer that mimics DNA. Our findings should provide new insights into RNA function and utility in both basic and applied sciences. PMID:23709277
Dipeptide frequency/bias analysis identifies conserved sites of nonrandomness shared by cysteine-rich motifs.

PubMed

Campion, S R; Ameen, A S; Lai, L; King, J M; Munzenmaier, T N

2001-08-15

This report describes the application of a simple computational tool, AAPAIR.TAB, for the systematic analysis of the cysteine-rich EGF, Sushi, and Laminin motif/sequence families at the two-amino acid level. Automated dipeptide frequency/bias analysis detects preferences in the distribution of amino acids in established protein families, by determining which "ordered dipeptides" occur most frequently in comprehensive motif-specific sequence data sets. Graphic display of the dipeptide frequency/bias data revealed family-specific preferences for certain dipeptides, but more importantly detected a shared preference for employment of the ordered dipeptides Gly-Tyr (GY) and Gly-Phe (GF) in all three protein families. The dipeptide Asn-Gly (NG) also exhibited high-frequency and bias in the EGF and Sushi motif families, whereas Asn-Thr (NT) was distinguished in the Laminin family. Evaluation of the distribution of dipeptides identified by frequency/bias analysis subsequently revealed the highly restricted localization of the G(F/Y) and N(G/T) sequence elements at two separate sites of extreme conservation in the consensus sequence of all three sequence families. The similar employment of the high-frequency/bias dipeptides in three distinct protein sequence families was further correlated with the concurrence of these shared molecular determinants at similar positions within the distinctive scaffolds of three structurally divergent, but similarly employed, motif modules.
Motif-based analysis of large nucleotide data sets using MEME-ChIP

PubMed Central

Ma, Wenxiu; Noble, William S; Bailey, Timothy L

2014-01-01

MEME-ChIP is a web-based tool for analyzing motifs in large DNA or RNA data sets. It can analyze peak regions identified by ChIP-seq, cross-linking sites identified by cLIP-seq and related assays, as well as sets of genomic regions selected using other criteria. MEME-ChIP performs de novo motif discovery, motif enrichment analysis, motif location analysis and motif clustering, providing a comprehensive picture of the DNA or RNA motifs that are enriched in the input sequences. MEME-ChIP performs two complementary types of de novo motif discovery: weight matrix–based discovery for high accuracy; and word-based discovery for high sensitivity. Motif enrichment analysis using DNA or RNA motifs from human, mouse, worm, fly and other model organisms provides even greater sensitivity. MEME-ChIP’s interactive HTML output groups and aligns significant motifs to ease interpretation. this protocol takes less than 3 h, and it provides motif discovery approaches that are distinct and complementary to other online methods. PMID:24853928
Motif mismatches in microsatellites: insights from genome-wide investigation among 20 insect species.

PubMed

Behura, Susanta K; Severson, David W

2015-02-01

We present a detailed genome-wide comparative study of motif mismatches of microsatellites among 20 insect species representing five taxonomic orders. The results show that varying proportions (∼15-46%) of microsatellites identified in these species are imperfect in motif structure, and that they also vary in chromosomal distribution within genomes. It was observed that the genomic abundance of imperfect repeats is significantly associated with the length and number of motif mismatches of microsatellites. Furthermore, microsatellites with a higher number of mismatches tend to have lower abundance in the genome, suggesting that sequence heterogeneity of repeat motifs is a key determinant of genomic abundance of microsatellites. This relationship seems to be a general feature of microsatellites even in unrelated species such as yeast, roundworm, mouse and human. We provide a mechanistic explanation of the evolutionary link between motif heterogeneity and genomic abundance of microsatellites by examining the patterns of motif mismatches and allele sequences of single-nucleotide polymorphisms identified within microsatellite loci. Using Drosophila Reference Genetic Panel data, we further show that pattern of allelic variation modulates motif heterogeneity of microsatellites, and provide estimates of allele age of specific imperfect microsatellites found within protein-coding genes. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
AMP-acetyl CoA synthetase from Leishmania donovani: identification and functional analysis of 'PX4GK' motif.

PubMed

Soumya, Neelagiri; Kumar, I Sravan; Shivaprasad, S; Gorakh, Landage Nitin; Dinesh, Neeradi; Swamy, Kayala Kambagiri; Singh, Sushma

2015-04-01

An adenosine monophosphate forming acetyl CoA synthetase (AceCS) which is the key enzyme involved in the conversion of acetate to acetyl CoA has been identified from Leishmania donovani for the first time. Sequence analysis of L. donovani AceCS (LdAceCS) revealed the presence of a 'PX4GK' motif which is highly conserved throughout organisms with higher sequence identity (96%) to lower sequence identity (38%). A ∼ 77 kDa heterologous protein with C-terminal 6X His-tag was expressed in Escherichia coli. Expression of LdAceCS in promastigotes was confirmed by western blot and RT-PCR analysis. Immunolocalization studies revealed that it is a cytosolic protein. We also report the kinetic characterization of recombinant LdAceCS with acetate, adenosine 5'-triphosphate, coenzyme A and propionate as substrates. Site directed mutagenesis of residues in conserved PX4GK motif of LdAceCS was performed to gain insight into its potential role in substrate binding, catalysis and its role in maintaining structural integrity of the protein. P646A, G651A and K652R exhibited more than 90% loss in activity signifying its indispensible role in the enzyme activity. Substitution of other residues in this motif resulted in altered substrate specificity and catalysis. However, none of them had any role in modulation of the secondary structure of the protein except G651A mutant. Copyright © 2015 Elsevier B.V. All rights reserved.
TFBSshape: a motif database for DNA shape features of transcription factor binding sites.

PubMed

Yang, Lin; Zhou, Tianyin; Dror, Iris; Mathelier, Anthony; Wasserman, Wyeth W; Gordân, Raluca; Rohs, Remo

2014-01-01

Transcription factor binding sites (TFBSs) are most commonly characterized by the nucleotide preferences at each position of the DNA target. Whereas these sequence motifs are quite accurate descriptions of DNA binding specificities of transcription factors (TFs), proteins recognize DNA as a three-dimensional object. DNA structural features refine the description of TF binding specificities and provide mechanistic insights into protein-DNA recognition. Existing motif databases contain extensive nucleotide sequences identified in binding experiments based on their selection by a TF. To utilize DNA shape information when analysing the DNA binding specificities of TFs, we developed a new tool, the TFBSshape database (available at http://rohslab.cmb.usc.edu/TFBSshape/), for calculating DNA structural features from nucleotide sequences provided by motif databases. The TFBSshape database can be used to generate heat maps and quantitative data for DNA structural features (i.e., minor groove width, roll, propeller twist and helix twist) for 739 TF datasets from 23 different species derived from the motif databases JASPAR and UniPROBE. As demonstrated for the basic helix-loop-helix and homeodomain TF families, our TFBSshape database can be used to compare, qualitatively and quantitatively, the DNA binding specificities of closely related TFs and, thus, uncover differential DNA binding specificities that are not apparent from nucleotide sequence alone.
TFBSshape: a motif database for DNA shape features of transcription factor binding sites

PubMed Central

Yang, Lin; Zhou, Tianyin; Dror, Iris; Mathelier, Anthony; Wasserman, Wyeth W.; Gordân, Raluca; Rohs, Remo

2014-01-01

Transcription factor binding sites (TFBSs) are most commonly characterized by the nucleotide preferences at each position of the DNA target. Whereas these sequence motifs are quite accurate descriptions of DNA binding specificities of transcription factors (TFs), proteins recognize DNA as a three-dimensional object. DNA structural features refine the description of TF binding specificities and provide mechanistic insights into protein–DNA recognition. Existing motif databases contain extensive nucleotide sequences identified in binding experiments based on their selection by a TF. To utilize DNA shape information when analysing the DNA binding specificities of TFs, we developed a new tool, the TFBSshape database (available at http://rohslab.cmb.usc.edu/TFBSshape/), for calculating DNA structural features from nucleotide sequences provided by motif databases. The TFBSshape database can be used to generate heat maps and quantitative data for DNA structural features (i.e., minor groove width, roll, propeller twist and helix twist) for 739 TF datasets from 23 different species derived from the motif databases JASPAR and UniPROBE. As demonstrated for the basic helix-loop-helix and homeodomain TF families, our TFBSshape database can be used to compare, qualitatively and quantitatively, the DNA binding specificities of closely related TFs and, thus, uncover differential DNA binding specificities that are not apparent from nucleotide sequence alone. PMID:24214955
SA-Mot: a web server for the identification of motifs of interest extracted from protein loops

PubMed Central

Regad, Leslie; Saladin, Adrien; Maupetit, Julien; Geneix, Colette; Camproux, Anne-Claude

2011-01-01

The detection of functional motifs is an important step for the determination of protein functions. We present here a new web server SA-Mot (Structural Alphabet Motif) for the extraction and location of structural motifs of interest from protein loops. Contrary to other methods, SA-Mot does not focus only on functional motifs, but it extracts recurrent and conserved structural motifs involved in structural redundancy of loops. SA-Mot uses the structural word notion to extract all structural motifs from uni-dimensional sequences corresponding to loop structures. Then, SA-Mot provides a description of these structural motifs using statistics computed in the loop data set and in SCOP superfamily, sequence and structural parameters. SA-Mot results correspond to an interactive table listing all structural motifs extracted from a target structure and their associated descriptors. Using this information, the users can easily locate loop regions that are important for the protein folding and function. The SA-Mot web server is available at http://sa-mot.mti.univ-paris-diderot.fr. PMID:21665924
SA-Mot: a web server for the identification of motifs of interest extracted from protein loops.

PubMed

Regad, Leslie; Saladin, Adrien; Maupetit, Julien; Geneix, Colette; Camproux, Anne-Claude

2011-07-01

The detection of functional motifs is an important step for the determination of protein functions. We present here a new web server SA-Mot (Structural Alphabet Motif) for the extraction and location of structural motifs of interest from protein loops. Contrary to other methods, SA-Mot does not focus only on functional motifs, but it extracts recurrent and conserved structural motifs involved in structural redundancy of loops. SA-Mot uses the structural word notion to extract all structural motifs from uni-dimensional sequences corresponding to loop structures. Then, SA-Mot provides a description of these structural motifs using statistics computed in the loop data set and in SCOP superfamily, sequence and structural parameters. SA-Mot results correspond to an interactive table listing all structural motifs extracted from a target structure and their associated descriptors. Using this information, the users can easily locate loop regions that are important for the protein folding and function. The SA-Mot web server is available at http://sa-mot.mti.univ-paris-diderot.fr.
Discriminative motif discovery via simulated evolution and random under-sampling.

PubMed

Song, Tao; Gu, Hong

2014-01-01

Conserved motifs in biological sequences are closely related to their structure and functions. Recently, discriminative motif discovery methods have attracted more and more attention. However, little attention has been devoted to the data imbalance problem, which is one of the main reasons affecting the performance of the discriminative models. In this article, a simulated evolution method is applied to solve the multi-class imbalance problem at the stage of data preprocessing, and at the stage of Hidden Markov Models (HMMs) training, a random under-sampling method is introduced for the imbalance between the positive and negative datasets. It is shown that, in the task of discovering targeting motifs of nine subcellular compartments, the motifs found by our method are more conserved than the methods without considering data imbalance problem and recover the most known targeting motifs from Minimotif Miner and InterPro. Meanwhile, we use the found motifs to predict protein subcellular localization and achieve higher prediction precision and recall for the minority classes.
Function-based classification of carbohydrate-active enzymes by recognition of short, conserved peptide motifs.

PubMed

Busk, Peter Kamp; Lange, Lene

2013-06-01

Functional prediction of carbohydrate-active enzymes is difficult due to low sequence identity. However, similar enzymes often share a few short motifs, e.g., around the active site, even when the overall sequences are very different. To exploit this notion for functional prediction of carbohydrate-active enzymes, we developed a simple algorithm, peptide pattern recognition (PPR), that can divide proteins into groups of sequences that share a set of short conserved sequences. When this method was used on 118 glycoside hydrolase 5 proteins with 9% average pairwise identity and representing four characterized enzymatic functions, 97% of the proteins were sorted into groups correlating with their enzymatic activity. Furthermore, we analyzed 8,138 glycoside hydrolase 13 proteins including 204 experimentally characterized enzymes with 28 different functions. There was a 91% correlation between group and enzyme activity. These results indicate that the function of carbohydrate-active enzymes can be predicted with high precision by finding short, conserved motifs in their sequences. The glycoside hydrolase 61 family is important for fungal biomass conversion, but only a few proteins of this family have been functionally characterized. Interestingly, PPR divided 743 glycoside hydrolase 61 proteins into 16 subfamilies useful for targeted investigation of the function of these proteins and pinpointed three conserved motifs with putative importance for enzyme activity. Furthermore, the conserved sequences were useful for cloning of new, subfamily-specific glycoside hydrolase 61 proteins from 14 fungi. In conclusion, identification of conserved sequence motifs is a new approach to sequence analysis that can predict carbohydrate-active enzyme functions with high precision.

Dissecting protein loops with a statistical scalpel suggests a functional implication of some structural motifs.

PubMed

Regad, Leslie; Martin, Juliette; Camproux, Anne-Claude

2011-06-20

One of the strategies for protein function annotation is to search particular structural motifs that are known to be shared by proteins with a given function. Here, we present a systematic extraction of structural motifs of seven residues from protein loops and we explore their correspondence with functional sites. Our approach is based on the structural alphabet HMM-SA (Hidden Markov Model - Structural Alphabet), which allows simplification of protein structures into uni-dimensional sequences, and advanced pattern statistics adapted to short sequences. Structural motifs of interest are selected by looking for structural motifs significantly over-represented in SCOP superfamilies in protein loops. We discovered two types of structural motifs significantly over-represented in SCOP superfamilies: (i) ubiquitous motifs, shared by several superfamilies and (ii) superfamily-specific motifs, over-represented in few superfamilies. A comparison of ubiquitous words with known small structural motifs shows that they contain well-described motifs as turn, niche or nest motifs. A comparison between superfamily-specific motifs and biological annotations of Swiss-Prot reveals that some of them actually correspond to functional sites involved in the binding sites of small ligands, such as ATP/GTP, NAD(P) and SAH/SAM. Our findings show that statistical over-representation in SCOP superfamilies is linked to functional features. The detection of over-represented motifs within structures simplified by HMM-SA is therefore a promising approach for prediction of functional sites and annotation of uncharacterized proteins.
Dissecting protein loops with a statistical scalpel suggests a functional implication of some structural motifs

PubMed Central

2011-01-01

Background One of the strategies for protein function annotation is to search particular structural motifs that are known to be shared by proteins with a given function. Results Here, we present a systematic extraction of structural motifs of seven residues from protein loops and we explore their correspondence with functional sites. Our approach is based on the structural alphabet HMM-SA (Hidden Markov Model - Structural Alphabet), which allows simplification of protein structures into uni-dimensional sequences, and advanced pattern statistics adapted to short sequences. Structural motifs of interest are selected by looking for structural motifs significantly over-represented in SCOP superfamilies in protein loops. We discovered two types of structural motifs significantly over-represented in SCOP superfamilies: (i) ubiquitous motifs, shared by several superfamilies and (ii) superfamily-specific motifs, over-represented in few superfamilies. A comparison of ubiquitous words with known small structural motifs shows that they contain well-described motifs as turn, niche or nest motifs. A comparison between superfamily-specific motifs and biological annotations of Swiss-Prot reveals that some of them actually correspond to functional sites involved in the binding sites of small ligands, such as ATP/GTP, NAD(P) and SAH/SAM. Conclusions Our findings show that statistical over-representation in SCOP superfamilies is linked to functional features. The detection of over-represented motifs within structures simplified by HMM-SA is therefore a promising approach for prediction of functional sites and annotation of uncharacterized proteins. PMID:21689388
STEME: A Robust, Accurate Motif Finder for Large Data Sets

PubMed Central

Reid, John E.; Wernisch, Lorenz

2014-01-01

Motif finding is a difficult problem that has been studied for over 20 years. Some older popular motif finders are not suitable for analysis of the large data sets generated by next-generation sequencing. We recently published an efficient approximation (STEME) to the EM algorithm that is at the core of many motif finders such as MEME. This approximation allows the EM algorithm to be applied to large data sets. In this work we describe several efficient extensions to STEME that are based on the MEME algorithm. Together with the original STEME EM approximation, these extensions make STEME a fully-fledged motif finder with similar properties to MEME. We discuss the difficulty of objectively comparing motif finders. We show that STEME performs comparably to existing prominent discriminative motif finders, DREME and Trawler, on 13 sets of transcription factor binding data in mouse ES cells. We demonstrate the ability of STEME to find long degenerate motifs which these discriminative motif finders do not find. As part of our method, we extend an earlier method due to Nagarajan et al. for the efficient calculation of motif E-values. STEME's source code is available under an open source license and STEME is available via a web interface. PMID:24625410
Motif types, motif locations and base composition patterns around the RNA polyadenylation site in microorganisms, plants and animals

PubMed Central

2014-01-01

Background The polyadenylation of RNA is critical for gene functioning, but the conserved sequence motifs (often called signal or signature motifs), motif locations and abundances, and base composition patterns around mRNA polyadenylation [poly(A)] sites are still uncharacterized in most species. The evolutionary tendency for poly(A) site selection is still largely unknown. Results We analyzed the poly(A) site regions of 31 species or phyla. Different groups of species showed different poly(A) signal motifs: UUACUU at the poly(A) site in the parasite Trypanosoma cruzi; UGUAAC (approximately 13 bases upstream of the site) in the alga Chlamydomonas reinhardtii; UGUUUG (or UGUUUGUU) at mainly the fourth base downstream of the poly(A) site in the parasite Blastocystis hominis; and AAUAAA at approximately 16 bases and approximately 19 bases upstream of the poly(A) site in animals and plants, respectively. Polyadenylation signal motifs are usually several hundred times more abundant around poly(A) sites than in whole genomes. These predominant motifs usually had very specific locations, whether upstream of, at, or downstream of poly(A) sites, depending on the species or phylum. The poly(A) site was usually an adenosine (A) in all analyzed species except for B. hominis, and there was weak A predominance in C. reinhardtii. Fungi, animals, plants, and the protist Phytophthora infestans shared a general base abundance pattern (or base composition pattern) of “U-rich—A-rich—U-rich—Poly(A) site—U-rich regions”, or U-A-U-A-U for short, with some variation for each kingdom or subkingdom. Conclusion This study identified the poly(A) signal motifs, motif locations, and base composition patterns around mRNA poly(A) sites in protists, fungi, plants, and animals and provided insight into poly(A) site evolution. PMID:25052519
Functional analysis reveals the possible role of the C-terminal sequences and PI motif in the function of lily (Lilium longiflorum) PISTILLATA (PI) orthologues

PubMed Central

Chen, Ming-Kun; Hsieh, Wen-Ping; Yang, Chang-Hsien

2012-01-01

Two lily (Lilium longiflorum) PISTILLATA (PI) genes, Lily MADS Box Gene 8 and 9 (LMADS8/9), were characterized. LMADS9 lacked 29 C-terminal amino acids including the PI motif that was present in LMADS8. Both LMADS8/9 mRNAs were prevalent in the first and second whorl tepals during all stages of development and were expressed in the stamen only in young flower buds. LMADS8/9 could both form homodimers, but the ability of LMADS8 homodimers to bind to CArG1 was relatively stronger than that of LMADS9 homodimers. 35S:LMADS8 completely, and 35S:LMADS9 only partially, rescued the second whorl petal formation and partially converted the first whorl sepal into a petal-like structure in Arabidopsis pi-1 mutants. Ectopic expression of LMADS8-C (with deletion of the 29 amino acids of the C-terminal sequence) or LMADS8-PI (with only the PI motif deleted) only partially rescued petal formation in pi mutants, which was similar to what was observed in 35S:LMADS9/pi plants. In contrast, 35:LMADS9+L8C (with the addition of the 29 amino acids of the LMADS8 C-terminal sequence) or 35S:LMADS9+L8PI (with the addition of the LMADS8 PI motif) demonstrated an increased ability to rescue petal formation in pi mutants, which was similar to what was observed in 35S:LMADS8/pi plants. Furthermore, ectopic expression of LMADS8-M (with the MADS domain truncated) generated more severe dominant negative phenotypes than those seen in 35S:LMADS9-M flowers. These results revealed that the 29 amino acids including the PI motif in the C-terminal region of the lily PI orthologue are valuable for its function in regulating perianth organ formation. PMID:22068145
Effector prediction in host-pathogen interaction based on a Markov model of a ubiquitous EPIYA motif

PubMed Central

2010-01-01

Background Effector secretion is a common strategy of pathogen in mediating host-pathogen interaction. Eight EPIYA-motif containing effectors have recently been discovered in six pathogens. Once these effectors enter host cells through type III/IV secretion systems (T3SS/T4SS), tyrosine in the EPIYA motif is phosphorylated, which triggers effectors binding other proteins to manipulate host-cell functions. The objectives of this study are to evaluate the distribution pattern of EPIYA motif in broad biological species, to predict potential effectors with EPIYA motif, and to suggest roles and biological functions of potential effectors in host-pathogen interactions. Results A hidden Markov model (HMM) of five amino acids was built for the EPIYA-motif based on the eight known effectors. Using this HMM to search the non-redundant protein database containing 9,216,047 sequences, we obtained 107,231 sequences with at least one EPIYA motif occurrence and 3115 sequences with multiple repeats of the EPIYA motif. Although the EPIYA motif exists among broad species, it is significantly over-represented in some particular groups of species. For those proteins containing at least four copies of EPIYA motif, most of them are from intracellular bacteria, extracellular bacteria with T3SS or T4SS or intracellular protozoan parasites. By combining the EPIYA motif and the adjacent SH2 binding motifs (KK, R4, Tarp and Tir), we built HMMs of nine amino acids and predicted many potential effectors in bacteria and protista by the HMMs. Some potential effectors for pathogens (such as Lawsonia intracellularis, Plasmodium falciparum and Leishmania major) are suggested. Conclusions Our study indicates that the EPIYA motif may be a ubiquitous functional site for effectors that play an important pathogenicity role in mediating host-pathogen interactions. We suggest that some intracellular protozoan parasites could secrete EPIYA-motif containing effectors through secretion systems similar to the
Isosteric And Non-Isosteric Base Pairs In RNA Motifs: Molecular Dynamics And Bioinformatics Study Of The Sarcin-Ricin Internal Loop

PubMed Central

Havrila, Marek; Réblová, Kamila; Zirbel, Craig L.; Leontis, Neocles B.; Šponer, Jiří

2013-01-01

The Sarcin-Ricin RNA motif (SR motif) is one of the most prominent recurrent RNA building blocks that occurs in many different RNA contexts and folds autonomously, i.e., in a context-independent manner. In this study, we combined bioinformatics analysis with explicit-solvent molecular dynamics (MD) simulations to better understand the relation between the RNA sequence and the evolutionary patterns of SR motif. SHAPE probing experiment was also performed to confirm fidelity of MD simulations. We identified 57 instances of the SR motif in a non-redundant subset of the RNA X-ray structure database and analyzed their basepairing, base-phosphate, and backbone-backbone interactions. We extracted sequences aligned to these instances from large ribosomal RNA alignments to determine frequency of occurrence for different sequence variants. We then used a simple scoring scheme based on isostericity to suggest 10 sequence variants with highly variable expected degree of compatibility with the SR motif 3D structure. We carried out MD simulations of SR motifs with these base substitutions. Non isosteric base substitutions led to unstable structures, but so did isosteric substitutions which were unable to make key base-phosphate interactions. MD technique explains why some potentially isosteric SR motifs are not realized during evolution. We also found that inability to form stable cWW geometry is an important factor in case of the first base pair of the flexible region of the SR motif. Comparison of structural, bioinformatics, SHAPE probing and MD simulation data reveals that explicit solvent MD simulations neatly reflect viability of different sequence variants of the SR motif. Thus, MD simulations can efficiently complement bioinformatics tools in studies of conservation patterns of RNA motifs and provide atomistic insight into the role of their different signature interactions. PMID:24144333
GPUmotif: An Ultra-Fast and Energy-Efficient Motif Analysis Program Using Graphics Processing Units

PubMed Central

Zandevakili, Pooya; Hu, Ming; Qin, Zhaohui

2012-01-01

Computational detection of TF binding patterns has become an indispensable tool in functional genomics research. With the rapid advance of new sequencing technologies, large amounts of protein-DNA interaction data have been produced. Analyzing this data can provide substantial insight into the mechanisms of transcriptional regulation. However, the massive amount of sequence data presents daunting challenges. In our previous work, we have developed a novel algorithm called Hybrid Motif Sampler (HMS) that enables more scalable and accurate motif analysis. Despite much improvement, HMS is still time-consuming due to the requirement to calculate matching probabilities position-by-position. Using the NVIDIA CUDA toolkit, we developed a graphics processing unit (GPU)-accelerated motif analysis program named GPUmotif. We proposed a “fragmentation" technique to hide data transfer time between memories. Performance comparison studies showed that commonly-used model-based motif scan and de novo motif finding procedures such as HMS can be dramatically accelerated when running GPUmotif on NVIDIA graphics cards. As a result, energy consumption can also be greatly reduced when running motif analysis using GPUmotif. The GPUmotif program is freely available at http://sourceforge.net/projects/gpumotif/ PMID:22662128
GPUmotif: an ultra-fast and energy-efficient motif analysis program using graphics processing units.

PubMed

Zandevakili, Pooya; Hu, Ming; Qin, Zhaohui

2012-01-01

Computational detection of TF binding patterns has become an indispensable tool in functional genomics research. With the rapid advance of new sequencing technologies, large amounts of protein-DNA interaction data have been produced. Analyzing this data can provide substantial insight into the mechanisms of transcriptional regulation. However, the massive amount of sequence data presents daunting challenges. In our previous work, we have developed a novel algorithm called Hybrid Motif Sampler (HMS) that enables more scalable and accurate motif analysis. Despite much improvement, HMS is still time-consuming due to the requirement to calculate matching probabilities position-by-position. Using the NVIDIA CUDA toolkit, we developed a graphics processing unit (GPU)-accelerated motif analysis program named GPUmotif. We proposed a "fragmentation" technique to hide data transfer time between memories. Performance comparison studies showed that commonly-used model-based motif scan and de novo motif finding procedures such as HMS can be dramatically accelerated when running GPUmotif on NVIDIA graphics cards. As a result, energy consumption can also be greatly reduced when running motif analysis using GPUmotif. The GPUmotif program is freely available at http://sourceforge.net/projects/gpumotif/
ELM: the status of the 2010 eukaryotic linear motif resource

PubMed Central

Gould, Cathryn M.; Diella, Francesca; Via, Allegra; Puntervoll, Pål; Gemünd, Christine; Chabanis-Davidson, Sophie; Michael, Sushama; Sayadi, Ahmed; Bryne, Jan Christian; Chica, Claudia; Seiler, Markus; Davey, Norman E.; Haslam, Niall; Weatheritt, Robert J.; Budd, Aidan; Hughes, Tim; Paś, Jakub; Rychlewski, Leszek; Travé, Gilles; Aasland, Rein; Helmer-Citterich, Manuela; Linding, Rune; Gibson, Toby J.

2010-01-01

Linear motifs are short segments of multidomain proteins that provide regulatory functions independently of protein tertiary structure. Much of intracellular signalling passes through protein modifications at linear motifs. Many thousands of linear motif instances, most notably phosphorylation sites, have now been reported. Although clearly very abundant, linear motifs are difficult to predict de novo in protein sequences due to the difficulty of obtaining robust statistical assessments. The ELM resource at http://elm.eu.org/ provides an expanding knowledge base, currently covering 146 known motifs, with annotation that includes >1300 experimentally reported instances. ELM is also an exploratory tool for suggesting new candidates of known linear motifs in proteins of interest. Information about protein domains, protein structure and native disorder, cellular and taxonomic contexts is used to reduce or deprecate false positive matches. Results are graphically displayed in a ‘Bar Code’ format, which also displays known instances from homologous proteins through a novel ‘Instance Mapper’ protocol based on PHI-BLAST. ELM server output provides links to the ELM annotation as well as to a number of remote resources. Using the links, researchers can explore the motifs, proteins, complex structures and associated literature to evaluate whether candidate motifs might be worth experimental investigation. PMID:19920119
Subtle Changes in Motif Positioning Cause Tissue-Specific Effects on Robustness of an Enhancer's Activity

PubMed Central

Erceg, Jelena; Saunders, Timothy E.; Girardot, Charles; Devos, Damien P.; Hufnagel, Lars; Furlong, Eileen E. M.

2014-01-01

Deciphering the specific contribution of individual motifs within cis-regulatory modules (CRMs) is crucial to understanding how gene expression is regulated and how this process is affected by sequence variation. But despite vast improvements in the ability to identify where transcription factors (TFs) bind throughout the genome, we are limited in our ability to relate information on motif occupancy to function from sequence alone. Here, we engineered 63 synthetic CRMs to systematically assess the relationship between variation in the content and spacing of motifs within CRMs to CRM activity during development using Drosophila transgenic embryos. In over half the cases, very simple elements containing only one or two types of TF binding motifs were capable of driving specific spatio-temporal patterns during development. Different motif organizations provide different degrees of robustness to enhancer activity, ranging from binary on-off responses to more subtle effects including embryo-to-embryo and within-embryo variation. By quantifying the effects of subtle changes in motif organization, we were able to model biophysical rules that explain CRM behavior and may contribute to the spatial positioning of CRM activity in vivo. For the same enhancer, the effects of small differences in motif positions varied in developmentally related tissues, suggesting that gene expression may be more susceptible to sequence variation in one tissue compared to another. This result has important implications for human eQTL studies in which many associated mutations are found in cis-regulatory regions, though the mechanism for how they affect tissue-specific gene expression is often not understood. PMID:24391522
Modeling of DNA local parameters predicts encrypted architectural motifs in Xenopus laevis ribosomal gene promoter

PubMed Central

Roux-Rouquie, Magali; Marilley, Monique

2000-01-01

We have modeled local DNA sequence parameters to search for DNA architectural motifs involved in transcription regulation and promotion within the Xenopus laevis ribosomal gene promoter and the intergenic spacer (IGS) sequences. The IGS was found to be shaped into distinct topological domains. First, intrinsic bends split the IGS into domains of common but different helical features. Local parameters at inter-domain junctions exhibit a high variability with respect to intrinsic curvature, bendability and thermal stability. Secondly, the repeated sequence blocks of the IGS exhibit right-handed supercoiled structures which could be related to their enhancer properties. Thirdly, the gene promoter presents both inherent curvature and minor groove narrowing which may be viewed as motifs of a structural code for protein recognition and binding. Such pre-existing deformations could simply be remodeled during the binding of the transcription complex. Alternatively, these deformations could pre-shape the promoter in such a way that further remodeling is facilitated. Mutations shown to abolish promoter curvature as well as intrinsic minor groove narrowing, in a variant which maintained full transcriptional activity, bring circumstantial evidence for structurally-preorganized motifs in relation to transcription regulation and promotion. Using well documented X.laevis rDNA regulatory sequences we showed that computer modeling may be of invaluable assistance in assessing encrypted architectural motifs. The evidence of these DNA topological motifs with respect to the concept of structural code is discussed. PMID:10982860
Modeling of DNA local parameters predicts encrypted architectural motifs in Xenopus laevis ribosomal gene promoter.

PubMed

Roux-Rouquie, M; Marilley, M

2000-09-15

We have modeled local DNA sequence parameters to search for DNA architectural motifs involved in transcription regulation and promotion within the Xenopus laevis ribosomal gene promoter and the intergenic spacer (IGS) sequences. The IGS was found to be shaped into distinct topological domains. First, intrinsic bends split the IGS into domains of common but different helical features. Local parameters at inter-domain junctions exhibit a high variability with respect to intrinsic curvature, bendability and thermal stability. Secondly, the repeated sequence blocks of the IGS exhibit right-handed supercoiled structures which could be related to their enhancer properties. Thirdly, the gene promoter presents both inherent curvature and minor groove narrowing which may be viewed as motifs of a structural code for protein recognition and binding. Such pre-existing deformations could simply be remodeled during the binding of the transcription complex. Alternatively, these deformations could pre-shape the promoter in such a way that further remodeling is facilitated. Mutations shown to abolish promoter curvature as well as intrinsic minor groove narrowing, in a variant which maintained full transcriptional activity, bring circumstantial evidence for structurally-preorganized motifs in relation to transcription regulation and promotion. Using well documented X. laevis rDNA regulatory sequences we showed that computer modeling may be of invaluable assistance in assessing encrypted architectural motifs. The evidence of these DNA topological motifs with respect to the concept of structural code is discussed.
A novel Arg H52/Tyr H33 conservative motif in antibodies: A correlation between sequence of antibodies and antigen binding.

PubMed

Petrov, Artem; Arzhanik, Vladimir; Makarov, Gennady; Koliasnikov, Oleg

2016-08-01

Antibodies are the family of proteins, which are responsible for antigen recognition. The computational modeling of interaction between an antigen and an antibody is very important when crystallographic structure is unavailable. In this research, we have discovered the correlation between the amino acid sequence of antibody and its specific binding characteristics on the example of the novel conservative binding motif, which consists of four residues: Arg H52, Tyr H33, Thr H59, and Glu H61. These residues are specifically oriented in the binding site and interact with each other in a specific manner. The residues of the binding motif are involved in interaction strictly with negatively charged groups of antigens, and form a binding complex. Mechanism of interaction and characteristics of the complex were also discovered. The results of this research can be used to increase the accuracy of computational antibody-antigen interaction modeling and for post-modeling quality control of the modeled structures.
A structural-alphabet-based strategy for finding structural motifs across protein families

PubMed Central

Wu, Chih Yuan; Chen, Yao Chi; Lim, Carmay

2010-01-01

Proteins with insignificant sequence and overall structure similarity may still share locally conserved contiguous structural segments; i.e. structural/3D motifs. Most methods for finding 3D motifs require a known motif to search for other similar structures or functionally/structurally crucial residues. Here, without requiring a query motif or essential residues, a fully automated method for discovering 3D motifs of various sizes across protein families with different folds based on a 16-letter structural alphabet is presented. It was applied to structurally non-redundant proteins bound to DNA, RNA, obligate/non-obligate proteins as well as free DNA-binding proteins (DBPs) and proteins with known structures but unknown function. Its usefulness was illustrated by analyzing the 3D motifs found in DBPs. A non-specific motif was found with a ‘corner’ architecture that confers a stable scaffold and enables diverse interactions, making it suitable for binding not only DNA but also RNA and proteins. Furthermore, DNA-specific motifs present ‘only’ in DBPs were discovered. The motifs found can provide useful guidelines in detecting binding sites and computational protein redesign. PMID:20525797
Non-B DB: a database of predicted non-B DNA-forming motifs in mammalian genomes.

PubMed

Cer, Regina Z; Bruce, Kevin H; Mudunuri, Uma S; Yi, Ming; Volfovsky, Natalia; Luke, Brian T; Bacolla, Albino; Collins, Jack R; Stephens, Robert M

2011-01-01

Although the capability of DNA to form a variety of non-canonical (non-B) structures has long been recognized, the overall significance of these alternate conformations in biology has only recently become accepted en masse. In order to provide access to genome-wide locations of these classes of predicted structures, we have developed non-B DB, a database integrating annotations and analysis of non-B DNA-forming sequence motifs. The database provides the most complete list of alternative DNA structure predictions available, including Z-DNA motifs, quadruplex-forming motifs, inverted repeats, mirror repeats and direct repeats and their associated subsets of cruciforms, triplex and slipped structures, respectively. The database also contains motifs predicted to form static DNA bends, short tandem repeats and homo(purine•pyrimidine) tracts that have been associated with disease. The database has been built using the latest releases of the human, chimp, dog, macaque and mouse genomes, so that the results can be compared directly with other data sources. In order to make the data interpretable in a genomic context, features such as genes, single-nucleotide polymorphisms and repetitive elements (SINE, LINE, etc.) have also been incorporated. The database is accessed through query pages that produce results with links to the UCSC browser and a GBrowse-based genomic viewer. It is freely accessible at http://nonb.abcc.ncifcrf.gov.
Identification and preliminary characterization of a protein motif related to the zinc finger.

PubMed Central

Lovering, R; Hanson, I M; Borden, K L; Martin, S; O'Reilly, N J; Evan, G I; Rahman, D; Pappin, D J; Trowsdale, J; Freemont, P S

1993-01-01

We have identified a protein motif, related to the zinc finger, which defines a newly discovered family of proteins. The motif was found in the sequence of the human RING1 gene, which is proximal to the major histocompatibility complex region on chromosome six. We propose naming this motif the "RING finger" and it is found in 27 proteins, all of which have putative DNA binding functions. We have synthesized a peptide corresponding to the RING1 motif and examined a number of properties, including metal and DNA binding. We provide evidence to support the suggestion that the RING finger motif is the DNA binding domain of this newly defined family of proteins. Images Fig. 1 Fig. 4 PMID:7681583
Association of a Model Transmembrane Peptide Containing Gly in a Heptad Sequence Motif

PubMed Central

Lear, James D.; Stouffer, Amanda L.; Gratkowski, Holly; Nanda, Vikas; DeGrado, William F.

2004-01-01

A peptide containing glycine at a and d positions of a heptad motif was synthesized to investigate the possibility that membrane-soluble peptides with a Gly-based, left-handed helical packing motif would associate. Based on analytical ultracentrifugation in C14-betaine detergent micelles, the peptide did associate in a monomer-dimer equilibrium, although the association constant was significantly less than that reported for the right-handed dimer of the glycophorin A transmembrane peptide in similar detergents. Fluorescence resonance energy transfer (FRET) experiments conducted on peptides labeled at their N-termini with either tetramethylrhodamine (TMR) or 7-nitrobenz-2-oxa-1,3-diazole (NBD) also indicated association. However, analysis of the FRET data using the usual assumption of complete quenching for NBD-TMR pairs in the dimer could not be quantitatively reconciled with the analytical ultracentrifugation-measured dimerization constant. This led us to develop a general treatment for the association of helices to either parallel or antiparallel structures of any aggregation state. Applying this treatment to the FRET data, constraining the dimerization constant to be within experimental uncertainty of that measured by analytical ultracentrifugation, we found the data could be well described by a monomer-dimer equilibrium with only partial quenching of the dimer, suggesting that the helices are most probably antiparallel. These results also suggest that a left-handed Gly heptad repeat motif can drive membrane helix association, but the affinity is likely to be less strong than the previously reported right-handed motif described for glycophorin A. PMID:15315956
Anion induced conformational preference of Cα NN motif residues in functional proteins.

PubMed

Patra, Piya; Ghosh, Mahua; Banerjee, Raja; Chakrabarti, Jaydeb

2017-12-01

Among different ligand binding motifs, anion binding C α NN motif consisting of peptide backbone atoms of three consecutive residues are observed to be important for recognition of free anions, like sulphate or biphosphate and participate in different key functions. Here we study the interaction of sulphate and biphosphate with C α NN motif present in different proteins. Instead of total protein, a peptide fragment has been studied keeping C α NN motif flanked in between other residues. We use classical force field based molecular dynamics simulations to understand the stability of this motif. Our data indicate fluctuations in conformational preferences of the motif residues in absence of the anion. The anion gives stability to one of these conformations. However, the anion induced conformational preferences are highly sequence dependent and specific to the type of anion. In particular, the polar residues are more favourable compared to the other residues for recognising the anion. © 2017 Wiley Periodicals, Inc.
SLiMSearch 2.0: biological context for short linear motifs in proteins

PubMed Central

Davey, Norman E.; Haslam, Niall J.; Shields, Denis C.

2011-01-01

Short, linear motifs (SLiMs) play a critical role in many biological processes. The SLiMSearch 2.0 (Short, Linear Motif Search) web server allows researchers to identify occurrences of a user-defined SLiM in a proteome, using conservation and protein disorder context statistics to rank occurrences. User-friendly output and visualizations of motif context allow the user to quickly gain insight into the validity of a putatively functional motif occurrence. For each motif occurrence, overlapping UniProt features and annotated SLiMs are displayed. Visualization also includes annotated multiple sequence alignments surrounding each occurrence, showing conservation and protein disorder statistics in addition to known and predicted SLiMs, protein domains and known post-translational modifications. In addition, enrichment of Gene Ontology terms and protein interaction partners are provided as indicators of possible motif function. All web server results are available for download. Users can search motifs against the human proteome or a subset thereof defined by Uniprot accession numbers or GO term. The SLiMSearch server is available at: http://bioware.ucd.ie/slimsearch2.html. PMID:21622654

Promoter Motifs in NCLDVs: An Evolutionary Perspective

PubMed Central

Oliveira, Graziele Pereira; Andrade, Ana Cláudia dos Santos Pereira; Rodrigues, Rodrigo Araújo Lima; Arantes, Thalita Souza; Boratto, Paulo Victor Miranda; Silva, Ludmila Karen dos Santos; Dornas, Fábio Pio; Trindade, Giliane de Souza; Drumond, Betânia Paiva; La Scola, Bernard; Kroon, Erna Geessien; Abrahão, Jônatas Santos

2017-01-01

For many years, gene expression in the three cellular domains has been studied in an attempt to discover sequences associated with the regulation of the transcription process. Some specific transcriptional features were described in viruses, although few studies have been devoted to understanding the evolutionary aspects related to the spread of promoter motifs through related viral families. The discovery of giant viruses and the proposition of the new viral order Megavirales that comprise a monophyletic group, named nucleo-cytoplasmic large DNA viruses (NCLDV), raised new questions in the field. Some putative promoter sequences have already been described for some NCLDV members, bringing new insights into the evolutionary history of these complex microorganisms. In this review, we summarize the main aspects of the transcription regulation process in the three domains of life, followed by a systematic description of what is currently known about promoter regions in several NCLDVs. We also discuss how the analysis of the promoter sequences could bring new ideas about the giant viruses’ evolution. Finally, considering a possible common ancestor for the NCLDV group, we discussed possible promoters’ evolutionary scenarios and propose the term “MEGA-box” to designate an ancestor promoter motif (‘TATATAAAATTGA’) that could be evolved gradually by nucleotides’ gain and loss and point mutations. PMID:28117683
Rules for the recognition of dilysine retrieval motifs by coatomer

PubMed Central

Ma, Wenfu; Goldberg, Jonathan

2013-01-01

Cytoplasmic dilysine motifs on transmembrane proteins are captured by coatomer α-COP and β′-COP subunits and packaged into COPI-coated vesicles for Golgi-to-ER retrieval. Numerous ER/Golgi proteins contain K(x)Kxx motifs, but the rules for their recognition are unclear. We present crystal structures of α-COP and β′-COP bound to a series of naturally occurring retrieval motifs—encompassing KKxx, KxKxx and non-canonical RKxx and viral KxHxx sequences. Binding experiments show that α-COP and β′-COP have generally the same specificity for KKxx and KxKxx, but only β′-COP recognizes the RKxx signal. Dilysine motif recognition involves lysine side-chain interactions with two acidic patches. Surprisingly, however, KKxx and KxKxx motifs bind differently, with their lysine residues transposed at the binding patches. We derive rules for retrieval motif recognition from key structural features: the reversed binding modes, the recognition of the C-terminal carboxylate group which enforces lysine positional context, and the tolerance of the acidic patches for non-lysine residues. PMID:23481256
RSAT matrix-clustering: dynamic exploration and redundancy reduction of transcription factor binding motif collections

PubMed Central

Jaeger, Sébastien; Thieffry, Denis

2017-01-01

Abstract Transcription factor (TF) databases contain multitudes of binding motifs (TFBMs) from various sources, from which non-redundant collections are derived by manual curation. The advent of high-throughput methods stimulated the production of novel collections with increasing numbers of motifs. Meta-databases, built by merging these collections, contain redundant versions, because available tools are not suited to automatically identify and explore biologically relevant clusters among thousands of motifs. Motif discovery from genome-scale data sets (e.g. ChIP-seq) also produces redundant motifs, hampering the interpretation of results. We present matrix-clustering, a versatile tool that clusters similar TFBMs into multiple trees, and automatically creates non-redundant TFBM collections. A feature unique to matrix-clustering is its dynamic visualisation of aligned TFBMs, and its capability to simultaneously treat multiple collections from various sources. We demonstrate that matrix-clustering considerably simplifies the interpretation of combined results from multiple motif discovery tools, and highlights biologically relevant variations of similar motifs. We also ran a large-scale application to cluster ∼11 000 motifs from 24 entire databases, showing that matrix-clustering correctly groups motifs belonging to the same TF families, and drastically reduced motif redundancy. matrix-clustering is integrated within the RSAT suite (http://rsat.eu/), accessible through a user-friendly web interface or command-line for its integration in pipelines. PMID:28591841
Modeling of the Ebola Virus Delta Peptide Reveals a Potential Lytic Sequence Motif

PubMed Central

Gallaher, William R.; Garry, Robert F.

2015-01-01

Filoviruses, such as Ebola and Marburg viruses, cause severe outbreaks of human infection, including the extensive epidemic of Ebola virus disease (EVD) in West Africa in 2014. In the course of examining mutations in the glycoprotein gene associated with 2014 Ebola virus (EBOV) sequences, a differential level of conservation was noted between the soluble form of glycoprotein (sGP) and the full length glycoprotein (GP), which are both encoded by the GP gene via RNA editing. In the region of the proteins encoded after the RNA editing site sGP was more conserved than the overlapping region of GP when compared to a distant outlier species, Tai Forest ebolavirus. Half of the amino acids comprising the “delta peptide”, a 40 amino acid carboxy-terminal fragment of sGP, were identical between otherwise widely divergent species. A lysine-rich amphipathic peptide motif was noted at the carboxyl terminus of delta peptide with high structural relatedness to the cytolytic peptide of the non-structural protein 4 (NSP4) of rotavirus. EBOV delta peptide is a candidate viroporin, a cationic pore-forming peptide, and may contribute to EBOV pathogenesis. PMID:25609303
Analysis of septins across kingdoms reveals orthology and new motifs.

PubMed

Pan, Fangfang; Malmberg, Russell L; Momany, Michelle

2007-07-01

Septins are cytoskeletal GTPase proteins first discovered in the fungus Saccharomyces cerevisiae where they organize the septum and link nuclear division with cell division. More recently septins have been found in animals where they are important in processes ranging from actin and microtubule organization to embryonic patterning and where defects in septins have been implicated in human disease. Previous studies suggested that many animal septins fell into independent evolutionary groups, confounding cross-kingdom comparison. In the current work, we identified 162 septins from fungi, microsporidia and animals and analyzed their phylogenetic relationships. There was support for five groups of septins with orthology between kingdoms. Group 1 (which includes S. cerevisiae Cdc10p and human Sept9) and Group 2 (which includes S. cerevisiae Cdc3p and human Sept7) contain sequences from fungi and animals. Group 3 (which includes S. cerevisiae Cdc11p) and Group 4 (which includes S. cerevisiae Cdc12p) contain sequences from fungi and microsporidia. Group 5 (which includes Aspergillus nidulans AspE) contains sequences from filamentous fungi. We suggest a modified nomenclature based on these phylogenetic relationships. Comparative sequence alignments revealed septin derivatives of already known G1, G3 and G4 GTPase motifs, four new motifs from two to twelve amino acids long and six conserved single amino acid positions. One of these new motifs is septin-specific and several are group specific. Our studies provide an evolutionary history for this important family of proteins and a framework and consistent nomenclature for comparison of septin orthologs across kingdoms.
Motivated Proteins: A web application for studying small three-dimensional protein motifs

PubMed Central

Leader, David P; Milner-White, E James

2009-01-01

Background Small loop-shaped motifs are common constituents of the three-dimensional structure of proteins. Typically they comprise between three and seven amino acid residues, and are defined by a combination of dihedral angles and hydrogen bonding partners. The most abundant of these are αβ-motifs, asx-motifs, asx-turns, β-bulges, β-bulge loops, β-turns, nests, niches, Schellmann loops, ST-motifs, ST-staples and ST-turns. We have constructed a database of such motifs from a range of high-quality protein structures and built a web application as a visual interface to this. Description The web application, Motivated Proteins, provides access to these 12 motifs (with 48 sub-categories) in a database of over 400 representative proteins. Queries can be made for specific categories or sub-categories of motif, motifs in the vicinity of ligands, motifs which include part of an enzyme active site, overlapping motifs, or motifs which include a particular amino acid sequence. Individual proteins can be specified, or, where appropriate, motifs for all proteins listed. The results of queries are presented in textual form as an (X)HTML table, and may be saved as parsable plain text or XML. Motifs can be viewed and manipulated either individually or in the context of the protein in the Jmol applet structural viewer. Cartoons of the motifs imposed on a linear representation of protein secondary structure are also provided. Summary information for the motifs is available, as are histograms of amino acid distribution, and graphs of dihedral angles at individual positions in the motifs. Conclusion Motivated Proteins is a publicly and freely accessible web application that enables protein scientists to study small three-dimensional motifs without requiring knowledge of either Structured Query Language or the underlying database schema. PMID:19210785
Informative priors based on transcription factor structural class improve de novo motif discovery.

PubMed

Narlikar, Leelavati; Gordân, Raluca; Ohler, Uwe; Hartemink, Alexander J

2006-07-15

An important problem in molecular biology is to identify the locations at which a transcription factor (TF) binds to DNA, given a set of DNA sequences believed to be bound by that TF. In previous work, we showed that information in the DNA sequence of a binding site is sufficient to predict the structural class of the TF that binds it. In particular, this suggests that we can predict which locations in any DNA sequence are more likely to be bound by certain classes of TFs than others. Here, we argue that traditional methods for de novo motif finding can be significantly improved by adopting an informative prior probability that a TF binding site occurs at each sequence location. To demonstrate the utility of such an approach, we present priority, a powerful new de novo motif finding algorithm. Using data from TRANSFAC, we train three classifiers to recognize binding sites of basic leucine zipper, forkhead, and basic helix loop helix TFs. These classifiers are used to equip priority with three class-specific priors, in addition to a default prior to handle TFs of other classes. We apply priority and a number of popular motif finding programs to sets of yeast intergenic regions that are reported by ChIP-chip to be bound by particular TFs. priority identifies motifs the other methods fail to identify, and correctly predicts the structural class of the TF recognizing the identified binding sites. Supplementary material and code can be found at http://www.cs.duke.edu/~amink/.
info-gibbs: a motif discovery algorithm that directly optimizes information content during sampling.

PubMed

Defrance, Matthieu; van Helden, Jacques

2009-10-15

Discovering cis-regulatory elements in genome sequence remains a challenging issue. Several methods rely on the optimization of some target scoring function. The information content (IC) or relative entropy of the motif has proven to be a good estimator of transcription factor DNA binding affinity. However, these information-based metrics are usually used as a posteriori statistics rather than during the motif search process itself. We introduce here info-gibbs, a Gibbs sampling algorithm that efficiently optimizes the IC or the log-likelihood ratio (LLR) of the motif while keeping computation time low. The method compares well with existing methods like MEME, BioProspector, Gibbs or GAME on both synthetic and biological datasets. Our study shows that motif discovery techniques can be enhanced by directly focusing the search on the motif IC or the motif LLR. http://rsat.ulb.ac.be/rsat/info-gibbs
Cations Form Sequence Selective Motifs within DNA Grooves via a Combination of Cation-Pi and Ion-Dipole/Hydrogen Bond Interactions

PubMed Central

Stewart, Mikaela; Dunlap, Tori; Dourlain, Elizabeth; Grant, Bryce; McFail-Isom, Lori

2013-01-01

The fine conformational subtleties of DNA structure modulate many fundamental cellular processes including gene activation/repression, cellular division, and DNA repair. Most of these cellular processes rely on the conformational heterogeneity of specific DNA sequences. Factors including those structural characteristics inherent in the particular base sequence as well as those induced through interaction with solvent components combine to produce fine DNA structural variation including helical flexibility and conformation. Cation-pi interactions between solvent cations or their first hydration shell waters and the faces of DNA bases form sequence selectively and contribute to DNA structural heterogeneity. In this paper, we detect and characterize the binding patterns found in cation-pi interactions between solvent cations and DNA bases in a set of high resolution x-ray crystal structures. Specifically, we found that monovalent cations (Tl+) and the polarized first hydration shell waters of divalent cations (Mg2+, Ca2+) form cation-pi interactions with DNA bases stabilizing unstacked conformations. When these cation-pi interactions are combined with electrostatic interactions a pattern of specific binding motifs is formed within the grooves. PMID:23940752
Cations form sequence selective motifs within DNA grooves via a combination of cation-pi and ion-dipole/hydrogen bond interactions.

PubMed

Stewart, Mikaela; Dunlap, Tori; Dourlain, Elizabeth; Grant, Bryce; McFail-Isom, Lori

2013-01-01

The fine conformational subtleties of DNA structure modulate many fundamental cellular processes including gene activation/repression, cellular division, and DNA repair. Most of these cellular processes rely on the conformational heterogeneity of specific DNA sequences. Factors including those structural characteristics inherent in the particular base sequence as well as those induced through interaction with solvent components combine to produce fine DNA structural variation including helical flexibility and conformation. Cation-pi interactions between solvent cations or their first hydration shell waters and the faces of DNA bases form sequence selectively and contribute to DNA structural heterogeneity. In this paper, we detect and characterize the binding patterns found in cation-pi interactions between solvent cations and DNA bases in a set of high resolution x-ray crystal structures. Specifically, we found that monovalent cations (Tl⁺) and the polarized first hydration shell waters of divalent cations (Mg²⁺, Ca²⁺) form cation-pi interactions with DNA bases stabilizing unstacked conformations. When these cation-pi interactions are combined with electrostatic interactions a pattern of specific binding motifs is formed within the grooves.
Prediction of host - pathogen protein interactions between Mycobacterium tuberculosis and Homo sapiens using sequence motifs.

PubMed

Huo, Tong; Liu, Wei; Guo, Yu; Yang, Cheng; Lin, Jianping; Rao, Zihe

2015-03-26

Emergence of multiple drug resistant strains of M. tuberculosis (MDR-TB) threatens to derail global efforts aimed at reigning in the pathogen. Co-infections of M. tuberculosis with HIV are difficult to treat. To counter these new challenges, it is essential to study the interactions between M. tuberculosis and the host to learn how these bacteria cause disease. We report a systematic flow to predict the host pathogen interactions (HPIs) between M. tuberculosis and Homo sapiens based on sequence motifs. First, protein sequences were used as initial input for identifying the HPIs by 'interolog' method. HPIs were further filtered by prediction of domain-domain interactions (DDIs). Functional annotations of protein and publicly available experimental results were applied to filter the remaining HPIs. Using such a strategy, 118 pairs of HPIs were identified, which involve 43 proteins from M. tuberculosis and 48 proteins from Homo sapiens. A biological interaction network between M. tuberculosis and Homo sapiens was then constructed using the predicted inter- and intra-species interactions based on the 118 pairs of HPIs. Finally, a web accessible database named PATH (Protein interactions of M. tuberculosis and Human) was constructed to store these predicted interactions and proteins. This interaction network will facilitate the research on host-pathogen protein-protein interactions, and may throw light on how M. tuberculosis interacts with its host.
The LAM-PCR Method to Sequence LV Integration Sites.

PubMed

Wang, Wei; Bartholomae, Cynthia C; Gabriel, Richard; Deichmann, Annette; Schmidt, Manfred

2016-01-01

Integrating viral gene transfer vectors are commonly used gene delivery tools in clinical gene therapy trials providing stable integration and continuous gene expression of the transgene in the treated host cell. However, integration of the reverse-transcribed vector DNA into the host genome is a potentially mutagenic event that may directly contribute to unwanted side effects. A comprehensive and accurate analysis of the integration site (IS) repertoire is indispensable to study clonality in transduced cells obtained from patients undergoing gene therapy and to identify potential in vivo selection of affected cell clones. To date, next-generation sequencing (NGS) of vector-genome junctions allows sophisticated studies on the integration repertoire in vitro and in vivo. We have explored the use of the Illumina MiSeq Personal Sequencer platform to sequence vector ISs amplified by non-restrictive linear amplification-mediated PCR (nrLAM-PCR) and LAM-PCR. MiSeq-based high-quality IS sequence retrieval is accomplished by the introduction of a double-barcode strategy that substantially minimizes the frequency of IS sequence collisions compared to the conventionally used single-barcode protocol. Here, we present an updated protocol of (nr)LAM-PCR for the analysis of lentiviral IS using a double-barcode system and followed by deep sequencing using the MiSeq device.
RSAT matrix-clustering: dynamic exploration and redundancy reduction of transcription factor binding motif collections.

PubMed

Castro-Mondragon, Jaime Abraham; Jaeger, Sébastien; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques

2017-07-27

Transcription factor (TF) databases contain multitudes of binding motifs (TFBMs) from various sources, from which non-redundant collections are derived by manual curation. The advent of high-throughput methods stimulated the production of novel collections with increasing numbers of motifs. Meta-databases, built by merging these collections, contain redundant versions, because available tools are not suited to automatically identify and explore biologically relevant clusters among thousands of motifs. Motif discovery from genome-scale data sets (e.g. ChIP-seq) also produces redundant motifs, hampering the interpretation of results. We present matrix-clustering, a versatile tool that clusters similar TFBMs into multiple trees, and automatically creates non-redundant TFBM collections. A feature unique to matrix-clustering is its dynamic visualisation of aligned TFBMs, and its capability to simultaneously treat multiple collections from various sources. We demonstrate that matrix-clustering considerably simplifies the interpretation of combined results from multiple motif discovery tools, and highlights biologically relevant variations of similar motifs. We also ran a large-scale application to cluster ∼11 000 motifs from 24 entire databases, showing that matrix-clustering correctly groups motifs belonging to the same TF families, and drastically reduced motif redundancy. matrix-clustering is integrated within the RSAT suite (http://rsat.eu/), accessible through a user-friendly web interface or command-line for its integration in pipelines. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Onco-Regulon: an integrated database and software suite for site specific targeting of transcription factors of cancer genes

PubMed Central

Tomar, Navneet; Mishra, Akhilesh; Mrinal, Nirotpal; Jayaram, B.

2016-01-01

Transcription factors (TFs) bind at multiple sites in the genome and regulate expression of many genes. Regulating TF binding in a gene specific manner remains a formidable challenge in drug discovery because the same binding motif may be present at multiple locations in the genome. Here, we present Onco-Regulon (http://www.scfbio-iitd.res.in/software/onco/NavSite/index.htm), an integrated database of regulatory motifs of cancer genes clubbed with Unique Sequence-Predictor (USP) a software suite that identifies unique sequences for each of these regulatory DNA motifs at the specified position in the genome. USP works by extending a given DNA motif, in 5′→3′, 3′ →5′ or both directions by adding one nucleotide at each step, and calculates the frequency of each extended motif in the genome by Frequency Counter programme. This step is iterated till the frequency of the extended motif becomes unity in the genome. Thus, for each given motif, we get three possible unique sequences. Closest Sequence Finder program predicts off-target drug binding in the genome. Inclusion of DNA-Protein structural information further makes Onco-Regulon a highly informative repository for gene specific drug development. We believe that Onco-Regulon will help researchers to design drugs which will bind to an exclusive site in the genome with no off-target effects, theoretically. Database URL: http://www.scfbio-iitd.res.in/software/onco/NavSite/index.htm PMID:27515825
Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion

PubMed Central

Thomsen, Martin Christen Frølund; Nielsen, Morten

2012-01-01

Seq2Logo is a web-based sequence logo generator. Sequence logos are a graphical representation of the information content stored in a multiple sequence alignment (MSA) and provide a compact and highly intuitive representation of the position-specific amino acid composition of binding motifs, active sites, etc. in biological sequences. Accurate generation of sequence logos is often compromised by sequence redundancy and low number of observations. Moreover, most methods available for sequence logo generation focus on displaying the position-specific enrichment of amino acids, discarding the equally valuable information related to amino acid depletion. Seq2logo aims at resolving these issues allowing the user to include sequence weighting to correct for data redundancy, pseudo counts to correct for low number of observations and different logotype representations each capturing different aspects related to amino acid enrichment and depletion. Besides allowing input in the format of peptides and MSA, Seq2Logo accepts input as Blast sequence profiles, providing easy access for non-expert end-users to characterize and identify functionally conserved/variable amino acids in any given protein of interest. The output from the server is a sequence logo and a PSSM. Seq2Logo is available at http://www.cbs.dtu.dk/biotools/Seq2Logo (14 May 2012, date last accessed). PMID:22638583
Identification of sequence–structure RNA binding motifs for SELEX-derived aptamers

PubMed Central

Hoinka, Jan; Zotenko, Elena; Friedman, Adam; Sauna, Zuben E.; Przytycka, Teresa M.

2012-01-01

Motivation: Systematic Evolution of Ligands by EXponential Enrichment (SELEX) represents a state-of-the-art technology to isolate single-stranded (ribo)nucleic acid fragments, named aptamers, which bind to a molecule (or molecules) of interest via specific structural regions induced by their sequence-dependent fold. This powerful method has applications in designing protein inhibitors, molecular detection systems, therapeutic drugs and antibody replacement among others. However, full understanding and consequently optimal utilization of the process has lagged behind its wide application due to the lack of dedicated computational approaches. At the same time, the combination of SELEX with novel sequencing technologies is beginning to provide the data that will allow the examination of a variety of properties of the selection process. Results: To close this gap we developed, Aptamotif, a computational method for the identification of sequence–structure motifs in SELEX-derived aptamers. To increase the chances of identifying functional motifs, Aptamotif uses an ensemble-based approach. We validated the method using two published aptamer datasets containing experimentally determined motifs of increasing complexity. We were able to recreate the author's findings to a high degree, thus proving the capability of our approach to identify binding motifs in SELEX data. Additionally, using our new experimental dataset, we illustrate the application of Aptamotif to elucidate several properties of the selection process. Contact: przytyck@ncbi.nlm.nih.gov, Zuben.Sauna@fda.hhs.gov PMID:22689764
Evaluation and integration of existing methods for computational prediction of allergens.

PubMed

Wang, Jing; Yu, Yabin; Zhao, Yunan; Zhang, Dabing; Li, Jing

2013-01-01

Allergy involves a series of complex reactions and factors that contribute to the development of the disease and triggering of the symptoms, including rhinitis, asthma, atopic eczema, skin sensitivity, even acute and fatal anaphylactic shock. Prediction and evaluation of the potential allergenicity is of importance for safety evaluation of foods and other environment factors. Although several computational approaches for assessing the potential allergenicity of proteins have been developed, their performance and relative merits and shortcomings have not been compared systematically. To evaluate and improve the existing methods for allergen prediction, we collected an up-to-date definitive dataset consisting of 989 known allergens and massive putative non-allergens. The three most widely used allergen computational prediction approaches including sequence-, motif- and SVM-based (Support Vector Machine) methods were systematically compared using the defined parameters and we found that SVM-based method outperformed the other two methods with higher accuracy and specificity. The sequence-based method with the criteria defined by FAO/WHO (FAO: Food and Agriculture Organization of the United Nations; WHO: World Health Organization) has higher sensitivity of over 98%, but having a low specificity. The advantage of motif-based method is the ability to visualize the key motif within the allergen. Notably, the performances of the sequence-based method defined by FAO/WHO and motif eliciting strategy could be improved by the optimization of parameters. To facilitate the allergen prediction, we integrated these three methods in a web-based application proAP, which provides the global search of the known allergens and a powerful tool for allergen predication. Flexible parameter setting and batch prediction were also implemented. The proAP can be accessed at http://gmobl.sjtu.edu.cn/proAP/main.html. This study comprehensively evaluated sequence-, motif- and SVM
DoOPSearch: a web-based tool for finding and analysing common conserved motifs in the promoter regions of different chordate and plant genes

PubMed Central

Sebestyén, Endre; Nagy, Tibor; Suhai, Sándor; Barta, Endre

2009-01-01

Background The comparative genomic analysis of a large number of orthologous promoter regions of the chordate and plant genes from the DoOP databases shows thousands of conserved motifs. Most of these motifs differ from any known transcription factor binding site (TFBS). To identify common conserved motifs, we need a specific tool to be able to search amongst them. Since conserved motifs from the DoOP databases are linked to genes, the result of such a search can give a list of genes that are potentially regulated by the same transcription factor(s). Results We have developed a new tool called DoOPSearch for the analysis of the conserved motifs in the promoter regions of chordate or plant genes. We used the orthologous promoters of the DoOP database to extract thousands of conserved motifs from different taxonomic groups. The advantage of this approach is that different sets of conserved motifs might be found depending on how broad the taxonomic coverage of the underlying orthologous promoter sequence collection is (consider e.g. primates vs. mammals or Brassicaceae vs. Viridiplantae). The DoOPSearch tool allows the users to search these motif collections or the promoter regions of DoOP with user supplied query sequences or any of the conserved motifs from the DoOP database. To find overrepresented gene ontologies, the gene lists obtained can be analysed further using a modified version of the GeneMerge program. Conclusion We present here a comparative genomics based promoter analysis tool. Our system is based on a unique collection of conserved promoter motifs characteristic of different taxonomic groups. We offer both a command line and a web-based tool for searching in these motif collections using user specified queries. These can be either short promoter sequences or consensus sequences of known transcription factor binding sites. The GeneMerge analysis of the search results allows the user to identify statistically overrepresented Gene Ontology terms that
Multiple TPR motifs characterize the Fanconi anemia FANCG protein.

PubMed

Blom, Eric; van de Vrugt, Henri J; de Vries, Yne; de Winter, Johan P; Arwert, Fré; Joenje, Hans

2004-01-05

The genome protection pathway that is defective in patients with Fanconi anemia (FA) is controlled by at least eight genes, including BRCA2. A key step in the pathway involves the monoubiquitylation of FANCD2, which critically depends on a multi-subunit nuclear 'core complex' of at least six FANC proteins (FANCA, -C, -E, -F, -G, and -L). Except for FANCL, which has WD40 repeats and a RING finger domain, no significant domain structure has so far been recognized in any of the core complex proteins. By using a homology search strategy comparing the human FANCG protein sequence with its ortholog sequences in Oryzias latipes (Japanese rice fish) and Danio rerio (zebrafish) we identified at least seven tetratricopeptide repeat motifs (TPRs) covering a major part of this protein. TPRs are degenerate 34-amino acid repeat motifs which function as scaffolds mediating protein-protein interactions, often found in multiprotein complexes. In four out of five TPR motifs tested (TPR1, -2, -5, and -6), targeted missense mutagenesis disrupting the motifs at the critical position 8 of each TPR caused complete or partial loss of FANCG function. Loss of function was evident from failure of the mutant proteins to complement the cellular FA phenotype in FA-G lymphoblasts, which was correlated with loss of binding to FANCA. Although the TPR4 mutant fully complemented the cells, it showed a reduced interaction with FANCA, suggesting that this TPR may also be of functional importance. The recognition of FANCG as a typical TPR protein predicts this protein to play a key role in the assembly and/or stabilization of the nuclear FA protein core complex.
Integration of retinal image sequences

NASA Astrophysics Data System (ADS)

Ballerini, Lucia

1998-10-01

In this paper a method for noise reduction in ocular fundus image sequences is described. The eye is the only part of the human body where the capillary network can be observed along with the arterial and venous circulation using a non invasive technique. The study of the retinal vessels is very important both for the study of the local pathology (retinal disease) and for the large amount of information it offers on systematic haemodynamics, such as hypertension, arteriosclerosis, and diabetes. In this paper a method for image integration of ocular fundus image sequences is described. The procedure can be divided in two step: registration and fusion. First we describe an automatic alignment algorithm for registration of ocular fundus images. In order to enhance vessel structures, we used a spatially oriented bank of filters designed to match the properties of the objects of interest. To evaluate interframe misalignment we adopted a fast cross-correlation algorithm. The performances of the alignment method have been estimated by simulating shifts between image pairs and by using a cross-validation approach. Then we propose a temporal integration technique of image sequences so as to compute enhanced pictures of the overall capillary network. Image registration is combined with image enhancement by fusing subsequent frames of a same region. To evaluate the attainable results, the signal-to-noise ratio was estimated before and after integration. Experimental results on synthetic images of vessel-like structures with different kind of Gaussian additive noise as well as on real fundus images are reported.

Motifs, modules and games in bacteria.

PubMed

Wolf, Denise M; Arkin, Adam P

2003-04-01

Global explorations of regulatory network dynamics, organization and evolution have become tractable thanks to high-throughput sequencing and molecular measurement of bacterial physiology. From these, a nascent conceptual framework is developing, that views the principles of regulation in term of motifs, modules and games. Motifs are small, repeated, and conserved biological units ranging from molecular domains to small reaction networks. They are arranged into functional modules, genetically dissectible cellular functions such as the cell cycle, or different stress responses. The dynamical functioning of modules defines the organism's strategy to survive in a game, pitting cell against cell, and cell against environment. Placing pathway structure and dynamics into an evolutionary context begins to allow discrimination between those physical and molecular features that particularize a species to its surroundings, and those that provide core physiological function. This approach promises to generate a higher level understanding of cellular design, pathway evolution and cellular bioengineering.
[Conserved motifs in voltage sensing proteins].

PubMed

Wang, Chang-He; Xie, Zhen-Li; Lv, Jian-Wei; Yu, Zhi-Dan; Shao, Shu-Li

2012-08-25

This paper was aimed to study conserved motifs of voltage sensing proteins (VSPs) and establish a voltage sensing model. All VSPs were collected from the Uniprot database using a comprehensive keyword search followed by manual curation, and the results indicated that there are only two types of known VSPs, voltage gated ion channels and voltage dependent phosphatases. All the VSPs have a common domain of four helical transmembrane segments (TMS, S1-S4), which constitute the voltage sensing module of the VSPs. The S1 segment was shown to be responsible for membrane targeting and insertion of these proteins, while S2-S4 segments, which can sense membrane potential, for protein properties. Conserved motifs/residues and their functional significance of each TMS were identified using profile-to-profile sequence alignments. Conserved motifs in these four segments are strikingly similar for all VSPs, especially, the conserved motif [RK]-X(2)-R-X(2)-R-X(2)-[RK] was presented in all the S4 segments, with positively charged arginine (R) alternating with two hydrophobic or uncharged residues. Movement of these arginines across the membrane electric field is the core mechanism by which the VSPs detect changes in membrane potential. The negatively charged aspartate (D) in the S3 segment is universally conserved in all the VSPs, suggesting that the aspartate residue may be involved in voltage sensing properties of VSPs as well as the electrostatic interactions with the positively charged residues in the S4 segment, which may enhance the thermodynamic stability of the S4 segments in plasma membrane.
RSAT 2015: Regulatory Sequence Analysis Tools

PubMed Central

Medina-Rivera, Alejandra; Defrance, Matthieu; Sand, Olivier; Herrmann, Carl; Castro-Mondragon, Jaime A.; Delerce, Jeremy; Jaeger, Sébastien; Blanchet, Christophe; Vincens, Pierre; Caron, Christophe; Staines, Daniel M.; Contreras-Moreira, Bruno; Artufel, Marie; Charbonnier-Khamvongsa, Lucie; Hernandez, Céline; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques

2015-01-01

RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/. PMID:25904632
SLIDER: a generic metaheuristic for the discovery of correlated motifs in protein-protein interaction networks.

PubMed

Boyen, Peter; Van Dyck, Dries; Neven, Frank; van Ham, Roeland C H J; van Dijk, Aalt D J

2011-01-01

Correlated motif mining (cmm) is the problem of finding overrepresented pairs of patterns, called motifs, in sequences of interacting proteins. Algorithmic solutions for cmm thereby provide a computational method for predicting binding sites for protein interaction. In this paper, we adopt a motif-driven approach where the support of candidate motif pairs is evaluated in the network. We experimentally establish the superiority of the Chi-square-based support measure over other support measures. Furthermore, we obtain that cmm is an np-hard problem for a large class of support measures (including Chi-square) and reformulate the search for correlated motifs as a combinatorial optimization problem. We then present the generic metaheuristic slider which uses steepest ascent with a neighborhood function based on sliding motifs and employs the Chi-square-based support measure. We show that slider outperforms existing motif-driven cmm methods and scales to large protein-protein interaction networks. The slider-implementation and the data used in the experiments are available on http://bioinformatics.uhasselt.be.
Cloud-based MOTIFSIM: Detecting Similarity in Large DNA Motif Data Sets.

PubMed

Tran, Ngoc Tam L; Huang, Chun-Hsi

2017-05-01

We developed the cloud-based MOTIFSIM on Amazon Web Services (AWS) cloud. The tool is an extended version from our web-based tool version 2.0, which was developed based on a novel algorithm for detecting similarity in multiple DNA motif data sets. This cloud-based version further allows researchers to exploit the computing resources available from AWS to detect similarity in multiple large-scale DNA motif data sets resulting from the next-generation sequencing technology. The tool is highly scalable with expandable AWS.
A putative N-terminal nuclear export sequence is sufficient for Mps1 nuclear exclusion during interphase.

PubMed

Jia, Haiwei; Zhang, Xiaojuan; Wang, Wenjun; Bai, Yuanyuan; Ling, Youguo; Cao, Cheng; Ma, Runlin Z; Zhong, Hui; Wang, Xue; Xu, Quanbin

2015-02-27

Mps1, an essential component of the mitotic checkpoint, is also an important interphase regulator and has roles in DNA damage response, cytokinesis and centrosome duplication. Mps1 predominantly resides in the cytoplasm and relocates into the nucleus at the late G2 phase. So far, the mechanism underlying the Mps1 translocation between the cytoplasm and nucleus has been unclear. In this work, a dynamic export process of Mps1 from the nucleus to cytoplasm in interphase was revealed- a process blocked by the Crm1 inhibitor, Leptomycin B, suggesting that export of Mps1 is Crm1 dependent. Consistent with this speculation, a direct association between Mps1 and Crm1 was found. Furthermore, a putative nuclear export sequence (pNES) motif at the N-terminal of Mps1 was identified by analyzing the motif of Mps1. This motif shows a high sequence similarity to the classic NES, a fusion of this motif with EGFP results in dramatic exclusion of the fusion protein from the nucleus. Additionally, Mps1 mutant loss of pNES integrity was shown by replacing leucine with alanine which produced a diffused subcellular distribution, compared to the wild type protein which resides predominantly in cytoplasm. Taken these findings together, it was concluded that the pNES sequence is sufficient for the Mps1 export from nucleus during interphase.
A computational approach to candidate gene prioritization for X-linked mental retardation using annotation-based binary filtering and motif-based linear discriminatory analysis

PubMed Central

2011-01-01

Background Several computational candidate gene selection and prioritization methods have recently been developed. These in silico selection and prioritization techniques are usually based on two central approaches - the examination of similarities to known disease genes and/or the evaluation of functional annotation of genes. Each of these approaches has its own caveats. Here we employ a previously described method of candidate gene prioritization based mainly on gene annotation, in accompaniment with a technique based on the evaluation of pertinent sequence motifs or signatures, in an attempt to refine the gene prioritization approach. We apply this approach to X-linked mental retardation (XLMR), a group of heterogeneous disorders for which some of the underlying genetics is known. Results The gene annotation-based binary filtering method yielded a ranked list of putative XLMR candidate genes with good plausibility of being associated with the development of mental retardation. In parallel, a motif finding approach based on linear discriminatory analysis (LDA) was employed to identify short sequence patterns that may discriminate XLMR from non-XLMR genes. High rates (>80%) of correct classification was achieved, suggesting that the identification of these motifs effectively captures genomic signals associated with XLMR vs. non-XLMR genes. The computational tools developed for the motif-based LDA is integrated into the freely available genomic analysis portal Galaxy (http://main.g2.bx.psu.edu/). Nine genes (APLN, ZC4H2, MAGED4, MAGED4B, RAP2C, FAM156A, FAM156B, TBL1X, and UXT) were highlighted as highly-ranked XLMR methods. Conclusions The combination of gene annotation information and sequence motif-orientated computational candidate gene prediction methods highlight an added benefit in generating a list of plausible candidate genes, as has been demonstrated for XLMR. Reviewers: This article was reviewed by Dr Barbara Bardoni (nominated by Prof Juergen Brosius
Discovery and validation of information theory-based transcription factor and cofactor binding site motifs.

PubMed

Lu, Ruipeng; Mucaki, Eliseos J; Rogan, Peter K

2017-03-17

Data from ChIP-seq experiments can derive the genome-wide binding specificities of transcription factors (TFs) and other regulatory proteins. We analyzed 765 ENCODE ChIP-seq peak datasets of 207 human TFs with a novel motif discovery pipeline based on recursive, thresholded entropy minimization. This approach, while obviating the need to compensate for skewed nucleotide composition, distinguishes true binding motifs from noise, quantifies the strengths of individual binding sites based on computed affinity and detects adjacent cofactor binding sites that coordinate with the targets of primary, immunoprecipitated TFs. We obtained contiguous and bipartite information theory-based position weight matrices (iPWMs) for 93 sequence-specific TFs, discovered 23 cofactor motifs for 127 TFs and revealed six high-confidence novel motifs. The reliability and accuracy of these iPWMs were determined via four independent validation methods, including the detection of experimentally proven binding sites, explanation of effects of characterized SNPs, comparison with previously published motifs and statistical analyses. We also predict previously unreported TF coregulatory interactions (e.g. TF complexes). These iPWMs constitute a powerful tool for predicting the effects of sequence variants in known binding sites, performing mutation analysis on regulatory SNPs and predicting previously unrecognized binding sites and target genes. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Mass spectrometry-based protein identification by integrating de novo sequencing with database searching.

PubMed

Wang, Penghao; Wilson, Susan R

2013-01-01

Mass spectrometry-based protein identification is a very challenging task. The main identification approaches include de novo sequencing and database searching. Both approaches have shortcomings, so an integrative approach has been developed. The integrative approach firstly infers partial peptide sequences, known as tags, directly from tandem spectra through de novo sequencing, and then puts these sequences into a database search to see if a close peptide match can be found. However the current implementation of this integrative approach has several limitations. Firstly, simplistic de novo sequencing is applied and only very short sequence tags are used. Secondly, most integrative methods apply an algorithm similar to BLAST to search for exact sequence matches and do not accommodate sequence errors well. Thirdly, by applying these methods the integrated de novo sequencing makes a limited contribution to the scoring model which is still largely based on database searching. We have developed a new integrative protein identification method which can integrate de novo sequencing more efficiently into database searching. Evaluated on large real datasets, our method outperforms popular identification methods.
Integration of Temporal and Ordinal Information During Serial Interception Sequence Learning

PubMed Central

Gobel, Eric W.; Sanchez, Daniel J.; Reber, Paul J.

2011-01-01

The expression of expert motor skills typically involves learning to perform a precisely timed sequence of movements (e.g., language production, music performance, athletic skills). Research examining incidental sequence learning has previously relied on a perceptually-cued task that gives participants exposure to repeating motor sequences but does not require timing of responses for accuracy. Using a novel perceptual-motor sequence learning task, learning a precisely timed cued sequence of motor actions is shown to occur without explicit instruction. Participants learned a repeating sequence through practice and showed sequence-specific knowledge via a performance decrement when switched to an unfamiliar sequence. In a second experiment, the integration of representation of action order and timing sequence knowledge was examined. When either action order or timing sequence information was selectively disrupted, performance was reduced to levels similar to completely novel sequences. Unlike prior sequence-learning research that has found timing information to be secondary to learning action sequences, when the task demands require accurate action and timing information, an integrated representation of these types of information is acquired. These results provide the first evidence for incidental learning of fully integrated action and timing sequence information in the absence of an independent representation of action order, and suggest that this integrative mechanism may play a material role in the acquisition of complex motor skills. PMID:21417511
A Motif in the Clathrin Heavy Chain Required for the Hsc70/Auxilin Uncoating Reaction

PubMed Central

Rapoport, Iris; Boll, Werner; Yu, Anan; Böcking, Till

2008-01-01

The 70-kDa heat-shock cognate protein (Hsc70) chaperone is an ATP-dependent “disassembly enzyme” for many subcellular structures, including clathrin-coated vesicles where it functions as an uncoating ATPase. Hsc70, and its cochaperone auxilin together catalyze coat disassembly. Like other members of the Hsp70 chaperone family, it is thought that ATP-bound Hsc70 recognizes the clathrin triskelion through an unfolded exposed hydrophobic segment. The best candidate is the unstructured C terminus (residues 1631–1675) of the heavy chain at the foot of the tripod below the hub, containing the sequence motif QLMLT, closely related to the sequence bound preferentially by the substrate groove of Hsc70 (Fotin et al., 2004b). To test this hypothesis, we generated in insect cells recombinant mammalian triskelions that in vitro form clathrin cages and clathrin/AP-2 coats exactly like those assembled from native clathrin. We show that coats assembled from recombinant clathrin are good substrates for ATP- and auxilin-dependent, Hsc70-catalyzed uncoating. Finally, we show that this uncoating reaction proceeds normally when the coats contain recombinant heavy chains truncated C-terminal to the QLMLT motif, but very inefficiently when the motif is absent. Thus, the QLMLT motif is required for Hsc-70–facilitated uncoating, consistent with the proposal that this sequence is a specific target of the chaperone. PMID:17978091
Salt-bridging effects on short amphiphilic helical structure and introducing sequence-based short beta-turn motifs.

PubMed

Guarracino, Danielle A; Gentile, Kayla; Grossman, Alec; Li, Evan; Refai, Nader; Mohnot, Joy; King, Daniel

2018-02-01

Determining the minimal sequence necessary to induce protein folding is beneficial in understanding the role of protein-protein interactions in biological systems, as their three-dimensional structures often dictate their activity. Proteins are generally comprised of discrete secondary structures, from α-helices to β-turns and larger β-sheets, each of which is influenced by its primary structure. Manipulating the sequence of short, moderately helical peptides can help elucidate the influences on folding. We created two new scaffolds based on a modestly helical eight-residue peptide, PT3, we previously published. Using circular dichroism (CD) spectroscopy and changing the possible salt-bridging residues to new combinations of Lys, Arg, Glu, and Asp, we found that our most helical improvements came from the Arg-Glu combination, whereas the Lys-Asp was not significantly different from the Lys-Glu of the parent scaffold, PT3. The marked 3 10 -helical contributions in PT3 were lessened in the Arg-Glu-containing peptide with the beginning of cooperative unfolding seen through a thermal denaturation. However, a unique and unexpected signature was seen for the denaturation of the Lys-Asp peptide which could help elucidate the stages of folding between the 3 10 and α-helix. In addition, we developed a short six-residue peptide with β-turn/sheet CD signature, again to help study minimal sequences needed for folding. Overall, the results indicate that improvements made to short peptide scaffolds by fine-tuning the salt-bridging residues can enhance scaffold structure. Likewise, with the results from the new, short β-turn motif, these can help impact future peptidomimetic designs in creating biologically useful, short, structured β-sheet-forming peptides.
Dimeric PROP1 binding to diverse palindromic TAAT sequences promotes its transcriptional activity.

PubMed

Nakayama, Michie; Kato, Takako; Susa, Takao; Sano, Akiko; Kitahara, Kousuke; Kato, Yukio

2009-08-13

Mutations in the Prop1 gene are responsible for murine Ames dwarfism and human combined pituitary hormone deficiency with hypogonadism. Recently, we reported that PROP1 is a possible transcription factor for gonadotropin subunit genes through plural cis-acting sites composed of AT-rich sequences containing a TAAT motif which differs from its consensus binding sequence known as PRDQ9 (TAATTGAATTA). This study aimed to verify the binding specificity and sequence of PROP1 by applying the method of SELEX (Systematic Evolution of Ligands by EXponential enrichment), EMSA (electrophoretic mobility shift assay) and transient transfection assay. SELEX, after 5, 7 and 9 generations of selection using a random sequence library, showed that nucleotides containing one or two TAAT motifs were accumulated and accounted for 98.5% at the 9th generation. Aligned sequences and EMSA demonstrated that PROP1 binds preferentially to 11 nucleotides composed of an inverted TAAT motif separated by 3 nucleotides with variation in the half site of palindromic TAAT motifs and with preferential requirement of T at the nucleotide number 5 immediately 3' to a TAAT motif. Transient transfection assay demonstrated first that dimeric binding of PROP1 to an inverted TAAT motif and its cognates resulted in transcriptional activation, whereas monomeric binding of PROP1 to a single TAAT motif and an inverted ATTA motif did not mediate activation. Thus, this study demonstrated that dimeric binding of PROP1 is able to recognize diverse palindromic TAAT sequences separated by 3 nucleotides and to exhibit its transcriptional activity.
Motifs, modules and games in bacteria

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wolf, Denise M.; Arkin, Adam P.

2003-04-01

Global explorations of regulatory network dynamics, organization and evolution have become tractable thanks to high-throughput sequencing and molecular measurement of bacterial physiology. From these, a nascent conceptual framework is developing, that views the principles of regulation in term of motifs, modules and games. Motifs are small, repeated, and conserved biological units ranging from molecular domains to small reaction networks. They are arranged into functional modules, genetically dissectible cellular functions such as the cell cycle, or different stress responses. The dynamical functioning of modules defines the organism's strategy to survive in a game, pitting cell against cell, and cell against environment.more » Placing pathway structure and dynamics into an evolutionary context begins to allow discrimination between those physical and molecular features that particularize a species to its surroundings, and those that provide core physiological function. This approach promises to generate a higher level understanding of cellular design, pathway evolution and cellular bioengineering.« less
BEAM web server: a tool for structural RNA motif discovery.

PubMed

Pietrosanto, Marco; Adinolfi, Marta; Casula, Riccardo; Ausiello, Gabriele; Ferrè, Fabrizio; Helmer-Citterich, Manuela

2018-03-15

RNA structural motif finding is a relevant problem that becomes computationally hard when working on high-throughput data (e.g. eCLIP, PAR-CLIP), often represented by thousands of RNA molecules. Currently, the BEAM server is the only web tool capable to handle tens of thousands of RNA in input with a motif discovery procedure that is only limited by the current secondary structure prediction accuracies. The recently developed method BEAM (BEAr Motifs finder) can analyze tens of thousands of RNA molecules and identify RNA secondary structure motifs associated to a measure of their statistical significance. BEAM is extremely fast thanks to the BEAR encoding that transforms each RNA secondary structure in a string of characters. BEAM also exploits the evolutionary knowledge contained in a substitution matrix of secondary structure elements, extracted from the RFAM database of families of homologous RNAs. The BEAM web server has been designed to streamline data pre-processing by automatically handling folding and encoding of RNA sequences, giving users a choice for the preferred folding program. The server provides an intuitive and informative results page with the list of secondary structure motifs identified, the logo of each motif, its significance, graphic representation and information about its position in the RNA molecules sharing it. The web server is freely available at http://beam.uniroma2.it/ and it is implemented in NodeJS and Python with all major browsers supported. marco.pietrosanto@uniroma2.it. Supplementary data are available at Bioinformatics online.
GibbsCluster: unsupervised clustering and alignment of peptide sequences.

PubMed

Andreatta, Massimo; Alvarez, Bruno; Nielsen, Morten

2017-07-03

Receptor interactions with short linear peptide fragments (ligands) are at the base of many biological signaling processes. Conserved and information-rich amino acid patterns, commonly called sequence motifs, shape and regulate these interactions. Because of the properties of a receptor-ligand system or of the assay used to interrogate it, experimental data often contain multiple sequence motifs. GibbsCluster is a powerful tool for unsupervised motif discovery because it can simultaneously cluster and align peptide data. The GibbsCluster 2.0 presented here is an improved version incorporating insertion and deletions accounting for variations in motif length in the peptide input. In basic terms, the program takes as input a set of peptide sequences and clusters them into meaningful groups. It returns the optimal number of clusters it identified, together with the sequence alignment and sequence motif characterizing each cluster. Several parameters are available to customize cluster analysis, including adjustable penalties for small clusters and overlapping groups and a trash cluster to remove outliers. As an example application, we used the server to deconvolute multiple specificities in large-scale peptidome data generated by mass spectrometry. The server is available at http://www.cbs.dtu.dk/services/GibbsCluster-2.0. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Promzea: a pipeline for discovery of co-regulatory motifs in maize and other plant species and its application to the anthocyanin and phlobaphene biosynthetic pathways and the Maize Development Atlas.

PubMed

Liseron-Monfils, Christophe; Lewis, Tim; Ashlock, Daniel; McNicholas, Paul D; Fauteux, François; Strömvik, Martina; Raizada, Manish N

2013-03-15

The discovery of genetic networks and cis-acting DNA motifs underlying their regulation is a major objective of transcriptome studies. The recent release of the maize genome (Zea mays L.) has facilitated in silico searches for regulatory motifs. Several algorithms exist to predict cis-acting elements, but none have been adapted for maize. A benchmark data set was used to evaluate the accuracy of three motif discovery programs: BioProspector, Weeder and MEME. Analysis showed that each motif discovery tool had limited accuracy and appeared to retrieve a distinct set of motifs. Therefore, using the benchmark, statistical filters were optimized to reduce the false discovery ratio, and then remaining motifs from all programs were combined to improve motif prediction. These principles were integrated into a user-friendly pipeline for motif discovery in maize called Promzea, available at http://www.promzea.org and on the Discovery Environment of the iPlant Collaborative website. Promzea was subsequently expanded to include rice and Arabidopsis. Within Promzea, a user enters cDNA sequences or gene IDs; corresponding upstream sequences are retrieved from the maize genome. Predicted motifs are filtered, combined and ranked. Promzea searches the chosen plant genome for genes containing each candidate motif, providing the user with the gene list and corresponding gene annotations. Promzea was validated in silico using a benchmark data set: the Promzea pipeline showed a 22% increase in nucleotide sensitivity compared to the best standalone program tool, Weeder, with equivalent nucleotide specificity. Promzea was also validated by its ability to retrieve the experimentally defined binding sites of transcription factors that regulate the maize anthocyanin and phlobaphene biosynthetic pathways. Promzea predicted additional promoter motifs, and genome-wide motif searches by Promzea identified 127 non-anthocyanin/phlobaphene genes that each contained all five predicted promoter
A common structural motif in immunopotentiating peptides with sequences present in human autoantigens. Elicitation of a response mediated by monocytes and Th1 cells.

PubMed

López-Moratalla, N; Ruíz, E; López-Zabalza, M J; Santiago, E

1996-12-16

We have found a common structural motif in human autoantigens, heat shock proteins and viral proteins. Peptides modelled after sequences present in those molecules were synthesized and immunomodulating properties tested. They share a core of 15 amino acid residues and a common pattern ('2-6-11' motif) characterized by requirements at fixed positions with respect to a Pro (position 6); an apolar residue or a Lys at position 2; and a Glu, Asp or Lys at position 11. Any of these peptides, when added to cultures of lymphomononuclear cells, caused the activation of monocytes manifested by a release of IL-1 alpha, IL-1 beta and TNF alpha. A release of INF gamma and IL-2 took also place; this release was abolished by anti-DR antibodies. Neither IL-4 nor IL-5 could be detected. This suggests a presentation by APCs and the appearance of cells with a Th1 phenotype. Monocytes and Th1 cells freshly obtained from 12 patients of Graves' disease, 8 of Hashimoto's disease and 8 of primary biliary cirrhosis exhibited activation features similar to those found in cells from healthy subjects incubated in the presence of peptides with a "2-6-11' motif and representing fragments of autoantigens. Their immunopotentiating properties suggest their involvement in the initiation or progression of the autoimmune response mediated by activated monocytes and Th1 cells.
Overlapping activation-induced cytidine deaminase hotspot motifs in Ig class-switch recombination

PubMed Central

Han, Li; Masani, Shahnaz; Yu, Kefei

2011-01-01

Ig class-switch recombination (CSR) is directed by the long and repetitive switch regions and requires activation-induced cytidine deaminase (AID). One of the conserved switch-region sequence motifs (AGCT) is a preferred site for AID-mediated DNA-cytosine deamination. By using somatic gene targeting and recombinase-mediated cassette exchange, we established a cell line-based CSR assay that allows manipulation of switch sequences at the endogenous locus. We show that AGCT is only one of a family of four WGCW motifs in the switch region that can facilitate CSR. We go on to show that it is the overlap of AID hotspots at WGCW sites on the top and bottom strands that is critical. This finding leads to a much clearer model for the difference between CSR and somatic hypermutation. PMID:21709240
RSAT 2015: Regulatory Sequence Analysis Tools.

PubMed

Medina-Rivera, Alejandra; Defrance, Matthieu; Sand, Olivier; Herrmann, Carl; Castro-Mondragon, Jaime A; Delerce, Jeremy; Jaeger, Sébastien; Blanchet, Christophe; Vincens, Pierre; Caron, Christophe; Staines, Daniel M; Contreras-Moreira, Bruno; Artufel, Marie; Charbonnier-Khamvongsa, Lucie; Hernandez, Céline; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques

2015-07-01

RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

Sequences characterization of microsatellite DNA sequences in Pacific abalone ( Haliotis discus hannai)

NASA Astrophysics Data System (ADS)

Li, Qi; Akihiro, Kijima

2007-01-01

The microsatellite-enriched library was constructed using magnetic bead hybridization selection method, and the microsatellite DNA sequences were analyzed in Pacific abalone Haliotis discus hannai. Three hundred and fifty white colonies were screened using PCR-based technique, and 84 clones were identified to potentially contain microsatellite repeat motif. The 84 clones were sequenced, and 42 microsatellites and 4 minisatellites with a minimum of five repeats were found (13.1% of white colonies screened). Besides the motif of CA contained in the oligoprobe, we also found other 16 types of microsatellite repeats including a dinucleotide repeat, two tetranucleotide repeats, twelve pentanucleotide repeats and a hexanucleotide repeat. According to Weber (1990), the microsatellite sequences obtained could be categorized structurally into perfect repeats (73.3%), imperfect repeats (13.3%), and compound repeats (13.4%). Among the microsatellite repeats, relatively short arrays (<20 repeats) were most abundant, accounting for 75.0%. The largest length of microsatellites was 48 repeats, and the average number of repeats was 13.4. The data on the composition and length distribution of microsatellites obtained in the present study can be useful for choosing the repeat motifs for microsatellite isolation in other abalone species.
Reversible conformational switching of i-motif DNA studied by fluorescence spectroscopy.

PubMed

Choi, Jungkweon; Majima, Tetsuro

2013-01-01

Non-B DNAs, which can form unique structures other than double helix of B-DNA, have attracted considerable attention from scientists in various fields including biology, chemistry and physics etc. Among them, i-motif DNA, which is formed from cytosine (C)-rich sequences found in telomeric DNA and the promoter region of oncogenes, has been extensively investigated as a signpost and controller for the oncogene expression at the transcription level and as a promising material in nanotechnology. Fluorescence techniques such as fluorescence resonance energy transfer (FRET) and the fluorescence quenching are important for studying DNA and in particular for the visualization of reversible conformational switching of i-motif DNA that is triggered by the protonation. Here, we review the latest studies on the conformational dynamics of i-motif DNA as well as the application of FRET and fluorescence quenching techniques to the visualization of reversible conformational switching of i-motif DNA in nano-biotechnology. © 2013 Wiley Periodicals, Inc. Photochemistry and Photobiology © 2013 The American Society of Photobiology.
Evaluation and integration of existing methods for computational prediction of allergens

PubMed Central

2013-01-01

Background Allergy involves a series of complex reactions and factors that contribute to the development of the disease and triggering of the symptoms, including rhinitis, asthma, atopic eczema, skin sensitivity, even acute and fatal anaphylactic shock. Prediction and evaluation of the potential allergenicity is of importance for safety evaluation of foods and other environment factors. Although several computational approaches for assessing the potential allergenicity of proteins have been developed, their performance and relative merits and shortcomings have not been compared systematically. Results To evaluate and improve the existing methods for allergen prediction, we collected an up-to-date definitive dataset consisting of 989 known allergens and massive putative non-allergens. The three most widely used allergen computational prediction approaches including sequence-, motif- and SVM-based (Support Vector Machine) methods were systematically compared using the defined parameters and we found that SVM-based method outperformed the other two methods with higher accuracy and specificity. The sequence-based method with the criteria defined by FAO/WHO (FAO: Food and Agriculture Organization of the United Nations; WHO: World Health Organization) has higher sensitivity of over 98%, but having a low specificity. The advantage of motif-based method is the ability to visualize the key motif within the allergen. Notably, the performances of the sequence-based method defined by FAO/WHO and motif eliciting strategy could be improved by the optimization of parameters. To facilitate the allergen prediction, we integrated these three methods in a web-based application proAP, which provides the global search of the known allergens and a powerful tool for allergen predication. Flexible parameter setting and batch prediction were also implemented. The proAP can be accessed at http://gmobl.sjtu.edu.cn/proAP/main.html. Conclusions This study comprehensively evaluated sequence
Endocytosis and Trafficking of Natriuretic Peptide Receptor-A: Potential Role of Short Sequence Motifs

PubMed Central

Pandey, Kailash N.

2015-01-01

The targeted endocytosis and redistribution of transmembrane receptors among membrane-bound subcellular organelles are vital for their correct signaling and physiological functions. Membrane receptors committed for internalization and trafficking pathways are sorted into coated vesicles. Cardiac hormones, atrial and brain natriuretic peptides (ANP and BNP) bind to guanylyl cyclase/natriuretic peptide receptor-A (GC-A/NPRA) and elicit the generation of intracellular second messenger cyclic guanosine 3',5'-monophosphate (cGMP), which lowers blood pressure and incidence of heart failure. After ligand binding, the receptor is rapidly internalized, sequestrated, and redistributed into intracellular locations. Thus, NPRA is considered a dynamic cellular macromolecule that traverses different subcellular locations through its lifetime. The utilization of pharmacologic and molecular perturbants has helped in delineating the pathways of endocytosis, trafficking, down-regulation, and degradation of membrane receptors in intact cells. This review describes the investigation of the mechanisms of internalization, trafficking, and redistribution of NPRA compared with other cell surface receptors from the plasma membrane into the cell interior. The roles of different short-signal peptide sequence motifs in the internalization and trafficking of other membrane receptors have been briefly reviewed and their potential significance in the internalization and trafficking of NPRA is discussed. PMID:26151885
Interaction of the Spo20 membrane-sensor motif with phosphatidic acid and other anionic lipids, and influence of the membrane environment.

PubMed

Horchani, Habib; de Saint-Jean, Maud; Barelli, Hélène; Antonny, Bruno

2014-01-01

The yeast protein Spo20 contains a regulatory amphipathic motif that has been suggested to recognize phosphatidic acid, a lipid involved in signal transduction, lipid metabolism and membrane fusion. We have investigated the interaction of the Spo20 amphipathic motif with lipid membranes using a bioprobe strategy that consists in appending this motif to the end of a long coiled-coil, which can be coupled to a GFP reporter for visualization in cells. The resulting construct is amenable to in vitro and in vivo experiments and allows unbiased comparison between amphipathic helices of different chemistry. In vitro, the Spo20 bioprobe responded to small variations in the amount of phosphatidic acid. However, this response was not specific. The membrane binding of the probe depended on the presence of phosphatidylethanolamine and also integrated the contribution of other anionic lipids, including phosphatidylserine and phosphatidyl-inositol-(4,5)bisphosphate. Inverting the sequence of the Spo20 motif neither affected the ability of the probe to interact with anionic liposomes nor did it modify its cellular localization, making a stereo-specific mode of phosphatidic acid recognition unlikely. Nevertheless, the lipid binding properties and the cellular localization of the Spo20 alpha-helix differed markedly from that of another amphipathic motif, Amphipathic Lipid Packing Sensor (ALPS), suggesting that even in the absence of stereo specific interactions, amphipathic helices can act as subcellular membrane targeting determinants in a cellular context.
[Prediction of Promoter Motifs in Virophages].

PubMed

Gong, Chaowen; Zhou, Xuewen; Pan, Yingjie; Wang, Yongjie

2015-07-01

Virophages have crucial roles in ecosystems and are the transport vectors of genetic materials. To shed light on regulation and control mechanisms in virophage--host systems as well as evolution between virophages and their hosts, the promoter motifs of virophages were predicted on the upstream regions of start codons using an analytical tool for prediction of promoter motifs: Multiple EM for Motif Elicitation. Seventeen potential promoter motifs were identified based on the E-value, location, number and length of promoters in genomes. Sputnik and zamilon motif 2 with AT-rich regions were distributed widely on genomes, suggesting that these motifs may be associated with regulation of the expression of various genes. Motifs containing the TCTA box were predicted to be late promoter motif in mavirus; motifs containing the ATCT box were the potential late promoter motif in the Ace Lake mavirus . AT-rich regions were identified on motif 2 in the Organic Lake virophage, motif 3 in Yellowstone Lake virophage (YSLV)1 and 2, motif 1 in YSLV3, and motif 1 and 2 in YSLV4, respectively. AT-rich regions were distributed widely on the genomes of virophages. All of these motifs may be promoter motifs of virophages. Our results provide insights into further exploration of temporal expression of genes in virophages as well as associations between virophages and giant viruses.
Spectroscopic studies on peptides and proteins with cysteine-containing heme regulatory motifs (HRM).

PubMed

Schubert, Erik; Florin, Nicole; Duthie, Fraser; Henning Brewitz, H; Kühl, Toni; Imhof, Diana; Hagelueken, Gregor; Schiemann, Olav

2015-07-01

The role of heme as a cofactor in enzymatic reactions has been studied for a long time and in great detail. Recently it was discovered that heme can also serve as a signalling molecule in cells but so far only few examples of this regulation have been studied. In order to discover new potentially heme-regulated proteins, we screened protein sequence databases for bacterial proteins that contain sequence features like a Cysteine-Proline (CP) motif, which is known for its heme-binding propensity. Based on this search we synthesized a series of these potential heme regulatory motifs (HRMs). We used cw EPR spectroscopy to investigate whether these sequences do indeed bind to heme and if the spin state of heme is changed upon interaction with the peptides. The corresponding proteins of two potential HRMs, FeoB and GlpF, were expressed and purified and their interaction with heme was studied by cw EPR and UV-Visible (UV-Vis) spectroscopy. Copyright © 2015 Elsevier Inc. All rights reserved.
Single-molecule study of thymidine glycol and i-motif through the alpha-hemolysin ion channel

NASA Astrophysics Data System (ADS)

He, Lidong

Nanopore-based devices have emerged as a single-molecule detection and analysis tool for a wide range of applications. Through electrophoretically driving DNA molecules across a nanosized pore, a lot of information can be received, including unfolding kinetics and DNA-protein interactions. This single-molecule method has the potential to sequence kilobase length DNA polymers without amplification or labeling, approaching "the third generation" genome sequencing for around $1000 within 24 hours. alpha-Hemolysin biological nanopores have the advantages of excellent stability, low-noise level, and precise site-directed mutagenesis for engineering this protein nanopore. The first work presented in this thesis established the current signal of the thymidine glycol lesion in DNA oligomers through an immobilization experiment. The thymidine glycol enantiomers were differentiated from each other by different current blockage levels. Also, the effect of bulky hydrophobic adducts to the current blockage was investigated. Secondly, the alpha-hemolysin nanopore was used to study the human telomere i-motif and RET oncogene i-motif at a single-molecule level. In Chapter 3, it was demonstrated that the alpha-hemolysin nanopore can differentiate an i-motif form and single-strand DNA form at different pH values based on the same sequence. In addition, it shows potential to differentiate the folding topologies generated from the same DNA sequence.
Biophysical Analysis of the Binding of WW Domains of YAP2 Transcriptional Regulator to PPXY Motifs within WBP1 and WBP2 Adaptors

PubMed Central

McDonald, Caleb B.; McIntosh, Samantha K. N.; Mikles, David C.; Bhat, Vikas; Deegan, Brian J.; Seldeen, Kenneth L.; Saeed, Ali M.; Buffa, Laura; Sudol, Marius; Nawaz, Zafar; Farooq, Amjad

2011-01-01

YAP2 transcriptional regulator mediates a plethora of cellular functions, including the newly discovered Hippo tumor suppressor pathway, by virtue of its ability to recognize WBP1 and WBP2 signaling adaptors among a wide variety of other ligands. Herein, using isothermal titration calorimery (ITC) and circular dichroism (CD) in combination with molecular modeling (MM) and molecular dynamics (MD), we provide evidence that the WW1 and WW2 domains of YAP2 recognize various PPXY motifs within WBP1 and WBP2 in a highly promiscuous and subtle manner. Thus, although both WW domains strictly require the integrity of the consensus PPXY sequence, non-consensus residues within and flanking this motif are not critical for high-affinity binding, implying that they most likely play a role in stabilizing the polyproline type II (PPII) helical conformation of the PPXY ligands. Of particular interest is the observation that both WW domains bind to a PPXYXG motif with highest affinity, implicating a preference for a non-bulky and flexible glycine one-residue C-terminal to the consensus tyrosine. Importantly, a large set of residues within both WW domains and the PPXY motifs appear to undergo rapid fluctuations on a nanosecond time scale, arguing that WW-ligand interactions are highly dynamic and that such conformational entropy may be an integral part of the reversible and temporal nature of cellular signaling cascades. Collectively, our study sheds light on the molecular determinants of a key WW-ligand interaction pertinent to cellular functions in health and disease. PMID:21981024
Biophysical analysis of binding of WW domains of the YAP2 transcriptional regulator to PPXY motifs within WBP1 and WBP2 adaptors.

PubMed

McDonald, Caleb B; McIntosh, Samantha K N; Mikles, David C; Bhat, Vikas; Deegan, Brian J; Seldeen, Kenneth L; Saeed, Ali M; Buffa, Laura; Sudol, Marius; Nawaz, Zafar; Farooq, Amjad

2011-11-08

The YAP2 transcriptional regulator mediates a plethora of cellular functions, including the newly discovered Hippo tumor suppressor pathway, by virtue of its ability to recognize WBP1 and WBP2 signaling adaptors among a wide variety of other ligands. Herein, using isothermal titration calorimery and circular dichroism in combination with molecular modeling and molecular dynamics, we provide evidence that the WW1 and WW2 domains of YAP2 recognize various PPXY motifs within WBP1 and WBP2 in a highly promiscuous and subtle manner. Thus, although both WW domains strictly require the integrity of the consensus PPXY sequence, nonconsensus residues within and flanking this motif are not critical for high-affinity binding, implying that they most likely play a role in stabilizing the polyproline type II helical conformation of the PPXY ligands. Of particular interest is the observation that both WW domains bind to a PPXYXG motif with highest affinity, implicating a preference for a nonbulky and flexible glycine one residue to the C-terminal side of the consensus tyrosine. Importantly, a large set of residues within both WW domains and the PPXY motifs appear to undergo rapid fluctuations on a nanosecond time scale, suggesting that WW-ligand interactions are highly dynamic and that such conformational entropy may be an integral part of the reversible and temporal nature of cellular signaling cascades. Collectively, our study sheds light on the molecular determinants of a key WW-ligand interaction pertinent to cellular functions in health and disease.
The ARTT motif and a unified structural understanding of substraterecognition in ADP ribosylating bacterial toxins and eukaryotic ADPribosyltransferases

DOE Office of Scientific and Technical Information (OSTI.GOV)

Han, S.; Tainer, J.A.

2001-08-01

ADP-ribosylation is a widely occurring and biologically critical covalent chemical modification process in pathogenic mechanisms, intracellular signaling systems, DNA repair, and cell division. The reaction is catalyzed by ADP-ribosyltransferases, which transfer the ADP-ribose moiety of NAD to a target protein with nicotinamide release. A family of bacterial toxins and eukaryotic enzymes has been termed the mono-ADP-ribosyltransferases, in distinction to the poly-ADP-ribosyltransferases, which catalyze the addition of multiple ADP-ribose groups to the carboxyl terminus of eukaryotic nucleoproteins. Despite the limited primary sequence homology among the different ADP-ribosyltransferases, a central cleft bearing NAD-binding pocket formed by the two perpendicular b-sheet core hasmore » been remarkably conserved between bacterial toxins and eukaryotic mono- and poly-ADP-ribosyltransferases. The majority of bacterial toxins and eukaryotic mono-ADP-ribosyltransferases are characterized by conserved His and catalytic Glu residues. In contrast, Diphtheria toxin, Pseudomonas exotoxin A, and eukaryotic poly-ADP-ribosyltransferases are characterized by conserved Arg and catalytic Glu residues. The NAD-binding core of a binary toxin and a C3-like toxin family identified an ARTT motif (ADP-ribosylating turn-turn motif) that is implicated in substrate specificity and recognition by structural and mutagenic studies. Here we apply structure-based sequence alignment and comparative structural analyses of all known structures of ADP-ribosyltransfeases to suggest that this ARTT motif is functionally important in many ADP-ribosylating enzymes that bear a NAD binding cleft as characterized by conserved Arg and catalytic Glu residues. Overall, structure-based sequence analysis reveals common core structures and conserved active sites of ADP-ribosyltransferases to support similar NAD binding mechanisms but differing mechanisms of target protein binding via sequence variations within
Identification of GATC- and CCGG- recognizing Type II REases and their putative specificity-determining positions using Scan2S—a novel motif scan algorithm with optional secondary structure constraints

PubMed Central

Niv, Masha Y.; Skrabanek, Lucy; Roberts, Richard J.; Scheraga, Harold A.; Weinstein, Harel

2008-01-01

Restriction endonucleases (REases) are DNA-cleaving enzymes that have become indispensable tools in molecular biology. Type II REases are highly divergent in sequence despite their common structural core, function and, in some cases, common specificities towards DNA sequences. This makes it difficult to identify and classify them functionally based on sequence, and has hampered the efforts of specificity-engineering. Here, we define novel REase sequence motifs, which extend beyond the PD-(D/E)XK hallmark, and incorporate secondary structure information. The automated search using these motifs is carried out with a newly developed fast regular expression matching algorithm that accommodates long patterns with optional secondary structure constraints. Using this new tool, named Scan2S, motifs derived from REases with specificity towards GATC- and CGGG-containing DNA sequences successfully identify REases of the same specificity. Notably, some of these sequences are not identified by standard sequence detection tools. The new motifs highlight potential specificity-determining positions that do not fully overlap for the GATC- and the CCGG-recognizing REases and are candidates for specificity re-engineering. PMID:17972284
Identification of GATC- and CCGG-recognizing Type II REases and their putative specificity-determining positions using Scan2S--a novel motif scan algorithm with optional secondary structure constraints.

PubMed

Niv, Masha Y; Skrabanek, Lucy; Roberts, Richard J; Scheraga, Harold A; Weinstein, Harel

2008-05-01

Restriction endonucleases (REases) are DNA-cleaving enzymes that have become indispensable tools in molecular biology. Type II REases are highly divergent in sequence despite their common structural core, function and, in some cases, common specificities towards DNA sequences. This makes it difficult to identify and classify them functionally based on sequence, and has hampered the efforts of specificity-engineering. Here, we define novel REase sequence motifs, which extend beyond the PD-(D/E)XK hallmark, and incorporate secondary structure information. The automated search using these motifs is carried out with a newly developed fast regular expression matching algorithm that accommodates long patterns with optional secondary structure constraints. Using this new tool, named Scan2S, motifs derived from REases with specificity towards GATC- and CGGG-containing DNA sequences successfully identify REases of the same specificity. Notably, some of these sequences are not identified by standard sequence detection tools. The new motifs highlight potential specificity-determining positions that do not fully overlap for the GATC- and the CCGG-recognizing REases and are candidates for specificity re-engineering.
Analysis of network motifs in cellular regulation: Structural similarities, input-output relations and signal integration.

PubMed

Straube, Ronny

2017-12-01

Much of the complexity of regulatory networks derives from the necessity to integrate multiple signals and to avoid malfunction due to cross-talk or harmful perturbations. Hence, one may expect that the input-output behavior of larger networks is not necessarily more complex than that of smaller network motifs which suggests that both can, under certain conditions, be described by similar equations. In this review, we illustrate this approach by discussing the similarities that exist in the steady state descriptions of a simple bimolecular reaction, covalent modification cycles and bacterial two-component systems. Interestingly, in all three systems fundamental input-output characteristics such as thresholds, ultrasensitivity or concentration robustness are described by structurally similar equations. Depending on the system the meaning of the parameters can differ ranging from protein concentrations and affinity constants to complex parameter combinations which allows for a quantitative understanding of signal integration in these systems. We argue that this approach may also be extended to larger regulatory networks. Copyright © 2017 Elsevier B.V. All rights reserved.
Recoding method that removes inhibitory sequences and improves HIV gene expression

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rabadan, Raul; Krasnitz, Michael; Robins, Harlan

The invention relates to inhibitory nucleotide signal sequences or "INS" sequences in the genomes of lentiviruses. In particular the invention relates to the AGG motif present in all viral genomes. The AGG motif may have an inhibitory effect on a virus, for example by reducing the levels of, or maintaining low steady-state levels of, viral RNAs in host cells, and inducing and/or maintaining in viral latency. In one aspect, the invention provides vaccines that contain, or are produced from, viral nucleic acids in which the AGG sequences have been mutated. In another aspect, the invention provides methods and compositions formore » affecting the function of the AGG motif, and methods for identifying other INS sequences in viral genomes.« less
BS-virus-finder: virus integration calling using bisulfite sequencing data.

PubMed

Gao, Shengjie; Hu, Xuesong; Xu, Fengping; Gao, Changduo; Xiong, Kai; Zhao, Xiao; Chen, Haixiao; Zhao, Shancen; Wang, Mengyao; Fu, Dongke; Zhao, Xiaohui; Bai, Jie; Mao, Likai; Li, Bo; Wu, Song; Wang, Jian; Li, Shengbin; Yang, Huangming; Bolund, Lars; Pedersen, Christian N S

2018-01-01

DNA methylation plays a key role in the regulation of gene expression and carcinogenesis. Bisulfite sequencing studies mainly focus on calling single nucleotide polymorphism, different methylation region, and find allele-specific DNA methylation. Until now, only a few software tools have focused on virus integration using bisulfite sequencing data. We have developed a new and easy-to-use software tool, named BS-virus-finder (BSVF, RRID:SCR_015727), to detect viral integration breakpoints in whole human genomes. The tool is hosted at https://github.com/BGI-SZ/BSVF. BS-virus-finder demonstrates high sensitivity and specificity. It is useful in epigenetic studies and to reveal the relationship between viral integration and DNA methylation. BS-virus-finder is the first software tool to detect virus integration loci by using bisulfite sequencing data. © The Authors 2017. Published by Oxford University Press.
Cytosine-phosphate-guanine oligodeoxynucleotides containing GACGTT motifs enhance the immune responses elicited by keyhole limpet hemocyanin antigen in dairy cattle.

PubMed

Chu, Chun-Yen; Lee, Shang-Chun; Liu, Shyh-Shyan; Lin, Yu-Ming; Shen, Perng-Chi; Yu, Chi; Lee, Kuo-Hua; Zhao, Xin; Lee, Jai-Wei

2011-10-01

Adjuvants are important components of vaccine formulations. Effective adjuvants line innate and adaptive immunity by signaling through pathogen recognition receptors. Synthetic cytosine-phosphate-guanine (CpG) oligodeoxynucleotides (ODNs) have been shown to have potentials as adjuvants for vaccines. However, the immunostimulatory effect of CpG is species-specific and depends on the sequence of CpG motifs. A CpG ODN (2135), containing 3 identical copies of GTCGTT motif, was previously reported to have the strongest effects on bovine peripheral blood mononuclear cells (PBMC). Based on the sequence of 2135, we replaced the GTCGTT motif with 11 other sequences containing CG and investigated their effects on bovine lymphocyte proliferation. Results showed that the CpG ODNs containing 3 copies of GACGTT motif had the highest lymphocyte stimulation index (7.91±1.18), which was significantly (P<0.05) higher than that of 2135 (4.25±0.56). The CpG ODNs containing 3 copies of GACGTT motif also significantly increased the mRNA expression of interferon (IFN)-α, interleukin (IL)-12, and IL-21 in bovine PBMC. When dairy cows were immunized with the keyhole limpet hemocyanin (KLH) antigen formulated with CpG ODNs containing 3 copies of GACGTT, production of KLH-specific antibodies in serum and in milk whey was significantly (P<0.05) enhanced. IFN-γ in whole blood stimulated by KLH was also significantly (P<0.05) increased in cows immunized with KLH plus CpG ODNs. Our results indicate that CpG ODNs containing 3 copies of the GACGTT motifs is a potential adjuvant for bovine vaccines.
VISA--Vector Integration Site Analysis server: a web-based server to rapidly identify retroviral integration sites from next-generation sequencing.

PubMed

Hocum, Jonah D; Battrell, Logan R; Maynard, Ryan; Adair, Jennifer E; Beard, Brian C; Rawlings, David J; Kiem, Hans-Peter; Miller, Daniel G; Trobridge, Grant D

2015-07-07

Analyzing the integration profile of retroviral vectors is a vital step in determining their potential genotoxic effects and developing safer vectors for therapeutic use. Identifying retroviral vector integration sites is also important for retroviral mutagenesis screens. We developed VISA, a vector integration site analysis server, to analyze next-generation sequencing data for retroviral vector integration sites. Sequence reads that contain a provirus are mapped to the human genome, sequence reads that cannot be localized to a unique location in the genome are filtered out, and then unique retroviral vector integration sites are determined based on the alignment scores of the remaining sequence reads. VISA offers a simple web interface to upload sequence files and results are returned in a concise tabular format to allow rapid analysis of retroviral vector integration sites.
Biological network motif detection and evaluation

PubMed Central

2011-01-01

Background Molecular level of biological data can be constructed into system level of data as biological networks. Network motifs are defined as over-represented small connected subgraphs in networks and they have been used for many biological applications. Since network motif discovery involves computationally challenging processes, previous algorithms have focused on computational efficiency. However, we believe that the biological quality of network motifs is also very important. Results We define biological network motifs as biologically significant subgraphs and traditional network motifs are differentiated as structural network motifs in this paper. We develop five algorithms, namely, EDGEGO-BNM, EDGEBETWEENNESS-BNM, NMF-BNM, NMFGO-BNM and VOLTAGE-BNM, for efficient detection of biological network motifs, and introduce several evaluation measures including motifs included in complex, motifs included in functional module and GO term clustering score in this paper. Experimental results show that EDGEGO-BNM and EDGEBETWEENNESS-BNM perform better than existing algorithms and all of our algorithms are applicable to find structural network motifs as well. Conclusion We provide new approaches to finding network motifs in biological networks. Our algorithms efficiently detect biological network motifs and further improve existing algorithms to find high quality structural network motifs, which would be impossible using existing algorithms. The performances of the algorithms are compared based on our new evaluation measures in biological contexts. We believe that our work gives some guidelines of network motifs research for the biological networks. PMID:22784624
Molecular Characterization of Transgene Integration by Next-Generation Sequencing in Transgenic Cattle

PubMed Central

Zhang, Ran; Yin, Yinliang; Zhang, Yujun; Li, Kexin; Zhu, Hongxia; Gong, Qin; Wang, Jianwu; Hu, Xiaoxiang; Li, Ning

2012-01-01

As the number of transgenic livestock increases, reliable detection and molecular characterization of transgene integration sites and copy number are crucial not only for interpreting the relationship between the integration site and the specific phenotype but also for commercial and economic demands. However, the ability of conventional PCR techniques to detect incomplete and multiple integration events is limited, making it technically challenging to characterize transgenes. Next-generation sequencing has enabled cost-effective, routine and widespread high-throughput genomic analysis. Here, we demonstrate the use of next-generation sequencing to extensively characterize cattle harboring a 150-kb human lactoferrin transgene that was initially analyzed by chromosome walking without success. Using this approach, the sites upstream and downstream of the target gene integration site in the host genome were identified at the single nucleotide level. The sequencing result was verified by event-specific PCR for the integration sites and FISH for the chromosomal location. Sequencing depth analysis revealed that multiple copies of the incomplete target gene and the vector backbone were present in the host genome. Upon integration, complex recombination was also observed between the target gene and the vector backbone. These findings indicate that next-generation sequencing is a reliable and accurate approach for the molecular characterization of the transgene sequence, integration sites and copy number in transgenic species. PMID:23185606

An integrated semiconductor device enabling non-optical genome sequencing.

PubMed

Rothberg, Jonathan M; Hinz, Wolfgang; Rearick, Todd M; Schultz, Jonathan; Mileski, William; Davey, Mel; Leamon, John H; Johnson, Kim; Milgrew, Mark J; Edwards, Matthew; Hoon, Jeremy; Simons, Jan F; Marran, David; Myers, Jason W; Davidson, John F; Branting, Annika; Nobile, John R; Puc, Bernard P; Light, David; Clark, Travis A; Huber, Martin; Branciforte, Jeffrey T; Stoner, Isaac B; Cawley, Simon E; Lyons, Michael; Fu, Yutao; Homer, Nils; Sedova, Marina; Miao, Xin; Reed, Brian; Sabina, Jeffrey; Feierstein, Erika; Schorn, Michelle; Alanjary, Mohammad; Dimalanta, Eileen; Dressman, Devin; Kasinskas, Rachel; Sokolsky, Tanya; Fidanza, Jacqueline A; Namsaraev, Eugeni; McKernan, Kevin J; Williams, Alan; Roth, G Thomas; Bustillo, James

2011-07-20

The seminal importance of DNA sequencing to the life sciences, biotechnology and medicine has driven the search for more scalable and lower-cost solutions. Here we describe a DNA sequencing technology in which scalable, low-cost semiconductor manufacturing techniques are used to make an integrated circuit able to directly perform non-optical DNA sequencing of genomes. Sequence data are obtained by directly sensing the ions produced by template-directed DNA polymerase synthesis using all-natural nucleotides on this massively parallel semiconductor-sensing device or ion chip. The ion chip contains ion-sensitive, field-effect transistor-based sensors in perfect register with 1.2 million wells, which provide confinement and allow parallel, simultaneous detection of independent sequencing reactions. Use of the most widely used technology for constructing integrated circuits, the complementary metal-oxide semiconductor (CMOS) process, allows for low-cost, large-scale production and scaling of the device to higher densities and larger array sizes. We show the performance of the system by sequencing three bacterial genomes, its robustness and scalability by producing ion chips with up to 10 times as many sensors and sequencing a human genome.
The binding of histone deacetylases and the integrity of zinc finger-like motifs of the E7 protein are essential for the life cycle of human papillomavirus type 31.

PubMed

Longworth, Michelle S; Laimins, Laimonis A

2004-04-01

The E7 oncoprotein of high-risk human papillomaviruses (HPVs) binds to and alters the action of cell cycle regulatory proteins such as members of the retinoblastoma (Rb) family of proteins as well as the histone deacetylases (HDACs). To examine the significance of the binding of E7 to HDACs in the viral life cycle, a mutational analysis of the E7 open reading frame was performed in the context of the complete HPV type 31 (HPV-31) genome. Human foreskin keratinocytes were transfected with wild-type HPV-31 genomes or HPV-31 genomes containing mutations in HDAC binding sequences as well as in the C-terminal zinc finger-like domain, and stable cell lines were isolated. All mutant genomes, except those with E7 mutations in the HDAC binding site, were found to be stably maintained extrachromosomally at an early passage following transfection. Upon further passage in culture, genomes containing mutations to the Rb binding domain as well as the zinc finger-like region quickly lost the ability to maintain episomal genomes. Genomes containing mutations abolishing E7 binding to HDACs or to Rb or mutations to the zinc finger-like motifs failed to extend the life span of transfected keratinocytes and caused cells to arrest at the same time as the untransfected keratinocytes. When induced to differentiate by suspension in methylcellulose, cells maintaining genomes with mutations in the Rb binding domain or the zinc finger-like motifs were impaired in their abilities to activate late viral functions. This study demonstrates that the interaction of E7 with HDACs and the integrity of the zinc finger-like motifs are essential for extending the life span of keratinocytes and for stable maintenance of viral genomes.
Conformational Preference of ‘CαNN’ Short Peptide Motif towards Recognition of Anions

PubMed Central

Banerjee, Raja

2013-01-01

Among several ‘anion binding motifs’, the recently described ‘CαNN’ motif occurring in the loop regions preceding a helix, is conserved through evolution both in sequence and its conformation. To establish the significance of the conserved sequence and their intrinsic affinity for anions, a series of peptides containing the naturally occurring ‘CαNN’ motif at the N-terminus of a designed helix, have been modeled and studied in a context free system using computational techniques. Appearance of a single interacting site with negative binding free-energy for both the sulfate and phosphate ions, as evidenced in docking experiments, establishes that the ‘CαNN’ segment has an intrinsic affinity for anions. Molecular Dynamics (MD) simulation studies reveal that interaction with anion triggers a conformational switch from non-helical to helical state at the ‘CαNN’ segment, which extends the length of the anchoring-helix by one turn at the N-terminus. Computational experiments substantiate the significance of sequence/structural context and justify the conserved nature of the ‘CαNN’ sequence for anion recognition through “local” interaction. PMID:23516403
Sequencing Data Discovery and Integration for Earth System Science with MetaSeek

NASA Astrophysics Data System (ADS)

Hoarfrost, A.; Brown, N.; Arnosti, C.

2017-12-01

Microbial communities play a central role in biogeochemical cycles. Sequencing data resources from environmental sources have grown exponentially in recent years, and represent a singular opportunity to investigate microbial interactions with Earth system processes. Carrying out such meta-analyses depends on our ability to discover and curate sequencing data into large-scale integrated datasets. However, such integration efforts are currently challenging and time-consuming, with sequencing data scattered across multiple repositories and metadata that is not easily or comprehensively searchable. MetaSeek is a sequencing data discovery tool that integrates sequencing metadata from all the major data repositories, allowing the user to search and filter on datasets in a lightweight application with an intuitive, easy-to-use web-based interface. Users can save and share curated datasets, while other users can browse these data integrations or use them as a jumping off point for their own curation. Missing and/or erroneous metadata are inferred automatically where possible, and where not possible, users are prompted to contribute to the improvement of the sequencing metadata pool by correcting and amending metadata errors. Once an integrated dataset has been curated, users can follow simple instructions to download their raw data and quickly begin their investigations. In addition to the online interface, the MetaSeek database is easily queryable via an open API, further enabling users and facilitating integrations of MetaSeek with other data curation tools. This tool lowers the barriers to curation and integration of environmental sequencing data, clearing the path forward to illuminating the ecosystem-scale interactions between biological and abiotic processes.
Mutually Exclusive Formation of G-Quadruplex and i-Motif Is a General Phenomenon Governed by Steric Hindrance in Duplex DNA.

PubMed

Cui, Yunxi; Kong, Deming; Ghimire, Chiran; Xu, Cuixia; Mao, Hanbin

2016-04-19

G-Quadruplex and i-motif are tetraplex structures that may form in opposite strands at the same location of a duplex DNA. Recent discoveries have indicated that the two tetraplex structures can have conflicting biological activities, which poses a challenge for cells to coordinate. Here, by performing innovative population analysis on mechanical unfolding profiles of tetraplex structures in double-stranded DNA, we found that formations of G-quadruplex and i-motif in the two complementary strands are mutually exclusive in a variety of DNA templates, which include human telomere and promoter fragments of hINS and hTERT genes. To explain this behavior, we placed G-quadruplex- and i-motif-hosting sequences in an offset fashion in the two complementary telomeric DNA strands. We found simultaneous formation of the G-quadruplex and i-motif in opposite strands, suggesting that mutual exclusivity between the two tetraplexes is controlled by steric hindrance. This conclusion was corroborated in the BCL-2 promoter sequence, in which simultaneous formation of two tetraplexes was observed due to possible offset arrangements between G-quadruplex and i-motif in opposite strands. The mutual exclusivity revealed here sets a molecular basis for cells to efficiently coordinate opposite biological activities of G-quadruplex and i-motif at the same dsDNA location.
Identification of helix capping and β-turn motifs from NMR chemical shifts

PubMed Central

Shen, Yang; Bax, Ad

2012-01-01

We present an empirical method for identification of distinct structural motifs in proteins on the basis of experimentally determined backbone and 13Cβ chemical shifts. Elements identified include the N-terminal and C-terminal helix capping motifs and five types of β-turns: I, II, I′, II′ and VIII. Using a database of proteins of known structure, the NMR chemical shifts, together with the PDB-extracted amino acid preference of the helix capping and β-turn motifs are used as input data for training an artificial neural network algorithm, which outputs the statistical probability of finding each motif at any given position in the protein. The trained neural networks, contained in the MICS (motif identification from chemical shifts) program, also provide a confidence level for each of their predictions, and values ranging from ca 0.7–0.9 for the Matthews correlation coefficient of its predictions far exceed that attainable by sequence analysis. MICS is anticipated to be useful both in the conventional NMR structure determination process and for enhancing on-going efforts to determine protein structures solely on the basis of chemical shift information, where it can aid in identifying protein database fragments suitable for use in building such structures. PMID:22314702
Targeting MED1 LxxLL Motifs for Tissue-Selective Treatment of Human Breast Cancer

DTIC Science & Technology

2013-09-01

colleagues have successfully conjugated malachite green aptamer to RNA nanoparticles characterized by a 3WJ pRNA motif. The in vitro experiment indi- cated...DNA/RNA sequence FIGURE 19.5 Diagram of RNA nanoparticle harboring malachite green aptamer, survivin siRNA and folate-DNA/RNA sequence for targeting...of RNA Aptamer to RNA Nanoparticles (Figure 19.5; Shu et al. 2011). The sequence for the malachite green aptamer nanoparticle was rationally designed
Targeting MED1 LxxLL Motifs for Tissue-Selective Treatment of Human Breast Cancer

DTIC Science & Technology

2014-09-01

his colleagues have successfully conjugated malachite green aptamer to RNA nanoparticles characterized by a 3WJ pRNA motif. The in vitro experiment...Folate-DNA/RNA sequence FIGURE 19.5 Diagram of RNA nanoparticle harboring malachite green aptamer, survivin siRNA and folate-DNA/RNA sequence for...405Conjugation of RNA Aptamer to RNA Nanoparticles (Figure 19.5; Shu et al. 2011). The sequence for the malachite green aptamer nanoparticle was rationally
Large-Scale Collection and Analysis of Full-Length cDNAs from Brachypodium distachyon and Integration with Pooideae Sequence Resources

PubMed Central

Mochida, Keiichi; Uehara-Yamaguchi, Yukiko; Takahashi, Fuminori; Yoshida, Takuhiro; Sakurai, Tetsuya; Shinozaki, Kazuo

2013-01-01

A comprehensive collection of full-length cDNAs is essential for correct structural gene annotation and functional analyses of genes. We constructed a mixed full-length cDNA library from 21 different tissues of Brachypodium distachyon Bd21, and obtained 78,163 high quality expressed sequence tags (ESTs) from both ends of ca. 40,000 clones (including 16,079 contigs). We updated gene structure annotations of Brachypodium genes based on full-length cDNA sequences in comparison with the latest publicly available annotations. About 10,000 non-redundant gene models were supported by full-length cDNAs; ca. 6,000 showed some transcription unit modifications. We also found ca. 580 novel gene models, including 362 newly identified in Bd21. Using the updated transcription start sites, we searched a total of 580 plant cis-motifs in the −3 kb promoter regions and determined a genome-wide Brachypodium promoter architecture. Furthermore, we integrated the Brachypodium full-length cDNAs and updated gene structures with available sequence resources in wheat and barley in a web-accessible database, the RIKEN Brachypodium FL cDNA database. The database represents a “one-stop” information resource for all genomic information in the Pooideae, facilitating functional analysis of genes in this model grass plant and seamless knowledge transfer to the Triticeae crops. PMID:24130698
A study on the application of topic models to motif finding algorithms.

PubMed

Basha Gutierrez, Josep; Nakai, Kenta

2016-12-22

Topic models are statistical algorithms which try to discover the structure of a set of documents according to the abstract topics contained in them. Here we try to apply this approach to the discovery of the structure of the transcription factor binding sites (TFBS) contained in a set of biological sequences, which is a fundamental problem in molecular biology research for the understanding of transcriptional regulation. Here we present two methods that make use of topic models for motif finding. First, we developed an algorithm in which first a set of biological sequences are treated as text documents, and the k-mers contained in them as words, to then build a correlated topic model (CTM) and iteratively reduce its perplexity. We also used the perplexity measurement of CTMs to improve our previous algorithm based on a genetic algorithm and several statistical coefficients. The algorithms were tested with 56 data sets from four different species and compared to 14 other methods by the use of several coefficients both at nucleotide and site level. The results of our first approach showed a performance comparable to the other methods studied, especially at site level and in sensitivity scores, in which it scored better than any of the 14 existing tools. In the case of our previous algorithm, the new approach with the addition of the perplexity measurement clearly outperformed all of the other methods in sensitivity, both at nucleotide and site level, and in overall performance at site level. The statistics obtained show that the performance of a motif finding method based on the use of a CTM is satisfying enough to conclude that the application of topic models is a valid method for developing motif finding algorithms. Moreover, the addition of topic models to a previously developed method dramatically increased its performance, suggesting that this combined algorithm can be a useful tool to successfully predict motifs in different kinds of sets of DNA sequences.
Conserved binding of GCAC motifs by MEC-8, couch potato, and the RBPMS protein family

PubMed Central

Soufari, Heddy

2017-01-01

Precise regulation of mRNA processing, translation, localization, and stability relies on specific interactions with RNA-binding proteins whose biological function and target preference are dictated by their preferred RNA motifs. The RBPMS family of RNA-binding proteins is defined by a conserved RNA recognition motif (RRM) domain found in metazoan RBPMS/Hermes and RBPMS2, Drosophila couch potato, and MEC-8 from Caenorhabditis elegans. In order to determine the parameters of RNA sequence recognition by the RBPMS family, we have first used the N-terminal domain from MEC-8 in binding assays and have demonstrated a preference for two GCAC motifs optimally separated by >6 nucleotides (nt). We have also determined the crystal structure of the dimeric N-terminal RRM domain from MEC-8 in the unbound form, and in complex with an oligonucleotide harboring two copies of the optimal GCAC motif. The atomic details reveal the molecular network that provides specificity to all four bases in the motif, including multiple hydrogen bonds to the initial guanine. Further studies with human RBPMS, as well as Drosophila couch potato, confirm a general preference for this double GCAC motif by other members of the protein family and the presence of this motif in known targets. PMID:28003515
Canonical Bcl-2 motifs of the Na+/K+ pump revealed by the BH3 mimetic chelerythrine: early signal transducers of apoptosis?

PubMed

Lauf, Peter K; Heiny, Judith; Meller, Jarek; Lepera, Michael A; Koikov, Leonid; Alter, Gerald M; Brown, Thomas L; Adragna, Norma C

2013-01-01

Chelerythrine [CET], a protein kinase C [PKC] inhibitor, is a prop-apoptotic BH3-mimetic binding to BH1-like motifs of Bcl-2 proteins. CET action was examined on PKC phosphorylation-dependent membrane transporters (Na+/K+ pump/ATPase [NKP, NKA], Na+-K+-2Cl+ [NKCC] and K+-Cl- [KCC] cotransporters, and channel-supported K+ loss) in human lens epithelial cells [LECs]. K+ loss and K+ uptake, using Rb+ as congener, were measured by atomic absorption/emission spectrophotometry with NKP and NKCC inhibitors, and Cl- replacement by NO3ˉ to determine KCC. 3H-Ouabain binding was performed on a pig renal NKA in the presence and absence of CET. Bcl-2 protein and NKA sequences were aligned and motifs identified and mapped using PROSITE in conjunction with BLAST alignments and analysis of conservation and structural similarity based on prediction of secondary and crystal structures. CET inhibited NKP and NKCC by >90% (IC50 values ~35 and ~15 μM, respectively) without significant KCC activity change, and stimulated K+ loss by ~35% at 10-30 μM. Neither ATP levels nor phosphorylation of the NKA α1 subunit changed. 3H-ouabain was displaced from pig renal NKA only at 100 fold higher CET concentrations than the ligand. Sequence alignments of NKA with BH1- and BH3-like motifs containing pro-survival Bcl-2 and BclXl proteins showed more than one BH1-like motif within NKA for interaction with CET or with BH3 motifs. One NKA BH1-like motif (ARAAEILARDGPN) was also found in all P-type ATPases. Also, NKA possessed a second motif similar to that near the BH3 region of Bcl-2. Findings support the hypothesis that CET inhibits NKP by binding to BH1-like motifs and disrupting the α1 subunit catalytic activity through conformational changes. By interacting with Bcl-2 proteins through their complementary BH1- or BH3-like-motifs, NKP proteins may be sensors of normal and pathological cell functions, becoming important yet unrecognized signal transducers in the initial phases of apoptosis. CET
A novel swarm intelligence algorithm for finding DNA motifs.

PubMed

Lei, Chengwei; Ruan, Jianhua

2009-01-01

Discovering DNA motifs from co-expressed or co-regulated genes is an important step towards deciphering complex gene regulatory networks and understanding gene functions. Despite significant improvement in the last decade, it still remains one of the most challenging problems in computational molecular biology. In this work, we propose a novel motif finding algorithm that finds consensus patterns using a population-based stochastic optimisation technique called Particle Swarm Optimisation (PSO), which has been shown to be effective in optimising difficult multidimensional problems in continuous domains. We propose to use a word dissimilarity graph to remap the neighborhood structure of the solution space of DNA motifs, and propose a modification of the naive PSO algorithm to accommodate discrete variables. In order to improve efficiency, we also propose several strategies for escaping from local optima and for automatically determining the termination criteria. Experimental results on simulated challenge problems show that our method is both more efficient and more accurate than several existing algorithms. Applications to several sets of real promoter sequences also show that our approach is able to detect known transcription factor binding sites, and outperforms two of the most popular existing algorithms.
SELEX and SHAPE reveal that sequence motifs and an extended hairpin in the 5' portion of Turnip crinkle virus satellite RNA C mediate fitness in plants.

PubMed

Bayne, Charlie F; Widawski, Max E; Gao, Feng; Masab, Mohammed H; Chattopadhyay, Maitreyi; Murawski, Allison M; Sansevere, Robert M; Lerner, Bryan D; Castillo, Rinaldys J; Griesman, Trevor; Fu, Jiantao; Hibben, Jennifer K; Garcia-Perez, Alma D; Simon, Anne E; Kushner, David B

2018-07-01

Noncoding RNAs use their sequence and/or structure to mediate function(s). The 5' portion (166 nt) of the 356-nt noncoding satellite RNA C (satC) of Turnip crinkle virus (TCV) was previously modeled to contain a central region with two stem-loops (H6 and H7) and a large connecting hairpin (H2). We now report that in vivo functional selection (SELEX) experiments assessing sequence/structure requirements in H2, H6, and H7 reveal that H6 loop sequence motifs were recovered at nonrandom rates and only some residues are proposed to base-pair with accessible complementary sequences within the 5' central region. In vitro SHAPE of SELEX winners indicates that the central region is heavily base-paired, such that along with the lower stem and H2 region, one extensive hairpin exists composing the entire 5' region. As these SELEX winners are highly fit, these characteristics facilitate satRNA amplification in association with TCV in plants. Copyright © 2018 Elsevier Inc. All rights reserved.
Counting motifs in dynamic networks.

PubMed

Mukherjee, Kingshuk; Hasan, Md Mahmudul; Boucher, Christina; Kahveci, Tamer

2018-04-11

A network motif is a sub-network that occurs frequently in a given network. Detection of such motifs is important since they uncover functions and local properties of the given biological network. Finding motifs is however a computationally challenging task as it requires solving the costly subgraph isomorphism problem. Moreover, the topology of biological networks change over time. These changing networks are called dynamic biological networks. As the network evolves, frequency of each motif in the network also changes. Computing the frequency of a given motif from scratch in a dynamic network as the network topology evolves is infeasible, particularly for large and fast evolving networks. In this article, we design and develop a scalable method for counting the number of motifs in a dynamic biological network. Our method incrementally updates the frequency of each motif as the underlying network's topology evolves. Our experiments demonstrate that our method can update the frequency of each motif in orders of magnitude faster than counting the motif embeddings every time the network changes. If the network evolves more frequently, the margin with which our method outperforms the existing static methods, increases. We evaluated our method extensively using synthetic and real datasets, and show that our method is highly accurate(≥ 96%) and that it can be scaled to large dense networks. The results on real data demonstrate the utility of our method in revealing interesting insights on the evolution of biological processes.
Local Renyi entropic profiles of DNA sequences.

PubMed

Vinga, Susana; Almeida, Jonas S

2007-10-16

In a recent report the authors presented a new measure of continuous entropy for DNA sequences, which allows the estimation of their randomness level. The definition therein explored was based on the Rényi entropy of probability density estimation (pdf) using the Parzen's window method and applied to Chaos Game Representation/Universal Sequence Maps (CGR/USM). Subsequent work proposed a fractal pdf kernel as a more exact solution for the iterated map representation. This report extends the concepts of continuous entropy by defining DNA sequence entropic profiles using the new pdf estimations to refine the density estimation of motifs. The new methodology enables two results. On the one hand it shows that the entropic profiles are directly related with the statistical significance of motifs, allowing the study of under and over-representation of segments. On the other hand, by spanning the parameters of the kernel function it is possible to extract important information about the scale of each conserved DNA region. The computational applications, developed in Matlab m-code, the corresponding binary executables and additional material and examples are made publicly available at http://kdbio.inesc-id.pt/~svinga/ep/. The ability to detect local conservation from a scale-independent representation of symbolic sequences is particularly relevant for biological applications where conserved motifs occur in multiple, overlapping scales, with significant future applications in the recognition of foreign genomic material and inference of motif structures.
Local Renyi entropic profiles of DNA sequences

PubMed Central

Vinga, Susana; Almeida, Jonas S

2007-01-01

Background In a recent report the authors presented a new measure of continuous entropy for DNA sequences, which allows the estimation of their randomness level. The definition therein explored was based on the Rényi entropy of probability density estimation (pdf) using the Parzen's window method and applied to Chaos Game Representation/Universal Sequence Maps (CGR/USM). Subsequent work proposed a fractal pdf kernel as a more exact solution for the iterated map representation. This report extends the concepts of continuous entropy by defining DNA sequence entropic profiles using the new pdf estimations to refine the density estimation of motifs. Results The new methodology enables two results. On the one hand it shows that the entropic profiles are directly related with the statistical significance of motifs, allowing the study of under and over-representation of segments. On the other hand, by spanning the parameters of the kernel function it is possible to extract important information about the scale of each conserved DNA region. The computational applications, developed in Matlab m-code, the corresponding binary executables and additional material and examples are made publicly available at . Conclusion The ability to detect local conservation from a scale-independent representation of symbolic sequences is particularly relevant for biological applications where conserved motifs occur in multiple, overlapping scales, with significant future applications in the recognition of foreign genomic material and inference of motif structures. PMID:17939871
Yeast One-Hybrid Gγ Recruitment System for Identification of Protein Lipidation Motifs

PubMed Central

Fukuda, Nobuo; Doi, Motomichi; Honda, Shinya

2013-01-01

Fatty acids and isoprenoids can be covalently attached to a variety of proteins. These lipid modifications regulate protein structure, localization and function. Here, we describe a yeast one-hybrid approach based on the Gγ recruitment system that is useful for identifying sequence motifs those influence lipid modification to recruit proteins to the plasma membrane. Our approach facilitates the isolation of yeast cells expressing lipid-modified proteins via a simple and easy growth selection assay utilizing G-protein signaling that induces diploid formation. In the current study, we selected the N-terminal sequence of Gα subunits as a model case to investigate dual lipid modification, i.e., myristoylation and palmitoylation, a modification that is widely conserved from yeast to higher eukaryotes. Our results suggest that both lipid modifications are required for restoration of G-protein signaling. Although we could not differentiate between myristoylation and palmitoylation, N-terminal position 7 and 8 play some critical role. Moreover, we tested the preference for specific amino-acid residues at position 7 and 8 using library-based screening. This new approach will be useful to explore protein-lipid associations and to determine the corresponding sequence motifs. PMID:23922919
The LINKS motif zippers trans-acyltransferase polyketide synthase assembly lines into a biosynthetic megacomplex

PubMed Central

Gay, Darren C.; Wagner, Drew T.; Meinke, Jessica L.; Zogzas, Charles E.; Gay, Glen R.; Keatinge-Clay, Adrian T.

2016-01-01

Polyketides such as the clinically-valuable antibacterial agent mupirocin are constructed by architecturally-sophisticated assembly lines known as trans-acyltransferase polyketide synthases. Organelle-sized megacomplexes composed of several copies of trans-acyltransferase polyketide synthase assembly lines have been observed by others through transmission electron microscopy to be located at the Bacillus subtilis plasma membrane, where the synthesis and export of the antibacterial polyketide bacillaene takes place. In this work we analyze ten crystal structures of trans-acyltransferase polyketide synthases ketosynthase domains, seven of which are reported here for the first time, to characterize a motif capable of zippering assembly lines into a megacomplex. While each of the three-helix LINKS (Laterally-INteracting Ketosynthase Sequence) motifs is observed to similarly dock with a spatially-reversed copy of itself through hydrophobic and ionic interactions, the amino acid sequences of this motif are not conserved. Such a code is appropriate for mediating homotypic contacts between assembly lines to ensure the ordered self-assembly of a noncovalent, yet tightly-knit, enzymatic network. LINKS-mediated lateral interactions would also have the effect of bolstering the vertical association of the polypeptides that comprise a polyketide synthase assembly line. PMID:26724270
Sequence and conformational preferences at termini of α-helices in membrane proteins: role of the helix environment.

PubMed

Shelar, Ashish; Bansal, Manju

2014-12-01

α-Helices are amongst the most common secondary structural elements seen in membrane proteins and are packed in the form of helix bundles. These α-helices encounter varying external environments (hydrophobic, hydrophilic) that may influence the sequence preferences at their N and C-termini. The role of the external environment in stabilization of the helix termini in membrane proteins is still unknown. Here we analyze α-helices in a high-resolution dataset of integral α-helical membrane proteins and establish that their sequence and conformational preferences differ from those in globular proteins. We specifically examine these preferences at the N and C-termini in helices initiating/terminating inside the membrane core as well as in linkers connecting these transmembrane helices. We find that the sequence preferences and structural motifs at capping (Ncap and Ccap) and near-helical (N' and C') positions are influenced by a combination of features including the membrane environment and the innate helix initiation and termination property of residues forming structural motifs. We also find that a large number of helix termini which do not form any particular capping motif are stabilized by formation of hydrogen bonds and hydrophobic interactions contributed from the neighboring helices in the membrane protein. We further validate the sequence preferences obtained from our analysis with data from an ultradeep sequencing study that identifies evolutionarily conserved amino acids in the rat neurotensin receptor. The results from our analysis provide insights for the secondary structure prediction, modeling and design of membrane proteins. © 2014 Wiley Periodicals, Inc.

Sequence, Structure, and Context Preferences of Human RNA Binding Proteins.

PubMed

Dominguez, Daniel; Freese, Peter; Alexis, Maria S; Su, Amanda; Hochman, Myles; Palden, Tsultrim; Bazile, Cassandra; Lambert, Nicole J; Van Nostrand, Eric L; Pratt, Gabriel A; Yeo, Gene W; Graveley, Brenton R; Burge, Christopher B

2018-06-07

RNA binding proteins (RBPs) orchestrate the production, processing, and function of mRNAs. Here, we present the affinity landscapes of 78 human RBPs using an unbiased assay that determines the sequence, structure, and context preferences of these proteins in vitro by deep sequencing of bound RNAs. These data enable construction of "RNA maps" of RBP activity without requiring crosslinking-based assays. We found an unexpectedly low diversity of RNA motifs, implying frequent convergence of binding specificity toward a relatively small set of RNA motifs, many with low compositional complexity. Offsetting this trend, however, we observed extensive preferences for contextual features distinct from short linear RNA motifs, including spaced "bipartite" motifs, biased flanking nucleotide composition, and bias away from or toward RNA structure. Our results emphasize the importance of contextual features in RNA recognition, which likely enable targeting of distinct subsets of transcripts by different RBPs that recognize the same linear motif. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.
THGS: a web-based database of Transmembrane Helices in Genome Sequences

PubMed Central

Fernando, S. A.; Selvarani, P.; Das, Soma; Kumar, Ch. Kiran; Mondal, Sukanta; Ramakumar, S.; Sekar, K.

2004-01-01

Transmembrane Helices in Genome Sequences (THGS) is an interactive web-based database, developed to search the transmembrane helices in the user-interested gene sequences available in the Genome Database (GDB). The proposed database has provision to search sequence motifs in transmembrane and globular proteins. In addition, the motif can be searched in the other sequence databases (Swiss-Prot and PIR) or in the macromolecular structure database, Protein Data Bank (PDB). Further, the 3D structure of the corresponding queried motif, if it is available in the solved protein structures deposited in the Protein Data Bank, can also be visualized using the widely used graphics package RASMOL. All the sequence databases used in the present work are updated frequently and hence the results produced are up to date. The database THGS is freely available via the world wide web and can be accessed at http://pranag.physics.iisc.ernet.in/thgs/ or http://144.16.71.10/thgs/. PMID:14681375
Rare k-mer DNA: Identification of sequence motifs and prediction of CpG island and promoter.

PubMed

Mohamed Hashim, Ezzeddin Kamil; Abdullah, Rosni

2015-12-21

Empirical analysis on k-mer DNA has been proven as an effective tool in finding unique patterns in DNA sequences which can lead to the discovery of potential sequence motifs. In an extensive study of empirical k-mer DNA on hundreds of organisms, the researchers found unique multi-modal k-mer spectra occur in the genomes of organisms from the tetrapod clade only which includes all mammals. The multi-modality is caused by the formation of the two lowest modes where k-mers under them are referred as the rare k-mers. The suppression of the two lowest modes (or the rare k-mers) can be attributed to the CG dinucleotide inclusions in them. Apart from that, the rare k-mers are selectively distributed in certain genomic features of CpG Island (CGI), promoter, 5' UTR, and exon. We correlated the rare k-mers with hundreds of annotated features using several bioinformatic tools, performed further intrinsic rare k-mer analyses within the correlated features, and modeled the elucidated rare k-mer clustering feature into a classifier to predict the correlated CGI and promoter features. Our correlation results show that rare k-mers are highly associated with several annotated features of CGI, promoter, 5' UTR, and open chromatin regions. Our intrinsic results show that rare k-mers have several unique topological, compositional, and clustering properties in CGI and promoter features. Finally, the performances of our RWC (rare-word clustering) method in predicting the CGI and promoter features are ranked among the top three, in eight of the CGI and promoter evaluations, among eight of the benchmarked datasets. Crown Copyright © 2015. Published by Elsevier Ltd. All rights reserved.
Computational and experimental analysis of short peptide motifs for enzyme inhibition.

PubMed

Fu, Jinglin; Larini, Luca; Cooper, Anthony J; Whittaker, John W; Ahmed, Azka; Dong, Junhao; Lee, Minyoung; Zhang, Ting

2017-01-01

The metabolism of living systems involves many enzymes that play key roles as catalysts and are essential to biological function. Searching ligands with the ability to modulate enzyme activities is central to diagnosis and therapeutics. Peptides represent a promising class of potential enzyme modulators due to the large chemical diversity, and well-established methods for library synthesis. Peptides and their derivatives are found to play critical roles in modulating enzymes and mediating cellular uptakes, which are increasingly valuable in therapeutics. We present a methodology that uses molecular dynamics (MD) and point-variant screening to identify short peptide motifs that are critical for inhibiting β-galactosidase (β-Gal). MD was used to simulate the conformations of peptides and to suggest short motifs that were most populated in simulated conformations. The function of the simulated motifs was further validated by the experimental point-variant screening as critical segments for inhibiting the enzyme. Based on the validated motifs, we eventually identified a 7-mer short peptide for inhibiting an enzyme with low μM IC50. The advantage of our methodology is the relatively simplified simulation that is informative enough to identify the critical sequence of a peptide inhibitor, with a precision comparable to truncation and alanine scanning experiments. Our combined experimental and computational approach does not rely on a detailed understanding of mechanistic and structural details. The MD simulation suggests the populated motifs that are consistent with the results of the experimental alanine and truncation scanning. This approach appears to be applicable to both natural and artificial peptides. With more discovered short motifs in the future, they could be exploited for modulating biocatalysis, and developing new medicine.
An RNA motif that binds ATP

NASA Technical Reports Server (NTRS)

Sassanfar, M.; Szostak, J. W.

1993-01-01

RNAs that contain specific high-affinity binding sites for small molecule ligands immobilized on a solid support are present at a frequency of roughly one in 10(10)-10(11) in pools of random sequence RNA molecules. Here we describe a new in vitro selection procedure designed to ensure the isolation of RNAs that bind the ligand of interest in solution as well as on a solid support. We have used this method to isolate a remarkably small RNA motif that binds ATP, a substrate in numerous biological reactions and the universal biological high-energy intermediate. The selected ATP-binding RNAs contain a consensus sequence, embedded in a common secondary structure. The binding properties of ATP analogues and modified RNAs show that the binding interaction is characterized by a large number of close contacts between the ATP and RNA, and by a change in the conformation of the RNA.
Helix–hairpin–helix motifs confer salt resistance and processivity on chimeric DNA polymerases

PubMed Central

Pavlov, Andrey R.; Belova, Galina I.; Kozyavkin, Sergei A.; Slesarev, Alexei I.

2002-01-01

Helix–hairpin–helix (HhH) is a widespread motif involved in sequence-nonspecific DNA binding. The majority of HhH motifs function as DNA-binding modules with typical occurrence of one HhH motif or one or two (HhH)2 domains in proteins. We recently identified 24 HhH motifs in DNA topoisomerase V (Topo V). Although these motifs are dispensable for the topoisomerase activity of Topo V, their removal narrows the salt concentration range for topoisomerase activity tenfold. Here, we demonstrate the utility of Topo V's HhH motifs for modulating DNA-binding properties of the Stoffel fragment of TaqDNA polymerase and Pfu DNA polymerase. Different HhH cassettes fused with either NH2 terminus or COOH terminus of DNA polymerases broaden the salt concentration range of the polymerase activity significantly (up to 0.5 M NaCl or 1.8 M potassium glutamate). We found that anions play a major role in the inhibition of DNA polymerase activity. The resistance of initial extension rates and the processivity of chimeric polymerases to salts depend on the structure of added HhH motifs. Regardless of the type of the construct, the thermal stability of chimeric Taq polymerases increases under the optimal ionic conditions, as compared with that of TaqDNA polymerase or its Stoffel fragment. Our approach to raise the salt tolerance, processivity, and thermostability of Taq and Pfu DNA polymerases may be applied to all pol1- and polB-type polymerases, as well as to other DNA processing enzymes. PMID:12368475
Regulatory sequence analysis tools.

PubMed

van Helden, Jacques

2003-07-01

The web resource Regulatory Sequence Analysis Tools (RSAT) (http://rsat.ulb.ac.be/rsat) offers a collection of software tools dedicated to the prediction of regulatory sites in non-coding DNA sequences. These tools include sequence retrieval, pattern discovery, pattern matching, genome-scale pattern matching, feature-map drawing, random sequence generation and other utilities. Alternative formats are supported for the representation of regulatory motifs (strings or position-specific scoring matrices) and several algorithms are proposed for pattern discovery. RSAT currently holds >100 fully sequenced genomes and these data are regularly updated from GenBank.
Overlapping ETS and CRE Motifs (G/CCGGAAGTGACGTCA) Preferentially Bound by GABPα and CREB Proteins

PubMed Central

Chatterjee, Raghunath; Zhao, Jianfei; He, Ximiao; Shlyakhtenko, Andrey; Mann, Ishminder; Waterfall, Joshua J.; Meltzer, Paul; Sathyanarayana, B. K.; FitzGerald, Peter C.; Vinson, Charles

2012-01-01

Previously, we identified 8-bps long DNA sequences (8-mers) that localize in human proximal promoters and grouped them into known transcription factor binding sites (TFBS). We now examine split 8-mers consisting of two 4-mers separated by 1-bp to 30-bps (X4-N1-30-X4) to identify pairs of TFBS that localize in proximal promoters at a precise distance. These include two overlapping TFBS: the ETS⇔ETS motif (C/GCCGGAAGCGGAA) and the ETS⇔CRE motif (C/GCGGAAGTGACGTCAC). The nucleotides in bold are part of both TFBS. Molecular modeling shows that the ETS⇔CRE motif can be bound simultaneously by both the ETS and the B-ZIP domains without protein-protein clashes. The electrophoretic mobility shift assay (EMSA) shows that the ETS protein GABPα and the B-ZIP protein CREB preferentially bind to the ETS⇔CRE motif only when the two TFBS overlap precisely. In contrast, the ETS domain of ETV5 and CREB interfere with each other for binding the ETS⇔CRE. The 11-mer (CGGAAGTGACG), the conserved part of the ETS⇔CRE motif, occurs 226 times in the human genome and 83% are in known regulatory regions. In vivo GABPα and CREB ChIP-seq peaks identified the ETS⇔CRE as the most enriched motif occurring in promoters of genes involved in mRNA processing, cellular catabolic processes, and stress response, suggesting that a specific class of genes is regulated by this composite motif. PMID:23050235
BioWord: A sequence manipulation suite for Microsoft Word

PubMed Central

2012-01-01

Background The ability to manipulate, edit and process DNA and protein sequences has rapidly become a necessary skill for practicing biologists across a wide swath of disciplines. In spite of this, most everyday sequence manipulation tools are distributed across several programs and web servers, sometimes requiring installation and typically involving frequent switching between applications. To address this problem, here we have developed BioWord, a macro-enabled self-installing template for Microsoft Word documents that integrates an extensive suite of DNA and protein sequence manipulation tools. Results BioWord is distributed as a single macro-enabled template that self-installs with a single click. After installation, BioWord will open as a tab in the Office ribbon. Biologists can then easily manipulate DNA and protein sequences using a familiar interface and minimize the need to switch between applications. Beyond simple sequence manipulation, BioWord integrates functionality ranging from dyad search and consensus logos to motif discovery and pair-wise alignment. Written in Visual Basic for Applications (VBA) as an open source, object-oriented project, BioWord allows users with varying programming experience to expand and customize the program to better meet their own needs. Conclusions BioWord integrates a powerful set of tools for biological sequence manipulation within a handy, user-friendly tab in a widely used word processing software package. The use of a simple scripting language and an object-oriented scheme facilitates customization by users and provides a very accessible educational platform for introducing students to basic bioinformatics algorithms. PMID:22676326
BioWord: a sequence manipulation suite for Microsoft Word.

PubMed

Anzaldi, Laura J; Muñoz-Fernández, Daniel; Erill, Ivan

2012-06-07

The ability to manipulate, edit and process DNA and protein sequences has rapidly become a necessary skill for practicing biologists across a wide swath of disciplines. In spite of this, most everyday sequence manipulation tools are distributed across several programs and web servers, sometimes requiring installation and typically involving frequent switching between applications. To address this problem, here we have developed BioWord, a macro-enabled self-installing template for Microsoft Word documents that integrates an extensive suite of DNA and protein sequence manipulation tools. BioWord is distributed as a single macro-enabled template that self-installs with a single click. After installation, BioWord will open as a tab in the Office ribbon. Biologists can then easily manipulate DNA and protein sequences using a familiar interface and minimize the need to switch between applications. Beyond simple sequence manipulation, BioWord integrates functionality ranging from dyad search and consensus logos to motif discovery and pair-wise alignment. Written in Visual Basic for Applications (VBA) as an open source, object-oriented project, BioWord allows users with varying programming experience to expand and customize the program to better meet their own needs. BioWord integrates a powerful set of tools for biological sequence manipulation within a handy, user-friendly tab in a widely used word processing software package. The use of a simple scripting language and an object-oriented scheme facilitates customization by users and provides a very accessible educational platform for introducing students to basic bioinformatics algorithms.
Lentivector Integration Sites in Ependymal Cells From a Model of Metachromatic Leukodystrophy: Non-B DNA as a New Factor Influencing Integration

PubMed Central

McAllister, Robert G; Liu, Jiahui; Woods, Matthew W; Tom, Sean K; Rupar, C Anthony; Barr, Stephen D

2014-01-01

The blood–brain barrier controls the passage of molecules from the blood into the central nervous system (CNS) and is a major challenge for treatment of neurological diseases. Metachromatic leukodystrophy is a neurodegenerative lysosomal storage disease caused by loss of arylsulfatase A (ARSA) activity. Gene therapy via intraventricular injection of a lentiviral vector is a potential approach to rapidly and permanently deliver therapeutic levels of ARSA to the CNS. We present the distribution of integration sites of a lentiviral vector encoding human ARSA (LV-ARSA) in murine brain choroid plexus and ependymal cells, administered via a single intracranial injection into the CNS. LV-ARSA did not exhibit a strong preference for integration in or near actively transcribed genes, but exhibited a strong preference for integration in or near satellite DNA. We identified several genomic hotspots for LV-ARSA integration and identified a consensus target site sequence characterized by two G-quadruplex-forming motifs flanking the integration site. In addition, our analysis identified several other non-B DNA motifs as new factors that potentially influence lentivirus integration, including human immunodeficiency virus type-1 in human cells. Together, our data demonstrate a clinically favorable integration site profile in the murine brain and identify non-B DNA as a potential new host factor that influences lentiviral integration in murine and human cells. PMID:25158091
Functional Analysis of Light-harvesting-like Protein 3 (LIL3) and Its Light-harvesting Chlorophyll-binding Motif in Arabidopsis*

PubMed Central

Takahashi, Kaori; Takabayashi, Atsushi; Tanaka, Ayumi; Tanaka, Ryouichi

2014-01-01

The light-harvesting complex (LHC) constitutes the major light-harvesting antenna of photosynthetic eukaryotes. LHC contains a characteristic sequence motif, termed LHC motif, consisting of 25–30 mostly hydrophobic amino acids. This motif is shared by a number of transmembrane proteins from oxygenic photoautotrophs that are termed light-harvesting-like (LIL) proteins. To gain insights into the functions of LIL proteins and their LHC motifs, we functionally characterized a plant LIL protein, LIL3. This protein has been shown previously to stabilize geranylgeranyl reductase (GGR), a key enzyme in phytol biosynthesis. It is hypothesized that LIL3 functions to anchor GGR to membranes. First, we conjugated the transmembrane domain of LIL3 or that of ascorbate peroxidase to GGR and expressed these chimeric proteins in an Arabidopsis mutant lacking LIL3 protein. As a result, the transgenic plants restored phytol-synthesizing activity. These results indicate that GGR is active as long as it is anchored to membranes, even in the absence of LIL3. Subsequently, we addressed the question why the LHC motif is conserved in the LIL3 sequences. We modified the transmembrane domain of LIL3, which contains the LHC motif, by substituting its conserved amino acids (Glu-171, Asn-174, and Asp-189) with alanine. As a result, the Arabidopsis transgenic plants partly recovered the phytol-biosynthesizing activity. However, in these transgenic plants, the LIL3-GGR complexes were partially dissociated. Collectively, these results indicate that the LHC motif of LIL3 is involved in the complex formation of LIL3 and GGR, which might contribute to the GGR reaction. PMID:24275650
The MARVEL transmembrane motif of occludin mediates oligomerization and targeting to the basolateral surface in epithelia.

PubMed

Yaffe, Yakey; Shepshelovitch, Jeanne; Nevo-Yassaf, Inbar; Yeheskel, Adva; Shmerling, Hedva; Kwiatek, Joanna M; Gaus, Katharina; Pasmanik-Chor, Metsada; Hirschberg, Koret

2012-08-01

Occludin (Ocln), a MARVEL-motif-containing protein, is found in all tight junctions. MARVEL motifs are comprised of four transmembrane helices associated with the localization to or formation of diverse membrane subdomains by interacting with the proximal lipid environment. The functions of the Ocln MARVEL motif are unknown. Bioinformatics sequence- and structure-based analyses demonstrated that the MARVEL domain of Ocln family proteins has distinct evolutionarily conserved sequence features that are consistent with its basolateral membrane localization. Live-cell microscopy, fluorescence resonance energy transfer (FRET) and bimolecular fluorescence complementation (BiFC) were used to analyze the intracellular distribution and self-association of fluorescent-protein-tagged full-length human Ocln or the Ocln MARVEL motif excluding the cytosolic C- and N-termini (amino acids 60-269, FP-MARVEL-Ocln). FP-MARVEL-Ocln efficiently arrived at the plasma membrane (PM) and was sorted to the basolateral PM in filter-grown polarized MDCK cells. A series of conserved aromatic amino acids within the MARVEL domain were found to be associated with Ocln dimerization using BiFC. FP-MARVEL-Ocln inhibited membrane pore growth during Triton-X-100-induced solubilization and was shown to increase the membrane-ordered state using Laurdan, a lipid dye. These data demonstrate that the Ocln MARVEL domain mediates self-association and correct sorting to the basolateral membrane.
Quantitative statistical analysis of cis-regulatory sequences in ABA/VP1- and CBF/DREB1-regulated genes of Arabidopsis.

PubMed

Suzuki, Masaharu; Ketterling, Matthew G; McCarty, Donald R

2005-09-01

We have developed a simple quantitative computational approach for objective analysis of cis-regulatory sequences in promoters of coregulated genes. The program, designated MotifFinder, identifies oligo sequences that are overrepresented in promoters of coregulated genes. We used this approach to analyze promoter sequences of Viviparous1 (VP1)/abscisic acid (ABA)-regulated genes and cold-regulated genes, respectively, of Arabidopsis (Arabidopsis thaliana). We detected significantly enriched sequences in up-regulated genes but not in down-regulated genes. This result suggests that gene activation but not repression is mediated by specific and common sequence elements in promoters. The enriched motifs include several known cis-regulatory sequences as well as previously unidentified motifs. With respect to known cis-elements, we dissected the flanking nucleotides of the core sequences of Sph element, ABA response elements (ABREs), and the C repeat/dehydration-responsive element. This analysis identified the motif variants that may correlate with qualitative and quantitative differences in gene expression. While both VP1 and cold responses are mediated in part by ABA signaling via ABREs, these responses correlate with unique ABRE variants distinguished by nucleotides flanking the ACGT core. ABRE and Sph motifs are tightly associated uniquely in the coregulated set of genes showing a strict dependence on VP1 and ABA signaling. Finally, analysis of distribution of the enriched sequences revealed a striking concentration of enriched motifs in a proximal 200-base region of VP1/ABA and cold-regulated promoters. Overall, each class of coregulated genes possesses a discrete set of the enriched motifs with unique distributions in their promoters that may account for the specificity of gene regulation.
Characterizing and controlling intrinsic biases of lambda exonuclease in nascent strand sequencing reveals phasing between nucleosomes and G-quadruplex motifs around a subset of human replication origins

PubMed Central

Foulk, Michael S.; Urban, John M.; Casella, Cinzia; Gerbi, Susan A.

2015-01-01

Nascent strand sequencing (NS-seq) is used to discover DNA replication origins genome-wide, allowing identification of features for their specification. NS-seq depends on the ability of lambda exonuclease (λ-exo) to efficiently digest parental DNA while leaving RNA-primer protected nascent strands intact. We used genomics and biochemical approaches to determine if λ-exo digests all parental DNA sequences equally. We report that λ-exo does not efficiently digest G-quadruplex (G4) structures in a plasmid. Moreover, λ-exo digestion of nonreplicating genomic DNA (LexoG0) enriches GC-rich DNA and G4 motifs genome-wide. We used LexoG0 data to control for nascent strand–independent λ-exo biases in NS-seq and validated this approach at the rDNA locus. The λ-exo–controlled NS-seq peaks are not GC-rich, and only 35.5% overlap with 6.8% of all G4s, suggesting that G4s are not general determinants for origin specification but may play a role for a subset. Interestingly, we observed a periodic spacing of G4 motifs and nucleosomes around the peak summits, suggesting that G4s may position nucleosomes at this subset of origins. Finally, we demonstrate that use of Na+ instead of K+ in the λ-exo digestion buffer reduced the effect of G4s on λ-exo digestion and discuss ways to increase both the sensitivity and specificity of NS-seq. PMID:25695952
Effective Feature Selection for Classification of Promoter Sequences.

PubMed

K, Kouser; P G, Lavanya; Rangarajan, Lalitha; K, Acharya Kshitish

2016-01-01

Exploring novel computational methods in making sense of biological data has not only been a necessity, but also productive. A part of this trend is the search for more efficient in silico methods/tools for analysis of promoters, which are parts of DNA sequences that are involved in regulation of expression of genes into other functional molecules. Promoter regions vary greatly in their function based on the sequence of nucleotides and the arrangement of protein-binding short-regions called motifs. In fact, the regulatory nature of the promoters seems to be largely driven by the selective presence and/or the arrangement of these motifs. Here, we explore computational classification of promoter sequences based on the pattern of motif distributions, as such classification can pave a new way of functional analysis of promoters and to discover the functionally crucial motifs. We make use of Position Specific Motif Matrix (PSMM) features for exploring the possibility of accurately classifying promoter sequences using some of the popular classification techniques. The classification results on the complete feature set are low, perhaps due to the huge number of features. We propose two ways of reducing features. Our test results show improvement in the classification output after the reduction of features. The results also show that decision trees outperform SVM (Support Vector Machine), KNN (K Nearest Neighbor) and ensemble classifier LibD3C, particularly with reduced features. The proposed feature selection methods outperform some of the popular feature transformation methods such as PCA and SVD. Also, the methods proposed are as accurate as MRMR (feature selection method) but much faster than MRMR. Such methods could be useful to categorize new promoters and explore regulatory mechanisms of gene expressions in complex eukaryotic species.
The LINKS motif zippers trans-acyltransferase polyketide synthase assembly lines into a biosynthetic megacomplex.

PubMed

Gay, Darren C; Wagner, Drew T; Meinke, Jessica L; Zogzas, Charles E; Gay, Glen R; Keatinge-Clay, Adrian T

2016-03-01

Polyketides such as the clinically-valuable antibacterial agent mupirocin are constructed by architecturally-sophisticated assembly lines known as trans-acyltransferase polyketide synthases. Organelle-sized megacomplexes composed of several copies of trans-acyltransferase polyketide synthase assembly lines have been observed by others through transmission electron microscopy to be located at the Bacillus subtilis plasma membrane, where the synthesis and export of the antibacterial polyketide bacillaene takes place. In this work we analyze ten crystal structures of trans-acyltransferase polyketide synthases ketosynthase domains, seven of which are reported here for the first time, to characterize a motif capable of zippering assembly lines into a megacomplex. While each of the three-helix LINKS (Laterally-INteracting Ketosynthase Sequence) motifs is observed to similarly dock with a spatially-reversed copy of itself through hydrophobic and ionic interactions, the amino acid sequences of this motif are not conserved. Such a code is appropriate for mediating homotypic contacts between assembly lines to ensure the ordered self-assembly of a noncovalent, yet tightly-knit, enzymatic network. LINKS-mediated lateral interactions would also have the effect of bolstering the vertical association of the polypeptides that comprise a polyketide synthase assembly line. Copyright © 2015 Elsevier Inc. All rights reserved.
Golden Ratio Versus Pi as Random Sequence Sources for Monte Carlo Integration

NASA Technical Reports Server (NTRS)

Sen, S. K.; Agarwal, Ravi P.; Shaykhian, Gholam Ali

2007-01-01

We discuss here the relative merits of these numbers as possible random sequence sources. The quality of these sequences is not judged directly based on the outcome of all known tests for the randomness of a sequence. Instead, it is determined implicitly by the accuracy of the Monte Carlo integration in a statistical sense. Since our main motive of using a random sequence is to solve real world problems, it is more desirable if we compare the quality of the sequences based on their performances for these problems in terms of quality/accuracy of the output. We also compare these sources against those generated by a popular pseudo-random generator, viz., the Matlab rand and the quasi-random generator ha/ton both in terms of error and time complexity. Our study demonstrates that consecutive blocks of digits of each of these numbers produce a good random sequence source. It is observed that randomly chosen blocks of digits do not have any remarkable advantage over consecutive blocks for the accuracy of the Monte Carlo integration. Also, it reveals that pi is a better source of a random sequence than theta when the accuracy of the integration is concerned.
Deletion of transcription factor binding motifs using the CRISPR/spCas9 system in the β-globin LCR.

PubMed

Kim, Yea Woon; Kim, AeRi

2017-07-20

Transcription factors play roles in gene transcription through direct binding to their motifs in genome, and inhibiting this binding provides an effective strategy for studying their roles. Here we applied the CRISPR/spCas9 system to mutate the binding motifs of transcription factors. Binding motifs for erythroid specific transcription factors were mutated in the locus control region hypersensitive sites of the human β-globin locus. Guide RNAs targeting binding motifs were cloned into lentiviral CRISPR vector containing the spCas9 gene, and transduced into MEL/ch11 cells carrying a human chromosome 11. DNA mutations in clonal cells were initially screened by quantitative PCR in genomic DNA and then clarified by sequencing. Mutations in binding motifs reduced occupancy by transcription factors in a chromatin environment. Characterization of mutations revealed that the CRISPR/spCas9 system mainly induced deletions in short regions of <20 bp and preferentially deleted nucleotides around the fifth nucleotide upstream of Protospacer adjacent motifs. These results indicate that the CRISPR/Cas9 system is suitable for mutating the binding motifs of transcription factors, and, consequently, would contribute to elucidate the direct roles of transcription factors. ©2017 The Author(s).
Sequence-specific DNA binding activity of the cross-brace zinc finger motif of the piggyBac transposase

PubMed Central

Morellet, Nelly; Li, Xianghong; Wieninger, Silke A; Taylor, Jennifer L; Bischerour, Julien; Moriau, Séverine; Lescop, Ewen; Bardiaux, Benjamin; Mathy, Nathalie; Assrir, Nadine; Bétermier, Mireille; Nilges, Michael; Hickman, Alison B; Dyda, Fred; Craig, Nancy L; Guittet, Eric

2018-01-01

Abstract The piggyBac transposase (PB) is distinguished by its activity and utility in genome engineering, especially in humans where it has highly promising therapeutic potential. Little is known, however, about the structure–function relationships of the different domains of PB. Here, we demonstrate in vitro and in vivo that its C-terminal Cysteine-Rich Domain (CRD) is essential for DNA breakage, joining and transposition and that it binds to specific DNA sequences in the left and right transposon ends, and to an additional unexpectedly internal site at the left end. Using NMR, we show that the CRD adopts the specific fold of the cross-brace zinc finger protein family. We determine the interaction interfaces between the CRD and its target, the 5′-TGCGT-3′/3′-ACGCA-5′ motifs found in the left, left internal and right transposon ends, and use NMR results to propose docking models for the complex, which are consistent with our site-directed mutagenesis data. Our results provide support for a model of the PB/DNA interactions in the context of the transpososome, which will be useful for the rational design of PB mutants with increased activity. PMID:29385532

CombiMotif: A new algorithm for network motifs discovery in protein-protein interaction networks

NASA Astrophysics Data System (ADS)

Luo, Jiawei; Li, Guanghui; Song, Dan; Liang, Cheng

2014-12-01

Discovering motifs in protein-protein interaction networks is becoming a current major challenge in computational biology, since the distribution of the number of network motifs can reveal significant systemic differences among species. However, this task can be computationally expensive because of the involvement of graph isomorphic detection. In this paper, we present a new algorithm (CombiMotif) that incorporates combinatorial techniques to count non-induced occurrences of subgraph topologies in the form of trees. The efficiency of our algorithm is demonstrated by comparing the obtained results with the current state-of-the art subgraph counting algorithms. We also show major differences between unicellular and multicellular organisms. The datasets and source code of CombiMotif are freely available upon request.
Characterization of tannase protein sequences of bacteria and fungi: an in silico study.

PubMed

Banerjee, Amrita; Jana, Arijit; Pati, Bikash R; Mondal, Keshab C; Das Mohapatra, Pradeep K

2012-04-01

The tannase protein sequences of 149 bacteria and 36 fungi were retrieved from NCBI database. Among them only 77 bacterial and 31 fungal tannase sequences were taken which have different amino acid compositions. These sequences were analysed for different physical and chemical properties, superfamily search, multiple sequence alignment, phylogenetic tree construction and motif finding to find out the functional motif and the evolutionary relationship among them. The superfamily search for these tannase exposed the occurrence of proline iminopeptidase-like, biotin biosynthesis protein BioH, O-acetyltransferase, carboxylesterase/thioesterase 1, carbon-carbon bond hydrolase, haloperoxidase, prolyl oligopeptidase, C-terminal domain and mycobacterial antigens families and alpha/beta hydrolase superfamily. Some bacterial and fungal sequence showed similarity with different families individually. The multiple sequence alignment of these tannase protein sequences showed conserved regions at different stretches with maximum homology from amino acid residues 389-469 and 482-523 which could be used for designing degenerate primers or probes specific for tannase producing bacterial and fungal species. Phylogenetic tree showed two different clusters; one has only bacteria and another have both fungi and bacteria showing some relationship between these different genera. Although in second cluster near about all fungal species were found together in a corner which indicates the sequence level similarity among fungal genera. The distributions of fourteen motifs analysis revealed Motif 1 with a signature amino acid sequence of 29 amino acids, i.e. GCSTGGREALKQAQRWPHDYDGIIANNPA, was uniformly observed in 83.3 % of studied tannase sequences representing its participation with the structure and enzymatic function.
Structural and functional analysis of the GABARAP interaction motif (GIM)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rogov, Vladimir V.; Stolz, Alexandra; Ravichandran, Arvind C.

Through the canonical LC3 interaction motif (LIR), [W/F/Y]–X 1–X 2[I/L/V], protein complexes are recruited to autophagosomes to perform their functions as either autophagy adaptors or receptors. How these adaptors/receptors selectively interact with either LC3 or GABARAP families remains unclear. Herein, we determine the range of selectivity of 30 known core LIR motifs towards individual LC3s and GABARAPs. From these, we define a GABARAP Interaction Motif (GIM) sequence ([W/F]–[V/I]–X 2–V) that the adaptor protein PLEKHM1 tightly conforms to. Using biophysical and structural approaches, we show that the PLEKHM1–LIR is indeed 11–fold more specific for GABARAP than LC3B. Selective mutation of themore » X 1 and X 2 positions either completely abolished the interaction with all LC3 and GABARAPs or increased PLEKHM1–GIM selectivity 20–fold towards LC3B. Finally, we show that conversion of p62/SQSTM1, FUNDC1 and FIP200 LIRs into our newly defined GIM, by introducing two valine residues, enhances their interaction with endogenous GABARAP over LC3B. In conclusion, the identification of a GABARAP–specific interaction motif will aid the identification and characterization of the expanding array of autophagy receptor and adaptor proteins and their in vivo functions.« less
Structural and functional analysis of the GABARAP interaction motif (GIM)

DOE PAGES

Rogov, Vladimir V.; Stolz, Alexandra; Ravichandran, Arvind C.; ...

2017-06-27

Through the canonical LC3 interaction motif (LIR), [W/F/Y]–X 1–X 2[I/L/V], protein complexes are recruited to autophagosomes to perform their functions as either autophagy adaptors or receptors. How these adaptors/receptors selectively interact with either LC3 or GABARAP families remains unclear. Herein, we determine the range of selectivity of 30 known core LIR motifs towards individual LC3s and GABARAPs. From these, we define a GABARAP Interaction Motif (GIM) sequence ([W/F]–[V/I]–X 2–V) that the adaptor protein PLEKHM1 tightly conforms to. Using biophysical and structural approaches, we show that the PLEKHM1–LIR is indeed 11–fold more specific for GABARAP than LC3B. Selective mutation of themore » X 1 and X 2 positions either completely abolished the interaction with all LC3 and GABARAPs or increased PLEKHM1–GIM selectivity 20–fold towards LC3B. Finally, we show that conversion of p62/SQSTM1, FUNDC1 and FIP200 LIRs into our newly defined GIM, by introducing two valine residues, enhances their interaction with endogenous GABARAP over LC3B. In conclusion, the identification of a GABARAP–specific interaction motif will aid the identification and characterization of the expanding array of autophagy receptor and adaptor proteins and their in vivo functions.« less
Detection and Preliminary Analysis of Motifs in Promoters of Anaerobically Induced Genes of Different Plant Species

PubMed Central

MOHANTY, BIJAYALAXMI; KRISHNAN, S. P. T.; SWARUP, SANJAY; BAJIC, VLADIMIR B.

2005-01-01

• Background and Aims Plants can suffer from oxygen limitation during flooding or more complete submergence and may therefore switch from Kreb's cycle respiration to fermentation in association with the expression of anaerobically inducible genes coding for enzymes involved in glycolysis and fermentation. The aim of this study was to clarify mechanisms of transcriptional regulation of these anaerobic genes by identifying motifs shared by their promoter regions. • Methods Statistically significant motifs were detected by an in silico method from 13 promoters of anaerobic genes. The selected motifs were common for the majority of analysed promoters. Their significance was evaluated by searching for their presence in transcription factor-binding site databases (TRANSFAC, PlantCARE and PLACE). Using several negative control data sets, it was tested whether the motifs found were specific to the anaerobic group. • Key Results Previously, anaerobic response elements have been identified in maize (Zea mays) and arabidopsis (Arabidopsis thaliana) genes. Known functional motifs were detected, such as GT and GC motifs, but also other motifs shared by most of the genes examined. Five motifs detected have not been found in plants hitherto but are present in the promoters of animal genes with various functions. The consensus sequences of these novel motifs are 5′-AAACAAA-3′, 5′-AGCAGC-3′, 5′-TCATCAC-3′, 5′-GTTT(A/C/T)GCAA-3′ and 5′-TTCCCTGTT-3′. • Conclusions It is believed that the promoter motifs identified could be functional by conferring anaerobic sensitivity to the genes that possess them. This proposal now requires experimental verification. PMID:16027132
PhyloGibbs-MP: Module Prediction and Discriminative Motif-Finding by Gibbs Sampling

PubMed Central

Siddharthan, Rahul

2008-01-01

PhyloGibbs, our recent Gibbs-sampling motif-finder, takes phylogeny into account in detecting binding sites for transcription factors in DNA and assigns posterior probabilities to its predictions obtained by sampling the entire configuration space. Here, in an extension called PhyloGibbs-MP, we widen the scope of the program, addressing two major problems in computational regulatory genomics. First, PhyloGibbs-MP can localise predictions to small, undetermined regions of a large input sequence, thus effectively predicting cis-regulatory modules (CRMs) ab initio while simultaneously predicting binding sites in those modules—tasks that are usually done by two separate programs. PhyloGibbs-MP's performance at such ab initio CRM prediction is comparable with or superior to dedicated module-prediction software that use prior knowledge of previously characterised transcription factors. Second, PhyloGibbs-MP can predict motifs that differentiate between two (or more) different groups of regulatory regions, that is, motifs that occur preferentially in one group over the others. While other “discriminative motif-finders” have been published in the literature, PhyloGibbs-MP's implementation has some unique features and flexibility. Benchmarks on synthetic and actual genomic data show that this algorithm is successful at enhancing predictions of differentiating sites and suppressing predictions of common sites and compares with or outperforms other discriminative motif-finders on actual genomic data. Additional enhancements include significant performance and speed improvements, the ability to use “informative priors” on known transcription factors, and the ability to output annotations in a format that can be visualised with the Generic Genome Browser. In stand-alone motif-finding, PhyloGibbs-MP remains competitive, outperforming PhyloGibbs-1.0 and other programs on benchmark data. PMID:18769735
Designing synthetic RNAs to determine the relevance of structural motifs in picornavirus IRES elements

NASA Astrophysics Data System (ADS)

Fernandez-Chamorro, Javier; Lozano, Gloria; Garcia-Martin, Juan Antonio; Ramajo, Jorge; Dotu, Ivan; Clote, Peter; Martinez-Salas, Encarnacion

2016-04-01

The function of Internal Ribosome Entry Site (IRES) elements is intimately linked to their RNA structure. Viral IRES elements are organized in modular domains consisting of one or more stem-loops that harbor conserved RNA motifs critical for internal initiation of translation. A conserved motif is the pyrimidine-tract located upstream of the functional initiation codon in type I and II picornavirus IRES. By computationally designing synthetic RNAs to fold into a structure that sequesters the polypyrimidine tract in a hairpin, we establish a correlation between predicted inaccessibility of the pyrimidine tract and IRES activity, as determined in both in vitro and in vivo systems. Our data supports the hypothesis that structural sequestration of the pyrimidine-tract within a stable hairpin inactivates IRES activity, since the stronger the stability of the hairpin the higher the inhibition of protein synthesis. Destabilization of the stem-loop immediately upstream of the pyrimidine-tract also decreases IRES activity. Our work introduces a hybrid computational/experimental method to determine the importance of structural motifs for biological function. Specifically, we show the feasibility of using the software RNAiFold to design synthetic RNAs with particular sequence and structural motifs that permit subsequent experimental determination of the importance of such motifs for biological function.
The C-terminal portion of the cleaved HT motif is necessary and sufficient to mediate export of proteins from the malaria parasite into its host cell

PubMed Central

Tarr, Sarah J; Cryar, Adam; Thalassinos, Konstantinos; Haldar, Kasturi; Osborne, Andrew R

2013-01-01

The malaria parasite exports proteins across its plasma membrane and a surrounding parasitophorous vacuole membrane, into its host erythrocyte. Most exported proteins contain a Host Targeting motif (HT motif) that targets them for export. In the parasite secretory pathway, the HT motif is cleaved by the protease plasmepsin V, but the role of the newly generated N-terminal sequence in protein export is unclear. Using a model protein that is cleaved by an exogenous viral protease, we show that the new N-terminal sequence, normally generated by plasmepsin V cleavage, is sufficient to target a protein for export, and that cleavage by plasmepsin V is not coupled directly to the transfer of a protein to the next component in the export pathway. Mutation of the fourth and fifth positions of the HT motif, as well as amino acids further downstream, block or affect the efficiency of protein export indicating that this region is necessary for efficient export. We also show that the fifth position of the HT motif is important for plasmepsin V cleavage. Our results indicate that plasmepsin V cleavage is required to generate a new N-terminal sequence that is necessary and sufficient to mediate protein export by the malaria parasite. PMID:23279267
Molecular Signaling Network Motifs Provide a Mechanistic Basis for Cellular Threshold Responses

PubMed Central

Bhattacharya, Sudin; Conolly, Rory B.; Clewell, Harvey J.; Kaminski, Norbert E.; Andersen, Melvin E.

2014-01-01

Background: Increasingly, there is a move toward using in vitro toxicity testing to assess human health risk due to chemical exposure. As with in vivo toxicity testing, an important question for in vitro results is whether there are thresholds for adverse cellular responses. Empirical evaluations may show consistency with thresholds, but the main evidence has to come from mechanistic considerations. Objectives: Cellular response behaviors depend on the molecular pathway and circuitry in the cell and the manner in which chemicals perturb these circuits. Understanding circuit structures that are inherently capable of resisting small perturbations and producing threshold responses is an important step towards mechanistically interpreting in vitro testing data. Methods: Here we have examined dose–response characteristics for several biochemical network motifs. These network motifs are basic building blocks of molecular circuits underpinning a variety of cellular functions, including adaptation, homeostasis, proliferation, differentiation, and apoptosis. For each motif, we present biological examples and models to illustrate how thresholds arise from specific network structures. Discussion and Conclusion: Integral feedback, feedforward, and transcritical bifurcation motifs can generate thresholds. Other motifs (e.g., proportional feedback and ultrasensitivity)produce responses where the slope in the low-dose region is small and stays close to the baseline. Feedforward control may lead to nonmonotonic or hormetic responses. We conclude that network motifs provide a basis for understanding thresholds for cellular responses. Computational pathway modeling of these motifs and their combinations occurring in molecular signaling networks will be a key element in new risk assessment approaches based on in vitro cellular assays. Citation: Zhang Q, Bhattacharya S, Conolly RB, Clewell HJ III, Kaminski NE, Andersen ME. 2014. Molecular signaling network motifs provide a
Unusual occurrence of a DAG motif in the Ipomovirus Cassava brown streak virus and implications for its vector transmission.

PubMed

Ateka, Elijah; Alicai, Titus; Ndunguru, Joseph; Tairo, Fred; Sseruwagi, Peter; Kiarie, Samuel; Makori, Timothy; Kehoe, Monica A; Boykin, Laura M

2017-01-01

Cassava is the main staple food for over 800 million people globally. Its production in eastern Africa is being constrained by two devastating Ipomoviruses that cause cassava brown streak disease (CBSD); Cassava brown streak virus (CBSV) and Ugandan cassava brown streak virus (UCBSV), with up to 100% yield loss for smallholder farmers in the region. To date, vector studies have not resulted in reproducible and highly efficient transmission of CBSV and UCBSV. Most virus transmission studies have used Bemisia tabaci (whitefly), but a maximum of 41% U/CBSV transmission efficiency has been documented for this vector. With the advent of next generation sequencing, researchers are generating whole genome sequences for both CBSV and UCBSV from throughout eastern Africa. Our initial goal for this study was to characterize U/CBSV whole genomes from CBSD symptomatic cassava plants sampled in Kenya. We have generated 8 new whole genomes (3 CBSV and 5 UCBSV) from Kenya, and in the process of analyzing these genomes together with 26 previously published sequences, we uncovered the aphid transmission associated DAG motif within coat protein genes of all CBSV whole genomes at amino acid positions 52-54, but not in UCBSV. Upon further investigation, the DAG motif was also found at the same positions in two other Ipomoviruses: Squash vein yellowing virus (SqVYV), Coccinia mottle virus (CocMoV). Until this study, the highly-conserved DAG motif, which is associated with aphid transmission was only noticed once, in SqVYV but discounted as being of minimal importance. This study represents the first comprehensive look at Ipomovirus genomes to determine the extent of DAG motif presence and significance for vector relations. The presence of this motif suggests that aphids could potentially be a vector of CBSV, SqVYV and CocMov. Further transmission and ipomoviral protein evolutionary studies are needed to confirm this hypothesis.
Solution structure and base pair opening kinetics of the i-motif dimer of d(5mCCTTTACC): a noncanonical structure with possible roles in chromosome stability.

PubMed

Nonin, S; Phan, A T; Leroy, J L

1997-09-15

Repetitive cytosine-rich DNA sequences have been identified in telomeres and centromeres of eukaryotic chromosomes. These sequences play a role in maintaining chromosome stability during replication and may be involved in chromosome pairing during meiosis. The C-rich repeats can fold into an 'i-motif' structure, in which two parallel-stranded duplexes with hemiprotonated C.C+ pairs are intercalated. Previous NMR studies of naturally occurring repeats have produced poor NMR spectra. This led us to investigate oligonucleotides, based on natural sequences, to produce higher quality spectra and thus provide further information as to the structure and possible biological function of the i-motif. NMR spectroscopy has shown that d(5mCCTTTACC) forms an i-motif dimer of symmetry-related and intercalated folded strands. The high-definition structure is computed on the basis of the build-up rates of 29 intraresidue and 35 interresidue nuclear Overhauser effect (NOE) connectivities. The i-motif core includes intercalated interstrand C.C+ pairs stacked in the order 2*.8/1.7*/1*.7/2.8* (where one strand is distinguished by an asterisk and the numbers relate to the base positions within the repeat). The TTTA sequences form two loops which span the two wide grooves on opposite sides of the i-motif core; the i-motif core is extended at both ends by the stacking of A6 onto C2.C8+. The lifetimes of pairs C2.C8+ and 5mC1.C7+ are 1 ms and 1 s, respectively, at 15 degrees C. Anomalous exchange properties of the T3 imino proton indicate hydrogen bonding to A6 N7 via a water bridge. The d(5mCCTTTTCC) deoxyoligonucleotide, in which position 6 is occupied by a thymidine instead of an adenine, also forms a symmetric i-motif dimer. However, in this structure the two TTTT loops are located on the same side of the i-motif core and the C.C+ pairs are formed by equivalent cytidines stacked in the order 8*.8/1.1*/7*.7/2.2*. Oligodeoxynucleotides containing two C-rich repeats can fold and dimerize
The combinatorial PP1-binding consensus Motif (R/K)x( (0,1))V/IxFxx(R/K)x(R/K) is a new apoptotic signature.

PubMed

Godet, Angélique N; Guergnon, Julien; Maire, Virginie; Croset, Amélie; Garcia, Alphonse

2010-04-01

Previous studies established that PP1 is a target for Bcl-2 proteins and an important regulator of apoptosis. The two distinct functional PP1 consensus docking motifs, R/Kx((0,1))V/IxF and FxxR/KxR/K, involved in PP1 binding and cell death were previously characterized in the BH1 and BH3 domains of some Bcl-2 proteins. In this study, we demonstrate that DPT-AIF(1), a peptide containing the AIF(562-571) sequence located in a c-terminal domain of AIF, is a new PP1 interacting and cell penetrating molecule. We also showed that DPT-AIF(1) provoked apoptosis in several human cell lines. Furthermore, DPT-APAF(1) a bi-partite cell penetrating peptide containing APAF-1(122-131), a non penetrating sequence from APAF-1 protein, linked to our previously described DPT-sh1 peptide shuttle, is also a PP1-interacting death molecule. Both AIF(562-571) and APAF-1(122-131) sequences contain a common R/Kx((0,1))V/IxFxxR/KxR/K motif, shared by several proteins involved in control of cell survival pathways. This motif combines the two distinct PP1c consensus docking motifs initially identified in some Bcl-2 proteins. Interestingly DPT-AIF(2) and DPT-APAF(2) that carry a F to A mutation within this combinatorial motif, no longer exhibited any PP1c binding or apoptotic effects. Moreover the F to A mutation in DPT-AIF(2) also suppressed cell penetration. These results indicate that the combinatorial PP1c docking motif R/Kx((0,1))V/IxFxxR/KxR/K, deduced from AIF(562-571) and APAF-1(122-131) sequences, is a new PP1c-dependent Apoptotic Signature. This motif is also a new tool for drug design that could be used to characterize potential anti-tumour molecules.
The Regulatory Factor ZFHX3 Modifies Circadian Function in SCN via an AT Motif-Driven Axis

PubMed Central

Parsons, Michael J.; Brancaccio, Marco; Sethi, Siddharth; Maywood, Elizabeth S.; Satija, Rahul; Edwards, Jessica K.; Jagannath, Aarti; Couch, Yvonne; Finelli, Mattéa J.; Smyllie, Nicola J.; Esapa, Christopher; Butler, Rachel; Barnard, Alun R.; Chesham, Johanna E.; Saito, Shoko; Joynson, Greg; Wells, Sara; Foster, Russell G.; Oliver, Peter L.; Simon, Michelle M.; Mallon, Ann-Marie; Hastings, Michael H.; Nolan, Patrick M.

2015-01-01

Summary We identified a dominant missense mutation in the SCN transcription factor Zfhx3, termed short circuit (Zfhx3Sci), which accelerates circadian locomotor rhythms in mice. ZFHX3 regulates transcription via direct interaction with predicted AT motifs in target genes. The mutant protein has a decreased ability to activate consensus AT motifs in vitro. Using RNA sequencing, we found minimal effects on core clock genes in Zfhx3Sci/+ SCN, whereas the expression of neuropeptides critical for SCN intercellular signaling was significantly disturbed. Moreover, mutant ZFHX3 had a decreased ability to activate AT motifs in the promoters of these neuropeptide genes. Lentiviral transduction of SCN slices showed that the ZFHX3-mediated activation of AT motifs is circadian, with decreased amplitude and robustness of these oscillations in Zfhx3Sci/+ SCN slices. In conclusion, by cloning Zfhx3Sci, we have uncovered a circadian transcriptional axis that determines the period and robustness of behavioral and SCN molecular rhythms. PMID:26232227
Distinct cagA EPIYA motifs are associated with ethnic diversity in Malaysia and Singapore.

PubMed

Schmidt, Heather-Marie A; Goh, Khean-Lee; Fock, Kwong Ming; Hilmi, Ida; Dhamodaran, Subbiah; Forman, David; Mitchell, Hazel

2009-08-01

In vitro studies have shown that the biologic activity of CagA is influenced by the number and class of EPIYA motifs present in its variable region as these motifs correspond to the CagA phosphorylation sites. It has been hypothesized that strains possessing specific combinations of these motifs may be responsible for gastric cancer development. This study investigated the prevalence of cagA and the EPIYA motifs with regard to number, class, and patterns in strains from the three major ethnic groups within the Malaysian and Singaporean populations in relation to disease development. Helicobacter pylori isolates from 49 Chinese, 43 Indian, and 14 Malay patients with functional dyspepsia (FD) and 21 gastric cancer (GC) cases were analyzed using polymerase chain reaction for the presence of cagA and the number, type, and pattern of EPIYA motifs. Additionally, the EPIYA motifs of 47 isolates were sequenced. All 126 isolates possessed cagA, with the majority encoding EPIYA-A (97.6%) and all encoding EPIYA-B. However, while the cagA of 93.0% of Indian FD isolates encoded EPIYA-C as the third motif, 91.8% of Chinese FD isolates and 81.7% of Chinese GC isolates encoded EPIYA-D (p < .001). Of Malay FD isolates, 61.5% and 38.5% possessed EPIYA-C and EPIYA-D, respectively. The majority of isolates possessed three EPIYA motifs; however, Indian isolates were significantly more likely to have four or more (p < .05). Although, H. pylori strains with distinct cagA-types are circulating within the primary ethnic groups resident in Malaysia and Singapore, these genotypes appear unassociated with the development of GC in the ethnic Chinese population. The phenomenon of distinct strains circulating within different ethnic groups, in combination with host and certain environmental factors, may help to explain the rates of GC development in Malaysia.
Placing a Disrupted Degradation Motif at the C Terminus of Proteasome Substrates Attenuates Degradation without Impairing Ubiquitylation*

PubMed Central

Alfassy, Omri S.; Cohen, Itamar; Reiss, Yuval; Tirosh, Boaz; Ravid, Tommer

2013-01-01

Protein elimination by the ubiquitin-proteasome system requires the presence of a cis-acting degradation signal. Efforts to discern degradation signals of misfolded proteasome substrates thus far revealed a general mechanism whereby the exposure of cryptic hydrophobic motifs provides a degradation determinant. We have previously characterized such a determinant, employing the yeast kinetochore protein Ndc10 as a model substrate. Ndc10 is essentially a stable protein that is rapidly degraded upon exposure of a hydrophobic motif located at the C-terminal region. The degradation motif comprises two distinct and essential elements: DegA, encompassing two amphipathic helices, and DegB, a hydrophobic sequence within the loosely structured C-terminal tail of Ndc10. Here we show that the hydrophobic nature of DegB is irrelevant for the ubiquitylation of substrates containing the Ndc10 degradation motif, but is essential for proteasomal degradation. Mutant DegB, in which the hydrophobic sequence was disrupted, acted as a dominant degradation inhibitory element when expressed at the C-terminal regions of ubiquitin-dependent and -independent substrates of the 26S proteasome. This mutant stabilized substrates in both yeast and mammalian cells, indicative of a modular recognition moiety. The dominant function of the mutant DegB provides a powerful experimental tool for evaluating the physiological implications of stabilization of specific proteasome substrates in intact cells and for studying the associated pathological effects. PMID:23519465
The heptanucleotide motif GAGACGC is a key component of a cis-acting promoter element that is critical for SnSAG1 expression in Sarcocystis neurona.

PubMed

Gaji, Rajshekhar Y; Howe, Daniel K

2009-07-01

The apicomplexan parasite Sarcocystis neurona undergoes a complex process of intracellular development, during which many genes are temporally regulated. The described study was undertaken to begin identifying the basic promoter elements that control gene expression in S. neurona. Sequence analysis of the 5'-flanking region of five S. neurona genes revealed a conserved heptanucleotide motif GAGACGC that is similar to the WGAGACG motif described upstream of multiple genes in Toxoplasma gondii. The promoter region for the major surface antigen gene SnSAG1, which contains three heptanucleotide motifs within 135 bases of the transcription start site, was dissected by functional analysis using a dual luciferase reporter assay. These analyses revealed that a minimal promoter fragment containing all three motifs was sufficient to drive reporter molecule expression, with the presence and orientation of the 5'-most heptanucleotide motif being absolutely critical for promoter function. Further studies should help to identify additional sequence elements important for promoter function and for controlling gene expression during intracellular development by this apicomplexan pathogen.
Fossil rhabdoviral sequences integrated into arthropod genomes: ontogeny, evolution, and potential functionality.

PubMed

Fort, Philippe; Albertini, Aurélie; Van-Hua, Aurélie; Berthomieu, Arnaud; Roche, Stéphane; Delsuc, Frédéric; Pasteur, Nicole; Capy, Pierre; Gaudin, Yves; Weill, Mylène

2012-01-01

Retroelements represent a considerable fraction of many eukaryotic genomes and are considered major drives for adaptive genetic innovations. Recent discoveries showed that despite not normally using DNA intermediates like retroviruses do, Mononegaviruses (i.e., viruses with nonsegmented, negative-sense RNA genomes) can integrate gene fragments into the genomes of their hosts. This was shown for Bornaviridae and Filoviridae, the sequences of which have been found integrated into the germ line cells of many vertebrate hosts. Here, we show that Rhabdoviridae sequences, the major Mononegavirales family, have integrated only into the genomes of arthropod species. We identified 185 integrated rhabdoviral elements (IREs) coding for nucleoproteins, glycoproteins, or RNA-dependent RNA polymerases; they were mostly found in the genomes of the mosquito Aedes aegypti and the blacklegged tick Ixodes scapularis. Phylogenetic analyses showed that most IREs in A. aegypti derived from multiple independent integration events. Since RNA viruses are submitted to much higher substitution rates as compared with their hosts, IREs thus represent fossil traces of the diversity of extinct Rhabdoviruses. Furthermore, analyses of orthologous IREs in A. aegypti field mosquitoes sampled worldwide identified an integrated polymerase IRE fragment that appeared under purifying selection within several million years, which supports a functional role in the host's biology. These results show that A. aegypti was subjected to repeated Rhabdovirus infectious episodes during its evolution history, which led to the accumulation of many integrated sequences. They also suggest that like retroviruses, integrated rhabdoviral sequences may participate actively in the evolution of their hosts.
Systematic analysis of phosphotyrosine antibodies recognizing single phosphorylated EPIYA-motifs in CagA of East Asian-type Helicobacter pylori strains.

PubMed

Lind, Judith; Backert, Steffen; Hoffmann, Rebecca; Eichler, Jutta; Yamaoka, Yoshio; Perez-Perez, Guillermo I; Torres, Javier; Sticht, Heinrich; Tegtmeyer, Nicole

2016-09-02

Highly virulent strains of the gastric pathogen Helicobacter pylori encode a type IV secretion system (T4SS) that delivers the effector protein CagA into gastric epithelial cells. Translocated CagA undergoes tyrosine phosphorylation by members of the oncogenic c-Src and c-Abl host kinases at EPIYA-sequence motifs A, B and D in East Asian-type strains. These phosphorylated EPIYA-motifs serve as recognition sites for various SH2-domains containing human proteins, mediating interactions of CagA with host signaling factors to manipulate signal transduction pathways. Recognition of phospho-CagA is mainly based on the use of commercial pan-phosphotyrosine antibodies that were originally designed to detect phosphotyrosines in mammalian proteins. Specific anti-phospho-EPIYA antibodies for each of the three sites in CagA are not forthcoming. This study was designed to systematically analyze the detection preferences of each phosphorylated East Asian CagA EPIYA-motif by pan-phosphotyrosine antibodies and to determine a minimal recognition sequence. We synthesized phospho- and non-phosphopeptides derived from each predominant EPIYA-site, and determined the recognition patterns by seven different pan-phosphotyrosine antibodies using Western blotting, and also investigated representative East Asian H. pylori isolates during infection. The results indicate that a total of only 9-11 amino acids containing the phosphorylated East Asian EPIYA-types are required and sufficient to detect the phosphopeptides with high specificity. However, the sequence recognition by the different antibodies was found to bear high variability. From the seven antibodies used, only four recognized all three phosphorylated EPIYA-motifs A, B and D similarly well. Two of the phosphotyrosine antibodies preferentially bound primarily to the phosphorylated motif A and D, while the seventh antibody failed to react with any of the phosphorylated EPIYA-motifs. Control experiments confirmed that none of the
Identity and functions of CxxC-derived motifs.

PubMed

Fomenko, Dmitri E; Gladyshev, Vadim N

2003-09-30

Two cysteines separated by two other residues (the CxxC motif) are employed by many redox proteins for formation, isomerization, and reduction of disulfide bonds and for other redox functions. The place of the C-terminal cysteine in this motif may be occupied by serine (the CxxS motif), modifying the functional repertoire of redox proteins. Here we found that the CxxC motif may also give rise to a motif, in which the C-terminal cysteine is replaced with threonine (the CxxT motif). Moreover, in contrast to a view that the N-terminal cysteine in the CxxC motif always serves as a nucleophilic attacking group, this residue could also be replaced with threonine (the TxxC motif), serine (the SxxC motif), or other residues. In each of these CxxC-derived motifs, the presence of a downstream alpha-helix was strongly favored. A search for conserved CxxC-derived motif/helix patterns in four complete genomes representing bacteria, archaea, and eukaryotes identified known redox proteins and suggested possible redox functions for several additional proteins. Catalytic sites in peroxiredoxins were major representatives of the TxxC motif, whereas those in glutathione peroxidases represented the CxxT motif. Structural assessments indicated that threonines in these enzymes could stabilize catalytic thiolates, suggesting revisions to previously proposed catalytic triads. Each of the CxxC-derived motifs was also observed in natural selenium-containing proteins, in which selenocysteine was present in place of a catalytic cysteine.
Sequential immunization with V3 peptides from primary human immunodeficiency virus type 1 produces cross-neutralizing antibodies against primary isolates with a matching narrow-neutralization sequence motif.

PubMed

Eda, Yasuyuki; Takizawa, Mari; Murakami, Toshio; Maeda, Hiroaki; Kimachi, Kazuhiko; Yonemura, Hiroshi; Koyanagi, Satoshi; Shiosaki, Kouichi; Higuchi, Hirofumi; Makizumi, Keiichi; Nakashima, Toshihiro; Osatomi, Kiyoshi; Tokiyoshi, Sachio; Matsushita, Shuzo; Yamamoto, Naoki; Honda, Mitsuo

2006-06-01

An antibody response capable of neutralizing not only homologous but also heterologous forms of the CXCR4-tropic human immunodeficiency virus type 1 (HIV-1) MNp and CCR5-tropic primary isolate HIV-1 JR-CSF was achieved through sequential immunization with a combination of synthetic peptides representing HIV-1 Env V3 sequences from field and laboratory HIV-1 clade B isolates. In contrast, repeated immunization with a single V3 peptide generated antibodies that neutralized only type-specific laboratory-adapted homologous viruses. To determine whether the cross-neutralization response could be attributed to a cross-reactive antibody in the immunized animals, we isolated a monoclonal antibody, C25, which neutralized the heterologous primary viruses of HIV-1 clade B. Furthermore, we generated a humanized monoclonal antibody, KD-247, by transferring the genes of the complementary determining region of C25 into genes of the human V region of the antibody. KD-247 bound with high affinity to the "PGR" motif within the HIV-1 Env V3 tip region, and, among the established reference antibodies, it most effectively neutralized primary HIV-1 field isolates possessing the matching neutralization sequence motif, suggesting its promise for clinical applications involving passive immunizations. These results demonstrate that sequential immunization with B-cell epitope peptides may contribute to a humoral immune-based HIV vaccine strategy. Indeed, they help lay the groundwork for the development of HIV-1 vaccine strategies that use sequential immunization with biologically relevant peptides to overcome difficulties associated with otherwise poorly immunogenic epitopes.

Characterizing and controlling intrinsic biases of lambda exonuclease in nascent strand sequencing reveals phasing between nucleosomes and G-quadruplex motifs around a subset of human replication origins.

PubMed

Foulk, Michael S; Urban, John M; Casella, Cinzia; Gerbi, Susan A

2015-05-01

Nascent strand sequencing (NS-seq) is used to discover DNA replication origins genome-wide, allowing identification of features for their specification. NS-seq depends on the ability of lambda exonuclease (λ-exo) to efficiently digest parental DNA while leaving RNA-primer protected nascent strands intact. We used genomics and biochemical approaches to determine if λ-exo digests all parental DNA sequences equally. We report that λ-exo does not efficiently digest G-quadruplex (G4) structures in a plasmid. Moreover, λ-exo digestion of nonreplicating genomic DNA (LexoG0) enriches GC-rich DNA and G4 motifs genome-wide. We used LexoG0 data to control for nascent strand-independent λ-exo biases in NS-seq and validated this approach at the rDNA locus. The λ-exo-controlled NS-seq peaks are not GC-rich, and only 35.5% overlap with 6.8% of all G4s, suggesting that G4s are not general determinants for origin specification but may play a role for a subset. Interestingly, we observed a periodic spacing of G4 motifs and nucleosomes around the peak summits, suggesting that G4s may position nucleosomes at this subset of origins. Finally, we demonstrate that use of Na(+) instead of K(+) in the λ-exo digestion buffer reduced the effect of G4s on λ-exo digestion and discuss ways to increase both the sensitivity and specificity of NS-seq. © 2015 Foulk et al.; Published by Cold Spring Harbor Laboratory Press.
Sequential Immunization with V3 Peptides from Primary Human Immunodeficiency Virus Type 1 Produces Cross-Neutralizing Antibodies against Primary Isolates with a Matching Narrow-Neutralization Sequence Motif

PubMed Central

Eda, Yasuyuki; Takizawa, Mari; Murakami, Toshio; Maeda, Hiroaki; Kimachi, Kazuhiko; Yonemura, Hiroshi; Koyanagi, Satoshi; Shiosaki, Kouichi; Higuchi, Hirofumi; Makizumi, Keiichi; Nakashima, Toshihiro; Osatomi, Kiyoshi; Tokiyoshi, Sachio; Matsushita, Shuzo; Yamamoto, Naoki; Honda, Mitsuo

2006-01-01

An antibody response capable of neutralizing not only homologous but also heterologous forms of the CXCR4-tropic human immunodeficiency virus type 1 (HIV-1) MNp and CCR5-tropic primary isolate HIV-1 JR-CSF was achieved through sequential immunization with a combination of synthetic peptides representing HIV-1 Env V3 sequences from field and laboratory HIV-1 clade B isolates. In contrast, repeated immunization with a single V3 peptide generated antibodies that neutralized only type-specific laboratory-adapted homologous viruses. To determine whether the cross-neutralization response could be attributed to a cross-reactive antibody in the immunized animals, we isolated a monoclonal antibody, C25, which neutralized the heterologous primary viruses of HIV-1 clade B. Furthermore, we generated a humanized monoclonal antibody, KD-247, by transferring the genes of the complementary determining region of C25 into genes of the human V region of the antibody. KD-247 bound with high affinity to the “PGR” motif within the HIV-1 Env V3 tip region, and, among the established reference antibodies, it most effectively neutralized primary HIV-1 field isolates possessing the matching neutralization sequence motif, suggesting its promise for clinical applications involving passive immunizations. These results demonstrate that sequential immunization with B-cell epitope peptides may contribute to a humoral immune-based HIV vaccine strategy. Indeed, they help lay the groundwork for the development of HIV-1 vaccine strategies that use sequential immunization with biologically relevant peptides to overcome difficulties associated with otherwise poorly immunogenic epitopes. PMID:16699036
Web server to identify similarity of amino acid motifs to compounds (SAAMCO).

PubMed

Casey, Fergal P; Davey, Norman E; Baran, Ivan; Varekova, Radka Svobodova; Shields, Denis C

2008-07-01

Protein-protein interactions are fundamental in mediating biological processes including metabolism, cell growth, and signaling. To be able to selectively inhibit or induce protein activity or complex formation is a key feature in controlling disease. For those situations in which protein-protein interactions derive substantial affinity from short linear peptide sequences, or motifs, we can develop search algorithms for peptidomimetic compounds that resemble the short peptide's structure but are not compromised by poor pharmacological properties. SAAMCO is a Web service ( http://bioware.ucd.ie/ approximately saamco) that facilitates the screening of motifs with known structures against bioactive compound databases. It is built on an algorithm that defines compound similarity based on the presence of appropriate amino acid side chain fragments and a favorable Root Mean Squared Deviation (RMSD) between compound and motif structure. The methodology is efficient as the available compound databases are preprocessed and fast regular expression searches filter potential matches before time-intensive 3D superposition is performed. The required input information is minimal, and the compound databases have been selected to maximize the availability of information on biological activity. "Hits" are accompanied with a visualization window and links to source database entries. Motif matching can be defined on partial or full similarity which will increase or reduce respectively the number of potential mimetic compounds. The Web server provides the functionality for rapid screening of known or putative interaction motifs against prepared compound libraries using a novel search algorithm. The tabulated results can be analyzed by linking to appropriate databases and by visualization.
Non-B DB v2.0: a database of predicted non-B DNA-forming motifs and its associated tools.

PubMed

Cer, Regina Z; Donohue, Duncan E; Mudunuri, Uma S; Temiz, Nuri A; Loss, Michael A; Starner, Nathan J; Halusa, Goran N; Volfovsky, Natalia; Yi, Ming; Luke, Brian T; Bacolla, Albino; Collins, Jack R; Stephens, Robert M

2013-01-01

The non-B DB, available at http://nonb.abcc.ncifcrf.gov, catalogs predicted non-B DNA-forming sequence motifs, including Z-DNA, G-quadruplex, A-phased repeats, inverted repeats, mirror repeats, direct repeats and their corresponding subsets: cruciforms, triplexes and slipped structures, in several genomes. Version 2.0 of the database revises and re-implements the motif discovery algorithms to better align with accepted definitions and thresholds for motifs, expands the non-B DNA-forming motifs coverage by including short tandem repeats and adds key visualization tools to compare motif locations relative to other genomic annotations. Non-B DB v2.0 extends the ability for comparative genomics by including re-annotation of the five organisms reported in non-B DB v1.0, human, chimpanzee, dog, macaque and mouse, and adds seven additional organisms: orangutan, rat, cow, pig, horse, platypus and Arabidopsis thaliana. Additionally, the non-B DB v2.0 provides an overall improved graphical user interface and faster query performance.
Identification and biochemical characterization of a GDSL-motif carboxylester hydrolase from Carica papaya latex.

PubMed

Abdelkafi, Slim; Ogata, Hiroyuki; Barouh, Nathalie; Fouquet, Benjamin; Lebrun, Régine; Pina, Michel; Scheirlinckx, Frantz; Villeneuve, Pierre; Carrière, Frédéric

2009-11-01

An esterase (CpEst) showing high specific activities on tributyrin and short chain vinyl esters was obtained from Carica papaya latex after an extraction step with zwitterionic detergent and sonication, followed by gel filtration chromatography. Although the protein could not be purified to complete homogeneity due to its presence in high molecular mass aggregates, a major protein band with an apparent molecular mass of 41 kDa was obtained by SDS-PAGE. This material was digested with trypsin and the amino acid sequences of the tryptic peptides were determined by LC/ESI/MS/MS. These sequences were used to identify a partial cDNA (679 bp) from expressed sequence tags (ESTs) of C. papaya. Based upon EST sequences, a full-length gene was identified in the genome of C. papaya, with an open reading frame of 1029 bp encoding a protein of 343 amino acid residues, with a theoretical molecular mass of 38 kDa. From sequence analysis, CpEst was identified as a GDSL-motif carboxylester hydrolase belonging to the SGNH protein family and four potential N-glycosylation sites were identified. The putative catalytic triad was localised (Ser(35)-Asp(307)-His(310)) with the nucleophile serine being part of the GDSL-motif. A 3D-model of CpEst was built from known X-ray structures and sequence alignments and the catalytic triad was found to be exposed at the surface of the molecule, thus confirming the results of CpEst inhibition by tetrahydrolipstatin suggesting a direct accessibility of the inhibitor to the active site.
Targeting MED1 LxxLL Motifs for Tissue-Selective Treatment of Human Breast Cancer

DTIC Science & Technology

2012-09-01

his colleagues have successfully conjugated malachite green (MG) aptamer to RNA nanoparticles characterized by a three-way junction (3WJ) pRNA motif...nanoparticle harboring malachite green (MG) aptamer, survivin siRNA and folate-DNA/RNA sequence for targeting delivery, using 3WJ-pRNA as scaffolds. Figure
Unlocking hidden genomic sequence

PubMed Central

Keith, Jonathan M.; Cochran, Duncan A. E.; Lala, Gita H.; Adams, Peter; Bryant, Darryn; Mitchelson, Keith R.

2004-01-01

Despite the success of conventional Sanger sequencing, significant regions of many genomes still present major obstacles to sequencing. Here we propose a novel approach with the potential to alleviate a wide range of sequencing difficulties. The technique involves extracting target DNA sequence from variants generated by introduction of random mutations. The introduction of mutations does not destroy original sequence information, but distributes it amongst multiple variants. Some of these variants lack problematic features of the target and are more amenable to conventional sequencing. The technique has been successfully demonstrated with mutation levels up to an average 18% base substitution and has been used to read previously intractable poly(A), AT-rich and GC-rich motifs. PMID:14973330
Identification of early zygotic genes in the yellow fever mosquito Aedes aegypti and discovery of a motif involved in early zygotic genome activation.

PubMed

Biedler, James K; Hu, Wanqi; Tae, Hongseok; Tu, Zhijian

2012-01-01

During early embryogenesis the zygotic genome is transcriptionally silent and all mRNAs present are of maternal origin. The maternal-zygotic transition marks the time over which embryogenesis changes its dependence from maternal RNAs to zygotically transcribed RNAs. Here we present the first systematic investigation of early zygotic genes (EZGs) in a mosquito species and focus on genes involved in the onset of transcription during 2-4 hr. We used transcriptome sequencing to identify the "pure" (without maternal expression) EZGs by analyzing transcripts from four embryonic time ranges of 0-2, 2-4, 4-8, and 8-12 hr, which includes the time of cellular blastoderm formation and up to the start of gastrulation. Blast of 16,789 annotated transcripts vs. the transcriptome reads revealed evidence for 63 (P<0.001) and 143 (P<0.05) nonmaternally derived transcripts having a significant increase in expression at 2-4 hr. One third of the 63 EZG transcripts do not have predicted introns compared to 10% of all Ae. aegypti genes. We have confirmed by RT-PCR that zygotic transcription starts as early as 2-3 hours. A degenerate motif VBRGGTA was found to be overrepresented in the upstream sequences of the identified EZGs using a motif identification software called SCOPE. We find evidence for homology between this motif and the TAGteam motif found in Drosophila that has been implicated in EZG activation. A 38 bp sequence in the proximal upstream sequence of a kinesin light chain EZG (KLC2.1) contains two copies of the mosquito motif. This sequence was shown to support EZG transcription by luciferase reporter assays performed on injected early embryos, and confers early zygotic activity to a heterologous promoter from a divergent mosquito species. The results of these studies are consistent with the model of early zygotic genome activation via transcriptional activators, similar to what has been found recently in Drosophila.
Mapping the distribution of packing topologies within protein interiors shows predominant preference for specific packing motifs

PubMed Central

2011-01-01

Background Mapping protein primary sequences to their three dimensional folds referred to as the 'second genetic code' remains an unsolved scientific problem. A crucial part of the problem concerns the geometrical specificity in side chain association leading to densely packed protein cores, a hallmark of correctly folded native structures. Thus, any model of packing within proteins should constitute an indispensable component of protein folding and design. Results In this study an attempt has been made to find, characterize and classify recurring patterns in the packing of side chain atoms within a protein which sustains its native fold. The interaction of side chain atoms within the protein core has been represented as a contact network based on the surface complementarity and overlap between associating side chain surfaces. Some network topologies definitely appear to be preferred and they have been termed 'packing motifs', analogous to super secondary structures in proteins. Study of the distribution of these motifs reveals the ubiquitous presence of typical smaller graphs, which appear to get linked or coalesce to give larger graphs, reminiscent of the nucleation-condensation model in protein folding. One such frequently occurring motif, also envisaged as the unit of clustering, the three residue clique was invariably found in regions of dense packing. Finally, topological measures based on surface contact networks appeared to be effective in discriminating sequences native to a specific fold amongst a set of decoys. Conclusions Out of innumerable topological possibilities, only a finite number of specific packing motifs are actually realized in proteins. This small number of motifs could serve as a basis set in the construction of larger networks. Of these, the triplet clique exhibits distinct preference both in terms of composition and geometry. PMID:21605466
Design of character-based DNA barcode motif for species identification: A computational approach and its validation in fishes.

PubMed

Chakraborty, Mohua; Dhar, Bishal; Ghosh, Sankar Kumar

2017-11-01

The DNA barcodes are generally interpreted using distance-based and character-based methods. The former uses clustering of comparable groups, based on the relative genetic distance, while the latter is based on the presence or absence of discrete nucleotide substitutions. The distance-based approach has a limitation in defining a universal species boundary across the taxa as the rate of mtDNA evolution is not constant throughout the taxa. However, character-based approach more accurately defines this using a unique set of nucleotide characters. The character-based analysis of full-length barcode has some inherent limitations, like sequencing of the full-length barcode, use of a sparse-data matrix and lack of a uniform diagnostic position for each group. A short continuous stretch of a fragment can be used to resolve the limitations. Here, we observe that a 154-bp fragment, from the transversion-rich domain of 1367 COI barcode sequences can successfully delimit species in the three most diverse orders of freshwater fishes. This fragment is used to design species-specific barcode motifs for 109 species by the character-based method, which successfully identifies the correct species using a pattern-matching program. The motifs also correctly identify geographically isolated population of the Cypriniformes species. Further, this region is validated as a species-specific mini-barcode for freshwater fishes by successful PCR amplification and sequencing of the motif (154 bp) using the designed primers. We anticipate that use of such motifs will enhance the diagnostic power of DNA barcode, and the mini-barcode approach will greatly benefit the field-based system of rapid species identification. © 2017 John Wiley & Sons Ltd.
[Personal motif in art].

PubMed

Gerevich, József

2015-01-01

One of the basic questions of the art psychology is whether a personal motif is to be found behind works of art and if so, how openly or indirectly it appears in the work itself. Analysis of examples and documents from the fine arts and literature allow us to conclude that the personal motif that can be identified by the viewer through symbols, at times easily at others with more difficulty, gives an emotional plus to the artistic product. The personal motif may be found in traumatic experiences, in communication to the model or with other emotionally important persons (mourning, disappointment, revenge, hatred, rivalry, revolt etc.), in self-searching, or self-analysis. The emotions are expressed in artistic activity either directly or indirectly. The intention nourished by the artist's identity (Kunstwollen) may stand in the way of spontaneous self-expression, channelling it into hidden paths. Under the influence of certain circumstances, the artist may arouse in the viewer, consciously or unconsciously, an illusionary, misleading image of himself. An examination of the personal motif is one of the important research areas of art therapy.
A comprehensive analysis of Helicobacter pylori plasticity zones reveals that they are integrating conjugative elements with intermediate integration specificity.

PubMed

Fischer, Wolfgang; Breithaupt, Ute; Kern, Beate; Smith, Stella I; Spicher, Carolin; Haas, Rainer

2014-04-27

The human gastric pathogen Helicobacter pylori is a paradigm for chronic bacterial infections. Its persistence in the stomach mucosa is facilitated by several mechanisms of immune evasion and immune modulation, but also by an unusual genetic variability which might account for the capability to adapt to changing environmental conditions during long-term colonization. This variability is reflected by the fact that almost each infected individual is colonized by a genetically unique strain. Strain-specific genes are dispersed throughout the genome, but clusters of genes organized as genomic islands may also collectively be present or absent. We have comparatively analysed such clusters, which are commonly termed plasticity zones, in a high number of H. pylori strains of varying geographical origin. We show that these regions contain fixed gene sets, rather than being true regions of genome plasticity, but two different types and several subtypes with partly diverging gene content can be distinguished. Their genetic diversity is incongruent with variations in the rest of the genome, suggesting that they are subject to horizontal gene transfer within H. pylori populations. We identified 40 distinct integration sites in 45 genome sequences, with a conserved heptanucleotide motif that seems to be the minimal requirement for integration. The significant number of possible integration sites, together with the requirement for a short conserved integration motif and the high level of gene conservation, indicates that these elements are best described as integrating conjugative elements (ICEs) with an intermediate integration site specificity.
Interaction of Tsg101 with Marburg Virus VP40 Depends on the PPPY Motif, but Not the PT/SAP Motif as in the Case of Ebola Virus, and Tsg101 Plays a Critical Role in the Budding of Marburg Virus-Like Particles Induced by VP40, NP, and GP▿

PubMed Central

Urata, Shuzo; Noda, Takeshi; Kawaoka, Yoshihiro; Morikawa, Shigeru; Yokosawa, Hideyoshi; Yasuda, Jiro

2007-01-01

Marburg virus (MARV) VP40 is a matrix protein that can be released from mammalian cells in the form of virus-like particles (VLPs) and contains the PPPY sequence, which is an L-domain motif. Here, we demonstrate that the PPPY motif is important for VP40-induced VLP budding and that VLP production is significantly enhanced by coexpression of NP and GP. We show that Tsg101 interacts with VP40 depending on the presence of the PPPY motif, but not the PT/SAP motif as in the case of Ebola virus, and plays an important role in VLP budding. These findings provide new insights into the mechanism of MARV budding. PMID:17301151
Efficient T-cell receptor signaling requires a high-affinity interaction between the Gads C-SH3 domain and the SLP-76 RxxK motif.

PubMed

Seet, Bruce T; Berry, Donna M; Maltzman, Jonathan S; Shabason, Jacob; Raina, Monica; Koretzky, Gary A; McGlade, C Jane; Pawson, Tony

2007-02-07

The relationship between the binding affinity and specificity of modular interaction domains is potentially important in determining biological signaling responses. In signaling from the T-cell receptor (TCR), the Gads C-terminal SH3 domain binds a core RxxK sequence motif in the SLP-76 scaffold. We show that residues surrounding this motif are largely optimized for binding the Gads C-SH3 domain resulting in a high-affinity interaction (K(D)=8-20 nM) that is essential for efficient TCR signaling in Jurkat T cells, since Gads-mediated signaling declines with decreasing affinity. Furthermore, the SLP-76 RxxK motif has evolved a very high specificity for the Gads C-SH3 domain. However, TCR signaling in Jurkat cells is tolerant of potential SLP-76 crossreactivity, provided that very high-affinity binding to the Gads C-SH3 domain is maintained. These data provide a quantitative argument that the affinity of the Gads C-SH3 domain for SLP-76 is physiologically important and suggest that the integrity of TCR signaling in vivo is sustained both by strong selection of SLP-76 for the Gads C-SH3 domain and by a capacity to buffer intrinsic crossreactivity.
Ratiometric fluorescent sensing of pH values in living cells by dual-fluorophore-labeled i-motif nanoprobes.

PubMed

Huang, Jin; Ying, Le; Yang, Xiaohai; Yang, Yanjing; Quan, Ke; Wang, He; Xie, Nuli; Ou, Min; Zhou, Qifeng; Wang, Kemin

2015-09-01

We designed a new ratiometric fluorescent nanoprobe for sensing pH values in living cells. Briefly, the nanoprobe consists of a gold nanoparticle (AuNP), short single-stranded oligonucleotides, and dual-fluorophore-labeled i-motif sequences. The short oligonucleotides are designed to bind with the i-motif sequences and immobilized on the AuNP surface via Au-S bond. At neutral pH, the dual fluorophores are separated, resulting in very low fluorescence resonance energy transfer (FRET) efficiency. At acidic pH, the i-motif strands fold into a quadruplex structure and leave the AuNP, bringing the dual fluorophores into close proximity, resulting in high FRET efficiency, which could be used as a signal for pH sensing. The nanoprobe possesses abilities of cellular transfection, enzymatic protection, fast response and quantitative pH detection. The in vitro and intracellular applications of the nanoprobe were demonstrated, which showed excellent response in the physiological pH range. Furthermore, our experimental results suggested that the nanoprobe showed excellent spatial and temporal resolution in living cells. We think that the ratiometric sensing strategy could potentially be applied to create a variety of new multicolor sensors for intracellular detection.
Motif enrichment tool.

PubMed

Blatti, Charles; Sinha, Saurabh

2014-07-01

The Motif Enrichment Tool (MET) provides an online interface that enables users to find major transcriptional regulators of their gene sets of interest. MET searches the appropriate regulatory region around each gene and identifies which transcription factor DNA-binding specificities (motifs) are statistically overrepresented. Motif enrichment analysis is currently available for many metazoan species including human, mouse, fruit fly, planaria and flowering plants. MET also leverages high-throughput experimental data such as ChIP-seq and DNase-seq from ENCODE and ModENCODE to identify the regulatory targets of a transcription factor with greater precision. The results from MET are produced in real time and are linked to a genome browser for easy follow-up analysis. Use of the web tool is free and open to all, and there is no login requirement. ADDRESS: http://veda.cs.uiuc.edu/MET/. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Edge usage, motifs, and regulatory logic for cell cycling genetic networks

NASA Astrophysics Data System (ADS)

Zagorski, M.; Krzywicki, A.; Martin, O. C.

2013-01-01

The cell cycle is a tightly controlled process, yet it shows marked differences across species. Which of its structural features follow solely from the ability to control gene expression? We tackle this question in silico by examining the ensemble of all regulatory networks which satisfy the constraint of producing a given sequence of gene expressions. We focus on three cell cycle profiles coming from baker's yeast, fission yeast, and mammals. First, we show that the networks in each of the ensembles use just a few interactions that are repeatedly reused as building blocks. Second, we find an enrichment in network motifs that is similar in the two yeast cell cycle systems investigated. These motifs do not have autonomous functions, yet they reveal a regulatory logic for cell cycling based on a feed-forward cascade of activating interactions.
Evolutionary relationships in the ilarviruses: nucleotide sequence of prunus necrotic ringspot virus RNA 3.

PubMed

Sánchez-Navarro, J A; Pallás, V

1997-01-01

The complete nucleotide sequence of an isolate of prunus necrotic ringspot virus (PNRSV) RNA 3 has been determined. Elucidation of the amino acid sequence of the proteins encoded by the two large open reading frames (ORFs) allowed us to carry out comparative and phylogenetic studies on the movement (MP) and coat (CP) proteins in the ilarvirus group. Amino acid sequence comparison of the MP revealed a highly conserved basic sequence motif with an amphipathic alpha-helical structure preceding the conserved motif of the '30K superfamily' proposed by Mushegian and Koonin [26] for MP's. Within this '30K' motif a strictly conserved transmembrane domain is present in all ilarviruses sequenced so far. At the amino-terminal end, prune dwarf virus (PDV) has an extension not present in other ilarviruses but which is observed in all bromo- and cucumoviruses, suggesting a common ancestor or a recombinational event in the Bromoviridae family. Examination of the N-terminus of the CP's of all ilarviruses revealed a highly basic region, part of which resembles the Arg-rich motif that has been characterized in the RNA-binding protein family. This motif has also been found in the other members of the Bromoviridae family, suggesting its involvement in a structural function. Furthermore this region is required for infectivity in ilarviruses. The similarities found in this Arg-rich motif are discussed in terms of this process known as genome activation. Finally, phylogenetic analysis of both the MP and CP proteins revealed a higher relationship of A1MV to PNRSV, apple mosaic virus (ApMV) and PDV than any other member of the ilarvirus group. In that sense, A1MV should be considered as a true ilarvirus instead of forming a distinct group of viruses.
The Verrucomicrobia LexA-Binding Motif: Insights into the Evolutionary Dynamics of the SOS Response.

PubMed

Erill, Ivan; Campoy, Susana; Kılıç, Sefa; Barbé, Jordi

2016-01-01

The SOS response is the primary bacterial mechanism to address DNA damage, coordinating multiple cellular processes that include DNA repair, cell division, and translesion synthesis. In contrast to other regulatory systems, the composition of the SOS genetic network and the binding motif of its transcriptional repressor, LexA, have been shown to vary greatly across bacterial clades, making it an ideal system to study the co-evolution of transcription factors and their regulons. Leveraging comparative genomics approaches and prior knowledge on the core SOS regulon, here we define the binding motif of the Verrucomicrobia, a recently described phylum of emerging interest due to its association with eukaryotic hosts. Site directed mutagenesis of the Verrucomicrobium spinosum recA promoter confirms that LexA binds a 14 bp palindromic motif with consensus sequence TGTTC-N4-GAACA. Computational analyses suggest that recognition of this novel motif is determined primarily by changes in base-contacting residues of the third alpha helix of the LexA helix-turn-helix DNA binding motif. In conjunction with comparative genomics analysis of the LexA regulon in the Verrucomicrobia phylum, electrophoretic shift assays reveal that LexA binds to operators in the promoter region of DNA repair genes and a mutagenesis cassette in this organism, and identify previously unreported components of the SOS response. The identification of tandem LexA-binding sites generating instances of other LexA-binding motifs in the lexA gene promoter of Verrucomicrobia species leads us to postulate a novel mechanism for LexA-binding motif evolution. This model, based on gene duplication, successfully addresses outstanding questions in the intricate co-evolution of the LexA protein, its binding motif and the regulatory network it controls.
ViennaNGS: A toolbox for building efficient next- generation sequencing analysis pipelines

PubMed Central

Wolfinger, Michael T.; Fallmann, Jörg; Eggenhofer, Florian; Amman, Fabian

2015-01-01

Recent achievements in next-generation sequencing (NGS) technologies lead to a high demand for reuseable software components to easily compile customized analysis workflows for big genomics data. We present ViennaNGS, an integrated collection of Perl modules focused on building efficient pipelines for NGS data processing. It comes with functionality for extracting and converting features from common NGS file formats, computation and evaluation of read mapping statistics, as well as normalization of RNA abundance. Moreover, ViennaNGS provides software components for identification and characterization of splice junctions from RNA-seq data, parsing and condensing sequence motif data, automated construction of Assembly and Track Hubs for the UCSC genome browser, as well as wrapper routines for a set of commonly used NGS command line tools. PMID:26236465

Integrated sequencing of exome and mRNA of large-sized single cells.

PubMed

Wang, Lily Yan; Guo, Jiajie; Cao, Wei; Zhang, Meng; He, Jiankui; Li, Zhoufang

2018-01-10

Current approaches of single cell DNA-RNA integrated sequencing are difficult to call SNPs, because a large amount of DNA and RNA is lost during DNA-RNA separation. Here, we performed simultaneous single-cell exome and transcriptome sequencing on individual mouse oocytes. Using microinjection, we kept the nuclei intact to avoid DNA loss, while retaining the cytoplasm inside the cell membrane, to maximize the amount of DNA and RNA captured from the single cell. We then conducted exome-sequencing on the isolated nuclei and mRNA-sequencing on the enucleated cytoplasm. For single oocytes, exome-seq can cover up to 92% of exome region with an average sequencing depth of 10+, while mRNA-sequencing reveals more than 10,000 expressed genes in enucleated cytoplasm, with similar performance for intact oocytes. This approach provides unprecedented opportunities to study DNA-RNA regulation, such as RNA editing at single nucleotide level in oocytes. In future, this method can also be applied to other large cells, including neurons, large dendritic cells and large tumour cells for integrated exome and transcriptome sequencing.
Nucleic Acid i-Motif Structures in Analytical Chemistry.

PubMed

Alba, Joan Josep; Sadurní, Anna; Gargallo, Raimundo

2016-09-02

Under the appropriate experimental conditions of pH and temperature, cytosine-rich segments in DNA or RNA sequences may produce a characteristic folded structure known as an i-motif. Besides its potential role in vivo, which is still under investigation, this structure has attracted increasing interest in other fields due to its sharp, fast and reversible pH-driven conformational changes. This "on/off" switch at molecular level is being used in nanotechnology and analytical chemistry to develop nanomachines and sensors, respectively. This paper presents a review of the latest applications of this structure in the field of chemical analysis.
A R/K-rich motif in the C-terminal of the homeodomain is required for complete translocating of NKX2.5 protein into nucleus.

PubMed

Ouyang, Ping; Zhang, He; Fan, Zhaolan; Wei, Pei; Huang, Zhigang; Wang, Sen; Li, Tao

2016-11-05

NKX2.5 plays important roles in heart development. Being a transcription factor, NKX2.5 exerts its biological functions in nucleus. However, the sequence motif that localize NKX2.5 into nucleus is still not clear. Here, we found a R/K-rich sequence motif from Q187 to R197 (QNRRYKCKRQR) was required for exclusive nuclear localization of NKX2.5. Eight truncated plasmids (E109X, Q149X, Q170X, Q187X, Q198X, Y256X, Y259X, and C264X) which were associated with congenital heart disease (CHD) were constructed. Compared with the wild type NKX2.5, the proteins E109X, Q149X, Q170X, Q187X without intact homeodomain (HD) showed no transcriptional activity while Q198X, Y256X, Y259X and C264X with intact HD showed 50 to 66% transcriptional activity. E109X, Q149X, Q170X, Q187X without intact HD localized in the cytoplasm and nucleus simultaneously and Q198X, Y256X, Y259X and C264X with intact HD localized completely in nucleus. These results inferred the indispensability of 187QNRRYKCKRQR197 in exclusive nucleus localization. Additionally, this sequence motif was very conservative among human, mouse and rat, indicating this motif was important for NKX2.5 function. Thus, we concluded that R/K-rich sequence motif 187QNRRYKCKRQR197 played a central role for NKX2.5 nuclear localization. Our findings provided a clue to understand the mechanisms between the truncated NKX2.5 mutants and CHD. Copyright © 2016 Elsevier B.V. All rights reserved.
MethylViewer: computational analysis and editing for bisulfite sequencing and methyltransferase accessibility protocol for individual templates (MAPit) projects.

PubMed

Pardo, Carolina E; Carr, Ian M; Hoffman, Christopher J; Darst, Russell P; Markham, Alexander F; Bonthron, David T; Kladde, Michael P

2011-01-01

Bisulfite sequencing is a widely-used technique for examining cytosine DNA methylation at nucleotide resolution along single DNA strands. Probing with cytosine DNA methyltransferases followed by bisulfite sequencing (MAPit) is an effective technique for mapping protein-DNA interactions. Here, MAPit methylation footprinting with M.CviPI, a GC methyltransferase we previously cloned and characterized, was used to probe hMLH1 chromatin in HCT116 and RKO colorectal cancer cells. Because M.CviPI-probed samples contain both CG and GC methylation, we developed a versatile, visually-intuitive program, called MethylViewer, for evaluating the bisulfite sequencing results. Uniquely, MethylViewer can simultaneously query cytosine methylation status in bisulfite-converted sequences at as many as four different user-defined motifs, e.g. CG, GC, etc., including motifs with degenerate bases. Data can also be exported for statistical analysis and as publication-quality images. Analysis of hMLH1 MAPit data with MethylViewer showed that endogenous CG methylation and accessible GC sites were both mapped on single molecules at high resolution. Disruption of positioned nucleosomes on single molecules of the PHO5 promoter was detected in budding yeast using M.CviPII, increasing the number of enzymes available for probing protein-DNA interactions. MethylViewer provides an integrated solution for primer design and rapid, accurate and detailed analysis of bisulfite sequencing or MAPit datasets from virtually any biological or biochemical system.
Cross-reactions vs co-sensitization evaluated by in silico motifs and in vitro IgE microarray testing.

PubMed

Pfiffner, P; Stadler, B M; Rasi, C; Scala, E; Mari, A

2012-02-01

Using an in silico allergen clustering method, we have recently shown that allergen extracts are highly cross-reactive. Here we used serological data from a multi-array IgE test based on recombinant or highly purified natural allergens to evaluate whether co-reactions are true cross-reactions or co-sensitizations by allergens with the same motifs. The serum database consisted of 3142 samples, each tested against 103 highly purified natural or recombinant allergens. Cross-reactivity was predicted by an iterative motif-finding algorithm through sequence motifs identified in 2708 known allergens. Allergen proteins containing the same motifs cross-reacted as predicted. However, proteins with identical motifs revealed a hierarchy in the degree of cross-reaction: The more frequent an allergen was positive in the allergic population, the less frequently it was cross-reacting and vice versa. Co-sensitization was analyzed by splitting the dataset into patient groups that were most likely sensitized through geographical occurrence of allergens. Interestingly, most co-reactions are cross-reactions but not co-sensitizations. The observed hierarchy of cross-reactivity may play an important role for the future management of allergic diseases. © 2011 John Wiley & Sons A/S.
Brickworx builds recurrent RNA and DNA structural motifs into medium- and low-resolution electron-density maps

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chojnowski, Grzegorz, E-mail: gchojnowski@genesilico.pl; Waleń, Tomasz; University of Warsaw, Banacha 2, 02-097 Warsaw

2015-03-01

A computer program that builds crystal structure models of nucleic acid molecules is presented. Brickworx is a computer program that builds crystal structure models of nucleic acid molecules using recurrent motifs including double-stranded helices. In a first step, the program searches for electron-density peaks that may correspond to phosphate groups; it may also take into account phosphate-group positions provided by the user. Subsequently, comparing the three-dimensional patterns of the P atoms with a database of nucleic acid fragments, it finds the matching positions of the double-stranded helical motifs (A-RNA or B-DNA) in the unit cell. If the target structure ismore » RNA, the helical fragments are further extended with recurrent RNA motifs from a fragment library that contains single-stranded segments. Finally, the matched motifs are merged and refined in real space to find the most likely conformations, including a fit of the sequence to the electron-density map. The Brickworx program is available for download and as a web server at http://iimcb.genesilico.pl/brickworx.« less
Regulation of amyloid precursor protein processing by its KFERQ motif.

PubMed

Park, Ji-Seon; Kim, Dong-Hou; Yoon, Seung-Yong

2016-06-01

Understanding of trafficking, processing, and degradation mechanisms of amyloid precursor protein (APP) is important because APP can be processed to produce β-amyloid (Aβ), a key pathogenic molecule in Alzheimer's disease (AD). Here, we found that APP contains KFERQ motif at its C-terminus, a consensus sequence for chaperone-mediated autophagy (CMA) or microautophagy which are another types of autophagy for degradation of pathogenic molecules in neurodegenerative diseases. Deletion of KFERQ in APP increased C-terminal fragments (CTFs) and secreted N-terminal fragments of APP and kept it away from lysosomes. KFERQ deletion did not abolish the interaction of APP or its cleaved products with heat shock cognate protein 70 (Hsc70), a protein necessary for CMA or microautophagy. These findings suggest that KFERQ motif is important for normal processing and degradation of APP to preclude the accumulation of APP-CTFs although it may not be important for CMA or microautophagy. [BMB Reports 2016; 49(6): 337-342].
Transcription factor ThWRKY4 binds to a novel WLS motif and a RAV1A element in addition to the W-box to regulate gene expression.

PubMed

Xu, Hongyun; Shi, Xinxin; Wang, Zhibo; Gao, Caiqiu; Wang, Chao; Wang, Yucheng

2017-08-01

WRKY transcription factors play important roles in many biological processes, and mainly bind to the W-box element to regulate gene expression. Previously, we characterized a WRKY gene from Tamarix hispida, ThWRKY4, in response to abiotic stress, and showed that it bound to the W-box motif. However, whether ThWRKY4 could bind to other motifs remains unknown. In this study, we employed a Transcription Factor-Centered Yeast one Hybrid (TF-Centered Y1H) screen to study the motifs recognized by ThWRKY4. In addition to the W-box core cis-element (termed W-box), we identified that ThWRKY4 could bind to two other motifs: the RAV1A element (CAACA) and a novel motif with sequence of GTCTA (W-box like sequence, WLS). The distributions of these motifs were screened in the promoter regions of genes regulated by some WRKYs. The results showed that the W-box, RAV1A, and WLS motifs were all present in high numbers, suggesting that they play key roles in gene expression mediated by WRKYs. Furthermore, five WRKY proteins from different WRKY subfamilies in Arabidopsis thaliana were selected and confirmed to bind to the RAV1A and WLS motifs, indicating that they are recognized commonly by WRKYs. These findings will help to further reveal the functions of WRKY proteins. Copyright © 2017 Elsevier B.V. All rights reserved.
Transterm—extended search facilities and improved integration with other databases

PubMed Central

Jacobs, Grant H.; Stockwell, Peter A.; Tate, Warren P.; Brown, Chris M.

2006-01-01

Transterm has now been publicly available for >10 years. Major changes have been made since its last description in this database issue in 2002. The current database provides data for key regions of mRNA sequences, a curated database of mRNA motifs and tools to allow users to investigate their own motifs or mRNA sequences. The key mRNA regions database is derived computationally from Genbank. It contains 3′ and 5′ flanking regions, the initiation and termination signal context and coding sequence for annotated CDS features from Genbank and RefSeq. The database is non-redundant, enabling summary files and statistics to be prepared for each species. Advances include providing extended search facilities, the database may now be searched by BLAST in addition to regular expressions (patterns) allowing users to search for motifs such as known miRNA sequences, and the inclusion of RefSeq data. The database contains >40 motifs or structural patterns important for translational control. In this release, patterns from UTRsite and Rfam are also incorporated with cross-referencing. Users may search their sequence data with Transterm or user-defined patterns. The system is accessible at . PMID:16381889
Classification of proteins with shared motifs and internal repeats in the ECOD database

PubMed Central

Kinch, Lisa N.; Liao, Yuxing

2016-01-01

Abstract Proteins and their domains evolve by a set of events commonly including the duplication and divergence of small motifs. The presence of short repetitive regions in domains has generally constituted a difficult case for structural domain classifications and their hierarchies. We developed the Evolutionary Classification Of protein Domains (ECOD) in part to implement a new schema for the classification of these types of proteins. Here we document the ways in which ECOD classifies proteins with small internal repeats, widespread functional motifs, and assemblies of small domain‐like fragments in its evolutionary schema. We illustrate the ways in which the structural genomics project impacted the classification and characterization of new structural domains and sequence families over the decade. PMID:26833690
Distribution of CpG Motifs in Upstream Gene Domains in a Reef Coral and Sea Anemone: Implications for Epigenetics in Cnidarians.

PubMed

Marsh, Adam G; Hoadley, Kenneth D; Warner, Mark E

2016-01-01

Coral reefs are under assault from stressors including global warming, ocean acidification, and urbanization. Knowing how these factors impact the future fate of reefs requires delineating stress responses across ecological, organismal and cellular scales. Recent advances in coral reef biology have integrated molecular processes with ecological fitness and have identified putative suites of temperature acclimation genes in a Scleractinian coral Acropora hyacinthus. We wondered what unique characteristics of these genes determined their coordinate expression in response to temperature acclimation, and whether or not other corals and cnidarians would likewise possess these features. Here, we focus on cytosine methylation as an epigenetic DNA modification that is responsive to environmental stressors. We identify common conserved patterns of cytosine-guanosine dinucleotide (CpG) motif frequencies in upstream promoter domains of different functional gene groups in two cnidarian genomes: a coral (Acropora digitifera) and an anemone (Nematostella vectensis). Our analyses show that CpG motif frequencies are prominent in the promoter domains of functional genes associated with environmental adaptation, particularly those identified in A. hyacinthus. Densities of CpG sites in upstream promoter domains near the transcriptional start site (TSS) are 1.38x higher than genomic background levels upstream of -2000 bp from the TSS. The increase in CpG usage suggests selection to allow for DNA methylation events to occur more frequently within 1 kb of the TSS. In addition, observed shifts in CpG densities among functional groups of genes suggests a potential role for epigenetic DNA methylation within promoter domains to impact functional gene expression responses in A. digitifera and N. vectensis. Identifying promoter epigenetic sequence motifs among genes within specific functional groups establishes an approach to describe integrated cellular responses to environmental stress in
Ser/Thr Motifs in Transmembrane Proteins: Conservation Patterns and Effects on Local Protein Structure and Dynamics

PubMed Central

del Val, Coral; White, Stephen H.

2014-01-01

We combined systematic bioinformatics analyses and molecular dynamics simulations to assess the conservation patterns of Ser and Thr motifs in membrane proteins, and the effect of such motifs on the structure and dynamics of α-helical transmembrane (TM) segments. We find that Ser/Thr motifs are often present in β-barrel TM proteins. At least one Ser/Thr motif is present in almost half of the sequences of α-helical proteins analyzed here. The extensive bioinformatics analyses and inspection of protein structures led to the identification of molecular transporters with noticeable numbers of Ser/Thr motifs within the TM region. Given the energetic penalty for burying multiple Ser/Thr groups in the membrane hydrophobic core, the observation of transporters with multiple membrane-embedded Ser/Thr is intriguing and raises the question of how the presence of multiple Ser/Thr affects protein local structure and dynamics. Molecular dynamics simulations of four different Ser-containing model TM peptides indicate that backbone hydrogen bonding of membrane-buried Ser/Thr hydroxyl groups can significantly change the local structure and dynamics of the helix. Ser groups located close to the membrane interface can hydrogen bond to solvent water instead of protein backbone, leading to an enhanced local solvation of the peptide. PMID:22836667
The Encapsidated Genome of Microplitis demolitor Bracovirus Integrates into the Host Pseudoplusia includens ▿ ‡

PubMed Central

Beck, Markus H.; Zhang, Shu; Bitra, Kavita; Burke, Gaelen R.; Strand, Michael R.

2011-01-01

Polydnaviruses (PDVs) are symbionts of parasitoid wasps that function as gene delivery vehicles in the insects (hosts) that the wasps parasitize. PDVs persist in wasps as integrated proviruses but are packaged as circularized and segmented double-stranded DNAs into the virions that wasps inject into hosts. In contrast, little is known about how PDV genomic DNAs persist in host cells. Microplitis demolitor carries Microplitis demolitor bracovirus (MdBV) and parasitizes the host Pseudoplusia includens. MdBV infects primarily host hemocytes and also infects a hemocyte-derived cell line from P. includens called CiE1 cells. Here we report that all 15 genomic segments of the MdBV encapsidated genome exhibited long-term persistence in CiE1 cells. Most MdBV genes expressed in hemocytes were persistently expressed in CiE1 cells, including members of the glc gene family whose products transformed CiE1 cells into a suspension culture. PCR-based integration assays combined with cloning and sequencing of host-virus junctions confirmed that genomic segments J and C persisted in CiE1 cells by integration. These genomic DNAs also rapidly integrated into parasitized P. includens. Sequence analysis of wasp-viral junction clones showed that the integration of proviral segments in M. demolitor was associated with a wasp excision/integration motif (WIM) known from other bracoviruses. However, integration into host cells occurred in association with a previously unknown domain that we named the host integration motif (HIM). The presence of HIMs in most MdBV genomic DNAs suggests that the integration of each genomic segment into host cells occurs through a shared mechanism. PMID:21880747
miRNA Enriched in Human Neuroblast Nuclei Bind the MAZ Transcription Factor and Their Precursors Contain the MAZ Consensus Motif.

PubMed

Goldie, Belinda J; Fitzsimmons, Chantel; Weidenhofer, Judith; Atkins, Joshua R; Wang, Dan O; Cairns, Murray J

2017-01-01

While the cytoplasmic function of microRNA (miRNA) as post-transcriptional regulators of mRNA has been the subject of significant research effort, their activity in the nucleus is less well characterized. Here we use a human neuronal cell model to show that some mature miRNA are preferentially enriched in the nucleus. These molecules were predominantly primate-specific and contained a sequence motif with homology to the consensus MAZ transcription factor binding element. Precursor miRNA containing this motif were shown to have affinity for MAZ protein in nuclear extract. We then used Ago1/2 RIP-Seq to explore nuclear miRNA-associated mRNA targets. Interestingly, the genes for Ago2-associated transcripts were also significantly enriched with MAZ binding sites and neural function, whereas Ago1-transcripts were associated with general metabolic processes and localized with SC35 spliceosomes. These findings suggest the MAZ transcription factor is associated with miRNA in the nucleus and may influence the regulation of neuronal development through Ago2-associated miRNA induced silencing complexes. The MAZ transcription factor may therefore be important for organizing higher order integration of transcriptional and post-transcriptional processes in primate neurons.
Crystal structure of yeast allantoicase reveals a repeated jelly roll motif.

PubMed

Leulliot, Nicolas; Quevillon-Cheruel, Sophie; Sorel, Isabelle; Graille, Marc; Meyer, Philippe; Liger, Dominique; Blondeau, Karine; Janin, Joël; van Tilbeurgh, Herman

2004-05-28

Allantoicase (EC 3.5.3.4) catalyzes the conversion of allantoate into ureidoglycolate and urea, one of the final steps in the degradation of purines to urea. The mechanism of most enzymes involved in this pathway, which has been known for a long time, is unknown. In this paper we describe the three-dimensional crystal structure of the yeast allantoicase determined at a resolution of 2.6 A by single anomalous diffraction. This constitutes the first structure for an enzyme of this pathway. The structure reveals a repeated jelly roll beta-sheet motif, also present in proteins of unrelated biochemical function. Allantoicase has a hexameric arrangement in the crystal (dimer of trimers). Analysis of the protein sequence against the structural data reveals the presence of two totally conserved surface patches, one on each jelly roll motif. The hexameric packing concentrates these patches into conserved pockets that probably constitute the active site.
The MIGenAS integrated bioinformatics toolkit for web-based sequence analysis

PubMed Central

Rampp, Markus; Soddemann, Thomas; Lederer, Hermann

2006-01-01

We describe a versatile and extensible integrated bioinformatics toolkit for the analysis of biological sequences over the Internet. The web portal offers convenient interactive access to a growing pool of chainable bioinformatics software tools and databases that are centrally installed and maintained by the RZG. Currently, supported tasks comprise sequence similarity searches in public or user-supplied databases, computation and validation of multiple sequence alignments, phylogenetic analysis and protein–structure prediction. Individual tools can be seamlessly chained into pipelines allowing the user to conveniently process complex workflows without the necessity to take care of any format conversions or tedious parsing of intermediate results. The toolkit is part of the Max-Planck Integrated Gene Analysis System (MIGenAS) of the Max Planck Society available at (click ‘Start Toolkit’). PMID:16844980
Clustering and Candidate Motif Detection in Exosomal miRNAs by Application of Machine Learning Algorithms.

PubMed

Gaur, Pallavi; Chaturvedi, Anoop

2017-07-22

The clustering pattern and motifs give immense information about any biological data. An application of machine learning algorithms for clustering and candidate motif detection in miRNAs derived from exosomes is depicted in this paper. Recent progress in the field of exosome research and more particularly regarding exosomal miRNAs has led much bioinformatic-based research to come into existence. The information on clustering pattern and candidate motifs in miRNAs of exosomal origin would help in analyzing existing, as well as newly discovered miRNAs within exosomes. Along with obtaining clustering pattern and candidate motifs in exosomal miRNAs, this work also elaborates the usefulness of the machine learning algorithms that can be efficiently used and executed on various programming languages/platforms. Data were clustered and sequence candidate motifs were detected successfully. The results were compared and validated with some available web tools such as 'BLASTN' and 'MEME suite'. The machine learning algorithms for aforementioned objectives were applied successfully. This work elaborated utility of machine learning algorithms and language platforms to achieve the tasks of clustering and candidate motif detection in exosomal miRNAs. With the information on mentioned objectives, deeper insight would be gained for analyses of newly discovered miRNAs in exosomes which are considered to be circulating biomarkers. In addition, the execution of machine learning algorithms on various language platforms gives more flexibility to users to try multiple iterations according to their requirements. This approach can be applied to other biological data-mining tasks as well.
Cellular microRNAs up-regulate transcription via interaction with promoter TATA-box motifs.

PubMed

Zhang, Yijun; Fan, Miaomiao; Zhang, Xue; Huang, Feng; Wu, Kang; Zhang, Junsong; Liu, Jun; Huang, Zhuoqiong; Luo, Haihua; Tao, Liang; Zhang, Hui

2014-12-01

The TATA box represents one of the most prevalent core promoters where the pre-initiation complexes (PICs) for gene transcription are assembled. This assembly is crucial for transcription initiation and well regulated. Here we show that some cellular microRNAs (miRNAs) are associated with RNA polymerase II (Pol II) and TATA box-binding protein (TBP) in human peripheral blood mononuclear cells (PBMCs). Among them, let-7i sequence specifically binds to the TATA-box motif of interleukin-2 (IL-2) gene and elevates IL-2 mRNA and protein production in CD4(+) T-lymphocytes in vitro and in vivo. Through direct interaction with the TATA-box motif, let-7i facilitates the PIC assembly and transcription initiation of IL-2 promoter. Several other cellular miRNAs, such as mir-138, mir-92a or mir-181d, also enhance the promoter activities via binding to the TATA-box motifs of insulin, calcitonin or c-myc, respectively. In agreement with the finding that an HIV-1-encoded miRNA could enhance viral replication through targeting the viral promoter TATA-box motif, our data demonstrate that the interaction with core transcription machinery is a novel mechanism for miRNAs to regulate gene expression. © 2014 Zhang et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
An integrated SNP mining and utilization (ISMU) pipeline for next generation sequencing data.

PubMed

Azam, Sarwar; Rathore, Abhishek; Shah, Trushar M; Telluri, Mohan; Amindala, BhanuPrakash; Ruperao, Pradeep; Katta, Mohan A V S K; Varshney, Rajeev K

2014-01-01

Open source single nucleotide polymorphism (SNP) discovery pipelines for next generation sequencing data commonly requires working knowledge of command line interface, massive computational resources and expertise which is a daunting task for biologists. Further, the SNP information generated may not be readily used for downstream processes such as genotyping. Hence, a comprehensive pipeline has been developed by integrating several open source next generation sequencing (NGS) tools along with a graphical user interface called Integrated SNP Mining and Utilization (ISMU) for SNP discovery and their utilization by developing genotyping assays. The pipeline features functionalities such as pre-processing of raw data, integration of open source alignment tools (Bowtie2, BWA, Maq, NovoAlign and SOAP2), SNP prediction (SAMtools/SOAPsnp/CNS2snp and CbCC) methods and interfaces for developing genotyping assays. The pipeline outputs a list of high quality SNPs between all pairwise combinations of genotypes analyzed, in addition to the reference genome/sequence. Visualization tools (Tablet and Flapjack) integrated into the pipeline enable inspection of the alignment and errors, if any. The pipeline also provides a confidence score or polymorphism information content value with flanking sequences for identified SNPs in standard format required for developing marker genotyping (KASP and Golden Gate) assays. The pipeline enables users to process a range of NGS datasets such as whole genome re-sequencing, restriction site associated DNA sequencing and transcriptome sequencing data at a fast speed. The pipeline is very useful for plant genetics and breeding community with no computational expertise in order to discover SNPs and utilize in genomics, genetics and breeding studies. The pipeline has been parallelized to process huge datasets of next generation sequencing. It has been developed in Java language and is available at http://hpc.icrisat.cgiar.org/ISMU as a standalone
An Integrated SNP Mining and Utilization (ISMU) Pipeline for Next Generation Sequencing Data

PubMed Central

Azam, Sarwar; Rathore, Abhishek; Shah, Trushar M.; Telluri, Mohan; Amindala, BhanuPrakash; Ruperao, Pradeep; Katta, Mohan A. V. S. K.; Varshney, Rajeev K.

2014-01-01

Open source single nucleotide polymorphism (SNP) discovery pipelines for next generation sequencing data commonly requires working knowledge of command line interface, massive computational resources and expertise which is a daunting task for biologists. Further, the SNP information generated may not be readily used for downstream processes such as genotyping. Hence, a comprehensive pipeline has been developed by integrating several open source next generation sequencing (NGS) tools along with a graphical user interface called Integrated SNP Mining and Utilization (ISMU) for SNP discovery and their utilization by developing genotyping assays. The pipeline features functionalities such as pre-processing of raw data, integration of open source alignment tools (Bowtie2, BWA, Maq, NovoAlign and SOAP2), SNP prediction (SAMtools/SOAPsnp/CNS2snp and CbCC) methods and interfaces for developing genotyping assays. The pipeline outputs a list of high quality SNPs between all pairwise combinations of genotypes analyzed, in addition to the reference genome/sequence. Visualization tools (Tablet and Flapjack) integrated into the pipeline enable inspection of the alignment and errors, if any. The pipeline also provides a confidence score or polymorphism information content value with flanking sequences for identified SNPs in standard format required for developing marker genotyping (KASP and Golden Gate) assays. The pipeline enables users to process a range of NGS datasets such as whole genome re-sequencing, restriction site associated DNA sequencing and transcriptome sequencing data at a fast speed. The pipeline is very useful for plant genetics and breeding community with no computational expertise in order to discover SNPs and utilize in genomics, genetics and breeding studies. The pipeline has been parallelized to process huge datasets of next generation sequencing. It has been developed in Java language and is available at http://hpc.icrisat.cgiar.org/ISMU as a standalone

Evolution subverting essentiality: Dispensability of the cell attachment Arg-Gly-Asp motif in multiply passaged foot-and-mouth disease virus

PubMed Central

Martínez, Miguel A.; Verdaguer, Nuria; Mateu, Mauricio G.; Domingo, Esteban

1997-01-01

Aphthoviruses use a conserved Arg-Gly-Asp triplet for attachment to host cells and this motif is believed to be essential for virus viability. Here we report that this triplet—which is also a widespread motif involved in cell-to-cell adhesion—can become dispensable upon short-term evolution of the virus harboring it. Foot-and-mouth disease virus (FMDV), which was multiply passaged in cell culture, showed an altered repertoire of antigenic variants resistant to a neutralizing monoclonal antibody. The altered repertoire includes variants with substitutions at the Arg-Gly-Asp motif. Mutants lacking this sequence replicated normally in cell culture and were indistinguishable from the parental virus. Studies with individual FMDV clones indicate that amino acid replacements on the capsid surface located around the loop harboring the Arg-Gly-Asp triplet may mediate in the dispensability of this motif. The results show that FMDV quasispecies evolving in a constant biological environment have the capability of rendering totally dispensable a receptor recognition motif previously invariant, and to ensure an alternative pathway for normal viral replication. Thus, variability of highly conserved motifs, even those that viruses have adapted from functional cellular motifs, can contribute to phenotypic flexibility of RNA viruses in nature. PMID:9192645
A discrete artificial bee colony algorithm for detecting transcription factor binding sites in DNA sequences.

PubMed

Karaboga, D; Aslan, S

2016-04-27

The great majority of biological sequences share significant similarity with other sequences as a result of evolutionary processes, and identifying these sequence similarities is one of the most challenging problems in bioinformatics. In this paper, we present a discrete artificial bee colony (ABC) algorithm, which is inspired by the intelligent foraging behavior of real honey bees, for the detection of highly conserved residue patterns or motifs within sequences. Experimental studies on three different data sets showed that the proposed discrete model, by adhering to the fundamental scheme of the ABC algorithm, produced competitive or better results than other metaheuristic motif discovery techniques.
Alignment-free sequence comparison (II): theoretical power of comparison statistics.

PubMed

Wan, Lin; Reinert, Gesine; Sun, Fengzhu; Waterman, Michael S

2010-11-01

Rapid methods for alignment-free sequence comparison make large-scale comparisons between sequences increasingly feasible. Here we study the power of the statistic D2, which counts the number of matching k-tuples between two sequences, as well as D2*, which uses centralized counts, and D2S, which is a self-standardized version, both from a theoretical viewpoint and numerically, providing an easy to use program. The power is assessed under two alternative hidden Markov models; the first one assumes that the two sequences share a common motif, whereas the second model is a pattern transfer model; the null model is that the two sequences are composed of independent and identically distributed letters and they are independent. Under the first alternative model, the means of the tuple counts in the individual sequences change, whereas under the second alternative model, the marginal means are the same as under the null model. Using the limit distributions of the count statistics under the null and the alternative models, we find that generally, asymptotically D2S has the largest power, followed by D2*, whereas the power of D2 can even be zero in some cases. In contrast, even for sequences of length 140,000 bp, in simulations D2* generally has the largest power. Under the first alternative model of a shared motif, the power of D2*approaches 100% when sufficiently many motifs are shared, and we recommend the use of D2* for such practical applications. Under the second alternative model of pattern transfer,the power for all three count statistics does not increase with sequence length when the sequence is sufficiently long, and hence none of the three statistics under consideration canbe recommended in such a situation. We illustrate the approach on 323 transcription factor binding motifs with length at most 10 from JASPAR CORE (October 12, 2009 version),verifying that D2* is generally more powerful than D2. The program to calculate the power of D2, D2* and D2S can be
Green oxidations of furans--initiated by molecular oxygen--that give key natural product motifs.

PubMed

Montagnon, Tamsyn; Noutsias, Dimitris; Alexopoulou, Ioanna; Tofi, Maria; Vassilikogiannakis, Georgios

2011-04-07

In this article, we explore how changes in the positioning of pendant hydroxyl functionalities in the photooxygenation substrate dramatically alter the course of furan oxidations that are initiated by singlet oxygen; and, how these different reactivities can be harnessed through cascade reaction sequences to access, rapidly and effectively, a broad range of important natural product motifs.
Targeting of Arabidopsis KNL2 to Centromeres Depends on the Conserved CENPC-k Motif in Its C Terminus.

PubMed

Sandmann, Michael; Talbert, Paul; Demidov, Dmitri; Kuhlmann, Markus; Rutten, Twan; Conrad, Udo; Lermontova, Inna

2017-01-01

KINETOCHORE NULL2 (KNL2) is involved in recognition of centromeres and in centromeric localization of the centromere-specific histone cenH3. Our study revealed a cenH3 nucleosome binding CENPC-k motif at the C terminus of Arabidopsis thaliana KNL2, which is conserved among a wide spectrum of eukaryotes. Centromeric localization of KNL2 is abolished by deletion of the CENPC-k motif and by mutating single conserved amino acids, but can be restored by insertion of the corresponding motif of Arabidopsis CENP-C. We showed by electrophoretic mobility shift assay that the C terminus of KNL2 binds DNA sequence-independently and interacts with the centromeric transcripts in vitro. Chromatin immunoprecipitation with anti-KNL2 antibodies indicated that in vivo KNL2 is preferentially associated with the centromeric repeat pAL1 Complete deletion of the CENPC-k motif did not influence its ability to interact with DNA in vitro. Therefore, we suggest that KNL2 recognizes centromeric nucleosomes, similar to CENP-C, via the CENPC-k motif and binds adjoining DNA. © 2017 American Society of Plant Biologists. All rights reserved.
Systematic and fully automated identification of protein sequence patterns.

PubMed

Hart, R K; Royyuru, A K; Stolovitzky, G; Califano, A

2000-01-01

We present an efficient algorithm to systematically and automatically identify patterns in protein sequence families. The procedure is based on the Splash deterministic pattern discovery algorithm and on a framework to assess the statistical significance of patterns. We demonstrate its application to the fully automated discovery of patterns in 974 PROSITE families (the complete subset of PROSITE families which are defined by patterns and contain DR records). Splash generates patterns with better specificity and undiminished sensitivity, or vice versa, in 28% of the families; identical statistics were obtained in 48% of the families, worse statistics in 15%, and mixed behavior in the remaining 9%. In about 75% of the cases, Splash patterns identify sequence sites that overlap more than 50% with the corresponding PROSITE pattern. The procedure is sufficiently rapid to enable its use for daily curation of existing motif and profile databases. Third, our results show that the statistical significance of discovered patterns correlates well with their biological significance. The trypsin subfamily of serine proteases is used to illustrate this method's ability to exhaustively discover all motifs in a family that are statistically and biologically significant. Finally, we discuss applications of sequence patterns to multiple sequence alignment and the training of more sensitive score-based motif models, akin to the procedure used by PSI-BLAST. All results are available at httpl//www.research.ibm.com/spat/.
Uniform, optimal signal processing of mapped deep-sequencing data.

PubMed

Kumar, Vibhor; Muratani, Masafumi; Rayan, Nirmala Arul; Kraus, Petra; Lufkin, Thomas; Ng, Huck Hui; Prabhakar, Shyam

2013-07-01

Despite their apparent diversity, many problems in the analysis of high-throughput sequencing data are merely special cases of two general problems, signal detection and signal estimation. Here we adapt formally optimal solutions from signal processing theory to analyze signals of DNA sequence reads mapped to a genome. We describe DFilter, a detection algorithm that identifies regulatory features in ChIP-seq, DNase-seq and FAIRE-seq data more accurately than assay-specific algorithms. We also describe EFilter, an estimation algorithm that accurately predicts mRNA levels from as few as 1-2 histone profiles (R ∼0.9). Notably, the presence of regulatory motifs in promoters correlates more with histone modifications than with mRNA levels, suggesting that histone profiles are more predictive of cis-regulatory mechanisms. We show by applying DFilter and EFilter to embryonic forebrain ChIP-seq data that regulatory protein identification and functional annotation are feasible despite tissue heterogeneity. The mathematical formalism underlying our tools facilitates integrative analysis of data from virtually any sequencing-based functional profile.
Using RSAT oligo-analysis and dyad-analysis tools to discover regulatory signals in nucleic sequences.

PubMed

Defrance, Matthieu; Janky, Rekin's; Sand, Olivier; van Helden, Jacques

2008-01-01

This protocol explains how to discover functional signals in genomic sequences by detecting over- or under-represented oligonucleotides (words) or spaced pairs thereof (dyads) with the Regulatory Sequence Analysis Tools (http://rsat.ulb.ac.be/rsat/). Two typical applications are presented: (i) predicting transcription factor-binding motifs in promoters of coregulated genes and (ii) discovering phylogenetic footprints in promoters of orthologous genes. The steps of this protocol include purging genomic sequences to discard redundant fragments, discovering over-represented patterns and assembling them to obtain degenerate motifs, scanning sequences and drawing feature maps. The main strength of the method is its statistical ground: the binomial significance provides an efficient control on the rate of false positives. In contrast with optimization-based pattern discovery algorithms, the method supports the detection of under- as well as over-represented motifs. Computation times vary from seconds (gene clusters) to minutes (whole genomes). The execution of the whole protocol should take approximately 1 h.
OSR1 regulates a subset of inward rectifier potassium channels via a binding motif variant.

PubMed

Taylor, Clinton A; An, Sung-Wan; Kankanamalage, Sachith Gallolu; Stippec, Steve; Earnest, Svetlana; Trivedi, Ashesh T; Yang, Jonathan Zijiang; Mirzaei, Hamid; Huang, Chou-Long; Cobb, Melanie H

2018-04-10

The with-no-lysine (K) (WNK) signaling pathway to STE20/SPS1-related proline- and alanine-rich kinase (SPAK) and oxidative stress-responsive 1 (OSR1) kinase is an important mediator of cell volume and ion transport. SPAK and OSR1 associate with upstream kinases WNK 1-4, substrates, and other proteins through their C-terminal domains which interact with linear R-F-x-V/I sequence motifs. In this study we find that SPAK and OSR1 also interact with similar affinity with a motif variant, R-x-F-x-V/I. Eight of 16 human inward rectifier K + channels have an R-x-F-x-V motif. We demonstrate that two of these channels, Kir2.1 and Kir2.3, are activated by OSR1, while Kir4.1, which does not contain the motif, is not sensitive to changes in OSR1 or WNK activity. Mutation of the motif prevents activation of Kir2.3 by OSR1. Both siRNA knockdown of OSR1 and chemical inhibition of WNK activity disrupt NaCl-induced plasma membrane localization of Kir2.3. Our results suggest a mechanism by which WNK-OSR1 enhance Kir2.1 and Kir2.3 channel activity by increasing their plasma membrane localization. Regulation of members of the inward rectifier K + channel family adds functional and mechanistic insight into the physiological impact of the WNK pathway.
Efficient Identification of Murine M2 Macrophage Peptide Targeting Ligands by Phage Display and Next-Generation Sequencing.

PubMed

Liu, Gary W; Livesay, Brynn R; Kacherovsky, Nataly A; Cieslewicz, Maryelise; Lutz, Emi; Waalkes, Adam; Jensen, Michael C; Salipante, Stephen J; Pun, Suzie H

2015-08-19

Peptide ligands are used to increase the specificity of drug carriers to their target cells and to facilitate intracellular delivery. One method to identify such peptide ligands, phage display, enables high-throughput screening of peptide libraries for ligands binding to therapeutic targets of interest. However, conventional methods for identifying target binders in a library by Sanger sequencing are low-throughput, labor-intensive, and provide a limited perspective (<0.01%) of the complete sequence space. Moreover, the small sample space can be dominated by nonspecific, preferentially amplifying "parasitic sequences" and plastic-binding sequences, which may lead to the identification of false positives or exclude the identification of target-binding sequences. To overcome these challenges, we employed next-generation Illumina sequencing to couple high-throughput screening and high-throughput sequencing, enabling more comprehensive access to the phage display library sequence space. In this work, we define the hallmarks of binding sequences in next-generation sequencing data, and develop a method that identifies several target-binding phage clones for murine, alternatively activated M2 macrophages with a high (100%) success rate: sequences and binding motifs were reproducibly present across biological replicates; binding motifs were identified across multiple unique sequences; and an unselected, amplified library accurately filtered out parasitic sequences. In addition, we validate the Multiple Em for Motif Elicitation tool as an efficient and principled means of discovering binding sequences.
BIPAD: A web server for modeling bipartite sequence elements

PubMed Central

Bi, Chengpeng; Rogan, Peter K

2006-01-01

Background Many dimeric protein complexes bind cooperatively to families of bipartite nucleic acid sequence elements, which consist of pairs of conserved half-site sequences separated by intervening distances that vary among individual sites. Results We introduce the Bipad Server [1], a web interface to predict sequence elements embedded within unaligned sequences. Either a bipartite model, consisting of a pair of one-block position weight matrices (PWM's) with a gap distribution, or a single PWM matrix for contiguous single block motifs may be produced. The Bipad program performs multiple local alignment by entropy minimization and cyclic refinement using a stochastic greedy search strategy. The best models are refined by maximizing incremental information contents among a set of potential models with varying half site and gap lengths. Conclusion The web service generates information positional weight matrices, identifies binding site motifs, graphically represents the set of discovered elements as a sequence logo, and depicts the gap distribution as a histogram. Server performance was evaluated by generating a collection of bipartite models for distinct DNA binding proteins. PMID:16503993
ISRNA: an integrative online toolkit for short reads from high-throughput sequencing data.

PubMed

Luo, Guan-Zheng; Yang, Wei; Ma, Ying-Ke; Wang, Xiu-Jie

2014-02-01

Integrative Short Reads NAvigator (ISRNA) is an online toolkit for analyzing high-throughput small RNA sequencing data. Besides the high-speed genome mapping function, ISRNA provides statistics for genomic location, length distribution and nucleotide composition bias analysis of sequence reads. Number of reads mapped to known microRNAs and other classes of short non-coding RNAs, coverage of short reads on genes, expression abundance of sequence reads as well as some other analysis functions are also supported. The versatile search functions enable users to select sequence reads according to their sub-sequences, expression abundance, genomic location, relationship to genes, etc. A specialized genome browser is integrated to visualize the genomic distribution of short reads. ISRNA also supports management and comparison among multiple datasets. ISRNA is implemented in Java/C++/Perl/MySQL and can be freely accessed at http://omicslab.genetics.ac.cn/ISRNA/.
Functional region prediction with a set of appropriate homologous sequences-an index for sequence selection by integrating structure and sequence information with spatial statistics

PubMed Central

2012-01-01

Background The detection of conserved residue clusters on a protein structure is one of the effective strategies for the prediction of functional protein regions. Various methods, such as Evolutionary Trace, have been developed based on this strategy. In such approaches, the conserved residues are identified through comparisons of homologous amino acid sequences. Therefore, the selection of homologous sequences is a critical step. It is empirically known that a certain degree of sequence divergence in the set of homologous sequences is required for the identification of conserved residues. However, the development of a method to select homologous sequences appropriate for the identification of conserved residues has not been sufficiently addressed. An objective and general method to select appropriate homologous sequences is desired for the efficient prediction of functional regions. Results We have developed a novel index to select the sequences appropriate for the identification of conserved residues, and implemented the index within our method to predict the functional regions of a protein. The implementation of the index improved the performance of the functional region prediction. The index represents the degree of conserved residue clustering on the tertiary structure of the protein. For this purpose, the structure and sequence information were integrated within the index by the application of spatial statistics. Spatial statistics is a field of statistics in which not only the attributes but also the geometrical coordinates of the data are considered simultaneously. Higher degrees of clustering generate larger index scores. We adopted the set of homologous sequences with the highest index score, under the assumption that the best prediction accuracy is obtained when the degree of clustering is the maximum. The set of sequences selected by the index led to higher functional region prediction performance than the sets of sequences selected by other sequence
Self-assembly of multi-stranded RNA motifs into lattices and tubular structures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Stewart, Jaimie Marie; Subramanian, Hari K. K.; Franco, Elisa

Rational design of nucleic acidmolecules yields selfassembling scaffolds with increasing complexity, size and functionality. It is an open question whether design methods tailored to build DNA nanostructures can be adapted to build RNA nanostructures with comparable features. We demonstrate the formation of RNA lattices and tubular assemblies from double crossover (DX) tiles, a canonical motif in DNA nanotechnology. Tubular structures can exceed 1 m in length, suggesting that this DX motif can produce very robust lattices. Some of these tubes spontaneously form with left-handed chirality. We obtain assemblies by using two methods: a protocol where gel-extracted RNA strands are slowlymore » annealed, and a one-pot transcription and anneal procedure. We then identify the tile nick position as a structural requirement for lattice formation. These results demonstrate that stable RNA structures can be obtained with design tools imported from DNA nanotechnology. These large assemblies could be potentially integrated with a variety of functional RNA motifs for drug or nanoparticle delivery, or for colocalization of cellular components.« less
Self-assembly of multi-stranded RNA motifs into lattices and tubular structures

DOE PAGES

Stewart, Jaimie Marie; Subramanian, Hari K. K.; Franco, Elisa

2017-02-16

Rational design of nucleic acidmolecules yields selfassembling scaffolds with increasing complexity, size and functionality. It is an open question whether design methods tailored to build DNA nanostructures can be adapted to build RNA nanostructures with comparable features. We demonstrate the formation of RNA lattices and tubular assemblies from double crossover (DX) tiles, a canonical motif in DNA nanotechnology. Tubular structures can exceed 1 m in length, suggesting that this DX motif can produce very robust lattices. Some of these tubes spontaneously form with left-handed chirality. We obtain assemblies by using two methods: a protocol where gel-extracted RNA strands are slowlymore » annealed, and a one-pot transcription and anneal procedure. We then identify the tile nick position as a structural requirement for lattice formation. These results demonstrate that stable RNA structures can be obtained with design tools imported from DNA nanotechnology. These large assemblies could be potentially integrated with a variety of functional RNA motifs for drug or nanoparticle delivery, or for colocalization of cellular components.« less
Self-assembly of multi-stranded RNA motifs into lattices and tubular structures

PubMed Central

Stewart, Jaimie Marie; Subramanian, Hari K. K.

2017-01-01

Abstract Rational design of nucleic acid molecules yields self-assembling scaffolds with increasing complexity, size and functionality. It is an open question whether design methods tailored to build DNA nanostructures can be adapted to build RNA nanostructures with comparable features. Here we demonstrate the formation of RNA lattices and tubular assemblies from double crossover (DX) tiles, a canonical motif in DNA nanotechnology. Tubular structures can exceed 1 μm in length, suggesting that this DX motif can produce very robust lattices. Some of these tubes spontaneously form with left-handed chirality. We obtain assemblies by using two methods: a protocol where gel-extracted RNA strands are slowly annealed, and a one-pot transcription and anneal procedure. We identify the tile nick position as a structural requirement for lattice formation. Our results demonstrate that stable RNA structures can be obtained with design tools imported from DNA nanotechnology. These large assemblies could be potentially integrated with a variety of functional RNA motifs for drug or nanoparticle delivery, or for colocalization of cellular components. PMID:28204562
Discovery of a Regulatory Motif for Human Satellite DNA Transcription in Response to BATF2 Overexpression.

PubMed

Bai, Xuejia; Huang, Wenqiu; Zhang, Chenguang; Niu, Jing; Ding, Wei

2016-03-01

One of the basic leucine zipper transcription factors, BATF2, has been found to suppress cancer growth and migration. However, little is known about the genes downstream of BATF2. HeLa cells were stably transfected with BATF2, then chromatin immunoprecipitation-sequencing was employed to identify the DNA motifs responsive to BATF2. Comprehensive bioinformatics analyses indicated that the most significant motif discovered as TTCCATT[CT]GATTCCATTC[AG]AT was primarily distributed among the chromosome centromere regions and mostly within human type II satellite DNA. Such motifs were able to prime the transcription of type II satellite DNA in a directional and asymmetrical manner. Consistently, satellite II transcription was up-regulated in BATF2-overexpressing cells. The present study provides insight into understanding the role of BATF2 in tumours and the importance of satellite DNA in the maintenance of genomic stability. Copyright© 2016 International Institute of Anticancer Research (Dr. John G. Delinassios), All rights reserved.
Chaotic Motifs in Gene Regulatory Networks

PubMed Central

Zhang, Zhaoyang; Ye, Weiming; Qian, Yu; Zheng, Zhigang; Huang, Xuhui; Hu, Gang

2012-01-01

Chaos should occur often in gene regulatory networks (GRNs) which have been widely described by nonlinear coupled ordinary differential equations, if their dimensions are no less than 3. It is therefore puzzling that chaos has never been reported in GRNs in nature and is also extremely rare in models of GRNs. On the other hand, the topic of motifs has attracted great attention in studying biological networks, and network motifs are suggested to be elementary building blocks that carry out some key functions in the network. In this paper, chaotic motifs (subnetworks with chaos) in GRNs are systematically investigated. The conclusion is that: (i) chaos can only appear through competitions between different oscillatory modes with rivaling intensities. Conditions required for chaotic GRNs are found to be very strict, which make chaotic GRNs extremely rare. (ii) Chaotic motifs are explored as the simplest few-node structures capable of producing chaos, and serve as the intrinsic source of chaos of random few-node GRNs. Several optimal motifs causing chaos with atypically high probability are figured out. (iii) Moreover, we discovered that a number of special oscillators can never produce chaos. These structures bring some advantages on rhythmic functions and may help us understand the robustness of diverse biological rhythms. (iv) The methods of dominant phase-advanced driving (DPAD) and DPAD time fraction are proposed to quantitatively identify chaotic motifs and to explain the origin of chaotic behaviors in GRNs. PMID:22792171
Molecular dynamics analysis of stabilities of the telomeric Watson-Crick duplex and the associated i-motif as a function of pH and temperature.

PubMed

Panczyk, Tomasz; Wolski, Pawel

2018-06-01

This work deals with a molecular dynamics analysis of the protonated and deprotonated states of the natural sequence d[(CCCTAA) 3 CCCT] of the telomeric DNA forming the intercalated i-motif or paired with the sequence d[(CCCTAA) 3 CCCT] and forming the Watson-Crick (WC) duplex. By utilizing the amber force field for nucleic acids we built the i-motif and the WC duplex either with native cytosines or using their protonated forms. We studied, by applying molecular dynamics simulations, the role of hydrogen bonds between cytosines or in cytosine-guanine pairs in the stabilization of both structures in the physiological fluid. We found that hydrogen bonds exist in the case of protonated i-motif and in the standard form of the WC duplex. They, however, vanish in the case of the deprotonated i-motif and protonated form of the WC duplex. By determining potentials of mean force in the enforced unwrapping of these structures we found that the protonated i-motif is thermodynamically the most stable. Its deprotonation leads to spontaneous and observed directly in the unbiased calculations unfolding of the i-motif to the hairpin structure at normal temperature. The WC duplex is stable in its standard form and its slight destabilization is observed at the acidic pH. However, the protonated WC duplex unwraps very slowly at 310 K and its decomposition was not observed in the unbiased calculations. At higher temperatures (ca. 400 K or more) the WC duplex unwraps spontaneously. Copyright © 2018. Published by Elsevier B.V.
Automated Recognition of RNA Structure Motifs by Their SHAPE Data Signatures.

PubMed

Radecki, Pierce; Ledda, Mirko; Aviran, Sharon

2018-06-14

High-throughput structure profiling (SP) experiments that provide information at nucleotide resolution are revolutionizing our ability to study RNA structures. Of particular interest are RNA elements whose underlying structures are necessary for their biological functions. We previously introduced patteRNA , an algorithm for rapidly mining SP data for patterns characteristic of such motifs. This work provided a proof-of-concept for the detection of motifs and the capability of distinguishing structures displaying pronounced conformational changes. Here, we describe several improvements and automation routines to patteRNA . We then consider more elaborate biological situations starting with the comparison or integration of results from searches for distinct motifs and across datasets. To facilitate such analyses, we characterize patteRNA ’s outputs and describe a normalization framework that regularizes results. We then demonstrate that our algorithm successfully discerns between highly similar structural variants of the human immunodeficiency virus type 1 (HIV-1) Rev response element (RRE) and readily identifies its exact location in whole-genome structure profiles of HIV-1. This work highlights the breadth of information that can be gleaned from SP data and broadens the utility of data-driven methods as tools for the detection of novel RNA elements.

A Comparison Study for DNA Motif Modeling on Protein Binding Microarray.

PubMed

Wong, Ka-Chun; Li, Yue; Peng, Chengbin; Wong, Hau-San

2016-01-01

Transcription factor binding sites (TFBSs) are relatively short (5-15 bp) and degenerate. Identifying them is a computationally challenging task. In particular, protein binding microarray (PBM) is a high-throughput platform that can measure the DNA binding preference of a protein in a comprehensive and unbiased manner; for instance, a typical PBM experiment can measure binding signal intensities of a protein to all possible DNA k-mers (k = 8∼10). Since proteins can often bind to DNA with different binding intensities, one of the major challenges is to build TFBS (also known as DNA motif) models which can fully capture the quantitative binding affinity data. To learn DNA motif models from the non-convex objective function landscape, several optimization methods are compared and applied to the PBM motif model building problem. In particular, representative methods from different optimization paradigms have been chosen for modeling performance comparison on hundreds of PBM datasets. The results suggest that the multimodal optimization methods are very effective for capturing the binding preference information from PBM data. In particular, we observe a general performance improvement if choosing di-nucleotide modeling over mono-nucleotide modeling. In addition, the models learned by the best-performing method are applied to two independent applications: PBM probe rotation testing and ChIP-Seq peak sequence prediction, demonstrating its biological applicability.
Sequence requirement of the ade6-4095 meiotic recombination hotspot in Schizosaccharomyces pombe.

PubMed

Foulis, Steven J; Fowler, Kyle R; Steiner, Walter W

2018-02-01

Homologous recombination occurs at a greatly elevated frequency in meiosis compared to mitosis and is initiated by programmed double-strand DNA breaks (DSBs). DSBs do not occur at uniform frequency throughout the genome in most organisms, but occur preferentially at a limited number of sites referred to as hotspots. The location of hotspots have been determined at nucleotide-level resolution in both the budding and fission yeasts, and while several patterns have emerged regarding preferred locations for DSB hotspots, it remains unclear why particular sites experience DSBs at much higher frequency than other sites with seemingly similar properties. Short sequence motifs, which are often sites for binding of transcription factors, are known to be responsible for a number of hotspots. In this study we identified the minimum sequence required for activity of one of such motif identified in a screen of random sequences capable of producing recombination hotspots. The experimentally determined sequence, GGTCTRGACC, closely matches the previously inferred sequence. Full hotspot activity requires an effective sequence length of 9.5 bp, whereas moderate activity requires an effective sequence length of approximately 8.2 bp and shows significant association with DSB hotspots. In combination with our previous work, this result is consistent with a large number of different sequence motifs capable of producing recombination hotspots, and supports a model in which hotspots can be rapidly regenerated by mutation as they are lost through recombination.
Sequential visibility-graph motifs

NASA Astrophysics Data System (ADS)

Iacovacci, Jacopo; Lacasa, Lucas

2016-04-01

Visibility algorithms transform time series into graphs and encode dynamical information in their topology, paving the way for graph-theoretical time series analysis as well as building a bridge between nonlinear dynamics and network science. In this work we introduce and study the concept of sequential visibility-graph motifs, smaller substructures of n consecutive nodes that appear with characteristic frequencies. We develop a theory to compute in an exact way the motif profiles associated with general classes of deterministic and stochastic dynamics. We find that this simple property is indeed a highly informative and computationally efficient feature capable of distinguishing among different dynamics and robust against noise contamination. We finally confirm that it can be used in practice to perform unsupervised learning, by extracting motif profiles from experimental heart-rate series and being able, accordingly, to disentangle meditative from other relaxation states. Applications of this general theory include the automatic classification and description of physical, biological, and financial time series.
Members of the Meloidogyne avirulence protein family contain multiple plant ligand-like motifs.

PubMed

Rutter, William B; Hewezi, Tarek; Maier, Tom R; Mitchum, Melissa G; Davis, Eric L; Hussey, Richard S; Baum, Thomas J

2014-08-01

Sedentary plant-parasitic nematodes engage in complex interactions with their host plants by secreting effector proteins. Some effectors of both root-knot nematodes (Meloidogyne spp.) and cyst nematodes (Heterodera and Globodera spp.) mimic plant ligand proteins. Most prominently, cyst nematodes secrete effectors that mimic plant CLAVATA3/ESR-related (CLE) ligand proteins. However, only cyst nematodes have been shown to secrete such effectors and to utilize CLE ligand mimicry in their interactions with host plants. Here, we document the presence of ligand-like motifs in bona fide root-knot nematode effectors that are most similar to CLE peptides from plants and cyst nematodes. We have identified multiple tandem CLE-like motifs conserved within the previously identified Meloidogyne avirulence protein (MAP) family that are secreted from root-knot nematodes and have been shown to function in planta. By searching all 12 MAP family members from multiple Meloidogyne spp., we identified 43 repetitive CLE-like motifs composing 14 unique variants. At least one CLE-like motif was conserved in each MAP family member. Furthermore, we documented the presence of other conserved sequences that resemble the variable domains described in Heterodera and Globodera CLE effectors. These findings document that root-knot nematodes appear to use CLE ligand mimicry and point toward a common host node targeted by two evolutionarily diverse groups of nematodes. As a consequence, it is likely that CLE signaling pathways are important in other phytonematode pathosystems as well.
Comparison of simple sequence repeats in 19 Archaea.

PubMed

Trivedi, S

2006-12-05

All organisms that have been studied until now have been found to have differential distribution of simple sequence repeats (SSRs), with more SSRs in intergenic than in coding sequences. SSR distribution was investigated in Archaea genomes where complete chromosome sequences of 19 Archaea were analyzed with the program SPUTNIK to find di- to penta-nucleotide repeats. The number of repeats was determined for the complete chromosome sequences and for the coding and non-coding sequences. Different from what has been found for other groups of organisms, there is an abundance of SSRs in coding regions of the genome of some Archaea. Dinucleotide repeats were rare and CG repeats were found in only two Archaea. In general, trinucleotide repeats are the most abundant SSR motifs; however, pentanucleotide repeats are abundant in some Archaea. Some of the tetranucleotide and pentanucleotide repeat motifs are organism specific. In general, repeats are short and CG-rich repeats are present in Archaea having a CG-rich genome. Among the 19 Archaea, SSR density was not correlated with genome size or with optimum growth temperature. Pentanucleotide density had an inverse correlation with the CG content of the genome.
Sequence analysis reveals genomic factors affecting EST-SSR primer performance and polymorphism

USDA-ARS?s Scientific Manuscript database

Search for simple sequence repeat (SSR) motifs and design of flanking primers in expressed sequence tag (EST) sequences can be easily done at a large scale using bioinformatics programs. However, failed amplification and/or detection, along with lack of polymorphism, is often seen among randomly sel...
Role of sequence encoded κB DNA geometry in gene regulation by Dorsal

PubMed Central

Mrinal, Nirotpal; Tomar, Archana; Nagaraju, Javaregowda

2011-01-01

Many proteins of the Rel family can act as both transcriptional activators and repressors. However, mechanism that discerns the ‘activator/repressor’ functions of Rel-proteins such as Dorsal (Drosophila homologue of mammalian NFκB) is not understood. Using genomic, biophysical and biochemical approaches, we demonstrate that the underlying principle of this functional specificity lies in the ‘sequence-encoded structure’ of the κB-DNA. We show that Dorsal-binding motifs exist in distinct activator and repressor conformations. Molecular dynamics of DNA-Dorsal complexes revealed that repressor κB-motifs typically have A-tract and flexible conformation that facilitates interaction with co-repressors. Deformable structure of repressor motifs, is due to changes in the hydrogen bonding in A:T pair in the ‘A-tract’ core. The sixth nucleotide in the nonameric κB-motif, ‘A’ (A6) in the repressor motifs and ‘T’ (T6) in the activator motifs, is critical to confer this functional specificity as A6 → T6 mutation transformed flexible repressor conformation into a rigid activator conformation. These results highlight that ‘sequence encoded κB DNA-geometry’ regulates gene expression by exerting allosteric effect on binding of Rel proteins which in turn regulates interaction with co-regulators. Further, we identified and characterized putative repressor motifs in Dl-target genes, which can potentially aid in functional annotation of Dorsal gene regulatory network. PMID:21890896
Pathogen recognition of a novel C-type lectin from Marsupenaeus japonicus reveals the divergent sugar-binding specificity of QAP motif.

PubMed

Alenton, Rod Russel R; Koiwai, Keiichiro; Miyaguchi, Kohei; Kondo, Hidehiro; Hirono, Ikuo

2017-04-04

C-type lectins (CTLs) are calcium-dependent carbohydrate-binding proteins known to assist the innate immune system as pattern recognition receptors (PRRs). The binding specificity of CTLs lies in the motif of their carbohydrate recognition domain (CRD), the tripeptide motifs EPN and QPD bind to mannose and galactose, respectively. However, variants of these motifs were discovered including a QAP sequence reported in shrimp believed to have the same carbohydrate specificity as QPD. Here, we characterized a novel C-type lectin (MjGCTL) possessing a CRD with a QAP motif. The recombinant MjGCTL has a calcium-dependent agglutinating capability against both Gram-negative and Gram-positive bacteria, and its sugar specificity did not involve either mannose or galactose. In an encapsulation assay, agarose beads coated with rMjGCTL were immediately encapsulated from 0 h followed by melanization at 4 h post-incubation with hemocytes. These results confirm that MjGCTL functions as a classical CTL. The structure of QAP motif and carbohydrate-specificity of rMjGCTL was found to be different to both EPN and QPD, suggesting that QAP is a new motif. Furthermore, MjGCTL acts as a PRR binding to hemocytes to activate their adherent state and initiate encapsulation.
Pathogen recognition of a novel C-type lectin from Marsupenaeus japonicus reveals the divergent sugar-binding specificity of QAP motif

PubMed Central

Alenton, Rod Russel R.; Koiwai, Keiichiro; Miyaguchi, Kohei; Kondo, Hidehiro; Hirono, Ikuo

2017-01-01

C-type lectins (CTLs) are calcium-dependent carbohydrate-binding proteins known to assist the innate immune system as pattern recognition receptors (PRRs). The binding specificity of CTLs lies in the motif of their carbohydrate recognition domain (CRD), the tripeptide motifs EPN and QPD bind to mannose and galactose, respectively. However, variants of these motifs were discovered including a QAP sequence reported in shrimp believed to have the same carbohydrate specificity as QPD. Here, we characterized a novel C-type lectin (MjGCTL) possessing a CRD with a QAP motif. The recombinant MjGCTL has a calcium-dependent agglutinating capability against both Gram-negative and Gram-positive bacteria, and its sugar specificity did not involve either mannose or galactose. In an encapsulation assay, agarose beads coated with rMjGCTL were immediately encapsulated from 0 h followed by melanization at 4 h post-incubation with hemocytes. These results confirm that MjGCTL functions as a classical CTL. The structure of QAP motif and carbohydrate-specificity of rMjGCTL was found to be different to both EPN and QPD, suggesting that QAP is a new motif. Furthermore, MjGCTL acts as a PRR binding to hemocytes to activate their adherent state and initiate encapsulation. PMID:28374848
Multiple Dileucine-like Motifs Direct VGLUT1 Trafficking

PubMed Central

Foss, Sarah M.; Li, Haiyan; Santos, Magda S.; Edwards, Robert H.

2013-01-01

The vesicular glutamate transporters (VGLUTs) package glutamate into synaptic vesicles, and the two principal isoforms VGLUT1 and VGLUT2 have been suggested to influence the properties of release. To understand how a VGLUT isoform might influence transmitter release, we have studied their trafficking and previously identified a dileucine-like endocytic motif in the C terminus of VGLUT1. Disruption of this motif impairs the activity-dependent recycling of VGLUT1, but does not eliminate its endocytosis. We now report the identification of two additional dileucine-like motifs in the N terminus of VGLUT1 that are not well conserved in the other isoforms. In the absence of all three motifs, rat VGLUT1 shows limited accumulation at synaptic sites and no longer responds to stimulation. In addition, shRNA-mediated knockdown of clathrin adaptor proteins AP-1 and AP-2 shows that the C-terminal motif acts largely via AP-2, whereas the N-terminal motifs use AP-1. Without the C-terminal motif, knockdown of AP-1 reduces the proportion of VGLUT1 that responds to stimulation. VGLUT1 thus contains multiple sorting signals that engage distinct trafficking mechanisms. In contrast to VGLUT1, the trafficking of VGLUT2 depends almost entirely on the conserved C-terminal dileucine-like motif: without this motif, a substantial fraction of VGLUT2 redistributes to the plasma membrane and the transporter's synaptic localization is disrupted. Consistent with these differences in trafficking signals, wild-type VGLUT1 and VGLUT2 differ in their response to stimulation. PMID:23804088
Multiple dileucine-like motifs direct VGLUT1 trafficking.

PubMed

Foss, Sarah M; Li, Haiyan; Santos, Magda S; Edwards, Robert H; Voglmaier, Susan M

2013-06-26

The vesicular glutamate transporters (VGLUTs) package glutamate into synaptic vesicles, and the two principal isoforms VGLUT1 and VGLUT2 have been suggested to influence the properties of release. To understand how a VGLUT isoform might influence transmitter release, we have studied their trafficking and previously identified a dileucine-like endocytic motif in the C terminus of VGLUT1. Disruption of this motif impairs the activity-dependent recycling of VGLUT1, but does not eliminate its endocytosis. We now report the identification of two additional dileucine-like motifs in the N terminus of VGLUT1 that are not well conserved in the other isoforms. In the absence of all three motifs, rat VGLUT1 shows limited accumulation at synaptic sites and no longer responds to stimulation. In addition, shRNA-mediated knockdown of clathrin adaptor proteins AP-1 and AP-2 shows that the C-terminal motif acts largely via AP-2, whereas the N-terminal motifs use AP-1. Without the C-terminal motif, knockdown of AP-1 reduces the proportion of VGLUT1 that responds to stimulation. VGLUT1 thus contains multiple sorting signals that engage distinct trafficking mechanisms. In contrast to VGLUT1, the trafficking of VGLUT2 depends almost entirely on the conserved C-terminal dileucine-like motif: without this motif, a substantial fraction of VGLUT2 redistributes to the plasma membrane and the transporter's synaptic localization is disrupted. Consistent with these differences in trafficking signals, wild-type VGLUT1 and VGLUT2 differ in their response to stimulation.
MIR@NT@N: a framework integrating transcription factors, microRNAs and their targets to identify sub-network motifs in a meta-regulation network model

PubMed Central

2011-01-01

Background To understand biological processes and diseases, it is crucial to unravel the concerted interplay of transcription factors (TFs), microRNAs (miRNAs) and their targets within regulatory networks and fundamental sub-networks. An integrative computational resource generating a comprehensive view of these regulatory molecular interactions at a genome-wide scale would be of great interest to biologists, but is not available to date. Results To identify and analyze molecular interaction networks, we developed MIR@NT@N, an integrative approach based on a meta-regulation network model and a large-scale database. MIR@NT@N uses a graph-based approach to predict novel molecular actors across multiple regulatory processes (i.e. TFs acting on protein-coding or miRNA genes, or miRNAs acting on messenger RNAs). Exploiting these predictions, the user can generate networks and further analyze them to identify sub-networks, including motifs such as feedback and feedforward loops (FBL and FFL). In addition, networks can be built from lists of molecular actors with an a priori role in a given biological process to predict novel and unanticipated interactions. Analyses can be contextualized and filtered by integrating additional information such as microarray expression data. All results, including generated graphs, can be visualized, saved and exported into various formats. MIR@NT@N performances have been evaluated using published data and then applied to the regulatory program underlying epithelium to mesenchyme transition (EMT), an evolutionary-conserved process which is implicated in embryonic development and disease. Conclusions MIR@NT@N is an effective computational approach to identify novel molecular regulations and to predict gene regulatory networks and sub-networks including conserved motifs within a given biological context. Taking advantage of the M@IA environment, MIR@NT@N is a user-friendly web resource freely available at http://mironton.uni.lu which will be
DNA Sequences from Formalin-Fixed Nematodes: Integrating Molecular and Morphological Approaches to Taxonomy

PubMed Central

Thomas, W. Kelley; Vida, J. T.; Frisse, Linda M.; Mundo, Manuel; Baldwin, James G.

1997-01-01

To effectively integrate DNA sequence analysis and classical nematode taxonomy, we must be able to obtain DNA sequences from formalin-fixed specimens. Microdissected sections of nematodes were removed from specimens fixed in formalin, using standard protocols and without destroying morphological features. The fixed sections provided sufficient template for multiple polymerase chain reaction-based DNA sequence analyses. PMID:19274156
Fragile DNA Motifs Trigger Mutagenesis at Distant Chromosomal Loci in Saccharomyces cerevisiae

PubMed Central

Saini, Natalie; Zhang, Yu; Nishida, Yuri; Sheng, Ziwei; Choudhury, Shilpa; Mieczkowski, Piotr; Lobachev, Kirill S.

2013-01-01

DNA sequences capable of adopting non-canonical secondary structures have been associated with gross-chromosomal rearrangements in humans and model organisms. Previously, we have shown that long inverted repeats that form hairpin and cruciform structures and triplex-forming GAA/TTC repeats induce the formation of double-strand breaks which trigger genome instability in yeast. In this study, we demonstrate that breakage at both inverted repeats and GAA/TTC repeats is augmented by defects in DNA replication. Increased fragility is associated with increased mutation levels in the reporter genes located as far as 8 kb from both sides of the repeats. The increase in mutations was dependent on the presence of inverted or GAA/TTC repeats and activity of the translesion polymerase Polζ. Mutagenesis induced by inverted repeats also required Sae2 which opens hairpin-capped breaks and initiates end resection. The amount of breakage at the repeats is an important determinant of mutations as a perfect palindromic sequence with inherently increased fragility was also found to elevate mutation rates even in replication-proficient strains. We hypothesize that the underlying mechanism for mutagenesis induced by fragile motifs involves the formation of long single-stranded regions in the broken chromosome, invasion of the undamaged sister chromatid for repair, and faulty DNA synthesis employing Polζ. These data demonstrate that repeat-mediated breaks pose a dual threat to eukaryotic genome integrity by inducing chromosomal aberrations as well as mutations in flanking genes. PMID:23785298
Ca2+-Induced Rigidity Change of the Myosin VIIa IQ Motif-Single α Helix Lever Arm Extension.

PubMed

Li, Jianchao; Chen, Yiyun; Deng, Yisong; Unarta, Ilona Christy; Lu, Qing; Huang, Xuhui; Zhang, Mingjie

2017-04-04

Several unconventional myosins contain a highly charged single α helix (SAH) immediately following the calmodulin (CaM) binding IQ motifs, functioning to extend lever arms of these myosins. How such SAH is connected to the IQ motifs and whether the conformation of the IQ motifs-SAH segments are regulated by Ca 2+ fluctuations are not known. Here, we demonstrate by solving its crystal structure that the predicted SAH of myosin VIIa (Myo7a) forms a stable SAH. The structure of Myo7a IQ5-SAH segment in complex with apo-CaM reveals that the SAH sequence can extend the length of the Myo7a lever arm. Although Ca 2+ -CaM remains bound to IQ5-SAH, the Ca 2+ -induced CaM binding mode change softens the conformation of the IQ5-SAH junction, revealing a Ca 2+ -induced lever arm flexibility change for Myo7a. We further demonstrate that the last IQ motif of several other myosins also binds to both apo- and Ca 2+ -CaM, suggesting a common Ca 2+ -induced conformational regulation mechanism. Copyright © 2017 Elsevier Ltd. All rights reserved.
Structural basis for genome wide recognition of 5-bp GC motifs by SMAD transcription factors.

PubMed

Martin-Malpartida, Pau; Batet, Marta; Kaczmarska, Zuzanna; Freier, Regina; Gomes, Tiago; Aragón, Eric; Zou, Yilong; Wang, Qiong; Xi, Qiaoran; Ruiz, Lidia; Vea, Angela; Márquez, José A; Massagué, Joan; Macias, Maria J

2017-12-12

Smad transcription factors activated by TGF-β or by BMP receptors form trimeric complexes with Smad4 to target specific genes for cell fate regulation. The CAGAC motif has been considered as the main binding element for Smad2/3/4, whereas Smad1/5/8 have been thought to preferentially bind GC-rich elements. However, chromatin immunoprecipitation analysis in embryonic stem cells showed extensive binding of Smad2/3/4 to GC-rich cis-regulatory elements. Here, we present the structural basis for specific binding of Smad3 and Smad4 to GC-rich motifs in the goosecoid promoter, a nodal-regulated differentiation gene. The structures revealed a 5-bp consensus sequence GGC(GC)|(CG) as the binding site for both TGF-β and BMP-activated Smads and for Smad4. These 5GC motifs are highly represented as clusters in Smad-bound regions genome-wide. Our results provide a basis for understanding the functional adaptability of Smads in different cellular contexts, and their dependence on lineage-determining transcription factors to target specific genes in TGF-β and BMP pathways.
CORE-SINEs: eukaryotic short interspersed retroposing elements with common sequence motifs.

PubMed

Gilbert, N; Labuda, D

1999-03-16

A 65-bp "core" sequence is dispersed in hundreds of thousands copies in the human genome. This sequence was found to constitute the central segment of a group of short interspersed elements (SINEs), referred to as mammalian-wide interspersed repeats, that proliferated before the radiation of placental mammals. Here, we propose that the core identifies an ancient tRNA-like SINE element, which survived in different lineages such as mammals, reptiles, birds, and fish, as well as mollusks, presumably for >550 million years. This element gave rise to a number of sequence families (CORE-SINEs), including mammalian-wide interspersed repeats, whose distinct 3' ends are shared with different families of long interspersed elements (LINEs). The evolutionary success of the generic CORE-SINE element can be related to the recruitment of the internal promoter from highly transcribed host RNA as well as to its capacity to adapt to changing retropositional opportunities by sequence exchange with actively amplifying LINEs. It reinforces the notion that the very existence of SINEs depends on the cohabitation with both LINEs and the host genome.
CORE-SINEs: Eukaryotic short interspersed retroposing elements with common sequence motifs

PubMed Central

Gilbert, Nicolas; Labuda, Damian

1999-01-01

A 65-bp “core” sequence is dispersed in hundreds of thousands copies in the human genome. This sequence was found to constitute the central segment of a group of short interspersed elements (SINEs), referred to as mammalian-wide interspersed repeats, that proliferated before the radiation of placental mammals. Here, we propose that the core identifies an ancient tRNA-like SINE element, which survived in different lineages such as mammals, reptiles, birds, and fish, as well as mollusks, presumably for >550 million years. This element gave rise to a number of sequence families (CORE-SINEs), including mammalian-wide interspersed repeats, whose distinct 3′ ends are shared with different families of long interspersed elements (LINEs). The evolutionary success of the generic CORE-SINE element can be related to the recruitment of the internal promoter from highly transcribed host RNA as well as to its capacity to adapt to changing retropositional opportunities by sequence exchange with actively amplifying LINEs. It reinforces the notion that the very existence of SINEs depends on the cohabitation with both LINEs and the host genome. PMID:10077603
Simple sequence repeat marker development from bacterial artificial chromosome end sequences and expressed sequence tags of flax (Linum usitatissimum L.).

PubMed

Cloutier, Sylvie; Miranda, Evelyn; Ward, Kerry; Radovanovic, Natasa; Reimer, Elsa; Walichnowski, Andrzej; Datla, Raju; Rowland, Gordon; Duguid, Scott; Ragupathy, Raja

2012-08-01

Flax is an important oilseed crop in North America and is mostly grown as a fibre crop in Europe. As a self-pollinated diploid with a small estimated genome size of ~370 Mb, flax is well suited for fast progress in genomics. In the last few years, important genetic resources have been developed for this crop. Here, we describe the assessment and comparative analyses of 1,506 putative simple sequence repeats (SSRs) of which, 1,164 were derived from BAC-end sequences (BESs) and 342 from expressed sequence tags (ESTs). The SSRs were assessed on a panel of 16 flax accessions with 673 (58 %) and 145 (42 %) primer pairs being polymorphic in the BESs and ESTs, respectively. With 818 novel polymorphic SSR primer pairs reported in this study, the repertoire of available SSRs in flax has more than doubled from the combined total of 508 of all previous reports. Among nucleotide motifs, trinucleotides were the most abundant irrespective of the class, but dinucleotides were the most polymorphic. SSR length was also positively correlated with polymorphism. Two dinucleotide (AT/TA and AG/GA) and two trinucleotide (AAT/ATA/TAA and GAA/AGA/AAG) motifs and their iterations, different from those reported in many other crops, accounted for more than half of all the SSRs and were also more polymorphic (63.4 %) than the rest of the markers (42.7 %). This improved resource promises to be useful in genetic, quantitative trait loci (QTL) and association mapping as well as for anchoring the physical/genetic map with the whole genome shotgun reference sequence of flax.
Motif-independent prediction of a secondary metabolism gene cluster using comparative genomics: application to sequenced genomes of Aspergillus and ten other filamentous fungal species.

PubMed

Takeda, Itaru; Umemura, Myco; Koike, Hideaki; Asai, Kiyoshi; Machida, Masayuki

2014-08-01

Despite their biological importance, a significant number of genes for secondary metabolite biosynthesis (SMB) remain undetected due largely to the fact that they are highly diverse and are not expressed under a variety of cultivation conditions. Several software tools including SMURF and antiSMASH have been developed to predict fungal SMB gene clusters by finding core genes encoding polyketide synthase, nonribosomal peptide synthetase and dimethylallyltryptophan synthase as well as several others typically present in the cluster. In this work, we have devised a novel comparative genomics method to identify SMB gene clusters that is independent of motif information of the known SMB genes. The method detects SMB gene clusters by searching for a similar order of genes and their presence in nonsyntenic blocks. With this method, we were able to identify many known SMB gene clusters with the core genes in the genomic sequences of 10 filamentous fungi. Furthermore, we have also detected SMB gene clusters without core genes, including the kojic acid biosynthesis gene cluster of Aspergillus oryzae. By varying the detection parameters of the method, a significant difference in the sequence characteristics was detected between the genes residing inside the clusters and those outside the clusters. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

A Conserved Metal Binding Motif in the Bacillus subtilis Competence Protein ComFA Enhances Transformation.

PubMed

Chilton, Scott S; Falbel, Tanya G; Hromada, Susan; Burton, Briana M

2017-08-01

Genetic competence is a process in which cells are able to take up DNA from their environment, resulting in horizontal gene transfer, a major mechanism for generating diversity in bacteria. Many bacteria carry homologs of the central DNA uptake machinery that has been well characterized in Bacillus subtilis It has been postulated that the B. subtilis competence helicase ComFA belongs to the DEAD box family of helicases/translocases. Here, we made a series of mutants to analyze conserved amino acid motifs in several regions of B. subtilis ComFA. First, we confirmed that ComFA activity requires amino acid residues conserved among the DEAD box helicases, and second, we show that a zinc finger-like motif consisting of four cysteines is required for efficient transformation. Each cysteine in the motif is important, and mutation of at least two of the cysteines dramatically reduces transformation efficiency. Further, combining multiple cysteine mutations with the helicase mutations shows an additive phenotype. Our results suggest that the helicase and metal binding functions are two distinct activities important for ComFA function during transformation. IMPORTANCE ComFA is a highly conserved protein that has a role in DNA uptake during natural competence, a mechanism for horizontal gene transfer observed in many bacteria. Investigation of the details of the DNA uptake mechanism is important for understanding the ways in which bacteria gain new traits from their environment, such as drug resistance. To dissect the role of ComFA in the DNA uptake machinery, we introduced point mutations into several motifs in the protein sequence. We demonstrate that several amino acid motifs conserved among ComFA proteins are important for efficient transformation. This report is the first to demonstrate the functional requirement of an amino-terminal cysteine motif in ComFA. Copyright © 2017 American Society for Microbiology.
Identification of high-efficiency 3'GG gRNA motifs in indexed FASTA files with ngg2.

PubMed

Roberson, Elisha D O

CRISPR/Cas9 is emerging as one of the most-used methods of genome modification in organisms ranging from bacteria to human cells. However, the efficiency of editing varies tremendously site-to-site. A recent report identified a novel motif, called the 3'GG motif, which substantially increases the efficiency of editing at all sites tested in C. elegans . Furthermore, they highlighted that previously published gRNAs with high editing efficiency also had this motif. I designed a python command-line tool, ngg2, to identify 3'GG gRNA sites from indexed FASTA files. As a proof-of-concept, I screened for these motifs in six model genomes: Saccharomyces cerevisiae , Caenorhabditis elegans , Drosophila melanogaster , Danio rerio , Mus musculus , and Homo sapiens. I also scanned the genomes of pig ( Sus scrofa ) and African elephant ( Loxodonta africana ) to demonstrate the utility in non-model organisms. I identified more than 60 million single match 3'GG motifs in these genomes. Greater than 61% of all protein coding genes in the reference genomes had at least one unique 3'GG gRNA site overlapping an exon. In particular, more than 96% of mouse and 93% of human protein coding genes have at least one unique, overlapping 3'GG gRNA. These identified sites can be used as a starting point in gRNA selection, and the ngg2 tool provides an important ability to identify 3'GG editing sites in any species with an available genome sequence.
Functional structural motifs for protein-ligand, protein-protein, and protein-nucleic acid interactions and their connection to supersecondary structures.

PubMed

Kinjo, Akira R; Nakamura, Haruki

2013-01-01

Protein functions are mediated by interactions between proteins and other molecules. One useful approach to analyze protein functions is to compare and classify the structures of interaction interfaces of proteins. Here, we describe the procedures for compiling a database of interface structures and efficiently comparing the interface structures. To do so requires a good understanding of the data structures of the Protein Data Bank (PDB). Therefore, we also provide a detailed account of the PDB exchange dictionary necessary for extracting data that are relevant for analyzing interaction interfaces and secondary structures. We identify recurring structural motifs by classifying similar interface structures, and we define a coarse-grained representation of supersecondary structures (SSS) which represents a sequence of two or three secondary structure elements including their relative orientations as a string of four to seven letters. By examining the correspondence between structural motifs and SSS strings, we show that no SSS string has particularly high propensity to be found interaction interfaces in general, indicating any SSS can be used as a binding interface. When individual structural motifs are examined, there are some SSS strings that have high propensity for particular groups of structural motifs. In addition, it is shown that while the SSS strings found in particular structural motifs for nonpolymer and protein interfaces are as abundant as in other structural motifs that belong to the same subunit, structural motifs for nucleic acid interfaces exhibit somewhat stronger preference for SSS strings. In regard to protein folds, many motif-specific SSS strings were found across many folds, suggesting that SSS may be a useful description to investigate the universality of ligand binding modes.
An evolutionarily conserved motif in the TAB1 C-terminal region is necessary for interaction with and activation of TAK1 MAPKKK.

PubMed

Ono, K; Ohtomo, T; Sato, S; Sugamata, Y; Suzuki, M; Hisamoto, N; Ninomiya-Tsuji, J; Tsuchiya, M; Matsumoto, K

2001-06-29

TAK1, a member of the MAPKKK family, is involved in the intracellular signaling pathways mediated by transforming growth factor beta, interleukin 1, and Wnt. TAK1 kinase activity is specifically activated by the TAK1-binding protein TAB1. The C-terminal 68-amino acid sequence of TAB1 (TAB1-C68) is sufficient for TAK1 interaction and activation. Analysis of various truncated versions of TAB1-C68 defined a C-terminal 30-amino acid sequence (TAB1-C30) necessary for TAK1 binding and activation. NMR studies revealed that the TAB1-C30 region has a unique alpha-helical structure. We identified a conserved sequence motif, PYVDXA/TXF, in the C-terminal domain of mammalian TAB1, Xenopus TAB1, and its Caenorhabditis elegans homolog TAP-1, suggesting that this motif constitutes a specific TAK1 docking site. Alanine substitution mutagenesis showed that TAB1 Phe-484, located in the conserved motif, is crucial for TAK1 binding and activation. The C. elegans homolog of TAB1, TAP-1, was able to interact with and activate the C. elegans homolog of TAK1, MOM-4. However, the site in TAP-1 corresponding to Phe-484 of TAB1 is an alanine residue (Ala-364), and changing this residue to Phe abrogates the ability of TAP-1 to interact with and activate MOM-4. These results suggest that the Phe or Ala residue within the conserved motif of the TAB1-related proteins is important for interaction with and activation of specific TAK1 MAPKKK family members in vivo.
Mapping and analysis of Caenorhabditis elegans transcription factor sequence specificities

PubMed Central

Narasimhan, Kamesh; Lambert, Samuel A; Yang, Ally WH; Riddell, Jeremy; Mnaimneh, Sanie; Zheng, Hong; Albu, Mihai; Najafabadi, Hamed S; Reece-Hoyes, John S; Fuxman Bass, Juan I; Walhout, Albertha JM; Weirauch, Matthew T; Hughes, Timothy R

2015-01-01

Caenorhabditis elegans is a powerful model for studying gene regulation, as it has a compact genome and a wealth of genomic tools. However, identification of regulatory elements has been limited, as DNA-binding motifs are known for only 71 of the estimated 763 sequence-specific transcription factors (TFs). To address this problem, we performed protein binding microarray experiments on representatives of canonical TF families in C. elegans, obtaining motifs for 129 TFs. Additionally, we predict motifs for many TFs that have DNA-binding domains similar to those already characterized, increasing coverage of binding specificities to 292 C. elegans TFs (∼40%). These data highlight the diversification of binding motifs for the nuclear hormone receptor and C2H2 zinc finger families and reveal unexpected diversity of motifs for T-box and DM families. Motif enrichment in promoters of functionally related genes is consistent with known biology and also identifies putative regulatory roles for unstudied TFs. DOI: http://dx.doi.org/10.7554/eLife.06967.001 PMID:25905672
SiteBinder: an improved approach for comparing multiple protein structural motifs.

PubMed

Sehnal, David; Vařeková, Radka Svobodová; Huber, Heinrich J; Geidl, Stanislav; Ionescu, Crina-Maria; Wimmerová, Michaela; Koča, Jaroslav

2012-02-27

There is a paramount need to develop new techniques and tools that will extract as much information as possible from the ever growing repository of protein 3D structures. We report here on the development of a software tool for the multiple superimposition of large sets of protein structural motifs. Our superimposition methodology performs a systematic search for the atom pairing that provides the best fit. During this search, the RMSD values for all chemically relevant pairings are calculated by quaternion algebra. The number of evaluated pairings is markedly decreased by using PDB annotations for atoms. This approach guarantees that the best fit will be found and can be applied even when sequence similarity is low or does not exist at all. We have implemented this methodology in the Web application SiteBinder, which is able to process up to thousands of protein structural motifs in a very short time, and which provides an intuitive and user-friendly interface. Our benchmarking analysis has shown the robustness, efficiency, and versatility of our methodology and its implementation by the successful superimposition of 1000 experimentally determined structures for each of 32 eukaryotic linear motifs. We also demonstrate the applicability of SiteBinder using three case studies. We first compared the structures of 61 PA-IIL sugar binding sites containing nine different sugars, and we found that the sugar binding sites of PA-IIL and its mutants have a conserved structure despite their binding different sugars. We then superimposed over 300 zinc finger central motifs and revealed that the molecular structure in the vicinity of the Zn atom is highly conserved. Finally, we superimposed 12 BH3 domains from pro-apoptotic proteins. Our findings come to support the hypothesis that there is a structural basis for the functional segregation of BH3-only proteins into activators and enablers.
The presence of the ancestral insect telomeric motif in kissing bugs (Triatominae) rules out the hypothesis of its loss in evolutionarily advanced Heteroptera (Cimicomorpha)

PubMed Central

Pita, Sebastián; Panzera, Francisco; Mora, Pablo; Vela, Jesús; Palomeque, Teresa; Lorite, Pedro

2016-01-01

Abstract Next-generation sequencing data analysis on Triatoma infestans Klug, 1834 (Heteroptera, Cimicomorpha, Reduviidae) revealed the presence of the ancestral insect (TTAGG)n telomeric motif in its genome. Fluorescence in situ hybridization confirms that chromosomes bear this telomeric sequence in their chromosomal ends. Furthermore, motif amount estimation was about 0.03% of the total genome, so that the average telomere length in each chromosomal end is almost 18 kb long. We also detected the presence of (TTAGG)n telomeric repeat in mitotic and meiotic chromosomes in other three species of Triatominae: Triatoma dimidiata Latreille, 1811, Dipetalogaster maxima Uhler, 1894, and Rhodnius prolixus Ståhl, 1859. This is the first report of the (TTAGG)n telomeric repeat in the infraorder Cimicomorpha, contradicting the currently accepted hypothesis that evolutionarily recent heteropterans lack this ancestral insect telomeric sequence. PMID:27830050
A Glance at Microsatellite Motifs from 454 Sequencing Reads of Watermelon Genomic DNA

USDA-ARS?s Scientific Manuscript database

A single 454 (Life Sciences Sequencing Technology) run of Charleston Gray watermelon (Citrullus lanatus var. lanatus) genomic DNA was performed and sequence data were assembled. A large scale identification of simple sequence repeat (SSR) was performed and SSR sequence data were used for the develo...
PUTATIVE GENE PROMOTER SEQUENCES IN THE CHLORELLA VIRUSES

PubMed Central

Fitzgerald, Lisa A.; Boucher, Philip T.; Yanai-Balser, Giane; Suhre, Karsten; Graves, Michael V.; Van Etten, James L.

2008-01-01

Three short (7 to 9 nucleotides) highly conserved nucleotide sequences were identified in the putative promoter regions (150 bp upstream and 50 bp downstream of the ATG translation start site) of three members of the genus Chlorovirus, family Phycodnaviridae. Most of these sequences occurred in similar locations within the defined promoter regions. The sequence and location of the motifs were often conserved among homologous ORFs within the Chlorovirus family. One of these conserved sequences (AATGACA) is predominately associated with genes expressed early in virus replication. PMID:18768195
Structural Integrity of the Greek Key Motif in βγ-Crystallins Is Vital for Central Eye Lens Transparency

PubMed Central

Vendra, Venkata Pulla Rao; Agarwal, Garima; Chandani, Sushil; Talla, Venu; Srinivasan, Narayanaswamy; Balasubramanian, Dorairajan

2013-01-01

Background We highlight an unrecognized physiological role for the Greek key motif, an evolutionarily conserved super-secondary structural topology of the βγ-crystallins. These proteins constitute the bulk of the human eye lens, packed at very high concentrations in a compact, globular, short-range order, generating transparency. Congenital cataract (affecting 400,000 newborns yearly worldwide), associated with 54 mutations in βγ-crystallins, occurs in two major phenotypes nuclear cataract, which blocks the central visual axis, hampering the development of the growing eye and demanding earliest intervention, and the milder peripheral progressive cataract where surgery can wait. In order to understand this phenotypic dichotomy at the molecular level, we have studied the structural and aggregation features of representative mutations. Methods Wild type and several representative mutant proteins were cloned, expressed and purified and their secondary and tertiary structural details, as well as structural stability, were compared in solution, using spectroscopy. Their tendencies to aggregate in vitro and in cellulo were also compared. In addition, we analyzed their structural differences by molecular modeling in silico. Results Based on their properties, mutants are seen to fall into two classes. Mutants A36P, L45PL54P, R140X, and G165fs display lowered solubility and structural stability, expose several buried residues to the surface, aggregate in vitro and in cellulo, and disturb/distort the Greek key motif. And they are associated with nuclear cataract. In contrast, mutants P24T and R77S, associated with peripheral cataract, behave quite similar to the wild type molecule, and do not affect the Greek key topology. Conclusion When a mutation distorts even one of the four Greek key motifs, the protein readily self-aggregates and precipitates, consistent with the phenotype of nuclear cataract, while mutations not affecting the motif display ‘native state
Methods for sequencing GC-rich and CCT repeat DNA templates

DOEpatents

Robinson, Donna L.

2007-02-20

The present invention is directed to a PCR-based method of cycle sequencing DNA and other polynucleotide sequences having high CG content and regions of high GC content, and includes for example DNA strands with a high Cytosine and/or Guanosine content and repeated motifs such as CCT repeats.
Exact calculation of distributions on integers, with application to sequence alignment.

PubMed

Newberg, Lee A; Lawrence, Charles E

2009-01-01

Computational biology is replete with high-dimensional discrete prediction and inference problems. Dynamic programming recursions can be applied to several of the most important of these, including sequence alignment, RNA secondary-structure prediction, phylogenetic inference, and motif finding. In these problems, attention is frequently focused on some scalar quantity of interest, a score, such as an alignment score or the free energy of an RNA secondary structure. In many cases, score is naturally defined on integers, such as a count of the number of pairing differences between two sequence alignments, or else an integer score has been adopted for computational reasons, such as in the test of significance of motif scores. The probability distribution of the score under an appropriate probabilistic model is of interest, such as in tests of significance of motif scores, or in calculation of Bayesian confidence limits around an alignment. Here we present three algorithms for calculating the exact distribution of a score of this type; then, in the context of pairwise local sequence alignments, we apply the approach so as to find the alignment score distribution and Bayesian confidence limits.
Conserved DNA motifs in the type II-A CRISPR leader region.

PubMed

Van Orden, Mason J; Klein, Peter; Babu, Kesavan; Najar, Fares Z; Rajan, Rakhi

2017-01-01

The Clustered Regularly Interspaced Short Palindromic Repeats associated (CRISPR-Cas) systems consist of RNA-protein complexes that provide bacteria and archaea with sequence-specific immunity against bacteriophages, plasmids, and other mobile genetic elements. Bacteria and archaea become immune to phage or plasmid infections by inserting short pieces of the intruder DNA (spacer) site-specifically into the leader-repeat junction in a process called adaptation. Previous studies have shown that parts of the leader region, especially the 3' end of the leader, are indispensable for adaptation. However, a comprehensive analysis of leader ends remains absent. Here, we have analyzed the leader, repeat, and Cas proteins from 167 type II-A CRISPR loci. Our results indicate two distinct conserved DNA motifs at the 3' leader end: ATTTGAG (noted previously in the CRISPR1 locus of Streptococcus thermophilus DGCC7710) and a newly defined CTRCGAG, associated with the CRISPR3 locus of S. thermophilus DGCC7710. A third group with a very short CG DNA conservation at the 3' leader end is observed mostly in lactobacilli. Analysis of the repeats and Cas proteins revealed clustering of these CRISPR components that mirrors the leader motif clustering, in agreement with the coevolution of CRISPR-Cas components. Based on our analysis of the type II-A CRISPR loci, we implicate leader end sequences that could confer site-specificity for the adaptation-machinery in the different subsets of type II-A CRISPR loci.
Conserved DNA motifs in the type II-A CRISPR leader region

PubMed Central

Babu, Kesavan; Najar, Fares Z.

2017-01-01

The Clustered Regularly Interspaced Short Palindromic Repeats associated (CRISPR-Cas) systems consist of RNA-protein complexes that provide bacteria and archaea with sequence-specific immunity against bacteriophages, plasmids, and other mobile genetic elements. Bacteria and archaea become immune to phage or plasmid infections by inserting short pieces of the intruder DNA (spacer) site-specifically into the leader-repeat junction in a process called adaptation. Previous studies have shown that parts of the leader region, especially the 3′ end of the leader, are indispensable for adaptation. However, a comprehensive analysis of leader ends remains absent. Here, we have analyzed the leader, repeat, and Cas proteins from 167 type II-A CRISPR loci. Our results indicate two distinct conserved DNA motifs at the 3′ leader end: ATTTGAG (noted previously in the CRISPR1 locus of Streptococcus thermophilus DGCC7710) and a newly defined CTRCGAG, associated with the CRISPR3 locus of S. thermophilus DGCC7710. A third group with a very short CG DNA conservation at the 3′ leader end is observed mostly in lactobacilli. Analysis of the repeats and Cas proteins revealed clustering of these CRISPR components that mirrors the leader motif clustering, in agreement with the coevolution of CRISPR-Cas components. Based on our analysis of the type II-A CRISPR loci, we implicate leader end sequences that could confer site-specificity for the adaptation-machinery in the different subsets of type II-A CRISPR loci. PMID:28392985
Use of Limited Proteolysis and Mutagenesis To Identify Folding Domains and Sequence Motifs Critical for Wax Ester Synthase/Acyl Coenzyme A:Diacylglycerol Acyltransferase Activity

PubMed Central

Villa, Juan A.; Cabezas, Matilde; de la Cruz, Fernando

2014-01-01

Triacylglycerols and wax esters are synthesized as energy storage molecules by some proteobacteria and actinobacteria under stress. The enzyme responsible for neutral lipid accumulation is the bifunctional wax ester synthase/acyl-coenzyme A (CoA):diacylglycerol acyltransferase (WS/DGAT). Structural modeling of WS/DGAT suggests that it can adopt an acyl-CoA-dependent acyltransferase fold with the N-terminal and C-terminal domains connected by a helical linker, an architecture demonstrated experimentally by limited proteolysis. Moreover, we found that both domains form an active complex when coexpressed as independent polypeptides. The structural prediction and sequence alignment of different WS/DGAT proteins indicated catalytically important motifs in the enzyme. Their role was probed by measuring the activities of a series of alanine scanning mutants. Our study underscores the structural understanding of this protein family and paves the way for their modification to improve the production of neutral lipids. PMID:24296496
The RXL motif of the African cassava mosaic virus Rep protein is necessary for rereplication of yeast DNA and viral infection in plants

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hipp, Katharina; Rau, Peter; Schäfer, Benjamin

Geminiviruses, single-stranded DNA plant viruses, encode a replication-initiator protein (Rep) that is indispensable for virus replication. A potential cyclin interaction motif (RXL) in the sequence of African cassava mosaic virus Rep may be an alternative link to cell cycle controls to the known interaction with plant homologs of retinoblastoma protein (pRBR). Mutation of this motif abrogated rereplication in fission yeast induced by expression of wildtype Rep suggesting that Rep interacts via its RXL motif with one or several yeast proteins. The RXL motif is essential for viral infection of Nicotiana benthamiana plants, since mutation of this motif in infectious clonesmore » prevented any symptomatic infection. The cell-cycle link (Clink) protein of a nanovirus (faba bean necrotic yellows virus) was investigated that activates the cell cycle by binding via its LXCXE motif to pRBR. Expression of wildtype Clink and a Clink mutant deficient in pRBR-binding did not trigger rereplication in fission yeast. - Highlights: • A potential cyclin interaction motif is conserved in geminivirus Rep proteins. • In ACMV Rep, this motif (RXL) is essential for rereplication of fission yeast DNA. • Mutating RXL abrogated viral infection completely in Nicotiana benthamiana. • Expression of a nanovirus Clink protein in yeast did not induce rereplication. • Plant viruses may have evolved multiple routes to exploit host DNA synthesis.« less
Exploring Short Linear Motifs Using the ELM Database and Tools.

PubMed

Gouw, Marc; Sámano-Sánchez, Hugo; Van Roey, Kim; Diella, Francesca; Gibson, Toby J; Dinkel, Holger

2017-06-27

The Eukaryotic Linear Motif (ELM) resource is dedicated to the characterization and prediction of short linear motifs (SLiMs). SLiMs are compact, degenerate peptide segments found in many proteins and essential to almost all cellular processes. However, despite their abundance, SLiMs remain largely uncharacterized. The ELM database is a collection of manually annotated SLiM instances curated from experimental literature. In this article we illustrate how to browse and search the database for curated SLiM data, and cover the different types of data integrated in the resource. We also cover how to use this resource in order to predict SLiMs in known as well as novel proteins, and how to interpret the results generated by the ELM prediction pipeline. The ELM database is a very rich resource, and in the following protocols we give helpful examples to demonstrate how this knowledge can be used to improve your own research. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.
Phylum-Level Conservation of Regulatory Information in Nematodes despite Extensive Non-coding Sequence Divergence

PubMed Central

Gordon, Kacy L.; Arthur, Robert K.; Ruvinsky, Ilya

2015-01-01

Gene regulatory information guides development and shapes the course of evolution. To test conservation of gene regulation within the phylum Nematoda, we compared the functions of putative cis-regulatory sequences of four sets of orthologs (unc-47, unc-25, mec-3 and elt-2) from distantly-related nematode species. These species, Caenorhabditis elegans, its congeneric C. briggsae, and three parasitic species Meloidogyne hapla, Brugia malayi, and Trichinella spiralis, represent four of the five major clades in the phylum Nematoda. Despite the great phylogenetic distances sampled and the extensive sequence divergence of nematode genomes, all but one of the regulatory elements we tested are able to drive at least a subset of the expected gene expression patterns. We show that functionally conserved cis-regulatory elements have no more extended sequence similarity to their C. elegans orthologs than would be expected by chance, but they do harbor motifs that are important for proper expression of the C. elegans genes. These motifs are too short to be distinguished from the background level of sequence similarity, and while identical in sequence they are not conserved in orientation or position. Functional tests reveal that some of these motifs contribute to proper expression. Our results suggest that conserved regulatory circuitry can persist despite considerable turnover within cis elements. PMID:26020930
Proteome-wide search for functional motifs altered in tumors: Prediction of nuclear export signals inactivated by cancer-related mutations

PubMed Central

Prieto, Gorka; Fullaondo, Asier; Rodríguez, Jose A.

2016-01-01

Large-scale sequencing projects are uncovering a growing number of missense mutations in human tumors. Understanding the phenotypic consequences of these alterations represents a formidable challenge. In silico prediction of functionally relevant amino acid motifs disrupted by cancer mutations could provide insight into the potential impact of a mutation, and guide functional tests. We have previously described Wregex, a tool for the identification of potential functional motifs, such as nuclear export signals (NESs), in proteins. Here, we present an improved version that allows motif prediction to be combined with data from large repositories, such as the Catalogue of Somatic Mutations in Cancer (COSMIC), and to be applied to a whole proteome scale. As an example, we have searched the human proteome for candidate NES motifs that could be altered by cancer-related mutations included in the COSMIC database. A subset of the candidate NESs identified was experimentally tested using an in vivo nuclear export assay. A significant proportion of the selected motifs exhibited nuclear export activity, which was abrogated by the COSMIC mutations. In addition, our search identified a cancer mutation that inactivates the NES of the human deubiquitinase USP21, and leads to the aberrant accumulation of this protein in the nucleus. PMID:27174732
Linear Lepidopteran ambidensovirus 1 sequences drive random integration of a reporter gene in transfected Spodoptera frugiperda cells.

PubMed

Rizk, Francine; Laverdure, Sylvain; d'Alençon, Emmanuelle; Bossin, Hervé; Dupressoir, Thierry

2018-01-01

The Lepidopteran ambidensovirus 1 isolated from Junonia coenia (hereafter JcDV) is an invertebrate parvovirus considered as a viral transduction vector as well as a potential tool for the biological control of insect pests. Previous works showed that JcDV-based circular plasmids experimentally integrate into insect cells genomic DNA. In order to approach the natural conditions of infection and possible integration, we generated linear JcDV- gfp based molecules which were transfected into non permissive Spodoptera frugiperda ( Sf9 ) cultured cells. Cells were monitored for the expression of green fluorescent protein (GFP) and DNA was analyzed for integration of transduced viral sequences. Non-structural protein modulation of the VP-gene cassette promoter activity was additionally assayed. We show that linear JcDV-derived molecules are capable of long term genomic integration and sustained transgene expression in Sf9 cells. As expected, only the deletion of both inverted terminal repeats (ITR) or the polyadenylation signals of NS and VP genes dramatically impairs the global transduction/expression efficiency. However, all the integrated viral sequences we characterized appear "scrambled" whatever the viral content of the transfected vector. Despite a strong GFP expression, we were unable to recover any full sequence of the original constructs and found rearranged viral and non-viral sequences as well. Cellular flanking sequences were identified as non-coding ones. On the other hand, the kinetics of GFP expression over time led us to investigate the apparent down-regulation by non-structural proteins of the VP-gene cassette promoter. Altogether, our results show that JcDV-derived sequences included in linear DNA molecules are able to drive efficiently the integration and expression of a foreign gene into the genome of insect cells, whatever their composition, provided that at least one ITR is present. However, the transfected sequences were extensively rearranged with

NullSeq: A Tool for Generating Random Coding Sequences with Desired Amino Acid and GC Contents.

PubMed

Liu, Sophia S; Hockenberry, Adam J; Lancichinetti, Andrea; Jewett, Michael C; Amaral, Luís A N

2016-11-01

The existence of over- and under-represented sequence motifs in genomes provides evidence of selective evolutionary pressures on biological mechanisms such as transcription, translation, ligand-substrate binding, and host immunity. In order to accurately identify motifs and other genome-scale patterns of interest, it is essential to be able to generate accurate null models that are appropriate for the sequences under study. While many tools have been developed to create random nucleotide sequences, protein coding sequences are subject to a unique set of constraints that complicates the process of generating appropriate null models. There are currently no tools available that allow users to create random coding sequences with specified amino acid composition and GC content for the purpose of hypothesis testing. Using the principle of maximum entropy, we developed a method that generates unbiased random sequences with pre-specified amino acid and GC content, which we have developed into a python package. Our method is the simplest way to obtain maximally unbiased random sequences that are subject to GC usage and primary amino acid sequence constraints. Furthermore, this approach can easily be expanded to create unbiased random sequences that incorporate more complicated constraints such as individual nucleotide usage or even di-nucleotide frequencies. The ability to generate correctly specified null models will allow researchers to accurately identify sequence motifs which will lead to a better understanding of biological processes as well as more effective engineering of biological systems.
Cancer-related marketing centrality motifs acting as pivot units in the human signaling network and mediating cross-talk between biological pathways.

PubMed

Li, Wan; Chen, Lina; Li, Xia; Jia, Xu; Feng, Chenchen; Zhang, Liangcai; He, Weiming; Lv, Junjie; He, Yuehan; Li, Weiguo; Qu, Xiaoli; Zhou, Yanyan; Shi, Yuchen

2013-12-01

Network motifs in central positions are considered to not only have more in-coming and out-going connections but are also localized in an area where more paths reach the networks. These central motifs have been extensively investigated to determine their consistent functions or associations with specific function categories. However, their functional potentials in the maintenance of cross-talk between different functional communities are unclear. In this paper, we constructed an integrated human signaling network from the Pathway Interaction Database. We identified 39 essential cancer-related motifs in central roles, which we called cancer-related marketing centrality motifs, using combined centrality indices on the system level. Our results demonstrated that these cancer-related marketing centrality motifs were pivotal units in the signaling network, and could mediate cross-talk between 61 biological pathways (25 could be mediated by one motif on average), most of which were cancer-related pathways. Further analysis showed that molecules of most marketing centrality motifs were in the same or adjacent subcellular localizations, such as the motif containing PI3K, PDK1 and AKT1 in the plasma membrane, to mediate signal transduction between 32 cancer-related pathways. Finally, we analyzed the pivotal roles of cancer genes in these marketing centrality motifs in the pathogenesis of cancers, and found that non-cancer genes were potential cancer-related genes.
A kinesin-1 binding motif in vaccinia virus that is widespread throughout the human genome

PubMed Central

Dodding, Mark P; Mitter, Richard; Humphries, Ashley C; Way, Michael

2011-01-01

Transport of cargoes by kinesin-1 is essential for many cellular processes. Nevertheless, the number of proteins known to recruit kinesin-1 via its cargo binding light chain (KLC) is still quite small. We also know relatively little about the molecular features that define kinesin-1 binding. We now show that a bipartite tryptophan-based kinesin-1 binding motif, originally identified in Calsyntenin is present in A36, a vaccinia integral membrane protein. This bipartite motif in A36 is required for kinesin-1-dependent transport of the virus to the cell periphery. Bioinformatic analysis reveals that related bipartite tryptophan-based motifs are present in over 450 human proteins. Using vaccinia as a surrogate cargo, we show that regions of proteins containing this motif can function to recruit KLC and promote virus transport in the absence of A36. These proteins interact with the kinesin light chain outside the context of infection and have distinct preferences for KLC1 and KLC2. Our observations demonstrate that KLC binding can be conferred by a common set of features that are found in a wide range of proteins associated with diverse cellular functions and human diseases. PMID:21915095
Rewiring yeast sugar transporter preference through modifying a conserved protein motif.

PubMed

Young, Eric M; Tong, Alice; Bui, Hang; Spofford, Caitlin; Alper, Hal S

2014-01-07

Utilization of exogenous sugars found in lignocellulosic biomass hydrolysates, such as xylose, must be improved before yeast can serve as an efficient biofuel and biochemical production platform. In particular, the first step in this process, the molecular transport of xylose into the cell, can serve as a significant flux bottleneck and is highly inhibited by other sugars. Here we demonstrate that sugar transport preference and kinetics can be rewired through the programming of a sequence motif of the general form G-G/F-XXX-G found in the first transmembrane span. By evaluating 46 different heterologously expressed transporters, we find that this motif is conserved among functional transporters and highly enriched in transporters that confer growth on xylose. Through saturation mutagenesis and subsequent rational mutagenesis, four transporter mutants unable to confer growth on glucose but able to sustain growth on xylose were engineered. Specifically, Candida intermedia gxs1 Phe(38)Ile(39)Met(40), Scheffersomyces stipitis rgt2 Phe(38) and Met(40), and Saccharomyces cerevisiae hxt7 Ile(39)Met(40)Met(340) all exhibit this phenotype. In these cases, primary hexose transporters were rewired into xylose transporters. These xylose transporters nevertheless remained inhibited by glucose. Furthermore, in the course of identifying this motif, novel wild-type transporters with superior monosaccharide growth profiles were discovered, namely S. stipitis RGT2 and Debaryomyces hansenii 2D01474. These findings build toward the engineering of efficient pentose utilization in yeast and provide a blueprint for reprogramming transporter properties.
Modeling gene regulatory network motifs using statecharts

PubMed Central

2012-01-01

Background Gene regulatory networks are widely used by biologists to describe the interactions among genes, proteins and other components at the intra-cellular level. Recently, a great effort has been devoted to give gene regulatory networks a formal semantics based on existing computational frameworks. For this purpose, we consider Statecharts, which are a modular, hierarchical and executable formal model widely used to represent software systems. We use Statecharts for modeling small and recurring patterns of interactions in gene regulatory networks, called motifs. Results We present an improved method for modeling gene regulatory network motifs using Statecharts and we describe the successful modeling of several motifs, including those which could not be modeled or whose models could not be distinguished using the method of a previous proposal. We model motifs in an easy and intuitive way by taking advantage of the visual features of Statecharts. Our modeling approach is able to simulate some interesting temporal properties of gene regulatory network motifs: the delay in the activation and the deactivation of the "output" gene in the coherent type-1 feedforward loop, the pulse in the incoherent type-1 feedforward loop, the bistability nature of double positive and double negative feedback loops, the oscillatory behavior of the negative feedback loop, and the "lock-in" effect of positive autoregulation. Conclusions We present a Statecharts-based approach for the modeling of gene regulatory network motifs in biological systems. The basic motifs used to build more complex networks (that is, simple regulation, reciprocal regulation, feedback loop, feedforward loop, and autoregulation) can be faithfully described and their temporal dynamics can be analyzed. PMID:22536967
Predictive regulatory models in Drosophila melanogaster by integrative inference of transcriptional networks

PubMed Central

Marbach, Daniel; Roy, Sushmita; Ay, Ferhat; Meyer, Patrick E.; Candeias, Rogerio; Kahveci, Tamer; Bristow, Christopher A.; Kellis, Manolis

2012-01-01

Gaining insights on gene regulation from large-scale functional data sets is a grand challenge in systems biology. In this article, we develop and apply methods for transcriptional regulatory network inference from diverse functional genomics data sets and demonstrate their value for gene function and gene expression prediction. We formulate the network inference problem in a machine-learning framework and use both supervised and unsupervised methods to predict regulatory edges by integrating transcription factor (TF) binding, evolutionarily conserved sequence motifs, gene expression, and chromatin modification data sets as input features. Applying these methods to Drosophila melanogaster, we predict ∼300,000 regulatory edges in a network of ∼600 TFs and 12,000 target genes. We validate our predictions using known regulatory interactions, gene functional annotations, tissue-specific expression, protein–protein interactions, and three-dimensional maps of chromosome conformation. We use the inferred network to identify putative functions for hundreds of previously uncharacterized genes, including many in nervous system development, which are independently confirmed based on their tissue-specific expression patterns. Last, we use the regulatory network to predict target gene expression levels as a function of TF expression, and find significantly higher predictive power for integrative networks than for motif or ChIP-based networks. Our work reveals the complementarity between physical evidence of regulatory interactions (TF binding, motif conservation) and functional evidence (coordinated expression or chromatin patterns) and demonstrates the power of data integration for network inference and studies of gene regulation at the systems level. PMID:22456606
FPGA implementation of motifs-based neuronal network and synchronization analysis

NASA Astrophysics Data System (ADS)

Deng, Bin; Zhu, Zechen; Yang, Shuangming; Wei, Xile; Wang, Jiang; Yu, Haitao

2016-06-01

Motifs in complex networks play a crucial role in determining the brain functions. In this paper, 13 kinds of motifs are implemented with Field Programmable Gate Array (FPGA) to investigate the relationships between the networks properties and motifs properties. We use discretization method and pipelined architecture to construct various motifs with Hindmarsh-Rose (HR) neuron as the node model. We also build a small-world network based on these motifs and conduct the synchronization analysis of motifs as well as the constructed network. We find that the synchronization properties of motif determine that of motif-based small-world network, which demonstrates effectiveness of our proposed hardware simulation platform. By imitation of some vital nuclei in the brain to generate normal discharges, our proposed FPGA-based artificial neuronal networks have the potential to replace the injured nuclei to complete the brain function in the treatment of Parkinson's disease and epilepsy.
Imperfect duplicate insertions type of mutations in plasmepsin V modulates binding properties of PEXEL motifs of export proteins in Indian Plasmodium vivax.

PubMed

Rawat, Manmeet; Vijay, Sonam; Gupta, Yash; Tiwari, Pramod Kumar; Sharma, Arun

2013-01-01

Plasmepsin V (PM-V) have functionally conserved orthologues across the Plasmodium genus who's binding and antigenic processing at the PEXEL motifs for export about 200-300 essential proteins is important for the virulence and viability of the causative Plasmodium species. This study was undertaken to determine P. vivax plasmepsin V Ind (PvPM-V-Ind) PEXEL motif export pathway for pathogenicity-related proteins/antigens export thereby altering plasmodium exportome during erythrocytic stages. We identify and characterize Plasmodium vivax plasmepsin-V-Ind (mutant) gene by cloning, sequence analysis, in silico bioinformatic protocols and structural modeling predictions based on docking studies on binding capacity with PEXEL motifs processing in terms of binding and accessibility of export proteins. Cloning and sequence analysis for genetic diversity demonstrates PvPM-V-Ind (mutant) gene is highly conserved among all isolates from different geographical regions of India. Imperfect duplicate insertion types of mutations (SVSE from 246-249 AA and SLSE from 266-269 AA) were identified among all Indian isolates in comparison to P.vivax Sal-1 (PvPM-V-Sal 1) isolate. In silico bioinformatics interaction studies of PEXEL peptide and active enzyme reveal that PvPM-V-Ind (mutant) is only active in endoplasmic reticulum lumen and membrane embedding is essential for activation of plasmepsin V. Structural modeling predictions based on docking studies with PEXEL motif show significant variation in substrate protein binding of these imperfect mutations with data mined PEXEL sequences. The predicted variation in the docking score and interacting amino acids of PvPM-V-Ind (mutant) proteins with PEXEL and lopinavir suggests a modulation in the activity of PvPM-V in terms of binding and accessibility at these sites. Our functional modeled validation of PvPM-V-Ind (mutant) imperfect duplicate insertions with data mined PEXEL sequences leading to altered binding and substrate accessibility
Imperfect Duplicate Insertions Type of Mutations in Plasmepsin V Modulates Binding Properties of PEXEL Motifs of Export Proteins in Indian Plasmodium vivax

PubMed Central

Rawat, Manmeet; Vijay, Sonam; Gupta, Yash; Tiwari, Pramod Kumar; Sharma, Arun

2013-01-01

Introduction Plasmepsin V (PM-V) have functionally conserved orthologues across the Plasmodium genus who's binding and antigenic processing at the PEXEL motifs for export about 200–300 essential proteins is important for the virulence and viability of the causative Plasmodium species. This study was undertaken to determine P. vivax plasmepsin V Ind (PvPM-V-Ind) PEXEL motif export pathway for pathogenicity-related proteins/antigens export thereby altering plasmodium exportome during erythrocytic stages. Method We identify and characterize Plasmodium vivax plasmepsin-V-Ind (mutant) gene by cloning, sequence analysis, in silico bioinformatic protocols and structural modeling predictions based on docking studies on binding capacity with PEXEL motifs processing in terms of binding and accessibility of export proteins. Results Cloning and sequence analysis for genetic diversity demonstrates PvPM-V-Ind (mutant) gene is highly conserved among all isolates from different geographical regions of India. Imperfect duplicate insertion types of mutations (SVSE from 246–249 AA and SLSE from 266–269 AA) were identified among all Indian isolates in comparison to P.vivax Sal-1 (PvPM-V-Sal 1) isolate. In silico bioinformatics interaction studies of PEXEL peptide and active enzyme reveal that PvPM-V-Ind (mutant) is only active in endoplasmic reticulum lumen and membrane embedding is essential for activation of plasmepsin V. Structural modeling predictions based on docking studies with PEXEL motif show significant variation in substrate protein binding of these imperfect mutations with data mined PEXEL sequences. The predicted variation in the docking score and interacting amino acids of PvPM-V-Ind (mutant) proteins with PEXEL and lopinavir suggests a modulation in the activity of PvPM-V in terms of binding and accessibility at these sites. Conclusion/Significance Our functional modeled validation of PvPM-V-Ind (mutant) imperfect duplicate insertions with data mined PEXEL
Solution structure of a DNA mimicking motif of an RNA aptamer against transcription factor AML1 Runt domain.

PubMed

Nomura, Yusuke; Tanaka, Yoichiro; Fukunaga, Jun-ichi; Fujiwara, Kazuya; Chiba, Manabu; Iibuchi, Hiroaki; Tanaka, Taku; Nakamura, Yoshikazu; Kawai, Gota; Kozu, Tomoko; Sakamoto, Taiichi

2013-12-01

AML1/RUNX1 is an essential transcription factor involved in the differentiation of hematopoietic cells. AML1 binds to the Runt-binding double-stranded DNA element (RDE) of target genes through its N-terminal Runt domain. In a previous study, we obtained RNA aptamers against the AML1 Runt domain by systematic evolution of ligands by exponential enrichment and revealed that RNA aptamers exhibit higher affinity for the Runt domain than that for RDE and possess the 5'-GCGMGNN-3' and 5'-N'N'CCAC-3' conserved motif (M: A or C; N and N' form Watson-Crick base pairs) that is important for Runt domain binding. In this study, to understand the structural basis of recognition of the Runt domain by the aptamer motif, the solution structure of a 22-mer RNA was determined using nuclear magnetic resonance. The motif contains the AH(+)-C mismatch and base triple and adopts an unusual backbone structure. Structural analysis of the aptamer motif indicated that the aptamer binds to the Runt domain by mimicking the RDE sequence and structure. Our data should enhance the understanding of the structural basis of DNA mimicry by RNA molecules.
Automated extraction and classification of RNA tertiary structure cyclic motifs

PubMed Central

Lemieux, Sébastien; Major, François

2006-01-01

A minimum cycle basis of the tertiary structure of a large ribosomal subunit (LSU) X-ray crystal structure was analyzed. Most cycles are small, as they are composed of 3- to 5 nt, and repeated across the LSU tertiary structure. We used hierarchical clustering to quantify and classify the 4 nt cycles. One class is defined by the GNRA tetraloop motif. The inspection of the GNRA class revealed peculiar instances in sequence. First is the presence of UA, CA, UC and CC base pairs that substitute the usual sheared GA base pair. Second is the revelation of GNR(Xn)A tetraloops, where Xn is bulged out of the classical GNRA structure, and of GN/RA formed by the two strands of interior-loops. We were able to unambiguously characterize the cycle classes using base stacking and base pairing annotations. The cycles identified correspond to small and cyclic motifs that compose most of the LSU RNA tertiary structure and contribute to its thermodynamic stability. Consequently, the RNA minimum cycles could well be used as the basic elements of RNA tertiary structure prediction methods. PMID:16679452
Conservation of the PTEN catalytic motif in the bacterial undecaprenyl pyrophosphate phosphatase, BacA/UppP.

PubMed

Bickford, Justin S; Nick, Harry S

2013-12-01

Isoprenoid lipid carriers are essential in protein glycosylation and bacterial cell envelope biosynthesis. The enzymes involved in their metabolism (synthases, kinases and phosphatases) are therefore critical to cell viability. In this review, we focus on two broad groups of isoprenoid pyrophosphate phosphatases. One group, containing phosphatidic acid phosphatase motifs, includes the eukaryotic dolichyl pyrophosphate phosphatases and proposed recycling bacterial undecaprenol pyrophosphate phosphatases, PgpB, YbjB and YeiU/LpxT. The second group comprises the bacterial undecaprenol pyrophosphate phosphatase, BacA/UppP, responsible for initial formation of undecaprenyl phosphate, which we predict contains a tyrosine phosphate phosphatase motif resembling that of the tumour suppressor, phosphatase and tensin homologue (PTEN). Based on protein sequence alignments across species and 2D structure predictions, we propose catalytic and lipid recognition motifs unique to BacA/UppP enzymes. The verification of our proposed active-site residues would provide new strategies for the development of substrate-specific inhibitors which mimic both the lipid and pyrophosphate moieties, leading to the development of novel antimicrobial agents.
Triadic motifs in the dependence networks of virtual societies.

PubMed

Xie, Wen-Jie; Li, Ming-Xia; Jiang, Zhi-Qiang; Zhou, Wei-Xing

2014-06-10

In friendship networks, individuals have different numbers of friends, and the closeness or intimacy between an individual and her friends is heterogeneous. Using a statistical filtering method to identify relationships about who depends on whom, we construct dependence networks (which are directed) from weighted friendship networks of avatars in more than two hundred virtual societies of a massively multiplayer online role-playing game (MMORPG). We investigate the evolution of triadic motifs in dependence networks. Several metrics show that the virtual societies evolved through a transient stage in the first two to three weeks and reached a relatively stable stage. We find that the unidirectional loop motif (M9) is underrepresented and does not appear, open motifs are also underrepresented, while other close motifs are overrepresented. We also find that, for most motifs, the overall level difference of the three avatars in the same motif is significantly lower than average, whereas the sum of ranks is only slightly larger than average. Our findings show that avatars' social status plays an important role in the formation of triadic motifs.
Triadic motifs in the dependence networks of virtual societies

NASA Astrophysics Data System (ADS)

Xie, Wen-Jie; Li, Ming-Xia; Jiang, Zhi-Qiang; Zhou, Wei-Xing

2014-06-01

In friendship networks, individuals have different numbers of friends, and the closeness or intimacy between an individual and her friends is heterogeneous. Using a statistical filtering method to identify relationships about who depends on whom, we construct dependence networks (which are directed) from weighted friendship networks of avatars in more than two hundred virtual societies of a massively multiplayer online role-playing game (MMORPG). We investigate the evolution of triadic motifs in dependence networks. Several metrics show that the virtual societies evolved through a transient stage in the first two to three weeks and reached a relatively stable stage. We find that the unidirectional loop motif (M9) is underrepresented and does not appear, open motifs are also underrepresented, while other close motifs are overrepresented. We also find that, for most motifs, the overall level difference of the three avatars in the same motif is significantly lower than average, whereas the sum of ranks is only slightly larger than average. Our findings show that avatars' social status plays an important role in the formation of triadic motifs.
Triadic motifs in the dependence networks of virtual societies

PubMed Central

Xie, Wen-Jie; Li, Ming-Xia; Jiang, Zhi-Qiang; Zhou, Wei-Xing

2014-01-01

In friendship networks, individuals have different numbers of friends, and the closeness or intimacy between an individual and her friends is heterogeneous. Using a statistical filtering method to identify relationships about who depends on whom, we construct dependence networks (which are directed) from weighted friendship networks of avatars in more than two hundred virtual societies of a massively multiplayer online role-playing game (MMORPG). We investigate the evolution of triadic motifs in dependence networks. Several metrics show that the virtual societies evolved through a transient stage in the first two to three weeks and reached a relatively stable stage. We find that the unidirectional loop motif (M9) is underrepresented and does not appear, open motifs are also underrepresented, while other close motifs are overrepresented. We also find that, for most motifs, the overall level difference of the three avatars in the same motif is significantly lower than average, whereas the sum of ranks is only slightly larger than average. Our findings show that avatars' social status plays an important role in the formation of triadic motifs. PMID:24912755
CRISPR-spacer integration reporter plasmids reveal distinct genuine acquisition specificities among CRISPR-Cas I-E variants of Escherichia coli

PubMed Central

Díez-Villaseñor, César; Guzmán, Noemí M.; Almendros, Cristóbal; García-Martínez, Jesús; Mojica, Francisco J.M.

2013-01-01

Prokaryotes immunize themselves against transmissible genetic elements by the integration (acquisition) in clustered regularly interspaced short palindromic repeats (CRISPR) loci of spacers homologous to invader nucleic acids, defined as protospacers. Following acquisition, mono-spacer CRISPR RNAs (termed crRNAs) guide CRISPR-associated (Cas) proteins to degrade (interference) protospacers flanked by an adjacent motif in extrachomosomal DNA. During acquisition, selection of spacer-precursors adjoining the protospacer motif and proper orientation of the integrated fragment with respect to the leader (sequence leading transcription of the flanking CRISPR array) grant efficient interference by at least some CRISPR-Cas systems. This adaptive stage of the CRISPR action is poorly characterized, mainly due to the lack of appropriate genetic strategies to address its study and, at least in Escherichia coli, the need of Cas overproduction for insertion detection. In this work, we describe the development and application in Escherichia coli strains of an interference-independent assay based on engineered selectable CRISPR-spacer integration reporter plasmids. By using this tool without the constraint of interference or cas overexpression, we confirmed fundamental aspects of this process such as the critical requirement of Cas1 and Cas2 and the identity of the CTT protospacer motif for the E. coli K12 system. In addition, we defined the CWT motif for a non-K12 CRISPR-Cas variant, and obtained data supporting the implication of the leader in spacer orientation, the preferred acquisition from plasmids harboring cas genes and the occurrence of a sequential cleavage at the insertion site by a ruler mechanism. PMID:23445770
CRISPR-spacer integration reporter plasmids reveal distinct genuine acquisition specificities among CRISPR-Cas I-E variants of Escherichia coli.

PubMed

Díez-Villaseñor, César; Guzmán, Noemí M; Almendros, Cristóbal; García-Martínez, Jesús; Mojica, Francisco J M

2013-05-01

Prokaryotes immunize themselves against transmissible genetic elements by the integration (acquisition) in clustered regularly interspaced short palindromic repeats (CRISPR) loci of spacers homologous to invader nucleic acids, defined as protospacers. Following acquisition, mono-spacer CRISPR RNAs (termed crRNAs) guide CRISPR-associated (Cas) proteins to degrade (interference) protospacers flanked by an adjacent motif in extrachomosomal DNA. During acquisition, selection of spacer-precursors adjoining the protospacer motif and proper orientation of the integrated fragment with respect to the leader (sequence leading transcription of the flanking CRISPR array) grant efficient interference by at least some CRISPR-Cas systems. This adaptive stage of the CRISPR action is poorly characterized, mainly due to the lack of appropriate genetic strategies to address its study and, at least in Escherichia coli, the need of Cas overproduction for insertion detection. In this work, we describe the development and application in Escherichia coli strains of an interference-independent assay based on engineered selectable CRISPR-spacer integration reporter plasmids. By using this tool without the constraint of interference or cas overexpression, we confirmed fundamental aspects of this process such as the critical requirement of Cas1 and Cas2 and the identity of the CTT protospacer motif for the E. coli K12 system. In addition, we defined the CWT motif for a non-K12 CRISPR-Cas variant, and obtained data supporting the implication of the leader in spacer orientation, the preferred acquisition from plasmids harboring cas genes and the occurrence of a sequential cleavage at the insertion site by a ruler mechanism.
Unravelling daily human mobility motifs

PubMed Central

Schneider, Christian M.; Belik, Vitaly; Couronné, Thomas; Smoreda, Zbigniew; González, Marta C.

2013-01-01

Human mobility is differentiated by time scales. While the mechanism for long time scales has been studied, the underlying mechanism on the daily scale is still unrevealed. Here, we uncover the mechanism responsible for the daily mobility patterns by analysing the temporal and spatial trajectories of thousands of persons as individual networks. Using the concept of motifs from network theory, we find only 17 unique networks are present in daily mobility and they follow simple rules. These networks, called here motifs, are sufficient to capture up to 90 per cent of the population in surveys and mobile phone datasets for different countries. Each individual exhibits a characteristic motif, which seems to be stable over several months. Consequently, daily human mobility can be reproduced by an analytically tractable framework for Markov chains by modelling periods of high-frequency trips followed by periods of lower activity as the key ingredient. PMID:23658117
Encryption of agonistic motifs for TLR4 into artificial antigens augmented the maturation of antigen-presenting cells.

PubMed

Ito, Masaki; Hayashi, Kazumi; Minamisawa, Tamiko; Homma, Sadamu; Koido, Shigeo; Shiba, Kiyotaka

2017-01-01

Adjuvants are indispensable for achieving a sufficient immune response from vaccinations. From a functional viewpoint, adjuvants are classified into two categories: "physical adjuvants" increase the efficacy of antigen presentation by antigen-presenting cells (APC) and "signal adjuvants" induce the maturation of APC. Our previous study has demonstrated that a physical adjuvant can be encrypted into proteinous antigens by creating artificial proteins from combinatorial assemblages of epitope peptides and those peptide sequences having propensities to form certain protein structures (motif programming). However, the artificial antigens still require a signal adjuvant to maturate the APC; for example, co-administration of the Toll-like receptor 4 (TLR4) agonist monophosphoryl lipid A (MPLA) was required to induce an in vivo immunoreaction. In this study, we further modified the previous artificial antigens by appending the peptide motifs, which have been reported to have agonistic activity for TLR4, to create "adjuvant-free" antigens. The created antigens with triple TLR4 agonistic motifs in their C-terminus have activated NF-κB signaling pathways through TLR4. These proteins also induced the production of the inflammatory cytokine TNF-α, and the expression of the co-stimulatory molecule CD40 in APC, supporting the maturation of APC in vitro. Unexpectedly, these signal adjuvant-encrypted proteins have lost their ability to be physical adjuvants because they did not induce cytotoxic T lymphocytes (CTL) in vivo, while the parental proteins induced CTL. These results confirmed that the manifestation of a motif's function is context-dependent and simple addition does not always work for motif-programing. Further optimization of the molecular context of the TLR4 agonistic motifs in antigens should be required to create adjuvant-free antigens.
Unexpected Inheritance: Multiple Integrations of Ancient Bornavirus and Ebolavirus/Marburgvirus Sequences in Vertebrate Genomes

PubMed Central

Belyi, Vladimir A.; Levine, Arnold J.; Skalka, Anna Marie

2010-01-01

Vertebrate genomes contain numerous copies of retroviral sequences, acquired over the course of evolution. Until recently they were thought to be the only type of RNA viruses to be so represented, because integration of a DNA copy of their genome is required for their replication. In this study, an extensive sequence comparison was conducted in which 5,666 viral genes from all known non-retroviral families with single-stranded RNA genomes were matched against the germline genomes of 48 vertebrate species, to determine if such viruses could also contribute to the vertebrate genetic heritage. In 19 of the tested vertebrate species, we discovered as many as 80 high-confidence examples of genomic DNA sequences that appear to be derived, as long ago as 40 million years, from ancestral members of 4 currently circulating virus families with single strand RNA genomes. Surprisingly, almost all of the sequences are related to only two families in the Order Mononegavirales: the Bornaviruses and the Filoviruses, which cause lethal neurological disease and hemorrhagic fevers, respectively. Based on signature landmarks some, and perhaps all, of the endogenous virus-like DNA sequences appear to be LINE element-facilitated integrations derived from viral mRNAs. The integrations represent genes that encode viral nucleocapsid, RNA-dependent-RNA-polymerase, matrix and, possibly, glycoproteins. Integrations are generally limited to one or very few copies of a related viral gene per species, suggesting that once the initial germline integration was obtained (or selected), later integrations failed or provided little advantage to the host. The conservation of relatively long open reading frames for several of the endogenous sequences, the virus-like protein regions represented, and a potential correlation between their presence and a species' resistance to the diseases caused by these pathogens, are consistent with the notion that their products provide some important biological

Unexpected inheritance: multiple integrations of ancient bornavirus and ebolavirus/marburgvirus sequences in vertebrate genomes.

PubMed

Belyi, Vladimir A; Levine, Arnold J; Skalka, Anna Marie

2010-07-29

Vertebrate genomes contain numerous copies of retroviral sequences, acquired over the course of evolution. Until recently they were thought to be the only type of RNA viruses to be so represented, because integration of a DNA copy of their genome is required for their replication. In this study, an extensive sequence comparison was conducted in which 5,666 viral genes from all known non-retroviral families with single-stranded RNA genomes were matched against the germline genomes of 48 vertebrate species, to determine if such viruses could also contribute to the vertebrate genetic heritage. In 19 of the tested vertebrate species, we discovered as many as 80 high-confidence examples of genomic DNA sequences that appear to be derived, as long ago as 40 million years, from ancestral members of 4 currently circulating virus families with single strand RNA genomes. Surprisingly, almost all of the sequences are related to only two families in the Order Mononegavirales: the Bornaviruses and the Filoviruses, which cause lethal neurological disease and hemorrhagic fevers, respectively. Based on signature landmarks some, and perhaps all, of the endogenous virus-like DNA sequences appear to be LINE element-facilitated integrations derived from viral mRNAs. The integrations represent genes that encode viral nucleocapsid, RNA-dependent-RNA-polymerase, matrix and, possibly, glycoproteins. Integrations are generally limited to one or very few copies of a related viral gene per species, suggesting that once the initial germline integration was obtained (or selected), later integrations failed or provided little advantage to the host. The conservation of relatively long open reading frames for several of the endogenous sequences, the virus-like protein regions represented, and a potential correlation between their presence and a species' resistance to the diseases caused by these pathogens, are consistent with the notion that their products provide some important biological
Integrated databanks access and sequence/structure analysis services at the PBIL.

PubMed

Perrière, Guy; Combet, Christophe; Penel, Simon; Blanchet, Christophe; Thioulouse, Jean; Geourjon, Christophe; Grassot, Julien; Charavay, Céline; Gouy, Manolo; Duret, Laurent; Deléage, Gilbert

2003-07-01

The World Wide Web server of the PBIL (Pôle Bioinformatique Lyonnais) provides on-line access to sequence databanks and to many tools of nucleic acid and protein sequence analyses. This server allows to query nucleotide sequence banks in the EMBL and GenBank formats and protein sequence banks in the SWISS-PROT and PIR formats. The query engine on which our data bank access is based is the ACNUC system. It allows the possibility to build complex queries to access functional zones of biological interest and to retrieve large sequence sets. Of special interest are the unique features provided by this system to query the data banks of gene families developed at the PBIL. The server also provides access to a wide range of sequence analysis methods: similarity search programs, multiple alignments, protein structure prediction and multivariate statistics. An originality of this server is the integration of these two aspects: sequence retrieval and sequence analysis. Indeed, thanks to the introduction of re-usable lists, it is possible to perform treatments on large sets of data. The PBIL server can be reached at: http://pbil.univ-lyon1.fr.
DNA methylation at hepatitis B viral integrants is associated with methylation at flanking human genomic sequences

PubMed Central

Watanabe, Yoshiyuki; Yamamoto, Hiroyuki; Oikawa, Ritsuko; Toyota, Minoru; Yamamoto, Masakazu; Kokudo, Norihiro; Tanaka, Shinji; Arii, Shigeki; Yotsuyanagi, Hiroshi; Koike, Kazuhiko; Itoh, Fumio

2015-01-01

Integration of DNA viruses into the human genome plays an important role in various types of tumors, including hepatitis B virus (HBV)–related hepatocellular carcinoma. However, the molecular details and clinical impact of HBV integration on either human or HBV epigenomes are unknown. Here, we show that methylation of the integrated HBV DNA is related to the methylation status of the flanking human genome. We developed a next-generation sequencing-based method for structural methylation analysis of integrated viral genomes (denoted G-NaVI). This method is a novel approach that enables enrichment of viral fragments for sequencing using unique baits based on the sequence of the HBV genome. We detected integrated HBV sequences in the genome of the PLC/PRF/5 cell line and found variable levels of methylation within the integrated HBV genomes. Allele-specific methylation analysis revealed that the HBV genome often became significantly methylated when integrated into highly methylated host sites. After integration into unmethylated human genome regions such as promoters, however, the HBV DNA remains unmethylated and may eventually play an important role in tumorigenesis. The observed dynamic changes in DNA methylation of the host and viral genomes may functionally affect the biological behavior of HBV. These findings may impact public health given that millions of people worldwide are carriers of HBV. We also believe our assay will be a powerful tool to increase our understanding of the various types of DNA virus-associated tumorigenesis. PMID:25653310
FY*A silencing by the GATA-motif variant FY*A(-69C) in a Caucasian family.

PubMed

Písačka, Martin; Marinov, Iuri; Králová, Miroslava; Králová, Jana; Kořánová, Michaela; Bohoněk, Miloš; Sood, Chhavi; Ochoa-Garay, Gorka

2015-11-01

The c.1-67C variant polymorphism in a GATA motif of the FY promoter is known to result in erythroid-specific FY silencing, that is, in Fy(a-) and Fy(b-) phenotypes. A Caucasian donor presented with the very rare Fy(a-b-) phenotype and was further investigated. Genomic DNA was analyzed by sequencing to identify the cause of the Fy(a-b-) phenotype. Samples were collected from some of his relatives to establish a correlation between the serology and genotyping results. Red blood cells were analyzed by gel column agglutination and flow cytometry. Genomic DNA was analyzed on genotyping microarrays, by DNA sequencing and by allele-specific PCR. In the donor, a single-nucleotide polymorphism T>C within the GATA motif was found at Position c.1-69 of the FY promoter and shown to occur in the FY*A allele. His genotype was found to be FY*A(-69C), FY*BW.01. In six FY*A/FY*B heterozygous members of the family, a perfect correlation was found between the presence vs. absence of the FY*A(-69C) variant allele and a Fy(a-) vs. Fy(a+) phenotype. The location of the c.1-69C polymorphism in a GATA motif whose disruption is known to result in a Fy null phenotype, together with the perfect correlation between the presence of the FY*A(-69C) allele and the Fy(a-) phenotype support a cause-effect relationship between the two. © 2015 AABB.
Molecular cloning of actin genes in Trichomonas vaginalis and phylogeny inferred from actin sequences.

PubMed

Bricheux, G; Brugerolle, G

1997-08-01

The parasitic protozoan Trichomonas vaginalis is known to contain the ubiquitous and highly conserved protein actin. A genomic library and a cDNA library have been screened to identify and clone the actin gene(s) of T. vaginalis. The nucleotide sequence of one gene and its flanking regions have been determined. The open reading frame encodes a protein of 376 amino acids. The sequence is not interrupted by any introns and the promoter could be represented by a 10 bp motif close to a consensus motif also found upstream of most sequenced T. vaginalis genes. The five different clones isolated from the cDNA library have similar sequences and encode three actin proteins differing only by one or two amino acids. A phylogenetic analysis of 31 actin sequences by distance matrix and parsimony methods, using centractin as outgroup, gives congruent trees with Parabasala branching above Diplomonadida.
Identification of a novel mitotic phosphorylation motif associated with protein localization to the mitotic apparatus

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yang, Feng; Camp, David G.; Gritsenko, Marina A.

2007-11-16

The chromosomal passenger complex (CPC) is a critical regulator of chromosome, cytoskeleton and membrane dynamics during mitosis. Here, we identified phosphopeptides and phosphoprotein complexes recognized by a phosphorylation specific antibody that labels the CPC using liquid chromatography coupled to mass spectrometry. A mitotic phosphorylation motif (PX{G/T/S}{L/M}[pS]P or WGL[pS]P) was identified in 11 proteins including Fzr/Cdh1 and RIC-8, two proteins with potential links to the CPC. Phosphoprotein complexes contained known CPC components INCENP, Aurora-B and TD-60, as well as SMAD2, 14-3-3 proteins, PP2A, and Cdk1, a likely kinase for this motif. Protein sequence analysis identified phosphorylation motifs in additional proteins includingmore » SMAD2, Plk3 and INCENP. Mitotic SMAD2 and Plk3 phosphorylation was confirmed using phosphorylation specific antibodies, and in the case of Plk3, phosphorylation correlates with its localization to the mitotic apparatus. A mutagenesis approach was used to show INCENP phosphorylation is required for midbody localization. These results provide evidence for a shared phosphorylation event that regulates localization of critical proteins during mitosis.« less
Discovery of candidate KEN-box motifs using cell cycle keyword enrichment combined with native disorder prediction and motif conservation.

PubMed

Michael, Sushama; Travé, Gilles; Ramu, Chenna; Chica, Claudia; Gibson, Toby J

2008-02-15

KEN-box-mediated target selection is one of the mechanisms used in the proteasomal destruction of mitotic cell cycle proteins via the APC/C complex. While annotating the Eukaryotic Linear Motif resource (ELM, http://elm.eu.org/), we found that KEN motifs were significantly enriched in human protein entries with cell cycle keywords in the UniProt/Swiss-Prot database-implying that KEN-boxes might be more common than reported. Matches to short linear motifs in protein database searches are not, per se, significant. KEN-box enrichment with cell cycle Gene Ontology terms suggests that collectively these motifs are functional but does not prove that any given instance is so. Candidates were surveyed for native disorder prediction using GlobPlot and IUPred and for motif conservation in homologues. Among >25 strong new candidates, the most notable are human HIPK2, CHFR, CDC27, Dab2, Upf2, kinesin Eg5, DNA Topoisomerase 1 and yeast Cdc5 and Swi5. A similar number of weaker candidates were present. These proteins have yet to be tested for APC/C targeted destruction, providing potential new avenues of research.
Importance of α–helix N–capping motif in stabilization of ββα fold

PubMed Central

Koscielska-Kasprzak, Katarzyna; Cierpicki, Tomasz; Otlewski, Jacek

2003-01-01

FSD-1 (full sequence design 1) is a protein folded in a ββα motif, designed on the basis of the second zinc finger domain of Zif268 by a substitution of its metal coordination site with a hydrophobic core. In this work, we analyzed the possibility of introducing the DNA recognition motif of the template zinc finger (S13RSDH17) into FSD-1 sequence in order to obtain a small DNA-binding module devoid of cross-link(s) or metal cofactors. The hybrid protein was unfolded, as judged by CD and NMR criteria. To reveal the role of each of the five amino acids, which form the N-capping motif of the α-helix, we analyzed conformational and stability properties of eight FSD-1 mutants. We used a shielded methyl group of Leu 18 and a CD signal at 215 nm as a convenient measure of the folded state. Glu 17→His substitution at the N3 in S13NEKE17 variant decreased the folded structure content from 90% to 25% (equivalent to 1.8 kcal • mole−1 destabilization) by disruption of N-capping interactions, and had the most significant effect among single mutants studied here. The Ncap Asn 14 substitution with Arg considerably decreased stability, reducing structure content from 90% to 40% (1.4 kcal • mole−1 destabilization) by disruption of a helix-capping hydrogen bond and destabilization of a helix macrodipole. The N1 Glu 15→Ser mutation also produced a considerable effect (1.0 kcal • mole−1 destabilization), again emphasizing the significance of electrostatic interactions in α-helix stabilization. PMID:12761399
Systematic Analysis of Intracellular Trafficking Motifs Located within the Cytoplasmic Domain of Simian Immunodeficiency Virus Glycoprotein gp41

PubMed Central

Postler, Thomas S.; Bixby, Jacqueline G.; Desrosiers, Ronald C.; Yuste, Eloísa

2014-01-01

Previous studies have shown that truncation of the cytoplasmic-domain sequences of the simian immunodeficiency virus (SIV) envelope glycoprotein (Env) just prior to a potential intracellular-trafficking signal of the sequence YIHF can strongly increase Env protein expression on the cell surface, Env incorporation into virions and, at least in some contexts, virion infectivity. Here, all 12 potential intracellular-trafficking motifs (YXXΦ or LL/LI/IL) in the gp41 cytoplasmic domain (gp41CD) of SIVmac239 were analyzed by systematic mutagenesis. One single and 7 sequential combination mutants in this cytoplasmic domain were characterized. Cell-surface levels of Env were not significantly affected by any of the mutations. Most combination mutations resulted in moderate 3- to 8-fold increases in Env incorporation into virions. However, mutation of all 12 potential sites actually decreased Env incorporation into virions. Variant forms with 11 or 12 mutated sites exhibited 3-fold lower levels of inherent infectivity, while none of the other single or combination mutations that were studied significantly affected the inherent infectivity of SIVmac239. These minor effects of mutations in trafficking motifs form a stark contrast to the strong increases in cell-surface expression and Env incorporation which have previously been reported for large truncations of gp41CD. Surprisingly, mutation of potential trafficking motifs in gp41CD of SIVmac316, which differs by only one residue from gp41CD of SIVmac239, effectively recapitulated the increases in Env incorporation into virions observed with gp41CD truncations. Our results indicate that increases in Env surface expression and virion incorporation associated with truncation of SIVmac239 gp41CD are not fully explained by loss of consensus trafficking motifs. PMID:25479017
Positive evolutionary selection of an HD motif on Alzheimer precursor protein orthologues suggests a functional role.

PubMed

Miklós, István; Zádori, Zoltán

2012-02-01

HD amino acid duplex has been found in the active center of many different enzymes. The dyad plays remarkably different roles in their catalytic processes that usually involve metal coordination. An HD motif is positioned directly on the amyloid beta fragment (Aβ) and on the carboxy-terminal region of the extracellular domain (CAED) of the human amyloid precursor protein (APP) and a taxonomically well defined group of APP orthologues (APPOs). In human Aβ HD is part of a presumed, RGD-like integrin-binding motif RHD; however, neither RHD nor RXD demonstrates reasonable conservation in APPOs. The sequences of CAEDs and the position of the HD are not particularly conserved either, yet we show with a novel statistical method using evolutionary modeling that the presence of HD on CAEDs cannot be the result of neutral evolutionary forces (p<0.0001). The motif is positively selected along the evolutionary process in the majority of APPOs, despite the fact that HD motif is underrepresented in the proteomes of all species of the animal kingdom. Position migration can be explained by high probability occurrence of multiple copies of HD on intermediate sequences, from which only one is kept by selective evolutionary forces, in a similar way as in the case of the "transcription binding site turnover." CAED of all APP orthologues and homologues are predicted to bind metal ions including Amyloid-like protein 1 (APLP1) and Amyloid-like protein 2 (APLP2). Our results suggest that HDs on the CAEDs are most probably key components of metal-binding domains, which facilitate and/or regulate inter- or intra-molecular interactions in a metal ion-dependent or metal ion concentration-dependent manner. The involvement of naturally occurring mutations of HD (Tottori (D7N) and English (H6R) mutations) in early onset Alzheimer's disease gives additional support to our finding that HD has an evolutionary preserved function on APPOs.
Positive Evolutionary Selection of an HD Motif on Alzheimer Precursor Protein Orthologues Suggests a Functional Role

PubMed Central

Miklós, István; Zádori, Zoltán

2012-01-01

HD amino acid duplex has been found in the active center of many different enzymes. The dyad plays remarkably different roles in their catalytic processes that usually involve metal coordination. An HD motif is positioned directly on the amyloid beta fragment (Aβ) and on the carboxy-terminal region of the extracellular domain (CAED) of the human amyloid precursor protein (APP) and a taxonomically well defined group of APP orthologues (APPOs). In human Aβ HD is part of a presumed, RGD-like integrin-binding motif RHD; however, neither RHD nor RXD demonstrates reasonable conservation in APPOs. The sequences of CAEDs and the position of the HD are not particularly conserved either, yet we show with a novel statistical method using evolutionary modeling that the presence of HD on CAEDs cannot be the result of neutral evolutionary forces (p<0.0001). The motif is positively selected along the evolutionary process in the majority of APPOs, despite the fact that HD motif is underrepresented in the proteomes of all species of the animal kingdom. Position migration can be explained by high probability occurrence of multiple copies of HD on intermediate sequences, from which only one is kept by selective evolutionary forces, in a similar way as in the case of the “transcription binding site turnover.” CAED of all APP orthologues and homologues are predicted to bind metal ions including Amyloid-like protein 1 (APLP1) and Amyloid-like protein 2 (APLP2). Our results suggest that HDs on the CAEDs are most probably key components of metal-binding domains, which facilitate and/or regulate inter- or intra-molecular interactions in a metal ion-dependent or metal ion concentration-dependent manner. The involvement of naturally occurring mutations of HD (Tottori (D7N) and English (H6R) mutations) in early onset Alzheimer's disease gives additional support to our finding that HD has an evolutionary preserved function on APPOs. PMID:22319430
Characterization of the genetic elements required for site-specific integration of plasmid pSE211 in Saccharopolyspora erythraea.

PubMed Central

Brown, D P; Idler, K B; Katz, L

1990-01-01

The 18.1-kilobase plasmid pSE211 integrates into the chromosome of Saccharopolyspora erythraea at a specific attB site. Restriction analysis of the integrated plasmid, pSE211int, and adjacent chromosomal sequences allowed identification of attP, the plasmid attachment site. Nucleotide sequencing of attP, attB, attL, and attR revealed a 57-base-pair sequence common to all sites with no duplications of adjacent plasmid or chromosomal sequences in the integrated state, indicating that integration takes place through conservative, reciprocal strand exchange. An analysis of the sequences indicated the presence of a putative gene for Phe-tRNA at attB which is preserved at attL after integration has occurred. A comparison of the attB site for a number of actinomycete plasmids is presented. Integration at attB was also observed when a 2.4-kilobase segment of pSE211 containing attP and the adjacent plasmid sequence was used to transform a pSE211- host. Nucleotide sequencing of this segment revealed the presence of two complete open reading frames (ORFs) and a segment of a third ORF. The ORF adjacent to attP encodes a putative polypeptide 437 amino acids in length that shows similarity, at its C-terminal domain, to sequences of site-specific recombinases of the integrase family. The adjacent ORF encodes a putative 98-amino-acid basic polypeptide that contains a helix-turn-helix motif at its N terminus which corresponds to domains in the Xis proteins of a number of bacteriophages. A proposal for the function of this polypeptide is presented. The deduced amino acid sequence of the third ORF did not reveal similarities to polypeptide sequences in the current data banks. Images FIG. 2 FIG. 3 PMID:2180909
Enhancing genome assemblies by integrating non-sequence based data

PubMed Central

2011-01-01

Introduction Many genome projects were underway before the advent of high-throughput sequencing and have thus been supported by a wealth of genome information from other technologies. Such information frequently takes the form of linkage and physical maps, both of which can provide a substantial amount of data useful in de novo sequencing projects. Furthermore, the recent abundance of genome resources enables the use of conserved synteny maps identified in related species to further enhance genome assemblies. Methods The tammar wallaby (Macropus eugenii) is a model marsupial mammal with a low coverage genome. However, we have access to extensive comparative maps containing over 14,000 markers constructed through the physical mapping of conserved loci, chromosome painting and comprehensive linkage maps. Using a custom Bioperl pipeline, information from the maps was aligned to assembled tammar wallaby contigs using BLAT. This data was used to construct pseudo paired-end libraries with intervals ranging from 5-10 MB. We then used Bambus (a program designed to scaffold eukaryotic genomes by ordering and orienting contigs through the use of paired-end data) to scaffold our libraries. To determine how map data compares to sequence based approaches to enhance assemblies, we repeated the experiment using a 0.5× coverage of unique reads from 4 KB and 8 KB Illumina paired-end libraries. Finally, we combined both the sequence and non-sequence-based data to determine how a combined approach could further enhance the quality of the low coverage de novo reconstruction of the tammar wallaby genome. Results Using the map data alone, we were able order 2.2% of the initial contigs into scaffolds, and increase the N50 scaffold size to 39 KB (36 KB in the original assembly). Using only the 0.5× paired-end sequence based data, 53% of the initial contigs were assigned to scaffolds. Combining both data sets resulted in a further 2% increase in the number of initial contigs integrated
Enhancing genome assemblies by integrating non-sequence based data.

PubMed

Heider, Thomas N; Lindsay, James; Wang, Chenwei; O'Neill, Rachel J; Pask, Andrew J

2011-05-28

Many genome projects were underway before the advent of high-throughput sequencing and have thus been supported by a wealth of genome information from other technologies. Such information frequently takes the form of linkage and physical maps, both of which can provide a substantial amount of data useful in de novo sequencing projects. Furthermore, the recent abundance of genome resources enables the use of conserved synteny maps identified in related species to further enhance genome assemblies. The tammar wallaby (Macropus eugenii) is a model marsupial mammal with a low coverage genome. However, we have access to extensive comparative maps containing over 14,000 markers constructed through the physical mapping of conserved loci, chromosome painting and comprehensive linkage maps. Using a custom Bioperl pipeline, information from the maps was aligned to assembled tammar wallaby contigs using BLAT. This data was used to construct pseudo paired-end libraries with intervals ranging from 5-10 MB. We then used Bambus (a program designed to scaffold eukaryotic genomes by ordering and orienting contigs through the use of paired-end data) to scaffold our libraries. To determine how map data compares to sequence based approaches to enhance assemblies, we repeated the experiment using a 0.5× coverage of unique reads from 4 KB and 8 KB Illumina paired-end libraries. Finally, we combined both the sequence and non-sequence-based data to determine how a combined approach could further enhance the quality of the low coverage de novo reconstruction of the tammar wallaby genome. Using the map data alone, we were able order 2.2% of the initial contigs into scaffolds, and increase the N50 scaffold size to 39 KB (36 KB in the original assembly). Using only the 0.5× paired-end sequence based data, 53% of the initial contigs were assigned to scaffolds. Combining both data sets resulted in a further 2% increase in the number of initial contigs integrated into a scaffold (55% total
Tiered Human Integrated Sequence Search Databases for Shotgun Proteomics.

PubMed

Deutsch, Eric W; Sun, Zhi; Campbell, David S; Binz, Pierre-Alain; Farrah, Terry; Shteynberg, David; Mendoza, Luis; Omenn, Gilbert S; Moritz, Robert L

2016-11-04

The results of analysis of shotgun proteomics mass spectrometry data can be greatly affected by the selection of the reference protein sequence database against which the spectra are matched. For many species there are multiple sources from which somewhat different sequence sets can be obtained. This can lead to confusion about which database is best in which circumstances-a problem especially acute in human sample analysis. All sequence databases are genome-based, with sequences for the predicted gene and their protein translation products compiled. Our goal is to create a set of primary sequence databases that comprise the union of sequences from many of the different available sources and make the result easily available to the community. We have compiled a set of four sequence databases of varying sizes, from a small database consisting of only the ∼20,000 primary isoforms plus contaminants to a very large database that includes almost all nonredundant protein sequences from several sources. This set of tiered, increasingly complete human protein sequence databases suitable for mass spectrometry proteomics sequence database searching is called the Tiered Human Integrated Search Proteome set. In order to evaluate the utility of these databases, we have analyzed two different data sets, one from the HeLa cell line and the other from normal human liver tissue, with each of the four tiers of database complexity. The result is that approximately 0.8%, 1.1%, and 1.5% additional peptides can be identified for Tiers 2, 3, and 4, respectively, as compared with the Tier 1 database, at substantially increasing computational cost. This increase in computational cost may be worth bearing if the identification of sequence variants or the discovery of sequences that are not present in the reviewed knowledge base entries is an important goal of the study. We find that it is useful to search a data set against a simpler database, and then check the uniqueness of the
Tiered Human Integrated Sequence Search Databases for Shotgun Proteomics

PubMed Central

Deutsch, Eric W.; Sun, Zhi; Campbell, David S.; Binz, Pierre-Alain; Farrah, Terry; Shteynberg, David; Mendoza, Luis; Omenn, Gilbert S.; Moritz, Robert L.

2016-01-01

The results of analysis of shotgun proteomics mass spectrometry data can be greatly affected by the selection of the reference protein sequence database against which the spectra are matched. For many species there are multiple sources from which somewhat different sequence sets can be obtained. This can lead to confusion about which database is best in which circumstances – a problem especially acute in human sample analysis. All sequence databases are genome-based, with sequences for the predicted gene and their protein translation products compiled. Our goal is to create a set of primary sequence databases that comprise the union of sequences from many of the different available sources and make the result easily available to the community. We have compiled a set of four sequence databases of varying sizes, from a small database consisting of only the ~20,000 primary isoforms plus contaminants to a very large database that includes almost all non-redundant protein sequences from several sources. This set of tiered, increasingly complete human protein sequence databases suitable for mass spectrometry proteomics sequence database searching is called the Tiered Human Integrated Search Proteome set. In order to evaluate the utility of these databases, we have analyzed two different data sets, one from the HeLa cell line and the other from normal human liver tissue, with each of the four tiers of database complexity. The result is that approximately 0.8%, 1.1%, and 1.5% additional peptides can be identified for Tiers 2, 3, and 4, respectively, as compared with the Tier 1 database, at substantially increasing computational cost. This increase in computational cost may be worth bearing if the identification of sequence variants or the discovery of sequences that are not present in the reviewed knowledge base entries is an important goal of the study. We find that it is useful to search a data set against a simpler database, and then check the uniqueness of the
DNA motif elucidation using belief propagation.

PubMed

Wong, Ka-Chun; Chan, Tak-Ming; Peng, Chengbin; Li, Yue; Zhang, Zhaolei

2013-09-01

Protein-binding microarray (PBM) is a high-throughout platform that can measure the DNA-binding preference of a protein in a comprehensive and unbiased manner. A typical PBM experiment can measure binding signal intensities of a protein to all the possible DNA k-mers (k=8∼10); such comprehensive binding affinity data usually need to be reduced and represented as motif models before they can be further analyzed and applied. Since proteins can often bind to DNA in multiple modes, one of the major challenges is to decompose the comprehensive affinity data into multimodal motif representations. Here, we describe a new algorithm that uses Hidden Markov Models (HMMs) and can derive precise and multimodal motifs using belief propagations. We describe an HMM-based approach using belief propagations (kmerHMM), which accepts and preprocesses PBM probe raw data into median-binding intensities of individual k-mers. The k-mers are ranked and aligned for training an HMM as the underlying motif representation. Multiple motifs are then extracted from the HMM using belief propagations. Comparisons of kmerHMM with other leading methods on several data sets demonstrated its effectiveness and uniqueness. Especially, it achieved the best performance on more than half of the data sets. In addition, the multiple binding modes derived by kmerHMM are biologically meaningful and will be useful in interpreting other genome-wide data such as those generated from ChIP-seq. The executables and source codes are available at the authors' websites: e.g. http://www.cs.toronto.edu/∼wkc/kmerHMM.
INCENP Centromere and Spindle Targeting: Identification of Essential Conserved Motifs and Involvement of Heterochromatin Protein HP1

PubMed Central

Ainsztein, Alexandra M.; Kandels-Lewis, Stefanie E.; Mackay, Alastair M.; Earnshaw, William C.

1998-01-01

The inner centromere protein (INCENP) has a modular organization, with domains required for chromosomal and cytoskeletal functions concentrated near the amino and carboxyl termini, respectively. In this study we have identified an autonomous centromere- and midbody-targeting module in the amino-terminal 68 amino acids of INCENP. Within this module, we have identified two evolutionarily conserved amino acid sequence motifs: a 13–amino acid motif that is required for targeting to centromeres and transfer to the spindle, and an 11–amino acid motif that is required for transfer to the spindle by molecules that have targeted previously to the centromere. To begin to understand the mechanisms of INCENP function in mitosis, we have performed a yeast two-hybrid screen for interacting proteins. These and subsequent in vitro binding experiments identify a physical interaction between INCENP and heterochromatin protein HP1Hsα. Surprisingly, this interaction does not appear to be involved in targeting INCENP to the centromeric heterochromatin, but may instead have a role in its transfer from the chromosomes to the anaphase spindle. PMID:9864353
Systematic elucidation and in vivo validation of sequences enriched in hindbrain transcriptional control

PubMed Central

Burzynski, Grzegorz M.; Reed, Xylena; Taher, Leila; Stine, Zachary E.; Matsui, Takeshi; Ovcharenko, Ivan; McCallion, Andrew S.

2012-01-01

Illuminating the primary sequence encryption of enhancers is central to understanding the regulatory architecture of genomes. We have developed a machine learning approach to decipher motif patterns of hindbrain enhancers and identify 40,000 sequences in the human genome that we predict display regulatory control that includes the hindbrain. Consistent with their roles in hindbrain patterning, MEIS1, NKX6-1, as well as HOX and POU family binding motifs contributed strongly to this enhancer model. Predicted hindbrain enhancers are overrepresented at genes expressed in hindbrain and associated with nervous system development, and primarily reside in the areas of open chromatin. In addition, 77 (0.2%) of these predictions are identified as hindbrain enhancers on the VISTA Enhancer Browser, and 26,000 (60%) overlap enhancer marks (H3K4me1 or H3K27ac). To validate these putative hindbrain enhancers, we selected 55 elements distributed throughout our predictions and six low scoring controls for evaluation in a zebrafish transgenic assay. When assayed in mosaic transgenic embryos, 51/55 elements directed expression in the central nervous system. Furthermore, 30/34 (88%) predicted enhancers analyzed in stable zebrafish transgenic lines directed expression in the larval zebrafish hindbrain. Subsequent analysis of sequence fragments selected based upon motif clustering further confirmed the critical role of the motifs contributing to the classifier. Our results demonstrate the existence of a primary sequence code characteristic to hindbrain enhancers. This code can be accurately extracted using machine-learning approaches and applied successfully for de novo identification of hindbrain enhancers. This study represents a critical step toward the dissection of regulatory control in specific neuronal subtypes. PMID:22759862
Neural Encoding and Integration of Learned Probabilistic Sequences in Avian Sensory-Motor Circuitry

PubMed Central

Brainard, Michael S.

2013-01-01

Many complex behaviors, such as human speech and birdsong, reflect a set of categorical actions that can be flexibly organized into variable sequences. However, little is known about how the brain encodes the probabilities of such sequences. Behavioral sequences are typically characterized by the probability of transitioning from a given action to any subsequent action (which we term “divergence probability”). In contrast, we hypothesized that neural circuits might encode the probability of transitioning to a given action from any preceding action (which we term “convergence probability”). The convergence probability of repeatedly experienced sequences could naturally become encoded by Hebbian plasticity operating on the patterns of neural activity associated with those sequences. To determine whether convergence probability is encoded in the nervous system, we investigated how auditory-motor neurons in vocal premotor nucleus HVC of songbirds encode different probabilistic characterizations of produced syllable sequences. We recorded responses to auditory playback of pseudorandomly sequenced syllables from the bird's repertoire, and found that variations in responses to a given syllable could be explained by a positive linear dependence on the convergence probability of preceding sequences. Furthermore, convergence probability accounted for more response variation than other probabilistic characterizations, including divergence probability. Finally, we found that responses integrated over >7–10 syllables (∼700–1000 ms) with the sign, gain, and temporal extent of integration depending on convergence probability. Our results demonstrate that convergence probability is encoded in sensory-motor circuitry of the song-system, and suggest that encoding of convergence probability is a general feature of sensory-motor circuits. PMID:24198363

The N-terminal leucine-zipper motif in PTRF/cavin-1 is essential and sufficient for its caveolae-association

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wei, Zhuang; Laboratory of System Biology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031; Zou, Xinle

2015-01-16

Highlight: • The N-terminal leucine-zipper motif in PTRF/cavin-1 determines caveolar association. • Different cellular localization of PTRF/cavin-1 influences its serine 389 and 391 phosphorylation state. • PTRF/cavin-1 regulates cell motility via its caveolar association. - Abstract: PTRF/cavin-1 is a protein of two lives. Its reported functions in ribosomal RNA synthesis and in caveolae formation happen in two different cellular locations: nucleus vs. plasma membrane. Here, we identified that the N-terminal leucine-zipper motif in PTRF/cavin-1 was essential for the protein to be associated with caveolae in plasma membrane. It could counteract the effect of nuclear localization sequence in the molecule (AAmore » 235–251). Deletion of this leucine-zipper motif from PTRF/cavin-1 caused the mutant to be exclusively localized in nuclei. The fusion of this leucine-zipper motif with histone 2A, which is a nuclear protein, could induce the fusion protein to be exported from nucleus. Cell migration was greatly inhibited in PTRF/cavin-1{sup −/−} mouse embryonic fibroblasts (MEFs). The inhibited cell motility could only be rescued by exogenous cavin-1 but not the leucine-zipper motif deleted cavin-1 mutant. Plasma membrane dynamics is an important factor in cell motility control. Our results suggested that the membrane dynamics in cell migration is affected by caveolae associated PTRF/cavin-1.« less
Combined sequence and structure analysis of the fungal laccase family.

PubMed

Kumar, S V Suresh; Phale, Prashant S; Durani, S; Wangikar, Pramod P

2003-08-20

Plant and fungal laccases belong to the family of multi-copper oxidases and show much broader substrate specificity than other members of the family. Laccases have consequently been of interest for potential industrial applications. We have analyzed the essential sequence features of fungal laccases based on multiple sequence alignments of more than 100 laccases. This has resulted in identification of a set of four ungapped sequence regions, L1-L4, as the overall signature sequences that can be used to identify the laccases, distinguishing them within the broader class of multi-copper oxidases. The 12 amino acid residues in the enzymes serving as the copper ligands are housed within these four identified conserved regions, of which L2 and L4 conform to the earlier reported copper signature sequences of multi-copper oxidases while L1 and L3 are distinctive to the laccases. The mapping of regions L1-L4 on to the three-dimensional structure of the Coprinus cinerius laccase indicates that many of the non-copper-ligating residues of the conserved regions could be critical in maintaining a specific, more or less C-2 symmetric, protein conformational motif characterizing the active site apparatus of the enzymes. The observed intraprotein homologies between L1 and L3 and between L2 and L4 at both the structure and the sequence levels suggest that the quasi C-2 symmetric active site conformational motif may have arisen from a structural duplication event that neither the sequence homology analysis nor the structure homology analysis alone would have unraveled. Although the sequence and structure homology is not detectable in the rest of the protein, the relative orientation of region L1 with L2 is similar to that of L3 with L4. The structure duplication of first-shell and second-shell residues has become cryptic because the intraprotein sequence homology noticeable for a given laccase becomes significant only after comparing the conservation pattern in several fungal
APRICOT: an integrated computational pipeline for the sequence-based identification and characterization of RNA-binding proteins.

PubMed

Sharan, Malvika; Förstner, Konrad U; Eulalio, Ana; Vogel, Jörg

2017-06-20

RNA-binding proteins (RBPs) have been established as core components of several post-transcriptional gene regulation mechanisms. Experimental techniques such as cross-linking and co-immunoprecipitation have enabled the identification of RBPs, RNA-binding domains (RBDs) and their regulatory roles in the eukaryotic species such as human and yeast in large-scale. In contrast, our knowledge of the number and potential diversity of RBPs in bacteria is poorer due to the technical challenges associated with the existing global screening approaches. We introduce APRICOT, a computational pipeline for the sequence-based identification and characterization of proteins using RBDs known from experimental studies. The pipeline identifies functional motifs in protein sequences using position-specific scoring matrices and Hidden Markov Models of the functional domains and statistically scores them based on a series of sequence-based features. Subsequently, APRICOT identifies putative RBPs and characterizes them by several biological properties. Here we demonstrate the application and adaptability of the pipeline on large-scale protein sets, including the bacterial proteome of Escherichia coli. APRICOT showed better performance on various datasets compared to other existing tools for the sequence-based prediction of RBPs by achieving an average sensitivity and specificity of 0.90 and 0.91 respectively. The command-line tool and its documentation are available at https://pypi.python.org/pypi/bio-apricot. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Role of the Box C/D Motif in Localization of Small Nucleolar RNAs to Coiled Bodies and Nucleoli

PubMed Central

Narayanan, Aarthi; Speckmann, Wayne; Terns, Rebecca; Terns, Michael P.

1999-01-01

Small nucleolar RNAs (snoRNAs) are a large family of eukaryotic RNAs that function within the nucleolus in the biogenesis of ribosomes. One major class of snoRNAs is the box C/D snoRNAs named for their conserved box C and box D sequence elements. We have investigated the involvement of cis-acting sequences and intranuclear structures in the localization of box C/D snoRNAs to the nucleolus by assaying the intranuclear distribution of fluorescently labeled U3, U8, and U14 snoRNAs injected into Xenopus oocyte nuclei. Analysis of an extensive panel of U3 RNA variants showed that the box C/D motif, comprised of box C′, box D, and the 3′ terminal stem of U3, is necessary and sufficient for the nucleolar localization of U3 snoRNA. Disruption of the elements of the box C/D motif of U8 and U14 snoRNAs also prevented nucleolar localization, indicating that all box C/D snoRNAs use a common nucleolar-targeting mechanism. Finally, we found that wild-type box C/D snoRNAs transiently associate with coiled bodies before they localize to nucleoli and that variant RNAs that lack an intact box C/D motif are detained within coiled bodies. These results suggest that coiled bodies play a role in the biogenesis and/or intranuclear transport of box C/D snoRNAs. PMID:10397754
CDinFusion – Submission-Ready, On-Line Integration of Sequence and Contextual Data

PubMed Central

Hankeln, Wolfgang; Wendel, Norma Johanna; Gerken, Jan; Waldmann, Jost; Buttigieg, Pier Luigi; Kostadinov, Ivaylo; Kottmann, Renzo; Yilmaz, Pelin; Glöckner, Frank Oliver

2011-01-01

State of the art (DNA) sequencing methods applied in “Omics” studies grant insight into the ‘blueprints’ of organisms from all domains of life. Sequencing is carried out around the globe and the data is submitted to the public repositories of the International Nucleotide Sequence Database Collaboration. However, the context in which these studies are conducted often gets lost, because experimental data, as well as information about the environment are rarely submitted along with the sequence data. If these contextual or metadata are missing, key opportunities of comparison and analysis across studies and habitats are hampered or even impossible. To address this problem, the Genomic Standards Consortium (GSC) promotes checklists and standards to better describe our sequence data collection and to promote the capturing, exchange and integration of sequence data with contextual data. In a recent community effort the GSC has developed a series of recommendations for contextual data that should be submitted along with sequence data. To support the scientific community to significantly enhance the quality and quantity of contextual data in the public sequence data repositories, specialized software tools are needed. In this work we present CDinFusion, a web-based tool to integrate contextual and sequence data in (Multi)FASTA format prior to submission. The tool is open source and available under the Lesser GNU Public License 3. A public installation is hosted and maintained at the Max Planck Institute for Marine Microbiology at http://www.megx.net/cdinfusion. The tool may also be installed locally using the open source code available at http://code.google.com/p/cdinfusion. PMID:21935468
Junctions between i-motif tetramers in supramolecular structures

PubMed Central

Guittet, Eric; Renciuk, Daniel; Leroy, Jean-Louis

2012-01-01

The symmetry of i-motif tetramers gives to cytidine-rich oligonucleotides the capacity to associate into supramolecular structures (sms). In order to determine how the tetramers are linked together in such structures, we have measured by gel filtration chromatography and NMR the formation and dissociation kinetics of sms built by oligonucleotides containing two short C stretches separated by a non-cytidine-base. We show that a stretch of only two cytidines either at the 3′- or 5′-end is long enough to link the tetramers into sms. The analysis of the properties of sms formed by oligonucleotides differing by the length of the oligo-C stretches, the sequence orientation and the nature of the non-C base provides a model of the junction connecting the tetramers in sms. PMID:22362739
DNA containing CpG motifs induces angiogenesis

NASA Astrophysics Data System (ADS)

Zheng, Mei; Klinman, Dennis M.; Gierynska, Malgorzata; Rouse, Barry T.

2002-06-01

New blood vessel formation in the cornea is an essential step in the pathogenesis of a blinding immunoinflammatory reaction caused by ocular infection with herpes simplex virus (HSV). By using a murine corneal micropocket assay, we found that HSV DNA (which contains a significant excess of potentially bioactive "CpG" motifs when compared with mammalian DNA) induces angiogenesis. Moreover, synthetic oligodeoxynucleotides containing CpG motifs attract inflammatory cells and stimulate the release of vascular endothelial growth factor (VEGF), which in turn triggers new blood vessel formation. In vitro, CpG DNA induces the J774A.1 murine macrophage cell line to produce VEGF. In vivo CpG-induced angiogenesis was blocked by the administration of anti-mVEGF Ab or the inclusion of "neutralizing" oligodeoxynucleotides that specifically oppose the stimulatory activity of CpG DNA. These findings establish that DNA containing bioactive CpG motifs induces angiogenesis, and suggest that CpG motifs in HSV DNA may contribute to the blinding lesions of stromal keratitis.
Ca2+-binding Motif of βγ-Crystallins*

PubMed Central

Srivastava, Shanti Swaroop; Mishra, Amita; Krishnan, Bal; Sharma, Yogendra

2014-01-01

βγ-Crystallin-type double clamp (N/D)(N/D)XX(S/T)S motif is an established but sparsely investigated motif for Ca2+ binding. A βγ-crystallin domain is formed of two Greek key motifs, accommodating two Ca2+-binding sites. βγ-Crystallins make a separate class of Ca2+-binding proteins (CaBP), apparently a major group of CaBP in bacteria. Paralleling the diversity in βγ-crystallin domains, these motifs also show great diversity, both in structure and in function. Although the expression of some of them has been associated with stress, virulence, and adhesion, the functional implications of Ca2+ binding to βγ-crystallins in mediating biological processes are yet to be elucidated. PMID:24567326
Relevance of CARC and CRAC Cholesterol-Recognition Motifs in the Nicotinic Acetylcholine Receptor and Other Membrane-Bound Receptors.

PubMed

Di Scala, Coralie; Baier, Carlos J; Evans, Luke S; Williamson, Philip T F; Fantini, Jacques; Barrantes, Francisco J

2017-01-01

Cholesterol is a ubiquitous neutral lipid, which finely tunes the activity of a wide range of membrane proteins, including neurotransmitter and hormone receptors and ion channels. Given the scarcity of available X-ray crystallographic structures and the even fewer in which cholesterol sites have been directly visualized, application of in silico computational methods remains a valid alternative for the detection and thermodynamic characterization of cholesterol-specific sites in functionally important membrane proteins. The membrane-embedded segments of the paradigm neurotransmitter receptor for acetylcholine display a series of cholesterol consensus domains (which we have coined "CARC"). The CARC motif exhibits a preference for the outer membrane leaflet and its mirror motif, CRAC, for the inner one. Some membrane proteins possess the double CARC-CRAC sequences within the same transmembrane domain. In addition to in silico molecular modeling, the affinity, concentration dependence, and specificity of the cholesterol-recognition motif-protein interaction have recently found experimental validation in other biophysical approaches like monolayer techniques and nuclear magnetic resonance spectroscopy. From the combined studies, it becomes apparent that the CARC motif is now more firmly established as a high-affinity cholesterol-binding domain for membrane-bound receptors and remarkably conserved along phylogenetic evolution. © 2017 Elsevier Inc. All rights reserved.
The Contribution of Short Repeats of Low Sequence Complexity to Large Conifer Genomes

Treesearch

A. Schmidt; R.L. Doudrick; J.S. Heslop-Harrison; T. Schmidt

2000-01-01

Abstract: The abundance and genomic organization of six simple sequence repeats, consisting of di-, tri-, and tetranucleotide sequence motifs, and a minisatellite repeat have been analyzed in different gymnosperms by Southern hybridization. Within the gymnosperm genomes investigated, the abundance and genomic organization of micro- and...
New Structural and Functional Contexts of the Dx[DN]xDG Linear Motif: Insights into Evolution of Calcium-Binding Proteins

PubMed Central

Rigden, Daniel J.; Woodhead, Duncan D.; Wong, Prudence W. H.; Galperin, Michael Y.

2011-01-01

Binding of calcium ions (Ca2+) to proteins can have profound effects on their structure and function. Common roles of calcium binding include structure stabilization and regulation of activity. It is known that diverse families – EF-hands being one of at least twelve – use a Dx[DN]xDG linear motif to bind calcium in near-identical fashion. Here, four novel structural contexts for the motif are described. Existing experimental data for one of them, a thermophilic archaeal subtilisin, demonstrate for the first time a role for Dx[DN]xDG-bound calcium in protein folding. An integrin-like embedding of the motif in the blade of a β-propeller fold – here named the calcium blade – is discovered in structures of bacterial and fungal proteins. Furthermore, sensitive database searches suggest a common origin for the calcium blade in β-propeller structures of different sizes and a pan-kingdom distribution of these proteins. Factors favouring the multiple convergent evolution of the motif appear to include its general Asp-richness, the regular spacing of the Asp residues and the fact that change of Asp into Gly and vice versa can occur though a single nucleotide change. Among the known structural contexts for the Dx[DN]xDG motif, only the calcium blade and the EF-hand are currently found intracellularly in large numbers, perhaps because the higher extracellular concentration of Ca2+ allows for easier fixing of newly evolved motifs that have acquired useful functions. The analysis presented here will inform ongoing efforts toward prediction of similar calcium-binding motifs from sequence information alone. PMID:21720552
Ultrasensitive response motifs: basic amplifiers in molecular signalling networks

PubMed Central

Zhang, Qiang; Bhattacharya, Sudin; Andersen, Melvin E.

2013-01-01

Multi-component signal transduction pathways and gene regulatory circuits underpin integrated cellular responses to perturbations. A recurring set of network motifs serve as the basic building blocks of these molecular signalling networks. This review focuses on ultrasensitive response motifs (URMs) that amplify small percentage changes in the input signal into larger percentage changes in the output response. URMs generally possess a sigmoid input–output relationship that is steeper than the Michaelis–Menten type of response and is often approximated by the Hill function. Six types of URMs can be commonly found in intracellular molecular networks and each has a distinct kinetic mechanism for signal amplification. These URMs are: (i) positive cooperative binding, (ii) homo-multimerization, (iii) multistep signalling, (iv) molecular titration, (v) zero-order covalent modification cycle and (vi) positive feedback. Multiple URMs can be combined to generate highly switch-like responses. Serving as basic signal amplifiers, these URMs are essential for molecular circuits to produce complex nonlinear dynamics, including multistability, robust adaptation and oscillation. These dynamic properties are in turn responsible for higher-level cellular behaviours, such as cell fate determination, homeostasis and biological rhythm. PMID:23615029
A frequent, GxxxG-mediated, transmembrane association motif is optimized for the formation of interhelical Cα–H hydrogen bonds

PubMed Central

Mueller, Benjamin K.; Subramaniam, Sabareesh; Senes, Alessandro

2014-01-01

Carbon hydrogen bonds between Cα–H donors and carbonyl acceptors are frequently observed between transmembrane helices (Cα–H···O=C). Networks of these interactions occur often at helix−helix interfaces mediated by GxxxG and similar patterns. Cα–H hydrogen bonds have been hypothesized to be important in membrane protein folding and association, but evidence that they are major determinants of helix association is still lacking. Here we present a comprehensive geometric analysis of homodimeric helices that demonstrates the existence of a single region in conformational space with high propensity for Cα–H···O=C hydrogen bond formation. This region corresponds to the most frequent motif for parallel dimers, GASright, whose best-known example is glycophorin A. The finding suggests a causal link between the high frequency of occurrence of GASright and its propensity for carbon hydrogen bond formation. Investigation of the sequence dependency of the motif determined that Gly residues are required at specific positions where only Gly can act as a donor with its “side chain” Hα. Gly also reduces the steric barrier for non-Gly amino acids at other positions to act as Cα donors, promoting the formation of cooperative hydrogen bonding networks. These findings offer a structural rationale for the occurrence of GxxxG patterns at the GASright interface. The analysis identified the conformational space and the sequence requirement of Cα–H···O=C mediated motifs; we took advantage of these results to develop a structural prediction method. The resulting program, CATM, predicts ab initio the known high-resolution structures of homodimeric GASright motifs at near-atomic level. PMID:24569864
Crammed signaling motifs in the T-cell receptor.

PubMed

Borroto, Aldo; Abia, David; Alarcón, Balbino

2014-09-01

Although the T cell antigen receptor (TCR) is long known to contain multiple signaling subunits (CD3γ, CD3δ, CD3ɛ and CD3ζ), their role in signal transduction is still not well understood. The presence of at least one immunoreceptor tyrosine-based activation motif (ITAM) in each CD3 subunit has led to the idea that the multiplication of such elements essentially serves to amplify signals. However, the evolutionary conservation of non-ITAM sequences suggests that each CD3 subunit is likely to have specific non-redundant roles at some stage of development or in mature T cell function. The CD3ɛ subunit is paradigmatic because in a relatively short cytoplasmic sequence (∼55 amino acids) it contains several docking sites for proteins involved in intracellular trafficking and signaling, proteins whose relevance in T cell activation is slowly starting to be revealed. In this review we will summarize our current knowledge on the signaling effectors that bind directly to the TCR and we will propose a hierarchy in their response to TCR triggering. Copyright © 2014 Elsevier B.V. All rights reserved.
MDD-carb: a combinatorial model for the identification of protein carbonylation sites with substrate motifs.

PubMed

Kao, Hui-Ju; Weng, Shun-Long; Huang, Kai-Yao; Kaunang, Fergie Joanda; Hsu, Justin Bo-Kai; Huang, Chien-Hsun; Lee, Tzong-Yi

2017-12-21

Carbonylation, which takes place through oxidation of reactive oxygen species (ROS) on specific residues, is an irreversibly oxidative modification of proteins. It has been reported that the carbonylation is related to a number of metabolic or aging diseases including diabetes, chronic lung disease, Parkinson's disease, and Alzheimer's disease. Due to the lack of computational methods dedicated to exploring motif signatures of protein carbonylation sites, we were motivated to exploit an iterative statistical method to characterize and identify carbonylated sites with motif signatures. By manually curating experimental data from research articles, we obtained 332, 144, 135, and 140 verified substrate sites for K (lysine), R (arginine), T (threonine), and P (proline) residues, respectively, from 241 carbonylated proteins. In order to examine the informative attributes for classifying between carbonylated and non-carbonylated sites, multifarious features including composition of twenty amino acids (AAC), composition of amino acid pairs (AAPC), position-specific scoring matrix (PSSM), and positional weighted matrix (PWM) were investigated in this study. Additionally, in an attempt to explore the motif signatures of carbonylation sites, an iterative statistical method was adopted to detect statistically significant dependencies of amino acid compositions between specific positions around substrate sites. Profile hidden Markov model (HMM) was then utilized to train a predictive model from each motif signature. Moreover, based on the method of support vector machine (SVM), we adopted it to construct an integrative model by combining the values of bit scores obtained from profile HMMs. The combinatorial model could provide an enhanced performance with evenly predictive sensitivity and specificity in the evaluation of cross-validation and independent testing. This study provides a new scheme for exploring potential motif signatures at substrate sites of protein
Computational study of the fibril organization of polyglutamine repeats reveals a common motif identified in beta-helices.

PubMed

Zanuy, David; Gunasekaran, Kannan; Lesk, Arthur M; Nussinov, Ruth

2006-04-21

The formation of fibril aggregates by long polyglutamine sequences is assumed to play a major role in neurodegenerative diseases such as Huntington. Here, we model peptides rich in glutamine, through a series of molecular dynamics simulations. Starting from a rigid nanotube-like conformation, we have obtained a new conformational template that shares structural features of a tubular helix and of a beta-helix conformational organization. Our new model can be described as a super-helical arrangement of flat beta-sheet segments linked by planar turns or bends. Interestingly, our comprehensive analysis of the Protein Data Bank reveals that this is a common motif in beta-helices (termed beta-bend), although it has not been identified so far. The motif is based on the alternation of beta-sheet and helical conformation as the protein sequence is followed from the N to the C termini (beta-alpha(R)-beta-polyPro-beta). We further identify this motif in the ssNMR structure of the protofibril of the amyloidogenic peptide Abeta(1-40). The recurrence of the beta-bend suggests a general mode of connecting long parallel beta-sheet segments that would allow the growth of partially ordered fibril structures. The design allows the peptide backbone to change direction with a minimal loss of main chain hydrogen bonds. The identification of a coherent organization beyond that of the beta-sheet segments in different folds rich in parallel beta-sheets suggests a higher degree of ordered structure in protein fibrils, in agreement with their low solubility and dense molecular packing.
Binding of the cSH3 domain of Grb2 adaptor to two distinct RXXK motifs within Gab1 docker employs differential mechanisms.

PubMed

McDonald, Caleb B; Seldeen, Kenneth L; Deegan, Brian J; Bhat, Vikas; Farooq, Amjad

2011-01-01

A ubiquitous component of cellular signaling machinery, Gab1 docker plays a pivotal role in routing extracellular information in the form of growth factors and cytokines to downstream targets such as transcription factors within the nucleus. Here, using isothermal titration calorimetry (ITC) in combination with macromolecular modeling (MM), we show that although Gab1 contains four distinct RXXK motifs, designated G1, G2, G3, and G4, only G1 and G2 motifs bind to the cSH3 domain of Grb2 adaptor and do so with distinct mechanisms. Thus, while the G1 motif strictly requires the PPRPPKP consensus sequence for high-affinity binding to the cSH3 domain, the G2 motif displays preference for the PXVXRXLKPXR consensus. Such sequential differences in the binding of G1 and G2 motifs arise from their ability to adopt distinct polyproline type II (PPII)- and 3(10) -helical conformations upon binding to the cSH3 domain, respectively. Collectively, our study provides detailed biophysical insights into a key protein-protein interaction involved in a diverse array of signaling cascades central to health and disease. Copyright © 2010 John Wiley & Sons, Ltd.
Binding of the cSH3 Domain of Grb2 Adaptor to Two Distinct RXXK Motifs within Gab1 Docker Employs Differential Mechanisms

PubMed Central

McDonald, Caleb B.; Seldeen, Kenneth L.; Deegan, Brian J.; Bhat, Vikas; Farooq, Amjad

2010-01-01

A ubiquitous component of cellular signaling machinery, Gab1 docker plays a pivotal role in routing extracellular information in the form of growth factors and cytokines to downstream targets such as transcription factors within the nucleus. Here, using isothermal titration calorimetry (ITC) in combination with macromolecular modeling (MM), we show that although Gab1 contains four distinct RXXK motifs, designated G1, G2, G3 and G4, only G1 and G2 motifs bind to the cSH3 domain of Grb2 adaptor and do so with distinct mechanisms. Thus, while the G1 motif strictly requires the PPRPPKP consensus sequence for high-affinity binding to the cSH3 domain, the G2 motif displays preference for the PXVXRXLKPXR consensus. Such sequential differences in the binding of G1 and G2 motifs arise from their ability to adopt distinct polyproline type II (PPII)- and 310-helical conformations upon binding to the cSH3 domain, respectively. Collectively, our study provides detailed biophysical insights into a key protein-protein interaction involved in a diverse array of signaling cascades central to health and disease. PMID:21472810
Protein contact prediction by integrating deep multiple sequence alignments, coevolution and machine learning.

PubMed

Adhikari, Badri; Hou, Jie; Cheng, Jianlin

2018-03-01

In this study, we report the evaluation of the residue-residue contacts predicted by our three different methods in the CASP12 experiment, focusing on studying the impact of multiple sequence alignment, residue coevolution, and machine learning on contact prediction. The first method (MULTICOM-NOVEL) uses only traditional features (sequence profile, secondary structure, and solvent accessibility) with deep learning to predict contacts and serves as a baseline. The second method (MULTICOM-CONSTRUCT) uses our new alignment algorithm to generate deep multiple sequence alignment to derive coevolution-based features, which are integrated by a neural network method to predict contacts. The third method (MULTICOM-CLUSTER) is a consensus combination of the predictions of the first two methods. We evaluated our methods on 94 CASP12 domains. On a subset of 38 free-modeling domains, our methods achieved an average precision of up to 41.7% for top L/5 long-range contact predictions. The comparison of the three methods shows that the quality and effective depth of multiple sequence alignments, coevolution-based features, and machine learning integration of coevolution-based features and traditional features drive the quality of predicted protein contacts. On the full CASP12 dataset, the coevolution-based features alone can improve the average precision from 28.4% to 41.6%, and the machine learning integration of all the features further raises the precision to 56.3%, when top L/5 predicted long-range contacts are evaluated. And the correlation between the precision of contact prediction and the logarithm of the number of effective sequences in alignments is 0.66. © 2017 Wiley Periodicals, Inc.
Development of expressed sequence tag-simple sequence repeat markers for genetic characterization and population structure analysis of Praxelis clematidea (Asteraceae).

PubMed

Wang, Q Z; Huang, M; Downie, S R; Chen, Z X

2016-05-23

Invasive plants tend to spread aggressively in new habitats and an understanding of their genetic diversity and population structure is useful for their management. In this study, expressed sequence tag-simple sequence repeat (EST-SSR) markers were developed for the invasive plant species Praxelis clematidea (Asteraceae) from 5548 Stevia rebaudiana (Asteraceae) expressed sequence tags (ESTs). A total of 133 microsatellite-containing ESTs (2.4%) were identified, of which 56 (42.1%) were hexanucleotide repeat motifs and 50 (37.6%) were trinucleotide repeat motifs. Of the 24 primer pairs designed from these 133 ESTs, 7 (29.2%) resulted in significant polymorphisms. The number of alleles per locus ranged from 5 to 9. The relatively high genetic diversity (H = 0.2667, I = 0.4212, and P = 100%) of P. clematidea was related to high gene flow (Nm = 1.4996) among populations. The coefficient of population differentiation (GST = 0.2500) indicated that most genetic variation occurred within populations. A Mantel test suggested that there was significant correlation between genetic distance and geographical distribution (r = 0.3192, P = 0.012). These results further support the transferability of EST-SSR markers between closely related genera of the same family.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.