motif finding problem: Topics by Science.gov

Sample records for motif finding problem

Discovering Motifs in Biological Sequences Using the Micron Automata Processor.

PubMed

Roy, Indranil; Aluru, Srinivas

2016-01-01

Finding approximately conserved sequences, called motifs, across multiple DNA or protein sequences is an important problem in computational biology. In this paper, we consider the (l, d) motif search problem of identifying one or more motifs of length l present in at least q of the n given sequences, with each occurrence differing from the motif in at most d substitutions. The problem is known to be NP-complete, and the largest solved instance reported to date is (26,11). We propose a novel algorithm for the (l,d) motif search problem using streaming execution over a large set of non-deterministic finite automata (NFA). This solution is designed to take advantage of the micron automata processor, a new technology close to deployment that can simultaneously execute multiple NFA in parallel. We demonstrate the capability for solving much larger instances of the (l, d) motif search problem using the resources available within a single automata processor board, by estimating run-times for problem instances (39,18) and (40,17). The paper serves as a useful guide to solving problems using this new accelerator technology.
A generic motif discovery algorithm for sequential data.

PubMed

Jensen, Kyle L; Styczynski, Mark P; Rigoutsos, Isidore; Stephanopoulos, Gregory N

2006-01-01

Motif discovery in sequential data is a problem of great interest and with many applications. However, previous methods have been unable to combine exhaustive search with complex motif representations and are each typically only applicable to a certain class of problems. Here we present a generic motif discovery algorithm (Gemoda) for sequential data. Gemoda can be applied to any dataset with a sequential character, including both categorical and real-valued data. As we show, Gemoda deterministically discovers motifs that are maximal in composition and length. As well, the algorithm allows any choice of similarity metric for finding motifs. Finally, Gemoda's output motifs are representation-agnostic: they can be represented using regular expressions, position weight matrices or any number of other models for any type of sequential data. We demonstrate a number of applications of the algorithm, including the discovery of motifs in amino acids sequences, a new solution to the (l,d)-motif problem in DNA sequences and the discovery of conserved protein substructures. Gemoda is freely available at http://web.mit.edu/bamel/gemoda
A Monte Carlo-based framework enhances the discovery and interpretation of regulatory sequence motifs

PubMed Central

2012-01-01

Background Discovery of functionally significant short, statistically overrepresented subsequence patterns (motifs) in a set of sequences is a challenging problem in bioinformatics. Oftentimes, not all sequences in the set contain a motif. These non-motif-containing sequences complicate the algorithmic discovery of motifs. Filtering the non-motif-containing sequences from the larger set of sequences while simultaneously determining the identity of the motif is, therefore, desirable and a non-trivial problem in motif discovery research. Results We describe MotifCatcher, a framework that extends the sensitivity of existing motif-finding tools by employing random sampling to effectively remove non-motif-containing sequences from the motif search. We developed two implementations of our algorithm; each built around a commonly used motif-finding tool, and applied our algorithm to three diverse chromatin immunoprecipitation (ChIP) data sets. In each case, the motif finder with the MotifCatcher extension demonstrated improved sensitivity over the motif finder alone. Our approach organizes candidate functionally significant discovered motifs into a tree, which allowed us to make additional insights. In all cases, we were able to support our findings with experimental work from the literature. Conclusions Our framework demonstrates that additional processing at the sequence entry level can significantly improve the performance of existing motif-finding tools. For each biological data set tested, we were able to propose novel biological hypotheses supported by experimental work from the literature. Specifically, in Escherichia coli, we suggested binding site motifs for 6 non-traditional LexA protein binding sites; in Saccharomyces cerevisiae, we hypothesize 2 disparate mechanisms for novel binding sites of the Cse4p protein; and in Halobacterium sp. NRC-1, we discoverd subtle differences in a general transcription factor (GTF) binding site motif across several data sets. We suggest that small differences in our discovered motif could confer specificity for one or more homologous GTF proteins. We offer a free implementation of the MotifCatcher software package at http://www.bme.ucdavis.edu/facciotti/resources_data/software/. PMID:23181585
A relational extension of the notion of motifs: application to the common 3D protein substructures searching problem.

PubMed

Pisanti, Nadia; Soldano, Henry; Carpentier, Mathilde; Pothier, Joel

2009-12-01

The geometrical configurations of atoms in protein structures can be viewed as approximate relations among them. Then, finding similar common substructures within a set of protein structures belongs to a new class of problems that generalizes that of finding repeated motifs. The novelty lies in the addition of constraints on the motifs in terms of relations that must hold between pairs of positions of the motifs. We will hence denote them as relational motifs. For this class of problems, we present an algorithm that is a suitable extension of the KMR paradigm and, in particular, of the KMRC as it uses a degenerate alphabet. Our algorithm contains several improvements that become especially useful when-as it is required for relational motifs-the inference is made by partially overlapping shorter motifs, rather than concatenating them. The efficiency, correctness and completeness of the algorithm is ensured by several non-trivial properties that are proven in this paper. The algorithm has been applied in the important field of protein common 3D substructure searching. The methods implemented have been tested on several examples of protein families such as serine proteases, globins and cytochromes P450 additionally. The detected motifs have been compared to those found by multiple structural alignments methods.
SLIDER: a generic metaheuristic for the discovery of correlated motifs in protein-protein interaction networks.

PubMed

Boyen, Peter; Van Dyck, Dries; Neven, Frank; van Ham, Roeland C H J; van Dijk, Aalt D J

2011-01-01

Correlated motif mining (cmm) is the problem of finding overrepresented pairs of patterns, called motifs, in sequences of interacting proteins. Algorithmic solutions for cmm thereby provide a computational method for predicting binding sites for protein interaction. In this paper, we adopt a motif-driven approach where the support of candidate motif pairs is evaluated in the network. We experimentally establish the superiority of the Chi-square-based support measure over other support measures. Furthermore, we obtain that cmm is an np-hard problem for a large class of support measures (including Chi-square) and reformulate the search for correlated motifs as a combinatorial optimization problem. We then present the generic metaheuristic slider which uses steepest ascent with a neighborhood function based on sliding motifs and employs the Chi-square-based support measure. We show that slider outperforms existing motif-driven cmm methods and scales to large protein-protein interaction networks. The slider-implementation and the data used in the experiments are available on http://bioinformatics.uhasselt.be.
Classification and assessment tools for structural motif discovery algorithms.

PubMed

Badr, Ghada; Al-Turaiki, Isra; Mathkour, Hassan

2013-01-01

Motif discovery is the problem of finding recurring patterns in biological data. Patterns can be sequential, mainly when discovered in DNA sequences. They can also be structural (e.g. when discovering RNA motifs). Finding common structural patterns helps to gain a better understanding of the mechanism of action (e.g. post-transcriptional regulation). Unlike DNA motifs, which are sequentially conserved, RNA motifs exhibit conservation in structure, which may be common even if the sequences are different. Over the past few years, hundreds of algorithms have been developed to solve the sequential motif discovery problem, while less work has been done for the structural case. In this paper, we survey, classify, and compare different algorithms that solve the structural motif discovery problem, where the underlying sequences may be different. We highlight their strengths and weaknesses. We start by proposing a benchmark dataset and a measurement tool that can be used to evaluate different motif discovery approaches. Then, we proceed by proposing our experimental setup. Finally, results are obtained using the proposed benchmark to compare available tools. To the best of our knowledge, this is the first attempt to compare tools solely designed for structural motif discovery. Results show that the accuracy of discovered motifs is relatively low. The results also suggest a complementary behavior among tools where some tools perform well on simple structures, while other tools are better for complex structures. We have classified and evaluated the performance of available structural motif discovery tools. In addition, we have proposed a benchmark dataset with tools that can be used to evaluate newly developed tools.
Efficient exact motif discovery.

PubMed

Marschall, Tobias; Rahmann, Sven

2009-06-15

The motif discovery problem consists of finding over-represented patterns in a collection of biosequences. It is one of the classical sequence analysis problems, but still has not been satisfactorily solved in an exact and efficient manner. This is partly due to the large number of possibilities of defining the motif search space and the notion of over-representation. Even for well-defined formalizations, the problem is frequently solved in an ad hoc manner with heuristics that do not guarantee to find the best motif. We show how to solve the motif discovery problem (almost) exactly on a practically relevant space of IUPAC generalized string patterns, using the p-value with respect to an i.i.d. model or a Markov model as the measure of over-representation. In particular, (i) we use a highly accurate compound Poisson approximation for the null distribution of the number of motif occurrences. We show how to compute the exact clump size distribution using a recently introduced device called probabilistic arithmetic automaton (PAA). (ii) We define two p-value scores for over-representation, the first one based on the total number of motif occurrences, the second one based on the number of sequences in a collection with at least one occurrence. (iii) We describe an algorithm to discover the optimal pattern with respect to either of the scores. The method exploits monotonicity properties of the compound Poisson approximation and is by orders of magnitude faster than exhaustive enumeration of IUPAC strings (11.8 h compared with an extrapolated runtime of 4.8 years). (iv) We justify the use of the proposed scores for motif discovery by showing our method to outperform other motif discovery algorithms (e.g. MEME, Weeder) on benchmark datasets. We also propose new motifs on Mycobacterium tuberculosis. The method has been implemented in Java. It can be obtained from http://ls11-www.cs.tu-dortmund.de/people/marschal/paa_md/.
Efficient sequential and parallel algorithms for finding edit distance based motifs.

PubMed

Pal, Soumitra; Xiao, Peng; Rajasekaran, Sanguthevar

2016-08-18

Motif search is an important step in extracting meaningful patterns from biological data. The general problem of motif search is intractable and there is a pressing need to develop efficient, exact and approximation algorithms to solve this problem. In this paper, we present several novel, exact, sequential and parallel algorithms for solving the (l,d) Edit-distance-based Motif Search (EMS) problem: given two integers l,d and n biological strings, find all strings of length l that appear in each input string with atmost d errors of types substitution, insertion and deletion. One popular technique to solve the problem is to explore for each input string the set of all possible l-mers that belong to the d-neighborhood of any substring of the input string and output those which are common for all input strings. We introduce a novel and provably efficient neighborhood exploration technique. We show that it is enough to consider the candidates in neighborhood which are at a distance exactly d. We compactly represent these candidate motifs using wildcard characters and efficiently explore them with very few repetitions. Our sequential algorithm uses a trie based data structure to efficiently store and sort the candidate motifs. Our parallel algorithm in a multi-core shared memory setting uses arrays for storing and a novel modification of radix-sort for sorting the candidate motifs. The algorithms for EMS are customarily evaluated on several challenging instances such as (8,1), (12,2), (16,3), (20,4), and so on. The best previously known algorithm, EMS1, is sequential and in estimated 3 days solves up to instance (16,3). Our sequential algorithms are more than 20 times faster on (16,3). On other hard instances such as (9,2), (11,3), (13,4), our algorithms are much faster. Our parallel algorithm has more than 600 % scaling performance while using 16 threads. Our algorithms have pushed up the state-of-the-art of EMS solvers and we believe that the techniques introduced in this paper are also applicable to other motif search problems such as Planted Motif Search (PMS) and Simple Motif Search (SMS).
A novel swarm intelligence algorithm for finding DNA motifs.

PubMed

Lei, Chengwei; Ruan, Jianhua

2009-01-01

Discovering DNA motifs from co-expressed or co-regulated genes is an important step towards deciphering complex gene regulatory networks and understanding gene functions. Despite significant improvement in the last decade, it still remains one of the most challenging problems in computational molecular biology. In this work, we propose a novel motif finding algorithm that finds consensus patterns using a population-based stochastic optimisation technique called Particle Swarm Optimisation (PSO), which has been shown to be effective in optimising difficult multidimensional problems in continuous domains. We propose to use a word dissimilarity graph to remap the neighborhood structure of the solution space of DNA motifs, and propose a modification of the naive PSO algorithm to accommodate discrete variables. In order to improve efficiency, we also propose several strategies for escaping from local optima and for automatically determining the termination criteria. Experimental results on simulated challenge problems show that our method is both more efficient and more accurate than several existing algorithms. Applications to several sets of real promoter sequences also show that our approach is able to detect known transcription factor binding sites, and outperforms two of the most popular existing algorithms.
STEME: A Robust, Accurate Motif Finder for Large Data Sets

PubMed Central

Reid, John E.; Wernisch, Lorenz

2014-01-01

Motif finding is a difficult problem that has been studied for over 20 years. Some older popular motif finders are not suitable for analysis of the large data sets generated by next-generation sequencing. We recently published an efficient approximation (STEME) to the EM algorithm that is at the core of many motif finders such as MEME. This approximation allows the EM algorithm to be applied to large data sets. In this work we describe several efficient extensions to STEME that are based on the MEME algorithm. Together with the original STEME EM approximation, these extensions make STEME a fully-fledged motif finder with similar properties to MEME. We discuss the difficulty of objectively comparing motif finders. We show that STEME performs comparably to existing prominent discriminative motif finders, DREME and Trawler, on 13 sets of transcription factor binding data in mouse ES cells. We demonstrate the ability of STEME to find long degenerate motifs which these discriminative motif finders do not find. As part of our method, we extend an earlier method due to Nagarajan et al. for the efficient calculation of motif E-values. STEME's source code is available under an open source license and STEME is available via a web interface. PMID:24625410
DNA motif alignment by evolving a population of Markov chains.

PubMed

Bi, Chengpeng

2009-01-30

Deciphering cis-regulatory elements or de novo motif-finding in genomes still remains elusive although much algorithmic effort has been expended. The Markov chain Monte Carlo (MCMC) method such as Gibbs motif samplers has been widely employed to solve the de novo motif-finding problem through sequence local alignment. Nonetheless, the MCMC-based motif samplers still suffer from local maxima like EM. Therefore, as a prerequisite for finding good local alignments, these motif algorithms are often independently run a multitude of times, but without information exchange between different chains. Hence it would be worth a new algorithm design enabling such information exchange. This paper presents a novel motif-finding algorithm by evolving a population of Markov chains with information exchange (PMC), each of which is initialized as a random alignment and run by the Metropolis-Hastings sampler (MHS). It is progressively updated through a series of local alignments stochastically sampled. Explicitly, the PMC motif algorithm performs stochastic sampling as specified by a population-based proposal distribution rather than individual ones, and adaptively evolves the population as a whole towards a global maximum. The alignment information exchange is accomplished by taking advantage of the pooled motif site distributions. A distinct method for running multiple independent Markov chains (IMC) without information exchange, or dubbed as the IMC motif algorithm, is also devised to compare with its PMC counterpart. Experimental studies demonstrate that the performance could be improved if pooled information were used to run a population of motif samplers. The new PMC algorithm was able to improve the convergence and outperformed other popular algorithms tested using simulated and biological motif sequences.
A flexible motif search technique based on generalized profiles.

PubMed

Bucher, P; Karplus, K; Moeri, N; Hofmann, K

1996-03-01

A flexible motif search technique is presented which has two major components: (1) a generalized profile syntax serving as a motif definition language; and (2) a motif search method specifically adapted to the problem of finding multiple instances of a motif in the same sequence. The new profile structure, which is the core of the generalized profile syntax, combines the functions of a variety of motif descriptors implemented in other methods, including regular expression-like patterns, weight matrices, previously used profiles, and certain types of hidden Markov models (HMMs). The relationship between generalized profiles and other biomolecular motif descriptors is analyzed in detail, with special attention to HMMs. Generalized profiles are shown to be equivalent to a particular class of HMMs, and conversion procedures in both directions are given. The conversion procedures provide an interpretation for local alignment in the framework of stochastic models, allowing for clear, simple significance tests. A mathematical statement of the motif search problem defines the new method exactly without linking it to a specific algorithmic solution. Part of the definition includes a new definition of disjointness of alignments.
Simultaneously learning DNA motif along with its position and sequence rank preferences through expectation maximization algorithm.

PubMed

Zhang, ZhiZhuo; Chang, Cheng Wei; Hugo, Willy; Cheung, Edwin; Sung, Wing-Kin

2013-03-01

Although de novo motifs can be discovered through mining over-represented sequence patterns, this approach misses some real motifs and generates many false positives. To improve accuracy, one solution is to consider some additional binding features (i.e., position preference and sequence rank preference). This information is usually required from the user. This article presents a de novo motif discovery algorithm called SEME (sampling with expectation maximization for motif elicitation), which uses pure probabilistic mixture model to model the motif's binding features and uses expectation maximization (EM) algorithms to simultaneously learn the sequence motif, position, and sequence rank preferences without asking for any prior knowledge from the user. SEME is both efficient and accurate thanks to two important techniques: the variable motif length extension and importance sampling. Using 75 large-scale synthetic datasets, 32 metazoan compendium benchmark datasets, and 164 chromatin immunoprecipitation sequencing (ChIP-Seq) libraries, we demonstrated the superior performance of SEME over existing programs in finding transcription factor (TF) binding sites. SEME is further applied to a more difficult problem of finding the co-regulated TF (coTF) motifs in 15 ChIP-Seq libraries. It identified significantly more correct coTF motifs and, at the same time, predicted coTF motifs with better matching to the known motifs. Finally, we show that the learned position and sequence rank preferences of each coTF reveals potential interaction mechanisms between the primary TF and the coTF within these sites. Some of these findings were further validated by the ChIP-Seq experiments of the coTFs. The application is available online.
Memetic algorithms for de novo motif-finding in biomedical sequences.

PubMed

Bi, Chengpeng

2012-09-01

The objectives of this study are to design and implement a new memetic algorithm for de novo motif discovery, which is then applied to detect important signals hidden in various biomedical molecular sequences. In this paper, memetic algorithms are developed and tested in de novo motif-finding problems. Several strategies in the algorithm design are employed that are to not only efficiently explore the multiple sequence local alignment space, but also effectively uncover the molecular signals. As a result, there are a number of key features in the implementation of the memetic motif-finding algorithm (MaMotif), including a chromosome replacement operator, a chromosome alteration-aware local search operator, a truncated local search strategy, and a stochastic operation of local search imposed on individual learning. To test the new algorithm, we compare MaMotif with a few of other similar algorithms using simulated and experimental data including genomic DNA, primary microRNA sequences (let-7 family), and transmembrane protein sequences. The new memetic motif-finding algorithm is successfully implemented in C++, and exhaustively tested with various simulated and real biological sequences. In the simulation, it shows that MaMotif is the most time-efficient algorithm compared with others, that is, it runs 2 times faster than the expectation maximization (EM) method and 16 times faster than the genetic algorithm-based EM hybrid. In both simulated and experimental testing, results show that the new algorithm is compared favorably or superior to other algorithms. Notably, MaMotif is able to successfully discover the transcription factors' binding sites in the chromatin immunoprecipitation followed by massively parallel sequencing (ChIP-Seq) data, correctly uncover the RNA splicing signals in gene expression, and precisely find the highly conserved helix motif in the transmembrane protein sequences, as well as rightly detect the palindromic segments in the primary microRNA sequences. The memetic motif-finding algorithm is effectively designed and implemented, and its applications demonstrate it is not only time-efficient, but also exhibits excellent performance while compared with other popular algorithms. Copyright © 2012 Elsevier B.V. All rights reserved.
BayesMotif: de novo protein sorting motif discovery from impure datasets.

PubMed

Hu, Jianjun; Zhang, Fan

2010-01-18

Protein sorting is the process that newly synthesized proteins are transported to their target locations within or outside of the cell. This process is precisely regulated by protein sorting signals in different forms. A major category of sorting signals are amino acid sub-sequences usually located at the N-terminals or C-terminals of protein sequences. Genome-wide experimental identification of protein sorting signals is extremely time-consuming and costly. Effective computational algorithms for de novo discovery of protein sorting signals is needed to improve the understanding of protein sorting mechanisms. We formulated the protein sorting motif discovery problem as a classification problem and proposed a Bayesian classifier based algorithm (BayesMotif) for de novo identification of a common type of protein sorting motifs in which a highly conserved anchor is present along with a less conserved motif regions. A false positive removal procedure is developed to iteratively remove sequences that are unlikely to contain true motifs so that the algorithm can identify motifs from impure input sequences. Experiments on both implanted motif datasets and real-world datasets showed that the enhanced BayesMotif algorithm can identify anchored sorting motifs from pure or impure protein sequence dataset. It also shows that the false positive removal procedure can help to identify true motifs even when there is only 20% of the input sequences containing true motif instances. We proposed BayesMotif, a novel Bayesian classification based algorithm for de novo discovery of a special category of anchored protein sorting motifs from impure datasets. Compared to conventional motif discovery algorithms such as MEME, our algorithm can find less-conserved motifs with short highly conserved anchors. Our algorithm also has the advantage of easy incorporation of additional meta-sequence features such as hydrophobicity or charge of the motifs which may help to overcome the limitations of PWM (position weight matrix) motif model.
Detection of core-periphery structure in networks based on 3-tuple motifs

NASA Astrophysics Data System (ADS)

Ma, Chuang; Xiang, Bing-Bing; Chen, Han-Shuang; Small, Michael; Zhang, Hai-Feng

2018-05-01

Detecting mesoscale structure, such as community structure, is of vital importance for analyzing complex networks. Recently, a new mesoscale structure, core-periphery (CP) structure, has been identified in many real-world systems. In this paper, we propose an effective algorithm for detecting CP structure based on a 3-tuple motif. In this algorithm, we first define a 3-tuple motif in terms of the patterns of edges as well as the property of nodes, and then a motif adjacency matrix is constructed based on the 3-tuple motif. Finally, the problem is converted to find a cluster that minimizes the smallest motif conductance. Our algorithm works well in different CP structures: including single or multiple CP structure, and local or global CP structures. Results on the synthetic and the empirical networks validate the high performance of our method.
A survey of motif finding Web tools for detecting binding site motifs in ChIP-Seq data

PubMed Central

2014-01-01

Abstract ChIP-Seq (chromatin immunoprecipitation sequencing) has provided the advantage for finding motifs as ChIP-Seq experiments narrow down the motif finding to binding site locations. Recent motif finding tools facilitate the motif detection by providing user-friendly Web interface. In this work, we reviewed nine motif finding Web tools that are capable for detecting binding site motifs in ChIP-Seq data. We showed each motif finding Web tool has its own advantages for detecting motifs that other tools may not discover. We recommended the users to use multiple motif finding Web tools that implement different algorithms for obtaining significant motifs, overlapping resemble motifs, and non-overlapping motifs. Finally, we provided our suggestions for future development of motif finding Web tool that better assists researchers for finding motifs in ChIP-Seq data. Reviewers This article was reviewed by Prof. Sandor Pongor, Dr. Yuriy Gusev, and Dr. Shyam Prabhakar (nominated by Prof. Limsoon Wong). PMID:24555784
ProMotE: an efficient algorithm for counting independent motifs in uncertain network topologies.

PubMed

Ren, Yuanfang; Sarkar, Aisharjya; Kahveci, Tamer

2018-06-26

Identifying motifs in biological networks is essential in uncovering key functions served by these networks. Finding non-overlapping motif instances is however a computationally challenging task. The fact that biological interactions are uncertain events further complicates the problem, as it makes the existence of an embedding of a given motif an uncertain event as well. In this paper, we develop a novel method, ProMotE (Probabilistic Motif Embedding), to count non-overlapping embeddings of a given motif in probabilistic networks. We utilize a polynomial model to capture the uncertainty. We develop three strategies to scale our algorithm to large networks. Our experiments demonstrate that our method scales to large networks in practical time with high accuracy where existing methods fail. Moreover, our experiments on cancer and degenerative disease networks show that our method helps in uncovering key functional characteristics of biological networks.
Exact calculation of distributions on integers, with application to sequence alignment.

PubMed

Newberg, Lee A; Lawrence, Charles E

2009-01-01

Computational biology is replete with high-dimensional discrete prediction and inference problems. Dynamic programming recursions can be applied to several of the most important of these, including sequence alignment, RNA secondary-structure prediction, phylogenetic inference, and motif finding. In these problems, attention is frequently focused on some scalar quantity of interest, a score, such as an alignment score or the free energy of an RNA secondary structure. In many cases, score is naturally defined on integers, such as a count of the number of pairing differences between two sequence alignments, or else an integer score has been adopted for computational reasons, such as in the test of significance of motif scores. The probability distribution of the score under an appropriate probabilistic model is of interest, such as in tests of significance of motif scores, or in calculation of Bayesian confidence limits around an alignment. Here we present three algorithms for calculating the exact distribution of a score of this type; then, in the context of pairwise local sequence alignments, we apply the approach so as to find the alignment score distribution and Bayesian confidence limits.
De-novo discovery of differentially abundant transcription factor binding sites including their positional preference.

PubMed

Keilwagen, Jens; Grau, Jan; Paponov, Ivan A; Posch, Stefan; Strickert, Marc; Grosse, Ivo

2011-02-10

Transcription factors are a main component of gene regulation as they activate or repress gene expression by binding to specific binding sites in promoters. The de-novo discovery of transcription factor binding sites in target regions obtained by wet-lab experiments is a challenging problem in computational biology, which has not been fully solved yet. Here, we present a de-novo motif discovery tool called Dispom for finding differentially abundant transcription factor binding sites that models existing positional preferences of binding sites and adjusts the length of the motif in the learning process. Evaluating Dispom, we find that its prediction performance is superior to existing tools for de-novo motif discovery for 18 benchmark data sets with planted binding sites, and for a metazoan compendium based on experimental data from micro-array, ChIP-chip, ChIP-DSL, and DamID as well as Gene Ontology data. Finally, we apply Dispom to find binding sites differentially abundant in promoters of auxin-responsive genes extracted from Arabidopsis thaliana microarray data, and we find a motif that can be interpreted as a refined auxin responsive element predominately positioned in the 250-bp region upstream of the transcription start site. Using an independent data set of auxin-responsive genes, we find in genome-wide predictions that the refined motif is more specific for auxin-responsive genes than the canonical auxin-responsive element. In general, Dispom can be used to find differentially abundant motifs in sequences of any origin. However, the positional distribution learned by Dispom is especially beneficial if all sequences are aligned to some anchor point like the transcription start site in case of promoter sequences. We demonstrate that the combination of searching for differentially abundant motifs and inferring a position distribution from the data is beneficial for de-novo motif discovery. Hence, we make the tool freely available as a component of the open-source Java framework Jstacs and as a stand-alone application at http://www.jstacs.de/index.php/Dispom.

Motif formation and industry specific topologies in the Japanese business firm network

NASA Astrophysics Data System (ADS)

Maluck, Julian; Donner, Reik V.; Takayasu, Hideki; Takayasu, Misako

2017-05-01

Motifs and roles are basic quantities for the characterization of interactions among 3-node subsets in complex networks. In this work, we investigate how the distribution of 3-node motifs can be influenced by modifying the rules of an evolving network model while keeping the statistics of simpler network characteristics, such as the link density and the degree distribution, invariant. We exemplify this problem for the special case of the Japanese Business Firm Network, where a well-studied and relatively simple yet realistic evolving network model is available, and compare the resulting motif distribution in the real-world and simulated networks. To better approximate the motif distribution of the real-world network in the model, we introduce both subgraph dependent and global additional rules. We find that a specific rule that allows only for the merging process between nodes with similar link directionality patterns reduces the observed excess of densely connected motifs with bidirectional links. Our study improves the mechanistic understanding of motif formation in evolving network models to better describe the characteristic features of real-world networks with a scale-free topology.
Automated Design Framework for Synthetic Biology Exploiting Pareto Optimality.

PubMed

Otero-Muras, Irene; Banga, Julio R

2017-07-21

In this work we consider Pareto optimality for automated design in synthetic biology. We present a generalized framework based on a mixed-integer dynamic optimization formulation that, given design specifications, allows the computation of Pareto optimal sets of designs, that is, the set of best trade-offs for the metrics of interest. We show how this framework can be used for (i) forward design, that is, finding the Pareto optimal set of synthetic designs for implementation, and (ii) reverse design, that is, analyzing and inferring motifs and/or design principles of gene regulatory networks from the Pareto set of optimal circuits. Finally, we illustrate the capabilities and performance of this framework considering four case studies. In the first problem we consider the forward design of an oscillator. In the remaining problems, we illustrate how to apply the reverse design approach to find motifs for stripe formation, rapid adaption, and fold-change detection, respectively.
Counting motifs in dynamic networks.

PubMed

Mukherjee, Kingshuk; Hasan, Md Mahmudul; Boucher, Christina; Kahveci, Tamer

2018-04-11

A network motif is a sub-network that occurs frequently in a given network. Detection of such motifs is important since they uncover functions and local properties of the given biological network. Finding motifs is however a computationally challenging task as it requires solving the costly subgraph isomorphism problem. Moreover, the topology of biological networks change over time. These changing networks are called dynamic biological networks. As the network evolves, frequency of each motif in the network also changes. Computing the frequency of a given motif from scratch in a dynamic network as the network topology evolves is infeasible, particularly for large and fast evolving networks. In this article, we design and develop a scalable method for counting the number of motifs in a dynamic biological network. Our method incrementally updates the frequency of each motif as the underlying network's topology evolves. Our experiments demonstrate that our method can update the frequency of each motif in orders of magnitude faster than counting the motif embeddings every time the network changes. If the network evolves more frequently, the margin with which our method outperforms the existing static methods, increases. We evaluated our method extensively using synthetic and real datasets, and show that our method is highly accurate(≥ 96%) and that it can be scaled to large dense networks. The results on real data demonstrate the utility of our method in revealing interesting insights on the evolution of biological processes.
Rapid search for tertiary fragments reveals protein sequence–structure relationships

PubMed Central

Zhou, Jianfu; Grigoryan, Gevorg

2015-01-01

Finding backbone substructures from the Protein Data Bank that match an arbitrary query structural motif, composed of multiple disjoint segments, is a problem of growing relevance in structure prediction and protein design. Although numerous protein structure search approaches have been proposed, methods that address this specific task without additional restrictions and on practical time scales are generally lacking. Here, we propose a solution, dubbed MASTER, that is both rapid, enabling searches over the Protein Data Bank in a matter of seconds, and provably correct, finding all matches below a user-specified root-mean-square deviation cutoff. We show that despite the potentially exponential time complexity of the problem, running times in practice are modest even for queries with many segments. The ability to explore naturally plausible structural and sequence variations around a given motif has the potential to synthesize its design principles in an automated manner; so we go on to illustrate the utility of MASTER to protein structural biology. We demonstrate its capacity to rapidly establish structure–sequence relationships, uncover the native designability landscapes of tertiary structural motifs, identify structural signatures of binding, and automatically rewire protein topologies. Given the broad utility of protein tertiary fragment searches, we hope that providing MASTER in an open-source format will enable novel advances in understanding, predicting, and designing protein structure. PMID:25420575
RNA motif search with data-driven element ordering.

PubMed

Rampášek, Ladislav; Jimenez, Randi M; Lupták, Andrej; Vinař, Tomáš; Brejová, Broňa

2016-05-18

In this paper, we study the problem of RNA motif search in long genomic sequences. This approach uses a combination of sequence and structure constraints to uncover new distant homologs of known functional RNAs. The problem is NP-hard and is traditionally solved by backtracking algorithms. We have designed a new algorithm for RNA motif search and implemented a new motif search tool RNArobo. The tool enhances the RNAbob descriptor language, allowing insertions in helices, which enables better characterization of ribozymes and aptamers. A typical RNA motif consists of multiple elements and the running time of the algorithm is highly dependent on their ordering. By approaching the element ordering problem in a principled way, we demonstrate more than 100-fold speedup of the search for complex motifs compared to previously published tools. We have developed a new method for RNA motif search that allows for a significant speedup of the search of complex motifs that include pseudoknots. Such speed improvements are crucial at a time when the rate of DNA sequencing outpaces growth in computing. RNArobo is available at http://compbio.fmph.uniba.sk/rnarobo .
Informative priors based on transcription factor structural class improve de novo motif discovery.

PubMed

Narlikar, Leelavati; Gordân, Raluca; Ohler, Uwe; Hartemink, Alexander J

2006-07-15

An important problem in molecular biology is to identify the locations at which a transcription factor (TF) binds to DNA, given a set of DNA sequences believed to be bound by that TF. In previous work, we showed that information in the DNA sequence of a binding site is sufficient to predict the structural class of the TF that binds it. In particular, this suggests that we can predict which locations in any DNA sequence are more likely to be bound by certain classes of TFs than others. Here, we argue that traditional methods for de novo motif finding can be significantly improved by adopting an informative prior probability that a TF binding site occurs at each sequence location. To demonstrate the utility of such an approach, we present priority, a powerful new de novo motif finding algorithm. Using data from TRANSFAC, we train three classifiers to recognize binding sites of basic leucine zipper, forkhead, and basic helix loop helix TFs. These classifiers are used to equip priority with three class-specific priors, in addition to a default prior to handle TFs of other classes. We apply priority and a number of popular motif finding programs to sets of yeast intergenic regions that are reported by ChIP-chip to be bound by particular TFs. priority identifies motifs the other methods fail to identify, and correctly predicts the structural class of the TF recognizing the identified binding sites. Supplementary material and code can be found at http://www.cs.duke.edu/~amink/.
PhyloGibbs-MP: Module Prediction and Discriminative Motif-Finding by Gibbs Sampling

PubMed Central

Siddharthan, Rahul

2008-01-01

PhyloGibbs, our recent Gibbs-sampling motif-finder, takes phylogeny into account in detecting binding sites for transcription factors in DNA and assigns posterior probabilities to its predictions obtained by sampling the entire configuration space. Here, in an extension called PhyloGibbs-MP, we widen the scope of the program, addressing two major problems in computational regulatory genomics. First, PhyloGibbs-MP can localise predictions to small, undetermined regions of a large input sequence, thus effectively predicting cis-regulatory modules (CRMs) ab initio while simultaneously predicting binding sites in those modules—tasks that are usually done by two separate programs. PhyloGibbs-MP's performance at such ab initio CRM prediction is comparable with or superior to dedicated module-prediction software that use prior knowledge of previously characterised transcription factors. Second, PhyloGibbs-MP can predict motifs that differentiate between two (or more) different groups of regulatory regions, that is, motifs that occur preferentially in one group over the others. While other “discriminative motif-finders” have been published in the literature, PhyloGibbs-MP's implementation has some unique features and flexibility. Benchmarks on synthetic and actual genomic data show that this algorithm is successful at enhancing predictions of differentiating sites and suppressing predictions of common sites and compares with or outperforms other discriminative motif-finders on actual genomic data. Additional enhancements include significant performance and speed improvements, the ability to use “informative priors” on known transcription factors, and the ability to output annotations in a format that can be visualised with the Generic Genome Browser. In stand-alone motif-finding, PhyloGibbs-MP remains competitive, outperforming PhyloGibbs-1.0 and other programs on benchmark data. PMID:18769735
Discriminative motif discovery via simulated evolution and random under-sampling.

PubMed

Song, Tao; Gu, Hong

2014-01-01

Conserved motifs in biological sequences are closely related to their structure and functions. Recently, discriminative motif discovery methods have attracted more and more attention. However, little attention has been devoted to the data imbalance problem, which is one of the main reasons affecting the performance of the discriminative models. In this article, a simulated evolution method is applied to solve the multi-class imbalance problem at the stage of data preprocessing, and at the stage of Hidden Markov Models (HMMs) training, a random under-sampling method is introduced for the imbalance between the positive and negative datasets. It is shown that, in the task of discovering targeting motifs of nine subcellular compartments, the motifs found by our method are more conserved than the methods without considering data imbalance problem and recover the most known targeting motifs from Minimotif Miner and InterPro. Meanwhile, we use the found motifs to predict protein subcellular localization and achieve higher prediction precision and recall for the minority classes.
Using SCOPE to identify potential regulatory motifs in coregulated genes.

PubMed

Martyanov, Viktor; Gross, Robert H

2011-05-31

SCOPE is an ensemble motif finder that uses three component algorithms in parallel to identify potential regulatory motifs by over-representation and motif position preference. Each component algorithm is optimized to find a different kind of motif. By taking the best of these three approaches, SCOPE performs better than any single algorithm, even in the presence of noisy data. In this article, we utilize a web version of SCOPE to examine genes that are involved in telomere maintenance. SCOPE has been incorporated into at least two other motif finding programs and has been used in other studies. The three algorithms that comprise SCOPE are BEAM, which finds non-degenerate motifs (ACCGGT), PRISM, which finds degenerate motifs (ASCGWT), and SPACER, which finds longer bipartite motifs (ACCnnnnnnnnGGT). These three algorithms have been optimized to find their corresponding type of motif. Together, they allow SCOPE to perform extremely well. Once a gene set has been analyzed and candidate motifs identified, SCOPE can look for other genes that contain the motif which, when added to the original set, will improve the motif score. This can occur through over-representation or motif position preference. Working with partial gene sets that have biologically verified transcription factor binding sites, SCOPE was able to identify most of the rest of the genes also regulated by the given transcription factor. Output from SCOPE shows candidate motifs, their significance, and other information both as a table and as a graphical motif map. FAQs and video tutorials are available at the SCOPE web site which also includes a "Sample Search" button that allows the user to perform a trial run. Scope has a very friendly user interface that enables novice users to access the algorithm's full power without having to become an expert in the bioinformatics of motif finding. As input, SCOPE can take a list of genes, or FASTA sequences. These can be entered in browser text fields, or read from a file. The output from SCOPE contains a list of all identified motifs with their scores, number of occurrences, fraction of genes containing the motif, and the algorithm used to identify the motif. For each motif, result details include a consensus representation of the motif, a sequence logo, a position weight matrix, and a list of instances for every motif occurrence (with exact positions and "strand" indicated). Results are returned in a browser window and also optionally by email. Previous papers describe the SCOPE algorithms in detail.
Mapping the distribution of packing topologies within protein interiors shows predominant preference for specific packing motifs

PubMed Central

2011-01-01

Background Mapping protein primary sequences to their three dimensional folds referred to as the 'second genetic code' remains an unsolved scientific problem. A crucial part of the problem concerns the geometrical specificity in side chain association leading to densely packed protein cores, a hallmark of correctly folded native structures. Thus, any model of packing within proteins should constitute an indispensable component of protein folding and design. Results In this study an attempt has been made to find, characterize and classify recurring patterns in the packing of side chain atoms within a protein which sustains its native fold. The interaction of side chain atoms within the protein core has been represented as a contact network based on the surface complementarity and overlap between associating side chain surfaces. Some network topologies definitely appear to be preferred and they have been termed 'packing motifs', analogous to super secondary structures in proteins. Study of the distribution of these motifs reveals the ubiquitous presence of typical smaller graphs, which appear to get linked or coalesce to give larger graphs, reminiscent of the nucleation-condensation model in protein folding. One such frequently occurring motif, also envisaged as the unit of clustering, the three residue clique was invariably found in regions of dense packing. Finally, topological measures based on surface contact networks appeared to be effective in discriminating sequences native to a specific fold amongst a set of decoys. Conclusions Out of innumerable topological possibilities, only a finite number of specific packing motifs are actually realized in proteins. This small number of motifs could serve as a basis set in the construction of larger networks. Of these, the triplet clique exhibits distinct preference both in terms of composition and geometry. PMID:21605466
Identification of Predictive Cis-Regulatory Elements Using a Discriminative Objective Function and a Dynamic Search Space

PubMed Central

Karnik, Rahul; Beer, Michael A.

2015-01-01

The generation of genomic binding or accessibility data from massively parallel sequencing technologies such as ChIP-seq and DNase-seq continues to accelerate. Yet state-of-the-art computational approaches for the identification of DNA binding motifs often yield motifs of weak predictive power. Here we present a novel computational algorithm called MotifSpec, designed to find predictive motifs, in contrast to over-represented sequence elements. The key distinguishing feature of this algorithm is that it uses a dynamic search space and a learned threshold to find discriminative motifs in combination with the modeling of motifs using a full PWM (position weight matrix) rather than k-mer words or regular expressions. We demonstrate that our approach finds motifs corresponding to known binding specificities in several mammalian ChIP-seq datasets, and that our PWMs classify the ChIP-seq signals with accuracy comparable to, or marginally better than motifs from the best existing algorithms. In other datasets, our algorithm identifies novel motifs where other methods fail. Finally, we apply this algorithm to detect motifs from expression datasets in C. elegans using a dynamic expression similarity metric rather than fixed expression clusters, and find novel predictive motifs. PMID:26465884
Identification of Predictive Cis-Regulatory Elements Using a Discriminative Objective Function and a Dynamic Search Space.

PubMed

Karnik, Rahul; Beer, Michael A

2015-01-01

The generation of genomic binding or accessibility data from massively parallel sequencing technologies such as ChIP-seq and DNase-seq continues to accelerate. Yet state-of-the-art computational approaches for the identification of DNA binding motifs often yield motifs of weak predictive power. Here we present a novel computational algorithm called MotifSpec, designed to find predictive motifs, in contrast to over-represented sequence elements. The key distinguishing feature of this algorithm is that it uses a dynamic search space and a learned threshold to find discriminative motifs in combination with the modeling of motifs using a full PWM (position weight matrix) rather than k-mer words or regular expressions. We demonstrate that our approach finds motifs corresponding to known binding specificities in several mammalian ChIP-seq datasets, and that our PWMs classify the ChIP-seq signals with accuracy comparable to, or marginally better than motifs from the best existing algorithms. In other datasets, our algorithm identifies novel motifs where other methods fail. Finally, we apply this algorithm to detect motifs from expression datasets in C. elegans using a dynamic expression similarity metric rather than fixed expression clusters, and find novel predictive motifs.
SVM2Motif—Reconstructing Overlapping DNA Sequence Motifs by Mimicking an SVM Predictor

PubMed Central

Vidovic, Marina M. -C.; Görnitz, Nico; Müller, Klaus-Robert; Rätsch, Gunnar; Kloft, Marius

2015-01-01

Identifying discriminative motifs underlying the functionality and evolution of organisms is a major challenge in computational biology. Machine learning approaches such as support vector machines (SVMs) achieve state-of-the-art performances in genomic discrimination tasks, but—due to its black-box character—motifs underlying its decision function are largely unknown. As a remedy, positional oligomer importance matrices (POIMs) allow us to visualize the significance of position-specific subsequences. Although being a major step towards the explanation of trained SVM models, they suffer from the fact that their size grows exponentially in the length of the motif, which renders their manual inspection feasible only for comparably small motif sizes, typically k ≤ 5. In this work, we extend the work on positional oligomer importance matrices, by presenting a new machine-learning methodology, entitled motifPOIM, to extract the truly relevant motifs—regardless of their length and complexity—underlying the predictions of a trained SVM model. Our framework thereby considers the motifs as free parameters in a probabilistic model, a task which can be phrased as a non-convex optimization problem. The exponential dependence of the POIM size on the oligomer length poses a major numerical challenge, which we address by an efficient optimization framework that allows us to find possibly overlapping motifs consisting of up to hundreds of nucleotides. We demonstrate the efficacy of our approach on a synthetic data set as well as a real-world human splice site data set. PMID:26690911
KIRMES: kernel-based identification of regulatory modules in euchromatic sequences.

PubMed

Schultheiss, Sebastian J; Busch, Wolfgang; Lohmann, Jan U; Kohlbacher, Oliver; Rätsch, Gunnar

2009-08-15

Understanding transcriptional regulation is one of the main challenges in computational biology. An important problem is the identification of transcription factor (TF) binding sites in promoter regions of potential TF target genes. It is typically approached by position weight matrix-based motif identification algorithms using Gibbs sampling, or heuristics to extend seed oligos. Such algorithms succeed in identifying single, relatively well-conserved binding sites, but tend to fail when it comes to the identification of combinations of several degenerate binding sites, as those often found in cis-regulatory modules. We propose a new algorithm that combines the benefits of existing motif finding with the ones of support vector machines (SVMs) to find degenerate motifs in order to improve the modeling of regulatory modules. In experiments on microarray data from Arabidopsis thaliana, we were able to show that the newly developed strategy significantly improves the recognition of TF targets. The python source code (open source-licensed under GPL), the data for the experiments and a Galaxy-based web service are available at http://www.fml.mpg.de/raetsch/suppl/kirmes/.
A study on the application of topic models to motif finding algorithms.

PubMed

Basha Gutierrez, Josep; Nakai, Kenta

2016-12-22

Topic models are statistical algorithms which try to discover the structure of a set of documents according to the abstract topics contained in them. Here we try to apply this approach to the discovery of the structure of the transcription factor binding sites (TFBS) contained in a set of biological sequences, which is a fundamental problem in molecular biology research for the understanding of transcriptional regulation. Here we present two methods that make use of topic models for motif finding. First, we developed an algorithm in which first a set of biological sequences are treated as text documents, and the k-mers contained in them as words, to then build a correlated topic model (CTM) and iteratively reduce its perplexity. We also used the perplexity measurement of CTMs to improve our previous algorithm based on a genetic algorithm and several statistical coefficients. The algorithms were tested with 56 data sets from four different species and compared to 14 other methods by the use of several coefficients both at nucleotide and site level. The results of our first approach showed a performance comparable to the other methods studied, especially at site level and in sensitivity scores, in which it scored better than any of the 14 existing tools. In the case of our previous algorithm, the new approach with the addition of the perplexity measurement clearly outperformed all of the other methods in sensitivity, both at nucleotide and site level, and in overall performance at site level. The statistics obtained show that the performance of a motif finding method based on the use of a CTM is satisfying enough to conclude that the application of topic models is a valid method for developing motif finding algorithms. Moreover, the addition of topic models to a previously developed method dramatically increased its performance, suggesting that this combined algorithm can be a useful tool to successfully predict motifs in different kinds of sets of DNA sequences.
Biological network motif detection and evaluation

PubMed Central

2011-01-01

Background Molecular level of biological data can be constructed into system level of data as biological networks. Network motifs are defined as over-represented small connected subgraphs in networks and they have been used for many biological applications. Since network motif discovery involves computationally challenging processes, previous algorithms have focused on computational efficiency. However, we believe that the biological quality of network motifs is also very important. Results We define biological network motifs as biologically significant subgraphs and traditional network motifs are differentiated as structural network motifs in this paper. We develop five algorithms, namely, EDGEGO-BNM, EDGEBETWEENNESS-BNM, NMF-BNM, NMFGO-BNM and VOLTAGE-BNM, for efficient detection of biological network motifs, and introduce several evaluation measures including motifs included in complex, motifs included in functional module and GO term clustering score in this paper. Experimental results show that EDGEGO-BNM and EDGEBETWEENNESS-BNM perform better than existing algorithms and all of our algorithms are applicable to find structural network motifs as well. Conclusion We provide new approaches to finding network motifs in biological networks. Our algorithms efficiently detect biological network motifs and further improve existing algorithms to find high quality structural network motifs, which would be impossible using existing algorithms. The performances of the algorithms are compared based on our new evaluation measures in biological contexts. We believe that our work gives some guidelines of network motifs research for the biological networks. PMID:22784624
BEAM web server: a tool for structural RNA motif discovery.

PubMed

Pietrosanto, Marco; Adinolfi, Marta; Casula, Riccardo; Ausiello, Gabriele; Ferrè, Fabrizio; Helmer-Citterich, Manuela

2018-03-15

RNA structural motif finding is a relevant problem that becomes computationally hard when working on high-throughput data (e.g. eCLIP, PAR-CLIP), often represented by thousands of RNA molecules. Currently, the BEAM server is the only web tool capable to handle tens of thousands of RNA in input with a motif discovery procedure that is only limited by the current secondary structure prediction accuracies. The recently developed method BEAM (BEAr Motifs finder) can analyze tens of thousands of RNA molecules and identify RNA secondary structure motifs associated to a measure of their statistical significance. BEAM is extremely fast thanks to the BEAR encoding that transforms each RNA secondary structure in a string of characters. BEAM also exploits the evolutionary knowledge contained in a substitution matrix of secondary structure elements, extracted from the RFAM database of families of homologous RNAs. The BEAM web server has been designed to streamline data pre-processing by automatically handling folding and encoding of RNA sequences, giving users a choice for the preferred folding program. The server provides an intuitive and informative results page with the list of secondary structure motifs identified, the logo of each motif, its significance, graphic representation and information about its position in the RNA molecules sharing it. The web server is freely available at http://beam.uniroma2.it/ and it is implemented in NodeJS and Python with all major browsers supported. marco.pietrosanto@uniroma2.it. Supplementary data are available at Bioinformatics online.
QuateXelero: An Accelerated Exact Network Motif Detection Algorithm

PubMed Central

Khakabimamaghani, Sahand; Sharafuddin, Iman; Dichter, Norbert; Koch, Ina; Masoudi-Nejad, Ali

2013-01-01

Finding motifs in biological, social, technological, and other types of networks has become a widespread method to gain more knowledge about these networks’ structure and function. However, this task is very computationally demanding, because it is highly associated with the graph isomorphism which is an NP problem (not known to belong to P or NP-complete subsets yet). Accordingly, this research is endeavoring to decrease the need to call NAUTY isomorphism detection method, which is the most time-consuming step in many existing algorithms. The work provides an extremely fast motif detection algorithm called QuateXelero, which has a Quaternary Tree data structure in the heart. The proposed algorithm is based on the well-known ESU (FANMOD) motif detection algorithm. The results of experiments on some standard model networks approve the overal superiority of the proposed algorithm, namely QuateXelero, compared with two of the fastest existing algorithms, G-Tries and Kavosh. QuateXelero is especially fastest in constructing the central data structure of the algorithm from scratch based on the input network. PMID:23874498
Searching RNA motifs and their intermolecular contacts with constraint networks.

PubMed

Thébault, P; de Givry, S; Schiex, T; Gaspin, C

2006-09-01

Searching RNA gene occurrences in genomic sequences is a task whose importance has been renewed by the recent discovery of numerous functional RNA, often interacting with other ligands. Even if several programs exist for RNA motif search, none exists that can represent and solve the problem of searching for occurrences of RNA motifs in interaction with other molecules. We present a constraint network formulation of this problem. RNA are represented as structured motifs that can occur on more than one sequence and which are related together by possible hybridization. The implemented tool MilPat is used to search for several sRNA families in genomic sequences. Results show that MilPat allows to efficiently search for interacting motifs in large genomic sequences and offers a simple and extensible framework to solve such problems. New and known sRNA are identified as H/ACA candidates in Methanocaldococcus jannaschii. http://carlit.toulouse.inra.fr/MilPaT/MilPat.pl.
DMINDA: an integrated web server for DNA motif identification and analyses

PubMed Central

Ma, Qin; Zhang, Hanyuan; Mao, Xizeng; Zhou, Chuan; Liu, Bingqiang; Chen, Xin; Xu, Ying

2014-01-01

DMINDA (DNA motif identification and analyses) is an integrated web server for DNA motif identification and analyses, which is accessible at http://csbl.bmb.uga.edu/DMINDA/. This web site is freely available to all users and there is no login requirement. This server provides a suite of cis-regulatory motif analysis functions on DNA sequences, which are important to elucidation of the mechanisms of transcriptional regulation: (i) de novo motif finding for a given set of promoter sequences along with statistical scores for the predicted motifs derived based on information extracted from a control set, (ii) scanning motif instances of a query motif in provided genomic sequences, (iii) motif comparison and clustering of identified motifs, and (iv) co-occurrence analyses of query motifs in given promoter sequences. The server is powered by a backend computer cluster with over 150 computing nodes, and is particularly useful for motif prediction and analyses in prokaryotic genomes. We believe that DMINDA, as a new and comprehensive web server for cis-regulatory motif finding and analyses, will benefit the genomic research community in general and prokaryotic genome researchers in particular. PMID:24753419

A private DNA motif finding algorithm.

PubMed

Chen, Rui; Peng, Yun; Choi, Byron; Xu, Jianliang; Hu, Haibo

2014-08-01

With the increasing availability of genomic sequence data, numerous methods have been proposed for finding DNA motifs. The discovery of DNA motifs serves a critical step in many biological applications. However, the privacy implication of DNA analysis is normally neglected in the existing methods. In this work, we propose a private DNA motif finding algorithm in which a DNA owner's privacy is protected by a rigorous privacy model, known as ∊-differential privacy. It provides provable privacy guarantees that are independent of adversaries' background knowledge. Our algorithm makes use of the n-gram model and is optimized for processing large-scale DNA sequences. We evaluate the performance of our algorithm over real-life genomic data and demonstrate the promise of integrating privacy into DNA motif finding. Copyright © 2014 Elsevier Inc. All rights reserved.
Discovery of phosphorylation motif mixtures in phosphoproteomics data

PubMed Central

Ritz, Anna; Shakhnarovich, Gregory; Salomon, Arthur R.; Raphael, Benjamin J.

2009-01-01

Motivation: Modification of proteins via phosphorylation is a primary mechanism for signal transduction in cells. Phosphorylation sites on proteins are determined in part through particular patterns, or motifs, present in the amino acid sequence. Results: We describe an algorithm that simultaneously discovers multiple motifs in a set of peptides that were phosphorylated by several different kinases. Such sets of peptides are routinely produced in proteomics experiments.Our motif-finding algorithm uses the principle of minimum description length to determine a mixture of sequence motifs that distinguish a foreground set of phosphopeptides from a background set of unphosphorylated peptides. We show that our algorithm outperforms existing motif-finding algorithms on synthetic datasets consisting of mixtures of known phosphorylation sites. We also derive a motif specificity score that quantifies whether or not the phosphoproteins containing an instance of a motif have a significant number of known interactions. Application of our motif-finding algorithm to recently published human and mouse proteomic studies recovers several known phosphorylation motifs and reveals a number of novel motifs that are enriched for interactions with a particular kinase or phosphatase. Our tools provide a new approach for uncovering the sequence specificities of uncharacterized kinases or phosphatases. Availability: Software is available at http:/cs.brown.edu/people/braphael/software.html. Contact: aritz@cs.brown.edu; braphael@cs.brown.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:18996944
Triadic motifs in the dependence networks of virtual societies.

PubMed

Xie, Wen-Jie; Li, Ming-Xia; Jiang, Zhi-Qiang; Zhou, Wei-Xing

2014-06-10

In friendship networks, individuals have different numbers of friends, and the closeness or intimacy between an individual and her friends is heterogeneous. Using a statistical filtering method to identify relationships about who depends on whom, we construct dependence networks (which are directed) from weighted friendship networks of avatars in more than two hundred virtual societies of a massively multiplayer online role-playing game (MMORPG). We investigate the evolution of triadic motifs in dependence networks. Several metrics show that the virtual societies evolved through a transient stage in the first two to three weeks and reached a relatively stable stage. We find that the unidirectional loop motif (M9) is underrepresented and does not appear, open motifs are also underrepresented, while other close motifs are overrepresented. We also find that, for most motifs, the overall level difference of the three avatars in the same motif is significantly lower than average, whereas the sum of ranks is only slightly larger than average. Our findings show that avatars' social status plays an important role in the formation of triadic motifs.
Triadic motifs in the dependence networks of virtual societies

NASA Astrophysics Data System (ADS)

Xie, Wen-Jie; Li, Ming-Xia; Jiang, Zhi-Qiang; Zhou, Wei-Xing

2014-06-01

In friendship networks, individuals have different numbers of friends, and the closeness or intimacy between an individual and her friends is heterogeneous. Using a statistical filtering method to identify relationships about who depends on whom, we construct dependence networks (which are directed) from weighted friendship networks of avatars in more than two hundred virtual societies of a massively multiplayer online role-playing game (MMORPG). We investigate the evolution of triadic motifs in dependence networks. Several metrics show that the virtual societies evolved through a transient stage in the first two to three weeks and reached a relatively stable stage. We find that the unidirectional loop motif (M9) is underrepresented and does not appear, open motifs are also underrepresented, while other close motifs are overrepresented. We also find that, for most motifs, the overall level difference of the three avatars in the same motif is significantly lower than average, whereas the sum of ranks is only slightly larger than average. Our findings show that avatars' social status plays an important role in the formation of triadic motifs.
Triadic motifs in the dependence networks of virtual societies

PubMed Central

Xie, Wen-Jie; Li, Ming-Xia; Jiang, Zhi-Qiang; Zhou, Wei-Xing

2014-01-01

In friendship networks, individuals have different numbers of friends, and the closeness or intimacy between an individual and her friends is heterogeneous. Using a statistical filtering method to identify relationships about who depends on whom, we construct dependence networks (which are directed) from weighted friendship networks of avatars in more than two hundred virtual societies of a massively multiplayer online role-playing game (MMORPG). We investigate the evolution of triadic motifs in dependence networks. Several metrics show that the virtual societies evolved through a transient stage in the first two to three weeks and reached a relatively stable stage. We find that the unidirectional loop motif (M9) is underrepresented and does not appear, open motifs are also underrepresented, while other close motifs are overrepresented. We also find that, for most motifs, the overall level difference of the three avatars in the same motif is significantly lower than average, whereas the sum of ranks is only slightly larger than average. Our findings show that avatars' social status plays an important role in the formation of triadic motifs. PMID:24912755
DMINDA: an integrated web server for DNA motif identification and analyses.

PubMed

Ma, Qin; Zhang, Hanyuan; Mao, Xizeng; Zhou, Chuan; Liu, Bingqiang; Chen, Xin; Xu, Ying

2014-07-01

DMINDA (DNA motif identification and analyses) is an integrated web server for DNA motif identification and analyses, which is accessible at http://csbl.bmb.uga.edu/DMINDA/. This web site is freely available to all users and there is no login requirement. This server provides a suite of cis-regulatory motif analysis functions on DNA sequences, which are important to elucidation of the mechanisms of transcriptional regulation: (i) de novo motif finding for a given set of promoter sequences along with statistical scores for the predicted motifs derived based on information extracted from a control set, (ii) scanning motif instances of a query motif in provided genomic sequences, (iii) motif comparison and clustering of identified motifs, and (iv) co-occurrence analyses of query motifs in given promoter sequences. The server is powered by a backend computer cluster with over 150 computing nodes, and is particularly useful for motif prediction and analyses in prokaryotic genomes. We believe that DMINDA, as a new and comprehensive web server for cis-regulatory motif finding and analyses, will benefit the genomic research community in general and prokaryotic genome researchers in particular. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
A structural-alphabet-based strategy for finding structural motifs across protein families

PubMed Central

Wu, Chih Yuan; Chen, Yao Chi; Lim, Carmay

2010-01-01

Proteins with insignificant sequence and overall structure similarity may still share locally conserved contiguous structural segments; i.e. structural/3D motifs. Most methods for finding 3D motifs require a known motif to search for other similar structures or functionally/structurally crucial residues. Here, without requiring a query motif or essential residues, a fully automated method for discovering 3D motifs of various sizes across protein families with different folds based on a 16-letter structural alphabet is presented. It was applied to structurally non-redundant proteins bound to DNA, RNA, obligate/non-obligate proteins as well as free DNA-binding proteins (DBPs) and proteins with known structures but unknown function. Its usefulness was illustrated by analyzing the 3D motifs found in DBPs. A non-specific motif was found with a ‘corner’ architecture that confers a stable scaffold and enables diverse interactions, making it suitable for binding not only DNA but also RNA and proteins. Furthermore, DNA-specific motifs present ‘only’ in DBPs were discovered. The motifs found can provide useful guidelines in detecting binding sites and computational protein redesign. PMID:20525797
Finding Hidden Location Patterns of Two Competitive Supermarkets in Thailand

NASA Astrophysics Data System (ADS)

Khumsri, Jinattaporn; Fujihara, Akihiro

There are two famous supermarkets in Thailand: Big C and Lotus. They are the highest competitive supermarkets whose hold the most market share by lots of promotions and also gather all convenience services including banking, restaurant, and others. In recent years, they gradually expand their stores and they take a similar strategy to determine where to locate a store. It is important for them to consider store allocation to obtain new customers efficiently. To consider this, we gather geographical locations of these supermarkets from Twitter using Twitter API. We gathered tweets having these supermarket names and geotags for seven months. To extract hidden location patterns from gathered data, we introduce location motif which is a directed subgraph whose edges are linked to every pair of the shortest-distance opponent node. We investigate every possible configuration of location motif when they have a small number of nodes and find that the configuration increases exponentially. We also visualize location motifs generated from gathered data on the map of Thailand and count the frequency of observed location motifs. As a result, we find that even if the possible location motifs exponentially increase as the number of nodes grows, limited location motifs can be observed. Using location motif, we successfully find an evidence of biased store allocation in reality.
Occurrence probability of structured motifs in random sequences.

PubMed

Robin, S; Daudin, J-J; Richard, H; Sagot, M-F; Schbath, S

2002-01-01

The problem of extracting from a set of nucleic acid sequences motifs which may have biological function is more and more important. In this paper, we are interested in particular motifs that may be implicated in the transcription process. These motifs, called structured motifs, are composed of two ordered parts separated by a variable distance and allowing for substitutions. In order to assess their statistical significance, we propose approximations of the probability of occurrences of such a structured motif in a given sequence. An application of our method to evaluate candidate promoters in E. coli and B. subtilis is presented. Simulations show the goodness of the approximations.
Motif finding in DNA sequences based on skipping nonconserved positions in background Markov chains.

PubMed

Zhao, Xiaoyan; Sze, Sing-Hoi

2011-05-01

One strategy to identify transcription factor binding sites is through motif finding in upstream DNA sequences of potentially co-regulated genes. Despite extensive efforts, none of the existing algorithms perform very well. We consider a string representation that allows arbitrary ignored positions within the nonconserved portion of single motifs, and use O(2(l)) Markov chains to model the background distributions of motifs of length l while skipping these positions within each Markov chain. By focusing initially on positions that have fixed nucleotides to define core occurrences, we develop an algorithm to identify motifs of moderate lengths. We compare the performance of our algorithm to other motif finding algorithms on a few benchmark data sets, and show that significant improvement in accuracy can be obtained when the sites are sufficiently conserved within a given sample, while comparable performance is obtained when the site conservation rate is low. A software program (PosMotif ) and detailed results are available online at http://faculty.cse.tamu.edu/shsze/posmotif.
A Bioinformatics Approach for Detecting Repetitive Nested Motifs using Pattern Matching.

PubMed

Romero, José R; Carballido, Jessica A; Garbus, Ingrid; Echenique, Viviana C; Ponzoni, Ignacio

2016-01-01

The identification of nested motifs in genomic sequences is a complex computational problem. The detection of these patterns is important to allow the discovery of transposable element (TE) insertions, incomplete reverse transcripts, deletions, and/or mutations. In this study, a de novo strategy for detecting patterns that represent nested motifs was designed based on exhaustive searches for pairs of motifs and combinatorial pattern analysis. These patterns can be grouped into three categories, motifs within other motifs, motifs flanked by other motifs, and motifs of large size. The methodology used in this study, applied to genomic sequences from the plant species Aegilops tauschii and Oryza sativa , revealed that it is possible to identify putative nested TEs by detecting these three types of patterns. The results were validated through BLAST alignments, which revealed the efficacy and usefulness of the new method, which is called Mamushka.
LDsplit: screening for cis-regulatory motifs stimulating meiotic recombination hotspots by analysis of DNA sequence polymorphisms.

PubMed

Yang, Peng; Wu, Min; Guo, Jing; Kwoh, Chee Keong; Przytycka, Teresa M; Zheng, Jie

2014-02-17

As a fundamental genomic element, meiotic recombination hotspot plays important roles in life sciences. Thus uncovering its regulatory mechanisms has broad impact on biomedical research. Despite the recent identification of the zinc finger protein PRDM9 and its 13-mer binding motif as major regulators for meiotic recombination hotspots, other regulators remain to be discovered. Existing methods for finding DNA sequence motifs of recombination hotspots often rely on the enrichment of co-localizations between hotspots and short DNA patterns, which ignore the cross-individual variation of recombination rates and sequence polymorphisms in the population. Our objective in this paper is to capture signals encoded in genetic variations for the discovery of recombination-associated DNA motifs. Recently, an algorithm called "LDsplit" has been designed to detect the association between single nucleotide polymorphisms (SNPs) and proximal meiotic recombination hotspots. The association is measured by the difference of population recombination rates at a hotspot between two alleles of a candidate SNP. Here we present an open source software tool of LDsplit, with integrative data visualization for recombination hotspots and their proximal SNPs. Applying LDsplit on SNPs inside an established 7-mer motif bound by PRDM9 we observed that SNP alleles preserving the original motif tend to have higher recombination rates than the opposite alleles that disrupt the motif. Running on SNP windows around hotspots each containing an occurrence of the 7-mer motif, LDsplit is able to guide the established motif finding algorithm of MEME to recover the 7-mer motif. In contrast, without LDsplit the 7-mer motif could not be identified. LDsplit is a software tool for the discovery of cis-regulatory DNA sequence motifs stimulating meiotic recombination hotspots by screening and narrowing down to hotspot associated SNPs. It is the first computational method that utilizes the genetic variation of recombination hotspots among individuals, opening a new avenue for motif finding. Tested on an established motif and simulated datasets, LDsplit shows promise to discover novel DNA motifs for meiotic recombination hotspots.
LDsplit: screening for cis-regulatory motifs stimulating meiotic recombination hotspots by analysis of DNA sequence polymorphisms

PubMed Central

2014-01-01

Background As a fundamental genomic element, meiotic recombination hotspot plays important roles in life sciences. Thus uncovering its regulatory mechanisms has broad impact on biomedical research. Despite the recent identification of the zinc finger protein PRDM9 and its 13-mer binding motif as major regulators for meiotic recombination hotspots, other regulators remain to be discovered. Existing methods for finding DNA sequence motifs of recombination hotspots often rely on the enrichment of co-localizations between hotspots and short DNA patterns, which ignore the cross-individual variation of recombination rates and sequence polymorphisms in the population. Our objective in this paper is to capture signals encoded in genetic variations for the discovery of recombination-associated DNA motifs. Results Recently, an algorithm called “LDsplit” has been designed to detect the association between single nucleotide polymorphisms (SNPs) and proximal meiotic recombination hotspots. The association is measured by the difference of population recombination rates at a hotspot between two alleles of a candidate SNP. Here we present an open source software tool of LDsplit, with integrative data visualization for recombination hotspots and their proximal SNPs. Applying LDsplit on SNPs inside an established 7-mer motif bound by PRDM9 we observed that SNP alleles preserving the original motif tend to have higher recombination rates than the opposite alleles that disrupt the motif. Running on SNP windows around hotspots each containing an occurrence of the 7-mer motif, LDsplit is able to guide the established motif finding algorithm of MEME to recover the 7-mer motif. In contrast, without LDsplit the 7-mer motif could not be identified. Conclusions LDsplit is a software tool for the discovery of cis-regulatory DNA sequence motifs stimulating meiotic recombination hotspots by screening and narrowing down to hotspot associated SNPs. It is the first computational method that utilizes the genetic variation of recombination hotspots among individuals, opening a new avenue for motif finding. Tested on an established motif and simulated datasets, LDsplit shows promise to discover novel DNA motifs for meiotic recombination hotspots. PMID:24533858
The Proliferating Cell Nuclear Antigen (PCNA)-interacting Protein (PIP) Motif of DNA Polymerase η Mediates Its Interaction with the C-terminal Domain of Rev1*

PubMed Central

Boehm, Elizabeth M.; Powers, Kyle T.; Kondratick, Christine M.; Spies, Maria; Houtman, Jon C. D.; Washington, M. Todd

2016-01-01

Y-family DNA polymerases, such as polymerase η, polymerase ι, and polymerase κ, catalyze the bypass of DNA damage during translesion synthesis. These enzymes are recruited to sites of DNA damage by interacting with the essential replication accessory protein proliferating cell nuclear antigen (PCNA) and the scaffold protein Rev1. In most Y-family polymerases, these interactions are mediated by one or more conserved PCNA-interacting protein (PIP) motifs that bind in a hydrophobic pocket on the front side of PCNA as well as by conserved Rev1-interacting region (RIR) motifs that bind in a hydrophobic pocket on the C-terminal domain of Rev1. Yeast polymerase η, a prototypical translesion synthesis polymerase, binds both PCNA and Rev1. It possesses a single PIP motif but not an RIR motif. Here we show that the PIP motif of yeast polymerase η mediates its interactions both with PCNA and with Rev1. Moreover, the PIP motif of polymerase η binds in the hydrophobic pocket on the Rev1 C-terminal domain. We also show that the RIR motif of human polymerase κ and the PIP motif of yeast Msh6 bind both PCNA and Rev1. Overall, these findings demonstrate that PIP motifs and RIR motifs have overlapping specificities and can interact with both PCNA and Rev1 in structurally similar ways. These findings also suggest that PIP motifs are a more versatile protein interaction motif than previously believed. PMID:26903512
Finding the target sites of RNA-binding proteins

PubMed Central

Li, Xiao; Kazan, Hilal; Lipshitz, Howard D; Morris, Quaid D

2014-01-01

RNA–protein interactions differ from DNA–protein interactions because of the central role of RNA secondary structure. Some RNA-binding domains (RBDs) recognize their target sites mainly by their shape and geometry and others are sequence-specific but are sensitive to secondary structure context. A number of small- and large-scale experimental approaches have been developed to measure RNAs associated in vitro and in vivo with RNA-binding proteins (RBPs). Generalizing outside of the experimental conditions tested by these assays requires computational motif finding. Often RBP motif finding is done by adapting DNA motif finding methods; but modeling secondary structure context leads to better recovery of RBP-binding preferences. Genome-wide assessment of mRNA secondary structure has recently become possible, but these data must be combined with computational predictions of secondary structure before they add value in predicting in vivo binding. There are two main approaches to incorporating structural information into motif models: supplementing primary sequence motif models with preferred secondary structure contexts (e.g., MEMERIS and RNAcontext) and directly modeling secondary structure recognized by the RBP using stochastic context-free grammars (e.g., CMfinder and RNApromo). The former better reconstruct known binding preferences for sequence-specific RBPs but are not suitable for modeling RBPs that recognize shape and geometry of RNAs. Future work in RBP motif finding should incorporate interactions between multiple RBDs and multiple RBPs in binding to RNA. WIREs RNA 2014, 5:111–130. doi: 10.1002/wrna.1201 PMID:24217996
MotifMark: Finding regulatory motifs in DNA sequences.

PubMed

Hassanzadeh, Hamid Reza; Kolhe, Pushkar; Isbell, Charles L; Wang, May D

2017-07-01

The interaction between proteins and DNA is a key driving force in a significant number of biological processes such as transcriptional regulation, repair, recombination, splicing, and DNA modification. The identification of DNA-binding sites and the specificity of target proteins in binding to these regions are two important steps in understanding the mechanisms of these biological activities. A number of high-throughput technologies have recently emerged that try to quantify the affinity between proteins and DNA motifs. Despite their success, these technologies have their own limitations and fall short in precise characterization of motifs, and as a result, require further downstream analysis to extract useful and interpretable information from a haystack of noisy and inaccurate data. Here we propose MotifMark, a new algorithm based on graph theory and machine learning, that can find binding sites on candidate probes and rank their specificity in regard to the underlying transcription factor. We developed a pipeline to analyze experimental data derived from compact universal protein binding microarrays and benchmarked it against two of the most accurate motif search methods. Our results indicate that MotifMark can be a viable alternative technique for prediction of motif from protein binding microarrays and possibly other related high-throughput techniques.
cWINNOWER Algorithm for Finding Fuzzy DNA Motifs

NASA Technical Reports Server (NTRS)

Liang, Shoudan

2003-01-01

The cWINNOWER algorithm detects fuzzy motifs in DNA sequences rich in protein-binding signals. A signal is defined as any short nucleotide pattern having up to d mutations differing from a motif of length l. The algorithm finds such motifs if multiple mutated copies of the motif (i.e., the signals) are present in the DNA sequence in sufficient abundance. The cWINNOWER algorithm substantially improves the sensitivity of the winnower method of Pevzner and Sze by imposing a consensus constraint, enabling it to detect much weaker signals. We studied the minimum number of detectable motifs qc as a function of sequence length N for random sequences. We found that qc increases linearly with N for a fast version of the algorithm based on counting three-member sub-cliques. Imposing consensus constraints reduces qc, by a factor of three in this case, which makes the algorithm dramatically more sensitive. Our most sensitive algorithm, which counts four-member sub-cliques, needs a minimum of only 13 signals to detect motifs in a sequence of length N = 12000 for (l,d) = (15,4).
General method to find the attractors of discrete dynamic models of biological systems.

PubMed

Gan, Xiao; Albert, Réka

2018-04-01

Analyzing the long-term behaviors (attractors) of dynamic models of biological networks can provide valuable insight. We propose a general method that can find the attractors of multilevel discrete dynamical systems by extending a method that finds the attractors of a Boolean network model. The previous method is based on finding stable motifs, subgraphs whose nodes' states can stabilize on their own. We extend the framework from binary states to any finite discrete levels by creating a virtual node for each level of a multilevel node, and describing each virtual node with a quasi-Boolean function. We then create an expanded representation of the multilevel network, find multilevel stable motifs and oscillating motifs, and identify attractors by successive network reduction. In this way, we find both fixed point attractors and complex attractors. We implemented an algorithm, which we test and validate on representative synthetic networks and on published multilevel models of biological networks. Despite its primary motivation to analyze biological networks, our motif-based method is general and can be applied to any finite discrete dynamical system.
General method to find the attractors of discrete dynamic models of biological systems

NASA Astrophysics Data System (ADS)

Gan, Xiao; Albert, Réka

2018-04-01

Analyzing the long-term behaviors (attractors) of dynamic models of biological networks can provide valuable insight. We propose a general method that can find the attractors of multilevel discrete dynamical systems by extending a method that finds the attractors of a Boolean network model. The previous method is based on finding stable motifs, subgraphs whose nodes' states can stabilize on their own. We extend the framework from binary states to any finite discrete levels by creating a virtual node for each level of a multilevel node, and describing each virtual node with a quasi-Boolean function. We then create an expanded representation of the multilevel network, find multilevel stable motifs and oscillating motifs, and identify attractors by successive network reduction. In this way, we find both fixed point attractors and complex attractors. We implemented an algorithm, which we test and validate on representative synthetic networks and on published multilevel models of biological networks. Despite its primary motivation to analyze biological networks, our motif-based method is general and can be applied to any finite discrete dynamical system.
qPMS9: An Efficient Algorithm for Quorum Planted Motif Search

NASA Astrophysics Data System (ADS)

Nicolae, Marius; Rajasekaran, Sanguthevar

2015-01-01

Discovering patterns in biological sequences is a crucial problem. For example, the identification of patterns in DNA sequences has resulted in the determination of open reading frames, identification of gene promoter elements, intron/exon splicing sites, and SH RNAs, location of RNA degradation signals, identification of alternative splicing sites, etc. In protein sequences, patterns have led to domain identification, location of protease cleavage sites, identification of signal peptides, protein interactions, determination of protein degradation elements, identification of protein trafficking elements, discovery of short functional motifs, etc. In this paper we focus on the identification of an important class of patterns, namely, motifs. We study the (l, d) motif search problem or Planted Motif Search (PMS). PMS receives as input n strings and two integers l and d. It returns all sequences M of length l that occur in each input string, where each occurrence differs from M in at most d positions. Another formulation is quorum PMS (qPMS), where the motif appears in at least q% of the strings. We introduce qPMS9, a parallel exact qPMS algorithm that offers significant runtime improvements on DNA and protein datasets. qPMS9 solves the challenging DNA (l, d)-instances (28, 12) and (30, 13). The source code is available at https://code.google.com/p/qpms9/.

Local Higher-Order Graph Clustering

PubMed Central

Yin, Hao; Benson, Austin R.; Leskovec, Jure; Gleich, David F.

2018-01-01

Local graph clustering methods aim to find a cluster of nodes by exploring a small region of the graph. These methods are attractive because they enable targeted clustering around a given seed node and are faster than traditional global graph clustering methods because their runtime does not depend on the size of the input graph. However, current local graph partitioning methods are not designed to account for the higher-order structures crucial to the network, nor can they effectively handle directed networks. Here we introduce a new class of local graph clustering methods that address these issues by incorporating higher-order network information captured by small subgraphs, also called network motifs. We develop the Motif-based Approximate Personalized PageRank (MAPPR) algorithm that finds clusters containing a seed node with minimal motif conductance, a generalization of the conductance metric for network motifs. We generalize existing theory to prove the fast running time (independent of the size of the graph) and obtain theoretical guarantees on the cluster quality (in terms of motif conductance). We also develop a theory of node neighborhoods for finding sets that have small motif conductance, and apply these results to the case of finding good seed nodes to use as input to the MAPPR algorithm. Experimental validation on community detection tasks in both synthetic and real-world networks, shows that our new framework MAPPR outperforms the current edge-based personalized PageRank methodology. PMID:29770258
Combinatorial Histone Acetylation Patterns Are Generated by Motif-Specific Reactions.

PubMed

Blasi, Thomas; Feller, Christian; Feigelman, Justin; Hasenauer, Jan; Imhof, Axel; Theis, Fabian J; Becker, Peter B; Marr, Carsten

2016-01-27

Post-translational modifications (PTMs) are pivotal to cellular information processing, but how combinatorial PTM patterns ("motifs") are set remains elusive. We develop a computational framework, which we provide as open source code, to investigate the design principles generating the combinatorial acetylation patterns on histone H4 in Drosophila melanogaster. We find that models assuming purely unspecific or lysine site-specific acetylation rates were insufficient to explain the experimentally determined motif abundances. Rather, these abundances were best described by an ensemble of models with acetylation rates that were specific to motifs. The model ensemble converged upon four acetylation pathways; we validated three of these using independent data from a systematic enzyme depletion study. Our findings suggest that histone acetylation patterns originate through specific pathways involving motif-specific acetylation activity. Copyright © 2016 Elsevier Inc. All rights reserved.
libFLASM: a software library for fixed-length approximate string matching.

PubMed

Ayad, Lorraine A K; Pissis, Solon P P; Retha, Ahmad

2016-11-10

Approximate string matching is the problem of finding all factors of a given text that are at a distance at most k from a given pattern. Fixed-length approximate string matching is the problem of finding all factors of a text of length n that are at a distance at most k from any factor of length ℓ of a pattern of length m. There exist bit-vector techniques to solve the fixed-length approximate string matching problem in time [Formula: see text] and space [Formula: see text] under the edit and Hamming distance models, where w is the size of the computer word; as such these techniques are independent of the distance threshold k or the alphabet size. Fixed-length approximate string matching is a generalisation of approximate string matching and, hence, has numerous direct applications in computational molecular biology and elsewhere. We present and make available libFLASM, a free open-source C++ software library for solving fixed-length approximate string matching under both the edit and the Hamming distance models. Moreover we describe how fixed-length approximate string matching is applied to solve real problems by incorporating libFLASM into established applications for multiple circular sequence alignment as well as single and structured motif extraction. Specifically, we describe how it can be used to improve the accuracy of multiple circular sequence alignment in terms of the inferred likelihood-based phylogenies; and we also describe how it is used to efficiently find motifs in molecular sequences representing regulatory or functional regions. The comparison of the performance of the library to other algorithms show how it is competitive, especially with increasing distance thresholds. Fixed-length approximate string matching is a generalisation of the classic approximate string matching problem. We present libFLASM, a free open-source C++ software library for solving fixed-length approximate string matching. The extensive experimental results presented here suggest that other applications could benefit from using libFLASM, and thus further maintenance and development of libFLASM is desirable.
Novel Strategy for Discrimination of Transcription Factor Binding Motifs Employing Mathematical Neural Network

NASA Astrophysics Data System (ADS)

Sugimoto, Asuka; Sumi, Takuya; Kang, Jiyoung; Tateno, Masaru

2017-07-01

Recognition in biological macromolecular systems, such as DNA-protein recognition, is one of the most crucial problems to solve toward understanding the fundamental mechanisms of various biological processes. Since specific base sequences of genome DNA are discriminated by proteins, such as transcription factors (TFs), finding TF binding motifs (TFBMs) in whole genome DNA sequences is currently a central issue in interdisciplinary biophysical and information sciences. In the present study, a novel strategy to create a discriminant function for discrimination of TFBMs by constituting mathematical neural networks (NNs) is proposed, together with a method to determine the boundary of signals (TFBMs) and noise in the NN-score (output) space. This analysis also leads to the mathematical limitation of discrimination in the recognition of features representing TFBMs, in an information geometrical manifold. Thus, the present strategy enables the identification of the whole space of TFBMs, right up to the noise boundary.
Identification of family-specific residue packing motifs and their use for structure-based protein function prediction: I. Method development.

PubMed

Bandyopadhyay, Deepak; Huan, Jun; Prins, Jan; Snoeyink, Jack; Wang, Wei; Tropsha, Alexander

2009-11-01

Protein function prediction is one of the central problems in computational biology. We present a novel automated protein structure-based function prediction method using libraries of local residue packing patterns that are common to most proteins in a known functional family. Critical to this approach is the representation of a protein structure as a graph where residue vertices (residue name used as a vertex label) are connected by geometrical proximity edges. The approach employs two steps. First, it uses a fast subgraph mining algorithm to find all occurrences of family-specific labeled subgraphs for all well characterized protein structural and functional families. Second, it queries a new structure for occurrences of a set of motifs characteristic of a known family, using a graph index to speed up Ullman's subgraph isomorphism algorithm. The confidence of function inference from structure depends on the number of family-specific motifs found in the query structure compared with their distribution in a large non-redundant database of proteins. This method can assign a new structure to a specific functional family in cases where sequence alignments, sequence patterns, structural superposition and active site templates fail to provide accurate annotation.
FISim: A new similarity measure between transcription factor binding sites based on the fuzzy integral

PubMed Central

Garcia, Fernando; Lopez, Francisco J; Cano, Carlos; Blanco, Armando

2009-01-01

Background Regulatory motifs describe sets of related transcription factor binding sites (TFBSs) and can be represented as position frequency matrices (PFMs). De novo identification of TFBSs is a crucial problem in computational biology which includes the issue of comparing putative motifs with one another and with motifs that are already known. The relative importance of each nucleotide within a given position in the PFMs should be considered in order to compute PFM similarities. Furthermore, biological data are inherently noisy and imprecise. Fuzzy set theory is particularly suitable for modeling imprecise data, whereas fuzzy integrals are highly appropriate for representing the interaction among different information sources. Results We propose FISim, a new similarity measure between PFMs, based on the fuzzy integral of the distance of the nucleotides with respect to the information content of the positions. Unlike existing methods, FISim is designed to consider the higher contribution of better conserved positions to the binding affinity. FISim provides excellent results when dealing with sets of randomly generated motifs, and outperforms the remaining methods when handling real datasets of related motifs. Furthermore, we propose a new cluster methodology based on kernel theory together with FISim to obtain groups of related motifs potentially bound by the same TFs, providing more robust results than existing approaches. Conclusion FISim corrects a design flaw of the most popular methods, whose measures favour similarity of low information content positions. We use our measure to successfully identify motifs that describe binding sites for the same TF and to solve real-life problems. In this study the reliability of fuzzy technology for motif comparison tasks is proven. PMID:19615102
FPGA implementation of motifs-based neuronal network and synchronization analysis

NASA Astrophysics Data System (ADS)

Deng, Bin; Zhu, Zechen; Yang, Shuangming; Wei, Xile; Wang, Jiang; Yu, Haitao

2016-06-01

Motifs in complex networks play a crucial role in determining the brain functions. In this paper, 13 kinds of motifs are implemented with Field Programmable Gate Array (FPGA) to investigate the relationships between the networks properties and motifs properties. We use discretization method and pipelined architecture to construct various motifs with Hindmarsh-Rose (HR) neuron as the node model. We also build a small-world network based on these motifs and conduct the synchronization analysis of motifs as well as the constructed network. We find that the synchronization properties of motif determine that of motif-based small-world network, which demonstrates effectiveness of our proposed hardware simulation platform. By imitation of some vital nuclei in the brain to generate normal discharges, our proposed FPGA-based artificial neuronal networks have the potential to replace the injured nuclei to complete the brain function in the treatment of Parkinson's disease and epilepsy.
The BaMM web server for de-novo motif discovery and regulatory sequence analysis.

PubMed

Kiesel, Anja; Roth, Christian; Ge, Wanwan; Wess, Maximilian; Meier, Markus; Söding, Johannes

2018-05-28

The BaMM web server offers four tools: (i) de-novo discovery of enriched motifs in a set of nucleotide sequences, (ii) scanning a set of nucleotide sequences with motifs to find motif occurrences, (iii) searching with an input motif for similar motifs in our BaMM database with motifs for >1000 transcription factors, trained from the GTRD ChIP-seq database and (iv) browsing and keyword searching the motif database. In contrast to most other servers, we represent sequence motifs not by position weight matrices (PWMs) but by Bayesian Markov Models (BaMMs) of order 4, which we showed previously to perform substantially better in ROC analyses than PWMs or first order models. To address the inadequacy of P- and E-values as measures of motif quality, we introduce the AvRec score, the average recall over the TP-to-FP ratio between 1 and 100. The BaMM server is freely accessible without registration at https://bammmotif.mpibpc.mpg.de.
An integrative and applicable phylogenetic footprinting framework for cis-regulatory motifs identification in prokaryotic genomes.

PubMed

Liu, Bingqiang; Zhang, Hanyuan; Zhou, Chuan; Li, Guojun; Fennell, Anne; Wang, Guanghui; Kang, Yu; Liu, Qi; Ma, Qin

2016-08-09

Phylogenetic footprinting is an important computational technique for identifying cis-regulatory motifs in orthologous regulatory regions from multiple genomes, as motifs tend to evolve slower than their surrounding non-functional sequences. Its application, however, has several difficulties for optimizing the selection of orthologous data and reducing the false positives in motif prediction. Here we present an integrative phylogenetic footprinting framework for accurate motif predictions in prokaryotic genomes (MP(3)). The framework includes a new orthologous data preparation procedure, an additional promoter scoring and pruning method and an integration of six existing motif finding algorithms as basic motif search engines. Specifically, we collected orthologous genes from available prokaryotic genomes and built the orthologous regulatory regions based on sequence similarity of promoter regions. This procedure made full use of the large-scale genomic data and taxonomy information and filtered out the promoters with limited contribution to produce a high quality orthologous promoter set. The promoter scoring and pruning is implemented through motif voting by a set of complementary predicting tools that mine as many motif candidates as possible and simultaneously eliminate the effect of random noise. We have applied the framework to Escherichia coli k12 genome and evaluated the prediction performance through comparison with seven existing programs. This evaluation was systematically carried out at the nucleotide and binding site level, and the results showed that MP(3) consistently outperformed other popular motif finding tools. We have integrated MP(3) into our motif identification and analysis server DMINDA, allowing users to efficiently identify and analyze motifs in 2,072 completely sequenced prokaryotic genomes. The performance evaluation indicated that MP(3) is effective for predicting regulatory motifs in prokaryotic genomes. Its application may enhance progress in elucidating transcription regulation mechanism, thus provide benefit to the genomic research community and prokaryotic genome researchers in particular.
Three computational mise-en-scènes of red- and blue-shifted hydrogen bonding motifs: Concept of negative intramolecular coupling-What else?

NASA Astrophysics Data System (ADS)

Kryachko, Eugene S.

This work is a kind of attempt to rethink some problems which are related to the blue-shifted "hydrogen bonds" and which have been left in the past decade as not yet fully resolved. The impetus for such rethink is originated from the three computational mise-en-scènes on red- and blue-shifted hydrogen bonding motifs, which are aimed to be thoroughly studied in this work, thus resolving the above problems.
cWINNOWER algorithm for finding fuzzy dna motifs

NASA Technical Reports Server (NTRS)

Liang, S.; Samanta, M. P.; Biegel, B. A.

2004-01-01

The cWINNOWER algorithm detects fuzzy motifs in DNA sequences rich in protein-binding signals. A signal is defined as any short nucleotide pattern having up to d mutations differing from a motif of length l. The algorithm finds such motifs if a clique consisting of a sufficiently large number of mutated copies of the motif (i.e., the signals) is present in the DNA sequence. The cWINNOWER algorithm substantially improves the sensitivity of the winnower method of Pevzner and Sze by imposing a consensus constraint, enabling it to detect much weaker signals. We studied the minimum detectable clique size qc as a function of sequence length N for random sequences. We found that qc increases linearly with N for a fast version of the algorithm based on counting three-member sub-cliques. Imposing consensus constraints reduces qc by a factor of three in this case, which makes the algorithm dramatically more sensitive. Our most sensitive algorithm, which counts four-member sub-cliques, needs a minimum of only 13 signals to detect motifs in a sequence of length N = 12,000 for (l, d) = (15, 4). Copyright Imperial College Press.
A Conserved GPG-Motif in the HIV-1 Nef Core Is Required for Principal Nef-Activities

PubMed Central

Martínez-Bonet, Marta; Palladino, Claudia; Briz, Veronica; Rudolph, Jochen M.; Fackler, Oliver T.; Relloso, Miguel; Muñoz-Fernandez, Maria Angeles; Madrid, Ricardo

2015-01-01

To find out new determinants required for Nef activity we performed a functional alanine scanning analysis along a discrete but highly conserved region at the core of HIV-1 Nef. We identified the GPG-motif, located at the 121–137 region of HIV-1 NL4.3 Nef, as a novel protein signature strictly required for the p56Lck dependent Nef-induced CD4-downregulation in T-cells. Since the Nef-GPG motif was dispensable for CD4-downregulation in HeLa-CD4 cells, Nef/AP-1 interaction and Nef-dependent effects on Tf-R trafficking, the observed effects on CD4 downregulation cannot be attributed to structure constraints or to alterations on general protein trafficking. Besides, we found that the GPG-motif was also required for Nef-dependent inhibition of ring actin re-organization upon TCR triggering and MHCI downregulation, suggesting that the GPG-motif could actively cooperate with the Nef PxxP motif for these HIV-1 Nef-related effects. Finally, we observed that the Nef-GPG motif was required for optimal infectivity of those viruses produced in T-cells. According to these findings, we propose the conserved GPG-motif in HIV-1 Nef as functional region required for HIV-1 infectivity and therefore with a potential interest for the interference of Nef activity during HIV-1 infection. PMID:26700863
DLocalMotif: a discriminative approach for discovering local motifs in protein sequences.

PubMed

Mehdi, Ahmed M; Sehgal, Muhammad Shoaib B; Kobe, Bostjan; Bailey, Timothy L; Bodén, Mikael

2013-01-01

Local motifs are patterns of DNA or protein sequences that occur within a sequence interval relative to a biologically defined anchor or landmark. Current protein motif discovery methods do not adequately consider such constraints to identify biologically significant motifs that are only weakly over-represented but spatially confined. Using negatives, i.e. sequences known to not contain a local motif, can further increase the specificity of their discovery. This article introduces the method DLocalMotif that makes use of positional information and negative data for local motif discovery in protein sequences. DLocalMotif combines three scoring functions, measuring degrees of motif over-representation, entropy and spatial confinement, specifically designed to discriminatively exploit the availability of negative data. The method is shown to outperform current methods that use only a subset of these motif characteristics. We apply the method to several biological datasets. The analysis of peroxisomal targeting signals uncovers several novel motifs that occur immediately upstream of the dominant peroxisomal targeting signal-1 signal. The analysis of proline-tyrosine nuclear localization signals uncovers multiple novel motifs that overlap with C2H2 zinc finger domains. We also evaluate the method on classical nuclear localization signals and endoplasmic reticulum retention signals and find that DLocalMotif successfully recovers biologically relevant sequence properties. http://bioinf.scmb.uq.edu.au/dlocalmotif/
Motif discovery and motif finding from genome-mapped DNase footprint data.

PubMed

Kulakovskiy, Ivan V; Favorov, Alexander V; Makeev, Vsevolod J

2009-09-15

Footprint data is an important source of information on transcription factor recognition motifs. However, a footprinting fragment can contain no sequences similar to known protein recognition sites. Inspection of genome fragments nearby can help to identify missing site positions. Genome fragments containing footprints were supplied to a pipeline that constructed a position weight matrix (PWM) for different motif lengths and selected the optimal PWM. Fragments were aligned with the SeSiMCMC sampler and a new heuristic algorithm, Bigfoot. Footprints with missing hits were found for approximately 50% of factors. Adding only 2 bp on both sides of a footprinting fragment recovered most hits. We automatically constructed motifs for 41 Drosophila factors. New motifs can recognize footprints with a greater sensitivity at the same false positive rate than existing models. Also we discuss possible overfitting of constructed motifs. Software and the collection of regulatory motifs are freely available at http://line.imb.ac.ru/DMMPMM.
Organization of feed-forward loop motifs reveals architectural principles in natural and engineered networks.

PubMed

Gorochowski, Thomas E; Grierson, Claire S; di Bernardo, Mario

2018-03-01

Network motifs are significantly overrepresented subgraphs that have been proposed as building blocks for natural and engineered networks. Detailed functional analysis has been performed for many types of motif in isolation, but less is known about how motifs work together to perform complex tasks. To address this issue, we measure the aggregation of network motifs via methods that extract precisely how these structures are connected. Applying this approach to a broad spectrum of networked systems and focusing on the widespread feed-forward loop motif, we uncover striking differences in motif organization. The types of connection are often highly constrained, differ between domains, and clearly capture architectural principles. We show how this information can be used to effectively predict functionally important nodes in the metabolic network of Escherichia coli . Our findings have implications for understanding how networked systems are constructed from motif parts and elucidate constraints that guide their evolution.
Organization of feed-forward loop motifs reveals architectural principles in natural and engineered networks

PubMed Central

Grierson, Claire S.

2018-01-01

Network motifs are significantly overrepresented subgraphs that have been proposed as building blocks for natural and engineered networks. Detailed functional analysis has been performed for many types of motif in isolation, but less is known about how motifs work together to perform complex tasks. To address this issue, we measure the aggregation of network motifs via methods that extract precisely how these structures are connected. Applying this approach to a broad spectrum of networked systems and focusing on the widespread feed-forward loop motif, we uncover striking differences in motif organization. The types of connection are often highly constrained, differ between domains, and clearly capture architectural principles. We show how this information can be used to effectively predict functionally important nodes in the metabolic network of Escherichia coli. Our findings have implications for understanding how networked systems are constructed from motif parts and elucidate constraints that guide their evolution. PMID:29670941
Combinatorics of feedback in cellular uptake and metabolism of small molecules.

PubMed

Krishna, Sandeep; Semsey, Szabolcs; Sneppen, Kim

2007-12-26

We analyze the connection between structure and function for regulatory motifs associated with cellular uptake and usage of small molecules. Based on the boolean logic of the feedback we suggest four classes: the socialist, consumer, fashion, and collector motifs. We find that the socialist motif is good for homeostasis of a useful but potentially poisonous molecule, whereas the consumer motif is optimal for nutrition molecules. Accordingly, examples of these motifs are found in, respectively, the iron homeostasis system in various organisms and in the uptake of sugar molecules in bacteria. The remaining two motifs have no obvious analogs in small molecule regulation, but we illustrate their behavior using analogies to fashion and obesity. These extreme motifs could inspire construction of synthetic systems that exhibit bistable, history-dependent states, and homeostasis of flux (rather than concentration).
MOTIFSIM 2.1: An Enhanced Software Platform for Detecting Similarity in Multiple DNA Motif Data Sets

PubMed Central

Huang, Chun-Hsi

2017-01-01

Abstract Finding binding site motifs plays an important role in bioinformatics as it reveals the transcription factors that control the gene expression. The development for motif finders has flourished in the past years with many tools have been introduced to the research community. Although these tools possess exceptional features for detecting motifs, they report different results for an identical data set. Hence, using multiple tools is recommended because motifs reported by several tools are likely biologically significant. However, the results from multiple tools need to be compared for obtaining common significant motifs. MOTIFSIM web tool and command-line tool were developed for this purpose. In this work, we present several technical improvements as well as additional features to further support the motif analysis in our new release MOTIFSIM 2.1. PMID:28632401
Direct AUC optimization of regulatory motifs.

PubMed

Zhu, Lin; Zhang, Hong-Bo; Huang, De-Shuang

2017-07-15

The discovery of transcription factor binding site (TFBS) motifs is essential for untangling the complex mechanism of genetic variation under different developmental and environmental conditions. Among the huge amount of computational approaches for de novo identification of TFBS motifs, discriminative motif learning (DML) methods have been proven to be promising for harnessing the discovery power of accumulated huge amount of high-throughput binding data. However, they have to sacrifice accuracy for speed and could fail to fully utilize the information of the input sequences. We propose a novel algorithm called CDAUC for optimizing DML-learned motifs based on the area under the receiver-operating characteristic curve (AUC) criterion, which has been widely used in the literature to evaluate the significance of extracted motifs. We show that when the considered AUC loss function is optimized in a coordinate-wise manner, the cost function of each resultant sub-problem is a piece-wise constant function, whose optimal value can be found exactly and efficiently. Further, a key step of each iteration of CDAUC can be efficiently solved as a computational geometry problem. Experimental results on real world high-throughput datasets illustrate that CDAUC outperforms competing methods for refining DML motifs, while being one order of magnitude faster. Meanwhile, preliminary results also show that CDAUC may also be useful for improving the interpretability of convolutional kernels generated by the emerging deep learning approaches for predicting TF sequences specificities. CDAUC is available at: https://drive.google.com/drive/folders/0BxOW5MtIZbJjNFpCeHlBVWJHeW8 . dshuang@tongji.edu.cn. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Mapping and analysis of Caenorhabditis elegans transcription factor sequence specificities

PubMed Central

Narasimhan, Kamesh; Lambert, Samuel A; Yang, Ally WH; Riddell, Jeremy; Mnaimneh, Sanie; Zheng, Hong; Albu, Mihai; Najafabadi, Hamed S; Reece-Hoyes, John S; Fuxman Bass, Juan I; Walhout, Albertha JM; Weirauch, Matthew T; Hughes, Timothy R

2015-01-01

Caenorhabditis elegans is a powerful model for studying gene regulation, as it has a compact genome and a wealth of genomic tools. However, identification of regulatory elements has been limited, as DNA-binding motifs are known for only 71 of the estimated 763 sequence-specific transcription factors (TFs). To address this problem, we performed protein binding microarray experiments on representatives of canonical TF families in C. elegans, obtaining motifs for 129 TFs. Additionally, we predict motifs for many TFs that have DNA-binding domains similar to those already characterized, increasing coverage of binding specificities to 292 C. elegans TFs (∼40%). These data highlight the diversification of binding motifs for the nuclear hormone receptor and C2H2 zinc finger families and reveal unexpected diversity of motifs for T-box and DM families. Motif enrichment in promoters of functionally related genes is consistent with known biology and also identifies putative regulatory roles for unstudied TFs. DOI: http://dx.doi.org/10.7554/eLife.06967.001 PMID:25905672

DoOPSearch: a web-based tool for finding and analysing common conserved motifs in the promoter regions of different chordate and plant genes

PubMed Central

Sebestyén, Endre; Nagy, Tibor; Suhai, Sándor; Barta, Endre

2009-01-01

Background The comparative genomic analysis of a large number of orthologous promoter regions of the chordate and plant genes from the DoOP databases shows thousands of conserved motifs. Most of these motifs differ from any known transcription factor binding site (TFBS). To identify common conserved motifs, we need a specific tool to be able to search amongst them. Since conserved motifs from the DoOP databases are linked to genes, the result of such a search can give a list of genes that are potentially regulated by the same transcription factor(s). Results We have developed a new tool called DoOPSearch for the analysis of the conserved motifs in the promoter regions of chordate or plant genes. We used the orthologous promoters of the DoOP database to extract thousands of conserved motifs from different taxonomic groups. The advantage of this approach is that different sets of conserved motifs might be found depending on how broad the taxonomic coverage of the underlying orthologous promoter sequence collection is (consider e.g. primates vs. mammals or Brassicaceae vs. Viridiplantae). The DoOPSearch tool allows the users to search these motif collections or the promoter regions of DoOP with user supplied query sequences or any of the conserved motifs from the DoOP database. To find overrepresented gene ontologies, the gene lists obtained can be analysed further using a modified version of the GeneMerge program. Conclusion We present here a comparative genomics based promoter analysis tool. Our system is based on a unique collection of conserved promoter motifs characteristic of different taxonomic groups. We offer both a command line and a web-based tool for searching in these motif collections using user specified queries. These can be either short promoter sequences or consensus sequences of known transcription factor binding sites. The GeneMerge analysis of the search results allows the user to identify statistically overrepresented Gene Ontology terms that might provide a clue on the function of the motifs and genes. PMID:19534755
Discovering Sequence Motifs with Arbitrary Insertions and Deletions

PubMed Central

Frith, Martin C.; Saunders, Neil F. W.; Kobe, Bostjan; Bailey, Timothy L.

2008-01-01

Biology is encoded in molecular sequences: deciphering this encoding remains a grand scientific challenge. Functional regions of DNA, RNA, and protein sequences often exhibit characteristic but subtle motifs; thus, computational discovery of motifs in sequences is a fundamental and much-studied problem. However, most current algorithms do not allow for insertions or deletions (indels) within motifs, and the few that do have other limitations. We present a method, GLAM2 (Gapped Local Alignment of Motifs), for discovering motifs allowing indels in a fully general manner, and a companion method GLAM2SCAN for searching sequence databases using such motifs. glam2 is a generalization of the gapless Gibbs sampling algorithm. It re-discovers variable-width protein motifs from the PROSITE database significantly more accurately than the alternative methods PRATT and SAM-T2K. Furthermore, it usefully refines protein motifs from the ELM database: in some cases, the refined motifs make orders of magnitude fewer overpredictions than the original ELM regular expressions. GLAM2 performs respectably on the BAliBASE multiple alignment benchmark, and may be superior to leading multiple alignment methods for “motif-like” alignments with N- and C-terminal extensions. Finally, we demonstrate the use of GLAM2 to discover protein kinase substrate motifs and a gapped DNA motif for the LIM-only transcriptional regulatory complex: using GLAM2SCAN, we identify promising targets for the latter. GLAM2 is especially promising for short protein motifs, and it should improve our ability to identify the protein cleavage sites, interaction sites, post-translational modification attachment sites, etc., that underlie much of biology. It may be equally useful for arbitrarily gapped motifs in DNA and RNA, although fewer examples of such motifs are known at present. GLAM2 is public domain software, available for download at http://bioinformatics.org.au/glam2. PMID:18437229
Identifying DNA-binding proteins using structural motifs and the electrostatic potential

PubMed Central

Shanahan, Hugh P.; Garcia, Mario A.; Jones, Susan; Thornton, Janet M.

2004-01-01

Robust methods to detect DNA-binding proteins from structures of unknown function are important for structural biology. This paper describes a method for identifying such proteins that (i) have a solvent accessible structural motif necessary for DNA-binding and (ii) a positive electrostatic potential in the region of the binding region. We focus on three structural motifs: helix–turn-helix (HTH), helix–hairpin–helix (HhH) and helix–loop–helix (HLH). We find that the combination of these variables detect 78% of proteins with an HTH motif, which is a substantial improvement over previous work based purely on structural templates and is comparable to more complex methods of identifying DNA-binding proteins. Similar true positive fractions are achieved for the HhH and HLH motifs. We see evidence of wide evolutionary diversity for DNA-binding proteins with an HTH motif, and much smaller diversity for those with an HhH or HLH motif. PMID:15356290
Identification of the sequence motif of glycoside hydrolase 13 family members

PubMed Central

Kumar, Vikash

2011-01-01

A bioinformatics analysis of sequences of enzymes of the glycoside hydrolase (GH) 13 family members such as α-amylase, cyclodextrin glycosyltransferase (CGTase), branching enzyme and cyclomaltodextrinase has been carried out in order to find out the sequence motifs that govern the reactions specificities of these enzymes by using hidden Markov model (HMM) profile. This analysis suggests the existence of such sequence motifs and residues of these motifs constituting the −1 to +3 catalytic subsites of the enzyme. Hence, by introducing mutations in the residues of these four subsites, one can change the reaction specificities of the enzymes. In general it has been observed that α -amylase sequence motif have low sequence conservation than rest of the motifs of the GH13 family members. PMID:21544166
Visualizing frequent patterns in large multivariate time series

NASA Astrophysics Data System (ADS)

Hao, M.; Marwah, M.; Janetzko, H.; Sharma, R.; Keim, D. A.; Dayal, U.; Patnaik, D.; Ramakrishnan, N.

2011-01-01

The detection of previously unknown, frequently occurring patterns in time series, often called motifs, has been recognized as an important task. However, it is difficult to discover and visualize these motifs as their numbers increase, especially in large multivariate time series. To find frequent motifs, we use several temporal data mining and event encoding techniques to cluster and convert a multivariate time series to a sequence of events. Then we quantify the efficiency of the discovered motifs by linking them with a performance metric. To visualize frequent patterns in a large time series with potentially hundreds of nested motifs on a single display, we introduce three novel visual analytics methods: (1) motif layout, using colored rectangles for visualizing the occurrences and hierarchical relationships of motifs in a multivariate time series, (2) motif distortion, for enlarging or shrinking motifs as appropriate for easy analysis and (3) motif merging, to combine a number of identical adjacent motif instances without cluttering the display. Analysts can interactively optimize the degree of distortion and merging to get the best possible view. A specific motif (e.g., the most efficient or least efficient motif) can be quickly detected from a large time series for further investigation. We have applied these methods to two real-world data sets: data center cooling and oil well production. The results provide important new insights into the recurring patterns.
Systematic comparison of the response properties of protein and RNA mediated gene regulatory motifs.

PubMed

Iyengar, Bharat Ravi; Pillai, Beena; Venkatesh, K V; Gadgil, Chetan J

2017-05-30

We present a framework enabling the dissection of the effects of motif structure (feedback or feedforward), the nature of the controller (RNA or protein), and the regulation mode (transcriptional, post-transcriptional or translational) on the response to a step change in the input. We have used a common model framework for gene expression where both motif structures have an activating input and repressing regulator, with the same set of parameters, to enable a comparison of the responses. We studied the global sensitivity of the system properties, such as steady-state gain, overshoot, peak time, and peak duration, to parameters. We find that, in all motifs, overshoot correlated negatively whereas peak duration varied concavely with peak time. Differences in the other system properties were found to be mainly dependent on the nature of the controller rather than the motif structure. Protein mediated motifs showed a higher degree of adaptation i.e. a tendency to return to baseline levels; in particular, feedforward motifs exhibited perfect adaptation. RNA mediated motifs had a mild regulatory effect; they also exhibited a lower peaking tendency and mean overshoot. Protein mediated feedforward motifs showed higher overshoot and lower peak time compared to the corresponding feedback motifs.
The effect of orthology and coregulation on detecting regulatory motifs.

PubMed

Storms, Valerie; Claeys, Marleen; Sanchez, Aminael; De Moor, Bart; Verstuyf, Annemieke; Marchal, Kathleen

2010-02-03

Computational de novo discovery of transcription factor binding sites is still a challenging problem. The growing number of sequenced genomes allows integrating orthology evidence with coregulation information when searching for motifs. Moreover, the more advanced motif detection algorithms explicitly model the phylogenetic relatedness between the orthologous input sequences and thus should be well adapted towards using orthologous information. In this study, we evaluated the conditions under which complementing coregulation with orthologous information improves motif detection for the class of probabilistic motif detection algorithms with an explicit evolutionary model. We designed datasets (real and synthetic) covering different degrees of coregulation and orthologous information to test how well Phylogibbs and Phylogenetic sampler, as representatives of the motif detection algorithms with evolutionary model performed as compared to MEME, a more classical motif detection algorithm that treats orthologs independently. Under certain conditions detecting motifs in the combined coregulation-orthology space is indeed more efficient than using each space separately, but this is not always the case. Moreover, the difference in success rate between the advanced algorithms and MEME is still marginal. The success rate of motif detection depends on the complex interplay between the added information and the specificities of the applied algorithms. Insights in this relation provide information useful to both developers and users. All benchmark datasets are available at http://homes.esat.kuleuven.be/~kmarchal/Supplementary_Storms_Valerie_PlosONE.
The Effect of Orthology and Coregulation on Detecting Regulatory Motifs

PubMed Central

Storms, Valerie; Claeys, Marleen; Sanchez, Aminael; De Moor, Bart; Verstuyf, Annemieke; Marchal, Kathleen

2010-01-01

Background Computational de novo discovery of transcription factor binding sites is still a challenging problem. The growing number of sequenced genomes allows integrating orthology evidence with coregulation information when searching for motifs. Moreover, the more advanced motif detection algorithms explicitly model the phylogenetic relatedness between the orthologous input sequences and thus should be well adapted towards using orthologous information. In this study, we evaluated the conditions under which complementing coregulation with orthologous information improves motif detection for the class of probabilistic motif detection algorithms with an explicit evolutionary model. Methodology We designed datasets (real and synthetic) covering different degrees of coregulation and orthologous information to test how well Phylogibbs and Phylogenetic sampler, as representatives of the motif detection algorithms with evolutionary model performed as compared to MEME, a more classical motif detection algorithm that treats orthologs independently. Results and Conclusions Under certain conditions detecting motifs in the combined coregulation-orthology space is indeed more efficient than using each space separately, but this is not always the case. Moreover, the difference in success rate between the advanced algorithms and MEME is still marginal. The success rate of motif detection depends on the complex interplay between the added information and the specificities of the applied algorithms. Insights in this relation provide information useful to both developers and users. All benchmark datasets are available at http://homes.esat.kuleuven.be/~kmarchal/Supplementary_Storms_Valerie_PlosONE. PMID:20140085
Statistical tests to compare motif count exceptionalities

PubMed Central

Robin, Stéphane; Schbath, Sophie; Vandewalle, Vincent

2007-01-01

Background Finding over- or under-represented motifs in biological sequences is now a common task in genomics. Thanks to p-value calculation for motif counts, exceptional motifs are identified and represent candidate functional motifs. The present work addresses the related question of comparing the exceptionality of one motif in two different sequences. Just comparing the motif count p-values in each sequence is indeed not sufficient to decide if this motif is significantly more exceptional in one sequence compared to the other one. A statistical test is required. Results We develop and analyze two statistical tests, an exact binomial one and an asymptotic likelihood ratio test, to decide whether the exceptionality of a given motif is equivalent or significantly different in two sequences of interest. For that purpose, motif occurrences are modeled by Poisson processes, with a special care for overlapping motifs. Both tests can take the sequence compositions into account. As an illustration, we compare the octamer exceptionalities in the Escherichia coli K-12 backbone versus variable strain-specific loops. Conclusion The exact binomial test is particularly adapted for small counts. For large counts, we advise to use the likelihood ratio test which is asymptotic but strongly correlated with the exact binomial test and very simple to use. PMID:17346349
Assessment of composite motif discovery methods.

PubMed

Klepper, Kjetil; Sandve, Geir K; Abul, Osman; Johansen, Jostein; Drablos, Finn

2008-02-26

Computational discovery of regulatory elements is an important area of bioinformatics research and more than a hundred motif discovery methods have been published. Traditionally, most of these methods have addressed the problem of single motif discovery - discovering binding motifs for individual transcription factors. In higher organisms, however, transcription factors usually act in combination with nearby bound factors to induce specific regulatory behaviours. Hence, recent focus has shifted from single motifs to the discovery of sets of motifs bound by multiple cooperating transcription factors, so called composite motifs or cis-regulatory modules. Given the large number and diversity of methods available, independent assessment of methods becomes important. Although there have been several benchmark studies of single motif discovery, no similar studies have previously been conducted concerning composite motif discovery. We have developed a benchmarking framework for composite motif discovery and used it to evaluate the performance of eight published module discovery tools. Benchmark datasets were constructed based on real genomic sequences containing experimentally verified regulatory modules, and the module discovery programs were asked to predict both the locations of these modules and to specify the single motifs involved. To aid the programs in their search, we provided position weight matrices corresponding to the binding motifs of the transcription factors involved. In addition, selections of decoy matrices were mixed with the genuine matrices on one dataset to test the response of programs to varying levels of noise. Although some of the methods tested tended to score somewhat better than others overall, there were still large variations between individual datasets and no single method performed consistently better than the rest in all situations. The variation in performance on individual datasets also shows that the new benchmark datasets represents a suitable variety of challenges to most methods for module discovery.
Automatic annotation of protein motif function with Gene Ontology terms.

PubMed

Lu, Xinghua; Zhai, Chengxiang; Gopalakrishnan, Vanathi; Buchanan, Bruce G

2004-09-02

Conserved protein sequence motifs are short stretches of amino acid sequence patterns that potentially encode the function of proteins. Several sequence pattern searching algorithms and programs exist foridentifying candidate protein motifs at the whole genome level. However, a much needed and important task is to determine the functions of the newly identified protein motifs. The Gene Ontology (GO) project is an endeavor to annotate the function of genes or protein sequences with terms from a dynamic, controlled vocabulary and these annotations serve well as a knowledge base. This paper presents methods to mine the GO knowledge base and use the association between the GO terms assigned to a sequence and the motifs matched by the same sequence as evidence for predicting the functions of novel protein motifs automatically. The task of assigning GO terms to protein motifs is viewed as both a binary classification and information retrieval problem, where PROSITE motifs are used as samples for mode training and functional prediction. The mutual information of a motif and aGO term association is found to be a very useful feature. We take advantage of the known motifs to train a logistic regression classifier, which allows us to combine mutual information with other frequency-based features and obtain a probability of correct association. The trained logistic regression model has intuitively meaningful and logically plausible parameter values, and performs very well empirically according to our evaluation criteria. In this research, different methods for automatic annotation of protein motifs have been investigated. Empirical result demonstrated that the methods have a great potential for detecting and augmenting information about the functions of newly discovered candidate protein motifs.
Brickworx builds recurrent RNA and DNA structural motifs into medium- and low-resolution electron-density maps

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chojnowski, Grzegorz, E-mail: gchojnowski@genesilico.pl; Waleń, Tomasz; University of Warsaw, Banacha 2, 02-097 Warsaw

2015-03-01

A computer program that builds crystal structure models of nucleic acid molecules is presented. Brickworx is a computer program that builds crystal structure models of nucleic acid molecules using recurrent motifs including double-stranded helices. In a first step, the program searches for electron-density peaks that may correspond to phosphate groups; it may also take into account phosphate-group positions provided by the user. Subsequently, comparing the three-dimensional patterns of the P atoms with a database of nucleic acid fragments, it finds the matching positions of the double-stranded helical motifs (A-RNA or B-DNA) in the unit cell. If the target structure ismore » RNA, the helical fragments are further extended with recurrent RNA motifs from a fragment library that contains single-stranded segments. Finally, the matched motifs are merged and refined in real space to find the most likely conformations, including a fit of the sequence to the electron-density map. The Brickworx program is available for download and as a web server at http://iimcb.genesilico.pl/brickworx.« less
Dissecting protein loops with a statistical scalpel suggests a functional implication of some structural motifs.

PubMed

Regad, Leslie; Martin, Juliette; Camproux, Anne-Claude

2011-06-20

One of the strategies for protein function annotation is to search particular structural motifs that are known to be shared by proteins with a given function. Here, we present a systematic extraction of structural motifs of seven residues from protein loops and we explore their correspondence with functional sites. Our approach is based on the structural alphabet HMM-SA (Hidden Markov Model - Structural Alphabet), which allows simplification of protein structures into uni-dimensional sequences, and advanced pattern statistics adapted to short sequences. Structural motifs of interest are selected by looking for structural motifs significantly over-represented in SCOP superfamilies in protein loops. We discovered two types of structural motifs significantly over-represented in SCOP superfamilies: (i) ubiquitous motifs, shared by several superfamilies and (ii) superfamily-specific motifs, over-represented in few superfamilies. A comparison of ubiquitous words with known small structural motifs shows that they contain well-described motifs as turn, niche or nest motifs. A comparison between superfamily-specific motifs and biological annotations of Swiss-Prot reveals that some of them actually correspond to functional sites involved in the binding sites of small ligands, such as ATP/GTP, NAD(P) and SAH/SAM. Our findings show that statistical over-representation in SCOP superfamilies is linked to functional features. The detection of over-represented motifs within structures simplified by HMM-SA is therefore a promising approach for prediction of functional sites and annotation of uncharacterized proteins.
Dissecting protein loops with a statistical scalpel suggests a functional implication of some structural motifs

PubMed Central

2011-01-01

Background One of the strategies for protein function annotation is to search particular structural motifs that are known to be shared by proteins with a given function. Results Here, we present a systematic extraction of structural motifs of seven residues from protein loops and we explore their correspondence with functional sites. Our approach is based on the structural alphabet HMM-SA (Hidden Markov Model - Structural Alphabet), which allows simplification of protein structures into uni-dimensional sequences, and advanced pattern statistics adapted to short sequences. Structural motifs of interest are selected by looking for structural motifs significantly over-represented in SCOP superfamilies in protein loops. We discovered two types of structural motifs significantly over-represented in SCOP superfamilies: (i) ubiquitous motifs, shared by several superfamilies and (ii) superfamily-specific motifs, over-represented in few superfamilies. A comparison of ubiquitous words with known small structural motifs shows that they contain well-described motifs as turn, niche or nest motifs. A comparison between superfamily-specific motifs and biological annotations of Swiss-Prot reveals that some of them actually correspond to functional sites involved in the binding sites of small ligands, such as ATP/GTP, NAD(P) and SAH/SAM. Conclusions Our findings show that statistical over-representation in SCOP superfamilies is linked to functional features. The detection of over-represented motifs within structures simplified by HMM-SA is therefore a promising approach for prediction of functional sites and annotation of uncharacterized proteins. PMID:21689388
An novel frequent probability pattern mining algorithm based on circuit simulation method in uncertain biological networks.

PubMed

He, Jieyue; Wang, Chunyan; Qiu, Kunpu; Zhong, Wei

2014-01-01

Motif mining has always been a hot research topic in bioinformatics. Most of current research on biological networks focuses on exact motif mining. However, due to the inevitable experimental error and noisy data, biological network data represented as the probability model could better reflect the authenticity and biological significance, therefore, it is more biological meaningful to discover probability motif in uncertain biological networks. One of the key steps in probability motif mining is frequent pattern discovery which is usually based on the possible world model having a relatively high computational complexity. In this paper, we present a novel method for detecting frequent probability patterns based on circuit simulation in the uncertain biological networks. First, the partition based efficient search is applied to the non-tree like subgraph mining where the probability of occurrence in random networks is small. Then, an algorithm of probability isomorphic based on circuit simulation is proposed. The probability isomorphic combines the analysis of circuit topology structure with related physical properties of voltage in order to evaluate the probability isomorphism between probability subgraphs. The circuit simulation based probability isomorphic can avoid using traditional possible world model. Finally, based on the algorithm of probability subgraph isomorphism, two-step hierarchical clustering method is used to cluster subgraphs, and discover frequent probability patterns from the clusters. The experiment results on data sets of the Protein-Protein Interaction (PPI) networks and the transcriptional regulatory networks of E. coli and S. cerevisiae show that the proposed method can efficiently discover the frequent probability subgraphs. The discovered subgraphs in our study contain all probability motifs reported in the experiments published in other related papers. The algorithm of probability graph isomorphism evaluation based on circuit simulation method excludes most of subgraphs which are not probability isomorphism and reduces the search space of the probability isomorphism subgraphs using the mismatch values in the node voltage set. It is an innovative way to find the frequent probability patterns, which can be efficiently applied to probability motif discovery problems in the further studies.
An novel frequent probability pattern mining algorithm based on circuit simulation method in uncertain biological networks

PubMed Central

2014-01-01

Background Motif mining has always been a hot research topic in bioinformatics. Most of current research on biological networks focuses on exact motif mining. However, due to the inevitable experimental error and noisy data, biological network data represented as the probability model could better reflect the authenticity and biological significance, therefore, it is more biological meaningful to discover probability motif in uncertain biological networks. One of the key steps in probability motif mining is frequent pattern discovery which is usually based on the possible world model having a relatively high computational complexity. Methods In this paper, we present a novel method for detecting frequent probability patterns based on circuit simulation in the uncertain biological networks. First, the partition based efficient search is applied to the non-tree like subgraph mining where the probability of occurrence in random networks is small. Then, an algorithm of probability isomorphic based on circuit simulation is proposed. The probability isomorphic combines the analysis of circuit topology structure with related physical properties of voltage in order to evaluate the probability isomorphism between probability subgraphs. The circuit simulation based probability isomorphic can avoid using traditional possible world model. Finally, based on the algorithm of probability subgraph isomorphism, two-step hierarchical clustering method is used to cluster subgraphs, and discover frequent probability patterns from the clusters. Results The experiment results on data sets of the Protein-Protein Interaction (PPI) networks and the transcriptional regulatory networks of E. coli and S. cerevisiae show that the proposed method can efficiently discover the frequent probability subgraphs. The discovered subgraphs in our study contain all probability motifs reported in the experiments published in other related papers. Conclusions The algorithm of probability graph isomorphism evaluation based on circuit simulation method excludes most of subgraphs which are not probability isomorphism and reduces the search space of the probability isomorphism subgraphs using the mismatch values in the node voltage set. It is an innovative way to find the frequent probability patterns, which can be efficiently applied to probability motif discovery problems in the further studies. PMID:25350277
Unitary circular code motifs in genomes of eukaryotes.

PubMed

El Soufi, Karim; Michel, Christian J

A set X of 20 trinucleotides was identified in genes of bacteria, eukaryotes, plasmids and viruses, which has in average the highest occurrence in reading frame compared to its two shifted frames (Michel, 2015; Arquès and Michel, 1996). This set X has an interesting mathematical property as X is a circular code (Arquès and Michel, 1996). Thus, the motifs from this circular code X, called X motifs, have the property to always retrieve, synchronize and maintain the reading frame in genes. The origin of this circular code X in genes is an open problem since its discovery in 1996. Here, we first show that the unitary circular codes (UCC), i.e. sets of one word, allow to generate unitary circular code motifs (UCC motifs), i.e. a concatenation of the same motif (simple repeats) leading to low complexity DNA. Three classes of UCC motifs are studied here: repeated dinucleotides (D + motifs), repeated trinucleotides (T + motifs) and repeated tetranucleotides (T + motifs). Thus, the D + , T + and T + motifs allow to retrieve, synchronize and maintain a frame modulo 2, modulo 3 and modulo 4, respectively, and their shifted frames (1 modulo 2; 1 and 2 modulo 3; 1, 2 and 3 modulo 4 according to the C 2 , C 3 and C 4 properties, respectively) in the DNA sequences. The statistical distribution of the D + , T + and T + motifs is analyzed in the genomes of eukaryotes. A UCC motif and its comp lementary UCC motif have the same distribution in the eukaryotic genomes. Furthermore, a UCC motif and its complementary UCC motif have increasing occurrences contrary to their number of hydrogen bonds, very significant with the T + motifs. The longest D + , T + and T + motifs in the studied eukaryotic genomes are also given. Surprisingly, a scarcity of repeated trinucleotides (T + motifs) in the large eukaryotic genomes is observed compared to the D + and T + motifs. This result has been investigated and may be explained by two outcomes. Repeated trinucleotides (T + motifs) are identified in the X motifs of low composition (cardinality less than 10) in the genomes of eukaryotes. Furthermore, identical trinucleotide pairs of the circular code X are preferentially used in the gene sequences of eukaryotes. These two results suggest that the unitary circular codes of trinucleotides may have been involved in the formation of the trinucleotide circular code X. Indeed, repeated trinucleotides in the X motifs in the genomes of eukaryotes may represent an intermediary evolution from repeated trinucleotides of cardinality 1 (T + motifs) in the genomes of eukaryotes up to the X motifs of cardinality 20 in the gene sequences of eukaryotes. Copyright © 2017 Elsevier B.V. All rights reserved.
Interaction of Tsg101 with Marburg Virus VP40 Depends on the PPPY Motif, but Not the PT/SAP Motif as in the Case of Ebola Virus, and Tsg101 Plays a Critical Role in the Budding of Marburg Virus-Like Particles Induced by VP40, NP, and GP▿

PubMed Central

Urata, Shuzo; Noda, Takeshi; Kawaoka, Yoshihiro; Morikawa, Shigeru; Yokosawa, Hideyoshi; Yasuda, Jiro

2007-01-01

Marburg virus (MARV) VP40 is a matrix protein that can be released from mammalian cells in the form of virus-like particles (VLPs) and contains the PPPY sequence, which is an L-domain motif. Here, we demonstrate that the PPPY motif is important for VP40-induced VLP budding and that VLP production is significantly enhanced by coexpression of NP and GP. We show that Tsg101 interacts with VP40 depending on the presence of the PPPY motif, but not the PT/SAP motif as in the case of Ebola virus, and plays an important role in VLP budding. These findings provide new insights into the mechanism of MARV budding. PMID:17301151
Symmetry compression method for discovering network motifs.

PubMed

Wang, Jianxin; Huang, Yuannan; Wu, Fang-Xiang; Pan, Yi

2012-01-01

Discovering network motifs could provide a significant insight into systems biology. Interestingly, many biological networks have been found to have a high degree of symmetry (automorphism), which is inherent in biological network topologies. The symmetry due to the large number of basic symmetric subgraphs (BSSs) causes a certain redundant calculation in discovering network motifs. Therefore, we compress all basic symmetric subgraphs before extracting compressed subgraphs and propose an efficient decompression algorithm to decompress all compressed subgraphs without loss of any information. In contrast to previous approaches, the novel Symmetry Compression method for Motif Detection, named as SCMD, eliminates most redundant calculations caused by widespread symmetry of biological networks. We use SCMD to improve three notable exact algorithms and two efficient sampling algorithms. Results of all exact algorithms with SCMD are the same as those of the original algorithms, since SCMD is a lossless method. The sampling results show that the use of SCMD almost does not affect the quality of sampling results. For highly symmetric networks, we find that SCMD used in both exact and sampling algorithms can help get a remarkable speedup. Furthermore, SCMD enables us to find larger motifs in biological networks with notable symmetry than previously possible.
Motif discovery with data mining in 3D protein structure databases: discovery, validation and prediction of the U-shape zinc binding ("Huf-Zinc") motif.

PubMed

Maurer-Stroh, Sebastian; Gao, He; Han, Hao; Baeten, Lies; Schymkowitz, Joost; Rousseau, Frederic; Zhang, Louxin; Eisenhaber, Frank

2013-02-01

Data mining in protein databases, derivatives from more fundamental protein 3D structure and sequence databases, has considerable unearthed potential for the discovery of sequence motif--structural motif--function relationships as the finding of the U-shape (Huf-Zinc) motif, originally a small student's project, exemplifies. The metal ion zinc is critically involved in universal biological processes, ranging from protein-DNA complexes and transcription regulation to enzymatic catalysis and metabolic pathways. Proteins have evolved a series of motifs to specifically recognize and bind zinc ions. Many of these, so called zinc fingers, are structurally independent globular domains with discontinuous binding motifs made up of residues mostly far apart in sequence. Through a systematic approach starting from the BRIX structure fragment database, we discovered that there exists another predictable subset of zinc-binding motifs that not only have a conserved continuous sequence pattern but also share a characteristic local conformation, despite being included in totally different overall folds. While this does not allow general prediction of all Zn binding motifs, a HMM-based web server, Huf-Zinc, is available for prediction of these novel, as well as conventional, zinc finger motifs in protein sequences. The Huf-Zinc webserver can be freely accessed through this URL (http://mendel.bii.a-star.edu.sg/METHODS/hufzinc/).

ssHMM: extracting intuitive sequence-structure motifs from high-throughput RNA-binding protein data

PubMed Central

Krestel, Ralf; Ohler, Uwe; Vingron, Martin; Marsico, Annalisa

2017-01-01

Abstract RNA-binding proteins (RBPs) play an important role in RNA post-transcriptional regulation and recognize target RNAs via sequence-structure motifs. The extent to which RNA structure influences protein binding in the presence or absence of a sequence motif is still poorly understood. Existing RNA motif finders either take the structure of the RNA only partially into account, or employ models which are not directly interpretable as sequence-structure motifs. We developed ssHMM, an RNA motif finder based on a hidden Markov model (HMM) and Gibbs sampling which fully captures the relationship between RNA sequence and secondary structure preference of a given RBP. Compared to previous methods which output separate logos for sequence and structure, it directly produces a combined sequence-structure motif when trained on a large set of sequences. ssHMM’s model is visualized intuitively as a graph and facilitates biological interpretation. ssHMM can be used to find novel bona fide sequence-structure motifs of uncharacterized RBPs, such as the one presented here for the YY1 protein. ssHMM reaches a high motif recovery rate on synthetic data, it recovers known RBP motifs from CLIP-Seq data, and scales linearly on the input size, being considerably faster than MEMERIS and RNAcontext on large datasets while being on par with GraphProt. It is freely available on Github and as a Docker image. PMID:28977546
Dynamic motifs in socio-economic networks

NASA Astrophysics Data System (ADS)

Zhang, Xin; Shao, Shuai; Stanley, H. Eugene; Havlin, Shlomo

2014-12-01

Socio-economic networks are of central importance in economic life. We develop a method of identifying and studying motifs in socio-economic networks by focusing on “dynamic motifs,” i.e., evolutionary connection patterns that, because of “node acquaintances” in the network, occur much more frequently than random patterns. We examine two evolving bi-partite networks: i) the world-wide commercial ship chartering market and ii) the ship build-to-order market. We find similar dynamic motifs in both bipartite networks, even though they describe different economic activities. We also find that “influence” and “persistence” are strong factors in the interaction behavior of organizations. When two companies are doing business with the same customer, it is highly probable that another customer who currently only has business relationship with one of these two companies, will become customer of the second in the future. This is the effect of influence. Persistence means that companies with close business ties to customers tend to maintain their relationships over a long period of time.
iFORM: Incorporating Find Occurrence of Regulatory Motifs.

PubMed

Ren, Chao; Chen, Hebing; Yang, Bite; Liu, Feng; Ouyang, Zhangyi; Bo, Xiaochen; Shu, Wenjie

2016-01-01

Accurately identifying the binding sites of transcription factors (TFs) is crucial to understanding the mechanisms of transcriptional regulation and human disease. We present incorporating Find Occurrence of Regulatory Motifs (iFORM), an easy-to-use and efficient tool for scanning DNA sequences with TF motifs described as position weight matrices (PWMs). Both performance assessment with a receiver operating characteristic (ROC) curve and a correlation-based approach demonstrated that iFORM achieves higher accuracy and sensitivity by integrating five classical motif discovery programs using Fisher's combined probability test. We have used iFORM to provide accurate results on a variety of data in the ENCODE Project and the NIH Roadmap Epigenomics Project, and the tool has demonstrated its utility in further elucidating individual roles of functional elements. Both the source and binary codes for iFORM can be freely accessed at https://github.com/wenjiegroup/iFORM. The identified TF binding sites across human cell and tissue types using iFORM have been deposited in the Gene Expression Omnibus under the accession ID GSE53962.
Space-related pharma-motifs for fast search of protein binding motifs and polypharmacological targets

PubMed Central

2012-01-01

Background To discover a compound inhibiting multiple proteins (i.e. polypharmacological targets) is a new paradigm for the complex diseases (e.g. cancers and diabetes). In general, the polypharmacological proteins often share similar local binding environments and motifs. As the exponential growth of the number of protein structures, to find the similar structural binding motifs (pharma-motifs) is an emergency task for drug discovery (e.g. side effects and new uses for old drugs) and protein functions. Results We have developed a Space-Related Pharmamotifs (called SRPmotif) method to recognize the binding motifs by searching against protein structure database. SRPmotif is able to recognize conserved binding environments containing spatially discontinuous pharma-motifs which are often short conserved peptides with specific physico-chemical properties for protein functions. Among 356 pharma-motifs, 56.5% interacting residues are highly conserved. Experimental results indicate that 81.1% and 92.7% polypharmacological targets of each protein-ligand complex are annotated with same biological process (BP) and molecular function (MF) terms, respectively, based on Gene Ontology (GO). Our experimental results show that the identified pharma-motifs often consist of key residues in functional (active) sites and play the key roles for protein functions. The SRPmotif is available at http://gemdock.life.nctu.edu.tw/SRP/. Conclusions SRPmotif is able to identify similar pharma-interfaces and pharma-motifs sharing similar binding environments for polypharmacological targets by rapidly searching against the protein structure database. Pharma-motifs describe the conservations of binding environments for drug discovery and protein functions. Additionally, these pharma-motifs provide the clues for discovering new sequence-based motifs to predict protein functions from protein sequence databases. We believe that SRPmotif is useful for elucidating protein functions and drug discovery. PMID:23281852
Space-related pharma-motifs for fast search of protein binding motifs and polypharmacological targets.

PubMed

Chiu, Yi-Yuan; Lin, Chun-Yu; Lin, Chih-Ta; Hsu, Kai-Cheng; Chang, Li-Zen; Yang, Jinn-Moon

2012-01-01

To discover a compound inhibiting multiple proteins (i.e. polypharmacological targets) is a new paradigm for the complex diseases (e.g. cancers and diabetes). In general, the polypharmacological proteins often share similar local binding environments and motifs. As the exponential growth of the number of protein structures, to find the similar structural binding motifs (pharma-motifs) is an emergency task for drug discovery (e.g. side effects and new uses for old drugs) and protein functions. We have developed a Space-Related Pharmamotifs (called SRPmotif) method to recognize the binding motifs by searching against protein structure database. SRPmotif is able to recognize conserved binding environments containing spatially discontinuous pharma-motifs which are often short conserved peptides with specific physico-chemical properties for protein functions. Among 356 pharma-motifs, 56.5% interacting residues are highly conserved. Experimental results indicate that 81.1% and 92.7% polypharmacological targets of each protein-ligand complex are annotated with same biological process (BP) and molecular function (MF) terms, respectively, based on Gene Ontology (GO). Our experimental results show that the identified pharma-motifs often consist of key residues in functional (active) sites and play the key roles for protein functions. The SRPmotif is available at http://gemdock.life.nctu.edu.tw/SRP/. SRPmotif is able to identify similar pharma-interfaces and pharma-motifs sharing similar binding environments for polypharmacological targets by rapidly searching against the protein structure database. Pharma-motifs describe the conservations of binding environments for drug discovery and protein functions. Additionally, these pharma-motifs provide the clues for discovering new sequence-based motifs to predict protein functions from protein sequence databases. We believe that SRPmotif is useful for elucidating protein functions and drug discovery.
An experimental test of a fundamental food web motif.

PubMed

Rip, Jason M K; McCann, Kevin S; Lynn, Denis H; Fawcett, Sonia

2010-06-07

Large-scale changes to the world's ecosystem are resulting in the deterioration of biostructure-the complex web of species interactions that make up ecological communities. A difficult, yet crucial task is to identify food web structures, or food web motifs, that are the building blocks of this baroque network of interactions. Once identified, these food web motifs can then be examined through experiments and theory to provide mechanistic explanations for how structure governs ecosystem stability. Here, we synthesize recent ecological research to show that generalist consumers coupling resources with different interaction strengths, is one such motif. This motif amazingly occurs across an enormous range of spatial scales, and so acts to distribute coupled weak and strong interactions throughout food webs. We then perform an experiment that illustrates the importance of this motif to ecological stability. We find that weak interactions coupled to strong interactions by generalist consumers dampen strong interaction strengths and increase community stability. This study takes a critical step by isolating a common food web motif and through clear, experimental manipulation, identifies the fundamental stabilizing consequences of this structure for ecological communities.
Oscillatory Protein Expression Dynamics Endows Stem Cells with Robust Differentiation Potential

PubMed Central

Kaneko, Kunihiko

2011-01-01

The lack of understanding of stem cell differentiation and proliferation is a fundamental problem in developmental biology. Although gene regulatory networks (GRNs) for stem cell differentiation have been partially identified, the nature of differentiation dynamics and their regulation leading to robust development remain unclear. Herein, using a dynamical system modeling cell approach, we performed simulations of the developmental process using all possible GRNs with a few genes, and screened GRNs that could generate cell type diversity through cell-cell interactions. We found that model stem cells that both proliferated and differentiated always exhibited oscillatory expression dynamics, and the differentiation frequency of such stem cells was regulated, resulting in a robust number distribution. Moreover, we uncovered the common regulatory motifs for stem cell differentiation, in which a combination of regulatory motifs that generated oscillatory expression dynamics and stabilized distinct cellular states played an essential role. These findings may explain the recently observed heterogeneity and dynamic equilibrium in cellular states of stem cells, and can be used to predict regulatory networks responsible for differentiation in stem cell systems. PMID:22073296
Dynamics of brain activity underlying working memory for music in a naturalistic condition.

PubMed

Burunat, Iballa; Alluri, Vinoo; Toiviainen, Petri; Numminen, Jussi; Brattico, Elvira

2014-08-01

We aimed at determining the functional neuroanatomy of working memory (WM) recognition of musical motifs that occurs while listening to music by adopting a non-standard procedure. Western tonal music provides naturally occurring repetition and variation of motifs. These serve as WM triggers, thus allowing us to study the phenomenon of motif tracking within real music. Adopting a modern tango as stimulus, a behavioural test helped to identify the stimulus motifs and build a time-course regressor of WM neural responses. This regressor was then correlated with the participants' (musicians') functional magnetic resonance imaging (fMRI) signal obtained during a continuous listening condition. In order to fine-tune the identification of WM processes in the brain, the variance accounted for by the sensory processing of a set of the stimulus' acoustic features was pruned from participants' neurovascular responses to music. Motivic repetitions activated prefrontal and motor cortical areas, basal ganglia, medial temporal lobe (MTL) structures, and cerebellum. The findings suggest that WM processing of motifs while listening to music emerges from the integration of neural activity distributed over cognitive, motor and limbic subsystems. The recruitment of the hippocampus stands as a novel finding in auditory WM. Effective connectivity and agglomerative hierarchical clustering analyses indicate that the hippocampal connectivity is modulated by motif repetitions, showing strong connections with WM-relevant areas (dorsolateral prefrontal cortex - dlPFC, supplementary motor area - SMA, and cerebellum), which supports the role of the hippocampus in the encoding of the musical motifs in WM, and may evidence long-term memory (LTM) formation, enabled by the use of a realistic listening condition. Copyright © 2014 Elsevier Ltd. All rights reserved.
Parallel arrangements of positive feedback loops limit cell-to-cell variability in differentiation.

PubMed

Dey, Anupam; Barik, Debashis

2017-01-01

Cellular differentiations are often regulated by bistable switches resulting from specific arrangements of multiple positive feedback loops (PFL) fused to one another. Although bistability generates digital responses at the cellular level, stochasticity in chemical reactions causes population heterogeneity in terms of its differentiated states. We hypothesized that the specific arrangements of PFLs may have evolved to minimize the cellular heterogeneity in differentiation. In order to test this we investigated variability in cellular differentiation controlled either by parallel or serial arrangements of multiple PFLs having similar average properties under extrinsic and intrinsic noises. We find that motifs with PFLs fused in parallel to one another around a central regulator are less susceptible to noise as compared to the motifs with PFLs arranged serially. Our calculations suggest that the increased resistance to noise in parallel motifs originate from the less sensitivity of bifurcation points to the extrinsic noise. Whereas estimation of mean residence times indicate that stable branches of bifurcations are robust to intrinsic noise in parallel motifs as compared to serial motifs. Model conclusions are consistent both in AND- and OR-gate input signal configurations and also with two different modeling strategies. Our investigations provide some insight into recent findings that differentiation of preadipocyte to mature adipocyte is controlled by network of parallel PFLs.
SARNAclust: Semi-automatic detection of RNA protein binding motifs from immunoprecipitation data

PubMed Central

Dotu, Ivan; Adamson, Scott I.; Coleman, Benjamin; Fournier, Cyril; Ricart-Altimiras, Emma; Eyras, Eduardo

2018-01-01

RNA-protein binding is critical to gene regulation, controlling fundamental processes including splicing, translation, localization and stability, and aberrant RNA-protein interactions are known to play a role in a wide variety of diseases. However, molecular understanding of RNA-protein interactions remains limited; in particular, identification of RNA motifs that bind proteins has long been challenging, especially when such motifs depend on both sequence and structure. Moreover, although RNA binding proteins (RBPs) often contain more than one binding domain, algorithms capable of identifying more than one binding motif simultaneously have not been developed. In this paper we present a novel pipeline to determine binding peaks in crosslinking immunoprecipitation (CLIP) data, to discover multiple possible RNA sequence/structure motifs among them, and to experimentally validate such motifs. At the core is a new semi-automatic algorithm SARNAclust, the first unsupervised method to identify and deconvolve multiple sequence/structure motifs simultaneously. SARNAclust computes similarity between sequence/structure objects using a graph kernel, providing the ability to isolate the impact of specific features through the bulge graph formalism. Application of SARNAclust to synthetic data shows its capability of clustering 5 motifs at once with a V-measure value of over 0.95, while GraphClust achieves only a V-measure of 0.083 and RNAcontext cannot detect any of the motifs. When applied to existing eCLIP sets, SARNAclust finds known motifs for SLBP and HNRNPC and novel motifs for several other RBPs such as AGGF1, AKAP8L and ILF3. We demonstrate an experimental validation protocol, a targeted Bind-n-Seq-like high-throughput sequencing approach that relies on RNA inverse folding for oligo pool design, that can validate the components within the SLBP motif. Finally, we use this protocol to experimentally interrogate the SARNAclust motif predictions for protein ILF3. Our results support a newly identified partially double-stranded UUUUUGAGA motif similar to that known for the splicing factor HNRNPC. PMID:29596423
Advantages and disadvantages in usage of bioinformatic programs in promoter region analysis

NASA Astrophysics Data System (ADS)

Pawełkowicz, Magdalena E.; Skarzyńska, Agnieszka; Posyniak, Kacper; ZiÄ bska, Karolina; PlÄ der, Wojciech; Przybecki, Zbigniew

2015-09-01

An important computational challenge is finding the regulatory elements across the promotor region. In this work we present the advantages and disadvantages from the application of different bioinformatics programs for localization of transcription factor binding sites in the upstream region of genes connected with sex determination in cucumber. We use PlantCARE, PlantPAN and SignalScan to find motifs in the promotor regions. The results have been compared and possible function of chosen motifs has been described.
Argo_CUDA: Exhaustive GPU based approach for motif discovery in large DNA datasets.

PubMed

Vishnevsky, Oleg V; Bocharnikov, Andrey V; Kolchanov, Nikolay A

2018-02-01

The development of chromatin immunoprecipitation sequencing (ChIP-seq) technology has revolutionized the genetic analysis of the basic mechanisms underlying transcription regulation and led to accumulation of information about a huge amount of DNA sequences. There are a lot of web services which are currently available for de novo motif discovery in datasets containing information about DNA/protein binding. An enormous motif diversity makes their finding challenging. In order to avoid the difficulties, researchers use different stochastic approaches. Unfortunately, the efficiency of the motif discovery programs dramatically declines with the query set size increase. This leads to the fact that only a fraction of top "peak" ChIP-Seq segments can be analyzed or the area of analysis should be narrowed. Thus, the motif discovery in massive datasets remains a challenging issue. Argo_Compute Unified Device Architecture (CUDA) web service is designed to process the massive DNA data. It is a program for the detection of degenerate oligonucleotide motifs of fixed length written in 15-letter IUPAC code. Argo_CUDA is a full-exhaustive approach based on the high-performance GPU technologies. Compared with the existing motif discovery web services, Argo_CUDA shows good prediction quality on simulated sets. The analysis of ChIP-Seq sequences revealed the motifs which correspond to known transcription factor binding sites.
Convolutional neural network architectures for predicting DNA–protein binding

PubMed Central

Zeng, Haoyang; Edwards, Matthew D.; Liu, Ge; Gifford, David K.

2016-01-01

Motivation: Convolutional neural networks (CNN) have outperformed conventional methods in modeling the sequence specificity of DNA–protein binding. Yet inappropriate CNN architectures can yield poorer performance than simpler models. Thus an in-depth understanding of how to match CNN architecture to a given task is needed to fully harness the power of CNNs for computational biology applications. Results: We present a systematic exploration of CNN architectures for predicting DNA sequence binding using a large compendium of transcription factor datasets. We identify the best-performing architectures by varying CNN width, depth and pooling designs. We find that adding convolutional kernels to a network is important for motif-based tasks. We show the benefits of CNNs in learning rich higher-order sequence features, such as secondary motifs and local sequence context, by comparing network performance on multiple modeling tasks ranging in difficulty. We also demonstrate how careful construction of sequence benchmark datasets, using approaches that control potentially confounding effects like positional or motif strength bias, is critical in making fair comparisons between competing methods. We explore how to establish the sufficiency of training data for these learning tasks, and we have created a flexible cloud-based framework that permits the rapid exploration of alternative neural network architectures for problems in computational biology. Availability and Implementation: All the models analyzed are available at http://cnn.csail.mit.edu. Contact: gifford@mit.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27307608
Predictive regulatory models in Drosophila melanogaster by integrative inference of transcriptional networks

PubMed Central

Marbach, Daniel; Roy, Sushmita; Ay, Ferhat; Meyer, Patrick E.; Candeias, Rogerio; Kahveci, Tamer; Bristow, Christopher A.; Kellis, Manolis

2012-01-01

Gaining insights on gene regulation from large-scale functional data sets is a grand challenge in systems biology. In this article, we develop and apply methods for transcriptional regulatory network inference from diverse functional genomics data sets and demonstrate their value for gene function and gene expression prediction. We formulate the network inference problem in a machine-learning framework and use both supervised and unsupervised methods to predict regulatory edges by integrating transcription factor (TF) binding, evolutionarily conserved sequence motifs, gene expression, and chromatin modification data sets as input features. Applying these methods to Drosophila melanogaster, we predict ∼300,000 regulatory edges in a network of ∼600 TFs and 12,000 target genes. We validate our predictions using known regulatory interactions, gene functional annotations, tissue-specific expression, protein–protein interactions, and three-dimensional maps of chromosome conformation. We use the inferred network to identify putative functions for hundreds of previously uncharacterized genes, including many in nervous system development, which are independently confirmed based on their tissue-specific expression patterns. Last, we use the regulatory network to predict target gene expression levels as a function of TF expression, and find significantly higher predictive power for integrative networks than for motif or ChIP-based networks. Our work reveals the complementarity between physical evidence of regulatory interactions (TF binding, motif conservation) and functional evidence (coordinated expression or chromatin patterns) and demonstrates the power of data integration for network inference and studies of gene regulation at the systems level. PMID:22456606
GPUmotif: An Ultra-Fast and Energy-Efficient Motif Analysis Program Using Graphics Processing Units

PubMed Central

Zandevakili, Pooya; Hu, Ming; Qin, Zhaohui

2012-01-01

Computational detection of TF binding patterns has become an indispensable tool in functional genomics research. With the rapid advance of new sequencing technologies, large amounts of protein-DNA interaction data have been produced. Analyzing this data can provide substantial insight into the mechanisms of transcriptional regulation. However, the massive amount of sequence data presents daunting challenges. In our previous work, we have developed a novel algorithm called Hybrid Motif Sampler (HMS) that enables more scalable and accurate motif analysis. Despite much improvement, HMS is still time-consuming due to the requirement to calculate matching probabilities position-by-position. Using the NVIDIA CUDA toolkit, we developed a graphics processing unit (GPU)-accelerated motif analysis program named GPUmotif. We proposed a “fragmentation" technique to hide data transfer time between memories. Performance comparison studies showed that commonly-used model-based motif scan and de novo motif finding procedures such as HMS can be dramatically accelerated when running GPUmotif on NVIDIA graphics cards. As a result, energy consumption can also be greatly reduced when running motif analysis using GPUmotif. The GPUmotif program is freely available at http://sourceforge.net/projects/gpumotif/ PMID:22662128
GPUmotif: an ultra-fast and energy-efficient motif analysis program using graphics processing units.

PubMed

Zandevakili, Pooya; Hu, Ming; Qin, Zhaohui

2012-01-01

Computational detection of TF binding patterns has become an indispensable tool in functional genomics research. With the rapid advance of new sequencing technologies, large amounts of protein-DNA interaction data have been produced. Analyzing this data can provide substantial insight into the mechanisms of transcriptional regulation. However, the massive amount of sequence data presents daunting challenges. In our previous work, we have developed a novel algorithm called Hybrid Motif Sampler (HMS) that enables more scalable and accurate motif analysis. Despite much improvement, HMS is still time-consuming due to the requirement to calculate matching probabilities position-by-position. Using the NVIDIA CUDA toolkit, we developed a graphics processing unit (GPU)-accelerated motif analysis program named GPUmotif. We proposed a "fragmentation" technique to hide data transfer time between memories. Performance comparison studies showed that commonly-used model-based motif scan and de novo motif finding procedures such as HMS can be dramatically accelerated when running GPUmotif on NVIDIA graphics cards. As a result, energy consumption can also be greatly reduced when running motif analysis using GPUmotif. The GPUmotif program is freely available at http://sourceforge.net/projects/gpumotif/
Binding Modes of Teixobactin to Lipid II: Molecular Dynamics Study.

PubMed

Liu, Yang; Liu, Yaxin; Chan-Park, Mary B; Mu, Yuguang

2017-12-08

Teixobactin (TXB) is a newly discovered antibiotic targeting the bacterial cell wall precursor Lipid II (L II ). In the present work, four binding modes of TXB on L II were identified by a contact-map based clustering method. The highly flexible binary complex ensemble was generated by parallel tempering metadynamics simulation in a well-tempered ensemble (PTMetaD-WTE). In agreement with experimental findings, the pyrophosphate group and the attached first sugar subunit of L II are found to be the minimal motif for stable TXB binding. Three of the four binding modes involve the ring structure of TXB and have relatively higher binding affinities, indicating the importance of the ring motif of TXB in L II recognition. TXB-L II complexes with a ratio of 2:1 are also predicted with configurations such that the ring motif of two TXB molecules bound to the pyrophosphate-MurNAc moiety and the glutamic acid residue of one L II , respectively. Our findings disclose that the ring motif of TXB is critical to L II binding and novel antibiotics can be designed based on its mimetics.
CoSMoS: Conserved Sequence Motif Search in the proteome

PubMed Central

Liu, Xiao I; Korde, Neeraj; Jakob, Ursula; Leichert, Lars I

2006-01-01

Background With the ever-increasing number of gene sequences in the public databases, generating and analyzing multiple sequence alignments becomes increasingly time consuming. Nevertheless it is a task performed on a regular basis by researchers in many labs. Results We have now created a database called CoSMoS to find the occurrences and at the same time evaluate the significance of sequence motifs and amino acids encoded in the whole genome of the model organism Escherichia coli K12. We provide a precomputed set of multiple sequence alignments for each individual E. coli protein with all of its homologues in the RefSeq database. The alignments themselves, information about the occurrence of sequence motifs together with information on the conservation of each of the more than 1.3 million amino acids encoded in the E. coli genome can be accessed via the web interface of CoSMoS. Conclusion CoSMoS is a valuable tool to identify highly conserved sequence motifs, to find regions suitable for mutational studies in functional analyses and to predict important structural features in E. coli proteins. PMID:16433915
Identification of sequence motifs significantly associated with antisense activity.

PubMed

McQuisten, Kyle A; Peek, Andrew S

2007-06-07

Predicting the suppression activity of antisense oligonucleotide sequences is the main goal of the rational design of nucleic acids. To create an effective predictive model, it is important to know what properties of an oligonucleotide sequence associate significantly with antisense activity. Also, for the model to be efficient we must know what properties do not associate significantly and can be omitted from the model. This paper will discuss the results of a randomization procedure to find motifs that associate significantly with either high or low antisense suppression activity, analysis of their properties, as well as the results of support vector machine modelling using these significant motifs as features. We discovered 155 motifs that associate significantly with high antisense suppression activity and 202 motifs that associate significantly with low suppression activity. The motifs range in length from 2 to 5 bases, contain several motifs that have been previously discovered as associating highly with antisense activity, and have thermodynamic properties consistent with previous work associating thermodynamic properties of sequences with their antisense activity. Statistical analysis revealed no correlation between a motif's position within an antisense sequence and that sequences antisense activity. Also, many significant motifs existed as subwords of other significant motifs. Support vector regression experiments indicated that the feature set of significant motifs increased correlation compared to all possible motifs as well as several subsets of the significant motifs. The thermodynamic properties of the significantly associated motifs support existing data correlating the thermodynamic properties of the antisense oligonucleotide with antisense efficiency, reinforcing our hypothesis that antisense suppression is strongly associated with probe/target thermodynamics, as there are no enzymatic mediators to speed the process along like the RNA Induced Silencing Complex (RISC) in RNAi. The independence of motif position and antisense activity also allows us to bypass consideration of this feature in the modelling process, promoting model efficiency and reducing the chance of overfitting when predicting antisense activity. The increase in SVR correlation with significant features compared to nearest-neighbour features indicates that thermodynamics alone is likely not the only factor in determining antisense efficiency.
Transcription factor ThWRKY4 binds to a novel WLS motif and a RAV1A element in addition to the W-box to regulate gene expression.

PubMed

Xu, Hongyun; Shi, Xinxin; Wang, Zhibo; Gao, Caiqiu; Wang, Chao; Wang, Yucheng

2017-08-01

WRKY transcription factors play important roles in many biological processes, and mainly bind to the W-box element to regulate gene expression. Previously, we characterized a WRKY gene from Tamarix hispida, ThWRKY4, in response to abiotic stress, and showed that it bound to the W-box motif. However, whether ThWRKY4 could bind to other motifs remains unknown. In this study, we employed a Transcription Factor-Centered Yeast one Hybrid (TF-Centered Y1H) screen to study the motifs recognized by ThWRKY4. In addition to the W-box core cis-element (termed W-box), we identified that ThWRKY4 could bind to two other motifs: the RAV1A element (CAACA) and a novel motif with sequence of GTCTA (W-box like sequence, WLS). The distributions of these motifs were screened in the promoter regions of genes regulated by some WRKYs. The results showed that the W-box, RAV1A, and WLS motifs were all present in high numbers, suggesting that they play key roles in gene expression mediated by WRKYs. Furthermore, five WRKY proteins from different WRKY subfamilies in Arabidopsis thaliana were selected and confirmed to bind to the RAV1A and WLS motifs, indicating that they are recognized commonly by WRKYs. These findings will help to further reveal the functions of WRKY proteins. Copyright © 2017 Elsevier B.V. All rights reserved.

Crystal Structure Predictions Using Adaptive Genetic Algorithm and Motif Search methods

NASA Astrophysics Data System (ADS)

Ho, K. M.; Wang, C. Z.; Zhao, X.; Wu, S.; Lyu, X.; Zhu, Z.; Nguyen, M. C.; Umemoto, K.; Wentzcovitch, R. M. M.

2017-12-01

Material informatics is a new initiative which has attracted a lot of attention in recent scientific research. The basic strategy is to construct comprehensive data sets and use machine learning to solve a wide variety of problems in material design and discovery. In pursuit of this goal, a key element is the quality and completeness of the databases used. Recent advance in the development of crystal structure prediction algorithms has made it a complementary and more efficient approach to explore the structure/phase space in materials using computers. In this talk, we discuss the importance of the structural motifs and motif-networks in crystal structure predictions. Correspondingly, powerful methods are developed to improve the sampling of the low-energy structure landscape.
Nanotube Interactions with Nanoparticles and Peptides

DTIC Science & Technology

2008-01-01

combinatorial phage display technique. We find a tryptophan rich binding motif to nanotubes on solid silicon substrates. The motif resembles an alpha helix...CHAPTER 2. DIELECTROPHORESIS AND PHAGE DISPLAY 2.1. Dielectrophoresis (DEP) 12 2.2. Phage display 14 References...104 5.3. Conclusions 105 5.4. Experimental Section 105 5.4.1. Nanotube synthesis 105 5.4.2. Phage display
T:G mismatch-specific thymine-DNA glycosylase (TDG) as a coregulator of transcription interacts with SRC1 family members through a novel tyrosine repeat motif

PubMed Central

Lucey, Marie J.; Chen, Dongsheng; Lopez-Garcia, Jorge; Hart, Stephen M.; Phoenix, Fladia; Al-Jehani, Rajai; Alao, John P.; White, Roger; Kindle, Karin B.; Losson, Régine; Chambon, Pierre; Parker, Malcolm G.; Schär, Primo; Heery, David M.; Buluwela, Lakjaya; Ali, Simak

2005-01-01

Gene activation involves protein complexes with diverse enzymatic activities, some of which are involved in chromatin modification. We have shown previously that the base excision repair enzyme thymine DNA glycosylase (TDG) acts as a potent coactivator for estrogen receptor-α. To further understand how TDG acts in this context, we studied its interaction with known coactivators of nuclear receptors. We find that TDG interacts in vitro and in vivo with the p160 coactivator SRC1, with the interaction being mediated by a previously undescribed motif encoding four equally spaced tyrosine residues in TDG, each tyrosine being separated by three amino acids. This is found to interact with two motifs in SRC1 also containing tyrosine residues separated by three amino acids. Site-directed mutagenesis shows that the tyrosines encoded in these motifs are critical for the interaction. The related p160 protein TIF2 does not interact with TDG and has the altered sequence, F-X-X-X-Y, at the equivalent positions relative to SRC1. Substitution of the phenylalanines to tyrosines is sufficient to bring about interaction of TIF2 with TDG. These findings highlight a new protein–protein interaction motif based on Y-X-X-X-Y and provide new insight into the interaction of diverse proteins in coactivator complexes. PMID:16282588
Temporal motifs reveal collaboration patterns in online task-oriented networks

NASA Astrophysics Data System (ADS)

Xuan, Qi; Fang, Huiting; Fu, Chenbo; Filkov, Vladimir

2015-05-01

Real networks feature layers of interactions and complexity. In them, different types of nodes can interact with each other via a variety of events. Examples of this complexity are task-oriented social networks (TOSNs), where teams of people share tasks towards creating a quality artifact, such as academic research papers or software development in commercial or open source environments. Accomplishing those tasks involves both work, e.g., writing the papers or code, and communication, to discuss and coordinate. Taking into account the different types of activities and how they alternate over time can result in much more precise understanding of the TOSNs behaviors and outcomes. That calls for modeling techniques that can accommodate both node and link heterogeneity as well as temporal change. In this paper, we report on methodology for finding temporal motifs in TOSNs, limited to a system of two people and an artifact. We apply the methods to publicly available data of TOSNs from 31 Open Source Software projects. We find that these temporal motifs are enriched in the observed data. When applied to software development outcome, temporal motifs reveal a distinct dependency between collaboration and communication in the code writing process. Moreover, we show that models based on temporal motifs can be used to more precisely relate both individual developer centrality and team cohesion to programmer productivity than models based on aggregated TOSNs.
Temporal motifs reveal collaboration patterns in online task-oriented networks.

PubMed

Xuan, Qi; Fang, Huiting; Fu, Chenbo; Filkov, Vladimir

2015-05-01

Real networks feature layers of interactions and complexity. In them, different types of nodes can interact with each other via a variety of events. Examples of this complexity are task-oriented social networks (TOSNs), where teams of people share tasks towards creating a quality artifact, such as academic research papers or software development in commercial or open source environments. Accomplishing those tasks involves both work, e.g., writing the papers or code, and communication, to discuss and coordinate. Taking into account the different types of activities and how they alternate over time can result in much more precise understanding of the TOSNs behaviors and outcomes. That calls for modeling techniques that can accommodate both node and link heterogeneity as well as temporal change. In this paper, we report on methodology for finding temporal motifs in TOSNs, limited to a system of two people and an artifact. We apply the methods to publicly available data of TOSNs from 31 Open Source Software projects. We find that these temporal motifs are enriched in the observed data. When applied to software development outcome, temporal motifs reveal a distinct dependency between collaboration and communication in the code writing process. Moreover, we show that models based on temporal motifs can be used to more precisely relate both individual developer centrality and team cohesion to programmer productivity than models based on aggregated TOSNs.
SamSelect: a sample sequence selection algorithm for quorum planted motif search on large DNA datasets.

PubMed

Yu, Qiang; Wei, Dingbang; Huo, Hongwei

2018-06-18

Given a set of t n-length DNA sequences, q satisfying 0 < q ≤ 1, and l and d satisfying 0 ≤ d < l < n, the quorum planted motif search (qPMS) finds l-length strings that occur in at least qt input sequences with up to d mismatches and is mainly used to locate transcription factor binding sites in DNA sequences. Existing qPMS algorithms have been able to efficiently process small standard datasets (e.g., t = 20 and n = 600), but they are too time consuming to process large DNA datasets, such as ChIP-seq datasets that contain thousands of sequences or more. We analyze the effects of t and q on the time performance of qPMS algorithms and find that a large t or a small q causes a longer computation time. Based on this information, we improve the time performance of existing qPMS algorithms by selecting a sample sequence set D' with a small t and a large q from the large input dataset D and then executing qPMS algorithms on D'. A sample sequence selection algorithm named SamSelect is proposed. The experimental results on both simulated and real data show (1) that SamSelect can select D' efficiently and (2) that the qPMS algorithms executed on D' can find implanted or real motifs in a significantly shorter time than when executed on D. We improve the ability of existing qPMS algorithms to process large DNA datasets from the perspective of selecting high-quality sample sequence sets so that the qPMS algorithms can find motifs in a short time in the selected sample sequence set D', rather than take an unfeasibly long time to search the original sequence set D. Our motif discovery method is an approximate algorithm.
Prediction of virus-host protein-protein interactions mediated by short linear motifs.

PubMed

Becerra, Andrés; Bucheli, Victor A; Moreno, Pedro A

2017-03-09

Short linear motifs in host organisms proteins can be mimicked by viruses to create protein-protein interactions that disable or control metabolic pathways. Given that viral linear motif instances of host motif regular expressions can be found by chance, it is necessary to develop filtering methods of functional linear motifs. We conduct a systematic comparison of linear motifs filtering methods to develop a computational approach for predicting motif-mediated protein-protein interactions between human and the human immunodeficiency virus 1 (HIV-1). We implemented three filtering methods to obtain linear motif sets: 1) conserved in viral proteins (C), 2) located in disordered regions (D) and 3) rare or scarce in a set of randomized viral sequences (R). The sets C,D,R are united and intersected. The resulting sets are compared by the number of protein-protein interactions correctly inferred with them - with experimental validation. The comparison is done with HIV-1 sequences and interactions from the National Institute of Allergy and Infectious Diseases (NIAID). The number of correctly inferred interactions allows to rank the interactions by the sets used to deduce them: D∪R and C. The ordering of the sets is descending on the probability of capturing functional interactions. With respect to HIV-1, the sets C∪R, D∪R, C∪D∪R infer all known interactions between HIV1 and human proteins mediated by linear motifs. We found that the majority of conserved linear motifs in the virus are located in disordered regions. We have developed a method for predicting protein-protein interactions mediated by linear motifs between HIV-1 and human proteins. The method only use protein sequences as inputs. We can extend the software developed to any other eukaryotic virus and host in order to find and rank candidate interactions. In future works we will use it to explore possible viral attack mechanisms based on linear motif mimicry.
Phosphate Sink Containing Two-Component Signaling Systems as Tunable Threshold Devices

PubMed Central

Amin, Munia; Kothamachu, Varun B.; Feliu, Elisenda; Scharf, Birgit E.; Porter, Steven L.; Soyer, Orkun S.

2014-01-01

Synthetic biology aims to design de novo biological systems and reengineer existing ones. These efforts have mostly focused on transcriptional circuits, with reengineering of signaling circuits hampered by limited understanding of their systems dynamics and experimental challenges. Bacterial two-component signaling systems offer a rich diversity of sensory systems that are built around a core phosphotransfer reaction between histidine kinases and their output response regulator proteins, and thus are a good target for reengineering through synthetic biology. Here, we explore the signal-response relationship arising from a specific motif found in two-component signaling. In this motif, a single histidine kinase (HK) phosphotransfers reversibly to two separate output response regulator (RR) proteins. We show that, under the experimentally observed parameters from bacteria and yeast, this motif not only allows rapid signal termination, whereby one of the RRs acts as a phosphate sink towards the other RR (i.e. the output RR), but also implements a sigmoidal signal-response relationship. We identify two mathematical conditions on system parameters that are necessary for sigmoidal signal-response relationships and define key parameters that control threshold levels and sensitivity of the signal-response curve. We confirm these findings experimentally, by in vitro reconstitution of the one HK-two RR motif found in the Sinorhizobium meliloti chemotaxis pathway and measuring the resulting signal-response curve. We find that the level of sigmoidality in this system can be experimentally controlled by the presence of the sink RR, and also through an auxiliary protein that is shown to bind to the HK (yielding Hill coefficients of above 7). These findings show that the one HK-two RR motif allows bacteria and yeast to implement tunable switch-like signal processing and provides an ideal basis for developing threshold devices for synthetic biology applications. PMID:25357192
Multiple Binding Modes between HNF4[alpha] and the LXXLL Motifs of PGC-1[alpha] Lead to Full Activation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rha, Geun Bae; Wu, Guangteng; Shoelson, Steven E.

2010-04-15

Hepatocyte nuclear factor 4{alpha} (HNF4{alpha}) is a novel nuclear receptor that participates in a hierarchical network of transcription factors regulating the development and physiology of such vital organs as the liver, pancreas, and kidney. Among the various transcriptional coregulators with which HNF4{alpha} interacts, peroxisome proliferation-activated receptor {gamma} (PPAR{gamma}) coactivator 1{alpha} (PGC-1{alpha}) represents a novel coactivator whose activation is unusually robust and whose binding mode appears to be distinct from that of canonical coactivators such as NCoA/SRC/p160 family members. To elucidate the potentially unique molecular mechanism of PGC-1{alpha} recruitment, we have determined the crystal structure of HNF4{alpha} in complex with amore » fragment of PGC-1{alpha} containing all three of its LXXLL motifs. Despite the presence of all three LXXLL motifs available for interactions, only one is bound at the canonical binding site, with no additional contacts observed between the two proteins. However, a close inspection of the electron density map indicates that the bound LXXLL motif is not a selected one but an averaged structure of more than one LXXLL motif. Further biochemical and functional studies show that the individual LXXLL motifs can bind but drive only minimal transactivation. Only when more than one LXXLL motif is involved can significant transcriptional activity be measured, and full activation requires all three LXXLL motifs. These findings led us to propose a model wherein each LXXLL motif has an additive effect, and the multiple binding modes by HNF4{alpha} toward the LXXLL motifs of PGC-1{alpha} could account for the apparent robust activation by providing a flexible mechanism for combinatorial recruitment of additional coactivators and mediators.« less
DNA containing CpG motifs induces angiogenesis

NASA Astrophysics Data System (ADS)

Zheng, Mei; Klinman, Dennis M.; Gierynska, Malgorzata; Rouse, Barry T.

2002-06-01

New blood vessel formation in the cornea is an essential step in the pathogenesis of a blinding immunoinflammatory reaction caused by ocular infection with herpes simplex virus (HSV). By using a murine corneal micropocket assay, we found that HSV DNA (which contains a significant excess of potentially bioactive "CpG" motifs when compared with mammalian DNA) induces angiogenesis. Moreover, synthetic oligodeoxynucleotides containing CpG motifs attract inflammatory cells and stimulate the release of vascular endothelial growth factor (VEGF), which in turn triggers new blood vessel formation. In vitro, CpG DNA induces the J774A.1 murine macrophage cell line to produce VEGF. In vivo CpG-induced angiogenesis was blocked by the administration of anti-mVEGF Ab or the inclusion of "neutralizing" oligodeoxynucleotides that specifically oppose the stimulatory activity of CpG DNA. These findings establish that DNA containing bioactive CpG motifs induces angiogenesis, and suggest that CpG motifs in HSV DNA may contribute to the blinding lesions of stromal keratitis.
Assessing local structure motifs using order parameters for motif recognition, interstitial identification, and diffusion path characterization

NASA Astrophysics Data System (ADS)

Zimmermann, Nils E. R.; Horton, Matthew K.; Jain, Anubhav; Haranczyk, Maciej

2017-11-01

Structure-property relationships form the basis of many design rules in materials science, including synthesizability and long-term stability of catalysts, control of electrical and optoelectronic behavior in semiconductors as well as the capacity of and transport properties in cathode materials for rechargeable batteries. The immediate atomic environments (i.e., the first coordination shells) of a few atomic sites are often a key factor in achieving a desired property. Some of the most frequently encountered coordination patterns are tetrahedra, octahedra, body and face-centered cubic as well as hexagonal closed packed-like environments. Here, we showcase the usefulness of local order parameters to identify these basic structural motifs in inorganic solid materials by developing classification criteria. We introduce a systematic testing framework, the Einstein crystal test rig, that probes the response of order parameters to distortions in perfect motifs to validate our approach. Subsequently, we highlight three important application cases. First, we map basic crystal structure information of a large materials database in an intuitive manner by screening the Materials Project (MP) database (61,422 compounds) for element-specific motif distributions. Second, we use the structure-motif recognition capabilities to automatically find interstitials in metals, semiconductor, and insulator materials. Our Interstitialcy Finding Tool (InFiT) facilitates high-throughput screenings of defect properties. Third, the order parameters are reliable and compact quantitative structure descriptors for characterizing diffusion hops of intercalants as our example of magnesium in MnO2-spinel indicates. Finally, the tools developed in our work are readily and freely available as software implementations in the pymatgen library, and we expect them to be further applied to machine-learning approaches for emerging applications in materials science.
Assessing Local Structure Motifs Using Order Parameters for Motif Recognition, Interstitial Identification, and Diffusion Path Characterization

DOE PAGES

Zimmermann, Nils E. R.; Horton, Matthew K.; Jain, Anubhav; ...

2017-11-13

Structure–property relationships form the basis of many design rules in materials science, including synthesizability and long-term stability of catalysts, control of electrical and optoelectronic behavior in semiconductors, as well as the capacity of and transport properties in cathode materials for rechargeable batteries. The immediate atomic environments (i.e., the first coordination shells) of a few atomic sites are often a key factor in achieving a desired property. Some of the most frequently encountered coordination patterns are tetrahedra, octahedra, body and face-centered cubic as well as hexagonal close packed-like environments. Here, we showcase the usefulness of local order parameters to identify thesemore » basic structural motifs in inorganic solid materials by developing classification criteria. We introduce a systematic testing framework, the Einstein crystal test rig, that probes the response of order parameters to distortions in perfect motifs to validate our approach. Subsequently, we highlight three important application cases. First, we map basic crystal structure information of a large materials database in an intuitive manner by screening the Materials Project (MP) database (61,422 compounds) for element-specific motif distributions. Second, we use the structure-motif recognition capabilities to automatically find interstitials in metals, semiconductor, and insulator materials. Our Interstitialcy Finding Tool (InFiT) facilitates high-throughput screenings of defect properties. Third, the order parameters are reliable and compact quantitative structure descriptors for characterizing diffusion hops of intercalants as our example of magnesium in MnO 2-spinel indicates. Finally, the tools developed in our work are readily and freely available as software implementations in the pymatgen library, and we expect them to be further applied to machine-learning approaches for emerging applications in materials science.« less
Assessing Local Structure Motifs Using Order Parameters for Motif Recognition, Interstitial Identification, and Diffusion Path Characterization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zimmermann, Nils E. R.; Horton, Matthew K.; Jain, Anubhav

Structure–property relationships form the basis of many design rules in materials science, including synthesizability and long-term stability of catalysts, control of electrical and optoelectronic behavior in semiconductors, as well as the capacity of and transport properties in cathode materials for rechargeable batteries. The immediate atomic environments (i.e., the first coordination shells) of a few atomic sites are often a key factor in achieving a desired property. Some of the most frequently encountered coordination patterns are tetrahedra, octahedra, body and face-centered cubic as well as hexagonal close packed-like environments. Here, we showcase the usefulness of local order parameters to identify thesemore » basic structural motifs in inorganic solid materials by developing classification criteria. We introduce a systematic testing framework, the Einstein crystal test rig, that probes the response of order parameters to distortions in perfect motifs to validate our approach. Subsequently, we highlight three important application cases. First, we map basic crystal structure information of a large materials database in an intuitive manner by screening the Materials Project (MP) database (61,422 compounds) for element-specific motif distributions. Second, we use the structure-motif recognition capabilities to automatically find interstitials in metals, semiconductor, and insulator materials. Our Interstitialcy Finding Tool (InFiT) facilitates high-throughput screenings of defect properties. Third, the order parameters are reliable and compact quantitative structure descriptors for characterizing diffusion hops of intercalants as our example of magnesium in MnO 2-spinel indicates. Finally, the tools developed in our work are readily and freely available as software implementations in the pymatgen library, and we expect them to be further applied to machine-learning approaches for emerging applications in materials science.« less
Dynamic changes in Sox2 spatio-temporal expression promote the second cell fate decision through Fgf4/Fgfr2 signaling in preimplantation mouse embryos.

PubMed

Mistri, Tapan Kumar; Arindrarto, Wibowo; Ng, Wei Ping; Wang, Choayang; Lim, Leng Hiong; Sun, Lili; Chambers, Ian; Wohland, Thorsten; Robson, Paul

2018-03-20

Oct4 and Sox2 regulate the expression of target genes such as Nanog, Fgf4 , and Utf1 , by binding to their respective regulatory motifs. Their functional cooperation is reflected in their ability to heterodimerize on adjacent cis regulatory motifs, the composite Sox/Oct motif. Given that Oct4 and Sox2 regulate many developmental genes, a quantitative analysis of their synergistic action on different Sox/Oct motifs would yield valuable insights into the mechanisms of early embryonic development. In the present study, we measured binding affinities of Oct4 and Sox2 to different Sox/Oct motifs using fluorescence correlation spectroscopy. We found that the synergistic binding interaction is driven mainly by the level of Sox2 in the case of the Fgf4 Sox/Oct motif. Taking into account Sox2 expression levels fluctuate more than Oct4 , our finding provides an explanation on how Sox2 controls the segregation of the epiblast and primitive endoderm populations within the inner cell mass of the developing rodent blastocyst. © 2018 The Author(s). Published by Portland Press Limited on behalf of the Biochemical Society.
Cross-reactions vs co-sensitization evaluated by in silico motifs and in vitro IgE microarray testing.

PubMed

Pfiffner, P; Stadler, B M; Rasi, C; Scala, E; Mari, A

2012-02-01

Using an in silico allergen clustering method, we have recently shown that allergen extracts are highly cross-reactive. Here we used serological data from a multi-array IgE test based on recombinant or highly purified natural allergens to evaluate whether co-reactions are true cross-reactions or co-sensitizations by allergens with the same motifs. The serum database consisted of 3142 samples, each tested against 103 highly purified natural or recombinant allergens. Cross-reactivity was predicted by an iterative motif-finding algorithm through sequence motifs identified in 2708 known allergens. Allergen proteins containing the same motifs cross-reacted as predicted. However, proteins with identical motifs revealed a hierarchy in the degree of cross-reaction: The more frequent an allergen was positive in the allergic population, the less frequently it was cross-reacting and vice versa. Co-sensitization was analyzed by splitting the dataset into patient groups that were most likely sensitized through geographical occurrence of allergens. Interestingly, most co-reactions are cross-reactions but not co-sensitizations. The observed hierarchy of cross-reactivity may play an important role for the future management of allergic diseases. © 2011 John Wiley & Sons A/S.
Evolution of Drosophila ribosomal protein gene core promoters.

PubMed

Ma, Xiaotu; Zhang, Kangyu; Li, Xiaoman

2009-03-01

The coordinated expression of ribosomal protein genes (RPGs) has been well documented in many species. Previous analyses of RPG promoters focus only on Fungi and mammals. Recognizing this gap and using a comparative genomics approach, we utilize a motif-finding algorithm that incorporates cross-species conservation to identify several significant motifs in Drosophila RPG promoters. As a result, significant differences of the enriched motifs in RPG promoter are found among Drosophila, Fungi, and mammals, demonstrating the evolutionary dynamics of the ribosomal gene regulatory network. We also report a motif present in similar numbers of RPGs among Drosophila species which does not appear to be conserved at the individual RPG gene level. A module-wise stabilizing selection theory is proposed to explain this observation. Overall, our results provide significant insight into the fast-evolving nature of transcriptional regulation in the RPG module.
Evolution of Drosophila ribosomal protein gene core promoters

PubMed Central

Ma, Xiaotu; Zhang, Kangyu; Li, Xiaoman

2011-01-01

The coordinated expression of ribosomal protein genes (RPGs) has been well documented in many species. Previous analyses of RPG promoters focus only on Fungi and mammals. Recognizing this gap and using a comparative genomics approach, we utilize a motif-finding algorithm that incorporates cross-species conservation to identify several significant motifs in Drosophila RPG promoters. As a result, significant differences of the enriched motifs in RPG promoter are found among Drosophila, Fungi, and mammals, demonstrating the evolutionary dynamics of the ribosomal gene regulatory network. We also report a motif present in similar numbers of RPGs among Drosophila species which does not appear to be conserved at the individual RPG gene level. A module-wise stabilizing selection theory is proposed to explain this observation. Overall, our results provide significant insight into the fast-evolving nature of transcriptional regulation in the RPG module. PMID:19059316
Core signalling motif displaying multistability through multi-state enzymes.

PubMed

Feng, Song; Sáez, Meritxell; Wiuf, Carsten; Feliu, Elisenda; Soyer, Orkun S

2016-10-01

Bistability, and more generally multistability, is a key system dynamics feature enabling decision-making and memory in cells. Deciphering the molecular determinants of multistability is thus crucial for a better understanding of cellular pathways and their (re)engineering in synthetic biology. Here, we show that a key motif found predominantly in eukaryotic signalling systems, namely a futile signalling cycle, can display bistability when featuring a two-state kinase. We provide necessary and sufficient mathematical conditions on the kinetic parameters of this motif that guarantee the existence of multiple steady states. These conditions foster the intuition that bistability arises as a consequence of competition between the two states of the kinase. Extending from this result, we find that increasing the number of kinase states linearly translates into an increase in the number of steady states in the system. These findings reveal, to our knowledge, a new mechanism for the generation of bistability and multistability in cellular signalling systems. Further the futile cycle featuring a two-state kinase is among the smallest bistable signalling motifs. We show that multi-state kinases and the described competition-based motif are part of several natural signalling systems and thereby could enable them to implement complex information processing through multistability. These results indicate that multi-state kinases in signalling systems are readily exploited by natural evolution and could equally be used by synthetic approaches for the generation of multistable information processing systems at the cellular level. © 2016 The Authors.
Finding specific RNA motifs: Function in a zeptomole world?

PubMed Central

KNIGHT, ROB; YARUS, MICHAEL

2003-01-01

We have developed a new method for estimating the abundance of any modular (piecewise) RNA motif within a longer random region. We have used this method to estimate the size of the active motifs available to modern SELEX experiments (picomoles of unique sequences) and to a plausible RNA World (zeptomoles of unique sequences: 1 zmole = 602 sequences). Unexpectedly, activities such as specific isoleucine binding are almost certainly present in zeptomoles of molecules, and even ribozymes such as self-cleavage motifs may appear (depending on assumptions about the minimal structures). The number of specified nucleotides is not the only important determinant of a motif’s rarity: The number of modules into which it is divided, and the details of this division, are also crucial. We propose three maxims for easily isolated motifs: the Maxim of Minimization, the Maxim of Multiplicity, and the Maxim of the Median. These maxims together state that selected motifs should be small and composed of as many separate, equally sized modules as possible. For evenly divided motifs with four modules, the largest accessible activity in picomole scale (1–1000 pmole) pools of length 100 is about 34 nucleotides; while for zeptomole scale (1–1000 zmole) pools it is about 20 specific nucleotides (50% probability of occurrence). This latter figure includes some ribozymes and aptamers. Consequently, an RNA metabolism apparently could have begun with only zeptomoles of RNA molecules. PMID:12554865
Probabilistic generation of random networks taking into account information on motifs occurrence.

PubMed

Bois, Frederic Y; Gayraud, Ghislaine

2015-01-01

Because of the huge number of graphs possible even with a small number of nodes, inference on network structure is known to be a challenging problem. Generating large random directed graphs with prescribed probabilities of occurrences of some meaningful patterns (motifs) is also difficult. We show how to generate such random graphs according to a formal probabilistic representation, using fast Markov chain Monte Carlo methods to sample them. As an illustration, we generate realistic graphs with several hundred nodes mimicking a gene transcription interaction network in Escherichia coli.

Probabilistic Generation of Random Networks Taking into Account Information on Motifs Occurrence

PubMed Central

Bois, Frederic Y.

2015-01-01

Abstract Because of the huge number of graphs possible even with a small number of nodes, inference on network structure is known to be a challenging problem. Generating large random directed graphs with prescribed probabilities of occurrences of some meaningful patterns (motifs) is also difficult. We show how to generate such random graphs according to a formal probabilistic representation, using fast Markov chain Monte Carlo methods to sample them. As an illustration, we generate realistic graphs with several hundred nodes mimicking a gene transcription interaction network in Escherichia coli. PMID:25493547
Solving Constraint-Satisfaction Problems with Distributed Neocortical-Like Neuronal Networks.

PubMed

Rutishauser, Ueli; Slotine, Jean-Jacques; Douglas, Rodney J

2018-05-01

Finding actions that satisfy the constraints imposed by both external inputs and internal representations is central to decision making. We demonstrate that some important classes of constraint satisfaction problems (CSPs) can be solved by networks composed of homogeneous cooperative-competitive modules that have connectivity similar to motifs observed in the superficial layers of neocortex. The winner-take-all modules are sparsely coupled by programming neurons that embed the constraints onto the otherwise homogeneous modular computational substrate. We show rules that embed any instance of the CSP's planar four-color graph coloring, maximum independent set, and sudoku on this substrate and provide mathematical proofs that guarantee these graph coloring problems will convergence to a solution. The network is composed of nonsaturating linear threshold neurons. Their lack of right saturation allows the overall network to explore the problem space driven through the unstable dynamics generated by recurrent excitation. The direction of exploration is steered by the constraint neurons. While many problems can be solved using only linear inhibitory constraints, network performance on hard problems benefits significantly when these negative constraints are implemented by nonlinear multiplicative inhibition. Overall, our results demonstrate the importance of instability rather than stability in network computation and offer insight into the computational role of dual inhibitory mechanisms in neural circuits.
Sequential visibility-graph motifs

NASA Astrophysics Data System (ADS)

Iacovacci, Jacopo; Lacasa, Lucas

2016-04-01

Visibility algorithms transform time series into graphs and encode dynamical information in their topology, paving the way for graph-theoretical time series analysis as well as building a bridge between nonlinear dynamics and network science. In this work we introduce and study the concept of sequential visibility-graph motifs, smaller substructures of n consecutive nodes that appear with characteristic frequencies. We develop a theory to compute in an exact way the motif profiles associated with general classes of deterministic and stochastic dynamics. We find that this simple property is indeed a highly informative and computationally efficient feature capable of distinguishing among different dynamics and robust against noise contamination. We finally confirm that it can be used in practice to perform unsupervised learning, by extracting motif profiles from experimental heart-rate series and being able, accordingly, to disentangle meditative from other relaxation states. Applications of this general theory include the automatic classification and description of physical, biological, and financial time series.
Unravelling daily human mobility motifs

PubMed Central

Schneider, Christian M.; Belik, Vitaly; Couronné, Thomas; Smoreda, Zbigniew; González, Marta C.

2013-01-01

Human mobility is differentiated by time scales. While the mechanism for long time scales has been studied, the underlying mechanism on the daily scale is still unrevealed. Here, we uncover the mechanism responsible for the daily mobility patterns by analysing the temporal and spatial trajectories of thousands of persons as individual networks. Using the concept of motifs from network theory, we find only 17 unique networks are present in daily mobility and they follow simple rules. These networks, called here motifs, are sufficient to capture up to 90 per cent of the population in surveys and mobile phone datasets for different countries. Each individual exhibits a characteristic motif, which seems to be stable over several months. Consequently, daily human mobility can be reproduced by an analytically tractable framework for Markov chains by modelling periods of high-frequency trips followed by periods of lower activity as the key ingredient. PMID:23658117
Motif enrichment tool.

PubMed

Blatti, Charles; Sinha, Saurabh

2014-07-01

The Motif Enrichment Tool (MET) provides an online interface that enables users to find major transcriptional regulators of their gene sets of interest. MET searches the appropriate regulatory region around each gene and identifies which transcription factor DNA-binding specificities (motifs) are statistically overrepresented. Motif enrichment analysis is currently available for many metazoan species including human, mouse, fruit fly, planaria and flowering plants. MET also leverages high-throughput experimental data such as ChIP-seq and DNase-seq from ENCODE and ModENCODE to identify the regulatory targets of a transcription factor with greater precision. The results from MET are produced in real time and are linked to a genome browser for easy follow-up analysis. Use of the web tool is free and open to all, and there is no login requirement. ADDRESS: http://veda.cs.uiuc.edu/MET/. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
CPI motif interaction is necessary for capping protein function in cells

PubMed Central

Edwards, Marc; McConnell, Patrick; Schafer, Dorothy A.; Cooper, John A.

2015-01-01

Capping protein (CP) has critical roles in actin assembly in vivo and in vitro. CP binds with high affinity to the barbed end of actin filaments, blocking the addition and loss of actin subunits. Heretofore, models for actin assembly in cells generally assumed that CP is constitutively active, diffusing freely to find and cap barbed ends. However, CP can be regulated by binding of the ‘capping protein interaction' (CPI) motif, found in a diverse and otherwise unrelated set of proteins that decreases, but does not abolish, the actin-capping activity of CP and promotes uncapping in biochemical experiments. Here, we report that CP localization and the ability of CP to function in cells requires interaction with a CPI-motif-containing protein. Our discovery shows that cells target and/or modulate the capping activity of CP via CPI motif interactions in order for CP to localize and function in cells. PMID:26412145
Designing a stochastic genetic switch by coupling chaos and bistability.

PubMed

Zhao, Xiang; Ouyang, Qi; Wang, Hongli

2015-11-01

In stem cell differentiation, a pluripotent stem cell becomes progressively specialized and generates specific cell types through a series of epigenetic processes. How cells can precisely determine their fate in a fluctuating environment is a currently unsolved problem. In this paper, we suggest an abstract gene regulatory network to describe mathematically the differentiation phenomenon featuring stochasticity, divergent cell fates, and robustness. The network consists of three functional motifs: an upstream chaotic motif, a buffering motif of incoherent feed forward loop capable of generating a pulse, and a downstream motif which is bistable. The dynamic behavior is typically a transient chaos with fractal basin boundaries. The trajectories take transiently chaotic journeys before divergently settling down to the bistable states. The ratio of the probability that the high state is achieved to the probability that the low state is reached can maintain a constant in a population of cells with varied molecular fluctuations. The ratio can be turned up or down when proper parameters are adjusted. The model suggests a possible mechanism for the robustness against fluctuations that is prominently featured in pluripotent cell differentiations and developmental phenomena.
Functional Motifs Responsible for Human Metapneumovirus M2-2-mediated Innate Immune Evasion

PubMed Central

Chen, Yu; Deng, Xiaoling; Deng, Junfang; Zhou, Jiehua; Ren, Yuping; Liu, Shengxuan; Prusak, Deborah J.; Wood, Thomas G.; Bao, Xiaoyong

2016-01-01

Human metapneumovirus (hMPV) is a major cause of lower respiratory infection in young children. Repeated infections occur throughout life, but its immune evasion mechanisms are largely unknown. We recently found that hMPV M2-2 protein elicits immune evasion by targeting mitochondrial antiviral-signaling protein (MAVS), an antiviral signaling molecule. However, the molecular mechanisms underlying such inhibition are not known. Our mutagenesis studies revealed that PDZ-binding motifs, 29-DEMI-32 and 39-KEALSDGI-46, located in an immune inhibitory region of M2-2, are responsible for M2-2-mediated immune evasion. We also found both motifs prevent TRAF5 and TRAF6, the MAVS downstream adaptors, to be recruited to MAVS, while the motif 39-KEALSDGI-46 also blocks TRAF3 migrating to MAVS. In parallel, these TRAFs are important in activating transcription factors NF-kB and/or IRF-3 by hMPV. Our findings collectively demonstrate that M2-2 uses its PDZ motifs to launch the hMPV immune evasion through blocking the interaction of MAVS and its downstream TRAFs. PMID:27743962
Functional motifs responsible for human metapneumovirus M2-2-mediated innate immune evasion.

PubMed

Chen, Yu; Deng, Xiaoling; Deng, Junfang; Zhou, Jiehua; Ren, Yuping; Liu, Shengxuan; Prusak, Deborah J; Wood, Thomas G; Bao, Xiaoyong

2016-12-01

Human metapneumovirus (hMPV) is a major cause of lower respiratory infection in young children. Repeated infections occur throughout life, but its immune evasion mechanisms are largely unknown. We recently found that hMPV M2-2 protein elicits immune evasion by targeting mitochondrial antiviral-signaling protein (MAVS), an antiviral signaling molecule. However, the molecular mechanisms underlying such inhibition are not known. Our mutagenesis studies revealed that PDZ-binding motifs, 29-DEMI-32 and 39-KEALSDGI-46, located in an immune inhibitory region of M2-2, are responsible for M2-2-mediated immune evasion. We also found both motifs prevent TRAF5 and TRAF6, the MAVS downstream adaptors, to be recruited to MAVS, while the motif 39-KEALSDGI-46 also blocks TRAF3 migrating to MAVS. In parallel, these TRAFs are important in activating transcription factors NF-kB and/or IRF-3 by hMPV. Our findings collectively demonstrate that M2-2 uses its PDZ motifs to launch the hMPV immune evasion through blocking the interaction of MAVS and its downstream TRAFs. Copyright © 2016 Elsevier Inc. All rights reserved.
Improving the Accuracy and Scalability of Discriminative Learning Methods for Markov Logic Networks

DTIC Science & Technology

2011-05-01

9 2.2 Inductive Logic Programming and Aleph . . . . . . . . . . . . 10 2.3 MLNs and Alchemy ...positive examples. Aleph allows users to customize each of 10 these steps, and thereby supports a variety of specific algorithms. 2.3 MLNs and Alchemy An...tural motifs. By limiting the search to each unique motif, LSM is able to find good clauses in an efficient manner. Alchemy (Kok, Singla, Richardson
Missing link in the evolution of Hox clusters.

PubMed

Ogishima, Soichi; Tanaka, Hiroshi

2007-01-31

Hox cluster has key roles in regulating the patterning of the antero-posterior axis in a metazoan embryo. It consists of the anterior, central and posterior genes; the central genes have been identified only in bilaterians, but not in cnidarians, and are responsible for archiving morphological complexity in bilaterian development. However, their evolutionary history has not been revealed, that is, there has been a "missing link". Here we show the evolutionary history of Hox clusters of 18 bilaterians and 2 cnidarians by using a new method, "motif-based reconstruction", examining the gain/loss processes of evolutionarily conserved sequences, "motifs", outside the homeodomain. We successfully identified the missing link in the evolution of Hox clusters between the cnidarian-bilaterian ancestor and the bilaterians as the ancestor of the central genes, which we call the proto-central gene. Exploring the correspondent gene with the proto-central gene, we found that one of the acoela Hox genes has the same motif repertory as that of the proto-central gene. This interesting finding suggests that the acoela Hox cluster corresponds with the missing link in the evolution of the Hox cluster between the cnidarian-bilaterian ancestor and the bilaterians. Our findings suggested that motif gains/diversifications led to the explosive diversity of the bilaterian body plan.
Identification of helix capping and β-turn motifs from NMR chemical shifts

PubMed Central

Shen, Yang; Bax, Ad

2012-01-01

We present an empirical method for identification of distinct structural motifs in proteins on the basis of experimentally determined backbone and 13Cβ chemical shifts. Elements identified include the N-terminal and C-terminal helix capping motifs and five types of β-turns: I, II, I′, II′ and VIII. Using a database of proteins of known structure, the NMR chemical shifts, together with the PDB-extracted amino acid preference of the helix capping and β-turn motifs are used as input data for training an artificial neural network algorithm, which outputs the statistical probability of finding each motif at any given position in the protein. The trained neural networks, contained in the MICS (motif identification from chemical shifts) program, also provide a confidence level for each of their predictions, and values ranging from ca 0.7–0.9 for the Matthews correlation coefficient of its predictions far exceed that attainable by sequence analysis. MICS is anticipated to be useful both in the conventional NMR structure determination process and for enhancing on-going efforts to determine protein structures solely on the basis of chemical shift information, where it can aid in identifying protein database fragments suitable for use in building such structures. PMID:22314702
Ser/Thr Motifs in Transmembrane Proteins: Conservation Patterns and Effects on Local Protein Structure and Dynamics

PubMed Central

del Val, Coral; White, Stephen H.

2014-01-01

We combined systematic bioinformatics analyses and molecular dynamics simulations to assess the conservation patterns of Ser and Thr motifs in membrane proteins, and the effect of such motifs on the structure and dynamics of α-helical transmembrane (TM) segments. We find that Ser/Thr motifs are often present in β-barrel TM proteins. At least one Ser/Thr motif is present in almost half of the sequences of α-helical proteins analyzed here. The extensive bioinformatics analyses and inspection of protein structures led to the identification of molecular transporters with noticeable numbers of Ser/Thr motifs within the TM region. Given the energetic penalty for burying multiple Ser/Thr groups in the membrane hydrophobic core, the observation of transporters with multiple membrane-embedded Ser/Thr is intriguing and raises the question of how the presence of multiple Ser/Thr affects protein local structure and dynamics. Molecular dynamics simulations of four different Ser-containing model TM peptides indicate that backbone hydrogen bonding of membrane-buried Ser/Thr hydroxyl groups can significantly change the local structure and dynamics of the helix. Ser groups located close to the membrane interface can hydrogen bond to solvent water instead of protein backbone, leading to an enhanced local solvation of the peptide. PMID:22836667
Improved prediction of MHC class I and class II epitopes using a novel Gibbs sampling approach.

PubMed

Nielsen, Morten; Lundegaard, Claus; Worning, Peder; Hvid, Christina Sylvester; Lamberth, Kasper; Buus, Søren; Brunak, Søren; Lund, Ole

2004-06-12

Prediction of which peptides will bind a specific major histocompatibility complex (MHC) constitutes an important step in identifying potential T-cell epitopes suitable as vaccine candidates. MHC class II binding peptides have a broad length distribution complicating such predictions. Thus, identifying the correct alignment is a crucial part of identifying the core of an MHC class II binding motif. In this context, we wish to describe a novel Gibbs motif sampler method ideally suited for recognizing such weak sequence motifs. The method is based on the Gibbs sampling method, and it incorporates novel features optimized for the task of recognizing the binding motif of MHC classes I and II. The method locates the binding motif in a set of sequences and characterizes the motif in terms of a weight-matrix. Subsequently, the weight-matrix can be applied to identifying effectively potential MHC binding peptides and to guiding the process of rational vaccine design. We apply the motif sampler method to the complex problem of MHC class II binding. The input to the method is amino acid peptide sequences extracted from the public databases of SYFPEITHI and MHCPEP and known to bind to the MHC class II complex HLA-DR4(B1*0401). Prior identification of information-rich (anchor) positions in the binding motif is shown to improve the predictive performance of the Gibbs sampler. Similarly, a consensus solution obtained from an ensemble average over suboptimal solutions is shown to outperform the use of a single optimal solution. In a large-scale benchmark calculation, the performance is quantified using relative operating characteristics curve (ROC) plots and we make a detailed comparison of the performance with that of both the TEPITOPE method and a weight-matrix derived using the conventional alignment algorithm of ClustalW. The calculation demonstrates that the predictive performance of the Gibbs sampler is higher than that of ClustalW and in most cases also higher than that of the TEPITOPE method.
The condition-dependent transcriptional network in Escherichia coli.

PubMed

Lemmens, Karen; De Bie, Tijl; Dhollander, Thomas; Monsieurs, Pieter; De Moor, Bart; Collado-Vides, Julio; Engelen, Kristof; Marchal, Kathleen

2009-03-01

Thanks to the availability of high-throughput omics data, bioinformatics approaches are able to hypothesize thus-far undocumented genetic interactions. However, due to the amount of noise in these data, inferences based on a single data source are often unreliable. A popular approach to overcome this problem is to integrate different data sources. In this study, we describe DISTILLER, a novel framework for data integration that simultaneously analyzes microarray and motif information to find modules that consist of genes that are co-expressed in a subset of conditions, and their corresponding regulators. By applying our method on publicly available data, we evaluated the condition-specific transcriptional network of Escherichia coli. DISTILLER confirmed 62% of 736 interactions described in RegulonDB, and 278 novel interactions were predicted.
Crystallographic and Computational Studies of a Class II MHC Complex with a Nonconforming Peptide: HLA-DRA/DRB3*0101

NASA Astrophysics Data System (ADS)

Parry, Christian S.; Gorski, Jack; Stern, Lawrence J.

2003-03-01

The stable binding of processed foreign peptide to a class II major histocompatibility (MHC) molecule and subsequent presentation to a T cell receptor is a central event in immune recognition and regulation. Polymorphic residues on the floor of the peptide binding site form pockets that anchor peptide side chains. These and other residues in the helical wall of the groove determine the specificity of each allele and define a motif. Allele specific motifs allow the prediction of epitopes from the sequence of pathogens. There are, however, known epitopes that do not satisfy these motifs: anchor motifs are not adequate for predicting epitopes as there are apparently major and minor motifs. We present crystallographic studies into the nature of the interactions that govern the binding of these so called nonconforming peptides. We would like to understand the role of the P10 pocket and find out whether the peptides that do not obey the consensus anchor motif bind in the canonical conformation observed in in prior structures of class II MHC-peptide complexes. HLA-DRB3*0101 complexed with peptide crystallized in unit cell 92.10 x 92.10 x 248.30 (90, 90, 90), P41212, and the diffraction data is reliable to 2.2ÅWe are complementing our studies with dynamical long time simulations to answer these questions, particularly the interplay of the anchor motifs in peptide binding, the range of protein and ligand conformations, and water hydration structures.
An analysis of the positional distribution of DNA motifs in promoter regions and its biological relevance.

PubMed

Casimiro, Ana C; Vinga, Susana; Freitas, Ana T; Oliveira, Arlindo L

2008-02-07

Motif finding algorithms have developed in their ability to use computationally efficient methods to detect patterns in biological sequences. However the posterior classification of the output still suffers from some limitations, which makes it difficult to assess the biological significance of the motifs found. Previous work has highlighted the existence of positional bias of motifs in the DNA sequences, which might indicate not only that the pattern is important, but also provide hints of the positions where these patterns occur preferentially. We propose to integrate position uniformity tests and over-representation tests to improve the accuracy of the classification of motifs. Using artificial data, we have compared three different statistical tests (Chi-Square, Kolmogorov-Smirnov and a Chi-Square bootstrap) to assess whether a given motif occurs uniformly in the promoter region of a gene. Using the test that performed better in this dataset, we proceeded to study the positional distribution of several well known cis-regulatory elements, in the promoter sequences of different organisms (S. cerevisiae, H. sapiens, D. melanogaster, E. coli and several Dicotyledons plants). The results show that position conservation is relevant for the transcriptional machinery. We conclude that many biologically relevant motifs appear heterogeneously distributed in the promoter region of genes, and therefore, that non-uniformity is a good indicator of biological relevance and can be used to complement over-representation tests commonly used. In this article we present the results obtained for the S. cerevisiae data sets.
B Cell Receptor Activation Predominantly Regulates AKT-mTORC1/2 Substrates Functionally Related to RNA Processing

PubMed Central

Mohammad, Dara K.; Ali, Raja H.; Turunen, Janne J.; Nore, Beston F.; Smith, C. I. Edvard

2016-01-01

Protein kinase B (AKT) phosphorylates numerous substrates on the consensus motif RXRXXpS/T, a docking site for 14-3-3 interactions. To identify novel AKT-induced phosphorylation events following B cell receptor (BCR) activation, we performed proteomics, biochemical and bioinformatics analyses. Phosphorylated consensus motif-specific antibody enrichment, followed by tandem mass spectrometry, identified 446 proteins, containing 186 novel phosphorylation events. Moreover, we found 85 proteins with up regulated phosphorylation, while in 277 it was down regulated following stimulation. Up regulation was mainly in proteins involved in ribosomal and translational regulation, DNA binding and transcription regulation. Conversely, down regulation was preferentially in RNA binding, mRNA splicing and mRNP export proteins. Immunoblotting of two identified RNA regulatory proteins, RBM25 and MEF-2D, confirmed the proteomics data. Consistent with these findings, the AKT-inhibitor (MK-2206) dramatically reduced, while the mTORC-inhibitor PP242 totally blocked phosphorylation on the RXRXXpS/T motif. This demonstrates that this motif, previously suggested as an AKT target sequence, also is a substrate for mTORC1/2. Proteins with PDZ, PH and/or SH3 domains contained the consensus motif, whereas in those with an HMG-box, H15 domains and/or NF-X1-zinc-fingers, the motif was absent. Proteins carrying the consensus motif were found in all eukaryotic clades indicating that they regulate a phylogenetically conserved set of proteins. PMID:27487157
IndeCut evaluates performance of network motif discovery algorithms.

PubMed

Ansariola, Mitra; Megraw, Molly; Koslicki, David

2018-05-01

Genomic networks represent a complex map of molecular interactions which are descriptive of the biological processes occurring in living cells. Identifying the small over-represented circuitry patterns in these networks helps generate hypotheses about the functional basis of such complex processes. Network motif discovery is a systematic way of achieving this goal. However, a reliable network motif discovery outcome requires generating random background networks which are the result of a uniform and independent graph sampling method. To date, there has been no method to numerically evaluate whether any network motif discovery algorithm performs as intended on realistically sized datasets-thus it was not possible to assess the validity of resulting network motifs. In this work, we present IndeCut, the first method to date that characterizes network motif finding algorithm performance in terms of uniform sampling on realistically sized networks. We demonstrate that it is critical to use IndeCut prior to running any network motif finder for two reasons. First, IndeCut indicates the number of samples needed for a tool to produce an outcome that is both reproducible and accurate. Second, IndeCut allows users to choose the tool that generates samples in the most independent fashion for their network of interest among many available options. The open source software package is available at https://github.com/megrawlab/IndeCut. megrawm@science.oregonstate.edu or david.koslicki@math.oregonstate.edu. Supplementary data are available at Bioinformatics online.
Mechanisms of Zero-Lag Synchronization in Cortical Motifs

PubMed Central

Gollo, Leonardo L.; Mirasso, Claudio; Sporns, Olaf; Breakspear, Michael

2014-01-01

Zero-lag synchronization between distant cortical areas has been observed in a diversity of experimental data sets and between many different regions of the brain. Several computational mechanisms have been proposed to account for such isochronous synchronization in the presence of long conduction delays: Of these, the phenomenon of “dynamical relaying” – a mechanism that relies on a specific network motif – has proven to be the most robust with respect to parameter mismatch and system noise. Surprisingly, despite a contrary belief in the community, the common driving motif is an unreliable means of establishing zero-lag synchrony. Although dynamical relaying has been validated in empirical and computational studies, the deeper dynamical mechanisms and comparison to dynamics on other motifs is lacking. By systematically comparing synchronization on a variety of small motifs, we establish that the presence of a single reciprocally connected pair – a “resonance pair” – plays a crucial role in disambiguating those motifs that foster zero-lag synchrony in the presence of conduction delays (such as dynamical relaying) from those that do not (such as the common driving triad). Remarkably, minor structural changes to the common driving motif that incorporate a reciprocal pair recover robust zero-lag synchrony. The findings are observed in computational models of spiking neurons, populations of spiking neurons and neural mass models, and arise whether the oscillatory systems are periodic, chaotic, noise-free or driven by stochastic inputs. The influence of the resonance pair is also robust to parameter mismatch and asymmetrical time delays amongst the elements of the motif. We call this manner of facilitating zero-lag synchrony resonance-induced synchronization, outline the conditions for its occurrence, and propose that it may be a general mechanism to promote zero-lag synchrony in the brain. PMID:24763382

Allergen cross reactions: a problem greater than ever thought?

PubMed

Pfiffner, P; Truffer, R; Matsson, P; Rasi, C; Mari, A; Stadler, B M

2010-12-01

Cross reactions are an often observed phenomenon in patients with allergy. Sensitization against some allergens may cause reactions against other seemingly unrelated allergens. Today, cross reactions are being investigated on a per-case basis, analyzing blood serum specific IgE (sIgE) levels and clinical features of patients suffering from cross reactions. In this study, we evaluated the level of sIgE compared to patients' total IgE assuming epitope specificity is a consequence of sequence similarity. Our objective was to evaluate our recently published model of molecular sequence similarities underlying cross reactivity using serum-derived data from IgE determinations of standard laboratory tests. We calculated the probabilities of protein cross reactivity based on conserved sequence motifs and compared these in silico predictions to a database consisting of 5362 sera with sIgE determinations. Cumulating sIgE values of a patient resulted in a median of 25-30% total IgE. Comparing motif cross reactivity predictions to sIgE levels showed that on average three times fewer motifs than extracts were recognized in a given serum (correlation coefficient: 0.967). Extracts belonging to the same motif group co-reacted in a high percentage of sera (up to 80% for some motifs). Cumulated sIgE levels are exaggerated because of a high level of observed cross reactions. Thus, not only bioinformatic prediction of allergenic motifs, but also serological routine testing of allergic patients implies that the immune system may recognize only a small number of allergenic structures. © 2010 John Wiley & Sons A/S.
One motif to bind them: A small-XXX-small motif affects transmembrane domain 1 oligomerization, function, localization, and cross-talk between two yeast GPCRs.

PubMed

Lock, Antonia; Forfar, Rachel; Weston, Cathryn; Bowsher, Leo; Upton, Graham J G; Reynolds, Christopher A; Ladds, Graham; Dixon, Ann M

2014-12-01

G protein-coupled receptors (GPCRs) are the largest family of cell-surface receptors in mammals and facilitate a range of physiological responses triggered by a variety of ligands. GPCRs were thought to function as monomers, however it is now accepted that GPCR homo- and hetero-oligomers also exist and influence receptor properties. The Schizosaccharomyces pombe GPCR Mam2 is a pheromone-sensing receptor involved in mating and has previously been shown to form oligomers in vivo. The first transmembrane domain (TMD) of Mam2 contains a small-XXX-small motif, overrepresented in membrane proteins and well-known for promoting helix-helix interactions. An ortholog of Mam2 in Saccharomyces cerevisiae, Ste2, contains an analogous small-XXX-small motif which has been shown to contribute to receptor homo-oligomerization, localization and function. Here we have used experimental and computational techniques to characterize the role of the small-XXX-small motif in function and assembly of Mam2 for the first time. We find that disruption of the motif via mutagenesis leads to reduction of Mam2 TMD1 homo-oligomerization and pheromone-responsive cellular signaling of the full-length protein. It also impairs correct targeting to the plasma membrane. Mutation of the analogous motif in Ste2 yielded similar results, suggesting a conserved mechanism for assembly. Using co-expression of the two fungal receptors in conjunction with computational models, we demonstrate a functional change in G protein specificity and propose that this is brought about through hetero-dimeric interactions of Mam2 with Ste2 via the complementary small-XXX-small motifs. This highlights the potential of these motifs to affect a range of properties that can be investigated in other GPCRs. Copyright © 2014. Published by Elsevier B.V.
A Three-Dimensional RNA Motif in Potato spindle tuber viroid Mediates Trafficking from Palisade Mesophyll to Spongy Mesophyll in Nicotiana benthamiana[W

PubMed Central

Takeda, Ryuta; Petrov, Anton I.; Leontis, Neocles B.; Ding, Biao

2011-01-01

Cell-to-cell trafficking of RNA is an emerging biological principle that integrates systemic gene regulation, viral infection, antiviral response, and cell-to-cell communication. A key mechanistic question is how an RNA is specifically selected for trafficking from one type of cell into another type. Here, we report the identification of an RNA motif in Potato spindle tuber viroid (PSTVd) required for trafficking from palisade mesophyll to spongy mesophyll in Nicotiana benthamiana leaves. This motif, called loop 6, has the sequence 5′-CGA-3′...5′-GAC-3′ flanked on both sides by cis Watson-Crick G/C and G/U wobble base pairs. We present a three-dimensional (3D) structural model of loop 6 that specifies all non-Watson-Crick base pair interactions, derived by isostericity-based sequence comparisons with 3D RNA motifs from the RNA x-ray crystal structure database. The model is supported by available chemical modification patterns, natural sequence conservation/variations in PSTVd isolates and related species, and functional characterization of all possible mutants for each of the loop 6 base pairs. Our findings and approaches have broad implications for studying the 3D RNA structural motifs mediating trafficking of diverse RNA species across specific cellular boundaries and for studying the structure-function relationships of RNA motifs in other biological processes. PMID:21258006
A three-dimensional RNA motif in Potato spindle tuber viroid mediates trafficking from palisade mesophyll to spongy mesophyll in Nicotiana benthamiana.

PubMed

Takeda, Ryuta; Petrov, Anton I; Leontis, Neocles B; Ding, Biao

2011-01-01

Cell-to-cell trafficking of RNA is an emerging biological principle that integrates systemic gene regulation, viral infection, antiviral response, and cell-to-cell communication. A key mechanistic question is how an RNA is specifically selected for trafficking from one type of cell into another type. Here, we report the identification of an RNA motif in Potato spindle tuber viroid (PSTVd) required for trafficking from palisade mesophyll to spongy mesophyll in Nicotiana benthamiana leaves. This motif, called loop 6, has the sequence 5'-CGA-3'...5'-GAC-3' flanked on both sides by cis Watson-Crick G/C and G/U wobble base pairs. We present a three-dimensional (3D) structural model of loop 6 that specifies all non-Watson-Crick base pair interactions, derived by isostericity-based sequence comparisons with 3D RNA motifs from the RNA x-ray crystal structure database. The model is supported by available chemical modification patterns, natural sequence conservation/variations in PSTVd isolates and related species, and functional characterization of all possible mutants for each of the loop 6 base pairs. Our findings and approaches have broad implications for studying the 3D RNA structural motifs mediating trafficking of diverse RNA species across specific cellular boundaries and for studying the structure-function relationships of RNA motifs in other biological processes.
Do motifs reflect evolved function?--No convergent evolution of genetic regulatory network subgraph topologies.

PubMed

Knabe, Johannes F; Nehaniv, Chrystopher L; Schilstra, Maria J

2008-01-01

Methods that analyse the topological structure of networks have recently become quite popular. Whether motifs (subgraph patterns that occur more often than in randomized networks) have specific functions as elementary computational circuits has been cause for debate. As the question is difficult to resolve with currently available biological data, we approach the issue using networks that abstractly model natural genetic regulatory networks (GRNs) which are evolved to show dynamical behaviors. Specifically one group of networks was evolved to be capable of exhibiting two different behaviors ("differentiation") in contrast to a group with a single target behavior. In both groups we find motif distribution differences within the groups to be larger than differences between them, indicating that evolutionary niches (target functions) do not necessarily mold network structure uniquely. These results show that variability operators can have a stronger influence on network topologies than selection pressures, especially when many topologies can create similar dynamics. Moreover, analysis of motif functional relevance by lesioning did not suggest that motifs were of greater importance to the functioning of the network than arbitrary subgraph patterns. Only when drastically restricting network size, so that one motif corresponds to a whole functionally evolved network, was preference for particular connection patterns found. This suggests that in non-restricted, bigger networks, entanglement with the rest of the network hinders topological subgraph analysis.
DEEP MOTIF DASHBOARD: VISUALIZING AND UNDERSTANDING GENOMIC SEQUENCES USING DEEP NEURAL NETWORKS.

PubMed

Lanchantin, Jack; Singh, Ritambhara; Wang, Beilun; Qi, Yanjun

2017-01-01

Deep neural network (DNN) models have recently obtained state-of-the-art prediction accuracy for the transcription factor binding (TFBS) site classification task. However, it remains unclear how these approaches identify meaningful DNA sequence signals and give insights as to why TFs bind to certain locations. In this paper, we propose a toolkit called the Deep Motif Dashboard (DeMo Dashboard) which provides a suite of visualization strategies to extract motifs, or sequence patterns from deep neural network models for TFBS classification. We demonstrate how to visualize and understand three important DNN models: convolutional, recurrent, and convolutional-recurrent networks. Our first visualization method is finding a test sequence's saliency map which uses first-order derivatives to describe the importance of each nucleotide in making the final prediction. Second, considering recurrent models make predictions in a temporal manner (from one end of a TFBS sequence to the other), we introduce temporal output scores, indicating the prediction score of a model over time for a sequential input. Lastly, a class-specific visualization strategy finds the optimal input sequence for a given TFBS positive class via stochastic gradient optimization. Our experimental results indicate that a convolutional-recurrent architecture performs the best among the three architectures. The visualization techniques indicate that CNN-RNN makes predictions by modeling both motifs as well as dependencies among them.
Deep Motif Dashboard: Visualizing and Understanding Genomic Sequences Using Deep Neural Networks

PubMed Central

Lanchantin, Jack; Singh, Ritambhara; Wang, Beilun; Qi, Yanjun

2018-01-01

Deep neural network (DNN) models have recently obtained state-of-the-art prediction accuracy for the transcription factor binding (TFBS) site classification task. However, it remains unclear how these approaches identify meaningful DNA sequence signals and give insights as to why TFs bind to certain locations. In this paper, we propose a toolkit called the Deep Motif Dashboard (DeMo Dashboard) which provides a suite of visualization strategies to extract motifs, or sequence patterns from deep neural network models for TFBS classification. We demonstrate how to visualize and understand three important DNN models: convolutional, recurrent, and convolutional-recurrent networks. Our first visualization method is finding a test sequence’s saliency map which uses first-order derivatives to describe the importance of each nucleotide in making the final prediction. Second, considering recurrent models make predictions in a temporal manner (from one end of a TFBS sequence to the other), we introduce temporal output scores, indicating the prediction score of a model over time for a sequential input. Lastly, a class-specific visualization strategy finds the optimal input sequence for a given TFBS positive class via stochastic gradient optimization. Our experimental results indicate that a convolutional-recurrent architecture performs the best among the three architectures. The visualization techniques indicate that CNN-RNN makes predictions by modeling both motifs as well as dependencies among them. PMID:27896980
TRIM67 Protein Negatively Regulates Ras Activity through Degradation of 80K-H and Induces Neuritogenesis*

PubMed Central

Yaguchi, Hiroaki; Okumura, Fumihiko; Takahashi, Hidehisa; Kano, Takahiro; Kameda, Hiroyuki; Uchigashima, Motokazu; Tanaka, Shinya; Watanabe, Masahiko; Sasaki, Hidenao; Hatakeyama, Shigetsugu

2012-01-01

Tripartite motif (TRIM)-containing proteins, which are defined by the presence of a common domain structure composed of a RING finger, one or two B-box motifs and a coiled-coil motif, are involved in many biological processes including innate immunity, viral infection, carcinogenesis, and development. Here we show that TRIM67, which has a TRIM motif, an FN3 domain and a SPRY domain, is highly expressed in the cerebellum and that TRIM67 interacts with PRG-1 and 80K-H, which is involved in the Ras-mediated signaling pathway. Ectopic expression of TRIM67 results in degradation of endogenous 80K-H and attenuation of cell proliferation and enhances neuritogenesis in the neuroblastoma cell line N1E-115. Furthermore, morphological and biological changes caused by knockdown of 80K-H are similar to those observed by overexpression of TRIM67. These findings suggest that TRIM67 regulates Ras signaling via degradation of 80K-H, leading to neural differentiation including neuritogenesis. PMID:22337885
TRIM67 protein negatively regulates Ras activity through degradation of 80K-H and induces neuritogenesis.

PubMed

Yaguchi, Hiroaki; Okumura, Fumihiko; Takahashi, Hidehisa; Kano, Takahiro; Kameda, Hiroyuki; Uchigashima, Motokazu; Tanaka, Shinya; Watanabe, Masahiko; Sasaki, Hidenao; Hatakeyama, Shigetsugu

2012-04-06

Tripartite motif (TRIM)-containing proteins, which are defined by the presence of a common domain structure composed of a RING finger, one or two B-box motifs and a coiled-coil motif, are involved in many biological processes including innate immunity, viral infection, carcinogenesis, and development. Here we show that TRIM67, which has a TRIM motif, an FN3 domain and a SPRY domain, is highly expressed in the cerebellum and that TRIM67 interacts with PRG-1 and 80K-H, which is involved in the Ras-mediated signaling pathway. Ectopic expression of TRIM67 results in degradation of endogenous 80K-H and attenuation of cell proliferation and enhances neuritogenesis in the neuroblastoma cell line N1E-115. Furthermore, morphological and biological changes caused by knockdown of 80K-H are similar to those observed by overexpression of TRIM67. These findings suggest that TRIM67 regulates Ras signaling via degradation of 80K-H, leading to neural differentiation including neuritogenesis.
A +1 ribosomal frameshifting motif prevalent among plant amalgaviruses.

PubMed

Nibert, Max L; Pyle, Jesse D; Firth, Andrew E

2016-11-01

Sequence accessions attributable to novel plant amalgaviruses have been found in the Transcriptome Shotgun Assembly database. Sixteen accessions, derived from 12 different plant species, appear to encompass the complete protein-coding regions of the proposed amalgaviruses, which would substantially expand the size of genus Amalgavirus from 4 current species. Other findings include evidence for UUU_CGN as a +1 ribosomal frameshifting motif prevalent among plant amalgaviruses; for a variant version of this motif found thus far in only two amalgaviruses from solanaceous plants; for a region of α-helical coiled coil propensity conserved in a central region of the ORF1 translation product of plant amalgaviruses; and for conserved sequences in a C-terminal region of the ORF2 translation product (RNA-dependent RNA polymerase) of plant amalgaviruses, seemingly beyond the region of conserved polymerase motifs. These results additionally illustrate the value of mining the TSA database and others for novel viral sequences for comparative analyses. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
Cytopathogenesis of vesicular stomatitis virus is regulated by the PSAP motif of M protein in a species-dependent manner.

PubMed

Irie, Takashi; Liu, Yuliang; Drolet, Barbara S; Carnero, Elena; García-Sastre, Adolfo; Harty, Ronald N

2012-09-01

Vesicular stomatitis virus (VSV) is an important vector-borne pathogen of bovine and equine species, causing a reportable vesicular disease. The matrix (M) protein of VSV is multifunctional and plays a key role in cytopathogenesis, apoptosis, host protein shut-off, and virion assembly/budding. Our previous findings indicated that mutations of residues flanking the (37)PSAP(40) motif within the M protein resulted in VSV recombinants having attenuated phenotypes in mice. In this report, we characterize the phenotype of VSV recombinant PS > A4 (which harbors four alanines (AAAA) in place of the PSAP motif without disruption of flanking residues) in both mice, and in Aedes albopictus C6/36 mosquito and Culicoides sonorensis KC cell lines. The PS > A4 recombinant displayed an attenuated phenotype in infected mice as judged by weight loss, mortality, and viral titers measured from lung and brain samples of infected animals. However, unexpectedly, the PS > A4 recombinant displayed a robust cytopathic phenotype in insect C6/36 cells compared to that observed with control viruses. Notably, titers of recombinant PS > A4 were approximately 10-fold greater than those of control viruses in infected C6/36 cells and in KC cells from Culicoides sonorensis, a known VSV vector species. In addition, recombinant PS > A4 induced a 25-fold increase in the level of C3 caspase activity in infected C6/36 cells. These findings indicate that the PSAP motif plays a direct role in regulating cytopathogenicity in a species-dependent manner, and suggest that the intact PSAP motif may be important for maintaining persistence of VSV in an insect host.
A frequent, GxxxG-mediated, transmembrane association motif is optimized for the formation of interhelical Cα–H hydrogen bonds

PubMed Central

Mueller, Benjamin K.; Subramaniam, Sabareesh; Senes, Alessandro

2014-01-01

Carbon hydrogen bonds between Cα–H donors and carbonyl acceptors are frequently observed between transmembrane helices (Cα–H···O=C). Networks of these interactions occur often at helix−helix interfaces mediated by GxxxG and similar patterns. Cα–H hydrogen bonds have been hypothesized to be important in membrane protein folding and association, but evidence that they are major determinants of helix association is still lacking. Here we present a comprehensive geometric analysis of homodimeric helices that demonstrates the existence of a single region in conformational space with high propensity for Cα–H···O=C hydrogen bond formation. This region corresponds to the most frequent motif for parallel dimers, GASright, whose best-known example is glycophorin A. The finding suggests a causal link between the high frequency of occurrence of GASright and its propensity for carbon hydrogen bond formation. Investigation of the sequence dependency of the motif determined that Gly residues are required at specific positions where only Gly can act as a donor with its “side chain” Hα. Gly also reduces the steric barrier for non-Gly amino acids at other positions to act as Cα donors, promoting the formation of cooperative hydrogen bonding networks. These findings offer a structural rationale for the occurrence of GxxxG patterns at the GASright interface. The analysis identified the conformational space and the sequence requirement of Cα–H···O=C mediated motifs; we took advantage of these results to develop a structural prediction method. The resulting program, CATM, predicts ab initio the known high-resolution structures of homodimeric GASright motifs at near-atomic level. PMID:24569864
Distribution and diversity of ribosome binding sites in prokaryotic genomes.

PubMed

Omotajo, Damilola; Tate, Travis; Cho, Hyuk; Choudhary, Madhusudan

2015-08-14

Prokaryotic translation initiation involves the proper docking, anchoring, and accommodation of mRNA to the 30S ribosomal subunit. Three initiation factors (IF1, IF2, and IF3) and some ribosomal proteins mediate the assembly and activation of the translation initiation complex. Although the interaction between Shine-Dalgarno (SD) sequence and its complementary sequence in the 16S rRNA is important in initiation, some genes lacking an SD ribosome binding site (RBS) are still well expressed. The objective of this study is to examine the pattern of distribution and diversity of RBS in fully sequenced bacterial genomes. The following three hypotheses were tested: SD motifs are prevalent in bacterial genomes; all previously identified SD motifs are uniformly distributed across prokaryotes; and genes with specific cluster of orthologous gene (COG) functions differ in their use of SD motifs. Data for 2,458 bacterial genomes, previously generated by Prodigal (PROkaryotic DYnamic programming Gene-finding ALgorithm) and currently available at the National Center for Biotechnology Information (NCBI), were analyzed. Of the total genes examined, ~77.0% use an SD RBS, while ~23.0% have no RBS. Majority of the genes with the most common SD motifs are distributed in a manner that is representative of their abundance for each COG functional category, while motifs 13 (5'-GGA-3'/5'-GAG-3'/5'-AGG-3') and 27 (5'-AGGAGG-3') appear to be predominantly used by genes for information storage and processing, and translation and ribosome biogenesis, respectively. These findings suggest that an SD sequence is not obligatory for translation initiation; instead, other signals, such as the RBS spacer, may have an overarching influence on translation of mRNAs. Subsequent analyses of the 5' secondary structure of these mRNAs may provide further insight into the translation initiation mechanism.
Analysis of zinc binding sites in protein crystal structures.

PubMed

Alberts, I L; Nadassy, K; Wodak, S J

1998-08-01

The geometrical properties of zinc binding sites in a dataset of high quality protein crystal structures deposited in the Protein Data Bank have been examined to identify important differences between zinc sites that are directly involved in catalysis and those that play a structural role. Coordination angles in the zinc primary coordination sphere are compared with ideal values for each coordination geometry, and zinc coordination distances are compared with those in small zinc complexes from the Cambridge Structural Database as a guide of expected trends. We find that distances and angles in the primary coordination sphere are in general close to the expected (or ideal) values. Deviations occur primarily for oxygen coordinating atoms and are found to be mainly due to H-bonding of the oxygen coordinating ligand to protein residues, bidentate binding arrangements, and multi-zinc sites. We find that H-bonding of oxygen containing residues (or water) to zinc bound histidines is almost universal in our dataset and defines the elec-His-Zn motif. Analysis of the stereochemistry shows that carboxyl elec-His-Zn motifs are geometrically rigid, while water elec-His-Zn motifs show the most geometrical variation. As catalytic motifs have a higher proportion of carboxyl elec atoms than structural motifs, they provide a more rigid framework for zinc binding. This is understood biologically, as a small distortion in the zinc position in an enzyme can have serious consequences on the enzymatic reaction. We also analyze the sequence pattern of the zinc ligands and residues that provide elecs, and identify conserved hydrophobic residues in the endopeptidases that also appear to contribute to stabilizing the catalytic zinc site. A zinc binding template in protein crystal structures is derived from these observations.
Efficient budding of the tacaribe virus matrix protein z requires the nucleoprotein.

PubMed

Groseth, Allison; Wolff, Svenja; Strecker, Thomas; Hoenen, Thomas; Becker, Stephan

2010-04-01

The Z protein has been shown for several arenaviruses to serve as the viral matrix protein. As such, Z provides the principal force for the budding of virus particles and is capable of forming virus-like particles (VLPs) when expressed alone. For most arenaviruses, this activity has been shown to be linked to the presence of proline-rich late-domain motifs in the C terminus; however, for the New World arenavirus Tacaribe virus (TCRV), no such motif exists within Z. It was recently demonstrated that while TCRV Z is still capable of functioning as a matrix protein to induce the formation of VLPs, neither its ASAP motif, which replaces a canonical PT/SAP motif in related viruses, nor its YxxL motif is involved in budding, leading to the suggestion that TCRV uses a novel budding mechanism. Here we show that in comparison to its closest relative, Junin virus (JUNV), TCRV Z buds only weakly when expressed in isolation. While this budding activity is independent of the ASAP or YxxL motif, it is significantly enhanced by coexpression with the nucleoprotein (NP), an effect not seen with JUNV Z. Interestingly, both the ASAP and YxxL motifs of Z appear to be critical for the recruitment of NP into VLPs, as well as for the enhancement of TCRV Z-mediated budding. While it is known that TCRV budding remains dependent on the endosomal sorting complex required for transport, our findings provide further evidence that TCRV uses a budding mechanism distinct from that of other known arenaviruses and suggest an essential role for NP in this process.
SUMOylation target sites at the C terminus protect Axin from ubiquitination and confer protein stability

PubMed Central

Kim, Min Jung; Chia, Ian V.; Costantini, Frank

2008-01-01

Axin is a scaffold protein for the β-catenin destruction complex, and a negative regulator of canonical Wnt signaling. Previous studies implicated the six C-terminal amino acids (C6 motif) in the ability of Axin to activate c-Jun N-terminal kinase, and identified them as a SUMOylation target. Deletion of the C6 motif of mouse Axin in vivo reduced the steady-state protein level, which caused embryonic lethality. Here, we report that this deletion (Axin-ΔC6) causes a reduced half-life in mouse embryonic fibroblasts and an increased susceptibility to ubiquitination in HEK 293T cells. We confirmed the C6 motif as a SUMOylation target in vitro, and found that mutating the C-terminal SUMOylation target residues increased the susceptibility of Axin to polyubiquitination and reduced its steady-state level. Heterologous SUMOylation target sites could replace C6 in providing this protective effect. These findings suggest that SUMOylation of the C6 motif may prevent polyubiquitination, thus increasing the stability of Axin. Although C6 deletion also caused increased association of Axin with Dvl-1, this interaction was not altered by mutating the lysine residues in C6, nor could heterologous SUMOylation motifs replace the C6 motif in this assay. Therefore, some other specific property of the C6 motif seems to reduce the interaction of Axin with Dvl-1.—Kim, M. J., Chia, I. V., Costantini, F. SUMOylation target sites at the C terminus protect Axin from ubiquitination and confer protein stability. PMID:18632848
Microprocessor depends on hemin to recognize the apical loop of primary microRNA

PubMed Central

Park, Joha; Dang, Thi Lieu; Choi, Yeon-Gil; Kim, V Narry

2018-01-01

Abstract Microprocessor, which consists of a ribonuclease III DROSHA and its cofactor DGCR8, initiates microRNA (miRNA) maturation by cleaving primary miRNA transcripts (pri-miRNAs). We recently demonstrated that the DGCR8 dimer recognizes the apical elements of pri-miRNAs, including the UGU motif, to accurately locate and orient Microprocessor on pri-miRNAs. However, the mechanism underlying the selective RNA binding remains unknown. In this study, we find that hemin, a ferric ion-containing porphyrin, enhances the specific interaction between the apical UGU motif and the DGCR8 dimer, allowing Microprocessor to achieve high efficiency and fidelity of pri-miRNA processing in vitro. Furthermore, by generating a DGCR8 mutant cell line and carrying out rescue experiments, we discover that hemin preferentially stimulates the expression of miRNAs possessing the UGU motif, thereby conferring differential regulation of miRNA maturation. Our findings reveal the molecular action mechanism of hemin in pri-miRNA processing and establish a novel function of hemin in inducing specific RNA-protein interaction. PMID:29750274
Microprocessor depends on hemin to recognize the apical loop of primary microRNA.

PubMed

Nguyen, Tuan Anh; Park, Joha; Dang, Thi Lieu; Choi, Yeon-Gil; Kim, V Narry

2018-06-20

Microprocessor, which consists of a ribonuclease III DROSHA and its cofactor DGCR8, initiates microRNA (miRNA) maturation by cleaving primary miRNA transcripts (pri-miRNAs). We recently demonstrated that the DGCR8 dimer recognizes the apical elements of pri-miRNAs, including the UGU motif, to accurately locate and orient Microprocessor on pri-miRNAs. However, the mechanism underlying the selective RNA binding remains unknown. In this study, we find that hemin, a ferric ion-containing porphyrin, enhances the specific interaction between the apical UGU motif and the DGCR8 dimer, allowing Microprocessor to achieve high efficiency and fidelity of pri-miRNA processing in vitro. Furthermore, by generating a DGCR8 mutant cell line and carrying out rescue experiments, we discover that hemin preferentially stimulates the expression of miRNAs possessing the UGU motif, thereby conferring differential regulation of miRNA maturation. Our findings reveal the molecular action mechanism of hemin in pri-miRNA processing and establish a novel function of hemin in inducing specific RNA-protein interaction.
Canonical Bcl-2 motifs of the Na+/K+ pump revealed by the BH3 mimetic chelerythrine: early signal transducers of apoptosis?

PubMed

Lauf, Peter K; Heiny, Judith; Meller, Jarek; Lepera, Michael A; Koikov, Leonid; Alter, Gerald M; Brown, Thomas L; Adragna, Norma C

2013-01-01

Chelerythrine [CET], a protein kinase C [PKC] inhibitor, is a prop-apoptotic BH3-mimetic binding to BH1-like motifs of Bcl-2 proteins. CET action was examined on PKC phosphorylation-dependent membrane transporters (Na+/K+ pump/ATPase [NKP, NKA], Na+-K+-2Cl+ [NKCC] and K+-Cl- [KCC] cotransporters, and channel-supported K+ loss) in human lens epithelial cells [LECs]. K+ loss and K+ uptake, using Rb+ as congener, were measured by atomic absorption/emission spectrophotometry with NKP and NKCC inhibitors, and Cl- replacement by NO3ˉ to determine KCC. 3H-Ouabain binding was performed on a pig renal NKA in the presence and absence of CET. Bcl-2 protein and NKA sequences were aligned and motifs identified and mapped using PROSITE in conjunction with BLAST alignments and analysis of conservation and structural similarity based on prediction of secondary and crystal structures. CET inhibited NKP and NKCC by >90% (IC50 values ~35 and ~15 μM, respectively) without significant KCC activity change, and stimulated K+ loss by ~35% at 10-30 μM. Neither ATP levels nor phosphorylation of the NKA α1 subunit changed. 3H-ouabain was displaced from pig renal NKA only at 100 fold higher CET concentrations than the ligand. Sequence alignments of NKA with BH1- and BH3-like motifs containing pro-survival Bcl-2 and BclXl proteins showed more than one BH1-like motif within NKA for interaction with CET or with BH3 motifs. One NKA BH1-like motif (ARAAEILARDGPN) was also found in all P-type ATPases. Also, NKA possessed a second motif similar to that near the BH3 region of Bcl-2. Findings support the hypothesis that CET inhibits NKP by binding to BH1-like motifs and disrupting the α1 subunit catalytic activity through conformational changes. By interacting with Bcl-2 proteins through their complementary BH1- or BH3-like-motifs, NKP proteins may be sensors of normal and pathological cell functions, becoming important yet unrecognized signal transducers in the initial phases of apoptosis. CET action on NKCC1 and K+ channels may involve PKC-regulated mechanisms; however, limited sequence homologies to BH1-like motifs cannot exclude direct effects.
A discrete artificial bee colony algorithm for detecting transcription factor binding sites in DNA sequences.

PubMed

Karaboga, D; Aslan, S

2016-04-27

The great majority of biological sequences share significant similarity with other sequences as a result of evolutionary processes, and identifying these sequence similarities is one of the most challenging problems in bioinformatics. In this paper, we present a discrete artificial bee colony (ABC) algorithm, which is inspired by the intelligent foraging behavior of real honey bees, for the detection of highly conserved residue patterns or motifs within sequences. Experimental studies on three different data sets showed that the proposed discrete model, by adhering to the fundamental scheme of the ABC algorithm, produced competitive or better results than other metaheuristic motif discovery techniques.

Biclustering sparse binary genomic data.

PubMed

van Uitert, Miranda; Meuleman, Wouter; Wessels, Lodewyk

2008-12-01

Genomic datasets often consist of large, binary, sparse data matrices. In such a dataset, one is often interested in finding contiguous blocks that (mostly) contain ones. This is a biclustering problem, and while many algorithms have been proposed to deal with gene expression data, only two algorithms have been proposed that specifically deal with binary matrices. None of the gene expression biclustering algorithms can handle the large number of zeros in sparse binary matrices. The two proposed binary algorithms failed to produce meaningful results. In this article, we present a new algorithm that is able to extract biclusters from sparse, binary datasets. A powerful feature is that biclusters with different numbers of rows and columns can be detected, varying from many rows to few columns and few rows to many columns. It allows the user to guide the search towards biclusters of specific dimensions. When applying our algorithm to an input matrix derived from TRANSFAC, we find transcription factors with distinctly dissimilar binding motifs, but a clear set of common targets that are significantly enriched for GO categories.
OSR1 regulates a subset of inward rectifier potassium channels via a binding motif variant.

PubMed

Taylor, Clinton A; An, Sung-Wan; Kankanamalage, Sachith Gallolu; Stippec, Steve; Earnest, Svetlana; Trivedi, Ashesh T; Yang, Jonathan Zijiang; Mirzaei, Hamid; Huang, Chou-Long; Cobb, Melanie H

2018-04-10

The with-no-lysine (K) (WNK) signaling pathway to STE20/SPS1-related proline- and alanine-rich kinase (SPAK) and oxidative stress-responsive 1 (OSR1) kinase is an important mediator of cell volume and ion transport. SPAK and OSR1 associate with upstream kinases WNK 1-4, substrates, and other proteins through their C-terminal domains which interact with linear R-F-x-V/I sequence motifs. In this study we find that SPAK and OSR1 also interact with similar affinity with a motif variant, R-x-F-x-V/I. Eight of 16 human inward rectifier K + channels have an R-x-F-x-V motif. We demonstrate that two of these channels, Kir2.1 and Kir2.3, are activated by OSR1, while Kir4.1, which does not contain the motif, is not sensitive to changes in OSR1 or WNK activity. Mutation of the motif prevents activation of Kir2.3 by OSR1. Both siRNA knockdown of OSR1 and chemical inhibition of WNK activity disrupt NaCl-induced plasma membrane localization of Kir2.3. Our results suggest a mechanism by which WNK-OSR1 enhance Kir2.1 and Kir2.3 channel activity by increasing their plasma membrane localization. Regulation of members of the inward rectifier K + channel family adds functional and mechanistic insight into the physiological impact of the WNK pathway.
Comparative qualitative phosphoproteomics analysis identifies shared phosphorylation motifs and associated biological processes in evolutionary divergent plants.

PubMed

Al-Momani, Shireen; Qi, Da; Ren, Zhe; Jones, Andrew R

2018-06-15

Phosphorylation is one of the most prevalent post-translational modifications and plays a key role in regulating cellular processes. We carried out a bioinformatics analysis of pre-existing phosphoproteomics data, to profile two model species representing the largest subclasses in flowering plants the dicot Arabidopsis thaliana and the monocot Oryza sativa, to understand the extent to which phosphorylation signaling and function is conserved across evolutionary divergent plants. We identified 6537 phosphopeptides from 3189 phosphoproteins in Arabidopsis and 2307 phosphopeptides from 1613 phosphoproteins in rice. We identified phosphorylation motifs, finding nineteen pS motifs and two pT motifs shared in rice and Arabidopsis. The majority of shared motif-containing proteins were mapped to the same biological processes with similar patterns of fold enrichment, indicating high functional conservation. We also identified shared patterns of crosstalk between phosphoserines with enrichment for motifs pSXpS, pSXXpS and pSXXXpS, where X is any amino acid. Lastly, our results identified several pairs of motifs that are significantly enriched to co-occur in Arabidopsis proteins, indicating cross-talk between different sites, but this was not observed in rice. Our results demonstrate that there are evolutionary conserved mechanisms of phosphorylation-mediated signaling in plants, via analysis of high-throughput phosphorylation proteomics data from key monocot and dicot species: rice and Arabidposis thaliana. The results also suggest that there is increased crosstalk between phosphorylation sites in A. thaliana compared with rice. The results are important for our general understanding of cell signaling in plants, and the ability to use A. thaliana as a general model for plant biology. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.
Real-space observation of magnetic excitations and avalanche behavior in artificial quasicrystal lattices

DOE PAGES

Brajuskovic, V.; Barrows, F.; Phatak, C.; ...

2016-10-03

Artificial spin ice lattices have emerged as model systems for studying magnetic frustration in recent years. Most work to date has looked at periodic artificial spin ice lattices. In this paper, we observe frustration effects in quasicrystal artificial spin ice lattices that lack translational symmetry and contain vertices with different numbers of interacting elements. We find that as the lattice state changes following demagnetizing and annealing, specific vertex motifs retain low-energy configurations, which excites other motifs into higher energy configurations. In addition, we find that unlike the magnetization reversal process for periodic artificial spin ice lattices, which occurs through 1Dmore » avalanches, quasicrystal lattices undergo reversal through a dendritic 2D avalanche mechanism.« less
Real-space observation of magnetic excitations and avalanche behavior in artificial quasicrystal lattices

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brajuskovic, V.; Barrows, F.; Phatak, C.

Artificial spin ice lattices have emerged as model systems for studying magnetic frustration in recent years. Most work to date has looked at periodic artificial spin ice lattices. In this paper, we observe frustration effects in quasicrystal artificial spin ice lattices that lack translational symmetry and contain vertices with different numbers of interacting elements. We find that as the lattice state changes following demagnetizing and annealing, specific vertex motifs retain low-energy configurations, which excites other motifs into higher energy configurations. In addition, we find that unlike the magnetization reversal process for periodic artificial spin ice lattices, which occurs through 1Dmore » avalanches, quasicrystal lattices undergo reversal through a dendritic 2D avalanche mechanism.« less
Process-driven inference of biological network structure: feasibility, minimality, and multiplicity

NASA Astrophysics Data System (ADS)

Zeng, Chen

2012-02-01

For a given dynamic process, identifying the putative interaction networks to achieve it is the inference problem. In this talk, we address the computational complexity of inference problem in the context of Boolean networks under dominant inhibition condition. The first is a proof that the feasibility problem (is there a network that explains the dynamics?) can be solved in polynomial-time. Second, while the minimality problem (what is the smallest network that explains the dynamics?) is shown to be NP-hard, a simple polynomial-time heuristic is shown to produce near-minimal solutions, as demonstrated by simulation. Third, the theoretical framework also leads to a fast polynomial-time heuristic to estimate the number of network solutions with reasonable accuracy. We will apply these approaches to two simplified Boolean network models for the cell cycle process of budding yeast (Li 2004) and fission yeast (Davidich 2008). Our results demonstrate that each of these networks contains a giant backbone motif spanning all the network nodes that provides the desired main functionality, while the remaining edges in the network form smaller motifs whose role is to confer stability properties rather than provide function. Moreover, we show that the bioprocesses of these two cell cycle models differ considerably from a typically generated process and are intrinsically cascade-like.
Sequence information gain based motif analysis.

PubMed

Maynou, Joan; Pairó, Erola; Marco, Santiago; Perera, Alexandre

2015-11-09

The detection of regulatory regions in candidate sequences is essential for the understanding of the regulation of a particular gene and the mechanisms involved. This paper proposes a novel methodology based on information theoretic metrics for finding regulatory sequences in promoter regions. This methodology (SIGMA) has been tested on genomic sequence data for Homo sapiens and Mus musculus. SIGMA has been compared with different publicly available alternatives for motif detection, such as MEME/MAST, Biostrings (Bioconductor package), MotifRegressor, and previous work such Qresiduals projections or information theoretic based detectors. Comparative results, in the form of Receiver Operating Characteristic curves, show how, in 70% of the studied Transcription Factor Binding Sites, the SIGMA detector has a better performance and behaves more robustly than the methods compared, while having a similar computational time. The performance of SIGMA can be explained by its parametric simplicity in the modelling of the non-linear co-variability in the binding motif positions. Sequence Information Gain based Motif Analysis is a generalisation of a non-linear model of the cis-regulatory sequences detection based on Information Theory. This generalisation allows us to detect transcription factor binding sites with maximum performance disregarding the covariability observed in the positions of the training set of sequences. SIGMA is freely available to the public at http://b2slab.upc.edu.
Evolutionary analysis of FAM83H in vertebrates.

PubMed

Huang, Wushuang; Yang, Mei; Wang, Changning; Song, Yaling

2017-01-01

Amelogenesis imperfecta is a group of disorders causing abnormalities in enamel formation in various phenotypes. Many mutations in the FAM83H gene have been identified to result in autosomal dominant hypocalcified amelogenesis imperfecta in different populations. However, the structure and function of FAM83H and its pathological mechanism have yet to be further explored. Evolutionary analysis is an alternative for revealing residues or motifs that are important for protein function. In the present study, we chose 50 vertebrate species in public databases representative of approximately 230 million years of evolution, including 1 amphibian, 2 fishes, 7 sauropsidas and 40 mammals, and we performed evolutionary analysis on the FAM83H protein. By sequence alignment, conserved residues and motifs were indicated, and the loss of important residues and motifs of five special species (Malayan pangolin, platypus, minke whale, nine-banded armadillo and aardvark) was discovered. A phylogenetic time tree showed the FAM83H divergent process. Positive selection sites in the C-terminus suggested that the C-terminus of FAM83H played certain adaptive roles during evolution. The results confirmed some important motifs reported in previous findings and identified some new highly conserved residues and motifs that need further investigation. The results suggest that the C-terminus of FAM83H contain key conserved regions critical to enamel formation and calcification.
Identification of sequence-structure RNA binding motifs for SELEX-derived aptamers.

PubMed

Hoinka, Jan; Zotenko, Elena; Friedman, Adam; Sauna, Zuben E; Przytycka, Teresa M

2012-06-15

Systematic Evolution of Ligands by EXponential Enrichment (SELEX) represents a state-of-the-art technology to isolate single-stranded (ribo)nucleic acid fragments, named aptamers, which bind to a molecule (or molecules) of interest via specific structural regions induced by their sequence-dependent fold. This powerful method has applications in designing protein inhibitors, molecular detection systems, therapeutic drugs and antibody replacement among others. However, full understanding and consequently optimal utilization of the process has lagged behind its wide application due to the lack of dedicated computational approaches. At the same time, the combination of SELEX with novel sequencing technologies is beginning to provide the data that will allow the examination of a variety of properties of the selection process. To close this gap we developed, Aptamotif, a computational method for the identification of sequence-structure motifs in SELEX-derived aptamers. To increase the chances of identifying functional motifs, Aptamotif uses an ensemble-based approach. We validated the method using two published aptamer datasets containing experimentally determined motifs of increasing complexity. We were able to recreate the author's findings to a high degree, thus proving the capability of our approach to identify binding motifs in SELEX data. Additionally, using our new experimental dataset, we illustrate the application of Aptamotif to elucidate several properties of the selection process.
Conservation of the Human Integrin-Type Beta-Propeller Domain in Bacteria

PubMed Central

Chouhan, Bhanupratap; Denesyuk, Alexander; Heino, Jyrki; Johnson, Mark S.; Denessiouk, Konstantin

2011-01-01

Integrins are heterodimeric cell-surface receptors with key functions in cell-cell and cell-matrix adhesion. Integrin α and β subunits are present throughout the metazoans, but it is unclear whether the subunits predate the origin of multicellular organisms. Several component domains have been detected in bacteria, one of which, a specific 7-bladed β-propeller domain, is a unique feature of the integrin α subunits. Here, we describe a structure-derived motif, which incorporates key features of each blade from the X-ray structures of human αIIbβ3 and αVβ3, includes elements of the FG-GAP/Cage and Ca2+-binding motifs, and is specific only for the metazoan integrin domains. Separately, we searched for the metazoan integrin type β-propeller domains among all available sequences from bacteria and unicellular eukaryotic organisms, which must incorporate seven repeats, corresponding to the seven blades of the β-propeller domain, and so that the newly found structure-derived motif would exist in every repeat. As the result, among 47 available genomes of unicellular eukaryotes we could not find a single instance of seven repeats with the motif. Several sequences contained three repeats, a predicted transmembrane segment, and a short cytoplasmic motif associated with some integrins, but otherwise differ from the metazoan integrin α subunits. Among the available bacterial sequences, we found five examples containing seven sequential metazoan integrin-specific motifs within the seven repeats. The motifs differ in having one Ca2+-binding site per repeat, whereas metazoan integrins have three or four sites. The bacterial sequences are more conserved in terms of motif conservation and loop length, suggesting that the structure is more regular and compact than those example structures from human integrins. Although the bacterial examples are not full-length integrins, the full-length metazoan-type 7-bladed β-propeller domains are present, and sometimes two tandem copies are found. PMID:22022374
Evidence for the Concerted Evolution between Short Linear Protein Motifs and Their Flanking Regions

PubMed Central

Chica, Claudia; Diella, Francesca; Gibson, Toby J.

2009-01-01

Background Linear motifs are short modules of protein sequences that play a crucial role in mediating and regulating many protein–protein interactions. The function of linear motifs strongly depends on the context, e.g. functional instances mainly occur inside flexible regions that are accessible for interaction. Sometimes linear motifs appear as isolated islands of conservation in multiple sequence alignments. However, they also occur in larger blocks of sequence conservation, suggesting an active role for the neighbouring amino acids. Results The evolution of regions flanking 116 functional linear motif instances was studied. The conservation of the amino acid sequence and order/disorder tendency of those regions was related to presence/absence of the instance. For the majority of the analysed instances, the pairs of sequences conserving the linear motif were also observed to maintain a similar local structural tendency and/or to have higher local sequence conservation when compared to pairs of sequences where one is missing the linear motif. Furthermore, those instances have a higher chance to co–evolve with the neighbouring residues in comparison to the distant ones. Those findings are supported by examples where the regulation of the linear motif–mediated interaction has been shown to depend on the modifications (e.g. phosphorylation) at neighbouring positions or is thought to benefit from the binding versatility of disordered regions. Conclusion The results suggest that flanking regions are relevant for linear motif–mediated interactions, both at the structural and sequence level. More interestingly, they indicate that the prediction of linear motif instances can be enriched with contextual information by performing a sequence analysis similar to the one presented here. This can facilitate the understanding of the role of these predicted instances in determining the protein function inside the broader context of the cellular network where they arise. PMID:19584925
Biomimetic trapping cocktail to screen reactive metabolites: use of an amino acid and DNA motif mixture as light/heavy isotope pairs differing in mass shift.

PubMed

Hosaka, Shuto; Honda, Takuto; Lee, Seon Hwa; Oe, Tomoyuki

2018-06-01

Candidate drugs that can be metabolically transformed into reactive electrophilic products, such as epoxides, quinones, and nitroso compounds, are of special concern because subsequent covalent binding to bio-macromolecules can cause adverse drug reactions, such as allergic reactions, hepatotoxicity, and genotoxicity. Several strategies have been reported for screening reactive metabolites, such as a covalent binding assay with radioisotope-labeled drugs and a trapping method followed by LC-MS/MS analyses. Of these, a trapping method using glutathione is the most common, especially at the early stage of drug development. However, the cysteine of glutathione is not the only nucleophilic site in vivo; lysine, histidine, arginine, and DNA bases are also nucleophilic. Indeed, the glutathione trapping method tends to overlook several types of reactive metabolites, such as aldehydes, acylglucuronides, and nitroso compounds. Here, we introduce an alternate way for screening reactive metabolites as follows: A mixture of the light and heavy isotopes of simplified amino acid motifs and a DNA motif is used as a biomimetic trapping cocktail. This mixture consists of [ 2 H 0 ]/[ 2 H 3 ]-1-methylguanidine (arginine motif, Δ 3 Da), [ 2 H 0 ]/[ 2 H 4 ]-2-mercaptoethanol (cysteine motif, Δ 4 Da), [ 2 H 0 ]/[ 2 H 5 ]-4-methylimidazole (histidine motif, Δ 5 Da), [ 2 H 0 ]/[ 2 H 9 ]-n-butylamine (lysine motif, Δ 9 Da), and [ 13 C 0 , 15 N 0 ]/[ 13 C 1 , 15 N 2 ]-2'-deoxyguanosine (DNA motif, Δ 3 Da). Mass tag triggered data-dependent acquisition is used to find the characteristic doublet peaks, followed by specific identification of the light isotope peak using MS/MS. Forty-two model drugs were examined using an in vitro microsome experiment to validate the strategy. Graphical abstract Biomimetic trapping cocktail to screen reactive metabolites.
Computational Tools for the Identification and Interpretation of Sequence Motifs in Immunopeptidomes.

PubMed

Alvarez, Bruno; Barra, Carolina; Nielsen, Morten; Andreatta, Massimo

2018-01-12

Recent advances in proteomics and mass-spectrometry have widely expanded the detectable peptide repertoire presented by major histocompatibility complex (MHC) molecules on the cell surface, collectively known as the immunopeptidome. Finely characterizing the immunopeptidome brings about important basic insights into the mechanisms of antigen presentation, but can also reveal promising targets for vaccine development and cancer immunotherapy. This report describes a number of practical and efficient approaches to analyze immunopeptidomics data, discussing the identification of meaningful sequence motifs in various scenarios and considering current limitations. Guidelines are provided for the filtering of false hits and contaminants, and to address the problem of motif deconvolution in cell lines expressing multiple MHC alleles, both for the MHC class I and class II systems. Finally, it is demonstrated how machine learning can be readily employed by non-expert users to generate accurate prediction models directly from mass-spectrometry eluted ligand data sets. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Rewiring yeast sugar transporter preference through modifying a conserved protein motif.

PubMed

Young, Eric M; Tong, Alice; Bui, Hang; Spofford, Caitlin; Alper, Hal S

2014-01-07

Utilization of exogenous sugars found in lignocellulosic biomass hydrolysates, such as xylose, must be improved before yeast can serve as an efficient biofuel and biochemical production platform. In particular, the first step in this process, the molecular transport of xylose into the cell, can serve as a significant flux bottleneck and is highly inhibited by other sugars. Here we demonstrate that sugar transport preference and kinetics can be rewired through the programming of a sequence motif of the general form G-G/F-XXX-G found in the first transmembrane span. By evaluating 46 different heterologously expressed transporters, we find that this motif is conserved among functional transporters and highly enriched in transporters that confer growth on xylose. Through saturation mutagenesis and subsequent rational mutagenesis, four transporter mutants unable to confer growth on glucose but able to sustain growth on xylose were engineered. Specifically, Candida intermedia gxs1 Phe(38)Ile(39)Met(40), Scheffersomyces stipitis rgt2 Phe(38) and Met(40), and Saccharomyces cerevisiae hxt7 Ile(39)Met(40)Met(340) all exhibit this phenotype. In these cases, primary hexose transporters were rewired into xylose transporters. These xylose transporters nevertheless remained inhibited by glucose. Furthermore, in the course of identifying this motif, novel wild-type transporters with superior monosaccharide growth profiles were discovered, namely S. stipitis RGT2 and Debaryomyces hansenii 2D01474. These findings build toward the engineering of efficient pentose utilization in yeast and provide a blueprint for reprogramming transporter properties.
A Comparison Study for DNA Motif Modeling on Protein Binding Microarray.

PubMed

Wong, Ka-Chun; Li, Yue; Peng, Chengbin; Wong, Hau-San

2016-01-01

Transcription factor binding sites (TFBSs) are relatively short (5-15 bp) and degenerate. Identifying them is a computationally challenging task. In particular, protein binding microarray (PBM) is a high-throughput platform that can measure the DNA binding preference of a protein in a comprehensive and unbiased manner; for instance, a typical PBM experiment can measure binding signal intensities of a protein to all possible DNA k-mers (k = 8∼10). Since proteins can often bind to DNA with different binding intensities, one of the major challenges is to build TFBS (also known as DNA motif) models which can fully capture the quantitative binding affinity data. To learn DNA motif models from the non-convex objective function landscape, several optimization methods are compared and applied to the PBM motif model building problem. In particular, representative methods from different optimization paradigms have been chosen for modeling performance comparison on hundreds of PBM datasets. The results suggest that the multimodal optimization methods are very effective for capturing the binding preference information from PBM data. In particular, we observe a general performance improvement if choosing di-nucleotide modeling over mono-nucleotide modeling. In addition, the models learned by the best-performing method are applied to two independent applications: PBM probe rotation testing and ChIP-Seq peak sequence prediction, demonstrating its biological applicability.
Ubiquitous presence of the hammerhead ribozyme motif along the tree of life

PubMed Central

de la Peña, Marcos; García-Robles, Inmaculada

2010-01-01

Examples of small self-cleaving RNAs embedded in noncoding regions already have been found to be involved in the control of gene expression, although their origin remains uncertain. In this work, we show the widespread occurrence of the hammerhead ribozyme (HHR) motif among genomes from the Bacteria, Chromalveolata, Plantae, and Metazoa kingdoms. Intergenic HHRs were detected in three different bacterial genomes, whereas metagenomic data from Galapagos Islands showed the occurrence of similar ribozymes that could be regarded as direct relics from the RNA world. Among eukaryotes, HHRs were detected in the genomes of three water molds as well as 20 plant species, ranging from unicellular algae to vascular plants. These HHRs were very similar to those previously described in small RNA plant pathogens and, in some cases, appeared as close tandem repetitions. A parallel situation of tandemly repeated HHR motifs was also detected in the genomes of lower metazoans from cnidarians to invertebrates, with special emphasis among hematophagous and parasitic organisms. Altogether, these findings unveil the HHR as a widespread motif in DNA genomes, which would be involved in new forms of retrotransposable elements. PMID:20705646
T-Reg Comparator: an analysis tool for the comparison of position weight matrices

PubMed Central

Roepcke, Stefan; Grossmann, Steffen; Rahmann, Sven; Vingron, Martin

2005-01-01

T-Reg Comparator is a novel software tool designed to support research into transcriptional regulation. Sequence motifs representing transcription factor binding sites are usually encoded as position weight matrices. The user inputs a set of such weight matrices or binding site sequences and our program matches them against the T-Reg database, which is presently built on data from the Transfac [E. Wingender (2004) In Silico Biol., 4, 55–61] and Jaspar [A. Sandelin, W. Alkema, P. Engstrom, W. W. Wasserman and B. Lenhard (2004) Nucleic Acids Res., 32, D91–D94]. Our tool delivers a detailed report on similarities between user-supplied motifs and motifs in the database. Apart from simple one-to-one relationships, T-Reg Comparator is also able to detect similarities between submatrices. In addition, we provide a user interface to a program for sequence scanning with weight matrices. Typical areas of application for T-Reg Comparator are motif and regulatory module finding and annotation of regulatory genomic regions. T-Reg Comparator is available at . PMID:15980506
T-Reg Comparator: an analysis tool for the comparison of position weight matrices.

PubMed

Roepcke, Stefan; Grossmann, Steffen; Rahmann, Sven; Vingron, Martin

2005-07-01

T-Reg Comparator is a novel software tool designed to support research into transcriptional regulation. Sequence motifs representing transcription factor binding sites are usually encoded as position weight matrices. The user inputs a set of such weight matrices or binding site sequences and our program matches them against the T-Reg database, which is presently built on data from the Transfac [E. Wingender (2004) In Silico Biol., 4, 55-61] and Jaspar [A. Sandelin, W. Alkema, P. Engstrom, W. W. Wasserman and B. Lenhard (2004) Nucleic Acids Res., 32, D91-D94]. Our tool delivers a detailed report on similarities between user-supplied motifs and motifs in the database. Apart from simple one-to-one relationships, T-Reg Comparator is also able to detect similarities between submatrices. In addition, we provide a user interface to a program for sequence scanning with weight matrices. Typical areas of application for T-Reg Comparator are motif and regulatory module finding and annotation of regulatory genomic regions. T-Reg Comparator is available at http://treg.molgen.mpg.de.
A discriminative method for family-based protein remote homology detection that combines inductive logic programming and propositional models

PubMed Central

2011-01-01

Background Remote homology detection is a hard computational problem. Most approaches have trained computational models by using either full protein sequences or multiple sequence alignments (MSA), including all positions. However, when we deal with proteins in the "twilight zone" we can observe that only some segments of sequences (motifs) are conserved. We introduce a novel logical representation that allows us to represent physico-chemical properties of sequences, conserved amino acid positions and conserved physico-chemical positions in the MSA. From this, Inductive Logic Programming (ILP) finds the most frequent patterns (motifs) and uses them to train propositional models, such as decision trees and support vector machines (SVM). Results We use the SCOP database to perform our experiments by evaluating protein recognition within the same superfamily. Our results show that our methodology when using SVM performs significantly better than some of the state of the art methods, and comparable to other. However, our method provides a comprehensible set of logical rules that can help to understand what determines a protein function. Conclusions The strategy of selecting only the most frequent patterns is effective for the remote homology detection. This is possible through a suitable first-order logical representation of homologous properties, and through a set of frequent patterns, found by an ILP system, that summarizes essential features of protein functions. PMID:21429187
A discriminative method for family-based protein remote homology detection that combines inductive logic programming and propositional models.

PubMed

Bernardes, Juliana S; Carbone, Alessandra; Zaverucha, Gerson

2011-03-23

Remote homology detection is a hard computational problem. Most approaches have trained computational models by using either full protein sequences or multiple sequence alignments (MSA), including all positions. However, when we deal with proteins in the "twilight zone" we can observe that only some segments of sequences (motifs) are conserved. We introduce a novel logical representation that allows us to represent physico-chemical properties of sequences, conserved amino acid positions and conserved physico-chemical positions in the MSA. From this, Inductive Logic Programming (ILP) finds the most frequent patterns (motifs) and uses them to train propositional models, such as decision trees and support vector machines (SVM). We use the SCOP database to perform our experiments by evaluating protein recognition within the same superfamily. Our results show that our methodology when using SVM performs significantly better than some of the state of the art methods, and comparable to other. However, our method provides a comprehensible set of logical rules that can help to understand what determines a protein function. The strategy of selecting only the most frequent patterns is effective for the remote homology detection. This is possible through a suitable first-order logical representation of homologous properties, and through a set of frequent patterns, found by an ILP system, that summarizes essential features of protein functions.

A computational proposal for designing structured RNA pools for in vitro selection of RNAs.

PubMed

Kim, Namhee; Gan, Hin Hark; Schlick, Tamar

2007-04-01

Although in vitro selection technology is a versatile experimental tool for discovering novel synthetic RNA molecules, finding complex RNA molecules is difficult because most RNAs identified from random sequence pools are simple motifs, consistent with recent computational analysis of such sequence pools. Thus, enriching in vitro selection pools with complex structures could increase the probability of discovering novel RNAs. Here we develop an approach for engineering sequence pools that links RNA sequence space regions with corresponding structural distributions via a "mixing matrix" approach combined with a graph theory analysis. We define five classes of mixing matrices motivated by covariance mutations in RNA; these constructs define nucleotide transition rates and are applied to chosen starting sequences to yield specific nonrandom pools. We examine the coverage of sequence space as a function of the mixing matrix and starting sequence via clustering analysis. We show that, in contrast to random sequences, which are associated only with a local region of sequence space, our designed pools, including a structured pool for GTP aptamers, can target specific motifs. It follows that experimental synthesis of designed pools can benefit from using optimized starting sequences, mixing matrices, and pool fractions associated with each of our constructed pools as a guide. Automation of our approach could provide practical tools for pool design applications for in vitro selection of RNAs and related problems.
Edge usage, motifs, and regulatory logic for cell cycling genetic networks

NASA Astrophysics Data System (ADS)

Zagorski, M.; Krzywicki, A.; Martin, O. C.

2013-01-01

The cell cycle is a tightly controlled process, yet it shows marked differences across species. Which of its structural features follow solely from the ability to control gene expression? We tackle this question in silico by examining the ensemble of all regulatory networks which satisfy the constraint of producing a given sequence of gene expressions. We focus on three cell cycle profiles coming from baker's yeast, fission yeast, and mammals. First, we show that the networks in each of the ensembles use just a few interactions that are repeatedly reused as building blocks. Second, we find an enrichment in network motifs that is similar in the two yeast cell cycle systems investigated. These motifs do not have autonomous functions, yet they reveal a regulatory logic for cell cycling based on a feed-forward cascade of activating interactions.
Exploration of tetrahedral structures in silicate cathodes using a motif-network scheme

PubMed Central

Zhao, Xin; Wu, Shunqing; Lv, Xiaobao; Nguyen, Manh Cuong; Wang, Cai-Zhuang; Lin, Zijing; Zhu, Zi-Zhong; Ho, Kai-Ming

2015-01-01

Using a motif-network search scheme, we studied the tetrahedral structures of the dilithium/disodium transition metal orthosilicates A2MSiO4 with A = Li or Na and M = Mn, Fe or Co. In addition to finding all previously reported structures, we discovered many other different tetrahedral-network-based crystal structures which are highly degenerate in energy. These structures can be classified into structures with 1D, 2D and 3D M-Si-O frameworks. A clear trend of the structural preference in different systems was revealed and possible indicators that affect the structure stabilities were introduced. For the case of Na systems which have been much less investigated in the literature relative to the Li systems, we predicted their ground state structures and found evidence for the existence of new structural motifs. PMID:26497381
Overlapping activation-induced cytidine deaminase hotspot motifs in Ig class-switch recombination

PubMed Central

Han, Li; Masani, Shahnaz; Yu, Kefei

2011-01-01

Ig class-switch recombination (CSR) is directed by the long and repetitive switch regions and requires activation-induced cytidine deaminase (AID). One of the conserved switch-region sequence motifs (AGCT) is a preferred site for AID-mediated DNA-cytosine deamination. By using somatic gene targeting and recombinase-mediated cassette exchange, we established a cell line-based CSR assay that allows manipulation of switch sequences at the endogenous locus. We show that AGCT is only one of a family of four WGCW motifs in the switch region that can facilitate CSR. We go on to show that it is the overlap of AID hotspots at WGCW sites on the top and bottom strands that is critical. This finding leads to a much clearer model for the difference between CSR and somatic hypermutation. PMID:21709240
In silico analysis of a therapeutic target in Leishmania infantum: the guanosine-diphospho-D-mannose pyrophosphorylase.

PubMed

Pomel, S; Rodrigo, J; Hendra, F; Cavé, C; Loiseau, P M

2012-02-01

Leishmaniases are tropical and sub-tropical diseases for which classical drugs (i.e. antimonials) exhibit toxicity and drug resistance. Such a situation requires to find new chemical series with antileishmanial activity. This work consists in analyzing the structure of a validated target in Leishmania: the GDP-mannose pyrophosphorylase (GDP-MP), an enzyme involved in glycosylation and essential for amastigote survival. By comparing both human and L. infantum GDP-MP 3D homology models, we identified (i) a common motif of amino acids that binds to the mannose moiety of the substrate and, interestingly, (ii) a motif that is specific to the catalytic site of the parasite enzyme. This motif could then be used to design compounds that specifically inhibit the leishmanial GDP-MP, without any effect on the human homolog.
Sequential dynamics in the motif of excitatory coupled elements

NASA Astrophysics Data System (ADS)

Korotkov, Alexander G.; Kazakov, Alexey O.; Osipov, Grigory V.

2015-11-01

In this article a new model of motif (small ensemble) of neuron-like elements is proposed. It is built with the use of the generalized Lotka-Volterra model with excitatory couplings. The main motivation for this work comes from the problems of neuroscience where excitatory couplings are proved to be the predominant type of interaction between neurons of the brain. In this paper it is shown that there are two modes depending on the type of coupling between the elements: the mode with a stable heteroclinic cycle and the mode with a stable limit cycle. Our second goal is to examine the chaotic dynamics of the generalized three-dimensional Lotka-Volterra model.
Gating-signal propagation by a feed-forward neural motif

NASA Astrophysics Data System (ADS)

Liang, Xiaoming; Yanchuk, Serhiy; Zhao, Liang

2013-07-01

We study the signal propagation in a feed-forward motif consisting of three bistable neurons: Two input neurons receive input signals and the third output neuron generates the output. We find that a weak input signal can be propagated from the input neurons to the output neuron without amplitude attenuation. We further reveal that the initial states of the input neurons and the coupling strength act as signal gates and determine whether the propagation is enhanced or not. We also investigate the effect of the input signal frequency on enhanced signal propagation.
BLSSpeller: exhaustive comparative discovery of conserved cis-regulatory elements.

PubMed

De Witte, Dieter; Van de Velde, Jan; Decap, Dries; Van Bel, Michiel; Audenaert, Pieter; Demeester, Piet; Dhoedt, Bart; Vandepoele, Klaas; Fostier, Jan

2015-12-01

The accurate discovery and annotation of regulatory elements remains a challenging problem. The growing number of sequenced genomes creates new opportunities for comparative approaches to motif discovery. Putative binding sites are then considered to be functional if they are conserved in orthologous promoter sequences of multiple related species. Existing methods for comparative motif discovery usually rely on pregenerated multiple sequence alignments, which are difficult to obtain for more diverged species such as plants. As a consequence, misaligned regulatory elements often remain undetected. We present a novel algorithm that supports both alignment-free and alignment-based motif discovery in the promoter sequences of related species. Putative motifs are exhaustively enumerated as words over the IUPAC alphabet and screened for conservation using the branch length score. Additionally, a confidence score is established in a genome-wide fashion. In order to take advantage of a cloud computing infrastructure, the MapReduce programming model is adopted. The method is applied to four monocotyledon plant species and it is shown that high-scoring motifs are significantly enriched for open chromatin regions in Oryza sativa and for transcription factor binding sites inferred through protein-binding microarrays in O.sativa and Zea mays. Furthermore, the method is shown to recover experimentally profiled ga2ox1-like KN1 binding sites in Z.mays. BLSSpeller was written in Java. Source code and manual are available at http://bioinformatics.intec.ugent.be/blsspeller Klaas.Vandepoele@psb.vib-ugent.be or jan.fostier@intec.ugent.be. Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.
BLSSpeller: exhaustive comparative discovery of conserved cis-regulatory elements

PubMed Central

De Witte, Dieter; Van de Velde, Jan; Decap, Dries; Van Bel, Michiel; Audenaert, Pieter; Demeester, Piet; Dhoedt, Bart; Vandepoele, Klaas; Fostier, Jan

2015-01-01

Motivation: The accurate discovery and annotation of regulatory elements remains a challenging problem. The growing number of sequenced genomes creates new opportunities for comparative approaches to motif discovery. Putative binding sites are then considered to be functional if they are conserved in orthologous promoter sequences of multiple related species. Existing methods for comparative motif discovery usually rely on pregenerated multiple sequence alignments, which are difficult to obtain for more diverged species such as plants. As a consequence, misaligned regulatory elements often remain undetected. Results: We present a novel algorithm that supports both alignment-free and alignment-based motif discovery in the promoter sequences of related species. Putative motifs are exhaustively enumerated as words over the IUPAC alphabet and screened for conservation using the branch length score. Additionally, a confidence score is established in a genome-wide fashion. In order to take advantage of a cloud computing infrastructure, the MapReduce programming model is adopted. The method is applied to four monocotyledon plant species and it is shown that high-scoring motifs are significantly enriched for open chromatin regions in Oryza sativa and for transcription factor binding sites inferred through protein-binding microarrays in O.sativa and Zea mays. Furthermore, the method is shown to recover experimentally profiled ga2ox1-like KN1 binding sites in Z.mays. Availability and implementation: BLSSpeller was written in Java. Source code and manual are available at http://bioinformatics.intec.ugent.be/blsspeller Contact: Klaas.Vandepoele@psb.vib-ugent.be or jan.fostier@intec.ugent.be Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26254488
Characterization of topological structure on complex networks.

PubMed

Nakamura, Ikuo

2003-10-01

Characterizing the topological structure of complex networks is a significant problem especially from the viewpoint of data mining on the World Wide Web. "Page rank" used in the commercial search engine Google is such a measure of authority to rank all the nodes matching a given query. We have investigated the page-rank distribution of the real Web and a growing network model, both of which have directed links and exhibit a power law distributions of in-degree (the number of incoming links to the node) and out-degree (the number of outgoing links from the node), respectively. We find a concentration of page rank on a small number of nodes and low page rank on high degree regimes in the real Web, which can be explained by topological properties of the network, e.g., network motifs, and connectivities of nearest neighbors.
Cellular microRNAs up-regulate transcription via interaction with promoter TATA-box motifs.

PubMed

Zhang, Yijun; Fan, Miaomiao; Zhang, Xue; Huang, Feng; Wu, Kang; Zhang, Junsong; Liu, Jun; Huang, Zhuoqiong; Luo, Haihua; Tao, Liang; Zhang, Hui

2014-12-01

The TATA box represents one of the most prevalent core promoters where the pre-initiation complexes (PICs) for gene transcription are assembled. This assembly is crucial for transcription initiation and well regulated. Here we show that some cellular microRNAs (miRNAs) are associated with RNA polymerase II (Pol II) and TATA box-binding protein (TBP) in human peripheral blood mononuclear cells (PBMCs). Among them, let-7i sequence specifically binds to the TATA-box motif of interleukin-2 (IL-2) gene and elevates IL-2 mRNA and protein production in CD4(+) T-lymphocytes in vitro and in vivo. Through direct interaction with the TATA-box motif, let-7i facilitates the PIC assembly and transcription initiation of IL-2 promoter. Several other cellular miRNAs, such as mir-138, mir-92a or mir-181d, also enhance the promoter activities via binding to the TATA-box motifs of insulin, calcitonin or c-myc, respectively. In agreement with the finding that an HIV-1-encoded miRNA could enhance viral replication through targeting the viral promoter TATA-box motif, our data demonstrate that the interaction with core transcription machinery is a novel mechanism for miRNAs to regulate gene expression. © 2014 Zhang et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Identification of sequence–structure RNA binding motifs for SELEX-derived aptamers

PubMed Central

Hoinka, Jan; Zotenko, Elena; Friedman, Adam; Sauna, Zuben E.; Przytycka, Teresa M.

2012-01-01

Motivation: Systematic Evolution of Ligands by EXponential Enrichment (SELEX) represents a state-of-the-art technology to isolate single-stranded (ribo)nucleic acid fragments, named aptamers, which bind to a molecule (or molecules) of interest via specific structural regions induced by their sequence-dependent fold. This powerful method has applications in designing protein inhibitors, molecular detection systems, therapeutic drugs and antibody replacement among others. However, full understanding and consequently optimal utilization of the process has lagged behind its wide application due to the lack of dedicated computational approaches. At the same time, the combination of SELEX with novel sequencing technologies is beginning to provide the data that will allow the examination of a variety of properties of the selection process. Results: To close this gap we developed, Aptamotif, a computational method for the identification of sequence–structure motifs in SELEX-derived aptamers. To increase the chances of identifying functional motifs, Aptamotif uses an ensemble-based approach. We validated the method using two published aptamer datasets containing experimentally determined motifs of increasing complexity. We were able to recreate the author's findings to a high degree, thus proving the capability of our approach to identify binding motifs in SELEX data. Additionally, using our new experimental dataset, we illustrate the application of Aptamotif to elucidate several properties of the selection process. Contact: przytyck@ncbi.nlm.nih.gov, Zuben.Sauna@fda.hhs.gov PMID:22689764
Characterization of tannase protein sequences of bacteria and fungi: an in silico study.

PubMed

Banerjee, Amrita; Jana, Arijit; Pati, Bikash R; Mondal, Keshab C; Das Mohapatra, Pradeep K

2012-04-01

The tannase protein sequences of 149 bacteria and 36 fungi were retrieved from NCBI database. Among them only 77 bacterial and 31 fungal tannase sequences were taken which have different amino acid compositions. These sequences were analysed for different physical and chemical properties, superfamily search, multiple sequence alignment, phylogenetic tree construction and motif finding to find out the functional motif and the evolutionary relationship among them. The superfamily search for these tannase exposed the occurrence of proline iminopeptidase-like, biotin biosynthesis protein BioH, O-acetyltransferase, carboxylesterase/thioesterase 1, carbon-carbon bond hydrolase, haloperoxidase, prolyl oligopeptidase, C-terminal domain and mycobacterial antigens families and alpha/beta hydrolase superfamily. Some bacterial and fungal sequence showed similarity with different families individually. The multiple sequence alignment of these tannase protein sequences showed conserved regions at different stretches with maximum homology from amino acid residues 389-469 and 482-523 which could be used for designing degenerate primers or probes specific for tannase producing bacterial and fungal species. Phylogenetic tree showed two different clusters; one has only bacteria and another have both fungi and bacteria showing some relationship between these different genera. Although in second cluster near about all fungal species were found together in a corner which indicates the sequence level similarity among fungal genera. The distributions of fourteen motifs analysis revealed Motif 1 with a signature amino acid sequence of 29 amino acids, i.e. GCSTGGREALKQAQRWPHDYDGIIANNPA, was uniformly observed in 83.3 % of studied tannase sequences representing its participation with the structure and enzymatic function.
The role of symmetry in the regulation of brain dynamics

NASA Astrophysics Data System (ADS)

Tang, Evelyn; Giusti, Chad; Cieslak, Matthew; Grafton, Scott; Bassett, Danielle

Synchronous neural processes regulate a wide range of behaviors from attention to learning. Yet structural constraints on these processes are far from understood. We draw on new theoretical links between structural symmetries and the control of synchronous function, to offer a reconceptualization of the relationships between brain structure and function in human and non-human primates. By classifying 3-node motifs in macaque connectivity data, we find the most prevalent motifs can theoretically ensure a diversity of function including strict synchrony as well as control to arbitrary states. The least prevalent motifs are theoretically controllable to arbitrary states, which may not be desirable in a biological system. In humans, regions with high topological similarity of connections (a continuous notion related to symmetry) are most commonly found in fronto-parietal systems, which may account for their critical role in cognitive control. Collectively, our work underscores the role of symmetry and topological similarity in regulating dynamics of brain function.
Exploration of tetrahedral structures in silicate cathodes using a motif-network scheme

DOE PAGES

Zhao, Xin; Wu, Shunqing; Lv, Xiaobao; ...

2015-10-26

Using a motif-network search scheme, we studied the tetrahedral structures of the dilithium/disodium transition metal orthosilicates A 2MSiO 4 with A = Li or Na and M = Mn, Fe or Co. In addition to finding all previously reported structures, we discovered many other different tetrahedral-network-based crystal structures which are highly degenerate in energy. In addition, these structures can be classified into structures with 1D, 2D and 3D M-Si-O frameworks. A clear trend of the structural preference in different systems was revealed and possible indicators that affect the structure stabilities were introduced. For the case of Na systems which havemore » been much less investigated in the literature relative to the Li systems, we predicted their ground state structures and found evidence for the existence of new structural motifs.« less
High-resolution profiling of linear B-cell epitopes from mucin-associated surface proteins (MASPs) of Trypanosoma cruzi during human infections

PubMed Central

Durante, Ignacio M.; La Spina, Pablo E.; Carmona, Santiago J.; Agüero, Fernán

2017-01-01

Background The Trypanosoma cruzi genome bears a huge family of genes and pseudogenes coding for Mucin-Associated Surface Proteins (MASPs). MASP molecules display a ‘mosaic’ structure, with highly conserved flanking regions and a strikingly variable central and mature domain made up of different combinations of a large repertoire of short sequence motifs. MASP molecules are highly expressed in mammal-dwelling stages of T. cruzi and may be involved in parasite-host interactions and/or in diverting the immune response. Methods/Principle findings High-density microarrays composed of fully overlapped 15mer peptides spanning the entire sequences of 232 non-redundant MASPs (~25% of the total MASP content) were screened with chronic Chagasic sera. This strategy led to the identification of 86 antigenic motifs, each one likely representing a single linear B-cell epitope, which were mapped to 69 different MASPs. These motifs could be further grouped into 31 clusters of structurally- and likely antigenically-related sequences, and fully characterized. In contrast to previous reports, we show that MASP antigenic motifs are restricted to the central and mature region of MASP polypeptides, consistent with their intracellular processing. The antigenicity of these motifs displayed significant positive correlation with their genome dosage and their relative position within the MASP polypeptide. In addition, we verified the biased genetic co-occurrence of certain antigenic motifs within MASP polypeptides, compatible with proposed intra-family recombination events underlying the evolution of their coding genes. Sequences spanning 7 MASP antigenic motifs were further evaluated using distinct synthesis/display approaches and a large panel of serum samples. Overall, the serological recognition of MASP antigenic motifs exhibited a remarkable non normal distribution among the T. cruzi seropositive population, thus reducing their applicability in conventional serodiagnosis. As previously observed in in vitro and animal infection models, immune signatures supported the concurrent expression of several MASPs during human infection. Conclusions/Significance In spite of their conspicuous expression and potential roles in parasite biology, this study constitutes the first unbiased, high-resolution profiling of linear B-cell epitopes from T. cruzi MASPs during human infection. PMID:28961244
A GXXXA motif in the transmembrane domain of the Ebola virus glycoprotein is required for tetherin antagonism.

PubMed

González-Hernández, Mariana; Hoffmann, Markus; Brinkmann, Constantin; Nehls, Julia; Winkler, Michael; Schindler, Michael; Pöhlmann, Stefan

2018-04-18

The interferon-induced antiviral host cell protein tetherin can inhibit the release of several enveloped viruses from infected cells. The Ebola virus (EBOV) glycoprotein (GP) antagonizes tetherin but the domains and amino acids in GP that are required for tetherin antagonism have not been fully defined. A GXXXA motif within the transmembrane domain (TMD) of EBOV-GP was previously shown to be important for GP-mediated cellular detachment. Here, we investigated whether this motif also contributes to tetherin antagonism. Mutation of the GXXXA motif did not impact GP expression or particle incorporation and only modestly reduced EBOV-GP-driven entry. In contrast, the GXXXA motif was required for tetherin antagonism in transfected cells. Moreover, alteration of the GXXXA motif increased tetherin-sensitivity of a replication-competent vesicular stomatitis virus (VSV) chimera encoding EBOV-GP. Although these results await confirmation with authentic EBOV, they indicate that a GXXXA motif in the TMD of EBOV-GP is important for tetherin antagonism. Moreover, they provide the first evidence that GP can antagonize tetherin in the context of an infectious EBOV surrogate. IMPORTANCE The glycoprotein (GP) of Ebola virus (EBOV) inhibits the antiviral host cell protein tetherin and may promote viral spread in tetherin-positive cells. However, tetherin antagonism by GP has so far only been demonstrated using virus-like particles and it is unknown whether GP can block tetherin in infected cells. Moreover, a mutation in GP that selectively abrogates tetherin antagonism is unknown. Here, we show that a GXXXA motif in the transmembrane domain of EBOV-GP, which was previously reported to be required for GP-mediated cell rounding, is also important for tetherin counteraction. Moreover, analysis of this mutation in the context of vesicular stomatitis virus chimeras encoding EBOV-GP revealed that GP-mediated tetherin counteraction is operative in infected cells. To our knowledge, these findings demonstrate for the first time that GP can antagonize tetherin in infected cells and provide a tool to study the impact of GP-dependent tetherin counteraction on EBOV spread. Copyright © 2018 American Society for Microbiology.
Plasticity of signaling and mate choice in a trilling species of the Mecopoda complex (Orthoptera: Tettigoniidae).

PubMed

Krobath, I; Römer, H; Hartbauer, M

2017-01-01

Males of a trilling species in the Mecopoda complex produce conspicuous calling songs that consist of two motifs: an amplitude-modulated motif with alternating loud and soft segments (AM-motif) and a continuous, high-intensity trill. The function of these song motifs for female attraction and competition between males was investigated. We tested the hypothesis that males modify their signaling behavior depending on the social environment (presence/absence of females or rival males) when they compete for mates. Therefore, we analyzed acoustic signaling of males in three different situations: (1) solo singing, (2) acoustic interaction with another male, and (3) singing in the presence of a female. In addition, the preference of females for these song motifs and further song parameters was studied in two-choice experiments. As expected, females showed a preference for conspicuous and loud song elements, but nevertheless, males increased the proportion of the AM-motif in the presence of a female. In acoustic interactions, males reduced bout duration significantly compared to both other situations. However, song bouts in this situation still overlapped more than expected by chance, which indicates intentionally simultaneous singing. A multivariate statistical analysis revealed that the proportion of the AM-motif and the duration of loud segments within the AM-motif allow a reliable prediction of whether males sing in isolation, compete with another male, or sing in the presence of a female. These results indicate that the AM-motif plays a dominant role especially in close-range courtship and that males are challenged in finding a balance between attracting females and saving energy during repeated acoustic interactions. Males of acoustic insects often produce conspicuous calling songs that have a dual function in male-male competition and mate attraction. High signal amplitudes and signal rates are associated with high energetic costs for signal production. We would therefore predict that males adjust their signaling behavior according to their perception of the social context. Here we studied signal production and mate choice in a katydid, where males switch between loud and soft song segments in a dynamic way. Additionally, we examined the attractiveness of different song elements in female choice tests. Our results show how males of this katydid deal with the conflict of remaining attractive for females and competing with a costly signal with rivals.
Blind prediction of noncanonical RNA structure at atomic accuracy.

PubMed

Watkins, Andrew M; Geniesse, Caleb; Kladwang, Wipapat; Zakrevsky, Paul; Jaeger, Luc; Das, Rhiju

2018-05-01

Prediction of RNA structure from nucleotide sequence remains an unsolved grand challenge of biochemistry and requires distinct concepts from protein structure prediction. Despite extensive algorithmic development in recent years, modeling of noncanonical base pairs of new RNA structural motifs has not been achieved in blind challenges. We report a stepwise Monte Carlo (SWM) method with a unique add-and-delete move set that enables predictions of noncanonical base pairs of complex RNA structures. A benchmark of 82 diverse motifs establishes the method's general ability to recover noncanonical pairs ab initio, including multistrand motifs that have been refractory to prior approaches. In a blind challenge, SWM models predicted nucleotide-resolution chemical mapping and compensatory mutagenesis experiments for three in vitro selected tetraloop/receptors with previously unsolved structures (C7.2, C7.10, and R1). As a final test, SWM blindly and correctly predicted all noncanonical pairs of a Zika virus double pseudoknot during a recent community-wide RNA-Puzzle. Stepwise structure formation, as encoded in the SWM method, enables modeling of noncanonical RNA structure in a variety of previously intractable problems.
A sequence-specific transcription activator motif and powerful synthetic variants that bind Mediator using a fuzzy protein interface.

PubMed

Warfield, Linda; Tuttle, Lisa M; Pacheco, Derek; Klevit, Rachel E; Hahn, Steven

2014-08-26

Although many transcription activators contact the same set of coactivator complexes, the mechanism and specificity of these interactions have been unclear. For example, do intrinsically disordered transcription activation domains (ADs) use sequence-specific motifs, or do ADs of seemingly different sequence have common properties that encode activation function? We find that the central activation domain (cAD) of the yeast activator Gcn4 functions through a short, conserved sequence-specific motif. Optimizing the residues surrounding this short motif by inserting additional hydrophobic residues creates very powerful ADs that bind the Mediator subunit Gal11/Med15 with high affinity via a "fuzzy" protein interface. In contrast to Gcn4, the activity of these synthetic ADs is not strongly dependent on any one residue of the AD, and this redundancy is similar to that of some natural ADs in which few if any sequence-specific residues have been identified. The additional hydrophobic residues in the synthetic ADs likely allow multiple faces of the AD helix to interact with the Gal11 activator-binding domain, effectively forming a fuzzier interface than that of the wild-type cAD.

Counting of oligomers in sequences generated by markov chains for DNA motif discovery.

PubMed

Shan, Gao; Zheng, Wei-Mou

2009-02-01

By means of the technique of the imbedded Markov chain, an efficient algorithm is proposed to exactly calculate first, second moments of word counts and the probability for a word to occur at least once in random texts generated by a Markov chain. A generating function is introduced directly from the imbedded Markov chain to derive asymptotic approximations for the problem. Two Z-scores, one based on the number of sequences with hits and the other on the total number of word hits in a set of sequences, are examined for discovery of motifs on a set of promoter sequences extracted from A. thaliana genome. Source code is available at http://www.itp.ac.cn/zheng/oligo.c.
Mining for class-specific motifs in protein sequence classification

PubMed Central

2013-01-01

Background In protein sequence classification, identification of the sequence motifs or n-grams that can precisely discriminate between classes is a more interesting scientific question than the classification itself. A number of classification methods aim at accurate classification but fail to explain which sequence features indeed contribute to the accuracy. We hypothesize that sequences in lower denominations (n-grams) can be used to explore the sequence landscape and to identify class-specific motifs that discriminate between classes during classification. Discriminative n-grams are short peptide sequences that are highly frequent in one class but are either minimally present or absent in other classes. In this study, we present a new substitution-based scoring function for identifying discriminative n-grams that are highly specific to a class. Results We present a scoring function based on discriminative n-grams that can effectively discriminate between classes. The scoring function, initially, harvests the entire set of 4- to 8-grams from the protein sequences of different classes in the dataset. Similar n-grams of the same size are combined to form new n-grams, where the similarity is defined by positive amino acid substitution scores in the BLOSUM62 matrix. Substitution has resulted in a large increase in the number of discriminatory n-grams harvested. Due to the unbalanced nature of the dataset, the frequencies of the n-grams are normalized using a dampening factor, which gives more weightage to the n-grams that appear in fewer classes and vice-versa. After the n-grams are normalized, the scoring function identifies discriminative 4- to 8-grams for each class that are frequent enough to be above a selection threshold. By mapping these discriminative n-grams back to the protein sequences, we obtained contiguous n-grams that represent short class-specific motifs in protein sequences. Our method fared well compared to an existing motif finding method known as Wordspy. We have validated our enriched set of class-specific motifs against the functionally important motifs obtained from the NLSdb, Prosite and ELM databases. We demonstrate that this method is very generic; thus can be widely applied to detect class-specific motifs in many protein sequence classification tasks. Conclusion The proposed scoring function and methodology is able to identify class-specific motifs using discriminative n-grams derived from the protein sequences. The implementation of amino acid substitution scores for similarity detection, and the dampening factor to normalize the unbalanced datasets have significant effect on the performance of the scoring function. Our multipronged validation tests demonstrate that this method can detect class-specific motifs from a wide variety of protein sequence classes with a potential application to detecting proteome-specific motifs of different organisms. PMID:23496846
A R/K-rich motif in the C-terminal of the homeodomain is required for complete translocating of NKX2.5 protein into nucleus.

PubMed

Ouyang, Ping; Zhang, He; Fan, Zhaolan; Wei, Pei; Huang, Zhigang; Wang, Sen; Li, Tao

2016-11-05

NKX2.5 plays important roles in heart development. Being a transcription factor, NKX2.5 exerts its biological functions in nucleus. However, the sequence motif that localize NKX2.5 into nucleus is still not clear. Here, we found a R/K-rich sequence motif from Q187 to R197 (QNRRYKCKRQR) was required for exclusive nuclear localization of NKX2.5. Eight truncated plasmids (E109X, Q149X, Q170X, Q187X, Q198X, Y256X, Y259X, and C264X) which were associated with congenital heart disease (CHD) were constructed. Compared with the wild type NKX2.5, the proteins E109X, Q149X, Q170X, Q187X without intact homeodomain (HD) showed no transcriptional activity while Q198X, Y256X, Y259X and C264X with intact HD showed 50 to 66% transcriptional activity. E109X, Q149X, Q170X, Q187X without intact HD localized in the cytoplasm and nucleus simultaneously and Q198X, Y256X, Y259X and C264X with intact HD localized completely in nucleus. These results inferred the indispensability of 187QNRRYKCKRQR197 in exclusive nucleus localization. Additionally, this sequence motif was very conservative among human, mouse and rat, indicating this motif was important for NKX2.5 function. Thus, we concluded that R/K-rich sequence motif 187QNRRYKCKRQR197 played a central role for NKX2.5 nuclear localization. Our findings provided a clue to understand the mechanisms between the truncated NKX2.5 mutants and CHD. Copyright © 2016 Elsevier B.V. All rights reserved.
Modification of Titanium Substrates with Chimeric Peptides Comprising Antimicrobial and Titanium-Binding Motifs Connected by Linkers To Inhibit Biofilm Formation.

PubMed

Liu, Zihao; Ma, Shiqing; Duan, Shun; Xuliang, Deng; Sun, Yingchun; Zhang, Xi; Xu, Xinhua; Guan, Binbin; Wang, Chao; Hu, Meilin; Qi, Xingying; Zhang, Xu; Gao, Ping

2016-03-02

Bacterial adhesion and biofilm formation are the primary causes of implant-associated infection, which is difficult to eliminate and may induce failure in dental implants. Chimeric peptides with both binding and antimicrobial motifs may provide a promising alternative to inhibit biofilm formation on titanium surfaces. In this study, chimeric peptides were designed by connecting an antimicrobial motif (JH8194: KRLFRRWQWRMKKY) with a binding motif (minTBP-1: RKLPDA) directly or via flexible/rigid linkers to modify Ti surfaces. We evaluated the binding behavior of peptides using quartz crystal microbalance (QCM) and atomic force microscopy (AFM) techniques and investigated the effect of the modification of titanium surfaces with these peptides on the bioactivity of Streptococcus gordonii (S. gordonii) and Streptococcus sanguis (S. sanguis). Compared with the flexible linker (GGGGS), the rigid linker (PAPAP) significantly increased the adsorption of the chimeric peptide on titanium surfaces (p < 0.05). Concentration-dependent adsorption is consistent with a single Langmuir model, whereas time-dependent adsorption is in line with a two-domain Langmuir model. Additionally, the chimeric peptide with the rigid linker exhibited more effective antimicrobial ability than the peptide with the flexible linker. This finding was ascribed to the ability of the rigid linker to separate functional domains and reduce their interference to the maximum extent. Consequently, the performance of chimeric peptides with specific titanium-binding motifs and antimicrobial motifs against bacteria can be optimized by the proper selection of linkers. This rational design of chimeric peptides provides a promising alternative to inhibit the formation of biofilms on titanium surfaces with the potential to prevent peri-implantitis and peri-implant mucositis.
CompariMotif: quick and easy comparisons of sequence motifs.

PubMed

Edwards, Richard J; Davey, Norman E; Shields, Denis C

2008-05-15

CompariMotif is a novel tool for making motif-motif comparisons, identifying and describing similarities between regular expression motifs. CompariMotif can identify a number of different relationships between motifs, including exact matches, variants of degenerate motifs and complex overlapping motifs. Motif relationships are scored using shared information content, allowing the best matches to be easily identified in large comparisons. Many input and search options are available, enabling a list of motifs to be compared to itself (to identify recurring motifs) or to datasets of known motifs. CompariMotif can be run online at http://bioware.ucd.ie/ and is freely available for academic use as a set of open source Python modules under a GNU General Public License from http://bioinformatics.ucd.ie/shields/software/comparimotif/
Retinal phenotype-genotype correlation of pediatric patients expressing mutations in the Norrie disease gene.

PubMed

Wu, Wei-Chi; Drenser, Kimberly; Trese, Michael; Capone, Antonio; Dailey, Wendy

2007-02-01

To correlate the ophthalmic findings of patients with pediatric vitreoretinopathies with mutations occurring in the Norrie disease gene (NDP). One hundred nine subjects with diverse pediatric vitreoretinopathies and 54 control subjects were enrolled in the study. Diagnoses were based on retinal findings at each patient's first examination. Samples of DNA from each patient underwent polymerase chain reaction amplification and direct sequencing of the NDP gene. Eleven male patients expressing mutations in the NDP gene were identified in the test group, whereas the controls demonstrated wild-type NDP. All patients diagnosed as having Norrie disease had mutations in the NDP gene. Four of the patients with Norrie disease had mutations involving a cysteine residue in the cysteine-knot motif. Four patients diagnosed as having familial exudative vitreoretinopathy were found to have noncysteine mutations. One patient with retinopathy of prematurity had a 14-base deletion in the 5' untranslated region (exon 1), and 1 patient with bilateral persistent fetal vasculature syndrome expressed a noncysteine mutation in the second exon. Mutations disrupting the cysteine-knot motif corresponded to severe retinal dysgenesis, whereas patients with noncysteine mutations had varying degrees of avascular peripheral retina, extraretinal vasculature, and subretinal exudate. Patients exhibiting severe retinal dysgenesis should be suspected of carrying a mutation that disrupts the cysteine-knot motif in the NDP gene.
Ciliate telomerase RNA loop IV nucleotides promote hierarchical RNP assembly and holoenzyme stability.

PubMed

Robart, Aaron R; O'Connor, Catherine M; Collins, Kathleen

2010-03-01

Telomerase adds simple-sequence repeats to chromosome 3' ends to compensate for the loss of repeats with each round of genome replication. To accomplish this de novo DNA synthesis, telomerase uses a template within its integral RNA component. In addition to providing the template, the telomerase RNA subunit (TER) also harbors nontemplate motifs that contribute to the specialized telomerase catalytic cycle of reiterative repeat synthesis. Most nontemplate TER motifs function through linkage with the template, but in ciliate and vertebrate telomerases, a stem-loop motif binds telomerase reverse transcriptase (TERT) and reconstitutes full activity of the minimal recombinant TERT+TER RNP, even when physically separated from the template. Here, we resolve the functional requirements for this motif of ciliate TER in physiological RNP context using the Tetrahymena thermophila p65-TER-TERT core RNP reconstituted in vitro and the holoenzyme reconstituted in vivo. Contrary to expectation based on assays of the minimal recombinant RNP, we find that none of a panel of individual loop IV nucleotide substitutions impacts the profile of telomerase product synthesis when reconstituted as physiological core RNP or holoenzyme RNP. However, loop IV nucleotide substitutions do variably reduce assembly of TERT with the p65-TER complex in vitro and reduce the accumulation and stability of telomerase RNP in endogenous holoenzyme context. Our results point to a unifying model of a conformational activation role for this TER motif in the telomerase RNP enzyme.
Children's Drawings About "Radiation"—Before and After Fukushima

NASA Astrophysics Data System (ADS)

Neumann, Susanne; Hopf, Martin

2013-08-01

Although the term "radiation" has a fixed place in everyday life as well as in the media, there is very little empirical research on students' conceptions about this topic. In our study we wanted to find out what students associate with this term. In 2009, we asked 509 students (between grade 4 and grade 6) from seven different schools to draw pictures related to "radiation". This method of children's drawings was supported by short interviews ( n = 74). The motifs appearing in the drawings were analysed, and we investigated whether or not the age and the sex of the children had any influence on the choice of motifs. One major result was that the older the students were, the more likely they were to choose sources of invisible radiation (nuclear power plants, mobile phones) as their motifs. Nine months after the tragic events in Fukushima (and at the same time 2 years after the 2009 data collection), we replicated the study. This time, we received 516 drawings from the same schools as in the 2009 study (supported by 33 interviews). This replicative trend study made it possible to compare the choice of motifs and discover possible differences. The results of this analysis showed that the drawings of 2011 included significantly more motifs related to radioactivity. This difference was prevalent in the drawings regardless of sex or age differences. Direct references to the Fukushima accident could be found in both the drawings and interviews.
Members of the Meloidogyne avirulence protein family contain multiple plant ligand-like motifs.

PubMed

Rutter, William B; Hewezi, Tarek; Maier, Tom R; Mitchum, Melissa G; Davis, Eric L; Hussey, Richard S; Baum, Thomas J

2014-08-01

Sedentary plant-parasitic nematodes engage in complex interactions with their host plants by secreting effector proteins. Some effectors of both root-knot nematodes (Meloidogyne spp.) and cyst nematodes (Heterodera and Globodera spp.) mimic plant ligand proteins. Most prominently, cyst nematodes secrete effectors that mimic plant CLAVATA3/ESR-related (CLE) ligand proteins. However, only cyst nematodes have been shown to secrete such effectors and to utilize CLE ligand mimicry in their interactions with host plants. Here, we document the presence of ligand-like motifs in bona fide root-knot nematode effectors that are most similar to CLE peptides from plants and cyst nematodes. We have identified multiple tandem CLE-like motifs conserved within the previously identified Meloidogyne avirulence protein (MAP) family that are secreted from root-knot nematodes and have been shown to function in planta. By searching all 12 MAP family members from multiple Meloidogyne spp., we identified 43 repetitive CLE-like motifs composing 14 unique variants. At least one CLE-like motif was conserved in each MAP family member. Furthermore, we documented the presence of other conserved sequences that resemble the variable domains described in Heterodera and Globodera CLE effectors. These findings document that root-knot nematodes appear to use CLE ligand mimicry and point toward a common host node targeted by two evolutionarily diverse groups of nematodes. As a consequence, it is likely that CLE signaling pathways are important in other phytonematode pathosystems as well.
21st International Conference on DNA Computing and Molecular Programming

DTIC Science & Technology

2016-05-24

ratio [24], which allows plants to ration starch reserves during seasonally changing nights . 28 N. Dalchau et al. We specify the division problem...design of leak- resistant DSD systems. T his motif forms t he basis of a number of DSD schemes t hat do not rely on toehold sequestration alone to prevent
RNAfbinv: an interactive Java application for fragment-based design of RNA sequences.

PubMed

Weinbrand, Lina; Avihoo, Assaf; Barash, Danny

2013-11-15

In RNA design problems, it is plausible to assume that the user would be interested in preserving a particular RNA secondary structure motif, or fragment, for biological reasons. The preservation could be in structure or sequence, or both. Thus, the inverse RNA folding problem could benefit from considering fragment constraints. We have developed a new interactive Java application called RNA fragment-based inverse that allows users to insert an RNA secondary structure in dot-bracket notation. It then performs sequence design that conforms to the shape of the input secondary structure, the specified thermodynamic stability, the specified mutational robustness and the user-selected fragment after shape decomposition. In this shape-based design approach, specific RNA structural motifs with known biological functions are strictly enforced, while others can possess more flexibility in their structure in favor of preserving physical attributes and additional constraints. RNAfbinv is freely available for download on the web at http://www.cs.bgu.ac.il/~RNAexinv/RNAfbinv. The site contains a help file with an explanation regarding the exact use.
Anomalous diffusion in neutral evolution of model proteins.

PubMed

Nelson, Erik D; Grishin, Nick V

2015-06-01

Protein evolution is frequently explored using minimalist polymer models, however, little attention has been given to the problem of structural drift, or diffusion. Here, we study neutral evolution of small protein motifs using an off-lattice heteropolymer model in which individual monomers interact as low-resolution amino acids. In contrast to most earlier models, both the length and folded structure of the polymers are permitted to change. To describe structural change, we compute the mean-square distance (MSD) between monomers in homologous folds separated by n neutral mutations. We find that structural change is episodic, and, averaged over lineages (for example, those extending from a single sequence), exhibits a power-law dependence on n. We show that this exponent depends on the alignment method used, and we analyze the distribution of waiting times between neutral mutations. The latter are more disperse than for models required to maintain a specific fold, but exhibit a similar power-law tail.
Anomalous diffusion in neutral evolution of model proteins

NASA Astrophysics Data System (ADS)

Nelson, Erik D.; Grishin, Nick V.

2015-06-01

Protein evolution is frequently explored using minimalist polymer models, however, little attention has been given to the problem of structural drift, or diffusion. Here, we study neutral evolution of small protein motifs using an off-lattice heteropolymer model in which individual monomers interact as low-resolution amino acids. In contrast to most earlier models, both the length and folded structure of the polymers are permitted to change. To describe structural change, we compute the mean-square distance (MSD) between monomers in homologous folds separated by n neutral mutations. We find that structural change is episodic, and, averaged over lineages (for example, those extending from a single sequence), exhibits a power-law dependence on n . We show that this exponent depends on the alignment method used, and we analyze the distribution of waiting times between neutral mutations. The latter are more disperse than for models required to maintain a specific fold, but exhibit a similar power-law tail.
Loss of a Tyrosine-Dependent Trafficking Motif in the Simian Immunodeficiency Virus Envelope Cytoplasmic Tail Spares Mucosal CD4 Cells but Does Not Prevent Disease Progression

PubMed Central

Breed, Matthew W.; Jordan, Andrea P. O.; Aye, Pyone P.; Lichtveld, Cornelis F.; Midkiff, Cecily C.; Schiro, Faith R.; Haggarty, Beth S.; Sugimoto, Chie; Alvarez, Xavier; Sandler, Netanya G.; Douek, Daniel C.; Kuroda, Marcelo J.; Pahar, Bapi; Piatak, Michael; Lifson, Jeffrey D.; Keele, Brandon F.; Hoxie, James A.

2013-01-01

A hallmark of pathogenic simian immunodeficiency virus (SIV) and human immunodeficiency virus (HIV) infections is the rapid and near-complete depletion of mucosal CD4+ T lymphocytes from the gastrointestinal tract. Loss of these cells and disruption of epithelial barrier function are associated with microbial translocation, which has been proposed to drive chronic systemic immune activation and disease progression. Here, we evaluate in rhesus macaques a novel attenuated variant of pathogenic SIVmac239, termed ΔGY, which contains a deletion of a Tyr and a proximal Gly from a highly conserved YxxØ trafficking motif in the envelope cytoplasmic tail. Compared to SIVmac239, ΔGY established a comparable acute peak of viremia but only transiently infected lamina propria and caused little or no acute depletion of mucosal CD4+ T cells and no detectable microbial translocation. Nonetheless, these animals developed T-cell activation and declining peripheral blood CD4+ T cells and ultimately progressed with clinical or pathological features of AIDS. ΔGY-infected animals also showed no infection of macrophages or central nervous system tissues even in late-stage disease. Although the ΔGY mutation persisted, novel mutations evolved, including the formation of new YxxØ motifs in two of four animals. These findings indicate that disruption of this trafficking motif by the ΔGY mutation leads to a striking alteration in anatomic distribution of virus with sparing of lamina propria and a lack of microbial translocation. Because these animals exhibited wild-type levels of acute viremia and immune activation, our findings indicate that these pathological events are dissociable and that immune activation unrelated to gut damage can be sufficient for the development of AIDS. PMID:23152518
Swellix: a computational tool to explore RNA conformational space.

PubMed

Sloat, Nathan; Liu, Jui-Wen; Schroeder, Susan J

2017-11-21

The sequence of nucleotides in an RNA determines the possible base pairs for an RNA fold and thus also determines the overall shape and function of an RNA. The Swellix program presented here combines a helix abstraction with a combinatorial approach to the RNA folding problem in order to compute all possible non-pseudoknotted RNA structures for RNA sequences. The Swellix program builds on the Crumple program and can include experimental constraints on global RNA structures such as the minimum number and lengths of helices from crystallography, cryoelectron microscopy, or in vivo crosslinking and chemical probing methods. The conceptual advance in Swellix is to count helices and generate all possible combinations of helices rather than counting and combining base pairs. Swellix bundles similar helices and includes improvements in memory use and efficient parallelization. Biological applications of Swellix are demonstrated by computing the reduction in conformational space and entropy due to naturally modified nucleotides in tRNA sequences and by motif searches in Human Endogenous Retroviral (HERV) RNA sequences. The Swellix motif search reveals occurrences of protein and drug binding motifs in the HERV RNA ensemble that do not occur in minimum free energy or centroid predicted structures. Swellix presents significant improvements over Crumple in terms of efficiency and memory use. The efficient parallelization of Swellix enables the computation of sequences as long as 418 nucleotides with sufficient experimental constraints. Thus, Swellix provides a practical alternative to free energy minimization tools when multiple structures, kinetically determined structures, or complex RNA-RNA and RNA-protein interactions are present in an RNA folding problem.
Complex lasso: new entangled motifs in proteins

NASA Astrophysics Data System (ADS)

Niemyska, Wanda; Dabrowski-Tumanski, Pawel; Kadlof, Michal; Haglund, Ellinor; Sułkowski, Piotr; Sulkowska, Joanna I.

2016-11-01

We identify new entangled motifs in proteins that we call complex lassos. Lassos arise in proteins with disulfide bridges (or in proteins with amide linkages), when termini of a protein backbone pierce through an auxiliary surface of minimal area, spanned on a covalent loop. We find that as much as 18% of all proteins with disulfide bridges in a non-redundant subset of PDB form complex lassos, and classify them into six distinct geometric classes, one of which resembles supercoiling known from DNA. Based on biological classification of proteins we find that lassos are much more common in viruses, plants and fungi than in other kingdoms of life. We also discuss how changes in the oxidation/reduction potential may affect the function of proteins with lassos. Lassos and associated surfaces of minimal area provide new, interesting and possessing many potential applications geometric characteristics not only of proteins, but also of other biomolecules.
[Prediction of Promoter Motifs in Virophages].

PubMed

Gong, Chaowen; Zhou, Xuewen; Pan, Yingjie; Wang, Yongjie

2015-07-01

Virophages have crucial roles in ecosystems and are the transport vectors of genetic materials. To shed light on regulation and control mechanisms in virophage--host systems as well as evolution between virophages and their hosts, the promoter motifs of virophages were predicted on the upstream regions of start codons using an analytical tool for prediction of promoter motifs: Multiple EM for Motif Elicitation. Seventeen potential promoter motifs were identified based on the E-value, location, number and length of promoters in genomes. Sputnik and zamilon motif 2 with AT-rich regions were distributed widely on genomes, suggesting that these motifs may be associated with regulation of the expression of various genes. Motifs containing the TCTA box were predicted to be late promoter motif in mavirus; motifs containing the ATCT box were the potential late promoter motif in the Ace Lake mavirus . AT-rich regions were identified on motif 2 in the Organic Lake virophage, motif 3 in Yellowstone Lake virophage (YSLV)1 and 2, motif 1 in YSLV3, and motif 1 and 2 in YSLV4, respectively. AT-rich regions were distributed widely on the genomes of virophages. All of these motifs may be promoter motifs of virophages. Our results provide insights into further exploration of temporal expression of genes in virophages as well as associations between virophages and giant viruses.
Finding functional features in Saccharomyces genomes by phylogenetic footprinting.

PubMed

Cliften, Paul; Sudarsanam, Priya; Desikan, Ashwin; Fulton, Lucinda; Fulton, Bob; Majors, John; Waterston, Robert; Cohen, Barak A; Johnston, Mark

2003-07-04

The sifting and winnowing of DNA sequence that occur during evolution cause nonfunctional sequences to diverge, leaving phylogenetic footprints of functional sequence elements in comparisons of genome sequences. We searched for such footprints among the genome sequences of six Saccharomyces species and identified potentially functional sequences. Comparison of these sequences allowed us to revise the catalog of yeast genes and identify sequence motifs that may be targets of transcriptional regulatory proteins. Some of these conserved sequence motifs reside upstream of genes with similar functional annotations or similar expression patterns or those bound by the same transcription factor and are thus good candidates for functional regulatory sequences.
Optimized mixed Markov models for motif identification

PubMed Central

Huang, Weichun; Umbach, David M; Ohler, Uwe; Li, Leping

2006-01-01

Background Identifying functional elements, such as transcriptional factor binding sites, is a fundamental step in reconstructing gene regulatory networks and remains a challenging issue, largely due to limited availability of training samples. Results We introduce a novel and flexible model, the Optimized Mixture Markov model (OMiMa), and related methods to allow adjustment of model complexity for different motifs. In comparison with other leading methods, OMiMa can incorporate more than the NNSplice's pairwise dependencies; OMiMa avoids model over-fitting better than the Permuted Variable Length Markov Model (PVLMM); and OMiMa requires smaller training samples than the Maximum Entropy Model (MEM). Testing on both simulated and actual data (regulatory cis-elements and splice sites), we found OMiMa's performance superior to the other leading methods in terms of prediction accuracy, required size of training data or computational time. Our OMiMa system, to our knowledge, is the only motif finding tool that incorporates automatic selection of the best model. OMiMa is freely available at [1]. Conclusion Our optimized mixture of Markov models represents an alternative to the existing methods for modeling dependent structures within a biological motif. Our model is conceptually simple and effective, and can improve prediction accuracy and/or computational speed over other leading methods. PMID:16749929
Functional analysis of a viroid RNA motif mediating cell-to-cell movement in Nicotiana benthamiana.

PubMed

Jiang, Dongmei; Wang, Meng; Li, Shifang

2017-01-01

Cell-to-cell trafficking through different cellular layers is a key process for various RNAs including those of plant viruses and viroids, but the regulatory mechanisms involved are still not fully elucidated and good model systems are important. Here, we analyse the function of a simple RNA motif (termed 'loop19') in potato spindle tuber viroid (PSTVd) which is required for trafficking in Nicotiana benthamiana leaves. Northern blotting, reverse transcriptase PCR (RT-PCR) and in situ hybridization analyses demonstrated that unlike wild-type PSTVd, which was present in the nuclei in all cell types, the trafficking-defective loop19 mutants were visible only in the nuclei of upper epidermal and palisade mesophyll cells, which shows that PSTVd loop19 plays a role in mediating RNA trafficking from palisade to spongy mesophyll cells in N.benthamiana leaves. Our findings and approaches have broad implications for studying the RNA motifs mediating trafficking of RNAs across specific cellular boundaries in other biological systems.

Regulation of amyloid precursor protein processing by its KFERQ motif.

PubMed

Park, Ji-Seon; Kim, Dong-Hou; Yoon, Seung-Yong

2016-06-01

Understanding of trafficking, processing, and degradation mechanisms of amyloid precursor protein (APP) is important because APP can be processed to produce β-amyloid (Aβ), a key pathogenic molecule in Alzheimer's disease (AD). Here, we found that APP contains KFERQ motif at its C-terminus, a consensus sequence for chaperone-mediated autophagy (CMA) or microautophagy which are another types of autophagy for degradation of pathogenic molecules in neurodegenerative diseases. Deletion of KFERQ in APP increased C-terminal fragments (CTFs) and secreted N-terminal fragments of APP and kept it away from lysosomes. KFERQ deletion did not abolish the interaction of APP or its cleaved products with heat shock cognate protein 70 (Hsc70), a protein necessary for CMA or microautophagy. These findings suggest that KFERQ motif is important for normal processing and degradation of APP to preclude the accumulation of APP-CTFs although it may not be important for CMA or microautophagy. [BMB Reports 2016; 49(6): 337-342].
Transcutaneous immunization with tetanus toxoid and mutants of Escherichia coli heat-labile enterotoxin as adjuvants elicits strong protective antibody responses.

PubMed

Tierney, Rob; Beignon, Anne-Sophie; Rappuoli, Rino; Muller, Sylviane; Sesardic, Dorothea; Partidos, Charalambos D

2003-09-01

In this study, the adjuvanticity of 2 nontoxic derivatives (LTK63 and LTR72) of heat-labile enterotoxin of Escherichia coli (LT) was evaluated and was compared with that of a cytosine phosphodiester-guanine (CpG) motif, after transcutaneous immunization with tetanus toxoid (TT). TT plus LTR72 elicited the strongest antibody responses, compared with those elicited by the other vaccines (TT, TT plus LTK63, TT plus CpG, and TT plus LTK63 plus CpG); it neutralized the toxin and conferred full protection after passive transfer in mice. Preexisting immunity to LT mutants did not adversely affect their adjuvant potency. Both LTK63 and LTR72 promoted the induction of IgG1 antibodies. In contrast, mice receiving either CpG motif alone or CpG motif plus LTK63 produced strong IgG2a anti-TT antibody responses. Overall, these findings demonstrate that mutants of enterotoxins with reduced toxicity are effective adjuvants for transcutaneous immunization.
Motif-Based Text Mining of Microbial Metagenome Redundancy Profiling Data for Disease Classification.

PubMed

Wang, Yin; Li, Rudong; Zhou, Yuhua; Ling, Zongxin; Guo, Xiaokui; Xie, Lu; Liu, Lei

2016-01-01

Text data of 16S rRNA are informative for classifications of microbiota-associated diseases. However, the raw text data need to be systematically processed so that features for classification can be defined/extracted; moreover, the high-dimension feature spaces generated by the text data also pose an additional difficulty. Here we present a Phylogenetic Tree-Based Motif Finding algorithm (PMF) to analyze 16S rRNA text data. By integrating phylogenetic rules and other statistical indexes for classification, we can effectively reduce the dimension of the large feature spaces generated by the text datasets. Using the retrieved motifs in combination with common classification methods, we can discriminate different samples of both pneumonia and dental caries better than other existing methods. We extend the phylogenetic approaches to perform supervised learning on microbiota text data to discriminate the pathological states for pneumonia and dental caries. The results have shown that PMF may enhance the efficiency and reliability in analyzing high-dimension text data.
Factoring local sequence composition in motif significance analysis.

PubMed

Ng, Patrick; Keich, Uri

2008-01-01

We recently introduced a biologically realistic and reliable significance analysis of the output of a popular class of motif finders. In this paper we further improve our significance analysis by incorporating local base composition information. Relying on realistic biological data simulation, as well as on FDR analysis applied to real data, we show that our method is significantly better than the increasingly popular practice of using the normal approximation to estimate the significance of a finder's output. Finally we turn to leveraging our reliable significance analysis to improve the actual motif finding task. Specifically, endowing a variant of the Gibbs Sampler with our improved significance analysis we demonstrate that de novo finders can perform better than has been perceived. Significantly, our new variant outperforms all the finders reviewed in a recently published comprehensive analysis of the Harbison genome-wide binding location data. Interestingly, many of these finders incorporate additional information such as nucleosome positioning and the significance of binding data.
External lipid PI3P mediates entry of eukaryotic pathogen effectors into plant and animal host cells.

PubMed

Kale, Shiv D; Gu, Biao; Capelluto, Daniel G S; Dou, Daolong; Feldman, Emily; Rumore, Amanda; Arredondo, Felipe D; Hanlon, Regina; Fudal, Isabelle; Rouxel, Thierry; Lawrence, Christopher B; Shan, Weixing; Tyler, Brett M

2010-07-23

Pathogens of plants and animals produce effector proteins that are transferred into the cytoplasm of host cells to suppress host defenses. One type of plant pathogens, oomycetes, produces effector proteins with N-terminal RXLR and dEER motifs that enable entry into host cells. We show here that effectors of another pathogen type, fungi, contain functional variants of the RXLR motif, and that the oomycete and fungal RXLR motifs enable binding to the phospholipid, phosphatidylinositol-3-phosphate (PI3P). We find that PI3P is abundant on the outer surface of plant cell plasma membranes and, furthermore, on some animal cells. All effectors could also enter human cells, suggesting that PI3P-mediated effector entry may be very widespread in plant, animal and human pathogenesis. Entry into both plant and animal cells involves lipid raft-mediated endocytosis. Blocking PI3P binding inhibited effector entry, suggesting new therapeutic avenues. Copyright 2010 Elsevier Inc. All rights reserved.
Analysis of transactivation potential of rice (Oryza sativa L.) heat shock factors.

PubMed

Lavania, Dhruv; Dhingra, Anuradha; Grover, Anil

2018-06-01

Based on yeast one-hybrid assays, we show that the presence of C-terminal AHA motifs is not a prerequisite for transactivation potential in rice heat shock factors. Transcriptional activation or transactivation (TA) of heat stress responsive genes takes place by binding of heat shock factors (Hsfs) to heat shock elements. Analysis of TA potential of thirteen rice (Oryza sativa L.) Hsfs (OsHsfs) carried out in this study by yeast one-hybrid assay showed that OsHsfsA3 possesses strong TA potential while OsHsfs A1a, A2a, A2b, A4a, A4d, A5, A7b, B1, B2a, B2b, B2c and B4d lack TA potential. From a near complete picture of TA potential of the OsHsf family (comprising of 25 members) emerging from this study and an earlier report from our group (Mittal et al. in FEBS J 278(17):3076-3085, 2011), it is concluded that (1) overall, six OsHsfs, namely A3, A6a, A6b, A8, C1a and C1b possess TA potential; (2) four class A OsHsfs, namely A3, A6a, A6b and A8 have TA potential out of which A6a and A6b contain AHA motifs while A3 and A8 lack AHA motifs; (3) nine class A OsHsfs, namely A1a, A2a, A2b, A2e, A4a, A4d, A5, A7a and A7b containing AHA motif(s) lack TA function in the yeast assay system; (4) all class B OsHsfs lack AHA motifs and TA potential (B4a not analyzed) and (5) though all class C OsHsf members lack AHA motifs, two members C1a and C1b possess TA function, while one member C2a lacks TA potential (C2b not analyzed). Thus, the presence or absence of AHA motif is possibly not the only factor determining TA potential of OsHsfs. Our findings will help to identify the transcriptional activators of rice heat shock response.
The Fantastic Tale for Children: Its Literary and Educational Problems.

ERIC Educational Resources Information Center

Klingberg, Goete

1967-01-01

The fantastic tale is a genre of children's literature in which magic and reality are found side by side in a superficially plausible story with a definite historical setting. Motifs characteristic of the tale are living toys, strange children, modern witches, space and time displacements, "doors" to the wonderland, the mythical world itself, and…
Rapid motif compliance scoring with match weight sets.

PubMed

Venezia, D; O'Hara, P J

1993-02-01

Most current implementations of motif matching in biological sequences have sacrificed the generality of weight matrix scoring for shorter runtimes. The program MOTIF incorporates a weight matrix and a rapid, backtracking tree-search algorithm to score motif compliance with greatly enhanced performance while placing no constraints on the motif. In addition, any positions within a motif can be marked as 'inviolate', thereby requiring an exact match. MOTIF allows a choice of regular expression formats and can use both motif and sequence libraries as either targets or queries. Nucleic acid sequences can optionally be translated by MOTIF in any frame(s) and used against peptide motifs.
WebMOTIFS: automated discovery, filtering and scoring of DNA sequence motifs using multiple programs and Bayesian approaches

PubMed Central

Romer, Katherine A.; Kayombya, Guy-Richard; Fraenkel, Ernest

2007-01-01

WebMOTIFS provides a web interface that facilitates the discovery and analysis of DNA-sequence motifs. Several studies have shown that the accuracy of motif discovery can be significantly improved by using multiple de novo motif discovery programs and using randomized control calculations to identify the most significant motifs or by using Bayesian approaches. WebMOTIFS makes it easy to apply these strategies. Using a single submission form, users can run several motif discovery programs and score, cluster and visualize the results. In addition, the Bayesian motif discovery program THEME can be used to determine the class of transcription factors that is most likely to regulate a set of sequences. Input can be provided as a list of gene or probe identifiers. Used with the default settings, WebMOTIFS accurately identifies biologically relevant motifs from diverse data in several species. WebMOTIFS is freely available at http://fraenkel.mit.edu/webmotifs. PMID:17584794
The Reconstruction of Condition-Specific Transcriptional Modules Provides New Insights in the Evolution of Yeast AP-1 Proteins

PubMed Central

Goudot, Christel; Etchebest, Catherine

2011-01-01

AP-1 proteins are transcription factors (TFs) that belong to the basic leucine zipper family, one of the largest families of TFs in eukaryotic cells. Despite high homology between their DNA binding domains, these proteins are able to recognize diverse DNA motifs. In yeasts, these motifs are referred as YRE (Yap Response Element) and are either seven (YRE-Overlap) or eight (YRE-Adjacent) base pair long. It has been proposed that the AP-1 DNA binding motif preference relies on a single change in the amino acid sequence of the yeast AP-1 TFs (an arginine in the YRE-O binding factors being replaced by a lysine in the YRE-A binding Yaps). We developed a computational approach to infer condition-specific transcriptional modules associated to the orthologous AP-1 protein Yap1p, Cgap1p and Cap1p, in three yeast species: the model yeast Saccharomyces cerevisiae and two pathogenic species Candida glabrata and Candida albicans. Exploitation of these modules in terms of predictions of the protein/DNA regulatory interactions changed our vision of AP-1 protein evolution. Cis-regulatory motif analyses revealed the presence of a conserved adenine in 5′ position of the canonical YRE sites. While Yap1p, Cgap1p and Cap1p shared a remarkably low number of target genes, an impressive conservation was observed in the YRE sequences identified by Yap1p and Cap1p. In Candida glabrata, we found that Cgap1p, unlike Yap1p and Cap1p, recognizes YRE-O and YRE-A motifs. These findings were supported by structural data available for the transcription factor Pap1p (Schizosaccharomyces pombe). Thus, whereas arginine and lysine substitutions in Cgap1p and Yap1p proteins were reported as responsible for a specific YRE-O or YRE-A preference, our analyses rather suggest that the ancestral yeast AP-1 protein could recognize both YRE-O and YRE-A motifs and that the arginine/lysine exchange is not the only determinant of the specialization of modern Yaps for one motif or another. PMID:21695268
Function-based classification of carbohydrate-active enzymes by recognition of short, conserved peptide motifs.

PubMed

Busk, Peter Kamp; Lange, Lene

2013-06-01

Functional prediction of carbohydrate-active enzymes is difficult due to low sequence identity. However, similar enzymes often share a few short motifs, e.g., around the active site, even when the overall sequences are very different. To exploit this notion for functional prediction of carbohydrate-active enzymes, we developed a simple algorithm, peptide pattern recognition (PPR), that can divide proteins into groups of sequences that share a set of short conserved sequences. When this method was used on 118 glycoside hydrolase 5 proteins with 9% average pairwise identity and representing four characterized enzymatic functions, 97% of the proteins were sorted into groups correlating with their enzymatic activity. Furthermore, we analyzed 8,138 glycoside hydrolase 13 proteins including 204 experimentally characterized enzymes with 28 different functions. There was a 91% correlation between group and enzyme activity. These results indicate that the function of carbohydrate-active enzymes can be predicted with high precision by finding short, conserved motifs in their sequences. The glycoside hydrolase 61 family is important for fungal biomass conversion, but only a few proteins of this family have been functionally characterized. Interestingly, PPR divided 743 glycoside hydrolase 61 proteins into 16 subfamilies useful for targeted investigation of the function of these proteins and pinpointed three conserved motifs with putative importance for enzyme activity. Furthermore, the conserved sequences were useful for cloning of new, subfamily-specific glycoside hydrolase 61 proteins from 14 fungi. In conclusion, identification of conserved sequence motifs is a new approach to sequence analysis that can predict carbohydrate-active enzyme functions with high precision.
Loads Bias Genetic and Signaling Switches in Synthetic and Natural Systems

PubMed Central

Medford, June; Prasad, Ashok

2014-01-01

Biological protein interactions networks such as signal transduction or gene transcription networks are often treated as modular, allowing motifs to be analyzed in isolation from the rest of the network. Modularity is also a key assumption in synthetic biology, where it is similarly expected that when network motifs are combined together, they do not lose their essential characteristics. However, the interactions that a network module has with downstream elements change the dynamical equations describing the upstream module and thus may change the dynamic and static properties of the upstream circuit even without explicit feedback. In this work we analyze the behavior of a ubiquitous motif in gene transcription and signal transduction circuits: the switch. We show that adding an additional downstream component to the simple genetic toggle switch changes its dynamical properties by changing the underlying potential energy landscape, and skewing it in favor of the unloaded side, and in some situations adding loads to the genetic switch can also abrogate bistable behavior. We find that an additional positive feedback motif found in naturally occurring toggle switches could tune the potential energy landscape in a desirable manner. We also analyze autocatalytic signal transduction switches and show that a ubiquitous positive feedback switch can lose its switch-like properties when connected to a downstream load. Our analysis underscores the necessity of incorporating the effects of downstream components when understanding the physics of biochemical network motifs, and raises the question as to how these effects are managed in real biological systems. This analysis is particularly important when scaling synthetic networks to more complex organisms. PMID:24676102
Genome-Wide Prediction of the Polymorphic Ser Gene Family in Tetrahymena thermophila Based on Motif Analysis

PubMed Central

Ponsuwanna, Patrath; Kümpornsin, Krittikorn; Chookajorn, Thanat

2014-01-01

Even though antigenic variation is employed among parasitic protozoa for host immune evasion, Tetrahymena thermophila, a free-living ciliate, can also change its surface protein antigens. These cysteine-rich glycosylphosphatidylinositol (GPI)-linked surface proteins are encoded by a family of polymorphic Ser genes. Despite the availability of T. thermophila genome, a comprehensive analysis of the Ser family is limited by its high degree of polymorphism. In order to overcome this problem, a new approach was adopted by searching for Ser candidates with common motif sequences, namely length-specific repetitive cysteine pattern and GPI anchor site. The candidate genes were phylogenetically compared with the previously identified Ser genes and classified into subtypes. Ser candidates were often found to be located as tandem arrays of the same subtypes on several chromosomal scaffolds. Certain Ser candidates located in the same chromosomal arrays were transcriptionally expressed at specific T. thermophila developmental stages. These Ser candidates selected by the motif analysis approach can form the foundation for a systematic identification of the entire Ser gene family, which will contribute to the understanding of their function and the basis of T. thermophila antigenic variation. PMID:25133747
De novo truncating variants in the AHDC1 gene encoding the AT-hook DNA-binding motif-containing protein 1 are associated with intellectual disability and developmental delay.

PubMed

Yang, Hui; Douglas, Ganka; Monaghan, Kristin G; Retterer, Kyle; Cho, Megan T; Escobar, Luis F; Tucker, Megan E; Stoler, Joan; Rodan, Lance H; Stein, Diane; Marks, Warren; Enns, Gregory M; Platt, Julia; Cox, Rachel; Wheeler, Patricia G; Crain, Carrie; Calhoun, Amy; Tryon, Rebecca; Richard, Gabriele; Vitazka, Patrik; Chung, Wendy K

2015-10-01

Whole-exome sequencing (WES) represents a significant breakthrough in clinical genetics, and identifies a genetic etiology in up to 30% of cases of intellectual disability (ID). Using WES, we identified seven unrelated patients with a similar clinical phenotype of severe intellectual disability or neurodevelopmental delay who were all heterozygous for de novo truncating variants in the AT-hook DNA-binding motif-containing protein 1 (AHDC1). The patients were all minimally verbal or nonverbal and had variable neurological problems including spastic quadriplegia, ataxia, nystagmus, seizures, autism, and self-injurious behaviors. Additional common clinical features include dysmorphic facial features and feeding difficulties associated with failure to thrive and short stature. The AHDC1 gene has only one coding exon, and the protein contains conserved regions including AT-hook motifs and a PDZ binding domain. We postulate that all seven variants detected in these patients result in a truncated protein missing critical functional domains, disrupting interactions with other proteins important for brain development. Our study demonstrates that truncating variants in AHDC1 are associated with ID and are primarily associated with a neurodevelopmental phenotype.
Combinatorial action of Grainyhead, Extradenticle and Notch in regulating Hox mediated apoptosis in Drosophila larval CNS

PubMed Central

Khandelwal, Risha; Govinda Rajan, Sriivatsan; Kumar, Raviranjan

2017-01-01

Hox mediated neuroblast apoptosis is a prevalent way to pattern larval central nervous system (CNS) by different Hox genes, but the mechanism of this apoptosis is not understood. Our studies with Abdominal-A (Abd-A) mediated larval neuroblast (pNB) apoptosis suggests that AbdA, its cofactor Extradenticle (Exd), a helix-loop-helix transcription factor Grainyhead (Grh), and Notch signaling transcriptionally contribute to expression of RHG family of apoptotic genes. We find that Grh, AbdA, and Exd function together at multiple motifs on the apoptotic enhancer. In vivo mutagenesis of these motifs suggest that they are important for the maintenance of the activity of the enhancer rather than its initiation. We also find that Exd function is independent of its known partner homothorax in this apoptosis. We extend some of our findings to Deformed expressing region of sub-esophageal ganglia where pNBs undergo a similar Hox dependent apoptosis. We propose a mechanism where common players like Exd-Grh-Notch work with different Hox genes through region specific enhancers to pattern respective segments of larval central nervous system. PMID:29023471
Combinatorial action of Grainyhead, Extradenticle and Notch in regulating Hox mediated apoptosis in Drosophila larval CNS.

PubMed

Khandelwal, Risha; Sipani, Rashmi; Govinda Rajan, Sriivatsan; Kumar, Raviranjan; Joshi, Rohit

2017-10-01

Hox mediated neuroblast apoptosis is a prevalent way to pattern larval central nervous system (CNS) by different Hox genes, but the mechanism of this apoptosis is not understood. Our studies with Abdominal-A (Abd-A) mediated larval neuroblast (pNB) apoptosis suggests that AbdA, its cofactor Extradenticle (Exd), a helix-loop-helix transcription factor Grainyhead (Grh), and Notch signaling transcriptionally contribute to expression of RHG family of apoptotic genes. We find that Grh, AbdA, and Exd function together at multiple motifs on the apoptotic enhancer. In vivo mutagenesis of these motifs suggest that they are important for the maintenance of the activity of the enhancer rather than its initiation. We also find that Exd function is independent of its known partner homothorax in this apoptosis. We extend some of our findings to Deformed expressing region of sub-esophageal ganglia where pNBs undergo a similar Hox dependent apoptosis. We propose a mechanism where common players like Exd-Grh-Notch work with different Hox genes through region specific enhancers to pattern respective segments of larval central nervous system.
Purifying Selection on Exonic Splice Enhancers in Intronless Genes

PubMed Central

Savisaar, Rosina; Hurst, Laurence D.

2016-01-01

Exonic splice enhancers (ESEs) are short nucleotide motifs, enriched near exon ends, that enhance the recognition of the splice site and thus promote splicing. Are intronless genes under selection to avoid these motifs so as not to attract the splicing machinery to an mRNA that should not be spliced, thereby preventing the production of an aberrant transcript? Consistent with this possibility, we find that ESEs in putative recent retrocopies are at a higher density and evolving faster than those in other intronless genes, suggesting that they are being lost. Moreover, intronless genes are less dense in putative ESEs than intron-containing ones. However, this latter difference is likely due to the skewed base composition of intronless sequences, a skew that is in line with the general GC richness of few exon genes. Indeed, after controlling for such biases, we find that both intronless and intron-containing genes are denser in ESEs than expected by chance. Importantly, nucleotide-controlled analysis of evolutionary rates at synonymous sites in ESEs indicates that the ESEs in intronless genes are under purifying selection in both human and mouse. We conclude that on the loss of introns, some but not all, ESE motifs are lost, the remainder having functions beyond a role in splice promotion. These results have implications for the design of intronless transgenes and for understanding the causes of selection on synonymous sites. PMID:26802218
Structural and Functional Investigations of the N-Terminal Ubiquitin Binding Region of Usp25.

PubMed

Yang, Yuanyuan; Shi, Li; Ding, Yiluan; Shi, Yanhong; Hu, Hong-Yu; Wen, Yi; Zhang, Naixia

2017-05-23

Ubiquitin-specific protease 25 (Usp25) is a deubiquitinase that is involved in multiple biological processes. The N-terminal ubiquitin-binding region (UBR) of Usp25 contains one ubiquitin-associated domain, one small ubiquitin-like modifier (SUMO)-interacting motif and two ubiquitin-interacting motifs. Previous studies suggest that the covalent sumoylation in the UBR of Usp25 impairs its enzymatic activity. Here, we raise the hypothesis that non-covalent binding of SUMO, a prerequisite for efficient sumoylation, will impair Usp25's catalytic activity as well. To test our hypothesis and elucidate the underlying molecular mechanism, we investigated the structure and function of the Usp25 N-terminal UBR. The solution structure of Usp25 1-146 is obtained, and the key residues responsible for recognition of ubiquitin and SUMO2 are identified. Our data suggest inhibition of Usp25's catalytic activity upon the non-covalent binding of SUMO2 to the Usp25 SUMO-interacting motif. We also find that SUMO2 can competitively block the interaction between the Usp25 UBR and its ubiquitin substrates. Based on our findings, we have proposed a working model to depict the regulatory role of the Usp25 UBR in the functional display of the enzyme. Copyright © 2017 Biophysical Society. Published by Elsevier Inc. All rights reserved.
Building a dictionary for genomes: Identification of presumptive regulatory sites by statistical analysis

PubMed Central

Bussemaker, Harmen J.; Li, Hao; Siggia, Eric D.

2000-01-01

The availability of complete genome sequences and mRNA expression data for all genes creates new opportunities and challenges for identifying DNA sequence motifs that control gene expression. An algorithm, “MobyDick,” is presented that decomposes a set of DNA sequences into the most probable dictionary of motifs or words. This method is applicable to any set of DNA sequences: for example, all upstream regions in a genome or all genes expressed under certain conditions. Identification of words is based on a probabilistic segmentation model in which the significance of longer words is deduced from the frequency of shorter ones of various lengths, eliminating the need for a separate set of reference data to define probabilities. We have built a dictionary with 1,200 words for the 6,000 upstream regulatory regions in the yeast genome; the 500 most significant words (some with as few as 10 copies in all of the upstream regions) match 114 of 443 experimentally determined sites (a significance level of 18 standard deviations). When analyzing all of the genes up-regulated during sporulation as a group, we find many motifs in addition to the few previously identified by analyzing the subclusters individually to the expression subclusters. Applying MobyDick to the genes derepressed when the general repressor Tup1 is deleted, we find known as well as putative binding sites for its regulatory partners. PMID:10944202
Motivated Proteins: A web application for studying small three-dimensional protein motifs

PubMed Central

Leader, David P; Milner-White, E James

2009-01-01

Background Small loop-shaped motifs are common constituents of the three-dimensional structure of proteins. Typically they comprise between three and seven amino acid residues, and are defined by a combination of dihedral angles and hydrogen bonding partners. The most abundant of these are αβ-motifs, asx-motifs, asx-turns, β-bulges, β-bulge loops, β-turns, nests, niches, Schellmann loops, ST-motifs, ST-staples and ST-turns. We have constructed a database of such motifs from a range of high-quality protein structures and built a web application as a visual interface to this. Description The web application, Motivated Proteins, provides access to these 12 motifs (with 48 sub-categories) in a database of over 400 representative proteins. Queries can be made for specific categories or sub-categories of motif, motifs in the vicinity of ligands, motifs which include part of an enzyme active site, overlapping motifs, or motifs which include a particular amino acid sequence. Individual proteins can be specified, or, where appropriate, motifs for all proteins listed. The results of queries are presented in textual form as an (X)HTML table, and may be saved as parsable plain text or XML. Motifs can be viewed and manipulated either individually or in the context of the protein in the Jmol applet structural viewer. Cartoons of the motifs imposed on a linear representation of protein secondary structure are also provided. Summary information for the motifs is available, as are histograms of amino acid distribution, and graphs of dihedral angles at individual positions in the motifs. Conclusion Motivated Proteins is a publicly and freely accessible web application that enables protein scientists to study small three-dimensional motifs without requiring knowledge of either Structured Query Language or the underlying database schema. PMID:19210785

A computational approach to candidate gene prioritization for X-linked mental retardation using annotation-based binary filtering and motif-based linear discriminatory analysis

PubMed Central

2011-01-01

Background Several computational candidate gene selection and prioritization methods have recently been developed. These in silico selection and prioritization techniques are usually based on two central approaches - the examination of similarities to known disease genes and/or the evaluation of functional annotation of genes. Each of these approaches has its own caveats. Here we employ a previously described method of candidate gene prioritization based mainly on gene annotation, in accompaniment with a technique based on the evaluation of pertinent sequence motifs or signatures, in an attempt to refine the gene prioritization approach. We apply this approach to X-linked mental retardation (XLMR), a group of heterogeneous disorders for which some of the underlying genetics is known. Results The gene annotation-based binary filtering method yielded a ranked list of putative XLMR candidate genes with good plausibility of being associated with the development of mental retardation. In parallel, a motif finding approach based on linear discriminatory analysis (LDA) was employed to identify short sequence patterns that may discriminate XLMR from non-XLMR genes. High rates (>80%) of correct classification was achieved, suggesting that the identification of these motifs effectively captures genomic signals associated with XLMR vs. non-XLMR genes. The computational tools developed for the motif-based LDA is integrated into the freely available genomic analysis portal Galaxy (http://main.g2.bx.psu.edu/). Nine genes (APLN, ZC4H2, MAGED4, MAGED4B, RAP2C, FAM156A, FAM156B, TBL1X, and UXT) were highlighted as highly-ranked XLMR methods. Conclusions The combination of gene annotation information and sequence motif-orientated computational candidate gene prediction methods highlight an added benefit in generating a list of plausible candidate genes, as has been demonstrated for XLMR. Reviewers: This article was reviewed by Dr Barbara Bardoni (nominated by Prof Juergen Brosius); Prof Neil Smalheiser and Dr Dustin Holloway (nominated by Prof Charles DeLisi). PMID:21668950
Identification of early zygotic genes in the yellow fever mosquito Aedes aegypti and discovery of a motif involved in early zygotic genome activation.

PubMed

Biedler, James K; Hu, Wanqi; Tae, Hongseok; Tu, Zhijian

2012-01-01

During early embryogenesis the zygotic genome is transcriptionally silent and all mRNAs present are of maternal origin. The maternal-zygotic transition marks the time over which embryogenesis changes its dependence from maternal RNAs to zygotically transcribed RNAs. Here we present the first systematic investigation of early zygotic genes (EZGs) in a mosquito species and focus on genes involved in the onset of transcription during 2-4 hr. We used transcriptome sequencing to identify the "pure" (without maternal expression) EZGs by analyzing transcripts from four embryonic time ranges of 0-2, 2-4, 4-8, and 8-12 hr, which includes the time of cellular blastoderm formation and up to the start of gastrulation. Blast of 16,789 annotated transcripts vs. the transcriptome reads revealed evidence for 63 (P<0.001) and 143 (P<0.05) nonmaternally derived transcripts having a significant increase in expression at 2-4 hr. One third of the 63 EZG transcripts do not have predicted introns compared to 10% of all Ae. aegypti genes. We have confirmed by RT-PCR that zygotic transcription starts as early as 2-3 hours. A degenerate motif VBRGGTA was found to be overrepresented in the upstream sequences of the identified EZGs using a motif identification software called SCOPE. We find evidence for homology between this motif and the TAGteam motif found in Drosophila that has been implicated in EZG activation. A 38 bp sequence in the proximal upstream sequence of a kinesin light chain EZG (KLC2.1) contains two copies of the mosquito motif. This sequence was shown to support EZG transcription by luciferase reporter assays performed on injected early embryos, and confers early zygotic activity to a heterologous promoter from a divergent mosquito species. The results of these studies are consistent with the model of early zygotic genome activation via transcriptional activators, similar to what has been found recently in Drosophila.
Hydrophobic motif site-phosphorylated protein kinase CβII between mTORC2 and Akt regulates high glucose-induced mesangial cell hypertrophy.

PubMed

Das, Falguni; Ghosh-Choudhury, Nandini; Mariappan, Meenalakshmi M; Kasinath, Balakuntalam S; Choudhury, Goutam Ghosh

2016-04-01

PKCβII controls the pathologic features of diabetic nephropathy, including glomerular mesangial cell hypertrophy. PKCβII contains the COOH-terminal hydrophobic motif site Ser-660. Whether this hydrophobic motif phosphorylation contributes to high glucose-induced mesangial cell hypertrophy has not been determined. Here we show that, in mesangial cells, high glucose increased phosphorylation of PKCβII at Ser-660 in a phosphatidylinositol 3-kinase (PI3-kinase)-dependent manner. Using siRNAs to downregulate PKCβII, dominant negative PKCβII, and PKCβII hydrophobic motif phosphorylation-deficient mutant, we found that PKCβII regulates activation of mechanistic target of rapamycin complex 1 (mTORC1) and mesangial cell hypertrophy by high glucose. PKCβII via its phosphorylation at Ser-660 regulated phosphorylation of Akt at both catalytic loop and hydrophobic motif sites, resulting in phosphorylation and inactivation of its substrate PRAS40. Specific inhibition of mTORC2 increased mTORC1 activity and induced mesangial cell hypertrophy. In contrast, inhibition of mTORC2 decreased the phosphorylation of PKCβII and Akt, leading to inhibition of PRAS40 phosphorylation and mTORC1 activity and prevented mesangial cell hypertrophy in response to high glucose; expression of constitutively active Akt or mTORC1 restored mesangial cell hypertrophy. Moreover, constitutively active PKCβII reversed the inhibition of high glucose-stimulated Akt phosphorylation and mesangial cell hypertrophy induced by suppression of mTORC2. Finally, using renal cortexes from type 1 diabetic mice, we found that increased phosphorylation of PKCβII at Ser-660 was associated with enhanced Akt phosphorylation and mTORC1 activation. Collectively, our findings identify a signaling route connecting PI3-kinase to mTORC2 to phosphorylate PKCβII at the hydrophobic motif site necessary for Akt phosphorylation and mTORC1 activation, leading to mesangial cell hypertrophy.
Hydrophobic motif site-phosphorylated protein kinase CβII between mTORC2 and Akt regulates high glucose-induced mesangial cell hypertrophy

PubMed Central

Das, Falguni; Mariappan, Meenalakshmi M.; Kasinath, Balakuntalam S.; Choudhury, Goutam Ghosh

2016-01-01

PKCβII controls the pathologic features of diabetic nephropathy, including glomerular mesangial cell hypertrophy. PKCβII contains the COOH-terminal hydrophobic motif site Ser-660. Whether this hydrophobic motif phosphorylation contributes to high glucose-induced mesangial cell hypertrophy has not been determined. Here we show that, in mesangial cells, high glucose increased phosphorylation of PKCβII at Ser-660 in a phosphatidylinositol 3-kinase (PI3-kinase)-dependent manner. Using siRNAs to downregulate PKCβII, dominant negative PKCβII, and PKCβII hydrophobic motif phosphorylation-deficient mutant, we found that PKCβII regulates activation of mechanistic target of rapamycin complex 1 (mTORC1) and mesangial cell hypertrophy by high glucose. PKCβII via its phosphorylation at Ser-660 regulated phosphorylation of Akt at both catalytic loop and hydrophobic motif sites, resulting in phosphorylation and inactivation of its substrate PRAS40. Specific inhibition of mTORC2 increased mTORC1 activity and induced mesangial cell hypertrophy. In contrast, inhibition of mTORC2 decreased the phosphorylation of PKCβII and Akt, leading to inhibition of PRAS40 phosphorylation and mTORC1 activity and prevented mesangial cell hypertrophy in response to high glucose; expression of constitutively active Akt or mTORC1 restored mesangial cell hypertrophy. Moreover, constitutively active PKCβII reversed the inhibition of high glucose-stimulated Akt phosphorylation and mesangial cell hypertrophy induced by suppression of mTORC2. Finally, using renal cortexes from type 1 diabetic mice, we found that increased phosphorylation of PKCβII at Ser-660 was associated with enhanced Akt phosphorylation and mTORC1 activation. Collectively, our findings identify a signaling route connecting PI3-kinase to mTORC2 to phosphorylate PKCβII at the hydrophobic motif site necessary for Akt phosphorylation and mTORC1 activation, leading to mesangial cell hypertrophy. PMID:26739493
Motif-based analysis of large nucleotide data sets using MEME-ChIP

PubMed Central

Ma, Wenxiu; Noble, William S; Bailey, Timothy L

2014-01-01

MEME-ChIP is a web-based tool for analyzing motifs in large DNA or RNA data sets. It can analyze peak regions identified by ChIP-seq, cross-linking sites identified by cLIP-seq and related assays, as well as sets of genomic regions selected using other criteria. MEME-ChIP performs de novo motif discovery, motif enrichment analysis, motif location analysis and motif clustering, providing a comprehensive picture of the DNA or RNA motifs that are enriched in the input sequences. MEME-ChIP performs two complementary types of de novo motif discovery: weight matrix–based discovery for high accuracy; and word-based discovery for high sensitivity. Motif enrichment analysis using DNA or RNA motifs from human, mouse, worm, fly and other model organisms provides even greater sensitivity. MEME-ChIP’s interactive HTML output groups and aligns significant motifs to ease interpretation. this protocol takes less than 3 h, and it provides motif discovery approaches that are distinct and complementary to other online methods. PMID:24853928
PromoterCAD: data-driven design of plant regulatory DNA

PubMed Central

Cox, Robert Sidney; Nishikata, Koro; Shimoyama, Sayoko; Yoshida, Yuko; Matsui, Minami; Makita, Yuko; Toyoda, Tetsuro

2013-01-01

Synthetic promoters can control the timing, location and amount of gene expression for any organism. PromoterCAD is a web application for designing synthetic promoters with altered transcriptional regulation. We use a data-first approach, using published high-throughput expression and motif data from for Arabidopsis thaliana to guide DNA design. We demonstrate data mining tools for finding motifs related to circadian oscillations and tissue-specific expression patterns. PromoterCAD is built on the LinkData open platform for data publication and rapid web application development, allowing new data to be easily added, and the source code modified to add new functionality. PromoterCAD URL: http://promotercad.org. LinkData URL: http://linkdata.org. PMID:23766287
Identifying the scale-dependent motifs in atmospheric surface layer by ordinal pattern analysis

NASA Astrophysics Data System (ADS)

Li, Qinglei; Fu, Zuntao

2018-07-01

Ramp-like structures in various atmospheric surface layer time series have been long studied, but the presence of motifs with the finer scale embedded within larger scale ramp-like structures has largely been overlooked in the reported literature. Here a novel, objective and well-adapted methodology, the ordinal pattern analysis, is adopted to study the finer-scaled motifs in atmospheric boundary-layer (ABL) time series. The studies show that the motifs represented by different ordinal patterns take clustering properties and 6 dominated motifs out of the whole 24 motifs account for about 45% of the time series under particular scales, which indicates the higher contribution of motifs with the finer scale to the series. Further studies indicate that motif statistics are similar for both stable conditions and unstable conditions at larger scales, but large discrepancies are found at smaller scales, and the frequencies of motifs "1234" and/or "4321" are a bit higher under stable conditions than unstable conditions. Under stable conditions, there are great changes for the occurrence frequencies of motifs "1234" and "4321", where the occurrence frequencies of motif "1234" decrease from nearly 24% to 4.5% with the scale factor increasing, and the occurrence frequencies of motif "4321" change nonlinearly with the scale increasing. These great differences of dominated motifs change with scale can be taken as an indicator to quantify the flow structure changes under different stability conditions, and motif entropy can be defined just by only 6 dominated motifs to quantify this time-scale independent property of the motifs. All these results suggest that the defined scale of motifs with the finer scale should be carefully taken into consideration in the interpretation of turbulence coherent structures.
Identity and functions of CxxC-derived motifs.

PubMed

Fomenko, Dmitri E; Gladyshev, Vadim N

2003-09-30

Two cysteines separated by two other residues (the CxxC motif) are employed by many redox proteins for formation, isomerization, and reduction of disulfide bonds and for other redox functions. The place of the C-terminal cysteine in this motif may be occupied by serine (the CxxS motif), modifying the functional repertoire of redox proteins. Here we found that the CxxC motif may also give rise to a motif, in which the C-terminal cysteine is replaced with threonine (the CxxT motif). Moreover, in contrast to a view that the N-terminal cysteine in the CxxC motif always serves as a nucleophilic attacking group, this residue could also be replaced with threonine (the TxxC motif), serine (the SxxC motif), or other residues. In each of these CxxC-derived motifs, the presence of a downstream alpha-helix was strongly favored. A search for conserved CxxC-derived motif/helix patterns in four complete genomes representing bacteria, archaea, and eukaryotes identified known redox proteins and suggested possible redox functions for several additional proteins. Catalytic sites in peroxiredoxins were major representatives of the TxxC motif, whereas those in glutathione peroxidases represented the CxxT motif. Structural assessments indicated that threonines in these enzymes could stabilize catalytic thiolates, suggesting revisions to previously proposed catalytic triads. Each of the CxxC-derived motifs was also observed in natural selenium-containing proteins, in which selenocysteine was present in place of a catalytic cysteine.
Regions of extreme synonymous codon selection in mammalian genes

PubMed Central

Schattner, Peter; Diekhans, Mark

2006-01-01

Recently there has been increasing evidence that purifying selection occurs among synonymous codons in mammalian genes. This selection appears to be a consequence of either cis-regulatory motifs, such as exonic splicing enhancers (ESEs), or mRNA secondary structures, being superimposed on the coding sequence of the gene. We have developed a program to identify regions likely to be enriched for such motifs by searching for extended regions of extreme codon conservation between homologous genes of related species. Here we present the results of applying this approach to five mammalian species (human, chimpanzee, mouse, rat and dog). Even with very conservative selection criteria, we find over 200 regions of extreme codon conservation, ranging in length from 60 to 178 codons. The regions are often found within genes involved in DNA-binding, RNA-binding or zinc-ion-binding. They are highly depleted for synonymous single nucleotide polymorphisms (SNPs) but not for non-synonymous SNPs, further indicating that the observed codon conservation is being driven by negative selection. Forty-three percent of the regions overlap conserved alternative transcript isoforms and are enriched for known ESEs. Other regions are enriched for TpA dinucleotides and may contain conserved motifs/structures relating to mRNA stability and/or degradation. We anticipate that this tool will be useful for detecting regions enriched in other classes of coding-sequence motifs and structures as well. PMID:16556911
A putative N-terminal nuclear export sequence is sufficient for Mps1 nuclear exclusion during interphase.

PubMed

Jia, Haiwei; Zhang, Xiaojuan; Wang, Wenjun; Bai, Yuanyuan; Ling, Youguo; Cao, Cheng; Ma, Runlin Z; Zhong, Hui; Wang, Xue; Xu, Quanbin

2015-02-27

Mps1, an essential component of the mitotic checkpoint, is also an important interphase regulator and has roles in DNA damage response, cytokinesis and centrosome duplication. Mps1 predominantly resides in the cytoplasm and relocates into the nucleus at the late G2 phase. So far, the mechanism underlying the Mps1 translocation between the cytoplasm and nucleus has been unclear. In this work, a dynamic export process of Mps1 from the nucleus to cytoplasm in interphase was revealed- a process blocked by the Crm1 inhibitor, Leptomycin B, suggesting that export of Mps1 is Crm1 dependent. Consistent with this speculation, a direct association between Mps1 and Crm1 was found. Furthermore, a putative nuclear export sequence (pNES) motif at the N-terminal of Mps1 was identified by analyzing the motif of Mps1. This motif shows a high sequence similarity to the classic NES, a fusion of this motif with EGFP results in dramatic exclusion of the fusion protein from the nucleus. Additionally, Mps1 mutant loss of pNES integrity was shown by replacing leucine with alanine which produced a diffused subcellular distribution, compared to the wild type protein which resides predominantly in cytoplasm. Taken these findings together, it was concluded that the pNES sequence is sufficient for the Mps1 export from nucleus during interphase.
Identification and classification of hubs in brain networks.

PubMed

Sporns, Olaf; Honey, Christopher J; Kötter, Rolf

2007-10-17

Brain regions in the mammalian cerebral cortex are linked by a complex network of fiber bundles. These inter-regional networks have previously been analyzed in terms of their node degree, structural motif, path length and clustering coefficient distributions. In this paper we focus on the identification and classification of hub regions, which are thought to play pivotal roles in the coordination of information flow. We identify hubs and characterize their network contributions by examining motif fingerprints and centrality indices for all regions within the cerebral cortices of both the cat and the macaque. Motif fingerprints capture the statistics of local connection patterns, while measures of centrality identify regions that lie on many of the shortest paths between parts of the network. Within both cat and macaque networks, we find that a combination of degree, motif participation, betweenness centrality and closeness centrality allows for reliable identification of hub regions, many of which have previously been functionally classified as polysensory or multimodal. We then classify hubs as either provincial (intra-cluster) hubs or connector (inter-cluster) hubs, and proceed to show that lesioning hubs of each type from the network produces opposite effects on the small-world index. Our study presents an approach to the identification and classification of putative hub regions in brain networks on the basis of multiple network attributes and charts potential links between the structural embedding of such regions and their functional roles.
Viral infection and human disease - insights from minimotifs

PubMed Central

Kadaveru, Krishna; Vyas, Jay; Schiller, Martin R.

2008-01-01

Short functional peptide motifs cooperate in many molecular functions including protein interactions, protein trafficking, and posttranslational modifications. Viruses exploit these motifs as a principal mechanism for hijacking cells and many motifs are necessary for the viral life-cycle. A virus can accommodate many short motifs in its small genome size providing a plethora of ways for the virus to acquire host molecular machinery. Host enzymes that act on motifs such as kinases, proteases, and lipidation enzymes, as well as protein interaction domains, are commonly mutated in human disease, suggesting that the short peptide motif targets of these enzymes may also be mutated in disease; however, this is not observed. How can we explain why viruses have evolved to be so dependent on motifs, yet these motifs, in general do not seem to be as necessary for human viability? We propose that short motifs are used at the system level. This system architecture allows viruses to exploit a motif, whereas the viability of the host is not affected by mutation of a single motif. PMID:18508672
A single amino-acid change in a highly conserved motif of gp41 elicits HIV-1 neutralization and protects against CD4 depletion.

PubMed

Petitdemange, Caroline; Achour, Abla; Dispinseri, Stefania; Malet, Isabelle; Sennepin, Alexis; Ho Tsong Fang, Raphaël; Crouzet, Joël; Marcelin, Anne-Geneviève; Calvez, Vincent; Scarlatti, Gabriella; Debré, Patrice; Vieillard, Vincent

2013-09-01

The induction of neutralizing antibodies against conserved regions of the human immunodeficiency virus type 1 (HIV-1) envelope protein is a major goal of vaccine strategies. We previously identified 3S, a critical conserved motif of gp41 that induces the NKp44L ligand of an activating NK receptor. In vivo, anti-3S antibodies protect against the natural killer (NK) cell-mediated CD4 depletion that occurs without efficient viral neutralization. Specific substitutions within the 3S peptide motif were prepared by directed mutagenesis. Virus production was monitored by measuring the p24 production. Neutralization assays were performed with immune-purified antibodies from immunized mice and a cohort of HIV-infected patients. Expression of NKp44L on CD4(+) T cells and degranulation assay on activating NK cells were both performed by flow cytometry. Here, we show that specific substitutions in the 3S motif reduce viral infection without affecting gp41 production, while decreasing both its capacity to induce NKp44L expression on CD4(+) T cells and its sensitivity to autologous NK cells. Generation of antibodies in mice against the W614 specific position in the 3S motif elicited a capacity to neutralize cross-clade viruses, notable in its magnitude, breadth, and durability. Antibodies against this 3S variant were also detected in sera from some HIV-1-infected patients, demonstrating both neutralization activity and protection against CD4 depletion. These findings suggest that a specific substitution in a 3S-based immunogen might allow the generation of specific antibodies, providing a foundation for a rational vaccine that combine a capacity to neutralize HIV-1 and to protect CD4(+) T cells.
A Chromatin Insulator-Like Element in the Herpes Simplex Virus Type 1 Latency-Associated Transcript Region Binds CCCTC-Binding Factor and Displays Enhancer-Blocking and Silencing Activities

PubMed Central

Amelio, Antonio L.; McAnany, Peterjon K.; Bloom, David C.

2006-01-01

A previous study demonstrated that the latency-associated transcript (LAT) promoter and the LAT enhancer/reactivation critical region (rcr) are enriched in acetyl histone H3 (K9, K14) during herpes simplex virus type 1 (HSV-1) latency, whereas all lytic genes analyzed (ICP0, UL54, ICP4, and DNA polymerase) are not (N. J. Kubat, R. K. Tran, P. McAnany, and D. C. Bloom, J. Virol. 78:1139-1149, 2004). This suggests that the HSV-1 latent genome is organized into histone H3 (K9, K14) hyperacetylated and hypoacetylated regions corresponding to transcriptionally permissive and transcriptionally repressed chromatin domains, respectively. Such an organization implies that chromatin insulators, similar to those of cellular chromosomes, may separate distinct transcriptional domains of the HSV-1 latent genome. In the present study, we sought to identify cis elements that could partition the HSV-1 genome into distinct chromatin domains. Sequence analysis coupled with chromatin immunoprecipitation and luciferase reporter assays revealed that (i) the long and short repeats and the unique-short region of the HSV-1 genome contain clustered CTCF (CCCTC-binding factor) motifs, (ii) CTCF motif clusters similar to those in HSV-1 are conserved in other alphaherpesviruses, (iii) CTCF binds to these motifs on latent HSV-1 genomes in vivo, and (iv) a 1.5-kb region containing the CTCF motif cluster in the LAT region possesses insulator activities, specifically, enhancer blocking and silencing. The finding that CTCF, a cellular protein associated with chromatin insulators, binds to motifs on the latent genome and insulates the LAT enhancer suggests that CTCF may facilitate the formation of distinct chromatin boundaries during herpesvirus latency. PMID:16474142
The Runt domain of AML1 (RUNX1) binds a sequence-conserved RNA motif that mimics a DNA element.

PubMed

Fukunaga, Junichi; Nomura, Yusuke; Tanaka, Yoichiro; Amano, Ryo; Tanaka, Taku; Nakamura, Yoshikazu; Kawai, Gota; Sakamoto, Taiichi; Kozu, Tomoko

2013-07-01

AML1 (RUNX1) is a key transcription factor for hematopoiesis that binds to the Runt-binding double-stranded DNA element (RDE) of target genes through its N-terminal Runt domain. Aberrations in the AML1 gene are frequently found in human leukemia. To better understand AML1 and its potential utility for diagnosis and therapy, we obtained RNA aptamers that bind specifically to the AML1 Runt domain. Enzymatic probing and NMR analyses revealed that Apt1-S, which is a truncated variant of one of the aptamers, has a CACG tetraloop and two stem regions separated by an internal loop. All the isolated aptamers were found to contain the conserved sequence motif 5'-NNCCAC-3' and 5'-GCGMGN'N'-3' (M:A or C; N and N' form Watson-Crick base pairs). The motif contains one AC mismatch and one base bulged out. Mutational analysis of Apt1-S showed that three guanines of the motif are important for Runt binding as are the three guanines of RDE, which are directly recognized by three arginine residues of the Runt domain. Mutational analyses of the Runt domain revealed that the amino acid residues used for Apt1-S binding were similar to those used for RDE binding. Furthermore, the aptamer competed with RDE for binding to the Runt domain in vitro. These results demonstrated that the Runt domain of the AML1 protein binds to the motif of the aptamer that mimics DNA. Our findings should provide new insights into RNA function and utility in both basic and applied sciences.
The Runt domain of AML1 (RUNX1) binds a sequence-conserved RNA motif that mimics a DNA element

PubMed Central

Fukunaga, Junichi; Nomura, Yusuke; Tanaka, Yoichiro; Amano, Ryo; Tanaka, Taku; Nakamura, Yoshikazu; Kawai, Gota; Sakamoto, Taiichi; Kozu, Tomoko

2013-01-01

AML1 (RUNX1) is a key transcription factor for hematopoiesis that binds to the Runt-binding double-stranded DNA element (RDE) of target genes through its N-terminal Runt domain. Aberrations in the AML1 gene are frequently found in human leukemia. To better understand AML1 and its potential utility for diagnosis and therapy, we obtained RNA aptamers that bind specifically to the AML1 Runt domain. Enzymatic probing and NMR analyses revealed that Apt1-S, which is a truncated variant of one of the aptamers, has a CACG tetraloop and two stem regions separated by an internal loop. All the isolated aptamers were found to contain the conserved sequence motif 5′-NNCCAC-3′ and 5′-GCGMGN′N′-3′ (M:A or C; N and N′ form Watson–Crick base pairs). The motif contains one AC mismatch and one base bulged out. Mutational analysis of Apt1-S showed that three guanines of the motif are important for Runt binding as are the three guanines of RDE, which are directly recognized by three arginine residues of the Runt domain. Mutational analyses of the Runt domain revealed that the amino acid residues used for Apt1-S binding were similar to those used for RDE binding. Furthermore, the aptamer competed with RDE for binding to the Runt domain in vitro. These results demonstrated that the Runt domain of the AML1 protein binds to the motif of the aptamer that mimics DNA. Our findings should provide new insights into RNA function and utility in both basic and applied sciences. PMID:23709277
Analysis of cagA in Helicobacter pylori strains from Colombian populations with contrasting gastric cancer risk reveals a biomarker for disease severity

PubMed Central

Loh, John T.; Shaffer, Carrie L.; Piazuelo, M. Blanca; Bravo, Luis E.; McClain, Mark S.; Correa, Pelayo; Cover, Timothy L.

2011-01-01

BACKGROUND Helicobacter pylori infection is a risk factor for the development of gastric cancer, and the bacterial oncoprotein CagA contributes to gastric carcinogenesis. METHODS We analyzed H. pylori isolates from persons in Colombia and observed that there was marked variation among strains in levels of CagA expression. To elucidate the basis for this variation, we analyzed sequences upstream from the CagA translational initiation site in each strain. RESULTS A DNA motif (AATAAGATA) upstream of the translational initiation site of CagA was associated with high levels of CagA expression. Experimental studies showed that this motif was necessary but not sufficient for high-level CagA expression. H. pylori strains from a region of Colombia with high gastric cancer rates expressed higher levels of CagA than did strains from a region with lower gastric cancer rates, and Colombian strains of European phylogeographic origin expressed higher levels of CagA than did strains of African origin. Histopathological analysis of gastric biopsy specimens revealed that strains expressing high levels of CagA or containing the AATAAGATA motif were associated with more advanced precancerous lesions than those found in persons infected with strains expressing low levels of CagA or lacking the AATAAGATA motif. CONCLUSIONS CagA expression varies greatly among H. pylori strains. The DNA motif identified in this study is associated with high levels of CagA expression, and may be a useful biomarker to predict gastric cancer risk. IMPACT These findings help to explain why some persons infected with cagA-positive H. pylori develop gastric cancer and others do not. PMID:21859954
In-silico mining, type and frequency analysis of genic microsatellites of finger millet (Eleusine coracana (L.) Gaertn.): a comparative genomic analysis of NBS-LRR regions of finger millet with rice.

PubMed

Kalyana Babu, B; Pandey, Dinesh; Agrawal, P K; Sood, Salej; Kumar, Anil

2014-05-01

In recent years, the increased availability of the DNA sequences has given the possibility to develop and explore the expressed sequence tags (ESTs) derived SSR markers. In the present study, a total of 1956 ESTs of finger millet were used to find the microsatellite type, distribution, frequency and developed a total of 545 primer pairs from the ESTs of finger millet. Thirty-two EST sequences had more than two microsatellites and 1357 sequences did not have any SSR repeats. The most frequent type of repeats was trimeric motif, however the second place was occupied by dimeric motif followed by tetra-, hexa- and penta repeat motifs. The most common dimer repeat motif was GA and in case of trimeric SSRs, it was CGG. The EST sequences of NBS-LRR region of finger millet and rice showed higher synteny and were found on nearly same positions on the rice chromosome map. A total of eight, out of 15 EST based SSR primers were polymorphic among the selected resistant and susceptible finger millet genotypes. The primer FMBLEST5 could able to differentiate them into resistant and susceptible genotypes. The alleles specific to the resistant and susceptible genotypes were sequenced using the ABI 3130XL genetic analyzer and found similarity to NBS-LRR regions of rice and finger millet and contained the characteristic kinase-2 and kinase 3a motifs of plant R-genes belonged to NBS-LRR region. The In-silico and comparative analysis showed that the genes responsible for blast resistance can be identified, mapped and further introgressed through molecular breeding approaches for enhancing the blast resistance in finger millet.
Automated classification of RNA 3D motifs and the RNA 3D Motif Atlas

PubMed Central

Petrov, Anton I.; Zirbel, Craig L.; Leontis, Neocles B.

2013-01-01

The analysis of atomic-resolution RNA three-dimensional (3D) structures reveals that many internal and hairpin loops are modular, recurrent, and structured by conserved non-Watson–Crick base pairs. Structurally similar loops define RNA 3D motifs that are conserved in homologous RNA molecules, but can also occur at nonhomologous sites in diverse RNAs, and which often vary in sequence. To further our understanding of RNA motif structure and sequence variability and to provide a useful resource for structure modeling and prediction, we present a new method for automated classification of internal and hairpin loop RNA 3D motifs and a new online database called the RNA 3D Motif Atlas. To classify the motif instances, a representative set of internal and hairpin loops is automatically extracted from a nonredundant list of RNA-containing PDB files. Their structures are compared geometrically, all-against-all, using the FR3D program suite. The loops are clustered into motif groups, taking into account geometric similarity and structural annotations and making allowance for a variable number of bulged bases. The automated procedure that we have implemented identifies all hairpin and internal loop motifs previously described in the literature. All motif instances and motif groups are assigned unique and stable identifiers and are made available in the RNA 3D Motif Atlas (http://rna.bgsu.edu/motifs), which is automatically updated every four weeks. The RNA 3D Motif Atlas provides an interactive user interface for exploring motif diversity and tools for programmatic data access. PMID:23970545
Homeostasis in a feed forward loop gene regulatory motif.

PubMed

Antoneli, Fernando; Golubitsky, Martin; Stewart, Ian

2018-05-14

The internal state of a cell is affected by inputs from the extra-cellular environment such as external temperature. If some output, such as the concentration of a target protein, remains approximately constant as inputs vary, the system exhibits homeostasis. Special sub-networks called motifs are unusually common in gene regulatory networks (GRNs), suggesting that they may have a significant biological function. Potentially, one such function is homeostasis. In support of this hypothesis, we show that the feed-forward loop GRN produces homeostasis. Here the inputs are subsumed into a single parameter that affects only the first node in the motif, and the output is the concentration of a target protein. The analysis uses the notion of infinitesimal homeostasis, which occurs when the input-output map has a critical point (zero derivative). In model equations such points can be located using implicit differentiation. If the second derivative of the input-output map also vanishes, the critical point is a chair: the output rises roughly linearly, then flattens out (the homeostasis region or plateau), and then starts to rise again. Chair points are a common cause of homeostasis. In more complicated equations or networks, numerical exploration would have to augment analysis. Thus, in terms of finding chairs, this paper presents a proof of concept. We apply this method to a standard family of differential equations modeling the feed-forward loop GRN, and deduce that chair points occur. This function determines the production of a particular mRNA and the resulting chair points are found analytically. The same method can potentially be used to find homeostasis regions in other GRNs. In the discussion and conclusion section, we also discuss why homeostasis in the motif may persist even when the rest of the network is taken into account. Copyright © 2018 Elsevier Ltd. All rights reserved.

The Drosophila hnRNP F/H Homolog Glorund Uses Two Distinct RNA-Binding Modes to Diversify Target Recognition.

PubMed

Tamayo, Joel V; Teramoto, Takamasa; Chatterjee, Seema; Hall, Traci M Tanaka; Gavis, Elizabeth R

2017-04-04

The Drosophila hnRNP F/H homolog, Glorund (Glo), regulates nanos mRNA translation by interacting with a structured UA-rich motif in the nanos 3' untranslated region. Glo regulates additional RNAs, however, and mammalian homologs bind G-tract sequences to regulate alternative splicing, suggesting that Glo also recognizes G-tract RNA. To gain insight into how Glo recognizes both structured UA-rich and G-tract RNAs, we used mutational analysis guided by crystal structures of Glo's RNA-binding domains and identified two discrete RNA-binding surfaces that allow Glo to recognize both RNA motifs. By engineering Glo variants that favor a single RNA-binding mode, we show that a subset of Glo's functions in vivo is mediated solely by the G-tract binding mode, whereas regulation of nanos requires both recognition modes. Our findings suggest a molecular mechanism for the evolution of dual RNA motif recognition in Glo that may be applied to understanding the functional diversity of other RNA-binding proteins. Copyright © 2017 The Author(s). Published by Elsevier Inc. All rights reserved.
Evolution of the tRNALeu (UAA) Intron and Congruence of Genetic Markers in Lichen-Symbiotic Nostoc

PubMed Central

Kaasalainen, Ulla; Olsson, Sanna; Rikkinen, Jouko

2015-01-01

The group I intron interrupting the tRNALeu UAA gene (trnL) is present in most cyanobacterial genomes as well as in the plastids of many eukaryotic algae and all green plants. In lichen symbiotic Nostoc, the P6b stem-loop of trnL intron always involves one of two different repeat motifs, either Class I or Class II, both with unresolved evolutionary histories. Here we attempt to resolve the complex evolution of the two different trnL P6b region types. Our analysis indicates that the Class II repeat motif most likely appeared first and that independent and unidirectional shifts to the Class I motif have since taken place repeatedly. In addition, we compare our results with those obtained with other genetic markers and find strong evidence of recombination in the 16S rRNA gene, a marker widely used in phylogenetic studies on Bacteria. The congruence of the different genetic markers is successfully evaluated with the recently published software Saguaro, which has not previously been utilized in comparable studies. PMID:26098760
Evolution of the tRNALeu (UAA) Intron and Congruence of Genetic Markers in Lichen-Symbiotic Nostoc.

PubMed

Kaasalainen, Ulla; Olsson, Sanna; Rikkinen, Jouko

2015-01-01

The group I intron interrupting the tRNALeu UAA gene (trnL) is present in most cyanobacterial genomes as well as in the plastids of many eukaryotic algae and all green plants. In lichen symbiotic Nostoc, the P6b stem-loop of trnL intron always involves one of two different repeat motifs, either Class I or Class II, both with unresolved evolutionary histories. Here we attempt to resolve the complex evolution of the two different trnL P6b region types. Our analysis indicates that the Class II repeat motif most likely appeared first and that independent and unidirectional shifts to the Class I motif have since taken place repeatedly. In addition, we compare our results with those obtained with other genetic markers and find strong evidence of recombination in the 16S rRNA gene, a marker widely used in phylogenetic studies on Bacteria. The congruence of the different genetic markers is successfully evaluated with the recently published software Saguaro, which has not previously been utilized in comparable studies.
The Drosophila hnRNP F/H homolog glorund uses two distinct RNA-binding modes to diversify target recognition

DOE PAGES

Tamayo, Joel V.; Teramoto, Takamasa; Chatterjee, Seema; ...

2017-04-04

The Drosophila hnRNP F/H homolog, Glorund (Glo), regulates nanos mRNA translation by interacting with a structured UA-rich motif in the nanos 3' untranslated region. Glo regulates additional RNAs, however, and mammalian homologs bind G-tract sequences to regulate alternative splicing, suggesting that Glo also recognizes G-tract RNA. To gain insight into how Glo recognizes both structured UA-rich and G-tract RNAs, we used mutational analysis guided by crystal structures of Glo’s RNA-binding domains and identified two discrete RNA-binding surfaces that allow Glo to recognize both RNA motifs. By engineering Glo variants that favor a single RNA-binding mode, we show that a subsetmore » of Glo’s functions in vivo is mediated solely by the G-tract binding mode, whereas regulation of nanos requires both recognition modes. Lastly, our findings suggest a molecular mechanism for the evolution of dual RNA motif recognition in Glo that may be applied to understanding the functional diversity of other RNA-binding proteins.« less
Arrestin Scaffolds NHERF1 to the P2Y12 Receptor to Regulate Receptor Internalization*

PubMed Central

Nisar, Shaista P.; Cunningham, Margaret; Saxena, Kunal; Pope, Robert J.; Kelly, Eamonn; Mundell, Stuart J.

2012-01-01

We have recently shown in a patient with mild bleeding that the PDZ-binding motif of the platelet G protein-coupled P2Y12 receptor (P2Y12R) is required for effective receptor traffic in human platelets. In this study we show for the first time that the PDZ motif-binding protein NHERF1 exerts a major role in potentiating G protein-coupled receptor (GPCR) internalization. NHERF1 interacts with the C-tail of the P2Y12R and unlike many other GPCRs, NHERF1 interaction is required for effective P2Y12R internalization. In vitro and prior to agonist stimulation P2Y12R/NHERF1 interaction requires the intact PDZ binding motif of this receptor. Interestingly on receptor stimulation NHERF1 no longer interacts directly with the receptor but instead binds to the receptor via the endocytic scaffolding protein arrestin. These findings suggest a novel model by which arrestin can serve as an adaptor to promote NHERF1 interaction with a GPCR to facilitate effective NHERF1-dependent receptor internalization. PMID:22610101
Intrinsically disordered proteins drive enamel formation via an evolutionarily conserved self-assembly motif.

PubMed

Wald, Tomas; Spoutil, Frantisek; Osickova, Adriana; Prochazkova, Michaela; Benada, Oldrich; Kasparek, Petr; Bumba, Ladislav; Klein, Ophir D; Sedlacek, Radislav; Sebo, Peter; Prochazka, Jan; Osicka, Radim

2017-02-28

The formation of mineralized tissues is governed by extracellular matrix proteins that assemble into a 3D organic matrix directing the deposition of hydroxyapatite. Although the formation of bones and dentin depends on the self-assembly of type I collagen via the Gly-X-Y motif, the molecular mechanism by which enamel matrix proteins (EMPs) assemble into the organic matrix remains poorly understood. Here we identified a Y/F-x-x-Y/L/F-x-Y/F motif, evolutionarily conserved from the first tetrapods to man, that is crucial for higher order structure self-assembly of the key intrinsically disordered EMPs, ameloblastin and amelogenin. Using targeted mutations in mice and high-resolution imaging, we show that impairment of ameloblastin self-assembly causes disorganization of the enamel organic matrix and yields enamel with disordered hydroxyapatite crystallites. These findings define a paradigm for the molecular mechanism by which the EMPs self-assemble into supramolecular structures and demonstrate that this process is crucial for organization of the organic matrix and formation of properly structured enamel.
The Drosophila hnRNP F/H Homolog Glorund Uses Two Distinct RNA-Binding Modes to Diversify Target Recognition

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tamayo, Joel V.; Teramoto, Takamasa; Chatterjee, Seema

The Drosophila hnRNP F/H homolog, Glorund (Glo), regulates nanos mRNA translation by interacting with a structured UA-rich motif in the nanos 3' untranslated region. Glo regulates additional RNAs, however, and mammalian homologs bind G-tract sequences to regulate alternative splicing, suggesting that Glo also recognizes G-tract RNA. To gain insight into how Glo recognizes both structured UA-rich and G-tract RNAs, we used mutational analysis guided by crystal structures of Glo’s RNA-binding domains and identified two discrete RNA-binding surfaces that allow Glo to recognize both RNA motifs. By engineering Glo variants that favor a single RNA-binding mode, we show that a subsetmore » of Glo’s functions in vivo is mediated solely by the G-tract binding mode, whereas regulation of nanos requires both recognition modes. Our findings suggest a molecular mechanism for the evolution of dual RNA motif recognition in Glo that may be applied to understanding the functional diversity of other RNA-binding proteins.« less
The Drosophila hnRNP F/H homolog glorund uses two distinct RNA-binding modes to diversify target recognition

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tamayo, Joel V.; Teramoto, Takamasa; Chatterjee, Seema

The Drosophila hnRNP F/H homolog, Glorund (Glo), regulates nanos mRNA translation by interacting with a structured UA-rich motif in the nanos 3' untranslated region. Glo regulates additional RNAs, however, and mammalian homologs bind G-tract sequences to regulate alternative splicing, suggesting that Glo also recognizes G-tract RNA. To gain insight into how Glo recognizes both structured UA-rich and G-tract RNAs, we used mutational analysis guided by crystal structures of Glo’s RNA-binding domains and identified two discrete RNA-binding surfaces that allow Glo to recognize both RNA motifs. By engineering Glo variants that favor a single RNA-binding mode, we show that a subsetmore » of Glo’s functions in vivo is mediated solely by the G-tract binding mode, whereas regulation of nanos requires both recognition modes. Lastly, our findings suggest a molecular mechanism for the evolution of dual RNA motif recognition in Glo that may be applied to understanding the functional diversity of other RNA-binding proteins.« less
Arrestin scaffolds NHERF1 to the P2Y12 receptor to regulate receptor internalization.

PubMed

Nisar, Shaista P; Cunningham, Margaret; Saxena, Kunal; Pope, Robert J; Kelly, Eamonn; Mundell, Stuart J

2012-07-13

We have recently shown in a patient with mild bleeding that the PDZ-binding motif of the platelet G protein-coupled P2Y(12) receptor (P2Y(12)R) is required for effective receptor traffic in human platelets. In this study we show for the first time that the PDZ motif-binding protein NHERF1 exerts a major role in potentiating G protein-coupled receptor (GPCR) internalization. NHERF1 interacts with the C-tail of the P2Y(12)R and unlike many other GPCRs, NHERF1 interaction is required for effective P2Y(12)R internalization. In vitro and prior to agonist stimulation P2Y(12)R/NHERF1 interaction requires the intact PDZ binding motif of this receptor. Interestingly on receptor stimulation NHERF1 no longer interacts directly with the receptor but instead binds to the receptor via the endocytic scaffolding protein arrestin. These findings suggest a novel model by which arrestin can serve as an adaptor to promote NHERF1 interaction with a GPCR to facilitate effective NHERF1-dependent receptor internalization.
Hyperactive antifreeze proteins from longhorn beetles: some structural insights.

PubMed

Kristiansen, Erlend; Wilkens, Casper; Vincents, Bjarne; Friis, Dennis; Lorentzen, Anders Blomkild; Jenssen, Håvard; Løbner-Olesen, Anders; Ramløv, Hans

2012-11-01

This study reports on structural characteristics of hyperactive antifreeze proteins (AFPs) from two species of longhorn beetles. In Rhagium mordax, eight unique mRNAs coding for five different mature AFPs were identified from cold-hardy individuals. These AFPs are apparently homologues to a previously characterized AFP from the closely related species Rhagium inquisitor, and consist of six identifiable repeats of a putative ice binding motif TxTxTxT spaced irregularly apart by segments varying in length from 13 to 20 residues. Circular dichroism spectra show that the AFPs from both species have a high content of β-sheet and low levels of α-helix and random coil. Theoretical predictions of residue-specific secondary structure locate these β-sheets within the putative ice-binding motifs and the central parts of the segments separating them, consistent with an overall β-helical structure with the ice-binding motifs stacked in a β-sheet on one side of the coil. Molecular dynamics models based on these findings show that these AFPs would be energetically stable in a β-helical conformation. Copyright © 2012 Elsevier Ltd. All rights reserved.
Swarm intelligence in bioinformatics: methods and implementations for discovering patterns of multiple sequences.

PubMed

Cui, Zhihua; Zhang, Yi

2014-02-01

As a promising and innovative research field, bioinformatics has attracted increasing attention recently. Beneath the enormous number of open problems in this field, one fundamental issue is about the accurate and efficient computational methodology that can deal with tremendous amounts of data. In this paper, we survey some applications of swarm intelligence to discover patterns of multiple sequences. To provide a deep insight, ant colony optimization, particle swarm optimization, artificial bee colony and artificial fish swarm algorithm are selected, and their applications to multiple sequence alignment and motif detecting problem are discussed.
MotifNet: a web-server for network motif analysis.

PubMed

Smoly, Ilan Y; Lerman, Eugene; Ziv-Ukelson, Michal; Yeger-Lotem, Esti

2017-06-15

Network motifs are small topological patterns that recur in a network significantly more often than expected by chance. Their identification emerged as a powerful approach for uncovering the design principles underlying complex networks. However, available tools for network motif analysis typically require download and execution of computationally intensive software on a local computer. We present MotifNet, the first open-access web-server for network motif analysis. MotifNet allows researchers to analyze integrated networks, where nodes and edges may be labeled, and to search for motifs of up to eight nodes. The output motifs are presented graphically and the user can interactively filter them by their significance, number of instances, node and edge labels, and node identities, and view their instances. MotifNet also allows the user to distinguish between motifs that are centered on specific nodes and motifs that recur in distinct parts of the network. MotifNet is freely available at http://netbio.bgu.ac.il/motifnet . The website was implemented using ReactJs and supports all major browsers. The server interface was implemented in Python with data stored on a MySQL database. estiyl@bgu.ac.il or michaluz@cs.bgu.ac.il. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Chinese lexical networks: The structure, function and formation

NASA Astrophysics Data System (ADS)

Li, Jianyu; Zhou, Jie; Luo, Xiaoyue; Yang, Zhanxin

2012-11-01

In this paper Chinese phrases are modeled using complex networks theory. We analyze statistical properties of the networks and find that phrase networks display some important features: not only small world and the power-law distribution, but also hierarchical structure and disassortative mixing. These statistical traits display the global organization of Chinese phrases. The origin and formation of such traits are analyzed from a macroscopic Chinese culture and philosophy perspective. It is interesting to find that Chinese culture and philosophy may shape the formation and structure of Chinese phrases. To uncover the structural design principles of networks, network motif patterns are studied. It is shown that they serve as basic building blocks to form the whole phrase networks, especially triad 38 (feed forward loop) plays a more important role in forming most of the phrases and other motifs. The distinct structure may not only keep the networks stable and robust, but also be helpful for information processing. The results of the paper can give some insight into Chinese language learning and language acquisition. It strengthens the idea that learning the phrases helps to understand Chinese culture. On the other side, understanding Chinese culture and philosophy does help to learn Chinese phrases. The hub nodes in the networks show the close relationship with Chinese culture and philosophy. Learning or teaching the hub characters, hub-linking phrases and phrases which are meaning related based on motif feature should be very useful and important for Chinese learning and acquisition.
A conserved intronic U1 snRNP-binding sequence promotes trans-splicing in Drosophila

PubMed Central

Gao, Jun-Li; Fan, Yu-Jie; Wang, Xiu-Ye; Zhang, Yu; Pu, Jia; Li, Liang; Shao, Wei; Zhan, Shuai; Hao, Jianjiang

2015-01-01

Unlike typical cis-splicing, trans-splicing joins exons from two separate transcripts to produce chimeric mRNA and has been detected in most eukaryotes. Trans-splicing in trypanosomes and nematodes has been characterized as a spliced leader RNA-facilitated reaction; in contrast, its mechanism in higher eukaryotes remains unclear. Here we investigate mod(mdg4), a classic trans-spliced gene in Drosophila, and report that two critical RNA sequences in the middle of the last 5′ intron, TSA and TSB, promote trans-splicing of mod(mdg4). In TSA, a 13-nucleotide (nt) core motif is conserved across Drosophila species and is essential and sufficient for trans-splicing, which binds U1 small nuclear RNP (snRNP) through strong base-pairing with U1 snRNA. In TSB, a conserved secondary structure acts as an enhancer. Deletions of TSA and TSB using the CRISPR/Cas9 system result in developmental defects in flies. Although it is not clear how the 5′ intron finds the 3′ introns, compensatory changes in U1 snRNA rescue trans-splicing of TSA mutants, demonstrating that U1 recruitment is critical to promote trans-splicing in vivo. Furthermore, TSA core-like motifs are found in many other trans-spliced Drosophila genes, including lola. These findings represent a novel mechanism of trans-splicing, in which RNA motifs in the 5′ intron are sufficient to bring separate transcripts into close proximity to promote trans-splicing. PMID:25838544
De Novo Regulatory Motif Discovery Identifies Significant Motifs in Promoters of Five Classes of Plant Dehydrin Genes.

PubMed

Zolotarov, Yevgen; Strömvik, Martina

2015-01-01

Plants accumulate dehydrins in response to osmotic stresses. Dehydrins are divided into five different classes, which are thought to be regulated in different manners. To better understand differences in transcriptional regulation of the five dehydrin classes, de novo motif discovery was performed on 350 dehydrin promoter sequences from a total of 51 plant genomes. Overrepresented motifs were identified in the promoters of five dehydrin classes. The Kn dehydrin promoters contain motifs linked with meristem specific expression, as well as motifs linked with cold/dehydration and abscisic acid response. KS dehydrin promoters contain a motif with a GATA core. SKn and YnSKn dehydrin promoters contain motifs that match elements connected with cold/dehydration, abscisic acid and light response. YnKn dehydrin promoters contain motifs that match abscisic acid and light response elements, but not cold/dehydration response elements. Conserved promoter motifs are present in the dehydrin classes and across different plant lineages, indicating that dehydrin gene regulation is likely also conserved.
A general strategy to solve the phase problem in RNA crystallography

PubMed Central

Keel, Amanda Y.; Rambo, Robert P.; Batey, Robert T.; Kieft, Jeffrey S.

2007-01-01

SUMMARY X-ray crystallography of biologically important RNA molecules has been hampered by technical challenges, including finding a heavy-atom derivative to obtain high-quality experimental phase information. Existing techniques have drawbacks, severely limiting the rate at which important new structures are solved. To address this need, we have developed a reliable means to localize heavy atoms specifically to virtually any RNA. By solving the crystal structures of thirteen variants of the G·U wobble pair cation binding motif we have identified an optimal version that when inserted into an RNA helix introduces a high-occupancy cation binding site suitable for phasing. This “directed soaking” strategy can be integrated fully into existing RNA and crystallography methods, potentially increasing the rate at which important structures are solved and facilitating routine solving of structures using Cu-Kα radiation. The success of this method has been proven in that it has already been used to solve several novel crystal structures. PMID:17637337
Transcriptome analyses reveal SR45 to be a neutral splicing regulator and a suppressor of innate immunity in Arabidopsis thaliana.

PubMed

Zhang, Xiao-Ning; Shi, Yifei; Powers, Jordan J; Gowda, Nikhil B; Zhang, Chong; Ibrahim, Heba M M; Ball, Hannah B; Chen, Samuel L; Lu, Hua; Mount, Stephen M

2017-10-11

Regulation of pre-mRNA splicing diversifies protein products and affects many biological processes. Arabidopsis thaliana Serine/Arginine-rich 45 (SR45), regulates pre-mRNA splicing by interacting with other regulatory proteins and spliceosomal subunits. Although SR45 has orthologs in diverse eukaryotes, including human RNPS1, the sr45-1 null mutant is viable. Narrow flower petals and reduced seed formation suggest that SR45 regulates genes involved in diverse processes, including reproduction. To understand how SR45 is involved in the regulation of reproductive processes, we studied mRNA from the wild-type and sr45-1 inflorescences using RNA-seq, and identified SR45-bound RNAs by immunoprecipitation. Using a variety of bioinformatics tools, we identified a total of 358 SR45 differentially regulated (SDR) genes, 542 SR45-dependent alternative splicing (SAS) events, and 1812 SR45-associated RNAs (SARs). There is little overlap between SDR genes and SAS genes, and neither set of genes is enriched for flower or seed development. However, transcripts from reproductive process genes are significantly overrepresented in SARs. In exploring the fate of SARs, we found that a total of 81 SARs are subject to alternative splicing, while 14 of them are known Nonsense-Mediated Decay (NMD) targets. Motifs related to GGNGG are enriched both in SARs and near different types of SAS events, suggesting that SR45 recognizes this motif directly. Genes involved in plant defense are significantly over-represented among genes whose expression is suppressed by SR45, and sr45-1 plants do indeed show enhanced immunity. We find that SR45 is a suppressor of innate immunity. We find that a single motif (GGNGG) is highly enriched in both RNAs bound by SR45 and in sequences near SR45- dependent alternative splicing events in inflorescence tissue. We find that the alternative splicing events regulated by SR45 are enriched for this motif whether the effect of SR45 is activation or repression of the particular event. Thus, our data suggests that SR45 acts to control splice site choice in a way that defies simple categorization as an activator or repressor of splicing.
A self-assembling peptide RADA16-I integrated with spider fibroin uncrystalline motifs

PubMed Central

Sun, Lijuan; Zhao, Xiaojun

2012-01-01

Mechanical strength of nanofiber scaffolds formed by the self-assembling peptide RADA16-I or its derivatives is not very good and limits their application. To address this problem, we inserted spidroin uncrystalline motifs, which confer incomparable elasticity and hydrophobicity to spider silk GGAGGS or GPGGY, into the C-terminus of RADA16-I to newly design two peptides: R3 (n-RADARADARADARADA-GGAGGS-c) and R4 (n-RADARADARADARADA-GPGGY-c), and then observed the effect of these motifs on biophysical properties of the peptide. Atomic force microscopy, transmitting electron microscopy, and circular dichroism spectroscopy confirm that R3 and R4 display β-sheet structure and self-assemble into long nanofibers. Compared with R3, the β-sheet structure and nanofibers formed by R4 are more stable; they change to random coil and unordered aggregation at higher temperature. Rheology measurements indicate that novel peptides form hydrogel when induced by DMEM, and the storage modulus of R3 and R4 hydrogel is 0.5 times and 3 times higher than that of RADA16-I, respectively. Furthermore, R4 hydrogel remarkably promotes growth of liver cell L02 and liver cancer cell SMCC7721 compared with 2D culture, determined by MTT assay. Novel peptides still have potential as hydrophobic drug carriers; they can stabilize pyrene microcrystals in aqueous solution and deliver this into a lipophilic environment, identified by fluorescence emission spectra. Altogether, the spider fibroin motif GPGGY most effectively enhances mechanical strength and hydrophobicity of the peptide. This study provides a new method in the design of nanobiomaterials and helps us to understand the role of the amino acid sequence in nanofiber formation. PMID:22346352
Phosphorylation of PPP(S/T)P motif of the free LRP6 intracellular domain is not required to activate the Wnt/beta-catenin pathway and attenuate GSK3beta activity.

PubMed

Beagle, Brandon; Mi, Kaihong; Johnson, Gail V W

2009-11-01

The canonical Wnt/beta-catenin signaling pathway plays a critical role in numerous physiological and pathological processes. LRP6 is an essential co-receptor for Wnt/beta-catenin signaling; as transduction of the Wnt signal is strongly dependent upon GSK3beta-mediated phosphorylation of multiple PPP(S/T)P motifs within the membrane-anchored LRP6 intracellular domain. Previously, we showed that the free LRP6 intracellular domain (LRP6-ICD) can activate the Wnt/beta-catenin pathway in a beta-catenin and TCF/LEF-1 dependent manner, as well as interact with and attenuate GSK3beta activity. However, it is unknown if the ability of LRP6-ICD to attenuate GSK3beta activity and modulate activation of the Wnt/beta-catenin pathway requires phosphorylation of the LRP6-ICD PPP(S/T)P motifs, in a manner similar to the membrane-anchored LRP6 intracellular domain. Here we provide evidence that the LRP6-ICD does not have to be phosphorylated at its PPP(S/T)P motif by GSK3beta to stabilize endogenous cytosolic beta-catenin resulting in activation of TCF/LEF-1 and the Wnt/beta-catenin pathway. LRP6-ICD and a mutant in which all 5 PPP(S/T)P motifs were changed to PPP(A)P motifs equivalently interacted with and attenuated GSK3beta activity in vitro, and both constructs inhibited the in situ GSK3beta-mediated phosphorylation of beta-catenin and tau to the same extent. These data indicate that the LRP6-ICD attenuates GSK3beta activity similar to other GSK3beta binding proteins, and is not a result of it being a GSK3beta substrate. Our findings suggest the functional and regulatory mechanisms governing the free LRP6-ICD may be distinct from membrane-anchored LRP6, and that release of the LRP6-ICD may provide a complimentary signaling cascade capable of modulating Wnt-dependent gene expression. (c) 2009 Wiley-Liss, Inc.
The snoRNA domain of vertebrate telomerase RNA functions to localize the RNA within the nucleus.

PubMed Central

Lukowiak, A A; Narayanan, A; Li, Z H; Terns, R M; Terns, M P

2001-01-01

Telomerase RNA is an essential component of the ribonucleoprotein enzyme involved in telomere length maintenance, a process implicated in cellular senescence and cancer. Vertebrate telomerase RNAs contain a box H/ACA snoRNA motif that is not required for telomerase activity in vitro but is essential in vivo. Using the Xenopus oocyte system, we have found that the box H/ACA motif functions in the subcellular localization of telomerase RNA. We have characterized the transport and biogenesis of telomerase RNA by injecting labeled wild-type and variant RNAs into Xenopus oocytes and assaying nucleocytoplasmic distribution, intranuclear localization, modification, and protein binding. Although yeast telomerase RNA shares characteristics of spliceosomal snRNAs, we show that human telomerase RNA is not associated with Sm proteins or efficiently imported into the nucleus. In contrast, the transport properties of vertebrate telomerase RNA resemble those of snoRNAs; telomerase RNA is retained in the nucleus and targeted to nucleoli. Furthermore, both nuclear retention and nucleolar localization depend on the box H/ACA motif. Our findings suggest that the H/ACA motif confers functional localization of vertebrate telomerase RNAs to the nucleus, the compartment where telomeres are synthesized. We have also found that telomerase RNA localizes to Cajal bodies, intranuclear structures where it is thought that assembly of various cellular RNPs takes place. Our results identify the Cajal body as a potential site of telomerase RNP biogenesis. PMID:11780638

Specific primary sequence requirements for Aurora B kinase-mediated phosphorylation and subcellular localization of TMAP during mitosis.

PubMed

Kim, Hyun-Jun; Kwon, Hye-Rim; Bae, Chang-Dae; Park, Joobae; Hong, Kyung U

2010-05-15

During mitosis, regulation of protein structures and functions by phosphorylation plays critical roles in orchestrating a series of complex events essential for the cell division process. Tumor-associated microtubule-associated protein (TMAP), also known as cytoskeleton-associated protein 2 (CKAP2), is a novel player in spindle assembly and chromosome segregation. We have previously reported that TMAP is phosphorylated at multiple residues specifically during mitosis. However, the mechanisms and functional importance of phosphorylation at most of the sites identified are currently unknown. Here, we report that TMAP is a novel substrate of the Aurora B kinase. Ser627 of TMAP was specifically phosphorylated by Aurora B both in vitro and in vivo. Ser627 and neighboring conserved residues were strictly required for efficient phosphorylation of TMAP by Aurora B, as even minor amino acid substitutions of the phosphorylation motif significantly diminished the efficiency of the substrate phosphorylation. Nearly all mutations at the phosphorylation motif had dramatic effects on the subcellular localization of TMAP. Instead of being localized to the chromosome region during late mitosis, the mutants remained associated with microtubules and centrosomes throughout mitosis. However, the changes in the subcellular localization of these mutants could not be completely explained by the phosphorylation status on Ser627. Our findings suggest that the motif surrounding Ser627 ((625) RRSRRL (630)) is a critical part of a functionally important sequence motif which not only governs the kinase-substrate recognition, but also regulates the subcellular localization of TMAP during mitosis.
GrammarViz 3.0: Interactive Discovery of Variable-Length Time Series Patterns

DOE PAGES

Senin, Pavel; Lin, Jessica; Wang, Xing; ...

2018-02-23

The problems of recurrent and anomalous pattern discovery in time series, e.g., motifs and discords, respectively, have received a lot of attention from researchers in the past decade. However, since the pattern search space is usually intractable, most existing detection algorithms require that the patterns have discriminative characteristics and have its length known in advance and provided as input, which is an unreasonable requirement for many real-world problems. In addition, patterns of similar structure, but of different lengths may co-exist in a time series. In order to address these issues, we have developed algorithms for variable-length time series pattern discoverymore » that are based on symbolic discretization and grammar inference—two techniques whose combination enables the structured reduction of the search space and discovery of the candidate patterns in linear time. In this work, we present GrammarViz 3.0—a software package that provides implementations of proposed algorithms and graphical user interface for interactive variable-length time series pattern discovery. The current version of the software provides an alternative grammar inference algorithm that improves the time series motif discovery workflow, and introduces an experimental procedure for automated discretization parameter selection that builds upon the minimum cardinality maximum cover principle and aids the time series recurrent and anomalous pattern discovery.« less
GrammarViz 3.0: Interactive Discovery of Variable-Length Time Series Patterns

DOE Office of Scientific and Technical Information (OSTI.GOV)

Senin, Pavel; Lin, Jessica; Wang, Xing

The problems of recurrent and anomalous pattern discovery in time series, e.g., motifs and discords, respectively, have received a lot of attention from researchers in the past decade. However, since the pattern search space is usually intractable, most existing detection algorithms require that the patterns have discriminative characteristics and have its length known in advance and provided as input, which is an unreasonable requirement for many real-world problems. In addition, patterns of similar structure, but of different lengths may co-exist in a time series. In order to address these issues, we have developed algorithms for variable-length time series pattern discoverymore » that are based on symbolic discretization and grammar inference—two techniques whose combination enables the structured reduction of the search space and discovery of the candidate patterns in linear time. In this work, we present GrammarViz 3.0—a software package that provides implementations of proposed algorithms and graphical user interface for interactive variable-length time series pattern discovery. The current version of the software provides an alternative grammar inference algorithm that improves the time series motif discovery workflow, and introduces an experimental procedure for automated discretization parameter selection that builds upon the minimum cardinality maximum cover principle and aids the time series recurrent and anomalous pattern discovery.« less
SCOPE: a web server for practical de novo motif discovery.

PubMed

Carlson, Jonathan M; Chakravarty, Arijit; DeZiel, Charles E; Gross, Robert H

2007-07-01

SCOPE is a novel parameter-free method for the de novo identification of potential regulatory motifs in sets of coordinately regulated genes. The SCOPE algorithm combines the output of three component algorithms, each designed to identify a particular class of motifs. Using an ensemble learning approach, SCOPE identifies the best candidate motifs from its component algorithms. In tests on experimentally determined datasets, SCOPE identified motifs with a significantly higher level of accuracy than a number of other web-based motif finders run with their default parameters. Because SCOPE has no adjustable parameters, the web server has an intuitive interface, requiring only a set of gene names or FASTA sequences and a choice of species. The most significant motifs found by SCOPE are displayed graphically on the main results page with a table containing summary statistics for each motif. Detailed motif information, including the sequence logo, PWM, consensus sequence and specific matching sites can be viewed through a single click on a motif. SCOPE's efficient, parameter-free search strategy has enabled the development of a web server that is readily accessible to the practising biologist while providing results that compare favorably with those of other motif finders. The SCOPE web server is at .
Convergent evolution and mimicry of protein linear motifs in host-pathogen interactions.

PubMed

Chemes, Lucía Beatriz; de Prat-Gay, Gonzalo; Sánchez, Ignacio Enrique

2015-06-01

Pathogen linear motif mimics are highly evolvable elements that facilitate rewiring of host protein interaction networks. Host linear motifs and pathogen mimics differ in sequence, leading to thermodynamic and structural differences in the resulting protein-protein interactions. Moreover, the functional output of a mimic depends on the motif and domain repertoire of the pathogen protein. Regulatory evolution mediated by linear motifs can be understood by measuring evolutionary rates, quantifying positive and negative selection and performing phylogenetic reconstructions of linear motif natural history. Convergent evolution of linear motif mimics is widespread among unrelated proteins from viral, prokaryotic and eukaryotic pathogens and can also take place within individual protein phylogenies. Statistics, biochemistry and laboratory models of infection link pathogen linear motifs to phenotypic traits such as tropism, virulence and oncogenicity. In vitro evolution experiments and analysis of natural sequences suggest that changes in linear motif composition underlie pathogen adaptation to a changing environment. Copyright © 2015 Elsevier Ltd. All rights reserved.
Enrichment of Circular Code Motifs in the Genes of the Yeast Saccharomyces cerevisiae.

PubMed

Michel, Christian J; Ngoune, Viviane Nguefack; Poch, Olivier; Ripp, Raymond; Thompson, Julie D

2017-12-03

A set X of 20 trinucleotides has been found to have the highest average occurrence in the reading frame, compared to the two shifted frames, of genes of bacteria, archaea, eukaryotes, plasmids and viruses. This set X has an interesting mathematical property, since X is a maximal C3 self-complementary trinucleotide circular code. Furthermore, any motif obtained from this circular code X has the capacity to retrieve, maintain and synchronize the original (reading) frame. Since 1996, the theory of circular codes in genes has mainly been developed by analysing the properties of the 20 trinucleotides of X, using combinatorics and statistical approaches. For the first time, we test this theory by analysing the X motifs, i.e., motifs from the circular code X, in the complete genome of the yeast Saccharomyces cerevisiae . Several properties of X motifs are identified by basic statistics (at the frequency level), and evaluated by comparison to R motifs, i.e., random motifs generated from 30 different random codes R. We first show that the frequency of X motifs is significantly greater than that of R motifs in the genome of S. cerevisiae . We then verify that no significant difference is observed between the frequencies of X and R motifs in the non-coding regions of S. cerevisiae , but that the occurrence number of X motifs is significantly higher than R motifs in the genes (protein-coding regions). This property is true for all cardinalities of X motifs (from 4 to 20) and for all 16 chromosomes. We further investigate the distribution of X motifs in the three frames of S. cerevisiae genes and show that they occur more frequently in the reading frame, regardless of their cardinality or their length. Finally, the ratio of X genes, i.e., genes with at least one X motif, to non-X genes, in the set of verified genes is significantly different to that observed in the set of putative or dubious genes with no experimental evidence. These results, taken together, represent the first evidence for a significant enrichment of X motifs in the genes of an extant organism. They raise two hypotheses: the X motifs may be evolutionary relics of the primitive codes used for translation, or they may continue to play a functional role in the complex processes of genome decoding and protein synthesis.
Discriminative motif optimization based on perceptron training

PubMed Central

Patel, Ronak Y.; Stormo, Gary D.

2014-01-01

Motivation: Generating accurate transcription factor (TF) binding site motifs from data generated using the next-generation sequencing, especially ChIP-seq, is challenging. The challenge arises because a typical experiment reports a large number of sequences bound by a TF, and the length of each sequence is relatively long. Most traditional motif finders are slow in handling such enormous amount of data. To overcome this limitation, tools have been developed that compromise accuracy with speed by using heuristic discrete search strategies or limited optimization of identified seed motifs. However, such strategies may not fully use the information in input sequences to generate motifs. Such motifs often form good seeds and can be further improved with appropriate scoring functions and rapid optimization. Results: We report a tool named discriminative motif optimizer (DiMO). DiMO takes a seed motif along with a positive and a negative database and improves the motif based on a discriminative strategy. We use area under receiver-operating characteristic curve (AUC) as a measure of discriminating power of motifs and a strategy based on perceptron training that maximizes AUC rapidly in a discriminative manner. Using DiMO, on a large test set of 87 TFs from human, drosophila and yeast, we show that it is possible to significantly improve motifs identified by nine motif finders. The motifs are generated/optimized using training sets and evaluated on test sets. The AUC is improved for almost 90% of the TFs on test sets and the magnitude of increase is up to 39%. Availability and implementation: DiMO is available at http://stormo.wustl.edu/DiMO Contact: rpatel@genetics.wustl.edu, ronakypatel@gmail.com PMID:24369152
Identification and Targeting of an Interaction between a Tyrosine Motif within Hepatitis C Virus Core Protein and AP2M1 Essential for Viral Assembly

PubMed Central

Ziv-Av, Amotz; Gerber, Doron; Jacob, Yves; Einav, Shirit

2012-01-01

Novel therapies are urgently needed against hepatitis C virus infection (HCV), a major global health problem. The current model of infectious virus production suggests that HCV virions are assembled on or near the surface of lipid droplets, acquire their envelope at the ER, and egress through the secretory pathway. The mechanisms of HCV assembly and particularly the role of viral-host protein-protein interactions in mediating this process are, however, poorly understood. We identified a conserved heretofore unrecognized YXXΦ motif (Φ is a bulky hydrophobic residue) within the core protein. This motif is homologous to sorting signals within host cargo proteins known to mediate binding of AP2M1, the μ subunit of clathrin adaptor protein complex 2 (AP-2), and intracellular trafficking. Using microfluidics affinity analysis, protein-fragment complementation assays, and co-immunoprecipitations in infected cells, we show that this motif mediates core binding to AP2M1. YXXΦ mutations, silencing AP2M1 expression or overexpressing a dominant negative AP2M1 mutant had no effect on HCV RNA replication, however, they dramatically inhibited intra- and extracellular infectivity, consistent with a defect in viral assembly. Quantitative confocal immunofluorescence analysis revealed that core's YXXΦ motif mediates recruitment of AP2M1 to lipid droplets and that the observed defect in HCV assembly following disruption of core-AP2M1 binding correlates with accumulation of core on lipid droplets, reduced core colocalization with E2 and reduced core localization to trans-Golgi network (TGN), the presumed site of viral particles maturation. Furthermore, AAK1 and GAK, serine/threonine kinases known to stimulate binding of AP2M1 to host cargo proteins, regulate core-AP2M1 binding and are essential for HCV assembly. Last, approved anti-cancer drugs that inhibit AAK1 or GAK not only disrupt core-AP2M1 binding, but also significantly inhibit HCV assembly and infectious virus production. These results validate viral-host interactions essential for HCV assembly and yield compounds for pharmaceutical development. PMID:22916011
RSAT matrix-clustering: dynamic exploration and redundancy reduction of transcription factor binding motif collections

PubMed Central

Jaeger, Sébastien; Thieffry, Denis

2017-01-01

Abstract Transcription factor (TF) databases contain multitudes of binding motifs (TFBMs) from various sources, from which non-redundant collections are derived by manual curation. The advent of high-throughput methods stimulated the production of novel collections with increasing numbers of motifs. Meta-databases, built by merging these collections, contain redundant versions, because available tools are not suited to automatically identify and explore biologically relevant clusters among thousands of motifs. Motif discovery from genome-scale data sets (e.g. ChIP-seq) also produces redundant motifs, hampering the interpretation of results. We present matrix-clustering, a versatile tool that clusters similar TFBMs into multiple trees, and automatically creates non-redundant TFBM collections. A feature unique to matrix-clustering is its dynamic visualisation of aligned TFBMs, and its capability to simultaneously treat multiple collections from various sources. We demonstrate that matrix-clustering considerably simplifies the interpretation of combined results from multiple motif discovery tools, and highlights biologically relevant variations of similar motifs. We also ran a large-scale application to cluster ∼11 000 motifs from 24 entire databases, showing that matrix-clustering correctly groups motifs belonging to the same TF families, and drastically reduced motif redundancy. matrix-clustering is integrated within the RSAT suite (http://rsat.eu/), accessible through a user-friendly web interface or command-line for its integration in pipelines. PMID:28591841
Engineering the shape and structure of materials by fractal cut.

PubMed

Cho, Yigil; Shin, Joong-Ho; Costa, Avelino; Kim, Tae Ann; Kunin, Valentin; Li, Ju; Lee, Su Yeon; Yang, Shu; Han, Heung Nam; Choi, In-Suk; Srolovitz, David J

2014-12-09

In this paper we discuss the transformation of a sheet of material into a wide range of desired shapes and patterns by introducing a set of simple cuts in a multilevel hierarchy with different motifs. Each choice of hierarchical cut motif and cut level allows the material to expand into a unique structure with a unique set of properties. We can reverse-engineer the desired expanded geometries to find the requisite cut pattern to produce it without changing the physical properties of the initial material. The concept was experimentally realized and applied to create an electrode that expands to >800% the original area with only very minor stretching of the underlying material. The generality of our approach greatly expands the design space for materials so that they can be tuned for diverse applications.
Cellular Localization and Characterization of Cytosolic Binding Partners for Gla Domain-containing Proteins PRRG4 and PRRG2*

PubMed Central

Yazicioglu, Mustafa N.; Monaldini, Luca; Chu, Kirk; Khazi, Fayaz R.; Murphy, Samuel L.; Huang, Heshu; Margaritis, Paris; High, Katherine A.

2013-01-01

The genes encoding a family of proteins termed proline-rich γ-carboxyglutamic acid (PRRG) proteins were identified and characterized more than a decade ago, but their functions remain unknown. These novel membrane proteins have an extracellular γ-carboxyglutamic acid (Gla) protein domain and cytosolic WW binding motifs. We screened WW domain arrays for cytosolic binding partners for PRRG4 and identified novel protein-protein interactions for the protein. We also uncovered a new WW binding motif in PRRG4 that is essential for these newly found protein-protein interactions. Several of the PRRG-interacting proteins we identified are essential for a variety of physiologic processes. Our findings indicate possible novel and previously unidentified functions for PRRG proteins. PMID:23873930
TGN38 is maintained in the trans-Golgi network by a tyrosine-containing motif in the cytoplasmic domain.

PubMed Central

Bos, K; Wraight, C; Stanley, K K

1993-01-01

Sorting of proteins destined for different plasma membrane domains, lysosomes and secretory pathways takes place in the trans-Golgi network (TGN). TGN38 is an integral membrane protein found in this intracellular compartment. We show that TGN38 contains an autonomous targeting signal within its cytoplasmic domain which determines its intracellular location. Deletion analysis and site-directed mutagenesis of this domain demonstrate that a tyrosine motif homologous to the internalization signal of surface receptors is necessary and sufficient for correct localization. These findings suggest that TGN38 is maintained in the TGN by retrieval from the plasma membrane and employs a different mechanism for retention from that of the transferase enzymes of the trans-Golgi. Images PMID:8491209
Sequence and conformational preferences at termini of α-helices in membrane proteins: role of the helix environment.

PubMed

Shelar, Ashish; Bansal, Manju

2014-12-01

α-Helices are amongst the most common secondary structural elements seen in membrane proteins and are packed in the form of helix bundles. These α-helices encounter varying external environments (hydrophobic, hydrophilic) that may influence the sequence preferences at their N and C-termini. The role of the external environment in stabilization of the helix termini in membrane proteins is still unknown. Here we analyze α-helices in a high-resolution dataset of integral α-helical membrane proteins and establish that their sequence and conformational preferences differ from those in globular proteins. We specifically examine these preferences at the N and C-termini in helices initiating/terminating inside the membrane core as well as in linkers connecting these transmembrane helices. We find that the sequence preferences and structural motifs at capping (Ncap and Ccap) and near-helical (N' and C') positions are influenced by a combination of features including the membrane environment and the innate helix initiation and termination property of residues forming structural motifs. We also find that a large number of helix termini which do not form any particular capping motif are stabilized by formation of hydrogen bonds and hydrophobic interactions contributed from the neighboring helices in the membrane protein. We further validate the sequence preferences obtained from our analysis with data from an ultradeep sequencing study that identifies evolutionarily conserved amino acids in the rat neurotensin receptor. The results from our analysis provide insights for the secondary structure prediction, modeling and design of membrane proteins. © 2014 Wiley Periodicals, Inc.
Seed storage protein gene promoters contain conserved DNA motifs in Brassicaceae, Fabaceae and Poaceae

PubMed Central

Fauteux, François; Strömvik, Martina V

2009-01-01

Background Accurate computational identification of cis-regulatory motifs is difficult, particularly in eukaryotic promoters, which typically contain multiple short and degenerate DNA sequences bound by several interacting factors. Enrichment in combinations of rare motifs in the promoter sequence of functionally or evolutionarily related genes among several species is an indicator of conserved transcriptional regulatory mechanisms. This provides a basis for the computational identification of cis-regulatory motifs. Results We have used a discriminative seeding DNA motif discovery algorithm for an in-depth analysis of 54 seed storage protein (SSP) gene promoters from three plant families, namely Brassicaceae (mustards), Fabaceae (legumes) and Poaceae (grasses) using backgrounds based on complete sets of promoters from a representative species in each family, namely Arabidopsis (Arabidopsis thaliana (L.) Heynh.), soybean (Glycine max (L.) Merr.) and rice (Oryza sativa L.) respectively. We have identified three conserved motifs (two RY-like and one ACGT-like) in Brassicaceae and Fabaceae SSP gene promoters that are similar to experimentally characterized seed-specific cis-regulatory elements. Fabaceae SSP gene promoter sequences are also enriched in a novel, seed-specific E2Fb-like motif. Conserved motifs identified in Poaceae SSP gene promoters include a GCN4-like motif, two prolamin-box-like motifs and an Skn-1-like motif. Evidence of the presence of a variant of the TATA-box is found in the SSP gene promoters from the three plant families. Motifs discovered in SSP gene promoters were used to score whole-genome sets of promoters from Arabidopsis, soybean and rice. The highest-scoring promoters are associated with genes coding for different subunits or precursors of seed storage proteins. Conclusion Seed storage protein gene promoter motifs are conserved in diverse species, and different plant families are characterized by a distinct combination of conserved motifs. The majority of discovered motifs match experimentally characterized cis-regulatory elements. These results provide a good starting point for further experimental analysis of plant seed-specific promoters and our methodology can be used to unravel more transcriptional regulatory mechanisms in plants and other eukaryotes. PMID:19843335
Temporal motifs reveal homophily, gender-specific patterns, and group talk in call sequences.

PubMed

Kovanen, Lauri; Kaski, Kimmo; Kertész, János; Saramäki, Jari

2013-11-05

Recent studies on electronic communication records have shown that human communication has complex temporal structure. We study how communication patterns that involve multiple individuals are affected by attributes such as sex and age. To this end, we represent the communication records as a colored temporal network where node color is used to represent individuals' attributes, and identify patterns known as temporal motifs. We then construct a null model for the occurrence of temporal motifs that takes into account the interaction frequencies and connectivity between nodes of different colors. This null model allows us to detect significant patterns in call sequences that cannot be observed in a static network that uses interaction frequencies as link weights. We find sex-related differences in communication patterns in a large dataset of mobile phone records and show the existence of temporal homophily, the tendency of similar individuals to participate in communication patterns beyond what would be expected on the basis of their average interaction frequencies. We also show that temporal patterns differ between dense and sparse neighborhoods in the network. Because also this result is independent of interaction frequencies, it can be seen as an extension of Granovetter's hypothesis to temporal networks.
Temporal motifs reveal homophily, gender-specific patterns, and group talk in call sequences

PubMed Central

Kovanen, Lauri; Kaski, Kimmo; Kertész, János; Saramäki, Jari

2013-01-01

Recent studies on electronic communication records have shown that human communication has complex temporal structure. We study how communication patterns that involve multiple individuals are affected by attributes such as sex and age. To this end, we represent the communication records as a colored temporal network where node color is used to represent individuals’ attributes, and identify patterns known as temporal motifs. We then construct a null model for the occurrence of temporal motifs that takes into account the interaction frequencies and connectivity between nodes of different colors. This null model allows us to detect significant patterns in call sequences that cannot be observed in a static network that uses interaction frequencies as link weights. We find sex-related differences in communication patterns in a large dataset of mobile phone records and show the existence of temporal homophily, the tendency of similar individuals to participate in communication patterns beyond what would be expected on the basis of their average interaction frequencies. We also show that temporal patterns differ between dense and sparse neighborhoods in the network. Because also this result is independent of interaction frequencies, it can be seen as an extension of Granovetter’s hypothesis to temporal networks. PMID:24145424
Using Maximum Entropy to Find Patterns in Genomes

NASA Astrophysics Data System (ADS)

Liu, Sophia; Hockenberry, Adam; Lancichinetti, Andrea; Jewett, Michael; Amaral, Luis

The existence of over- and under-represented sequence motifs in genomes provides evidence of selective evolutionary pressures on biological mechanisms such as transcription, translation, ligand-substrate binding, and host immunity. To accurately identify motifs and other genome-scale patterns of interest, it is essential to be able to generate accurate null models that are appropriate for the sequences under study. There are currently no tools available that allow users to create random coding sequences with specified amino acid composition and GC content. Using the principle of maximum entropy, we developed a method that generates unbiased random sequences with pre-specified amino acid and GC content. Our method is the simplest way to obtain maximally unbiased random sequences that are subject to GC usage and primary amino acid sequence constraints. This approach can also be easily be expanded to create unbiased random sequences that incorporate more complicated constraints such as individual nucleotide usage or even di-nucleotide frequencies. The ability to generate correctly specified null models will allow researchers to accurately identify sequence motifs which will lead to a better understanding of biological processes. National Institute of General Medical Science, Northwestern University Presidential Fellowship, National Science Foundation, David and Lucile Packard Foundation, Camille Dreyfus Teacher Scholar Award.
The fibronectin synergy site re-enforces cell adhesion and mediates a crosstalk between integrin classes

PubMed Central

Benito-Jardón, Maria; Klapproth, Sarah; Gimeno-LLuch, Irene; Petzold, Tobias; Bharadwaj, Mitasha; Müller, Daniel J; Zuchtriegel, Gabriele; Reichel, Christoph A; Costell, Mercedes

2017-01-01

Fibronectin (FN), a major extracellular matrix component, enables integrin-mediated cell adhesion via binding of α5β1, αIIbβ3 and αv-class integrins to an RGD-motif. An additional linkage for α5 and αIIb is the synergy site located in close proximity to the RGD motif. We report that mice with a dysfunctional FN-synergy motif (Fn1syn/syn) suffer from surprisingly mild platelet adhesion and bleeding defects due to delayed thrombus formation after vessel injury. Additional loss of β3 integrins dramatically aggravates the bleedings and severely compromises smooth muscle cell coverage of the vasculature leading to embryonic lethality. Cell-based studies revealed that the synergy site is dispensable for the initial contact of α5β1 with the RGD, but essential to re-enforce the binding of α5β1/αIIbβ3 to FN. Our findings demonstrate a critical role for the FN synergy site when external forces exceed a certain threshold or when αvβ3 integrin levels decrease below a critical level. DOI: http://dx.doi.org/10.7554/eLife.22264.001 PMID:28092265
Robust shifts in S100a9 expression with aging: a novel mechanism for chronic inflammation.

PubMed

Swindell, William R; Johnston, Andrew; Xing, Xianying; Little, Andrew; Robichaud, Patrick; Voorhees, John J; Fisher, Gary; Gudjonsson, Johann E

2013-01-01

The S100a8 and S100a9 genes encode a pro-inflammatory protein (calgranulin) that has been implicated in multiple diseases. However, involvement of S100a8/a9 in the basic mechanisms of intrinsic aging has not been established. In this study, we show that shifts in the abundance of S100a8 and S100a9 mRNA are a robust feature of aging in mammalian tissues, involving a range of cell types including the central nervous system. To identify transcription factors that control S100a9 expression, we performed a large-scale transcriptome analysis of 62 mouse and human cell types. We identified cell type-specific trends, as well as robust associations linking S100a9 coexpression to elevated frequency of ETS family motifs, and in particular, to motifs recognized by the transcription factor SPI/PU.1. Sparse occurrence of SATB1 motifs was also a strong predictor of S100a9 coexpression. These findings offer support for a novel mechanism by which a SPI1/PU.1-S100a9 axis sustains chronic inflammation during aging.
Quaking and PTB control overlapping splicing regulatory networks during muscle cell differentiation

PubMed Central

Hall, Megan P.; Nagel, Roland J.; Fagg, W. Samuel; Shiue, Lily; Cline, Melissa S.; Perriman, Rhonda J.; Donohue, John Paul; Ares, Manuel

2013-01-01

Alternative splicing contributes to muscle development, but a complete set of muscle-splicing factors and their combinatorial interactions are unknown. Previous work identified ACUAA (“STAR” motif) as an enriched intron sequence near muscle-specific alternative exons such as Capzb exon 9. Mass spectrometry of myoblast proteins selected by the Capzb exon 9 intron via RNA affinity chromatography identifies Quaking (QK), a protein known to regulate mRNA function through ACUAA motifs in 3′ UTRs. We find that QK promotes inclusion of Capzb exon 9 in opposition to repression by polypyrimidine tract-binding protein (PTB). QK depletion alters inclusion of 406 cassette exons whose adjacent intron sequences are also enriched in ACUAA motifs. During differentiation of myoblasts to myotubes, QK levels increase two- to threefold, suggesting a mechanism for QK-responsive exon regulation. Combined analysis of the PTB- and QK-splicing regulatory networks during myogenesis suggests that 39% of regulated exons are under the control of one or both of these splicing factors. This work provides the first evidence that QK is a global regulator of splicing during muscle development in vertebrates and shows how overlapping splicing regulatory networks contribute to gene expression programs during differentiation. PMID:23525800

Utilizing Teacher Leadership as a Catalyst for Change in Schools

ERIC Educational Resources Information Center

Ankrum, Raymond J.

2016-01-01

School leaders are constantly trying to find alternative ways to leverage and explore teacher leadership potential in their school building(s). Teachers leaders that are willing to go above and beyond their general duties. Teacher leaders are the type of educators that fall under the motif of potentially taking on additive responsibilities that…
SA-Mot: a web server for the identification of motifs of interest extracted from protein loops

PubMed Central

Regad, Leslie; Saladin, Adrien; Maupetit, Julien; Geneix, Colette; Camproux, Anne-Claude

2011-01-01

The detection of functional motifs is an important step for the determination of protein functions. We present here a new web server SA-Mot (Structural Alphabet Motif) for the extraction and location of structural motifs of interest from protein loops. Contrary to other methods, SA-Mot does not focus only on functional motifs, but it extracts recurrent and conserved structural motifs involved in structural redundancy of loops. SA-Mot uses the structural word notion to extract all structural motifs from uni-dimensional sequences corresponding to loop structures. Then, SA-Mot provides a description of these structural motifs using statistics computed in the loop data set and in SCOP superfamily, sequence and structural parameters. SA-Mot results correspond to an interactive table listing all structural motifs extracted from a target structure and their associated descriptors. Using this information, the users can easily locate loop regions that are important for the protein folding and function. The SA-Mot web server is available at http://sa-mot.mti.univ-paris-diderot.fr. PMID:21665924
SA-Mot: a web server for the identification of motifs of interest extracted from protein loops.

PubMed

Regad, Leslie; Saladin, Adrien; Maupetit, Julien; Geneix, Colette; Camproux, Anne-Claude

2011-07-01

The detection of functional motifs is an important step for the determination of protein functions. We present here a new web server SA-Mot (Structural Alphabet Motif) for the extraction and location of structural motifs of interest from protein loops. Contrary to other methods, SA-Mot does not focus only on functional motifs, but it extracts recurrent and conserved structural motifs involved in structural redundancy of loops. SA-Mot uses the structural word notion to extract all structural motifs from uni-dimensional sequences corresponding to loop structures. Then, SA-Mot provides a description of these structural motifs using statistics computed in the loop data set and in SCOP superfamily, sequence and structural parameters. SA-Mot results correspond to an interactive table listing all structural motifs extracted from a target structure and their associated descriptors. Using this information, the users can easily locate loop regions that are important for the protein folding and function. The SA-Mot web server is available at http://sa-mot.mti.univ-paris-diderot.fr.
Promzea: a pipeline for discovery of co-regulatory motifs in maize and other plant species and its application to the anthocyanin and phlobaphene biosynthetic pathways and the Maize Development Atlas.

PubMed

Liseron-Monfils, Christophe; Lewis, Tim; Ashlock, Daniel; McNicholas, Paul D; Fauteux, François; Strömvik, Martina; Raizada, Manish N

2013-03-15

The discovery of genetic networks and cis-acting DNA motifs underlying their regulation is a major objective of transcriptome studies. The recent release of the maize genome (Zea mays L.) has facilitated in silico searches for regulatory motifs. Several algorithms exist to predict cis-acting elements, but none have been adapted for maize. A benchmark data set was used to evaluate the accuracy of three motif discovery programs: BioProspector, Weeder and MEME. Analysis showed that each motif discovery tool had limited accuracy and appeared to retrieve a distinct set of motifs. Therefore, using the benchmark, statistical filters were optimized to reduce the false discovery ratio, and then remaining motifs from all programs were combined to improve motif prediction. These principles were integrated into a user-friendly pipeline for motif discovery in maize called Promzea, available at http://www.promzea.org and on the Discovery Environment of the iPlant Collaborative website. Promzea was subsequently expanded to include rice and Arabidopsis. Within Promzea, a user enters cDNA sequences or gene IDs; corresponding upstream sequences are retrieved from the maize genome. Predicted motifs are filtered, combined and ranked. Promzea searches the chosen plant genome for genes containing each candidate motif, providing the user with the gene list and corresponding gene annotations. Promzea was validated in silico using a benchmark data set: the Promzea pipeline showed a 22% increase in nucleotide sensitivity compared to the best standalone program tool, Weeder, with equivalent nucleotide specificity. Promzea was also validated by its ability to retrieve the experimentally defined binding sites of transcription factors that regulate the maize anthocyanin and phlobaphene biosynthetic pathways. Promzea predicted additional promoter motifs, and genome-wide motif searches by Promzea identified 127 non-anthocyanin/phlobaphene genes that each contained all five predicted promoter motifs in their promoters, perhaps uncovering a broader co-regulated gene network. Promzea was also tested against tissue-specific microarray data from maize. An online tool customized for promoter motif discovery in plants has been generated called Promzea. Promzea was validated in silico by its ability to retrieve benchmark motifs and experimentally defined motifs and was tested using tissue-specific microarray data. Promzea predicted broader networks of gene regulation associated with the historic anthocyanin and phlobaphene biosynthetic pathways. Promzea is a new bioinformatics tool for understanding transcriptional gene regulation in maize and has been expanded to include rice and Arabidopsis.
NNAlign: A Web-Based Prediction Method Allowing Non-Expert End-User Discovery of Sequence Motifs in Quantitative Peptide Data

PubMed Central

Andreatta, Massimo; Schafer-Nielsen, Claus; Lund, Ole; Buus, Søren; Nielsen, Morten

2011-01-01

Recent advances in high-throughput technologies have made it possible to generate both gene and protein sequence data at an unprecedented rate and scale thereby enabling entirely new “omics”-based approaches towards the analysis of complex biological processes. However, the amount and complexity of data that even a single experiment can produce seriously challenges researchers with limited bioinformatics expertise, who need to handle, analyze and interpret the data before it can be understood in a biological context. Thus, there is an unmet need for tools allowing non-bioinformatics users to interpret large data sets. We have recently developed a method, NNAlign, which is generally applicable to any biological problem where quantitative peptide data is available. This method efficiently identifies underlying sequence patterns by simultaneously aligning peptide sequences and identifying motifs associated with quantitative readouts. Here, we provide a web-based implementation of NNAlign allowing non-expert end-users to submit their data (optionally adjusting method parameters), and in return receive a trained method (including a visual representation of the identified motif) that subsequently can be used as prediction method and applied to unknown proteins/peptides. We have successfully applied this method to several different data sets including peptide microarray-derived sets containing more than 100,000 data points. NNAlign is available online at http://www.cbs.dtu.dk/services/NNAlign. PMID:22073191
NNAlign: a web-based prediction method allowing non-expert end-user discovery of sequence motifs in quantitative peptide data.

PubMed

Andreatta, Massimo; Schafer-Nielsen, Claus; Lund, Ole; Buus, Søren; Nielsen, Morten

2011-01-01

Recent advances in high-throughput technologies have made it possible to generate both gene and protein sequence data at an unprecedented rate and scale thereby enabling entirely new "omics"-based approaches towards the analysis of complex biological processes. However, the amount and complexity of data that even a single experiment can produce seriously challenges researchers with limited bioinformatics expertise, who need to handle, analyze and interpret the data before it can be understood in a biological context. Thus, there is an unmet need for tools allowing non-bioinformatics users to interpret large data sets. We have recently developed a method, NNAlign, which is generally applicable to any biological problem where quantitative peptide data is available. This method efficiently identifies underlying sequence patterns by simultaneously aligning peptide sequences and identifying motifs associated with quantitative readouts. Here, we provide a web-based implementation of NNAlign allowing non-expert end-users to submit their data (optionally adjusting method parameters), and in return receive a trained method (including a visual representation of the identified motif) that subsequently can be used as prediction method and applied to unknown proteins/peptides. We have successfully applied this method to several different data sets including peptide microarray-derived sets containing more than 100,000 data points. NNAlign is available online at http://www.cbs.dtu.dk/services/NNAlign.
Hepatitis B virus genotyping among chronic hepatitis B patients with resistance to treatment with lamivudine in the City of Ribeirão Preto, State of São Paulo.

PubMed

Haddad, Rodrigo; Martinelli, Ana de Lourdes Candolo; Uyemura, Sérgio Akira; Yokosawa, Jonny

2010-01-01

Lamivudine is a nucleoside analogue that is used clinically for treating chronic hepatitis B infection. However, the main problem with prolonged use of lamivudine is the development of viral resistance to the treatment. Mutations in the YMDD motif of the hepatitis B virus DNA polymerase gene have been associated with resistance to drug therapy. So far, there have not been many studies in Brazil reporting on genotype-dependent development of resistance to lamivudine. Thus, the aim of the present study was to determine the possible correlation between a certain genotype and increased development of resistance to lamivudine among chronic hepatitis B patients. HBV DNA in samples from 50 patients under lamivudine treatment was amplified by means of conventional PCR. Samples were collected at Hospital das Clínicas, FMRP-USP. The products were then sequenced and phylogenetic analysis was performed. Phylogenetic analysis revealed that 29 (58%) patients were infected with genotype D, 20 (40%) with genotype A and one (2%) with genotype F. Mutations in the YMDD motif occurred in 20% of the patients with genotype A and 27.6% of the patients with genotype D. Despite the small number of samples, our results indicated that mutations in the YMDD motif were 1.38 times more frequent in genotype D than in genotype A.
The primary growth of laryngeal squamous cell carcinoma cells in vitro is effectively supported by paired cancer-associated fibroblasts alone.

PubMed

Wang, Mei; Wu, Chunping; Guo, Yu; Cao, Xiaojuan; Zheng, Wenwei; Fan, Guo-Kang

2017-05-01

Most primarily cultured laryngeal squamous cell carcinoma cells are difficult to propagate in vitro and have a low survival rate. However, in our previous work to establish a laryngeal squamous cell carcinoma cell line, we found that laryngeal cancer-associated fibroblasts appeared to strongly inhibit the apoptosis of primarily cultured laryngeal squamous cell carcinoma cells in vitro. In this study, we investigated whether paired laryngeal cancer-associated fibroblasts alone can effectively support the growth of primarily cultured laryngeal squamous cell carcinoma cells in vitro. In all, 29 laryngeal squamous cell carcinoma specimens were collected and primarily cultured. The laryngeal squamous cell carcinoma cells were separated from cancer-associated fibroblasts by differential trypsinization and continuously subcultured. Morphological changes of the cultured laryngeal squamous cell carcinoma cells were observed. Immunocytofluorescence was used to authenticate the identity of the cancer-associated fibroblasts and laryngeal squamous cell carcinoma cells. Flow cytometry was used to quantify the proportion of apoptotic cells. Western blot was used to detect the protein levels of caspase-3. Enzyme-linked immunosorbent assay was used to detect the levels of chemokine (C-X-C motif) ligand 12, chemokine (C-X-C motif) ligand 7, hepatocyte growth factor, and fibroblast growth factor 1 in the supernatants of the laryngeal squamous cell carcinoma and control cells. AMD3100 (a chemokine (C-X-C motif) receptor 4 antagonist) and an anti-chemokine (C-X-C motif) ligand 7 antibody were used to block the tumor-supporting capacity of cancer-associated fibroblasts. Significant apoptotic changes were detected in the morphology of laryngeal squamous cell carcinoma cells detached from cancer-associated fibroblasts. The percentage of apoptotic laryngeal squamous cell carcinoma cells and the protein levels of caspase-3 increased gradually in subsequent subcultures. In contrast, no significant differences in the proliferation capacity of laryngeal squamous cell carcinoma cells cocultured with cancer-associated fibroblasts were detected during subculturing. High level of chemokine (C-X-C motif) ligand 12 was detected in the culture supernatant of cancer-associated fibroblasts. The tumor-supporting effect of cancer-associated fibroblasts was significantly inhibited by AMD3100. Our findings demonstrate that the paired laryngeal cancer-associated fibroblasts alone are sufficient to support the primary growth of laryngeal squamous cell carcinoma cells in vitro and that the chemokine (C-X-C motif) ligand 12/chemokine (C-X-C motif) receptor 4 axis is one of the major contributors.
Transcriptome Analysis of an Insecticide Resistant Housefly Strain: Insights about SNPs and Regulatory Elements in Cytochrome P450 Genes.

PubMed

Mahmood, Khalid; Højland, Dorte H; Asp, Torben; Kristensen, Michael

2016-01-01

Insecticide resistance in the housefly, Musca domestica, has been investigated for more than 60 years. It will enter a new era after the recent publication of the housefly genome and the development of multiple next generation sequencing technologies. The genetic background of the xenobiotic response can now be investigated in greater detail. Here, we investigate the 454-pyrosequencing transcriptome of the spinosad-resistant 791spin strain in relation to the housefly genome with focus on P450 genes. The de novo assembly of clean reads gave 35,834 contigs consisting of 21,780 sequences of the spinosad resistant strain. The 3,648 sequences were annotated with an enzyme code EC number and were mapped to 124 KEGG pathways with metabolic processes as most highly represented pathway. One hundred and twenty contigs were annotated as P450s covering 44 different P450 genes of housefly. Eight differentially expressed P450s genes were identified and investigated for SNPs, CpG islands and common regulatory motifs in promoter and coding regions. Functional annotation clustering of metabolic related genes and motif analysis of P450s revealed their association with epigenetic, transcription and gene expression related functions. The sequence variation analysis resulted in 12 SNPs and eight of them found in cyp6d1. There is variation in location, size and frequency of CpG islands and specific motifs were also identified in these P450s. Moreover, identified motifs were associated to GO terms and transcription factors using bioinformatic tools. Transcriptome data of a spinosad resistant strain provide together with genome data fundamental support for future research to understand evolution of resistance in houseflies. Here, we report for the first time the SNPs, CpG islands and common regulatory motifs in differentially expressed P450s. Taken together our findings will serve as a stepping stone to advance understanding of the mechanism and role of P450s in xenobiotic detoxification.
Prediction of GCRV virus-host protein interactome based on structural motif-domain interactions.

PubMed

Zhang, Aidi; He, Libo; Wang, Yaping

2017-03-02

Grass carp hemorrhagic disease, caused by grass carp reovirus (GCRV), is the most fatal causative agent in grass carp aquaculture. Protein-protein interactions between virus and host are one avenue through which GCRV can trigger infection and induce disease. Experimental approaches for the detection of host-virus interactome have many inherent limitations, and studies on protein-protein interactions between GCRV and its host remain rare. In this study, based on known motif-domain interaction information, we systematically predicted the GCRV virus-host protein interactome by using motif-domain interaction pair searching strategy. These proteins derived from different domain families and were predicted to interact with different motif patterns in GCRV. JAM-A protein was successfully predicted to interact with motifs of GCRV Sigma1-like protein, and shared the similar binding mode compared with orthoreovirus. Differentially expressed genes during GCRV infection process were extracted and mapped to our predicted interactome, the overlapped genes displayed different tissue expression distributions on the whole, the overall expression level in intestinal is higher than that of other three tissues, which may suggest that the functions of these genes are more active in intestinal. Function annotation and pathway enrichment analysis revealed that the host targets were largely involved in signaling pathway and immune pathway, such as interferon-gamma signaling pathway, VEGF signaling pathway, EGF receptor signaling pathway, B cell activation, and T cell activation. Although the predicted PPIs may contain some false positives due to limited data resource and poor research background in non-model species, the computational method still provide reasonable amount of interactions, which can be further validated by high throughput experiments. The findings of this work will contribute to the development of system biology for GCRV infectious diseases, and help guide the identification of novel receptors of GCRV in its host.
Positive evolutionary selection of an HD motif on Alzheimer precursor protein orthologues suggests a functional role.

PubMed

Miklós, István; Zádori, Zoltán

2012-02-01

HD amino acid duplex has been found in the active center of many different enzymes. The dyad plays remarkably different roles in their catalytic processes that usually involve metal coordination. An HD motif is positioned directly on the amyloid beta fragment (Aβ) and on the carboxy-terminal region of the extracellular domain (CAED) of the human amyloid precursor protein (APP) and a taxonomically well defined group of APP orthologues (APPOs). In human Aβ HD is part of a presumed, RGD-like integrin-binding motif RHD; however, neither RHD nor RXD demonstrates reasonable conservation in APPOs. The sequences of CAEDs and the position of the HD are not particularly conserved either, yet we show with a novel statistical method using evolutionary modeling that the presence of HD on CAEDs cannot be the result of neutral evolutionary forces (p<0.0001). The motif is positively selected along the evolutionary process in the majority of APPOs, despite the fact that HD motif is underrepresented in the proteomes of all species of the animal kingdom. Position migration can be explained by high probability occurrence of multiple copies of HD on intermediate sequences, from which only one is kept by selective evolutionary forces, in a similar way as in the case of the "transcription binding site turnover." CAED of all APP orthologues and homologues are predicted to bind metal ions including Amyloid-like protein 1 (APLP1) and Amyloid-like protein 2 (APLP2). Our results suggest that HDs on the CAEDs are most probably key components of metal-binding domains, which facilitate and/or regulate inter- or intra-molecular interactions in a metal ion-dependent or metal ion concentration-dependent manner. The involvement of naturally occurring mutations of HD (Tottori (D7N) and English (H6R) mutations) in early onset Alzheimer's disease gives additional support to our finding that HD has an evolutionary preserved function on APPOs.
Positive Evolutionary Selection of an HD Motif on Alzheimer Precursor Protein Orthologues Suggests a Functional Role

PubMed Central

Miklós, István; Zádori, Zoltán

2012-01-01

HD amino acid duplex has been found in the active center of many different enzymes. The dyad plays remarkably different roles in their catalytic processes that usually involve metal coordination. An HD motif is positioned directly on the amyloid beta fragment (Aβ) and on the carboxy-terminal region of the extracellular domain (CAED) of the human amyloid precursor protein (APP) and a taxonomically well defined group of APP orthologues (APPOs). In human Aβ HD is part of a presumed, RGD-like integrin-binding motif RHD; however, neither RHD nor RXD demonstrates reasonable conservation in APPOs. The sequences of CAEDs and the position of the HD are not particularly conserved either, yet we show with a novel statistical method using evolutionary modeling that the presence of HD on CAEDs cannot be the result of neutral evolutionary forces (p<0.0001). The motif is positively selected along the evolutionary process in the majority of APPOs, despite the fact that HD motif is underrepresented in the proteomes of all species of the animal kingdom. Position migration can be explained by high probability occurrence of multiple copies of HD on intermediate sequences, from which only one is kept by selective evolutionary forces, in a similar way as in the case of the “transcription binding site turnover.” CAED of all APP orthologues and homologues are predicted to bind metal ions including Amyloid-like protein 1 (APLP1) and Amyloid-like protein 2 (APLP2). Our results suggest that HDs on the CAEDs are most probably key components of metal-binding domains, which facilitate and/or regulate inter- or intra-molecular interactions in a metal ion-dependent or metal ion concentration-dependent manner. The involvement of naturally occurring mutations of HD (Tottori (D7N) and English (H6R) mutations) in early onset Alzheimer's disease gives additional support to our finding that HD has an evolutionary preserved function on APPOs. PMID:22319430
SiteBinder: an improved approach for comparing multiple protein structural motifs.

PubMed

Sehnal, David; Vařeková, Radka Svobodová; Huber, Heinrich J; Geidl, Stanislav; Ionescu, Crina-Maria; Wimmerová, Michaela; Koča, Jaroslav

2012-02-27

There is a paramount need to develop new techniques and tools that will extract as much information as possible from the ever growing repository of protein 3D structures. We report here on the development of a software tool for the multiple superimposition of large sets of protein structural motifs. Our superimposition methodology performs a systematic search for the atom pairing that provides the best fit. During this search, the RMSD values for all chemically relevant pairings are calculated by quaternion algebra. The number of evaluated pairings is markedly decreased by using PDB annotations for atoms. This approach guarantees that the best fit will be found and can be applied even when sequence similarity is low or does not exist at all. We have implemented this methodology in the Web application SiteBinder, which is able to process up to thousands of protein structural motifs in a very short time, and which provides an intuitive and user-friendly interface. Our benchmarking analysis has shown the robustness, efficiency, and versatility of our methodology and its implementation by the successful superimposition of 1000 experimentally determined structures for each of 32 eukaryotic linear motifs. We also demonstrate the applicability of SiteBinder using three case studies. We first compared the structures of 61 PA-IIL sugar binding sites containing nine different sugars, and we found that the sugar binding sites of PA-IIL and its mutants have a conserved structure despite their binding different sugars. We then superimposed over 300 zinc finger central motifs and revealed that the molecular structure in the vicinity of the Zn atom is highly conserved. Finally, we superimposed 12 BH3 domains from pro-apoptotic proteins. Our findings come to support the hypothesis that there is a structural basis for the functional segregation of BH3-only proteins into activators and enablers.
Frnakenstein: multiple target inverse RNA folding.

PubMed

Lyngsø, Rune B; Anderson, James W J; Sizikova, Elena; Badugu, Amarendra; Hyland, Tomas; Hein, Jotun

2012-10-09

RNA secondary structure prediction, or folding, is a classic problem in bioinformatics: given a sequence of nucleotides, the aim is to predict the base pairs formed in its three dimensional conformation. The inverse problem of designing a sequence folding into a particular target structure has only more recently received notable interest. With a growing appreciation and understanding of the functional and structural properties of RNA motifs, and a growing interest in utilising biomolecules in nano-scale designs, the interest in the inverse RNA folding problem is bound to increase. However, whereas the RNA folding problem from an algorithmic viewpoint has an elegant and efficient solution, the inverse RNA folding problem appears to be hard. In this paper we present a genetic algorithm approach to solve the inverse folding problem. The main aims of the development was to address the hitherto mostly ignored extension of solving the inverse folding problem, the multi-target inverse folding problem, while simultaneously designing a method with superior performance when measured on the quality of designed sequences. The genetic algorithm has been implemented as a Python program called Frnakenstein. It was benchmarked against four existing methods and several data sets totalling 769 real and predicted single structure targets, and on 292 two structure targets. It performed as well as or better at finding sequences which folded in silico into the target structure than all existing methods, without the heavy bias towards CG base pairs that was observed for all other top performing methods. On the two structure targets it also performed well, generating a perfect design for about 80% of the targets. Our method illustrates that successful designs for the inverse RNA folding problem does not necessarily have to rely on heavy biases in base pair and unpaired base distributions. The design problem seems to become more difficult on larger structures when the target structures are real structures, while no deterioration was observed for predicted structures. Design for two structure targets is considerably more difficult, but far from impossible, demonstrating the feasibility of automated design of artificial riboswitches. The Python implementation is available at http://www.stats.ox.ac.uk/research/genome/software/frnakenstein.
Frnakenstein: multiple target inverse RNA folding

PubMed Central

2012-01-01

Background RNA secondary structure prediction, or folding, is a classic problem in bioinformatics: given a sequence of nucleotides, the aim is to predict the base pairs formed in its three dimensional conformation. The inverse problem of designing a sequence folding into a particular target structure has only more recently received notable interest. With a growing appreciation and understanding of the functional and structural properties of RNA motifs, and a growing interest in utilising biomolecules in nano-scale designs, the interest in the inverse RNA folding problem is bound to increase. However, whereas the RNA folding problem from an algorithmic viewpoint has an elegant and efficient solution, the inverse RNA folding problem appears to be hard. Results In this paper we present a genetic algorithm approach to solve the inverse folding problem. The main aims of the development was to address the hitherto mostly ignored extension of solving the inverse folding problem, the multi-target inverse folding problem, while simultaneously designing a method with superior performance when measured on the quality of designed sequences. The genetic algorithm has been implemented as a Python program called Frnakenstein. It was benchmarked against four existing methods and several data sets totalling 769 real and predicted single structure targets, and on 292 two structure targets. It performed as well as or better at finding sequences which folded in silico into the target structure than all existing methods, without the heavy bias towards CG base pairs that was observed for all other top performing methods. On the two structure targets it also performed well, generating a perfect design for about 80% of the targets. Conclusions Our method illustrates that successful designs for the inverse RNA folding problem does not necessarily have to rely on heavy biases in base pair and unpaired base distributions. The design problem seems to become more difficult on larger structures when the target structures are real structures, while no deterioration was observed for predicted structures. Design for two structure targets is considerably more difficult, but far from impossible, demonstrating the feasibility of automated design of artificial riboswitches. The Python implementation is available at http://www.stats.ox.ac.uk/research/genome/software/frnakenstein. PMID:23043260
D-MATRIX: A web tool for constructing weight matrix of conserved DNA motifs

PubMed Central

Sen, Naresh; Mishra, Manoj; Khan, Feroz; Meena, Abha; Sharma, Ashok

2009-01-01

Despite considerable efforts to date, DNA motif prediction in whole genome remains a challenge for researchers. Currently the genome wide motif prediction tools required either direct pattern sequence (for single motif) or weight matrix (for multiple motifs). Although there are known motif pattern databases and tools for genome level prediction but no tool for weight matrix construction. Considering this, we developed a D-MATRIX tool which predicts the different types of weight matrix based on user defined aligned motif sequence set and motif width. For retrieval of known motif sequences user can access the commonly used databases such as TFD, RegulonDB, DBTBS, Transfac. DMATRIX program uses a simple statistical approach for weight matrix construction, which can be converted into different file formats according to user requirement. It provides the possibility to identify the conserved motifs in the coregulated genes or whole genome. As example, we successfully constructed the weight matrix of LexA transcription factor binding site with the help of known sosbox cisregulatory elements in Deinococcus radiodurans genome. The algorithm is implemented in C-Sharp and wrapped in ASP.Net to maintain a user friendly web interface. DMATRIX tool is accessible through the CIMAP domain network. Availability http://203.190.147.116/dmatrix/ PMID:19759861
D-MATRIX: a web tool for constructing weight matrix of conserved DNA motifs.

PubMed

Sen, Naresh; Mishra, Manoj; Khan, Feroz; Meena, Abha; Sharma, Ashok

2009-07-27

Despite considerable efforts to date, DNA motif prediction in whole genome remains a challenge for researchers. Currently the genome wide motif prediction tools required either direct pattern sequence (for single motif) or weight matrix (for multiple motifs). Although there are known motif pattern databases and tools for genome level prediction but no tool for weight matrix construction. Considering this, we developed a D-MATRIX tool which predicts the different types of weight matrix based on user defined aligned motif sequence set and motif width. For retrieval of known motif sequences user can access the commonly used databases such as TFD, RegulonDB, DBTBS, Transfac. D-MATRIX program uses a simple statistical approach for weight matrix construction, which can be converted into different file formats according to user requirement. It provides the possibility to identify the conserved motifs in the co-regulated genes or whole genome. As example, we successfully constructed the weight matrix of LexA transcription factor binding site with the help of known sos-box cis-regulatory elements in Deinococcus radiodurans genome. The algorithm is implemented in C-Sharp and wrapped in ASP.Net to maintain a user friendly web interface. D-MATRIX tool is accessible through the CIMAP domain network. http://203.190.147.116/dmatrix/
A motif detection and classification method for peptide sequences using genetic programming.

PubMed

Tomita, Yasuyuki; Kato, Ryuji; Okochi, Mina; Honda, Hiroyuki

2008-08-01

An exploration of common rules (property motifs) in amino acid sequences has been required for the design of novel sequences and elucidation of the interactions between molecules controlled by the structural or physical environment. In the present study, we developed a new method to search property motifs that are common in peptide sequence data. Our method comprises the following two characteristics: (i) the automatic determination of the position and length of common property motifs by calculating the physicochemical similarity of amino acids, and (ii) the quick and effective exploration of motif candidates that discriminates the positives and negatives by the introduction of genetic programming (GP). Our method was evaluated by two types of model data sets. First, the intentionally buried property motifs were searched in the artificially derived peptide data containing intentionally buried property motifs. As a result, the expected property motifs were correctly extracted by our algorithm. Second, the peptide data that interact with MHC class II molecules were analyzed as one of the models of biologically active peptides with buried motifs in various lengths. Twofold MHC class II binding peptides were identified with the rule using our method, compared to the existing scoring matrix method. In conclusion, our GP based motif searching approach enabled to obtain knowledge of functional aspects of the peptides without any prior knowledge.
Disparate requirements for the Walker A and B ATPase motifs of human RAD51D in homologous recombination.

PubMed

Wiese, Claudia; Hinz, John M; Tebbs, Robert S; Nham, Peter B; Urbin, Salustra S; Collins, David W; Thompson, Larry H; Schild, David

2006-01-01

In vertebrates, homologous recombinational repair (HRR) requires RAD51 and five RAD51 paralogs (XRCC2, XRCC3, RAD51B, RAD51C and RAD51D) that all contain conserved Walker A and B ATPase motifs. In human RAD51D we examined the requirement for these motifs in interactions with XRCC2 and RAD51C, and for survival of cells in response to DNA interstrand crosslinks (ICLs). Ectopic expression of wild-type human RAD51D or mutants having a non-functional A or B motif was used to test for complementation of a rad51d knockout hamster CHO cell line. Although A-motif mutants complement very efficiently, B-motif mutants do not. Consistent with these results, experiments using the yeast two- and three-hybrid systems show that the interactions between RAD51D and its XRCC2 and RAD51C partners also require a functional RAD51D B motif, but not motif A. Similarly, hamster Xrcc2 is unable to bind to the non-complementing human RAD51D B-motif mutants in co-immunoprecipitation assays. We conclude that a functional Walker B motif, but not A motif, is necessary for RAD51D's interactions with other paralogs and for efficient HRR. We present a model in which ATPase sites are formed in a bipartite manner between RAD51D and other RAD51 paralogs.
RSAT matrix-clustering: dynamic exploration and redundancy reduction of transcription factor binding motif collections.

PubMed

Castro-Mondragon, Jaime Abraham; Jaeger, Sébastien; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques

2017-07-27

Transcription factor (TF) databases contain multitudes of binding motifs (TFBMs) from various sources, from which non-redundant collections are derived by manual curation. The advent of high-throughput methods stimulated the production of novel collections with increasing numbers of motifs. Meta-databases, built by merging these collections, contain redundant versions, because available tools are not suited to automatically identify and explore biologically relevant clusters among thousands of motifs. Motif discovery from genome-scale data sets (e.g. ChIP-seq) also produces redundant motifs, hampering the interpretation of results. We present matrix-clustering, a versatile tool that clusters similar TFBMs into multiple trees, and automatically creates non-redundant TFBM collections. A feature unique to matrix-clustering is its dynamic visualisation of aligned TFBMs, and its capability to simultaneously treat multiple collections from various sources. We demonstrate that matrix-clustering considerably simplifies the interpretation of combined results from multiple motif discovery tools, and highlights biologically relevant variations of similar motifs. We also ran a large-scale application to cluster ∼11 000 motifs from 24 entire databases, showing that matrix-clustering correctly groups motifs belonging to the same TF families, and drastically reduced motif redundancy. matrix-clustering is integrated within the RSAT suite (http://rsat.eu/), accessible through a user-friendly web interface or command-line for its integration in pipelines. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

Form and function in gene regulatory networks: the structure of network motifs determines fundamental properties of their dynamical state space.

PubMed

Ahnert, S E; Fink, T M A

2016-07-01

Network motifs have been studied extensively over the past decade, and certain motifs, such as the feed-forward loop, play an important role in regulatory networks. Recent studies have used Boolean network motifs to explore the link between form and function in gene regulatory networks and have found that the structure of a motif does not strongly determine its function, if this is defined in terms of the gene expression patterns the motif can produce. Here, we offer a different, higher-level definition of the 'function' of a motif, in terms of two fundamental properties of its dynamical state space as a Boolean network. One is the basin entropy, which is a complexity measure of the dynamics of Boolean networks. The other is the diversity of cyclic attractor lengths that a given motif can produce. Using these two measures, we examine all 104 topologically distinct three-node motifs and show that the structural properties of a motif, such as the presence of feedback loops and feed-forward loops, predict fundamental characteristics of its dynamical state space, which in turn determine aspects of its functional versatility. We also show that these higher-level properties have a direct bearing on real regulatory networks, as both basin entropy and cycle length diversity show a close correspondence with the prevalence, in neural and genetic regulatory networks, of the 13 connected motifs without self-interactions that have been studied extensively in the literature. © 2016 The Authors.
RNA 3D Structural Motifs: Definition, Identification, Annotation, and Database Searching

NASA Astrophysics Data System (ADS)

Nasalean, Lorena; Stombaugh, Jesse; Zirbel, Craig L.; Leontis, Neocles B.

Structured RNA molecules resemble proteins in the hierarchical organization of their global structures, folding and broad range of functions. Structured RNAs are composed of recurrent modular motifs that play specific functional roles. Some motifs direct the folding of the RNA or stabilize the folded structure through tertiary interactions. Others bind ligands or proteins or catalyze chemical reactions. Therefore, it is desirable, starting from the RNA sequence, to be able to predict the locations of recurrent motifs in RNA molecules. Conversely, the potential occurrence of one or more known 3D RNA motifs may indicate that a genomic sequence codes for a structured RNA molecule. To identify known RNA structural motifs in new RNA sequences, precise structure-based definitions are needed that specify the core nucleotides of each motif and their conserved interactions. By comparing instances of each recurrent motif and applying base pair isosteriCity relations, one can identify neutral mutations that preserve its structure and function in the contexts in which it occurs.
Human lysozyme possesses novel antimicrobial peptides within its N-terminal domain that target bacterial respiration.

PubMed

Ibrahim, Hisham R; Imazato, Kenta; Ono, Hajime

2011-09-28

Human milk lysozyme is thought to be a key defense factor in protecting the gastrointestinal tract of newborns against bacterial infection. Recently, evidence was found that pepsin, under conditions relevant to the newborn stomach, cleaves chicken lysozyme (cLZ) at specific loops to generate five antimicrobial peptide motifs. This study explores the antimicrobial role of the corresponding peptides of human lysozyme (hLZ), the actual protein in breast milk. Five peptide motifs of hLZ, one helix-loop-helix (HLH), its two helices (H1 and H2), and two helix-sheet motifs, H2-β-strands 1-2 (H2-S12) or H2-β-strands 1-3 (H2-S13), were synthesized and examined for antimicrobial action. The five peptides of hLZ exhibit microbicidal activity to various degrees against several bacterial strains. The HLH peptide and its N-terminal helix (H1) were significantly the most potent bactericidal to Gram-positive and Gram-negative bacteria and the fungus Candida albicans . Outer and inner membrane permeabilization studies, as well as measurements of transmembrane electrochemical potentials, provided evidence that HLH peptide and its N-terminal helix (H1) kill bacteria by crossing the outer membrane of Gram-negative bacteria via self-promoted uptake and are able to dissipate the membrane potential-dependent respiration of Gram-positive bacteria. This finding is the first to describe that hLZ possesses multiple antimicrobial peptide motifs within its N-terminal domain, providing insight into new classes of antibiotic peptides with potential use in the treatment of infectious diseases.
Calcium binding studies of peptides of human phospholipid scramblases 1 to 4 suggest that scramblases are new class of calcium binding proteins in the cell.

PubMed

Sahu, Santosh Kumar; Aradhyam, Gopala Krishna; Gummadi, Sathyanarayana N

2009-10-01

Phospholipid scramblases are a group of four homologous proteins conserved from C. elegans to human. In human, two members of the scramblase family, hPLSCR1 and hPLSCR3 are known to bring about Ca2+ dependent translocation of phosphatidylserine and cardiolipin respectively during apoptotic processes. However, affinities of Ca2+/Mg2+ binding to human scramblases and conformational changes taking place in them remains unknown. In the present study, we analyzed the Ca2+ and Mg2+ binding to the calcium binding motifs of hPLSCR1-4 and hPLSCR1 by spectroscopic methods and isothermal titration calorimetry. The results in this study show that (i) affinities of the peptides are in the order hPLSCR1>hPLSCR3>hPLSCR2>hPLSCR4 for Ca2+ and in the order hPLSCR1>hPLSCR2>hPLSCR3>hPLSCR4 for Mg2+, (ii) binding of ions brings about conformational change in the secondary structure of the peptides. The affinity of Ca2+ and Mg2+ binding to protein hPLSCR1 was similar to that of the peptide I. A sequence comparison shows the existence of scramblase-like motifs among other protein families. Based on the above results, we hypothesize that the Ca2+ binding motif of hPLSCR1 is a novel type of Ca2+ binding motif. Our findings will be relevant in understanding the calcium dependent scrambling activity of hPLSCRs and their biological function.
A low-temperature-responsive element involved in the regulation of the Arabidopsis thaliana At1g71850/At1g71860 divergent gene pair.

PubMed

Liu, Shijuan; Chen, Huiqing; Li, Xiulan; Zhang, Wei

2016-08-01

The bidirectional promoter of the Arabidopsis thaliana gene pair At1g71850/At1g71860 harbors low-temperature-responsive elements, which participate in anti-correlated transcription regulation of the driving genes in response to environmental low temperature. A divergent gene pair is defined as two adjacent genes organized head to head in opposite orientation, sharing a common promoter region. Divergent gene pairs are mainly coexpressed, but some display opposite regulation. The mechanistic basis of such anti-correlated regulation is not well understood. Here, the regulation of the Arabidopsis thaliana gene pair At1g71850/At1g71860 was investigated. Semi-quantitative RT-PCR and Genevestigator analyses showed that while one of the pair was upregulated by exposure to low temperature, the same treatment downregulated the other. Promoter::GUS fusion transgenes were used to show that this behavior was driven by a bidirectional promoter, which harbored an as-1 motif, associated with the low-temperature response; mutation of this sequence produced a significant decrease in cold-responsive expression. With regard to the as-1 motif in the native orientation repressing the promoter's low-temperature responsiveness, the same as-1 motif introduced in the reverse direction showed a slight enhancement in the promoter's responsiveness to low-temperature exposure, indicating that the orientation of the motif was important for the promoter's activity. These findings provide new insights into the complex transcriptional regulation of bidirectional gene pairs as well as plant stress response.
Entropic Profiler – detection of conservation in genomes using information theory

PubMed Central

Fernandes, Francisco; Freitas, Ana T; Almeida, Jonas S; Vinga, Susana

2009-01-01

Background In the last decades, with the successive availability of whole genome sequences, many research efforts have been made to mathematically model DNA. Entropic Profiles (EP) were proposed recently as a new measure of continuous entropy of genome sequences. EP represent local information plots related to DNA randomness and are based on information theory and statistical concepts. They express the weighed relative abundance of motifs for each position in genomes. Their study is very relevant because under or over-representation segments are often associated with significant biological meaning. Findings The Entropic Profiler application here presented is a new tool designed to detect and extract under and over-represented DNA segments in genomes by using EP. It allows its computation in a very efficient way by recurring to improved algorithms and data structures, which include modified suffix trees. Available through a web interface and as downloadable source code, it allows to study positions and to search for motifs inside the whole sequence or within a specified range. DNA sequences can be entered from different sources, including FASTA files, pre-loaded examples or resuming a previously saved work. Besides the EP value plots, p-values and z-scores for each motif are also computed, along with the Chaos Game Representation of the sequence. Conclusion EP are directly related with the statistical significance of motifs and can be considered as a new method to extract and classify significant regions in genomes and estimate local scales in DNA. The present implementation establishes an efficient and useful tool for whole genome analysis. PMID:19416538
Multilayer motif analysis of brain networks

NASA Astrophysics Data System (ADS)

Battiston, Federico; Nicosia, Vincenzo; Chavez, Mario; Latora, Vito

2017-04-01

In the last decade, network science has shed new light both on the structural (anatomical) and on the functional (correlations in the activity) connectivity among the different areas of the human brain. The analysis of brain networks has made possible to detect the central areas of a neural system and to identify its building blocks by looking at overabundant small subgraphs, known as motifs. However, network analysis of the brain has so far mainly focused on anatomical and functional networks as separate entities. The recently developed mathematical framework of multi-layer networks allows us to perform an analysis of the human brain where the structural and functional layers are considered together. In this work, we describe how to classify the subgraphs of a multiplex network, and we extend the motif analysis to networks with an arbitrary number of layers. We then extract multi-layer motifs in brain networks of healthy subjects by considering networks with two layers, anatomical and functional, respectively, obtained from diffusion and functional magnetic resonance imaging. Results indicate that subgraphs in which the presence of a physical connection between brain areas (links at the structural layer) coexists with a non-trivial positive correlation in their activities are statistically overabundant. Finally, we investigate the existence of a reinforcement mechanism between the two layers by looking at how the probability to find a link in one layer depends on the intensity of the connection in the other one. Showing that functional connectivity is non-trivially constrained by the underlying anatomical network, our work contributes to a better understanding of the interplay between the structure and function in the human brain.
Complete Genome Analysis of Three Novel Picornaviruses from Diverse Bat Species▿

PubMed Central

Lau, Susanna K. P.; Woo, Patrick C. Y.; Lai, Kenneth K. Y.; Huang, Yi; Yip, Cyril C. Y.; Shek, Chung-Tong; Lee, Paul; Lam, Carol S. F.; Chan, Kwok-Hung; Yuen, Kwok-Yung

2011-01-01

Although bats are important reservoirs of diverse viruses that can cause human epidemics, little is known about the presence of picornaviruses in these flying mammals. Among 1,108 bats of 18 species studied, three novel picornaviruses (groups 1, 2, and 3) were identified from alimentary specimens of 12 bats from five species and four genera. Two complete genomes, each from the three picornaviruses, were sequenced. Phylogenetic analysis showed that they fell into three distinct clusters in the Picornaviridae family, with low homologies to known picornaviruses, especially in leader and 2A proteins. Moreover, group 1 and 2 viruses are more closely related to each other than to group 3 viruses, which exhibit genome features distinct from those of the former two virus groups. In particular, the group 3 virus genome contains the shortest leader protein within Picornaviridae, a putative type I internal ribosome entry site (IRES) in the 5′-untranslated region instead of the type IV IRES found in group 1 and 2 viruses, one instead of two GXCG motifs in 2A, an L→V substitution in the DDLXQ motif in 2C helicase, and a conserved GXH motif in 3C protease. Group 1 and 2 viruses are unique among picornaviruses in having AMH instead of the GXH motif in 3Cpro. These findings suggest that the three picornaviruses belong to two novel genera in the Picornaviridae family. This report describes the discovery and complete genome analysis of three picornaviruses in bats, and their presence in diverse bat genera/species suggests the ability to cross the species barrier. PMID:21697464
A comparative hidden Markov model analysis pipeline identifies proteins characteristic of cereal-infecting fungi

PubMed Central

2013-01-01

Background Fungal pathogens cause devastating losses in economically important cereal crops by utilising pathogen proteins to infect host plants. Secreted pathogen proteins are referred to as effectors and have thus far been identified by selecting small, cysteine-rich peptides from the secretome despite increasing evidence that not all effectors share these attributes. Results We take advantage of the availability of sequenced fungal genomes and present an unbiased method for finding putative pathogen proteins and secreted effectors in a query genome via comparative hidden Markov model analyses followed by unsupervised protein clustering. Our method returns experimentally validated fungal effectors in Stagonospora nodorum and Fusarium oxysporum as well as the N-terminal Y/F/WxC-motif from the barley powdery mildew pathogen. Application to the cereal pathogen Fusarium graminearum reveals a secreted phosphorylcholine phosphatase that is characteristic of hemibiotrophic and necrotrophic cereal pathogens and shares an ancient selection process with bacterial plant pathogens. Three F. graminearum protein clusters are found with an enriched secretion signal. One of these putative effector clusters contains proteins that share a [SG]-P-C-[KR]-P sequence motif in the N-terminal and show features not commonly associated with fungal effectors. This motif is conserved in secreted pathogenic Fusarium proteins and a prime candidate for functional testing. Conclusions Our pipeline has successfully uncovered conservation patterns, putative effectors and motifs of fungal pathogens that would have been overlooked by existing approaches that identify effectors as small, secreted, cysteine-rich peptides. It can be applied to any pathogenic proteome data, such as microbial pathogen data of plants and other organisms. PMID:24252298
Characterization of various promoter regions of the human DNA helicase-encoding genes and identification of duplicated ets (GGAA) motifs as an essential transcription regulatory element.

PubMed

Uchiumi, Fumiaki; Watanabe, Takeshi; Tanuma, Sei-ichi

2010-05-15

DNA helicases are important in the regulation of DNA transaction and thereby various cellular functions. In this study, we developed a cost-effective multiple DNA transfection assay with DEAE-dextran reagent and analyzed the promoter activities of the human DNA helicases. The 5'-flanking regions of the human DNA helicase-encoding genes were isolated and subcloned into luciferase (Luc) expression plasmids. They were coated onto 96-well plate and used for co-transfection with a renilla-Luc expression vector into various cells, and dual-Luc assays were performed. The profiles of promoter activities were dependent on cell lines used. Among these human DNA helicase genes, XPB, RecQL5, and RTEL promoters were activated during TPA-induced HL-60 cell differentiation. Interestingly, duplicated ets (GGAA) elements are commonly located around the transcription start sites of these genes. The duplicated GGAA motifs are also found in the promoters of DNA replication/repair synthesis factor genes including PARG, ATR, TERC, and Rb1. Mutation analyses suggested that the duplicated GGAA-motifs are necessary for the basal promoter activity in various cells and some of them positively respond to TPA in HL-60 cells. TPA-induced response of 44-bp in the RTEL promoter was attenuated by co-transfection of the PU.1 expression vector. These findings suggest that the duplicated ets motifs regulate DNA-repair associated gene expressions during macrophage-like differentiation of HL-60 cells. Copyright 2010 Elsevier Inc. All rights reserved.
Competition between drum and quasi-planar structures in RhB18-: motifs for metallo-boronanotubes and metallo-borophenes.

PubMed

Jian, Tian; Li, Wan-Lu; Chen, Xin; Chen, Teng-Teng; Lopez, Gary V; Li, Jun; Wang, Lai-Sheng

2016-12-01

Metal-doped boron clusters provide new opportunities to design nanoclusters with interesting structures and bonding. A cobalt-doped boron cluster, CoB 18 - , has been observed recently to be planar and can be viewed as a motif for metallo-borophenes, whereas the D 9d drum isomer as a motif for metallo-boronanotubes is found to be much higher in energy. Hence, whether larger doped boron drums are possible is still an open question. Here we report that for RhB 18 - the drum and quasi-planar structures become much closer in energy and co-exist experimentally, revealing a competition between the metallo-boronanotube and metallo-borophene structures. Photoelectron spectroscopy of RhB 18 - shows a complicated spectral pattern, suggesting the presence of two isomers. Quantum chemistry studies indicate that the D 9d drum isomer and a quasi-planar isomer ( C s ) compete for the global minimum. The enhanced stability of the drum isomer in RhB 18 - is due to the less contracted Rh 4d orbitals, which can have favorable interactions with the B 18 drum motif. Chemical bonding analyses show that the quasi-planar isomer of RhB 18 - is aromatic with 10 π electrons, whereas the observed RhB 18 - drum cluster sets a new record for coordination number of eighteen among metal complexes. The current finding shows that the size of the boron drum can be tuned by appropriate metal dopants, suggesting that even larger boron drums with 5d, 6d transition metal, lanthanide or actinide metal atoms are possible.
Effector prediction in host-pathogen interaction based on a Markov model of a ubiquitous EPIYA motif

PubMed Central

2010-01-01

Background Effector secretion is a common strategy of pathogen in mediating host-pathogen interaction. Eight EPIYA-motif containing effectors have recently been discovered in six pathogens. Once these effectors enter host cells through type III/IV secretion systems (T3SS/T4SS), tyrosine in the EPIYA motif is phosphorylated, which triggers effectors binding other proteins to manipulate host-cell functions. The objectives of this study are to evaluate the distribution pattern of EPIYA motif in broad biological species, to predict potential effectors with EPIYA motif, and to suggest roles and biological functions of potential effectors in host-pathogen interactions. Results A hidden Markov model (HMM) of five amino acids was built for the EPIYA-motif based on the eight known effectors. Using this HMM to search the non-redundant protein database containing 9,216,047 sequences, we obtained 107,231 sequences with at least one EPIYA motif occurrence and 3115 sequences with multiple repeats of the EPIYA motif. Although the EPIYA motif exists among broad species, it is significantly over-represented in some particular groups of species. For those proteins containing at least four copies of EPIYA motif, most of them are from intracellular bacteria, extracellular bacteria with T3SS or T4SS or intracellular protozoan parasites. By combining the EPIYA motif and the adjacent SH2 binding motifs (KK, R4, Tarp and Tir), we built HMMs of nine amino acids and predicted many potential effectors in bacteria and protista by the HMMs. Some potential effectors for pathogens (such as Lawsonia intracellularis, Plasmodium falciparum and Leishmania major) are suggested. Conclusions Our study indicates that the EPIYA motif may be a ubiquitous functional site for effectors that play an important pathogenicity role in mediating host-pathogen interactions. We suggest that some intracellular protozoan parasites could secrete EPIYA-motif containing effectors through secretion systems similar to the T3SS/T4SS in bacteria. Our predicted effectors provide useful hypotheses for further studies. PMID:21143776
Identification of sequence motifs in oligonucleotides whose presence is correlated with antisense activity

PubMed Central

Matveeva, O. V.; Tsodikov, A. D.; Giddings, M.; Freier, S. M.; Wyatt, J. R.; Spiridonov, A. N.; Shabalina, S. A.; Gesteland, R. F.; Atkins, J. F.

2000-01-01

Design of antisense oligonucleotides targeting any mRNA can be much more efficient when several activity-enhancing motifs are included and activity-decreasing motifs are avoided. This conclusion was made after statistical analysis of data collected from >1000 experiments with phosphorothioate-modified oligonucleotides. Highly significant positive correlation between the presence of motifs CCAC, TCCC, ACTC, GCCA and CTCT in the oligonucleotide and its antisense efficiency was demonstrated. In addition, negative correlation was revealed for the motifs GGGG, ACTG, AAA and TAA. It was found that the likelihood of activity of an oligonucleotide against a desired mRNA target is sequence motif content dependent. PMID:10908347
Efficacy of function specific 3D-motifs in enzyme classification according to their EC-numbers.

PubMed

Rahimi, Amir; Madadkar-Sobhani, Armin; Touserkani, Rouzbeh; Goliaei, Bahram

2013-11-07

Due to the increasing number of protein structures with unknown function originated from structural genomics projects, protein function prediction has become an important subject in bioinformatics. Among diverse function prediction methods, exploring known 3D-motifs, which are associated with functional elements in unknown protein structures is one of the most biologically meaningful methods. Homologous enzymes inherit such motifs in their active sites from common ancestors. However, slight differences in the properties of these motifs, results in variation in the reactions and substrates of the enzymes. In this study, we examined the possibility of discriminating highly related active site patterns according to their EC-numbers by 3D-motifs. For each EC-number, the spatial arrangement of an active site, which has minimum average distance to other active sites with the same function, was selected as a representative 3D-motif. In order to characterize the motifs, various points in active site elements were tested. The results demonstrated the possibility of predicting full EC-number of enzymes by 3D-motifs. However, the discriminating power of 3D-motifs varies among different enzyme families and depends on selecting the appropriate points and features. © 2013 Elsevier Ltd. All rights reserved.
ELM: the status of the 2010 eukaryotic linear motif resource

PubMed Central

Gould, Cathryn M.; Diella, Francesca; Via, Allegra; Puntervoll, Pål; Gemünd, Christine; Chabanis-Davidson, Sophie; Michael, Sushama; Sayadi, Ahmed; Bryne, Jan Christian; Chica, Claudia; Seiler, Markus; Davey, Norman E.; Haslam, Niall; Weatheritt, Robert J.; Budd, Aidan; Hughes, Tim; Paś, Jakub; Rychlewski, Leszek; Travé, Gilles; Aasland, Rein; Helmer-Citterich, Manuela; Linding, Rune; Gibson, Toby J.

2010-01-01

Linear motifs are short segments of multidomain proteins that provide regulatory functions independently of protein tertiary structure. Much of intracellular signalling passes through protein modifications at linear motifs. Many thousands of linear motif instances, most notably phosphorylation sites, have now been reported. Although clearly very abundant, linear motifs are difficult to predict de novo in protein sequences due to the difficulty of obtaining robust statistical assessments. The ELM resource at http://elm.eu.org/ provides an expanding knowledge base, currently covering 146 known motifs, with annotation that includes >1300 experimentally reported instances. ELM is also an exploratory tool for suggesting new candidates of known linear motifs in proteins of interest. Information about protein domains, protein structure and native disorder, cellular and taxonomic contexts is used to reduce or deprecate false positive matches. Results are graphically displayed in a ‘Bar Code’ format, which also displays known instances from homologous proteins through a novel ‘Instance Mapper’ protocol based on PHI-BLAST. ELM server output provides links to the ELM annotation as well as to a number of remote resources. Using the links, researchers can explore the motifs, proteins, complex structures and associated literature to evaluate whether candidate motifs might be worth experimental investigation. PMID:19920119
Composite Structural Motifs of Binding Sites for Delineating Biological Functions of Proteins

PubMed Central

Kinjo, Akira R.; Nakamura, Haruki

2012-01-01

Most biological processes are described as a series of interactions between proteins and other molecules, and interactions are in turn described in terms of atomic structures. To annotate protein functions as sets of interaction states at atomic resolution, and thereby to better understand the relation between protein interactions and biological functions, we conducted exhaustive all-against-all atomic structure comparisons of all known binding sites for ligands including small molecules, proteins and nucleic acids, and identified recurring elementary motifs. By integrating the elementary motifs associated with each subunit, we defined composite motifs that represent context-dependent combinations of elementary motifs. It is demonstrated that function similarity can be better inferred from composite motif similarity compared to the similarity of protein sequences or of individual binding sites. By integrating the composite motifs associated with each protein function, we define meta-composite motifs each of which is regarded as a time-independent diagrammatic representation of a biological process. It is shown that meta-composite motifs provide richer annotations of biological processes than sequence clusters. The present results serve as a basis for bridging atomic structures to higher-order biological phenomena by classification and integration of binding site structures. PMID:22347478
World Color Survey color naming reveals universal motifs and their within-language diversity

PubMed Central

Lindsey, Delwin T.; Brown, Angela M.

2009-01-01

We analyzed the color terms in the World Color Survey (WCS) (www.icsi.berkeley.edu/wcs/), a large color-naming database obtained from informants of mostly unwritten languages spoken in preindustrialized cultures that have had limited contact with modern, industrialized society. The color naming idiolects of 2,367 WCS informants fall into three to six “motifs,” where each motif is a different color-naming system based on a subset of a universal glossary of 11 color terms. These motifs are universal in that they occur worldwide, with some individual variation, in completely unrelated languages. Strikingly, these few motifs are distributed across the WCS informants in such a way that multiple motifs occur in most languages. Thus, the culture a speaker comes from does not completely determine how he or she will use color terms. An analysis of the modern patterns of motif usage in the WCS languages, based on the assumption that they reflect historical patterns of color term evolution, suggests that color lexicons have changed over time in a complex but orderly way. The worldwide distribution of the motifs and the cooccurrence of multiple motifs within languages suggest that universal processes control the naming of colors. PMID:19901327
TRStalker: an efficient heuristic for finding fuzzy tandem repeats.

PubMed

Pellegrini, Marco; Renda, M Elena; Vecchio, Alessio

2010-06-15

Genomes in higher eukaryotic organisms contain a substantial amount of repeated sequences. Tandem Repeats (TRs) constitute a large class of repetitive sequences that are originated via phenomena such as replication slippage and are characterized by close spatial contiguity. They play an important role in several molecular regulatory mechanisms, and also in several diseases (e.g. in the group of trinucleotide repeat disorders). While for TRs with a low or medium level of divergence the current methods are rather effective, the problem of detecting TRs with higher divergence (fuzzy TRs) is still open. The detection of fuzzy TRs is propaedeutic to enriching our view of their role in regulatory mechanisms and diseases. Fuzzy TRs are also important as tools to shed light on the evolutionary history of the genome, where higher divergence correlates with more remote duplication events. We have developed an algorithm (christened TRStalker) with the aim of detecting efficiently TRs that are hard to detect because of their inherent fuzziness, due to high levels of base substitutions, insertions and deletions. To attain this goal, we developed heuristics to solve a Steiner version of the problem for which the fuzziness is measured with respect to a motif string not necessarily present in the input string. This problem is akin to the 'generalized median string' that is known to be an NP-hard problem. Experiments with both synthetic and biological sequences demonstrate that our method performs better than current state of the art for fuzzy TRs and that the fuzzy TRs of the type we detect are indeed present in important biological sequences. TRStalker will be integrated in the web-based TRs Discovery Service (TReaDS) at bioalgo.iit.cnr.it. Supplementary data are available at Bioinformatics online.
PSSMSearch: a server for modeling, visualization, proteome-wide discovery and annotation of protein motif specificity determinants.

PubMed

Krystkowiak, Izabella; Manguy, Jean; Davey, Norman E

2018-06-05

There is a pressing need for in silico tools that can aid in the identification of the complete repertoire of protein binding (SLiMs, MoRFs, miniMotifs) and modification (moiety attachment/removal, isomerization, cleavage) motifs. We have created PSSMSearch, an interactive web-based tool for rapid statistical modeling, visualization, discovery and annotation of protein motif specificity determinants to discover novel motifs in a proteome-wide manner. PSSMSearch analyses proteomes for regions with significant similarity to a motif specificity determinant model built from a set of aligned motif-containing peptides. Multiple scoring methods are available to build a position-specific scoring matrix (PSSM) describing the motif specificity determinant model. This model can then be modified by a user to add prior knowledge of specificity determinants through an interactive PSSM heatmap. PSSMSearch includes a statistical framework to calculate the significance of specificity determinant model matches against a proteome of interest. PSSMSearch also includes the SLiMSearch framework's annotation, motif functional analysis and filtering tools to highlight relevant discriminatory information. Additional tools to annotate statistically significant shared keywords and GO terms, or experimental evidence of interaction with a motif-recognizing protein have been added. Finally, PSSM-based conservation metrics have been created for taxonomic range analyses. The PSSMSearch web server is available at http://slim.ucd.ie/pssmsearch/.
QuadBase2: web server for multiplexed guanine quadruplex mining and visualization

PubMed Central

Dhapola, Parashar; Chowdhury, Shantanu

2016-01-01

DNA guanine quadruplexes or G4s are non-canonical DNA secondary structures which affect genomic processes like replication, transcription and recombination. G4s are computationally identified by specific nucleotide motifs which are also called putative G4 (PG4) motifs. Despite the general relevance of these structures, there is currently no tool available that can allow batch queries and genome-wide analysis of these motifs in a user-friendly interface. QuadBase2 (quadbase.igib.res.in) presents a completely reinvented web server version of previously published QuadBase database. QuadBase2 enables users to mine PG4 motifs in up to 178 eukaryotes through the EuQuad module. This module interfaces with Ensembl Compara database, to allow users mine PG4 motifs in the orthologues of genes of interest across eukaryotes. PG4 motifs can be mined across genes and their promoter sequences in 1719 prokaryotes through ProQuad module. This module includes a feature that allows genome-wide mining of PG4 motifs and their visualization as circular histograms. TetraplexFinder, the module for mining PG4 motifs in user-provided sequences is now capable of handling up to 20 MB of data. QuadBase2 is a comprehensive PG4 motif mining tool that further expands the configurations and algorithms for mining PG4 motifs in a user-friendly way. PMID:27185890

Cancer-related marketing centrality motifs acting as pivot units in the human signaling network and mediating cross-talk between biological pathways.

PubMed

Li, Wan; Chen, Lina; Li, Xia; Jia, Xu; Feng, Chenchen; Zhang, Liangcai; He, Weiming; Lv, Junjie; He, Yuehan; Li, Weiguo; Qu, Xiaoli; Zhou, Yanyan; Shi, Yuchen

2013-12-01

Network motifs in central positions are considered to not only have more in-coming and out-going connections but are also localized in an area where more paths reach the networks. These central motifs have been extensively investigated to determine their consistent functions or associations with specific function categories. However, their functional potentials in the maintenance of cross-talk between different functional communities are unclear. In this paper, we constructed an integrated human signaling network from the Pathway Interaction Database. We identified 39 essential cancer-related motifs in central roles, which we called cancer-related marketing centrality motifs, using combined centrality indices on the system level. Our results demonstrated that these cancer-related marketing centrality motifs were pivotal units in the signaling network, and could mediate cross-talk between 61 biological pathways (25 could be mediated by one motif on average), most of which were cancer-related pathways. Further analysis showed that molecules of most marketing centrality motifs were in the same or adjacent subcellular localizations, such as the motif containing PI3K, PDK1 and AKT1 in the plasma membrane, to mediate signal transduction between 32 cancer-related pathways. Finally, we analyzed the pivotal roles of cancer genes in these marketing centrality motifs in the pathogenesis of cancers, and found that non-cancer genes were potential cancer-related genes.
Gibbs motif sampling: detection of bacterial outer membrane protein repeats.

PubMed Central

Neuwald, A. F.; Liu, J. S.; Lawrence, C. E.

1995-01-01

The detection and alignment of locally conserved regions (motifs) in multiple sequences can provide insight into protein structure, function, and evolution. A new Gibbs sampling algorithm is described that detects motif-encoding regions in sequences and optimally partitions them into distinct motif models; this is illustrated using a set of immunoglobulin fold proteins. When applied to sequences sharing a single motif, the sampler can be used to classify motif regions into related submodels, as is illustrated using helix-turn-helix DNA-binding proteins. Other statistically based procedures are described for searching a database for sequences matching motifs found by the sampler. When applied to a set of 32 very distantly related bacterial integral outer membrane proteins, the sampler revealed that they share a subtle, repetitive motif. Although BLAST (Altschul SF et al., 1990, J Mol Biol 215:403-410) fails to detect significant pairwise similarity between any of the sequences, the repeats present in these outer membrane proteins, taken as a whole, are highly significant (based on a generally applicable statistical test for motifs described here). Analysis of bacterial porins with known trimeric beta-barrel structure and related proteins reveals a similar repetitive motif corresponding to alternating membrane-spanning beta-strands. These beta-strands occur on the membrane interface (as opposed to the trimeric interface) of the beta-barrel. The broad conservation and structural location of these repeats suggests that they play important functional roles. PMID:8520488
Anion-π Catalysis of Enolate Chemistry: Rigidified Leonard Turns as a General Motif to Run Reactions on Aromatic Surfaces.

PubMed

Cotelle, Yoann; Benz, Sebastian; Avestro, Alyssa-Jennifer; Ward, Thomas R; Sakai, Naomi; Matile, Stefan

2016-03-18

To integrate anion-π, cation-π, and ion pair-π interactions in catalysis, the fundamental challenge is to run reactions reliably on aromatic surfaces. Addressing a specific question concerning enolate addition to nitroolefins, this study elaborates on Leonard turns to tackle this problem in a general manner. Increasingly refined turns are constructed to position malonate half thioesters as close as possible on π-acidic surfaces. The resulting preorganization of reactive intermediates is shown to support the disfavored addition to enolate acceptors to an absolutely unexpected extent. This decisive impact on anion-π catalysis increases with the rigidity of the turns. The new, rigidified Leonard turns are most effective with weak anion-π interactions, whereas stronger interactions do not require such ideal substrate positioning to operate well. The stunning simplicity of the motif and its surprisingly strong relevance for function should render the introduced approach generally useful. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Changes of Water Hydrogen Bond Network with Different Externalities

PubMed Central

Zhao, Lin; Ma, Kai; Yang, Zi

2015-01-01

It is crucial to uncover the mystery of water cluster and structural motif to have an insight into the abundant anomalies bound to water. In this context, the analysis of influence factors is an alternative way to shed light on the nature of water clusters. Water structure has been tentatively explained within different frameworks of structural models. Based on comprehensive analysis and summary of the studies on the response of water to four externalities (i.e., temperature, pressure, solutes and external fields), the changing trends of water structure and a deduced intrinsic structural motif are put forward in this work. The variations in physicochemical and biological effects of water induced by each externality are also discussed to emphasize the role of water in our daily life. On this basis, the underlying problems that need to be further studied are formulated by pointing out the limitations attached to current study techniques and to outline prominent studies that have come up recently. PMID:25884333
Engineering monolayer poration for rapid exfoliation of microbial membranes.

PubMed

Pyne, Alice; Pfeil, Marc-Philipp; Bennett, Isabel; Ravi, Jascindra; Iavicoli, Patrizia; Lamarre, Baptiste; Roethke, Anita; Ray, Santanu; Jiang, Haibo; Bella, Angelo; Reisinger, Bernd; Yin, Daniel; Little, Benjamin; Muñoz-García, Juan C; Cerasoli, Eleonora; Judge, Peter J; Faruqui, Nilofar; Calzolai, Luigi; Henrion, Andre; Martyna, Glenn J; Grovenor, Chris R M; Crain, Jason; Hoogenboom, Bart W; Watts, Anthony; Ryadnov, Maxim G

2017-02-01

The spread of bacterial resistance to traditional antibiotics continues to stimulate the search for alternative antimicrobial strategies. All forms of life, from bacteria to humans, are postulated to rely on a fundamental host defense mechanism, which exploits the formation of open pores in microbial phospholipid bilayers. Here we predict that transmembrane poration is not necessary for antimicrobial activity and reveal a distinct poration mechanism that targets the outer leaflet of phospholipid bilayers. Using a combination of molecular-scale and real-time imaging, spectroscopy and spectrometry approaches, we introduce a structural motif with a universal insertion mode in reconstituted membranes and live bacteria. We demonstrate that this motif rapidly assembles into monolayer pits that coalesce during progressive membrane exfoliation, leading to bacterial cell death within minutes. The findings offer a new physical basis for designing effective antibiotics.
Ethnomathematics Exploration of the Toba Community: Elements of Geometry Transformation Contained in Gorga (Ornament on Bataks House)

NASA Astrophysics Data System (ADS)

Ditasona, C.

2018-04-01

Gorga is an ornament known to the Batak community. As one of the works of art that poured in the form of carvings, gorga become icons for batak society. Long before the Batak people knew formal education, they had made gorga. This is evidenced through several historical sources. Gorga not only contains the value of art but also has a mathematical element. There are many mathematical principles used in the process of making gorga. The principle of geometry transformation is very prominent in the gorga motifs. This article is an ethnomathematics research that will discuss how the thinking process in making gorga. Observations and interviews with the gorga craftsmen (pande) are conducted to find out how the principles of rotation, translation, dilation and reflection are used in making gorga motifs
New Protein Mimetics: The Zinc Finger Motif as a Locked-In Tertiary Fold.

PubMed

Tuchscherer, Gabriele; Lehmann, Christian; Mathieu, Marc

1998-11-16

The principle of a molecular kit is used for the covalent assembly of secondary structure forming peptide blocks to predetermined packing topologies. The resulting locked-in folds (LIFs; depicted schematically) are readily accessible and bypass the intriguing folding problem of linear peptide chains. This strategy allows, for example, mimicking of the essential structural and functional features of zinc finger proteins. © 1998 WILEY-VCH Verlag GmbH, Weinheim, Fed. Rep. of Germany.
info-gibbs: a motif discovery algorithm that directly optimizes information content during sampling.

PubMed

Defrance, Matthieu; van Helden, Jacques

2009-10-15

Discovering cis-regulatory elements in genome sequence remains a challenging issue. Several methods rely on the optimization of some target scoring function. The information content (IC) or relative entropy of the motif has proven to be a good estimator of transcription factor DNA binding affinity. However, these information-based metrics are usually used as a posteriori statistics rather than during the motif search process itself. We introduce here info-gibbs, a Gibbs sampling algorithm that efficiently optimizes the IC or the log-likelihood ratio (LLR) of the motif while keeping computation time low. The method compares well with existing methods like MEME, BioProspector, Gibbs or GAME on both synthetic and biological datasets. Our study shows that motif discovery techniques can be enhanced by directly focusing the search on the motif IC or the motif LLR. http://rsat.ulb.ac.be/rsat/info-gibbs
A Gibbs sampler for motif detection in phylogenetically close sequences

NASA Astrophysics Data System (ADS)

Siddharthan, Rahul; van Nimwegen, Erik; Siggia, Eric

2004-03-01

Genes are regulated by transcription factors that bind to DNA upstream of genes and recognize short conserved ``motifs'' in a random intergenic ``background''. Motif-finders such as the Gibbs sampler compare the probability of these short sequences being represented by ``weight matrices'' to the probability of their arising from the background ``null model'', and explore this space (analogous to a free-energy landscape). But closely related species may show conservation not because of functional sites but simply because they have not had sufficient time to diverge, so conventional methods will fail. We introduce a new Gibbs sampler algorithm that accounts for common ancestry when searching for motifs, while requiring minimal ``prior'' assumptions on the number and types of motifs, assessing the significance of detected motifs by ``tracking'' clusters that stay together. We apply this scheme to motif detection in sporulation-cycle genes in the yeast S. cerevisiae, using recent sequences of other closely-related Saccharomyces species.
CircularLogo: A lightweight web application to visualize intra-motif dependencies.

PubMed

Ye, Zhenqing; Ma, Tao; Kalmbach, Michael T; Dasari, Surendra; Kocher, Jean-Pierre A; Wang, Liguo

2017-05-22

The sequence logo has been widely used to represent DNA or RNA motifs for more than three decades. Despite its intelligibility and intuitiveness, the traditional sequence logo is unable to display the intra-motif dependencies and therefore is insufficient to fully characterize nucleotide motifs. Many methods have been developed to quantify the intra-motif dependencies, but fewer tools are available for visualization. We developed CircularLogo, a web-based interactive application, which is able to not only visualize the position-specific nucleotide consensus and diversity but also display the intra-motif dependencies. Applying CircularLogo to HNF6 binding sites and tRNA sequences demonstrated its ability to show intra-motif dependencies and intuitively reveal biomolecular structure. CircularLogo is implemented in JavaScript and Python based on the Django web framework. The program's source code and user's manual are freely available at http://circularlogo.sourceforge.net . CircularLogo web server can be accessed from http://bioinformaticstools.mayo.edu/circularlogo/index.html . CircularLogo is an innovative web application that is specifically designed to visualize and interactively explore intra-motif dependencies.
SLiMSearch 2.0: biological context for short linear motifs in proteins

PubMed Central

Davey, Norman E.; Haslam, Niall J.; Shields, Denis C.

2011-01-01

Short, linear motifs (SLiMs) play a critical role in many biological processes. The SLiMSearch 2.0 (Short, Linear Motif Search) web server allows researchers to identify occurrences of a user-defined SLiM in a proteome, using conservation and protein disorder context statistics to rank occurrences. User-friendly output and visualizations of motif context allow the user to quickly gain insight into the validity of a putatively functional motif occurrence. For each motif occurrence, overlapping UniProt features and annotated SLiMs are displayed. Visualization also includes annotated multiple sequence alignments surrounding each occurrence, showing conservation and protein disorder statistics in addition to known and predicted SLiMs, protein domains and known post-translational modifications. In addition, enrichment of Gene Ontology terms and protein interaction partners are provided as indicators of possible motif function. All web server results are available for download. Users can search motifs against the human proteome or a subset thereof defined by Uniprot accession numbers or GO term. The SLiMSearch server is available at: http://bioware.ucd.ie/slimsearch2.html. PMID:21622654
Identifying novel sequence variants of RNA 3D motifs

PubMed Central

Zirbel, Craig L.; Roll, James; Sweeney, Blake A.; Petrov, Anton I.; Pirrung, Meg; Leontis, Neocles B.

2015-01-01

Predicting RNA 3D structure from sequence is a major challenge in biophysics. An important sub-goal is accurately identifying recurrent 3D motifs from RNA internal and hairpin loop sequences extracted from secondary structure (2D) diagrams. We have developed and validated new probabilistic models for 3D motif sequences based on hybrid Stochastic Context-Free Grammars and Markov Random Fields (SCFG/MRF). The SCFG/MRF models are constructed using atomic-resolution RNA 3D structures. To parameterize each model, we use all instances of each motif found in the RNA 3D Motif Atlas and annotations of pairwise nucleotide interactions generated by the FR3D software. Isostericity relations between non-Watson–Crick basepairs are used in scoring sequence variants. SCFG techniques model nested pairs and insertions, while MRF ideas handle crossing interactions and base triples. We use test sets of randomly-generated sequences to set acceptance and rejection thresholds for each motif group and thus control the false positive rate. Validation was carried out by comparing results for four motif groups to RMDetect. The software developed for sequence scoring (JAR3D) is structured to automatically incorporate new motifs as they accumulate in the RNA 3D Motif Atlas when new structures are solved and is available free for download. PMID:26130723
Evolutionary dynamics of a conserved sequence motif in the ribosomal genes of the ciliate Paramecium.

PubMed

Catania, Francesco; Lynch, Michael

2010-05-04

In protozoa, the identification of preserved motifs by comparative genomics is often impeded by difficulties to generate reliable alignments for non-coding sequences. Moreover, the evolutionary dynamics of regulatory elements in 3' untranslated regions (both in protozoa and metazoa) remains a virtually unexplored issue. By screening Paramecium tetraurelia's 3' untranslated regions for 8-mers that were previously found to be preserved in mammalian 3' UTRs, we detect and characterize a motif that is distinctly conserved in the ribosomal genes of this ciliate. The motif appears to be conserved across Paramecium aurelia species but is absent from the ribosomal genes of four additional non-Paramecium species surveyed, including another ciliate, Tetrahymena thermophila. Motif-free ribosomal genes retain fewer paralogs in the genome and appear to be lost more rapidly relative to motif-containing genes. Features associated with the discovered preserved motif are consistent with this 8-mer playing a role in post-transcriptional regulation. Our observations 1) shed light on the evolution of a putative regulatory motif across large phylogenetic distances; 2) are expected to facilitate the understanding of the modulation of ribosomal genes expression in Paramecium; and 3) reveal a largely unexplored--and presumably not restricted to Paramecium--association between the presence/absence of a DNA motif and the evolutionary fate of its host genes.
Multiple Dileucine-like Motifs Direct VGLUT1 Trafficking

PubMed Central

Foss, Sarah M.; Li, Haiyan; Santos, Magda S.; Edwards, Robert H.

2013-01-01

The vesicular glutamate transporters (VGLUTs) package glutamate into synaptic vesicles, and the two principal isoforms VGLUT1 and VGLUT2 have been suggested to influence the properties of release. To understand how a VGLUT isoform might influence transmitter release, we have studied their trafficking and previously identified a dileucine-like endocytic motif in the C terminus of VGLUT1. Disruption of this motif impairs the activity-dependent recycling of VGLUT1, but does not eliminate its endocytosis. We now report the identification of two additional dileucine-like motifs in the N terminus of VGLUT1 that are not well conserved in the other isoforms. In the absence of all three motifs, rat VGLUT1 shows limited accumulation at synaptic sites and no longer responds to stimulation. In addition, shRNA-mediated knockdown of clathrin adaptor proteins AP-1 and AP-2 shows that the C-terminal motif acts largely via AP-2, whereas the N-terminal motifs use AP-1. Without the C-terminal motif, knockdown of AP-1 reduces the proportion of VGLUT1 that responds to stimulation. VGLUT1 thus contains multiple sorting signals that engage distinct trafficking mechanisms. In contrast to VGLUT1, the trafficking of VGLUT2 depends almost entirely on the conserved C-terminal dileucine-like motif: without this motif, a substantial fraction of VGLUT2 redistributes to the plasma membrane and the transporter's synaptic localization is disrupted. Consistent with these differences in trafficking signals, wild-type VGLUT1 and VGLUT2 differ in their response to stimulation. PMID:23804088
Multiple dileucine-like motifs direct VGLUT1 trafficking.

PubMed

Foss, Sarah M; Li, Haiyan; Santos, Magda S; Edwards, Robert H; Voglmaier, Susan M

2013-06-26

The vesicular glutamate transporters (VGLUTs) package glutamate into synaptic vesicles, and the two principal isoforms VGLUT1 and VGLUT2 have been suggested to influence the properties of release. To understand how a VGLUT isoform might influence transmitter release, we have studied their trafficking and previously identified a dileucine-like endocytic motif in the C terminus of VGLUT1. Disruption of this motif impairs the activity-dependent recycling of VGLUT1, but does not eliminate its endocytosis. We now report the identification of two additional dileucine-like motifs in the N terminus of VGLUT1 that are not well conserved in the other isoforms. In the absence of all three motifs, rat VGLUT1 shows limited accumulation at synaptic sites and no longer responds to stimulation. In addition, shRNA-mediated knockdown of clathrin adaptor proteins AP-1 and AP-2 shows that the C-terminal motif acts largely via AP-2, whereas the N-terminal motifs use AP-1. Without the C-terminal motif, knockdown of AP-1 reduces the proportion of VGLUT1 that responds to stimulation. VGLUT1 thus contains multiple sorting signals that engage distinct trafficking mechanisms. In contrast to VGLUT1, the trafficking of VGLUT2 depends almost entirely on the conserved C-terminal dileucine-like motif: without this motif, a substantial fraction of VGLUT2 redistributes to the plasma membrane and the transporter's synaptic localization is disrupted. Consistent with these differences in trafficking signals, wild-type VGLUT1 and VGLUT2 differ in their response to stimulation.
Overexpression of TRIM25 in Lung Cancer Regulates Tumor Cell Progression.

PubMed

Qin, Ying; Cui, He; Zhang, Hua

2016-10-01

Lung cancer is one of the most common causes of cancer-related deaths worldwide. Although great efforts and progressions have been made in the study of the lung cancer in the recent decades, the mechanism of lung cancer formation remains elusive. To establish effective therapeutic methods, new targets implied in lung cancer processes have to be identified. Tripartite motif-containing 25 has been associated with ovarian and breast cancer and is thought to positively promote cell growth by targeting the cell cycle. However, whether tripartite motif-containing 25 has a function in lung cancer development remains unknown. In this study, we found that tripartite motif-containing 25 was overexpressed in human lung cancer tissues. Expression of tripartite motif-containing 25 in lung cancer cells is important for cell proliferation and migration. Knockdown of tripartite motif-containing 25 markedly reduced proliferation of lung cancer cells both in vitro and in vivo and reduced migration of lung cancer cells in vitro Meanwhile, tripartite motif-containing 25 silencing also increased the sensitivity of doxorubicin and significantly increased death and apoptosis of lung cancer cells by doxorubicin were achieved with knockdown of tripartite motif-containing 25. We also observed that tripartite motif-containing 25 formed a complex with p53 and mouse double minute 2 homolog (MDM2) in both human lung cancer tissues and in lung cancer cells and tripartite motif-containing 25 silencing increased the expression of p53. These results provide evidence that tripartite motif-containing 25 contributes to the pathogenesis of lung cancer probably by promoting proliferation and migration of lung cancer cells. Therefore, targeting tripartite motif-containing 25 may provide a potential therapeutic intervention for lung cancer. © The Author(s) 2015.
Isosteric And Non-Isosteric Base Pairs In RNA Motifs: Molecular Dynamics And Bioinformatics Study Of The Sarcin-Ricin Internal Loop

PubMed Central

Havrila, Marek; Réblová, Kamila; Zirbel, Craig L.; Leontis, Neocles B.; Šponer, Jiří

2013-01-01

The Sarcin-Ricin RNA motif (SR motif) is one of the most prominent recurrent RNA building blocks that occurs in many different RNA contexts and folds autonomously, i.e., in a context-independent manner. In this study, we combined bioinformatics analysis with explicit-solvent molecular dynamics (MD) simulations to better understand the relation between the RNA sequence and the evolutionary patterns of SR motif. SHAPE probing experiment was also performed to confirm fidelity of MD simulations. We identified 57 instances of the SR motif in a non-redundant subset of the RNA X-ray structure database and analyzed their basepairing, base-phosphate, and backbone-backbone interactions. We extracted sequences aligned to these instances from large ribosomal RNA alignments to determine frequency of occurrence for different sequence variants. We then used a simple scoring scheme based on isostericity to suggest 10 sequence variants with highly variable expected degree of compatibility with the SR motif 3D structure. We carried out MD simulations of SR motifs with these base substitutions. Non isosteric base substitutions led to unstable structures, but so did isosteric substitutions which were unable to make key base-phosphate interactions. MD technique explains why some potentially isosteric SR motifs are not realized during evolution. We also found that inability to form stable cWW geometry is an important factor in case of the first base pair of the flexible region of the SR motif. Comparison of structural, bioinformatics, SHAPE probing and MD simulation data reveals that explicit solvent MD simulations neatly reflect viability of different sequence variants of the SR motif. Thus, MD simulations can efficiently complement bioinformatics tools in studies of conservation patterns of RNA motifs and provide atomistic insight into the role of their different signature interactions. PMID:24144333
Cellular automata simulation of topological effects on the dynamics of feed-forward motifs

PubMed Central

Apte, Advait A; Cain, John W; Bonchev, Danail G; Fong, Stephen S

2008-01-01

Background Feed-forward motifs are important functional modules in biological and other complex networks. The functionality of feed-forward motifs and other network motifs is largely dictated by the connectivity of the individual network components. While studies on the dynamics of motifs and networks are usually devoted to the temporal or spatial description of processes, this study focuses on the relationship between the specific architecture and the overall rate of the processes of the feed-forward family of motifs, including double and triple feed-forward loops. The search for the most efficient network architecture could be of particular interest for regulatory or signaling pathways in biology, as well as in computational and communication systems. Results Feed-forward motif dynamics were studied using cellular automata and compared with differential equation modeling. The number of cellular automata iterations needed for a 100% conversion of a substrate into a target product was used as an inverse measure of the transformation rate. Several basic topological patterns were identified that order the specific feed-forward constructions according to the rate of dynamics they enable. At the same number of network nodes and constant other parameters, the bi-parallel and tri-parallel motifs provide higher network efficacy than single feed-forward motifs. Additionally, a topological property of isodynamicity was identified for feed-forward motifs where different network architectures resulted in the same overall rate of the target production. Conclusion It was shown for classes of structural motifs with feed-forward architecture that network topology affects the overall rate of a process in a quantitatively predictable manner. These fundamental results can be used as a basis for simulating larger networks as combinations of smaller network modules with implications on studying synthetic gene circuits, small regulatory systems, and eventually dynamic whole-cell models. PMID:18304325
MOCCS: Clarifying DNA-binding motif ambiguity using ChIP-Seq data.

PubMed

Ozaki, Haruka; Iwasaki, Wataru

2016-08-01

As a key mechanism of gene regulation, transcription factors (TFs) bind to DNA by recognizing specific short sequence patterns that are called DNA-binding motifs. A single TF can accept ambiguity within its DNA-binding motifs, which comprise both canonical (typical) and non-canonical motifs. Clarification of such DNA-binding motif ambiguity is crucial for revealing gene regulatory networks and evaluating mutations in cis-regulatory elements. Although chromatin immunoprecipitation sequencing (ChIP-seq) now provides abundant data on the genomic sequences to which a given TF binds, existing motif discovery methods are unable to directly answer whether a given TF can bind to a specific DNA-binding motif. Here, we report a method for clarifying the DNA-binding motif ambiguity, MOCCS. Given ChIP-Seq data of any TF, MOCCS comprehensively analyzes and describes every k-mer to which that TF binds. Analysis of simulated datasets revealed that MOCCS is applicable to various ChIP-Seq datasets, requiring only a few minutes per dataset. Application to the ENCODE ChIP-Seq datasets proved that MOCCS directly evaluates whether a given TF binds to each DNA-binding motif, even if known position weight matrix models do not provide sufficient information on DNA-binding motif ambiguity. Furthermore, users are not required to provide numerous parameters or background genomic sequence models that are typically unavailable. MOCCS is implemented in Perl and R and is freely available via https://github.com/yuifu/moccs. By complementing existing motif-discovery software, MOCCS will contribute to the basic understanding of how the genome controls diverse cellular processes via DNA-protein interactions. Copyright © 2016 Elsevier Ltd. All rights reserved.
Detection and Preliminary Analysis of Motifs in Promoters of Anaerobically Induced Genes of Different Plant Species

PubMed Central

MOHANTY, BIJAYALAXMI; KRISHNAN, S. P. T.; SWARUP, SANJAY; BAJIC, VLADIMIR B.

2005-01-01

• Background and Aims Plants can suffer from oxygen limitation during flooding or more complete submergence and may therefore switch from Kreb's cycle respiration to fermentation in association with the expression of anaerobically inducible genes coding for enzymes involved in glycolysis and fermentation. The aim of this study was to clarify mechanisms of transcriptional regulation of these anaerobic genes by identifying motifs shared by their promoter regions. • Methods Statistically significant motifs were detected by an in silico method from 13 promoters of anaerobic genes. The selected motifs were common for the majority of analysed promoters. Their significance was evaluated by searching for their presence in transcription factor-binding site databases (TRANSFAC, PlantCARE and PLACE). Using several negative control data sets, it was tested whether the motifs found were specific to the anaerobic group. • Key Results Previously, anaerobic response elements have been identified in maize (Zea mays) and arabidopsis (Arabidopsis thaliana) genes. Known functional motifs were detected, such as GT and GC motifs, but also other motifs shared by most of the genes examined. Five motifs detected have not been found in plants hitherto but are present in the promoters of animal genes with various functions. The consensus sequences of these novel motifs are 5′-AAACAAA-3′, 5′-AGCAGC-3′, 5′-TCATCAC-3′, 5′-GTTT(A/C/T)GCAA-3′ and 5′-TTCCCTGTT-3′. • Conclusions It is believed that the promoter motifs identified could be functional by conferring anaerobic sensitivity to the genes that possess them. This proposal now requires experimental verification. PMID:16027132

A reduced-dimensionality approach to uncovering dyadic modes of body motion in conversations.

PubMed

Gaziv, Guy; Noy, Lior; Liron, Yuvalal; Alon, Uri

2017-01-01

Face-to-face conversations are central to human communication and a fascinating example of joint action. Beyond verbal content, one of the primary ways in which information is conveyed in conversations is body language. Body motion in natural conversations has been difficult to study precisely due to the large number of coordinates at play. There is need for fresh approaches to analyze and understand the data, in order to ask whether dyads show basic building blocks of coupled motion. Here we present a method for analyzing body motion during joint action using depth-sensing cameras, and use it to analyze a sample of scientific conversations. Our method consists of three steps: defining modes of body motion of individual participants, defining dyadic modes made of combinations of these individual modes, and lastly defining motion motifs as dyadic modes that occur significantly more often than expected given the single-person motion statistics. As a proof-of-concept, we analyze the motion of 12 dyads of scientists measured using two Microsoft Kinect cameras. In our sample, we find that out of many possible modes, only two were motion motifs: synchronized parallel torso motion in which the participants swayed from side to side in sync, and still segments where neither person moved. We find evidence of dyad individuality in the use of motion modes. For a randomly selected subset of 5 dyads, this individuality was maintained for at least 6 months. The present approach to simplify complex motion data and to define motion motifs may be used to understand other joint tasks and interactions. The analysis tools developed here and the motion dataset are publicly available.
Improved accuracy of supervised CRM discovery with interpolated Markov models and cross-species comparison

PubMed Central

Kazemian, Majid; Zhu, Qiyun; Halfon, Marc S.; Sinha, Saurabh

2011-01-01

Despite recent advances in experimental approaches for identifying transcriptional cis-regulatory modules (CRMs, ‘enhancers’), direct empirical discovery of CRMs for all genes in all cell types and environmental conditions is likely to remain an elusive goal. Effective methods for computational CRM discovery are thus a critically needed complement to empirical approaches. However, existing computational methods that search for clusters of putative binding sites are ineffective if the relevant TFs and/or their binding specificities are unknown. Here, we provide a significantly improved method for ‘motif-blind’ CRM discovery that does not depend on knowledge or accurate prediction of TF-binding motifs and is effective when limited knowledge of functional CRMs is available to ‘supervise’ the search. We propose a new statistical method, based on ‘Interpolated Markov Models’, for motif-blind, genome-wide CRM discovery. It captures the statistical profile of variable length words in known CRMs of a regulatory network and finds candidate CRMs that match this profile. The method also uses orthologs of the known CRMs from closely related genomes. We perform in silico evaluation of predicted CRMs by assessing whether their neighboring genes are enriched for the expected expression patterns. This assessment uses a novel statistical test that extends the widely used Hypergeometric test of gene set enrichment to account for variability in intergenic lengths. We find that the new CRM prediction method is superior to existing methods. Finally, we experimentally validate 12 new CRM predictions by examining their regulatory activity in vivo in Drosophila; 10 of the tested CRMs were found to be functional, while 6 of the top 7 predictions showed the expected activity patterns. We make our program available as downloadable source code, and as a plugin for a genome browser installed on our servers. PMID:21821659
Protease-activated Receptor-4 Signaling and Trafficking Is Regulated by the Clathrin Adaptor Protein Complex-2 Independent of β-Arrestins*

PubMed Central

Smith, Thomas H.; Coronel, Luisa J.; Li, Julia G.; Dores, Michael R.; Nieman, Marvin T.; Trejo, JoAnn

2016-01-01

Protease-activated receptor-4 (PAR4) is a G protein-coupled receptor (GPCR) for thrombin and is proteolytically activated, similar to the prototypical PAR1. Due to the irreversible activation of PAR1, receptor trafficking is intimately linked to signal regulation. However, unlike PAR1, the mechanisms that control PAR4 trafficking are not known. Here, we sought to define the mechanisms that control PAR4 trafficking and signaling. In HeLa cells depleted of clathrin by siRNA, activated PAR4 failed to internalize. Consistent with clathrin-mediated endocytosis, expression of a dynamin dominant-negative K44A mutant also blocked activated PAR4 internalization. However, unlike most GPCRs, PAR4 internalization occurred independently of β-arrestins and the receptor's C-tail domain. Rather, we discovered a highly conserved tyrosine-based motif in the third intracellular loop of PAR4 and found that the clathrin adaptor protein complex-2 (AP-2) is important for internalization. Depletion of AP-2 inhibited PAR4 internalization induced by agonist. In addition, mutation of the critical residues of the tyrosine-based motif disrupted agonist-induced PAR4 internalization. Using Dami megakaryocytic cells, we confirmed that AP-2 is required for agonist-induced internalization of endogenous PAR4. Moreover, inhibition of activated PAR4 internalization enhanced ERK1/2 signaling, whereas Akt signaling was markedly diminished. These findings indicate that activated PAR4 internalization requires AP-2 and a tyrosine-based motif and occurs independent of β-arrestins, unlike most classical GPCRs. Moreover, these findings are the first to show that internalization of activated PAR4 is linked to proper ERK1/2 and Akt activation. PMID:27402844
A reduced-dimensionality approach to uncovering dyadic modes of body motion in conversations

PubMed Central

Noy, Lior; Liron, Yuvalal; Alon, Uri

2017-01-01

Face-to-face conversations are central to human communication and a fascinating example of joint action. Beyond verbal content, one of the primary ways in which information is conveyed in conversations is body language. Body motion in natural conversations has been difficult to study precisely due to the large number of coordinates at play. There is need for fresh approaches to analyze and understand the data, in order to ask whether dyads show basic building blocks of coupled motion. Here we present a method for analyzing body motion during joint action using depth-sensing cameras, and use it to analyze a sample of scientific conversations. Our method consists of three steps: defining modes of body motion of individual participants, defining dyadic modes made of combinations of these individual modes, and lastly defining motion motifs as dyadic modes that occur significantly more often than expected given the single-person motion statistics. As a proof-of-concept, we analyze the motion of 12 dyads of scientists measured using two Microsoft Kinect cameras. In our sample, we find that out of many possible modes, only two were motion motifs: synchronized parallel torso motion in which the participants swayed from side to side in sync, and still segments where neither person moved. We find evidence of dyad individuality in the use of motion modes. For a randomly selected subset of 5 dyads, this individuality was maintained for at least 6 months. The present approach to simplify complex motion data and to define motion motifs may be used to understand other joint tasks and interactions. The analysis tools developed here and the motion dataset are publicly available. PMID:28141861
Extensive T-Cell Epitope Repertoire Sharing among Human Proteome, Gastrointestinal Microbiome, and Pathogenic Bacteria: Implications for the Definition of Self

PubMed Central

Bremel, Robert D.; Homan, E. Jane

2015-01-01

T-cell receptor binding to MHC-bound peptides plays a key role in discrimination between self and non-self. Only a subset, typically a pentamer, of amino acids in a MHC-bound peptide form the motif exposed to the T-cell receptor. We categorize and compare the T-cell exposed amino acid motif repertoire of the total proteomes of two groups of bacteria, comprising pathogens and gastrointestinal microbiome organisms, with the human proteome and immunoglobulins. Given the maximum 205, or 3.2 million of such motifs that bind T-cell receptors, there is considerable overlap in motif usage. We show that the human proteome, exclusive of immunoglobulins, only comprises three quarters of the possible motifs, of which 65.3% are also present in both composite bacterial proteomes. Very few motifs are unique to the human proteome. Immunoglobulin variable regions carry a broad diversity of T-cell exposed motifs (TCEMs) that provides a stratified random sample of the motifs found in pathogens, microbiome, and the human proteome. Individual bacterial genera and species vary in the content of immunoglobulin and human proteome matched motifs that they carry. Mycobacteria and Burkholderia spp carry a particularly high content of such matched motifs. Some bacteria retain a unique motif signature and motif sharing pattern with the human proteome. The implication is that distinguishing self from non-self does not depend on individual TCEMs, but on a complex and dynamic overlay of signals wherein the same TCEM may play different roles in different organisms, and the frequency with which a particular TCEM appears influences its effect. The patterns observed provide clues to bacterial immune evasion and to strategies for intervention, including vaccine design. The breadth and distinct frequency patterns of the immunoglobulin-derived peptides suggest a role of immunoglobulins in maintaining a broadly responsive T-cell repertoire. PMID:26557118
CombiMotif: A new algorithm for network motifs discovery in protein-protein interaction networks

NASA Astrophysics Data System (ADS)

Luo, Jiawei; Li, Guanghui; Song, Dan; Liang, Cheng

2014-12-01

Discovering motifs in protein-protein interaction networks is becoming a current major challenge in computational biology, since the distribution of the number of network motifs can reveal significant systemic differences among species. However, this task can be computationally expensive because of the involvement of graph isomorphic detection. In this paper, we present a new algorithm (CombiMotif) that incorporates combinatorial techniques to count non-induced occurrences of subgraph topologies in the form of trees. The efficiency of our algorithm is demonstrated by comparing the obtained results with the current state-of-the art subgraph counting algorithms. We also show major differences between unicellular and multicellular organisms. The datasets and source code of CombiMotif are freely available upon request.
Identification and preliminary characterization of a protein motif related to the zinc finger.

PubMed Central

Lovering, R; Hanson, I M; Borden, K L; Martin, S; O'Reilly, N J; Evan, G I; Rahman, D; Pappin, D J; Trowsdale, J; Freemont, P S

1993-01-01

We have identified a protein motif, related to the zinc finger, which defines a newly discovered family of proteins. The motif was found in the sequence of the human RING1 gene, which is proximal to the major histocompatibility complex region on chromosome six. We propose naming this motif the "RING finger" and it is found in 27 proteins, all of which have putative DNA binding functions. We have synthesized a peptide corresponding to the RING1 motif and examined a number of properties, including metal and DNA binding. We provide evidence to support the suggestion that the RING finger motif is the DNA binding domain of this newly defined family of proteins. Images Fig. 1 Fig. 4 PMID:7681583
Genomic characterization and phylogenetic analysis of Zika virus circulating in the Americas.

PubMed

Ye, Qing; Liu, Zhong-Yu; Han, Jian-Feng; Jiang, Tao; Li, Xiao-Feng; Qin, Cheng-Feng

2016-09-01

The rapid spread and potential link with birth defects have made Zika virus (ZIKV) a global public health problem. The virus was discovered 70years ago, yet the knowledge about its genomic structure and the genetic variations associated with current ZIKV explosive epidemics remains not fully understood. In this review, the genome organization, especially conserved terminal structures of ZIKV genome were characterized and compared with other mosquito-borne flaviviruses. It is suggested that major viral proteins of ZIKV share high structural and functional similarity with other known flaviviruses as shown by sequence comparison and prediction of functional motifs in viral proteins. Phylogenetic analysis demonstrated that all ZIKV strains circulating in the America form a unique clade within the Asian lineage. Furthermore, we identified a series of conserved amino acid residues that differentiate the Asian strains including the current circulating American strains from the ancient African strains. Overall, our findings provide an overview of ZIKV genome characterization and evolutionary dynamics in the Americas and point out critical clues for future virological and epidemiological studies. Copyright © 2016 Elsevier B.V. All rights reserved.
Dynamic Fluctuations of Protein-Carbohydrate Interactions Promote Protein Aggregation

PubMed Central

Voynov, Vladimir; Chennamsetty, Naresh; Kayser, Veysel; Helk, Bernhard; Forrer, Kurt; Zhang, Heidi; Fritsch, Cornelius; Heine, Holger; Trout, Bernhardt L.

2009-01-01

Protein-carbohydrate interactions are important for glycoprotein structure and function. Antibodies of the IgG class, with increasing significance as therapeutics, are glycosylated at a conserved site in the constant Fc region. We hypothesized that disruption of protein-carbohydrate interactions in the glycosylated domain of antibodies leads to the exposure of aggregation-prone motifs. Aggregation is one of the main problems in protein-based therapeutics because of immunogenicity concerns and decreased efficacy. To explore the significance of intramolecular interactions between aromatic amino acids and carbohydrates in the IgG glycosylated domain, we utilized computer simulations, fluorescence analysis, and site-directed mutagenesis. We find that the surface exposure of one aromatic amino acid increases due to dynamic fluctuations. Moreover, protein-carbohydrate interactions decrease upon stress, while protein-protein and carbohydrate-carbohydrate interactions increase. Substitution of the carbohydrate-interacting aromatic amino acids with non-aromatic residues leads to a significantly lower stability than wild type, and to compromised binding to Fc receptors. Our results support a mechanism for antibody aggregation via decreased protein-carbohydrate interactions, leading to the exposure of aggregation-prone regions, and to aggregation. PMID:20037630
Telobox motifs recruit CLF/SWN-PRC2 for H3K27me3 deposition via TRB factors in Arabidopsis.

PubMed

Zhou, Yue; Wang, Yuejun; Krause, Kristin; Yang, Tingting; Dongus, Joram A; Zhang, Yijing; Turck, Franziska

2018-05-01

Polycomb repressive complexes (PRCs) control organismic development in higher eukaryotes through epigenetic gene repression 1-4 . PRC proteins do not contain DNA-binding domains, thus prompting questions regarding how PRCs find their target loci 5 . Here we present genome-wide evidence of PRC2 recruitment by telomere-repeat-binding factors (TRBs) through telobox-related motifs in Arabidopsis. A triple trb1-2, trb2-1, and trb3-2 (trb1/2/3) mutant with a developmental phenotype and a transcriptome strikingly similar to those of strong PRC2 mutants showed redistribution of trimethyl histone H3 Lys27 (H3K27me3) marks and lower H3K27me3 levels, which were correlated with derepression of TRB1-target genes. TRB1-3 physically interacted with the PRC2 proteins CLF and SWN. A SEP3 reporter gene with a telobox mutation showed ectopic expression, which was correlated with H3K27me3 depletion, whereas tethering TRB1 to the mutated cis element partially restored repression. We propose that telobox-related motifs recruit PRC2 through the interaction between TRBs and CLF/SWN, a mechanism essential for H3K27me3 deposition at a subset of target genes.
miRNA Enriched in Human Neuroblast Nuclei Bind the MAZ Transcription Factor and Their Precursors Contain the MAZ Consensus Motif.

PubMed

Goldie, Belinda J; Fitzsimmons, Chantel; Weidenhofer, Judith; Atkins, Joshua R; Wang, Dan O; Cairns, Murray J

2017-01-01

While the cytoplasmic function of microRNA (miRNA) as post-transcriptional regulators of mRNA has been the subject of significant research effort, their activity in the nucleus is less well characterized. Here we use a human neuronal cell model to show that some mature miRNA are preferentially enriched in the nucleus. These molecules were predominantly primate-specific and contained a sequence motif with homology to the consensus MAZ transcription factor binding element. Precursor miRNA containing this motif were shown to have affinity for MAZ protein in nuclear extract. We then used Ago1/2 RIP-Seq to explore nuclear miRNA-associated mRNA targets. Interestingly, the genes for Ago2-associated transcripts were also significantly enriched with MAZ binding sites and neural function, whereas Ago1-transcripts were associated with general metabolic processes and localized with SC35 spliceosomes. These findings suggest the MAZ transcription factor is associated with miRNA in the nucleus and may influence the regulation of neuronal development through Ago2-associated miRNA induced silencing complexes. The MAZ transcription factor may therefore be important for organizing higher order integration of transcriptional and post-transcriptional processes in primate neurons.
Layered structures of organic/inorganic hybrid halide perovskites

NASA Astrophysics Data System (ADS)

Huan, Tran Doan; Tuoc, Vu Ngoc; Minh, Nguyen Viet

2016-03-01

Organic-inorganic hybrid halide perovskites, in which the A cations of an ABX3 perovskite are replaced by organic cations, may be used for photovoltaic and solar thermoelectric applications. In this contribution, we systematically study three lead-free hybrid perovskites, i.e., methylammonium tin iodide CH3NH3SnI3 , ammonium tin iodide NH4SnI3 , and formamidnium tin iodide HC (NH2)2SnI3 by first-principles calculations. We find that in addition to the commonly known motif in which the corner-shared SnI6 octahedra form a three-dimensional network, these materials may also favor a two-dimensional (layered) motif formed by alternating layers of the SnI6 octahedra and the organic cations. These two motifs are nearly equal in free energy and are separated by low barriers. These layered structures features many flat electronic bands near the band edges, making their electronic structures significantly different from those of the structural phases composed of three-dimension networks of SnI6 octahedra. Furthermore, because the electronic structures of HC (NH2)2SnI3 are found to be rather similar to those of CH3NH3SnI3 , formamidnium tin iodide may also be promising for the applications of methylammonium tin iodide.
Are Long-Range Structural Correlations Behind the Aggregration Phenomena of Polyglutamine Diseases?

PubMed Central

Moradi, Mahmoud; Babin, Volodymyr; Roland, Christopher; Sagui, Celeste

2012-01-01

We have characterized the conformational ensembles of polyglutamine peptides of various lengths (ranging from to ), both with and without the presence of a C-terminal polyproline hexapeptide. For this, we used state-of-the-art molecular dynamics simulations combined with a novel statistical analysis to characterize the various properties of the backbone dihedral angles and secondary structural motifs of the glutamine residues. For (i.e., just above the pathological length for Huntington's disease), the equilibrium conformations of the monomer consist primarily of disordered, compact structures with non-negligible -helical and turn content. We also observed a relatively small population of extended structures suitable for forming aggregates including - and -strands, and - and -hairpins. Most importantly, for we find that there exists a long-range correlation (ranging for at least residues) among the backbone dihedral angles of the Q residues. For polyglutamine peptides below the pathological length, the population of the extended strands and hairpins is considerably smaller, and the correlations are short-range (at most residues apart). Adding a C-terminal hexaproline to suppresses both the population of these rare motifs and the long-range correlation of the dihedral angles. We argue that the long-range correlation of the polyglutamine homopeptide, along with the presence of these rare motifs, could be responsible for its aggregation phenomena. PMID:22577357
Auxiliary KChIP4a Suppresses A-type K+ Current through Endoplasmic Reticulum (ER) Retention and Promoting Closed-state Inactivation of Kv4 Channels*

PubMed Central

Tang, Yi-Quan; Liang, Ping; Zhou, Jingheng; Lu, Yanxin; Lei, Lei; Bian, Xiling; Wang, KeWei

2013-01-01

In the brain and heart, auxiliary Kv channel-interacting proteins (KChIPs) co-assemble with pore-forming Kv4 α-subunits to form a native K+ channel complex and regulate the expression and gating properties of Kv4 currents. Among the KChIP1–4 members, KChIP4a exhibits a unique N terminus that is known to suppress Kv4 function, but the underlying mechanism of Kv4 inhibition remains unknown. Using a combination of confocal imaging, surface biotinylation, and electrophysiological recordings, we identified a novel endoplasmic reticulum (ER) retention motif, consisting of six hydrophobic and aliphatic residues, 12–17 (LIVIVL), within the KChIP4a N-terminal KID, that functions to reduce surface expression of Kv4-KChIP complexes. This ER retention capacity is transferable and depends on its flanking location. In addition, adjacent to the ER retention motif, the residues 19–21 (VKL motif) directly promote closed-state inactivation of Kv4.3, thus leading to an inhibition of channel current. Taken together, our findings demonstrate that KChIP4a suppresses A-type Kv4 current via ER retention and enhancement of Kv4 closed-state inactivation. PMID:23576435
Discrete Determinants in ArfGAP2/3 Conferring Golgi Localization and Regulation by the COPI Coat

PubMed Central

Kliouchnikov, Lena; Bigay, Joëlle; Mesmin, Bruno; Parnis, Anna; Rawet, Moran; Goldfeder, Noga; Antonny, Bruno

2009-01-01

From yeast to mammals, two types of GTPase-activating proteins, ArfGAP1 and ArfGAP2/3, control guanosine triphosphate (GTP) hydrolysis on the small G protein ADP-ribosylation factor (Arf) 1 at the Golgi apparatus. Although functionally interchangeable, they display little similarity outside the catalytic GTPase-activating protein (GAP) domain, suggesting differential regulation. ArfGAP1 is controlled by membrane curvature through its amphipathic lipid packing sensor motifs, whereas Golgi targeting of ArfGAP2 depends on coatomer, the building block of the COPI coat. Using a reporter fusion approach and in vitro assays, we identified several functional elements in ArfGAP2/3. We show that the Golgi localization of ArfGAP3 depends on both a central basic stretch and a carboxy-amphipathic motif. The basic stretch interacts directly with coatomer, which we found essential for the catalytic activity of ArfGAP3 on Arf1-GTP, whereas the carboxy-amphipathic motif interacts directly with lipid membranes but has minor role in the regulation of ArfGAP3 activity. Our findings indicate that the two types of ArfGAP proteins that reside at the Golgi use a different combination of protein–protein and protein–lipid interactions to promote GTP hydrolysis in Arf1-GTP. PMID:19109418
Transcription Factor Binding Profiles Reveal Cyclic Expression of Human Protein-coding Genes and Non-coding RNAs

PubMed Central

Cheng, Chao; Ung, Matthew; Grant, Gavin D.; Whitfield, Michael L.

2013-01-01

Cell cycle is a complex and highly supervised process that must proceed with regulatory precision to achieve successful cellular division. Despite the wide application, microarray time course experiments have several limitations in identifying cell cycle genes. We thus propose a computational model to predict human cell cycle genes based on transcription factor (TF) binding and regulatory motif information in their promoters. We utilize ENCODE ChIP-seq data and motif information as predictors to discriminate cell cycle against non-cell cycle genes. Our results show that both the trans- TF features and the cis- motif features are predictive of cell cycle genes, and a combination of the two types of features can further improve prediction accuracy. We apply our model to a complete list of GENCODE promoters to predict novel cell cycle driving promoters for both protein-coding genes and non-coding RNAs such as lincRNAs. We find that a similar percentage of lincRNAs are cell cycle regulated as protein-coding genes, suggesting the importance of non-coding RNAs in cell cycle division. The model we propose here provides not only a practical tool for identifying novel cell cycle genes with high accuracy, but also new insights on cell cycle regulation by TFs and cis-regulatory elements. PMID:23874175
Regulation of HTLV-1 Gag budding by Vps4A, Vps4B, and AIP1/Alix

PubMed Central

Urata, Shuzo; Yokosawa, Hideyoshi; Yasuda, Jiro

2007-01-01

Background HTLV-1 Gag protein is a matrix protein that contains the PTAP and PPPY sequences as L-domain motifs and which can be released from mammalian cells in the form of virus-like particles (VLPs). The cellular factors Tsg101 and Nedd4.1 interact with PTAP and PPPY, respectively, within the HTLV-1 Gag polyprotein. Tsg101 forms a complex with Vps28 and Vps37 (ESCRT-I complex) and plays an important role in the class E Vps pathway, which mediates protein sorting and invagination of vesicles into multivesicular bodies. Nedd4.1 is an E3 ubiquitin ligase that binds to the PPPY motif through its WW motif, but its function is still unknown. In the present study, to investigate the mechanism of HTLV-1 budding in detail, we analyzed HTLV-1 budding using dominant negative (DN) forms of the class E proteins. Results Here, we report that DN forms of Vps4A, Vps4B, and AIP1 inhibit HTLV-1 budding. Conclusion These findings suggest that HTLV-1 budding utilizes the MVB pathway and that these class E proteins may be targets for prevention of mother-to-infant vertical transmission of the virus. PMID:17601348
Identification of a conserved B-cell epitope on the GapC protein of Streptococcus dysgalactiae.

PubMed

Zhang, Limeng; Zhou, Xue; Fan, Ziyao; Tang, Wei; Chen, Liang; Dai, Jian; Wei, Yuhua; Zhang, Jianxin; Yang, Xuan; Yang, Xijing; Liu, Daolong; Yu, Liquan; Zhang, Hua; Wu, Zhijun; Yu, Yongzhong; Sun, Hunan; Cui, Yudong

2015-01-01

Streptococcus dysgalactiae (S. dysgalactia) GapC is a highly conserved surface dehydrogenase among the streptococcus spp., which is responsible for inducing protective antibody immune responses in animals. However, the B-cell epitope of S. dysgalactia GapC have not been well characterized. In this study, a monoclonal antibody 1F2 (mAb1F2) against S. dysgalactiae GapC was generated by the hybridoma technique and used to screen a phage-displayed 12-mer random peptide library (Ph.D.-12) for mapping the linear B-cell epitope. The mAb1F2 recognized phages displaying peptides with the consensus motif TRINDLT. Amino acid sequence of the motif exactly matched (30)TRINDLT(36) of the S. dysgalactia GapC. Subsequently, site-directed mutagenic analysis further demonstrated that residues R31, I32, N33, D34 and L35 formed the core of (30)TRINDLT(36), and this core motif was the minimal determinant of the B-cell epitope recognized by the mAb1F2. The epitope (30)TRINDLT(36) showed high homology among different streptococcus species. Overall, our findings characterized a conserved B-cell epitope, which will be useful for the further study of epitope-based vaccines. Copyright © 2015 Elsevier Ltd. All rights reserved.
Auxiliary KChIP4a suppresses A-type K+ current through endoplasmic reticulum (ER) retention and promoting closed-state inactivation of Kv4 channels.

PubMed

Tang, Yi-Quan; Liang, Ping; Zhou, Jingheng; Lu, Yanxin; Lei, Lei; Bian, Xiling; Wang, KeWei

2013-05-24

In the brain and heart, auxiliary Kv channel-interacting proteins (KChIPs) co-assemble with pore-forming Kv4 α-subunits to form a native K(+) channel complex and regulate the expression and gating properties of Kv4 currents. Among the KChIP1-4 members, KChIP4a exhibits a unique N terminus that is known to suppress Kv4 function, but the underlying mechanism of Kv4 inhibition remains unknown. Using a combination of confocal imaging, surface biotinylation, and electrophysiological recordings, we identified a novel endoplasmic reticulum (ER) retention motif, consisting of six hydrophobic and aliphatic residues, 12-17 (LIVIVL), within the KChIP4a N-terminal KID, that functions to reduce surface expression of Kv4-KChIP complexes. This ER retention capacity is transferable and depends on its flanking location. In addition, adjacent to the ER retention motif, the residues 19-21 (VKL motif) directly promote closed-state inactivation of Kv4.3, thus leading to an inhibition of channel current. Taken together, our findings demonstrate that KChIP4a suppresses A-type Kv4 current via ER retention and enhancement of Kv4 closed-state inactivation.
Mechanism for recognition of polyubiquitin chains: balancing affinity through interplay between multivalent binding and dynamics.

PubMed

Markin, Craig J; Xiao, Wei; Spyracopoulos, Leo

2010-08-18

RAP80 plays a key role in signal transduction in the DNA damage response by recruiting proteins to DNA damage foci by binding K63-polyubiquitin chains with two tandem ubiquitin-interacting motifs (tUIM). It is generally recognized that the typically weak interaction between ubiquitin (Ub) and various recognition motifs is intensified by themes such as tandem recognition motifs and Ub polymerization to achieve biological relevance. However, it remains an intricate problem to develop a detailed molecular mechanism to describe the process that leads to amplification of the Ub signal. A battery of solution-state NMR methods and molecular dynamics simulations were used to demonstrate that RAP80-tUIM employs mono- and multivalent interactions with polyUb chains to achieve enhanced affinity in comparison to monoUb interactions for signal amplification. The enhanced affinity is balanced by unfavorable entropic effects that include partial quenching of rapid reorientation between individual UIM domains and individual Ub domains in the bound state. For the RAP80-tUIM-polyUb interaction, increases in affinity with increasing chain length are a result of increased numbers of mono- and multivalent binding sites in the longer polyUb chains. The mono- and multivalent interactions are characterized by intrinsically weak binding and fast off-rates; these weak interactions with fast kinetics may be an important factor underlying the transient nature of protein-protein interactions that comprise DNA damage foci.

Multiple Copies of a Simple MYB-Binding Site Confers Trans-regulation by Specific Flavonoid-Related R2R3 MYBs in Diverse Species.

PubMed

Brendolise, Cyril; Espley, Richard V; Lin-Wang, Kui; Laing, William; Peng, Yongyan; McGhie, Tony; Dejnoprat, Supinya; Tomes, Sumathi; Hellens, Roger P; Allan, Andrew C

2017-01-01

In apple, the MYB transcription factor MYB10 controls the accumulation of anthocyanins. MYB10 is able to auto-activate its expression by binding its own promoter at a specific motif, the R1 motif. In some apple accessions a natural mutation, termed R6, has more copies of this motif within the MYB10 promoter resulting in stronger auto-activation and elevated anthocyanins. Here we show that other anthocyanin-related MYBs selected from apple, pear, strawberry, petunia, kiwifruit and Arabidopsis are able to activate promoters containing the R6 motif. To examine the specificity of this motif, members of the R2R3 MYB family were screened against a promoter harboring the R6 mutation. Only MYBs from subgroups 5 and 6 activate expression by binding the R6 motif, with these MYBs sharing conserved residues in their R2R3 DNA binding domains. Insertion of the apple R6 motif into orthologous promoters of MYB10 in pear ( PcMYB10 ) and Arabidopsis ( AtMY75 ) elevated anthocyanin levels. Introduction of the R6 motif into the promoter region of an anthocyanin biosynthetic enzyme F3'5'H of kiwifruit imparts regulation by MYB10. This results in elevated levels of delphinidin in both tobacco and kiwifruit. Finally, an R6 motif inserted into the promoter the vitamin C biosynthesis gene GDP-L-Gal phosphorylase increases vitamin C content in a MYB10-dependent manner. This motif therefore provides a tool to re-engineer novel MYB-regulated responses in plants.
Combinations of various CpG motifs cloned into plasmid backbone modulate and enhance protective immunity of viral replicon DNA anthrax vaccines.

PubMed

Yu, Yun-Zhou; Ma, Yao; Xu, Wen-Hui; Wang, Shuang; Sun, Zhi-Wei

2015-08-01

DNA vaccines are generally weak stimulators of the immune system. Fortunately, their efficacy can be improved using a viral replicon vector or by the addition of immunostimulatory CpG motifs, although the design of these engineered DNA vectors requires optimization. Our results clearly suggest that multiple copies of three types of CpG motifs or combinations of various types of CpG motifs cloned into a viral replicon vector backbone with strong immunostimulatory activities on human PBMC are efficient adjuvants for these DNA vaccines to modulate and enhance protective immunity against anthrax, although modifications with these different CpG forms in vivo elicited inconsistent immune response profiles. Modification with more copies of CpG motifs elicited more potent adjuvant effects leading to the generation of enhanced immunity, which indicated a CpG motif dose-dependent enhancement of antigen-specific immune responses. Notably, the enhanced and/or synchronous adjuvant effects were observed in modification with combinations of two different types of CpG motifs, which provides not only a contribution to the knowledge base on the adjuvant activities of CpG motifs combinations but also implications for the rational design of optimal DNA vaccines with combinations of CpG motifs as "built-in" adjuvants. We describe an efficient strategy to design and optimize DNA vaccines by the addition of combined immunostimulatory CpG motifs in a viral replicon DNA plasmid to produce strong immune responses, which indicates that the CpG-modified viral replicon DNA plasmid may be desirable for use as vector of DNA vaccines.
Evolutionary dynamics of a conserved sequence motif in the ribosomal genes of the ciliate Paramecium

PubMed Central

2010-01-01

Background In protozoa, the identification of preserved motifs by comparative genomics is often impeded by difficulties to generate reliable alignments for non-coding sequences. Moreover, the evolutionary dynamics of regulatory elements in 3' untranslated regions (both in protozoa and metazoa) remains a virtually unexplored issue. Results By screening Paramecium tetraurelia's 3' untranslated regions for 8-mers that were previously found to be preserved in mammalian 3' UTRs, we detect and characterize a motif that is distinctly conserved in the ribosomal genes of this ciliate. The motif appears to be conserved across Paramecium aurelia species but is absent from the ribosomal genes of four additional non-Paramecium species surveyed, including another ciliate, Tetrahymena thermophila. Motif-free ribosomal genes retain fewer paralogs in the genome and appear to be lost more rapidly relative to motif-containing genes. Features associated with the discovered preserved motif are consistent with this 8-mer playing a role in post-transcriptional regulation. Conclusions Our observations 1) shed light on the evolution of a putative regulatory motif across large phylogenetic distances; 2) are expected to facilitate the understanding of the modulation of ribosomal genes expression in Paramecium; and 3) reveal a largely unexplored--and presumably not restricted to Paramecium--association between the presence/absence of a DNA motif and the evolutionary fate of its host genes. PMID:20441586
Dienogest inhibits C-C motif chemokine ligand 20 expression in human endometriotic epithelial cells.

PubMed

Mita, Shizuka; Nakakuki, Masanori; Ichioka, Masayuki; Shimizu, Yutaka; Hashiba, Masamichi; Miyazaki, Hiroyasu; Kyo, Satoru

2017-07-01

C-C motif chemokine ligand 20 is thought to contribute to the development of endometriosis by recruiting Th17 lymphocytes into endometriotic foci. The present study investigated the effects of dienogest, a progesterone receptor agonist used to treat endometriosis, on C-C motif chemokine ligand 20 expression by endometriotic cells. Effects of dienogest on mRNA expression and protein secretion of C-C motif chemokine ligand 20 induced by interleukin 1β were assessed in three immortalized endometriotic epithelial cell lines, parental cells (EMosis-CC/TERT1), and stably expressing human progesterone receptor isoform A (EMosis-CC/TERT1/PRA+) or isoform B (EMosis-CC/TERT1/PRA-/PRB+). Dienogest markedly inhibited interleukin 1β-stimulated C-C motif chemokine ligand 20 mRNA expression and protein secretion in EMosis-CC/TERT1/PRA-/PRB+, which was abrogated by the progesterone receptor antagonist RU486. In EMosis-CC/TERT1/PRA+, dienogest slightly inhibited C-C motif chemokine ligand 20 mRNA and protein. In EMosis-CC/TERT1, dienogest slightly inhibited C-C motif chemokine ligand 20 mRNA, but had no effect on C-C motif chemokine ligand 20 protein. Dienogest inhibited interleukin 1β-induced up-regulation of C-C motif chemokine ligand 20 in endometriotic epithelial cells, mainly mediated by progesterone receptor B. Copyright © 2017 Elsevier B.V. All rights reserved.
Transcriptional regulation of Saccharomyces cerevisiaeCYS3 encoding cystathionine γ-lyase

PubMed Central

Hiraishi, Hiroyuki; Miyake, Tsuyoshi

2008-01-01

In studying the regulation of GSH11, the structural gene of the high-affinity glutathione transporter (GSH-P1) in Saccharomyces cerevisiae, a cis-acting cysteine responsive element, CCGCCACAC (CCG motif), was detected. Like GSH-P1, the cystathionine γ-lyase encoded by CYS3 is induced by sulfur starvation and repressed by addition of cysteine to the growth medium. We detected a CCG motif (−311 to −303) and a CGC motif (CGCCACAC; −193 to −186), which is one base shorter than the CCG motif, in the 5′-upstream region of CYS3. One copy of the centromere determining element 1, CDE1 (TCACGTGA; −217 to −210), being responsible for regulation of the sulfate assimilation pathway genes, was also detected. We tested the roles of these three elements in the regulation of CYS3. Using a lacZ-reporter assay system, we found that the CCG/CGC motif is required for activation of CYS3, as well as for its repression by cysteine. In contrast, the CDE1 motif was responsible for only activation of CYS3. We also found that two transcription factors, Met4 and VDE, are responsible for activation of CYS3 through the CCG/CGC and CDE1 motifs. These observations suggest a dual regulation of CYS3 by factors that interact with the CDE1 motif and the CCG/CGC motifs. PMID:18317767
Noncoding RNA danger motifs bridge innate and adaptive immunity and are potent adjuvants for vaccination

PubMed Central

Wang, Lilin; Smith, Dan; Bot, Simona; Dellamary, Luis; Bloom, Amy; Bot, Adrian

2002-01-01

The adaptive immune response is triggered by recognition of T and B cell epitopes and is influenced by “danger” motifs that act via innate immune receptors. This study shows that motifs associated with noncoding RNA are essential features in the immune response reminiscent of viral infection, mediating rapid induction of proinflammatory chemokine expression, recruitment and activation of antigen-presenting cells, modulation of regulatory cytokines, subsequent differentiation of Th1 cells, isotype switching, and stimulation of cross-priming. The heterogeneity of RNA-associated motifs results in differential binding to cellular receptors, and specifically impacts the immune profile. Naturally occurring double-stranded RNA (dsRNA) triggered activation of dendritic cells and enhancement of specific immunity, similar to selected synthetic dsRNA motifs. Based on the ability of specific RNA motifs to block tolerance induction and effectively organize the immune defense during viral infection, we conclude that such RNA species are potent danger motifs. We also demonstrate the feasibility of using selected RNA motifs as adjuvants in the context of novel aerosol carriers for optimizing the immune response to subunit vaccines. In conclusion, RNA-associated motifs produced during viral infection bridge the early response with the late adaptive phase, regulating the activation and differentiation of antigen-specific B and T cells, in addition to a short-term impact on innate immunity. PMID:12393853
The Methionine-aromatic Motif Plays a Unique Role in Stabilizing Protein Structure*

PubMed Central

Valley, Christopher C.; Cembran, Alessandro; Perlmutter, Jason D.; Lewis, Andrew K.; Labello, Nicholas P.; Gao, Jiali; Sachs, Jonathan N.

2012-01-01

Of the 20 amino acids, the precise function of methionine (Met) remains among the least well understood. To establish a determining characteristic of methionine that fundamentally differentiates it from purely hydrophobic residues, we have used in vitro cellular experiments, molecular simulations, quantum calculations, and a bioinformatics screen of the Protein Data Bank. We show that approximately one-third of all known protein structures contain an energetically stabilizing Met-aromatic motif and, remarkably, that greater than 10,000 structures contain this motif more than 10 times. Critically, we show that as compared with a purely hydrophobic interaction, the Met-aromatic motif yields an additional stabilization of 1–1.5 kcal/mol. To highlight its importance and to dissect the energetic underpinnings of this motif, we have studied two clinically relevant TNF ligand-receptor complexes, namely TRAIL-DR5 and LTα-TNFR1. In both cases, we show that the motif is necessary for high affinity ligand binding as well as function. Additionally, we highlight previously overlooked instances of the motif in several disease-related Met mutations. Our results strongly suggest that the Met-aromatic motif should be exploited in the rational design of therapeutics targeting a range of proteins. PMID:22859300
Deletion of transcription factor binding motifs using the CRISPR/spCas9 system in the β-globin LCR.

PubMed

Kim, Yea Woon; Kim, AeRi

2017-07-20

Transcription factors play roles in gene transcription through direct binding to their motifs in genome, and inhibiting this binding provides an effective strategy for studying their roles. Here we applied the CRISPR/spCas9 system to mutate the binding motifs of transcription factors. Binding motifs for erythroid specific transcription factors were mutated in the locus control region hypersensitive sites of the human β-globin locus. Guide RNAs targeting binding motifs were cloned into lentiviral CRISPR vector containing the spCas9 gene, and transduced into MEL/ch11 cells carrying a human chromosome 11. DNA mutations in clonal cells were initially screened by quantitative PCR in genomic DNA and then clarified by sequencing. Mutations in binding motifs reduced occupancy by transcription factors in a chromatin environment. Characterization of mutations revealed that the CRISPR/spCas9 system mainly induced deletions in short regions of <20 bp and preferentially deleted nucleotides around the fifth nucleotide upstream of Protospacer adjacent motifs. These results indicate that the CRISPR/Cas9 system is suitable for mutating the binding motifs of transcription factors, and, consequently, would contribute to elucidate the direct roles of transcription factors. ©2017 The Author(s).
Mutational Analysis of the QRRQ Motif in the Yeast Hig1 Type 2 Protein Rcf1 Reveals a Regulatory Role for the Cytochrome c Oxidase Complex*

PubMed Central

Garlich, Joshua; Strecker, Valentina; Wittig, Ilka; Stuart, Rosemary A.

2017-01-01

The yeast Rcf1 protein is a member of the conserved family of proteins termed the hypoxia-induced gene (domain) 1 (Hig1 or HIGD1) family. Rcf1 interacts with components of the mitochondrial oxidative phosphorylation system, in particular the cytochrome bc1 (complex III)-cytochrome c oxidase (complex IV) supercomplex (termed III-IV) and the ADP/ATP carrier proteins. Rcf1 plays a role in the assembly and modulation of the activity of complex IV; however, the molecular basis for how Rcf1 influences the activity of complex IV is currently unknown. Hig1 type 2 isoforms, which include the Rcf1 protein, are characterized in part by the presence of a conserved motif, (Q/I)X3(R/H)XRX3Q, termed here the QRRQ motif. We show that mutation of conserved residues within the Rcf1 QRRQ motif alters the interactions between Rcf1 and partner proteins and results in the destabilization of complex IV and alteration of its enzymatic properties. Our findings indicate that Rcf1 does not serve as a stoichiometric component, i.e. as a subunit of complex IV, to support its activity. Rather, we propose that Rcf1 serves to dynamically interact with complex IV during its assembly process and, in doing so, regulates a late maturation step of complex IV. We speculate that the Rcf1/Hig1 proteins play a role in the incorporation and/or remodeling of lipids, in particular cardiolipin, into complex IV and. possibly, other mitochondrial proteins such as ADP/ATP carrier proteins. PMID:28167530
Novel DNA Motif Binding Activity Observed In Vivo With an Estrogen Receptor α Mutant Mouse

PubMed Central

Li, Leping; Grimm, Sara A.; Winuthayanon, Wipawee; Hamilton, Katherine J.; Pockette, Brianna; Rubel, Cory A.; Pedersen, Lars C.; Fargo, David; Lanz, Rainer B.; DeMayo, Francesco J.; Schütz, Günther; Korach, Kenneth S.

2014-01-01

Estrogen receptor α (ERα) interacts with DNA directly or indirectly via other transcription factors, referred to as “tethering.” Evidence for tethering is based on in vitro studies and a widely used “KIKO” mouse model containing mutations that prevent direct estrogen response element DNA- binding. KIKO mice are infertile, due in part to the inability of estradiol (E2) to induce uterine epithelial proliferation. To elucidate the molecular events that prevent KIKO uterine growth, regulation of the pro-proliferative E2 target gene Klf4 and of Klf15, a progesterone (P4) target gene that opposes the pro-proliferative activity of KLF4, was evaluated. Klf4 induction was impaired in KIKO uteri; however, Klf15 was induced by E2 rather than by P4. Whole uterine chromatin immunoprecipitation-sequencing revealed enrichment of KIKO ERα binding to hormone response elements (HREs) motifs. KIKO binding to HRE motifs was verified using reporter gene and DNA-binding assays. Because the KIKO ERα has HRE DNA-binding activity, we evaluated the “EAAE” ERα, which has more severe DNA-binding domain mutations, and demonstrated a lack of estrogen response element or HRE reporter gene induction or DNA-binding. The EAAE mouse has an ERα null–like phenotype, with impaired uterine growth and transcriptional activity. Our findings demonstrate that the KIKO mouse model, which has been used by numerous investigators, cannot be used to establish biological functions for ERα tethering, because KIKO ERα effectively stimulates transcription using HRE motifs. The EAAE-ERα DNA-binding domain mutant mouse demonstrates that ERα DNA-binding is crucial for biological and transcriptional processes in reproductive tissues and that ERα tethering may not contribute to estrogen responsiveness in vivo. PMID:24713037
Semaphorin4D Drives CD8+ T-Cell Lesional Trafficking in Oral Lichen Planus via CXCL9/CXCL10 Upregulations in Oral Keratinocytes.

PubMed

Ke, Yao; Dang, Erle; Shen, Shengxian; Zhang, Tongmei; Qiao, Hongjiang; Chang, Yuqian; Liu, Qing; Wang, Gang

2017-11-01

Chemokine-mediated CD8 + T-cell recruitment is an essential but not well-established event for the persistence of oral lichen planus (OLP). Semaphorin 4D (Sema4D)/CD100 is implicated in immune dysfunction, chemokine modulation, and cell migration, which are critical aspects for OLP progression, but its implication in OLP pathogenesis has not been determined. In this study, we sought to explicate the effect of Sema4D on human oral keratinocytes and its capacity to drive CD8 + T-cell lesional trafficking via chemokine modulation. We found that upregulations of sSema4D in OLP tissues and blood were positively correlated with disease severity and activity. In vitro observation revealed that Sema4D induced C-X-C motif chemokine ligand 9/C-X-C motif chemokine ligand 10 production by binding to plexin-B1 via protein kinase B-NF-κB cascade in human oral keratinocytes, which elicited OLP CD8 + T-cell migration. We also confirmed using clinical samples that elevated C-X-C motif chemokine ligand 9/C-X-C motif chemokine ligand 10 levels were positively correlated with sSema4D levels in OLP lesions and serum. Notably, we determined matrix metalloproteinase-9 as a new proteolytic enzyme for the cleavage of sSema4D from the T-cell surface, which may contribute to the high levels of sSema4D in OLP lesions and serum. Our findings conclusively revealed an amplification feedback loop involving T cells, chemokines, and Sema4D-dependent signal that promotes OLP progression. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Molecular origin of the binding of WWOX tumor suppressor to ErbB4 receptor tyrosine kinase.

PubMed

Schuchardt, Brett J; Bhat, Vikas; Mikles, David C; McDonald, Caleb B; Sudol, Marius; Farooq, Amjad

2013-12-23

The ability of WWOX tumor suppressor to physically associate with the intracellular domain (ICD) of ErbB4 receptor tyrosine kinase is believed to play a central role in downregulating the transcriptional function of the latter. Herein, using various biophysical methods, we show that while the WW1 domain of WWOX binds to PPXY motifs located within the ICD of ErbB4 in a physiologically relevant manner, the WW2 domain does not. Importantly, while the WW1 domain absolutely requires the integrity of the PPXY consensus sequence, nonconsensus residues within and flanking this motif do not appear to be critical for binding. This strongly suggests that the WW1 domain of WWOX is rather promiscuous toward its cellular partners. We also provide evidence that the lack of binding of the WW2 domain of WWOX to PPXY motifs is due to the replacement of a signature tryptophan, lining the hydrophobic ligand binding groove, with tyrosine (Y85). Consistent with this notion, the Y85W substitution within the WW2 domain exquisitely restores its binding to PPXY motifs in a manner akin to the binding of the WW1 domain of WWOX. Of particular significance is the observation that the WW2 domain augments the binding of the WW1 domain to ErbB4, implying that the former serves as a chaperone within the context of the WW1-WW2 tandem module of WWOX in agreement with our findings reported previously. Altogether, our study sheds new light on the molecular basis of an important WW-ligand interaction involved in mediating a plethora of cellular processes.
Molecular Origin of the Binding of WWOX Tumor Suppressor to ErbB4 Receptor Tyrosine Kinase

PubMed Central

Schuchardt, Brett J.; Bhat, Vikas; Mikles, David C.; McDonald, Caleb B.; Sudol, Marius; Farooq, Amjad

2014-01-01

The ability of WWOX tumor suppressor to physically associate with the intracellular domain (ICD) of ErbB4 receptor tyrosine kinase is believed to play a central role in down-regulating the transcriptional function of the latter. Herein, using various biophysical methods, we show that while the WW1 domain of WWOX binds to PPXY motifs located within the ICD of ErbB4 in a physiologically-relevant manner, the WW2 domain does not. Importantly, while the WW1 domain absolutely requires the integrity of the PPXY consensus sequence, non-consensus residues within and flanking this motif do not appear to be critical for binding. This strongly suggests that the WW1 domain of WWOX is rather promiscuous toward its cellular partners. We also provide evidence that the lack of binding of WW2 domain of WWOX to PPXY motifs is due to the replacement of a signature tryptophan, lining the hydrophobic ligand binding groove, with tyrosine (Y85). Consistent with this notion, the Y85W substitution within the WW2 domain exquisitely restores its binding to PPXY motifs in a manner akin to the binding of WW1 domain of WWOX. Of particular significance is the observation that WW2 domain augments the binding of WW1 domain to ErbB4, implying that the former serves as a chaperone within the context of the WW1–WW2 tandem module of WWOX in agreement with our findings reported previously. Taken together, our study sheds new light on the molecular basis of an important WW-ligand interaction involved in mediating a plethora of cellular processes. PMID:24308844
Targeting cysteine-mediated dimerization of the MUC1-C oncoprotein in human cancer cells

PubMed Central

RAINA, DEEPAK; AHMAD, REHAN; RAJABI, HASAN; PANCHAMOORTHY, GOVIND; KHARBANDA, SURENDER; KUFE, DONALD

2012-01-01

The MUC1 heterodimeric protein is aberrantly overexpressed in diverse human carcinomas and contributes to the malignant phenotype. The MUC1-C transmembrane subunit contains a CQC motif in the cytoplasmic domain that has been implicated in the formation of dimers and in its oncogenic function. The present study demonstrates that MUC1-C forms dimers in human breast and lung cancer cells. MUC1-C dimerization was detectable in the cytoplasm and was independent of MUC1-N, the N-terminal mucin subunit that extends outside the cell. We show that the MUC1-C cytoplasmic domain forms dimers in vitro that are disrupted by reducing agents. Moreover, dimerization of the MUC1-C subunit in cancer cells was blocked by reducing agents and increased by oxidative stress, supporting involvement of the CQC motif in forming disulfide bonds. In support of these observations, mutation of the MUC1-C CQC motif to AQA completely blocked MUC1-C dimerization. Importantly, this study was performed with MUC1-C devoid of fluorescent proteins, such as GFP, CFP and YFP. In this regard, we show that GFP, CFP and YFP themselves form dimers that are readily detectable with cross-linking agents. The present results further demonstrate that a cell-penetrating peptide that targets the MUC1-C CQC cysteines blocks MUC1-C dimerization in cancer cells. These findings provide definitive evidence that: i) the MUC1-C cytoplasmic domain cysteines are necessary and sufficient for MUC1-C dimerization, and ii) these CQC motif cysteines represent an Achilles’ heel for targeting MUC1-C function. PMID:22200620
GIV/Girdin activates Gαi and inhibits Gαs via the same motif

PubMed Central

Gupta, Vijay; Bhandari, Deepali; Leyme, Anthony; Aznar, Nicolas; Midde, Krishna K.; Lo, I-Chung; Ear, Jason; Niesman, Ingrid; López-Sánchez, Inmaculada; Blanco-Canosa, Juan Bautista; von Zastrow, Mark; Garcia-Marcos, Mikel; Farquhar, Marilyn G.; Ghosh, Pradipta

2016-01-01

We previously showed that guanine nucleotide-binding (G) protein α subunit (Gα)-interacting vesicle-associated protein (GIV), a guanine-nucleotide exchange factor (GEF), transactivates Gα activity-inhibiting polypeptide 1 (Gαi) proteins in response to growth factors, such as EGF, using a short C-terminal motif. Subsequent work demonstrated that GIV also binds Gαs and that inactive Gαs promotes maturation of endosomes and shuts down mitogenic MAPK–ERK1/2 signals from endosomes. However, the mechanism and consequences of dual coupling of GIV to two G proteins, Gαi and Gαs, remained unknown. Here we report that GIV is a bifunctional modulator of G proteins; it serves as a guanine nucleotide dissociation inhibitor (GDI) for Gαs using the same motif that allows it to serve as a GEF for Gαi. Upon EGF stimulation, GIV modulates Gαi and Gαs sequentially: first, a key phosphomodification favors the assembly of GIV–Gαi complexes and activates GIV’s GEF function; then a second phosphomodification terminates GIV’s GEF function, triggers the assembly of GIV–Gαs complexes, and activates GIV’s GDI function. By comparing WT and GIV mutants, we demonstrate that GIV inhibits Gαs activity in cells responding to EGF. Consequently, the cAMP→PKA→cAMP response element-binding protein signaling axis is inhibited, the transit time of EGF receptor through early endosomes are accelerated, mitogenic MAPK–ERK1/2 signals are rapidly terminated, and proliferation is suppressed. These insights define a paradigm in G-protein signaling in which a pleiotropically acting modulator uses the same motif both to activate and to inhibit G proteins. Our findings also illuminate how such modulation of two opposing Gα proteins integrates downstream signals and cellular responses. PMID:27621449
PEBP1, a RAF kinase inhibitory protein, negatively regulates starvation-induced autophagy by direct interaction with LC3.

PubMed

Noh, Hae Sook; Hah, Young-Sool; Zada, Sahib; Ha, Ji Hye; Sim, Gyujin; Hwang, Jin Seok; Lai, Trang Huyen; Nguyen, Huynh Quoc; Park, Jae-Yong; Kim, Hyun Joon; Byun, June-Ho; Hahm, Jong Ryeal; Kang, Kee Ryeon; Kim, Deok Ryong

2016-11-01

Autophagy plays a critical role in maintaining cell homeostasis in response to various stressors through protein conjugation and activation of lysosome-dependent degradation. MAP1LC3B/LC3B (microtubule- associated protein 1 light chain 3 β) is conjugated with phosphatidylethanolamine (PE) in the membranes and regulates initiation of autophagy through interaction with many autophagy-related proteins possessing an LC3-interacting region (LIR) motif, which is composed of 2 hydrophobic amino acids (tryptophan and leucine) separated by 2 non-conserved amino acids (WXXL). In this study, we identified a new putative LIR motif in PEBP1/RKIP (phosphatidylethanolamine binding protein 1) that was originally isolated as a PE-binding protein and also a cellular inhibitor of MAPK/ERK signaling. PEBP1 was specifically bound to PE-unconjugated LC3 in cells, and mutation (WXXL mutated to AXXA) of this LIR motif disrupted its interaction with LC3 proteins. Interestingly, overexpression of PEBP1 significantly inhibited starvation-induced autophagy by activating the AKT and MTORC1 (mechanistic target of rapamycin [serine/threonine kinase] complex 1) signaling pathway and consequently suppressing the ULK1 (unc-51 like autophagy activating kinase 1) activity. In contrast, ablation of PEBP1 expression dramatically promoted the autophagic process under starvation conditions. Furthermore, PEBP1 lacking the LIR motif highly stimulated starvation-induced autophagy through the AKT-MTORC1-dependent pathway. PEBP1 phosphorylation at Ser153 caused dissociation of LC3 from the PEBP1-LC3 complex for autophagy induction. PEBP1-dependent suppression of autophagy was not associated with the MAPK pathway. These findings suggest that PEBP1 can act as a negative mediator in autophagy through stimulation of the AKT-MTORC1 pathway and direct interaction with LC3.
Assembly mechanism of FCT region type 1 pili in serotype M6 Streptococcus pyogenes.

PubMed

Nakata, Masanobu; Kimura, Keiji Richard; Sumitomo, Tomoko; Wada, Satoshi; Sugauchi, Akinari; Oiki, Eiji; Higashino, Miharu; Kreikemeyer, Bernd; Podbielski, Andreas; Okahashi, Nobuo; Hamada, Shigeyuki; Isoda, Ryutaro; Terao, Yutaka; Kawabata, Shigetada

2011-10-28

The human pathogen Streptococcus pyogenes produces diverse pili depending on the serotype. We investigated the assembly mechanism of FCT type 1 pili in a serotype M6 strain. The pili were found to be assembled from two precursor proteins, the backbone protein T6 and ancillary protein FctX, and anchored to the cell wall in a manner that requires both a housekeeping sortase enzyme (SrtA) and pilus-associated sortase enzyme (SrtB). SrtB is primarily required for efficient formation of the T6 and FctX complex and subsequent polymerization of T6, whereas proper anchoring of the pili to the cell wall is mainly mediated by SrtA. Because motifs essential for polymerization of pilus backbone proteins in other Gram-positive bacteria are not present in T6, we sought to identify the functional residues involved in this process. Our results showed that T6 encompasses the novel VAKS pilin motif conserved in streptococcal T6 homologues and that the lysine residue (Lys-175) within the motif and cell wall sorting signal of T6 are prerequisites for isopeptide linkage of T6 molecules. Because Lys-175 and the cell wall sorting signal of FctX are indispensable for substantial incorporation of FctX into the T6 pilus shaft, FctX is suggested to be located at the pilus tip, which was also implied by immunogold electron microscopy findings. Thus, the elaborate assembly of FCT type 1 pili is potentially organized by sortase-mediated cross-linking between sorting signals and the amino group of Lys-175 positioned in the VAKS motif of T6, thereby displaying T6 and FctX in a temporospatial manner.
Toxic and nontoxic components of botulinum neurotoxin complex are evolved from a common ancestral zinc protein

DOE Office of Scientific and Technical Information (OSTI.GOV)

Inui, Ken; Japan Society for the Promotion of Science, 1-8 Chiyoda-ku, Tokyo 102-8472; Sagane, Yoshimasa

2012-03-16

Highlights: Black-Right-Pointing-Pointer BoNT and NTNHA proteins share a similar protein architecture. Black-Right-Pointing-Pointer NTNHA and BoNT were both identified as zinc-binding proteins. Black-Right-Pointing-Pointer NTNHA does not have a classical HEXXH zinc-coordinating motif similar to that found in all serotypes of BoNT. Black-Right-Pointing-Pointer Homology modeling implied probable key residues involved in zinc coordination. -- Abstract: Zinc atoms play an essential role in a number of enzymes. Botulinum neurotoxin (BoNT), the most potent toxin known in nature, is a zinc-dependent endopeptidase. Here we identify the nontoxic nonhemagglutinin (NTNHA), one of the BoNT-complex constituents, as a zinc-binding protein, along with BoNT. A protein structuremore » classification database search indicated that BoNT and NTNHA share a similar domain architecture, comprising a zinc-dependent metalloproteinase-like, BoNT coiled-coil motif and concanavalin A-like domains. Inductively coupled plasma-mass spectrometry analysis demonstrated that every single NTNHA molecule contains a single zinc atom. This is the first demonstration of a zinc atom in this protein, as far as we know. However, the NTNHA molecule does not possess any known zinc-coordinating motif, whereas all BoNT serotypes possess the classical HEXXH motif. Homology modeling of the NTNHA structure implied that a consensus K-C-L-I-K-X{sub 35}-D sequence common among all NTNHA serotype molecules appears to coordinate a single zinc atom. These findings lead us to propose that NTNHA and BoNT may have evolved distinct functional specializations following their branching out from a common ancestral zinc protein.« less
Characterization of Novel Calmodulin Binding Domains within IQ Motifs of IQGAP1

PubMed Central

Jang, Deok-Jin; Ban, Byungkwan; Lee, Jin-A

2011-01-01

IQ motif-containing GTPase-activating protein 1 (IQGAP1), which is a well-known calmodulin (CaM) binding protein, is involved in a wide range of cellular processes including cell proliferation, tumorigenesis, adhesion, and migration. Interaction of IQGAP1 with CaM is important for its cellular functions. Although each IQ domain of IQGAP1 for CaM binding has been characterized in a Ca2+-dependent or -independent manner, it was not clear which IQ motifs are physiologically relevant for CaM binding in the cells. In this study, we performed immunoprecipitation using 3xFLAGhCaM in mammalian cell lines to characterize the domains of IQGAP1 that are key for CaM binding under physiological conditions. Interestingly, using this method, we identified two novel domains, IQ(2.7-3) and IQ(3.5-4.4), within IQGAP1 that were involved in Ca2+-independent or -dependent CaM binding, respectively. Mutant analysis clearly showed that the hydrophobic regions within IQ(2.7-3) were mainly involved in apoCaM binding, while the basic amino acids and hydrophobic region of IQ(3.5-4.4) were required for Ca2+/CaM binding. Finally, we showed that IQ(2.7-3) was the main apoCaM binding domain and both IQ(2.7-3) and IQ(3.5-4.4) were required for Ca2+/CaM binding within IQ(1- 2-3-4). Thus, we identified and characterized novel direct CaM binding motifs essential for IQGAP1. This finding indicates that IQGAP1 plays a dynamic role via direct interactions with CaM in a Ca2+-dependent or -independent manner. PMID:22080369
Specific repression of β-globin promoter activity by nuclear ferritin

PubMed Central

Broyles, Robert H.; Belegu, Visar; DeWitt, Christina R.; Shah, Sandeep N.; Stewart, Charles A.; Pye, Quentin N.; Floyd, Robert A.

2001-01-01

Developmental hemoglobin switching involves sequential globin gene activations and repressions that are incompletely understood. Earlier observations, described herein, led us to hypothesize that nuclear ferritin is a repressor of the adult β-globin gene in embryonic erythroid cells. Our data show that a ferritin-family protein in K562 cell nuclear extracts binds specifically to a highly conserved CAGTGC motif in the β-globin promoter at −153 to −148 bp from the cap site, and mutation of the CAGTGC motif reduces binding 20-fold in competition gel-shift assays. Purified human ferritin that is enriched in ferritin-H chains also binds the CAGTGC promoter segment. Expression clones of ferritin-H markedly repress β-globin promoter-driven reporter gene expression in cotransfected CV-1 cells in which the β-promoter has been stimulated with the transcription activator erythroid Krüppel-like factor (EKLF). We have constructed chloramphenicol acetyltransferase reporter plasmids containing either a wild-type or mutant β-globin promoter for the −150 CAGTGC motif and have compared the constructs for susceptibility to repression by ferritin-H in cotransfection assays. We find that stimulation by cotransfected EKLF is retained with the mutant promoter, whereas repression by ferritin-H is lost. Thus, mutation of the −150 CAGTGC motif not only markedly reduces in vitro binding of nuclear ferritin but also abrogates the ability of expressed ferritin-H to repress this promoter in our cell transfection assay, providing a strong link between DNA binding and function, and strong support for our proposal that nuclear ferritin-H is a repressor of the human β-globin gene. Such a repressor could be helpful in treating sickle cell and other genetic diseases. PMID:11481480

Recurring sequence-structure motifs in (βα)8-barrel proteins and experimental optimization of a chimeric protein designed based on such motifs.

PubMed

Wang, Jichao; Zhang, Tongchuan; Liu, Ruicun; Song, Meilin; Wang, Juncheng; Hong, Jiong; Chen, Quan; Liu, Haiyan

2017-02-01

An interesting way of generating novel artificial proteins is to combine sequence motifs from natural proteins, mimicking the evolutionary path suggested by natural proteins comprising recurring motifs. We analyzed the βα and αβ modules of TIM barrel proteins by structure alignment-based sequence clustering. A number of preferred motifs were identified. A chimeric TIM was designed by using recurring elements as mutually compatible interfaces. The foldability of the designed TIM protein was then significantly improved by six rounds of directed evolution. The melting temperature has been improved by more than 20°C. A variety of characteristics suggested that the resulting protein is well-folded. Our analysis provided a library of peptide motifs that is potentially useful for different protein engineering studies. The protein engineering strategy of using recurring motifs as interfaces to connect partial natural proteins may be applied to other protein folds. Copyright © 2016 Elsevier B.V. All rights reserved.
Identifying the preferred RNA motifs and chemotypes that interact by probing millions of combinations.

PubMed

Tran, Tuan; Disney, Matthew D

2012-01-01

RNA is an important therapeutic target but information about RNA-ligand interactions is limited. Here, we report a screening method that probes over 3,000,000 combinations of RNA motif-small molecule interactions to identify the privileged RNA structures and chemical spaces that interact. Specifically, a small molecule library biased for binding RNA was probed for binding to over 70,000 unique RNA motifs in a high throughput solution-based screen. The RNA motifs that specifically bind each small molecule were identified by microarray-based selection. In this library-versus-library or multidimensional combinatorial screening approach, hairpin loops (among a variety of RNA motifs) were the preferred RNA motif space that binds small molecules. Furthermore, it was shown that indole, 2-phenyl indole, 2-phenyl benzimidazole and pyridinium chemotypes allow for specific recognition of RNA motifs. As targeting RNA with small molecules is an extremely challenging area, these studies provide new information on RNA-ligand interactions that has many potential uses.
Identifying the Preferred RNA Motifs and Chemotypes that Interact by Probing Millions of Combinations

PubMed Central

Tran, Tuan; Disney, Matthew D.

2012-01-01

RNA is an important therapeutic target but information about RNA-ligand interactions is limited. Here we report a screening method that probes over 3,000,000 combinations of RNA motif-small molecule interactions to identify the privileged RNA structures and chemical spaces that interact. Specifically, a small molecule library biased for binding RNA was probed for binding to over 70,000 unique RNA motifs in a high throughput solution-based screen. The RNA motifs that specifically bind each small molecule were identified by microarray-based selection. In this library-versus-library or multidimensional combinatorial screening approach, hairpin loops (amongst a variety of RNA motifs) were the preferred RNA motif space that binds small molecules. Furthermore, it was shown that indole, 2-phenyl indole, 2-phenyl benzimidazole, and pyridinium chemotypes allow for specific recognition of RNA motifs. Since targeting RNA with small molecules is an extremely challenging area, these studies provide new information on RNA-ligand interactions that has many potential uses. PMID:23047683
Effect of C(60) fullerene on the duplex formation of i-motif DNA with complementary DNA in solution.

PubMed

Jin, Kyeong Sik; Shin, Su Ryon; Ahn, Byungcheol; Jin, Sangwoo; Rho, Yecheol; Kim, Heesoo; Kim, Seon Jeong; Ree, Moonhor

2010-04-15

The structural effects of fullerene on i-motif DNA were investigated by characterizing the structures of fullerene-free and fullerene-bound i-motif DNA, in the presence of cDNA and in solutions of varying pH, using circular dichroism and synchrotron small-angle X-ray scattering. To facilitate a direct structural comparison between the i-motif and duplex structures in response to pH stimulus, we developed atomic scale structural models for the duplex and i-motif DNA structures, and for the C(60)/i-motif DNA hybrid associated with the cDNA strand, assuming that the DNA strands are present in an ideal right-handed helical conformation. We found that fullerene shifted the pH-induced conformational transition between the i-motif and the duplex structure, possibly due to the hydrophobic interactions between the terminal fullerenes and between the terminal fullerenes and an internal TAA loop in the DNA strand. The hybrid structure showed a dramatic reduction in cyclic hysteresis.
Anion induced conformational preference of Cα NN motif residues in functional proteins.

PubMed

Patra, Piya; Ghosh, Mahua; Banerjee, Raja; Chakrabarti, Jaydeb

2017-12-01

Among different ligand binding motifs, anion binding C α NN motif consisting of peptide backbone atoms of three consecutive residues are observed to be important for recognition of free anions, like sulphate or biphosphate and participate in different key functions. Here we study the interaction of sulphate and biphosphate with C α NN motif present in different proteins. Instead of total protein, a peptide fragment has been studied keeping C α NN motif flanked in between other residues. We use classical force field based molecular dynamics simulations to understand the stability of this motif. Our data indicate fluctuations in conformational preferences of the motif residues in absence of the anion. The anion gives stability to one of these conformations. However, the anion induced conformational preferences are highly sequence dependent and specific to the type of anion. In particular, the polar residues are more favourable compared to the other residues for recognising the anion. © 2017 Wiley Periodicals, Inc.
Modeling protein homopolymeric repeats: possible polyglutamine structural motifs for Huntington's disease.

PubMed

Lathrop, R H; Casale, M; Tobias, D J; Marsh, J L; Thompson, L M

1998-01-01

We describe a prototype system (Poly-X) for assisting an expert user in modeling protein repeats. Poly-X reduces the large number of degrees of freedom required to specify a protein motif in complete atomic detail. The result is a small number of parameters that are easily understood by, and under the direct control of, a domain expert. The system was applied to the polyglutamine (poly-Q) repeat in the first exon of huntingtin, the gene implicated in Huntington's disease. We present four poly-Q structural motifs: two poly-Q beta-sheet motifs (parallel and antiparallel) that constitute plausible alternatives to a similar previously published poly-Q beta-sheet motif, and two novel poly-Q helix motifs (alpha-helix and pi-helix). To our knowledge, helical forms of polyglutamine have not been proposed before. The motifs suggest that there may be several plausible aggregation structures for the intranuclear inclusion bodies which have been found in diseased neurons, and may help in the effort to understand the structural basis for Huntington's disease.
Structural insight into the interaction of proteins containing NPF, DPF, and GPF motifs with the C-terminal EH-domain of EHD1

PubMed Central

Kieken, Fabien; Jović, Marko; Tonelli, Marco; Naslavsky, Naava; Caplan, Steve; Sorgen, Paul L

2009-01-01

Eps15 homology (EH)-domain containing proteins are regulators of endocytic membrane trafficking. EH-domain binding to proteins containing the tripeptide NPF has been well characterized, but recent studies have shown that EH-domains are also able to interact with ligands containing DPF or GPF motifs. We demonstrate that the three motifs interact in a similar way with the EH-domain of EHD1, with the NPF motif having the highest affinity due to the presence of an intermolecular hydrogen bond. The weaker affinity for the DPF and GPF motifs suggests that if complex formation occurs in vivo, they may require high ligand concentrations, the presence of successive motifs and/or specific flanking residues. PMID:19798736
RosettaRemodel: A Generalized Framework for Flexible Backbone Protein Design

PubMed Central

Huang, Po-Ssu; Ban, Yih-En Andrew; Richter, Florian; Andre, Ingemar; Vernon, Robert; Schief, William R.; Baker, David

2011-01-01

We describe RosettaRemodel, a generalized framework for flexible protein design that provides a versatile and convenient interface to the Rosetta modeling suite. RosettaRemodel employs a unified interface, called a blueprint, which allows detailed control over many aspects of flexible backbone protein design calculations. RosettaRemodel allows the construction and elaboration of customized protocols for a wide range of design problems ranging from loop insertion and deletion, disulfide engineering, domain assembly, loop remodeling, motif grafting, symmetrical units, to de novo structure modeling. PMID:21909381
kpLogo: positional k-mer analysis reveals hidden specificity in biological sequences

PubMed Central

2017-01-01

Abstract Motifs of only 1–4 letters can play important roles when present at key locations within macromolecules. Because existing motif-discovery tools typically miss these position-specific short motifs, we developed kpLogo, a probability-based logo tool for integrated detection and visualization of position-specific ultra-short motifs from a set of aligned sequences. kpLogo also overcomes the limitations of conventional motif-visualization tools in handling positional interdependencies and utilizing ranked or weighted sequences increasingly available from high-throughput assays. kpLogo can be found at http://kplogo.wi.mit.edu/. PMID:28460012
A G-quadruplex-containing RNA activates fluorescence in a GFP-like fluorophore

DOE Office of Scientific and Technical Information (OSTI.GOV)

Huang, Hao; Suslov, Nikolai B.; Li, Nan-Sheng

2014-08-21

Spinach is an in vitro–selected RNA aptamer that binds a GFP-like ligand and activates its green fluorescence. Spinach is thus an RNA analog of GFP and has potentially widespread applications for in vivo labeling and imaging. We used antibody-assisted crystallography to determine the structures of Spinach both with and without bound fluorophore at 2.2-Å and 2.4-Å resolution, respectively. Spinach RNA has an elongated structure containing two helical domains separated by an internal bulge that folds into a G-quadruplex motif of unusual topology. The G-quadruplex motif and adjacent nucleotides comprise a partially preformed binding site for the fluorophore. The fluorophore bindsmore » in a planar conformation and makes extensive aromatic stacking and hydrogen bond interactions with the RNA. Our findings provide a foundation for structure-based engineering of new fluorophore-binding RNA aptamers.« less
G = MAT: linking transcription factor expression and DNA binding data.

PubMed

Tretyakov, Konstantin; Laur, Sven; Vilo, Jaak

2011-01-31

Transcription factors are proteins that bind to motifs on the DNA and thus affect gene expression regulation. The qualitative description of the corresponding processes is therefore important for a better understanding of essential biological mechanisms. However, wet lab experiments targeted at the discovery of the regulatory interplay between transcription factors and binding sites are expensive. We propose a new, purely computational method for finding putative associations between transcription factors and motifs. This method is based on a linear model that combines sequence information with expression data. We present various methods for model parameter estimation and show, via experiments on simulated data, that these methods are reliable. Finally, we examine the performance of this model on biological data and conclude that it can indeed be used to discover meaningful associations. The developed software is available as a web tool and Scilab source code at http://biit.cs.ut.ee/gmat/.
G = MAT: Linking Transcription Factor Expression and DNA Binding Data

PubMed Central

Tretyakov, Konstantin; Laur, Sven; Vilo, Jaak

2011-01-01

Transcription factors are proteins that bind to motifs on the DNA and thus affect gene expression regulation. The qualitative description of the corresponding processes is therefore important for a better understanding of essential biological mechanisms. However, wet lab experiments targeted at the discovery of the regulatory interplay between transcription factors and binding sites are expensive. We propose a new, purely computational method for finding putative associations between transcription factors and motifs. This method is based on a linear model that combines sequence information with expression data. We present various methods for model parameter estimation and show, via experiments on simulated data, that these methods are reliable. Finally, we examine the performance of this model on biological data and conclude that it can indeed be used to discover meaningful associations. The developed software is available as a web tool and Scilab source code at http://biit.cs.ut.ee/gmat/. PMID:21297945
Selection of the simplest RNA that binds isoleucine

PubMed Central

LOZUPONE, CATHERINE; CHANGAYIL, SHANKAR; MAJERFELD, IRENE; YARUS, MICHAEL

2003-01-01

We have identified the simplest RNA binding site for isoleucine using selection-amplification (SELEX), by shrinking the size of the randomized region until affinity selection is extinguished. Such a protocol can be useful because selection does not necessarily make the simplest active motif most prominent, as is often assumed. We find an isoleucine binding site that behaves exactly as predicted for the site that requires fewest nucleotides. This UAUU motif (16 highly conserved positions; 27 total), is also the most abundant site in successful selections on short random tracts. The UAUU site, now isolated independently at least 63 times, is a small asymmetric internal loop. Conserved loop sequences include isoleucine codon and anticodon triplets, whose nucleotides are required for amino acid binding. This reproducible association between isoleucine and its coding sequences supports the idea that the genetic code is, at least in part, a stereochemical residue of the most easily isolated RNA–amino acid binding structures. PMID:14561881
Game story space of professional sports: Australian rules football

NASA Astrophysics Data System (ADS)

Kiley, Dilan Patrick; Reagan, Andrew J.; Mitchell, Lewis; Danforth, Christopher M.; Dodds, Peter Sheridan

2016-05-01

Sports are spontaneous generators of stories. Through skill and chance, the script of each game is dynamically written in real time by players acting out possible trajectories allowed by a sport's rules. By properly characterizing a given sport's ecology of "game stories," we are able to capture the sport's capacity for unfolding interesting narratives, in part by contrasting them with random walks. Here we explore the game story space afforded by a data set of 1310 Australian Football League (AFL) score lines. We find that AFL games exhibit a continuous spectrum of stories rather than distinct clusters. We show how coarse graining reveals identifiable motifs ranging from last-minute comeback wins to one-sided blowouts. Through an extensive comparison with biased random walks, we show that real AFL games deliver a broader array of motifs than null models, and we provide consequent insights into the narrative appeal of real games.
Male social workers working with men who batter: dilemmas in gender identity.

PubMed

Bailey, Benjamin; Buchbinder, Eli; Eisikovits, Zvi

2011-06-01

Research into the impact of dealing with intimate partner violence has focused mainly on women who treated victims. The present article explores the interaction between male social workers and battering men. The sample included 15 male social workers who worked with battering men in social services. Data collection was performed through semistructured interviews. The main theme emerging from the interviews describes the reconstruction and renegotiation of the worker's professional and personal self in light of his experiences with violent clients. Two major motifs describing their experience emerged: The first is self-doubt arising from adopting a broad definition of violence, thus creating increased sensitization to and inclusion of a wide range of behaviors under the term violence . The second motif is related to compromising with reality by renegotiating their identity as aggressive, at times, but not violent. Findings were discussed in the light of the constructionist perspective.
A G-Quadruplex-Containing RNA Activates Fluorescence in a GFP-Like Fluorophore

PubMed Central

Huang, Hao; Suslov, Nikolai B.; Li, Nan-Sheng; Shelke, Sandip A.; Evans, Molly E.; Koldobskaya, Yelena; Rice, Phoebe A.; Piccirilli, Joseph A.

2014-01-01

Spinach is an in vitro selected RNA aptamer that binds a GFP-like ligand and activates its green fluorescence.Spinach is thus an RNA analog of GFP, and has potentially widespread applications for in vivo labeling and imaging. We used antibody-assisted crystallography to determine the structures of Spinach both with and without bound fluorophore at 2.2 and 2.4 Å resolution, respectively. Spinach RNA has an elongated structure containing two helical domains separated by an internal bulge that folds into a G-quadruplex motif of unusual topology. The G-quadruplex motif and adjacent nucleotides comprise a partially pre-formed binding site for the fluorophore.The fluorophore binds in a planar conformation and makes extensive aromatic stacking and hydrogen bond interactions with the RNA. Our findings provide a foundation for structure-based engineering of new fluorophore-binding RNA aptamers. PMID:24952597
Gene regulatory and signaling networks exhibit distinct topological distributions of motifs

NASA Astrophysics Data System (ADS)

Ferreira, Gustavo Rodrigues; Nakaya, Helder Imoto; Costa, Luciano da Fontoura

2018-04-01

The biological processes of cellular decision making and differentiation involve a plethora of signaling pathways and gene regulatory circuits. These networks in turn exhibit a multitude of motifs playing crucial parts in regulating network activity. Here we compare the topological placement of motifs in gene regulatory and signaling networks and observe that it suggests different evolutionary strategies in motif distribution for distinct cellular subnetworks.
Discovery of T Cell Receptor β Motifs Specific to HLA-B27-Positive Ankylosing Spondylitis by Deep Repertoire Sequence Analysis.

PubMed

Faham, Malek; Carlton, Victoria; Moorhead, Martin; Zheng, Jianbiao; Klinger, Mark; Pepin, Francois; Asbury, Thomas; Vignali, Marissa; Emerson, Ryan O; Robins, Harlan S; Ireland, James; Baechler-Gillespie, Emily; Inman, Robert D

2017-04-01

Ankylosing spondylitis (AS), a chronic inflammatory disorder, has a notable association with HLA-B27. One hypothesis suggests that a common antigen that binds to HLA-B27 is important for AS disease pathogenesis. This study was undertaken to determine sequences and motifs that are shared among HLA-B27-positive AS patients, using T cell repertoire next-generation sequencing. To identify motifs enriched among B27-positive AS patients, we performed T cell receptor β (TCRβ) repertoire sequencing on samples from 191 B27-positive AS patients, 43 B27-negative AS patients, and 227 controls, and we obtained >77 million TCRβ clonotype sequences. First, we assessed whether any of 50 previously published sequences were enriched in B27-positive AS patients. We then used training and test cohorts to identify discovered motifs that were enriched in B27-positive AS patients versus controls. Six previously published and 11 discovered motifs were enriched in the B27-positive AS samples as compared to controls. After combining motifs related by sequence, we identified a total of 15 independent motifs. Both the full set of 15 motifs and a set of 6 published motifs were enriched in the B27-positive AS patients as compared to B27-positive healthy individuals (P = 0.049 and P = 0.001, respectively). Using an independent cohort, we validated that at least some of these motifs were associated with AS, and not simply with B27-positive status. We identified TCRβ motifs that are enriched in B27-positive AS patients as compared to B27-positive healthy controls. This suggests that a common antigen, presented by HLA-B27 and detected by CD8+ T cells, may be associated with AS disease pathogenesis. © 2016, American College of Rheumatology.
The PDZ-binding motif of Yes-associated protein is required for its co-activation of TEAD-mediated CTGF transcription and oncogenic cell transforming activity

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shimomura, Tadanori; Miyamura, Norio; Hata, Shoji

2014-01-17

Highlights: •Loss of the PDZ-binding motif inhibits constitutively active YAP (5SA)-induced oncogenic cell transformation. •The PDZ-binding motif of YAP promotes its nuclear localization in cultured cells and mouse liver. •Loss of the PDZ-binding motif inhibits YAP (5SA)-induced CTGF transcription in cultured cells and mouse liver. -- Abstract: YAP is a transcriptional co-activator that acts downstream of the Hippo signaling pathway and regulates multiple cellular processes, including proliferation. Hippo pathway-dependent phosphorylation of YAP negatively regulates its function. Conversely, attenuation of Hippo-mediated phosphorylation of YAP increases its ability to stimulate proliferation and eventually induces oncogenic transformation. The C-terminus of YAP contains amore » highly conserved PDZ-binding motif that regulates YAP’s functions in multiple ways. However, to date, the importance of the PDZ-binding motif to the oncogenic cell transforming activity of YAP has not been determined. In this study, we disrupted the PDZ-binding motif in the YAP (5SA) protein, in which the sites normally targeted by Hippo pathway-dependent phosphorylation are mutated. We found that loss of the PDZ-binding motif significantly inhibited the oncogenic transformation of cultured cells induced by YAP (5SA). In addition, the increased nuclear localization of YAP (5SA) and its enhanced activation of TEAD-dependent transcription of the cell proliferation gene CTGF were strongly reduced when the PDZ-binding motif was deleted. Similarly, in mouse liver, deletion of the PDZ-binding motif suppressed nuclear localization of YAP (5SA) and YAP (5SA)-induced CTGF expression. Taken together, our results indicate that the PDZ-binding motif of YAP is critical for YAP-mediated oncogenesis, and that this effect is mediated by YAP’s co-activation of TEAD-mediated CTGF transcription.« less
Multiple Copies of a Simple MYB-Binding Site Confers Trans-regulation by Specific Flavonoid-Related R2R3 MYBs in Diverse Species

PubMed Central

Brendolise, Cyril; Espley, Richard V.; Lin-Wang, Kui; Laing, William; Peng, Yongyan; McGhie, Tony; Dejnoprat, Supinya; Tomes, Sumathi; Hellens, Roger P.; Allan, Andrew C.

2017-01-01

In apple, the MYB transcription factor MYB10 controls the accumulation of anthocyanins. MYB10 is able to auto-activate its expression by binding its own promoter at a specific motif, the R1 motif. In some apple accessions a natural mutation, termed R6, has more copies of this motif within the MYB10 promoter resulting in stronger auto-activation and elevated anthocyanins. Here we show that other anthocyanin-related MYBs selected from apple, pear, strawberry, petunia, kiwifruit and Arabidopsis are able to activate promoters containing the R6 motif. To examine the specificity of this motif, members of the R2R3 MYB family were screened against a promoter harboring the R6 mutation. Only MYBs from subgroups 5 and 6 activate expression by binding the R6 motif, with these MYBs sharing conserved residues in their R2R3 DNA binding domains. Insertion of the apple R6 motif into orthologous promoters of MYB10 in pear (PcMYB10) and Arabidopsis (AtMY75) elevated anthocyanin levels. Introduction of the R6 motif into the promoter region of an anthocyanin biosynthetic enzyme F3′5′H of kiwifruit imparts regulation by MYB10. This results in elevated levels of delphinidin in both tobacco and kiwifruit. Finally, an R6 motif inserted into the promoter the vitamin C biosynthesis gene GDP-L-Gal phosphorylase increases vitamin C content in a MYB10-dependent manner. This motif therefore provides a tool to re-engineer novel MYB-regulated responses in plants. PMID:29163590

Transcriptome Analysis of an Insecticide Resistant Housefly Strain: Insights about SNPs and Regulatory Elements in Cytochrome P450 Genes

PubMed Central

Asp, Torben; Kristensen, Michael

2016-01-01

Background Insecticide resistance in the housefly, Musca domestica, has been investigated for more than 60 years. It will enter a new era after the recent publication of the housefly genome and the development of multiple next generation sequencing technologies. The genetic background of the xenobiotic response can now be investigated in greater detail. Here, we investigate the 454-pyrosequencing transcriptome of the spinosad-resistant 791spin strain in relation to the housefly genome with focus on P450 genes. Results The de novo assembly of clean reads gave 35,834 contigs consisting of 21,780 sequences of the spinosad resistant strain. The 3,648 sequences were annotated with an enzyme code EC number and were mapped to 124 KEGG pathways with metabolic processes as most highly represented pathway. One hundred and twenty contigs were annotated as P450s covering 44 different P450 genes of housefly. Eight differentially expressed P450s genes were identified and investigated for SNPs, CpG islands and common regulatory motifs in promoter and coding regions. Functional annotation clustering of metabolic related genes and motif analysis of P450s revealed their association with epigenetic, transcription and gene expression related functions. The sequence variation analysis resulted in 12 SNPs and eight of them found in cyp6d1. There is variation in location, size and frequency of CpG islands and specific motifs were also identified in these P450s. Moreover, identified motifs were associated to GO terms and transcription factors using bioinformatic tools. Conclusion Transcriptome data of a spinosad resistant strain provide together with genome data fundamental support for future research to understand evolution of resistance in houseflies. Here, we report for the first time the SNPs, CpG islands and common regulatory motifs in differentially expressed P450s. Taken together our findings will serve as a stepping stone to advance understanding of the mechanism and role of P450s in xenobiotic detoxification. PMID:27019205
Evaluation and integration of existing methods for computational prediction of allergens

PubMed Central

2013-01-01

Background Allergy involves a series of complex reactions and factors that contribute to the development of the disease and triggering of the symptoms, including rhinitis, asthma, atopic eczema, skin sensitivity, even acute and fatal anaphylactic shock. Prediction and evaluation of the potential allergenicity is of importance for safety evaluation of foods and other environment factors. Although several computational approaches for assessing the potential allergenicity of proteins have been developed, their performance and relative merits and shortcomings have not been compared systematically. Results To evaluate and improve the existing methods for allergen prediction, we collected an up-to-date definitive dataset consisting of 989 known allergens and massive putative non-allergens. The three most widely used allergen computational prediction approaches including sequence-, motif- and SVM-based (Support Vector Machine) methods were systematically compared using the defined parameters and we found that SVM-based method outperformed the other two methods with higher accuracy and specificity. The sequence-based method with the criteria defined by FAO/WHO (FAO: Food and Agriculture Organization of the United Nations; WHO: World Health Organization) has higher sensitivity of over 98%, but having a low specificity. The advantage of motif-based method is the ability to visualize the key motif within the allergen. Notably, the performances of the sequence-based method defined by FAO/WHO and motif eliciting strategy could be improved by the optimization of parameters. To facilitate the allergen prediction, we integrated these three methods in a web-based application proAP, which provides the global search of the known allergens and a powerful tool for allergen predication. Flexible parameter setting and batch prediction were also implemented. The proAP can be accessed at http://gmobl.sjtu.edu.cn/proAP/main.html. Conclusions This study comprehensively evaluated sequence-, motif- and SVM-based computational prediction approaches for allergens and optimized their parameters to obtain better performance. These findings may provide helpful guidance for the researchers in allergen-prediction. Furthermore, we integrated these methods into a web application proAP, greatly facilitating users to do customizable allergen search and prediction. PMID:23514097
Evaluation and integration of existing methods for computational prediction of allergens.

PubMed

Wang, Jing; Yu, Yabin; Zhao, Yunan; Zhang, Dabing; Li, Jing

2013-01-01

Allergy involves a series of complex reactions and factors that contribute to the development of the disease and triggering of the symptoms, including rhinitis, asthma, atopic eczema, skin sensitivity, even acute and fatal anaphylactic shock. Prediction and evaluation of the potential allergenicity is of importance for safety evaluation of foods and other environment factors. Although several computational approaches for assessing the potential allergenicity of proteins have been developed, their performance and relative merits and shortcomings have not been compared systematically. To evaluate and improve the existing methods for allergen prediction, we collected an up-to-date definitive dataset consisting of 989 known allergens and massive putative non-allergens. The three most widely used allergen computational prediction approaches including sequence-, motif- and SVM-based (Support Vector Machine) methods were systematically compared using the defined parameters and we found that SVM-based method outperformed the other two methods with higher accuracy and specificity. The sequence-based method with the criteria defined by FAO/WHO (FAO: Food and Agriculture Organization of the United Nations; WHO: World Health Organization) has higher sensitivity of over 98%, but having a low specificity. The advantage of motif-based method is the ability to visualize the key motif within the allergen. Notably, the performances of the sequence-based method defined by FAO/WHO and motif eliciting strategy could be improved by the optimization of parameters. To facilitate the allergen prediction, we integrated these three methods in a web-based application proAP, which provides the global search of the known allergens and a powerful tool for allergen predication. Flexible parameter setting and batch prediction were also implemented. The proAP can be accessed at http://gmobl.sjtu.edu.cn/proAP/main.html. This study comprehensively evaluated sequence-, motif- and SVM-based computational prediction approaches for allergens and optimized their parameters to obtain better performance. These findings may provide helpful guidance for the researchers in allergen-prediction. Furthermore, we integrated these methods into a web application proAP, greatly facilitating users to do customizable allergen search and prediction.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Sekiyama, Naotaka; Arthanari, Haribabu; Papadopoulos, Evangelos

The eIF4E-binding protein (4E-BP) is a phosphorylation-dependent regulator of protein synthesis. The nonphosphorylated or minimally phosphorylated form binds translation initiation factor 4E (eIF4E), preventing binding of eIF4G and the recruitment of the small ribosomal subunit. Signaling events stimulate serial phosphorylation of 4E-BP, primarily by mammalian target of rapamycin complex 1 (mTORC1) at residues T 37/T 46, followed by T 70 and S 65. Hyperphosphorylated 4E-BP dissociates from eIF4E, allowing eIF4E to interact with eIF4G and translation initiation to resume. Because overexpression of eIF4E is linked to cellular transformation, 4E-BP is a tumor suppressor, and up-regulation of its activity is amore » goal of interest for cancer therapy. A recently discovered small molecule, eIF4E/eIF4G interaction inhibitor 1 (4EGI-1), disrupts the eIF4E/eIF4G interaction and promotes binding of 4E-BP1 to eIF4E. Structures of 14- to 16-residue 4E-BP fragments bound to eIF4E contain the eIF4E consensus binding motif, 54YXXXXLΦ 60 (motif 1) but lack known phosphorylation sites. We report in this paper a 2.1-Å crystal structure of mouse eIF4E in complex with m 7GTP and with a fragment of human 4E-BP1, extended C-terminally from the consensus-binding motif (4E-BP1 50–84). The extension, which includes a proline-turn-helix segment (motif 2) followed by a loop of irregular structure, reveals the location of two phosphorylation sites (S 65 and T 70). Our major finding is that the C-terminal extension (motif 3) is critical to 4E-BP1–mediated cell cycle arrest and that it partially overlaps with the binding site of 4EGI-1. Finally, the binding of 4E-BP1 and 4EGI-1 to eIF4E is therefore not mutually exclusive, and both ligands contribute to shift the equilibrium toward the inhibition of translation initiation.« less
ALIX Rescues Budding of a Double PTAP/PPEY L-Domain Deletion Mutant of Ebola VP40: A Role for ALIX in Ebola Virus Egress.

PubMed

Han, Ziying; Madara, Jonathan J; Liu, Yuliang; Liu, Wenbo; Ruthel, Gordon; Freedman, Bruce D; Harty, Ronald N

2015-10-01

Ebola (EBOV) is an enveloped, negative-sense RNA virus belonging to the family Filoviridae that causes hemorrhagic fever syndromes with high-mortality rates. To date, there are no licensed vaccines or therapeutics to control EBOV infection and prevent transmission. Consequently, the need to better understand the mechanisms that regulate virus transmission is critical to developing countermeasures. The EBOV VP40 matrix protein plays a central role in late stages of virion assembly and egress, and independent expression of VP40 leads to the production of virus-like particles (VLPs) by a mechanism that accurately mimics budding of live virus. VP40 late (L) budding domains mediate efficient virus-cell separation by recruiting host ESCRT and ESCRT-associated proteins to complete the membrane fission process. L-domains consist of core consensus amino acid motifs including PPxY, P(T/S)AP, and YPx(n)L/I, and EBOV VP40 contains overlapping PPxY and PTAP motifs whose interactions with Nedd4 and Tsg101, respectively, have been characterized extensively. Here, we present data demonstrating for the first time that EBOV VP40 possesses a third L-domain YPx(n)L/I consensus motif that interacts with the ESCRT-III protein Alix. We show that the YPx(n)L/I motif mapping to amino acids 18-26 of EBOV VP40 interacts with the Alix Bro1-V fragment, and that siRNA knockdown of endogenous Alix expression inhibits EBOV VP40 VLP egress. Furthermore, overexpression of Alix Bro1-V rescues VLP production of the budding deficient EBOV VP40 double PTAP/PPEY L-domain deletion mutant to wild-type levels. Together, these findings demonstrate that EBOV VP40 recruits host Alix via a YPx(n)L/I motif that can function as an alternative L-domain to promote virus egress. © The Author 2015. Published by Oxford University Press on behalf of the Infectious Diseases Society of America. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
The RXL motif of the African cassava mosaic virus Rep protein is necessary for rereplication of yeast DNA and viral infection in plants

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hipp, Katharina; Rau, Peter; Schäfer, Benjamin

Geminiviruses, single-stranded DNA plant viruses, encode a replication-initiator protein (Rep) that is indispensable for virus replication. A potential cyclin interaction motif (RXL) in the sequence of African cassava mosaic virus Rep may be an alternative link to cell cycle controls to the known interaction with plant homologs of retinoblastoma protein (pRBR). Mutation of this motif abrogated rereplication in fission yeast induced by expression of wildtype Rep suggesting that Rep interacts via its RXL motif with one or several yeast proteins. The RXL motif is essential for viral infection of Nicotiana benthamiana plants, since mutation of this motif in infectious clonesmore » prevented any symptomatic infection. The cell-cycle link (Clink) protein of a nanovirus (faba bean necrotic yellows virus) was investigated that activates the cell cycle by binding via its LXCXE motif to pRBR. Expression of wildtype Clink and a Clink mutant deficient in pRBR-binding did not trigger rereplication in fission yeast. - Highlights: • A potential cyclin interaction motif is conserved in geminivirus Rep proteins. • In ACMV Rep, this motif (RXL) is essential for rereplication of fission yeast DNA. • Mutating RXL abrogated viral infection completely in Nicotiana benthamiana. • Expression of a nanovirus Clink protein in yeast did not induce rereplication. • Plant viruses may have evolved multiple routes to exploit host DNA synthesis.« less
Distance-dependent duplex DNA destabilization proximal to G-quadruplex/i-motif sequences

PubMed Central

König, Sebastian L. B.; Huppert, Julian L.; Sigel, Roland K. O.; Evans, Amanda C.

2013-01-01

G-quadruplexes and i-motifs are complementary examples of non-canonical nucleic acid substructure conformations. G-quadruplex thermodynamic stability has been extensively studied for a variety of base sequences, but the degree of duplex destabilization that adjacent quadruplex structure formation can cause has yet to be fully addressed. Stable in vivo formation of these alternative nucleic acid structures is likely to be highly dependent on whether sufficient spacing exists between neighbouring duplex- and quadruplex-/i-motif-forming regions to accommodate quadruplexes or i-motifs without disrupting duplex stability. Prediction of putative G-quadruplex-forming regions is likely to be assisted by further understanding of what distance (number of base pairs) is required for duplexes to remain stable as quadruplexes or i-motifs form. Using oligonucleotide constructs derived from precedented G-quadruplexes and i-motif-forming bcl-2 P1 promoter region, initial biophysical stability studies indicate that the formation of G-quadruplex and i-motif conformations do destabilize proximal duplex regions. The undermining effect that quadruplex formation can have on duplex stability is mitigated with increased distance from the duplex region: a spacing of five base pairs or more is sufficient to maintain duplex stability proximal to predicted quadruplex/i-motif-forming regions. PMID:23771141
Evolution subverting essentiality: Dispensability of the cell attachment Arg-Gly-Asp motif in multiply passaged foot-and-mouth disease virus

PubMed Central

Martínez, Miguel A.; Verdaguer, Nuria; Mateu, Mauricio G.; Domingo, Esteban

1997-01-01

Aphthoviruses use a conserved Arg-Gly-Asp triplet for attachment to host cells and this motif is believed to be essential for virus viability. Here we report that this triplet—which is also a widespread motif involved in cell-to-cell adhesion—can become dispensable upon short-term evolution of the virus harboring it. Foot-and-mouth disease virus (FMDV), which was multiply passaged in cell culture, showed an altered repertoire of antigenic variants resistant to a neutralizing monoclonal antibody. The altered repertoire includes variants with substitutions at the Arg-Gly-Asp motif. Mutants lacking this sequence replicated normally in cell culture and were indistinguishable from the parental virus. Studies with individual FMDV clones indicate that amino acid replacements on the capsid surface located around the loop harboring the Arg-Gly-Asp triplet may mediate in the dispensability of this motif. The results show that FMDV quasispecies evolving in a constant biological environment have the capability of rendering totally dispensable a receptor recognition motif previously invariant, and to ensure an alternative pathway for normal viral replication. Thus, variability of highly conserved motifs, even those that viruses have adapted from functional cellular motifs, can contribute to phenotypic flexibility of RNA viruses in nature. PMID:9192645
A Novel Protein Interaction between Nucleotide Binding Domain of Hsp70 and p53 Motif

PubMed Central

Elengoe, Asita; Naser, Mohammed Abu; Hamdan, Salehhuddin

2015-01-01

Currently, protein interaction of Homo sapiens nucleotide binding domain (NBD) of heat shock 70 kDa protein (PDB: 1HJO) with p53 motif remains to be elucidated. The NBD-p53 motif complex enhances the p53 stabilization, thereby increasing the tumor suppression activity in cancer treatment. Therefore, we identified the interaction between NBD and p53 using STRING version 9.1 program. Then, we modeled the three-dimensional structure of p53 motif through homology modeling and determined the binding affinity and stability of NBD-p53 motif complex structure via molecular docking and dynamics (MD) simulation. Human DNA binding domain of p53 motif (SCMGGMNR) retrieved from UniProt (UniProtKB: P04637) was docked with the NBD protein, using the Autodock version 4.2 program. The binding energy and intermolecular energy for the NBD-p53 motif complex were −0.44 Kcal/mol and −9.90 Kcal/mol, respectively. Moreover, RMSD, RMSF, hydrogen bonds, salt bridge, and secondary structure analyses revealed that the NBD protein had a strong bond with p53 motif and the protein-ligand complex was stable. Thus, the current data would be highly encouraging for designing Hsp70 structure based drug in cancer therapy. PMID:26098630
A Novel Protein Interaction between Nucleotide Binding Domain of Hsp70 and p53 Motif.

PubMed

Elengoe, Asita; Naser, Mohammed Abu; Hamdan, Salehhuddin

2015-01-01

Currently, protein interaction of Homo sapiens nucleotide binding domain (NBD) of heat shock 70 kDa protein (PDB: 1HJO) with p53 motif remains to be elucidated. The NBD-p53 motif complex enhances the p53 stabilization, thereby increasing the tumor suppression activity in cancer treatment. Therefore, we identified the interaction between NBD and p53 using STRING version 9.1 program. Then, we modeled the three-dimensional structure of p53 motif through homology modeling and determined the binding affinity and stability of NBD-p53 motif complex structure via molecular docking and dynamics (MD) simulation. Human DNA binding domain of p53 motif (SCMGGMNR) retrieved from UniProt (UniProtKB: P04637) was docked with the NBD protein, using the Autodock version 4.2 program. The binding energy and intermolecular energy for the NBD-p53 motif complex were -0.44 Kcal/mol and -9.90 Kcal/mol, respectively. Moreover, RMSD, RMSF, hydrogen bonds, salt bridge, and secondary structure analyses revealed that the NBD protein had a strong bond with p53 motif and the protein-ligand complex was stable. Thus, the current data would be highly encouraging for designing Hsp70 structure based drug in cancer therapy.
Motif mismatches in microsatellites: insights from genome-wide investigation among 20 insect species.

PubMed

Behura, Susanta K; Severson, David W

2015-02-01

We present a detailed genome-wide comparative study of motif mismatches of microsatellites among 20 insect species representing five taxonomic orders. The results show that varying proportions (∼15-46%) of microsatellites identified in these species are imperfect in motif structure, and that they also vary in chromosomal distribution within genomes. It was observed that the genomic abundance of imperfect repeats is significantly associated with the length and number of motif mismatches of microsatellites. Furthermore, microsatellites with a higher number of mismatches tend to have lower abundance in the genome, suggesting that sequence heterogeneity of repeat motifs is a key determinant of genomic abundance of microsatellites. This relationship seems to be a general feature of microsatellites even in unrelated species such as yeast, roundworm, mouse and human. We provide a mechanistic explanation of the evolutionary link between motif heterogeneity and genomic abundance of microsatellites by examining the patterns of motif mismatches and allele sequences of single-nucleotide polymorphisms identified within microsatellite loci. Using Drosophila Reference Genetic Panel data, we further show that pattern of allelic variation modulates motif heterogeneity of microsatellites, and provide estimates of allele age of specific imperfect microsatellites found within protein-coding genes. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Conserved binding of GCAC motifs by MEC-8, couch potato, and the RBPMS protein family

PubMed Central

Soufari, Heddy

2017-01-01

Precise regulation of mRNA processing, translation, localization, and stability relies on specific interactions with RNA-binding proteins whose biological function and target preference are dictated by their preferred RNA motifs. The RBPMS family of RNA-binding proteins is defined by a conserved RNA recognition motif (RRM) domain found in metazoan RBPMS/Hermes and RBPMS2, Drosophila couch potato, and MEC-8 from Caenorhabditis elegans. In order to determine the parameters of RNA sequence recognition by the RBPMS family, we have first used the N-terminal domain from MEC-8 in binding assays and have demonstrated a preference for two GCAC motifs optimally separated by >6 nucleotides (nt). We have also determined the crystal structure of the dimeric N-terminal RRM domain from MEC-8 in the unbound form, and in complex with an oligonucleotide harboring two copies of the optimal GCAC motif. The atomic details reveal the molecular network that provides specificity to all four bases in the motif, including multiple hydrogen bonds to the initial guanine. Further studies with human RBPMS, as well as Drosophila couch potato, confirm a general preference for this double GCAC motif by other members of the protein family and the presence of this motif in known targets. PMID:28003515
BlockLogo: visualization of peptide and sequence motif conservation

PubMed Central

Olsen, Lars Rønn; Kudahl, Ulrich Johan; Simon, Christian; Sun, Jing; Schönbach, Christian; Reinherz, Ellis L.; Zhang, Guang Lan; Brusic, Vladimir

2013-01-01

BlockLogo is a web-server application for visualization of protein and nucleotide fragments, continuous protein sequence motifs, and discontinuous sequence motifs using calculation of block entropy from multiple sequence alignments. The user input consists of a multiple sequence alignment, selection of motif positions, type of sequence, and output format definition. The output has BlockLogo along with the sequence logo, and a table of motif frequencies. We deployed BlockLogo as an online application and have demonstrated its utility through examples that show visualization of T-cell epitopes and B-cell epitopes (both continuous and discontinuous). Our additional example shows a visualization and analysis of structural motifs that determine specificity of peptide binding to HLA-DR molecules. The BlockLogo server also employs selected experimentally validated prediction algorithms to enable on-the-fly prediction of MHC binding affinity to 15 common HLA class I and class II alleles as well as visual analysis of discontinuous epitopes from multiple sequence alignments. It enables the visualization and analysis of structural and functional motifs that are usually described as regular expressions. It provides a compact view of discontinuous motifs composed of distant positions within biological sequences. BlockLogo is available at: http://research4.dfci.harvard.edu/cvc/blocklogo/ and http://methilab.bu.edu/blocklogo/ PMID:24001880
Hybrid DNA i-motif: Aminoethylprolyl-PNA (pC5) enhance the stability of DNA (dC5) i-motif structure.

PubMed

Gade, Chandrasekhar Reddy; Sharma, Nagendra K

2017-12-15

This report describes the synthesis of C-rich sequence, cytosine pentamer, of aep-PNA and its biophysical studies for the formation of hybrid DNA:aep-PNAi-motif structure with DNA cytosine pentamer (dC 5 ) under acidic pH conditions. Herein, the CD/UV/NMR/ESI-Mass studies strongly support the formation of stable hybrid DNA i-motif structure with aep-PNA even near acidic conditions. Hence aep-PNA C-rich sequence cytosine could be considered as potential DNA i-motif stabilizing agents in vivo conditions. Copyright © 2017 Elsevier Ltd. All rights reserved.
PISMA: A Visual Representation of Motif Distribution in DNA Sequences.

PubMed

Alcántara-Silva, Rogelio; Alvarado-Hermida, Moisés; Díaz-Contreras, Gibrán; Sánchez-Barrios, Martha; Carrera, Samantha; Galván, Silvia Carolina

2017-01-01

Because the graphical presentation and analysis of motif distribution can provide insights for experimental hypothesis, PISMA aims at identifying motifs on DNA sequences, counting and showing them graphically. The motif length ranges from 2 to 10 bases, and the DNA sequences range up to 10 kb. The motif distribution is shown as a bar-code-like, as a gene-map-like, and as a transcript scheme. We obtained graphical schemes of the CpG site distribution from 91 human papillomavirus genomes. Also, we present 2 analyses: one of DNA motifs associated with either methylation-resistant or methylation-sensitive CpG islands and another analysis of motifs associated with exosome RNA secretion. PISMA is developed in Java; it is executable in any type of hardware and in diverse operating systems. PISMA is freely available to noncommercial users. The English version and the User Manual are provided in Supplementary Files 1 and 2, and a Spanish version is available at www.biomedicas.unam.mx/wp-content/software/pisma.zip and www.biomedicas.unam.mx/wp-content/pdf/manual/pisma.pdf.
PISMA: A Visual Representation of Motif Distribution in DNA Sequences

PubMed Central

Alcántara-Silva, Rogelio; Alvarado-Hermida, Moisés; Díaz-Contreras, Gibrán; Sánchez-Barrios, Martha; Carrera, Samantha; Galván, Silvia Carolina

2017-01-01

Background: Because the graphical presentation and analysis of motif distribution can provide insights for experimental hypothesis, PISMA aims at identifying motifs on DNA sequences, counting and showing them graphically. The motif length ranges from 2 to 10 bases, and the DNA sequences range up to 10 kb. The motif distribution is shown as a bar-code–like, as a gene-map–like, and as a transcript scheme. Results: We obtained graphical schemes of the CpG site distribution from 91 human papillomavirus genomes. Also, we present 2 analyses: one of DNA motifs associated with either methylation-resistant or methylation-sensitive CpG islands and another analysis of motifs associated with exosome RNA secretion. Availability and Implementation: PISMA is developed in Java; it is executable in any type of hardware and in diverse operating systems. PISMA is freely available to noncommercial users. The English version and the User Manual are provided in Supplementary Files 1 and 2, and a Spanish version is available at www.biomedicas.unam.mx/wp-content/software/pisma.zip and www.biomedicas.unam.mx/wp-content/pdf/manual/pisma.pdf. PMID:28469418
An Interactive Tool for Discrete Phase Analysis in Two-Phase Flows

NASA Technical Reports Server (NTRS)

Dejong, Frederik J.; Thoren, Stephen J.

1993-01-01

Under a NASA MSFC SBIR Phase 1 effort an interactive software package has been developed for the analysis of discrete (particulate) phase dynamics in two-phase flows in which the discrete phase does not significantly affect the continuous phase. This package contains a Graphical User Interface (based on the X Window system and the Motif tool kit) coupled to a particle tracing program, which allows the user to interactively set up and run a case for which a continuous phase grid and flow field are available. The software has been applied to a solid rocket motor problem, to demonstrate its ease of use and its suitability for problems of engineering interest, and has been delivered to NASA Marshall Space Flight Center.
Methods for Identifying Ligands that Target Nucleic Acid Molecules and Nucleic Acid Structural Motifs

NASA Technical Reports Server (NTRS)

Childs-Disney, Jessica L. (Inventor); Disney, Matthew D. (Inventor)

2017-01-01

Disclosed are methods for identifying a nucleic acid (e.g., RNA, DNA, etc.) motif which interacts with a ligand. The method includes providing a plurality of ligands immobilized on a support, wherein each particular ligand is immobilized at a discrete location on the support; contacting the plurality of immobilized ligands with a nucleic acid motif library under conditions effective for one or more members of the nucleic acid motif library to bind with the immobilized ligands; and identifying members of the nucleic acid motif library that are bound to a particular immobilized ligand. Also disclosed are methods for selecting, from a plurality of candidate ligands, one or more ligands that have increased likelihood of binding to a nucleic acid molecule comprising a particular nucleic acid motif, as well as methods for identifying a nucleic acid which interacts with a ligand.
I-motif DNA structures are formed in the nuclei of human cells

NASA Astrophysics Data System (ADS)

Zeraati, Mahdi; Langley, David B.; Schofield, Peter; Moye, Aaron L.; Rouet, Romain; Hughes, William E.; Bryan, Tracy M.; Dinger, Marcel E.; Christ, Daniel

2018-06-01

Human genome function is underpinned by the primary storage of genetic information in canonical B-form DNA, with a second layer of DNA structure providing regulatory control. I-motif structures are thought to form in cytosine-rich regions of the genome and to have regulatory functions; however, in vivo evidence for the existence of such structures has so far remained elusive. Here we report the generation and characterization of an antibody fragment (iMab) that recognizes i-motif structures with high selectivity and affinity, enabling the detection of i-motifs in the nuclei of human cells. We demonstrate that the in vivo formation of such structures is cell-cycle and pH dependent. Furthermore, we provide evidence that i-motif structures are formed in regulatory regions of the human genome, including promoters and telomeric regions. Our results support the notion that i-motif structures provide key regulatory roles in the genome.
"Without Contraries Is No Progression": Dust as an All-Inclusive, Multifunctional Metaphor in Philip Pullman's "His Dark Materials."

ERIC Educational Resources Information Center

Bird, Anne-Marie

2001-01-01

Draws on Milton's "Paradise Lost" and on motifs found within Gnostic mythology and the poetry of William Blake to explore how Philip Pullman reworks the Judeo-Christian myth of the Fall in his trilogy, "His Dark Materials." Finds at its center "Dust": a conventional metaphor for human physicality in which good and evil, and spirit and matter…

Self-assembly of virus-like particles of canine parvovirus capsid protein expressed from Escherichia coli and application as virus-like particle vaccine.

PubMed

Xu, Jin; Guo, Hui-Chen; Wei, Yan-Quan; Dong, Hu; Han, Shi-Chong; Ao, Da; Sun, De-Hui; Wang, Hai-Ming; Cao, Sui-Zhong; Sun, Shi-Qi

2014-04-01

Canine parvovirus disease is an acute infectious disease caused by canine parvovirus (CPV). Current commercial vaccines are mainly attenuated and inactivated; as such, problems concerning safety may occur. To resolve this problem, researchers developed virus-like particles (VLPs) as biological nanoparticles resembling natural virions and showing high bio-safety. This property allows the use of VLPs for vaccine development and mechanism studies of viral infections. Tissue-specific drug delivery also employs VLPs as biological nanomaterials. Therefore, VLPs derived from CPV have a great potential in medicine and diagnostics. In this study, small ubiquitin-like modifier (SUMO) fusion motif was utilized to express a whole, naturalVP2 protein of CPV in Escherichia coli. After the cleavage of the fusion motif, the CPV VP2 protein has self-assembled into VLPs. The VLPs had a size and shape that resembled the authentic virus capsid. However, the self-assembly efficiency of VLPs can be affected by different pH levels and ionic strengths. The mice vaccinated subcutaneously with CPV VLPs and CPV-specific immune responses were compared with those immunized with the natural virus. This result showed that VLPs can effectively induce anti-CPV specific antibody and lymphocyte proliferation as a whole virus. This result further suggested that the antigen epitope of CPV was correctly present on VLPs, thereby showing the potential application of a VLP-based CPV vaccine.
Core regulatory network motif underlies the ocellar complex patterning in Drosophila melanogaster

NASA Astrophysics Data System (ADS)

Aguilar-Hidalgo, D.; Lemos, M. C.; Córdoba, A.

2015-03-01

During organogenesis, developmental programs governed by Gene Regulatory Networks (GRN) define the functionality, size and shape of the different constituents of living organisms. Robustness, thus, is an essential characteristic that GRNs need to fulfill in order to maintain viability and reproducibility in a species. In the present work we analyze the robustness of the patterning for the ocellar complex formation in Drosophila melanogaster fly. We have systematically pruned the GRN that drives the development of this visual system to obtain the minimum pathway able to satisfy this pattern. We found that the mechanism underlying the patterning obeys to the dynamics of a 3-nodes network motif with a double negative feedback loop fed by a morphogenetic gradient that triggers the inhibition in a French flag problem fashion. A Boolean modeling of the GRN confirms robustness in the patterning mechanism showing the same result for different network complexity levels. Interestingly, the network provides a steady state solution in the interocellar part of the patterning and an oscillatory regime in the ocelli. This theoretical result predicts that the ocellar pattern may underlie oscillatory dynamics in its genetic regulation.
The Transcriptional Complex Between the BCL2 i-Motif and hnRNP LL Is a Molecular Switch for Control of Gene Expression That Can Be Modulated by Small Molecules

PubMed Central

2015-01-01

In a companion paper (DOI: 10.021/ja410934b) we demonstrate that the C-rich strand of the cis-regulatory element in the BCL2 promoter element is highly dynamic in nature and can form either an i-motif or a flexible hairpin. Under physiological conditions these two secondary DNA structures are found in an equilibrium mixture, which can be shifted by the addition of small molecules that trap out either the i-motif (IMC-48) or the flexible hairpin (IMC-76). In cellular experiments we demonstrate that the addition of these molecules has opposite effects on BCL2 gene expression and furthermore that these effects are antagonistic. In this contribution we have identified a transcriptional factor that recognizes and binds to the BCL2 i-motif to activate transcription. The molecular basis for the recognition of the i-motif by hnRNP LL is determined, and we demonstrate that the protein unfolds the i-motif structure to form a stable single-stranded complex. In subsequent experiments we show that IMC-48 and IMC-76 have opposite, antagonistic effects on the formation of the hnRNP LL–i-motif complex as well as on the transcription factor occupancy at the BCL2 promoter. For the first time we propose that the i-motif acts as a molecular switch that controls gene expression and that small molecules that target the dynamic equilibrium of the i-motif and the flexible hairpin can differentially modulate gene expression. PMID:24559432
Genome Wide Identification, Evolutionary, and Expression Analysis of VQ Genes from Two Pyrus Species.

PubMed

Cao, Yunpeng; Meng, Dandan; Abdullah, Muhammad; Jin, Qing; Lin, Yi; Cai, Yongping

2018-04-23

The VQ motif-containing gene, a member of the plant-specific genes, is involved in the plant developmental process and various stress responses. The VQ motif-containing gene family has been studied in several plants, such as rice ( Oryza sativa ), maize ( Zea mays ), and Arabidopsis ( Arabidopsis thaliana ). However, no systematic study has been performed in Pyrus species, which have important economic value. In our study, we identified 41 and 28 VQ motif-containing genes in Pyrus bretschneideri and Pyrus communis , respectively. Phylogenetic trees were calculated using A. thaliana and O. sativa VQ motif-containing genes as a template, allowing us to categorize these genes into nine subfamilies. Thirty-two and eight paralogous of VQ motif-containing genes were found in P. bretschneideri and P. communis , respectively, showing that the VQ motif-containing genes had a more remarkable expansion in P. bretschneideri than in P. communis . A total of 31 orthologous pairs were identified from the P. bretschneideri and P. communis VQ motif-containing genes. Additionally, among the paralogs, we found that these duplication gene pairs probably derived from segmental duplication/whole-genome duplication (WGD) events in the genomes of P. bretschneideri and P. communis , respectively. The gene expression profiles in both P. bretschneideri and P. communis fruits suggested functional redundancy for some orthologous gene pairs derived from a common ancestry, and sub-functionalization or neo-functionalization for some of them. Our study provided the first systematic evolutionary analysis of the VQ motif-containing genes in Pyrus , and highlighted the diversification and duplication of VQ motif-containing genes in both P. bretschneideri and P. communis .
An effective approach for annotation of protein families with low sequence similarity and conserved motifs: identifying GDSL hydrolases across the plant kingdom.

PubMed

Vujaklija, Ivan; Bielen, Ana; Paradžik, Tina; Biđin, Siniša; Goldstein, Pavle; Vujaklija, Dušica

2016-02-18

The massive accumulation of protein sequences arising from the rapid development of high-throughput sequencing, coupled with automatic annotation, results in high levels of incorrect annotations. In this study, we describe an approach to decrease annotation errors of protein families characterized by low overall sequence similarity. The GDSL lipolytic family comprises proteins with multifunctional properties and high potential for pharmaceutical and industrial applications. The number of proteins assigned to this family has increased rapidly over the last few years. In particular, the natural abundance of GDSL enzymes reported recently in plants indicates that they could be a good source of novel GDSL enzymes. We noticed that a significant proportion of annotated sequences lack specific GDSL motif(s) or catalytic residue(s). Here, we applied motif-based sequence analyses to identify enzymes possessing conserved GDSL motifs in selected proteomes across the plant kingdom. Motif-based HMM scanning (Viterbi decoding-VD and posterior decoding-PD) and the here described PD/VD protocol were successfully applied on 12 selected plant proteomes to identify sequences with GDSL motifs. A significant number of identified GDSL sequences were novel. Moreover, our scanning approach successfully detected protein sequences lacking at least one of the essential motifs (171/820) annotated by Pfam profile search (PfamA) as GDSL. Based on these analyses we provide a curated list of GDSL enzymes from the selected plants. CLANS clustering and phylogenetic analysis helped us to gain a better insight into the evolutionary relationship of all identified GDSL sequences. Three novel GDSL subfamilies as well as unreported variations in GDSL motifs were discovered in this study. In addition, analyses of selected proteomes showed a remarkable expansion of GDSL enzymes in the lycophyte, Selaginella moellendorffii. Finally, we provide a general motif-HMM scanner which is easily accessible through the graphical user interface ( http://compbio.math.hr/ ). Our results show that scanning with a carefully parameterized motif-HMM is an effective approach for annotation of protein families with low sequence similarity and conserved motifs. The results of this study expand current knowledge and provide new insights into the evolution of the large GDSL-lipase family in land plants.
Detecting DNA regulatory motifs by incorporating positional trendsin information content

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kechris, Katherina J.; van Zwet, Erik; Bickel, Peter J.

2004-05-04

On the basis of the observation that conserved positions in transcription factor binding sites are often clustered together, we propose a simple extension to the model-based motif discovery methods. We assign position-specific prior distributions to the frequency parameters of the model, penalizing deviations from a specified conservation profile. Examples with both simulated and real data show that this extension helps discover motifs as the data become noisier or when there is a competing false motif.
Molecular Mechanisms Controlling GLUT4 Intracellular Retention

PubMed Central

Blot, Vincent

2008-01-01

In basal adipocytes, glucose transporter 4 (GLUT4) is sequestered intracellularly by an insulin-reversible retention mechanism. Here, we analyze the roles of three GLUT4 trafficking motifs (FQQI, TELEY, and LL), providing molecular links between insulin signaling, cellular trafficking machinery, and the motifs in the specialized trafficking of GLUT4. Our results support a GLUT4 retention model that involves two linked intracellular cycles: one between endosomes and a retention compartment, and the other between endosomes and specialized GLUT4 transport vesicles. Targeting of GLUT4 to the former is dependent on the FQQI motif and its targeting to the latter is dependent on the TELEY motif. These two motifs act independently in retention, with the TELEY-dependent step being under the control of signaling downstream of the AS160 rab GTPase activating protein. Segregation of GLUT4 from endosomes, although positively correlated with the degree of basal retention, does not completely account for GLUT4 retention or insulin-responsiveness. Mutation of the LL motif slows return to basal intracellular retention after insulin withdrawal. Knockdown of clathrin adaptin protein complex-1 (AP-1) causes a delay in the return to intracellular retention after insulin withdrawal. The effects of mutating the LL motif and knockdown of AP-1 were not additive, establishing that AP-1 regulation of GLUT4 trafficking requires the LL motif. PMID:18550797
A systems wide mass spectrometric based linear motif screen to identify dominant in-vivo interacting proteins for the ubiquitin ligase MDM2.

PubMed

Nicholson, Judith; Scherl, Alex; Way, Luke; Blackburn, Elizabeth A; Walkinshaw, Malcolm D; Ball, Kathryn L; Hupp, Ted R

2014-06-01

Linear motifs mediate protein-protein interactions (PPI) that allow expansion of a target protein interactome at a systems level. This study uses a proteomics approach and linear motif sub-stratifications to expand on PPIs of MDM2. MDM2 is a multi-functional protein with over one hundred known binding partners not stratified by hierarchy or function. A new linear motif based on a MDM2 interaction consensus is used to select novel MDM2 interactors based on Nutlin-3 responsiveness in a cell-based proteomics screen. MDM2 binds a subset of peptide motifs corresponding to real proteins with a range of allosteric responses to MDM2 ligands. We validate cyclophilin B as a novel protein with a consensus MDM2 binding motif that is stabilised by Nutlin-3 in vivo, thus identifying one of the few known interactors of MDM2 that is stabilised by Nutlin-3. These data invoke two modes of peptide binding at the MDM2 N-terminus that rely on a consensus core motif to control the equilibrium between MDM2 binding proteins. This approach stratifies MDM2 interacting proteins based on the linear motif feature and provides a new biomarker assay to define clinically relevant Nutlin-3 responsive MDM2 interactors. Copyright © 2014 Elsevier Inc. All rights reserved.
Mixotrophy and intraguild predation - dynamic consequences of shifts between food web motifs

NASA Astrophysics Data System (ADS)

Karnatak, Rajat; Wollrab, Sabine

2017-06-01

Mixotrophy is ubiquitous in microbial communities of aquatic systems with many flagellates being able to use autotroph as well as heterotroph pathways for energy acquisition. The usage of one over the other pathway is associated with resource availability and the coupling of alternative pathways has strong implications for system stability. We investigated the impact of dominance of different energy pathways related to relative resource availability on system dynamics in the setting of a tritrophic food web motif. This motif consists of a mixotroph feeding on a purely autotroph species while competing for a shared resource. In addition, the autotroph can use an additional exclusive food source. By changing the relative abundance of shared vs. exclusive food source, we shift the food web motif from an intraguild predation motif to a food chain motif. We analyzed the dependence of system dynamics on absolute and relative resource availability. In general, the system exhibits a transition from stable to oscillatory dynamics with increasing nutrient availability. However, this transition occurs at a much lower nutrient level for the food chain in comparison to the intraguild predation motif. A similar transition is also observed with variations in the relative abundance of food sources for a range of nutrient levels. We expect this shift in food web motifs to occur frequently in microbial communities and therefore the results from our study are highly relevant for natural systems.
A naturally occurring, noncanonical GTP aptamer made of simple tandem repeats

PubMed Central

Curtis, Edward A; Liu, David R

2014-01-01

Recently, we used in vitro selection to identify a new class of naturally occurring GTP aptamer called the G motif. Here we report the discovery and characterization of a second class of naturally occurring GTP aptamer, the “CA motif.” The primary sequence of this aptamer is unusual in that it consists entirely of tandem repeats of CA-rich motifs as short as three nucleotides. Several active variants of the CA motif aptamer lack the ability to form consecutive Watson-Crick base pairs in any register, while others consist of repeats containing only cytidine and adenosine residues, indicating that noncanonical interactions play important roles in its structure. The circular dichroism spectrum of the CA motif aptamer is distinct from that of A-form RNA and other major classes of nucleic acid structures. Bioinformatic searches indicate that the CA motif is absent from most archaeal and bacterial genomes, but occurs in at least 70 percent of approximately 400 eukaryotic genomes examined. These searches also uncovered several phylogenetically conserved examples of the CA motif in rodent (mouse and rat) genomes. Together, these results reveal the existence of a second class of naturally occurring GTP aptamer whose sequence requirements, like that of the G motif, are not consistent with those of a canonical secondary structure. They also indicate a new and unexpected potential biochemical activity of certain naturally occurring tandem repeats. PMID:24824832
Coiled-Coil Hydrogels. Effect of Grafted Copolymer Composition and Cyclization on Gelation

PubMed Central

Dušek, Karel; Dušková-Smrčková, Miroslava; Yang, Jiyuan; Kopeček, Jindřich

2009-01-01

A mean-field theoretical approach was developed to model gelation of solutions of hydrophilic polymers with grafted peptide motifs capable of forming associates of coiled-coil type. The model addresses the competition between associates engaged in branching and cyclization. It results in relative concentrations of intra- and intermolecular associates in dependence on associate strength and motif concentration. The cyclization probability is derived from the model of equivalent Gaussian chain and takes into account all possible paths connecting the interacting motifs. Examination of the association-dissociation equilibria, controlled by the equilibrium constant for association taken as input information, determines the fractions of inter- and intramolecularly associated motifs. The gelation model is based on the statistical theory of branching processes and in combination with the cyclization model predicts the critical concentration delimiting the regions of gelled and liquid states of the system. A comparison between predictions of the model and experimental data available for aqueous solutions of poly[N-(2-hydroxypropyl)methacrylamide] grafted with oppositely charged pentaheptad peptides, CCE and CCK, indicates that the association constant of grafted motifs by four orders of magnitude lower than that of free motifs. It is predicted that at the critical concentration of each motif of about 6×10−7 mol/cm3, about half of motifs in associated state is engaged in intramolecular bonds. PMID:20160932
Characterization of a novel androgen receptor (AR) coregulator RIPK1 and related chemicals that suppress AR-mediated prostate cancer growth via peptide and chemical screening.

PubMed

Hsu, Cheng-Lung; Liu, Jai-Shin; Lin, Ting-Wei; Chang, Ying-Hsu; Kuo, Yung-Chia; Lin, An-Chi; Ting, Huei-Ju; Pang, See-Tong; Lee, Li-Yu; Ma, Wen-Lung; Lin, Chun-Cheng; Wu, Wen-Guey

2017-09-19

Using bicalutamide-androgen receptor (AR) DNA binding domain-ligand binding domain as bait, we observed enrichment of FxxFY motif-containing peptides. Protein database searches revealed the presence of receptor-interacting protein kinase 1 (RIPK1) harboring one FxxFY motif. RIPK1 interacted directly with AR and suppressed AR transactivation in a dose-dependent manner. Domain mapping experiments showed that the FxxFY motif in RIPK1 is critical for interactions with AR and the death domain of RIPK1 plays a crucial role in its inhibitory effect on transactivation. In terms of tissue expression, RIPK1 levels were markedly higher in benign prostate hyperplasia and non-cancerous tissue regions relative to the tumor area. With the aid of computer modeling for screening of chemicals targeting activation function 2 (AF-2) of AR, we identified oxadiazole derivatives as good candidates and subsequently generated a small library of these compounds. A number of candidates could effectively suppress AR transactivation and AR-related functions in vitro and in vivo with tolerable toxicity via inhibiting AR-peptide, AR-coregulator and AR N-C interactions. Combination of these chemicals with antiandrogen had an additive suppressive effect on AR transcriptional activity. Our collective findings may pave the way in creating new strategies for the development and design of anti-AR drugs.
VE-cadherin RGD motifs promote metastasis and constitute a potential therapeutic target in melanoma and breast cancers.

PubMed

Bartolomé, Rubén A; Torres, Sofía; Isern de Val, Soledad; Escudero-Paniagua, Beatriz; Calviño, Eva; Teixidó, Joaquín; Casal, J Ignacio

2017-01-03

We have investigated the role of vascular-endothelial (VE)-cadherin in melanoma and breast cancer metastasis. We found that VE-cadherin is expressed in highly aggressive melanoma and breast cancer cell lines. Remarkably, inactivation of VE-cadherin triggered a significant loss of malignant traits (proliferation, adhesion, invasion and transendothelial migration) in melanoma and breast cancer cells. These effects, except transendothelial migration, were induced by the VE-cadherin RGD motifs. Co-immunoprecipitation experiments demonstrated an interaction between VE-cadherin and α2β1 integrin, with the RGD motifs found to directly affect β1 integrin activation. VE-cadherin-mediated integrin signaling occurred through specific activation of SRC, ERK and JNK, including AKT in melanoma. Knocking down VE-cadherin suppressed lung colonization capacity of melanoma or breast cancer cells inoculated in mice, while pre-incubation with VE-cadherin RGD peptides promoted lung metastasis for both cancer types. Finally, an in silico study revealed the association of high VE-cadherin expression with poor survival in a subset of melanoma patients and breast cancer patients showing low CD34 expression. These findings support a general role for VE-cadherin and other RGD cadherins as critical regulators of lung and liver metastasis in multiple solid tumours. These results pave the way for cadherin-specific RGD targeted therapies to control disseminated metastasis in multiple cancers.
Analysis of the interaction with the hepatitis C virus mRNA reveals an alternative mode of RNA recognition by the human La protein.

PubMed

Martino, Luigi; Pennell, Simon; Kelly, Geoff; Bui, Tam T T; Kotik-Kogan, Olga; Smerdon, Stephen J; Drake, Alex F; Curry, Stephen; Conte, Maria R

2012-02-01

Human La protein is an essential factor in the biology of both coding and non-coding RNAs. In the nucleus, La binds primarily to 3' oligoU containing RNAs, while in the cytoplasm La interacts with an array of different mRNAs lacking a 3' UUU(OH) trailer. An example of the latter is the binding of La to the IRES domain IV of the hepatitis C virus (HCV) RNA, which is associated with viral translation stimulation. By systematic biophysical investigations, we have found that La binds to domain IV using an RNA recognition that is quite distinct from its mode of binding to RNAs with a 3' UUU(OH) trailer: although the La motif and first RNA recognition motif (RRM1) are sufficient for high-affinity binding to 3' oligoU, recognition of HCV domain IV requires the La motif and RRM1 to work in concert with the atypical RRM2 which has not previously been shown to have a significant role in RNA binding. This new mode of binding does not appear sequence specific, but recognizes structural features of the RNA, in particular a double-stranded stem flanked by single-stranded extensions. These findings pave the way for a better understanding of the role of La in viral translation initiation.
Identification of 15 candidate structured noncoding RNA motifs in fungi by comparative genomics.

PubMed

Li, Sanshu; Breaker, Ronald R

2017-10-13

With the development of rapid and inexpensive DNA sequencing, the genome sequences of more than 100 fungal species have been made available. This dataset provides an excellent resource for comparative genomics analyses, which can be used to discover genetic elements, including noncoding RNAs (ncRNAs). Bioinformatics tools similar to those used to uncover novel ncRNAs in bacteria, likewise, should be useful for searching fungal genomic sequences, and the relative ease of genetic experiments with some model fungal species could facilitate experimental validation studies. We have adapted a bioinformatics pipeline for discovering bacterial ncRNAs to systematically analyze many fungal genomes. This comparative genomics pipeline integrates information on conserved RNA sequence and structural features with alternative splicing information to reveal fungal RNA motifs that are candidate regulatory domains, or that might have other possible functions. A total of 15 prominent classes of structured ncRNA candidates were identified, including variant HDV self-cleaving ribozyme representatives, atypical snoRNA candidates, and possible structured antisense RNA motifs. Candidate regulatory motifs were also found associated with genes for ribosomal proteins, S-adenosylmethionine decarboxylase (SDC), amidase, and HexA protein involved in Woronin body formation. We experimentally confirm that the variant HDV ribozymes undergo rapid self-cleavage, and we demonstrate that the SDC RNA motif reduces the expression of SAM decarboxylase by translational repression. Furthermore, we provide evidence that several other motifs discovered in this study are likely to be functional ncRNA elements. Systematic screening of fungal genomes using a computational discovery pipeline has revealed the existence of a variety of novel structured ncRNAs. Genome contexts and similarities to known ncRNA motifs provide strong evidence for the biological and biochemical functions of some newly found ncRNA motifs. Although initial examinations of several motifs provide evidence for their likely functions, other motifs will require more in-depth analysis to reveal their functions.
Membrane Curvature Sensing by Amphipathic Helices Is Modulated by the Surrounding Protein Backbone.

PubMed

Doucet, Christine M; Esmery, Nina; de Saint-Jean, Maud; Antonny, Bruno

2015-01-01

Membrane curvature is involved in numerous biological pathways like vesicle trafficking, endocytosis or nuclear pore complex assembly. In addition to its topological role, membrane curvature is sensed by specific proteins, enabling the coordination of biological processes in space and time. Amongst membrane curvature sensors are the ALPS (Amphipathic Lipid Packing Sensors). ALPS motifs are short peptides with peculiar amphipathic properties. They are found in proteins targeted to distinct curved membranes, mostly in the early secretory pathway. For instance, the ALPS motif of the golgin GMAP210 binds trafficking vesicles, while the ALPS motif of Nup133 targets nuclear pores. It is not clear if, besides curvature sensitivity, ALPS motifs also provide target specificity, or if other domains in the surrounding protein backbone are involved. To elucidate this aspect, we studied the subcellular localization of ALPS motifs outside their natural protein context. The ALPS motifs of GMAP210 or Nup133 were grafted on artificial fluorescent probes. Importantly, ALPS motifs are held in different positions and these contrasting architectures were mimicked by the fluorescent probes. The resulting chimeras recapitulated the original proteins localization, indicating that ALPS motifs are sufficient to specifically localize proteins. Modulating the electrostatic or hydrophobic content of Nup133 ALPS motif modified its avidity for cellular membranes but did not change its organelle targeting properties. In contrast, the structure of the backbone surrounding the helix strongly influenced targeting. In particular, introducing an artificial coiled-coil between ALPS and the fluorescent protein increased membrane curvature sensitivity. This coiled-coil domain also provided membrane curvature sensitivity to the amphipathic helix of Sar1. The degree of curvature sensitivity within the coiled-coil context remains correlated to the natural curvature sensitivity of the helices. This suggests that the chemistry of ALPS motifs is a key parameter for membrane curvature sensitivity, which can be further modulated by the surrounding protein backbone.
Motif types, motif locations and base composition patterns around the RNA polyadenylation site in microorganisms, plants and animals

PubMed Central

2014-01-01

Background The polyadenylation of RNA is critical for gene functioning, but the conserved sequence motifs (often called signal or signature motifs), motif locations and abundances, and base composition patterns around mRNA polyadenylation [poly(A)] sites are still uncharacterized in most species. The evolutionary tendency for poly(A) site selection is still largely unknown. Results We analyzed the poly(A) site regions of 31 species or phyla. Different groups of species showed different poly(A) signal motifs: UUACUU at the poly(A) site in the parasite Trypanosoma cruzi; UGUAAC (approximately 13 bases upstream of the site) in the alga Chlamydomonas reinhardtii; UGUUUG (or UGUUUGUU) at mainly the fourth base downstream of the poly(A) site in the parasite Blastocystis hominis; and AAUAAA at approximately 16 bases and approximately 19 bases upstream of the poly(A) site in animals and plants, respectively. Polyadenylation signal motifs are usually several hundred times more abundant around poly(A) sites than in whole genomes. These predominant motifs usually had very specific locations, whether upstream of, at, or downstream of poly(A) sites, depending on the species or phylum. The poly(A) site was usually an adenosine (A) in all analyzed species except for B. hominis, and there was weak A predominance in C. reinhardtii. Fungi, animals, plants, and the protist Phytophthora infestans shared a general base abundance pattern (or base composition pattern) of “U-rich—A-rich—U-rich—Poly(A) site—U-rich regions”, or U-A-U-A-U for short, with some variation for each kingdom or subkingdom. Conclusion This study identified the poly(A) signal motifs, motif locations, and base composition patterns around mRNA poly(A) sites in protists, fungi, plants, and animals and provided insight into poly(A) site evolution. PMID:25052519
Fast social-like learning of complex behaviors based on motor motifs.

PubMed

Calvo Tapia, Carlos; Tyukin, Ivan Y; Makarov, Valeri A

2018-05-01

Social learning is widely observed in many species. Less experienced agents copy successful behaviors exhibited by more experienced individuals. Nevertheless, the dynamical mechanisms behind this process remain largely unknown. Here we assume that a complex behavior can be decomposed into a sequence of n motor motifs. Then a neural network capable of activating motor motifs in a given sequence can drive an agent. To account for (n-1)! possible sequences of motifs in a neural network, we employ the winnerless competition approach. We then consider a teacher-learner situation: one agent exhibits a complex movement, while another one aims at mimicking the teacher's behavior. Despite the huge variety of possible motif sequences we show that the learner, equipped with the provided learning model, can rewire "on the fly" its synaptic couplings in no more than (n-1) learning cycles and converge exponentially to the durations of the teacher's motifs. We validate the learning model on mobile robots. Experimental results show that the learner is indeed capable of copying the teacher's behavior composed of six motor motifs in a few learning cycles. The reported mechanism of learning is general and can be used for replicating different functions, including, for example, sound patterns or speech.
Transient α-helices in the disordered RPEL motifs of the serum response factor coactivator MKL1

NASA Astrophysics Data System (ADS)

Mizuguchi, Mineyuki; Fuju, Takahiro; Obita, Takayuki; Ishikawa, Mitsuru; Tsuda, Masaaki; Tabuchi, Akiko

2014-06-01

The megakaryoblastic leukemia 1 (MKL1) protein functions as a transcriptional coactivator of the serum response factor. MKL1 has three RPEL motifs (RPEL1, RPEL2, and RPEL3) in its N-terminal region. MKL1 binds to monomeric G-actin through RPEL motifs, and the dissociation of MKL1 from G-actin promotes the translocation of MKL1 to the nucleus. Although structural data are available for RPEL motifs of MKL1 in complex with G-actin, the structural characteristics of RPEL motifs in the free state have been poorly defined. Here we characterized the structures of free RPEL motifs using NMR and CD spectroscopy. NMR and CD measurements showed that free RPEL motifs are largely unstructured in solution. However, NMR analysis identified transient α-helices in the regions where helices α1 and α2 are induced upon binding to G-actin. Proline mutagenesis showed that the transient α-helices are locally formed without helix-helix interactions. The helix content is higher in the order of RPEL1, RPEL2, and RPEL3. The amount of preformed structure may correlate with the binding affinity between the intrinsically disordered protein and its target molecule.
RNA Bricks—a database of RNA 3D motifs and their interactions

PubMed Central

Chojnowski, Grzegorz; Waleń, Tomasz; Bujnicki, Janusz M.

2014-01-01

The RNA Bricks database (http://iimcb.genesilico.pl/rnabricks), stores information about recurrent RNA 3D motifs and their interactions, found in experimentally determined RNA structures and in RNA–protein complexes. In contrast to other similar tools (RNA 3D Motif Atlas, RNA Frabase, Rloom) RNA motifs, i.e. ‘RNA bricks’ are presented in the molecular environment, in which they were determined, including RNA, protein, metal ions, water molecules and ligands. All nucleotide residues in RNA bricks are annotated with structural quality scores that describe real-space correlation coefficients with the electron density data (if available), backbone geometry and possible steric conflicts, which can be used to identify poorly modeled residues. The database is also equipped with an algorithm for 3D motif search and comparison. The algorithm compares spatial positions of backbone atoms of the user-provided query structure and of stored RNA motifs, without relying on sequence or secondary structure information. This enables the identification of local structural similarities among evolutionarily related and unrelated RNA molecules. Besides, the search utility enables searching ‘RNA bricks’ according to sequence similarity, and makes it possible to identify motifs with modified ribonucleotide residues at specific positions. PMID:24220091

Helix-packing motifs in membrane proteins.

PubMed

Walters, R F S; DeGrado, W F

2006-09-12

The fold of a helical membrane protein is largely determined by interactions between membrane-imbedded helices. To elucidate recurring helix-helix interaction motifs, we dissected the crystallographic structures of membrane proteins into a library of interacting helical pairs. The pairs were clustered according to their three-dimensional similarity (rmsd
Discovery of candidate KEN-box motifs using cell cycle keyword enrichment combined with native disorder prediction and motif conservation.

PubMed

Michael, Sushama; Travé, Gilles; Ramu, Chenna; Chica, Claudia; Gibson, Toby J

2008-02-15

KEN-box-mediated target selection is one of the mechanisms used in the proteasomal destruction of mitotic cell cycle proteins via the APC/C complex. While annotating the Eukaryotic Linear Motif resource (ELM, http://elm.eu.org/), we found that KEN motifs were significantly enriched in human protein entries with cell cycle keywords in the UniProt/Swiss-Prot database-implying that KEN-boxes might be more common than reported. Matches to short linear motifs in protein database searches are not, per se, significant. KEN-box enrichment with cell cycle Gene Ontology terms suggests that collectively these motifs are functional but does not prove that any given instance is so. Candidates were surveyed for native disorder prediction using GlobPlot and IUPred and for motif conservation in homologues. Among >25 strong new candidates, the most notable are human HIPK2, CHFR, CDC27, Dab2, Upf2, kinesin Eg5, DNA Topoisomerase 1 and yeast Cdc5 and Swi5. A similar number of weaker candidates were present. These proteins have yet to be tested for APC/C targeted destruction, providing potential new avenues of research.
Fast social-like learning of complex behaviors based on motor motifs

NASA Astrophysics Data System (ADS)

Calvo Tapia, Carlos; Tyukin, Ivan Y.; Makarov, Valeri A.

2018-05-01

Social learning is widely observed in many species. Less experienced agents copy successful behaviors exhibited by more experienced individuals. Nevertheless, the dynamical mechanisms behind this process remain largely unknown. Here we assume that a complex behavior can be decomposed into a sequence of n motor motifs. Then a neural network capable of activating motor motifs in a given sequence can drive an agent. To account for (n -1 )! possible sequences of motifs in a neural network, we employ the winnerless competition approach. We then consider a teacher-learner situation: one agent exhibits a complex movement, while another one aims at mimicking the teacher's behavior. Despite the huge variety of possible motif sequences we show that the learner, equipped with the provided learning model, can rewire "on the fly" its synaptic couplings in no more than (n -1 ) learning cycles and converge exponentially to the durations of the teacher's motifs. We validate the learning model on mobile robots. Experimental results show that the learner is indeed capable of copying the teacher's behavior composed of six motor motifs in a few learning cycles. The reported mechanism of learning is general and can be used for replicating different functions, including, for example, sound patterns or speech.
De novo discovery of structural motifs in RNA 3D structures through clustering.

PubMed

Ge, Ping; Islam, Shahidul; Zhong, Cuncong; Zhang, Shaojie

2018-05-18

As functional components in three-dimensional (3D) conformation of an RNA, the RNA structural motifs provide an easy way to associate the molecular architectures with their biological mechanisms. In the past years, many computational tools have been developed to search motif instances by using the existing knowledge of well-studied families. Recently, with the rapidly increasing number of resolved RNA 3D structures, there is an urgent need to discover novel motifs with the newly presented information. In this work, we classify all the loops in non-redundant RNA 3D structures to detect plausible RNA structural motif families by using a clustering pipeline. Compared with other clustering approaches, our method has two benefits: first, the underlying alignment algorithm is tolerant to the variations in 3D structures. Second, sophisticated downstream analysis has been performed to ensure the clusters are valid and easily applied to further research. The final clustering results contain many interesting new variants of known motif families, such as GNAA tetraloop, kink-turn, sarcin-ricin and T-loop. We have also discovered potential novel functional motifs conserved in ribosomal RNA, sgRNA, SRP RNA, riboswitch and ribozyme.
A dinucleotide motif in oligonucleotides shows potent immunomodulatory activity and overrides species-specific recognition observed with CpG motif.

PubMed

Kandimalla, Ekambar R; Bhagat, Lakshmi; Zhu, Fu-Gang; Yu, Dong; Cong, Yan-Ping; Wang, Daqing; Tang, Jimmy X; Tang, Jin-Yan; Knetter, Cathrine F; Lien, Egil; Agrawal, Sudhir

2003-11-25

Bacterial and synthetic DNAs containing CpG dinucleotides in specific sequence contexts activate the vertebrate immune system through Toll-like receptor 9 (TLR9). In the present study, we used a synthetic nucleoside with a bicyclic heterobase [1-(2'-deoxy-beta-d-ribofuranosyl)-2-oxo-7-deaza-8-methyl-purine; R] to replace the C in CpG, resulting in an RpG dinucleotide. The RpG dinucleotide was incorporated in mouse- and human-specific motifs in oligodeoxynucleotides (oligos) and 3'-3-linked oligos, referred to as immunomers. Oligos containing the RpG motif induced cytokine secretion in mouse spleen-cell cultures. Immunomers containing RpG dinucleotides showed activity in transfected-HEK293 cells stably expressing mouse TLR9, suggesting direct involvement of TLR9 in the recognition of RpG motif. In J774 macrophages, RpG motifs activated NF-kappa B and mitogen-activated protein kinase pathways. Immunomers containing the RpG dinucleotide induced high levels of IL-12 and IFN-gamma, but lower IL-6 in time- and concentration-dependent fashion in mouse spleen-cell cultures costimulated with IL-2. Importantly, immunomers containing GTRGTT and GARGTT motifs were recognized to a similar extent by both mouse and human immune systems. Additionally, both mouse- and human-specific RpG immunomers potently stimulated proliferation of peripheral blood mononuclear cells obtained from diverse vertebrate species, including monkey, pig, horse, sheep, goat, rat, and chicken. An immunomer containing GTRGTT motif prevented conalbumin-induced and ragweed allergen-induced allergic inflammation in mice. We show that a synthetic bicyclic nucleotide is recognized in the C position of a CpG dinucleotide by immune cells from diverse vertebrate species without bias for flanking sequences, suggesting a divergent nucleotide motif recognition pattern of TLR9.
Overexpression of TRIM44 is related to invasive potential and malignant outcomes in esophageal squamous cell carcinoma.

PubMed

Kawaguchi, Tsutomu; Komatsu, Shuhei; Ichikawa, Daisuke; Hirajima, Shoji; Nishimura, Yukihisa; Konishi, Hirotaka; Shiozaki, Atsushi; Fujiwara, Hitoshi; Okamoto, Kazuma; Tsuda, Hitoshi; Otsuji, Eigo

2017-06-01

Recent studies have shown that some members of the tripartite motif-containing protein family function as important regulators for carcinogenesis. In this study, we investigated whether tripartite motif-containing protein 44 acts as a cancer-promoting gene through its overexpression in esophageal squamous cell carcinoma. We analyzed esophageal squamous cell carcinoma cell lines to evaluate malignant potential and also analyzed 68 primary tumors to evaluate clinical relevance of tripartite motif-containing protein 44 protein in esophageal squamous cell carcinoma patients. Expression of the tripartite motif-containing protein 44 protein was detected in esophageal squamous cell carcinoma cell lines (8/14 cell lines; 57%) and primary tumor samples of esophageal squamous cell carcinoma (39/68 cases; 57%). Knockdown of tripartite motif-containing protein 44 expression in esophageal squamous cell carcinoma cells using several specific small interfering RNAs inhibited cell migration and invasion, but not cell proliferation. Immunohistochemical analysis demonstrated that the overexpression of the tripartite motif-containing protein 44 protein in the tumor infiltrated region was associated with the status of lymph node metastasis ( p = 0.049), and the overall survival rates were significantly worse among patients with tripartite motif-containing protein 44-overexpressing tumors than those with non-expressing tumors ( p = 0.029). Moreover, multivariate Cox regression model identified that overexpression of the tripartite motif-containing protein 44 protein was an independent worse prognostic factor (hazard ratio = 2.815; p = 0.041), as well as lymphatic invasion (hazard ratio = 2.735; p = 0.037). These results suggest that tripartite motif-containing protein 44 protein could play a crucial role in tumor invasion through its overexpression and highlight its usefulness as a predictor and potential therapeutic target in esophageal squamous cell carcinoma.
Genome Wide Identification, Evolutionary, and Expression Analysis of VQ Genes from Two Pyrus Species

PubMed Central

Meng, Dandan; Abdullah, Muhammad; Jin, Qing; Lin, Yi; Cai, Yongping

2018-01-01

The VQ motif-containing gene, a member of the plant-specific genes, is involved in the plant developmental process and various stress responses. The VQ motif-containing gene family has been studied in several plants, such as rice (Oryza sativa), maize (Zea mays), and Arabidopsis (Arabidopsis thaliana). However, no systematic study has been performed in Pyrus species, which have important economic value. In our study, we identified 41 and 28 VQ motif-containing genes in Pyrus bretschneideri and Pyrus communis, respectively. Phylogenetic trees were calculated using A. thaliana and O. sativa VQ motif-containing genes as a template, allowing us to categorize these genes into nine subfamilies. Thirty-two and eight paralogous of VQ motif-containing genes were found in P. bretschneideri and P. communis, respectively, showing that the VQ motif-containing genes had a more remarkable expansion in P. bretschneideri than in P. communis. A total of 31 orthologous pairs were identified from the P. bretschneideri and P. communis VQ motif-containing genes. Additionally, among the paralogs, we found that these duplication gene pairs probably derived from segmental duplication/whole-genome duplication (WGD) events in the genomes of P. bretschneideri and P. communis, respectively. The gene expression profiles in both P. bretschneideri and P. communis fruits suggested functional redundancy for some orthologous gene pairs derived from a common ancestry, and sub-functionalization or neo-functionalization for some of them. Our study provided the first systematic evolutionary analysis of the VQ motif-containing genes in Pyrus, and highlighted the diversification and duplication of VQ motif-containing genes in both P. bretschneideri and P. communis. PMID:29690608
Functional structural motifs for protein-ligand, protein-protein, and protein-nucleic acid interactions and their connection to supersecondary structures.

PubMed

Kinjo, Akira R; Nakamura, Haruki

2013-01-01

Protein functions are mediated by interactions between proteins and other molecules. One useful approach to analyze protein functions is to compare and classify the structures of interaction interfaces of proteins. Here, we describe the procedures for compiling a database of interface structures and efficiently comparing the interface structures. To do so requires a good understanding of the data structures of the Protein Data Bank (PDB). Therefore, we also provide a detailed account of the PDB exchange dictionary necessary for extracting data that are relevant for analyzing interaction interfaces and secondary structures. We identify recurring structural motifs by classifying similar interface structures, and we define a coarse-grained representation of supersecondary structures (SSS) which represents a sequence of two or three secondary structure elements including their relative orientations as a string of four to seven letters. By examining the correspondence between structural motifs and SSS strings, we show that no SSS string has particularly high propensity to be found interaction interfaces in general, indicating any SSS can be used as a binding interface. When individual structural motifs are examined, there are some SSS strings that have high propensity for particular groups of structural motifs. In addition, it is shown that while the SSS strings found in particular structural motifs for nonpolymer and protein interfaces are as abundant as in other structural motifs that belong to the same subunit, structural motifs for nucleic acid interfaces exhibit somewhat stronger preference for SSS strings. In regard to protein folds, many motif-specific SSS strings were found across many folds, suggesting that SSS may be a useful description to investigate the universality of ligand binding modes.
The Verrucomicrobia LexA-Binding Motif: Insights into the Evolutionary Dynamics of the SOS Response.

PubMed

Erill, Ivan; Campoy, Susana; Kılıç, Sefa; Barbé, Jordi

2016-01-01

The SOS response is the primary bacterial mechanism to address DNA damage, coordinating multiple cellular processes that include DNA repair, cell division, and translesion synthesis. In contrast to other regulatory systems, the composition of the SOS genetic network and the binding motif of its transcriptional repressor, LexA, have been shown to vary greatly across bacterial clades, making it an ideal system to study the co-evolution of transcription factors and their regulons. Leveraging comparative genomics approaches and prior knowledge on the core SOS regulon, here we define the binding motif of the Verrucomicrobia, a recently described phylum of emerging interest due to its association with eukaryotic hosts. Site directed mutagenesis of the Verrucomicrobium spinosum recA promoter confirms that LexA binds a 14 bp palindromic motif with consensus sequence TGTTC-N4-GAACA. Computational analyses suggest that recognition of this novel motif is determined primarily by changes in base-contacting residues of the third alpha helix of the LexA helix-turn-helix DNA binding motif. In conjunction with comparative genomics analysis of the LexA regulon in the Verrucomicrobia phylum, electrophoretic shift assays reveal that LexA binds to operators in the promoter region of DNA repair genes and a mutagenesis cassette in this organism, and identify previously unreported components of the SOS response. The identification of tandem LexA-binding sites generating instances of other LexA-binding motifs in the lexA gene promoter of Verrucomicrobia species leads us to postulate a novel mechanism for LexA-binding motif evolution. This model, based on gene duplication, successfully addresses outstanding questions in the intricate co-evolution of the LexA protein, its binding motif and the regulatory network it controls.
Deciphering functional glycosaminoglycan motifs in development.

PubMed

Townley, Robert A; Bülow, Hannes E

2018-03-23

Glycosaminoglycans (GAGs) such as heparan sulfate, chondroitin/dermatan sulfate, and keratan sulfate are linear glycans, which when attached to protein backbones form proteoglycans. GAGs are essential components of the extracellular space in metazoans. Extensive modifications of the glycans such as sulfation, deacetylation and epimerization create structural GAG motifs. These motifs regulate protein-protein interactions and are thereby repsonsible for many of the essential functions of GAGs. This review focusses on recent genetic approaches to characterize GAG motifs and their function in defined signaling pathways during development. We discuss a coding approach for GAGs that would enable computational analyses of GAG sequences such as alignments and the computation of position weight matrices to describe GAG motifs. Copyright © 2018 Elsevier Ltd. All rights reserved.
Ca2+-binding Motif of βγ-Crystallins*

PubMed Central

Srivastava, Shanti Swaroop; Mishra, Amita; Krishnan, Bal; Sharma, Yogendra

2014-01-01

βγ-Crystallin-type double clamp (N/D)(N/D)XX(S/T)S motif is an established but sparsely investigated motif for Ca2+ binding. A βγ-crystallin domain is formed of two Greek key motifs, accommodating two Ca2+-binding sites. βγ-Crystallins make a separate class of Ca2+-binding proteins (CaBP), apparently a major group of CaBP in bacteria. Paralleling the diversity in βγ-crystallin domains, these motifs also show great diversity, both in structure and in function. Although the expression of some of them has been associated with stress, virulence, and adhesion, the functional implications of Ca2+ binding to βγ-crystallins in mediating biological processes are yet to be elucidated. PMID:24567326
Redemptive Rhetoric: The Continuity Motif in the Rhetoric of Right to Life.

ERIC Educational Resources Information Center

Solomon, Martha

1980-01-01

Traces the use of the "continuity" motif in the Right to Life movement's rhetoric and its influence on the depiction of the abortion controversy. Analyzes how the motif functions rhetorically to aid the movement in defining its activities and involvement. (PD)
Dual Functions of Lip6 and Its Regulation of Lipid Metabolism in the Oleaginous Fungus Mucor circinelloides.

PubMed

Zan, Xinyi; Tang, Xin; Chu, Linfang; Song, Yuanda

2018-03-21

Although multiple roles of lipases have been reported in yeasts and microalgae, the functions of lipases have not been studied in oleaginous filamentous fungi. Lipase Lip6 has been reported in the oleaginous filamentous fungus Mucor circinelloides with the consensus lipase motif GXSXG and the typical acyltransferase motif of H-(X) 4 -D. To demonstrate that Lip6 might play dual roles as a lipase and an acyltransferase, we performed site-directed mutagenesis in the lipase motif and the acyltransferase motif of Lip6. Mutation in the lipase motif increased cell biomass by 12%-18% and promoted lipid accumulation by 9%-24%, while mutation in the acyltransferase motif induced lipid degradation. In vitro, purified Lip6 had a slight lipase activity but had a stronger phospholipid:DAG acyltransferase activity. Enzyme activity assays in vivo and phospholipid synthesis pathway analysis suggested that phosphatidyl serine and phosphatidyl ethanolamine can be the supplier of a fatty acyl moiety to form TAG in M. circinelloides.
Feature extraction using gray-level co-occurrence matrix of wavelet coefficients and texture matching for batik motif recognition

NASA Astrophysics Data System (ADS)

Suciati, Nanik; Herumurti, Darlis; Wijaya, Arya Yudhi

2017-02-01

Batik is one of Indonesian's traditional cloth. Motif or pattern drawn on a piece of batik fabric has a specific name and philosopy. Although batik cloths are widely used in everyday life, but only few people understand its motif and philosophy. This research is intended to develop a batik motif recognition system which can be used to identify motif of Batik image automatically. First, a batik image is decomposed into sub-images using wavelet transform. Six texture descriptors, i.e. max probability, correlation, contrast, uniformity, homogenity and entropy, are extracted from gray-level co-occurrence matrix of each sub-image. The texture features are then matched to the template features using canberra distance. The experiment is performed on Batik Dataset consisting of 1088 batik images grouped into seven motifs. The best recognition rate, that is 92,1%, is achieved using feature extraction process with 5 level wavelet decomposition and 4 directional gray-level co-occurrence matrix.
[Screening specific recognition motif of RNA-binding proteins by SELEX in combination with next-generation sequencing technique].

PubMed

Zhang, Lu; Xu, Jinhao; Ma, Jinbiao

2016-07-25

RNA-binding protein exerts important biological function by specifically recognizing RNA motif. SELEX (Systematic evolution of ligands by exponential enrichment), an in vitro selection method, can obtain consensus motif with high-affinity and specificity for many target molecules from DNA or RNA libraries. Here, we combined SELEX with next-generation sequencing to study the protein-RNA interaction in vitro. A pool of RNAs with 20 bp random sequences were transcribed by T7 promoter, and target protein was inserted into plasmid containing SBP-tag, which can be captured by streptavidin beads. Through only one cycle, the specific RNA motif can be obtained, which dramatically improved the selection efficiency. Using this method, we found that human hnRNP A1 RRMs domain (UP1 domain) bound RNA motifs containing AGG and AG sequences. The EMSA experiment indicated that hnRNP A1 RRMs could bind the obtained RNA motif. Taken together, this method provides a rapid and effective method to study the RNA binding specificity of proteins.
DynaMIT: the dynamic motif integration toolkit

PubMed Central

Dassi, Erik; Quattrone, Alessandro

2016-01-01

De-novo motif search is a frequently applied bioinformatics procedure to identify and prioritize recurrent elements in sequences sets for biological investigation, such as the ones derived from high-throughput differential expression experiments. Several algorithms have been developed to perform motif search, employing widely different approaches and often giving divergent results. In order to maximize the power of these investigations and ultimately be able to draft solid biological hypotheses, there is the need for applying multiple tools on the same sequences and merge the obtained results. However, motif reporting formats and statistical evaluation methods currently make such an integration task difficult to perform and mostly restricted to specific scenarios. We thus introduce here the Dynamic Motif Integration Toolkit (DynaMIT), an extremely flexible platform allowing to identify motifs employing multiple algorithms, integrate them by means of a user-selected strategy and visualize results in several ways; furthermore, the platform is user-extendible in all its aspects. DynaMIT is freely available at http://cibioltg.bitbucket.org. PMID:26253738
Non-B DB v2.0: a database of predicted non-B DNA-forming motifs and its associated tools.

PubMed

Cer, Regina Z; Donohue, Duncan E; Mudunuri, Uma S; Temiz, Nuri A; Loss, Michael A; Starner, Nathan J; Halusa, Goran N; Volfovsky, Natalia; Yi, Ming; Luke, Brian T; Bacolla, Albino; Collins, Jack R; Stephens, Robert M

2013-01-01

The non-B DB, available at http://nonb.abcc.ncifcrf.gov, catalogs predicted non-B DNA-forming sequence motifs, including Z-DNA, G-quadruplex, A-phased repeats, inverted repeats, mirror repeats, direct repeats and their corresponding subsets: cruciforms, triplexes and slipped structures, in several genomes. Version 2.0 of the database revises and re-implements the motif discovery algorithms to better align with accepted definitions and thresholds for motifs, expands the non-B DNA-forming motifs coverage by including short tandem repeats and adds key visualization tools to compare motif locations relative to other genomic annotations. Non-B DB v2.0 extends the ability for comparative genomics by including re-annotation of the five organisms reported in non-B DB v1.0, human, chimpanzee, dog, macaque and mouse, and adds seven additional organisms: orangutan, rat, cow, pig, horse, platypus and Arabidopsis thaliana. Additionally, the non-B DB v2.0 provides an overall improved graphical user interface and faster query performance.
The Role of BRCA1 Domains and Motifs in Tumor Suppression

DTIC Science & Technology

2011-08-01

The views, opinions and/or findings contained in this report are those of the author( s ) and should not be construed as an...CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR( S ) 5d. PROJECT NUMBER 5e. TASK NUMBER E-Mail: 5f. WORK UNIT...NUMBER 7. PERFORMING ORGANIZATION NAME( S ) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION REPORT NUMBER 9. SPONSORING / MONITORING
Predicted taxonomic patterns in pheromone production by longhorned beetles

NASA Astrophysics Data System (ADS)

Ray, Ann M.; Lacey, Emerson S.; Hanks, Lawrence M.

2006-11-01

Males of five species of three tribes in the longhorned beetle subfamily Cerambycinae produce volatile pheromones that share a structural motif (hydroxyl or carbonyl groups at carbons two and three in straight-chains of six, eight, or ten carbons). Pheromone gland pores are present on the prothoraces of males, but are absent in females, suggesting that male-specific gland pores could provide a convenient morphological indication that a species uses volatile pheromones. In this article, we assess the taxonomic distribution of gland pores within the Cerambycinae by examining males and females of 65 species in 24 tribes using scanning electron microscopy. Gland pores were present in males and absent in females of 49 species, but absent in both sexes of the remaining 16 species. Pores were confined to indentations in the cuticle. Among the species that had male-specific gland pores were four species already known to produce volatile compounds consistent with the structural motif. These findings support the initial assumption that gland pores are associated with the production of pheromones by males. There were apparently no taxonomic patterns in the presence of gland pores. These findings suggest that volatile pheromones play an important role in reproduction for many species of the Cerambycinae, and that the trait is evolutionarily labile.
Understanding system dynamics of an adaptive enzyme network from globally profiled kinetic parameters.

PubMed

Chiang, Austin W T; Liu, Wei-Chung; Charusanti, Pep; Hwang, Ming-Jing

2014-01-15

A major challenge in mathematical modeling of biological systems is to determine how model parameters contribute to systems dynamics. As biological processes are often complex in nature, it is desirable to address this issue using a systematic approach. Here, we propose a simple methodology that first performs an enrichment test to find patterns in the values of globally profiled kinetic parameters with which a model can produce the required system dynamics; this is then followed by a statistical test to elucidate the association between individual parameters and different parts of the system's dynamics. We demonstrate our methodology on a prototype biological system of perfect adaptation dynamics, namely the chemotaxis model for Escherichia coli. Our results agreed well with those derived from experimental data and theoretical studies in the literature. Using this model system, we showed that there are motifs in kinetic parameters and that these motifs are governed by constraints of the specified system dynamics. A systematic approach based on enrichment statistical tests has been developed to elucidate the relationships between model parameters and the roles they play in affecting system dynamics of a prototype biological network. The proposed approach is generally applicable and therefore can find wide use in systems biology modeling research.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.