Science.gov

Sample records for integrated sequence motif

  1. iMotifs: an integrated sequence motif visualization and analysis environment

    PubMed Central

    Piipari, Matias; Down, Thomas A.; Saini, Harpreet; Enright, Anton; Hubbard, Tim J.P.

    2010-01-01

    Motivation: Short sequence motifs are an important class of models in molecular biology, used most commonly for describing transcription factor binding site specificity patterns. High-throughput methods have been recently developed for detecting regulatory factor binding sites in vivo and in vitro and consequently high-quality binding site motif data are becoming available for increasing number of organisms and regulatory factors. Development of intuitive tools for the study of sequence motifs is therefore important. iMotifs is a graphical motif analysis environment that allows visualization of annotated sequence motifs and scored motif hits in sequences. It also offers motif inference with the sensitive NestedMICA algorithm, as well as overrepresentation and pairwise motif matching capabilities. All of the analysis functionality is provided without the need to convert between file formats or learn different command line interfaces. The application includes a bundled and graphically integrated version of the NestedMICA motif inference suite that has no outside dependencies. Problems associated with local deployment of software are therefore avoided. Availability: iMotifs is licensed with the GNU Lesser General Public License v2.0 (LGPL 2.0). The software and its source is available at http://wiki.github.com/mz2/imotifs and can be run on Mac OS X Leopard (Intel/PowerPC). We also provide a cross-platform (Linux, OS X, Windows) LGPL 2.0 licensed library libxms for the Perl, Ruby, R and Objective-C programming languages for input and output of XMS formatted annotated sequence motif set files. Contact: matias.piipari@gmail.com; imotifs@googlegroups.com PMID:20106815

  2. AliBiMotif: integrating alignment and biclustering to unravel transcription factor binding sites in DNA sequences.

    PubMed

    Gonçalves, Joana P; Moreau, Yves; Madeira, Sara C

    2012-01-01

    Transcription Factors (TFs) control transcription by binding to specific sites in the promoter regions of the target genes, which can be modelled by structured motifs. In this paper we propose AliBiMotif, a method combining sequence alignment and a biclustering approach based on efficient string matching techniques using suffix trees to unravel approximately conserved sets of blocks (structured motifs) while straightforwardly disregarding non-conserved stretches in-between. The ability to ignore the width of non-conserved regions is a major advantage of the proposed method over other motif finders, as the lengths of the binding sites are usually easier to estimate than the separating distances.

  3. Motif Yggdrasil: sampling sequence motifs from a tree mixture model.

    PubMed

    Andersson, Samuel A; Lagergren, Jens

    2007-06-01

    In phylogenetic foot-printing, putative regulatory elements are found in upstream regions of orthologous genes by searching for common motifs. Motifs in different upstream sequences are subject to mutations along the edges of the corresponding phylogenetic tree, consequently taking advantage of the tree in the motif search is an appealing idea. We describe the Motif Yggdrasil sampler; the first Gibbs sampler based on a general tree that uses unaligned sequences. Previous tree-based Gibbs samplers have assumed a star-shaped tree or partially aligned upstream regions. We give a probabilistic model (MY model) describing upstream sequences with regulatory elements and build a Gibbs sampler with respect to this model. The model allows toggling, i.e., the restriction of a position to a subset of nucleotides, but does not require aligned sequences nor edge lengths, which may be difficult to come by. We apply the collapsing technique to eliminate the need to sample nuisance parameters, and give a derivation of the predictive update formula. We show that the MY model improves the modeling of difficult motif instances and that the use of the tree achieves a substantial increase in nucleotide level correlation coefficient both for synthetic data and 37 bacterial lexA genes. We investigate the sensitivity to errors in the tree and show that using random trees MY sampler still has a performance similar to the original version.

  4. Short sequence motifs, overrepresented in mammalian conservednon-coding sequences

    SciTech Connect

    Minovitsky, Simon; Stegmaier, Philip; Kel, Alexander; Kondrashov,Alexey S.; Dubchak, Inna

    2007-02-21

    Background: A substantial fraction of non-coding DNAsequences of multicellular eukaryotes is under selective constraint. Inparticular, ~;5 percent of the human genome consists of conservednon-coding sequences (CNSs). CNSs differ from other genomic sequences intheir nucleotide composition and must play important functional roles,which mostly remain obscure.Results: We investigated relative abundancesof short sequence motifs in all human CNSs present in the human/mousewhole-genome alignments vs. three background sets of sequences: (i)weakly conserved or unconserved non-coding sequences (non-CNSs); (ii)near-promoter sequences (located between nucleotides -500 and -1500,relative to a start of transcription); and (iii) random sequences withthe same nucleotide composition as that of CNSs. When compared tonon-CNSs and near-promoter sequences, CNSs possess an excess of AT-richmotifs, often containing runs of identical nucleotides. In contrast, whencompared to random sequences, CNSs contain an excess of GC-rich motifswhich, however, lack CpG dinucleotides. Thus, abundance of short sequencemotifs in human CNSs, taken as a whole, is mostly determined by theiroverall compositional properties and not by overrepresentation of anyspecific short motifs. These properties are: (i) high AT-content of CNSs,(ii) a tendency, probably due to context-dependent mutation, of A's andT's to clump, (iii) presence of short GC-rich regions, and (iv) avoidanceof CpG contexts, due to their hypermutability. Only a small number ofshort motifs, overrepresented in all human CNSs are similar to bindingsites of transcription factors from the FOX family.Conclusion: Human CNSsas a whole appear to be too broad a class of sequences to possess strongfootprints of any short sequence-specific functions. Such footprintsshould be studied at the level of functional subclasses of CNSs, such asthose which flank genes with a particular pattern of expression. Overallproperties of CNSs are affected by patterns in

  5. Occurrence probability of structured motifs in random sequences.

    PubMed

    Robin, S; Daudin, J-J; Richard, H; Sagot, M-F; Schbath, S

    2002-01-01

    The problem of extracting from a set of nucleic acid sequences motifs which may have biological function is more and more important. In this paper, we are interested in particular motifs that may be implicated in the transcription process. These motifs, called structured motifs, are composed of two ordered parts separated by a variable distance and allowing for substitutions. In order to assess their statistical significance, we propose approximations of the probability of occurrences of such a structured motif in a given sequence. An application of our method to evaluate candidate promoters in E. coli and B. subtilis is presented. Simulations show the goodness of the approximations.

  6. Probabilistic models for semisupervised discriminative motif discovery in DNA sequences.

    PubMed

    Kim, Jong Kyoung; Choi, Seungjin

    2011-01-01

    Methods for discriminative motif discovery in DNA sequences identify transcription factor binding sites (TFBSs), searching only for patterns that differentiate two sets (positive and negative sets) of sequences. On one hand, discriminative methods increase the sensitivity and specificity of motif discovery, compared to generative models. On the other hand, generative models can easily exploit unlabeled sequences to better detect functional motifs when labeled training samples are limited. In this paper, we develop a hybrid generative/discriminative model which enables us to make use of unlabeled sequences in the framework of discriminative motif discovery, leading to semisupervised discriminative motif discovery. Numerical experiments on yeast ChIP-chip data for discovering DNA motifs demonstrate that the best performance is obtained between the purely-generative and the purely-discriminative and the semisupervised learning improves the performance when labeled sequences are limited.

  7. Retroviruses integrate into a shared, non-palindromic DNA motif.

    PubMed

    Kirk, Paul D W; Huvet, Maxime; Melamed, Anat; Maertens, Goedele N; Bangham, Charles R M

    2016-11-14

    Many DNA-binding factors, such as transcription factors, form oligomeric complexes with structural symmetry that bind to palindromic DNA sequences(1). Palindromic consensus nucleotide sequences are also found at the genomic integration sites of retroviruses(2-6) and other transposable elements(7-9), and it has been suggested that this palindromic consensus arises as a consequence of the structural symmetry in the integrase complex(2,3). However, we show here that the palindromic consensus sequence is not present in individual integration sites of human T-cell lymphotropic virus type 1 (HTLV-1) and human immunodeficiency virus type 1 (HIV-1), but arises in the population average as a consequence of the existence of a non-palindromic nucleotide motif that occurs in approximately equal proportions on the plus strand and the minus strand of the host genome. We develop a generally applicable algorithm to sort the individual integration site sequences into plus-strand and minus-strand subpopulations, and use this to identify the integration site nucleotide motifs of five retroviruses of different genera: HTLV-1, HIV-1, murine leukaemia virus (MLV), avian sarcoma leucosis virus (ASLV) and prototype foamy virus (PFV). The results reveal a non-palindromic motif that is shared between these retroviruses.

  8. MISCORE: a new scoring function for characterizing DNA regulatory motifs in promoter sequences

    PubMed Central

    2012-01-01

    Background Computational approaches for finding DNA regulatory motifs in promoter sequences are useful to biologists in terms of reducing the experimental costs and speeding up the discovery process of de novo binding sites. It is important for rule-based or clustering-based motif searching schemes to effectively and efficiently evaluate the similarity between a k-mer (a k-length subsequence) and a motif model, without assuming the independence of nucleotides in motif models or without employing computationally expensive Markov chain models to estimate the background probabilities of k-mers. Also, it is interesting and beneficial to use a priori knowledge in developing advanced searching tools. Results This paper presents a new scoring function, termed as MISCORE, for functional motif characterization and evaluation. Our MISCORE is free from: (i) any assumption on model dependency; and (ii) the use of Markov chain model for background modeling. It integrates the compositional complexity of motif instances into the function. Performance evaluations with comparison to the well-known Maximum a Posteriori (MAP) score and Information Content (IC) have shown that MISCORE has promising capabilities to separate and recognize functional DNA motifs and its instances from non-functional ones. Conclusions MISCORE is a fast computational tool for candidate motif characterization, evaluation and selection. It enables to embed priori known motif models for computing motif-to-motif similarity, which is more advantageous than IC and MAP score. In addition to these merits mentioned above, MISCORE can automatically filter out some repetitive k-mers from a motif model due to the introduction of the compositional complexity in the function. Consequently, the merits of our proposed MISCORE in terms of both motif signal modeling power and computational efficiency will make it more applicable in the development of computational motif discovery tools. PMID:23282090

  9. The distribution of RNA motifs in natural sequences.

    PubMed

    Bourdeau, V; Ferbeyre, G; Pageau, M; Paquin, B; Cedergren, R

    1999-11-15

    Functional analysis of genome sequences has largely ignored RNA genes and their structures. We introduce here the notion of 'ribonomics' to describe the search for the distribution of and eventually the determination of the physiological roles of these RNA structures found in the sequence databases. The utility of this approach is illustrated here by the identification in the GenBank database of RNA motifs having known binding or chemical activity. The frequency of these motifs indicates that most have originated from evolutionary drift and are selectively neutral. On the other hand, their distribution among species and their location within genes suggest that the destiny of these motifs may be more elaborate. For example, the hammerhead motif has a skewed organismal presence, is phylogenetically stable and recent work on a schistosome version confirms its in vivo biological activity. The under-representation of the valine-binding motif and the Rev-binding element in GenBank hints at a detrimental effect on cell growth or viability. Data on the presence and the location of these motifs may provide critical guidance in the design of experiments directed towards the understanding and the manipulation of RNA complexes and activities in vivo.

  10. Chaotic motif sampler: detecting motifs from biological sequences by using chaotic neurodynamics

    NASA Astrophysics Data System (ADS)

    Matsuura, Takafumi; Ikeguchi, Tohru

    Identification of a region in biological sequences, motif extraction problem (MEP) is solved in bioinformatics. However, the MEP is an NP-hard problem. Therefore, it is almost impossible to obtain an optimal solution within a reasonable time frame. To find near optimal solutions for NP-hard combinatorial optimization problems such as traveling salesman problems, quadratic assignment problems, and vehicle routing problems, chaotic search, which is one of the deterministic approaches, has been proposed and exhibits better performance than stochastic approaches. In this paper, we propose a new alignment method that employs chaotic dynamics to solve the MEPs. It is called the Chaotic Motif Sampler. We show that the performance of the Chaotic Motif Sampler is considerably better than that of the conventional methods such as the Gibbs Site Sampler and the Neighborhood Optimization for Multiple Alignment Discovery.

  11. Identification of disease-specific motifs in the antibody specificity repertoire via next-generation sequencing

    PubMed Central

    Pantazes, Robert J.; Reifert, Jack; Bozekowski, Joel; Ibsen, Kelly N.; Murray, Joseph A.; Daugherty, Patrick S.

    2016-01-01

    Disease-specific antibodies can serve as highly effective biomarkers but have been identified for only a relatively small number of autoimmune diseases. A method was developed to identify disease-specific binding motifs through integration of bacterial display peptide library screening, next-generation sequencing (NGS) and computational analysis. Antibody specificity repertoires were determined by identifying bound peptide library members for each specimen using cell sorting and performing NGS. A computational algorithm, termed Identifying Motifs Using Next- generation sequencing Experiments (IMUNE), was developed and applied to discover disease- and healthy control-specific motifs. IMUNE performs comprehensive pattern searches, identifies patterns statistically enriched in the disease or control groups and clusters the patterns to generate motifs. Using celiac disease sera as a discovery set, IMUNE identified a consensus motif (QPEQPF[PS]E) with high diagnostic sensitivity and specificity in a validation sera set, in addition to novel motifs. Peptide display and sequencing (Display-Seq) coupled with IMUNE analysis may thus be useful to characterize antibody repertoires and identify disease-specific antibody epitopes and biomarkers. PMID:27481573

  12. WildSpan: mining structured motifs from protein sequences

    PubMed Central

    2011-01-01

    Background Automatic extraction of motifs from biological sequences is an important research problem in study of molecular biology. For proteins, it is desired to discover sequence motifs containing a large number of wildcard symbols, as the residues associated with functional sites are usually largely separated in sequences. Discovering such patterns is time-consuming because abundant combinations exist when long gaps (a gap consists of one or more successive wildcards) are considered. Mining algorithms often employ constraints to narrow down the search space in order to increase efficiency. However, improper constraint models might degrade the sensitivity and specificity of the motifs discovered by computational methods. We previously proposed a new constraint model to handle large wildcard regions for discovering functional motifs of proteins. The patterns that satisfy the proposed constraint model are called W-patterns. A W-pattern is a structured motif that groups motif symbols into pattern blocks interleaved with large irregular gaps. Considering large gaps reflects the fact that functional residues are not always from a single region of protein sequences, and restricting motif symbols into clusters corresponds to the observation that short motifs are frequently present within protein families. To efficiently discover W-patterns for large-scale sequence annotation and function prediction, this paper first formally introduces the problem to solve and proposes an algorithm named WildSpan (sequential pattern mining across large wildcard regions) that incorporates several pruning strategies to largely reduce the mining cost. Results WildSpan is shown to efficiently find W-patterns containing conserved residues that are far separated in sequences. We conducted experiments with two mining strategies, protein-based and family-based mining, to evaluate the usefulness of W-patterns and performance of WildSpan. The protein-based mining mode of WildSpan is developed for

  13. A generalized profile syntax for biomolecular sequence motifs and its function in automatic sequence interpretation

    SciTech Connect

    Bucher, P.; Bairoch, A.

    1994-12-31

    A general syntax for expressing bimolecular sequence motifs is described, which will be used in future releases of the PROSITE data bank and in a similar collection of nucleic acid sequence motifs currently under development. The central part of the syntax is a regular structure which can be viewed as a generalization of the profiles introduced by Gribskov and coworkers. Accessory features implement specific motif search strategies and provide information helpful for the interpretation of predicted matches. Two contrasting examples, representing E. coli promoters and SH3 domains respectively, are shown to demonstrate the versatility of the syntax, and its compatibility with diverse motif search methods. It is argued, that a comprehensive machine-readable motif collection based on the new syntax, in conjunction with a standard search program, can serve as a general-purpose sequence interpretation and function prediction tool.

  14. Finding sequence motifs in groups of functionally related proteins.

    PubMed

    Smith, H O; Annau, T M; Chandrasegaran, S

    1990-01-01

    We have developed a method for rapidly finding patterns of conserved amino acid residues (motifs) in groups of functionally related proteins. All 3-amino acid patterns in a group of proteins of the type aa1 d1 aa2 d2 aa3, where d1 and d2 are distances that can be varied in a range up to 24 residues, are accumulated into an array. Segments of the proteins containing those patterns that occur most frequently are aligned on each other by a scoring method that obtains an average relatedness value for all the amino acids in each column of the aligned sequence block based on the Dayhoff relatedness odds matrix. The automated method successfully finds and displays nearly all of the sequence motifs that have been previously reported to occur in 33 reverse transcriptases, 18 DNA integrases, and 30 DNA methyltransferases.

  15. Computational definition of sequence motifs governing constitutive exon splicing.

    PubMed

    Zhang, Xiang H-F; Chasin, Lawrence A

    2004-06-01

    We have searched for sequence motifs that contribute to the recognition of human pre-mRNA splice sites by comparing the frequency of 8-mers in internal noncoding exons versus unspliced pseudo exons and 5' untranslated regions (5' untranslated regions [UTRs]) of transcripts of intronless genes. This type of comparison avoids the isolation of sequences that are distinguished by their protein-coding information. We classified sequence families comprising 2069 putative exonic enhancers and 974 putative exonic silencers. Representatives of each class functioned as enhancers or silencers when inserted into a test exon and assayed in transfected mammalian cells. As a class, the enhancer sequencers were more prevalent and the silencer elements less prevalent in all exons compared with introns. A survey of 58 reported exonic splicing mutations showed good agreement between the splicing phenotype and the effect of the mutation on the motifs defined here. The large number of effective sequences implied by these results suggests that sequences that influence splicing may be very abundant in pre-mRNA.

  16. Identification of imine reductase-specific sequence motifs.

    PubMed

    Fademrecht, Silvia; Scheller, Philipp N; Nestl, Bettina M; Hauer, Bernhard; Pleiss, Jürgen

    2016-05-01

    Chiral amines are valuable building blocks for the production of a variety of pharmaceuticals, agrochemicals and other specialty chemicals. Only recently, imine reductases (IREDs) were discovered which catalyze the stereoselective reduction of imines to chiral amines. Although several IREDs were biochemically characterized in the last few years, knowledge of the reaction mechanism and the molecular basis of substrate specificity and stereoselectivity is limited. To gain further insights into the sequence-function relationships, the Imine Reductase Engineering Database (www.IRED.BioCatNet.de) was established and a systematic analysis of 530 putative IREDs was performed. A standard numbering scheme based on R-IRED-Sk was introduced to facilitate the identification and communication of structurally equivalent positions in different proteins. A conservation analysis revealed a highly conserved cofactor binding region and a predominantly hydrophobic substrate binding cleft. Two IRED-specific motifs were identified, the cofactor binding motif GLGxMGx(5 )[ATS]x(4) Gx(4) [VIL]WNR[TS]x(2) [KR] and the active site motif Gx[DE]x[GDA]x[APS]x(3){K}x[ASL]x[LMVIAG]. Our results indicate a preference toward NADPH for all IREDs and explain why, despite their sequence similarity to β-hydroxyacid dehydrogenases (β-HADs), no conversion of β-hydroxyacids has been observed. Superfamily-specific conservations were investigated to explore the molecular basis of their stereopreference. Based on our analysis and previous experimental results on IRED mutants, an exclusive role of standard position 187 for stereoselectivity is excluded. Alternatively, two standard positions 139 and 194 were identified which are superfamily-specifically conserved and differ in R- and S-selective enzymes.

  17. Phosphotyrosine Substrate Sequence Motifs for Dual Specificity Phosphatases

    PubMed Central

    Zhao, Bryan M.; Keasey, Sarah L.; Tropea, Joseph E.; Lountos, George T.; Dyas, Beverly K.; Cherry, Scott; Raran-Kurussi, Sreejith; Waugh, David S.; Ulrich, Robert G.

    2015-01-01

    Protein tyrosine phosphatases dephosphorylate tyrosine residues of proteins, whereas, dual specificity phosphatases (DUSPs) are a subgroup of protein tyrosine phosphatases that dephosphorylate not only Tyr(P) residue, but also the Ser(P) and Thr(P) residues of proteins. The DUSPs are linked to the regulation of many cellular functions and signaling pathways. Though many cellular targets of DUSPs are known, the relationship between catalytic activity and substrate specificity is poorly defined. We investigated the interactions of peptide substrates with select DUSPs of four types: MAP kinases (DUSP1 and DUSP7), atypical (DUSP3, DUSP14, DUSP22 and DUSP27), viral (variola VH1), and Cdc25 (A-C). Phosphatase recognition sites were experimentally determined by measuring dephosphorylation of 6,218 microarrayed Tyr(P) peptides representing confirmed and theoretical phosphorylation motifs from the cellular proteome. A broad continuum of dephosphorylation was observed across the microarrayed peptide substrates for all phosphatases, suggesting a complex relationship between substrate sequence recognition and optimal activity. Further analysis of peptide dephosphorylation by hierarchical clustering indicated that DUSPs could be organized by substrate sequence motifs, and peptide-specificities by phylogenetic relationships among the catalytic domains. The most highly dephosphorylated peptides represented proteins from 29 cell-signaling pathways, greatly expanding the list of potential targets of DUSPs. These newly identified DUSP substrates will be important for examining structure-activity relationships with physiologically relevant targets. PMID:26302245

  18. cisExpress: motif detection in DNA sequences

    PubMed Central

    Triska, Martin; Grocutt, David; Southern, James; Murphy, Denis J.; Tatarinova, Tatiana

    2013-01-01

    Motivation: One of the major challenges for contemporary bioinformatics is the analysis and accurate annotation of genomic datasets to enable extraction of useful information about the functional role of DNA sequences. This article describes a novel genome-wide statistical approach to the detection of specific DNA sequence motifs based on similarities between the promoters of similarly expressed genes. This new tool, cisExpress, is especially designed for use with large datasets, such as those generated by publicly accessible whole genome and transcriptome projects. cisExpress uses a task farming algorithm to exploit all available computational cores within a shared memory node. We demonstrate the robust nature and validity of the proposed method. It is applicable for use with a wide range of genomic databases for any species of interest. Availability: cisExpress is available at www.cisexpress.org. Contact: tatiana.tatarinova@usc.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23793750

  19. Characterizing regulatory path motifs in integrated networks using perturbational data

    PubMed Central

    2010-01-01

    We introduce Pathicular http://bioinformatics.psb.ugent.be/software/details/Pathicular, a Cytoscape plugin for studying the cellular response to perturbations of transcription factors by integrating perturbational expression data with transcriptional, protein-protein and phosphorylation networks. Pathicular searches for 'regulatory path motifs', short paths in the integrated physical networks which occur significantly more often than expected between transcription factors and their targets in the perturbational data. A case study in Saccharomyces cerevisiae identifies eight regulatory path motifs and demonstrates their biological significance. PMID:20230615

  20. Improved K-means clustering algorithm for exploring local protein sequence motifs representing common structural property.

    PubMed

    Zhong, Wei; Altun, Gulsah; Harrison, Robert; Tai, Phang C; Pan, Yi

    2005-09-01

    Information about local protein sequence motifs is very important to the analysis of biologically significant conserved regions of protein sequences. These conserved regions can potentially determine the diverse conformation and activities of proteins. In this work, recurring sequence motifs of proteins are explored with an improved K-means clustering algorithm on a new dataset. The structural similarity of these recurring sequence clusters to produce sequence motifs is studied in order to evaluate the relationship between sequence motifs and their structures. To the best of our knowledge, the dataset used by our research is the most updated dataset among similar studies for sequence motifs. A new greedy initialization method for the K-means algorithm is proposed to improve traditional K-means clustering techniques. The new initialization method tries to choose suitable initial points, which are well separated and have the potential to form high-quality clusters. Our experiments indicate that the improved K-means algorithm satisfactorily increases the percentage of sequence segments belonging to clusters with high structural similarity. Careful comparison of sequence motifs obtained by the improved and traditional algorithms also suggests that the improved K-means clustering algorithm may discover some relatively weak and subtle sequence motifs, which are undetectable by the traditional K-means algorithms. Many biochemical tests reported in the literature show that these sequence motifs are biologically meaningful. Experimental results also indicate that the improved K-means algorithm generates more detailed sequence motifs representing common structures than previous research. Furthermore, these motifs are universally conserved sequence patterns across protein families, overcoming some weak points of other popular sequence motifs. The satisfactory result of the experiment suggests that this new K-means algorithm may be applied to other areas of bioinformatics

  1. Illumina MiSeq sequencing disfavours a sequence motif in the GFP reporter gene

    PubMed Central

    Van den Hoecke, Silvie; Verhelst, Judith; Saelens, Xavier

    2016-01-01

    Green fluorescent protein (GFP) is one of the most used reporter genes. We have used next-generation sequencing (NGS) to analyse the genetic diversity of a recombinant influenza A virus that expresses GFP and found a remarkable coverage dip in the GFP coding sequence. This coverage dip was present when virus-derived RT-PCR product or the parental plasmid DNA was used as starting material for NGS and regardless of whether Nextera XT transposase or Covaris shearing was used for DNA fragmentation. Therefore, the sequence coverage dip in the GFP coding sequence was not the result of emerging GFP mutant viruses or a bias introduced by Nextera XT fragmentation. Instead, we found that the Illumina MiSeq sequencing method disfavours the ‘CCCGCC’ motif in the GFP coding sequence. PMID:27193250

  2. Exhaustive Search for Over-represented DNA Sequence Motifs with CisFinder

    PubMed Central

    Sharov, Alexei A.; Ko, Minoru S.H.

    2009-01-01

    We present CisFinder software, which generates a comprehensive list of motifs enriched in a set of DNA sequences and describes them with position frequency matrices (PFMs). A new algorithm was designed to estimate PFMs directly from counts of n-mer words with and without gaps; then PFMs are extended over gaps and flanking regions and clustered to generate non-redundant sets of motifs. The algorithm successfully identified binding motifs for 12 transcription factors (TFs) in embryonic stem cells based on published chromatin immunoprecipitation sequencing data. Furthermore, CisFinder successfully identified alternative binding motifs of TFs (e.g. POU5F1, ESRRB, and CTCF) and motifs for known and unknown co-factors of genes associated with the pluripotent state of ES cells. CisFinder also showed robust performance in the identification of motifs that were only slightly enriched in a set of DNA sequences. PMID:19740934

  3. A survey of motif discovery methods in an integrated framework

    PubMed Central

    Sandve, Geir Kjetil; Drabløs, Finn

    2006-01-01

    Background There has been a growing interest in computational discovery of regulatory elements, and a multitude of motif discovery methods have been proposed. Computational motif discovery has been used with some success in simple organisms like yeast. However, as we move to higher organisms with more complex genomes, more sensitive methods are needed. Several recent methods try to integrate additional sources of information, including microarray experiments (gene expression and ChlP-chip). There is also a growing awareness that regulatory elements work in combination, and that this combinatorial behavior must be modeled for successful motif discovery. However, the multitude of methods and approaches makes it difficult to get a good understanding of the current status of the field. Results This paper presents a survey of methods for motif discovery in DNA, based on a structured and well defined framework that integrates all relevant elements. Existing methods are discussed according to this framework. Conclusion The survey shows that although no single method takes all relevant elements into consideration, a very large number of different models treating the various elements separately have been tried. Very often the choices that have been made are not explicitly stated, making it difficult to compare different implementations. Also, the tests that have been used are often not comparable. Therefore, a stringent framework and improved test methods are needed to evaluate the different approaches in order to conclude which ones are most promising. Reviewers: This article was reviewed by Eugene V. Koonin, Philipp Bucher (nominated by Mikhail Gelfand) and Frank Eisenhaber. PMID:16600018

  4. Computational generation and screening of RNA motifs in large nucleotide sequence pools

    PubMed Central

    Kim, Namhee; Izzo, Joseph A.; Elmetwaly, Shereef; Gan, Hin Hark; Schlick, Tamar

    2010-01-01

    Although identification of active motifs in large random sequence pools is central to RNA in vitro selection, no systematic computational equivalent of this process has yet been developed. We develop a computational approach that combines target pool generation, motif scanning and motif screening using secondary structure analysis for applications to 1012–1014-sequence pools; large pool sizes are made possible using program redesign and supercomputing resources. We use the new protocol to search for aptamer and ribozyme motifs in pools up to experimental pool size (1014 sequences). We show that motif scanning, structure matching and flanking sequence analysis, respectively, reduce the initial sequence pool by 6–8, 1–2 and 1 orders of magnitude, consistent with the rare occurrence of active motifs in random pools. The final yields match the theoretical yields from probability theory for simple motifs and overestimate experimental yields, which constitute lower bounds, for aptamers because screening analyses beyond secondary structure information are not considered systematically. We also show that designed pools using our nucleotide transition probability matrices can produce higher yields for RNA ligase motifs than random pools. Our methods for generating, analyzing and designing large pools can help improve RNA design via simulation of aspects of in vitro selection. PMID:20448026

  5. Mitoxantrone and Analogues Bind and Stabilize i-Motif Forming DNA Sequences

    PubMed Central

    Wright, Elisé P.; Day, Henry A.; Ibrahim, Ali M.; Kumar, Jeethendra; Boswell, Leo J. E.; Huguin, Camille; Stevenson, Clare E. M.; Pors, Klaus; Waller, Zoë A. E.

    2016-01-01

    There are hundreds of ligands which can interact with G-quadruplex DNA, yet very few which target i-motif. To appreciate an understanding between the dynamics between these structures and how they can be affected by intervention with small molecule ligands, more i-motif binding compounds are required. Herein we describe how the drug mitoxantrone can bind, induce folding of and stabilise i-motif forming DNA sequences, even at physiological pH. Additionally, mitoxantrone was found to bind i-motif forming sequences preferentially over double helical DNA. We also describe the stabilisation properties of analogues of mitoxantrone. This offers a new family of ligands with potential for use in experiments into the structure and function of i-motif forming DNA sequences. PMID:28004744

  6. Mitoxantrone and Analogues Bind and Stabilize i-Motif Forming DNA Sequences

    NASA Astrophysics Data System (ADS)

    Wright, Elisé P.; Day, Henry A.; Ibrahim, Ali M.; Kumar, Jeethendra; Boswell, Leo J. E.; Huguin, Camille; Stevenson, Clare E. M.; Pors, Klaus; Waller, Zoë A. E.

    2016-12-01

    There are hundreds of ligands which can interact with G-quadruplex DNA, yet very few which target i-motif. To appreciate an understanding between the dynamics between these structures and how they can be affected by intervention with small molecule ligands, more i-motif binding compounds are required. Herein we describe how the drug mitoxantrone can bind, induce folding of and stabilise i-motif forming DNA sequences, even at physiological pH. Additionally, mitoxantrone was found to bind i-motif forming sequences preferentially over double helical DNA. We also describe the stabilisation properties of analogues of mitoxantrone. This offers a new family of ligands with potential for use in experiments into the structure and function of i-motif forming DNA sequences.

  7. Physical-chemical property based sequence motifs and methods regarding same

    DOEpatents

    Braun, Werner; Mathura, Venkatarajan S.; Schein, Catherine H.

    2008-09-09

    A data analysis system, program, and/or method, e.g., a data mining/data exploration method, using physical-chemical property motifs. For example, a sequence database may be searched for identifying segments thereof having physical-chemical properties similar to the physical-chemical property motifs.

  8. Fast and Accurate Discovery of Degenerate Linear Motifs in Protein Sequences

    PubMed Central

    Levy, Emmanuel D.; Michnick, Stephen W.

    2014-01-01

    Linear motifs mediate a wide variety of cellular functions, which makes their characterization in protein sequences crucial to understanding cellular systems. However, the short length and degenerate nature of linear motifs make their discovery a difficult problem. Here, we introduce MotifHound, an algorithm particularly suited for the discovery of small and degenerate linear motifs. MotifHound performs an exact and exhaustive enumeration of all motifs present in proteins of interest, including all of their degenerate forms, and scores the overrepresentation of each motif based on its occurrence in proteins of interest relative to a background (e.g., proteome) using the hypergeometric distribution. To assess MotifHound, we benchmarked it together with state-of-the-art algorithms. The benchmark consists of 11,880 sets of proteins from S. cerevisiae; in each set, we artificially spiked-in one motif varying in terms of three key parameters, (i) number of occurrences, (ii) length and (iii) the number of degenerate or “wildcard” positions. The benchmark enabled the evaluation of the impact of these three properties on the performance of the different algorithms. The results showed that MotifHound and SLiMFinder were the most accurate in detecting degenerate linear motifs. Interestingly, MotifHound was 15 to 20 times faster at comparable accuracy and performed best in the discovery of highly degenerate motifs. We complemented the benchmark by an analysis of proteins experimentally shown to bind the FUS1 SH3 domain from S. cerevisiae. Using the full-length protein partners as sole information, MotifHound recapitulated most experimentally determined motifs binding to the FUS1 SH3 domain. Moreover, these motifs exhibited properties typical of SH3 binding peptides, e.g., high intrinsic disorder and evolutionary conservation, despite the fact that none of these properties were used as prior information. MotifHound is available (http://michnick.bcm.umontreal.ca or http

  9. Factoring local sequence composition in motif significance analysis.

    PubMed

    Ng, Patrick; Keich, Uri

    2008-01-01

    We recently introduced a biologically realistic and reliable significance analysis of the output of a popular class of motif finders. In this paper we further improve our significance analysis by incorporating local base composition information. Relying on realistic biological data simulation, as well as on FDR analysis applied to real data, we show that our method is significantly better than the increasingly popular practice of using the normal approximation to estimate the significance of a finder's output. Finally we turn to leveraging our reliable significance analysis to improve the actual motif finding task. Specifically, endowing a variant of the Gibbs Sampler with our improved significance analysis we demonstrate that de novo finders can perform better than has been perceived. Significantly, our new variant outperforms all the finders reviewed in a recently published comprehensive analysis of the Harbison genome-wide binding location data. Interestingly, many of these finders incorporate additional information such as nucleosome positioning and the significance of binding data.

  10. Analysis of Genomic Sequence Motifs for Deciphering Transcription Factor Binding and Transcriptional Regulation in Eukaryotic Cells

    PubMed Central

    Boeva, Valentina

    2016-01-01

    Eukaryotic genomes contain a variety of structured patterns: repetitive elements, binding sites of DNA and RNA associated proteins, splice sites, and so on. Often, these structured patterns can be formalized as motifs and described using a proper mathematical model such as position weight matrix and IUPAC consensus. Two key tasks are typically carried out for motifs in the context of the analysis of genomic sequences. These are: identification in a set of DNA regions of over-represented motifs from a particular motif database, and de novo discovery of over-represented motifs. Here we describe existing methodology to perform these two tasks for motifs characterizing transcription factor binding. When applied to the output of ChIP-seq and ChIP-exo experiments, or to promoter regions of co-modulated genes, motif analysis techniques allow for the prediction of transcription factor binding events and enable identification of transcriptional regulators and co-regulators. The usefulness of motif analysis is further exemplified in this review by how motif discovery improves peak calling in ChIP-seq and ChIP-exo experiments and, when coupled with information on gene expression, allows insights into physical mechanisms of transcriptional modulation. PMID:26941778

  11. Predicting candidate genomic sequences that correspond to synthetic functional RNA motifs

    PubMed Central

    Laserson, Uri; Gan, Hin Hark; Schlick, Tamar

    2005-01-01

    Riboswitches and RNA interference are important emerging mechanisms found in many organisms to control gene expression. To enhance our understanding of such RNA roles, finding small regulatory motifs in genomes presents a challenge on a wide scale. Many simple functional RNA motifs have been found by in vitro selection experiments, which produce synthetic target-binding aptamers as well as catalytic RNAs, including the hammerhead ribozyme. Motivated by the prediction of Piganeau and Schroeder [(2003) Chem. Biol., 10, 103–104] that synthetic RNAs may have natural counterparts, we develop and apply an efficient computational protocol for identifying aptamer-like motifs in genomes. We define motifs from the sequence and structural information of synthetic aptamers, search for sequences in genomes that will produce motif matches, and then evaluate the structural stability and statistical significance of the potential hits. Our application to aptamers for streptomycin, chloramphenicol, neomycin B and ATP identifies 37 candidate sequences (in coding and non-coding regions) that fold to the target aptamer structures in bacterial and archaeal genomes. Further energetic screening reveals that several candidates exhibit energetic properties and sequence conservation patterns that are characteristic of functional motifs. Besides providing candidates for experimental testing, our computational protocol offers an avenue for expanding natural RNA's functional repertoire. PMID:16254081

  12. GOmotif: A web server for investigating the biological role of protein sequence motifs

    PubMed Central

    2011-01-01

    Background Many proteins contain conserved sequence patterns (motifs) that contribute to their functionality. The process of experimentally identifying and validating novel protein motifs can be difficult, expensive, and time consuming. A means for helping to identify in advance the possible function of a novel motif is important to test hypotheses concerning the biological relevance of these motifs, thus reducing experimental trial-and-error. Results GOmotif accepts PROSITE and regular expression formatted motifs as input and searches a Gene Ontology annotated protein database using motif search tools. The search returns the set of proteins containing matching motifs and their associated Gene Ontology terms. These results are presented as: 1) a hierarchical, navigable tree separated into the three Gene Ontology biological domains - biological process, cellular component, and molecular function; 2) corresponding pie charts indicating raw and statistically adjusted distributions of the results, and 3) an interactive graphical network view depicting the location of the results in the Gene Ontology. Conclusions GOmotif is a web-based tool designed to assist researchers in investigating the biological role of novel protein motifs. GOmotif can be freely accessed at http://www.gomotif.ca PMID:21943350

  13. Recurring sequence-structure motifs in (βα)8-barrel proteins and experimental optimization of a chimeric protein designed based on such motifs.

    PubMed

    Wang, Jichao; Zhang, Tongchuan; Liu, Ruicun; Song, Meilin; Wang, Juncheng; Hong, Jiong; Chen, Quan; Liu, Haiyan

    2017-02-01

    An interesting way of generating novel artificial proteins is to combine sequence motifs from natural proteins, mimicking the evolutionary path suggested by natural proteins comprising recurring motifs. We analyzed the βα and αβ modules of TIM barrel proteins by structure alignment-based sequence clustering. A number of preferred motifs were identified. A chimeric TIM was designed by using recurring elements as mutually compatible interfaces. The foldability of the designed TIM protein was then significantly improved by six rounds of directed evolution. The melting temperature has been improved by more than 20°C. A variety of characteristics suggested that the resulting protein is well-folded. Our analysis provided a library of peptide motifs that is potentially useful for different protein engineering studies. The protein engineering strategy of using recurring motifs as interfaces to connect partial natural proteins may be applied to other protein folds.

  14. Creation of Hybrid Nanorods From Sequences of Natural Trimeric Fibrous Proteins Using the Fibritin Trimerization Motif

    NASA Astrophysics Data System (ADS)

    Papanikolopoulou, Katerina; van Raaij, Mark J.; Mitraki, Anna

    Stable, artificial fibrous proteins that can be functionalized open new avenues in fields such as bionanomaterials design and fiber engineering. An important source of inspiration for the creation of such proteins are natural fibrous proteins such as collagen, elastin, insect silks, and fibers from phages and viruses. The fibrous parts of this last class of proteins usually adopt trimeric, β-stranded structural folds and are appended to globular, receptor-binding domains. It has been recently shown that the globular domains are essential for correct folding and trimerization and can be successfully substituted by a very small (27-amino acid) trimerization motif from phage T4 fibritin. The hybrid proteins are correctly folded nanorods that can withstand extreme conditions. When the fibrous part derives from the adenovirus fiber shaft, different tissue-targeting specificities can be engineered into the hybrid proteins, which therefore can be used as gene therapy vectors. The integration of such stable nanorods in devices is also a big challenge in the field of biomechanical design. The fibritin foldon domain is a versatile trimerization motif and can be combined with a variety of fibrous motifs, such as coiled-coil, collagenous, and triple β-stranded motifs, provided the appropriate linkers are used. The combination of different motifs within the same fibrous molecule to create stable rods with multiple functions can even be envisioned. We provide a comprehensive overview of the experimental procedures used for designing, creating, and characterizing hybrid fibrous nanorods using the fibritin trimerization motif.

  15. Identification of an oligodeoxynucleotide sequence motif that specifically inhibits phosphorylation by protein tyrosine kinases.

    PubMed

    Krieg, A M; Matson, S; Cheng, K; Fisher, E; Koretzky, G A; Koland, J G

    1997-04-01

    Protein tyrosine kinases (PTKs) have central roles in cellular signal transduction. We have identified a sequence motif (CGT[C]GA) in phosphorothioate-modified oligodeoxynucleotides (ODNs) that specifically inhibits the enzymatic activity of recombinant or immunoprecipitated PTK in vitro. Hexamer ODNs containing this motif block both substrate and autophosphorylation of at least four different PTKs but have no apparent effect on the enzymatic activity of a serine/threonine protein kinase. These data suggest possible new applications for ODNs and have implications for the design and interpretation of experiments using antisense or triplex ODNs.

  16. Classification of protein motifs based on subcellular localization uncovers evolutionary relationships at both sequence and functional levels

    PubMed Central

    2013-01-01

    Background Most proteins have evolved in specific cellular compartments that limit their functions and potential interactions. On the other hand, motifs define amino acid arrangements conserved between protein family members and represent powerful tools for assigning function to protein sequences. The ideal motif would identify all members of a protein family but in practice many motifs identify both family members and unrelated proteins, referred to as True Positive (TP) and False Positive (FP) sequences, respectively. Results To address the relationship between protein motifs, protein function and cellular localization, we systematically assigned subcellular localization data to motif sequences from the comprehensive PROSITE sequence motif database. Using this data we analyzed relationships between localization and function. We find that TPs and FPs have a strong tendency to localize in different compartments. When multiple localizations are considered, TPs are usually distributed between related cellular compartments. We also identified cases where FPs are concentrated in particular subcellular regions, indicating possible functional or evolutionary relationships with TP sequences of the same motif. Conclusions Our findings suggest that the systematic examination of subcellular localization has the potential to uncover evolutionary and functional relationships between motif-containing sequences. We believe that this type of analysis complements existing motif annotations and could aid in their interpretation. Our results shed light on the evolution of cellular organelles and potentially establish the basis for new subcellular localization and function prediction algorithms. PMID:23865897

  17. Species-Specific Minimal Sequence Motif for Oligodeoxyribonucleotides Activating Mouse TLR9.

    PubMed

    Pohar, Jelka; Lainšček, Duško; Fukui, Ryutaro; Yamamoto, Chikako; Miyake, Kensuke; Jerala, Roman; Benčina, Mojca

    2015-11-01

    Synthetic oligodeoxyribonucleotides (ODNs) containing unmethylated CpG recapitulate the activation of TLR9 by microbial DNA. ODNs are potent stimulators of the immune response in cells expressing TLR9. Despite extensive use of mice as experimental animals in basic and applied immunological research, the key sequence determinants that govern the activation of mouse TLR9 by ODNs have not been well defined. We performed a systematic investigation of the sequence motif of B class phosphodiester ODNs to identify the sequence properties that govern mouse TLR9 activation. In contrast to ODNs activating human TLR9, where the minimal sequence motif for the receptor activation comprises a pair of closely positioned CpGs we found that the mouse TLR9 requires a single CpG positioned 4-6 nt from the 5'-end. Activation is augmented by a 5'TCC sequence one to three nucleotides from the CG. The distance of the CG dinucleotide of four to six nucleotides from the 5'-end and the ODN's length fine-tunes activation of mouse macrophages. Length of the ODN <23 and >29 nt decreases activation of dendritic cells. The ODNs with minimal sequence induce Th1-type cytokine synthesis in dendritic cells and confirm the expression of cell surface markers in B cells. Identification of the minimal sequence provides an insight into the sequence selectivity of mouse TLR9 and points to the differences in the receptor selectivity between species probably as a result of differences in the receptor binding sites.

  18. Using machine learning to predict gene expression and discover sequence motifs

    NASA Astrophysics Data System (ADS)

    Li, Xuejing

    Recently, large amounts of experimental data for complex biological systems have become available. We use tools and algorithms from machine learning to build data-driven predictive models. We first present a novel algorithm to discover gene sequence motifs associated with temporal expression patterns of genes. Our algorithm, which is based on partial least squares (PLS) regression, is able to directly model the flow of information, from gene sequence to gene expression, to learn cis regulatory motifs and characterize associated gene expression patterns. Our algorithm outperforms traditional computational methods e.g. clustering in motif discovery. We then present a study of extending a machine learning model for transcriptional regulation predictive of genetic regulatory response to Caenorhabditis elegans. We show meaningful results both in terms of prediction accuracy on the test experiments and biological information extracted from the regulatory program. The model discovers DNA binding sites ab initio. We also present a case study where we detect a signal of lineage-specific regulation. Finally we present a comparative study on learning predictive models for motif discovery, based on different boosting algorithms: Adaptive Boosting (AdaBoost), Linear Programming Boosting (LPBoost) and Totally Corrective Boosting (TotalBoost). We evaluate and compare the performance of the three boosting algorithms via both statistical and biological validation, for hypoxia response in Saccharomyces cerevisiae.

  19. Conserved sequence motifs among bacterial, eukaryotic, and archaeal phosphatases that define a new phosphohydrolase superfamily.

    PubMed Central

    Thaller, M. C.; Schippa, S.; Rossolini, G. M.

    1998-01-01

    Members of a new molecular family of bacterial nonspecific acid phosphatases (NSAPs), indicated as class C, were found to share significant sequence similarities to bacterial class B NSAPs and to some plant acid phosphatases, representing the first example of a family of bacterial NSAPs that has a relatively close eukaryotic counterpart. Despite the lack of an overall similarity, conserved sequence motifs were also identified among the above enzyme families (class B and class C bacterial NSAPs, and related plant phosphatases) and several other families of phosphohydrolases, including bacterial phosphoglycolate phosphatases, histidinol-phosphatase domains of the bacterial bifunctional enzymes imidazole-glycerolphosphate dehydratases, and bacterial, eukaryotic, and archaeal phosphoserine phosphatases and threalose-6-phosphatases. These conserved motifs are clustered within two domains, separated by a variable spacer region, according to the pattern [FILMAVT]-D-[ILFRMVY]-D-[GSNDE]-[TV]-[ILVAM]-[AT S VILMC]-X-¿YFWHKR)-X-¿YFWHNQ¿-X( 102,191)-¿KRHNQ¿-G-D-¿FYWHILVMC¿-¿QNH¿-¿FWYGP¿-D -¿PSNQYW¿. The dephosphorylating activity common to all these proteins supports the definition of this phosphatase motif and the inclusion of these enzymes into a superfamily of phosphohydrolases that we propose to indicate as "DDDD" after the presence of the four invariant aspartate residues. Database searches retrieved various hypothetical proteins of unknown function containing this or similar motifs, for which a phosphohydrolase activity could be hypothesized. PMID:9684901

  20. Nucleotide binding database NBDB – a collection of sequence motifs with specific protein-ligand interactions

    PubMed Central

    Zheng, Zejun; Goncearenco, Alexander; Berezovsky, Igor N.

    2016-01-01

    NBDB database describes protein motifs, elementary functional loops (EFLs) that are involved in binding of nucleotide-containing ligands and other biologically relevant cofactors/coenzymes, including ATP, AMP, ATP, GMP, GDP, GTP, CTP, PAP, PPS, FMN, FAD(H), NAD(H), NADP, cAMP, cGMP, c-di-AMP and c-di-GMP, ThPP, THD, F-420, ACO, CoA, PLP and SAM. The database is freely available online at http://nbdb.bii.a-star.edu.sg. In total, NBDB contains data on 249 motifs that work in interactions with 24 ligands. Sequence profiles of EFL motifs were derived de novo from nonredundant Uniprot proteome sequences. Conserved amino acid residues in the profiles interact specifically with distinct chemical parts of nucleotide-containing ligands, such as nitrogenous bases, phosphate groups, ribose, nicotinamide, and flavin moieties. Each EFL profile in the database is characterized by a pattern of corresponding ligand–protein interactions found in crystallized ligand–protein complexes. NBDB database helps to explore the determinants of nucleotide and cofactor binding in different protein folds and families. NBDB can also detect fragments that match to profiles of particular EFLs in the protein sequence provided by user. Comprehensive information on sequence, structures, and interactions of EFLs with ligands provides a foundation for experimental and computational efforts on design of required protein functions. PMID:26507856

  1. A Convex Atomic-Norm Approach to Multiple Sequence Alignment and Motif Discovery

    PubMed Central

    Yen, Ian E. H.; Lin, Xin; Zhang, Jiong; Ravikumar, Pradeep; Dhillon, Inderjit S.

    2016-01-01

    Multiple Sequence Alignment and Motif Discovery, known as NP-hard problems, are two fundamental tasks in Bioinformatics. Existing approaches to these two problems are based on either local search methods such as Expectation Maximization (EM), Gibbs Sampling or greedy heuristic methods. In this work, we develop a convex relaxation approach to both problems based on the recent concept of atomic norm and develop a new algorithm, termed Greedy Direction Method of Multiplier, for solving the convex relaxation with two convex atomic constraints. Experiments show that our convex relaxation approach produces solutions of higher quality than those standard tools widely-used in Bioinformatics community on the Multiple Sequence Alignment and Motif Discovery problems. PMID:27559428

  2. 'Size leap' algorithm: an efficient extraction of the longest common motifs from a molecular sequence set. Application to the DNA sequence reconstruction.

    PubMed

    Danckaert, A; Chappey, C; Hazout, S

    1991-10-01

    We propose a new method, called 'size leap' algorithm, of search for motifs of maximum size and common to two fragments at least. It allows the creation of a reduced database of motifs from a set of sequences whose size obeys the series of Fibonacci numbers. The convenience lies in the efficiency of the motif extraction. It can be applied in the establishment of overlap regions for DNA sequence reconstruction and multiple alignment of biological sequences. The method of complete DNA sequence reconstruction by extraction of the longest motifs ('anchor motifs') is presented as an application of the size leap algorithm. The details of a reconstruction from three sequenced fragments are given as an example.

  3. The tungsten formylmethanofuran dehydrogenase from Methanobacterium thermoautotrophicum contains sequence motifs characteristic for enzymes containing molybdopterin dinucleotide.

    PubMed

    Hochheimer, A; Schmitz, R A; Thauer, R K; Hedderich, R

    1995-12-15

    Formylmethanofuran dehydrogenases are molybdenum or tungsten iron-sulfur proteins containing a pterin dinucleotide cofactor. We report here on the primary structures of the four subunits FwdABCD of the tungsten enzyme from Methanobacterium thermoautotrophicum which were determined by cloning and sequencing the encoding genes fwdABCD. FwdB was found to contain sequence motifs characteristic for molybdopterin-dinucleotide-containing enzymes indicating that this subunit harbors the active site. FwdA, FwdC and FwdD showed no significant sequence similarity to proteins in the data bases. Northern blot analysis revealed that the four fwd genes form a transcription unit together with three additional genes designated fwdE, fwdF and fwdG. A 17.8-kDa protein and an 8.6-kDa protein, both containing two [4Fe-4S] cluster binding motifs, were deduced from fwdE and fwdG. The open reading frame fwdF encodes a 38.6-kDa protein containing eight binding motifs for [4Fe-4S] clusters suggesting the gene product to be a novel polyferredoxin. All seven fwd genes were expressed in Escherichia coli yielding proteins of the expected size. The fwd operon was found to be located in a region of the M. thermoautotrophicum genome encoding molybdenum enzymes and proteins involved in molybdopterin biosynthesis.

  4. Peptide sequence motif analysis of tandem MS data with the SALSA algorithm.

    PubMed

    Liebler, Daniel C; Hansen, Beau T; Davey, Sean W; Tiscareno, Laura; Mason, Daniel E

    2002-01-01

    We have developed a pattern recognition algorithm called SALSA (scoring algorithm for spectral analysis) for the detection of specific features in tandem MS (MS-MS) spectra. Application of the SALSA algorithm to the detection of peptide MS-MS ion series enables identification of MS-MS spectra displaying characteristics of specific peptide sequences. SALSA analysis scores MS-MS spectra based on correspondence between theoretical ion series for peptide sequence motifs and actual MS-MS product ion series, regardless of their absolute positions on the m/z axis. Analyses of tryptic digests of bovine serum albumin (BSA) by LC-MS-MS followed by SALSA analysis detected MS-MS spectra for both unmodified and multiple modified forms of several BSA tryptic peptides. SALSA analysis of MS-MS data from mixtures of BSA and human serum albumin (HSA) tryptic digests indicated that ion series searches with BSA peptide sequence motifs identified MS-MS spectra for both BSA and closely related HSA peptides. Optimal discrimination between MS-MS spectra of variant peptide forms is achieved when the SALSA search criteria are optimized to the target peptide. Application of SALSA to LC-MS-MS proteome analysis will facilitate the characterization of modified and sequence variant proteins.

  5. CyanoLyase: a database of phycobilin lyase sequences, motifs and functions

    PubMed Central

    Bretaudeau, Anthony; Coste, François; Humily, Florian; Garczarek, Laurence; Le Corguillé, Gildas; Six, Christophe; Ratin, Morgane; Collin, Olivier; Schluchter, Wendy M.; Partensky, Frédéric

    2013-01-01

    CyanoLyase (http://cyanolyase.genouest.org/) is a manually curated sequence and motif database of phycobilin lyases and related proteins. These enzymes catalyze the covalent ligation of chromophores (phycobilins) to specific binding sites of phycobiliproteins (PBPs). The latter constitute the building bricks of phycobilisomes, the major light-harvesting systems of cyanobacteria and red algae. Phycobilin lyases sequences are poorly annotated in public databases. Sequences included in CyanoLyase were retrieved from all available genomes of these organisms and a few others by similarity searches using biochemically characterized enzyme sequences and then classified into 3 clans and 32 families. Amino acid motifs were computed for each family using Protomata learner. CyanoLyase also includes BLAST and a novel pattern matching tool (Protomatch) that allow users to rapidly retrieve and annotate lyases from any new genome. In addition, it provides phylogenetic analyses of all phycobilin lyases families, describes their function, their presence/absence in all genomes of the database (phyletic profiles) and predicts the chromophorylation of PBPs in each strain. The site also includes a thorough bibliography about phycobilin lyases and genomes included in the database. This resource should be useful to scientists and companies interested in natural or artificial PBPs, which have a number of biotechnological applications, notably as fluorescent markers. PMID:23175607

  6. Network motifs in integrated cellular networks of transcription-regulation and protein-protein interaction

    NASA Astrophysics Data System (ADS)

    Yeger-Lotem, Esti; Sattath, Shmuel; Kashtan, Nadav; Itzkovitz, Shalev; Milo, Ron; Pinter, Ron Y.; Alon, Uri; Margalit, Hanah

    2004-04-01

    Genes and proteins generate molecular circuitry that enables the cell to process information and respond to stimuli. A major challenge is to identify characteristic patterns in this network of interactions that may shed light on basic cellular mechanisms. Previous studies have analyzed aspects of this network, concentrating on either transcription-regulation or protein-protein interactions. Here we search for composite network motifs: characteristic network patterns consisting of both transcription-regulation and protein-protein interactions that recur significantly more often than in random networks. To this end we developed algorithms for detecting motifs in networks with two or more types of interactions and applied them to an integrated data set of protein-protein interactions and transcription regulation in Saccharomyces cerevisiae. We found a two-protein mixed-feedback loop motif, five types of three-protein motifs exhibiting coregulation and complex formation, and many motifs involving four proteins. Virtually all four-protein motifs consisted of combinations of smaller motifs. This study presents a basic framework for detecting the building blocks of networks with multiple types of interactions.

  7. QGRS-H Predictor: a web server for predicting homologous quadruplex forming G-rich sequence motifs in nucleotide sequences

    PubMed Central

    Menendez, Camille; Frees, Scott; Bagga, Paramjeet S.

    2012-01-01

    Naturally occurring G-quadruplex structural motifs, formed by guanine-rich nucleic acids, have been reported in telomeric, promoter and transcribed regions of mammalian genomes. G-quadruplex structures have received significant attention because of growing evidence for their role in important biological processes, human disease and as therapeutic targets. Lately, there has been much interest in the potential roles of RNA G-quadruplexes as cis-regulatory elements of post-transcriptional gene expression. Large-scale computational genomics studies on G-quadruplexes have difficulty validating their predictions without laborious testing in ‘wet’ labs. We have developed a bioinformatics tool, QGRS-H Predictor that can map and analyze conserved putative Quadruplex forming 'G'-Rich Sequences (QGRS) in mRNAs, ncRNAs and other nucleotide sequences, e.g. promoter, telomeric and gene flanking regions. Identifying conserved regulatory motifs helps validate computations and enhances accuracy of predictions. The QGRS-H Predictor is particularly useful for mapping homologous G-quadruplex forming sequences as cis-regulatory elements in the context of 5′- and 3′-untranslated regions, and CDS sections of aligned mRNA sequences. QGRS-H Predictor features highly interactive graphic representation of the data. It is a unique and user-friendly application that provides many options for defining and studying G-quadruplexes. The QGRS-H Predictor can be freely accessed at: http://quadruplex.ramapo.edu/qgrs/app/start. PMID:22576365

  8. Computational Prediction of Phylogenetically Conserved Sequence Motifs for Five Different Candidate Genes in Type II Diabetic Nephropathy

    PubMed Central

    Sindhu, T; Rajamanikandan, S; Srinivasan, P

    2012-01-01

    Background: Computational identification of phylogenetic motifs helps to understand the knowledge about known functional features that includes catalytic site, substrate binding epitopes, and protein-protein interfaces. Furthermore, they are strongly conserved among orthologs, indicating their evolutionary importance. The study aimed to analyze five candidate genes involved in type II diabetic nephropathy and to predict phylogenetic motifs from their corresponding orthologous protein sequences. Methods: AKR1B1, APOE, ENPP1, ELMO1 and IGFBP1 are the genes that have been identified as an important target for type II diabetic nephropathy through experimental studies. Their corresponding protein sequences, structures, orthologous sequences were retrieved from UniprotKB, PDB, and PHOG database respectively. Multiple sequence alignments were constructed using ClustalW and phylogenetic motifs were identified using MINER. The occurrence of amino acids in the obtained phylogenetic motifs was generated using WebLogo and false positive expectations were calculated against phylogenetic similarity. Results: In total, 17 phylogenetic motifs were identified from the five proteins and the residues such as glycine, leucine, tryptophan, aspartic acid were found in appreciable frequency whereas arginine identified in all the predicted PMs. The result implies that these residues can be important to the functional and structural role of the proteins and calculated false positive expectations implies that they were generally conserved in traditional sense. Conclusion: The prediction of phylogenetic motifs is an accurate method for detecting functionally important conserved residues. The conserved motifs can be used as a potential drug target for type II diabetic nephropathy. PMID:23113206

  9. Identification of sequence motifs involved in Dengue virus-host interactions.

    PubMed

    Asnet Mary, J; Paramasivan, R; Shenbagarathai, R

    2016-01-01

    Dengue fever is a rapidly spreading mosquito-borne virus infection, which remains a serious global public health problem. As there is no specific treatment or commercial vaccine available for effective control of the disease, the attempts on developing novel control strategies are underway. Viruses utilize the surface receptor proteins of host to enter into the cells. Though various proteins were said to be receptors of Dengue virus (DENV) using Virus Overlay Protein Binding Assay, the precise interaction between DENV and host is not explored. Understanding the structural features of domain III envelope glycoprotein would help in developing efficient antiviral inhibitors. Therefore, an attempt was made to identify the sequence motifs present in domain III envelope glycoprotein of Dengue virus. Computational analysis revealed that the NGR motif is present in the domain III envelope glycoprotein of DENV-1 and DENV-3. Similarly, DENV-1, DENV-2 and DENV-4 were found to contain Yxxphi motif which is a tyrosine-based sorting signal responsible for the interaction with a mu subunit of adaptor protein complex. High-throughput virtual screening resulted in five compounds as lead molecules based on glide score, which ranges from -4.664 to -6.52 kcal/Mol. This computational prediction provides an additional tool for understanding the virus-host interactions and helps to identify potential targets in the host. Further, experimental evidence is warranted to confirm the virus-host interactions and also inhibitory activity of reported lead compounds.

  10. Sequence-motif Detection of NAD(P)-binding Proteins: Discovery of a Unique Antibacterial Drug Target

    NASA Astrophysics Data System (ADS)

    Hua, Yun Hao; Wu, Chih Yuan; Sargsyan, Karen; Lim, Carmay

    2014-09-01

    Many enzymes use nicotinamide adenine dinucleotide or nicotinamide adenine dinucleotide phosphate (NAD(P)) as essential coenzymes. These enzymes often do not share significant sequence identity and cannot be easily detected by sequence homology. Previously, we determined all distinct locally conserved pyrophosphate-binding structures (3d motifs) from NAD(P)-bound protein structures, from which 1d sequence motifs were derived. Here, we aim to establish the precision of these 3d and 1d motifs to annotate NAD(P)-binding proteins. We show that the pyrophosphate-binding 3d motifs are characteristic of NAD(P)-binding proteins, as they are rarely found in nonNAD(P)-binding proteins. Furthermore, several 1d motifs could distinguish between proteins that bind only NAD and those that bind only NADP. They could also distinguish between NAD(P)-binding proteins from nonNAD(P)-binding ones. Interestingly, one of the pyrophosphate-binding 3d and corresponding 1d motifs was found only in enoyl-acyl carrier protein reductases, which are enzymes essential for bacterial fatty acid biosynthesis. This unique 3d motif serves as an attractive novel drug target, as it is conserved across many bacterial species and is not found in human proteins.

  11. Quadfinder: server for identification and analysis of quadruplex-forming motifs in nucleotide sequences

    PubMed Central

    Scaria, Vinod; Hariharan, Manoj; Arora, Amit; Maiti, Souvik

    2006-01-01

    G-quadruplex secondary structures, which play a structural role in repetitive DNA such as telomeres, may also play a functional role at other genomic locations as targetable regulatory elements which control gene expression. The recent interest in application of quadruplexes in biological systems prompted us to develop a tool for the identification and analysis of quadruplex-forming nucleotide sequences especially in the RNA. Here we present Quadfinder, an online server for prediction and bioinformatics of uni-molecular quadruplex-forming nucleotide sequences. The server is designed to be user-friendly and needs minimal intervention by the user, while providing flexibility of defining the variants of the motif. The server is freely available at URL . PMID:16845097

  12. A Bayesian hidden Markov model for motif discovery through joint modeling of genomic sequence and ChIP-chip data.

    PubMed

    Gelfond, Jonathan A L; Gupta, Mayetri; Ibrahim, Joseph G

    2009-12-01

    We propose a unified framework for the analysis of chromatin (Ch) immunoprecipitation (IP) microarray (ChIP-chip) data for detecting transcription factor binding sites (TFBSs) or motifs. ChIP-chip assays are used to focus the genome-wide search for TFBSs by isolating a sample of DNA fragments with TFBSs and applying this sample to a microarray with probes corresponding to tiled segments across the genome. Present analytical methods use a two-step approach: (i) analyze array data to estimate IP-enrichment peaks then (ii) analyze the corresponding sequences independently of intensity information. The proposed model integrates peak finding and motif discovery through a unified Bayesian hidden Markov model (HMM) framework that accommodates the inherent uncertainty in both measurements. A Markov chain Monte Carlo algorithm is formulated for parameter estimation, adapting recursive techniques used for HMMs. In simulations and applications to a yeast RAP1 dataset, the proposed method has favorable TFBS discovery performance compared to currently available two-stage procedures in terms of both sensitivity and specificity.

  13. A Bayesian hidden Markov model for motif discovery through joint modeling of genomic sequence and ChIP-chip data

    PubMed Central

    Gelfond, Jonathan A. L.; Gupta, Mayetri; Ibrahim, Joseph G.

    2009-01-01

    SUMMARY We propose a unified framework for the analysis of Chromatin (Ch) Immunoprecipitation (IP) microarray (ChIP-chip) data for detecting transcription factor binding sites (TFBSs) or motifs. ChIP-chip assays are used to focus the genome-wide search for TFBSs by isolating a sample of DNA fragments with TFBSs and applying this sample to a microarray with probes corresponding to tiled segments across the genome. Present analytical methods use a two-step approach: (i) analyze array data to estimate IP enrichment peaks then (ii) analyze the corresponding sequences independently of intensity information. The proposed model integrates peak finding and motif discovery through a unified Bayesian hidden Markov model (HMM) framework that accommodates the inherent uncertainty in both measurements. A Markov Chain Monte Carlo algorithm is formulated for parameter estimation, adapting recursive techniques used for HMMs. In simulations and applications to a yeast RAP1 dataset, the proposed method has favorable TFBS discovery performance compared to currently available two-stage procedures in terms of both sensitivity and specificity. PMID:19210737

  14. Functional characterization of sequence motifs in the transit peptide of Arabidopsis small subunit of rubisco.

    PubMed

    Lee, Dong Wook; Lee, Sookjin; Lee, Gil-Je; Lee, Kwang Hee; Kim, Sanguk; Cheong, Gang-Won; Hwang, Inhwan

    2006-02-01

    The transit peptides of nuclear-encoded chloroplast proteins are necessary and sufficient for targeting and import of proteins into chloroplasts. However, the sequence information encoded by transit peptides is not fully understood. In this study, we investigated sequence motifs in the transit peptide of the small subunit of the Rubisco complex by examining the ability of various mutant transit peptides to target green fluorescent protein reporter proteins to chloroplasts in Arabidopsis (Arabidopsis thaliana) leaf protoplasts. We divided the transit peptide into eight blocks (T1 through T8), each consisting of eight or 10 amino acids, and generated mutants that had alanine (Ala) substitutions or deletions, of one or two T blocks in the transit peptide. In addition, we generated mutants that had the original sequence partially restored in single- or double-T-block Ala (A) substitution mutants. Analysis of chloroplast import of these mutants revealed several interesting observations. Single-T-block mutations did not noticeably affect targeting efficiency, except in T1 and T4 mutations. However, double-T mutants, T2A/T4A, T3A/T6A, T3A/T7A, T4A/T6A, and T4A/T7A, caused a 50% to 100% loss in targeting ability. T3A/T6A and T4A/T6A mutants produced only precursor proteins, whereas T2A/T4A and T4A/T7A mutants produced only a 37-kD protein. Detailed analyses revealed that sequence motifs ML in T1, LKSSA in T3, FP and RK in T4, CMQVW in T6, and KKFET in T7 play important roles in chloroplast targeting. In T1, the hydrophobicity of ML is important for targeting. LKSSA in T3 is functionally equivalent to CMQVW in T6 and KKFET in T7. Furthermore, subcellular fractionation revealed that Ala substitution in T1, T3, and T6 produced soluble precursors, whereas Ala substitution in T4 and T7 produced intermediates that were tightly associated with membranes. These results demonstrate that the transit peptide contains multiple motifs and that some of them act in concert or

  15. Sequence motifs and prokaryotic expression of the reptilian paramyxovirus fusion protein

    USGS Publications Warehouse

    Franke, J.; Batts, W.N.; Ahne, W.; Kurath, G.; Winton, J.R.

    2006-01-01

    Fourteen reptilian paramyxovirus isolates were chosen to represent the known extent of genetic diversity among this novel group of viruses. Selected regions of the fusion (F) gene were sequenced, analyzed and compared. The F gene of all isolates contained conserved motifs homologous to those described for other members of the family Paramyxoviridae including: signal peptide, transmembrane domain, furin cleavage site, fusion peptide, N-linked glycosylation sites, and two heptad repeats, the second of which (HRB-LZ) had the characteristics of a leucine zipper. Selected regions of the fusion gene of isolate Gono-GER85 were inserted into a prokaryotic expression system to generate three recombinant protein fragments of various sizes. The longest recombinant protein was cleaved by furin into two fragments of predicted length. Western blot analysis with virus-neutralizing rabbit-antiserum against this isolate demonstrated that only the longest construct reacted with the antiserum. This construct was unique in containing 30 additional C-terminal amino acids that included most of the HRB-LZ. These results indicate that the F genes of reptilian paramyxoviruses contain highly conserved motifs typical of other members of the family and suggest that the HRB-LZ domain of the reptilian paramyxovirus F protein contains a linear antigenic epitope. ?? Springer-Verlag 2005.

  16. The Motif Tool Assessment Platform (MTAP) for sequence-based transcription factor binding site prediction tools.

    PubMed

    Quest, Daniel; Ali, Hesham

    2010-01-01

    Predicting transcription factor binding sites (TFBS) from sequence is one of the most challenging problems in computational biology. The development of (semi-)automated computer-assisted prediction methods is needed to find TFBS over an entire genome, which is a first step in reconstructing mechanisms that control gene activity. Bioinformatics journals continue to publish diverse methods for predicting TFBS on a monthly basis. To help practitioners in deciding which method to use to predict for a particular TFBS, we provide a platform to assess the quality and applicability of the available methods. Assessment tools allow researchers to determine how methods can be expected to perform on specific organisms or on specific transcription factor families. This chapter introduces the TFBS detection problem and reviews current strategies for evaluating algorithm effectiveness. In this chapter, a novel and robust assessment tool, the Motif Tool Assessment Platform (MTAP), is introduced and discussed.

  17. The structure of an endogenous Drosophila centromere reveals the prevalence of tandemly repeated sequences able to form i-motifs

    PubMed Central

    Garavís, Miguel; Méndez-Lago, María; Gabelica, Valérie; Whitehead, Siobhan L.; González, Carlos; Villasante, Alfredo

    2015-01-01

    Centromeres are the chromosomal loci at which spindle microtubules attach to mediate chromosome segregation during mitosis and meiosis. In most eukaryotes, centromeres are made up of highly repetitive DNA sequences (satellite DNA) interspersed with middle repetitive DNA sequences (transposable elements). Despite the efforts to establish complete genomic sequences of eukaryotic organisms, the so-called ‘finished’ genomes are not actually complete because the centromeres have not been assembled due to the intrinsic difficulties in constructing both physical maps and complete sequence assemblies of long stretches of tandemly repetitive DNA. Here we show the first molecular structure of an endogenous Drosophila centromere and the ability of the C-rich dodeca satellite strand to form dimeric i-motifs. The finding of i-motif structures in simple and complex centromeric satellite DNAs leads us to suggest that these centromeric sequences may have been selected not by their primary sequence but by their ability to form noncanonical secondary structures. PMID:26289671

  18. Integration of retinal image sequences

    NASA Astrophysics Data System (ADS)

    Ballerini, Lucia

    1998-10-01

    In this paper a method for noise reduction in ocular fundus image sequences is described. The eye is the only part of the human body where the capillary network can be observed along with the arterial and venous circulation using a non invasive technique. The study of the retinal vessels is very important both for the study of the local pathology (retinal disease) and for the large amount of information it offers on systematic haemodynamics, such as hypertension, arteriosclerosis, and diabetes. In this paper a method for image integration of ocular fundus image sequences is described. The procedure can be divided in two step: registration and fusion. First we describe an automatic alignment algorithm for registration of ocular fundus images. In order to enhance vessel structures, we used a spatially oriented bank of filters designed to match the properties of the objects of interest. To evaluate interframe misalignment we adopted a fast cross-correlation algorithm. The performances of the alignment method have been estimated by simulating shifts between image pairs and by using a cross-validation approach. Then we propose a temporal integration technique of image sequences so as to compute enhanced pictures of the overall capillary network. Image registration is combined with image enhancement by fusing subsequent frames of a same region. To evaluate the attainable results, the signal-to-noise ratio was estimated before and after integration. Experimental results on synthetic images of vessel-like structures with different kind of Gaussian additive noise as well as on real fundus images are reported.

  19. Sequence-specific high mobility group box factors recognize 10-12-base pair minor groove motifs.

    PubMed

    van Beest, M; Dooijes, D; van De Wetering, M; Kjaerulff, S; Bonvin, A; Nielsen, O; Clevers, H

    2000-09-01

    Sequence-specific high mobility group (HMG) box factors bind and bend DNA via interactions in the minor groove. Three-dimensional NMR analyses have provided the structural basis for this interaction. The cognate HMG domain DNA motif is generally believed to span 6-8 bases. However, alignment of promoter elements controlled by the yeast genes ste11 and Rox1 has indicated strict conservation of a larger DNA motif. By site selection, we identify a highly specific 12-base pair motif for Ste11, AGAACAAAGAAA. Similarly, we show that Tcf1, MatMc, and Sox4 bind unique, highly specific DNA motifs of 12, 12, and 10 base pairs, respectively. Footprinting with a deletion mutant of Ste11 reveals a novel interaction between the 3' base pairs of the extended DNA motif and amino acids C-terminal to the HMG domain. The sequence-specific interaction of Ste11 with these 3' base pairs contributes significantly to binding and bending of the DNA motif.

  20. Platelet immunoreceptor tyrosine-based activation motif (ITAM) signaling and vascular integrity.

    PubMed

    Boulaftali, Yacine; Hess, Paul R; Kahn, Mark L; Bergmeier, Wolfgang

    2014-03-28

    Platelets are well-known for their critical role in hemostasis, that is, the prevention of blood loss at sites of mechanical vessel injury. Inappropriate platelet activation and adhesion, however, can lead to thrombotic complications, such as myocardial infarction and stroke. To fulfill its role in hemostasis, the platelet is equipped with various G protein-coupled receptors that mediate the response to soluble agonists such as thrombin, ADP, and thromboxane A2. In addition to G protein-coupled receptors, platelets express 3 glycoproteins that belong to the family of immunoreceptor tyrosine-based activation motif receptors: Fc receptor γ chain, which is noncovalently associated with the glycoprotein VI collagen receptor, C-type lectin 2, the receptor for podoplanin, and Fc receptor γII A, a low-affinity receptor for immune complexes. Although both genetic and chemical approaches have documented a critical role for platelet G protein-coupled receptors in hemostasis, the contribution of immunoreceptor tyrosine-based activation motif receptors to this process is less defined. Studies performed during the past decade, however, have identified new roles for platelet immunoreceptor tyrosine-based activation motif signaling in vascular integrity in utero and at sites of inflammation. The purpose of this review is to summarize recent findings on how platelet immunoreceptor tyrosine-based activation motif signaling controls vascular integrity, both in the presence and absence of mechanical injury.

  1. Exceptional motifs in different Markov chain models for a statistical analysis of DNA sequences.

    PubMed

    Schbath, S; Prum, B; de Turckheim, E

    1995-01-01

    Identifying exceptional motifs is often used for extracting information from long DNA sequences. The two difficulties of the method are the choice of the model that defines the expected frequencies of words and the approximation of the variance of the difference T(W) between the number of occurrences of a word W and its estimation. We consider here different Markov chain models, either with stationary or periodic transition probabilities. We estimate the variance of the difference T(W) by the conditional variance of the number of occurrences of W given the oligonucleotides counts that define the model. Two applications show how to use asymptotically standard normal statistics associated with the counts to describe a given sequence in terms of its outlying words. Sequences of Escherichia coli and of Bacillus subtilis are compared with respect to their exceptional tri- and tetranucleotides. For both bacteria, exceptional 3-words are mainly found in the coding frame. E. coli palindrome counts are analyzed in different models, showing that many overabundant words are one-letter mutations of avoided palindromes.

  2. Synthesis, anti-mycobacterial activity and DNA sequence-selectivity of a library of biaryl-motifs containing polyamides.

    PubMed

    Brucoli, Federico; Guzman, Juan D; Maitra, Arundhati; James, Colin H; Fox, Keith R; Bhakta, Sanjib

    2015-07-01

    The alarming rise of extensively drug-resistant tuberculosis (XDR-TB) strains, compel the development of new molecules with novel modes of action to control this world health emergency. Distamycin analogues containing N-terminal biaryl-motifs 2(1-5)(1-7) were synthesised using a solution-phase approach and evaluated for their anti-mycobacterial activity and DNA-sequence selectivity. Thiophene dimer motif-containing polyamide 2(2,6) exhibited 10-fold higher inhibitory activity against Mycobacterium tuberculosis compared to distamycin and library member 2(5,7) showed high binding affinity for the 5'-ACATAT-3' sequence.

  3. Engineering Proteins with Enhanced Mechanical Stability by Force Specific Sequence Motifs

    PubMed Central

    Lu, Wenzhe; Negi, Surendra; Oberhauser, Andres F.; Braun, Werner

    2012-01-01

    Use of atomic force microscopy (AFM) has recently led to a better understanding of the molecular mechanisms of the unfolding process by mechanical forces; however, the rational design of novel proteins with specific mechanical strength remains challenging. We have approached this problem from a new perspective that generates linear physical-chemical properties (PCP) motifs from a limited AFM data set. Guided by our linear sequence analysis we designed and analyzed four new mutants of the titin I1 domain with the goal of increasing the domain's mechanical strength. All four mutants could be cloned and expressed as soluble proteins. AFM data indicate that at least two of the mutants have increased molecular mechanical strength. This observation suggests that the PCP method is useful to graft sequences specific for high mechanical stability to weak proteins to increase their mechanical stability, and represents an additional tool in the design of novel proteins besides steered molecular dynamics calculations, coarse grained simulations and phi-value analysis of the transition state. PMID:22274941

  4. Detecting Remote Sequence Homology in Disordered Proteins: Discovery of Conserved Motifs in the N-Termini of Mononegavirales phosphoproteins

    PubMed Central

    Karlin, David; Belshaw, Robert

    2012-01-01

    Paramyxovirinae are a large group of viruses that includes measles virus and parainfluenza viruses. The viral Phosphoprotein (P) plays a central role in viral replication. It is composed of a highly variable, disordered N-terminus and a conserved C-terminus. A second viral protein alternatively expressed, the V protein, also contains the N-terminus of P, fused to a zinc finger. We suspected that, despite their high variability, the N-termini of P/V might all be homologous; however, using standard approaches, we could previously identify sequence conservation only in some Paramyxovirinae. We now compared the N-termini using sensitive sequence similarity search programs, able to detect residual similarities unnoticeable by conventional approaches. We discovered that all Paramyxovirinae share a short sequence motif in their first 40 amino acids, which we called soyuz1. Despite its short length (11–16aa), several arguments allow us to conclude that soyuz1 probably evolved by homologous descent, unlike linear motifs. Conservation across such evolutionary distances suggests that soyuz1 plays a crucial role and experimental data suggest that it binds the viral nucleoprotein to prevent its illegitimate self-assembly. In some Paramyxovirinae, the N-terminus of P/V contains a second motif, soyuz2, which might play a role in blocking interferon signaling. Finally, we discovered that the P of related Mononegavirales contain similarly overlooked motifs in their N-termini, and that their C-termini share a previously unnoticed structural similarity suggesting a common origin. Our results suggest several testable hypotheses regarding the replication of Mononegavirales and suggest that disordered regions with little overall sequence similarity, common in viral and eukaryotic proteins, might contain currently overlooked motifs (intermediate in length between linear motifs and disordered domains) that could be detected simply by comparing orthologous proteins. PMID:22403617

  5. Nuclear Magnetic Resonance Structure of a Novel Globular Domain in RBM10 Containing OCRE, the Octamer Repeat Sequence Motif.

    PubMed

    Martin, Bryan T; Serrano, Pedro; Geralt, Michael; Wüthrich, Kurt

    2016-01-05

    The OCtamer REpeat (OCRE) has been annotated as a 42-residue sequence motif with 12 tyrosine residues in the spliceosome trans-regulatory elements RBM5 and RBM10 (RBM [RNA-binding motif]), which are known to regulate alternative splicing of Fas and Bcl-x pre-mRNA transcripts. Nuclear magnetic resonance structure determination showed that the RBM10 OCRE sequence motif is part of a 55-residue globular domain containing 16 aromatic amino acids, which consists of an anti-parallel arrangement of six β strands, with the first five strands containing complete or incomplete Tyr triplets. This OCRE globular domain is a distinctive component of RBM10 and is more widely conserved in RBM10s across the animal kingdom than the ubiquitous RNA recognition components. It is also found in the functionally related RBM5. Thus, it appears that the three-dimensional structure of the globular OCRE domain, rather than the 42-residue OCRE sequence motif alone, confers specificity on RBM10 intermolecular interactions in the spliceosome.

  6. Endocytosis and Trafficking of Natriuretic Peptide Receptor-A: Potential Role of Short Sequence Motifs

    PubMed Central

    Pandey, Kailash N.

    2015-01-01

    The targeted endocytosis and redistribution of transmembrane receptors among membrane-bound subcellular organelles are vital for their correct signaling and physiological functions. Membrane receptors committed for internalization and trafficking pathways are sorted into coated vesicles. Cardiac hormones, atrial and brain natriuretic peptides (ANP and BNP) bind to guanylyl cyclase/natriuretic peptide receptor-A (GC-A/NPRA) and elicit the generation of intracellular second messenger cyclic guanosine 3',5'-monophosphate (cGMP), which lowers blood pressure and incidence of heart failure. After ligand binding, the receptor is rapidly internalized, sequestrated, and redistributed into intracellular locations. Thus, NPRA is considered a dynamic cellular macromolecule that traverses different subcellular locations through its lifetime. The utilization of pharmacologic and molecular perturbants has helped in delineating the pathways of endocytosis, trafficking, down-regulation, and degradation of membrane receptors in intact cells. This review describes the investigation of the mechanisms of internalization, trafficking, and redistribution of NPRA compared with other cell surface receptors from the plasma membrane into the cell interior. The roles of different short-signal peptide sequence motifs in the internalization and trafficking of other membrane receptors have been briefly reviewed and their potential significance in the internalization and trafficking of NPRA is discussed. PMID:26151885

  7. Finishing and Special Motifs: Lessons Learned from CRISPR Analysis Using Next-Generation Draft Sequences ( 7th Annual SFAF Meeting, 2012)

    SciTech Connect

    Campbell, Catherine

    2012-06-01

    Catherine Campbell on "Finishing and Special Motifs: Lessons learned from CRISPR analysis using next-generation draft sequences" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.

  8. Finishing and Special Motifs: Lessons Learned from CRISPR Analysis Using Next-Generation Draft Sequences ( 7th Annual SFAF Meeting, 2012)

    ScienceCinema

    Campbell, Catherine [Noblis

    2016-07-12

    Catherine Campbell on "Finishing and Special Motifs: Lessons learned from CRISPR analysis using next-generation draft sequences" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.

  9. Triazine-Based Sequence-Defined Polymers with Side-Chain Diversity and Backbone-Backbone Interaction Motifs.

    PubMed

    Grate, Jay W; Mo, Kai-For; Daily, Michael D

    2016-03-14

    Sequence control in polymers, well-known in nature, encodes structure and functionality. Here we introduce a new architecture, based on the nucleophilic aromatic substitution chemistry of cyanuric chloride, that creates a new class of sequence-defined polymers dubbed TZPs. Proof of concept is demonstrated with two synthesized hexamers, having neutral and ionizable side chains. Molecular dynamics simulations show backbone-backbone interactions, including H-bonding motifs and pi-pi interactions. This architecture is arguably biomimetic while differing from sequence-defined polymers having peptide bonds. The synthetic methodology supports the structural diversity of side chains known in peptides, as well as backbone-backbone hydrogen-bonding motifs, and will thus enable new macromolecules and materials with useful functions.

  10. DNA recognition for virus assembly through multiple sequence-independent interactions with a helix-turn-helix motif

    PubMed Central

    Greive, Sandra J.; Fung, Herman K.H.; Chechik, Maria; Jenkins, Huw T.; Weitzel, Stephen E.; Aguiar, Pedro M.; Brentnall, Andrew S.; Glousieau, Matthieu; Gladyshev, Grigory V.; Potts, Jennifer R.; Antson, Alfred A.

    2016-01-01

    The helix-turn-helix (HTH) motif features frequently in protein DNA-binding assemblies. Viral pac site-targeting small terminase proteins possess an unusual architecture in which the HTH motifs are displayed in a ring, distinct from the classical HTH dimer. Here we investigate how such a circular array of HTH motifs enables specific recognition of the viral genome for initiation of DNA packaging during virus assembly. We found, by surface plasmon resonance and analytical ultracentrifugation, that individual HTH motifs of the Bacillus phage SF6 small terminase bind the packaging regions of SF6 and related SPP1 genome weakly, with little local sequence specificity. Nuclear magnetic resonance chemical shift perturbation studies with an arbitrary single-site substrate suggest that the HTH motif contacts DNA similarly to how certain HTH proteins contact DNA non-specifically. Our observations support a model where specificity is generated through conformational selection of an intrinsically bent DNA segment by a ring of HTHs which bind weakly but cooperatively. Such a system would enable viral gene regulation and control of the viral life cycle, with a minimal genome, conferring a major evolutionary advantage for SPP1-like viruses. PMID:26673721

  11. Defining RNA motif-aminoglycoside interactions via two-dimensional combinatorial screening and structure-activity relationships through sequencing.

    PubMed

    Velagapudi, Sai Pradeep; Disney, Matthew D

    2013-10-15

    RNA is an extremely important target for the development of chemical probes of function or small molecule therapeutics. Aminoglycosides are the most well studied class of small molecules to target RNA. However, the RNA motifs outside of the bacterial rRNA A-site that are likely to be bound by these compounds in biological systems is largely unknown. If such information were known, it could allow for aminoglycosides to be exploited to target other RNAs and, in addition, could provide invaluable insights into potential bystander targets of these clinically used drugs. We utilized two-dimensional combinatorial screening (2DCS), a library-versus-library screening approach, to select the motifs displayed in a 3×3 nucleotide internal loop library and in a 6-nucleotide hairpin library that bind with high affinity and selectivity to six aminoglycoside derivatives. The selected RNA motifs were then analyzed using structure-activity relationships through sequencing (StARTS), a statistical approach that defines the privileged RNA motif space that binds a small molecule. StARTS allowed for the facile annotation of the selected RNA motif-aminoglycoside interactions in terms of affinity and selectivity. The interactions selected by 2DCS generally have nanomolar affinities, which is higher affinity than the binding of aminoglycosides to a mimic of their therapeutic target, the bacterial rRNA A-site.

  12. DNA consensus sequence motif for binding response regulator PhoP, a virulence regulator of Mycobacterium tuberculosis.

    PubMed

    He, Xiaoyuan; Wang, Shuishu

    2014-12-30

    Tuberculosis has reemerged as a serious threat to human health because of the increasing prevalence of drug-resistant strains and synergetic infection with HIV, prompting an urgent need for new and more efficient treatments. The PhoP-PhoR two-component system of Mycobacterium tuberculosis plays an important role in the virulence of the pathogen and thus represents a potential drug target. To study the mechanism of gene transcription regulation by response regulator PhoP, we identified a high-affinity DNA sequence for PhoP binding using systematic evolution of ligands by exponential enrichment. The sequence contains a direct repeat of two 7 bp motifs separated by a 4 bp spacer, TCACAGC(N4)TCACAGC. The specificity of the direct-repeat sequence for PhoP binding was confirmed by isothermal titration calorimetry and electrophoretic mobility shift assays. PhoP binds to the direct repeat as a dimer in a highly cooperative manner. We found many genes previously identified to be regulated by PhoP that contain the direct-repeat motif in their promoter sequences. Synthetic DNA fragments at the putative promoter-binding sites bind PhoP with variable affinity, which is related to the number of mismatches in the 7 bp motifs, the positions of the mismatches, and the spacer and flanking sequences. Phosphorylation of PhoP increases the affinity but does not change the specificity of DNA binding. Overall, our results confirm the direct-repeat sequence as the consensus motif for PhoP binding and thus pave the way for identification of PhoP directly regulated genes in different mycobacterial genomes.

  13. SVM2Motif—Reconstructing Overlapping DNA Sequence Motifs by Mimicking an SVM Predictor

    PubMed Central

    Vidovic, Marina M. -C.; Görnitz, Nico; Müller, Klaus-Robert; Rätsch, Gunnar; Kloft, Marius

    2015-01-01

    Identifying discriminative motifs underlying the functionality and evolution of organisms is a major challenge in computational biology. Machine learning approaches such as support vector machines (SVMs) achieve state-of-the-art performances in genomic discrimination tasks, but—due to its black-box character—motifs underlying its decision function are largely unknown. As a remedy, positional oligomer importance matrices (POIMs) allow us to visualize the significance of position-specific subsequences. Although being a major step towards the explanation of trained SVM models, they suffer from the fact that their size grows exponentially in the length of the motif, which renders their manual inspection feasible only for comparably small motif sizes, typically k ≤ 5. In this work, we extend the work on positional oligomer importance matrices, by presenting a new machine-learning methodology, entitled motifPOIM, to extract the truly relevant motifs—regardless of their length and complexity—underlying the predictions of a trained SVM model. Our framework thereby considers the motifs as free parameters in a probabilistic model, a task which can be phrased as a non-convex optimization problem. The exponential dependence of the POIM size on the oligomer length poses a major numerical challenge, which we address by an efficient optimization framework that allows us to find possibly overlapping motifs consisting of up to hundreds of nucleotides. We demonstrate the efficacy of our approach on a synthetic data set as well as a real-world human splice site data set. PMID:26690911

  14. A combined sequence and structure based method for discovering enriched motifs in RNA from in vivo binding data.

    PubMed

    Polishchuk, Maya; Paz, Inbal; Kohen, Refael; Mesika, Rona; Yakhini, Zohar; Mandel-Gutfreund, Yael

    2017-03-06

    RNA binding proteins (RBPs) play an important role in regulating many processes in the cell. RBPs often recognize their RNA targets in a specific manner. In addition to the RNA primary sequence, the structure of the RNA has been shown to play a central role in RNA recognition by RBPs. In recent years, many experimental approaches, both in vitro and in vivo, were developed and employed to identify and characterize RBP targets and extract their binding specificities. In vivo binding techniques, such as CrossLinking and ImmunoPrecipitation (CLIP)-based methods, enable the characterization of protein binding sites on RNA targets. However, these methods do not provide information regarding the structural preferences of the protein. While methods to obtain the structure of RNA are available, inferring both the sequence and the structure preferences of RBPs remains a challenge. Here we present SMARTIV, a novel computational tool for discovering combined sequence and structure binding motifs from in vivo RNA binding data relying on the sequences of the target sites, the ranking of their binding scores and their predicted secondary structure. The combined motifs are provided in a unified representation that is informative and easy for visual perception. We tested the method on CLIP-seq data from different platforms for a variety of RBPs. Overall, we show that our results are highly consistent with known binding motifs of RBPs, offering additional information on their structural preferences.

  15. A survey of DNA motif finding algorithms

    PubMed Central

    Das, Modan K; Dai, Ho-Kwok

    2007-01-01

    Background Unraveling the mechanisms that regulate gene expression is a major challenge in biology. An important task in this challenge is to identify regulatory elements, especially the binding sites in deoxyribonucleic acid (DNA) for transcription factors. These binding sites are short DNA segments that are called motifs. Recent advances in genome sequence availability and in high-throughput gene expression analysis technologies have allowed for the development of computational methods for motif finding. As a result, a large number of motif finding algorithms have been implemented and applied to various motif models over the past decade. This survey reviews the latest developments in DNA motif finding algorithms. Results Earlier algorithms use promoter sequences of coregulated genes from single genome and search for statistically overrepresented motifs. Recent algorithms are designed to use phylogenetic footprinting or orthologous sequences and also an integrated approach where promoter sequences of coregulated genes and phylogenetic footprinting are used. All the algorithms studied have been reported to correctly detect the motifs that have been previously detected by laboratory experimental approaches, and some algorithms were able to find novel motifs. However, most of these motif finding algorithms have been shown to work successfully in yeast and other lower organisms, but perform significantly worse in higher organisms. Conclusion Despite considerable efforts to date, DNA motif finding remains a complex challenge for biologists and computer scientists. Researchers have taken many different approaches in developing motif discovery tools and the progress made in this area of research is very encouraging. Performance comparison of different motif finding tools and identification of the best tools have proven to be a difficult task because tools are designed based on algorithms and motif models that are diverse and complex and our incomplete understanding of

  16. MSDmotif: exploring protein sites and motifs

    PubMed Central

    Golovin, Adel; Henrick, Kim

    2008-01-01

    Background Protein structures have conserved features – motifs, which have a sufficient influence on the protein function. These motifs can be found in sequence as well as in 3D space. Understanding of these fragments is essential for 3D structure prediction, modelling and drug-design. The Protein Data Bank (PDB) is the source of this information however present search tools have limited 3D options to integrate protein sequence with its 3D structure. Results We describe here a web application for querying the PDB for ligands, binding sites, small 3D structural and sequence motifs and the underlying database. Novel algorithms for chemical fragments, 3D motifs, ϕ/ψ sequences, super-secondary structure motifs and for small 3D structural motif associations searches are incorporated. The interface provides functionality for visualization, search criteria creation, sequence and 3D multiple alignment options. MSDmotif is an integrated system where a results page is also a search form. A set of motif statistics is available for analysis. This set includes molecule and motif binding statistics, distribution of motif sequences, occurrence of an amino-acid within a motif, correlation of amino-acids side-chain charges within a motif and Ramachandran plots for each residue. The binding statistics are presented in association with properties that include a ligand fragment library. Access is also provided through the distributed Annotation System (DAS) protocol. An additional entry point facilitates XML requests with XML responses. Conclusion MSDmotif is unique by combining chemical, sequence and 3D data in a single search engine with a range of search and visualisation options. It provides multiple views of data found in the PDB archive for exploring protein structures. PMID:18637174

  17. A DNA-binding protein containing two widely separated zinc finger motifs that recognize the same DNA sequence.

    PubMed

    Fan, C M; Maniatis, T

    1990-01-01

    We have isolated a full-length cDNA clone encoding a protein (PRDII-BF1) that binds specifically to a positive regulatory domain (PRDII) of the human IFN-beta gene promoter, and to a similar sequence present in a number of other promoters and enhancers. The sequence of this protein reveals two novel structural features. First, it is the largest sequence-specific DNA-binding protein reported to date (298 kD). Second, it contains two widely separated sets of C2-H2-type zinc fingers. Remarkably, each set of zinc fingers binds to the same DNA sequence motif with similar affinities and methylation interference patterns. Thus, this protein may act by binding simultaneously to reiterated copies of the same recognition sequence. Although the function of PRDII-BF1 is not known, the level of its mRNA is inducible by serum and virus, albeit with different kinetics.

  18. An amino acid sequence motif sufficient for subnuclear localization of an arginine/serine-rich splicing factor.

    PubMed

    Hedley, M L; Amrein, H; Maniatis, T

    1995-12-05

    We have identified an amino acid sequence in the Drosophila Transformer (Tra) protein that is capable of directing a heterologous protein to nuclear speckles, regions of the nucleus previously shown to contain high concentrations of spliceosomal small nuclear RNAs and splicing factors. This sequence contains a nucleoplasmin-like bipartite nuclear localization signal (NLS) and a repeating arginine/serine (RS) dipeptide sequence adjacent to a short stretch of basic amino acids. Sequence comparisons from a number of other splicing factors that colocalize to nuclear speckles reveal the presence of one or more copies of this motif. We propose a two-step subnuclear localization mechanism for splicing factors. The first step is transport across the nuclear envelope via the nucleoplasmin-like NLS, while the second step is association with components in the speckled domain via the RS dipeptide sequence.

  19. microRNAs with AAGUGC seed motif constitute an integral part of an oncogenic signaling network

    PubMed Central

    Zhou, Y; Frings, O; Branca, R M; Boekel, J; le Sage, C; Fredlund, E; Agami, R; Orre, L M

    2017-01-01

    microRNA (miRNA) dysregulation is a common feature of cancer cells, but the complex roles of miRNAs in cancer are not fully elucidated. Here, we used functional genomics to identify oncogenic miRNAs in non-small cell lung cancer and evaluate their impact on response to epidermal growth factor (EGFR)-targeting therapy. Our data demonstrate that miRNAs with an AAGUGC motif in their seed sequence increase both cancer cell proliferation and sensitivity to EGFR inhibitors. Global transcriptomics, proteomics and target prediction resulted in the identification of several tumor suppressors involved in the G1/S transition as AAGUGC-miRNA targets. The clinical implications of our findings were evaluated by analysis of AAGUGC-miRNA expression in multiple cancer types, supporting the link between this miRNA seed family, their tumor suppressor targets and cancer cell proliferation. In conclusion, we propose the AAGUGC seed motif as an oncomotif and that oncomotif-miRNAs promote cancer cell proliferation. These findings have potential therapeutic implications, especially in selecting patients for EGFR-targeting therapy. PMID:27477696

  20. A sequence upstream of canonical PDZ-binding motif within CFTR COOH-terminus enhances NHERF1 interaction.

    PubMed

    Sharma, Neeraj; LaRusch, Jessica; Sosnay, Patrick R; Gottschalk, Laura B; Lopez, Andrea P; Pellicore, Matthew J; Evans, Taylor; Davis, Emily; Atalar, Melis; Na, Chan-Hyun; Rosson, Gedge D; Belchis, Deborah; Milewski, Michal; Pandey, Akhilesh; Cutting, Garry R

    2016-12-01

    The development of cystic fibrosis transmembrane conductance regulator (CFTR) targeted therapy for cystic fibrosis has generated interest in maximizing membrane residence of mutant forms of CFTR by manipulating interactions with scaffold proteins, such as sodium/hydrogen exchange regulatory factor-1 (NHERF1). In this study, we explored whether COOH-terminal sequences in CFTR beyond the PDZ-binding motif influence its interaction with NHERF1. NHERF1 displayed minimal self-association in blot overlays (NHERF1, Kd = 1,382 ± 61.1 nM) at concentrations well above physiological levels, estimated at 240 nM from RNA-sequencing and 260 nM by liquid chromatography tandem mass spectrometry in sweat gland, a key site of CFTR function in vivo. However, NHERF1 oligomerized at considerably lower concentrations (10 nM) in the presence of the last 111 amino acids of CFTR (20 nM) in blot overlays and cross-linking assays and in coimmunoprecipitations using differently tagged versions of NHERF1. Deletion and alanine mutagenesis revealed that a six-amino acid sequence (1417)EENKVR(1422) and the terminal (1478)TRL(1480) (PDZ-binding motif) in the COOH-terminus were essential for the enhanced oligomerization of NHERF1. Full-length CFTR stably expressed in Madin-Darby canine kidney epithelial cells fostered NHERF1 oligomerization that was substantially reduced (∼5-fold) on alanine substitution of EEN, KVR, or EENKVR residues or deletion of the TRL motif. Confocal fluorescent microscopy revealed that the EENKVR and TRL sequences contribute to preferential localization of CFTR to the apical membrane. Together, these results indicate that COOH-terminal sequences mediate enhanced NHERF1 interaction and facilitate the localization of CFTR, a property that could be manipulated to stabilize mutant forms of CFTR at the apical surface to maximize the effect of CFTR-targeted therapeutics.

  1. ML2Motif—Reliable extraction of discriminative sequence motifs from learning machines

    PubMed Central

    Kloft, Marius; Müller, Klaus-Robert; Görnitz, Nico

    2017-01-01

    High prediction accuracies are not the only objective to consider when solving problems using machine learning. Instead, particular scientific applications require some explanation of the learned prediction function. For computational biology, positional oligomer importance matrices (POIMs) have been successfully applied to explain the decision of support vector machines (SVMs) using weighted-degree (WD) kernels. To extract relevant biological motifs from POIMs, the motifPOIM method has been devised and showed promising results on real-world data. Our contribution in this paper is twofold: as an extension to POIMs, we propose gPOIM, a general measure of feature importance for arbitrary learning machines and feature sets (including, but not limited to, SVMs and CNNs) and devise a sampling strategy for efficient computation. As a second contribution, we derive a convex formulation of motifPOIMs that leads to more reliable motif extraction from gPOIMs. Empirical evaluations confirm the usefulness of our approach on artificially generated data as well as on real-world datasets. PMID:28346487

  2. Bayesian Markov models consistently outperform PWMs at predicting motifs in nucleotide sequences

    PubMed Central

    Siebert, Matthias; Söding, Johannes

    2016-01-01

    Position weight matrices (PWMs) are the standard model for DNA and RNA regulatory motifs. In PWMs nucleotide probabilities are independent of nucleotides at other positions. Models that account for dependencies need many parameters and are prone to overfitting. We have developed a Bayesian approach for motif discovery using Markov models in which conditional probabilities of order k − 1 act as priors for those of order k. This Bayesian Markov model (BaMM) training automatically adapts model complexity to the amount of available data. We also derive an EM algorithm for de-novo discovery of enriched motifs. For transcription factor binding, BaMMs achieve significantly (P    =  1/16) higher cross-validated partial AUC than PWMs in 97% of 446 ChIP-seq ENCODE datasets and improve performance by 36% on average. BaMMs also learn complex multipartite motifs, improving predictions of transcription start sites, polyadenylation sites, bacterial pause sites, and RNA binding sites by 26–101%. BaMMs never performed worse than PWMs. These robust improvements argue in favour of generally replacing PWMs by BaMMs. PMID:27288444

  3. Definition of the tempo of sequence diversity across an alignment and automatic identification of sequence motifs: Application to protein homologous families and superfamilies

    PubMed Central

    May, Alex C.W.

    2002-01-01

    It is often possible to identify sequence motifs that characterize a protein family in terms of its fold and/or function from aligned protein sequences. Such motifs can be used to search for new family members. Partitioning of sequence alignments into regions of similar amino acid variability is usually done by hand. Here, I present a completely automatic method for this purpose: one that is guaranteed to produce globally optimal solutions at all levels of partition granularity. The method is used to compare the tempo of sequence diversity across reliable three-dimensional (3D) structure-based alignments of 209 protein families (HOMSTRAD) and that for 69 superfamilies (CAMPASS). (The mean alignment length for HOMSTRAD and CAMPASS are very similar.) Surprisingly, the optimal segmentation distributions for the closely related proteins and distantly related ones are found to be very similar. Also, optimal segmentation identifies an unusual protein superfamily. Finally, protein 3D structure clues from the tempo of sequence diversity across alignments are examined. The method is general, and could be applied to any area of comparative biological sequence and 3D structure analysis where the constraint of the inherent linear organization of the data imposes an ordering on the set of objects to be clustered. PMID:12441381

  4. Conserved sequence motifs upstream from the co-ordinately expressed vitellogenin and apoVLDLII genes of chicken.

    PubMed

    van het Schip, F; Strijker, R; Samallo, J; Gruber, M; Geert, A B

    1986-11-11

    The vitellogenin and apoVLDLII yolk protein genes of chicken are transcribed in the liver upon estrogenization. To get information on putative regulatory elements, we compared more than 2 kb of their 5' flanking DNA sequences. Common sequence motifs were found in regions exhibiting estrogen-induced changes in chromatin structure. Stretches of alternating pyrimidines and purines of about 30-nucleotides long are present at roughly similar positions. A distinct box of sequence homology in the chicken genes also appears to be present at a similar position in front of the vitellogenin genes of Xenopus laevis, but is absent from the estrogen-responsive egg-white protein genes expressed in the oviduct. In front of the vitellogenin (position -595) and the VLDLII gene (position -548), a DNA element of about 300 base-pairs was found, which possesses structural characteristics of a mobile genetic element and bears homology to the transposon-like Vi element of Xenopus laevis.

  5. A Glance at Microsatellite Motifs from 454 Sequencing Reads of Watermelon Genomic DNA

    Technology Transfer Automated Retrieval System (TEKTRAN)

    A single 454 (Life Sciences Sequencing Technology) run of Charleston Gray watermelon (Citrullus lanatus var. lanatus) genomic DNA was performed and sequence data were assembled. A large scale identification of simple sequence repeat (SSR) was performed and SSR sequence data were used for the develo...

  6. The Runt domain of AML1 (RUNX1) binds a sequence-conserved RNA motif that mimics a DNA element

    PubMed Central

    Fukunaga, Junichi; Nomura, Yusuke; Tanaka, Yoichiro; Amano, Ryo; Tanaka, Taku; Nakamura, Yoshikazu; Kawai, Gota; Sakamoto, Taiichi; Kozu, Tomoko

    2013-01-01

    AML1 (RUNX1) is a key transcription factor for hematopoiesis that binds to the Runt-binding double-stranded DNA element (RDE) of target genes through its N-terminal Runt domain. Aberrations in the AML1 gene are frequently found in human leukemia. To better understand AML1 and its potential utility for diagnosis and therapy, we obtained RNA aptamers that bind specifically to the AML1 Runt domain. Enzymatic probing and NMR analyses revealed that Apt1-S, which is a truncated variant of one of the aptamers, has a CACG tetraloop and two stem regions separated by an internal loop. All the isolated aptamers were found to contain the conserved sequence motif 5′-NNCCAC-3′ and 5′-GCGMGN′N′-3′ (M:A or C; N and N′ form Watson–Crick base pairs). The motif contains one AC mismatch and one base bulged out. Mutational analysis of Apt1-S showed that three guanines of the motif are important for Runt binding as are the three guanines of RDE, which are directly recognized by three arginine residues of the Runt domain. Mutational analyses of the Runt domain revealed that the amino acid residues used for Apt1-S binding were similar to those used for RDE binding. Furthermore, the aptamer competed with RDE for binding to the Runt domain in vitro. These results demonstrated that the Runt domain of the AML1 protein binds to the motif of the aptamer that mimics DNA. Our findings should provide new insights into RNA function and utility in both basic and applied sciences. PMID:23709277

  7. Analysis of Cytochrome P450 Conserved Sequence Motifs between Helices E and H: Prediction of Critical Motifs and Residues in Enzyme Functions

    PubMed Central

    Oezguen, Numan; Kumar, Santosh

    2014-01-01

    Rational approaches have been extensively used to investigate the role of active site residues in cytochrome P450 (CYP) functions. However, recent studies using random mutagenesis suggest an important role for non-active site residues in CYP functions. Meta-analysis of the random mutants showed that 75% of the functionally important non-active site residues are present in 20% of the entire protein between helices E and H (E-H) and conserved sequence motif (CSM) between 7 and 11. The CSM approach was developed recently to investigate the functional role of non-active site residues in CYP2B4. Furthermore, we identified and analyzed the CSM in multiple CYP families and subfamilies in the E-H region. Results from CSM analysis showed that CSM 7, 8, 10, and 11 are conserved in CYP1, CYP2, and CYP3 families, while CSM 9 is conserved only in CYP2 family. Analysis of different CYP2 subfamilies showed that CYP2B and CYP2C have similar characteristics in the CSM, while the characteristics of CYP2A and CYP2D subfamilies are different. Finally, we analyzed CSM 7, 8, 10, and 11, which are common in all the CYP families/subfamilies analyzed, in fifteen important drug-metabolizing CYPs. The results showed that while CSM 8 is most conserved among these CYPs, CSM 7, 9, and 10 have significant variations. We suggest that CSM8 has a common role in all the CYPs that have been analyzed, while CSM 7, 10, and 11 may have relatively specific role within the subfamily. We further suggest that these CSM play important role in opening and closing of the substrate access/egress channel by modulating the flexible/plastic region of the protein. Thus, site-directed mutagenesis of these CSM can be used to study structure-function and dynamic/plasticity-function relationships and to design CYP biocatalysts. PMID:25426333

  8. Viral sequences integrated into plant genomes.

    PubMed

    Harper, Glyn; Hull, Roger; Lockhart, Ben; Olszewski, Neil

    2002-01-01

    Sequences of various DNA plant viruses have been found integrated into the host genome. There are two forms of integrant, those that can form episomal viral infections and those that cannot. Integrants of three pararetroviruses, Banana streak virus (BSV), Tobacco vein clearing virus (TVCV), and Petunia vein clearing virus (PVCV), can generate episomal infections in certain hybrid plant hosts in response to stress. In the case of BSV and TVCV, one of the parents contains the integrant but is has not been seen to be activated in that parent; the other parent does not contain the integrant. The number of integrant loci is low for BSV and PVCV and high in TVCV. The structure of the integrants is complex, and it is thought that episomal virus is released by recombination and/or reverse transcription. Geminiviral and pararetroviral sequences are found in plant genomes although not so far associated with a virus disease. It appears that integration of viral sequences is widespread in the plant kingdom and has been occurring for a long period of time.

  9. Examination of the transcription factor NtcA-binding motif by in vitro selection of DNA sequences from a random library.

    PubMed

    Jiang, F; Wisén, S; Widersten, M; Bergman, B; Mannervik, B

    2000-08-25

    A recursive in vitro selection among random DNA sequences was used for analysis of the cyanobacterial transcription factor NtcA-binding motifs. An eight-base palindromic sequence, TGTA-(N(8))-TACA, was found to be the optimal NtcA-binding sequence. The more divergent the binding sequences, compared to this consensus sequence, the lower the NtcA affinity. The second and third bases in each four-nucleotide half of the consensus sequence were crucial for NtcA binding, and they were in general highly conserved. The most frequently occurring sequence in the middle weakly conserved region was similar to that of the NtcA-binding motif of the Anabaena sp. strain PCC 7120 glnA gene, previously known to have high affinity for NtcA. This indicates that the middle sequences were selected for high NtcA affinity. Analysis of natural NtcA-binding motifs showed that these could be classified into two groups based on differences in recognition consensus sequences. It is suggested that NtcA naturally recognizes different DNA-binding motifs, or has differential affinities to these sequences under different physiological conditions.

  10. New melanocortin 1 receptor binding motif based on the C-terminal sequence of alpha-melanocyte-stimulating hormone.

    PubMed

    Schiöth, Helgi B; Muceniece, Ruta; Mutule, Ilga; Wikberg, Jarl E S

    2006-10-01

    The C-terminal tripeptide of the alpha-melanocyte stimulating hormone (alpha-MSH11-13) possesses strong antiinflammatory activity without known cellular target. In order to better understand the structural requirements for function of such motif, we designed, synthesized and tested out Trp- and Tyr-containing analogues of the alpha-MSH11-13. Seven alpha-MSH11-13 analogues were synthesized and characterized for their binding to the melanocortin receptors recombinantly expressed in insect (Sf9) cells, infected with baculovirus carrying corresponding MC receptor DNA. We also tested these analogues on B16-F1 mouse melanoma cells endogenously expressing the MC1 receptor for binding and for ability to increase cAMP levels as well as on COS-7 cells transfected with the human MC receptors. The data indicate that HS401 (Ac-Tyr-Lys-Pro-Val-NH2) and HS402 (Ac-Lys-Pro-Val-Tyr-NH2) selectively bound to the MC1 receptor and stimulated cAMP generation in a concentration dependent way while the other Tyr- and Trp-containing alpha-MSH11-13 analogues neither bound to MC receptors nor stimulated cAMP. We have thus identified new MC receptor binding motif derived from the C-terminal sequence of alpha-MSH. The tetrapeptides have novel properties as the both act via MC-ergic pathways and also carry the anti-inflammatory alpha-MSH11-13 message sequence.

  11. A conserved sequence extending motif III of the motor domain in the Snf2-family DNA translocase Rad54 is critical for ATPase activity.

    PubMed

    Zhang, Xiao-Ping; Janke, Ryan; Kingsley, James; Luo, Jerry; Fasching, Clare; Ehmsen, Kirk T; Heyer, Wolf-Dietrich

    2013-01-01

    Rad54 is a dsDNA-dependent ATPase that translocates on duplex DNA. Its ATPase function is essential for homologous recombination, a pathway critical for meiotic chromosome segregation, repair of complex DNA damage, and recovery of stalled or broken replication forks. In recombination, Rad54 cooperates with Rad51 protein and is required to dissociate Rad51 from heteroduplex DNA to allow access by DNA polymerases for recombination-associated DNA synthesis. Sequence analysis revealed that Rad54 contains a perfect match to the consensus PIP box sequence, a widely spread PCNA interaction motif. Indeed, Rad54 interacts directly with PCNA, but this interaction is not mediated by the Rad54 PIP box-like sequence. This sequence is located as an extension of motif III of the Rad54 motor domain and is essential for full Rad54 ATPase activity. Mutations in this motif render Rad54 non-functional in vivo and severely compromise its activities in vitro. Further analysis demonstrated that such mutations affect dsDNA binding, consistent with the location of this sequence motif on the surface of the cleft formed by two RecA-like domains, which likely forms the dsDNA binding site of Rad54. Our study identified a novel sequence motif critical for Rad54 function and showed that even perfect matches to the PIP box consensus may not necessarily identify PCNA interaction sites.

  12. Functional importance of GGXG sequence motifs in putative reentrant loops of 2HCT and ESS transport proteins.

    PubMed

    Dobrowolski, Adam; Lolkema, Juke S

    2009-08-11

    The 2HCT and ESS families are two families of secondary transporters. Members of the two families are unrelated in amino acid sequence but share similar hydropathy profiles, which suggest a similar folding of the proteins in membranes. Structural models show two homologous domains containing five transmembrane segments (TMSs) each, with a reentrant or pore loop between the fourth and fifth TMSs in each domain. Here we show that GGXG sequence motifs present in the putative reentrant loops are important for the activity of the transporters. Mutation of the conserved Gly residues to Cys in the motifs of the Na(+)-citrate transporter CitS in the 2HCT family and the Na(+)-glutamate transporter GltS in the ESS family resulted in strongly reduced transport activity. Similarly, mutation of the variable residue "X" to Cys in the N-terminal half of GltS essentially inactivated the transporter. The corresponding mutations in the N- and C-terminal halves of CitS reduced transport activity to 60 and 25% of that of the wild type, respectively. Residual activity of any of the mutants could be further reduced by treatment with the membrane permeable thiol reagent N-ethylmaleimide (NEM). The X to Cys mutation (S405C) in the cytoplasmic loop in the C-terminal half of CitS rendered the protein sensitive to the bulky, membrane impermeable thiol reagent 4-acetamido-4'-maleimidylstilbene-2,2'-disulfonic acid (AmdiS) added at the periplasmic side of the membrane, providing further evidence that this part of the loop is positioned between the transmembrane segments. The putative reentrant loop in the C-terminal half of the ESS family does not contain the GGXG motif, but a conserved stretch rich in Gly residues. Cysteine-scanning mutagenesis of a stretch of 18 residues in the GltS protein revealed two residues important for function. Mutant N356C was completely inactivated by treatment with NEM, and mutant P351C appeared to be the counterpart of mutant S405C of CitS; the mutant was

  13. Rapid characterization of CRISPR-Cas9 protospacer adjacent motif sequence elements.

    PubMed

    Karvelis, Tautvydas; Gasiunas, Giedrius; Young, Joshua; Bigelyte, Greta; Silanskas, Arunas; Cigan, Mark; Siksnys, Virginijus

    2015-11-19

    To expand the repertoire of Cas9s available for genome targeting, we present a new in vitro method for the simultaneous examination of guide RNA and protospacer adjacent motif (PAM) requirements. The method relies on the in vitro cleavage of plasmid libraries containing a randomized PAM as a function of Cas9-guide RNA complex concentration. Using this method, we accurately reproduce the canonical PAM preferences for Streptococcus pyogenes, Streptococcus thermophilus CRISPR3 (Sth3), and CRISPR1 (Sth1). Additionally, PAM and sgRNA solutions for a novel Cas9 protein from Brevibacillus laterosporus are provided by the assay and are demonstrated to support functional activity in vitro and in plants.

  14. Sequence motifs associated with paternal transmission of mitochondrial DNA in the horse mussel, Modiolus modiolus (Bivalvia: Mytilidae).

    PubMed

    Robicheau, Brent M; Breton, Sophie; Stewart, Donald T

    2017-03-20

    In the majority of metazoans paternal mitochondria represent evolutionary dead-ends. In many bivalves, however, this paradigm does not hold true; both maternal and paternal mitochondria are inherited. Herein, we characterize maternal and paternal mitochondrial control regions of the horse mussel, Modiolus modiolus (Bivalvia: Mytilidae). The maternal control region is 808bp long, while the paternal control region is longer at 2.3kb. We hypothesize that the size difference is due to a combination of repeated duplications within the control region of the paternal mtDNA genome, as well as an evolutionarily ancient recombination event between two sex-associated mtDNA genomes that led to the insertion of a second control region sequence in the genome that is now transmitted via males. In a comparison to other mytilid male control regions, we identified two evolutionarily Conserved Motifs, CMA and CMB, associated with paternal transmission of mitochondrial DNA. CMA is characterized by a conserved purine/pyrimidine pattern, while CMB exhibits a specific 13bp nucleotide string within a stem and loop structure. The identification of motifs CMA and CMB in M. modiolus extends our understanding of Sperm Transmission Elements (STEs) that have recently been identified as being associated with the paternal transmission of mitochondria in marine bivalves.

  15. De novo computational identification of stress-related sequence motifs and microRNA target sites in untranslated regions of a plant translatome

    PubMed Central

    Munusamy, Prabhakaran; Zolotarov, Yevgen; Meteignier, Louis-Valentin; Moffett, Peter; Strömvik, Martina V.

    2017-01-01

    Gene regulation at the transcriptional and translational level leads to diversity in phenotypes and function in organisms. Regulatory DNA or RNA sequence motifs adjacent to the gene coding sequence act as binding sites for proteins that in turn enable or disable expression of the gene. Whereas the known DNA and RNA binding proteins range in the thousands, only a few motifs have been examined. In this study, we have predicted putative regulatory motifs in groups of untranslated regions from genes regulated at the translational level in Arabidopsis thaliana under normal and stressed conditions. The test group of sequences was divided into random subgroups and subjected to three de novo motif finding algorithms (Seeder, Weeder and MEME). In addition to identifying sequence motifs, using an in silico tool we have predicted microRNA target sites in the 3′ UTRs of the translationally regulated genes, as well as identified upstream open reading frames located in the 5′ UTRs. Our bioinformatics strategy and the knowledge generated contribute to understanding gene regulation during stress, and can be applied to disease and stress resistant plant development. PMID:28276452

  16. Alignment of U3 region sequences of mammalian type C viruses: identification of highly conserved motifs and implications for enhancer design.

    PubMed Central

    Golemis, E A; Speck, N A; Hopkins, N

    1990-01-01

    We aligned published sequences for the U3 region of 35 type C mammalian retroviruses. The alignment reveals that certain sequence motifs within the U3 region are strikingly conserved. A number of these motifs correspond to previously identified sites. In particular, we found that the enhancer region of most of the viruses examined contains a binding site for leukemia virus factor b, a viral corelike element, the consensus motif for nuclear factor 1, and the glucocorticoid response element. Most viruses containing more than one copy of enhancer sequences include these binding sites in both copies of the repeat. We consider this set of binding sites to constitute a framework for the enhancers of this set of viruses. Other highly conserved motifs in the U3 region include the retrovirus inverted repeat sequence, a negative regulatory element, and the CCAAT and TATA boxes. In addition, we identified two novel motifs in the promoter region that were exceptionally highly conserved but have not been previously described. PMID:2153223

  17. WeederH: an algorithm for finding conserved regulatory motifs and regions in homologous sequences

    PubMed Central

    Pavesi, Giulio; Zambelli, Federico; Pesole, Graziano

    2007-01-01

    Background This work addresses the problem of detecting conserved transcription factor binding sites and in general regulatory regions through the analysis of sequences from homologous genes, an approach that is becoming more and more widely used given the ever increasing amount of genomic data available. Results We present an algorithm that identifies conserved transcription factor binding sites in a given sequence by comparing it to one or more homologs, adapting a framework we previously introduced for the discovery of sites in sequences from co-regulated genes. Differently from the most commonly used methods, the approach we present does not need or compute an alignment of the sequences investigated, nor resorts to descriptors of the binding specificity of known transcription factors. The main novel idea we introduce is a relative measure of conservation, assuming that true functional elements should present a higher level of conservation with respect to the rest of the sequence surrounding them. We present tests where we applied the algorithm to the identification of conserved annotated sites in homologous promoters, as well as in distal regions like enhancers. Conclusion Results of the tests show how the algorithm can provide fast and reliable predictions of conserved transcription factor binding sites regulating the transcription of a gene, with better performances than other available methods for the same task. We also show examples on how the algorithm can be successfully employed when promoter annotations of the genes investigated are missing, or when regulatory sites and regions are located far away from the genes. PMID:17286865

  18. A sequence motif enriched in regions bound by the Drosophila dosage compensation complex

    PubMed Central

    2010-01-01

    Background In Drosophila melanogaster, dosage compensation is mediated by the action of the dosage compensation complex (DCC). How the DCC recognizes the fly X chromosome is still poorly understood. Characteristic sequence signatures at all DCC binding sites have not hitherto been found. Results In this study, we compare the known binding sites of the DCC with oligonucleotide profiles that measure the specificity of the sequences of the D. melanogaster X chromosome. We show that the X chromosome regions bound by the DCC are enriched for a particular type of short, repetitive sequences. Their distribution suggests that these sequences contribute to chromosome recognition, the generation of DCC binding sites and/or the local spreading of the complex. Comparative data indicate that the same sequences may be involved in dosage compensation in other Drosophila species. Conclusions These results offer an explanation for the wild-type binding of the DCC along the Drosophila X chromosome, contribute to delineate the forces leading to the establishment of dosage compensation and suggest new experimental approaches to understand the precise biochemical features of the dosage compensation system. PMID:20226017

  19. MPS Editor - An Integrated Sequencing Environment

    NASA Technical Reports Server (NTRS)

    Streiffert, Barbara A.; O'Reilly, Taifun; Schrock, Mitchell; Catchen, Jaime

    2010-01-01

    In today's operations environment, the teams are smaller and need to be more efficient while still ensuring the safety and success of the mission. In addition, teams often begin working on a mission in its early development phases and continue on the team through actual operations. For these reasons the operations teams want to be presented with a software environment that integrates multiple needed software applications as well as providing them with context sensitive editing support for entering commands and sequences of commands. At Jet Propulsion Laboratory, the Multi-Mission Planning and Sequencing (MPS) Editor provided by the Multi-Mission Ground Systems and Services (MGSS) supports those operational needs.

  20. Structural Analysis of a Repetitive Protein Sequence Motif in Strepsirrhine Primate Amelogenin

    PubMed Central

    Bromley, Keith M.; Hacia, Joseph G.; Bromage, Timothy G.; Snead, Malcolm L.; Moradian-Oldak, Janet; Paine, Michael L.

    2011-01-01

    Strepsirrhines are members of a primate suborder that has a distinctive set of features associated with the development of the dentition. Amelogenin (AMEL), the better known of the enamel matrix proteins, forms 90% of the secreted organic matrix during amelogenesis. Although AMEL has been sequenced in numerous mammalian lineages, the only reported strepsirrhine AMEL sequences are those of the ring-tailed lemur and galago, which contain a set of additional proline-rich tandem repeats absent in all other primates species analyzed to date, but present in some non-primate mammals. Here, we first determined that these repeats are present in AMEL from three additional lemur species and thus are likely to be widespread throughout this group. To evaluate the functional relevance of these repeats in strepsirrhines, we engineered a mutated murine amelogenin sequence containing a similar proline-rich sequence to that of Lemur catta. In the monomeric form, the MQP insertions had no influence on the secondary structure or refolding properties, whereas in the assembled form, the insertions increased the hydrodynamic radii. We speculate that increased AMEL nanosphere size may influence enamel formation in strepsirrhine primates. PMID:21437261

  1. Phylogenetic Analysis of Geographically Diverse Radopholus similis via rDNA Sequence Reveals a Monomorphic Motif.

    PubMed

    Kaplan, D T; Thomas, W K; Frisse, L M; Sarah, J L; Stanton, J M; Speijer, P R; Marin, D H; Opperman, C H

    2000-06-01

    The nucleic acid sequences of rDNA ITS1 and the rDNA D2/D3 expansion segment were compared for 57 burrowing nematode isolates collected from Australia, Cameroon, Central America, Cuba, Dominican Republic, Florida, Guadeloupe, Hawaii, Nigeria, Honduras, Indonesia, Ivory Coast, Puerto Rico, South Africa, and Uganda. Of the 57 isolates, 55 were morphologically similar to Radopholus similis and seven were citrus-parasitic. The nucleic acid sequences for PCR-amplified ITS1 and for the D2/D3 expansion segment of the 28S rDNA gene were each identical for all putative R. similis. Sequence divergence for both the ITS1 and the D2/D3 was concordant with morphological differences that distinguish R. similis from other burrowing nematode species. This result substantiates previous observations that the R. similis genome is highly conserved across geographic regions. Autapomorphies that would delimit phylogenetic lineages of non-citrus-parasitic R. similis from those that parasitize citrus were not observed. The data presented herein support the concept that R. similis is comprised of two pathotypes-one that parasitizes citrus and one that does not.

  2. Gibbs motif sampling: detection of bacterial outer membrane protein repeats.

    PubMed Central

    Neuwald, A. F.; Liu, J. S.; Lawrence, C. E.

    1995-01-01

    The detection and alignment of locally conserved regions (motifs) in multiple sequences can provide insight into protein structure, function, and evolution. A new Gibbs sampling algorithm is described that detects motif-encoding regions in sequences and optimally partitions them into distinct motif models; this is illustrated using a set of immunoglobulin fold proteins. When applied to sequences sharing a single motif, the sampler can be used to classify motif regions into related submodels, as is illustrated using helix-turn-helix DNA-binding proteins. Other statistically based procedures are described for searching a database for sequences matching motifs found by the sampler. When applied to a set of 32 very distantly related bacterial integral outer membrane proteins, the sampler revealed that they share a subtle, repetitive motif. Although BLAST (Altschul SF et al., 1990, J Mol Biol 215:403-410) fails to detect significant pairwise similarity between any of the sequences, the repeats present in these outer membrane proteins, taken as a whole, are highly significant (based on a generally applicable statistical test for motifs described here). Analysis of bacterial porins with known trimeric beta-barrel structure and related proteins reveals a similar repetitive motif corresponding to alternating membrane-spanning beta-strands. These beta-strands occur on the membrane interface (as opposed to the trimeric interface) of the beta-barrel. The broad conservation and structural location of these repeats suggests that they play important functional roles. PMID:8520488

  3. On the Concept of Cis-regulatory Information: From Sequence Motifs to Logic Functions

    NASA Astrophysics Data System (ADS)

    Tarpine, Ryan; Istrail, Sorin

    The regulatory genome is about the “system level organization of the core genomic regulatory apparatus, and how this is the locus of causality underlying the twin phenomena of animal development and animal evolution” (E.H. Davidson. The Regulatory Genome: Gene Regulatory Networks in Development and Evolution, Academic Press, 2006). Information processing in the regulatory genome is done through regulatory states, defined as sets of transcription factors (sequence-specific DNA binding proteins which determine gene expression) that are expressed and active at the same time. The core information processing machinery consists of modular DNA sequence elements, called cis-modules, that interact with transcription factors. The cis-modules “read” the information contained in the regulatory state of the cell through transcription factor binding, “process” it, and directly or indirectly communicate with the basal transcription apparatus to determine gene expression. This endowment of each gene with the information-receiving capacity through their cis-regulatory modules is essential for the response to every possible regulatory state to which it might be exposed during all phases of the life cycle and in all cell types. We present here a set of challenges addressed by our CYRENE research project aimed at studying the cis-regulatory code of the regulatory genome. The CYRENE Project is devoted to (1) the construction of a database, the cis-Lexicon, containing comprehensive information across species about experimentally validated cis-regulatory modules; and (2) the software development of a next-generation genome browser, the cis-Browser, specialized for the regulatory genome. The presentation is anchored on three main computational challenges: the Gene Naming Problem, the Consensus Sequence Bottleneck Problem, and the Logic Function Inference Problem.

  4. Peptide sequences identified by phage display are immunodominant functional motifs of Pet and Pic serine proteases secreted by Escherichia coli and Shigella flexneri.

    PubMed

    Ulises, Hernández-Chiñas; Tatiana, Gazarian; Karlen, Gazarian; Guillermo, Mendoza-Hernández; Juan, Xicohtencatl-Cortes; Carlos, Eslava

    2009-12-01

    Plasmid-encoded toxin (Pet) and protein involved in colonization (Pic), are serine protease autotransporters of Enterobacteriaceae (SPATEs) secreted by enteroaggregative Escherichia coli (EAEC), which display the GDSGSG sequence or the serine motif. Our research was directed to localize functional sites in both proteins using the phage display method. From a 12mer linear and a 7mer cysteine-constrained (C7C) libraries displayed on the M13 phage pIII protein we selected different mimotopes using IgG purified from sera of children naturally infected with EAEC producing Pet and Pic proteins, and anti-Pet and anti-Pic IgG purified from rabbits immunized with each one of these proteins. Children IgG selected a homologous group of sequences forming the consensus sequence, motif, PQPxK, and the motifs PGxI/LN and CxPDDSSxC were selected by the rabbit anti-Pet and anti-Pic IgGs, respectively. Analysis of the amino terminal region of a panel of SPATEs showed the presence in all of them of sequences matching the PGxI/LN or CxPDDSSxC motifs, and in a three-dimensional model (Modeller 9v2) designed for Pet, both these motifs were found in the globular portion of the protein, close to the protease active site GDSGSG. Antibodies induced in mice by mimotopes carrying the three aforementioned motifs were reactive with Pet, Pic, and with synthetic peptides carrying the immunogenic mimotope sequences TYPGYINHSKA and LLPQPPKLLLP, thus confirming that the peptide moiety of the selected phages induced the antibodies specific for the toxins. The antibodies induced in mice to the PGxI/LN and CxPDDSSxC mimotopes inhibited fodrin proteolysis and macrophage chemotaxis biological activities of Pet. Our results showed that we were able to generate, by a phage display procedure, mimotopes with sequence motifs PGxI/LN and CxPDDSSxC, and to identify them as functional motifs of the Pet, Pic and other SPATEs involved in their biological activities.

  5. High-throughput sequencing enhanced phage display enables the identification of patient-specific epitope motifs in serum

    PubMed Central

    Christiansen, Anders; Kringelum, Jens V.; Hansen, Christian S.; Bøgh, Katrine L.; Sullivan, Eric; Patel, Jigar; Rigby, Neil M.; Eiwegger, Thomas; Szépfalusi, Zsolt; Masi, Federico de; Nielsen, Morten; Lund, Ole; Dufva, Martin

    2015-01-01

    Phage display is a prominent screening technique with a multitude of applications including therapeutic antibody development and mapping of antigen epitopes. In this study, phages were selected based on their interaction with patient serum and exhaustively characterised by high-throughput sequencing. A bioinformatics approach was developed in order to identify peptide motifs of interest based on clustering and contrasting to control samples. Comparison of patient and control samples confirmed a major issue in phage display, namely the selection of unspecific peptides. The potential of the bioinformatic approach was demonstrated by identifying epitopes of a prominent peanut allergen, Ara h 1, in sera from patients with severe peanut allergy. The identified epitopes were confirmed by high-density peptide micro-arrays. The present study demonstrates that high-throughput sequencing can empower phage display by (i) enabling the analysis of complex biological samples, (ii) circumventing the traditional laborious picking and functional testing of individual phage clones and (iii) reducing the number of selection rounds. PMID:26246327

  6. CpG island erosion, polycomb occupancy and sequence motif enrichment at bivalent promoters in mammalian embryonic stem cells.

    PubMed

    Mantsoki, Anna; Devailly, Guillaume; Joshi, Anagha

    2015-11-19

    In embryonic stem (ES) cells, developmental regulators have a characteristic bivalent chromatin signature marked by simultaneous presence of both activation (H3K4me3) and repression (H3K27me3) signals and are thought to be in a 'poised' state for subsequent activation or silencing during differentiation. We collected eleven pairs (H3K4me3 and H3K27me3) of ChIP sequencing datasets in human ES cells and eight pairs in murine ES cells, and predicted high-confidence (HC) bivalent promoters. Over 85% of H3K27me3 marked promoters were bivalent in human and mouse ES cells. We found that (i) HC bivalent promoters were enriched for developmental factors and were highly likely to be differentially expressed upon transcription factor perturbation; (ii) murine HC bivalent promoters were occupied by both polycomb repressive component classes (PRC1 and PRC2) and grouped into four distinct clusters with different biological functions; (iii) HC bivalent and active promoters were CpG rich while H3K27me3-only promoters lacked CpG islands. Binding enrichment of distinct sets of regulators distinguished bivalent from active promoters. Moreover, a 'TCCCC' sequence motif was specifically enriched in bivalent promoters. Finally, this analysis will serve as a resource for future studies to further understand transcriptional regulation during embryonic development.

  7. CpG island erosion, polycomb occupancy and sequence motif enrichment at bivalent promoters in mammalian embryonic stem cells

    PubMed Central

    Mantsoki, Anna; Devailly, Guillaume; Joshi, Anagha

    2015-01-01

    In embryonic stem (ES) cells, developmental regulators have a characteristic bivalent chromatin signature marked by simultaneous presence of both activation (H3K4me3) and repression (H3K27me3) signals and are thought to be in a ‘poised’ state for subsequent activation or silencing during differentiation. We collected eleven pairs (H3K4me3 and H3K27me3) of ChIP sequencing datasets in human ES cells and eight pairs in murine ES cells, and predicted high-confidence (HC) bivalent promoters. Over 85% of H3K27me3 marked promoters were bivalent in human and mouse ES cells. We found that (i) HC bivalent promoters were enriched for developmental factors and were highly likely to be differentially expressed upon transcription factor perturbation; (ii) murine HC bivalent promoters were occupied by both polycomb repressive component classes (PRC1 and PRC2) and grouped into four distinct clusters with different biological functions; (iii) HC bivalent and active promoters were CpG rich while H3K27me3-only promoters lacked CpG islands. Binding enrichment of distinct sets of regulators distinguished bivalent from active promoters. Moreover, a ‘TCCCC’ sequence motif was specifically enriched in bivalent promoters. Finally, this analysis will serve as a resource for future studies to further understand transcriptional regulation during embryonic development. PMID:26582124

  8. Contribution of Sequence Motif, Chromatin State, and DNA Structure Features to Predictive Models of Transcription Factor Binding in Yeast.

    PubMed

    Tsai, Zing Tsung-Yeh; Shiu, Shin-Han; Tsai, Huai-Kuang

    2015-08-01

    Transcription factor (TF) binding is determined by the presence of specific sequence motifs (SM) and chromatin accessibility, where the latter is influenced by both chromatin state (CS) and DNA structure (DS) properties. Although SM, CS, and DS have been used to predict TF binding sites, a predictive model that jointly considers CS and DS has not been developed to predict either TF-specific binding or general binding properties of TFs. Using budding yeast as model, we found that machine learning classifiers trained with either CS or DS features alone perform better in predicting TF-specific binding compared to SM-based classifiers. In addition, simultaneously considering CS and DS further improves the accuracy of the TF binding predictions, indicating the highly complementary nature of these two properties. The contributions of SM, CS, and DS features to binding site predictions differ greatly between TFs, allowing TF-specific predictions and potentially reflecting different TF binding mechanisms. In addition, a "TF-agnostic" predictive model based on three DNA "intrinsic properties" (in silico predicted nucleosome occupancy, major groove geometry, and dinucleotide free energy) that can be calculated from genomic sequences alone has performance that rivals the model incorporating experiment-derived data. This intrinsic property model allows prediction of binding regions not only across TFs, but also across DNA-binding domain families with distinct structural folds. Furthermore, these predicted binding regions can help identify TF binding sites that have a significant impact on target gene expression. Because the intrinsic property model allows prediction of binding regions across DNA-binding domain families, it is TF agnostic and likely describes general binding potential of TFs. Thus, our findings suggest that it is feasible to establish a TF agnostic model for identifying functional regulatory regions in potentially any sequenced genome.

  9. Contribution of Sequence Motif, Chromatin State, and DNA Structure Features to Predictive Models of Transcription Factor Binding in Yeast

    PubMed Central

    Tsai, Zing Tsung-Yeh; Shiu, Shin-Han; Tsai, Huai-Kuang

    2015-01-01

    Transcription factor (TF) binding is determined by the presence of specific sequence motifs (SM) and chromatin accessibility, where the latter is influenced by both chromatin state (CS) and DNA structure (DS) properties. Although SM, CS, and DS have been used to predict TF binding sites, a predictive model that jointly considers CS and DS has not been developed to predict either TF-specific binding or general binding properties of TFs. Using budding yeast as model, we found that machine learning classifiers trained with either CS or DS features alone perform better in predicting TF-specific binding compared to SM-based classifiers. In addition, simultaneously considering CS and DS further improves the accuracy of the TF binding predictions, indicating the highly complementary nature of these two properties. The contributions of SM, CS, and DS features to binding site predictions differ greatly between TFs, allowing TF-specific predictions and potentially reflecting different TF binding mechanisms. In addition, a "TF-agnostic" predictive model based on three DNA “intrinsic properties” (in silico predicted nucleosome occupancy, major groove geometry, and dinucleotide free energy) that can be calculated from genomic sequences alone has performance that rivals the model incorporating experiment-derived data. This intrinsic property model allows prediction of binding regions not only across TFs, but also across DNA-binding domain families with distinct structural folds. Furthermore, these predicted binding regions can help identify TF binding sites that have a significant impact on target gene expression. Because the intrinsic property model allows prediction of binding regions across DNA-binding domain families, it is TF agnostic and likely describes general binding potential of TFs. Thus, our findings suggest that it is feasible to establish a TF agnostic model for identifying functional regulatory regions in potentially any sequenced genome. PMID:26291518

  10. Flow Cytometry-assisted Cloning of Specific Sequence Motifs fromComplex 16S ribosomal RNA Gene Libraries.

    SciTech Connect

    Nielsen, J.L.; Schramm, A.; Bernhard, A.E.; van den Engh, G.J.; Stahl, D.A.

    2004-07-21

    A flow cytometry method was developed for rapid screeningand recovery of cloned DNA containing common sequence motifs. Thisapproach, termed fluorescence-activated cell sorting-assisted cloning,was used to recover sequences affiliated with a unique lineage within theBacteroidetes not abundant in a clone library of environmental 16S rRNAgenes. Retrieval and sequence analysis of phylogenetically informativegenes has become a standard cultivation-independent technique toinvestigate microbial diversity in nature (7, 18). Genes encoding the 16SrRNA, because of the relative ease of their selective amplification, havebeen most frequently employed for general diversity surveys (16).Environmental studies have also focused on specific subpopulationsaffiliated with a phylogenetic group or identified by genes encodingspecific metabolic functions (e.g., ammonia oxidation, sulfaterespiration, and nitrate reduction) (8,15,20). However, specificpopulations may be of low abundance (1,23), or the genes encodingspecific metabolic functions may be insufficiently conserved to providepriming sites for general PCR amplification. Three general approacheshave been used to obtain 16S rRNA sequence information from low-abundancepopulations: screening hundreds to thousands of clones in a general 16SrRNA gene library (21), flow cytometric sorting of a subpopulation ofenvironmentally derived cells labeled by fluorescent in situhybridization (FISH) (27), or selective PCR amplification using primersspecific for the subpopulation (2,23). While the first approach is simplytime-consuming and tedious, the second has been restricted to fairlylarge and strongly fluorescent cells from aquatic samples (5, 27). Thethird approach often generates fragments of only a few hundred bases dueto the limited number of specific priming sites. Partial sequenceinformation often degrades analysis, obscuring or distorting thephylogenetic placement of the new sequences (11, 20). A more robustcharacterization of environ

  11. Localization and trafficking of an isoform of the AtPRA1 family to the Golgi apparatus depend on both N- and C-terminal sequence motifs.

    PubMed

    Jung, Chan Jin; Lee, Myoung Hui; Min, Myung Ki; Hwang, Inhwan

    2011-02-01

    Prenylated Rab acceptors (PRAs) bind to prenylated Rab proteins and possibly aid in targeting Rabs to their respective compartments. In Arabidopsis, 19 isoforms of PRA1 have been identified and, depending upon the isoforms, they localize to the endoplasmic reticulum (ER), Golgi apparatus and endosomes. Here, we investigated the localization and trafficking of AtPRA1.B6, an isoform of the Arabidopsis PRA1 family. In colocalization experiments with various organellar markers, AtPRA1.B6 tagged with hemagglutinin (HA) at the N-terminus localized to the Golgi apparatus in protoplasts and transgenic plants. The valine residue at the C-terminal end and an EEE motif in the C-terminal cytoplasmic domain were critical for anterograde trafficking from the ER to the Golgi apparatus. The N-terminal region contained a sequence motif for retention of AtPRA1.B6 at the Golgi apparatus. In addition, anterograde trafficking of AtPRA1.B6 from the ER to the Golgi apparatus was highly sensitive to the HA:AtPRA1.B6 level. The region that contains the sequence motif for Golgi retention also conferred the abundance-dependent trafficking inhibition. On the basis of these results, we propose that AtPRA1.B6 localizes to the Golgi apparatus and its ER-to-Golgi trafficking and localization to the Golgi apparatus are regulated by multiple sequence motifs in both the C- and N-terminal cytoplasmic domains.

  12. Protospacer recognition motifs

    PubMed Central

    Shah, Shiraz A.; Erdmann, Susanne; Mojica, Francisco J.M.; Garrett, Roger A.

    2013-01-01

    Protospacer adjacent motifs (PAMs) were originally characterized for CRISPR-Cas systems that were classified on the basis of their CRISPR repeat sequences. A few short 2–5 bp sequences were identified adjacent to one end of the protospacers. Experimental and bioinformatical results linked the motif to the excision of protospacers and their insertion into CRISPR loci. Subsequently, evidence accumulated from different virus- and plasmid-targeting assays, suggesting that these motifs were also recognized during DNA interference, at least for the recently classified type I and type II CRISPR-based systems. The two processes, spacer acquisition and protospacer interference, employ different molecular mechanisms, and there is increasing evidence to suggest that the sequence motifs that are recognized, while overlapping, are unlikely to be identical. In this article, we consider the properties of PAM sequences and summarize the evidence for their dual functional roles. It is proposed to use the terms protospacer associated motif (PAM) for the conserved DNA sequence and to employ spacer acqusition motif (SAM) and target interference motif (TIM), respectively, for acquisition and interference recognition sites. PMID:23403393

  13. Bases of motifs for generating repeated patterns with wild cards.

    PubMed

    Pisanti, Nadia; Crochemore, Maxime; Grossi, Roberto; Sagot, Marie-France

    2005-01-01

    Motif inference represents one of the most important areas of research in computational biology, and one of its oldest ones. Despite this, the problem remains very much open in the sense that no existing definition is fully satisfying, either in formal terms, or in relation to the biological questions that involve finding such motifs. Two main types of motifs have been considered in the literature: matrices (of letter frequency per position in the motif) and patterns. There is no conclusive evidence in favor of either, and recent work has attempted to integrate the two types into a single model. In this paper, we address the formal issue in relation to motifs as patterns. This is essential to get at a better understanding of motifs in general. In particular, we consider a promising idea that was recently proposed, which attempted to avoid the combinatorial explosion in the number of motifs by means of a generator set for the motifs. Instead of exhibiting a complete list of motifs satisfying some input constraints, what is produced is a basis of such motifs from which all the other ones can be generated. We study the computational cost of determining such a basis of repeated motifs with wild cards in a sequence. We give new upper and lower bounds on such a cost, introducing a notion of basis that is provably contained in (and, thus, smaller) than previously defined ones. Our basis can be computed in less time and space, and is still able to generate the same set of motifs. We also prove that the number of motifs in all bases defined so far grows exponentially with the quorum, that is, with the minimal number of times a motif must appear in a sequence, something unnoticed in previous work. We show that there is no hope to efficiently compute such bases unless the quorum is fixed.

  14. Identification of a novel calcium binding motif based on the detection of sequence insertions in the animal peroxidase domain of bacterial proteins.

    PubMed

    Santamaría-Hernando, Saray; Krell, Tino; Ramos-González, María-Isabel

    2012-01-01

    Proteins of the animal heme peroxidase (ANP) superfamily differ greatly in size since they have either one or two catalytic domains that match profile PS50292. The orf PP_2561 of Pseudomonas putida KT2440 that we have called PepA encodes a two-domain ANP. The alignment of these domains with those of PepA homologues revealed a variable number of insertions with the consensus G-x-D-G-x-x-[GN]-[TN]-x-D-D. This motif has also been detected in the structure of pseudopilin (pdb 3G20), where it was found to be involved in Ca(2+) coordination although a sequence analysis did not reveal the presence of any known calcium binding motifs in this protein. Isothermal titration calorimetry revealed that a peptide containing this consensus motif bound specifically calcium ions with affinities ranging between 33-79 µM depending on the pH. Microcalorimetric titrations of the purified N-terminal ANP-like domain of PepA revealed Ca(2+) binding with a K(D) of 12 µM and stoichiometry of 1.25 calcium ions per protein monomer. This domain exhibited peroxidase activity after its reconstitution with heme. These data led to the definition of a novel calcium binding motif that we have termed PERCAL and which was abundantly present in animal peroxidase-like domains of bacterial proteins. Bacterial heme peroxidases thus possess two different types of calcium binding motifs, namely PERCAL and the related hemolysin type calcium binding motif, with the latter being located outside the catalytic domains and in their C-terminal end. A phylogenetic tree of ANP-like catalytic domains of bacterial proteins with PERCAL motifs, including single domain peroxidases, was divided into two major clusters, representing domains with and without PERCAL motif containing insertions. We have verified that the recently reported classification of bacterial heme peroxidases in two families (cd09819 and cd09821) is unrelated to these insertions. Sequences matching PERCAL were detected in all kingdoms of life.

  15. Integrating motif, DNA accessibility and gene expression data to build regulatory maps in an organism

    PubMed Central

    Blatti, Charles; Kazemian, Majid; Wolfe, Scot; Brodsky, Michael; Sinha, Saurabh

    2015-01-01

    Characterization of cell type specific regulatory networks and elements is a major challenge in genomics, and emerging strategies frequently employ high-throughput genome-wide assays of transcription factor (TF) to DNA binding, histone modifications or chromatin state. However, these experiments remain too difficult/expensive for many laboratories to apply comprehensively to their system of interest. Here, we explore the potential of elucidating regulatory systems in varied cell types using computational techniques that rely on only data of gene expression, low-resolution chromatin accessibility, and TF–DNA binding specificities (‘motifs’). We show that static computational motif scans overlaid with chromatin accessibility data reasonably approximate experimentally measured TF–DNA binding. We demonstrate that predicted binding profiles and expression patterns of hundreds of TFs are sufficient to identify major regulators of ∼200 spatiotemporal expression domains in the Drosophila embryo. We are then able to learn reliable statistical models of enhancer activity for over 70 expression domains and apply those models to annotate domain specific enhancers genome-wide. Throughout this work, we apply our motif and accessibility based approach to comprehensively characterize the regulatory network of fruitfly embryonic development and show that the accuracy of our computational method compares favorably to approaches that rely on data from many experimental assays. PMID:25791631

  16. Formation and Dissociation of the Interstrand i-Motif by the Sequences d(XnC4Ym) Monitored with Electrospray Ionization Mass Spectrometry

    NASA Astrophysics Data System (ADS)

    Cao, Yanwei; Qin, Yujiao; Bruist, Michael; Gao, Shang; Wang, Bing; Wang, Huixin; Guo, Xinhua

    2015-06-01

    Formation and dissociation of the interstrand i-motifs by DNA with the sequence d(XnC4Ym) (X and Y represent thymine, adenine, or guanine, and n, m range from 0 to 2) are studied with electrospray ionization mass spectrometry (ESI-MS), circular dichroism (CD), and UV spectrophotometry. The ion complexes detected in the gas phase and the melting temperatures (Tm) obtained in solution show that a non-C base residue located at 5' end favors formation of the four-stranded structures, with T > A > G for imparting stability. Comparatively, no rule is found when a non-C base is located at the 3' end. Detection of penta- and hexa-stranded ions indicates the formation of i-motifs with more than four strands. In addition, the i-motifs seen in our mass spectra are accompanied by single-, double-, and triple-stranded ions, and the trimeric ions were always less abundant during annealing and heat-induced dissociation process of the DNA strands in solution (pH = 4.5). This provides a direct evidence of a strand-by-strand formation and dissociation pathway of the interstrand i-motif and formation of the triple strands is the rate-limiting step. In contrast, the trimeric ions are abundant when the tetramolecular ions are subjected to collision-induced dissociation (CID) in the gas phase, suggesting different dissociation behaviors of the interstrand i-motif in the gas phase and in solution. Furthermore, hysteretic UV absorption melting and cooling curves reveal an irreversible dissociation and association kinetic process of the interstrand i-motif in solution.

  17. Modeling and analysis of MH1 domain of Smads and their interaction with promoter DNA sequence motif.

    PubMed

    Makkar, Pooja; Metpally, Raghu Prasad R; Sangadala, Sreedhara; Reddy, Boojala Vijay B

    2009-04-01

    The Smads are a group of related intracellular proteins critical for transmitting the signals to the nucleus from the transforming growth factor-beta (TGF-beta) superfamily of proteins at the cell surface. The prototypic members of the Smad family, Mad and Sma, were first described in Drosophila and Caenorhabditis elegans, respectively. Related proteins in Xenopus, Humans, Mice and Rats were subsequently identified, and are now known as Smads. Smad protein family members act downstream in the TGF-beta signaling pathway mediating various biological processes, including cell growth, differentiation, matrix production, apoptosis and development. Smads range from about 400-500 amino acids in length and are grouped into the receptor-regulated Smads (R-Smads), the common Smads (Co-Smads) and the inhibitory Smads (I-Smads). There are eight Smads in mammals, Smad1/5/8 (bone morphogenetic protein regulated) and Smad2/3 (TGF-beta/activin regulated) are termed R-Smads, Smad4 is denoted as Co-Smad and Smad6/7 are inhibitory Smads. A typical Smad consists of a conserved N-terminal Mad Homology 1 (MH1) domain and a C-terminal Mad Homology 2 (MH2) domain connected by a proline rich linker. The MH1 domain plays key role in DNA recognition and also facilitates the binding of Smad4 to the phosphorylated C-terminus of R-Smads to form activated complex. The MH2 domain exhibits transcriptional activation properties. In order to understand the structural basis of interaction of various Smads with their target proteins and the promoter DNA, we modeled MH1 domain of the remaining mammalian Smads based on known crystal structures of Smad3-MH1 domain bound to GTCT Smad box DNA sequence (1OZJ). We generated a B-DNA structure using average base-pair parameters of Twist, Tilt, Roll and base Slide angles. We then modeled interaction pose of the MH1 domain of Smad1/5/8 to their corresponding DNA sequence motif GCCG. These models provide the structural basis towards understanding functional

  18. Using Weeder, Pscan, and PscanChIP for the Discovery of Enriched Transcription Factor Binding Site Motifs in Nucleotide Sequences.

    PubMed

    Zambelli, Federico; Pesole, Graziano; Pavesi, Giulio

    2014-09-08

    One of the greatest challenges facing modern molecular biology is understanding the complex mechanisms regulating gene expression. A fundamental step in this process requires the characterization of sequence motifs involved in the regulation of gene expression at transcriptional and post-transcriptional levels. In particular, transcription is modulated by the interaction of transcription factors (TFs) with their corresponding binding sites. Weeder, Pscan, and PscanChIP are software tools freely available for noncommercial users as a stand-alone or Web-based applications for the automatic discovery of conserved motifs in a set of DNA sequences likely to be bound by the same TFs. Input for the tools can be promoter sequences from co-expressed or co-regulated genes (for which Weeder and Pscan are suitable), or regions identified through genome wide ChIP-seq or similar experiments (Weeder and PscanChIP). The motifs are either found by a de novo approach (Weeder) or by using descriptors of the binding specificity of TFs (Pscan and PscanChIP).

  19. Retinoic acid-induced down-regulation of the interleukin-2 promoter via cis-regulatory sequences containing an octamer motif.

    PubMed Central

    Felli, M P; Vacca, A; Meco, D; Screpanti, I; Farina, A R; Maroder, M; Martinotti, S; Petrangeli, E; Frati, L; Gulino, A

    1991-01-01

    Retinoic acid (RA) is known to influence the proliferation and differentiation of a wide variety of transformed and developing cells. We found that RA and the specific RA receptor (RAR) ligand Ch55 inhibited the phorbol ester and calcium ionophore-induced expression of the T-cell growth factor interleukin-2 (IL-2) gene. Expression of transiently transfected chloramphenicol acetyltransferase vectors containing the 5'-flanking region of the IL-2 gene was also inhibited by RA. RA-induced down-regulation of the IL-2 enhancer is mediated by RAR, since overexpression of transfected RARs increased RA sensitivity of the IL-2 promoter. Functional analysis of chloramphenicol acetyltransferase vectors containing either internal deletion mutants of the region from -317 to +47 bp of the IL-2 enhancer or multimerized cis-regulatory elements showed that the RA-responsive element in the IL-2 promoter mapped to sequences containing an octamer motif. RAR also inhibited the transcriptional activity of the octamer motif of the immunoglobulin heavy chain enhancer. In spite of the transcriptional inhibition of the IL-2 octamer motif, RA did not decrease the in vitro DNA-binding capability of octamer-1 protein. These results identify a regulatory pathway within the IL-2 promoter which involves the octamer motif and RAR. Images PMID:1652063

  20. The stabilized supralinear network: a unifying circuit motif underlying multi-input integration in sensory cortex.

    PubMed

    Rubin, Daniel B; Van Hooser, Stephen D; Miller, Kenneth D

    2015-01-21

    Neurons in sensory cortex integrate multiple influences to parse objects and support perception. Across multiple cortical areas, integration is characterized by two neuronal response properties: (1) surround suppression--modulatory contextual stimuli suppress responses to driving stimuli; and (2) "normalization"--responses to multiple driving stimuli add sublinearly. These depend on input strength: for weak driving stimuli, contextual influences facilitate or more weakly suppress and summation becomes linear or supralinear. Understanding the circuit operations underlying integration is critical to understanding cortical function and disease. We present a simple, general theory. A wealth of integrative properties, including the above, emerge robustly from four cortical circuit properties: (1) supralinear neuronal input/output functions; (2) sufficiently strong recurrent excitation; (3) feedback inhibition; and (4) simple spatial properties of intracortical connections. Integrative properties emerge dynamically as circuit properties, with excitatory and inhibitory neurons showing similar behaviors. In new recordings in visual cortex, we confirm key model predictions.

  1. The stabilized supralinear network: A unifying circuit motif underlying multi-input integration in sensory cortex

    PubMed Central

    Rubin, Daniel B.; Van Hooser, Stephen D.; Miller, Kenneth D.

    2014-01-01

    Summary Neurons in sensory cortex integrate multiple influences to parse objects and support perception. Across multiple cortical areas, integration is characterized by two neuronal response properties: (1) surround suppression: modulatory contextual stimuli suppress responses to driving stimuli; (2) “normalization”: responses to multiple driving stimuli add sublinearly. These properties depend on input strength: for weak driving stimuli, contextual influences more weakly suppress or facilitate and summation becomes linear or supralinear. Understanding the circuit operations underlying integration is critical to understanding cortical function and disease. We present a simple, general theory. A wealth of integrative properties including the above emerge robustly from four properties of cortical circuitry: (1) supralinear neuronal input/output functions; (2) sufficiently strong recurrent excitation; (3) feedback inhibition; (4) simple spatial properties of intracortical connections. Integrative properties emerge dynamically as circuit properties, with excitatory and inhibitory neurons showing similar behaviors. In new recordings in visual cortex, we confirm key model predictions. PMID:25611511

  2. Nuclear Magnetic Resonance Structural Mapping Reveals Promiscuous Interactions between Clathrin-Box Motif Sequences and the N-Terminal Domain of the Clathrin Heavy Chain

    PubMed Central

    2016-01-01

    The recruitment and organization of clathrin at endocytic sites first to form coated pits and then clathrin-coated vesicles depend on interactions between the clathrin N-terminal domain (TD) and multiple clathrin binding sequences on the cargo adaptor and accessory proteins that are concentrated at such sites. Up to four distinct protein binding sites have been proposed to be present on the clathrin TD, with each site proposed to interact with a distinct clathrin binding motif. However, an understanding of how such interactions contribute to clathrin coat assembly must take into account observations that any three of these four sites on clathrin TD can be mutationally ablated without causing loss of clathrin-mediated endocytosis. To take an unbiased approach to mapping binding sites for clathrin-box motifs on clathrin TD, we used isothermal titration calorimetry (ITC) and nuclear magnetic resonance spectroscopy. Our ITC experiments revealed that a canonical clathrin-box motif peptide from the AP-2 adaptor binds to clathrin TD with a stoichiometry of 3:1. Assignment of 90% of the total visible amide resonances in the TROSY-HSQC spectrum of 13C-, 2H-, and 15N-labeled TD40 allowed us to map these three binding sites by analyzing the chemical shift changes as clathrin-box motif peptides were titrated into clathrin TD. We found that three different clathrin-box motif peptides can each simultaneously bind not only to the previously characterized clathrin-box site but also to the W-box site and the β-arrestin splice loop site on a single TD. The promiscuity of these binding sites can help explain why their mutation does not lead to larger effects on clathrin function and suggests a mechanism by which clathrin may be transferred between different proteins during the course of an endocytic event. PMID:25844500

  3. Stably integrated mouse mammary tumor virus long terminal repeat DNA requires the octamer motifs for basal promoter activity.

    PubMed Central

    Buetti, E

    1994-01-01

    In the mouse mammary tumor virus promoter, a tandem of octamer motifs, recognized by ubiquitous and tissue-restricted Oct transcription factors, is located upstream of the TATA box and next to a binding site for the transcription factor nuclear factor I (NF-I). Their function was investigated with mutant long terminal repeats under different transfection conditions in mouse Ltk- cells and quantitative S1 nuclease mapping of the transcripts. In stable transfectants, which are most representative of the state of proviral DNA with respect to both number of integrated DNA templates and chromatin organization, a long terminal repeat mutant of both octamer sites showed an average 50-fold reduction of the basal transcription level, while the dexamethasone-stimulated level was unaffected. DNase I in vitro footprinting assays with L-cell nuclear protein extracts showed that the mutant DNA was unable to bind octamer factors but had a normal footprint in the NF-I site. I conclude that mouse mammary tumor virus employs the tandem octamer motifs of the viral promoter, recognized by the ubiquitous transcription factor Oct-1, for its basal transcriptional activity and the NF-I binding site, as previously shown, for glucocorticoid-stimulated transcription. A deletion mutant with only one octamer site showed a marked base-level reduction at high copy number but little reduction at low copies of integrated plasmids. The observed transcription levels may depend both on the relative ratio of transcription factors to DNA templates and on the relative affinity of binding sites, as determined by oligonucleotide competition footprinting. Images PMID:8289800

  4. Crystal structure of interleukin-21 receptor (IL-21R) bound to IL-21 reveals that sugar chain interacting with WSXWS motif is integral part of IL-21R.

    PubMed

    Hamming, Ole J; Kang, Lishan; Svensson, Anders; Karlsen, Jesper L; Rahbek-Nielsen, Henrik; Paludan, Søren R; Hjorth, Siv A; Bondensgaard, Kent; Hartmann, Rune

    2012-03-16

    IL-21 is a class I cytokine that exerts pleiotropic effects on both innate and adaptive immune responses. It signals through a heterodimeric receptor complex consisting of the IL-21 receptor (IL-21R) and the common γ-chain. A hallmark of the class I cytokine receptors is the class I cytokine receptor signature motif (WSXWS). The exact role of this motif has not been determined yet; however, it has been implicated in diverse functions, including ligand binding, receptor internalization, proper folding, and export, as well as signal transduction. Furthermore, the WXXW motif is known to be a consensus sequence for C-mannosylation. Here, we present the crystal structure of IL-21 bound to IL-21R and reveal that the WSXWS motif of IL-21R is C-mannosylated at the first tryptophan. We furthermore demonstrate that a sugar chain bridges the two fibronectin domains that constitute the extracellular domain of IL-21R and anchors at the WSXWS motif through an extensive hydrogen bonding network, including mannosylation. The glycan thus transforms the V-shaped receptor into an A-frame. This finding offers a novel structural explanation of the role of the class I cytokine signature motif.

  5. Crystal Structure of Interleukin-21 Receptor (IL-21R) Bound to IL-21 Reveals That Sugar Chain Interacting with WSXWS Motif Is Integral Part of IL-21R*

    PubMed Central

    Hamming, Ole J.; Kang, Lishan; Svensson, Anders; Karlsen, Jesper L.; Rahbek-Nielsen, Henrik; Paludan, Søren R.; Hjorth, Siv A.; Bondensgaard, Kent; Hartmann, Rune

    2012-01-01

    IL-21 is a class I cytokine that exerts pleiotropic effects on both innate and adaptive immune responses. It signals through a heterodimeric receptor complex consisting of the IL-21 receptor (IL-21R) and the common γ-chain. A hallmark of the class I cytokine receptors is the class I cytokine receptor signature motif (WSXWS). The exact role of this motif has not been determined yet; however, it has been implicated in diverse functions, including ligand binding, receptor internalization, proper folding, and export, as well as signal transduction. Furthermore, the WXXW motif is known to be a consensus sequence for C-mannosylation. Here, we present the crystal structure of IL-21 bound to IL-21R and reveal that the WSXWS motif of IL-21R is C-mannosylated at the first tryptophan. We furthermore demonstrate that a sugar chain bridges the two fibronectin domains that constitute the extracellular domain of IL-21R and anchors at the WSXWS motif through an extensive hydrogen bonding network, including mannosylation. The glycan thus transforms the V-shaped receptor into an A-frame. This finding offers a novel structural explanation of the role of the class I cytokine signature motif. PMID:22235133

  6. Sequence and spatiotemporal expression analysis of CLE-motif containing genes from the reniform nematode (Rotylenchulus reniformis Linford & Oliveira)

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The reniform nematode, Rotylenchulus reniformis, is a sedentary semi-endoparasitic species with a host range that encompasses more than 77 plant families. Nematode effector proteins containing plant-ligand motifs similar to CLAVATA3/ESR (CLE) peptides have been identified in the Heterodera, Globode...

  7. Helicobacter pylori CagA: analysis of sequence diversity in relation to phosphorylation motifs and implications for the role of CagA as a virulence factor.

    PubMed

    Evans, D J; Evans, D G

    2001-09-01

    CagA is transported into host target cells and subsequently phosphorylated. Clearly this is a mechanism by which Helicobacter pylori could take control of one or more host cell signal transduction pathways. Presumably the end result of this interaction favors survival of H. pylori, irrespective of eventual damage to the host cell. CagA is noted for its amino acid (AA) sequence diversity, both within and outside the variable region of the molecule. The primary purpose of this review is to examine how variation in the type and number of CagA phosphorylation sites might determine the outcome of infection by different strains of H. pylori. The answer to this question could help to explain the widely disparate results obtained when H. pylori CagA status has been compared to type and severity of disease outcome in different populations, that is in different countries. Analysis of all available CagA sequences revealed that CagA contains both tyrosine phosphorylation motifs (TPMs) and cyclic-AMP-dependent phosphorylation motifs (CPMs). There are two potential CPMs near the N-terminus of CagA and at least two in the repeat region; these are not all equally well conserved. We also defined a 48-residue AA sequence, which includes the N-terminal TPM at tyrosine (Y)-122, which distinguishes between Eastern (Hong Kong-Taiwan-Japan-Thailand) H. pylori isolates and those from the West (Europe-Africa-the Americas-Australia). All 28 of the Eastern type CagA proteins have a functional N-terminal TPM whereas 11 of 47 (23.4%) of the Western type contain an inactive motif, with threonine (T) replacing the critical aspartic acid (D) residue. Only 13 of 24 (54%) known CagA sequences have an active TPM in the repeat region and only one has two TPMs in this region. The potential TPM near the C-terminus of CagA is not likely to be important since only 3 of 24 (12.5%) sequences were found to be intact. Protein database searches revealed that the AA sequence immediately following the TPM at Y

  8. Large Putative PEST-like Sequence Motif at the Carboxyl Tail of Human Calcium Receptor Directs Lysosomal Degradation and Regulates Cell Surface Receptor Level*

    PubMed Central

    Zhuang, Xiaolei; Northup, John K.; Ray, Kausik

    2012-01-01

    A deletion between amino acid residues Ser895 and Val1075 in the carboxyl terminus of the human calcium receptor (hCaR), which causes autosomal dominant hypocalcemia, showed enhanced signaling activity and increased cell surface expression in HEK293 cells (Lienhardt, A., Garabédian, M. G., Bai, M., Sinding, C., Zhang, Z., Lagarde, J. P., Boulesteix, J., Rigaud, M., Brown, E. M., and Kottler, M. L. (2000) J. Clin. Endocrinol. Metab. 85, 1695–1702). To identify the underlying mechanism(s) for these increases, we investigated the effects of carboxyl tail truncation and deletion in hCaR mutants using a combination of biochemical and cell imaging approaches to define motifs that participate in regulating cell surface numbers of this G protein-coupled receptor. Our data indicate a rapid constitutive receptor internalization of the cell surface hCaR, accumulating in early (Rab7 positive) and late endosomal (LAMP1 positive) sorting compartments, before targeting to lysosomes for degradation. Recycling of hCaR back to the cell surface was also evident. Truncation and deletion mapping defined a 51-amino acid sequence between residues 920 and 970 that is required for targeting to lysosomes and degradation but not for internalization or recycling of the receptor. No singular sequence motif was identified, instead the required sequence elements seem to distribute throughout this entire interval. This interval includes a high proportion of acidic and hydroxylated amino acid residues, suggesting a similarity to PEST-like degradation motif (PESTfind score of +10) and several glutamine repeats. The results define a novel large PEST-like sequence that participates in the sorting of internalized hCaR routed to the lysosomal/degradation pathway that regulates cell surface receptor numbers. PMID:22158862

  9. Localization of proteins to the 1,2-propanediol utilization microcompartment by non-native signal sequences is mediated by a common hydrophobic motif.

    PubMed

    Jakobson, Christopher M; Kim, Edward Y; Slininger, Marilyn F; Chien, Alex; Tullman-Ercek, Danielle

    2015-10-02

    Various bacteria localize metabolic pathways to proteinaceous organelles known as bacterial microcompartments (MCPs), enabling the metabolism of carbon sources to enhance survival and pathogenicity in the gut. There is considerable interest in exploiting bacterial MCPs for metabolic engineering applications, but little is known about the interactions between MCP signal sequences and the protein shells of different MCP systems. We found that the N-terminal sequences from the ethanolamine utilization (Eut) and glycyl radical-generating protein MCPs are able to target reporter proteins to the 1,2-propanediol utilization (Pdu) MCP, and that this localization is mediated by a conserved hydrophobic residue motif. Recapitulation of this motif by the addition of a single amino acid conferred targeting function on an N-terminal sequence from the ethanol utilization MCP system that previously did not act as a Pdu signal sequence. Moreover, the Pdu-localized signal sequences competed with native Pdu targeting sequences for encapsulation in the Pdu MCP. Salmonella enterica natively possesses both the Pdu and Eut operons, and our results suggest that Eut proteins might be localized to the Pdu MCP in vivo. We further demonstrate that S. enterica LT2 retained the ability to grow on 1,2-propanediol as the sole carbon source when a Pdu enzyme was replaced with its Eut homolog. Although the relevance of this finding to the native system remains to be explored, we show that the Pdu-localized signal sequences described herein allow control over the ratio of heterologous proteins encapsulated within Pdu MCPs.

  10. Linear array of conserved sequence motifs to discriminate protein subfamilies: study on pyridine nucleotide-disulfide reductases

    PubMed Central

    Avila, César L; Rapisarda, Viviana A; Farías, Ricardo N; De Las Rivas, Javier; Chehín, Rosana

    2007-01-01

    Background The pyridine nucleotide disulfide reductase (PNDR) is a large and heterogeneous protein family divided into two classes (I and II), which reflect the divergent evolution of its characteristic disulfide redox active site. However, not all the PNDR members fit into these categories and this suggests the need of further studies to achieve a more comprehensive classification of this complex family. Results A workflow to improve the clusterization of protein families based on the array of linear conserved motifs is designed. The method is applied to the PNDR large family finding two main groups, which correspond to PNDR classes I and II. However, two other separate protein clusters, previously classified as class I in most databases, are outgrouped: the peroxide reductases (NAOX, NAPE) and the type II NADH dehydrogenases (NDH-2). In this way, two novel PNDR classes III and IV for NAOX/NAPE and NDH-2 respectively are proposed. By knowledge-driven biochemical and functional data analyses done on the new class IV, a linear array of motifs putatively related to Cu(II)-reductase activity is detected in a specific subset of NDH-2. Conclusion The results presented are a novel contribution to the classification of the complex and large PNDR protein family, supporting its reclusterization into four classes. The linear array of motifs detected within the class IV PNDR subfamily could be useful as a signature for a particular subgroup of NDH-2. PMID:17367536

  11. Comparison of loline alkaloid gene clusters across fungal endophytes: predicting the co-regulatory sequence motifs and the evolutionary history.

    PubMed

    Kutil, Brandi L; Greenwald, Charles; Liu, Gang; Spiering, Martin J; Schardl, Christopher L; Wilkinson, Heather H

    2007-10-01

    LOL, a fungal secondary metabolite gene cluster found in Epichloë and Neotyphodium species, is responsible for production of insecticidal loline alkaloids. To analyze the genetic architecture and to predict the evolutionary history of LOL, we compared five clusters from four fungal species (single clusters from Epichloë festucae, Neotyphodium sp. PauTG-1, Neotyphodium coenophialum, and two clusters we previously characterized in Neotyphodium uncinatum). Using PhyloCon to compare putative lol gene promoter regions, we have identified four motifs conserved across the lol genes in all five clusters. Each motif has significant similarity to known fungal transcription factor binding sites in the TRANSFAC database. Conservation of these motifs is further support for the hypothesis that the lol genes are co-regulated. Interestingly, the history of asexual Neotyphodium spp. includes multiple interspecific hybridization events. Comparing clusters from three Neotyphodium species and E. festucae allowed us to determine which Epichloë ancestors are the most likely contributors of LOL in these asexual species. For example, while no present day Epichloë typhina isolates are known to produce lolines, our data support the hypothesis that the E. typhina ancestor(s) of three asexual endophyte species contained a LOL gene cluster. Thus, these data support a model of evolution in which the polymorphism in loline alkaloid production phenotypes among endophyte species is likely due to the loss of the trait over time.

  12. Analysis of sequences involved in IE2 transactivation of a baculovirus immediate-early gene promoter and identification of a new regulatory motif.

    PubMed

    Shippam-Brett, C E; Willis, L G; Theilmann, D A

    2001-05-01

    Opep-2 is a unique baculovirus early gene that has only been identified in the Orgyia pseudotsugata multiple capsid nucleopolyhedrovirus (OpMNPV). Previous analyses have shown this gene is expressed at very early times post-infection (p.i.) but is shut down by 36-48 h p.i. The promoter of opep-2 therefore, represents a class of early genes that is temporally regulated. In this study, a detailed analysis of the opep-2 promoter is performed to analyze the role individual motifs play in early gene expression. A new 13 base pair regulatory element was identified and shown to be essential in controlling high-level expression of this gene. In addition, mutational analysis revealed that GATA and CACGTG motifs, which have been shown to bind cellular factors in Sf9 and Ld652Y cells, played minor roles in influencing opep-2 expression in the absence of other viral factors. The OpMNPV transactivator IE2 causes a significant activation of the opep-2 promoter. Cotransfection of an extensive number of promoter deletions and mutations did not show any sequence specificity for IE2 transactivation. This is the first detailed analysis of the sequence requirements for IE2 transactivation, and these results suggest that IE2 does not bind directly to specific elements in the opep-2 promoter.

  13. Building dictionaries of 1D and 3D motifs by mining the Unaligned 1D sequences of 17 archaeal and bacterial genomes.

    PubMed

    Rigoutsos, I; Gao, Y; Floratos, A; Parida, L

    1999-01-01

    We have used the Teiresias algorithm to carry out unsupervised pattern discovery in a database containing the unaligned ORFs from the 17 publicly available complete archaeal and bacterial genomes and build a 1D dictionary of motifs. These motifs which we refer to as seqlets account for and cover 97.88% of this genomic input at the level of amino acid positions. Each of the seqlets in this 1D dictionary was located among the sequences in Release 38.0 of the Protein Data Bank and the structural fragments corresponding to each seqlet's instances were identified and aligned in three dimensions: those of the seqlets that resulted in RMSD errors below a pre-selected threshold of 2.5 Angstroms were entered in a 3D dictionary of structurally conserved seqlets. These two dictionaries can be thought of as cross-indices that facilitate the tackling of tasks such as automated functional annotation of genomic sequences, local homology identification, local structure characterization, comparative genomics, etc.

  14. Microfabricated bioprocessor for integrated nanoliter-scale Sanger DNA sequencing.

    PubMed

    Blazej, Robert G; Kumaresan, Palani; Mathies, Richard A

    2006-05-09

    An efficient, nanoliter-scale microfabricated bioprocessor integrating all three Sanger sequencing steps, thermal cycling, sample purification, and capillary electrophoresis, has been developed and evaluated. Hybrid glass-polydimethylsiloxane (PDMS) wafer-scale construction is used to combine 250-nl reactors, affinity-capture purification chambers, high-performance capillary electrophoresis channels, and pneumatic valves and pumps onto a single microfabricated device. Lab-on-a-chip-level integration enables complete Sanger sequencing from only 1 fmol of DNA template. Up to 556 continuous bases were sequenced with 99% accuracy, demonstrating read lengths required for de novo sequencing of human and other complex genomes. The performance of this miniaturized DNA sequencer provides a benchmark for predicting the ultimate cost and efficiency limits of Sanger sequencing.

  15. Platelet immunoreceptor tyrosine-based activation motif (ITAM) and hemITAM signaling and vascular integrity in inflammation and development.

    PubMed

    Lee, R H; Bergmeier, W

    2016-04-01

    Platelets are essential for maintaining hemostasis following mechanical injury to the vasculature. Besides this established function, novel roles of platelets are becoming increasingly recognized, which are critical in non-injury settings to maintain vascular barrier integrity. For example, during embryogenesis platelets act to support the proper separation of blood and lymphatic vessels. This role continues beyond birth, where platelets prevent leakage of blood into the lymphatic vessel network. During the course of inflammation, platelets are necessary to prevent local hemorrhage due to neutrophil diapedesis and disruption of endothelial cell-cell junctions. Surprisingly, platelets also work to secure tumor-associated blood vessels, inhibiting excessive vessel permeability and intra-tumor hemorrhaging. Interestingly, many of these novel platelet functions depend on immunoreceptor tyrosine-based activation motif (ITAM) signaling but not on signaling via G protein-coupled receptors, which plays a crucial role in platelet plug formation at sites of mechanical injury. Murine platelets express two ITAM-containing receptors: the Fc receptor γ-chain (FcRγ), which functionally associates with the collagen receptor GPVI, and the C-type lectin-like 2 (CLEC-2) receptor, a hemITAM receptor for the mucin-type glycoprotein podoplanin. Human platelets express an additional ITAM receptor, FcγRIIA. These receptors share common downstream effectors, including Syk, SLP-76 and PLCγ2. Here we will review the recent literature that highlights a critical role for platelet GPVI/FcRγ and CLEC-2 in vascular integrity during development and inflammation in mice and discuss the relevance to human disease.

  16. Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion

    PubMed Central

    Thomsen, Martin Christen Frølund; Nielsen, Morten

    2012-01-01

    Seq2Logo is a web-based sequence logo generator. Sequence logos are a graphical representation of the information content stored in a multiple sequence alignment (MSA) and provide a compact and highly intuitive representation of the position-specific amino acid composition of binding motifs, active sites, etc. in biological sequences. Accurate generation of sequence logos is often compromised by sequence redundancy and low number of observations. Moreover, most methods available for sequence logo generation focus on displaying the position-specific enrichment of amino acids, discarding the equally valuable information related to amino acid depletion. Seq2logo aims at resolving these issues allowing the user to include sequence weighting to correct for data redundancy, pseudo counts to correct for low number of observations and different logotype representations each capturing different aspects related to amino acid enrichment and depletion. Besides allowing input in the format of peptides and MSA, Seq2Logo accepts input as Blast sequence profiles, providing easy access for non-expert end-users to characterize and identify functionally conserved/variable amino acids in any given protein of interest. The output from the server is a sequence logo and a PSSM. Seq2Logo is available at http://www.cbs.dtu.dk/biotools/Seq2Logo (14 May 2012, date last accessed). PMID:22638583

  17. Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion.

    PubMed

    Thomsen, Martin Christen Frølund; Nielsen, Morten

    2012-07-01

    Seq2Logo is a web-based sequence logo generator. Sequence logos are a graphical representation of the information content stored in a multiple sequence alignment (MSA) and provide a compact and highly intuitive representation of the position-specific amino acid composition of binding motifs, active sites, etc. in biological sequences. Accurate generation of sequence logos is often compromised by sequence redundancy and low number of observations. Moreover, most methods available for sequence logo generation focus on displaying the position-specific enrichment of amino acids, discarding the equally valuable information related to amino acid depletion. Seq2logo aims at resolving these issues allowing the user to include sequence weighting to correct for data redundancy, pseudo counts to correct for low number of observations and different logotype representations each capturing different aspects related to amino acid enrichment and depletion. Besides allowing input in the format of peptides and MSA, Seq2Logo accepts input as Blast sequence profiles, providing easy access for non-expert end-users to characterize and identify functionally conserved/variable amino acids in any given protein of interest. The output from the server is a sequence logo and a PSSM. Seq2Logo is available at http://www.cbs.dtu.dk/biotools/Seq2Logo (14 May 2012, date last accessed).

  18. Relation between mRNA expression and sequence information in Desulfovibrio vulgaris: Combinatorial contributions of upstream regulatory motifs and coding sequence features to variations in mRNA abundance

    SciTech Connect

    Wu, Gang; Nie, Lei; Zhang, Weiwen

    2006-05-26

    ABSTRACT-The context-dependent expression of genes is the core for biological activities, and significant attention has been given to identification of various factors contributing to gene expression at genomic scale. However, so far this type of analysis has been focused whether on relation between mRNA expression and non-coding sequence features such as upstream regulatory motifs or on correlation between mRN abundance and non-random features in coding sequences (e.g. codon usage and amino acid usage). In this study multiple regression analyses of the mRNA abundance and all sequence information in Desulfovibrio vulgaris were performed, with the goal to investigate how much coding and non-coding sequence features contribute to the variations in mRNA expression, and in what manner they act together...

  19. Integrating Sequence Evolution into Probabilistic Orthology Analysis.

    PubMed

    Ullah, Ikram; Sjöstrand, Joel; Andersson, Peter; Sennblad, Bengt; Lagergren, Jens

    2015-11-01

    Orthology analysis, that is, finding out whether a pair of homologous genes are orthologs - stemming from a speciation - or paralogs - stemming from a gene duplication - is of central importance in computational biology, genome annotation, and phylogenetic inference. In particular, an orthologous relationship makes functional equivalence of the two genes highly likely. A major approach to orthology analysis is to reconcile a gene tree to the corresponding species tree, (most commonly performed using the most parsimonious reconciliation, MPR). However, most such phylogenetic orthology methods infer the gene tree without considering the constraints implied by the species tree and, perhaps even more importantly, only allow the gene sequences to influence the orthology analysis through the a priori reconstructed gene tree. We propose a sound, comprehensive Bayesian Markov chain Monte Carlo-based method, DLRSOrthology, to compute orthology probabilities. It efficiently sums over the possible gene trees and jointly takes into account the current gene tree, all possible reconciliations to the species tree, and the, typically strong, signal conveyed by the sequences. We compare our method with PrIME-GEM, a probabilistic orthology approach built on a probabilistic duplication-loss model, and MrBayesMPR, a probabilistic orthology approach that is based on conventional Bayesian inference coupled with MPR. We find that DLRSOrthology outperforms these competing approaches on synthetic data as well as on biological data sets and is robust to incomplete taxon sampling artifacts.

  20. ICAP-1, a Novel β1 Integrin Cytoplasmic Domain–associated Protein, Binds to a Conserved and Functionally Important NPXY Sequence Motif of β1 Integrin

    PubMed Central

    Chang, David D.; Wong, Carol; Smith, Healy; Liu, Jenny

    1997-01-01

    The cytoplasmic domains of integrins are essential for cell adhesion. We report identification of a novel protein, ICAP-1 (integrin cytoplasmic domain– associated protein-1), which binds to the β1 integrin cytoplasmic domain. The interaction between ICAP-1 and β1 integrins is highly specific, as demonstrated by the lack of interaction between ICAP-1 and the cytoplasmic domains of other β integrins, and requires a conserved and functionally important NPXY sequence motif found in the COOH-terminal region of the β1 integrin cytoplasmic domain. Mutational studies reveal that Asn and Tyr of the NPXY motif and a Val residue located NH2-terminal to this motif are critical for the ICAP-1 binding. Two isoforms of ICAP-1, a 200–amino acid protein (ICAP-1α) and a shorter 150–amino acid protein (ICAP-1β), derived from alternatively spliced mRNA, are expressed in most cells. ICAP-1α is a phosphoprotein and the extent of its phosphorylation is regulated by the cell–matrix interaction. First, an enhancement of ICAP-1α phosphorylation is observed when cells were plated on fibronectin-coated but not on nonspecific poly-l-lysine–coated surface. Second, the expression of a constitutively activated RhoA protein that disrupts the cell–matrix interaction results in dephosphorylation of ICAP-1α. The regulation of ICAP-1α phosphorylation by the cell–matrix interaction suggests an important role of ICAP-1 during integrin-dependent cell adhesion. PMID:9281591

  1. Identification of a Novel Sequence Motif Recognized by the Ankyrin Repeat Domain of zDHHC17/13 S-Acyltransferases*

    PubMed Central

    Lemonidis, Kimon; Sanchez-Perez, Maria C.; Chamberlain, Luke H.

    2015-01-01

    S-Acylation is a major post-translational modification affecting several cellular processes. It is particularly important for neuronal functions. This modification is catalyzed by a family of transmembrane S-acyltransferases that contain a conserved zinc finger DHHC (zDHHC) domain. Typically, eukaryote genomes encode for 7–24 distinct zDHHC enzymes, with two members also harboring an ankyrin repeat (AR) domain at their cytosolic N termini. The AR domain of zDHHC enzymes is predicted to engage in numerous interactions and facilitates both substrate recruitment and S-acylation-independent functions; however, the sequence/structural features recognized by this module remain unknown. The two mammalian AR-containing S-acyltransferases are the Golgi-localized zDHHC17 and zDHHC13, also known as Huntingtin-interacting proteins 14 and 14-like, respectively; they are highly expressed in brain, and their loss in mice leads to neuropathological deficits that are reminiscent of Huntington's disease. Here, we report that zDHHC17 and zDHHC13 recognize, via their AR domain, evolutionary conserved and closely related sequences of a [VIAP][VIT]XXQP consensus in SNAP25, SNAP23, cysteine string protein, Huntingtin, cytoplasmic linker protein 3, and microtubule-associated protein 6. This novel AR-binding sequence motif is found in regions predicted to be unstructured and is present in a number of zDHHC17 substrates and zDHHC17/13-interacting S-acylated proteins. This is the first study to identify a motif recognized by AR-containing zDHHCs. PMID:26198635

  2. Small yet effective: the ethylene responsive element binding factor-associated amphiphilic repression (EAR) motif.

    PubMed

    Kagale, Sateesh; Rozwadowski, Kevin

    2010-06-01

    The Ethylene-responsive element binding factor-associated Amphiphilic Repression (EAR) motif is a small yet distinct regulatory motif that is conserved in many plant transcriptional regulator (TR) proteins associated with diverse biological functions. We have previously established a list of high-confidence Arabidopsis EAR repressors, the EAR repressome, comprising 219 TRs belonging to 21 different TR families. This class of proteins and the sequence context of the EAR motif exhibited a high degree of conservation across evolutionarily diverse plant species. Our comprehensive genome-wide analysis enabled refining EAR motifs as comprising either LxLxL or DLNxxP. Comparing the representation of these sequence signatures in TRs to that of other repressor motifs we show that the EAR motif is the one most frequently represented, detected in 10 to 25% of the TRs from diverse plant species. The mechanisms involved in regulation of EAR motif function and the cellular fates of EAR repressors are currently not well understood. Our earlier analysis had implicated amino acid residues flanking the EAR motifs in regulation of their functionality. Here, we present additional evidence supporting possible regulation of EAR motif function by phosphorylation of integral or adjacent Ser and/or Thr residues. Additionally, we discuss potential novel roles of EAR motifs in plant-pathogen interaction and processes other than transcriptional repression.

  3. A common sequence motif determines the Cajal body-specific localization of box H/ACA scaRNAs.

    PubMed

    Richard, Patricia; Darzacq, Xavier; Bertrand, Edouard; Jády, Beáta E; Verheggen, Céline; Kiss, Tamás

    2003-08-15

    Post-transcriptional synthesis of 2'-O-methylated nucleotides and pseudouridines in Sm spliceosomal small nuclear RNAs takes place in the nucleoplasmic Cajal bodies and it is directed by guide RNAs (scaRNAs) that are structurally and functionally indistinguishable from small nucleolar RNAs (snoRNAs) directing rRNA modification in the nucleolus. The scaRNAs are synthesized in the nucleoplasm and specifically targeted to Cajal bodies. Here, mutational analysis of the human U85 box C/D-H/ACA scaRNA, followed by in situ localization, demonstrates that box H/ACA scaRNAs share a common Cajal body-specific localization signal, the CAB box. Two copies of the evolutionarily conserved CAB consensus (UGAG) are located in the terminal loops of the 5' and 3' hairpins of the box H/ACA domains of mammalian, Drosophila and plant scaRNAs. Upon alteration of the CAB boxes, mutant scaRNAs accumulate in the nucleolus. In turn, authentic snoRNAs can be targeted into Cajal bodies by addition of exogenous CAB box motifs. Our results indicate that scaRNAs represent an ancient group of small nuclear RNAs which are localized to Cajal bodies by an evolutionarily conserved mechanism.

  4. Nucleophosmin integrates within the nucleolus via multi-modal interactions with proteins displaying R-rich linear motifs and rRNA.

    PubMed

    Mitrea, Diana M; Cika, Jaclyn A; Guy, Clifford S; Ban, David; Banerjee, Priya R; Stanley, Christopher B; Nourse, Amanda; Deniz, Ashok A; Kriwacki, Richard W

    2016-02-02

    The nucleolus is a membrane-less organelle formed through liquid-liquid phase separation of its components from the surrounding nucleoplasm. Here, we show that nucleophosmin (NPM1) integrates within the nucleolus via a multi-modal mechanism involving multivalent interactions with proteins containing arginine-rich linear motifs (R-motifs) and ribosomal RNA (rRNA). Importantly, these R-motifs are found in canonical nucleolar localization signals. Based on a novel combination of biophysical approaches, we propose a model for the molecular organization within liquid-like droplets formed by the N-terminal domain of NPM1 and R-motif peptides, thus providing insights into the structural organization of the nucleolus. We identify multivalency of acidic tracts and folded nucleic acid binding domains, mediated by N-terminal domain oligomerization, as structural features required for phase separation of NPM1 with other nucleolar components in vitro and for localization within mammalian nucleoli. We propose that one mechanism of nucleolar localization involves phase separation of proteins within the nucleolus.

  5. Nucleophosmin integrates within the nucleolus via multi-modal interactions with proteins displaying R-rich linear motifs and rRNA

    DOE PAGES

    Mitrea, Diana M.; Cika, Jaclyn A.; Guy, Clifford S.; ...

    2016-02-02

    The nucleolus is a membrane-less organelle formed through liquid-liquid phase separation of its components from the surrounding nucleoplasm. Here, we show that nucleophosmin (NPM1) integrates within the nucleolus via a multi-modal mechanism involving multivalent interactions with proteins containing arginine-rich linear motifs (R-motifs) and ribosomal RNA (rRNA). Importantly, these R-motifs are found in canonical nucleolar localization signals. Based on a novel combination of biophysical approaches, we propose a model for the molecular organization within liquid-like droplets formed by the N-terminal domain of NPM1 and R-motif peptides, thus providing insights into the structural organization of the nucleolus. We identify multivalency of acidicmore » tracts and folded nucleic acid binding domains, mediated by N-terminal domain oligomerization, as structural features required for phase separation of NPM1 with other nucleolar components in vitro and for localization within mammalian nucleoli. We propose that one mechanism of nucleolar localization involves phase separation of proteins within the nucleolus.« less

  6. Nucleophosmin integrates within the nucleolus via multi-modal interactions with proteins displaying R-rich linear motifs and rRNA

    PubMed Central

    Mitrea, Diana M; Cika, Jaclyn A; Guy, Clifford S; Ban, David; Banerjee, Priya R; Stanley, Christopher B; Nourse, Amanda; Deniz, Ashok A; Kriwacki, Richard W

    2016-01-01

    The nucleolus is a membrane-less organelle formed through liquid-liquid phase separation of its components from the surrounding nucleoplasm. Here, we show that nucleophosmin (NPM1) integrates within the nucleolus via a multi-modal mechanism involving multivalent interactions with proteins containing arginine-rich linear motifs (R-motifs) and ribosomal RNA (rRNA). Importantly, these R-motifs are found in canonical nucleolar localization signals. Based on a novel combination of biophysical approaches, we propose a model for the molecular organization within liquid-like droplets formed by the N-terminal domain of NPM1 and R-motif peptides, thus providing insights into the structural organization of the nucleolus. We identify multivalency of acidic tracts and folded nucleic acid binding domains, mediated by N-terminal domain oligomerization, as structural features required for phase separation of NPM1 with other nucleolar components in vitro and for localization within mammalian nucleoli. We propose that one mechanism of nucleolar localization involves phase separation of proteins within the nucleolus. DOI: http://dx.doi.org/10.7554/eLife.13571.001 PMID:26836305

  7. RIKEN Integrated Sequence Analysis (RISA) System—384-Format Sequencing Pipeline with 384 Multicapillary Sequencer

    PubMed Central

    Shibata, Kazuhiro; Itoh, Masayoshi; Aizawa, Katsunori; Nagaoka, Sumiharu; Sasaki, Nobuya; Carninci, Piero; Konno, Hideaki; Akiyama, Junichi; Nishi, Katsuo; Kitsunai, Tokuji; Tashiro, Hideo; Itoh, Mari; Sumi, Noriko; Ishii, Yoshiyuki; Nakamura, Shin; Hazama, Makoto; Nishine, Tsutomu; Harada, Akira; Yamamoto, Rintaro; Matsumoto, Hiroyuki; Sakaguchi, Sumito; Ikegami, Takashi; Kashiwagi, Katsuya; Fujiwake, Syuji; Inoue, Kouji; Togawa, Yoshiyuki; Izawa, Masaki; Ohara, Eiji; Watahiki, Masanori; Yoneda, Yuko; Ishikawa, Tomokazu; Ozawa, Kaori; Tanaka, Takumi; Matsuura, Shuji; Kawai, Jun; Okazaki, Yasushi; Muramatsu, Masami; Inoue, Yorinao; Kira, Akira; Hayashizaki, Yoshihide

    2000-01-01

    The RIKEN high-throughput 384-format sequencing pipeline (RISA system) including a 384-multicapillary sequencer (the so-called RISA sequencer) was developed for the RIKEN mouse encyclopedia project. The RISA system consists of colony picking, template preparation, sequencing reaction, and the sequencing process. A novel high-throughput 384-format capillary sequencer system (RISA sequencer system) was developed for the sequencing process. This system consists of a 384-multicapillary auto sequencer (RISA sequencer), a 384-multicapillary array assembler (CAS), and a 384-multicapillary casting device. The RISA sequencer can simultaneously analyze 384 independent sequencing products. The optical system is a scanning system chosen after careful comparison with an image detection system for the simultaneous detection of the 384-capillary array. This scanning system can be used with any fluorescent-labeled sequencing reaction (chain termination reaction), including transcriptional sequencing based on RNA polymerase, which was originally developed by us, and cycle sequencing based on thermostable DNA polymerase. For long-read sequencing, 380 out of 384 sequences (99.2%) were successfully analyzed and the average read length, with more than 99% accuracy, was 654.4 bp. A single RISA sequencer can analyze 216 kb with >99% accuracy in 2.7 h (90 kb/h). For short-read sequencing to cluster the 3′ end and 5′ end sequencing by reading 350 bp, 384 samples can be analyzed in 1.5 h. We have also developed a RISA inoculator, RISA filtrator and densitometer, RISA plasmid preparator which can handle throughput of 40,000 samples in 17.5 h, and a high-throughput RISA thermal cycler which has four 384-well sites. The combination of these technologies allowed us to construct the RISA system consisting of 16 RISA sequencers, which can process 50,000 DNA samples per day. One haploid genome shotgun sequence of a higher organism, such as human, mouse, rat, domestic animals, and plants, can

  8. Role of repetitive nine-residue sequence motifs in secretion, enzymatic activity, and protein conformation of a family I.3 lipase.

    PubMed

    Kwon, Hyun-Ju; Haruki, Mitsuru; Morikawa, Masaaki; Omori, Kenji; Kanaya, Shigenori

    2002-01-01

    A family I.3 lipase from Pseudomonas sp. MIS38 (PML) contains 12 repeats of a nine-residue sequence motif in the C-terminal region. To elucidate the role of these repetitive sequences, mutant proteins PML5, PML4, PML1, and PML0, in which 7, 8, 11, and all 12 of the repetitive sequences are deleted, and PMLdelta19, in which 19 C-terminal residues are truncated, were constructed. Escherichia coli DH5 cells carrying the Serratia marcescens Lip system permitted the secretion of the wild-type and all of the mutant proteins except for PMLdelta19, although they were partially accumulated in the cells in an insoluble form as well. Both the secretion level and cellular content of the proteins decreased in the order PML > PML5 > PML4 > PML1 > PML0, indicating that repetitive sequences are not required for secretion of PML but are important for its stability in the cells. All the mutant proteins were purified in a refolded form and their biochemical properties were characterized. CD spectra, the Ca2+ contents, and susceptibility to chymotryptic digestion strongly suggested that the five repetitive sequences remaining in PML5 are sufficient to form a beta-roll structure, whereas the four in PML4 are not. PML5 and PMLdelta19 showed both lipase and esterase activities, whereas PML4, PML1, and PML0 were inactive. These results suggest that the enzymatic activity of PML is not seriously affected by a deletion or truncation at the C-terminal region as long as a succession of repetitive sequences can build a beta-roll structure.

  9. Limb body wall complex, amniotic band sequence, or new syndrome caused by mutation in IQ Motif containing K (IQCK)?

    PubMed Central

    Kruszka, Paul; Uwineza, Annette; Mutesa, Leon; Martinez, Ariel F; Abe, Yu; Zackai, Elaine H; Ganetzky, Rebecca; Chung, Brian; Stevenson, Roger E; Adelstein, Robert S; Ma, Xuefei; Mullikin, James C; Hong, Sung-Kook; Muenke, Maximilian

    2015-01-01

    Limb body wall complex (LBWC) and amniotic band sequence (ABS) are multiple congenital anomaly conditions with craniofacial, limb, and ventral wall defects. LBWC and ABS are considered separate entities by some, and a continuum of severity of the same condition by others. The etiology of LBWC/ABS remains unknown and multiple hypotheses have been proposed. One individual with features of LBWC and his unaffected parents were whole exome sequenced and Sanger sequenced as confirmation of the mutation. Functional studies were conducted using morpholino knockdown studies followed by human mRNA rescue experiments. Using whole exome sequencing, a de novo heterozygous mutation was found in the gene IQCK: c.667C>G; p.Q223E and confirmed by Sanger sequencing in an individual with LBWC. Morpholino knockdown of iqck mRNA in the zebrafish showed ventral defects including failure of ventral fin to develop and cardiac edema. Human wild-type IQCK mRNA rescued the zebrafish phenotype, whereas human p.Q223E IQCK mRNA did not, but worsened the phenotype of the morpholino knockdown zebrafish. This study supports a genetic etiology for LBWC/ABS, or potentially a new syndrome. PMID:26436108

  10. Spectrometric study of the folding process of i-motif-forming DNA sequences upstream of the c-kit transcription initiation site.

    PubMed

    Bucek, Pavel; Gargallo, Raimundo; Kudrev, Andrei

    2010-12-17

    The c-kit oncogene shows a cytosine-rich DNA region upstream of the transcription initiation site which forms an i-motif structure at slightly acidic pH values (Bucek et al. [5]). In the present study, the pH-induced formation of i-motif - forming sequences 5'-CCC CTC CCT CGC GCC CGC CCG-3' (ckitC1, native), 5'-CCC TTC CCT TGT GCC CGC CCG-3' (ckitC2) and 5'-CCCTT CCC TTTTT CCC T CCC T-3' (ckitC3) was studied by spectroscopic techniques, such as UV molecular absorption and circular dichroism (CD), in tandem with two multivariate data analysis methods, the hard modelling-based matrix method and the soft modelling-based MCR-ALS approach. Use of the hard chemical modelling enabled us to propose the equilibrium model, which describes spectral changes as functions of solution acidity. Additionally, the intrinsic protonation constant, K(in), and the cooperativity parameters, ω(c), and ω(a), were calculated from the fitting procedure of the coupled CD and molecular absorption spectra. In the case of ckitC2 and ckitC3, the hard model correctly reproduced the spectral variations observed experimentally. The results indicated that folding was accompanied by a cooperative process, i.e. the enhancement of protonated structure stability upon protonation. In contrast, unfolding was accompanied by an anticooperative process. Finally, folding of the native sequence, ckitC1, seemed to follow a more complex mechanism.

  11. A Novel Bayesian DNA Motif Comparison Method for Clustering and Retrieval

    PubMed Central

    Margalit, Hanah; Friedman, Nir

    2008-01-01

    Characterizing the DNA-binding specificities of transcription factors is a key problem in computational biology that has been addressed by multiple algorithms. These usually take as input sequences that are putatively bound by the same factor and output one or more DNA motifs. A common practice is to apply several such algorithms simultaneously to improve coverage at the price of redundancy. In interpreting such results, two tasks are crucial: clustering of redundant motifs, and attributing the motifs to transcription factors by retrieval of similar motifs from previously characterized motif libraries. Both tasks inherently involve motif comparison. Here we present a novel method for comparing and merging motifs, based on Bayesian probabilistic principles. This method takes into account both the similarity in positional nucleotide distributions of the two motifs and their dissimilarity to the background distribution. We demonstrate the use of the new comparison method as a basis for motif clustering and retrieval procedures, and compare it to several commonly used alternatives. Our results show that the new method outperforms other available methods in accuracy and sensitivity. We incorporated the resulting motif clustering and retrieval procedures in a large-scale automated pipeline for analyzing DNA motifs. This pipeline integrates the results of various DNA motif discovery algorithms and automatically merges redundant motifs from multiple training sets into a coherent annotated library of motifs. Application of this pipeline to recent genome-wide transcription factor location data in S. cerevisiae successfully identified DNA motifs in a manner that is as good as semi-automated analysis reported in the literature. Moreover, we show how this analysis elucidates the mechanisms of condition-specific preferences of transcription factors. PMID:18463706

  12. Construction of an integrated database to support genomic sequence analysis

    SciTech Connect

    Gilbert, W.; Overbeek, R.

    1994-11-01

    The central goal of this project is to develop an integrated database to support comparative analysis of genomes including DNA sequence data, protein sequence data, gene expression data and metabolism data. In developing the logic-based system GenoBase, a broader integration of available data was achieved due to assistance from collaborators. Current goals are to easily include new forms of data as they become available and to easily navigate through the ensemble of objects described within the database. This report comments on progress made in these areas.

  13. A survey of motif finding Web tools for detecting binding site motifs in ChIP-Seq data.

    PubMed

    Tran, Ngoc Tam L; Huang, Chun-Hsi

    2014-02-20

    ChIP-Seq (chromatin immunoprecipitation sequencing) has provided the advantage for finding motifs as ChIP-Seq experiments narrow down the motif finding to binding site locations. Recent motif finding tools facilitate the motif detection by providing user-friendly Web interface. In this work, we reviewed nine motif finding Web tools that are capable for detecting binding site motifs in ChIP-Seq data. We showed each motif finding Web tool has its own advantages for detecting motifs that other tools may not discover. We recommended the users to use multiple motif finding Web tools that implement different algorithms for obtaining significant motifs, overlapping resemble motifs, and non-overlapping motifs. Finally, we provided our suggestions for future development of motif finding Web tool that better assists researchers for finding motifs in ChIP-Seq data.

  14. Rice MEL2, the RNA recognition motif (RRM) protein, binds in vitro to meiosis-expressed genes containing U-rich RNA consensus sequences in the 3'-UTR.

    PubMed

    Miyazaki, Saori; Sato, Yutaka; Asano, Tomoya; Nagamura, Yoshiaki; Nonomura, Ken-Ichi

    2015-10-01

    Post-transcriptional gene regulation by RNA recognition motif (RRM) proteins through binding to cis-elements in the 3'-untranslated region (3'-UTR) is widely used in eukaryotes to complete various biological processes. Rice MEIOSIS ARRESTED AT LEPTOTENE2 (MEL2) is the RRM protein that functions in the transition to meiosis in proper timing. The MEL2 RRM preferentially associated with the U-rich RNA consensus, UUAGUU[U/A][U/G][A/U/G]U, dependently on sequences and proportionally to MEL2 protein amounts in vitro. The consensus sequences were located in the putative looped structures of the RNA ligand. A genome-wide survey revealed a tendency of MEL2-binding consensus appearing in 3'-UTR of rice genes. Of 249 genes that conserved the consensus in their 3'-UTR, 13 genes spatiotemporally co-expressed with MEL2 in meiotic flowers, and included several genes whose function was supposed in meiosis; such as Replication protein A and OsMADS3. The proteome analysis revealed that the amounts of small ubiquitin-related modifier-like protein and eukaryotic translation initiation factor3-like protein were dramatically altered in mel2 mutant anthers. Taken together with transcriptome and gene ontology results, we propose that the rice MEL2 is involved in the translational regulation of key meiotic genes on 3'-UTRs to achieve the faithful transition of germ cells to meiosis.

  15. A mechanism of immunoreceptor tyrosine-based activation motif (ITAM)-like sequences in the capsid protein VP2 in viral growth and pathogenesis of Coxsackievirus B3.

    PubMed

    Kim, Dae-Sun; Park, Jung-Hyun; Kim, Joo-Young; Kim, Dokeun; Nam, Jae-Hwan

    2012-04-01

    Coxsackievirus B3 (CVB3) is an RNA virus that mainly causes myocarditis. We have reported previously that immunoreceptor tyrosine-based activation motif (ITAM)-like sequences are contained in the capsid protein VP2 of CVB3. The substitution of two tyrosines for phenylalanines in the ITAM-like region causes attenuation of CVB3, possibly via defective viral assembly. In this study, we found that Syk, a downstream molecule of ITAM, interacts with the wild-type (WT) CVB3 VP0 protein, but not with the mutant CVB3 VP0 (called YYFF), and that an inhibitor of Syk reduced the growth of CVB3. The WT CVB3 activated nuclear factor kappa B (NF-κB), a protein activated by ITAM, and eventually induced the production of interleukin-6 (IL-6)-one of the proinflammatory cytokines induced by NF-κB-in macrophages. However, the YYFF form did not. In addition, viral VP2 protein may be dependent on the phosphorylation of an ITAM-like region that affected the activation of NF-κB. Taken together, these results suggest that the ITAM-like sequences in CVB3 VP2 can not only affect viral structure but also act as signals in pathogenesis.

  16. An Integrated Enzyme Kinetics Laboratory Sequence for Undergraduates.

    ERIC Educational Resources Information Center

    Bucholtz, Michael L.

    1988-01-01

    Describes a three-week sequence to take undergraduate students through the study of enzyme kinetics in an integrated manner that reinforces the basic concepts of initial velocity and the effects of varying operational parameters. Discusses laboratory sessions and the use of a microcomputer in instruction. (CW)

  17. Papillomavirus sequences integrate near cellular oncogenes in some cervical carcinomas

    SciTech Connect

    Duerst, M.; Croce, C.M.; Gissmann, L.; Schwarz, E.; Huebner, K.

    1987-02-01

    The chromosomal locations of cellular sequences flanking integrated papillomavirus DNA in four cervical cell lines and a primary cervical carcinoma have been determined. The two human papillomavirus (HPV) 16 flanking sequences derived from the tumor were localized to chromosomes regions 20pter..-->..20q13 and 3p25..-->..3qter, regions that also contain the protooncogenes c-src-1 and c-raf-1, respectively. The HPV 16 integration site in the SiHa cervical carcinoma-derived cell line is in chromosome region 13q14..-->..13q32. The HPV 18 integration site in SW756 cervical carcinoma cells is in chromosome 12 but is not closely linked to the Ki-ras2 gene. Finally, in two cervical carcinoma cell lines, HeLa and C4-I, HPV 18 DNA is integrated in chromosome 8, 5' of the c-myc gene. The HeLaHPV 18 integration site is within 40 kilobases 5' of the c-myc gene, inside the HL60 amplification unit surrounding and including the c-myc gene. Additionally, steady-state levels of c-myc mRNA are elevated in HeLa and C4-I cells relative to other cervical carcinoma cell lines. Thus, in at least some genital tumors, cis-activation of cellular oncogenes by HPV may be involved in malignant transformation of cervical cells.

  18. Tiered Human Integrated Sequence Search Databases for Shotgun Proteomics.

    PubMed

    Deutsch, Eric W; Sun, Zhi; Campbell, David S; Binz, Pierre-Alain; Farrah, Terry; Shteynberg, David; Mendoza, Luis; Omenn, Gilbert S; Moritz, Robert L

    2016-11-04

    The results of analysis of shotgun proteomics mass spectrometry data can be greatly affected by the selection of the reference protein sequence database against which the spectra are matched. For many species there are multiple sources from which somewhat different sequence sets can be obtained. This can lead to confusion about which database is best in which circumstances-a problem especially acute in human sample analysis. All sequence databases are genome-based, with sequences for the predicted gene and their protein translation products compiled. Our goal is to create a set of primary sequence databases that comprise the union of sequences from many of the different available sources and make the result easily available to the community. We have compiled a set of four sequence databases of varying sizes, from a small database consisting of only the ∼20,000 primary isoforms plus contaminants to a very large database that includes almost all nonredundant protein sequences from several sources. This set of tiered, increasingly complete human protein sequence databases suitable for mass spectrometry proteomics sequence database searching is called the Tiered Human Integrated Search Proteome set. In order to evaluate the utility of these databases, we have analyzed two different data sets, one from the HeLa cell line and the other from normal human liver tissue, with each of the four tiers of database complexity. The result is that approximately 0.8%, 1.1%, and 1.5% additional peptides can be identified for Tiers 2, 3, and 4, respectively, as compared with the Tier 1 database, at substantially increasing computational cost. This increase in computational cost may be worth bearing if the identification of sequence variants or the discovery of sequences that are not present in the reviewed knowledge base entries is an important goal of the study. We find that it is useful to search a data set against a simpler database, and then check the uniqueness of the

  19. Music and language perception: expectations, structural integration, and cognitive sequencing.

    PubMed

    Tillmann, Barbara

    2012-10-01

    Music can be described as sequences of events that are structured in pitch and time. Studying music processing provides insight into how complex event sequences are learned, perceived, and represented by the brain. Given the temporal nature of sound, expectations, structural integration, and cognitive sequencing are central in music perception (i.e., which sounds are most likely to come next and at what moment should they occur?). This paper focuses on similarities in music and language cognition research, showing that music cognition research provides insight into the understanding of not only music processing but also language processing and the processing of other structured stimuli. The hypothesis of shared resources between music and language processing and of domain-general dynamic attention has motivated the development of research to test music as a means to stimulate sensory, cognitive, and motor processes.

  20. A short sequence motif in the 5' leader of the HIV-1 genome modulates extended RNA dimer formation and virus replication.

    PubMed

    van Bel, Nikki; Das, Atze T; Cornelissen, Marion; Abbink, Truus E M; Berkhout, Ben

    2014-12-19

    The 5' leader of the HIV-1 RNA genome encodes signals that control various steps in the replication cycle, including the dimerization initiation signal (DIS) that triggers RNA dimerization. The DIS folds a hairpin structure with a palindromic sequence in the loop that allows RNA dimerization via intermolecular kissing loop (KL) base pairing. The KL dimer can be stabilized by including the DIS stem nucleotides in the intermolecular base pairing, forming an extended dimer (ED). The role of the ED RNA dimer in HIV-1 replication has hardly been addressed because of technical challenges. We analyzed a set of leader mutants with a stabilized DIS hairpin for in vitro RNA dimerization and virus replication in T cells. In agreement with previous observations, DIS hairpin stability modulated KL and ED dimerization. An unexpected previous finding was that mutation of three nucleotides immediately upstream of the DIS hairpin significantly reduced in vitro ED formation. In this study, we tested such mutants in vivo for the importance of the ED in HIV-1 biology. Mutants with a stabilized DIS hairpin replicated less efficiently than WT HIV-1. This defect was most severe when the upstream sequence motif was altered. Virus evolution experiments with the defective mutants yielded fast replicating HIV-1 variants with second site mutations that (partially) restored the WT hairpin stability. Characterization of the mutant and revertant RNA molecules and the corresponding viruses confirmed the correlation between in vitro ED RNA dimer formation and efficient virus replication, thus indicating that the ED structure is important for HIV-1 replication.

  1. Motif-based construction of a functional map for mammalian olfactory receptors.

    PubMed

    Liu, Agatha H; Zhang, Xinmin; Stolovitzky, Gustavo A; Califano, Andrea; Firestein, Stuart J

    2003-05-01

    We applied an automatic and unsupervised system to a nearly complete database of mammalian odor receptor genes. The generated motifs and gene classification were subjected to extensive and systematic downstream analysis to obtain biological insights. Two major results from this analysis were: (1) a map of sequence motifs that may correlate with function and (2) the corresponding receptor classes in which members of each class are likely to share specific functions. We have discovered motifs that have been implicated in structural integrity and posttranslational modification, as well as motifs very likely to be directly involved in ligand binding. We further propose a combinatorial molecular hypothesis, based on unique combinations of the observed motifs, that provides a foundation for understanding the generation of a large number of ligand binding sites.

  2. Selection against spurious promoter motifs correlates withtranslational efficiency across bacteria

    SciTech Connect

    Froula, Jeffrey L.; Francino, M. Pilar

    2007-05-01

    Because binding of RNAP to misplaced sites could compromise the efficiency of transcription, natural selection for the optimization of gene expression should regulate the distribution of DNA motifs capable of RNAP-binding across the genome. Here we analyze the distribution of the -10 promoter motifs that bind the {sigma}{sup 70} subunit of RNAP in 42 bacterial genomes. We show that selection on these motifs operates across the genome, maintaining an over-representation of -10 motifs in regulatory sequences while eliminating them from the nonfunctional and, in most cases, from the protein coding regions. In some genomes, however, -10 sites are over-represented in the coding sequences; these sites could induce pauses effecting regulatory roles throughout the length of a transcriptional unit. For nonfunctional sequences, the extent of motif under-representation varies across genomes in a manner that broadly correlates with the number of tRNA genes, a good indicator of translational speed and growth rate. This suggests that minimizing the time invested in gene transcription is an important selective pressure against spurious binding. However, selection against spurious binding is detectable in the reduced genomes of host-restricted bacteria that grow at slow rates, indicating that components of efficiency other than speed may also be important. Minimizing the number of RNAP molecules per cell required for transcription, and the corresponding energetic expense, may be most relevant in slow growers. These results indicate that genome-level properties affecting the efficiency of transcription and translation can respond in an integrated manner to optimize gene expression. The detection of selection against promoter motifs in nonfunctional regions also implies that no sequence may evolve free of selective constraints, at least in the relatively small and unstructured genomes of bacteria.

  3. Graph-based sequence annotation using a data integration approach.

    PubMed

    Pesch, Robert; Lysenko, Artem; Hindle, Matthew; Hassani-Pak, Keywan; Thiele, Ralf; Rawlings, Christopher; Köhler, Jacob; Taubert, Jan

    2008-08-25

    The automated annotation of data from high throughput sequencing and genomics experiments is a significant challenge for bioinformatics. Most current approaches rely on sequential pipelines of gene finding and gene function prediction methods that annotate a gene with information from different reference data sources. Each function prediction method contributes evidence supporting a functional assignment. Such approaches generally ignore the links between the information in the reference datasets. These links, however, are valuable for assessing the plausibility of a function assignment and can be used to evaluate the confidence in a prediction. We are working towards a novel annotation system that uses the network of information supporting the function assignment to enrich the annotation process for use by expert curators and predicting the function of previously unannotated genes. In this paper we describe our success in the first stages of this development. We present the data integration steps that are needed to create the core database of integrated reference databases (UniProt, PFAM, PDB, GO and the pathway database Ara-Cyc) which has been established in the ONDEX data integration system. We also present a comparison between different methods for integration of GO terms as part of the function assignment pipeline and discuss the consequences of this analysis for improving the accuracy of gene function annotation. The methods and algorithms presented in this publication are an integral part of the ONDEX system which is freely available from http://ondex.sf.net/.

  4. Efficient motif search in ranked lists and applications to variable gap motifs.

    PubMed

    Leibovich, Limor; Yakhini, Zohar

    2012-07-01

    Sequence elements, at all levels-DNA, RNA and protein, play a central role in mediating molecular recognition and thereby molecular regulation and signaling. Studies that focus on -measuring and investigating sequence-based recognition make use of statistical and computational tools, including approaches to searching sequence motifs. State-of-the-art motif searching tools are limited in their coverage and ability to address large motif spaces. We develop and present statistical and algorithmic approaches that take as input ranked lists of sequences and return significant motifs. The efficiency of our approach, based on suffix trees, allows searches over motif spaces that are not covered by existing tools. This includes searching variable gap motifs-two half sites with a flexible length gap in between-and searching long motifs over large alphabets. We used our approach to analyze several high-throughput measurement data sets and report some validation results as well as novel suggested motifs and motif refinements. We suggest a refinement of the known estrogen receptor 1 motif in humans, where we observe gaps other than three nucleotides that also serve as significant recognition sites, as well as a variable length motif related to potential tyrosine phosphorylation.

  5. Integrative analysis of environmental sequences using MEGAN4

    PubMed Central

    Huson, Daniel H.; Mitra, Suparna; Ruscheweyh, Hans-Joachim; Weber, Nico; Schuster, Stephan C.

    2011-01-01

    A major challenge in the analysis of environmental sequences is data integration. The question is how to analyze different types of data in a unified approach, addressing both the taxonomic and functional aspects. To facilitate such analyses, we have substantially extended MEGAN, a widely used taxonomic analysis program. The new program, MEGAN4, provides an integrated approach to the taxonomic and functional analysis of metagenomic, metatranscriptomic, metaproteomic, and rRNA data. While taxonomic analysis is performed based on the NCBI taxonomy, functional analysis is performed using the SEED classification of subsystems and functional roles or the KEGG classification of pathways and enzymes. A number of examples illustrate how such analyses can be performed, and show that one can also import and compare classification results obtained using others' tools. MEGAN4 is freely available for academic purposes, and installers for all three major operating systems can be downloaded from www-ab.informatik.uni-tuebingen.de/software/megan. PMID:21690186

  6. Mutation of the aspartic acid residues of the GDD sequence motif of poliovirus RNA-dependent RNA polymerase results in enzymes with altered metal ion requirements for activity.

    PubMed Central

    Jablonski, S A; Morrow, C D

    1995-01-01

    The poliovirus RNA-dependent RNA polymerase, 3Dpol, is known to share a region of sequence homology with all RNA polymerases centered at the GDD amino acid motif. The two aspartic acids have been postulated to be involved in the catalytic activity and metal ion coordination of the enzyme. To test this hypothesis, we have utilized oligonucleotide site-directed mutagenesis to generate defined mutations in the aspartic acids of the GDD motif of the 3Dpol gene. The codon for the first aspartate (3D-D-328 [D refers to the single amino acid change, and the number refers to its position in the polymerase]) was changed to that for glutamic acid, histidine, asparagine, or glutamine; the codons for both aspartic acids were simultaneously changed to those for glutamic acids; and the codon for the second aspartic acid (3D-D-329) was changed to that for glutamic acid or asparagine. The mutant enzymes were expressed in Escherichia coli, and the in vitro poly(U) polymerase activity was characterized. All of the mutant 3Dpol enzymes were enzymatically inactive in vitro when tested over a range of Mg2+ concentrations. However, when Mn2+ was substituted for Mg2+ in the in vitro assays, the mutant that substituted the second aspartic acid for asparagine (3D-N-329) was active. To further substantiate this finding, a series of different transition metal ions were substituted for Mg2+ in the poly(U) polymerase assay. The wild-type enzyme was active with all metals except Ca2+, while the 3D-N-329 mutant was active only when FeC6H7O5 was used in the reaction. To determine the effects of the mutations on poliovirus replication, the mutant 3Dpol genes were subcloned into an infectious cDNA of poliovirus. The cDNAs containing the mutant 3Dpol genes did not produce infectious virus when transfected into tissue culture cells under standard conditions. Because of the activity of the 3D-N-329 mutant in the presence of Fe2+ and Mn2+, transfections were also performed in the presence of the

  7. The Pichia pastoris PER6 gene product is a peroxisomal integral membrane protein essential for peroxisome biogenesis and has sequence similarity to the Zellweger syndrome protein PAF-1.

    PubMed Central

    Waterham, H R; de Vries, Y; Russel, K A; Xie, W; Veenhuis, M; Cregg, J M

    1996-01-01

    We report the cloning of PER6, a gene essential for peroxisome biogenesis in the methylotrophic yeast Pichia pastoris. The PER6 sequence predicts that its product Per6p is a 52-kDa polypeptide with the cysteine-rich C3HC4 motif. Per6p has significant overall sequence similarity with the human peroxisome assembly factor PAF-1, a protein that is defective in certain patients suffering from the peroxisomal disorder Zellweger syndrome, and with car1, a protein required for peroxisome biogenesis and caryogamy in the filamentous fungus Podospora anserina. In addition, the C3HC4 motif and two of the three membrane-spanning segments predicted for Per6p align with the C3HC4 motifs and the two membrane-spanning segments predicted for PAF-1 and car1. Like PAF-1, Per6p is a peroxisomal integral membrane protein. In methanol- or oleic acid-induced cells of per6 mutants, morphologically recognizable peroxisomes are absent. Instead, peroxisomal remnants are observed. In addition, peroxisomal matrix proteins are synthesized but located in the cytosol. The similarities between Per6p and PAF-1 in amino acid sequence and biochemical properties, and between mutants defective in their respective genes, suggest that Per6p is the putative yeast homolog of PAF-1. PMID:8628321

  8. DNA motifs determining the accuracy of repeat duplication during CRISPR adaptation in Haloarcula hispanica

    PubMed Central

    Wang, Rui; Li, Ming; Gong, Luyao; Hu, Songnian; Xiang, Hua

    2016-01-01

    Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) acquire new spacers to generate adaptive immunity in prokaryotes. During spacer integration, the leader-preceded repeat is always accurately duplicated, leading to speculations of a repeat-length ruler. Here in Haloarcula hispanica, we demonstrate that the accurate duplication of its 30-bp repeat requires two conserved mid-repeat motifs, AACCC and GTGGG. The AACCC motif was essential and needed to be ∼10 bp downstream from the leader-repeat junction site, where duplication consistently started. Interestingly, repeat duplication terminated sequence-independently and usually with a specific distance from the GTGGG motif, which seemingly served as an anchor site for a molecular ruler. Accordingly, altering the spacing between the two motifs led to an aberrant duplication size (29, 31, 32 or 33 bp). We propose the adaptation complex may recognize these mid-repeat elements to enable measuring the repeat DNA for spacer integration. PMID:27085805

  9. The C-Terminal Sequence and PI motif of the Orchid (Oncidium Gower Ramsey) PISTILLATA (PI) Ortholog Determine its Ability to Bind AP3 Orthologs and Enter the Nucleus to Regulate Downstream Genes Controlling Petal and Stamen Formation.

    PubMed

    Mao, Wan-Ting; Hsu, Hsing-Fun; Hsu, Wei-Han; Li, Jen-Ying; Lee, Yung-I; Yang, Chang-Hsien

    2015-11-01

    This study focused on the investigation of the effects of the PI motif and C-terminus of the Oncidium Gower Ramsey MADS box gene 8 (OMADS8), a PISTILLATA (PI) ortholog, on floral organ formation. 35S::OMADS8 completely rescued and 35S::OMADS8-PI (with the PI motif deleted) partially rescued petal/stamen formation, whereas these deficiencies were not rescued by 35S::OMADS8-C (C-terminal 29 amino acids deleted) in pi-1 mutants. OMADS8 could interact with Arabidopsis APETALA3 (AP3) and enter the nucleus. The nuclear entry efficiency was reduced for OMADS8-PI/AP3 and OMADS8-C/AP3. OMADS8 could also interact with OMADS5/OMADS9 (the Oncidium AP3 ortholog) and enter the nucleus with an efficiency only slightly affected by the deletion of the C-terminal sequence or PI motif. However, the stability of the OMADS8/OMADS5 and OMADS8/OMADS9 complexes was significantly reduced by deletion of the C-terminal sequence or PI motif. Further analysis indicated that the expression of genes downstream of AP3/PI (BNQ1/BNQ2/GNC/At4g30270) was compensated by 35S::OMADS8 and 35S::OMADS8-PI to a level similar to wild-type plants but was not affected by 35S::OMADS8-C in the pi-1 mutants. A similar FRET (fluorescence resonance energy transfer) efficiency was observed for Arabidopsis AGAMOUS (AG) and the Oncidium AG ortholog OMADS4 for OMADS8, OMADS8-PI and OMADS8-C. These results indicated that the OMADS8 PI motif and C-terminus were valuable for the interaction of OMADS8 with the AP3 orthologs to form higher order heterotetrameric complexes that regulated petal/stamen formation in both Oncidium orchids and transgenic Arabidopsis. However, the C-terminal sequence and PI motif were dispensable for the interaction of OMADS8 with the AG orthologs.

  10. Efficient motif search in ranked lists and applications to variable gap motifs

    PubMed Central

    Leibovich, Limor; Yakhini, Zohar

    2012-01-01

    Sequence elements, at all levels—DNA, RNA and protein, play a central role in mediating molecular recognition and thereby molecular regulation and signaling. Studies that focus on measuring and investigating sequence-based recognition make use of statistical and computational tools, including approaches to searching sequence motifs. State-of-the-art motif searching tools are limited in their coverage and ability to address large motif spaces. We develop and present statistical and algorithmic approaches that take as input ranked lists of sequences and return significant motifs. The efficiency of our approach, based on suffix trees, allows searches over motif spaces that are not covered by existing tools. This includes searching variable gap motifs—two half sites with a flexible length gap in between—and searching long motifs over large alphabets. We used our approach to analyze several high-throughput measurement data sets and report some validation results as well as novel suggested motifs and motif refinements. We suggest a refinement of the known estrogen receptor 1 motif in humans, where we observe gaps other than three nucleotides that also serve as significant recognition sites, as well as a variable length motif related to potential tyrosine phosphorylation. PMID:22416066

  11. Ovodefensins, an Oviduct-Specific Antimicrobial Gene Family, Have Evolved in Birds and Reptiles to Protect the Egg by Both Sequence and Intra-Six-Cysteine Sequence Motif Spacing.

    PubMed

    Whenham, Natasha; Lu, Tian Chee; Maidin, Maisarah B M; Wilson, Peter W; Bain, Maureen M; Stevenson, M Lynn; Stevens, Mark P; Bedford, Michael R; Dunn, Ian C

    2015-06-01

    Ovodefensins are a novel beta defensin-related family of antimicrobial peptides containing conserved glycine and six cysteine residues. Originally thought to be restricted to the albumen-producing region of the avian oviduct, expression was found in chicken, turkey, duck, and zebra finch in large quantities in many parts of the oviduct, but this varied between species and between gene forms in the same species. Using new search strategies, the ovodefensin family now has 35 members, including reptiles, but no representatives outside birds and reptiles have been found. Analysis of their evolution shows that ovodefensins divide into six groups based on the intra-cysteine amino acid spacing, representing a unique mechanism alongside traditional evolution of sequence. The groups have been used to base a nomenclature for the family. Antimicrobial activity for three ovodefensins from chicken and duck was confirmed against Escherichia coli and a pathogenic E. coli strain as well as a Gram-positive organism, Staphylococcus aureus, for the first time. However, activity varied greatly between peptides, with Gallus gallus OvoDA1 being the most potent, suggesting a link with the different structures. Expression of Gallus gallus OvoDA1 (gallin) in the oviduct was increased by estrogen and progesterone and in the reproductive state. Overall, the results support the hypothesis that ovodefensins evolved to protect the egg, but they are not necessarily restricted to the egg white. Therefore, divergent motif structure and sequence present an interesting area of research for antimicrobial peptide design and understanding protection of the cleidoic egg.

  12. Integrated sequence and immunology filovirus database at Los Alamos

    DOE PAGES

    Yusim, Karina; Yoon, Hyejin; Foley, Brian; ...

    2016-01-01

    The Ebola outbreak of 2013–15 infected more than 28,000 people and claimed more lives than all previous filovirus outbreaks combined. Governmental agencies, clinical teams, and the world scientific community pulled together in a multifaceted response ranging from prevention and disease control, to evaluating vaccines and therapeutics in human trials. We report that as this epidemic is finally coming to a close, refocusing on long-term prevention strategies becomes paramount. Given the very real threat of future filovirus outbreaks, and the inherent uncertainty of the next outbreak virus and geographic location, it is prudent to consider the extent and implications of knownmore » natural diversity in advancing vaccines and therapeutic approaches. To facilitate such consideration, we have updated and enhanced the content of the filovirus portion of Los Alamos Hemorrhagic Fever Viruses Database. We have integrated and performed baseline analysis of all family Filoviridae sequences deposited into GenBank, with associated immune response data, and metadata, and we have added new computational tools with web-interfaces to assist users with analysis. Here, we (i) describe the main features of updated database, (ii) provide integrated views and some basic analyses summarizing evolutionary patterns as they relate to geo-temporal data captured in the database and (iii) highlight the most conserved regions in the proteome that may be useful for a T cell vaccine strategy.« less

  13. Integrated sequence and immunology filovirus database at Los Alamos

    SciTech Connect

    Yusim, Karina; Yoon, Hyejin; Foley, Brian; Feng, Shihai; Macke, Jennifer; Dimitrijevic, Mira; Abfalterer, Werner; Szinger, James; Fischer, Will; Kuiken, Carla; Korber, Bette

    2016-01-01

    The Ebola outbreak of 2013–15 infected more than 28,000 people and claimed more lives than all previous filovirus outbreaks combined. Governmental agencies, clinical teams, and the world scientific community pulled together in a multifaceted response ranging from prevention and disease control, to evaluating vaccines and therapeutics in human trials. We report that as this epidemic is finally coming to a close, refocusing on long-term prevention strategies becomes paramount. Given the very real threat of future filovirus outbreaks, and the inherent uncertainty of the next outbreak virus and geographic location, it is prudent to consider the extent and implications of known natural diversity in advancing vaccines and therapeutic approaches. To facilitate such consideration, we have updated and enhanced the content of the filovirus portion of Los Alamos Hemorrhagic Fever Viruses Database. We have integrated and performed baseline analysis of all family Filoviridae sequences deposited into GenBank, with associated immune response data, and metadata, and we have added new computational tools with web-interfaces to assist users with analysis. Here, we (i) describe the main features of updated database, (ii) provide integrated views and some basic analyses summarizing evolutionary patterns as they relate to geo-temporal data captured in the database and (iii) highlight the most conserved regions in the proteome that may be useful for a T cell vaccine strategy.

  14. Stochastic motif extraction using hidden Markov model

    SciTech Connect

    Fujiwara, Yukiko; Asogawa, Minoru; Konagaya, Akihiko

    1994-12-31

    In this paper, we study the application of an HMM (hidden Markov model) to the problem of representing protein sequences by a stochastic motif. A stochastic protein motif represents the small segments of protein sequences that have a certain function or structure. The stochastic motif, represented by an HMM, has conditional probabilities to deal with the stochastic nature of the motif. This HMM directive reflects the characteristics of the motif, such as a protein periodical structure or grouping. In order to obtain the optimal HMM, we developed the {open_quotes}iterative duplication method{close_quotes} for HMM topology learning. It starts from a small fully-connected network and iterates the network generation and parameter optimization until it achieves sufficient discrimination accuracy. Using this method, we obtained an HMM for a leucine zipper motif. Compared to the accuracy of a symbolic pattern representation with accuracy of 14.8 percent, an HMM achieved 79.3 percent in prediction. Additionally, the method can obtain an HMM for various types of zinc finger motifs, and it might separate the mixed data. We demonstrated that this approach is applicable to the validation of the protein databases; a constructed HMM b as indicated that one protein sequence annotated as {open_quotes}lencine-zipper like sequence{close_quotes} in the database is quite different from other leucine-zipper sequences in terms of likelihood, and we found this discrimination is plausible.

  15. Temporal motifs in time-dependent networks

    NASA Astrophysics Data System (ADS)

    Kovanen, Lauri; Karsai, Márton; Kaski, Kimmo; Kertész, János; Saramäki, Jari

    2011-11-01

    Temporal networks are commonly used to represent systems where connections between elements are active only for restricted periods of time, such as telecommunication, neural signal processing, biochemical reaction and human social interaction networks. We introduce the framework of temporal motifs to study the mesoscale topological-temporal structure of temporal networks in which the events of nodes do not overlap in time. Temporal motifs are classes of similar event sequences, where the similarity refers not only to topology but also to the temporal order of the events. We provide a mapping from event sequences to coloured directed graphs that enables an efficient algorithm for identifying temporal motifs. We discuss some aspects of temporal motifs, including causality and null models, and present basic statistics of temporal motifs in a large mobile call network.

  16. RegulonDB version 9.0: high-level integration of gene regulation, coexpression, motif clustering and beyond.

    PubMed

    Gama-Castro, Socorro; Salgado, Heladia; Santos-Zavaleta, Alberto; Ledezma-Tejeida, Daniela; Muñiz-Rascado, Luis; García-Sotelo, Jair Santiago; Alquicira-Hernández, Kevin; Martínez-Flores, Irma; Pannier, Lucia; Castro-Mondragón, Jaime Abraham; Medina-Rivera, Alejandra; Solano-Lira, Hilda; Bonavides-Martínez, César; Pérez-Rueda, Ernesto; Alquicira-Hernández, Shirley; Porrón-Sotelo, Liliana; López-Fuentes, Alejandra; Hernández-Koutoucheva, Anastasia; Del Moral-Chávez, Víctor; Rinaldi, Fabio; Collado-Vides, Julio

    2016-01-04

    RegulonDB (http://regulondb.ccg.unam.mx) is one of the most useful and important resources on bacterial gene regulation,as it integrates the scattered scientific knowledge of the best-characterized organism, Escherichia coli K-12, in a database that organizes large amounts of data. Its electronic format enables researchers to compare their results with the legacy of previous knowledge and supports bioinformatics tools and model building. Here, we summarize our progress with RegulonDB since our last Nucleic Acids Research publication describing RegulonDB, in 2013. In addition to maintaining curation up-to-date, we report a collection of 232 interactions with small RNAs affecting 192 genes, and the complete repertoire of 189 Elementary Genetic Sensory-Response units (GENSOR units), integrating the signal, regulatory interactions, and metabolic pathways they govern. These additions represent major progress to a higher level of understanding of regulated processes. We have updated the computationally predicted transcription factors, which total 304 (184 with experimental evidence and 120 from computational predictions); we updated our position-weight matrices and have included tools for clustering them in evolutionary families. We describe our semiautomatic strategy to accelerate curation, including datasets from high-throughput experiments, a novel coexpression distance to search for 'neighborhood' genes to known operons and regulons, and computational developments.

  17. RegulonDB version 9.0: high-level integration of gene regulation, coexpression, motif clustering and beyond

    PubMed Central

    Gama-Castro, Socorro; Salgado, Heladia; Santos-Zavaleta, Alberto; Ledezma-Tejeida, Daniela; Muñiz-Rascado, Luis; García-Sotelo, Jair Santiago; Alquicira-Hernández, Kevin; Martínez-Flores, Irma; Pannier, Lucia; Castro-Mondragón, Jaime Abraham; Medina-Rivera, Alejandra; Solano-Lira, Hilda; Bonavides-Martínez, César; Pérez-Rueda, Ernesto; Alquicira-Hernández, Shirley; Porrón-Sotelo, Liliana; López-Fuentes, Alejandra; Hernández-Koutoucheva, Anastasia; Moral-Chávez, Víctor Del; Rinaldi, Fabio; Collado-Vides, Julio

    2016-01-01

    RegulonDB (http://regulondb.ccg.unam.mx) is one of the most useful and important resources on bacterial gene regulation,as it integrates the scattered scientific knowledge of the best-characterized organism, Escherichia coli K-12, in a database that organizes large amounts of data. Its electronic format enables researchers to compare their results with the legacy of previous knowledge and supports bioinformatics tools and model building. Here, we summarize our progress with RegulonDB since our last Nucleic Acids Research publication describing RegulonDB, in 2013. In addition to maintaining curation up-to-date, we report a collection of 232 interactions with small RNAs affecting 192 genes, and the complete repertoire of 189 Elementary Genetic Sensory-Response units (GENSOR units), integrating the signal, regulatory interactions, and metabolic pathways they govern. These additions represent major progress to a higher level of understanding of regulated processes. We have updated the computationally predicted transcription factors, which total 304 (184 with experimental evidence and 120 from computational predictions); we updated our position-weight matrices and have included tools for clustering them in evolutionary families. We describe our semiautomatic strategy to accelerate curation, including datasets from high-throughput experiments, a novel coexpression distance to search for ‘neighborhood’ genes to known operons and regulons, and computational developments. PMID:26527724

  18. Localization of the labile disulfide bond between SU and TM of the murine leukemia virus envelope protein complex to a highly conserved CWLC motif in SU that resembles the active-site sequence of thiol-disulfide exchange enzymes.

    PubMed Central

    Pinter, A; Kopelman, R; Li, Z; Kayman, S C; Sanders, D A

    1997-01-01

    Previous studies have indicated that the surface (SU) and transmembrane (TM) subunits of the envelope protein (Env) of murine leukemia viruses (MuLVs) are joined by a labile disulfide bond that can be stabilized by treatment of virions with thiol-specific reagents. In the present study this observation was extended to the Envs of additional classes of MuLV, and the cysteines of SU involved in this linkage were mapped by proteolytic fragmentation analyses to the CWLC sequence present at the beginning of the C-terminal domain of SU. This sequence is highly conserved across a broad range of distantly related retroviruses and resembles the CXXC motif present at the active site of thiol-disulfide exchange enzymes. A model is proposed in which rearrangements of the SU-TM intersubunit disulfide linkage, mediated by the CWLC sequence, play roles in the assembly and function of the Env complex. PMID:9311907

  19. RSAT peak-motifs: motif analysis in full-size ChIP-seq datasets.

    PubMed

    Thomas-Chollier, Morgane; Herrmann, Carl; Defrance, Matthieu; Sand, Olivier; Thieffry, Denis; van Helden, Jacques

    2012-02-01

    ChIP-seq is increasingly used to characterize transcription factor binding and chromatin marks at a genomic scale. Various tools are now available to extract binding motifs from peak data sets. However, most approaches are only available as command-line programs, or via a website but with size restrictions. We present peak-motifs, a computational pipeline that discovers motifs in peak sequences, compares them with databases, exports putative binding sites for visualization in the UCSC genome browser and generates an extensive report suited for both naive and expert users. It relies on time- and memory-efficient algorithms enabling the treatment of several thousand peaks within minutes. Regarding time efficiency, peak-motifs outperforms all comparable tools by several orders of magnitude. We demonstrate its accuracy by analyzing data sets ranging from 4000 to 1,28,000 peaks for 12 embryonic stem cell-specific transcription factors. In all cases, the program finds the expected motifs and returns additional motifs potentially bound by cofactors. We further apply peak-motifs to discover tissue-specific motifs in peak collections for the p300 transcriptional co-activator. To our knowledge, peak-motifs is the only tool that performs a complete motif analysis and offers a user-friendly web interface without any restriction on sequence size or number of peaks.

  20. Integrated and Independent Learning of Hand-Related Constituent Sequences

    ERIC Educational Resources Information Center

    Berner, Michael P.; Hoffmann, Joachim

    2009-01-01

    In almost all daily activities fingers of both hands are used in coordinated succession. The present experiments explored whether learning in such tasks pertains not only to the overall sequence spanning both hands but also to the constituent sequences of each hand. In a serial reaction time task, 2 repeating hand-related sequences were…

  1. Automated classification of RNA 3D motifs and the RNA 3D Motif Atlas.

    PubMed

    Petrov, Anton I; Zirbel, Craig L; Leontis, Neocles B

    2013-10-01

    The analysis of atomic-resolution RNA three-dimensional (3D) structures reveals that many internal and hairpin loops are modular, recurrent, and structured by conserved non-Watson-Crick base pairs. Structurally similar loops define RNA 3D motifs that are conserved in homologous RNA molecules, but can also occur at nonhomologous sites in diverse RNAs, and which often vary in sequence. To further our understanding of RNA motif structure and sequence variability and to provide a useful resource for structure modeling and prediction, we present a new method for automated classification of internal and hairpin loop RNA 3D motifs and a new online database called the RNA 3D Motif Atlas. To classify the motif instances, a representative set of internal and hairpin loops is automatically extracted from a nonredundant list of RNA-containing PDB files. Their structures are compared geometrically, all-against-all, using the FR3D program suite. The loops are clustered into motif groups, taking into account geometric similarity and structural annotations and making allowance for a variable number of bulged bases. The automated procedure that we have implemented identifies all hairpin and internal loop motifs previously described in the literature. All motif instances and motif groups are assigned unique and stable identifiers and are made available in the RNA 3D Motif Atlas (http://rna.bgsu.edu/motifs), which is automatically updated every four weeks. The RNA 3D Motif Atlas provides an interactive user interface for exploring motif diversity and tools for programmatic data access.

  2. Automated classification of RNA 3D motifs and the RNA 3D Motif Atlas

    PubMed Central

    Petrov, Anton I.; Zirbel, Craig L.; Leontis, Neocles B.

    2013-01-01

    The analysis of atomic-resolution RNA three-dimensional (3D) structures reveals that many internal and hairpin loops are modular, recurrent, and structured by conserved non-Watson–Crick base pairs. Structurally similar loops define RNA 3D motifs that are conserved in homologous RNA molecules, but can also occur at nonhomologous sites in diverse RNAs, and which often vary in sequence. To further our understanding of RNA motif structure and sequence variability and to provide a useful resource for structure modeling and prediction, we present a new method for automated classification of internal and hairpin loop RNA 3D motifs and a new online database called the RNA 3D Motif Atlas. To classify the motif instances, a representative set of internal and hairpin loops is automatically extracted from a nonredundant list of RNA-containing PDB files. Their structures are compared geometrically, all-against-all, using the FR3D program suite. The loops are clustered into motif groups, taking into account geometric similarity and structural annotations and making allowance for a variable number of bulged bases. The automated procedure that we have implemented identifies all hairpin and internal loop motifs previously described in the literature. All motif instances and motif groups are assigned unique and stable identifiers and are made available in the RNA 3D Motif Atlas (http://rna.bgsu.edu/motifs), which is automatically updated every four weeks. The RNA 3D Motif Atlas provides an interactive user interface for exploring motif diversity and tools for programmatic data access. PMID:23970545

  3. MAR characteristic motifs mediate episomal vector in CHO cells.

    PubMed

    Lin, Yan; Li, Zhaoxi; Wang, Tianyun; Wang, Xiaoyin; Wang, Li; Dong, Weihua; Jing, Changqin; Yang, Xianjun

    2015-04-01

    An ideal gene therapy vector should enable persistent transgene expression without limitations in safety and reproducibility. Recent researches' insight into the ability of chromosomal matrix attachment regions (MARs) to mediate episomal maintenance of genetic elements allowed the development of a circular episomal vector. Although a MAR-mediated engineered vector has been developed, little is known on which motifs of MAR confer this function during interaction with the host genome. Here, we report an artificially synthesized DNA fragment containing only characteristic motif sequences that served as an alternative to human beta-interferon matrix attachment region sequence. The potential of the vector to mediate gene transfer in CHO cells was investigated. The short synthetic MAR motifs were found to mediate episomal vector at a low copy number for many generations without integration into the host genome. Higher transgene expression was maintained for at least 4 months. In addition, MAR was maintained episomally and conferred sustained EGFP expression even in nonselective CHO cells. All the results demonstrated that MAR characteristic sequence-based vector can function as stable episomes in CHO cells, supporting long-term and effective transgene expression.

  4. Identifying DNA Binding Motifs by Combining Data from Different Sources

    SciTech Connect

    Mao, Linyong; Resat, Haluk; Nagib Callaos; Katsuhisa Horimoto; Jake Chen; Amy Sze Chan

    2004-07-19

    A transcription factor regulates the expression of its target genes by binding to their operator regions. It functions by affecting the interactions between RNA polymerases and the gene's promoter. Many transcription factors bind to their targets by recognizing a specific DNA sequence pattern, which is referred to as a consensus sequence or a motif. Since it would remove the possible biases, combining biological data from different sources can be expected to improve the quality of the information extracted from the biological data. We analyzed the microarray gene expression data and the organism's genome sequence jointly to determine the transcription factor recognition sequences with more accuracy. Utilizing such a data integration approach, we have investigated the regulation of the photosynthesis genes of the purple non-sulphur photosynthetic bacterium Rhodobacter sphaeroides. The photosynthesis genes in this organism are tightly regulated as a function of environmental growth conditions by three major regulatory systems, PrrB/PrrA, AppA/PpsR and FnrL. In this study, we have detected a previously undefined PrrA consensus sequence, improved the previously known DNA-binding motif of PpsR, and confirmed the consensus sequence of the global regulator FnrL.

  5. Two overlapping sequence motifs within the polyomavirus enhancer are independently the targets of stimulation by both the tumor promoter 12-O-tetradecanoylphorbol-3-acetate and the Ha-ras oncogene

    SciTech Connect

    Yamaguchi, Yyko; Satake, Masanobu; Ito, Yoshiaki

    1989-03-01

    A tumor-promoting phorbol ester, 12-O-tetradecanoylphorbol-13-acetate (TPA), strongly stimulates the activity of polyomavirus enhancer in a human erythroleukemia cell line, K562. The target of stimulation was the previously defined A element (from nucleotides 5107 to 5130) of the enhancer. The authors found that within the A element, two partly overlapping sequence motifs (one from nucleotides 5107 to 5117, the other from nucleotides 5113 to 5121) were independently the targets of TPA stimulation. The former is homologous to the enhancer core sequence of the adenovirus type 5 E1A gene, and the latter shares the consensus AP-1-binding site. In addition, transiently expressed Ha-ras oncogene also stimulated these two subelements in K562 cells, as they reported for NIH 3T3 cells previously.

  6. Sampling Motif-Constrained Ensembles of Networks

    NASA Astrophysics Data System (ADS)

    Fischer, Rico; Leitão, Jorge C.; Peixoto, Tiago P.; Altmann, Eduardo G.

    2015-10-01

    The statistical significance of network properties is conditioned on null models which satisfy specified properties but that are otherwise random. Exponential random graph models are a principled theoretical framework to generate such constrained ensembles, but which often fail in practice, either due to model inconsistency or due to the impossibility to sample networks from them. These problems affect the important case of networks with prescribed clustering coefficient or number of small connected subgraphs (motifs). In this Letter we use the Wang-Landau method to obtain a multicanonical sampling that overcomes both these problems. We sample, in polynomial time, networks with arbitrary degree sequences from ensembles with imposed motifs counts. Applying this method to social networks, we investigate the relation between transitivity and homophily, and we quantify the correlation between different types of motifs, finding that single motifs can explain up to 60% of the variation of motif profiles.

  7. Fitting a mixture model by expectation maximization to discover motifs in biopolymers

    SciTech Connect

    Bailey, T.L.; Elkan, C.

    1994-12-31

    The algorithm described in this paper discovers one or more motifs in a collection of DNA or protein sequences by using the technique of expectation maximization to fit a two-component finite mixture model to the set of sequences. Multiple motifs are found by fitting a mixture model to the data, probabilistically erasing the occurrences of the motif thus found, and repeating the process to find successive motifs. The algorithm requires only a set of unaligned sequences and a number specifying the width of the motifs as input. It returns a model of each motif and a threshold which together can be used as a Bayes-optimal classifier for searching for occurrences of the motif in other databases. The algorithm estimates how many times each motif occurs in each sequence in the dataset and outputs an alignment of the occurrences of the motif. The algorithm is capable of discovering several different motifs with differing numbers of occurrences in a single dataset.

  8. A survey of motif finding Web tools for detecting binding site motifs in ChIP-Seq data

    PubMed Central

    2014-01-01

    Abstract ChIP-Seq (chromatin immunoprecipitation sequencing) has provided the advantage for finding motifs as ChIP-Seq experiments narrow down the motif finding to binding site locations. Recent motif finding tools facilitate the motif detection by providing user-friendly Web interface. In this work, we reviewed nine motif finding Web tools that are capable for detecting binding site motifs in ChIP-Seq data. We showed each motif finding Web tool has its own advantages for detecting motifs that other tools may not discover. We recommended the users to use multiple motif finding Web tools that implement different algorithms for obtaining significant motifs, overlapping resemble motifs, and non-overlapping motifs. Finally, we provided our suggestions for future development of motif finding Web tool that better assists researchers for finding motifs in ChIP-Seq data. Reviewers This article was reviewed by Prof. Sandor Pongor, Dr. Yuriy Gusev, and Dr. Shyam Prabhakar (nominated by Prof. Limsoon Wong). PMID:24555784

  9. VARUN: discovering extensible motifs under saturation constraints.

    PubMed

    Apostolico, Alberto; Comin, Matteo; Parida, Laxmi

    2010-01-01

    The discovery of motifs in biosequences is frequently torn between the rigidity of the model on one hand and the abundance of candidates on the other hand. In particular, motifs that include wild cards or "don't cares" escalate exponentially with their number, and this gets only worse if a don't care is allowed to stretch up to some prescribed maximum length. In this paper, a notion of extensible motif in a sequence is introduced and studied, which tightly combines the structure of the motif pattern, as described by its syntactic specification, with the statistical measure of its occurrence count. It is shown that a combination of appropriate saturation conditions and the monotonicity of probabilistic scores over regions of constant frequency afford us significant parsimony in the generation and testing of candidate overrepresented motifs. A suite of software programs called Varun is described, implementing the discovery of extensible motifs of the type considered. The merits of the method are then documented by results obtained in a variety of experiments primarily targeting protein sequence families. Of equal importance seems the fact that the sets of all surprising motifs returned in each experiment are extracted faster and come in much more manageable sizes than would be obtained in the absence of saturation constraints.

  10. NG6: Integrated next generation sequencing storage and processing environment

    PubMed Central

    2012-01-01

    Background Next generation sequencing platforms are now well implanted in sequencing centres and some laboratories. Upcoming smaller scale machines such as the 454 junior from Roche or the MiSeq from Illumina will increase the number of laboratories hosting a sequencer. In such a context, it is important to provide these teams with an easily manageable environment to store and process the produced reads. Results We describe a user-friendly information system able to manage large sets of sequencing data. It includes, on one hand, a workflow environment already containing pipelines adapted to different input formats (sff, fasta, fastq and qseq), different sequencers (Roche 454, Illumina HiSeq) and various analyses (quality control, assembly, alignment, diversity studies,…) and, on the other hand, a secured web site giving access to the results. The connected user will be able to download raw and processed data and browse through the analysis result statistics. The provided workflows can easily be modified or extended and new ones can be added. Ergatis is used as a workflow building, running and monitoring system. The analyses can be run locally or in a cluster environment using Sun Grid Engine. Conclusions NG6 is a complete information system designed to answer the needs of a sequencing platform. It provides a user-friendly interface to process, store and download high-throughput sequencing data. PMID:22958229

  11. Characteristic motifs for families of allergenic proteins

    PubMed Central

    Ivanciuc, Ovidiu; Garcia, Tzintzuni; Torres, Miguel; Schein, Catherine H.; Braun, Werner

    2008-01-01

    The identification of potential allergenic proteins is usually done by scanning a database of allergenic proteins and locating known allergens with a high sequence similarity. However, there is no universally accepted cut-off value for sequence similarity to indicate potential IgE cross-reactivity. Further, overall sequence similarity may be less important than discrete areas of similarity in proteins with homologous structure. To identify such areas, we first classified all allergens and their subdomains in the Structural Database of Allergenic Proteins (SDAP, http://fermi.utmb.edu/SDAP/) to their closest protein families as defined in Pfam, and identified conserved physicochemical property motifs characteristic of each group of sequences. Allergens populate only a small subset of all known Pfam families, as all allergenic proteins in SDAP could be grouped to only 130 (of 9318 total) Pfams, and 31 families contain more than four allergens. Conserved physicochemical property motifs for the aligned sequences of the most populated Pfam families were identified with the PCPMer program suite and catalogued in the webserver Motif-Mate (http://born.utmb.edu/motifmate/summary.php). We also determined specific motifs for allergenic members of a family that could distinguish them from non-allergenic ones. These allergen specific motifs should be most useful in database searches for potential allergens. We found that sequence motifs unique to the allergens in three families (seed storage proteins, Bet v 1, and tropomyosin) overlap with known IgE epitopes, thus providing evidence that our motif based approach can be used to assess the potential allergenicity of novel proteins. PMID:18951633

  12. Sharing of four DR-beta sequence motifs between HLA-DRB1*1601 and DRB1*1101 correlates with frequent degenerate T-cell recognition of HA306-320 peptide complexed to these two molecules.

    PubMed

    Zeliszewski, D; Dorval, I; Golvano, J J; Prevost, A; Borras-Cuesta, F; Sterkers, G

    1996-02-01

    This paper shows that the seven HA306-320 specific T-cell clones isolated from one individual recognize the peptide complexed to both autologous HLA-DRB1*1101 and allogeneic HLA-DRB1*1601 (or DRB5*0201) molecules. For each T-cell clone, a single T-cell receptor (TCR) is involved in the recognition of these two different peptide-DR complexes as evidenced by cold target competition experiments. Yet, the seven T-cell clones express several different TCRs as judged by V beta-J beta usage and fine specificities. Furthermore, one representative clone has the same fine specificity for HA306-320 analogues mutated at epitopic residues irrespective of the use of DR1101 or DR1601 APC. These results suggest that structural differences between DRB1*1101 and DRB1*1601 (or DRB5*0201) do not dramatically influence the orientation of HA306-320 in the grooves such that most residues interacting with TCRs are conserved. In another individual, the same pattern of restriction, i.e. DR1101 + DR1601, was found for several HA306-320 specific clones. Two additional patterns, DR1101 + DR0801 and DR1101 + DR0801 + DR1601, were identified. By comparing DR sequences the authors found that DRB1*1101 and DRB1*1601 share four important motifs, i.e. beta 85-86, beta 67-71, beta 57 and beta 28-31 supposed to line three distinct HLA-DR pockets. Three of these motifs are also shared with DRB1*0801. All the results further support that the motif similarities allow the peptide to adopt very similar orientations in the cross-reacting DR molecules.

  13. Integrating sequence, evolution and functional genomics in regulatory genomics

    PubMed Central

    Vingron, Martin; Brazma, Alvis; Coulson, Richard; van Helden, Jacques; Manke, Thomas; Palin, Kimmo; Sand, Olivier; Ukkonen, Esko

    2009-01-01

    With genome analysis expanding from the study of genes to the study of gene regulation, 'regulatory genomics' utilizes sequence information, evolution and functional genomics measurements to unravel how regulatory information is encoded in the genome. PMID:19226437

  14. The Thiamin Pyrophosphate-Motif

    NASA Technical Reports Server (NTRS)

    Dominiak, Paulina M.; Ciszak, Ewa M.

    2003-01-01

    Using databases the authors have identified a common thiamin pyrophosphate (TPP)-motif in the family of functionally diverse TPP-dependent enzymes. This common motif consists of multimeric organization of subunits, two catalytic centers, common amino acid sequence, and specific contacts to provide a flip-flop, or alternate site, mechanism of action. Each catalytic center [PP:PYR] is formed at the interface of the PP-domain binding the magnesium ion, pyrophosphate and aminopyrimidine ring of TPP, and the PYR-domain binding the aminopyrimidine ring of that cofactor. A pair of these catalytic centers constitutes the catalytic core [PP:PYR]* within these enzymes. Analysis of the structural elements of this catalytic core reveals novel definition of the common amino acid sequences, which are GX@&(G)@XXGQ, and GDGX25-30 within the PP- domain, and the E&(G)@XXG@ within the PYR-domain, where Q, corresponds to a hydrophobic amino acid. This TPP-motif provides a novel tool for annotation of TPP-dependent enzymes useful in advancing functional proteomics.

  15. Allele drop-out in the MECP2 gene due to G-quadruplex and i-motif sequences when using polymerase chain reaction-based diagnosis for Rett syndrome.

    PubMed

    Saunders, Carol J; Friez, Michael J; Patterson, Melanie; Nzabi, Masha; Zhao, Weiwei; Bi, Chengpeng

    2010-04-01

    Although few examples are formally documented, all polymerase chain reaction-based testing is theoretically vulnerable to allele drop-out (ADO), the failure to amplify one of the two alleles present in a cell. In a clinical setting, this can lead to false positive or negative diagnosis. We investigated the mechanisms leading to ADO in the MECP2 gene in two unrelated female patients undergoing testing for Rett syndrome. Both the patients had two benign DNA variations, c.819G > T and c.1161C > T, that appeared homozygous due to ADO. Bioinformatics analyses indicate that this region of the MECP2 gene is rich in complex tertiary structures called G-quadruplex and i-motifs, the disruption of which by the c.819G > T and c.1161C > T variants leads to preferential amplification of the variant allele. Other examples of ADO likely occur, and consideration of disrupting G-quadruplex and i-motif structures should be given when this phenomenon is unexpected. We identify factors in both the polymerase chain reaction amplification and the sequencing steps that help overcome ADO.

  16. The complete sequence of a Spanish isolate of Broad bean wilt virus 1 (BBWV-1) reveals a high variability and conserved motifs in the genus Fabavirus.

    PubMed

    Ferrer, R M; Guerri, J; Luis-Arteaga, M S; Moreno, P; Rubio, L

    2005-10-01

    The genome of a Spanish isolate of Broad bean wilt virus-1 (BBWV-1) was completely sequenced and compared with available sequences of other isolates of the genus Fabavirus (BBWV-1 and BBWV-2). This consisted of two RNAs of 5814 and 3431 nucleotides, respectively, and their organization was similar to that of other members of the family Comoviridae. Its mean nucleotide identity with a BBWV-1 American isolate was 81.5%, and between 59.8 and 63.5% with seven BBWV-2 isolates. Our analysis showed sequence stretches in the 5' non-coding regions which are conserved in both genomic RNAs and in BBWV-1 and BBWV-2 isolates.

  17. Evolutionary Analysis and Classification of OATs, OCTs, OCTNs, and Other SLC22 Transporters: Structure-Function Implications and Analysis of Sequence Motifs.

    PubMed

    Zhu, Christopher; Nigam, Kabir B; Date, Rishabh C; Bush, Kevin T; Springer, Stevan A; Saier, Milton H; Wu, Wei; Nigam, Sanjay K

    2015-01-01

    The SLC22 family includes organic anion transporters (OATs), organic cation transporters (OCTs) and organic carnitine and zwitterion transporters (OCTNs). These are often referred to as drug transporters even though they interact with many endogenous metabolites and signaling molecules (Nigam, S.K., Nature Reviews Drug Discovery, 14:29-44, 2015). Phylogenetic analysis of SLC22 supports the view that these transporters may have evolved over 450 million years ago. Many OAT members were found to appear after a major expansion of the SLC22 family in mammals, suggesting a physiological and/or toxicological role during the mammalian radiation. Putative SLC22 orthologs exist in worms, sea urchins, flies, and ciona. At least six groups of SLC22 exist. OATs and OCTs form two Major clades of SLC22, within which (apart from Oat and Oct subclades), there are also clear Oat-like, Octn, and Oct-related subclades, as well as a distantly related group we term "Oat-related" (which may have different functions). Based on available data, it is arguable whether SLC22A18, which is related to bacterial drug-proton antiporters, should be assigned to SLC22. Disease-causing mutations, single nucleotide polymorphisms (SNPs) and other functionally analyzed mutations in OAT1, OAT3, URAT1, OCT1, OCT2, OCTN1, and OCTN2 map to the first extracellular domain, the large central intracellular domain, and transmembrane domains 9 and 10. These regions are highly conserved within subclades, but not between subclades, and may be necessary for SLC22 transporter function and functional diversification. Our results not only link function to evolutionarily conserved motifs but indicate the need for a revised sub-classification of SLC22.

  18. Evolutionary Analysis and Classification of OATs, OCTs, OCTNs, and Other SLC22 Transporters: Structure-Function Implications and Analysis of Sequence Motifs

    PubMed Central

    Date, Rishabh C.; Bush, Kevin T.; Springer, Stevan A.; Saier, Milton H.; Wu, Wei; Nigam, Sanjay K.

    2015-01-01

    The SLC22 family includes organic anion transporters (OATs), organic cation transporters (OCTs) and organic carnitine and zwitterion transporters (OCTNs). These are often referred to as drug transporters even though they interact with many endogenous metabolites and signaling molecules (Nigam, S.K., Nature Reviews Drug Discovery, 14:29–44, 2015). Phylogenetic analysis of SLC22 supports the view that these transporters may have evolved over 450 million years ago. Many OAT members were found to appear after a major expansion of the SLC22 family in mammals, suggesting a physiological and/or toxicological role during the mammalian radiation. Putative SLC22 orthologs exist in worms, sea urchins, flies, and ciona. At least six groups of SLC22 exist. OATs and OCTs form two Major clades of SLC22, within which (apart from Oat and Oct subclades), there are also clear Oat-like, Octn, and Oct-related subclades, as well as a distantly related group we term “Oat-related” (which may have different functions). Based on available data, it is arguable whether SLC22A18, which is related to bacterial drug-proton antiporters, should be assigned to SLC22. Disease-causing mutations, single nucleotide polymorphisms (SNPs) and other functionally analyzed mutations in OAT1, OAT3, URAT1, OCT1, OCT2, OCTN1, and OCTN2 map to the first extracellular domain, the large central intracellular domain, and transmembrane domains 9 and 10. These regions are highly conserved within subclades, but not between subclades, and may be necessary for SLC22 transporter function and functional diversification. Our results not only link function to evolutionarily conserved motifs but indicate the need for a revised sub-classification of SLC22. PMID:26536134

  19. Phylogenetic analysis, based on EPIYA repeats in the cagA gene of Indian Helicobacter pylori, and the implications of sequence variation in tyrosine phosphorylation motifs on determining the clinical outcome.

    PubMed

    Tiwari, Santosh K; Sharma, Vishwas; Sharma, Varun Kumar; Gopi, Manoj; Saikant, R; Nandan, Amrita; Bardia, Avinash; Gunisetty, Sivaram; Katikala, Prasanth; Habeeb, Md Aejaz; Khan, Aleem A; Habibullah, C M

    2011-04-01

    The population of India harbors one of the world's most highly diverse gene pools, owing to the influx of successive waves of immigrants over regular periods in time. Several phylogenetic studies involving mitochondrial DNA and Y chromosomal variation have demonstrated Europeans to have been the first settlers in India. Nevertheless, certain controversy exists, due to the support given to the thesis that colonization was by the Austro-Asiatic group, prior to the Europeans. Thus, the aim was to investigate pre-historic colonization of India by anatomically modern humans, using conserved stretches of five amino acid (EPIYA) sequences in the cagA gene of Helicobacter pylori. Simultaneously, the existence of a pathogenic relationship of tyrosine phosphorylation motifs (TPMs), in 32 H. pylori strains isolated from subjects with several forms of gastric diseases, was also explored. High resolution sequence analysis of the above described genes was performed. The nucleotide sequences obtained were translated into amino acids using MEGA (version 4.0) software for EPIYA. An MJ-Network was constructed for obtaining TPM haplotypes by using NETWORK (version 4.5) software. The findings of the study suggest that Indian H. pylori strains share a common ancestry with Europeans. No specific association of haplotypes with the outcome of disease was revealed through additional network analysis of TPMs.

  20. The highly conserved amino acid sequence motif Tyr-Gly-Asp-Thr-Asp-Ser in alpha-like DNA polymerases is required by phage phi 29 DNA polymerase for protein-primed initiation and polymerization.

    PubMed Central

    Bernad, A; Lázaro, J M; Salas, M; Blanco, L

    1990-01-01

    The alpha-like DNA polymerases from bacteriophage phi 29 and other viruses, prokaryotes and eukaryotes contain an amino acid consensus sequence that has been proposed to form part of the dNTP binding site. We have used site-directed mutants to study five of the six highly conserved consecutive amino acids corresponding to the most conserved C-terminal segment (Tyr-Gly-Asp-Thr-Asp-Ser). Our results indicate that in phi 29 DNA polymerase this consensus sequence, although irrelevant for the 3'----5' exonuclease activity, is essential for initiation and elongation. Based on these results and on its homology with known or putative metal-binding amino acid sequences, we propose that in phi 29 DNA polymerase the Tyr-Gly-Asp-Thr-Asp-Ser consensus motif is part of the dNTP binding site, involved in the synthetic activities of the polymerase (i.e., initiation and polymerization), and that it is involved particularly in the metal binding associated with the dNTP site. Images PMID:2191296

  1. DNA motifs determining the efficiency of adaptation into the Escherichia coli CRISPR array.

    PubMed

    Yosef, Ido; Shitrit, Dror; Goren, Moran G; Burstein, David; Pupko, Tal; Qimron, Udi

    2013-08-27

    Clustered regularly interspaced short palindromic repeats (CRISPR) and their associated proteins constitute a recently identified prokaryotic defense system against invading nucleic acids. DNA segments, termed protospacers, are integrated into the CRISPR array in a process called adaptation. Here, we establish a PCR-based assay that enables evaluating the adaptation efficiency of specific spacers into the type I-E Escherichia coli CRISPR array. Using this assay, we provide direct evidence that the protospacer adjacent motif along with the first base of the protospacer (5'-AAG) partially affect the efficiency of spacer acquisition. Remarkably, we identified a unique dinucleotide, 5'-AA, positioned at the 3' end of the spacer, that enhances efficiency of the spacer's acquisition. Insertion of this dinucleotide increased acquisition efficiency of two different spacers. DNA sequencing of newly adapted CRISPR arrays revealed that the position of the newly identified motif with respect to the 5'-AAG is important for affecting acquisition efficiency. Analysis of approximately 1 million spacers showed that this motif is overrepresented in frequently acquired spacers compared with those acquired rarely. Our results represent an example of a short nonprotospacer adjacent motif sequence that affects acquisition efficiency and suggest that other as yet unknown motifs affect acquisition efficiency in other CRISPR systems as well.

  2. Integration of new alternative reference strain genome sequences into the Saccharomyces genome database.

    PubMed

    Song, Giltae; Balakrishnan, Rama; Binkley, Gail; Costanzo, Maria C; Dalusag, Kyla; Demeter, Janos; Engel, Stacia; Hellerstedt, Sage T; Karra, Kalpana; Hitz, Benjamin C; Nash, Robert S; Paskov, Kelley; Sheppard, Travis; Skrzypek, Marek; Weng, Shuai; Wong, Edith; Michael Cherry, J

    2016-01-01

    The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) is the authoritative community resource for the Saccharomyces cerevisiae reference genome sequence and its annotation. To provide a wider scope of genetic and phenotypic variation in yeast, the genome sequences and their corresponding annotations from 11 alternative S. cerevisiae reference strains have been integrated into SGD. Genomic and protein sequence information for genes from these strains are now available on the Sequence and Protein tab of the corresponding Locus Summary pages. We illustrate how these genome sequences can be utilized to aid our understanding of strain-specific functional and phenotypic differences.Database URL: www.yeastgenome.org.

  3. Prediction and prioritization of neoantigens: integration of RNA sequencing data with whole-exome sequencing.

    PubMed

    Karasaki, Takahiro; Nagayama, Kazuhiro; Kuwano, Hideki; Nitadori, Jun-Ichi; Sato, Masaaki; Anraku, Masaki; Hosoi, Akihiro; Matsushita, Hirokazu; Takazawa, Masaki; Ohara, Osamu; Nakajima, Jun; Kakimi, Kazuhiro

    2017-02-01

    The importance of neoantigens for cancer immunity is now well-acknowledged. However, there are diverse strategies for predicting and prioritizing candidate neoantigens, and thus reported neoantigen loads vary a great deal. To clarify this issue, we compared the numbers of neoantigen candidates predicted by four currently utilized strategies. Whole-exome sequencing and RNA sequencing (RNA-Seq) of four non-small-cell lung cancer patients was carried out. We identified 361 somatic missense mutations from which 224 candidate neoantigens were predicted using MHC class I binding affinity prediction software (strategy I). Of these, 207 exceeded the set threshold of gene expression (fragments per kilobase of transcript per million fragments mapped ≥1), resulting in 124 candidate neoantigens (strategy II). To verify mutant mRNA expression, sequencing of amplicons from tumor cDNA including each mutation was undertaken; 204 of the 207 mutations were successfully sequenced, yielding 121 mutant mRNA sequences, resulting in 75 candidate neoantigens (strategy III). Sequence information was extracted from RNA-Seq to confirm the presence of mutated mRNA. Variant allele frequencies ≥0.04 in RNA-Seq were found for 117 of the 207 mutations and regarded as expressed in the tumor, and finally, 72 candidate neoantigens were predicted (strategy IV). Without additional amplicon sequencing of cDNA, strategy IV was comparable to strategy III. We therefore propose strategy IV as a practical and appropriate strategy to predict candidate neoantigens fully utilizing currently available information. It is of note that different neoantigen loads were deduced from the same tumors depending on the strategies applied.

  4. Process sequence optimization for digital microfluidic integration using EWOD technique

    NASA Astrophysics Data System (ADS)

    Yadav, Supriya; Joyce, Robin; Sharma, Akash Kumar; Sharma, Himani; Sharma, Niti Nipun; Varghese, Soney; Akhtar, Jamil

    2016-04-01

    Micro/nano-fluidic MEMS biosensors are the devices that detects the biomolecules. The emerging micro/nano-fluidic devices provide high throughput and high repeatability with very low response time and reduced device cost as compared to traditional devices. This article presents the experimental details for process sequence optimization of digital microfluidics (DMF) using "electrowetting-on-dielectric" (EWOD). Stress free thick film deposition of silicon dioxide using PECVD and subsequent process for EWOD techniques have been optimized in this work.

  5. Import of desired nucleic acid sequences using addressing motif of mitochondrial ribosomal 5S-rRNA for fluorescent in vivo hybridization of mitochondrial DNA and RNA.

    PubMed

    Zelenka, Jaroslav; Alán, Lukáš; Jabůrek, Martin; Ježek, Petr

    2014-04-01

    Based on the matrix-addressing sequence of mitochondrial ribosomal 5S-rRNA (termed MAM), which is naturally imported into mitochondria, we have constructed an import system for in vivo targeting of mitochondrial DNA (mtDNA) or mt-mRNA, in order to provide fluorescence hybridization of the desired sequences. Thus DNA oligonucleotides were constructed, containing the 5'-flanked T7 RNA polymerase promoter. After in vitro transcription and fluorescent labeling with Alexa Fluor(®) 488 or 647 dye, we obtained the fluorescent "L-ND5 probe" containing MAM and exemplar cargo, i.e., annealing sequence to a short portion of ND5 mRNA and to the light-strand mtDNA complementary to the heavy strand nd5 mt gene (5'-end 21 base pair sequence). For mitochondrial in vivo fluorescent hybridization, HepG2 cells were treated with dequalinium micelles, containing the fluorescent probes, bringing the probes proximally to the mitochondrial outer membrane and to the natural import system. A verification of import into the mitochondrial matrix of cultured HepG2 cells was provided by confocal microscopy colocalizations. Transfections using lipofectamine or probes without 5S-rRNA addressing MAM sequence or with MAM only were ineffective. Alternatively, the same DNA oligonucleotides with 5'-CACC overhang (substituting T7 promoter) were transcribed from the tetracycline-inducible pENTRH1/TO vector in human embryonic kidney T-REx®-293 cells, while mitochondrial matrix localization after import of the resulting unlabeled RNA was detected by PCR. The MAM-containing probe was then enriched by three-order of magnitude over the natural ND5 mRNA in the mitochondrial matrix. In conclusion, we present a proof-of-principle for mitochondrial in vivo hybridization and mitochondrial nucleic acid import.

  6. Model peptide studies of sequence regions in the elastomeric biomineralization protein, Lustrin A. I. The C-domain consensus-PG-, -NVNCT-motif.

    PubMed

    Zhang, Bo; Wustman, Brandon A; Morse, Daniel; Evans, John Spencer

    2002-05-01

    The lustrin superfamily represents a unique group of biomineralization proteins localized between layered aragonite mineral plates (i.e., nacre layer) in mollusk shell. Recent atomic force microscopy (AFM) pulling studies have demonstrated that the lustrin-containing organic nacre layer in the abalone, Haliotis rufescens, exhibits a typical sawtooth force-extension curve with hysteretic recovery. This force extension behavior is reminiscent of reversible unfolding and refolding in elastomeric proteins such as titin and tenascin. Since secondary structure plays an important role in force-induced protein unfolding and refolding, the question is, What secondary structure(s) exist within the major domains of Lustrin A? Using a model peptide (FPGKNVNCTSGE) representing the 12-residue consensus sequence found near the N-termini of the first eight cysteine-rich domains (C-domains) within the Lustrin A protein, we employed CD, NMR spectroscopy, and simulated annealing/minimization to determine the secondary structure preferences for this sequence. At pH 7.4, we find that the 12-mer sequence adopts a loop conformation, consisting of a "bend" or "turn" involving residues G3-K4 and N7-C8-T9, with extended conformations arising at F1-G3; K4-V6; T9-S10-G11 in the sequence. Minor pH-dependent conformational effects were noted for this peptide; however, there is no evidence for a salt-bridge interaction between the K4 and E12 side chains. The presence of a loop conformation within the highly conserved -PG-, -NVNCT- sequence of C1-C8 domains may have important structural and mechanistic implications for the Lustrin A protein with regard to elastic behavior.

  7. Ichnofabric and siliciclastic depositional systems: Integration for sequence stratigraphic analysis

    SciTech Connect

    Bottjer, D.J. ); Droser, M.L. )

    1991-03-01

    Much previous research on biogenic sedimentary structures has established how ichnofacies (assemblages of discrete trace fossils) vary within marine depositional systems. However, studies aimed at understanding the distribution of ichnofabric (sedimentary rock fabric resulting from biogenic reworking) have only recently been attempted. Because ichnofabric can be recorded using a semi-quantitative series of ichnofabric indices (ii), its distribution in marine sedimentary rocks can be easily recorded through vertical sequence analysis. Thicknesses of strata recording different ichnofabric indices can be logged from stratigraphic sections or cores. These data are best displayed in histograms as percent of ii recorded from the total thickness measured. These ichnofabric histograms (ichnograms) show variable but distinctive distributions for genetic units such as facies within systems tracts of siliciclastic depositional sequences. An average ichnofabric index for any genetic sedimentary unit can also be computed from the data used to construct ichnograms. Because skeletal fossils are typically much less commonly preserved in siliciclastic than carbonate depositional systems, such ichnofabric analyses have the potential of providing an important new line of evidence for depositional systems and sequence stratigraphic analysis of siliciclastic strata. In petroleum exploration results from completing analyses of ichnofabric distribution could provide important information including: (1) systems tracts with fine-grained facies that have relatively low ichnofabric values are potential source beds; and (2) petroleum reservoirs that occur in coarse episodically deposited beds are more likely to from in systems tracts with facies that have low rather than high ichnofabric values.

  8. Cloning, expression and functional characterization of the putative regeneration and tolerance factor (RTF/TJ6) as a functional vacuolar ATPase proton pump regulatory subunit with a conserved sequence of immunoreceptor tyrosine-based activation motif.

    PubMed

    Babichev, Yael; Tamir, Ami; Park, Meeyoug; Muallem, Shmuel; Isakov, Noah

    2005-10-01

    In an attempt to identify new immunoreceptor tyrosine-based activation motif (ITAM)-containing human molecules that may regulate hitherto unknown immune cell functions, we BLAST searched the National Center for Biotechnology Information database for ITAM-containing sequences. A human expressed sequence tag showing partial homology to the murine TJ6 (mTJ6) gene and encoding a putative ITAM sequence has been identified and used to clone the human TJ6 (hTJ6) gene from an HL-60-derived cDNA library. hTJ6 was found to encode a protein of 856 residues with a calculated mass of 98 155 Da. Immunolocalization and sequence analysis revealed that hTJ6 is a membrane protein with predicted six transmembrane-spanning regions, typical of ion channels, and a single putative ITAM (residues 452-466) in a juxtamembrane or hydrophobic intramembrane region. hTJ6 is highly homologous to Bos taurus 116-kDa subunit of the vacuolar proton-translocating ATPase. Over-expression of hTJ6 in HEK 293 cells increased H+ uptake into intracellular organelles, an effect that was sensitive to inhibition by bafilomycin, a selective inhibitor of vacuolar H+ pump. Northern blot analysis demonstrated three different hybridizing mRNA transcripts corresponding to 3.2, 5.0 and 7.3 kb, indicating the presence of several splice variants. Significant differences in hTJ6 mRNA levels in human tissues of different origins point to possible tissue-specific function. Although hTJ6 was found to be a poor substrate for tyrosine-phosphorylating enzymes, suggesting that its ITAM sequence is non-functional in protein tyrosine kinase-mediated signaling pathways, its role in organellar H+ pumping suggests that hTJ6 function may participate in protein trafficking/processing.

  9. An Integrated Sequence-Structure Database incorporating matching mRNA sequence, amino acid sequence and protein three-dimensional structure data.

    PubMed Central

    Adzhubei, I A; Adzhubei, A A; Neidle, S

    1998-01-01

    We have constructed a non-homologous database, termed the Integrated Sequence-Structure Database (ISSD) which comprises the coding sequences of genes, amino acid sequences of the corresponding proteins, their secondary structure and straight phi,psi angles assignments, and polypeptide backbone coordinates. Each protein entry in the database holds the alignment of nucleotide sequence, amino acid sequence and the PDB three-dimensional structure data. The nucleotide and amino acid sequences for each entry are selected on the basis of exact matches of the source organism and cell environment. The current version 1.0 of ISSD is available on the WWW at http://www.protein.bio.msu.su/issd/ and includes 107 non-homologous mammalian proteins, of which 80 are human proteins. The database has been used by us for the analysis of synonymous codon usage patterns in mRNA sequences showing their correlation with the three-dimensional structure features in the encoded proteins. Possible ISSD applications include optimisation of protein expression, improvement of the protein structure prediction accuracy, and analysis of evolutionary aspects of the nucleotide sequence-protein structure relationship. PMID:9399866

  10. The Thiamin Pyrophosphate-Motif

    NASA Technical Reports Server (NTRS)

    Dominiak, P.; Ciszak, E.

    2003-01-01

    Using databases the authors have identified a common thiamin pyrophosphate (TPP)-motif in the family of functionally diverse TPP-dependent enzymes. This common motif consists of multimeric organization of subunits and two catalytic centers. Each catalytic center (PP:PYR) is formed at the interface of the PP-domain binding the magnesium ion, pyrophosphate and amhopyrimidine ring of TPP, and the PYR-domain binding the aminopyrimidine ring of that cofactor. A pair of these catalytic centers constitutes the catalytic core (PP:PYR)(sub 2) within these enzymes. Analysis of the structural elements of this catalytic core reveals novel definition of the common amino acid sequences, which are GXPhiX(sub 4)(G)PhiXXGQ and GDGX(sub 25-30)NN in the PP-domain, and the EX(sub 4)(G)PhiXXGPhi in the PYR-domain, where Phi corresponds to a hydrophobic amino acid. This TPP-motif provides a novel tool for annotation of TPP-dependent enzymes useful in advancing functional proteomics.

  11. The Thiamine-Pyrophosphate-Motif

    NASA Technical Reports Server (NTRS)

    Ciszak, Ewa; Dominiak, Paulina

    2004-01-01

    Thiamin pyrophosphate (TPP), a derivative of vitamin B1, is a cofactor for enzymes performing catalysis in pathways of energy production including the well known decarboxylation of a-keto acid dehydrogenases followed by transketolation. TPP-dependent enzymes constitute a structurally and functionally diverse group exhibiting multimeric subunit organization, multiple domains and two chemically equivalent catalytic centers. Annotation of functional TPP-dependcnt enzymes, therefore, has not been trivial due to low sequence similarity related to this complex organization. Our approach to analysis of structures of known TPP-dependent enzymes reveals for the first time features common to this group, which we have termed the TPP-motif. The TPP-motif consists of specific spatial arrangements of structural elements and their specific contacts to provide for a flip-flop, or alternate site, enzymatic mechanism of action. Analysis of structural elements entrained in the flip-flop action displayed by TPP-dependent enzymes reveals a novel definition of the common amino acid sequences. These sequences allow for annotation of TPP-dependent enzymes, thus advancing functional proteomics. Further details of three-dimensional structures of TPP-dependent enzymes will be discussed.

  12. Sequence-Based Screening for Rare Enzymes: New Insights into the World of AMDases Reveal a Conserved Motif and 58 Novel Enzymes Clustering in Eight Distinct Families

    PubMed Central

    Maimanakos, Janine; Chow, Jennifer; Gaßmeyer, Sarah K.; Güllert, Simon; Busch, Florian; Kourist, Robert; Streit, Wolfgang R.

    2016-01-01

    Arylmalonate Decarboxylases (AMDases, EC 4.1.1.76) are very rare and mostly underexplored enzymes. Currently only four known and biochemically characterized representatives exist. However, their ability to decarboxylate α-disubstituted malonic acid derivatives to optically pure products without cofactors makes them attractive and promising candidates for the use as biocatalysts in industrial processes. Until now, AMDases could not be separated from other members of the aspartate/glutamate racemase superfamily based on their gene sequences. Within this work, a search algorithm was developed that enables a reliable prediction of AMDase activity for potential candidates. Based on specific sequence patterns and screening methods 58 novel AMDase candidate genes could be identified in this work. Thereby, AMDases with the conserved sequence pattern of Bordetella bronchiseptica’s prototype appeared to be limited to the classes of Alpha-, Beta-, and Gamma-proteobacteria. Amino acid homologies and comparison of gene surrounding sequences enabled the classification of eight enzyme clusters. Particularly striking is the accumulation of genes coding for different transporters of the tripartite tricarboxylate transporters family, TRAP transporters and ABC transporters as well as genes coding for mandelate racemases/muconate lactonizing enzymes that might be involved in substrate uptake or degradation of AMDase products. Further, three novel AMDases were characterized which showed a high enantiomeric excess (>99%) of the (R)-enantiomer of flurbiprofen. These are the recombinant AmdA and AmdV from Variovorax sp. strains HH01 and HH02, originated from soil, and AmdP from Polymorphum gilvum found by a data base search. Altogether our findings give new insights into the class of AMDases and reveal many previously unknown enzyme candidates with high potential for bioindustrial processes. PMID:27610105

  13. An Affinity Propagation-Based DNA Motif Discovery Algorithm.

    PubMed

    Sun, Chunxiao; Huo, Hongwei; Yu, Qiang; Guo, Haitao; Sun, Zhigang

    2015-01-01

    The planted (l, d) motif search (PMS) is one of the fundamental problems in bioinformatics, which plays an important role in locating transcription factor binding sites (TFBSs) in DNA sequences. Nowadays, identifying weak motifs and reducing the effect of local optimum are still important but challenging tasks for motif discovery. To solve the tasks, we propose a new algorithm, APMotif, which first applies the Affinity Propagation (AP) clustering in DNA sequences to produce informative and good candidate motifs and then employs Expectation Maximization (EM) refinement to obtain the optimal motifs from the candidate motifs. Experimental results both on simulated data sets and real biological data sets show that APMotif usually outperforms four other widely used algorithms in terms of high prediction accuracy.

  14. Systematic reconstruction of RNA functional motifs with high-throughput microfluidics.

    PubMed

    Martin, Lance; Meier, Matthias; Lyons, Shawn M; Sit, Rene V; Marzluff, William F; Quake, Stephen R; Chang, Howard Y

    2012-12-01

    We present RNA-mechanically induced trapping of molecular interactions (RNA-MITOMI), a microfluidic platform that allows integrated synthesis and functional assays for programmable RNA libraries. The interaction of a comprehensive library of RNA mutants with stem-loop-binding protein precisely defined the RNA structural and sequence features that govern affinity. The functional motif reconstructed in a single experiment on our platform uncovers new binding specificities and enriches interpretation of phylogenetic data.

  15. Inhibition of NADPH oxidase activation by synthetic peptides mapping within the carboxyl-terminal domain of small GTP-binding proteins. Lack of amino acid sequence specificity and importance of polybasic motif.

    PubMed

    Joseph, G; Gorzalczany, Y; Koshkin, V; Pick, E

    1994-11-18

    The small GTP-binding protein (G protein) Rac1 is an obligatory participant in the assembly of the superoxide (O2-.)-generating NADPH oxidase complex of macrophages. We investigated the effect of synthetic peptides, mapping within the near carboxyl-terminal domains of Rac1 and of related G proteins, on the activity of NADPH oxidase in a cell-free system consisting of solubilized guinea pig macrophage membrane, a cytosolic fraction enriched in p47phox and p67phox (or total cytosol), highly purified Rac1-GDP dissociation inhibitor for Rho (Rho GDI) complex, and the activating amphiphile, lithium dodecyl sulfate. Peptides Rac1-(178-188) and Rac1-(178-191), but not Rac2-(178-188), inhibited NADPH oxidase activity in a Rac1-dependent system when added prior to or simultaneously with the initiation of activation. However, undecapeptides corresponding to the near carboxyl-terminal domains of RhoA and RhoC and, most notably, a peptide containing the same amino acids as Rac1-(178-188), but in reversed orientation, were also inhibitory. Surprisingly, O2-. production in a Rac2-dependent cell-free system was inhibited by Rac1-(178-188) but not by Rac2-(178-188). Finally, basic polyamino acids containing lysine, histidine, or arginine, also inhibited NADPH oxidase activation. We conclude that inhibition of NADPH oxidase activation by synthetic peptides mapping within the carboxyl-terminal domain of certain small G proteins is not amino acid sequence-specific but related to the presence of a polybasic motif. It has been proposed that such a motif serves as a plasma membrane targeting signal for a number of small G proteins (Hancock, J.F., Paterson, H., and Marshall, C.J. (1990) Cell 63, 133-139).

  16. ATtRACT-a database of RNA-binding proteins and associated motifs.

    PubMed

    Giudice, Girolamo; Sánchez-Cabo, Fátima; Torroja, Carlos; Lara-Pezzi, Enrique

    2016-01-01

    RNA-binding proteins (RBPs) play a crucial role in key cellular processes, including RNA transport, splicing, polyadenylation and stability. Understanding the interaction between RBPs and RNA is key to improve our knowledge of RNA processing, localization and regulation in a global manner. Despite advances in recent years, a unified non-redundant resource that includes information on experimentally validated motifs, RBPs and integrated tools to exploit this information is lacking. Here, we developed a database named ATtRACT (available athttp://attract.cnic.es) that compiles information on 370 RBPs and 1583 RBP consensus binding motifs, 192 of which are not present in any other database. To populate ATtRACT we (i) extracted and hand-curated experimentally validated data from CISBP-RNA, SpliceAid-F, RBPDB databases, (ii) integrated and updated the unavailable ASD database and (iii) extracted information from Protein-RNA complexes present in Protein Data Bank database through computational analyses. ATtRACT provides also efficient algorithms to search a specific motif and scan one or more RNA sequences at a time. It also allows discoveringde novomotifs enriched in a set of related sequences and compare them with the motifs included in the database.Database URL:http:// attract. cnic. es.

  17. Integrated sequence and immunology filovirus database at Los Alamos.

    PubMed

    Yusim, Karina; Yoon, Hyejin; Foley, Brian; Feng, Shihai; Macke, Jennifer; Dimitrijevic, Mira; Abfalterer, Werner; Szinger, James; Fischer, Will; Kuiken, Carla; Korber, Bette

    2016-01-01

    The Ebola outbreak of 2013-15 infected more than 28 000 people and claimed more lives than all previous filovirus outbreaks combined. Governmental agencies, clinical teams, and the world scientific community pulled together in a multifaceted response ranging from prevention and disease control, to evaluating vaccines and therapeutics in human trials. As this epidemic is finally coming to a close, refocusing on long-term prevention strategies becomes paramount. Given the very real threat of future filovirus outbreaks, and the inherent uncertainty of the next outbreak virus and geographic location, it is prudent to consider the extent and implications of known natural diversity in advancing vaccines and therapeutic approaches. To facilitate such consideration, we have updated and enhanced the content of the filovirus portion of Los Alamos Hemorrhagic Fever Viruses Database. We have integrated and performed baseline analysis of all family ITALIC! Filoviridaesequences deposited into GenBank, with associated immune response data, and metadata, and we have added new computational tools with web-interfaces to assist users with analysis. Here, we (i) describe the main features of updated database, (ii) provide integrated views and some basic analyses summarizing evolutionary patterns as they relate to geo-temporal data captured in the database and (iii) highlight the most conserved regions in the proteome that may be useful for a T cell vaccine strategy.Database URL:www.hfv.lanl.gov.

  18. The European Classical Swine Fever Virus Database: Blueprint for a Pathogen-Specific Sequence Database with Integrated Sequence Analysis Tools.

    PubMed

    Postel, Alexander; Schmeiser, Stefanie; Zimmermann, Bernd; Becher, Paul

    2016-11-07

    Molecular epidemiology has become an indispensable tool in the diagnosis of diseases and in tracing the infection routes of pathogens. Due to advances in conventional sequencing and the development of high throughput technologies, the field of sequence determination is in the process of being revolutionized. Platforms for sharing sequence information and providing standardized tools for phylogenetic analyses are becoming increasingly important. The database (DB) of the European Union (EU) and World Organisation for Animal Health (OIE) Reference Laboratory for classical swine fever offers one of the world's largest semi-public virus-specific sequence collections combined with a module for phylogenetic analysis. The classical swine fever (CSF) DB (CSF-DB) became a valuable tool for supporting diagnosis and epidemiological investigations of this highly contagious disease in pigs with high socio-economic impacts worldwide. The DB has been re-designed and now allows for the storage and analysis of traditionally used, well established genomic regions and of larger genomic regions including complete viral genomes. We present an application example for the analysis of highly similar viral sequences obtained in an endemic disease situation and introduce the new geographic "CSF Maps" tool. The concept of this standardized and easy-to-use DB with an integrated genetic typing module is suited to serve as a blueprint for similar platforms for other human or animal viruses.

  19. The European Classical Swine Fever Virus Database: Blueprint for a Pathogen-Specific Sequence Database with Integrated Sequence Analysis Tools

    PubMed Central

    Postel, Alexander; Schmeiser, Stefanie; Zimmermann, Bernd; Becher, Paul

    2016-01-01

    Molecular epidemiology has become an indispensable tool in the diagnosis of diseases and in tracing the infection routes of pathogens. Due to advances in conventional sequencing and the development of high throughput technologies, the field of sequence determination is in the process of being revolutionized. Platforms for sharing sequence information and providing standardized tools for phylogenetic analyses are becoming increasingly important. The database (DB) of the European Union (EU) and World Organisation for Animal Health (OIE) Reference Laboratory for classical swine fever offers one of the world’s largest semi-public virus-specific sequence collections combined with a module for phylogenetic analysis. The classical swine fever (CSF) DB (CSF-DB) became a valuable tool for supporting diagnosis and epidemiological investigations of this highly contagious disease in pigs with high socio-economic impacts worldwide. The DB has been re-designed and now allows for the storage and analysis of traditionally used, well established genomic regions and of larger genomic regions including complete viral genomes. We present an application example for the analysis of highly similar viral sequences obtained in an endemic disease situation and introduce the new geographic “CSF Maps” tool. The concept of this standardized and easy-to-use DB with an integrated genetic typing module is suited to serve as a blueprint for similar platforms for other human or animal viruses. PMID:27827988

  20. Rapid fixation of a distinctive sequence motif in the 3' noncoding region of the clade of West Nile virus invading North America.

    PubMed

    Hughes, Austin L; Piontkivska, Helen; Foppa, Ivo

    2007-09-15

    Phylogenetic analysis of complete genomes of West Nile virus (WNV) by a variety of methods supported the hypothesis that North American isolates of WNV constitute a monophyletic group, together with an isolate from Israel and one from Hungary. We used ancestral sequence reconstruction in order to obtain evidence for evolutionary changes that might be correlated with increased virulence in this clade (designated the N.A. clade). There was one amino acid change (I-->T at residue 356 of the NS3 protein) that occurred in the ancestor of the N.A. clade and remained conserved in all N.A. clade genomes analyzed. There were four changes in the upstream portion of the 3' noncoding region (the AT-enriched region) that occurred in the ancestor of the N.A. clade and remained conserved in all N.A. clade genomes analyzed, changes predicted to alter RNA secondary structure. The AT-enriched region showed a higher rate of substitution in the branch ancestral to the N.A. clade, relative to polymorphism, than did the remainder of the noncoding regions, synonymous sites in coding regions, or nonsynonymous sites in coding regions. The high rate of occurrence of fixed nucleotide substitutions in this region suggests that positive Darwinian selection may have acted on this portion of the 3'NCR and that these fixed changes, possibly in concert with the amino acid change in NS3, may underlie phenotypic effects associated with increased virulence in North American WNV.

  1. Serial number tagging reveals a prominent sequence preference of retrotransposon integration.

    PubMed

    Chatterjee, Atreyi Ghatak; Esnault, Caroline; Guo, Yabin; Hung, Stevephen; McQueen, Philip G; Levin, Henry L

    2014-07-01

    Transposable elements (TE) have both negative and positive impact on the biology of their host. As a result, a balance is struck between the host and the TE that relies on directing integration to specific genome territories. The extraordinary capacity of DNA sequencing can create ultra dense maps of integration that are being used to study the mechanisms that position integration. Unfortunately, the great increase in the numbers of insertion sites detected comes with the cost of not knowing which positions are rare targets and which sustain high numbers of insertions. To address this problem we developed the serial number system, a TE tagging method that measures the frequency of integration at single nucleotide positions. We sequenced 1 million insertions of retrotransposon Tf1 in the genome of Schizosaccharomyces pombe and obtained the first profile of integration with frequencies for each individual position. Integration levels at individual nucleotides varied over two orders of magnitude and revealed that sequence recognition plays a key role in positioning integration. The serial number system is a general method that can be applied to determine precise integration maps for retroviruses and gene therapy vectors.

  2. Golden Ratio Versus Pi as Random Sequence Sources for Monte Carlo Integration

    NASA Technical Reports Server (NTRS)

    Sen, S. K.; Agarwal, Ravi P.; Shaykhian, Gholam Ali

    2007-01-01

    We discuss here the relative merits of these numbers as possible random sequence sources. The quality of these sequences is not judged directly based on the outcome of all known tests for the randomness of a sequence. Instead, it is determined implicitly by the accuracy of the Monte Carlo integration in a statistical sense. Since our main motive of using a random sequence is to solve real world problems, it is more desirable if we compare the quality of the sequences based on their performances for these problems in terms of quality/accuracy of the output. We also compare these sources against those generated by a popular pseudo-random generator, viz., the Matlab rand and the quasi-random generator ha/ton both in terms of error and time complexity. Our study demonstrates that consecutive blocks of digits of each of these numbers produce a good random sequence source. It is observed that randomly chosen blocks of digits do not have any remarkable advantage over consecutive blocks for the accuracy of the Monte Carlo integration. Also, it reveals that pi is a better source of a random sequence than theta when the accuracy of the integration is concerned.

  3. Rapid Fixation of a Distinctive Sequence Motif in the 3′Noncoding Region of the Clade of West Nile Virus Invading North America

    PubMed Central

    Hughes, Austin L.; Piontkivska, Helen; Foppa, Ivo

    2007-01-01

    Phylogenetic analysis of complete genomes of West Nile virus (WNV) by a variety of methods supported the hypothesis that North American isolates of WNV constitute a monophyletic group, together with an isolate from Israel and one from Hungary. We used ancestral sequence reconstruction in order to obtain evidence for evolutionary changes that might be correlated with increased virulence in this clade (designated the N.A. clade). There was one amino acid change (I→T at residue 356 of the NS3 protein) that occurred in the ancestor of the N.A. clade and remained conserved in all N.A. clade genomes analyzed. There were four changes in the upstream portion of the 3′ noncoding region (the AT-enriched region) that occurred in the ancestor of the N.A. clade and remained conserved in all N.A. clade genomes analyzed, changes predicted to alter RNA secondary structure. The AT-enriched region showed a higher rate of substitution in the branch ancestral to the N.A. clade, relative to polymorphism, than did the remainder of the non-coding regions, synonymous sites in coding regions, or nonsynonymous sites in coding regions. The high rate of occurrence of fixed nucleotide substitutions in this region suggests that positive Darwinian selection may have acted on this portion of the 3′NCR and that these fixed changes, possibly in concert with the amino acid change in NS3, may underlie phenotypic effects associated with increased virulence in North American WNV. PMID:17587514

  4. Motif enrichment tool.

    PubMed

    Blatti, Charles; Sinha, Saurabh

    2014-07-01

    The Motif Enrichment Tool (MET) provides an online interface that enables users to find major transcriptional regulators of their gene sets of interest. MET searches the appropriate regulatory region around each gene and identifies which transcription factor DNA-binding specificities (motifs) are statistically overrepresented. Motif enrichment analysis is currently available for many metazoan species including human, mouse, fruit fly, planaria and flowering plants. MET also leverages high-throughput experimental data such as ChIP-seq and DNase-seq from ENCODE and ModENCODE to identify the regulatory targets of a transcription factor with greater precision. The results from MET are produced in real time and are linked to a genome browser for easy follow-up analysis. Use of the web tool is free and open to all, and there is no login requirement. ADDRESS: http://veda.cs.uiuc.edu/MET/.

  5. An integrated SNP mining and utilization (ISMU) pipeline for next generation sequencing data.

    PubMed

    Azam, Sarwar; Rathore, Abhishek; Shah, Trushar M; Telluri, Mohan; Amindala, BhanuPrakash; Ruperao, Pradeep; Katta, Mohan A V S K; Varshney, Rajeev K

    2014-01-01

    Open source single nucleotide polymorphism (SNP) discovery pipelines for next generation sequencing data commonly requires working knowledge of command line interface, massive computational resources and expertise which is a daunting task for biologists. Further, the SNP information generated may not be readily used for downstream processes such as genotyping. Hence, a comprehensive pipeline has been developed by integrating several open source next generation sequencing (NGS) tools along with a graphical user interface called Integrated SNP Mining and Utilization (ISMU) for SNP discovery and their utilization by developing genotyping assays. The pipeline features functionalities such as pre-processing of raw data, integration of open source alignment tools (Bowtie2, BWA, Maq, NovoAlign and SOAP2), SNP prediction (SAMtools/SOAPsnp/CNS2snp and CbCC) methods and interfaces for developing genotyping assays. The pipeline outputs a list of high quality SNPs between all pairwise combinations of genotypes analyzed, in addition to the reference genome/sequence. Visualization tools (Tablet and Flapjack) integrated into the pipeline enable inspection of the alignment and errors, if any. The pipeline also provides a confidence score or polymorphism information content value with flanking sequences for identified SNPs in standard format required for developing marker genotyping (KASP and Golden Gate) assays. The pipeline enables users to process a range of NGS datasets such as whole genome re-sequencing, restriction site associated DNA sequencing and transcriptome sequencing data at a fast speed. The pipeline is very useful for plant genetics and breeding community with no computational expertise in order to discover SNPs and utilize in genomics, genetics and breeding studies. The pipeline has been parallelized to process huge datasets of next generation sequencing. It has been developed in Java language and is available at http://hpc.icrisat.cgiar.org/ISMU as a standalone

  6. An Integrated SNP Mining and Utilization (ISMU) Pipeline for Next Generation Sequencing Data

    PubMed Central

    Azam, Sarwar; Rathore, Abhishek; Shah, Trushar M.; Telluri, Mohan; Amindala, BhanuPrakash; Ruperao, Pradeep; Katta, Mohan A. V. S. K.; Varshney, Rajeev K.

    2014-01-01

    Open source single nucleotide polymorphism (SNP) discovery pipelines for next generation sequencing data commonly requires working knowledge of command line interface, massive computational resources and expertise which is a daunting task for biologists. Further, the SNP information generated may not be readily used for downstream processes such as genotyping. Hence, a comprehensive pipeline has been developed by integrating several open source next generation sequencing (NGS) tools along with a graphical user interface called Integrated SNP Mining and Utilization (ISMU) for SNP discovery and their utilization by developing genotyping assays. The pipeline features functionalities such as pre-processing of raw data, integration of open source alignment tools (Bowtie2, BWA, Maq, NovoAlign and SOAP2), SNP prediction (SAMtools/SOAPsnp/CNS2snp and CbCC) methods and interfaces for developing genotyping assays. The pipeline outputs a list of high quality SNPs between all pairwise combinations of genotypes analyzed, in addition to the reference genome/sequence. Visualization tools (Tablet and Flapjack) integrated into the pipeline enable inspection of the alignment and errors, if any. The pipeline also provides a confidence score or polymorphism information content value with flanking sequences for identified SNPs in standard format required for developing marker genotyping (KASP and Golden Gate) assays. The pipeline enables users to process a range of NGS datasets such as whole genome re-sequencing, restriction site associated DNA sequencing and transcriptome sequencing data at a fast speed. The pipeline is very useful for plant genetics and breeding community with no computational expertise in order to discover SNPs and utilize in genomics, genetics and breeding studies. The pipeline has been parallelized to process huge datasets of next generation sequencing. It has been developed in Java language and is available at http://hpc.icrisat.cgiar.org/ISMU as a standalone

  7. Protein Chaperones Q8ZP25_SALTY from Salmonella Typhimurium and HYAE_ECOLI from Escherichia coli Exhibit Thioredoxin-like Structures Despite Lack of Canonical Thioredoxin Active Site Sequence Motif

    SciTech Connect

    Parish, D.; Benach, J; Liu, G; Singarapu, K; Xiao, R; Acton, T; Hunt, J; Montelione, G; Szyperski, T; et. al.

    2008-01-01

    The structure of the 142-residue protein Q8ZP25 SALTY encoded in the genome of Salmonella typhimurium LT2 was determined independently by NMR and X-ray crystallography, and the structure of the 140-residue protein HYAE ECOLI encoded in the genome of Escherichia coli was determined by NMR. The two proteins belong to Pfam (Finn et al. 34:D247-D251, 2006) PF07449, which currently comprises 50 members, and belongs itself to the 'thioredoxin-like clan'. However, protein HYAE ECOLI and the other proteins of Pfam PF07449 do not contain the canonical Cys-X-X-Cys active site sequence motif of thioredoxin. Protein HYAE ECOLI was previously classified as a (NiFe) hydrogenase-1 specific chaperone interacting with the twin-arginine translocation (Tat) signal peptide. The structures presented here exhibit the expected thioredoxin-like fold and support the view that members of Pfam family PF07449 specifically interact with Tat signal peptides.

  8. Integrated next-generation sequencing analysis of whole exome and 409 cancer-related genes.

    PubMed

    Shimoda, Yuji; Nagashima, Takeshi; Urakami, Kenichi; Tanabe, Tomoe; Saito, Junko; Naruoka, Akane; Serizawa, Masakuni; Mochizuki, Tohru; Ohshima, Keiichi; Ohnami, Sumiko; Ohnami, Shumpei; Kusuhara, Masatoshi; Yamaguchi, Ken

    2016-01-01

    The use of next-generation sequencing (NGS) techniques to analyze the genomes of cancer cells has identified numerous genomic alterations, including single-base substitutions, small insertions and deletions, amplification, recombination, and epigenetic modifications. NGS contributes to the clinical management of patients as well as new discoveries that identify the mechanisms of tumorigenesis. Moreover, analysis of gene panels targeting actionable mutations enhances efforts to optimize the selection of chemotherapeutic regimens. However, whole genome sequencing takes several days and costs at least $10,000, depending on sequence coverage. Therefore, laboratories with relatively limited resources must employ a more economical approach. For this purpose, we conducted an integrated nucleotide sequence analysis of a panel of 409-cancer related genes (409-CRG) combined with whole exome sequencing (WES). Analysis of the 409-CRG panel detected low-frequency variants with high sensitivity, and WES identified moderate and high frequency somatic variants as well as germline variants.

  9. [Personal motif in art].

    PubMed

    Gerevich, József

    2015-01-01

    One of the basic questions of the art psychology is whether a personal motif is to be found behind works of art and if so, how openly or indirectly it appears in the work itself. Analysis of examples and documents from the fine arts and literature allow us to conclude that the personal motif that can be identified by the viewer through symbols, at times easily at others with more difficulty, gives an emotional plus to the artistic product. The personal motif may be found in traumatic experiences, in communication to the model or with other emotionally important persons (mourning, disappointment, revenge, hatred, rivalry, revolt etc.), in self-searching, or self-analysis. The emotions are expressed in artistic activity either directly or indirectly. The intention nourished by the artist's identity (Kunstwollen) may stand in the way of spontaneous self-expression, channelling it into hidden paths. Under the influence of certain circumstances, the artist may arouse in the viewer, consciously or unconsciously, an illusionary, misleading image of himself. An examination of the personal motif is one of the important research areas of art therapy.

  10. Large-scale collection and analysis of full-length cDNAs from Brachypodium distachyon and integration with Pooideae sequence resources.

    PubMed

    Mochida, Keiichi; Uehara-Yamaguchi, Yukiko; Takahashi, Fuminori; Yoshida, Takuhiro; Sakurai, Tetsuya; Shinozaki, Kazuo

    2013-01-01

    A comprehensive collection of full-length cDNAs is essential for correct structural gene annotation and functional analyses of genes. We constructed a mixed full-length cDNA library from 21 different tissues of Brachypodium distachyon Bd21, and obtained 78,163 high quality expressed sequence tags (ESTs) from both ends of ca. 40,000 clones (including 16,079 contigs). We updated gene structure annotations of Brachypodium genes based on full-length cDNA sequences in comparison with the latest publicly available annotations. About 10,000 non-redundant gene models were supported by full-length cDNAs; ca. 6,000 showed some transcription unit modifications. We also found ca. 580 novel gene models, including 362 newly identified in Bd21. Using the updated transcription start sites, we searched a total of 580 plant cis-motifs in the -3 kb promoter regions and determined a genome-wide Brachypodium promoter architecture. Furthermore, we integrated the Brachypodium full-length cDNAs and updated gene structures with available sequence resources in wheat and barley in a web-accessible database, the RIKEN Brachypodium FL cDNA database. The database represents a "one-stop" information resource for all genomic information in the Pooideae, facilitating functional analysis of genes in this model grass plant and seamless knowledge transfer to the Triticeae crops.

  11. FEATnotator: A tool for integrated annotation of sequence features and variation, facilitating interpretation in genomics experiments.

    PubMed

    Podicheti, Ram; Mockaitis, Keithanne

    2015-06-01

    As approaches are sought for more efficient and democratized uses of non-model and expanded model genomics references, ease of integration of genomic feature datasets is especially desirable in multidisciplinary research communities. Valuable conclusions are often missed or slowed when researchers refer experimental results to a single reference sequence that lacks integrated pan-genomic and multi-experiment data in accessible formats. Association of genomic positional information, such as results from an expansive variety of next-generation sequencing experiments, with annotated reference features such as genes or predicted protein binding sites, provides the context essential for conclusions and ongoing research. When the experimental system includes polymorphic genomic inputs, rapid calculation of gene structural and protein translational effects of sequence variation from the reference can be invaluable. Here we present FEATnotator, a lightweight, fast and easy to use open source software program that integrates and reports overlap and proximity in genomic information from any user-defined datasets including those from next generation sequencing applications. We illustrate use of the tool by summarizing whole genome sequence variation of a widely used natural isolate of Arabidopsis thaliana in the context of gene models of the reference accession. Previous discovery of a protein coding deletion influencing root development is replicated rapidly. Appropriate even in investigations of a single gene or genic regions such as QTL, comprehensive reports provided by FEATnotator better prepare researchers for interpretation of their experimental results. The tool is available for download at http://featnotator.sourceforge.net.

  12. The Assembly Motif of a Bacterial Small Multidrug Resistance Protein*

    PubMed Central

    Poulsen, Bradley E.; Rath, Arianna; Deber, Charles M.

    2009-01-01

    Multidrug transporters such as the small multidrug resistance (SMR) family of bacterial integral membrane proteins are capable of conferring clinically significant resistance to a variety of common therapeutics. As antiporter proteins of ∼100 amino acids, SMRs must self-assemble into homo-oligomeric structures for efflux of drug molecules. Oligomerization centered at transmembrane helix four (TM4) has been implicated in SMR assembly, but the full complement of residues required to mediate its self-interaction remains to be characterized. Here, we use Hsmr, the 110-residue SMR family member of the archaebacterium Halobacterium salinarum, to determine the TM4 residue motif required to mediate drug resistance and SMR self-association. Twelve single point mutants that scan the central portion of the TM4 helix (residues 85–104) were constructed and were tested for their ability to confer resistance to the cytotoxic compound ethidium bromide. Six residues were found to be individually essential for drug resistance activity (Gly90, Leu91, Leu93, Ile94, Gly97, and Val98), defining a minimum activity motif of 90GLXLIXXGV98 within TM4. When the propensity of these mutants to dimerize on SDS-PAGE was examined, replacements of all but Ile resulted in ∼2-fold reduction of dimerization versus the wild-type antiporter. Our work defines a minimum activity motif of 90GLXLIXXGV98 within TM4 and suggests that this sequence mediates TM4-based SMR dimerization along a single helix surface, stabilized by a small residue heptad repeat sequence. These TM4-TM4 interactions likely constitute the highest affinity locus for disruption of SMR function by directly targeting its self-assembly mechanism. PMID:19224913

  13. Sequenced Integration and the Identification of a Problem-Solving Approach through a Learning Process

    ERIC Educational Resources Information Center

    Cormas, Peter C.

    2016-01-01

    Preservice teachers (N = 27) in two sections of a sequenced, methodological and process integrated mathematics/science course solved a levers problem with three similar learning processes and a problem-solving approach, and identified a problem-solving approach through one different learning process. Similar learning processes used included:…

  14. The Practice Integrated Learning Sequence: Linking Education with the Practice of Medicine.

    ERIC Educational Resources Information Center

    Lanzilotti, Salvatore S.; And Others

    1986-01-01

    The Practice Integrated Learning Sequence (PILS) process was used in an educational program on hypertension management involving 1,013 physicians. Data suggest that PILS can help physicians assess practice needs by evaluating practice behaviors. Results indicate the importance of designing programs that facilitate physicians' development of…

  15. Human Immunodeficiency Virus Reverse Transcriptase and Protease Sequence Database: an expanded data model integrating natural language text and sequence analysis programs.

    PubMed

    Kantor, R; Machekano, R; Gonzales, M J; Dupnik, K; Schapiro, J M; Shafer, R W

    2001-01-01

    The HIV Reverse Transcriptase and Protease Sequence Database is an on-line relational database that catalogs evolutionary and drug-related sequence variation in the human immunodeficiency virus (HIV) reverse transcriptase (RT) and protease enzymes, the molecular targets of anti-HIV therapy (http://hivdb.stanford.edu). The database contains a compilation of nearly all published HIV RT and protease sequences, including submissions from International Collaboration databases and sequences published in journal articles. Sequences are linked to data about the source of the sequence sample and the antiretroviral drug treatment history of the individual from whom the isolate was obtained. During the past year 3500 sequences have been added and the data model has been expanded to include drug susceptibility data on sequenced isolates. Database content has also been integrated with didactic text and the output of two sequence analysis programs.

  16. Integrated and sequence-ordered BAC- and YAC-based physical maps for the rat genome.

    PubMed

    Krzywinski, Martin; Wallis, John; Gösele, Claudia; Bosdet, Ian; Chiu, Readman; Graves, Tina; Hummel, Oliver; Layman, Dan; Mathewson, Carrie; Wye, Natasja; Zhu, Baoli; Albracht, Derek; Asano, Jennifer; Barber, Sarah; Brown-John, Mabel; Chan, Susanna; Chand, Steve; Cloutier, Alison; Davito, Jonathon; Fjell, Chris; Gaige, Tony; Ganten, Detlev; Girn, Noreen; Guggenheimer, Kurtis; Himmelbauer, Heinz; Kreitler, Thomas; Leach, Stephen; Lee, Darlene; Lehrach, Hans; Mayo, Michael; Mead, Kelly; Olson, Teika; Pandoh, Pawan; Prabhu, Anna-Liisa; Shin, Heesun; Tänzer, Simone; Thompson, Jason; Tsai, Miranda; Walker, Jason; Yang, George; Sekhon, Mandeep; Hillier, LaDeana; Zimdahl, Heike; Marziali, Andre; Osoegawa, Kazutoyo; Zhao, Shaying; Siddiqui, Asim; de Jong, Pieter J; Warren, Wes; Mardis, Elaine; McPherson, John D; Wilson, Richard; Hübner, Norbert; Jones, Steven; Marra, Marco; Schein, Jacqueline

    2004-04-01

    As part of the effort to sequence the genome of Rattus norvegicus, we constructed a physical map comprised of fingerprinted bacterial artificial chromosome (BAC) clones from the CHORI-230 BAC library. These BAC clones provide approximately 13-fold redundant coverage of the genome and have been assembled into 376 fingerprint contigs. A yeast artificial chromosome (YAC) map was also constructed and aligned with the BAC map via fingerprinted BAC and P1 artificial chromosome clones (PACs) sharing interspersed repetitive sequence markers with the YAC-based physical map. We have annotated 95% of the fingerprint map clones in contigs with coordinates on the version 3.1 rat genome sequence assembly, using BAC-end sequences and in silico mapping methods. These coordinates have allowed anchoring 358 of the 376 fingerprint map contigs onto the sequence assembly. Of these, 324 contigs are anchored to rat genome sequences localized to chromosomes, and 34 contigs are anchored to unlocalized portions of the rat sequence assembly. The remaining 18 contigs, containing 54 clones, still require placement. The fingerprint map is a high-resolution integrative data resource that provides genome-ordered associations among BAC, YAC, and PAC clones and the assembled sequence of the rat genome.

  17. T-DNA integration into the Arabidopsis genome depends on sequences of pre-insertion sites

    PubMed Central

    Brunaud, Véronique; Balzergue, Sandrine; Dubreucq, Bertrand; Aubourg, Sébastien; Samson, Franck; Chauvin, Stéphanie; Bechtold, Nicole; Cruaud, Corinne; DeRose, Richard; Pelletier, Georges; Lepiniec, Loïc; Caboche, Michel; Lecharny, Alain

    2002-01-01

    A statistical analysis of 9000 flanking sequence tags characterizing transferred DNA (T-DNA) transformants in Arabidopsis sheds new light on T-DNA insertion by illegitimate recombination. T-DNA integration is favoured in plant DNA regions with an A-T-rich content. The formation of a short DNA duplex between the host DNA and the left end of the T-DNA sets the frame for the recombination. The sequence immediately downstream of the plant A-T-rich region is the master element for setting up the DNA duplex, and deletions into the left end of the integrated T-DNA depend on the location of a complementary sequence on the T-DNA. Recombination at the right end of the T-DNA with the host DNA involves another DNA duplex, 2–3 base pairs long, that preferentially includes a G close to the right end of the T-DNA. PMID:12446565

  18. Integration of complete chloroplast genome sequences with small amplicon datasets improves phylogenetic resolution in Acacia.

    PubMed

    Williams, Anna V; Miller, Joseph T; Small, Ian; Nevill, Paul G; Boykin, Laura M

    2016-03-01

    Combining whole genome data with previously obtained amplicon sequences has the potential to increase the resolution of phylogenetic analyses, particularly at low taxonomic levels or where recent divergence, rapid speciation or slow genome evolution has resulted in limited sequence variation. However, the integration of these types of data for large scale phylogenetic studies has rarely been investigated. Here we conduct a phylogenetic analysis of the whole chloroplast genome and two nuclear ribosomal loci for 65 Acacia species from across the most recent Acacia phylogeny. We then combine this data with previously generated amplicon sequences (four chloroplast loci and two nuclear ribosomal loci) for 508 Acacia species. We use several phylogenetic methods, including maximum likelihood bootstrapping (with and without constraint) and ExaBayes, in order to determine the success of combining a dataset of 4000bp with one of 189,000bp. The results of our study indicate that the inclusion of whole genome data gave a far better resolved and well supported representation of the phylogenetic relationships within Acacia than using only amplicon sequences, with the greatest support observed when using a whole genome phylogeny as a constraint on the amplicon sequences. Our study therefore provides methods for optimal integration of genomic and amplicon sequences.

  19. Identifying combinatorial regulation of transcription factors and binding motifs

    PubMed Central

    Kato, Mamoru; Hata, Naoya; Banerjee, Nilanjana; Futcher, Bruce; Zhang, Michael Q

    2004-01-01

    Background Combinatorial interaction of transcription factors (TFs) is important for gene regulation. Although various genomic datasets are relevant to this issue, each dataset provides relatively weak evidence on its own. Developing methods that can integrate different sequence, expression and localization data have become important. Results Here we use a novel method that integrates chromatin immunoprecipitation (ChIP) data with microarray expression data and with combinatorial TF-motif analysis. We systematically identify combinations of transcription factors and of motifs. The various combinations of TFs involved multiple binding mechanisms. We reconstruct a new combinatorial regulatory map of the yeast cell cycle in which cell-cycle regulation can be drawn as a chain of extended TF modules. We find that the pairwise combination of a TF for an early cell-cycle phase and a TF for a later phase is often used to control gene expression at intermediate times. Thus the number of distinct times of gene expression is greater than the number of transcription factors. We also see that some TF modules control branch points (cell-cycle entry and exit), and in the presence of appropriate signals they can allow progress along alternative pathways. Conclusions Combining different data sources can increase statistical power as demonstrated by detecting TF interactions and composite TF-binding motifs. The original picture of a chain of simple cell-cycle regulators can be extended to a chain of composite regulatory modules: different modules may share a common TF component in the same pathway or a TF component cross-talking to other pathways. PMID:15287978

  20. IMSA: integrated metagenomic sequence analysis for identification of exogenous reads in a host genomic background.

    PubMed

    Dimon, Michelle T; Wood, Henry M; Rabbitts, Pamela H; Arron, Sarah T

    2013-01-01

    Metagenomics, the study of microbial genomes within diverse environments, is a rapidly developing field. The identification of microbial sequences within a host organism enables the study of human intestinal, respiratory, and skin microbiota, and has allowed the identification of novel viruses in diseases such as Merkel cell carcinoma. There are few publicly available tools for metagenomic high throughput sequence analysis. We present Integrated Metagenomic Sequence Analysis (IMSA), a flexible, fast, and robust computational analysis pipeline that is available for public use. IMSA takes input sequence from high throughput datasets and uses a user-defined host database to filter out host sequence. IMSA then aligns the filtered reads to a user-defined universal database to characterize exogenous reads within the host background. IMSA assigns a score to each node of the taxonomy based on read frequency, and can output this as a taxonomy report suitable for cluster analysis or as a taxonomy map (TaxMap). IMSA also outputs the specific sequence reads assigned to a taxon of interest for downstream analysis. We demonstrate the use of IMSA to detect pathogens and normal flora within sequence data from a primary human cervical cancer carrying HPV16, a primary human cutaneous squamous cell carcinoma carrying HPV 16, the CaSki cell line carrying HPV16, and the HeLa cell line carrying HPV18.

  1. MADMX: a strategy for maximal dense motif extraction.

    PubMed

    Grossi, Roberto; Pietracaprina, Andrea; Pisanti, Nadia; Pucci, Geppino; Upfal, Eli; Vandin, Fabio

    2011-04-01

    We develop, analyze, and experiment with a new tool, called MADMX, which extracts frequent motifs from biological sequences. We introduce the notion of density to single out the "significant" motifs. The density is a simple and flexible measure for bounding the number of don't cares in a motif, defined as the fraction of solid (i.e., different from don't care) characters in the motif. A maximal dense motif has density above a certain threshold, and any further specialization of a don't care symbol in it or any extension of its boundaries decreases its number of occurrences in the input sequence. By extracting only maximal dense motifs, MADMX reduces the output size and improves performance, while enhancing the quality of the discoveries. The efficiency of our approach relies on a newly defined combining operation, dubbed fusion, which allows for the construction of maximal dense motifs in a bottom-up fashion, while avoiding the generation of nonmaximal ones. We provide experimental evidence of the efficiency and the quality of the motifs returned by MADMX.

  2. A Bioinformatics Approach for Detecting Repetitive Nested Motifs using Pattern Matching

    PubMed Central

    Romero, José R.; Carballido, Jessica A.; Garbus, Ingrid; Echenique, Viviana C.; Ponzoni, Ignacio

    2016-01-01

    The identification of nested motifs in genomic sequences is a complex computational problem. The detection of these patterns is important to allow the discovery of transposable element (TE) insertions, incomplete reverse transcripts, deletions, and/or mutations. In this study, a de novo strategy for detecting patterns that represent nested motifs was designed based on exhaustive searches for pairs of motifs and combinatorial pattern analysis. These patterns can be grouped into three categories, motifs within other motifs, motifs flanked by other motifs, and motifs of large size. The methodology used in this study, applied to genomic sequences from the plant species Aegilops tauschii and Oryza sativa, revealed that it is possible to identify putative nested TEs by detecting these three types of patterns. The results were validated through BLAST alignments, which revealed the efficacy and usefulness of the new method, which is called Mamushka. PMID:27812277

  3. Convergent evolution and mimicry of protein linear motifs in host-pathogen interactions.

    PubMed

    Chemes, Lucía Beatriz; de Prat-Gay, Gonzalo; Sánchez, Ignacio Enrique

    2015-06-01

    Pathogen linear motif mimics are highly evolvable elements that facilitate rewiring of host protein interaction networks. Host linear motifs and pathogen mimics differ in sequence, leading to thermodynamic and structural differences in the resulting protein-protein interactions. Moreover, the functional output of a mimic depends on the motif and domain repertoire of the pathogen protein. Regulatory evolution mediated by linear motifs can be understood by measuring evolutionary rates, quantifying positive and negative selection and performing phylogenetic reconstructions of linear motif natural history. Convergent evolution of linear motif mimics is widespread among unrelated proteins from viral, prokaryotic and eukaryotic pathogens and can also take place within individual protein phylogenies. Statistics, biochemistry and laboratory models of infection link pathogen linear motifs to phenotypic traits such as tropism, virulence and oncogenicity. In vitro evolution experiments and analysis of natural sequences suggest that changes in linear motif composition underlie pathogen adaptation to a changing environment.

  4. A Bioinformatics Approach for Detecting Repetitive Nested Motifs using Pattern Matching.

    PubMed

    Romero, José R; Carballido, Jessica A; Garbus, Ingrid; Echenique, Viviana C; Ponzoni, Ignacio

    2016-01-01

    The identification of nested motifs in genomic sequences is a complex computational problem. The detection of these patterns is important to allow the discovery of transposable element (TE) insertions, incomplete reverse transcripts, deletions, and/or mutations. In this study, a de novo strategy for detecting patterns that represent nested motifs was designed based on exhaustive searches for pairs of motifs and combinatorial pattern analysis. These patterns can be grouped into three categories, motifs within other motifs, motifs flanked by other motifs, and motifs of large size. The methodology used in this study, applied to genomic sequences from the plant species Aegilops tauschii and Oryza sativa, revealed that it is possible to identify putative nested TEs by detecting these three types of patterns. The results were validated through BLAST alignments, which revealed the efficacy and usefulness of the new method, which is called Mamushka.

  5. Discriminative motif analysis of high-throughput dataset

    PubMed Central

    Yao, Zizhen; MacQuarrie, Kyle L.; Fong, Abraham P.; Tapscott, Stephen J.; Ruzzo, Walter L.; Gentleman, Robert C.

    2014-01-01

    Motivation: High-throughput ChIP-seq studies typically identify thousands of peaks for a single transcription factor (TF). It is common for traditional motif discovery tools to predict motifs that are statistically significant against a naïve background distribution but are of questionable biological relevance. Results: We describe a simple yet effective algorithm for discovering differential motifs between two sequence datasets that is effective in eliminating systematic biases and scalable to large datasets. Tested on 207 ENCODE ChIP-seq datasets, our method identifies correct motifs in 78% of the datasets with known motifs, demonstrating improvement in both accuracy and efficiency compared with DREME, another state-of-art discriminative motif discovery tool. More interestingly, on the remaining more challenging datasets, we identify common technical or biological factors that compromise the motif search results and use advanced features of our tool to control for these factors. We also present case studies demonstrating the ability of our method to detect single base pair differences in DNA specificity of two similar TFs. Lastly, we demonstrate discovery of key TF motifs involved in tissue specification by examination of high-throughput DNase accessibility data. Availability: The motifRG package is publically available via the bioconductor repository. Contact: yzizhen@fhcrc.org Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24162561

  6. An IcmF family protein, ImpLM, is an integral inner membrane protein interacting with ImpKL, and its walker a motif is required for type VI secretion system-mediated Hcp secretion in Agrobacterium tumefaciens.

    PubMed

    Ma, Lay-Sun; Lin, Jer-Sheng; Lai, Erh-Min

    2009-07-01

    An intracellular multiplication F (IcmF) family protein is a conserved component of a newly identified type VI secretion system (T6SS) encoded in many animal and plant-associated Proteobacteria. We have previously identified ImpL(M), an IcmF family protein that is required for the secretion of the T6SS substrate hemolysin-coregulated protein (Hcp) from the plant-pathogenic bacterium Agrobacterium tumefaciens. In this study, we characterized the topology of ImpL(M) and the importance of its nucleotide-binding Walker A motif involved in Hcp secretion from A. tumefaciens. A combination of beta-lactamase-green fluorescent protein fusion and biochemical fractionation analyses revealed that ImpL(M) is an integral polytopic inner membrane protein comprising three transmembrane domains bordered by an N-terminal domain facing the cytoplasm and a C-terminal domain exposed to the periplasm. impL(M) mutants with substitutions or deletions in the Walker A motif failed to complement the impL(M) deletion mutant for Hcp secretion, which provided evidence that ImpL(M) may bind and/or hydrolyze nucleoside triphosphates to mediate T6SS machine assembly and/or substrate secretion. Protein-protein interaction and protein stability analyses indicated that there is a physical interaction between ImpL(M) and another essential T6SS component, ImpK(L). Topology and biochemical fractionation analyses suggested that ImpK(L) is an integral bitopic inner membrane protein with an N-terminal domain facing the cytoplasm and a C-terminal OmpA-like domain exposed to the periplasm. Further comprehensive yeast two-hybrid assays dissecting ImpL(M)-ImpK(L) interaction domains suggested that ImpL(M) interacts with ImpK(L) via the N-terminal cytoplasmic domains of the proteins. In conclusion, ImpL(M) interacts with ImpK(L), and its Walker A motif is required for its function in mediation of Hcp secretion from A. tumefaciens.

  7. cWINNOWER Algorithm for Finding Fuzzy DNA Motifs

    NASA Technical Reports Server (NTRS)

    Liang, Shoudan

    2003-01-01

    The cWINNOWER algorithm detects fuzzy motifs in DNA sequences rich in protein-binding signals. A signal is defined as any short nucleotide pattern having up to d mutations differing from a motif of length l. The algorithm finds such motifs if multiple mutated copies of the motif (i.e., the signals) are present in the DNA sequence in sufficient abundance. The cWINNOWER algorithm substantially improves the sensitivity of the winnower method of Pevzner and Sze by imposing a consensus constraint, enabling it to detect much weaker signals. We studied the minimum number of detectable motifs qc as a function of sequence length N for random sequences. We found that qc increases linearly with N for a fast version of the algorithm based on counting three-member sub-cliques. Imposing consensus constraints reduces qc, by a factor of three in this case, which makes the algorithm dramatically more sensitive. Our most sensitive algorithm, which counts four-member sub-cliques, needs a minimum of only 13 signals to detect motifs in a sequence of length N = 12000 for (l,d) = (15,4).

  8. Preparation of Nucleic Acid Libraries for Personalized Sequencing Systems Using an Integrated Microfluidic Hub Technology (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    ScienceCinema

    Patel, Kamlesh D [Ken; SNL,

    2016-07-12

    Kamlesh (Ken) Patel from Sandia National Laboratories (Livermore, California) presents "Preparation of Nucleic Acid Libraries for Personalized Sequencing Systems Using an Integrated Microfluidic Hub Technology " at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  9. Vy-PER: eliminating false positive detection of virus integration events in next generation sequencing data.

    PubMed

    Forster, Michael; Szymczak, Silke; Ellinghaus, David; Hemmrich, Georg; Rühlemann, Malte; Kraemer, Lars; Mucha, Sören; Wienbrandt, Lars; Stanulla, Martin; Franke, Andre

    2015-07-13

    Several pathogenic viruses such as hepatitis B and human immunodeficiency viruses may integrate into the host genome. These virus/host integrations are detectable using paired-end next generation sequencing. However, the low number of expected true virus integrations may be difficult to distinguish from the noise of many false positive candidates. Here, we propose a novel filtering approach that increases specificity without compromising sensitivity for virus/host chimera detection. Our detection pipeline termed Vy-PER (Virus integration detection bY Paired End Reads) outperforms existing similar tools in speed and accuracy. We analysed whole genome data from childhood acute lymphoblastic leukemia (ALL), which is characterised by genomic rearrangements and usually associated with radiation exposure. This analysis was motivated by the recently reported virus integrations at genomic rearrangement sites and association with chromosomal instability in liver cancer. However, as expected, our analysis of 20 tumour and matched germline genomes from ALL patients finds no significant evidence for integrations by known viruses. Nevertheless, our method eliminates 12,800 false positives per genome (80× coverage) and only our method detects singleton human-phiX174-chimeras caused by optical errors of the Illumina HiSeq platform. This high accuracy is useful for detecting low virus integration levels as well as non-integrated viruses.

  10. Vy-PER: eliminating false positive detection of virus integration events in next generation sequencing data

    PubMed Central

    Forster, Michael; Szymczak, Silke; Ellinghaus, David; Hemmrich, Georg; Rühlemann, Malte; Kraemer, Lars; Mucha, Sören; Wienbrandt, Lars; Stanulla, Martin; Franke, Andre

    2015-01-01

    Several pathogenic viruses such as hepatitis B and human immunodeficiency viruses may integrate into the host genome. These virus/host integrations are detectable using paired-end next generation sequencing. However, the low number of expected true virus integrations may be difficult to distinguish from the noise of many false positive candidates. Here, we propose a novel filtering approach that increases specificity without compromising sensitivity for virus/host chimera detection. Our detection pipeline termed Vy-PER (Virus integration detection bY Paired End Reads) outperforms existing similar tools in speed and accuracy. We analysed whole genome data from childhood acute lymphoblastic leukemia (ALL), which is characterised by genomic rearrangements and usually associated with radiation exposure. This analysis was motivated by the recently reported virus integrations at genomic rearrangement sites and association with chromosomal instability in liver cancer. However, as expected, our analysis of 20 tumour and matched germline genomes from ALL patients finds no significant evidence for integrations by known viruses. Nevertheless, our method eliminates 12,800 false positives per genome (80× coverage) and only our method detects singleton human-phiX174-chimeras caused by optical errors of the Illumina HiSeq platform. This high accuracy is useful for detecting low virus integration levels as well as non-integrated viruses. PMID:26166306

  11. Weak Palindromic Consensus Sequences Are a Common Feature Found at the Integration Target Sites of Many Retroviruses

    PubMed Central

    Wu, Xiaolin; Li, Yuan; Crise, Bruce; Burgess, Shawn M.; Munroe, David J.

    2005-01-01

    Integration into the host genome is one of the hallmarks of the retroviral life cycle and is catalyzed by virus-encoded integrases. While integrase has strict sequence requirements for the viral DNA ends, target site sequences have been shown to be very diverse. We carefully examined a large number of integration target site sequences from several retroviruses, including human immunodeficiency virus type 1, simian immunodeficiency virus, murine leukemia virus, and avian sarcoma-leukosis virus, and found that a statistical palindromic consensus, centered on the virus-specific duplicated target site sequence, was a common feature at integration target sites for these retroviruses. PMID:15795304

  12. High-throughput sequencing of multiple amplicons for barcoding and integrative taxonomy

    PubMed Central

    Cruaud, Perrine; Rasplus, Jean-Yves; Rodriguez, Lillian Jennifer; Cruaud, Astrid

    2017-01-01

    Until now, the potential of NGS for the construction of barcode libraries or integrative taxonomy has been seldom realised. Here, we amplified (two-step PCR) and simultaneously sequenced (MiSeq) multiple markers from hundreds of fig wasp specimens. We also developed a workflow for quality control of the data. Illumina and Sanger sequences accumulated in the past years were compared. Interestingly, primers and PCR conditions used for the Sanger approach did not require optimisation to construct the MiSeq library. After quality controls, 87% of the species (76% of the specimens) had a valid MiSeq sequence for each marker. Importantly, major clusters did not always correspond to the targeted loci. Nine specimens exhibited two divergent sequences (up to 10%). In 95% of the species, MiSeq and Sanger sequences obtained from the same sampling were similar. For the remaining 5%, species were paraphyletic or the sequences clustered into divergent groups on the Sanger + MiSeq trees (>7%). These problematic cases may represent coding NUMTS or heteroplasms. Our results illustrate that Illumina approaches are not artefact-free and confirm that Sanger databases can contain non-target genes. This highlights the importance of quality controls, working with taxonomists and using multiple markers for DNA-taxonomy or species diversity assessment. PMID:28165046

  13. Integrating ChIP-sequencing and digital gene expression profiling to identify BRD7 downstream genes and construct their regulating network.

    PubMed

    Xu, Ke; Xiong, Wei; Zhou, Ming; Wang, Heran; Yang, Jing; Li, Xiayu; Chen, Pan; Liao, Qianjin; Deng, Hao; Li, Xiaoling; Li, Guiyuan; Zeng, Zhaoyang

    2016-01-01

    BRD7 is a single bromodomain-containing protein that functions as a subunit of the SWI/SNF chromatin-remodeling complex to regulate transcription. It also interacts with the well-known tumor suppressor protein p53 to trans-activate genes involved in cell cycle arrest. In this paper, we report an integrative analysis of genome-wide chromatin occupancy of BRD7 by chromatin immunoprecipitation coupled with high-throughput sequencing (ChIP-seq) and digital gene expression (DGE) profiling by RNA-sequencing upon the overexpression of BRD7 in human cells. We localized 156 BRD7-binding peaks representing 184 genes by ChIP-sequencing, and most of these peaks were co-localized with histone modification sites. Four novel motifs were significantly represented in these BRD7-enriched regions. Ingenuity pathway analysis revealed that 22 of these BRD7 target genes were involved in a network regulating cell death and survival. DGE profiling identified 560 up-regulated genes and 1088 down-regulated genes regulated by BRD7. Using Gene Ontology and pathway analysis, we found significant enrichment of the cell cycle and apoptosis pathway genes. For the integrative analysis of the ChIP-seq and DEG data, we constructed a regulating network of BRD7 downstream genes, and this network suggests multiple feedback regulations of the pathways. Furthermore, we validated BIRC2, BIRC3, TXN2, and NOTCH1 genes as direct, functional BRD7 targets, which were involved in the cell cycle and apoptosis pathways. These results provide a genome-wide view of chromatin occupancy and the gene regulation network of the BRD7 signaling pathway.

  14. Genome puzzle master (GPM): an integrated pipeline for building and editing pseudomolecules from fragmented sequences

    PubMed Central

    Zhang, Jianwei; Kudrna, Dave; Mu, Ting; Li, Weiming; Copetti, Dario; Yu, Yeisoo; Goicoechea, Jose Luis; Lei, Yang; Wing, Rod A.

    2016-01-01

    Motivation: Next generation sequencing technologies have revolutionized our ability to rapidly and affordably generate vast quantities of sequence data. Once generated, raw sequences are assembled into contigs or scaffolds. However, these assemblies are mostly fragmented and inaccurate at the whole genome scale, largely due to the inability to integrate additional informative datasets (e.g. physical, optical and genetic maps). To address this problem, we developed a semi-automated software tool—Genome Puzzle Master (GPM)—that enables the integration of additional genomic signposts to edit and build ‘new-gen-assemblies’ that result in high-quality ‘annotation-ready’ pseudomolecules. Results: With GPM, loaded datasets can be connected to each other via their logical relationships which accomplishes tasks to ‘group,’ ‘merge,’ ‘order and orient’ sequences in a draft assembly. Manual editing can also be performed with a user-friendly graphical interface. Final pseudomolecules reflect a user’s total data package and are available for long-term project management. GPM is a web-based pipeline and an important part of a Laboratory Information Management System (LIMS) which can be easily deployed on local servers for any genome research laboratory. Availability and Implementation: The GPM (with LIMS) package is available at https://github.com/Jianwei-Zhang/LIMS Contacts: jzhang@mail.hzau.edu.cn or rwing@mail.arizona.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27318200

  15. An integrated framework for discovery and genotyping of genomic variants from high-throughput sequencing experiments.

    PubMed

    Duitama, Jorge; Quintero, Juan Camilo; Cruz, Daniel Felipe; Quintero, Constanza; Hubmann, Georg; Foulquié-Moreno, Maria R; Verstrepen, Kevin J; Thevelein, Johan M; Tohme, Joe

    2014-04-01

    Recent advances in high-throughput sequencing (HTS) technologies and computing capacity have produced unprecedented amounts of genomic data that have unraveled the genetics of phenotypic variability in several species. However, operating and integrating current software tools for data analysis still require important investments in highly skilled personnel. Developing accurate, efficient and user-friendly software packages for HTS data analysis will lead to a more rapid discovery of genomic elements relevant to medical, agricultural and industrial applications. We therefore developed Next-Generation Sequencing Eclipse Plug-in (NGSEP), a new software tool for integrated, efficient and user-friendly detection of single nucleotide variants (SNVs), indels and copy number variants (CNVs). NGSEP includes modules for read alignment, sorting, merging, functional annotation of variants, filtering and quality statistics. Analysis of sequencing experiments in yeast, rice and human samples shows that NGSEP has superior accuracy and efficiency, compared with currently available packages for variants detection. We also show that only a comprehensive and accurate identification of repeat regions and CNVs allows researchers to properly separate SNVs from differences between copies of repeat elements. We expect that NGSEP will become a strong support tool to empower the analysis of sequencing data in a wide range of research projects on different species.

  16. Development and Assessment of a Horizontally Integrated Biological Sciences Course Sequence for Pharmacy Education

    PubMed Central

    Wright, Nicholas J.D.; Alston, Gregory L.

    2015-01-01

    Objective. To design and assess a horizontally integrated biological sciences course sequence and to determine its effectiveness in imparting the foundational science knowledge necessary to successfully progress through the pharmacy school curriculum and produce competent pharmacy school graduates. Design. A 2-semester course sequence integrated principles from several basic science disciplines: biochemistry, molecular biology, cellular biology, anatomy, physiology, and pathophysiology. Each is a 5-credit course taught 5 days per week, with 50-minute class periods. Assessment. Achievement of outcomes was determined with course examinations, student lecture, and an annual skills mastery assessment. The North American Pharmacist Licensure Examination (NAPLEX) results were used as an indicator of competency to practice pharmacy. Conclusion. Students achieved course objectives and program level outcomes. The biological sciences integrated course sequence was successful in providing students with foundational basic science knowledge required to progress through the pharmacy program and to pass the NAPLEX. The percentage of the school’s students who passed the NAPLEX was not statistically different from the national percentage. PMID:26430276

  17. Motifs and structural blocks retrieval by GHT

    NASA Astrophysics Data System (ADS)

    Cantoni, Virginio; Ferone, Alessio; Petrosino, Alfredo; Polat, Ozlem

    2014-06-01

    The structure of a protein gives more insight on the protein function than its amino acid sequence. Protein structure analysis and comparison are important for understanding the evolutionary relationships among proteins, predicting protein functions, and predicting protein folding. Proteins are formed by two basic regular 3D structural patterns, called Secondary Structures (SSs): helices and sheets. A structural motif is a compact 3D protein block referring to a small specific combination of secondary structural elements, which appears in a variety of molecules. In this paper we compare a few approaches for motif retrieval based on the Generalized Hough Transform (GHT). A primary technique is to adopt the single SS as structural primitives; alternatives are to adopt a SSs pair as primitive structural element, or a SSs triplet, and so on up-to an entire motif. The richer the primitive, the higher the time for pre-analysis and search, and the simpler the inspection process on the parameter space for analyzing the peaks. Performance comparisons, in terms of precision and computation time, are here presented considering the retrieval of motifs composed by three to five SSs for more than 15 million searches. The approach can be easily applied to the retrieval of greater blocks, up to protein domains, or even entire proteins.

  18. Temporal sequence compression by an integrate-and-fire model of hippocampal area CA3.

    PubMed

    August, D A; Levy, W B

    1999-01-01

    Cells in the rat hippocampus fire as a function of the animal's location in space. Thus, a rat moving through the world produces a statistically reproducible sequence of "place cell" firings. With this perspective, spatial navigation can be viewed as a sequence learning problem for the hippocampus. That is, learning entails associating the relationships among a sequence of places that are represented by a sequence of place cell firing. Recent experiments by McNaughton and colleagues suggest the hippocampus can recall a sequence of place cell firings at a faster rate than it was experienced. This speedup, which occurs during slow-wave sleep, is called temporal compression. Here, we show that a simplified model of hippocampal area CA3, based on integrate-and-fire cells and unsupervised Hebbian learning, reproduces this temporal compression. The amount of compression is proportional to the activity level during recall and to the relative timespan of associativity during learning. Compression seems to arise from an alteration of network dynamics between learning and recall. During learning, the dynamics are paced by external input and slowed by a low overall level of activity. During recall, however, external input is absent, and the dynamics are controlled by intrinsic network properties. Raising the activity level by lowering inhibition increases the rate at which the network can transition between previously learned states and thereby produces temporal compression. The tendency for speeding up future activations, however, is limited by the temporal range of associations that were present during learning.

  19. MINER: software for phylogenetic motif identification.

    PubMed

    La, David; Livesay, Dennis R

    2005-07-01

    MINER is web-based software for phylogenetic motif (PM) identification. PMs are sequence regions (fragments) that conserve the overall familial phylogeny. PMs have been shown to correspond to a wide variety of catalytic regions, substrate-binding sites and protein interfaces, making them ideal functional site predictions. The MINER output provides an intuitive interface for interactive PM sequence analysis and structural visualization. The web implementation of MINER is freely available at http://www.pmap.csupomona.edu/MINER/. Source code is available to the academic community on request.

  20. High-resolution sequence stratigraphy from outcrop study, with the integration of log and seismic data

    SciTech Connect

    Ardevol, L.; Krauss, S. ); Klimowitz, J. )

    1993-09-01

    The detailed sequence stratigraphic analysis of the siliciclastic-dominated Late Cretaceous sediments (Aren Sandstone and Garumnian red beds, south central Pyrenees, Spain) reveals the repeating disposition of critical elements and controlling mechanisms of cycles and sequences. Our approach integrates (a) hierarchy of unconformity-bounded units, (b) physical expression of boundaries traceable from the continent to the basin, (c) featuring facies and depositional systems, (d) well log and seismic expression, and (e) driving basing-filling mechanisms. A comparison to other active basins is suggested in order to prove the validity beyond the regional scale. Four basin-wide transgressive facies cycles were identified and interpreted as third-order units. The transgressive phase of each cycle is represented by mixed shelf deposits, while regressive periods consists of complex delta systems. The cycles are composed within their regressive phase of fourth-order depositional sequences, trapped in structural lows, which are controlled by synsedimentary compressive tectonics. Both cycles and sequences are set up by similar building blocks. Our example, cycle two, localized in the Tremp area, displays seven sequences: lowstand systems tracts are channelized turbidites; transgressive systems tracts are lagoon-barrier systems and/or storm-dominated shoreface deposits; and fluvial, coastal plain, and delta deposits build highstand systems tracts. The physical continuity of the sequences (and cycles) is frequently disrupted by erosion due to lowstand or transgressive processes and active faulting. Synsedimentary ramp anticlines, which control the entire basin, and third-order unconformities have been recognized in seismic lines. Their interpretation has led to the identification and correlation of cycles and sequences in the well logs of the region.

  1. Patterned sequence in the transcriptome of vascular plants

    PubMed Central

    Crane, Charles F

    2007-01-01

    Background Microsatellites (repeated subsequences based on motifs of one to six nucleotides) are widely used as codominant genetic markers because of their frequent polymorphism and relative selective neutrality. Minisatellites are repeats of motifs having seven or more nucleotides. The large number of EST sequences now available in public databases offers an opportunity to compare microsatellite and minisatellite properties and evaluate their evolution over a broad range of plant taxa. Results Repeated motifs from one to 250 nucleotides long were identified in 6793306 expressed sequence tags (ESTs) from 88 genera of vascular plants, using a custom data-processing pipeline that allowed limited variation among repeats. The pipeline processed trimmed but otherwise unfiltered sequence and output nonredundant loci of at least 15 nucleotides, with degree of polymorphism and PCR primers wherever possible. Motifs that were an integral multiple of three in length were more abundant and richer in G/C than other motifs. From 80 to 85% of minisatellite motifs represented repeats within proteins, up to the 228-nucleotide repeat of ubiquitin, but not all of these repeats preserved reading frame. The remaining 15 to 20% of minisatellite motifs were associated with transcribed repetitive elements, e.g., retrotransposons. Relative microsatellite motif frequencies did not correlate tightly to phylogenetic relationship. Evolution of increased microsatellite and EST GC content was evident within the grasses. Microsatellites were less frequent in the transcriptome of genera with large genomes, but there was no evidence for greater dilution of the transcriptome with transposable element transcripts in these genera. Conclusion The relatively low correlation of microsatellite spectrum to phylogeny suggests that repeat loci evolve more rapidly than the surrounding sequence, although tissue specificity of the different EST libraries is a complicating factor. In-frame motifs are more

  2. ORIO (Online Resource for Integrative Omics): a web-based platform for rapid integration of next generation sequencing data.

    PubMed

    Lavender, Christopher A; Shapiro, Andrew J; Burkholder, Adam B; Bennett, Brian D; Adelman, Karen; Fargo, David C

    2017-04-11

    Established and emerging next generation sequencing (NGS)-based technologies allow for genome-wide interrogation of diverse biological processes. However, accessibility of NGS data remains a problem, and few user-friendly resources exist for integrative analysis of NGS data from different sources and experimental techniques. Here, we present Online Resource for Integrative Omics (ORIO; https://orio.niehs.nih.gov/), a web-based resource with an intuitive user interface for rapid analysis and integration of NGS data. To use ORIO, the user specifies NGS data of interest along with a list of genomic coordinates. Genomic coordinates may be biologically relevant features from a variety of sources, such as ChIP-seq peaks for a given protein or transcription start sites from known gene models. ORIO first iteratively finds read coverage values at each genomic feature for each NGS dataset. Data are then integrated using clustering-based approaches, giving hierarchical relationships across NGS datasets and separating individual genomic features into groups. In focusing its analysis on read coverage, ORIO makes limited assumptions about the analyzed data; this allows the tool to be applied across data from a variety of experiments and techniques. Results from analysis are presented in dynamic displays alongside user-controlled statistical tests, supporting rapid statistical validation of observed results. We emphasize the versatility of ORIO through diverse examples, ranging from NGS data quality control to characterization of enhancer regions and integration of gene expression information. Easily accessible on a public web server, we anticipate wide use of ORIO in genome-wide investigations by life scientists.

  3. Motif co-regulation and co-operativity are common mechanisms in transcriptional, post-transcriptional and post-translational regulation.

    PubMed

    Van Roey, Kim; Davey, Norman E

    2015-12-01

    A substantial portion of the regulatory interactions in the higher eukaryotic cell are mediated by simple sequence motifs in the regulatory segments of genes and (pre-)mRNAs, and in the intrinsically disordered regions of proteins. Although these regulatory modules are physicochemically distinct, they share an evolutionary plasticity that has facilitated a rapid growth of their use and resulted in their ubiquity in complex organisms. The ease of motif acquisition simplifies access to basal housekeeping functions, facilitates the co-regulation of multiple biomolecules allowing them to respond in a coordinated manner to changes in the cell state, and supports the integration of multiple signals for combinatorial decision-making. Consequently, motifs are indispensable for temporal, spatial, conditional and basal regulation at the transcriptional, post-transcriptional and post-translational level. In this review, we highlight that many of the key regulatory pathways of the cell are recruited by motifs and that the ease of motif acquisition has resulted in large networks of co-regulated biomolecules. We discuss how co-operativity allows simple static motifs to perform the conditional regulation that underlies decision-making in higher eukaryotic biological systems. We observe that each gene and its products have a unique set of DNA, RNA or protein motifs that encode a regulatory program to define the logical circuitry that guides the life cycle of these biomolecules, from transcription to degradation. Finally, we contrast the regulatory properties of protein motifs and the regulatory elements of DNA and (pre-)mRNAs, advocating that co-regulation, co-operativity, and motif-driven regulatory programs are common mechanisms that emerge from the use of simple, evolutionarily plastic regulatory modules.

  4. De Novo Regulatory Motif Discovery Identifies Significant Motifs in Promoters of Five Classes of Plant Dehydrin Genes

    PubMed Central

    Zolotarov, Yevgen; Strömvik, Martina

    2015-01-01

    Plants accumulate dehydrins in response to osmotic stresses. Dehydrins are divided into five different classes, which are thought to be regulated in different manners. To better understand differences in transcriptional regulation of the five dehydrin classes, de novo motif discovery was performed on 350 dehydrin promoter sequences from a total of 51 plant genomes. Overrepresented motifs were identified in the promoters of five dehydrin classes. The Kn dehydrin promoters contain motifs linked with meristem specific expression, as well as motifs linked with cold/dehydration and abscisic acid response. KS dehydrin promoters contain a motif with a GATA core. SKn and YnSKn dehydrin promoters contain motifs that match elements connected with cold/dehydration, abscisic acid and light response. YnKn dehydrin promoters contain motifs that match abscisic acid and light response elements, but not cold/dehydration response elements. Conserved promoter motifs are present in the dehydrin classes and across different plant lineages, indicating that dehydrin gene regulation is likely also conserved. PMID:26114291

  5. Finding specific RNA motifs: Function in a zeptomole world?

    PubMed Central

    KNIGHT, ROB; YARUS, MICHAEL

    2003-01-01

    We have developed a new method for estimating the abundance of any modular (piecewise) RNA motif within a longer random region. We have used this method to estimate the size of the active motifs available to modern SELEX experiments (picomoles of unique sequences) and to a plausible RNA World (zeptomoles of unique sequences: 1 zmole = 602 sequences). Unexpectedly, activities such as specific isoleucine binding are almost certainly present in zeptomoles of molecules, and even ribozymes such as self-cleavage motifs may appear (depending on assumptions about the minimal structures). The number of specified nucleotides is not the only important determinant of a motif’s rarity: The number of modules into which it is divided, and the details of this division, are also crucial. We propose three maxims for easily isolated motifs: the Maxim of Minimization, the Maxim of Multiplicity, and the Maxim of the Median. These maxims together state that selected motifs should be small and composed of as many separate, equally sized modules as possible. For evenly divided motifs with four modules, the largest accessible activity in picomole scale (1–1000 pmole) pools of length 100 is about 34 nucleotides; while for zeptomole scale (1–1000 zmole) pools it is about 20 specific nucleotides (50% probability of occurrence). This latter figure includes some ribozymes and aptamers. Consequently, an RNA metabolism apparently could have begun with only zeptomoles of RNA molecules. PMID:12554865

  6. cWINNOWER algorithm for finding fuzzy dna motifs

    NASA Technical Reports Server (NTRS)

    Liang, S.; Samanta, M. P.; Biegel, B. A.

    2004-01-01

    The cWINNOWER algorithm detects fuzzy motifs in DNA sequences rich in protein-binding signals. A signal is defined as any short nucleotide pattern having up to d mutations differing from a motif of length l. The algorithm finds such motifs if a clique consisting of a sufficiently large number of mutated copies of the motif (i.e., the signals) is present in the DNA sequence. The cWINNOWER algorithm substantially improves the sensitivity of the winnower method of Pevzner and Sze by imposing a consensus constraint, enabling it to detect much weaker signals. We studied the minimum detectable clique size qc as a function of sequence length N for random sequences. We found that qc increases linearly with N for a fast version of the algorithm based on counting three-member sub-cliques. Imposing consensus constraints reduces qc by a factor of three in this case, which makes the algorithm dramatically more sensitive. Our most sensitive algorithm, which counts four-member sub-cliques, needs a minimum of only 13 signals to detect motifs in a sequence of length N = 12,000 for (l, d) = (15, 4). Copyright Imperial College Press.

  7. An integrated approach for analyzing clinical genomic variant data from next-generation sequencing.

    PubMed

    Crowgey, Erin L; Stabley, Deborah L; Chen, Chuming; Huang, Hongzhan; Robbins, Katherine M; Polson, Shawn W; Sol-Church, Katia; Wu, Cathy H

    2015-04-01

    Next-generation sequencing (NGS) technologies provide the potential for developing high-throughput and low-cost platforms for clinical diagnostics. A limiting factor to clinical applications of genomic NGS is downstream bioinformatics analysis for data interpretation. We have developed an integrated approach for end-to-end clinical NGS data analysis from variant detection to functional profiling. Robust bioinformatics pipelines were implemented for genome alignment, single nucleotide polymorphism (SNP), small insertion/deletion (InDel), and copy number variation (CNV) detection of whole exome sequencing (WES) data from the Illumina platform. Quality-control metrics were analyzed at each step of the pipeline by use of a validated training dataset to ensure data integrity for clinical applications. We annotate the variants with data regarding the disease population and variant impact. Custom algorithms were developed to filter variants based on criteria, such as quality of variant, inheritance pattern, and impact of variant on protein function. The developed clinical variant pipeline links the identified rare variants to Integrated Genome Viewer for visualization in a genomic context and to the Protein Information Resource's iProXpress for rich protein and disease information. With the application of our system of annotations, prioritizations, inheritance filters, and functional profiling and analysis, we have created a unique methodology for downstream variant filtering that empowers clinicians and researchers to interpret more effectively the relevance of genomic alterations within a rare genetic disease.

  8. An Integrated Approach for Analyzing Clinical Genomic Variant Data from Next-Generation Sequencing

    PubMed Central

    Stabley, Deborah L.; Chen, Chuming; Huang, Hongzhan; Robbins, Katherine M.; Polson, Shawn W.; Sol-Church, Katia; Wu, Cathy H.

    2015-01-01

    Next-generation sequencing (NGS) technologies provide the potential for developing high-throughput and low-cost platforms for clinical diagnostics. A limiting factor to clinical applications of genomic NGS is downstream bioinformatics analysis for data interpretation. We have developed an integrated approach for end-to-end clinical NGS data analysis from variant detection to functional profiling. Robust bioinformatics pipelines were implemented for genome alignment, single nucleotide polymorphism (SNP), small insertion/deletion (InDel), and copy number variation (CNV) detection of whole exome sequencing (WES) data from the Illumina platform. Quality-control metrics were analyzed at each step of the pipeline by use of a validated training dataset to ensure data integrity for clinical applications. We annotate the variants with data regarding the disease population and variant impact. Custom algorithms were developed to filter variants based on criteria, such as quality of variant, inheritance pattern, and impact of variant on protein function. The developed clinical variant pipeline links the identified rare variants to Integrated Genome Viewer for visualization in a genomic context and to the Protein Information Resource’s iProXpress for rich protein and disease information. With the application of our system of annotations, prioritizations, inheritance filters, and functional profiling and analysis, we have created a unique methodology for downstream variant filtering that empowers clinicians and researchers to interpret more effectively the relevance of genomic alterations within a rare genetic disease. PMID:25649353

  9. A novel cysteine-rich sequence-specific DNA-binding protein interacts with the conserved X-box motif of the human major histocompatibility complex class II genes via a repeated Cys-His domain and functions as a transcriptional repressor

    PubMed Central

    1994-01-01

    The class II major histocompatibility complex (MHC) molecules function in the presentation of processed peptides to helper T cells. As most mammalian cells can endocytose and process foreign antigen, the critical determinant of an antigen-presenting cell is its ability to express class II MHC molecules. Expression of these molecules is usually restricted to cells of the immune system and dysregulated expression is hypothesized to contribute to the pathogenesis of a severe combined immunodeficiency syndrome and certain autoimmune diseases. Human complementary DNA clones encoding a newly identified, cysteine-rich transcription factor, NF-X1, which binds to the conserved X-box motif of class II MHC genes, were obtained, and the primary amino acid sequence deduced. The major open reading frame encodes a polypeptide of 1,104 amino acids with a symmetrical organization. A central cysteine-rich portion encodes the DNA-binding domain, and is subdivided into seven repeated motifs. This motif is similar to but distinct from the LIM domain and the RING finger family, and is reminiscent of known metal-binding regions. The unique arrangement of cysteines indicates that the consensus sequence CX3CXL-XCGX1- 5HXCX3CHXGXC represents a novel cysteine-rich motif. Two lines of evidence indicate that the polypeptide encodes a potent and biologically relevant repressor of HLA-DRA transcription: (a) overexpression of NF-X1 from a retroviral construct strongly decreases transcription from the HLA-DRA promoter; and (b) the NF-X1 transcript is markedly induced late after induction with interferon gamma (IFN- gamma), coinciding with postinduction attenuation of HLA-DRA transcription. The NF-X1 protein may therefore play an important role in regulating the duration of an inflammatory response by limiting the period in which class II MHC molecules are induced by IFN-gamma. PMID:7964459

  10. DNA nanotechnology based on i-motif structures.

    PubMed

    Dong, Yuanchen; Yang, Zhongqiang; Liu, Dongsheng

    2014-06-17

    CONSPECTUS: Most biological processes happen at the nanometer scale, and understanding the energy transformations and material transportation mechanisms within living organisms has proved challenging. To better understand the secrets of life, researchers have investigated artificial molecular motors and devices over the past decade because such systems can mimic certain biological processes. DNA nanotechnology based on i-motif structures is one system that has played an important role in these investigations. In this Account, we summarize recent advances in functional DNA nanotechnology based on i-motif structures. The i-motif is a DNA quadruplex that occurs as four stretches of cytosine repeat sequences form C·CH(+) base pairs, and their stabilization requires slightly acidic conditions. This unique property has produced the first DNA molecular motor driven by pH changes. The motor is reliable, and studies show that it is capable of millisecond running speeds, comparable to the speed of natural protein motors. With careful design, the output of these types of motors was combined to drive micrometer-sized cantilevers bend. Using established DNA nanostructure assembly and functionalization methods, researchers can easily integrate the motor within other DNA assembled structures and functional units, producing DNA molecular devices with new functions such as suprahydrophobic/suprahydrophilic smart surfaces that switch, intelligent nanopores triggered by pH changes, molecular logic gates, and DNA nanosprings. Recently, researchers have produced motors driven by light and electricity, which have allowed DNA motors to be integrated within silicon-based nanodevices. Moreover, some devices based on i-motif structures have proven useful for investigating processes within living cells. The pH-responsiveness of the i-motif structure also provides a way to control the stepwise assembly of DNA nanostructures. In addition, because of the stability of the i-motif, this

  11. Deep Impact Sequence Planning Using Multi-Mission Adaptable Planning Tools With Integrated Spacecraft Models

    NASA Technical Reports Server (NTRS)

    Wissler, Steven S.; Maldague, Pierre; Rocca, Jennifer; Seybold, Calina

    2006-01-01

    The Deep Impact mission was ambitious and challenging. JPL's well proven, easily adaptable multi-mission sequence planning tools combined with integrated spacecraft subsystem models enabled a small operations team to develop, validate, and execute extremely complex sequence-based activities within very short development times. This paper focuses on the core planning tool used in the mission, APGEN. It shows how the multi-mission design and adaptability of APGEN made it possible to model spacecraft subsystems as well as ground assets throughout the lifecycle of the Deep Impact project, starting with models of initial, high-level mission objectives, and culminating in detailed predictions of spacecraft behavior during mission-critical activities.

  12. Core-seismic integration of lower-middle Miocene sequences of the New Jersey shallow shelf (IODP Exp. 313): Sequence boundaries are impedance contrasts

    NASA Astrophysics Data System (ADS)

    Bassetti, M.; Miller, K. G.; Monteverde, D.; Mountain, G.; Proust, J.; Scienceparty, E.

    2010-12-01

    , respectively. There were 11 cases where there was a minimal lithologic expression of the seismic sequence boundary; many of these occur in a position basinward of the clinoform inflection point and correlate to the juxtaposition of relatively deep-water facies above and below the sequence boundary. Surfaces recognized in the core and on seismic profiles as maximum flooding surfaces and sequence boundaries are also associated with peaks of gamma ray values. Oligocene sequences were only sampled at Site M27, but have minimal sedimentological and seismic expression due to basinal locations and sequence boundaries must be picked by chronostratigraphic gaps. Limited core recovery prevented firm evaluation of uppermost middle to upper Miocene sequences. Excellent recovery of lower to lower middle Miocene sequences allows core-seismic integration that confirms the hypothesis that sequence bounding unconformities and their correlative conformities are major impedance contrasts.

  13. Genetic counselors' views and experiences with the clinical integration of genome sequencing.

    PubMed

    Machini, Kalotina; Douglas, Jessica; Braxton, Alicia; Tsipis, Judith; Kramer, Kate

    2014-08-01

    In recent years, new sequencing technologies known as next generation sequencing (NGS) have provided scientists the ability to rapidly sequence all known coding as well as non-coding sequences in the human genome. As the two emerging approaches, whole exome (WES) and whole genome (WGS) sequencing, have started to be integrated in the clinical arena, we sought to survey health care professionals who are likely to be involved in the implementation process now and/or in the future (e.g., genetic counselors, geneticists and nurse practitioners). Two hundred twenty-one genetic counselors- one third of whom currently offer WES/WGS-participated in an anonymous online survey. The aims of the survey were first, to identify barriers to the implementation of WES/WGS, as perceived by survey participants; second, to provide the first systematic report of current practices regarding the integration of WES/WGS in clinic and/or research across the US and Canada and to illuminate the roles and challenges of genetic counselors participating in this process; and third to evaluate the impact of WES/WGS on patient care. Our results showed that genetic counseling practices with respect to WES/WGS are consistent with the criteria set forth in the ACMG 2012 policy statement, which highlights indications for testing, reporting, and pre/post test considerations. Our respondents described challenges related to offering WES/WGS, which included billing issues, the duration and content of the consent process, result interpretation and disclosure of incidental findings and variants of unknown significance. In addition, respondents indicated that specialty area (i.e., prenatal and cancer), lack of clinical utility of WES/WGS and concerns about interpretation of test results were factors that prevented them from offering this technology to patients. Finally, study participants identified the aspects of their professional training which have been most beneficial in aiding with the integration of

  14. Motif discovery with data mining in 3D protein structure databases: discovery, validation and prediction of the U-shape zinc binding ("Huf-Zinc") motif.

    PubMed

    Maurer-Stroh, Sebastian; Gao, He; Han, Hao; Baeten, Lies; Schymkowitz, Joost; Rousseau, Frederic; Zhang, Louxin; Eisenhaber, Frank

    2013-02-01

    Data mining in protein databases, derivatives from more fundamental protein 3D structure and sequence databases, has considerable unearthed potential for the discovery of sequence motif--structural motif--function relationships as the finding of the U-shape (Huf-Zinc) motif, originally a small student's project, exemplifies. The metal ion zinc is critically involved in universal biological processes, ranging from protein-DNA complexes and transcription regulation to enzymatic catalysis and metabolic pathways. Proteins have evolved a series of motifs to specifically recognize and bind zinc ions. Many of these, so called zinc fingers, are structurally independent globular domains with discontinuous binding motifs made up of residues mostly far apart in sequence. Through a systematic approach starting from the BRIX structure fragment database, we discovered that there exists another predictable subset of zinc-binding motifs that not only have a conserved continuous sequence pattern but also share a characteristic local conformation, despite being included in totally different overall folds. While this does not allow general prediction of all Zn binding motifs, a HMM-based web server, Huf-Zinc, is available for prediction of these novel, as well as conventional, zinc finger motifs in protein sequences. The Huf-Zinc webserver can be freely accessed through this URL (http://mendel.bii.a-star.edu.sg/METHODS/hufzinc/).

  15. Identification of regulatory motifs in the CHO genome for stable monoclonal antibody production.

    PubMed

    Takagi, Yasuhiro; Yamazaki, Tomomi; Masuda, Kenji; Nishii, Shigeaki; Kawakami, Bunsei; Omasa, Takeshi

    2016-08-20

    Chinese hamster ovary (CHO) cell lines are widely used for therapeutic protein production. When a transgene is integrated into the genome of a CHO cell, the expression level is highly dependent on the site of integration because of positional effects such as gene silencing. To overcome negative positional effects and establish stable CHO cell lines with high productivity, several regulatory DNA elements are used in vector construction. Previously, we established the CHO DR1000L-4N cell line, a stable and high copy number Dhfr gene-amplified cell line. It was hypothesized that the chromosomal location of the exogenous gene-amplified region in the CHO DR1000L-4N genome contains regulatory motifs for stable protein production. Therefore, we isolated DNA regulatory motifs from the CHO DR1000L-4N cell line and determined whether these motifs act as an insulator. Our results suggest that stable expression of a transgene can be promoted by the CHO genome sequence, and it would be a powerful tool for therapeutic protein manufacturing.

  16. Motivated Proteins: A web application for studying small three-dimensional protein motifs

    PubMed Central

    Leader, David P; Milner-White, E James

    2009-01-01

    Background Small loop-shaped motifs are common constituents of the three-dimensional structure of proteins. Typically they comprise between three and seven amino acid residues, and are defined by a combination of dihedral angles and hydrogen bonding partners. The most abundant of these are αβ-motifs, asx-motifs, asx-turns, β-bulges, β-bulge loops, β-turns, nests, niches, Schellmann loops, ST-motifs, ST-staples and ST-turns. We have constructed a database of such motifs from a range of high-quality protein structures and built a web application as a visual interface to this. Description The web application, Motivated Proteins, provides access to these 12 motifs (with 48 sub-categories) in a database of over 400 representative proteins. Queries can be made for specific categories or sub-categories of motif, motifs in the vicinity of ligands, motifs which include part of an enzyme active site, overlapping motifs, or motifs which include a particular amino acid sequence. Individual proteins can be specified, or, where appropriate, motifs for all proteins listed. The results of queries are presented in textual form as an (X)HTML table, and may be saved as parsable plain text or XML. Motifs can be viewed and manipulated either individually or in the context of the protein in the Jmol applet structural viewer. Cartoons of the motifs imposed on a linear representation of protein secondary structure are also provided. Summary information for the motifs is available, as are histograms of amino acid distribution, and graphs of dihedral angles at individual positions in the motifs. Conclusion Motivated Proteins is a publicly and freely accessible web application that enables protein scientists to study small three-dimensional motifs without requiring knowledge of either Structured Query Language or the underlying database schema. PMID:19210785

  17. Non-canonical integration events in Pichia pastoris encountered during standard transformation analysed with genome sequencing

    PubMed Central

    Schwarzhans, Jan-Philipp; Wibberg, Daniel; Winkler, Anika; Luttermann, Tobias; Kalinowski, Jörn; Friehs, Karl

    2016-01-01

    The non-conventional yeast Pichia pastoris is a popular host for recombinant protein production in scientific research and industry. Typically, the expression cassette is integrated into the genome via homologous recombination. Due to unknown integration events, a large clonal variability is often encountered consisting of clones with different productivities as well as aberrant morphological or growth characteristics. In this study, we analysed several clones with abnormal colony morphology and discovered unpredicted integration events via whole genome sequencing. These include (i) the relocation of the locus targeted for replacement to another chromosome (ii) co-integration of DNA from the E. coli plasmid host and (iii) the disruption of untargeted genes affecting colony morphology. Most of these events have not been reported so far in literature and present challenges for genetic engineering approaches in this yeast. Especially, the presence and independent activity of E. coli DNA elements in P. pastoris is of concern. In our study, we provide a deeper insight into these events and their potential origins. Steps preventing or reducing the risk for these phenomena are proposed and will help scientists working on genetic engineering of P. pastoris or similar non-conventional yeast to better understand and control clonal variability. PMID:27958335

  18. Robust design of an optical router based on a tapered side-coupled integrated spaced sequence of optical resonators.

    PubMed

    Bettotti, P; Mancinelli, M; Guider, R; Masi, M; Vanacharla, M Rao; Pavesi, L

    2011-04-15

    A novel (to our knowledge) scheme of an optical router/switch element, composed of a tapered side-coupled integrated spaced sequence of optical resonators, is proposed. It is based on a modified design of the ring sequence in which the resonance conditions are set by the single ring resonance and by the coherent feedback of the sequence of rings. This double condition yields robustness against fabrication defects, dense routing capability, and high switching efficiency.

  19. IQ-motif peptides as novel anti-microbial agents.

    PubMed

    McLean, Denise T F; Lundy, Fionnuala T; Timson, David J

    2013-04-01

    The IQ-motif is an amphipathic, often positively charged, α-helical, calmodulin binding sequence found in a number of eukaryote signalling, transport and cytoskeletal proteins. They share common biophysical characteristics with established, cationic α-helical antimicrobial peptides, such as the human cathelicidin LL-37. Therefore, we tested eight peptides encoding the sequences of IQ-motifs derived from the human cytoskeletal scaffolding proteins IQGAP2 and IQGAP3. Some of these peptides were able to inhibit the growth of Escherichia coli and Staphylococcus aureus with minimal inhibitory concentrations (MIC) comparable to LL-37. In addition some IQ-motifs had activity against the fungus Candida albicans. This antimicrobial activity is combined with low haemolytic activity (comparable to, or lower than, that of LL-37). Those IQ-motifs with anti-microbial activity tended to be able to bind to lipopolysaccharide. Some of these were also able to permeabilise the cell membranes of both Gram positive and Gram negative bacteria. These results demonstrate that IQ-motifs are viable lead sequences for the identification and optimisation of novel anti-microbial peptides. Thus, further investigation of the anti-microbial properties of this diverse group of sequences is merited.

  20. An Integrated System for DNA Sequencing by Synthesis Using Novel Nucleotide Analogues

    PubMed Central

    Guo, Jia; Yu, Lin; Turro, Nicholas J.; Ju, Jingyue

    2010-01-01

    via click chemistry is unambiguously identified with this chip-SBS system. The second generation (G-2) SBS system was developed based on the concept that the closer the structures of the added nucleotide and the primer are to their natural counterparts, the more faithfully the polymerase would incorporate the nucleotide. In this approach, the polymerase reaction is performed with the combination of 3′-capped nucleotide reversible terminators (NRTs) and cleavable fluorescent dideoxynucleotides (ddNTPs). By sacrificing a small amount of the primers permanently terminated by ddNTPs, the majority of the primers extended by the reversible terminators are reverted to the natural ones after each sequencing cycle. We have also developed the 3′-capped nucleotide reversible terminators to solve the problem of deciphering the homopolymeric regions of the template in conventional pyrosequencing. The 3′-capping moiety on the DNA extension product temporarily terminates the polymerase reaction, which allows only one nucleotide to be incorporated during each sequencing cycle. Thus, the number of nucleotides in the homopolymeric regions are unambiguously determined using the 3′-capped NRTs. It has been established that millions of DNA templates can be immobilized on a chip surface through a variety of approaches. Therefore, the integration of these high-density DNA chips with the molecular-level SBS approaches described in this Account is expected to generate a high-throughput and accurate DNA sequencing system with wide applications in biological research and health care. PMID:20121268

  1. BALSA: integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by GPU

    PubMed Central

    Lee, Lap-Kei; Cheung, Jeanno; Liu, Chi-Man

    2014-01-01

    This paper reports an integrated solution, called BALSA, for the secondary analysis of next generation sequencing data; it exploits the computational power of GPU and an intricate memory management to give a fast and accurate analysis. From raw reads to variants (including SNPs and Indels), BALSA, using just a single computing node with a commodity GPU board, takes 5.5 h to process 50-fold whole genome sequencing (∼750 million 100 bp paired-end reads), or just 25 min for 210-fold whole exome sequencing. BALSA’s speed is rooted at its parallel algorithms to effectively exploit a GPU to speed up processes like alignment, realignment and statistical testing. BALSA incorporates a 16-genotype model to support the calling of SNPs and Indels and achieves competitive variant calling accuracy and sensitivity when compared to the ensemble of six popular variant callers. BALSA also supports efficient identification of somatic SNVs and CNVs; experiments showed that BALSA recovers all the previously validated somatic SNVs and CNVs, and it is more sensitive for somatic Indel detection. BALSA outputs variants in VCF format. A pileup-like SNAPSHOT format, while maintaining the same fidelity as BAM in variant calling, enables efficient storage and indexing, and facilitates the App development of downstream analyses. BALSA is available at: http://sourceforge.net/p/balsa. PMID:24949238

  2. Integration of geologic data by applying sequence stratigraphic methods to foreland basins

    SciTech Connect

    Nunez, F.J.; Tover, G.N.S. )

    1993-02-01

    Much doubt exists about the application of sequence stratigraphy in areas of strong tectonic influence. The purpose of this paper is to demonstrate the wide applicability of sequence stratigraphy as an alternative to classical lithostratigraphy by utilizing this tool to explain deformed geometries of passive margin basins described by Vail et al as they became deformed within the foreland basins. The information that is summarized in the present study is the result of the integration of existing data in an area of the Eastern Venezuela Basin. We combined the biostratigraphic data (abundance and diversity curves of fossils from planktonic and benthonic forminifera to calcareous nannoplancton and palinomorphs) with their depths relation to the electrical log in order to identify condensed sections and sequence boundaries in the manner of Vail and Wornardt. Dipmeter data is incorporated in this study by using stereoplots of structural poles which allows a priori determination of the source and direction of basin fill and reconstruction of the paleogeography. Combining all of these analytical elements one derives a conceptual geological model that is then transferred into the interpretation of the seismic lines.

  3. BALSA: integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by GPU.

    PubMed

    Luo, Ruibang; Wong, Yiu-Lun; Law, Wai-Chun; Lee, Lap-Kei; Cheung, Jeanno; Liu, Chi-Man; Lam, Tak-Wah

    2014-01-01

    This paper reports an integrated solution, called BALSA, for the secondary analysis of next generation sequencing data; it exploits the computational power of GPU and an intricate memory management to give a fast and accurate analysis. From raw reads to variants (including SNPs and Indels), BALSA, using just a single computing node with a commodity GPU board, takes 5.5 h to process 50-fold whole genome sequencing (∼750 million 100 bp paired-end reads), or just 25 min for 210-fold whole exome sequencing. BALSA's speed is rooted at its parallel algorithms to effectively exploit a GPU to speed up processes like alignment, realignment and statistical testing. BALSA incorporates a 16-genotype model to support the calling of SNPs and Indels and achieves competitive variant calling accuracy and sensitivity when compared to the ensemble of six popular variant callers. BALSA also supports efficient identification of somatic SNVs and CNVs; experiments showed that BALSA recovers all the previously validated somatic SNVs and CNVs, and it is more sensitive for somatic Indel detection. BALSA outputs variants in VCF format. A pileup-like SNAPSHOT format, while maintaining the same fidelity as BAM in variant calling, enables efficient storage and indexing, and facilitates the App development of downstream analyses. BALSA is available at: http://sourceforge.net/p/balsa.

  4. Drosophila bloom helicase maintains genome integrity by inhibiting recombination between divergent DNA sequences.

    PubMed

    Kappeler, Michael; Kranz, Elisabeth; Woolcock, Katrina; Georgiev, Oleg; Schaffner, Walter

    2008-12-01

    DNA double strand breaks (DSB) can be repaired either via a sequence independent joining of DNA ends or via homologous recombination. We established a detection system in Drosophila melanogaster to investigate the impact of sequence constraints on the usage of the homology based DSB repair via single strand annealing (SSA), which leads to recombination between direct repeats with concomitant loss of one repeat copy. First of all, we find the SSA frequency to be inversely proportional to the spacer length between the repeats, for spacers up to 2.4 kb in length. We further show that SSA between divergent repeats (homeologous SSA) is suppressed in cell cultures and in vivo in a sensitive manner, recognizing sequence divergences smaller than 0.5%. Finally, we demonstrate that the suppression of homeologous SSA depends on the Bloom helicase (Blm), encoded by the Drosophila gene mus309. Suppression of homeologous recombination is a novel function of Blm in ensuring genomic integrity, not described to date in mammalian systems. Unexpectedly, distinct from its function in Saccharomyces cerevisiae, the mismatch repair factor Msh2 encoded by spel1 does not suppress homeologous SSA in Drosophila.

  5. Integration of Residue Attributes for Sequence Diversity Characterization of Terpenoid Enzymes

    PubMed Central

    Ikeda, Shun; Ono, Naoaki; Altaf-Ul-Amin, Md.; Kanaya, Shigehiko

    2014-01-01

    Progress in the “omics” fields such as genomics, transcriptomics, proteomics, and metabolomics has engendered a need for innovative analytical techniques to derive meaningful information from the ever increasing molecular data. KNApSAcK motorcycle DB is a popular database for enzymes related to secondary metabolic pathways in plants. One of the challenges in analyses of protein sequence data in such repositories is the standard notation of sequences as strings of alphabetical characters. This has created lack of a natural underlying metric that eases amenability to computation. In view of this requirement, we applied novel integration of selected biochemical and physical attributes of amino acids derived from the amino acid index and quantified in numerical scale, to examine diversity of peptide sequences of terpenoid synthases accumulated in KNApSAcK motorcycle DB. We initially generated a reduced amino acid index table. This is a set of biochemical and physical properties obtained by random forest feature selection of important indices from the amino acid index. Principal component analysis was then applied for characterization of enzymes involved in synthesis of terpenoids. The variance explained was increased by incorporation of residue attributes for analyses. PMID:24900985

  6. [Specific motifs in the genomes of the family Chlamydiaceae].

    PubMed

    Demkin, V V; Kirillova, N V

    2012-01-01

    Specific motifs in the genomes of the family Chlamydiaceae were discussed. The search for genetic markers ofbacteria identification and typing is an urgent problem. The progress in sequencing technology resulted in compilation of the database of genomic nucleotide sequences of bacteria. This raised the problem of the search and selection of genetic targets for identification and typing in bacterial genes based on comparative analysis of complete genomic sequences. The goal of this work was to implement comparative genetic analysis of different species of the family Chlamydiaceae. This analysis was focused to detection of specific motifs capable of serving as genetic marker of this family. The consensus domains were detected using the Visual Basic for Application software for MS Excel. Complete coincidence of segments 25 nucleotide long was used as the test for consensus domain selection. One complete genomic sequence for each of 8 bacterial species was taken for the experiment. The experimental sample did not contain complete sequence of C. suis, because at the moment of this research this species was absence in the database GenBank. Comparative assay of the sequences of the C. trachomatis and other representatives of the family Chlamydiaceae revealed 41 common motifs for 8 Chlamydiaceae species tested in this work. The maximal number of consensus motifs was observed in genes of ribosomal RNA and t-RNA. In addition to genes of r-RNA and t-RNA consensus motifs were observed in 5 genes and 6 intergene segments. The gene CTL0299, CTLO800, dagA, and hctA consensus motifs detected in this work can be regarded as identification domains of the family Chlamydiaceae.

  7. Selection of peptide entry motifs by bacterial surface display.

    PubMed Central

    Taschner, Sabine; Meinke, Andreas; von Gabain, Alexander; Boyd, Aoife P

    2002-01-01

    Surface display technologies have been established previously to select peptides and polypeptides that interact with purified immobilized ligands. In the present study, we designed and implemented a surface display-based technique to identify novel peptide motifs that mediate entry into eukaryotic cells. An Escherichia coli library expressing surface-displayed peptides was combined with eukaryotic cells and the gentamicin protection assay was performed to select recombinant E. coli, which were internalized into eukaryotic cells by virtue of the displayed peptides. To establish the proof of principle of this approach, the fibronectin-binding motifs of the fibronectin-binding protein A of Staphylococcus aureus were inserted into the E. coli FhuA protein. Surface expression of the fusion proteins was demonstrated by functional assays and by FACS analysis. The fibronectin-binding motifs were shown to mediate entry of the bacteria into non-phagocytic eukaryotic cells and brought about the preferential selection of these bacteria over E. coli expressing parental FhuA, with an enrichment of 100000-fold. Four entry sequences were selected and identified using an S. aureus library of peptides displayed in the FhuA protein on the surface of E. coli. These sequences included novel entry motifs as well as integrin-binding Arg-Gly-Asp (RGD) motifs and promoted a high degree of bacterial entry. Bacterial surface display is thus a powerful tool to effectively select and identify entry peptide motifs. PMID:12144529

  8. miRDeep*: an integrated application tool for miRNA identification from RNA sequencing data.

    PubMed

    An, Jiyuan; Lai, John; Lehman, Melanie L; Nelson, Colleen C

    2013-01-01

    miRDeep and its varieties are widely used to quantify known and novel micro RNA (miRNA) from small RNA sequencing (RNAseq). This article describes miRDeep*, our integrated miRNA identification tool, which is modeled off miRDeep, but the precision of detecting novel miRNAs is improved by introducing new strategies to identify precursor miRNAs. miRDeep* has a user-friendly graphic interface and accepts raw data in FastQ and Sequence Alignment Map (SAM) or the binary equivalent (BAM) format. Known and novel miRNA expression levels, as measured by the number of reads, are displayed in an interface, which shows each RNAseq read relative to the pre-miRNA hairpin. The secondary pre-miRNA structure and read locations for each predicted miRNA are shown and kept in a separate figure file. Moreover, the target genes of known and novel miRNAs are predicted using the TargetScan algorithm, and the targets are ranked according to the confidence score. miRDeep* is an integrated standalone application where sequence alignment, pre-miRNA secondary structure calculation and graphical display are purely Java coded. This application tool can be executed using a normal personal computer with 1.5 GB of memory. Further, we show that miRDeep* outperformed existing miRNA prediction tools using our LNCaP and other small RNAseq datasets. miRDeep* is freely available online at http://www.australianprostatecentre.org/research/software/mirdeep-star.

  9. Promoter Motifs in NCLDVs: An Evolutionary Perspective.

    PubMed

    Oliveira, Graziele Pereira; Andrade, Ana Cláudia Dos Santos Pereira; Rodrigues, Rodrigo Araújo Lima; Arantes, Thalita Souza; Boratto, Paulo Victor Miranda; Silva, Ludmila Karen Dos Santos; Dornas, Fábio Pio; Trindade, Giliane de Souza; Drumond, Betânia Paiva; La Scola, Bernard; Kroon, Erna Geessien; Abrahão, Jônatas Santos

    2017-01-20

    For many years, gene expression in the three cellular domains has been studied in an attempt to discover sequences associated with the regulation of the transcription process. Some specific transcriptional features were described in viruses, although few studies have been devoted to understanding the evolutionary aspects related to the spread of promoter motifs through related viral families. The discovery of giant viruses and the proposition of the new viral order Megavirales that comprise a monophyletic group, named nucleo-cytoplasmic large DNA viruses (NCLDV), raised new questions in the field. Some putative promoter sequences have already been described for some NCLDV members, bringing new insights into the evolutionary history of these complex microorganisms. In this review, we summarize the main aspects of the transcription regulation process in the three domains of life, followed by a systematic description of what is currently known about promoter regions in several NCLDVs. We also discuss how the analysis of the promoter sequences could bring new ideas about the giant viruses' evolution. Finally, considering a possible common ancestor for the NCLDV group, we discussed possible promoters' evolutionary scenarios and propose the term "MEGA-box" to designate an ancestor promoter motif ('TATATAAAATTGA') that could be evolved gradually by nucleotides' gain and loss and point mutations.

  10. Promoter Motifs in NCLDVs: An Evolutionary Perspective

    PubMed Central

    Oliveira, Graziele Pereira; Andrade, Ana Cláudia dos Santos Pereira; Rodrigues, Rodrigo Araújo Lima; Arantes, Thalita Souza; Boratto, Paulo Victor Miranda; Silva, Ludmila Karen dos Santos; Dornas, Fábio Pio; Trindade, Giliane de Souza; Drumond, Betânia Paiva; La Scola, Bernard; Kroon, Erna Geessien; Abrahão, Jônatas Santos

    2017-01-01

    For many years, gene expression in the three cellular domains has been studied in an attempt to discover sequences associated with the regulation of the transcription process. Some specific transcriptional features were described in viruses, although few studies have been devoted to understanding the evolutionary aspects related to the spread of promoter motifs through related viral families. The discovery of giant viruses and the proposition of the new viral order Megavirales that comprise a monophyletic group, named nucleo-cytoplasmic large DNA viruses (NCLDV), raised new questions in the field. Some putative promoter sequences have already been described for some NCLDV members, bringing new insights into the evolutionary history of these complex microorganisms. In this review, we summarize the main aspects of the transcription regulation process in the three domains of life, followed by a systematic description of what is currently known about promoter regions in several NCLDVs. We also discuss how the analysis of the promoter sequences could bring new ideas about the giant viruses’ evolution. Finally, considering a possible common ancestor for the NCLDV group, we discussed possible promoters’ evolutionary scenarios and propose the term “MEGA-box” to designate an ancestor promoter motif (‘TATATAAAATTGA’) that could be evolved gradually by nucleotides’ gain and loss and point mutations. PMID:28117683

  11. HPV integration detection in CaSki and SiHa using detection of integrated papillomavirus sequences and restriction-site PCR.

    PubMed

    Raybould, Rachel; Fiander, Alison; Wilkinson, Gavin W G; Hibbitts, Sam

    2014-09-01

    Human Papillomavirus (HPV) infection is the primary cause of cervical neoplasia. HPV DNA is integrated into the human genome in the majority of cervical cancers. The nature of integration may differ with integration incorporating a single copy of HPV or occurring in concatenated form. Our understanding of HPV tumorigenesis is largely based on studies using characterised cell lines with defined integration sites; these cell lines provide an invaluable standard for validation of diagnostic assays. Cell lines also further understanding of integration mechanisms in clinical samples. The objective of this study was to explore integration assays and to investigate integration events in cell lines where HPV is integrated in concatenated form. Restriction site PCR and detection of integrated papillomavirus sequences were performed on DNA from SiHa and CaSki. A novel integration site on Xq27.3 and HPV genome rearrangements were detected in CaSki DNA. However, where integration was previously detected by FISH in CaSki, and reported to be integrated in concatenated form, integration was not detected by DIPS or RS-PCR. The data presented illustrate that HPV copy number can hinder integration detection; this needs consideration when interpreting results from tests applied to clinical samples.

  12. Agonist and antagonist switch DNA motifs recognized by human androgen receptor in prostate cancer

    PubMed Central

    Chen, Zhong; Lan, Xun; Thomas-Ahner, Jennifer M; Wu, Dayong; Liu, Xiangtao; Ye, Zhenqing; Wang, Liguo; Sunkel, Benjamin; Grenade, Cassandra; Chen, Junsheng; Zynger, Debra L; Yan, Pearlly S; Huang, Jiaoti; Nephew, Kenneth P; Huang, Tim H-M; Lin, Shili; Clinton, Steven K; Li, Wei; Jin, Victor X; Wang, Qianben

    2015-01-01

    Human transcription factors recognize specific DNA sequence motifs to regulate transcription. It is unknown whether a single transcription factor is able to bind to distinctly different motifs on chromatin, and if so, what determines the usage of specific motifs. By using a motif-resolution chromatin immunoprecipitation-exonuclease (ChIP-exo) approach, we find that agonist-liganded human androgen receptor (AR) and antagonist-liganded AR bind to two distinctly different motifs, leading to distinct transcriptional outcomes in prostate cancer cells. Further analysis on clinical prostate tissues reveals that the binding of AR to these two distinct motifs is involved in prostate carcinogenesis. Together, these results suggest that unique ligands may switch DNA motifs recognized by ligand-dependent transcription factors in vivo. Our findings also provide a broad mechanistic foundation for understanding ligand-specific induction of gene expression profiles. PMID:25535248

  13. A type of nucleotide motif that distinguishes tobamovirus species more efficiently than nucleotide signatures.

    PubMed

    Gibbs, A J; Armstrong, J S; Gibbs, M J

    2004-10-01

    The complete genomic sequences of forty-eight tobamoviruses were classified and found to form at least twelve species clusters. Individual species were not conveniently defined by 'nucleotide signatures' (i.e. strings of one or more nucleotides unique to a taxon) as these were scattered sparsely throughout the genomes and were mostly single nucleotides. By contrast all the species were concisely and uniquely distinguished by short nucleotide motifs consisting of conserved genus-specific sites intercalated with variable sites that provided species-specific combinations of nucleotides (nucleotide combination motifs; NC-motifs). We describe the procedure for finding NC-motifs in a convenient and phylogenetically conserved region of the tobamovirus RNA polymerase gene, the '4404-50 motif'. NC-motifs have been found in other sets of homologous sequences, and are convenient for use in published taxonomic descriptions.

  14. Repair of oxidative DNA base damage in the host genome influences the HIV integration site sequence preference.

    PubMed

    Bennett, Geoffrey R; Peters, Ryan; Wang, Xiao-hong; Hanne, Jeungphill; Sobol, Robert W; Bundschuh, Ralf; Fishel, Richard; Yoder, Kristine E

    2014-01-01

    Host base excision repair (BER) proteins that repair oxidative damage enhance HIV infection. These proteins include the oxidative DNA damage glycosylases 8-oxo-guanine DNA glycosylase (OGG1) and mutY homolog (MYH) as well as DNA polymerase beta (Polβ). While deletion of oxidative BER genes leads to decreased HIV infection and integration efficiency, the mechanism remains unknown. One hypothesis is that BER proteins repair the DNA gapped integration intermediate. An alternative hypothesis considers that the most common oxidative DNA base damages occur on guanines. The subtle consensus sequence preference at HIV integration sites includes multiple G:C base pairs surrounding the points of joining. These observations suggest a role for oxidative BER during integration targeting at the nucleotide level. We examined the hypothesis that BER repairs a gapped integration intermediate by measuring HIV infection efficiency in Polβ null cell lines complemented with active site point mutants of Polβ. A DNA synthesis defective mutant, but not a 5'dRP lyase mutant, rescued HIV infection efficiency to wild type levels; this suggested Polβ DNA synthesis activity is not necessary while 5'dRP lyase activity is required for efficient HIV infection. An alternate hypothesis that BER events in the host genome influence HIV integration site selection was examined by sequencing integration sites in OGG1 and MYH null cells. In the absence of these 8-oxo-guanine specific glycosylases the chromatin elements of HIV integration site selection remain the same as in wild type cells. However, the HIV integration site sequence preference at G:C base pairs is altered at several positions in OGG1 and MYH null cells. Inefficient HIV infection in the absence of oxidative BER proteins does not appear related to repair of the gapped integration intermediate; instead oxidative damage repair may participate in HIV integration site preference at the sequence level.

  15. Chicxulub Post-Impact Sedimentary Sequence: Integrated Borehole Paleogene Carbonate Stratigraphy

    NASA Astrophysics Data System (ADS)

    Fucugauchi, J. U.; Perez-Cruz, L. L.; Escobar-Sanchez, E.; Ortega-Nieto, A.; Velasco-Villarreal, M.

    2014-12-01

    The Chicxulub crater was formed by a bolide impact on the southern Gulf of Mexico at ~66 Ma ago that marked the Cretaceous/Paleogene (K/Pg) boundary, represented worldwide by the ejecta layer. The K/Pg boundary layer with its global distribution provides a high resolution marker, allowing high precision stratigraphic analyses in marine and continental sequences. Following crater formation, sedimentation re-established in the carbonate platform, filling the basin. Crater is located half on-land and half offshore, with the crater floor covered by sediments with variable thickness up to about 1 km. The target, impact and post-impact sequences have been drilled and cored, providing samples for stratigraphic, petrographic and physical-chemical laboratory studies. The post-impact stratigraphy has been analyzed in several studies at proximal, intermediate and distal outcrops and in the crater boreholes, using e.g., radiometric dating, micropaleontology, paleomagnetism, and strontium and stable isotope geochemistry. Emphasis has been given on the impact breccias-carbonates contact and the basal Paleocene sequence. Here we re-analyze the available data, revisiting the stratigraphy for the Santa Elena, Tekax, Peto and Yaxcopoil-1 boreholes using newly constructed detailed lithostratigraphic columns in the continuously cored boreholes. Additionally we extend the study to the Paleogene sequence in the Santa Elena and Yaxcopoil-1 boreholes using bulk carbon and oxygen isotopes, magnetic polarity, XRF core geochemistry and magnetic susceptibility stratigraphy. Results spanning chrons c29 to c24 constrain the K/Pg boundary, c29r-c29n polarity reversal and the Paleocene-Eocene thermal maximum, providing high resolution records. The basal Paleocene gap and age differences in an integrated stratigraphy are discussed and correlated to the GPTS scale and IODP marine isotope records. The extent and characteristics of crater structure and target/cover sediments have been imaged with

  16. An integrated sequence stratigraphic and chronostratigraphic analysis of the Pliocene, Tiburon Basin succession, Mejillones Peninsula, Chile

    NASA Astrophysics Data System (ADS)

    Tapia, Claudio A.; Wilson, Gary S.; Ishman, Scott E.; Wilke, Hans G.; Wartho, Jo-Anne; Winter, Diane; Martínez-Pardo, Rubén

    2015-08-01

    We present new findings from Pliocene marine sediments from the Mejillones Peninsula Tiburon Basin of the northern Chile continental margin that provide constraints for the global sea level record. Sedimentologic and sequence stratigraphic studies reveal facies associations of a continental shelf setting. Textural variations indicate that coarsening and fining up of the succession are due to relative sea level rise and fall, respectively. Magnetostratigraphy was integrated with bio- and tephro- stratigraphic data to construct a record of high-resolution chronology. The age model constrains the Tiburon Basin lower section between 4.2 Ma and 2.8 Ma. The record is likely to be controlled in part by sea level change with orbital periodicities of obliquity (∼ 40 ka of frequency) and, between 3.2 Ma and 2.9 Ma a high-amplitude sea level fall is correlated to global climatic deterioration and the onset of major Northern Hemisphere glaciations.

  17. The HIVToolbox 2 Web System Integrates Sequence, Structure, Function and Mutation Analysis

    PubMed Central

    Sargeant, David P.; Deverasetty, Sandeep; Strong, Christy L.; Alaniz, Izua J.; Bartlett, Alexandria; Brandon, Nicholas R.; Brooks, Steven B.; Brown, Frederick A.; Bufi, Flaviona; Chakarova, Monika; David, Roxanne P.; Dobritch, Karlyn M.; Guerra, Horacio P.; Hedden, Michael W.; Kumra, Rma; Levitt, Kelvy S.; Mathew, Kiran R.; Matti, Ray; Maza, Dorothea Q.; Mistry, Sabyasachy; Novakovic, Nemanja; Pomerantz, Austin; Portillo, Josue; Rafalski, Timothy F.; Rathnayake, Viraj R.; Rezapour, Noura; Songao, Sarah; Tuggle, Sean L.; Yousif, Sandy; Dorsky, David I.; Schiller, Martin R.

    2014-01-01

    There is enormous interest in studying HIV pathogenesis for improving the treatment of patients with HIV infection. HIV infection has become one of the best-studied systems for understanding how a virus can hijack a cell. To help facilitate discovery, we previously built HIVToolbox, a web system for visual data mining. The original HIVToolbox integrated information for HIV protein sequence, structure, functional sites, and sequence conservation. This web system has been used for almost 40,000 searches. We report improvements to HIVToolbox including new functions and workflows, data updates, and updates for ease of use. HIVToolbox2, is an improvement over HIVToolbox with new functions. HIVToolbox2 has new functionalities focused on HIV pathogenesis including drug-binding sites, drug-resistance mutations, and immune epitopes. The integrated, interactive view enables visual mining to generate hypotheses that are not readily revealed by other approaches. Most HIV proteins form multimers, and there are posttranslational modification and protein-protein interaction sites at many of these multimerization interfaces. Analysis of protease drug binding sites reveals an anatomy of drug resistance with different types of drug-resistance mutations regionally localized on the surface of protease. Some of these drug-resistance mutations have a high prevalence in specific HIV-1 M subtypes. Finally, consolidation of Tat functional sites reveals a hotspot region where there appear to be 30 interactions or posttranslational modifications. A cursory analysis with HIVToolbox2 has helped to identify several global patterns for HIV proteins. An initial analysis with this tool identifies homomultimerization of almost all HIV proteins, functional sites that overlap with multimerization sites, a global drug resistance anatomy for HIV protease, and specific distributions of some DRMs in specific HIV M subtypes. HIVToolbox2 is an open-access web application available at [http://hivtoolbox2

  18. Two types of deletion within integrated viral sequences mediate reversion of simian virus 40-transformed mouse cells.

    PubMed Central

    Maruyama, K; Oda, K

    1984-01-01

    Simian virus 40 (SV40) DNA insertions from SV40-transformed mouse cell line W-2K-11 and its revertants M18, M31, and M42 were cloned. W-2K-11 cells contain 1.5 copies of the SV40 sequences in a partially tandem duplicated form. The endpoints of the viral sequences at the virus-host junctions are located very close to those reported by others, indicating that there are some preferred sites for integration and rearrangement in SV40 sequences. One flanking cellular sequence is a long stretch of adenine and thymine with repeated AAAT, and the other is a stretch of guanine and cytosine with repeated CCG. There are patchy homologies between the flanking cellular sequences and the corresponding parental SV40 sequences. The sequences around both junctions were retained in all the revertants, whereas most of the internal SV40 sequences coding for large T antigen were deleted. The coding sequences for small T antigen are intact, and small T antigen was expressed in all the revertants. The fragments cloned from M18 and M42 were identical and 3.9 kilobases of SV40 sequences were deleted. The parental SV40 sequences around the deletion site have sequences capable of forming a secondary structure which might reduce the effective distance between the two regions. The SV40 DNA retained in M31 is colinear with SV40 virion DNA, and a unit length of SV40 DNA was deleted within the SV40 sequences present in W-2K-11 cells. These results indicated that two types of deletion occurred during the reversion, one between homologous sequences and the other between nonhomologous sequences. Images PMID:6319747

  19. A machine-learning approach for predicting palmitoylation sites from integrated sequence-based features.

    PubMed

    Li, Liqi; Luo, Qifa; Xiao, Weidong; Li, Jinhui; Zhou, Shiwen; Li, Yongsheng; Zheng, Xiaoqi; Yang, Hua

    2017-02-01

    Palmitoylation is the covalent attachment of lipids to amino acid residues in proteins. As an important form of protein posttranslational modification, it increases the hydrophobicity of proteins, which contributes to the protein transportation, organelle localization, and functions, therefore plays an important role in a variety of cell biological processes. Identification of palmitoylation sites is necessary for understanding protein-protein interaction, protein stability, and activity. Since conventional experimental techniques to determine palmitoylation sites in proteins are both labor intensive and costly, a fast and accurate computational approach to predict palmitoylation sites from protein sequences is in urgent need. In this study, a support vector machine (SVM)-based method was proposed through integrating PSI-BLAST profile, physicochemical properties, [Formula: see text]-mer amino acid compositions (AACs), and [Formula: see text]-mer pseudo AACs into the principal feature vector. A recursive feature selection scheme was subsequently implemented to single out the most discriminative features. Finally, an SVM method was implemented to predict palmitoylation sites in proteins based on the optimal features. The proposed method achieved an accuracy of 99.41% and Matthews Correlation Coefficient of 0.9773 for a benchmark dataset. The result indicates the efficiency and accuracy of our method in prediction of palmitoylation sites based on protein sequences.

  20. Sequence-based Network Completion Reveals the Integrality of Missing Reactions in Metabolic Networks.

    PubMed

    Krumholz, Elias W; Libourel, Igor G L

    2015-07-31

    Genome-scale metabolic models are central in connecting genotypes to metabolic phenotypes. However, even for well studied organisms, such as Escherichia coli, draft networks do not contain a complete biochemical network. Missing reactions are referred to as gaps. These gaps need to be filled to enable functional analysis, and gap-filling choices influence model predictions. To investigate whether functional networks existed where all gap-filling reactions were supported by sequence similarity to annotated enzymes, four draft networks were supplemented with all reactions from the Model SEED database for which minimal sequence similarity was found in their genomes. Quadratic programming revealed that the number of reactions that could partake in a gap-filling solution was vast: 3,270 in the case of E. coli, where 72% of the metabolites in the draft network could connect a gap-filling solution. Nonetheless, no network could be completed without the inclusion of orphaned enzymes, suggesting that parts of the biochemistry integral to biomass precursor formation are uncharacterized. However, many gap-filling reactions were well determined, and the resulting networks showed improved prediction of gene essentiality compared with networks generated through canonical gap filling. In addition, gene essentiality predictions that were sensitive to poorly determined gap-filling reactions were of poor quality, suggesting that damage to the network structure resulting from the inclusion of erroneous gap-filling reactions may be predictable.

  1. Microelectrophoresis devices with integrated fluorescence detectors and reactors for high-throughput DNA sequencing

    NASA Astrophysics Data System (ADS)

    Soper, Steven A.; Ford, Sean M.; Davies, Jack; Williams, Daryl C.; Cheng, Benxu; Klopf, J. Michael; Calderon, Gina M.; Saile, Volker

    1997-05-01

    This work describes the development of micro-devices for high-throughput DNA sequencing applications. Basically, two research efforts will be discussed; (1) fabrication and characterization of micro-reactors to prepare Sanger chain terminated DNA sequencing fragments on a nanoliter scale and; (2) x-ray photolithography of PMMA substrates for the high aspect ratio preparation of electrophoresis devices. The micro-reactor consisted of a 5'-biotinylated catfish olfactory gene, which was amplified by PCR, and attached to the interior wall of an aminoalkylisilane derivatized fused- silica capillary tube via a streptavidin/biotin linkage. Coverage of the interior capillary wall with biotinylated DNA averaged 77 percent. Stability of the anchored template under pressure and electroosmotic rinsing was favorable, requiring approximately 150 h of continuous rinsing to reduce the coverage by only 50 percent. The capillary micro- reactor was placed inside an air thermocycler to control temperature during Sanger ddNTP chain extension and directly coupled to a capillary separation column filled with a LPA solution via low dead volume capillary interlocks. The complimentary DNA fragments generated in the reactor were heat denatured from the immobilized template and directly injected onto a gel-filled capillary using electropumping for size fractionation and detection using NIR-LIF analysis. The total amount of termination fragments in the 31 nL reactor volume was estimated to be 5.2 X 1013 moles and sequencing was shown to produce read lengths on the order to 400 bases. Work will also be described concerning the development of micro-electrophoresis devices in x-ray sensitive photoresists using LIGA techniques. An electrophoresis device with an integrated fluorescence detector was constructed for the high resolution separation of DNA oligonucleotides. The choice of substrate for the electrophoresis was PMMA, due to its intrinsic low electroosmotic flow. Using x-ray lithography in

  2. The right motifs for plant cell adhesion: what makes an adhesive site?

    PubMed

    Langhans, Markus; Weber, Wadim; Babel, Laura; Grunewald, Miriam; Meckel, Tobias

    2017-01-01

    Cells of multicellular organisms are surrounded by and attached to a matrix of fibrous polysaccharides and proteins known as the extracellular matrix. This fibrous network not only serves as a structural support to cells and tissues but also plays an integral part in the process as important as proliferation, differentiation, or defense. While at first sight, the extracellular matrices of plant and animals do not have much in common, a closer look reveals remarkable similarities. In particular, the proteins involved in the adhesion of the cell to the extracellular matrix share many functional properties. At the sequence level, however, a surprising lack of homology is found between adhesion-related proteins of plants and animals. Both protein machineries only reveal similarities between small subdomains and motifs, which further underlines their functional relationship. In this review, we provide an overview on the similarities between motifs in proteins known to be located at the plant cell wall-plasma membrane-cytoskeleton interface to proteins of the animal adhesome. We also show that by comparing the proteome of both adhesion machineries at the level of motifs, we are also able to identify potentially new candidate proteins that functionally contribute to the adhesion of the plant plasma membrane to the cell wall.

  3. A physical map of the highly heterozygous Populus genome: integration with the genome sequence and genetic map

    SciTech Connect

    Kelleher, Colin; CHIU, Dr. R.; Shin, Dr. H.; Krywinski, Martin; Fjell, Chris; Wilkin, Jennifer; Yin, Tongming; Difazio, Stephen P.

    2007-01-01

    As part of a larger project to sequence the Populus genome and generate genomic resources for this emerging model tree, we constructed a physical map of the Populus genome, representing one of the few such maps of an undomesticated, highly heterozygous plant species. The physical map, consisting of 2802 contigs, was constructed from fingerprinted bacterial artificial chromosome (BAC) clones. The map represents approximately 9.4-fold coverage of the Populus genome, which has been estimated from the genome sequence assembly to be 485 {+-} 10 Mb in size. BAC ends were sequenced to assist long-range assembly of whole-genome shotgun sequence scaffolds and to anchor the physical map to the genome sequence. Simple sequence repeat-based markers were derived from the end sequences and used to initiate integration of the BAC and genetic maps. A total of 2411 physical map contigs, representing 97% of all clones assigned to contigs, were aligned to the sequence assembly (JGI Populus trichocarpa, version 1.0). These alignments represent a total coverage of 384 Mb (79%) of the entire poplar sequence assembly and 295 Mb (96%) of linkage group sequence assemblies. A striking result of the physical map contig alignments to the sequence assembly was the co-localization of multiple contigs across numerous regions of the 19 linkage groups. Targeted sequencing of BAC clones and genetic analysis in a small number of representative regions showed that these co-aligning contigs represent distinct haplotypes in the heterozygous individual sequenced, and revealed the nature of these haplotype sequence differences.

  4. WordSpy: identifying transcription factor binding motifs by building a dictionary and learning a grammar

    PubMed Central

    Wang, Guandong; Yu, Taotao; Zhang, Weixiong

    2005-01-01

    Transcription factor (TF) binding sites or motifs (TFBMs) are functional cis-regulatory DNA sequences that play an essential role in gene transcriptional regulation. Although many experimental and computational methods have been developed, finding TFBMs remains a challenging problem. We propose and develop a novel dictionary based motif finding algorithm, which we call WordSpy. One significant feature of WordSpy is the combination of a word counting method and a statistical model which consists of a dictionary of motifs and a grammar specifying their usage. The algorithm is suitable for genome-wide motif finding; it is capable of discovering hundreds of motifs from a large set of promoters in a single run. We further enhance WordSpy by applying gene expression information to separate true TFBMs from spurious ones, and by incorporating negative sequences to identify discriminative motifs. In addition, we also use randomly selected promoters from the genome to evaluate the significance of the discovered motifs. The output from WordSpy consists of an ordered list of putative motifs and a set of regulatory sequences with motif binding sites highlighted. The web server of WordSpy is available at . PMID:15980501

  5. iSVP: an integrated structural variant calling pipeline from high-throughput sequencing data

    PubMed Central

    2013-01-01

    Background Structural variations (SVs), such as insertions, deletions, inversions, and duplications, are a common feature in human genomes, and a number of studies have reported that such SVs are associated with human diseases. Although the progress of next generation sequencing (NGS) technologies has led to the discovery of a large number of SVs, accurate and genome-wide detection of SVs remains challenging. Thus far, various calling algorithms based on NGS data have been proposed. However, their strategies are diverse and there is no tool able to detect a full range of SVs accurately. Results We focused on evaluating the performance of existing deletion calling algorithms for various spanning ranges from low- to high-coverage simulation data. The simulation data was generated from a whole genome sequence with artificial SVs constructed based on the distribution of variants obtained from the 1000 Genomes Project. From the simulation analysis, deletion calls of various deletion sizes were obtained with each caller, and it was found that the performance was quite different according to the type of algorithms and targeting deletion size. Based on these results, we propose an integrated structural variant calling pipeline (iSVP) that combines existing methods with a newly devised filtering and merging processes. It achieved highly accurate deletion calling with >90% precision and >90% recall on the 30× read data for a broad range of size. We applied iSVP to the whole-genome sequence data of a CEU HapMap sample, and detected a large number of deletions, including notable peaks around 300 bp and 6,000 bp, which corresponded to Alus and long interspersed nuclear elements, respectively. In addition, many of the predicted deletions were highly consistent with experimentally validated ones by other studies. Conclusions We present iSVP, a new deletion calling pipeline to obtain a genome-wide landscape of deletions in a highly accurate manner. From simulation and real data

  6. [Prediction of Promoter Motifs in Virophages].

    PubMed

    Gong, Chaowen; Zhou, Xuewen; Pan, Yingjie; Wang, Yongjie

    2015-07-01

    Virophages have crucial roles in ecosystems and are the transport vectors of genetic materials. To shed light on regulation and control mechanisms in virophage--host systems as well as evolution between virophages and their hosts, the promoter motifs of virophages were predicted on the upstream regions of start codons using an analytical tool for prediction of promoter motifs: Multiple EM for Motif Elicitation. Seventeen potential promoter motifs were identified based on the E-value, location, number and length of promoters in genomes. Sputnik and zamilon motif 2 with AT-rich regions were distributed widely on genomes, suggesting that these motifs may be associated with regulation of the expression of various genes. Motifs containing the TCTA box were predicted to be late promoter motif in mavirus; motifs containing the ATCT box were the potential late promoter motif in the Ace Lake mavirus . AT-rich regions were identified on motif 2 in the Organic Lake virophage, motif 3 in Yellowstone Lake virophage (YSLV)1 and 2, motif 1 in YSLV3, and motif 1 and 2 in YSLV4, respectively. AT-rich regions were distributed widely on the genomes of virophages. All of these motifs may be promoter motifs of virophages. Our results provide insights into further exploration of temporal expression of genes in virophages as well as associations between virophages and giant viruses.

  7. An RNA motif that binds ATP

    NASA Technical Reports Server (NTRS)

    Sassanfar, M.; Szostak, J. W.

    1993-01-01

    RNAs that contain specific high-affinity binding sites for small molecule ligands immobilized on a solid support are present at a frequency of roughly one in 10(10)-10(11) in pools of random sequence RNA molecules. Here we describe a new in vitro selection procedure designed to ensure the isolation of RNAs that bind the ligand of interest in solution as well as on a solid support. We have used this method to isolate a remarkably small RNA motif that binds ATP, a substrate in numerous biological reactions and the universal biological high-energy intermediate. The selected ATP-binding RNAs contain a consensus sequence, embedded in a common secondary structure. The binding properties of ATP analogues and modified RNAs show that the binding interaction is characterized by a large number of close contacts between the ATP and RNA, and by a change in the conformation of the RNA.

  8. Agrobacterium T-DNA integration in Arabidopsis is correlated with DNA sequence compositions that occur frequently in gene promoter regions.

    PubMed

    Schneeberger, Richard G; Zhang, Ke; Tatarinova, Tatiana; Troukhan, Max; Kwok, Shing F; Drais, Josh; Klinger, Kevin; Orejudos, Francis; Macy, Kimberly; Bhakta, Amit; Burns, James; Subramanian, Gopal; Donson, Jonathan; Flavell, Richard; Feldmann, Kenneth A

    2005-10-01

    Mobile insertion elements such as transposons and T-DNA generate useful genetic variation and are important tools for functional genomics studies in plants and animals. The spectrum of mutations obtained in different systems can be highly influenced by target site preferences inherent in the mechanism of DNA integration. We investigated the target site preferences of Agrobacterium T-DNA insertions in the chromosomes of the model plant Arabidopsis thaliana. The relative frequencies of insertions in genic and intergenic regions of the genome were calculated and DNA composition features associated with the insertion site flanking sequences were identified. Insertion frequencies across the genome indicate that T-strand integration is suppressed near centromeres and rDNA loci, progressively increases towards telomeres, and is highly correlated with gene density. At the gene level, T-DNA integration events show a statistically significant preference for insertion in the 5' and 3' flanking regions of protein coding sequences as well as the promoter region of RNA polymerase I transcribed rRNA gene repeats. The increased insertion frequencies in 5' upstream regions compared to coding sequences are positively correlated with gene expression activity and DNA sequence composition. Analysis of the relationship between DNA sequence composition and gene activity further demonstrates that DNA sequences with high CG-skew ratios are consistently correlated with T-DNA insertion site preference and high gene expression. The results demonstrate genomic and gene-specific preferences for T-strand integration and suggest that DNA sequences with a pronounced transition in CG- and AT-skew ratios are preferred targets for T-DNA integration.

  9. The dimerization motif of the glycophorin A transmembrane segment in membranes: importance of glycine residues.

    PubMed

    Brosig, B; Langosch, D

    1998-04-01

    The glycophorin A transmembrane segment homo-dimerizes to a right-handed pair of alpha-helices. Here, we identified the amino acid motif mediating this interaction within a natural membrane environment. Critical residues were grafted onto two different hydrophobic host sequences in a stepwise manner and self-assembly of the hybrid sequences was determined with the ToxR transcription activator system. Our results show that the motif LIxxGxxxGxxxT elicits a level of self-association equivalent to that of the original glycophorin A transmembrane segment. This motif is very similar to the one previously established in detergent solution. Interestingly, the central GxxxG motif by itself already induced strong self-assembly of host sequences and the three-residue spacing between both glycines proved to be optimal for the interaction. The GxxxG element thus appears to be the most crucial part of the interaction motif.

  10. The dimerization motif of the glycophorin A transmembrane segment in membranes: importance of glycine residues.

    PubMed Central

    Brosig, B.; Langosch, D.

    1998-01-01

    The glycophorin A transmembrane segment homo-dimerizes to a right-handed pair of alpha-helices. Here, we identified the amino acid motif mediating this interaction within a natural membrane environment. Critical residues were grafted onto two different hydrophobic host sequences in a stepwise manner and self-assembly of the hybrid sequences was determined with the ToxR transcription activator system. Our results show that the motif LIxxGxxxGxxxT elicits a level of self-association equivalent to that of the original glycophorin A transmembrane segment. This motif is very similar to the one previously established in detergent solution. Interestingly, the central GxxxG motif by itself already induced strong self-assembly of host sequences and the three-residue spacing between both glycines proved to be optimal for the interaction. The GxxxG element thus appears to be the most crucial part of the interaction motif. PMID:9568912

  11. Knowledge discovery of multilevel protein motifs

    SciTech Connect

    Conklin, D.; Glasgow, J.; Fortier, S.

    1994-12-31

    A new category of protein motif is introduced. This type of motif captures, in addition to global structure, the nested structure of its component parts. A dataset of four proteins is represented using this scheme. A structured machine discovery procedure is used to discover recurrent amino acid motifs and this knowledge is utilized for the expression of subsequent protein motif discoveries. Examples of discovered multilevel motifs are presented.

  12. Motif types, motif locations and base composition patterns around the RNA polyadenylation site in microorganisms, plants and animals

    PubMed Central

    2014-01-01

    Background The polyadenylation of RNA is critical for gene functioning, but the conserved sequence motifs (often called signal or signature motifs), motif locations and abundances, and base composition patterns around mRNA polyadenylation [poly(A)] sites are still uncharacterized in most species. The evolutionary tendency for poly(A) site selection is still largely unknown. Results We analyzed the poly(A) site regions of 31 species or phyla. Different groups of species showed different poly(A) signal motifs: UUACUU at the poly(A) site in the parasite Trypanosoma cruzi; UGUAAC (approximately 13 bases upstream of the site) in the alga Chlamydomonas reinhardtii; UGUUUG (or UGUUUGUU) at mainly the fourth base downstream of the poly(A) site in the parasite Blastocystis hominis; and AAUAAA at approximately 16 bases and approximately 19 bases upstream of the poly(A) site in animals and plants, respectively. Polyadenylation signal motifs are usually several hundred times more abundant around poly(A) sites than in whole genomes. These predominant motifs usually had very specific locations, whether upstream of, at, or downstream of poly(A) sites, depending on the species or phylum. The poly(A) site was usually an adenosine (A) in all analyzed species except for B. hominis, and there was weak A predominance in C. reinhardtii. Fungi, animals, plants, and the protist Phytophthora infestans shared a general base abundance pattern (or base composition pattern) of “U-rich—A-rich—U-rich—Poly(A) site—U-rich regions”, or U-A-U-A-U for short, with some variation for each kingdom or subkingdom. Conclusion This study identified the poly(A) signal motifs, motif locations, and base composition patterns around mRNA poly(A) sites in protists, fungi, plants, and animals and provided insight into poly(A) site evolution. PMID:25052519

  13. Sequential visibility-graph motifs

    NASA Astrophysics Data System (ADS)

    Iacovacci, Jacopo; Lacasa, Lucas

    2016-04-01

    Visibility algorithms transform time series into graphs and encode dynamical information in their topology, paving the way for graph-theoretical time series analysis as well as building a bridge between nonlinear dynamics and network science. In this work we introduce and study the concept of sequential visibility-graph motifs, smaller substructures of n consecutive nodes that appear with characteristic frequencies. We develop a theory to compute in an exact way the motif profiles associated with general classes of deterministic and stochastic dynamics. We find that this simple property is indeed a highly informative and computationally efficient feature capable of distinguishing among different dynamics and robust against noise contamination. We finally confirm that it can be used in practice to perform unsupervised learning, by extracting motif profiles from experimental heart-rate series and being able, accordingly, to disentangle meditative from other relaxation states. Applications of this general theory include the automatic classification and description of physical, biological, and financial time series.

  14. Integrating biogeochemistry with multiomic sequence information in a model oxygen minimum zone

    PubMed Central

    Hawley, Alyse K.; Katsev, Sergei; Torres-Beltran, Monica; Bhatia, Maya P.; Kheirandish, Sam; Michiels, Céline C.; Capelle, David; Lavik, Gaute; Doebeli, Michael; Crowe, Sean A.; Hallam, Steven J.

    2016-01-01

    Microorganisms are the most abundant lifeform on Earth, mediating global fluxes of matter and energy. Over the past decade, high-throughput molecular techniques generating multiomic sequence information (DNA, mRNA, and protein) have transformed our perception of this microcosmos, conceptually linking microorganisms at the individual, population, and community levels to a wide range of ecosystem functions and services. Here, we develop a biogeochemical model that describes metabolic coupling along the redox gradient in Saanich Inlet—a seasonally anoxic fjord with biogeochemistry analogous to oxygen minimum zones (OMZs). The model reproduces measured biogeochemical process rates as well as DNA, mRNA, and protein concentration profiles across the redox gradient. Simulations make predictions about the role of ubiquitous OMZ microorganisms in mediating carbon, nitrogen, and sulfur cycling. For example, nitrite “leakage” during incomplete sulfide-driven denitrification by SUP05 Gammaproteobacteria is predicted to support inorganic carbon fixation and intense nitrogen loss via anaerobic ammonium oxidation. This coupling creates a metabolic niche for nitrous oxide reduction that completes denitrification by currently unidentified community members. These results quantitatively improve previous conceptual models describing microbial metabolic networks in OMZs. Beyond OMZ-specific predictions, model results indicate that geochemical fluxes are robust indicators of microbial community structure and reciprocally, that gene abundances and geochemical conditions largely determine gene expression patterns. The integration of real observational data, including geochemical profiles and process rate measurements as well as metagenomic, metatranscriptomic and metaproteomic sequence data, into a biogeochemical model, as shown here, enables holistic insight into the microbial metabolic network driving nutrient and energy flow at ecosystem scales. PMID:27655888

  15. Integrating biogeochemistry with multiomic sequence information in a model oxygen minimum zone.

    PubMed

    Louca, Stilianos; Hawley, Alyse K; Katsev, Sergei; Torres-Beltran, Monica; Bhatia, Maya P; Kheirandish, Sam; Michiels, Céline C; Capelle, David; Lavik, Gaute; Doebeli, Michael; Crowe, Sean A; Hallam, Steven J

    2016-10-04

    Microorganisms are the most abundant lifeform on Earth, mediating global fluxes of matter and energy. Over the past decade, high-throughput molecular techniques generating multiomic sequence information (DNA, mRNA, and protein) have transformed our perception of this microcosmos, conceptually linking microorganisms at the individual, population, and community levels to a wide range of ecosystem functions and services. Here, we develop a biogeochemical model that describes metabolic coupling along the redox gradient in Saanich Inlet-a seasonally anoxic fjord with biogeochemistry analogous to oxygen minimum zones (OMZs). The model reproduces measured biogeochemical process rates as well as DNA, mRNA, and protein concentration profiles across the redox gradient. Simulations make predictions about the role of ubiquitous OMZ microorganisms in mediating carbon, nitrogen, and sulfur cycling. For example, nitrite "leakage" during incomplete sulfide-driven denitrification by SUP05 Gammaproteobacteria is predicted to support inorganic carbon fixation and intense nitrogen loss via anaerobic ammonium oxidation. This coupling creates a metabolic niche for nitrous oxide reduction that completes denitrification by currently unidentified community members. These results quantitatively improve previous conceptual models describing microbial metabolic networks in OMZs. Beyond OMZ-specific predictions, model results indicate that geochemical fluxes are robust indicators of microbial community structure and reciprocally, that gene abundances and geochemical conditions largely determine gene expression patterns. The integration of real observational data, including geochemical profiles and process rate measurements as well as metagenomic, metatranscriptomic and metaproteomic sequence data, into a biogeochemical model, as shown here, enables holistic insight into the microbial metabolic network driving nutrient and energy flow at ecosystem scales.

  16. Interstitial Telomeric Motifs in Squamate Reptiles: When the Exceptions Outnumber the Rule

    PubMed Central

    Rovatsos, Michail; Kratochvíl, Lukáš; Altmanová, Marie; Johnson Pokorná, Martina

    2015-01-01

    Telomeres are nucleoprotein complexes protecting the physical ends of linear eukaryotic chromosomes and therefore helping to ensure their stability and integrity. Additionally, telomeric sequences can be localized in non-terminal regions of chromosomes, forming so-called interstitial telomeric sequences (ITSs). ITSs are traditionally considered to be relics of chromosomal rearrangements and thus very informative in the reconstruction of the evolutionary history of karyotype formation. We examined the distribution of the telomeric motifs (TTAGGG)n using fluorescence in situ hybridization (FISH) in 30 species, representing 17 families of squamate reptiles, and compared them with the collected data from another 38 species from literature. Out of the 68 squamate species analyzed, 35 possess ITSs in pericentromeric regions, centromeric regions and/or within chromosome arms. We conclude that the occurrence of ITSs is rather common in squamates, despite their generally conserved karyotypes, suggesting frequent and independent cryptic chromosomal rearrangements in this vertebrate group. PMID:26252002

  17. Interstitial Telomeric Motifs in Squamate Reptiles: When the Exceptions Outnumber the Rule.

    PubMed

    Rovatsos, Michail; Kratochvíl, Lukáš; Altmanová, Marie; Johnson Pokorná, Martina

    2015-01-01

    Telomeres are nucleoprotein complexes protecting the physical ends of linear eukaryotic chromosomes and therefore helping to ensure their stability and integrity. Additionally, telomeric sequences can be localized in non-terminal regions of chromosomes, forming so-called interstitial telomeric sequences (ITSs). ITSs are traditionally considered to be relics of chromosomal rearrangements and thus very informative in the reconstruction of the evolutionary history of karyotype formation. We examined the distribution of the telomeric motifs (TTAGGG)n using fluorescence in situ hybridization (FISH) in 30 species, representing 17 families of squamate reptiles, and compared them with the collected data from another 38 species from literature. Out of the 68 squamate species analyzed, 35 possess ITSs in pericentromeric regions, centromeric regions and/or within chromosome arms. We conclude that the occurrence of ITSs is rather common in squamates, despite their generally conserved karyotypes, suggesting frequent and independent cryptic chromosomal rearrangements in this vertebrate group.

  18. Rigorous assessment and integration of the sequence and structure based features to predict hot spots

    PubMed Central

    2011-01-01

    effective in predicting hot spots based on sequence features. Hot spots cannot be fully predicted through simple analysis based on physicochemical characteristics, but there is reason to believe that integration of features and machine learning methods can remarkably improve the predictive performance for hot spots. PMID:21798070

  19. Construction of an integrated pepper map using RFLP, SSR, CAPS, AFLP, WRKY, rRAMP, and BAC end sequences.

    PubMed

    Lee, Heung-Ryul; Bae, Ik-Hyun; Park, Soung-Woo; Kim, Hyoun-Joung; Min, Woong-Ki; Han, Jung-Heon; Kim, Ki-Taek; Kim, Byung-Dong

    2009-01-31

    Map-based cloning to find genes of interest, markerassisted selection (MAS), and marker-assisted breeding (MAB) all require good genetic maps with high reproducible markers. For map construction as well as chromosome assignment, development of single copy PCR-based markers and map integration process are necessary. In this study, the 132 markers (57 STS from BAC-end sequences, 13 STS from RFLP, and 62 SSR) were newly developed as single copy type PCR-based markers. They were used together with 1830 markers previously developed in our lab to construct an integrated map with the Joinmap 3.0 program. This integrated map contained 169 SSR, 354 RFLP, 23 STS from BAC-end sequences, 6 STS from RFLP, 152 AFLP, 51 WRKY, and 99 rRAMP markers on 12 chromosomes. The integrated map contained four genetic maps of two interspecific (Capsicum annuum 'TF68' and C. chinense 'Habanero') and two intraspecific (C. annuum 'CM334' and C. annuum 'Chilsungcho') populations of peppers. This constructed integrated map consisted of 805 markers (map distance of 1858 cM) in interspecific populations and 745 markers (map distance of 1892 cM) in intraspecific populations. The used pepper STS were first developed from end sequences of BAC clones from Capsicum annuum 'CM334'. This integrated map will provide useful information for construction of future pepper genetic maps and for assignment of linkage groups to pepper chromosomes.

  20. Revealing constitutively expressed resistance genes in Agrostis species using PCR-based motif-directed RNA fingerprinting.

    PubMed

    Budak, Hikmet; Su, Senem; Ergen, Neslihan

    2006-12-01

    Agrostis species are mainly used in athletic fields and golf courses. Their integrity is maintained by fungicides, which makes the development of disease-resistance varieties a high priority. However, there is a lack of knowledge about resistance (R) genes and their use for genetic improvement in Agrostis species. The objective of this study was to identify and clone constitutively expressed cDNAs encoding R gene-like (RGL) sequences from three Agrostis species (colonial bentgrass (A. capillaris L.), creeping bentgrass (A. stolonifera L.) and velvet bentgrass (A. canina L.)) by PCR-based motif-directed RNA fingerprinting towards relatively conserved nucleotide binding site (NBS) domains. Sixty-one constitutively expressed cDNA sequences were identified and characterized. Sequence analysis of ESTs and probable translation products revealed that RGLs are highly conserved among these three Agrostis species. Fifteen of them were shown to share conserved motifs found in other plant disease resistance genes such as MLA13, Xa1, YR6, YR23 and RPP5. The molecular evolutionary forces, analysed using the Ka/Ks ratio, reflected purifying selection both on NBS and leucine-rich repeat (LRR) intervening regions of discovered RGL sequences in these species. This study presents, for the first time, isolation and characterization of constitutively expressed RGL sequences from Agrostis species revealing the presence of TNL (TIR-NBS-LRR) type R genes in monocot plants. The characterized RGLs will further enhance knowledge on the molecular evolution of the R gene family in grasses.

  1. Integration of latex protein sequence data provides comprehensive functional overview of latex proteins.

    PubMed

    Cho, Won Kyong; Jo, Yeonhwa; Chu, Hyosub; Park, Sang-Ho; Kim, Kook-Hyung

    2014-03-01

    The laticiferous system is one of the most important conduit systems in higher plants, which produces a milky-like sap known as latex. Latex contains diverse secondary metabolites with various ecological functions. To obtain a comprehensive overview of the latex proteome, we integrated available latex proteins sequences and constructed a comprehensive dataset composed of 1,208 non-redundant latex proteins from 20 various latex-bearing plants. The results of functional analyses revealed that latex proteins are involved in various biological processes, including transcription, translation, protein degradation and the plant response to environmental stimuli. The results of the comparative analysis showed that the functions of the latex proteins are similar to those of phloem, suggesting the functional conservation of plant vascular proteins. The presence of latex proteins in mitochondria and plastids suggests the production of diverse secondary metabolites. Furthermore, using a BLAST search, we identified 854 homologous latex proteins in eight plant species, including three latex-bearing plants, such as papaya, caster bean and cassava, suggesting that latex proteins were newly evolved in vascular plants. Taken together, this study is the largest and most comprehensive in silico analysis of the latex proteome. The results obtained here provide useful resources and information for characterizing the evolution of the latex proteome.

  2. Description of the PMAD DC test bed architecture and integration sequence

    NASA Technical Reports Server (NTRS)

    Beach, R. F.; Trash, L.; Fong, D.; Bolerjack, B.

    1991-01-01

    NASA-Lewis is responsible for the development, fabrication, and assembly of the electric power system (EPS) for the Space Station Freedom (SSF). The SSF power system is radically different from previous spacecraft power systems in both the size and complexity of the system. Unlike past spacecraft power system the SSF EPS will grow and be maintained on orbit and must be flexible to meet changing user power needs. The SSF power system is also unique in comparison with terrestrial power systems because it is dominated by power electronic converters which regulate and control the power. Although spacecraft historically have used power converters for regulation they typically involved only a single series regulating element. The SSF EPS involves multiple regulating elements, two or more in series, prior to the load. These unique system features required the construction of a testbed which would allow the development of spacecraft power system technology. A description is provided of the Power Management and Distribution (PMAD) DC Testbed which was assembled to support the design and early evaluation of the SSF EPS. A description of the integration process used in the assembly sequence is also given along with a description of the support facility.

  3. Integrative analyses of transcriptome sequencing identify novel functional lncRNAs in esophageal squamous cell carcinoma

    PubMed Central

    Li, C-Q; Huang, G-W; Wu, Z-Y; Xu, Y-J; Li, X-C; Xue, Y-J; Zhu, Y; Zhao, J-M; Li, M; Zhang, J; Wu, J-Y; Lei, F; Wang, Q-Y; Li, S; Zheng, C-P; Ai, B; Tang, Z-D; Feng, C-C; Liao, L-D; Wang, S-H; Shen, J-H; Liu, Y-J; Bai, X-F; He, J-Z; Cao, H-H; Wu, B-L; Wang, M-R; Lin, D-C; Koeffler, H P; Wang, L-D; Li, X; Li, E-M; Xu, L-Y

    2017-01-01

    Long non-coding RNAs (lncRNAs) have a critical role in cancer initiation and progression, and thus may mediate oncogenic or tumor suppressing effects, as well as be a new class of cancer therapeutic targets. We performed high-throughput sequencing of RNA (RNA-seq) to investigate the expression level of lncRNAs and protein-coding genes in 30 esophageal samples, comprised of 15 esophageal squamous cell carcinoma (ESCC) samples and their 15 paired non-tumor tissues. We further developed an integrative bioinformatics method, denoted URW-LPE, to identify key functional lncRNAs that regulate expression of downstream protein-coding genes in ESCC. A number of known onco-lncRNA and many putative novel ones were effectively identified by URW-LPE. Importantly, we identified lncRNA625 as a novel regulator of ESCC cell proliferation, invasion and migration. ESCC patients with high lncRNA625 expression had significantly shorter survival time than those with low expression. LncRNA625 also showed specific prognostic value for patients with metastatic ESCC. Finally, we identified E1A-binding protein p300 (EP300) as a downstream executor of lncRNA625-induced transcriptional responses. These findings establish a catalog of novel cancer-associated functional lncRNAs, which will promote our understanding of lncRNA-mediated regulation in this malignancy. PMID:28194033

  4. Waves and Particles, The Orbital Atom, Parts One and Two of an Integrated Science Sequence, Teacher's Guide, 1973 Edition.

    ERIC Educational Resources Information Center

    Portland Project Committee, OR.

    This teacher's guide includes parts one and two of the four-part third year Portland Project, a three-year integrated secondary science curriculum sequence. The Harvard Project Physics textbook is used for reading assignments for part one. Assignments relate to waves, light, electricity, magnetic fields, Faraday and the electrical age,…

  5. Motion and Energy Chemical Reactions, Parts One and Two of an Integrated Science Sequence, Teacher's Guide, 1973 Edition.

    ERIC Educational Resources Information Center

    Portland Project Committee, OR.

    This teacher's guide is for the second year of the Portland Project, a three-year integrated secondary science curriculum sequence. The first of two parts in this volume, "Motion and Energy," begins with the study of motion, going from the quantitative description to a consideration of what causes motion and a discussion of Newton's…

  6. Mice and Men Environmental Balance, Parts Three and Four of an Integrated Science Sequence, Teacher's Guide, 1970 Edition.

    ERIC Educational Resources Information Center

    Portland Project Committee, OR.

    This teacher's guide contains parts three and four of the four-part first year Portland Project, a three-year secondary integrated science curriculum sequence. Part three of the guide deals with topics such as the cell, reproduction, embryology, genetics, genetic diseases, genetics and change, populations, effects of density on populations,…

  7. Chemistry of Living Matter, Energy Capture & Growth, Parts Three & Four of an Integrated Science Sequence, Teacher's Guide, 1973 Edition.

    ERIC Educational Resources Information Center

    Portland Project Committee, OR.

    This teacher's guide includes parts three and four of the four-part third year Portland Project, a three-year integrated secondary science curriculum sequence. The underlying intention of the third year is to study energy and its importance to life. Energy-related concepts considered in year one and two, and the concepts related to atomic…

  8. Teaching Note--Integrating Theory and Research Methods in a First-Year Doctoral Sequence or Program

    ERIC Educational Resources Information Center

    Pollio, David E.; MacNeil, Gordon; Womack, Bethany; Brazeal, Michelle; Church, Wesley T., II

    2016-01-01

    This teaching note describes an innovative process in which faculty members worked collaboratively to create an integrated three-course sequence of requisite course content in a PhD program, developed complementary assignments, and coordinated a classroom experience that led to the creation of an individualized area statement and eventual…

  9. SLiMDisc: short, linear motif discovery, correcting for common evolutionary descent

    PubMed Central

    Davey, Norman E.; Shields, Denis C.; Edwards, Richard J.

    2006-01-01

    Many important interactions of proteins are facilitated by short, linear motifs (SLiMs) within a protein's primary sequence. Our aim was to establish robust methods for discovering putative functional motifs. The strongest evidence for such motifs is obtained when the same motifs occur in unrelated proteins, evolving by convergence. In practise, searches for such motifs are often swamped by motifs shared in related proteins that are identical by descent. Prediction of motifs among sets of biologically related proteins, including those both with and without detectable similarity, were made using the TEIRESIAS algorithm. The number of motif occurrences arising through common evolutionary descent were normalized based on treatment of BLAST local alignments. Motifs were ranked according to a score derived from the product of the normalized number of occurrences and the information content. The method was shown to significantly outperform methods that do not discount evolutionary relatedness, when applied to known SLiMs from a subset of the eukaryotic linear motif (ELM) database. An implementation of Multiple Spanning Tree weighting outperformed two other weighting schemes, in a variety of settings. PMID:16855291

  10. Efficient Binding of the NOS1AP C-Terminus to the nNOS PDZ Pocket Requires the Concerted Action of the PDZ Ligand Motif, the Internal ExF Site and Structural Integrity of an Independent Element

    PubMed Central

    Li, Li-Li; Cisek, Katryna; Courtney, Michael J.

    2017-01-01

    Neuronal nitric oxide synthase is widely regarded as an important contributor to a number of disorders of excitable tissues. Recently the adaptor protein NOS1AP has emerged as a contributor to several nNOS-linked conditions. As a consequence, the unexpectedly complex mechanisms of interaction between nNOS and its effector NOS1AP have become a particularly interesting topic from the point of view of both basic research and the potential for therapeutic applications. Here we demonstrate that the concerted action of two previously described motif regions contributing to the interaction of nNOS with NOS1AP, the ExF region and the PDZ ligand motif, efficiently excludes an alternate ligand from the nNOS-PDZ ligand-binding pocket. Moreover, we identify an additional element with a denaturable structure that contributes to interaction of NOS1AP with nNOS. Denaturation does not affect the functions of the individual motifs and results in a relatively mild drop, ∼3-fold, of overall binding affinity of the C-terminal region of NOS1AP for nNOS. However, denaturation selectively prevents the concerted action of the two motifs that normally results in efficient occlusion of the PDZ ligand-binding pocket, and results in 30-fold reduction of competition between NOS1AP and an alternate PDZ ligand. PMID:28360833

  11. Neural Circuits: Male Mating Motifs.

    PubMed

    Benton, Richard

    2015-09-02

    Characterizing microcircuit motifs in intact nervous systems is essential to relate neural computations to behavior. In this issue of Neuron, Clowney et al. (2015) identify recurring, parallel feedforward excitatory and inhibitory pathways in male Drosophila's courtship circuitry, which might explain decisive mate choice.

  12. Integration of Seismic Sequence Analysis and High Resolution Sequence Stratigraphy for Delineating the Sedimentation Characteristics and Modeling of Baltim Area, Off-Shore Nile Delta, Egypt

    NASA Astrophysics Data System (ADS)

    Nasr El-Deen Badawy, A. M. E. S.; Abu El-Ata, A. S. A.; El-Gendy, N. H.

    2014-12-01

    The current study is aiming to discuss the Messinian Prospectivity of the concerned area, which is located in the offshore Nile Delta, about 25 Km from the Mediterranean Sea shoreline. An integrated exploration approach applied, using a variety of the 2D/3D seismic data, subsurface borehole geologic and log data of the selected wells distributed in the study area, as well as the geophysical and biostratigraphic data. The well data comprise well markers, and electric logs, where the geological data represented by litho-stratigraphic information, as well as ditch samples analysis of the studied interval. The geophysical data include check shots, VSP, velocity cubes and 3D seismic lines. Biostratigraphic data include biozones, benthonic to planktonic ratios, nannofossils and foraminiferal data. Seismic interpretation and seismic stratigraphic analysis, in the form of seismic sequence analysis, seismic facies analysis, seismic unit analysis and geologic confirmation have been done by the aid of Petrel and Kingdom computer softwares. The seismic lines were interpreted for defining the different parasequences and picking the various smaller sequences for mapping, after picking each sequence from the seismic correlation, it is facilitated the mapping of every sequence laterally. In addition, the interpretation of structures and isopach of every sequence has been carried out, and the seismic attributes for every sequence were possible, to extract the sands present in each sequence, and to study the extensions of these sands that act as a reservoir. The integration of all results was taken as a base to produce the various models for the study area. The first one was the depositional environmental model, which showed that, the area varies from intertidal-littoral southward at Nidoco wells to inner-middle neritic at Baltim East wells then to outer neritic, and changes to bathyal and then to abyssal at the extreme north. The geologic model for the area was constructed

  13. Off-line consolidation of motor sequence learning results in greater integration within a cortico-striatal functional network.

    PubMed

    Debas, Karen; Carrier, Julie; Barakat, Marc; Marrelec, Guillaume; Bellec, Pierre; Hadj Tahar, Abdallah; Karni, Avi; Ungerleider, Leslie G; Benali, Habib; Doyon, Julien

    2014-10-01

    The consolidation of motor sequence learning is known to depend on sleep. Work in our laboratory and others have shown that the striatum is associated with this off-line consolidation process. In this study, we aimed to quantify the sleep-dependent dynamic changes occurring at the network level using a measure of functional integration. We directly compared changes in connectivity before and after sleep or the simple passage of daytime. As predicted, the results revealed greater integration within the cortico-striatal network after sleep, but not an equivalent daytime period. Importantly, a similar pattern of results was also observed using a data-driven approach; the increase in integration being specific to a cortico-striatal network, but not to other known functional networks. These findings reveal, for the first time, a new signature of motor sequence consolidation: a greater between-regions interaction within the cortico-striatal system.

  14. Onco-Regulon: an integrated database and software suite for site specific targeting of transcription factors of cancer genes

    PubMed Central

    Tomar, Navneet; Mishra, Akhilesh; Mrinal, Nirotpal; Jayaram, B.

    2016-01-01

    Transcription factors (TFs) bind at multiple sites in the genome and regulate expression of many genes. Regulating TF binding in a gene specific manner remains a formidable challenge in drug discovery because the same binding motif may be present at multiple locations in the genome. Here, we present Onco-Regulon (http://www.scfbio-iitd.res.in/software/onco/NavSite/index.htm), an integrated database of regulatory motifs of cancer genes clubbed with Unique Sequence-Predictor (USP) a software suite that identifies unique sequences for each of these regulatory DNA motifs at the specified position in the genome. USP works by extending a given DNA motif, in 5′→3′, 3′ →5′ or both directions by adding one nucleotide at each step, and calculates the frequency of each extended motif in the genome by Frequency Counter programme. This step is iterated till the frequency of the extended motif becomes unity in the genome. Thus, for each given motif, we get three possible unique sequences. Closest Sequence Finder program predicts off-target drug binding in the genome. Inclusion of DNA-Protein structural information further makes Onco-Regulon a highly informative repository for gene specific drug development. We believe that Onco-Regulon will help researchers to design drugs which will bind to an exclusive site in the genome with no off-target effects, theoretically. Database URL: http://www.scfbio-iitd.res.in/software/onco/NavSite/index.htm PMID:27515825

  15. Onco-Regulon: an integrated database and software suite for site specific targeting of transcription factors of cancer genes.

    PubMed

    Tomar, Navneet; Mishra, Akhilesh; Mrinal, Nirotpal; Jayaram, B

    2016-01-01

    Transcription factors (TFs) bind at multiple sites in the genome and regulate expression of many genes. Regulating TF binding in a gene specific manner remains a formidable challenge in drug discovery because the same binding motif may be present at multiple locations in the genome. Here, we present Onco-Regulon (http://www.scfbio-iitd.res.in/software/onco/NavSite/index.htm), an integrated database of regulatory motifs of cancer genes clubbed with Unique Sequence-Predictor (USP) a software suite that identifies unique sequences for each of these regulatory DNA motifs at the specified position in the genome. USP works by extending a given DNA motif, in 5'→3', 3' →5' or both directions by adding one nucleotide at each step, and calculates the frequency of each extended motif in the genome by Frequency Counter programme. This step is iterated till the frequency of the extended motif becomes unity in the genome. Thus, for each given motif, we get three possible unique sequences. Closest Sequence Finder program predicts off-target drug binding in the genome. Inclusion of DNA-Protein structural information further makes Onco-Regulon a highly informative repository for gene specific drug development. We believe that Onco-Regulon will help researchers to design drugs which will bind to an exclusive site in the genome with no off-target effects, theoretically.Database URL: http://www.scfbio-iitd.res.in/software/onco/NavSite/index.htm.

  16. Biophysical analysis of binding of WW domains of the YAP2 transcriptional regulator to PPXY motifs within WBP1 and WBP2 adaptors.

    PubMed

    McDonald, Caleb B; McIntosh, Samantha K N; Mikles, David C; Bhat, Vikas; Deegan, Brian J; Seldeen, Kenneth L; Saeed, Ali M; Buffa, Laura; Sudol, Marius; Nawaz, Zafar; Farooq, Amjad

    2011-11-08

    The YAP2 transcriptional regulator mediates a plethora of cellular functions, including the newly discovered Hippo tumor suppressor pathway, by virtue of its ability to recognize WBP1 and WBP2 signaling adaptors among a wide variety of other ligands. Herein, using isothermal titration calorimery and circular dichroism in combination with molecular modeling and molecular dynamics, we provide evidence that the WW1 and WW2 domains of YAP2 recognize various PPXY motifs within WBP1 and WBP2 in a highly promiscuous and subtle manner. Thus, although both WW domains strictly require the integrity of the consensus PPXY sequence, nonconsensus residues within and flanking this motif are not critical for high-affinity binding, implying that they most likely play a role in stabilizing the polyproline type II helical conformation of the PPXY ligands. Of particular interest is the observation that both WW domains bind to a PPXYXG motif with highest affinity, implicating a preference for a nonbulky and flexible glycine one residue to the C-terminal side of the consensus tyrosine. Importantly, a large set of residues within both WW domains and the PPXY motifs appear to undergo rapid fluctuations on a nanosecond time scale, suggesting that WW-ligand interactions are highly dynamic and that such conformational entropy may be an integral part of the reversible and temporal nature of cellular signaling cascades. Collectively, our study sheds light on the molecular determinants of a key WW-ligand interaction pertinent to cellular functions in health and disease.

  17. BEAUTY: an enhanced BLAST-based search tool that integrates multiple biological information resources into sequence similarity search results.

    PubMed

    Worley, K C; Wiese, B A; Smith, R F

    1995-09-01

    BEAUTY (BLAST enhanced alignment utility) is an enhanced version of the NCBI's BLAST data base search tool that facilitates identification of the functions of matched sequences. We have created new data bases of conserved regions and functional domains for protein sequences in NCBI's Entrez data base, and BEAUTY allows this information to be incorporated directly into BLAST search results. A Conserved Regions Data Base, containing the locations of conserved regions within Entrez protein sequences, was constructed by (1) clustering the entire data base into families, (2) aligning each family using our PIMA multiple sequence alignment program, and (3) scanning the multiple alignments to locate the conserved regions within each aligned sequence. A separate Annotated Domains Data Base was constructed by extracting the locations of all annotated domains and sites from sequences represented in the Entrez, PROSITE, BLOCKS, and PRINTS data bases. BEAUTY performs a BLAST search of those Entrez sequences with conserved regions and/or annotated domains. BEAUTY then uses the information from the Conserved Regions and Annotated Domains data bases to generate, for each matched sequence, a schematic display that allows one to directly compare the relative locations of (1) the conserved regions, (2) annotated domains and sites, and (3) the locally aligned regions matched in the BLAST search. In addition, BEAUTY search results include World-Wide Web hypertext links to a number of external data bases that provide a variety of additional types of information on the function of matched sequences. This convenient integration of protein families, conserved regions, annotated domains, alignment displays, and World-Wide Web resources greatly enhances the biological informativeness of sequence similarity searches. BEAUTY searches can be performed remotely on our system using the "BCM Search Launcher" World-Wide Web pages (URL is < http:/ /gc.bcm.tmc.edu:8088/ search-launcher/launcher.html > ).

  18. A novel approach to identifying regulatory motifs in distantly related genomes

    PubMed Central

    Van Hellemont, Ruth; Monsieurs, Pieter; Thijs, Gert; De Moor, Bart; Van de Peer, Yves; Marchal, Kathleen

    2005-01-01

    Although proven successful in the identification of regulatory motifs, phylogenetic footprinting methods still show some shortcomings. To assess these difficulties, most apparent when applying phylogenetic footprinting to distantly related organisms, we developed a two-step procedure that combines the advantages of sequence alignment and motif detection approaches. The results on well-studied benchmark datasets indicate that the presented method outperforms other methods when the sequences become either too long or too heterogeneous in size. PMID:16420672

  19. Conservation defines functional motifs in the squint/nodal-related 1 RNA dorsal localization element

    PubMed Central

    Gilligan, Patrick C.; Kumari, Pooja; Lim, Shimin; Cheong, Albert; Chang, Alex; Sampath, Karuna

    2011-01-01

    RNA localization is emerging as a general principle of sub-cellular protein localization and cellular organization. However, the sequence and structural requirements in many RNA localization elements remain poorly understood. Whereas transcription factor-binding sites in DNA can be recognized as short degenerate motifs, and consensus binding sites readily inferred, protein-binding sites in RNA often contain structural features, and can be difficult to infer. We previously showed that zebrafish squint/nodal-related 1 (sqt/ndr1) RNA localizes to the future dorsal side of the embryo. Interestingly, mammalian nodal RNA can also localize to dorsal when injected into zebrafish embryos, suggesting that the sequence motif(s) may be conserved, even though the fish and mammal UTRs cannot be aligned. To define potential sequence and structural features, we obtained ndr1 3′-UTR sequences from approximately 50 fishes that are closely, or distantly, related to zebrafish, for high-resolution phylogenetic footprinting. We identify conserved sequence and structural motifs within the zebrafish/carp family and catfish. We find that two novel motifs, a single-stranded AGCAC motif and a small stem-loop, are required for efficient sqt RNA localization. These findings show that comparative sequencing in the zebrafish/carp family is an efficient approach for identifying weak consensus binding sites for RNA regulatory proteins. PMID:21149265

  20. Non-integrating lentiviral vectors based on the minimal S/MAR sequence retain transgene expression in dividing cells.

    PubMed

    Xu, Zhen; Chen, Feng; Zhang, Lingling; Lu, Jing; Xu, Peng; Liu, Guang; Xie, Xuemin; Mu, Wenli; Wang, Yajun; Liu, Depei

    2016-10-01

    Safe and efficient gene transfer systems are the basis of gene therapy applications. Non-integrating lentiviral (NIL) vectors are among the most promising candidates for gene transfer tools, because they exhibit high transfer efficiency in both dividing and non-dividing cells and do not present a risk of insertional mutagenesis. However, non-integrating lentiviral vectors cannot introduce stable exogenous gene expression to dividing cells, thereby limiting their application. Here, we report the design of a non-integrating lentiviral vector that contains the minimal scaffold/matrix attachment region (S/MAR) sequence (SNIL), and this SNIL vector is able to retain episomal transgene expression in dividing cells. Using SNIL vectors, we detected the expression of the eGFP gene for 61 days in SNIL-transduced stable CHO cells, either with selection or not. In the NIL group without the S/MAR sequence, however, the transduced cells died under selection for the transient expression of NIL vectors. Furthermore, Southern blot assays demonstrated that the SNIL vectors were retained extrachromosomally in the CHO cells. In conclusion, the minimal S/MAR sequence retained the non-integrating lentiviral vectors in dividing cells, which indicates that SNIL vectors have the potential for use as a gene transfer tool.

  1. Recurrent mutation of the ID3 gene in Burkitt lymphoma identified by integrated genome, exome and transcriptome sequencing.

    PubMed

    Richter, Julia; Schlesner, Matthias; Hoffmann, Steve; Kreuz, Markus; Leich, Ellen; Burkhardt, Birgit; Rosolowski, Maciej; Ammerpohl, Ole; Wagener, Rabea; Bernhart, Stephan H; Lenze, Dido; Szczepanowski, Monika; Paulsen, Maren; Lipinski, Simone; Russell, Robert B; Adam-Klages, Sabine; Apic, Gordana; Claviez, Alexander; Hasenclever, Dirk; Hovestadt, Volker; Hornig, Nadine; Korbel, Jan O; Kube, Dieter; Langenberger, David; Lawerenz, Chris; Lisfeld, Jasmin; Meyer, Katharina; Picelli, Simone; Pischimarov, Jordan; Radlwimmer, Bernhard; Rausch, Tobias; Rohde, Marius; Schilhabel, Markus; Scholtysik, René; Spang, Rainer; Trautmann, Heiko; Zenz, Thorsten; Borkhardt, Arndt; Drexler, Hans G; Möller, Peter; MacLeod, Roderick A F; Pott, Christiane; Schreiber, Stefan; Trümper, Lorenz; Loeffler, Markus; Stadler, Peter F; Lichter, Peter; Eils, Roland; Küppers, Ralf; Hummel, Michael; Klapper, Wolfram; Rosenstiel, Philip; Rosenwald, Andreas; Brors, Benedikt; Siebert, Reiner

    2012-12-01

    Burkitt lymphoma is a mature aggressive B-cell lymphoma derived from germinal center B cells. Its cytogenetic hallmark is the Burkitt translocation t(8;14)(q24;q32) and its variants, which juxtapose the MYC oncogene with one of the three immunoglobulin loci. Consequently, MYC is deregulated, resulting in massive perturbation of gene expression. Nevertheless, MYC deregulation alone seems not to be sufficient to drive Burkitt lymphomagenesis. By whole-genome, whole-exome and transcriptome sequencing of four prototypical Burkitt lymphomas with immunoglobulin gene (IG)-MYC translocation, we identified seven recurrently mutated genes. One of these genes, ID3, mapped to a region of focal homozygous loss in Burkitt lymphoma. In an extended cohort, 36 of 53 molecularly defined Burkitt lymphomas (68%) carried potentially damaging mutations of ID3. These were strongly enriched at somatic hypermutation motifs. Only 6 of 47 other B-cell lymphomas with the IG-MYC translocation (13%) carried ID3 mutations. These findings suggest that cooperation between ID3 inactivation and IG-MYC translocation is a hallmark of Burkitt lymphomagenesis.

  2. Monoclonal antibody specific to a subclass of polyproline-Arg motif provides evidence for the presence of an snRNA-free spliceosomal Sm protein complex in vivo: implications for molecular interactions involving proline-rich sequences of Sm B/B' proteins.

    PubMed

    Filali, M; Qiu, J; Awasthi, S; Fischer, U; Monos, D; Kamoun, M

    1999-08-01

    The human spliceosomal Sm B/B' proteins are essential for the biogenesis of the snRNP particles. B/B' proteins contain several clusters of the PPPPGM/IR sequence, which occurs within the C-terminus of Sm B/B'. This sequence is very similar to the PPPPPGHR sequence of the cytoplasmic tail of the CD2 receptor and closely resembles the class II of SH3 ligands, suggesting a similarly important role. We report that a monoclonal antibody (3E10) against the PPPPPGHR sequence recognizes spliceosomal Sm B/B' proteins. Proteins that are specifically immunoprecipitated by 3E10 include Sm B, B', D1, D2, D3, E, F, and G. However, unlike Y12 and other anti-Sm immunoprecipitates, 3E10 immunoprecipitates appear to lack the U1 snRNP-specific proteins A and C and U snRNAs. These findings indicate that 3E10 recognizes a subset of Sm protein core and suggest the presence of snRNA-free Sm protein complex(es) in vivo. We propose that the epitope binding for 3E10 may become unaccessible upon interactions of Sm proteins and their subsequent incorporation into the core particles. The Sm proline-rich sequences may have an important role in mediating protein-protein interactions necessary for the proper snRNP core assembly or function, or both. To our knowledge, 3E10 is the first well characterized mAb specific for a subclass of polyproline-arg motif recognizing Sm B/B' and CD2 proteins. 3E10 antibody can be used to further characterize the nature of protein components in the snRNA-free Sm subcore protein complex(es) that are formed during the snRNP core assembly steps.

  3. The widely used Nicotiana benthamiana 16c line has an unusual T-DNA integration pattern including a transposon sequence

    PubMed Central

    Lorenc, Michał T.; Dudley, Kevin J.; Hellens, Roger P.

    2017-01-01

    Nicotiana benthamiana is employed around the world for many types of research and one transgenic line has been used more extensively than any other. This line, 16c, expresses the Aequorea victoria green fluorescent protein (GFP), highly and constitutively, and has been a major resource for visualising the mobility and actions of small RNAs. Insights into the mechanisms studied at a molecular level in N. benthamiana 16c are likely to be deeper and more accurate with a greater knowledge of the GFP gene integration site. Therefore, using next generation sequencing, genome mapping and local alignment, we identified the location and characteristics of the integrated T-DNA. As suggested from previous molecular hybridisation and inheritance data, the transgenic line contains a single GFP-expressing locus. However, the GFP coding sequence differs from that originally reported. Furthermore, a 3.2 kb portion of a transposon, appears to have co-integrated with the T-DNA. The location of the integration mapped to a region of the genome represented by Nbv0.5scaffold4905 in the www.benthgenome.com assembly, and with less integrity to Niben101Scf03641 in the www.solgenomics.net assembly. The transposon is not endogenous to laboratory strains of N. benthamiana or Agrobacterium tumefaciens strain GV3101 (MP90), which was reportedly used in the generation of line 16c. However, it is present in the popular LBA4404 strain. The integrated transposon sequence includes its 5’ terminal repeat and a transposase gene, and is immediately adjacent to the GFP gene. This unexpected genetic arrangement may contribute to the characteristics that have made the 16c line such a popular research tool and alerts researchers, taking transgenic plants to commercial release, to be aware of this genomic hitchhiker. PMID:28231340

  4. The widely used Nicotiana benthamiana 16c line has an unusual T-DNA integration pattern including a transposon sequence.

    PubMed

    Philips, Joshua G; Naim, Fatima; Lorenc, Michał T; Dudley, Kevin J; Hellens, Roger P; Waterhouse, Peter M

    2017-01-01

    Nicotiana benthamiana is employed around the world for many types of research and one transgenic line has been used more extensively than any other. This line, 16c, expresses the Aequorea victoria green fluorescent protein (GFP), highly and constitutively, and has been a major resource for visualising the mobility and actions of small RNAs. Insights into the mechanisms studied at a molecular level in N. benthamiana 16c are likely to be deeper and more accurate with a greater knowledge of the GFP gene integration site. Therefore, using next generation sequencing, genome mapping and local alignment, we identified the location and characteristics of the integrated T-DNA. As suggested from previous molecular hybridisation and inheritance data, the transgenic line contains a single GFP-expressing locus. However, the GFP coding sequence differs from that originally reported. Furthermore, a 3.2 kb portion of a transposon, appears to have co-integrated with the T-DNA. The location of the integration mapped to a region of the genome represented by Nbv0.5scaffold4905 in the www.benthgenome.com assembly, and with less integrity to Niben101Scf03641 in the www.solgenomics.net assembly. The transposon is not endogenous to laboratory strains of N. benthamiana or Agrobacterium tumefaciens strain GV3101 (MP90), which was reportedly used in the generation of line 16c. However, it is present in the popular LBA4404 strain. The integrated transposon sequence includes its 5' terminal repeat and a transposase gene, and is immediately adjacent to the GFP gene. This unexpected genetic arrangement may contribute to the characteristics that have made the 16c line such a popular research tool and alerts researchers, taking transgenic plants to commercial release, to be aware of this genomic hitchhiker.

  5. Integrated bioinformatics analysis of chromatin regulator EZH2 in regulating mRNA and lncRNA expression by ChIP sequencing and RNA sequencing

    PubMed Central

    Li, Yuan; Luo, Mei; Shi, Xuejiao; Lu, Zhiliang; Sun, Shouguo; Huang, Jianbing; Chen, Zhaoli; He, Jie

    2016-01-01

    Enhancer of zeste homolog 2 (EZH2), a dynamic chromatin regulator in cancer, represents a potential therapeutic target showing early signs of promise in clinical trials. EZH2 ChIP sequencing data in 19 cell lines and RNA sequencing data in ten cancer types were downloaded from GEO and TCGA, respectively. Integrated ChIP sequencing analysis and co-expressing analysis were conducted and both mRNA and long noncoding RNA (lncRNA) targets were detected. We detected a median of 4,672 mRNA targets and 4,024 lncRNA targets regulated by EZH2 in 19 cell lines. 20 mRNA targets and 27 lncRNA targets were found in all 19 cell lines. These mRNA targets were enriched in pathways in cancer, Hippo, Wnt, MAPK and PI3K-Akt pathways. Co-expression analysis confirmed numerous targets, mRNA genes (RRAS, TGFBR2, NUF2 and PRC1) and lncRNA genes (lncRNA LINC00261, DIO3OS, RP11-307C12.11 and RP11-98D18.9) were potential targets and were significantly correlated with EZH2. We predicted genome-wide potential targets and the role of EZH2 in regulating as a transcriptional suppressor or activator which could pave the way for mechanism studies and the targeted therapy of EZH2 in cancer. PMID:27835578

  6. Bayesian multiple-instance motif discovery with BAMBI: inference of recombinase and transcription factor binding sites

    PubMed Central

    Jajamovich, Guido H.; Wang, Xiaodong; Arkin, Adam P.; Samoilov, Michael S.

    2011-01-01

    Finding conserved motifs in genomic sequences represents one of essential bioinformatic problems. However, achieving high discovery performance without imposing substantial auxiliary constraints on possible motif features remains a key algorithmic challenge. This work describes BAMBI—a sequential Monte Carlo motif-identification algorithm, which is based on a position weight matrix model that does not require additional constraints and is able to estimate such motif properties as length, logo, number of instances and their locations solely on the basis of primary nucleotide sequence data. Furthermore, should biologically meaningful information about motif attributes be available, BAMBI takes advantage of this knowledge to further refine the discovery results. In practical applications, we show that the proposed approach can be used to find sites of such diverse DNA-binding molecules as the cAMP receptor protein (CRP) and Din-family site-specific serine recombinases. Results obtained by BAMBI in these and other settings demonstrate better statistical performance than any of the four widely-used profile-based motif discovery methods: MEME, BioProspector with BioOptimizer, SeSiMCMC and Motif Sampler as measured by the nucleotide-level correlation coefficient. Additionally, in the case of Din-family recombinase target site discovery, the BAMBI-inferred motif is found to be the only one functionally accurate from the underlying biochemical mechanism standpoint. C++ and Matlab code is available at http://www.ee.columbia.edu/~guido/BAMBI or http://genomics.lbl.gov/BAMBI/. PMID:21948794

  7. Integrating genomic information with protein sequence and 3D atomic level structure at the RCSB protein data bank.

    PubMed

    Prlić, Andreas; Kalro, Tara; Bhattacharya, Roshni; Christie, Cole; Burley, Stephen K; Rose, Peter W

    2016-12-15

    The Protein Data Bank (PDB) now contains more than 120,000 three-dimensional (3D) structures of biological macromolecules. To allow an interpretation of how PDB data relates to other publicly available annotations, we developed a novel data integration platform that maps 3D structural information across various datasets. This integration bridges from the human genome across protein sequence to 3D structure space. We developed novel software solutions for data management and visualization, while incorporating new libraries for web-based visualization using SVG graphics.

  8. Comprehensive discovery of DNA motifs in 349 human cells and tissues reveals new features of motifs.

    PubMed

    Zheng, Yiyu; Li, Xiaoman; Hu, Haiyan

    2015-01-01

    Comprehensive motif discovery under experimental conditions is critical for the global understanding of gene regulation. To generate a nearly complete list of human DNA motifs under given conditions, we employed a novel approach to de novo discover significant co-occurring DNA motifs in 349 human DNase I hypersensitive site datasets. We predicted 845 to 1325 motifs in each dataset, for a total of 2684 non-redundant motifs. These 2684 motifs contained 54.02 to 75.95% of the known motifs in seven large collections including TRANSFAC. In each dataset, we also discovered 43 663 to 2 013 288 motif modules, groups of motifs with their binding sites co-occurring in a significant number of short DNA regions. Compared with known interacting transcription factors in eight resources, the predicted motif modules on average included 84.23% of known interacting motifs. We further showed new features of the predicted motifs, such as motifs enriched in proximal regions rarely overlapped with motifs enriched in distal regions, motifs enriched in 5' distal regions were often enriched in 3' distal regions, etc. Finally, we observed that the 2684 predicted motifs classified the cell or tissue types of the datasets with an accuracy of 81.29%. The resources generated in this study are available at http://server.cs.ucf.edu/predrem/.

  9. RNAMotifScanX: a graph alignment approach for RNA structural motif identification.

    PubMed

    Zhong, Cuncong; Zhang, Shaojie

    2015-03-01

    RNA structural motifs are recurrent three-dimensional (3D) components found in the RNA architecture. These RNA structural motifs play important structural or functional roles and usually exhibit highly conserved 3D geometries and base-interaction patterns. Analysis of the RNA 3D structures and elucidation of their molecular functions heavily rely on efficient and accurate identification of these motifs. However, efficient RNA structural motif search tools are lacking due to the high complexity of these motifs. In this work, we present RNAMotifScanX, a motif search tool based on a base-interaction graph alignment algorithm. This novel algorithm enables automatic identification of both partially and fully matched motif instances. RNAMotifScanX considers noncanonical base-pairing interactions, base-stacking interactions, and sequence conservation of the motifs, which leads to significantly improved sensitivity and specificity as compared with other state-of-the-art search tools. RNAMotifScanX also adopts a carefully designed branch-and-bound technique, which enables ultra-fast search of large kink-turn motifs against a 23S rRNA. The software package RNAMotifScanX is implemented using GNU C++, and is freely available from http://genome.ucf.edu/RNAMotifScanX.

  10. A poly(A) binding protein-specific sequence motif: MRTENGKSKGFGFVC binding to mRNA poly(A) and polynucleotides and its role on mRNA translation.

    PubMed

    Rubin, H N; Halim, M N; Leavis, P C

    1994-06-01

    A consensus sequence (GKSKGFGFV) was recognized in all the sequenced poly(A) binding proteins. We synthesized a 15-amino acid peptide (corresponding to 354-368 in the yeast poly(A) binding protein) which includes the consensus sequence to test its binding affinity to different nucleotides, polynucleotides and mRNA with or without a poly(A) tail. Biochemical and biophysical studies revealed that the 15-amino acid peptide has a strong binding affinity to poly(A) alone or poly(A) attached at the 3' end of mRNA. Circular dichroism spectroscopy demonstrated that the secondary structure of the 15-mer is consistent with that expected based on the structure of the native RNP domain. Furthermore, among the various mononucleotides performed in the present studies, ATP was preferentially found to bind to the 15-mer. To further examine the biological significance of the binding of the 15-mer to the poly(A) tail of mRNA, in vitro translation of the mRNA poly(A)+ in the presence of the 15-mer drastically increased globin synthesis by almost 2-fold, while translation of the deadenylated mRNA in the presence of the 15-mer almost did not alter the rate of incorporation of radiolabeled leucine into globin.

  11. Combinatorial motif analysis of regulatory gene expression in Mafb deficient macrophages

    PubMed Central

    2011-01-01

    Background Deficiency of the transcription factor MafB, which is normally expressed in macrophages, can underlie cellular dysfunction associated with a range of autoimmune diseases and arteriosclerosis. MafB has important roles in cell differentiation and regulation of target gene expression; however, the mechanisms of this regulation and the identities of other transcription factors with which MafB interacts remain uncertain. Bioinformatics methods provide a valuable approach for elucidating the nature of these interactions with transcriptional regulatory elements from a large number of DNA sequences. In particular, identification of patterns of co-occurrence of regulatory cis-elements (motifs) offers a robust approach. Results Here, the directional relationships among several functional motifs were evaluated using the Log-linear Graphical Model (LGM) after extraction and search for evolutionarily conserved motifs. This analysis highlighted GATA-1 motifs and 5’AT-rich half Maf recognition elements (MAREs) in promoter regions of 18 genes that were down-regulated in Mafb deficient macrophages. GATA-1 motifs and MafB motifs could regulate expression of these genes in both a negative and positive manner, respectively. The validity of this conclusion was tested with data from a luciferase assay that used a C1qa promoter construct carrying both the GATA-1 motifs and MAREs. GATA-1 was found to inhibit the activity of the C1qa promoter with the GATA-1 motifs and MafB motifs. Conclusions These observations suggest that both the GATA-1 motifs and MafB motifs are important for lineage specific expression of C1qa. In addition, these findings show that analysis of combinations of evolutionarily conserved motifs can be successfully used to identify patterns of gene regulation. PMID:22784578

  12. Integration of PacBio RS into Massive Parallel Sequencing and Data Analysis Pipelining at the UC Davis Genome Center

    PubMed Central

    Vanessa, Rashbrook; O'Geen, Henriette; Nguyen, Oanh; Ashtari, Siranoosh; Fan, Xiaohong; Kim, Ryan

    2013-01-01

    Whole genome sequencing and genomic biology has been widely adopted in many fields of biology as next-generation sequencing technology (NGS) has rapidly improved quality, read length, and throughput to make whole genome sequencing and association studies possible in a very cost effective manner. Continued improvement and development of sample preparation protocols and data analysis tools have been significant in helping to extend genome sequencing technology to genomes that were previously difficult to sequence. Recent arrival of Pacific Biosciences RS (PacBio) contributed in furthering such opportunity by providing options for single molecule long read sequencing in real time and kinetic analysis (methylation). PacBio has been employed successfully for sequencing low complexity genomic region such as extremely high GC, long repeats, rearrangement, gene fusion, etc. In this poster we present the optimization of PacBio sample preparation that was fine-tuned to meet unique challenges of sequencing through “difficult-to-sequence” template. We discuss the integration of PacBio into the wet lab equipped with other NGS platforms and data pipelining workflow including cloud computing and robotic sample preparation at the Genome Center. UC Davis Genome Center currently operates NGS technology platforms including HiSeq, MiSeq, PacBio, and has genotyping capacity using Illumina Infinium and GoldenGate technology. UC Davis Genome Center and Bioinformatics Program provides most up-to-date genome technology and informatics support tailored for specific biological goals meeting needs for more than 80 faculty members within Genome Center and more than 200 campus and off-campus researchers.

  13. DRIMust: a web server for discovering rank imbalanced motifs using suffix trees

    PubMed Central

    Leibovich, Limor; Paz, Inbal; Yakhini, Zohar; Mandel-Gutfreund, Yael

    2013-01-01

    Cellular regulation mechanisms that involve proteins and other active molecules interacting with specific targets often involve the recognition of sequence patterns. Short sequence elements on DNA, RNA and proteins play a central role in mediating such molecular recognition events. Studies that focus on measuring and investigating sequence-based recognition processes make use of statistical and computational tools that support the identification and understanding of sequence motifs. We present a new web application, named DRIMust, freely accessible through the website http://drimust.technion.ac.il for de novo motif discovery services. The DRIMust algorithm is based on the minimum hypergeometric statistical framework and uses suffix trees for an efficient enumeration of motif candidates. DRIMust takes as input ranked lists of sequences in FASTA format and returns motifs that are over-represented at the top of the list, where the determination of the threshold that defines top is data driven. The resulting motifs are presented individually with an accurate P-value indication and as a Position Specific Scoring Matrix. Comparing DRIMust with other state-of-the-art tools demonstrated significant advantage to DRIMust, both in result accuracy and in short running times. Overall, DRIMust is unique in combining efficient search on large ranked lists with rigorous P-value assessment for the detected motifs. PMID:23685432

  14. DRIMust: a web server for discovering rank imbalanced motifs using suffix trees.

    PubMed

    Leibovich, Limor; Paz, Inbal; Yakhini, Zohar; Mandel-Gutfreund, Yael

    2013-07-01

    Cellular regulation mechanisms that involve proteins and other active molecules interacting with specific targets often involve the recognition of sequence patterns. Short sequence elements on DNA, RNA and proteins play a central role in mediating such molecular recognition events. Studies that focus on measuring and investigating sequence-based recognition processes make use of statistical and computational tools that support the identification and understanding of sequence motifs. We present a new web application, named DRIMust, freely accessible through the website http://drimust.technion.ac.il for de novo motif discovery services. The DRIMust algorithm is based on the minimum hypergeometric statistical framework and uses suffix trees for an efficient enumeration of motif candidates. DRIMust takes as input ranked lists of sequences in FASTA format and returns motifs that are over-represented at the top of the list, where the determination of the threshold that defines top is data driven. The resulting motifs are presented individually with an accurate P-value indication and as a Position Specific Scoring Matrix. Comparing DRIMust with other state-of-the-art tools demonstrated significant advantage to DRIMust, both in result accuracy and in short running times. Overall, DRIMust is unique in combining efficient search on large ranked lists with rigorous P-value assessment for the detected motifs.

  15. Integrated protein function prediction by mining function associations, sequences, and protein–protein and gene–gene interaction networks

    PubMed Central

    Cao, Renzhi; Cheng, Jianlin

    2016-01-01

    Motivations Protein function prediction is an important and challenging problem in bioinformatics and computational biology. Functionally relevant biological information such as protein sequences, gene expression, and protein–protein interactions has been used mostly separately for protein function prediction. One of the major challenges is how to effectively integrate multiple sources of both traditional and new information such as spatial gene–gene interaction networks generated from chromosomal conformation data together to improve protein function prediction. Results In this work, we developed three different probabilistic scores (MIS, SEQ, and NET score) to combine protein sequence, function associations, and protein–protein interaction and spatial gene–gene interaction networks for protein function prediction. The MIS score is mainly generated from homologous proteins found by PSI-BLAST search, and also association rules between Gene Ontology terms, which are learned by mining the Swiss-Prot database. The SEQ score is generated from protein sequences. The NET score is generated from protein–protein interaction and spatial gene–gene interaction networks. These three scores were combined in a new Statistical Multiple Integrative Scoring System (SMISS) to predict protein function. We tested SMISS on the data set of 2011 Critical Assessment of Function Annotation (CAFA). The method performed substantially better than three base-line methods and an advanced method based on protein profile–sequence comparison, profile–profile comparison, and domain co-occurrence networks according to the maximum F-measure. PMID:26370280

  16. Multi-sequence magnetic resonance imaging integration framework for image-guided catheter ablation of scar-related ventricular tachycardia

    NASA Astrophysics Data System (ADS)

    Tao, Qian; Milles, Julien; van Huls van Taxis, Carine; Reiber, Johan H. C.; Zeppenfeld, Katja; van der Geest, Rob J.

    2012-02-01

    Catheter ablation is an important option to treat ventricular tachycardias (VT). Scar-related VT is among the most difficult to treat, because myocardial scar, which is the underlying arrhythmogenic substrate, is patient-specific and often highly complex. The scar image from preprocedural late gadolinium enhancement magnetic resonance imaging (LGE- MRI) can provide high-resolution substrate information and, if integrated at the early stage of the procedure, can largely facilitate the procedure with image guidance. In clinical practice, however, early MRI integration is difficult because available integration tools rely on matching the MRI surface mesh and electroanatomical mapping (EAM) points, which is only possible after extensive EAM has been performed. In this paper, we propose to use a priori information on patient posture and a multi-sequence MRI integration framework to achieve accurate MRI integration that can be accomplished at an early stage of the procedure. From the MRI sequences, the left ventricular (LV) geometry, myocardial scar characteristics, and an anatomical landmark indicating the origin of the left main coronary artery are obtained preprocedurally using image processing techniques. Thereby the integration can be realized at the beginning of the procedure after acquiring a single mapping point. The integration method has been evaluated postprocedurally in terms of LV shape match and actual scar match. Compared to the iterative closest point (ICP) method that uses high-intensity mapping (225+/-49 points), our method using one mapping point reached a mean point-to-surface distance of 5.09+/-1.09 mm (vs. 3.85+/-0.60 mm, p<0.05), and scar correlation of -0.51+/-0.14 (vs. -0.50+/-0.14, p=NS).

  17. Phylogenomics-guided discovery of a novel conserved cassette of short linear motifs in BubR1 essential for the spindle checkpoint

    PubMed Central

    Bade, Debora

    2016-01-01

    The spindle assembly checkpoint (SAC) maintains genomic integrity by preventing progression of mitotic cell division until all chromosomes are stably attached to spindle microtubules. The SAC critically relies on the paralogues Bub1 and BubR1/Mad3, which integrate kinetochore–spindle attachment status with generation of the anaphase inhibitory complex MCC. We previously reported on the widespread occurrences of independent gene duplications of an ancestral ‘MadBub’ gene in eukaryotic evolution and the striking parallel subfunctionalization that lead to loss of kinase function in BubR1/Mad3-like paralogues. Here, we present an elaborate subfunctionalization analysis of the Bub1/BubR1 gene family and perform de novo sequence discovery in a comparative phylogenomics framework to trace the distribution of ancestral sequence features to extant paralogues throughout the eukaryotic tree of life. We show that known ancestral sequence features are consistently retained in the same functional paralogue: GLEBS/CMI/CDII/kinase in the Bub1-like and KEN1/KEN2/D-Box in the BubR1/Mad3-like. The recently described ABBA motif can be found in either or both paralogues. We however discovered two additional ABBA motifs that flank KEN2. This cassette of ABBA1-KEN2-ABBA2 forms a strictly conserved module in all ancestral and BubR1/Mad3-like proteins, suggestive of a specific and crucial SAC function. Indeed, deletion of the ABBA motifs in human BUBR1 abrogates the SAC and affects APC/C–Cdc20 interactions. Our detailed comparative genomics analyses thus enabled discovery of a conserved cassette of motifs essential for the SAC and shows how this approach can be used to uncover hitherto unrecognized functional protein features. PMID:28003474

  18. Dynamic hydrogen bonding and DNA flexibility in minor groove binders: molecular dynamics simulation of the polyamide f-ImPyIm bound to the Mlu1 (MCB) sequence 5'-ACGCGT-3' in 2:1 motif.

    PubMed

    Bruce, Chrystal D; Ferrara, Maddi M; Manka, Julie L; Davis, Zachary S; Register, Janna

    2015-05-01

    Molecular dynamics simulations of the DNA 10-mer 5'-CCACGCGTGG-3' alone and complexed with the formamido-imidazole-pyrrole-imidazole (f-ImPyIm) polyamide minor groove binder in a 2:1 fashion were conducted for 50 ns using the pbsc0 parameters within the AMBER 12 software package. The change in DNA structure upon binding of f-ImPyIm was evaluated via minor groove width and depth, base pair parameters of Slide, Twist, Roll, Stretch, Stagger, Opening, Propeller, and x-displacement, dihedral angle distributions of ζ, ε, α, and γ determined using the Curves+ software program, and hydrogen bond formation. The dynamic hydrogen bonding between the f-ImPyIm and its cognate DNA sequence was compared to the static image used to predict sequence recognition by polyamide minor groove binders. Many of the predicted hydrogen bonds were present in less than 50% of the simulation; however, persistent hydrogen bonds between G5/15 and the formamido group of f-ImPyIm were observed. It was determined that the DNA is wider in the Complex than without the polyamide binder; however, there is flexibility in this particular sequence, even in the presence of the f-ImPyIm as evidenced by the range of minor groove widths the DNA exhibits and the dynamics of the hydrogen bonding that binds the two f-ImPyIm ions to the minor groove. The Complex consisting of the DNA and the 2 f-ImPyIm binders shows slight fraying of the 5' end of the 10-mer at the end of the simulation, but the portion of the oligomer responsible for recognition and binding is stable throughout the simulation. Several structural changes in the Complex indicate that minor groove binders may have a more active role in inhibiting transcription than just preventing binding of important transcription factors.

  19. Definition of a GC-rich motif as regulatory sequence of the human IL-3 gene: coordinate regulation of the IL-3 gene by CLE2/GC box of the GM-CSF gene in T cell activation.

    PubMed

    Nishida, J; Yoshida, M; Arai, K; Yokota, T

    1991-03-01

    The human IL-3 gene, located on chromosome 5, contains several cis-acting DNA sequences, i.e. CLE (conserved lymphokine element) and a GC-rich region, similar to the GM-CSF gene. To investigate the role of these elements, the 5' flanking region of the IL-3 gene was attached to a bacterial chloramphenicol acetyltransferase (CAT) gene. The fusion plasmids were analyzed by an in vitro transcription system using Jurkat cell nuclear extract prepared from cells stimulated with phorbol-12-myristate-13-acetate and calcium ionophore (PMA/A23187), introduced into Jurkat cells, expressed transiently, and stimulated by co-transfection of human T cell leukemia virus type I (HTLV-I) encoded transactivator, p40tax. The GC-rich region enhanced TATA-dependent transcription in the in vitro transcription system and also strongly responded to p40tax stimulation in the in vivo cotransfection assay. Using this GC-rich region as a probe, we identified a constitutive DNA-protein complex, alpha, whose binding specificity correlates with transcription activity. However, this element is not sufficient for the expression of the IL-3 gene in response to T cell activation signals (PMA/A23187) and no sequence was found within the IL-3 gene which mediates the response to PMA/A23187. The enhancer sequence which responds to T cell activation signals may be located outside the IL-3 gene and may be shared by other lymphokines, possibly by GM-CSF. We propose that the GM-CSF enhancer (CLE2/GC box) which mediates the response to T cell activation signals may stimulate the expression of the IL-3 gene.

  20. Regulatory motifs are present in the ITS1 of some flatworm species.

    PubMed

    Van Herwerden, Lynne; Caley, M Julian; Blair, David

    2003-04-15

    Particular sequence motifs can act as transcription regulators. Because the total regulatory effects of such motifs can be related to their abundance, their presence might be expected at locations within the genome where sequences are repeated. Multiple repeats that vary in number among individuals occur within the ribosomal first internal transcribed spacer (ITS1) in some species in three trematode genera: Paragonimus, Schistosoma and Dolichosaccus. In all of these genera we found in ITS1, sequences identical to known enhancer motifs. We also searched for, and identified, known regulatory motifs in published ITS1 sequences of other parasitic flatworms including Echinostoma spp. (Trematoda) and Echinococcus spp. (Cestoda) which lack multiple repeats in ITS1. We present three lines of evidence that this widespread occurrence of such motifs within the ITS1 of parasitic flatworms may indicate a functional role in regulating tissue- or stage-specific transcription of ribosomal genes. First, these motifs are identical to ones whose functional roles have been established using in vitro assays of transcriptional rates. Second, in all 18 species investigated here, between one and three different regulatory motifs were identified. In 14 of these 18 species, the probability that at least one of these motifs occurred because of the random assortment of bases within the regions investigated was 10% or less. In 12 of these 14 species, the probability was 5% or less. Third, the evolutionary divergence of flatworm species investigated is quite ancient. Therefore, the interspecific distribution of motifs observed here, in a rapidly evolving region such as ITS1, is unlikely to be attributable solely to shared evolutionary histories. These results, therefore, suggest a broader functional role for the ITS1 than previously thought.

  1. Proteogenomics: Integrating Next-Generation Sequencing and Mass Spectrometry to Characterize Human Proteomic Variation

    NASA Astrophysics Data System (ADS)

    Sheynkman, Gloria M.; Shortreed, Michael R.; Cesnik, Anthony J.; Smith, Lloyd M.

    2016-06-01

    Mass spectrometry-based proteomics has emerged as the leading method for detection, quantification, and characterization of proteins. Nearly all proteomic workflows rely on proteomic databases to identify peptides and proteins, but these databases typically contain a generic set of proteins that lack variations unique to a given sample, precluding their detection. Fortunately, proteogenomics enables the detection of such proteomic variations and can be defined, broadly, as the use of nucleotide sequences to generate candidate protein sequences for mass spectrometry database searching. Proteogenomics is experiencing heightened significance due to two developments: (a) advances in DNA sequencing technologies that have made complete sequencing of human genomes and transcriptomes routine, and (b) the unveiling of the tremendous complexity of the human proteome as expressed at the levels of genes, cells, tissues, individuals, and populations. We review here the field of human proteogenomics, with an emphasis on its history, current implementations, the types of proteomic variations it reveals, and several important applications.

  2. The role of integrated databases in microbial genome sequence analysis and metabolic reconstruction

    SciTech Connect

    Gaasterland, T., Maltsev, N., Overbeek, R.

    1997-02-01

    This paper provides an overview of the PUMA system which provides access to data about metabolic pathways, enzymes, compounds, organisms, encoded activity, and assay condition information for enzymes in particular organisms and multiple sequence alignments.

  3. Proteogenomics: Integrating Next-Generation Sequencing and Mass Spectrometry to Characterize Human Proteomic Variation.

    PubMed

    Sheynkman, Gloria M; Shortreed, Michael R; Cesnik, Anthony J; Smith, Lloyd M

    2016-06-12

    Mass spectrometry-based proteomics has emerged as the leading method for detection, quantification, and characterization of proteins. Nearly all proteomic workflows rely on proteomic databases to identify peptides and proteins, but these databases typically contain a generic set of proteins that lack variations unique to a given sample, precluding their detection. Fortunately, proteogenomics enables the detection of such proteomic variations and can be defined, broadly, as the use of nucleotide sequences to generate candidate protein sequences for mass spectrometry database searching. Proteogenomics is experiencing heightened significance due to two developments: (a) advances in DNA sequencing technologies that have made complete sequencing of human genomes and transcriptomes routine, and (b) the unveiling of the tremendous complexity of the human proteome as expressed at the levels of genes, cells, tissues, individuals, and populations. We review here the field of human proteogenomics, with an emphasis on its history, current implementations, the types of proteomic variations it reveals, and several important applications.

  4. Proteogenomics: Integrating Next-Generation Sequencing and Mass Spectrometry to Characterize Human Proteomic Variation

    PubMed Central

    Sheynkman, Gloria M.; Shortreed, Michael R.; Cesnik, Anthony J.; Smith, Lloyd M.

    2016-01-01

    Mass spectrometry–based proteomics has emerged as the leading method for detection, quantification, and characterization of proteins. Nearly all proteomic workflows rely on proteomic databases to identify peptides and proteins, but these databases typically contain a generic set of proteins that lack variations unique to a given sample, precluding their detection. Fortunately, proteogenomics enables the detection of such proteomic variations and can be defined, broadly, as the use of nucleotide sequences to generate candidate protein sequences for mass spectrometry database searching. Proteogenomics is experiencing heightened significance due to two developments: (a) advances in DNA sequencing technologies that have made complete sequencing of human genomes and transcriptomes routine, and (b) the unveiling of the tremendous complexity of the human proteome as expressed at the levels of genes, cells, tissues, individuals, and populations. We review here the field of human proteogenomics, with an emphasis on its history, current implementations, the types of proteomic variations it reveals, and several important applications. PMID:27049631

  5. Prioritization Of Nonsynonymous Single Nucleotide Variants For Exome Sequencing Studies Via Integrative Learning On Multiple Genomic Data.

    PubMed

    Wu, Mengmeng; Wu, Jiaxin; Chen, Ting; Jiang, Rui

    2015-10-13

    The rapid advancement of next generation sequencing technology has greatly accelerated the progress for understanding human inherited diseases via such innovations as exome sequencing. Nevertheless, the identification of causative variants from sequencing data remains a great challenge. Traditional statistical genetics approaches such as linkage analysis and association studies have limited power in analyzing exome sequencing data, while relying on simply filtration strategies and predicted functional implications of mutations to pinpoint pathogenic variants are prone to produce false positives. To overcome these limitations, we herein propose a supervised learning approach, termed snvForest, to prioritize candidate nonsynonymous single nucleotide variants for a specific type of disease by integrating 11 functional scores at the variant level and 8 association scores at the gene level. We conduct a series of large-scale in silico validation experiments, demonstrating the effectiveness of snvForest across 2,511 diseases of different inheritance styles and the superiority of our approach over two state-of-the-art methods. We further apply snvForest to three real exome sequencing data sets of epileptic encephalophathies and intellectual disability to show the ability of our approach to identify causative de novo mutations for these complex diseases. The online service and standalone software of snvForest are found at http://bioinfo.au.tsinghua.edu.cn/jianglab/snvforest.

  6. Transcription factor and microRNA-regulated network motifs for cancer and signal transduction networks

    PubMed Central

    2015-01-01

    Abstract Background Molecular networks are the basis of biological processes. Such networks can be decomposed into smaller modules, also known as network motifs. These motifs show interesting dynamical behaviors, in which co-operativity effects between the motif components play a critical role in human diseases. We have developed a motif-searching algorithm, which is able to identify common motif types from the cancer networks and signal transduction networks (STNs). Some of the network motifs are interconnected which can be merged together and form more complex structures, the so-called coupled motif structures (CMS). These structures exhibit mixed dynamical behavior, which may lead biological organisms to perform specific functions. Results In this study, we integrate transcription factors (TFs), microRNAs (miRNAs), miRNA targets and network motifs information to build the cancer-related TF-miRNA-motif networks (TMMN). This allows us to examine the role of network motifs in cancer formation at different levels of regulation, i.e. transcription initiation (TF → miRNA), gene-gene interaction (CMS), and post-transcriptional regulation (miRNA → target genes). Among the cancer networks and STNs we considered, it is found that there is a substantial amount of crosstalking through motif interconnections, in particular, the crosstalk between prostate cancer network and PI3K-Akt STN. Conclusions To validate the role of network motifs in cancer formation, several examples are presented which demonstrated the effectiveness of the present approach. A web-based platform has been set up which can be accessed at: http://ppi.bioinfo.asia.edu.tw/pathway/. It is very likely that our results can supply very specific CMS missing information for certain cancer types, it is an indispensable tool for cancer biology research. PMID:25707690

  7. Development of a high density integrated reference genetic linkage map for the multinational Brassica rapa Genome Sequencing Project.

    PubMed

    Li, Xiaonan; Ramchiary, Nirala; Choi, Su Ryun; Van Nguyen, Dan; Hossain, Md Jamil; Yang, Hyeon Kook; Lim, Yong Pyo

    2010-11-01

    We constructed a high-density Brassica rapa integrated linkage map by combining a reference genetic map of 78 doubled haploid lines derived from Chiifu-401-42 × Kenshin (CKDH) and a new map of 190 F2 lines derived from Chiifu-401-42 × rapid cycling B. rapa (CRF2). The integrated map contains 1017 markers and covers 1262.0 cM of the B. rapa genome, with an average interlocus distance of 1.24 cM. High similarity of marker order and position was observed among the linkage groups of the maps with few short-distance inversions. In total, 155 simple sequence repeat (SSR) markers, anchored to 102 new bacterial artificial chromosomes (BACs) and 146 intron polymorphic (IP) markers were mapped in the integrated map, which would be helpful to align the sequenced BACs in the ongoing multinational Brassica rapa Genome Sequencing Project (BrGSP). Further, comparison of the B. rapa consensus map with the 10 B. juncea A-genome linkage groups by using 98 common IP markers showed high-degree colinearity between the A-genome linkage groups, except for few markers showing inversion or translocation. Suggesting that chromosomes are highly conserved between these Brassica species, although they evolved independently after divergence. The sequence information coming out of BrGSP would be useful for B. juncea breeding. and the identified Arabidopsis chromosomal blocks and known quantitative trait loci (QTL) information of B. juncea could be applied to improve other Brassica crops including B. rapa.

  8. Toward an Integrated BAC Library Resource for Genome Sequencing and Analysis

    SciTech Connect

    Simon, M. I.; Kim, U.-J.

    2002-02-26

    We developed a great deal of expertise in building large BAC libraries from a variety of DNA sources including humans, mice, corn, microorganisms, worms, and Arabidopsis. We greatly improved the technology for screening these libraries rapidly and for selecting appropriate BACs and mapping BACs to develop large overlapping contigs. We became involved in supplying BACs and BAC contigs to a variety of sequencing and mapping projects and we began to collaborate with Drs. Adams and Venter at TIGR and with Dr. Leroy Hood and his group at University of Washington to provide BACs for end sequencing and for mapping and sequencing of large fragments of chromosome 16. Together with Dr. Ian Dunham and his co-workers at the Sanger Center we completed the mapping and they completed the sequencing of the first human chromosome, chromosome 22. This was published in Nature in 1999 and our BAC contigs made a major contribution to this sequencing effort. Drs. Shizuya and Ding invented an automated highly accurate BAC mapping technique. We also developed long-term collaborations with Dr. Uli Weier at UCSF in the design of BAC probes for characterization of human tumors and specific chromosome deletions and breakpoints. Finally the contribution of our work to the human genome project has been recognized in the publication both by the international consortium and the NIH of a draft sequence of the human genome in Nature last year. Dr. Shizuya was acknowledged in the authorship of that landmark paper. Dr. Simon was also an author on the Venter/Adams Celera project sequencing the human genome that was published in Science last year.

  9. Gender-specific effects on food intake but no inhibition of age-related fat accretion in transgenic mice overexpressing human IGFBP-2 lacking the Cardin-Weintraub sequence motif.

    PubMed

    Wiedmer, Petra; Schwarz, Franziska; Große, Birgit; Schindler, Nancy; Tuchscherer, Armin; Russo, Vincenzo C; Tschöp, Matthias H; Hoeflich, Andreas

    2015-06-01

    IGFBP-2 affects growth and metabolism and is thought to impact on energy homeostasis and the accretion of body fat via its heparin binding domains (HBD). In order to assess the function of the HBD present in the linker domain (HBD1) we have generated transgenic mice overexpressing mutant human IGFBP-2 lacking the PKKLRP sequence and carrying a PNNLAP sequence instead. Transgenic mice expressed high amounts of human IGFBP-2, while endogenous IGFBP-2 or IGF-I serum concentrations were not affected. In both genders we performed a longitudinal analysis of growth and metabolism including at least 4 separate time points between the age of 10 and 52 weeks. Body composition was assessed by nuclear magnetic resonance (NMR) analysis. Food intake was recorded by an automated online-monitoring. We describe negative effects of mutant human IGFBP-2 on body weight, longitudinal growth and lean body mass (p < 0.05). Very clearly, negative effects of mutant IGFBP-2 were not observed for fat mass accretion throughout life. Instead, relative fat mass was increased in transgenic mice of both genders (p < 0.05). In male mice transgene expression significantly increased absolute mass of total body fat over all age groups (p < 0.05). Food intake was increased in female but decreased in male transgenic mice at an age of 11 weeks. Thus our study clearly provides gender- and time-specific effects of HBD1-deficient hIGFBP-2 (H1d-BP-2) on fat mass accretion and food intake. While our data are in principal agreement with current knowledge on the role of HB-domains for fat accretion we now may also speculate on a role of HBD1 for the control of eating behavior.

  10. Prevalence of the EH1 Groucho interaction motif in the metazoan Fox family of transcriptional regulators

    PubMed Central

    Yaklichkin, Sergey; Vekker, Alexander; Stayrook, Steven; Lewis, Mitchell; Kessler, Daniel S

    2007-01-01

    Background The Fox gene family comprises a large and functionally diverse group of forkhead-related transcriptional regulators, many of which are essential for metazoan embryogenesis and physiology. Defining conserved functional domains that mediate the transcriptional activity of Fox proteins will contribute to a comprehensive understanding of the biological function of Fox family genes. Results Systematic analysis of 458 protein sequences of the metazoan Fox family was performed to identify the presence of the engrailed homology-1 motif (eh1), a motif known to mediate physical interaction with transcriptional corepressors of the TLE/Groucho family. Greater than 50% of Fox proteins contain sequences with high similarity to the eh1 motif, including ten of the nineteen Fox subclasses (A, B, C, D, E, G, H, I, L, and Q) and Fox proteins of early divergent species such as marine sponge. The eh1 motif is not detected in Fox proteins of the F, J, K, M, N, O, P, R and S subclasses, or in yeast Fox proteins. The eh1-like motifs are positioned C-terminal to the winged helix DNA-binding domain in all subclasses except for FoxG proteins, which have an N-terminal motif. Two similar eh1-like motifs are found in the zebrafish FoxQ1 and in FoxG proteins of sea urchin and amphioxus. The identification of eh1-like motifs by manual sequence alignment was validated by statistical analyses of the Swiss protein database, confirming a high frequency of occurrence of eh1-like sequences in Fox family proteins. Structural predictions suggest that the majority of identified eh1-like motifs are short α-helices, and wheel modeling revealed an amphipathicity that supports this secondary structure prediction. Conclusion A search for eh1 Groucho interaction motifs in the Fox gene family has identified eh1-like sequences in greater than 50% of Fox proteins. The results predict a physical and functional interaction of TLE/Groucho corepressors with many members of the Fox family of transcriptional

  11. miRBase: integrating microRNA annotation and deep-sequencing data.

    PubMed

    Kozomara, Ana; Griffiths-Jones, Sam

    2011-01-01

    miRBase is the primary online repository for all microRNA sequences and annotation. The current release (miRBase 16) contains over 15,000 microRNA gene loci in over 140 species, and over 17,000 distinct mature microRNA sequences. Deep-sequencing technologies have delivered a sharp rise in the rate of novel microRNA discovery. We have mapped reads from short RNA deep-sequencing experiments to microRNAs in miRBase and developed web interfaces to view these mappings. The user can view all read data associated with a given microRNA annotation, filter reads by experiment and count, and search for microRNAs by tissue- and stage-specific expression. These data can be used as a proxy for relative expression levels of microRNA sequences, provide detailed evidence for microRNA annotations and alternative isoforms of mature microRNAs, and allow us to revisit previous annotations. miRBase is available online at: http://www.mirbase.org/.

  12. Tools for visualization and integration of intermediate sequencing results in large disease gene discovery projects.

    PubMed

    Rzhetsky, A; Kalachikov, S; Ye, X; Zhang, P; Russo, J J

    1998-02-16

    We describe two Java applets which are useful for insightful presentation of intermediate experimental data in gene discovery projects involving large scale sequencing. One of these applets provides a physical map of a genomic region and provides easy access to the second applet, which furnishes a detailed map of sequence contigs associated with clones on the physical map. In particular, the second applet displays all the known information about each contig, including the presence of exons, database homology 'hits', repetitive elements and other features; the graphics are linked to other World Wide Web pages, providing detailed information on each feature. These applets should be useful to other research groups working on large sequencing projects.

  13. Conditional graphical models for protein structural motif recognition.

    PubMed

    Liu, Yan; Carbonell, Jaime; Gopalakrishnan, Vanathi; Weigele, Peter

    2009-05-01

    Determining protein structures is crucial to understanding the mechanisms of infection and designing drugs. However, the elucidation of protein folds by crystallographic experiments can be a bottleneck in the development process. In this article, we present a probabilistic graphical model framework, conditional graphical models, for predicting protein structural motifs. It represents the structure characteristics of a structural motif using a graph, where the nodes denote the secondary structure elements, and the edges indicate the side-chain interactions between the components either within one protein chain or between chains. Then the model defines the optimal segmentation of a protein sequence against the graph by maximizing its "conditional" probability so that it can take advantages of the discriminative training approach. Efficient approximate inference algorithms using reversible jump Markov Chain Monte Carlo (MCMC) algorithm are developed to handle the resulting complex graphical models. We test our algorithm on four important structural motifs, and our method outperforms other state-of-art algorithms for motif recognition. We also hypothesize potential membership proteins of target folds from Swiss-Prot, which further supports the evolutionary hypothesis about viral folds.

  14. SBH and the integration of complementary approaches in the mapping, sequencing, and understanding of complex genomes

    SciTech Connect

    Drmanac, R.; Drmanac, S.; Labat, I.; Vicentic, A.; Gemmell, A.; Stavropoulos, N.; Jarvis, J.

    1992-01-01

    A variant of sequencing by hybridization (SBH) is being developed with a potential to inexpensively determine up to 100 million base pairs per year. The method comprises (1) arraying short clones in 864-well plates; (2) growth of the M13 clones or PCR of the inserts; (3) automated spotting of DNAs by corresponding pin-arrays; (4) hybridization of dotted samples with 200-3000 [sup 32]P- or [sup 33]P-labeled 6- to 8-mer probes; and (5) scoring hybridization signals using storage phosphor plates. Some 200 7- to 8-mers can provide an inventory of the genes if CDNA clones are hybridized, or can define the order of 2-kb genomic clones, creating physical and structural maps with 100-bp resolution; the distribution of G+C, LINEs, SINEs, and gene families would be revealed. cDNAs that represent new genes and genomic clones in regions of interest selected by SBH can be sequenced by a gel method. Uniformly distributed clones from the previous step will be hybridized with 2000--3000 6- to 8-mers. As a result, approximately 50--60% of the genomic regions containing members of large repetitive and gene families and those families represented in GenBank would be completely sequenced. In the less redundant regions, every base pair is expected to be read with 3-4 probes, but the complete sequence can not be reconstructed. Such partial sequences allow the inference of similarity and the recognition of coding, regulatory, and repetitive sequences, as well as study of the evolutionary processes all the way up to the species delineation.

  15. SBH and the integration of complementary approaches in the mapping, sequencing, and understanding of complex genomes

    SciTech Connect

    Drmanac, R.; Drmanac, S.; Labat, I.; Vicentic, A.; Gemmell, A.; Stavropoulos, N.; Jarvis, J.

    1992-12-01

    A variant of sequencing by hybridization (SBH) is being developed with a potential to inexpensively determine up to 100 million base pairs per year. The method comprises (1) arraying short clones in 864-well plates; (2) growth of the M13 clones or PCR of the inserts; (3) automated spotting of DNAs by corresponding pin-arrays; (4) hybridization of dotted samples with 200-3000 {sup 32}P- or {sup 33}P-labeled 6- to 8-mer probes; and (5) scoring hybridization signals using storage phosphor plates. Some 200 7- to 8-mers can provide an inventory of the genes if CDNA clones are hybridized, or can define the order of 2-kb genomic clones, creating physical and structural maps with 100-bp resolution; the distribution of G+C, LINEs, SINEs, and gene families would be revealed. cDNAs that represent new genes and genomic clones in regions of interest selected by SBH can be sequenced by a gel method. Uniformly distributed clones from the previous step will be hybridized with 2000--3000 6- to 8-mers. As a result, approximately 50--60% of the genomic regions containing members of large repetitive and gene families and those families represented in GenBank would be completely sequenced. In the less redundant regions, every base pair is expected to be read with 3-4 probes, but the complete sequence can not be reconstructed. Such partial sequences allow the inference of similarity and the recognition of coding, regulatory, and repetitive sequences, as well as study of the evolutionary processes all the way up to the species delineation.

  16. Integrated view of genome structure and sequence of a single DNA molecule in a nanofluidic device

    PubMed Central

    Marie, Rodolphe; Pedersen, Jonas N.; Bauer, David L. V.; Rasmussen, Kristian H.; Yusuf, Mohammed; Volpi, Emanuela; Flyvbjerg, Henrik; Kristensen, Anders; Mir, Kalim U.

    2013-01-01

    We show how a bird’s-eye view of genomic structure can be obtained at ∼1-kb resolution from long (∼2 Mb) DNA molecules extracted from whole chromosomes in a nanofluidic laboratory-on-a-chip. We use an improved single-molecule denaturation mapping approach to detect repetitive elements and known as well as unique structural variation. Following its mapping, a molecule of interest was rescued from the chip; amplified and localized to a chromosome by FISH; and interrogated down to 1-bp resolution with a commercial sequencer, thereby reconciling haplotype-phased chromosome substructure with sequence. PMID:23479649

  17. Construction of an integrated high density simple sequence repeat linkage map in cultivated strawberry (Fragaria × ananassa) and its applicability.

    PubMed

    Isobe, Sachiko N; Hirakawa, Hideki; Sato, Shusei; Maeda, Fumi; Ishikawa, Masami; Mori, Toshiki; Yamamoto, Yuko; Shirasawa, Kenta; Kimura, Mitsuhiro; Fukami, Masanobu; Hashizume, Fujio; Tsuji, Tomoko; Sasamoto, Shigemi; Kato, Midori; Nanri, Keiko; Tsuruoka, Hisano; Minami, Chiharu; Takahashi, Chika; Wada, Tsuyuko; Ono, Akiko; Kawashima, Kumiko; Nakazaki, Naomi; Kishida, Yoshie; Kohara, Mitsuyo; Nakayama, Shinobu; Yamada, Manabu; Fujishiro, Tsunakazu; Watanabe, Akiko; Tabata, Satoshi

    2013-02-01

    The cultivated strawberry (Fragaria × ananassa) is an octoploid (2n = 8x = 56) of the Rosaceae family whose genomic architecture is still controversial. Several recent studies support the AAA'A'BBB'B' model, but its complexity has hindered genetic and genomic analysis of this important crop. To overcome this difficulty and to assist genome-wide analysis of F. × ananassa, we constructed an integrated linkage map by organizing a total of 4474 of simple sequence repeat (SSR) markers collected from published Fragaria sequences, including 3746 SSR markers [Fragaria vesca expressed sequence tag (EST)-derived SSR markers] derived from F. vesca ESTs, 603 markers (F. × ananassa EST-derived SSR markers) from F. × ananassa ESTs, and 125 markers (F. × ananassa transcriptome-derived SSR markers) from F. × ananassa transcripts. Along with the previously published SSR markers, these markers were mapped onto five parent-specific linkage maps derived from three mapping populations, which were then assembled into an integrated linkage map. The constructed map consists of 1856 loci in 28 linkage groups (LGs) that total 2364.1 cM in length. Macrosynteny at the chromosome level was observed between the LGs of F. × ananassa and the genome of F. vesca. Variety distinction on 129 F. × ananassa lines was demonstrated using 45 selected SSR markers.

  18. Characterization of AKR murine leukemia virus sequences in AKR mouse substrains and structure of integrated recombinant genomes in tumor tissues.

    PubMed Central

    Quint, W; Quax, W; van der Putten, H; Berns, A

    1981-01-01

    A specific cDNA probe of AKR murine leukemia virus (AKR-MLV) was prepared to detect AKR-MLV sequences in normal and tumor tissues in a variety of AKR mouse substrains. AKR strains contained up to six endogenous AKR-MLV genomes. All substrains tested had one AKR-MLV locus in common, and closely related substrains had several proviruses integrated in an identical site. Virus-induced tumors in the AKR/FuRdA and AKR/JS strains showed a reintegration pattern of AKR-MLV sequences unique for the individual animal, suggesting a monoclonal origin for the outgrown tumors. An analysis of tumor DNAs from the AKR/FuRdA and AKR/JS substrains with restriction enzymes cleaving within the proviral genome revealed a new EcoRI restriction site and BamHI restriction site not present in normal tissues. The positions of these sites corresponded both with cleavage sites of EcoRI and BamHI in integrated Moloney recombinants and with the structure of isolated AKR mink cell focus-forming viruses. All tumors analyzed to data contain nearly identical integrated recombinant genomes, suggesting a causal relationship between the formation of recombinants and the leukemogenic process. Images PMID:6268802

  19. Population genomics and transcriptional consequences of regulatory motif variation in globally diverse Saccharomyces cerevisiae strains.

    PubMed

    Connelly, Caitlin F; Skelly, Daniel A; Dunham, Maitreya J; Akey, Joshua M

    2013-07-01

    Noncoding genetic variation is known to significantly influence gene expression levels in a growing number of specific cases; however, the patterns of genome-wide noncoding variation present within populations, the evolutionary forces acting on noncoding variants, and the relative effects of regulatory polymorphisms on transcript abundance are not well characterized. Here, we address these questions by analyzing patterns of regulatory variation in motifs for 177 DNA binding proteins in 37 strains of Saccharomyces cerevisiae. Between S. cerevisiae strains, we found considerable polymorphism in regulatory motifs across strains (mean π = 0.005) as well as diversity in regulatory motifs (mean 0.91 motifs differences per regulatory region). Population genetics analyses reveal that motifs are under purifying selection, and there is considerable heterogeneity in the magnitude of selection across different motifs. Finally, we obtained RNA-Seq data in 22 strains and identified 49 polymorphic DNA sequence motifs in 30 distinct genes that are significantly associated with transcriptional differences between strains. In 22 of these genes, there was a single polymorphic motif associated with expression in the upstream region. Our results provide comprehensive insights into the evolutionary trajectory of regulatory variation in yeast and the characteristics of a compendium of regulatory alleles.

  20. Genome-wide conserved consensus transcription factor binding motifs are hyper-methylated

    PubMed Central

    2010-01-01

    Background DNA methylation can regulate gene expression by modulating the interaction between DNA and proteins or protein complexes. Conserved consensus motifs exist across the human genome ("predicted transcription factor binding sites": "predicted TFBS") but the large majority of these are proven by chromatin immunoprecipitation and high throughput sequencing (ChIP-seq) not to be biological transcription factor binding sites ("empirical TFBS"). We hypothesize that DNA methylation at conserved consensus motifs prevents promiscuous or disorderly transcription factor binding. Results Using genome-wide methylation maps of the human heart and sperm, we found that all conserved consensus motifs as well as the subset of those that reside outside CpG islands have an aggregate profile of hyper-methylation. In contrast, empirical TFBS with conserved consensus motifs have a profile of hypo-methylation. 40% of empirical TFBS with conserved consensus motifs resided in CpG islands whereas only 7% of all conserved consensus motifs were in CpG islands. Finally we further identified a minority subset of TF whose profiles are either hypo-methylated or neutral at their respective conserved consensus motifs implicating that these TF may be responsible for establishing or maintaining an un-methylated DNA state, or whose binding is not regulated by DNA methylation. Conclusions Our analysis supports the hypothesis that at least for a subset of TF, empirical binding to conserved consensus motifs genome-wide may be controlled by DNA methylation. PMID:20875111

  1. Population Genomics and Transcriptional Consequences of Regulatory Motif Variation in Globally Diverse Saccharomyces cerevisiae Strains

    PubMed Central

    Connelly, Caitlin F.; Skelly, Daniel A.; Dunham, Maitreya J.; Akey, Joshua M.

    2013-01-01

    Noncoding genetic variation is known to significantly influence gene expression levels in a growing number of specific cases; however, the patterns of genome-wide noncoding variation present within populations, the evolutionary forces acting on noncoding variants, and the relative effects of regulatory polymorphisms on transcript abundance are not well characterized. Here, we address these questions by analyzing patterns of regulatory variation in motifs for 177 DNA binding proteins in 37 strains of Saccharomyces cerevisiae. Between S. cerevisiae strains, we found considerable polymorphism in regulatory motifs across strains (mean π = 0.005) as well as diversity in regulatory motifs (mean 0.91 motifs differences per regulatory region). Population genetics analyses reveal that motifs are under purifying selection, and there is considerable heterogeneity in the magnitude of selection across different motifs. Finally, we obtained RNA-Seq data in 22 strains and identified 49 polymorphic DNA sequence motifs in 30 distinct genes that are significantly associated with transcriptional differences between strains. In 22 of these genes, there was a single polymorphic motif associated with expression in the upstream region. Our results provide comprehensive insights into the evolutionary trajectory of regulatory variation in yeast and the characteristics of a compendium of regulatory alleles. PMID:23619145

  2. Automation and integration of multiplexed on-line sample preparation with capillary electrophoresis for DNA sequencing

    SciTech Connect

    Tan, H.

    1999-03-31

    The purpose of this research is to develop a multiplexed sample processing system in conjunction with multiplexed capillary electrophoresis for high-throughput DNA sequencing. The concept from DNA template to called bases was first demonstrated with a manually operated single capillary system. Later, an automated microfluidic system with 8 channels based on the same principle was successfully constructed. The instrument automatically processes 8 templates through reaction, purification, denaturation, pre-concentration, injection, separation and detection in a parallel fashion. A multiplexed freeze/thaw switching principle and a distribution network were implemented to manage flow direction and sample transportation. Dye-labeled terminator cycle-sequencing reactions are performed in an 8-capillary array in a hot air thermal cycler. Subsequently, the sequencing ladders are directly loaded into a corresponding size-exclusion chromatographic column operated at {approximately} 60 C for purification. On-line denaturation and stacking injection for capillary electrophoresis is simultaneously accomplished at a cross assembly set at {approximately} 70 C. Not only the separation capillary array but also the reaction capillary array and purification columns can be regenerated after every run. DNA sequencing data from this system allow base calling up to 460 bases with accuracy of 98%.

  3. Wasabi: An Integrated Platform for Evolutionary Sequence Analysis and Data Visualization.

    PubMed

    Veidenberg, Andres; Medlar, Alan; Löytynoja, Ari

    2016-04-01

    Wasabi is an open source, web-based environment for evolutionary sequence analysis. Wasabi visualizes sequence data together with a phylogenetic tree within a modern, user-friendly interface: The interface hides extraneous options, supports context sensitive menus, drag-and-drop editing, and displays additional information, such as ancestral sequences, associated with specific tree nodes. The Wasabi environment supports reproducibility by automatically storing intermediate analysis steps and includes built-in functions to share data between users and publish analysis results. For computational analysis, Wasabi supports PRANK and PAGAN for phylogeny-aware alignment and alignment extension, and it can be easily extended with other tools. Along with drag-and-drop import of local files, Wasabi can access remote data through URL and import sequence data, GeneTrees and EPO alignments directly from Ensembl. To demonstrate a typical workflow using Wasabi, we reproduce key findings from recent comparative genomics studies, including a reanalysis of the EGLN1 gene from the tiger genome study: These case studies can be browsed within Wasabi at http://wasabiapp.org:8000?id=usecases. Wasabi runs inside a web browser and does not require any installation. One can start using it at http://wasabiapp.org. All source code is licensed under the AGPLv3.

  4. ParameciumDB: a community resource that integrates the Paramecium tetraurelia genome sequence with genetic data.

    PubMed

    Arnaiz, Olivier; Cain, Scott; Cohen, Jean; Sperling, Linda

    2007-01-01

    ParameciumDB (http://paramecium.cgm.cnrs-gif.fr) is a new model organism database associated with the genome sequencing project of the unicellular eukaryote Paramecium tetraurelia. Built with the core components of the Generic Model Organism Database (GMOD) project, ParameciumDB currently contains the genome sequence and annotations, linked to available genetic data including the Gif Paramecium stock collection. It is thus possible to navigate between sequences and stocks via the genes and alleles. Phenotypes, of mutant strains and of knockdowns obtained by RNA interference, are captured using controlled vocabularies according to the Entity-Attribute-Value model. ParameciumDB currently supports browsing of phenotypes, alleles and stocks as well as querying of sequence features (genes, UniProt matches, InterPro domains, Gene Ontology terms) and of genetic data (phenotypes, stocks, RNA interference experiments). Forms allow submission of RNA interference data and some bioinformatics services are available. Future ParameciumDB development plans include coordination of human curation of the near 40 000 gene models by members of the research community.

  5. Incorporating Writing in an Integrated Calculus, Linear Algebra, and Differential Equations Sequence.

    ERIC Educational Resources Information Center

    Kelly, Susan E.; LeDocq, Rebecca Lewin

    2001-01-01

    Describes the specific courses in a sequence along with how the writing has been implemented in each course. Provides ideas for how to efficiently handle the additional paper load so students receive the necessary feedback while keeping the grading time reasonable. (Author/ASK)

  6. A variety of DNA-binding and multimeric proteins contain the histone fold motif.

    PubMed Central

    Baxevanis, A D; Arents, G; Moudrianakis, E N; Landsman, D

    1995-01-01

    The histone fold motif has previously been identified as a structural feature common to all four core histones and is involved in both histone-histone and histone-DNA interactions. Through the use of a novel motif searching method, a group of proteins containing the histone fold motif has been established. The proteins in this group are involved in a wide variety of functions related mostly to DNA metabolism. Most of these proteins engage in protein-protein or protein-DNA interactions, as do their core histone counterparts. Among these, CCAAT-specific transcription factor CBF and its yeast homologue HAP are two examples of multimeric complexes with different component subunits that contain the histone fold motif. The histone fold proteins are distantly related, with a relatively small degree of absolute sequence similarity. It is proposed that these proteins may share a similar three-dimensional conformation despite the lack of significant sequence similarity. PMID:7651829

  7. Repression domains of class II ERF transcriptional repressors share an essential motif for active repression.

    PubMed

    Ohta, M; Matsui, K; Hiratsu, K; Shinshi, H; Ohme-Takagi, M

    2001-08-01

    We reported previously that three ERF transcription factors, tobacco ERF3 (NtERF3) and Arabidopsis AtERF3 and AtERF4, which are categorized as class II ERFs, are active repressors of transcription. To clarify the roles of these repressors in transcriptional regulation in plants, we attempted to identify the functional domains of the ERF repressor that mediates the repression of transcription. Analysis of the results of a series of deletions revealed that the C-terminal 35 amino acids of NtERF3 are sufficient to confer the capacity for repression of transcription on a heterologous DNA binding domain. This repression domain suppressed the intermolecular activities of other transcriptional activators. In addition, fusion of this repression domain to the VP16 activation domain completely inhibited the transactivation function of VP16. Comparison of amino acid sequences of class II ERF repressors revealed the conservation of the sequence motif (L)/(F)DLN(L)/(F)(x)P. This motif was essential for repression because mutations within the motif eliminated the capacity for repression. We designated this motif the ERF-associated amphiphilic repression (EAR) motif, and we identified this motif in a number of zinc-finger proteins from wheat, Arabidopsis, and petunia plants. These zinc finger proteins functioned as repressors, and their repression domains were identified as regions that contained an EAR motif.

  8. Targeting functional motifs of a protein family

    NASA Astrophysics Data System (ADS)

    Bhadola, Pradeep; Deo, Nivedita

    2016-10-01

    The structural organization of a protein family is investigated by devising a method based on the random matrix theory (RMT), which uses the physiochemical properties of the amino acid with multiple sequence alignment. A graphical method to represent protein sequences using physiochemical properties is devised that gives a fast, easy, and informative way of comparing the evolutionary distances between protein sequences. A correlation matrix associated with each property is calculated, where the noise reduction and information filtering is done using RMT involving an ensemble of Wishart matrices. The analysis of the eigenvalue statistics of the correlation matrix for the β -lactamase family shows the universal features as observed in the Gaussian orthogonal ensemble (GOE). The property-based approach captures the short- as well as the long-range correlation (approximately following GOE) between the eigenvalues, whereas the previous approach (treating amino acids as characters) gives the usual short-range correlations, while the long-range correlations are the same as that of an uncorrelated series. The distribution of the eigenvector components for the eigenvalues outside the bulk (RMT bound) deviates significantly from RMT observations and contains important information about the system. The information content of each eigenvector of the correlation matrix is quantified by introducing an entropic estimate, which shows that for the β -lactamase family the smallest eigenvectors (low eigenmodes) are highly localized as well as informative. These small eigenvectors when processed gives clusters involving positions that have well-defined biological and structural importance matching with experiments. The approach is crucial for the recognition of structural motifs as shown in β -lactamase (and other families) and selectively identifies the important positions for targets to deactivate (activate) the enzymatic actions.

  9. Targeting functional motifs of a protein family.

    PubMed

    Bhadola, Pradeep; Deo, Nivedita

    2016-10-01

    The structural organization of a protein family is investigated by devising a method based on the random matrix theory (RMT), which uses the physiochemical properties of the amino acid with multiple sequence alignment. A graphical method to represent protein sequences using physiochemical properties is devised that gives a fast, easy, and informative way of comparing the evolutionary distances between protein sequences. A correlation matrix associated with each property is calculated, where the noise reduction and information filtering is done using RMT involving an ensemble of Wishart matrices. The analysis of the eigenvalue statistics of the correlation matrix for the β-lactamase family shows the universal features as observed in the Gaussian orthogonal ensemble (GOE). The property-based approach captures the short- as well as the long-range correlation (approximately following GOE) between the eigenvalues, whereas the previous approach (treating amino acids as characters) gives the usual short-range correlations, while the long-range correlations are the same as that of an uncorrelated series. The distribution of the eigenvector components for the eigenvalues outside the bulk (RMT bound) deviates significantly from RMT observations and contains important information about the system. The information content of each eigenvector of the correlation matrix is quantified by introducing an entropic estimate, which shows that for the β-lactamase family the smallest eigenvectors (low eigenmodes) are highly localized as well as informative. These small eigenvectors when processed gives clusters involving positions that have well-defined biological and structural importance matching with experiments. The approach is crucial for the recognition of structural motifs as shown in β-lactamase (and other families) and selectively identifies the important positions for targets to deactivate (activate) the enzymatic actions.

  10. Phosphopeptide interactions with BRCA1 BRCT domains: More than just a motif.

    PubMed

    Wu, Qian; Jubb, Harry; Blundell, Tom L

    2015-03-01

    BRCA1 BRCT domains function as phosphoprotein-binding modules for recognition of the phosphorylated protein-sequence motif pSXXF. While the motif interaction interface provides strong anchor points for binding, protein regions outside the motif have recently been found to be important for binding affinity. In this review, we compare the available structural data for BRCA1 BRCT domains in complex with phosphopeptides in order to gain a more complete understanding of the interaction between phosphopeptides and BRCA1-BRCT domains.

  11. Identification, sequencing and expression of an integral membrane protein of the trans-Golgi network (TGN38).

    PubMed Central

    Luzio, J P; Brake, B; Banting, G; Howell, K E; Braghetta, P; Stanley, K K

    1990-01-01

    Organelle-specific integral membrane proteins were identified by a novel strategy which gives rise to monospecific antibodies to these proteins as well as to the cDNA clones encoding them. A cDNA expression library was screened with a polyclonal antiserum raised against Triton X-114-extracted organelle proteins and clones were then grouped using antibodies affinity-purified on individual fusion proteins. The identification, molecular cloning and sequencing are described of a type 1 membrane protein (TGN38) which is located specifically in the trans-Golgi network. Images Fig. 1. Fig. 3. PMID:2204342

  12. Complete Genome Sequence of Streptomyces parvulus 2297, Integrating Site-Specifically with Actinophage R4

    PubMed Central

    Miura, Takamasa; Harada, Chizuko; Guo, Yong; Narisawa, Kazuhiko; Ohta, Hiroyuki; Takahashi, Hideo; Shirai, Makoto

    2016-01-01

    Streptomyces parvulus 2297, which is a host for site-specific recombination according to actinophage R4, is derived from the type strain ATCC 12434. Species of S. parvulus are known as producers of polypeptide antibiotic actinomycins and have been considered for industrial applications. We herein report for the first time the complete genome sequence of S. parvulus 2297. PMID:27563047

  13. Integrating genomic information with protein sequence and 3D atomic level structure at the RCSB protein data bank

    PubMed Central

    Prlić, Andreas; Kalro, Tara; Bhattacharya, Roshni; Christie, Cole; Burley, Stephen K.; Rose, Peter W.

    2016-01-01

    Summary: The Protein Data Bank (PDB) now contains more than 120,000 three-dimensional (3D) structures of biological macromolecules. To allow an interpretation of how PDB data relates to other publicly available annotations, we developed a novel data integration platform that maps 3D structural information across various datasets. This integration bridges from the human genome across protein sequence to 3D structure space. We developed novel software solutions for data management and visualization, while incorporating new libraries for web-based visualization using SVG graphics. Availability and Implementation: The new views are available from http://www.rcsb.org and software is available from https://github.com/rcsb/. Contact: andreas.prlic@rcsb.org Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27551105

  14. An integrated multiple capillary array electrophoresis system for high-throughput DNA sequencing

    SciTech Connect

    Lu, X.

    1998-03-27

    A capillary array electrophoresis system was chosen to perform DNA sequencing because of several advantages such as rapid heat dissipation, multiplexing capabilities, gel matrix filling simplicity, and the mature nature of the associated manufacturing technologies. There are two major concerns for the multiple capillary systems. One concern is inter-capillary cross-talk, and the other concern is excitation and detection efficiency. Cross-talk is eliminated through proper optical coupling, good focusing and immersing capillary array into index matching fluid. A side-entry excitation scheme with orthogonal detection was established for large capillary array. Two 100 capillary array formats were used for DNA sequencing. One format is cylindrical capillary with 150 {micro}m o.d., 75 {micro}m i.d and the other format is square capillary with 300 {micro}m out edge and 75 {micro}m inner edge. This project is focused on the development of excitation and detection of DNA as well as performing DNA sequencing. The DNA injection schemes are discussed for the cases of single and bundled capillaries. An individual sampling device was designed. The base-calling was performed for a capillary from the capillary array with the accuracy of 98%.

  15. STEME: efficient EM to find motifs in large data sets.

    PubMed

    Reid, John E; Wernisch, Lorenz

    2011-10-01

    MEME and many other popular motif finders use the expectation-maximization (EM) algorithm to optimize their parameters. Unfortunately, the running time of EM is linear in the length of the input sequences. This can prohibit its application to data sets of the size commonly generated by high-throughput biological techniques. A suffix tree is a data structure that can efficiently index a set of sequences. We describe an algorithm, Suffix Tree EM for Motif Elicitation (STEME), that approximates EM using suffix trees. To the best of our knowledge, this is the first application of suffix trees to EM. We provide an analysis of the expected running time of the algorithm and demonstrate that STEME runs an order of magnitude more quickly than the implementation of EM used by MEME. We give theoretical bounds for the quality of the approximation and show that, in practice, the approximation has a negligible effect on the outcome. We provide an open source implementation of the algorithm that we hope will be used to speed up existing and future motif search algorithms.

  16. Zinc finger binding motifs do not explain recombination rate variation within or between species of Drosophila.

    PubMed

    Heil, Caiti S S; Noor, Mohamed A F

    2012-01-01

    In humans and mice, the Cys(2)His(2) zinc finger protein PRDM9 binds to a DNA sequence motif enriched in hotspots of recombination, possibly modifying nucleosomes, and recruiting recombination machinery to initiate Double Strand Breaks (DSBs). However, since its discovery, some researchers have suggested that the recombinational effect of PRDM9 is lineage or species specific. To test for a conserved role of PRDM9-like proteins across taxa, we use the Drosophila pseudoobscura species group in an attempt to identify recombination associated zinc finger proteins and motifs. We leveraged the conserved amino acid motifs in Cys(2)His(2) zinc fingers to predict nucleotide binding motifs for all Cys(2)His(2) zinc finger proteins in Drosophila pseudoobscura and identified associations with empirical measures of recombination rate. Additionally, we utilized recombination maps from D. pseudoobscura and D. miranda to explore whether changes in the binding motifs between species can account for changes in the recombination landscape, analogous to the effect observed in PRDM9 among human populations. We identified a handful of potential recombination-associated sequence motifs, but the associations are generally tenuous and their biological relevance remains uncertain. Furthermore, we found no evidence that changes in zinc finger DNA binding explains variation in recombination rate between species. We therefore conclude that there is no protein with a DNA sequence specific human-PRDM9-like function in Drosophila. We suggest these findings could be explained by the existence of a different recombination initiation system in Drosophila.

  17. Demonstrating the Effectiveness of an Integrated and Intensive Research Methods and Statistics Course Sequence

    ERIC Educational Resources Information Center

    Pliske, Rebecca M.; Caldwell, Tracy L.; Calin-Jageman, Robert J.; Taylor-Ritzler, Tina

    2015-01-01

    We developed a two-semester series of intensive (six-contact hours per week) behavioral research methods courses with an integrated statistics curriculum. Our approach includes the use of team-based learning, authentic projects, and Excel and SPSS. We assessed the effectiveness of our approach by examining our students' content area scores on the…

  18. Heat, Energy, and Order, Part Two of an Integrated Science Sequence, Student Guide, 1970 Edition.

    ERIC Educational Resources Information Center

    Portland Project Committee, OR.

    Part two of the first year in the Portland Project, a three-year high school integrated science curriculum, is contained in this student guide. This volume, one of four parts in the year course, involves activities relating to what is considered the most powerful unifying concept in science: energy. The macroscopic aspects of heat as embodied in…

  19. A bioinformatics pipeline to search functional motifs within whole-proteome data: a case study of poxviruses.

    PubMed

    Sobhy, Haitham

    2017-04-01

    Proteins harbor domains or short linear motifs, which facilitate their functions and interactions. Finding functional motifs in protein sequences could predict the putative cellular roles or characteristics of hypothetical proteins. In this study, we present Shetti-Motif, which is an interactive tool to (i) map UniProt and PROSITE flat files, (ii) search for multiple pre-defined consensus patterns or experimentally validated functional motifs in large datasets protein sequences (proteome-wide), (iii) search for motifs containing repeated residues (low-complexity regions, e.g., Leu-, SR-, PEST-rich motifs, etc.). As proof of principle, using this comparative proteomics pipeline, eleven proteomes encoded by member of Poxviridae family were searched against about 100 experimentally validated functional motifs. The closely related viruses and viruses infect the same host cells (e.g. vaccinia and variola viruses) show similar motif-containing proteins profile. The motifs encoded by these viruses are correlated, which explains why poxviruses are able to interact with wide range of host cells. In conclusion, this in silico analysis is useful to establish a dataset(s) or potential proteins for further investigation or compare between species.

  20. Isosteric and nonisosteric base pairs in RNA motifs: molecular dynamics and bioinformatics study of the sarcin-ricin internal loop.

    PubMed

    Havrila, Marek; Réblová, Kamila; Zirbel, Craig L; Leontis, Neocles B; Šponer, Jiří

    2013-11-21

    The sarcin-ricin RNA motif (SR motif) is one of the most prominent recurrent RNA building blocks that occurs in many different RNA contexts and folds autonomously, that is, in a context-independent manner. In this study, we combined bioinformatics analysis with explicit-solvent molecular dynamics (MD) simulations to better understand the relation between the RNA sequence and the evolutionary patterns of the SR motif. A SHAPE probing experiment was also performed to confirm the fidelity of the MD simulations. We identified 57 instances of the SR motif in a nonredundant subset of the RNA X-ray structure database and analyzed their base pairing, base-phosphate, and backbone-backbone interactions. We extracted sequences aligned to these instances from large rRNA alignments to determine the frequency of occurrence for different sequence variants. We then used a simple scoring scheme based on isostericity to suggest 10 sequence variants with a highly variable expected degree of compatibility with the SR motif 3D structure. We carried out MD simulations of SR motifs with these base substitutions. Nonisosteric base substitutions led to unstable structures, but so did isosteric substitutions which were unable to make key base-phosphate interactions. The MD technique explains why some potentially isosteric SR motifs are not realized during evolution. We also found that the inability to form stable cWW geometry is an important factor in the case of the first base pair of the flexible region of the SR motif. A comparison of structural, bioinformatics, SHAPE probing, and MD simulation data reveals that explicit solvent MD simulations neatly reflect the viability of different sequence variants of the SR motif. Thus, MD simulations can efficiently complement bioinformatics tools in studies of conservation patterns of RNA motifs and provide atomistic insight into the role of their different signature interactions.

  1. The Locus Lookup Tool at MaizeGDB: Identification of Genomic Regions in Maize by Integrating Sequence Information with Physical and Genetic Maps

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Methods to automatically integrate sequence information with physical and genetic maps are scarce. The Locus Lookup Tool enables researchers to define windows of genomic sequence likely to contain loci of interest where only genetic or physical mapping associations are reported. Using the Locus Look...

  2. Local motifs involved in the canonical structure of the ligand-binding domain in the nuclear receptor superfamily.

    PubMed

    Tsuji, Motonori

    2014-03-01

    Structural and sequence alignment analyses have revealed the existence of class-dependent and -independent local motifs involved in the overall fold of the ligand-binding domain (LBD) in the nuclear receptor (NR) superfamily. Of these local motifs, three local motifs, i.e., AF-2 fixed motifs, were involved in the agonist conformation of the activation function-2 (AF-2) region of the LBD. Receptor-agonist interactions increased the stability of these AF-2 fixed motifs in the agonist conformation. In contrast, perturbation of the AF-2 fixed motifs by a ligand or another protein molecule led the AF-2 architecture to adopt an antagonist conformation. Knowledge of this process should provide us with novel insights into the 'agonism' and 'antagonism' of NRs.

  3. Minimal motif peptide structure of metzincin clan zinc peptidases in micelles.

    PubMed

    Onoda, Akira; Suzuki, Takako; Ishizuka, Hiroaki; Sugiyama, Rumiko; Ariyasu, Shinya; Yamamura, Takeshi

    2009-12-01

    It is well known that the functions of metalloproteins generally originate from their metal-binding motifs. However, the intrinsic nature of individual motifs remains unknown, particularly the details about metal-binding effects on the folding of motifs; the converse is also unknown, although there is no doubt that the motif is the core of the reactivity for each metalloprotein. In this study, we focused our attention on the zinc-binding motif of the metzincin clan family, HEXXHXXGXXH; this family contains the general zinc-binding sequence His-Glu-Xaa-Xaa-His (HEXXH) and the extended GXXH region. We adopted the motif sequence of stromelysin-1 and investigated the folding properties of the Trp-labeled peptides WAHEIAHSLGLFHA (STR-W1), AWHEIAHSLGLFHA (STR-W2), AHEIAHSLGWFHA (STR-W11), and AHEIAHSLGLFHWA (STR-W14) in the presence and absence of zinc ions in hydrophobic micellar environments by circular dichroism (CD) measurements. We accessed successful incorporation of these zinc peptides into micelles using quenching of Trp fluorescence. Results of CD studies indicated that two of the Trp-incorporated peptides, STR-W1 and STR-W14, exhibited helical folding in the hydrophobic region of cetyltrimethylammonium chloride micelle. The NMR structural analysis of the apo STR-W14 revealed that the conformation in the C-terminus GXXH region significantly differred between the apo state in the micelle and the reported Zn-bound state of stromelysin-1 in crystal structures. The structural analyses of the qualitative Zn-binding properties of this motif peptide provide an interesting Zn-binding mechanism: the minimum consensus motif in the metzincin clan, a basic zinc-binding motif with an extended GXXH region, has the potential to serve as a preorganized Zn binding scaffold in a hydrophobic environment.

  4. Integrating next-generation sequencing into clinical oncology: strategies, promises and pitfalls

    PubMed Central

    Horak, Peter; Fröhling, Stefan; Glimm, Hanno

    2016-01-01

    We live in an era of genomic medicine. The past five years brought about many significant achievements in the field of cancer genetics, driven by rapidly evolving technologies and plummeting costs of next-generation sequencing (NGS). The official completion of the Cancer Genome Project in 2014 led many to envision the clinical implementation of cancer genomic data as the next logical step in cancer therapy. Stemming from this vision, the term ‘precision oncology’ was coined to illustrate the novelty of this individualised approach. The basic assumption of precision oncology is that molecular markers detected by NGS will predict response to targeted therapies independently from tumour histology. However, along with a ubiquitous availability of NGS, the complexity and heterogeneity at the individual patient level had to be acknowledged. Not only does the latter present challenges to clinical decision-making based on sequencing data, it is also an obstacle to the rational design of clinical trials. Novel tissue-agnostic trial designs were quickly developed to overcome these challenges. Results from some of these trials have recently demonstrated the feasibility and efficacy of this approach. On the other hand, there is an increasing amount of whole-exome and whole-genome NGS data which allows us to assess ever smaller differences between individual patients with cancer. In this review, we highlight different tumour sequencing strategies currently used for precision oncology, describe their individual strengths and weaknesses, and emphasise their feasibility in different clinical settings. Further, we evaluate the possibility of NGS implementation in current and future clinical trials, and point to the significance of NGS for translational research. PMID:27933214

  5. Genomic integration of the full-length dystrophin coding sequence in Duchenne muscular dystrophy induced pluripotent stem cells.

    PubMed

    Farruggio, Alfonso P; Bhakta, Mital S; du Bois, Haley; Ma, Julia; P Calos, Michele

    2017-04-01

    The plasmid vectors that express the full-length human dystrophin coding sequence in human cells was developed. Dystrophin, the protein mutated in Duchenne muscular dystrophy, is extraordinarily large, providing challenges for cloning and plasmid production in Escherichia coli. The authors expressed dystrophin from the strong, widely expressed CAG promoter, along with co-transcribed luciferase and mCherry marker genes useful for tracking plasmid expression. Introns were added at the 3' and 5' ends of the dystrophin sequence to prevent translation in E. coli, resulting in improved plasmid yield. Stability and yield were further improved by employing a lower-copy number plasmid origin of replication. The dystrophin plasmids also carried an attB site recognized by phage phiC31 integrase, enabling the plasmids to be integrated into the human genome at preferred locations by phiC31 integrase. The authors demonstrated single-copy integration of plasmid DNA into the genome and production of human dystrophin in the human 293 cell line, as well as in induced pluripotent stem cells derived from a patient with Duchenne muscular dystrophy. Plasmid-mediated dystrophin expression was also demonstrated in mouse muscle. The dystrophin expression plasmids described here will be useful in cell and gene therapy studies aimed at ameliorating Duchenne muscular dystrophy.

  6. An AU-Rich Sequence Element (UUUN[A/U]U) Downstream of the Edited C in Apolipoprotein B mRNA Is a High-Affinity Binding Site for Apobec-1: Binding of Apobec-1 to This Motif in the 3′ Untranslated Region of c-myc Increases mRNA Stability

    PubMed Central

    Anant, Shrikant; Davidson, Nicholas O.

    2000-01-01

    Apobec-1, the catalytic subunit of the mammalian apolipoprotein B (apoB) mRNA-editing enzyme, is a cytidine deaminase with RNA binding activity for AU-rich sequences. This RNA binding activity is required for Apobec-1 to mediate C-to-U RNA editing. Filter binding assays, using immobilized Apobec-1, demonstrate saturable binding to a 105-nt apoB RNA with a Kd of ∼435 nM. A series of AU-rich templates was used to identify a high-affinity (∼50 nM) binding site of consensus sequence UUUN[A/U]U, with multiple copies of this sequence constituting the high-affinity binding site. In order to determine whether this consensus site could be functionally demonstrated from within an apoB RNA, circular-permutation analysis was performed, revealing one major (UUUGAU) and one minor (UU) site located 3 and 16 nucleotides, respectively, downstream of the edited base. Secondary-structure predictions reveal a stem-loop flanking the edited base with Apobec-1 binding to the consensus site(s) at an open loop. A similar consensus (AUUUA) is present in the 3′ untranslated regions of several mRNAs, including that of c-myc, that are known to undergo rapid degradation. In this context, it is presumed that the consensus motif acts as a destabilizing element. As an independent test of the ability of Apobec-1 to bind to this sequence, F442A cells were transfected with Apobec-1 and the half-life of c-myc mRNA was determined following actinomycin D treatment. These studies demonstrated an increase in the half-life of c-myc mRNA from 90 to 240 min in control versus Apobec-1-expressing cells. Apobec-1 expression mutants, in which RNA binding activity is eliminated, failed to alter c-myc mRNA turnover. Taken together, the data establish a consensus binding site for Apobec-1 embedded in proximity to the edited base in apoB RNA. Binding to this site in other target RNAs raises the possibility that Apobec-1 may be involved in other aspects of RNA metabolism, independent of its role as an apoB RNA

  7. Bioinformatics study of cancer-related mutations within p53 phosphorylation site motifs.

    PubMed

    Ji, Xiaona; Huang, Qiang; Yu, Long; Nussinov, Ruth; Ma, Buyong

    2014-07-29

    p53 protein has about thirty phosphorylation sites located at the N- and C-termini and in the core domain. The phosphorylation sites are relatively less mutated than other residues in p53. To understand why and how p53 phosphorylation sites are rarely mutated in human cancer, using a bioinformatics approaches, we examined the phosphorylation site and its nearby flanking residues, focusing on the consensus phosphorylation motif pattern, amino-acid correlations within the phosphorylation motifs, the propensity of structural disorder of the phosphorylation motifs, and cancer mutations observed within the phosphorylation motifs. Many p53 phosphorylation sites are targets for several kinases. The phosphorylation sites match 17 consensus sequence motifs out of the 29 classified. In addition to proline, which is common in kinase specificity-determining sites, we found high propensity of acidic residues to be adjacent to phosphorylation sites. Analysis of human cancer mutations in the phosphorylation motifs revealed that motifs with adjacent acidic residues generally have fewer mutations, in contrast to phosphorylation sites near proline residues. p53 phosphorylation motifs are mostly disordered. However, human cancer mutations within phosphorylation motifs tend to decrease the disorder propensity. Our results suggest that combination of acidic residues Asp and Glu with phosphorylation sites provide charge redundancy which may safe guard against loss-of-function mutations, and that the natively disordered nature of p53 phosphorylation motifs may help reduce mutational damage. Our results further suggest that engineering acidic amino acids adjacent to potential phosphorylation sites could be a p53 gene therapy strategy.

  8. An improved poly(A) motifs recognition method based on decision level fusion.

    PubMed

    Zhang, Shanxin; Han, Jiuqiang; Liu, Jun; Zheng, Jiguang; Liu, Ruiling

    2015-02-01

    Polyadenylation is the process of addition of poly(A) tail to mRNA 3' ends. Identification of motifs controlling polyadenylation plays an essential role in improving genome annotation accuracy and better understanding of the mechanisms governing gene regulation. The bioinformatics methods used for poly(A) motifs recognition have demonstrated that information extracted from sequences surrounding the candidate motifs can differentiate true motifs from the false ones greatly. However, these methods depend on either domain features or string kernels. To date, methods combining information from different sources have not been found yet. Here, we proposed an improved poly(A) motifs recognition method by combing different sources based on decision level fusion. First of all, two novel prediction methods was proposed based on support vector machine (SVM): one method is achieved by using the domain-specific features and principle component analysis (PCA) method to eliminate the redundancy (PCA-SVM); the other method is based on Oligo string kernel (Oligo-SVM). Then we proposed a novel machine-learning method for poly(A) motif prediction by marrying four poly(A) motifs recognition methods, including two state-of-the-art methods (Random Forest (RF) and HMM-SVM), and two novel proposed methods (PCA-SVM and Oligo-SVM). A decision level information fusion method was employed to combine the decision values of different classifiers by applying the DS evidence theory. We evaluated our method on a comprehensive poly(A) dataset that consists of 14,740 samples on 12 variants of poly(A) motifs and 2750 samples containing none of these motifs. Our method has achieved accuracy up to 86.13%. Compared with the four classifiers, our evidence theory based method reduces the average error rate by about 30%, 27%, 26% and 16%, respectively. The experimental results suggest that the proposed method is more effective for poly(A) motif recognition.

  9. A Combinatorial Code for Splicing Silencing: UAGG and GGGG Motifs

    PubMed Central

    An, Ping; Burge, Christopher B

    2005-01-01

    Alternative pre-mRNA splicing is widely used to regulate gene expression by tuning the levels of tissue-specific mRNA isoforms. Few regulatory mechanisms are understood at the level of combinatorial control despite numerous sequences, distinct from splice sites, that have been shown to play roles in splicing enhancement or silencing. Here we use molecular approaches to identify a ternary combination of exonic UAGG and 5′-splice-site-proximal GGGG motifs that functions cooperatively to silence the brain-region-specific CI cassette exon (exon 19) of the glutamate NMDA R1 receptor (GRIN1) transcript. Disruption of three components of the motif pattern converted the CI cassette into a constitutive exon, while predominant skipping was conferred when the same components were introduced, de novo, into a heterologous constitutive exon. Predominant exon silencing was directed by the motif pattern in the presence of six competing exonic splicing enhancers, and this effect was retained after systematically repositioning the two exonic UAGGs within the CI cassette. In this system, hnRNP A1 was shown to mediate silencing while hnRNP H antagonized silencing. Genome-wide computational analysis combined with RT-PCR testing showed that a class of skipped human and mouse exons can be identified by searches that preserve the sequence and spatial configuration of the UAGG and GGGG motifs. This analysis suggests that the multi-component silencing code may play an important role in the tissue-specific regulation of the CI cassette exon, and that it may serve more generally as a molecular language to allow for intricate adjustments and the coordination of splicing patterns from different genes. PMID:15828859

  10. Mars Exploration 2003 to 2013 - An Integrated Perspective: Time Sequencing the Missions

    NASA Technical Reports Server (NTRS)

    Briggs, G.; McKay, C.

    2000-01-01

    The science goals for the Mars exploration program, together with the HEDS precursor environmental and technology needs, serve as a solid starting point for re-planning the program in an orderly way. Most recently, the community has recognized the significance of subsurface sampling as a key component in "following the water". Accessing samples from hundreds and even thousands of meters beneath the surface is a challenge that will call for technology development and for one or more demonstration missions. Recent mission failures and concerns about the complexity of the previously planned MSR missions indicate that, before we are ready to undertake sample return and deep sampling, the Mars exploration program needs to include: 1) technology development missions; and 2) basic landing site assessment missions. These precursor missions should demonstrate the capability for reliable & accurate soft landing and in situ propellant production. The precursor missions will need to carry out close-up site observations, ground-penetrating radar mapping from orbit and conduct seismic surveys. Clearly the programs should be planned as a single, continuous exploration effort. A prudent minimum list of missions, including surface rovers with ranges of more than 10 km, can be derived from the numerous goals and requirements; they can be sequenced in an orderly way to ensure that time is available to feed forward the results of the precursor missions. One such sequence of missions is proposed for the decade beginning in 2003.

  11. A unified statistical model of protein multiple sequence alignment integrating direct coupling and insertions

    PubMed Central

    Kinjo, Akira R.

    2016-01-01

    The multiple sequence alignment (MSA) of a protein family provides a wealth of information in terms of the conservation pattern of amino acid residues not only at each alignment site but also between distant sites. In order to statistically model the MSA incorporating both short-range and long-range correlations as well as insertions, I have derived a lattice gas model of the MSA based on the principle of maximum entropy. The partition function, obtained by the transfer matrix method with a mean-field approximation, accounts for all possible alignments with all possible sequences. The model parameters for short-range and long-range interactions were determined by a self-consistent condition and by a Gaussian approximation, respectively. Using this model with and without long-range interactions, I analyzed the globin and V-set domains by increasing the “temperature” and by “mutating” a site. The correlations between residue conservation and various measures of the system’s stability indicate that the long-range interactions make the conservation pattern more specific to the structure, and increasingly stabilize better conserved residues. PMID:27924257

  12. Integrating next-generation sequencing and traditional tongue diagnosis to determine tongue coating microbiome

    PubMed Central

    Jiang, Bai; Liang, Xujun; Chen, Yang; Ma, Tao; Liu, Liyang; Li, Junfeng; Jiang, Rui; Chen, Ting; Zhang, Xuegong; Li, Shao

    2012-01-01

    Tongue diagnosis is a unique method in traditional Chinese medicine (TCM). This is the first investigation on the association between traditional tongue diagnosis and the tongue coating microbiome using next-generation sequencing. The study included 19 gastritis patients with a typical white-greasy or yellow-dense tongue coating corresponding to TCM Cold or Hot Syndrome respectively, as well as eight healthy volunteers. An Illumina paired-end, double-barcode 16S rRNA sequencing protocol was designed to profile the tongue-coating microbiome, from which approximately 3.7 million V6 tags for each sample were obtained. We identified 123 and 258 species-level OTUs that were enriched in patients with Cold/Hot Syndromes, respectively, representing "Cold Microbiota" and "Hot Microbiota". We further constructed the tongue microbiota-imbalanced networks associated with Cold/Hot Syndromes. The results reveal an important connection between the tongue-coating microbiome and traditional tongue diagnosis, and illustrate the potential of the tongue-coating microbiome as a novel holistic biomarker for characterizing patient subtypes. PMID:23226834

  13. Integrated genome sequence and linkage map of physic nut (Jatropha curcas L.), a biodiesel plant.

    PubMed

    Wu, Pingzhi; Zhou, Changpin; Cheng, Shifeng; Wu, Zhenying; Lu, Wenjia; Han, Jinli; Chen, Yanbo; Chen, Yan; Ni, Peixiang; Wang, Ying; Xu, Xun; Huang, Ying; Song, Chi; Wang, Zhiwen; Shi, Nan; Zhang, Xudong; Fang, Xiaohua; Yang, Qing; Jiang, Huawu; Chen, Yaping; Li, Meiru; Wang, Ying; Chen, Fan; Wang, Jun; Wu, Guojiang

    2015-03-01

    The family Euphorbiaceae includes some of the most efficient biomass accumulators. Whole genome sequencing and the development of genetic maps of these species are important components in molecular breeding and genetic improvement. Here we report the draft genome of physic nut (Jatropha curcas L.), a biodiesel plant. The assembled genome has a total length of 320.5 Mbp and contains 27,172 putative protein-coding genes. We established a linkage map containing 1208 markers and anchored the genome assembly (81.7%) to this map to produce 11 pseudochromosomes. After gene family clustering, 15,268 families were identified, of which 13,887 existed in the castor bean genome. Analysis of the genome highlighted specific expansion and contraction of a number of gene families during the evolution of this species, including the ribosome-inactivating proteins and oil biosynthesis pathway enzymes. The genomic sequence and linkage map provide a valuable resource not only for fundamental and applied research on physic nut but also for evolutionary and comparative genomics analysis, particularly in the Euphorbiaceae.

  14. Modulation of the oligomerization of myelin proteolipid protein by transmembrane helix interaction motifs.

    PubMed

    Ng, Derek P; Deber, Charles M

    2010-08-17

    Proteolipid protein (PLP) is a highly hydrophobic 276-residue integral membrane protein that constitutes more than 50% of the total protein in central nervous system myelin. Previous studies have shown that this protein exists in myelin as an oligomer rather than as a monomer, and mutations in PLP that lead to neurological disorders such as Pelizaeus-Merzbacher disease and spastic paraplegia type 2 have been reported to affect its normal oligomerization. Here we employ peptide-based and in vivo approaches to examine the role of the TM domain in the formation of PLP quaternary structure through homo-oligomeric helix-helix interactions. Focusing on the TM4 alpha-helix (sequence (239)FIAAFVGAAATLVSLLTFMIAATY(262)), the site of several disease-causing point mutations that involve putative small residue helix-helix interaction motifs in the TM4 sequence, we used SDS-PAGE, fluorescence resonance energy transfer, size-exclusion chromatography, and TOXCAT assays in an Escherichia coli membrane to show that the PLP TM4 helix readily assembles into varying oligomeric states. In addition, through targeted studies of the PLP TM4 alpha-helix with point mutations that selectively eliminate these small residue motifs via substitution of Gly, Ala, or Ser residues with Ile residues, we describe a potential mechanism through which disease-causing point mutations can lead to aberrant PLP assembly. The overall results suggest that TM segments in misfolded PLP monomers that expose and/or create surface-exposed helix-helix interaction sites that are normally masked may have consequences for disease.

  15. Limits on movement integration in children: The concatenation of trained subsequences into composite sequences as a specific experience-triggered skill.

    PubMed

    Ashtamker, Lilach; Karni, Avi

    2015-09-01

    Complex movement sequences may be easier to acquire in sub-segments. Nevertheless, the neuro-behavioral constraints on assembling short multi-element movement segments, acquired piecemeal and serially, into larger, composite units of action, are not clear. Here we examined the ability of children to combine movement subsequences into longer, composite, sequences. Eleven-year-olds were trained in the performance of two, 3-elements, finger-to-thumb opposition movement sequences and were tested, overnight, in the performance of composite, 6-elements, sequences. Two experiments were compared, differing only in whether or not a brief test for integration into a composite sequence was afforded immediately post-training. This composite sequence (Full) was a direct forward integration of the two subsequences, maintaining the order in which the two subsequences were trained. In both experiments, overnight performance of movement elements within the composite sequences was better than naive performance, but slower and less accurate compared to the performance of the identical movement elements in the context of the trained subsequences. Integration was as effective in the Full sequence as when the order between subsequences was switched (Reversed). However, the early test for subsequence integration was critical in inducing clear between-session ('offline') gains, as expressed in overnight performance, in both the Full and Reversed sequences. Without this brief experience in integration, no overnight gains were expressed in any of the 6-elements sequences. Moreover, the immediate post-training test resulted in a relative advantage of the Full and Reversed sequences over a 6-element sequence in which the order of the elements was mirror-reversed within each subsequence. Thus, training on subsequences may not spontaneously lead to an advantage in the performance of composite sequences, in children. However, an early brief experience with a composite sequence can suffice to

  16. H-NS Facilitates Sequence Diversification of Horizontally Transferred DNAs during Their Integration in Host Chromosomes

    PubMed Central

    Higashi, Koichi; Tobe, Toru; Kanai, Akinori; Uyar, Ebru; Ishikawa, Shu; Suzuki, Yutaka; Ogasawara, Naotake; Kurokawa, Ken; Oshima, Taku

    2016-01-01

    Bacteria can acquire new traits through horizontal gene transfer. Inappropriate expression of transferred genes, however, can disrupt the physiology of the host bacteria. To reduce this risk, Escherichia coli expresses the nucleoid-associated protein, H-NS, which preferentially binds to horizontally transferred genes to control their expression. Once expression is optimized, the horizontally transferred genes may actually contribute to E. coli survival in new habitats. Therefore, we investigated whether and how H-NS contributes to this optimization process. A comparison of H-NS binding profiles on common chromosomal segments of three E. coli strains belonging to different phylogenetic groups indicated that the positions of H-NS-bound regions have been conserved in E. coli strains. The sequences of the H-NS-bound regions appear to have diverged more so than H-NS-unbound regions only when H-NS-bound regions are located upstream or in coding regions of genes. Because these regions generally contain regulatory elements for gene expression, sequence divergence in these regions may be associated with alteration of gene expression. Indeed, nucleotide substitutions in H-NS-bound regions of the ybdO promoter and coding regions have diversified the potential for H-NS-independent negative regulation among E. coli strains. The ybdO expression in these strains was still negatively regulated by H-NS, which reduced the effect of H-NS-independent regulation under normal growth conditions. Hence, we propose that, during E. coli evolution, the conservation of H-NS binding sites resulted in the diversification of the regulation of horizontally transferred genes, which may have facilitated E. coli adaptation to new ecological niches. PMID:26789284

  17. Integrated genomic sequencing reveals mutational landscape of T-cell prolymphocytic leukemia.

    PubMed

    Kiel, Mark J; Velusamy, Thirunavukkarasu; Rolland, Delphine; Sahasrabuddhe, Anagh A; Chung, Fuzon; Bailey, Nathanael G; Schrader, Alexandra; Li, Bo; Li, Jun Z; Ozel, Ayse B; Betz, Bryan L; Miranda, Roberto N; Medeiros, L Jeffrey; Zhao, Lili; Herling, Marco; Lim, Megan S; Elenitoba-Johnson, Kojo S J

    2014-08-28

    The comprehensive genetic alterations underlying the pathogenesis of T-cell prolymphocytic leukemia (T-PLL) are unknown. To address this, we performed whole-genome sequencing (WGS), whole-exome sequencing (WES), high-resolution copy-number analysis, and Sanger resequencing of a large cohort of T-PLL. WGS and WES identified novel mutations in recurrently altered genes not previously implicated in T-PLL including EZH2, FBXW10, and CHEK2. Strikingly, WGS and/or WES showed largely mutually exclusive mutations affecting IL2RG, JAK1, JAK3, or STAT5B in 38 of 50 T-PLL genomes (76.0%). Notably, gain-of-function IL2RG mutations are novel and have not been reported in any form of cancer. Further, high-frequency mutations in STAT5B have not been previously reported in T-PLL. Functionally, IL2RG-JAK1-JAK3-STAT5B mutations led to signal transducer and activator of transcription 5 (STAT5) hyperactivation, transformed Ba/F3 cells resulting in cytokine-independent growth, and/or enhanced colony formation in Jurkat T cells. Importantly, primary T-PLL cells exhibited constitutive activation of STAT5, and targeted pharmacologic inhibition of STAT5 with pimozide induced apoptosis in primary T-PLL cells. These results for the first time provide a portrait of the mutational landscape of T-PLL and implicate deregulation of DNA repair and epigenetic modulators as well as high-frequency mutational activation of the IL2RG-JAK1-JAK3-STAT5B axis in the pathogenesis of T-PLL. These findings offer opportunities for novel targeted therapies in this aggressive leukemia.

  18. Distribution of hammerhead and hammerhead-like RNA motifs through the GenBank.

    PubMed

    Ferbeyre, G; Bourdeau, V; Pageau, M; Miramontes, P; Cedergren, R

    2000-07-01

    Hammerhead ribozymes previously were found in satellite RNAs from plant viroids and in repetitive DNA from certain species of newts and schistosomes. To determine if this catalytic RNA motif has a wider distribution, we decided to scrutinize the GenBank database for RNAs that contain hammerhead or hammerhead-like motifs. The search shows a widespread distribution of this kind of RNA motif in different sequences suggesting that they might have a more general role in RNA biology. The frequency of the hammerhead motif is half of that expected from a random distribution, but this fact comes from the low CpG representation in vertebrate sequences and the bias of the GenBank for those sequences. Intriguing motifs include those found in several families of repetitive sequences, in the satellite RNA from the carrot red leaf luteovirus, in plant viruses like the spinach latent virus and the elm mottle virus, in animal viruses like the hepatitis E virus and the caprine encephalitis virus, and in mRNAs such as those coding for cytochrome P450 oxidoreductase in the rat and the hamster.

  19. qPMS9: An Efficient Algorithm for Quorum Planted Motif Search

    NASA Astrophysics Data System (ADS)

    Nicolae, Marius; Rajasekaran, Sanguthevar

    2015-01-01

    Discovering patterns in biological sequences is a crucial problem. For example, the identification of patterns in DNA sequences has resulted in the determination of open reading frames, identification of gene promoter elements, intron/exon splicing sites, and SH RNAs, location of RNA degradation signals, identification of alternative splicing sites, etc. In protein sequences, patterns have led to domain identification, location of protease cleavage sites, identification of signal peptides, protein interactions, determination of protein degradation elements, identification of protein trafficking elements, discovery of short functional motifs, etc. In this paper we focus on the identification of an important class of patterns, namely, motifs. We study the (l, d) motif search problem or Planted Motif Search (PMS). PMS receives as input n strings and two integers l and d. It returns all sequences M of length l that occur in each input string, where each occurrence differs from M in at most d positions. Another formulation is quorum PMS (qPMS), where the motif appears in at least q% of the strings. We introduce qPMS9, a parallel exact qPMS algorithm that offers significant runtime improvements on DNA and protein datasets. qPMS9 solves the challenging DNA (l, d)-instances (28, 12) and (30, 13). The source code is available at https://code.google.com/p/qpms9/.

  20. S/MAR sequence confers long-term mitotic stability on non-integrating lentiviral vector episomes without selection.

    PubMed

    Verghese, Santhosh Chakkaramakkil; Goloviznina, Natalya A; Skinner, Amy M; Lipps, Hans J; Kurre, Peter

    2014-04-01

    Insertional oncogene activation and aberrant splicing have proved to be major setbacks for retroviral stem cell gene therapy. Integrase-deficient human immunodeficiency virus-1-derived vectors provide a potentially safer approach, but their circular genomes are rapidly lost during cell division. Here we describe a novel lentiviral vector (LV) that incorporates human ß-interferon scaffold/matrix-associated region sequences to provide an origin of replication for long-term mitotic maintenance of the episomal LTR circles. The resulting 'anchoring' non-integrating lentiviral vector (aniLV) achieved initial transduction rates comparable with integrating vector followed by progressive establishment of long-term episomal expression in a subset of cells. Analysis of aniLV-transduced single cell-derived clones maintained without selective pressure for >100 rounds of cell division showed sustained transgene expression from episomes and provided molecular evidence for long-term episome maintenance. To evaluate aniLV performance in primary cells, we transduced lineage-depleted murine hematopoietic progenitor cells, observing GFP expression in clonogenic progenitor colonies and peripheral blood leukocyte chimerism following transplantation into conditioned hosts. In aggregate, our studies suggest that scaffold/matrix-associated region elements can serve as molecular anchors for non-integrating lentivector episomes, providing sustained gene expression through successive rounds of cell division and progenitor differentiation in vitro and in vivo.

  1. Evolutionary time-scale of the begomoviruses: evidence from integrated sequences in the Nicotiana genome.

    PubMed

    Lefeuvre, Pierre; Harkins, Gordon W; Lett, Jean-Michel; Briddon, Rob W; Chase, Mark W; Moury, Benoit; Martin, Darren P

    2011-01-01

    Despite having single stranded DNA genomes that are replicated by host DNA polymerases, viruses in the family Geminiviridae are apparently evolving as rapidly as some RNA viruses. The observed substitution rates of geminiviruses in the genera Begomovirus and Mastrevirus are so high that the entire family could conceivably have originated less than a million years ago (MYA). However, the existence of geminivirus related DNA (GRD) integrated within the genomes of various Nicotiana species suggests that the geminiviruses probably originated >10 MYA. Some have even suggested that a distinct New-World (NW) lineage of begomoviruses may have arisen following the separation by continental drift of African and American proto-begomoviruses ∼110 MYA. We evaluate these various geminivirus origin hypotheses using Bayesian coalescent-based approaches to date firstly the Nicotiana GRD integration events, and then the divergence of the NW and Old-World (OW) begomoviruses. Besides rejecting the possibility of a<2 MYA OW-NW begomovirus split, we could also discount that it may have occurred concomitantly with the breakup of Gondwanaland 110 MYA. Although we could only confidently narrow the date of the split down to between 2 and 80 MYA, the most plausible (and best supported) date for the split is between 20 and 30 MYA--a time when global cooling ended the dispersal of temperate species between Asia and North America via the Beringian land bridge.

  2. [Psychopathological study of lie motif in schizophrenia].

    PubMed

    Otsuka, Koichiro; Kato, Satoshi

    2006-01-01

    The theme of a statement is called "lie motif" by the authors when schizophrenic patients say "I have lied to anybody". We tried to analyse of the psychopathological characteristics and anthropological meanings of the lie motifs in schizophrenia, which has not been thematically examined until now, based on 4 cases, and contrasting with the lie motif (Lügenmotiv) in depression taken up by A. Kraus (1989). We classified the lie motifs in schizophrenia into the following two types: a) the past directive lie motif: the patients speak about their real lie regarding it as a 'petty fault' in their distant past with self-guilty feeling, b) the present directive lie motif: the patients say repeatedly 'I have lied' (about their present speech and behavior), retreating from their previous commitments. The observed false confessions of innocent fault by the patients seem to belong to the present directed lie motif. In comparison with the lie motif in depression, it is characteristic for the lie motif in schizophrenia that the patients feel themselves to already have been caught out by others before they confess the lie. The lie motif in schizophrenia seems to come into being through the attribution process of taking the others' blame on ones' own shoulders, which has been pointed out to be common in the guilt experience in schizophrenia. The others' blame on this occasion is due to "the others' gaze" in the experience of the initial self-centralization (i.e. non delusional self-referential experience) in the early stage of schizophrenia (S. Kato 1999). The others' gaze is supposed to bring about the feeling of amorphous self-revelation which could also be regarded as the guilt feeling without content, to the patients. When the guilt feeling is bound with a past concrete fault, the patients tell the past directive lie motif. On the other hand, when the patients cannot find a past fixed content, and feel their present actions as uncertain and experience them as lies, the

  3. Assembly of supramolecular DNA complexes containing both G-quadruplexes and i-motifs by enhancing the G-repeat-bearing capacity of i-motifs

    PubMed Central

    Cao, Yanwei; Gao, Shang; Yan, Yuting; Bruist, Michael F.; Wang, Bing; Guo, Xinhua

    2017-01-01

    The single-step assembly of supramolecular complexes containing both i-motifs and G-quadruplexes (G4s) is demonstrated. This can be achieved because the formation of four-stranded i-motifs appears to be little affected by certain terminal residues: a five-cytosine tetrameric i-motif can bear ten-base flanking residues. However, things become complex when different lengths of guanine-repeats are added at the 3′ or 5′ ends of the cytosine-repeats. Here, a series of oligomers d(XGiXC5X) and d(XC5XGiX) (X = A, T or none; i < 5) are designed to study the impact of G-repeats on the formation of tetrameric i-motifs. Our data demonstrate that tetramolecular i-motif structure can tolerate specific flanking G-repeats. Assemblies of these oligonucleotides are polymorphic, but may be controlled by solution pH and counter ion species. Importantly, we find that the sequences d(TGiAC5) can form the tetrameric i-motif in large quantities. This leads to the design of two oligonucleotides d(TG4AC7) and d(TGBrGGBrGAC7) that self-assemble to form quadruplex supramolecules under certain conditions. d(TG4AC7) forms supramolecules under acidic conditions in the presence of K+ that are mainly V-shaped or ring-like containing parallel G4s and antiparallel i-motifs. d(TGBrGGBrGAC7) forms long linear quadruplex wires under acidic conditions in the presence of Na+ that consist of both antiparallel G4s and i-motifs. PMID:27899568

  4. RNA Bricks—a database of RNA 3D motifs and their interactions

    PubMed Central

    Chojnowski, Grzegorz; Waleń, Tomasz; Bujnicki, Janusz M.

    2014-01-01

    The RNA Bricks database (http://iimcb.genesilico.pl/rnabricks), stores information about recurrent RNA 3D motifs and their interactions, found in experimentally determined RNA structures and in RNA–protein complexes. In contrast to other similar tools (RNA 3D Motif Atlas, RNA Frabase, Rloom) RNA motifs, i.e. ‘RNA bricks’ are presented in the molecular environment, in which they were determined, including RNA, protein, metal ions, water molecules and ligands. All nucleotide residues in RNA bricks are annotated with structural quality scores that describe real-space correlation coefficients with the electron density data (if available), backbone geometry and possible steric conflicts, which can be used to identify poorly modeled residues. The database is also equipped with an algorithm for 3D motif search and comparison. The algorithm compares spatial positions of backbone atoms of the user-provided query structure and of stored RNA motifs, without relying on sequence or secondary structure information. This enables the identification of local structural similarities among evolutionarily related and unrelated RNA molecules. Besides, the search utility enables searching ‘RNA bricks’ according to sequence similarity, and makes it possible to identify motifs with modified ribonucleotide residues at specific positions. PMID:24220091

  5. Integrated exome and transcriptome sequencing reveals ZAK isoform usage in gastric cancer

    PubMed Central

    Liu, Jinfeng; McCleland, Mark; Stawiski, Eric W.; Gnad, Florian; Mayba, Oleg; Haverty, Peter M.; Durinck, Steffen; Chen, Ying-Jiun; Klijn, Christiaan; Jhunjhunwala, Suchit; Lawrence, Michael; Liu, Hanbin; Wan, Yinan; Chopra, Vivek; Yaylaoglu, Murat B.; Yuan, Wenlin; Ha, Connie; Gilbert, Houston N.; Reeder, Jens; Pau, Gregoire; Stinson, Jeremy; Stern, Howard M.; Manning, Gerard; Wu, Thomas D.; Neve, Richard M.; de Sauvage, Frederic J.; Modrusan, Zora; Seshagiri, Somasekar; Firestein, Ron; Zhang, Zemin

    2014-01-01

    Gastric cancer is the second leading cause of worldwide cancer mortality, yet the underlying genomic alterations remain poorly understood. Here we perform exome and transcriptome sequencing and SNP array assays to characterize 51 primary gastric tumours and 32 cell lines. Meta-analysis of exome data and previously published data sets reveals 24 significantly mutated genes in microsatellite stable (MSS) tumours and 16 in microsatellite instable (MSI) tumours. Over half the patients in our collection could potentially benefit from targeted therapies. We identify 55 splice site mutations accompanied by aberrant splicing products, in addition to mutation-independent differential isoform usage in tumours. ZAK kinase isoform TV1 is preferentially upregulated in gastric tumours and cell lines relative to normal samples. This pattern is also observed in colorectal, bladder and breast cancers. Overexpression of this particular isoform activates multiple cancer-related transcription factor reporters, while depletion of ZAK in gastric cell lines inhibits proliferation. These results reveal the spectrum of genomic and transcriptomic alterations in gastric cancer, and identify isoform-specific oncogenic properties of ZAK. PMID:24807215

  6. Dipeptide frequency/bias analysis identifies conserved sites of nonrandomness shared by cysteine-rich motifs.

    PubMed

    Campion, S R; Ameen, A S; Lai, L; King, J M; Munzenmaier, T N

    2001-08-15

    This report describes the application of a simple computational tool, AAPAIR.TAB, for the systematic analysis of the cysteine-rich EGF, Sushi, and Laminin motif/sequence families at the two-amino acid level. Automated dipeptide frequency/bias analysis detects preferences in the distribution of amino acids in established protein families, by determining which "ordered dipeptides" occur most frequently in comprehensive motif-specific sequence data sets. Graphic display of the dipeptide frequency/bias data revealed family-specific preferences for certain dipeptides, but more importantly detected a shared preference for employment of the ordered dipeptides Gly-Tyr (GY) and Gly-Phe (GF) in all three protein families. The dipeptide Asn-Gly (NG) also exhibited high-frequency and bias in the EGF and Sushi motif families, whereas Asn-Thr (NT) was distinguished in the Laminin family. Evaluation of the distribution of dipeptides identified by frequency/bias analysis subsequently revealed the highly restricted localization of the G(F/Y) and N(G/T) sequence elements at two separate sites of extreme conservation in the consensus sequence of all three sequence families. The similar employment of the high-frequency/bias dipeptides in three distinct protein sequence families was further correlated with the concurrence of these shared molecular determinants at similar positions within the distinctive scaffolds of three structurally divergent, but similarly employed, motif modules.

  7. A motif rich in charged residues determines product specificity in isomaltulose synthase.

    PubMed

    Zhang, Daohai; Li, Nan; Swaminathan, Kunchithapadam; Zhang, Lian Hui

    2003-01-16

    Isomaltulose synthase (PalI) catalyzes hydrolysis of sucrose and formation of alpha-1,6 and alpha-1,1 bonds to produce isomaltulose (alpha-D-glucosylpyranosyl-1,6-D-fructofranose) and small amount of trehalulose (alpha-D-glucosylpyranosyl-1,1-D-fructofranose). A potential isomaltulose synthase-specific motif ((325)RLDRD(329)), that contains a 'DxD' motif conserved in many glycosyltransferases, was identified based on sequence comparison with reference to the secondary structural features of PalI and homologs. Site-directed mutagenesis analysis of the motif showed that the four charged amino acid residues (Arg(325), Arg(328), Asp(327) and Asp(329)) influence the enzyme kinetics and determine the product specificity. Mutation of these four residues increased trehalulose formation by 17-61% and decreased isomaltulose by 26-67%. We conclude that the 'RLDRD' motif controls the product specificity of PalI.

  8. Genomic analysis of membrane protein families: abundance and conserved motifs

    PubMed Central

    Liu, Yang; Engelman, Donald M; Gerstein, Mark

    2002-01-01

    Background Polytopic membrane proteins can be related to each other on the basis of the number of transmembrane helices and sequence similarities. Building on the Pfam classification of protein domain families, and using transmembrane-helix prediction and sequence-similarity searching, we identified a total of 526 well-characterized membrane protein families in 26 recently sequenced genomes. To this we added a clustering of a number of predicted but unclassified membrane proteins, resulting in a total of 637 membrane protein families. Results Analysis of the occurrence and composition of these families revealed several interesting trends. The number of assigned membrane protein domains has an approximately linear relationship to the total number of open reading frames (ORFs) in 26 genomes studied. Caenorhabditis elegans is an apparent outlier, because of its high representation of seven-span transmembrane (7-TM) chemoreceptor families. In all genomes, including that of C. elegans, the number of distinct membrane protein families has a logarithmic relation to the number of ORFs. Glycine, proline, and tyrosine locations tend to be conserved in transmembrane regions within families, whereas isoleucine, valine, and methionine locations are relatively mutable. Analysis of motifs in putative transmembrane helices reveals that GxxxG and GxxxxxxG (which can be written GG4 and GG7, respectively; see Materials and methods) are among the most prevalent. This was noted in earlier studies; we now find these motifs are particularly well conserved in families, however, especially those corresponding to transporters, symporters, and channels. Conclusions We carried out a genome-wide analysis on patterns of the classified polytopic membrane protein families and analyzed the distribution of conserved amino acids and motifs in the transmembrane helix regions in these families. PMID:12372142

  9. An update on cell surface proteins containing extensin-motifs.

    PubMed

    Borassi, Cecilia; Sede, Ana R; Mecchia, Martin A; Salgado Salter, Juan D; Marzol, Eliana; Muschietti, Jorge P; Estevez, Jose M

    2016-01-01

    In recent years it has become clear that there are several molecular links that interconnect the plant cell surface continuum, which is highly important in many biological processes such as plant growth, development, and interaction with the environment. The plant cell surface continuum can be defined as the space that contains and interlinks the cell wall, plasma membrane and cytoskeleton compartments. In this review, we provide an updated view of cell surface proteins that include modular domains with an extensin (EXT)-motif followed by a cytoplasmic kinase-like domain, known as PERKs (for proline-rich extensin-like receptor kinases); with an EXT-motif and an actin binding domain, known as formins; and with extracellular hybrid-EXTs. We focus our attention on the EXT-motifs with the short sequence Ser-Pro(3-5), which is found in several different protein contexts within the same extracellular space, highlighting a putative conserved structural and functional role. A closer understanding of the dynamic regulation of plant cell surface continuum and its relationship with the downstream signalling cascade is a crucial forthcoming challenge.

  10. A novel swarm intelligence algorithm for finding DNA motifs.

    PubMed

    Lei, Chengwei; Ruan, Jianhua

    2009-01-01

    Discovering DNA motifs from co-expressed or co-regulated genes is an important step towards deciphering complex gene regulatory networks and understanding gene functions. Despite significant improvement in the last decade, it still remains one of the most challenging problems in computational molecular biology. In this work, we propose a novel motif finding algorithm that finds consensus patterns using a population-based stochastic optimisation technique called Particle Swarm Optimisation (PSO), which has been shown to be effective in optimising difficult multidimensional problems in continuous domains. We propose to use a word dissimilarity graph to remap the neighborhood structure of the solution space of DNA motifs, and propose a modification of the naive PSO algorithm to accommodate discrete variables. In order to improve efficiency, we also propose several strategies for escaping from local optima and for automatically determining the termination criteria. Experimental results on simulated challenge problems show that our method is both more efficient and more accurate than several existing algorithms. Applications to several sets of real promoter sequences also show that our approach is able to detect known transcription factor binding sites, and outperforms two of the most popular existing algorithms.

  11. The sorting sequence of the peroxisomal integral membrane protein PMP47 is contained within a short hydrophilic loop

    PubMed Central

    1996-01-01

    No targeting sequence for peroxisomal integral membrane proteins has yet been identified. We have previously shown that a region of 67 amino acids is necessary to target Pmp47, a protein that spans the membrane six times, to peroxisomes. This region comprises two membrane spans and the intervening loop. We now demonstrate that the 20 amino acid loop, which is predicted to face the matrix, is both necessary and sufficient for peroxisomal targeting. Sufficiency was demonstrated with both chloramphenicol acetyltransferase and green fluorescent protein as carriers. There is a cluster of basic amino acids in the middle of the loop that we predict protrudes from the membrane surface into the matrix by a flanking stem structure. We show that the targeting signal is composed of this basic cluster and a block of amino acids immediately down-stream from it. PMID:8609161

  12. Sliding mode control of dissolved oxygen in an integrated nitrogen removal process in a sequencing batch reactor (SBR).

    PubMed

    Muñoz, C; Young, H; Antileo, C; Bornhardt, C

    2009-01-01

    This paper presents a sliding mode controller (SMC) for dissolved oxygen (DO) in an integrated nitrogen removal process carried out in a suspended biomass sequencing batch reactor (SBR). The SMC performance was compared against an auto-tuning PI controller with parameters adjusted at the beginning of the batch cycle. A method for cancelling the slow DO sensor dynamics was implemented by using a first order model of the sensor. Tests in a lab-scale reactor showed that the SMC offers a better disturbance rejection capability than the auto-tuning PI controller, furthermore providing reasonable performance in a wide range of operation. Thus, SMC becomes an effective robust nonlinear tool to the DO control in this process, being also simple from a computational point of view, allowing its implementation in devices such as industrial programmable logic controllers (PLCs).

  13. Integrated profiling of microRNA expression in membranous nephropathy using high-throughput sequencing technology.

    PubMed

    Chen, Wenbiao; Lin, Xiaocong; Huang, Jianrong; Tan, Kuibi; Chen, Yuyu; Peng, Wujian; Li, Wuxian; Dai, Yong

    2014-01-01

    The present study analyzed microRNA (miRNA) expression profiles in peripheral blood lymphocyte cells (PBLCs) from patients with membranous nephropathy (MN) and normal controls (NC), in an effort to improve the understanding of the pathogenesis of MN. High-throughput sequencing was performed on 30 MN patients and 30 healthy individuals (NC group). Known and novel miRNAs were analyzed and the results were confirmed by quantitative reverse transcription PCR (qRT-PCR). In total, 326 miRNAs showed a significant difference in expression between the MN and NC groups. This included 286 downregulated miRNAs and 40 upregulated miRNAs. In addition, there were 6 novel miRNAs that presented differential levels of expression between the MN and NC groups. The miRNAs were mapped to the genome, using a short oligonucleotide alignment program (SOAP), to analyze their expression and distribution. Twenty-five percent of the unique miRNAs in the MN group and 52.1% in the NC group were mapped to the genome. One hundred and eight mismatches were identified. Seventy-seven mismatches were detected in a higher proportion of the MN samples, compared with the NC samples. Twenty-five mismatches were detected in a higher proportion of the NC samples than the MN samples. Differential miRNA expression was also detected between 10 randomly selected pair groups, as depicted in a cluster analysis diagram. These data indicate that differential miRNA expression may be involved in the pathogenesis of MN. In addition, the discrepancies between the MN and NC groups, in the mismatched miRNAs that were mapped to the genome, strongly suggest that miRNAs play an important role in the pathogenesis of human disorders. miRNAs may provide a potential breakthrough in the research of MN and may provide a novel biomarker for the diagnosis and treatment of the disease.

  14. Integrated genome analysis suggests that most conserved non-coding sequences are regulatory factor binding sites

    PubMed Central

    Hemberg, Martin; Gray, Jesse M.; Cloonan, Nicole; Kuersten, Scott; Grimmond, Sean; Greenberg, Michael E.; Kreiman, Gabriel

    2012-01-01

    More than 98% of a typical vertebrate genome does not code for proteins. Although non-coding regions are sprinkled with short (<200 bp) islands of evolutionarily conserved sequences, the function of most of these unannotated conserved islands remains unknown. One possibility is that unannotated conserved islands could encode non-coding RNAs (ncRNAs); alternatively, unannotated conserved islands could serve as promoter-distal regulatory factor binding sites (RFBSs) like enhancers. Here we assess these possibilities by comparing unannotated conserved islands in the human and mouse genomes to transcribed regions and to RFBSs, relying on a detailed case study of one human and one mouse cell type. We define transcribed regions by applying a novel transcript-calling algorithm to RNA-Seq data obtained from total cellular RNA, and we define RFBSs using ChIP-Seq and DNAse-hypersensitivity assays. We find that unannotated conserved islands are four times more likely to coincide with RFBSs than with unannotated ncRNAs. Thousands of conserved RFBSs can be categorized as insulators based on the presence of CTCF or as enhancers based on the presence of p300/CBP and H3K4me1. While many unannotated conserved RFBSs are transcriptionally active to some extent, the transcripts produced tend to be unspliced, non-polyadenylated and expressed at levels 10 to 100-fold lower than annotated coding or ncRNAs. Extending these findings across multiple cell types and tissues, we propose that most conserved non-coding genomic DNA in vertebrate genomes corresponds to promoter-distal regulatory elements. PMID:22684627

  15. Comprehensively identifying and characterizing the missing gene sequences in human reference genome with integrated analytic approaches.

    PubMed

    Chen, Geng; Wang, Charles; Shi, Leming; Tong, Weida; Qu, Xiongfei; Chen, Jiwei; Yang, Jianmin; Shi, Caiping; Chen, Long; Zhou, Peiying; Lu, Bingxin; Shi, Tieliu

    2013-08-01

    The human reference genome is still incomplete and a number of gene sequences are missing from it. The approaches to uncover them, the reasons causing their absence and their functions are less explored. Here, we comprehensively identified and characterized the missing genes of human reference genome with RNA-Seq data from 16 different human tissues. By using a combined approach of genome-guided transcriptome reconstruction coupled with genome-wide comparison, we uncovered 3.78 and 2.37 Mb transcribed regions in the human genome assemblies of Celera and HuRef either missed from their homologous chromosomes of NCBI human reference genome build 37.2 or partially or entirely absent from the reference. We further identified a significant number of novel transcript contigs in each tissue from de novo transcriptome assembly that are unalignable to NCBI build 37.2 but can be aligned to at least one of the genomes from Celera, HuRef, chimpanzee, macaca or mouse. Our analyses indicate that the missing genes could result from genome misassembly, transposition, copy number variation, translocation and other structural variations. Moreover, our results further suggest that a large portion of these missing genes are conserved between human and other mammals, implying their important biological functions. Totally, 1,233 functional protein domains were detected in these missing genes. Collectively, our study not only provides approaches for uncovering the missing genes of a genome, but also proposes the potential reasons causing genes missed from the genome and highlights the importance of uncovering the missing genes of incomplete genomes.

  16. Membrane-bound fatty acid desaturases are inserted co-translationally into the ER and contain different ER retrieval motifs at their carboxy termini.

    PubMed

    McCartney, Andrew W; Dyer, John M; Dhanoa, Preetinder K; Kim, Peter K; Andrews, David W; McNew, James A; Mullen, Robert T

    2004-01-01

    Fatty acid desaturases (FADs) play a prominent role in plant lipid metabolism and are located in various subcellular compartments, including the endoplasmic reticulum (ER). To investigate the biogenesis of ER-localized membrane-bound FADs, we characterized the mechanisms responsible for insertion of Arabidopsis FAD2 and Brassica FAD3 into ER membranes and determined the molecular signals that maintain their ER residency. Using in vitro transcription/translation reactions with ER-derived microsomes, we show that both FAD2 and FAD3 are efficiently integrated into membranes by a co-translational, translocon-mediated pathway. We also demonstrate that while the C-terminus of FAD3 (-KSKIN) contains a functional prototypic dilysine ER retrieval motif, FAD2 contains a novel C-terminal aromatic amino acid-containing sequence (-YNNKL) that is both necessary and sufficient for maintaining localization in the ER. Co-expression of a membrane-bound reporter protein containing the FAD2 C-terminus with a dominant-negative mutant of ADP-ribosylation factor (Arf)1 abolished transient localization of the reporter protein in the Golgi, indicating that the FAD2 peptide signal acts as an ER retrieval motif. Mutational analysis of the FAD2 ER retrieval signal revealed a sequence-specific motif consisting of Phi-X-X-K/R/D/E-Phi-COOH, where -Phi- are large hydrophobic amino acid residues. Interestingly, this aromatic motif was present in a variety of other known and putative ER membrane proteins, including cytochrome P450 and the peroxisomal biogenesis factor Pex10p. Taken together, these data describe the insertion and retrieval mechanisms of FADs and define a new ER localization signal in plants that is responsible for the retrieval of escaped membrane proteins back to the ER.

  17. Functional implications of local DNA structures in regulatory motifs.

    PubMed

    Xiang, Qian

    2013-01-01

    The three-dimensional structure of DNA has been proposed to be a major determinant for functional transcription factors (TFs) and DNA interaction. Here, we use hydroxyl radical cleavage pattern as a measure of local DNA structure. We compared the conservation between DNA sequence and structure in terms of information content and attempted to assess the functional implications of DNA structures in regulatory motifs. We used statistical methods to evaluate the structural divergence of substituting a single position within a binding site and applied them to a collection of putative regulatory motifs. The following are our major observations: (i) we observed more information in structural alignment than in the corresponding sequence alignment for most of the transcriptional factors; (ii) for each TF, majority of positions have more information in the structural alignment as compared to the sequence alignment; (iii) we further defined a DNA structural divergence score (SD score) for each wild-type and mutant pair that is distinguished by single-base mutation. The SD score for benign mutations is significantly lower than that of switch mutations. This indicates structural conservation is also important for TFBS to be functional and DNA structures will provide previously unappreciated information for TF to realize the binding specificity.

  18. Discovering interacting domains and motifs in protein-protein interactions.

    PubMed

    Hugo, Willy; Sung, Wing-Kin; Ng, See-Kiong

    2013-01-01

    Many important biological processes, such as the signaling pathways, require protein-protein interactions (PPIs) that are designed for fast response to stimuli. These interactions are usually transient, easily formed, and disrupted, yet specific. Many of these transient interactions involve the binding of a protein domain to a short stretch (3-10) of amino acid residues, which can be characterized by a sequence pattern, i.e., a short linear motif (SLiM). We call these interacting domains and motifs domain-SLiM interactions. Existing methods have focused on discovering SLiMs in the interacting proteins' sequence data. With the recent increase in protein structures, we have a new opportunity to detect SLiMs directly from the proteins' 3D structures instead of their linear sequences. In this chapter, we describe a computational method called SLiMDIet to directly detect SLiMs on domain interfaces extracted from 3D structures of PPIs. SLiMDIet comprises two steps: (1) interaction interfaces belonging to the same domain are extracted and grouped together using structural clustering and (2) the extracted interaction interfaces in each cluster are structurally aligned to extract the corresponding SLiM. Using SLiMDIet, de novo SLiMs interacting with protein domains can be computationally detected from structurally clustered domain-SLiM interactions for PFAM domains which have available 3D structures in the PDB database.

  19. SA-Mot: a web server for the identification of motifs of interest extracted from protein loops.

    PubMed

    Regad, Leslie; Saladin, Adrien; Maupetit, Julien; Geneix, Colette; Camproux, Anne-Claude

    2011-07-01

    The detection of functional motifs is an important step for the determination of protein functions. We present here a new web server SA-Mot (Structural Alphabet Motif) for the extraction and location of structural motifs of interest from protein loops. Contrary to other methods, SA-Mot does not focus only on functional motifs, but it extracts recurrent and conserved structural motifs involved in structural redundancy of loops. SA-Mot uses the structural word notion to extract all structural motifs from uni-dimensional sequences corresponding to loop structures. Then, SA-Mot provides a description of these structural motifs using statistics computed in the loop data set and in SCOP superfamily, sequence and structural parameters. SA-Mot results correspond to an interactive table listing all structural motifs extracted from a target structure and their associated descriptors. Using this information, the users can easily locate loop regions that are important for the protein folding and function. The SA-Mot web server is available at http://sa-mot.mti.univ-paris-diderot.fr.

  20. Identification of a putative nuclear export signal motif in human NANOG homeobox domain

    SciTech Connect

    Park, Sung-Won; Do, Hyun-Jin; Huh, Sun-Hyung; Sung, Boreum; Uhm, Sang-Jun; Song, Hyuk; Kim, Nam-Hyung; Kim, Jae-Hwan

    2012-05-11

    Highlights: Black-Right-Pointing-Pointer We found the putative nuclear export signal motif within human NANOG homeodomain. Black-Right-Pointing-Pointer Leucine-rich residues are important for human NANOG homeodomain nuclear export. Black-Right-Pointing-Pointer CRM1-specific inhibitor LMB blocked the potent human NANOG NES-mediated nuclear export. -- Abstract: NANOG is a homeobox-containing transcription factor that plays an important role in pluripotent stem cells and tumorigenic cells. To understand how nuclear localization of human NANOG is regulated, the NANOG sequence was examined and a leucine-rich nuclear export signal (NES) motif ({sup 125}MQELSNILNL{sup 134}) was found in the homeodomain (HD). To functionally validate the putative NES motif, deletion and site-directed mutants were fused to an EGFP expression vector and transfected into COS-7 cells, and the localization of the proteins was examined. While hNANOG HD exclusively localized to the nucleus, a mutant with both NLSs deleted and only the putative NES motif contained (hNANOG HD-{Delta}NLSs) was predominantly cytoplasmic, as observed by nucleo/cytoplasmic fractionation and Western blot analysis as well as confocal microscopy. Furthermore, site-directed mutagenesis of the putative NES motif in a partial hNANOG HD only containing either one of the two NLS motifs led to localization in the nucleus, suggesting that the NES motif may play a functional role in nuclear export. Furthermore, CRM1-specific nuclear export inhibitor LMB blocked the hNANOG potent NES-mediated export, suggesting that the leucine-rich motif may function in CRM1-mediated nuclear export of hNANOG. Collectively, a NES motif is present in the hNANOG HD and may be functionally involved in CRM1-mediated nuclear export pathway.

  1. Dispom: a discriminative de-novo motif discovery tool based on the jstacs library.

    PubMed

    Grau, Jan; Keilwagen, Jens; Gohr, André; Paponov, Ivan A; Posch, Stefan; Seifert, Michael; Strickert, Marc; Grosse, Ivo

    2013-02-01

    DNA-binding proteins are a main component of gene regulation as they activate or repress gene expression by binding to specific binding sites in target regions of genomic DNA. However, de-novo discovery of these binding sites in target regions obtained by wet-lab experiments is a challenging problem in computational biology, which has not yet been solved satisfactorily. Here, we present a detailed description and analysis of the de-novo motif discovery tool Dispom, which has been developed for finding binding sites of DNA-binding proteins that are differentially abundant in a set of target regions compared to a set of control regions. Two additional features of Dispom are its capability of modeling positional preferences of binding sites and adjusting the length of the motif in the learning process. Dispom yields an increased prediction accuracy compared to existing tools for de-novo motif discovery, suggesting that the combination of searching for differentially abundant motifs, inferring their positional distributions, and adjusting the motif lengths is beneficial for de-novo motif discovery. When applying Dispom to promoters of auxin-responsive genes and those of ABI3 target genes from Arabidopsis thaliana, we identify relevant binding motifs with pronounced positional distributions. These results suggest that learning motifs, their positional distributions, and their lengths by a discriminative learning principle may aid motif discovery from ChIP-chip and gene expression data. We make Dispom freely available as part of Jstacs, an open-source Java library that is tailored to statistical sequence analysis. To facilitate extensions of Dispom, we describe its implementation using Jstacs in this manuscript. In addition, we provide a stand-alone application of Dispom at http://www.jstacs.de/index.php/Dispom for instant use.

  2. Evolving DNA motifs to predict GeneChip probe performance

    PubMed Central

    Langdon, WB; Harrison, AP

    2009-01-01

    Background Affymetrix High Density Oligonuclotide Arrays (HDONA) simultaneously measure expression of thousands of genes using millions of probes. We use correlations between measurements for the same gene across 6685 human tissue samples from NCBI's GEO database to indicated the quality of individual HG-U133A probes. Low correlation indicates a poor probe. Results Regular expressions can be automatically created from a Backus-Naur form (BNF) context-free grammar using strongly typed genetic programming. Conclusion The automatically produced motif is better at predicting poor DNA sequences than an existing human generated RE, suggesting runs of Cytosine and Guanine and mixtures should all be avoided. PMID:19298675

  3. Nucleic Acid i-Motif Structures in Analytical Chemistry.

    PubMed

    Alba, Joan Josep; Sadurní, Anna; Gargallo, Raimundo

    2016-09-02

    Under the appropriate experimental conditions of pH and temperature, cytosine-rich segments in DNA or RNA sequences may produce a characteristic folded structure known as an i-motif. Besides its potential role in vivo, which is still under investigation, this structure has attracted increasing interest in other fields due to its sharp, fast and reversible pH-driven conformational changes. This "on/off" switch at molecular level is being used in nanotechnology and analytical chemistry to develop nanomachines and sensors, respectively. This paper presents a review of the latest applications of this structure in the field of chemical analysis.

  4. Graph animals, subgraph sampling, and motif search in large networks.

    PubMed

    Baskerville, Kim; Grassberger, Peter; Paczuski, Maya

    2007-09-01

    We generalize a sampling algorithm for lattice animals (connected clusters on a regular lattice) to a Monte Carlo algorithm for "graph animals," i.e., connected subgraphs in arbitrary networks. As with the algorithm in [N. Kashtan et al., Bioinformatics 20, 1746 (2004)], it provides a weighted sample, but the computation of the weights is much faster (linear in the size of subgraphs, instead of superexponential). This allows subgraphs with up to ten or more nodes to be sampled with very high statistics, from arbitrarily large networks. Using this together with a heuristic algorithm for rapidly classifying isomorphic graphs, we present results for two protein interaction networks obtained using the tandem affinity purification (TAP) method: one of Escherichia coli with 230 nodes and 695 links, and one for yeast (Saccharomyces cerevisiae) with roughly ten times more nodes and links. We find in both cases that most connected subgraphs are strong motifs (Z scores >10) or antimotifs (Z scores <-10) when the null model is the ensemble of networks with fixed degree sequence. Strong differences appear between the two networks, with dominant motifs in E. coli being (nearly) bipartite graphs and having many pairs of nodes that connect to the same neighbors, while dominant motifs in yeast tend towards completeness or contain large cliques. We also explore a number of methods that do not rely on measurements of Z scores or comparisons with null models. For instance, we discuss the influence of specific complexes like the 26S proteasome in yeast, where a small number of complexes dominate the k cores with large k and have a decisive effect on the strongest motifs with 6-8 nodes. We also present Zipf plots of counts versus rank. They show broad distributions that are not power laws, in contrast to the case when disconnected subgraphs are included.

  5. Graph animals, subgraph sampling, and motif search in large networks

    NASA Astrophysics Data System (ADS)

    Baskerville, Kim; Grassberger, Peter; Paczuski, Maya

    2007-09-01

    We generalize a sampling algorithm for lattice animals (connected clusters on a regular lattice) to a Monte Carlo algorithm for “graph animals,” i.e., connected subgraphs in arbitrary networks. As with the algorithm in [N. Kashtan , Bioinformatics 20, 1746 (2004)], it provides a weighted sample, but the computation of the weights is much faster (linear in the size of subgraphs, instead of superexponential). This allows subgraphs with up to ten or more nodes to be sampled with very high statistics, from arbitrarily large networks. Using this together with a heuristic algorithm for rapidly classifying isomorphic graphs, we present results for two protein interaction networks obtained using the tandem affinity purification (TAP) method: one of Escherichia coli with 230 nodes and 695 links, and one for yeast (Saccharomyces cerevisiae) with roughly ten times more nodes and links. We find in both cases that most connected subgraphs are strong motifs ( Z scores >10 ) or antimotifs ( Z scores <-10 ) when the null model is the ensemble of networks with fixed degree sequence. Strong differences appear between the two networks, with dominant motifs in E. coli being (nearly) bipartite graphs and having many pairs of nodes that connect to the same neighbors, while dominant motifs in yeast tend towards completeness or contain large cliques. We also explore a number of methods that do not rely on measurements of Z scores or comparisons with null models. For instance, we discuss the influence of specific complexes like the 26S proteasome in yeast, where a small number of complexes dominate the k cores with large k and have a decisive effect on the strongest motifs with 6-8 nodes. We also present Zipf plots of counts versus rank. They show broad distributions that are not power laws, in contrast to the case when disconnected subgraphs are included.

  6. Electromagnetic Field Seems to Not Influence Transcription via CTCT Motif in Three Plant Promoters

    PubMed Central

    Sztafrowski, Dariusz; Aksamit-Stachurska, Anna; Kostyn, Kamil; Mackiewicz, Paweł; Łukaszewicz, Marcin

    2017-01-01

    It was proposed that magnetic fields (MFs) can influence gene transcription via CTCT motif located in human HSP70 promoter. To check the universality of this mechanism, we estimated the potential role of this motif on plant gene transcription in response to MFs using both bioinformatics and experimental studies. We searched potential promoter sequences (1000 bp upstream) in the potato Solanum tuberosum and thale cress Arabidopsis thaliana genomes for the CTCT sequence. The motif was found, on average, 3.6 and 4.3 times per promoter (148,487 and 134,361 motifs in total) in these two species, respectively; however, the CTCT sequences were not randomly distributed in the promoter regions but were preferentially located near the transcription initiation site and were closely packed. The closer these CTCT sequences to the transcription initiation site, the smaller distance between them in both plants. One can assume that genes with many CTCT motifs in their promoter regions can be potentially regulated by MFs. To check this assumption, we tested the influence of MFs on gene expression in a transgenic potato with three promoters (16R, 20R, and 5UGT) containing from 3 to 12 CTCT sequences and starting expression of β-glucuronidase as a reported gene. The potatoes were exposed to a 50 Hz 60–70 A/m MF for 30 min and the reporter gene activity was measured for up to 24 h. Although other factors induced the reporter gene activity, the MF did not. It implies the CTCT motif does not mediate in response to MF in the tested plant promoters. PMID:28326086

  7. Network motif identification in stochastic networks

    NASA Astrophysics Data System (ADS)

    Jiang, Rui; Tu, Zhidong; Chen, Ting; Sun, Fengzhu

    2006-06-01

    Network motifs have been identified in a wide range of networks across many scientific disciplines and are suggested to be the basic building blocks of most complex networks. Nonetheless, many networks come with intrinsic and/or experimental uncertainties and should be treated as stochastic networks. The building blocks in these networks thus may also have stochastic properties. In this article, we study stochastic network motifs derived from families of mutually similar but not necessarily identical patterns of interconnections. We establish a finite mixture model for stochastic networks and develop an expectation-maximization algorithm for identifying stochastic network motifs. We apply this approach to the transcriptional regulatory networks of Escherichia coli and Saccharomyces cerevisiae, as well as the protein-protein interaction networks of seven species, and identify several stochastic network motifs that are consistent with current biological knowledge. expectation-maximization algorithm | mixture model | transcriptional regulatory network | protein-protein interaction network

  8. The position of the Gly-xxx-Gly motif in transmembrane segments modulates dimer affinity.

    PubMed

    Johnson, Rachel M; Rath, Arianna; Deber, Charles M

    2006-12-01

    Although the intrinsic low solubility of membrane proteins presents challenges to their high-resolution structure determination, insight into the amino acid sequence features and forces that stabilize their folds has been provided through study of sequence-dependent helix-helix interactions between single transmembrane (TM) helices. While the stability of helix-helix partnerships mediated by the Gly-xxx-Gly (GG4) motif is known to be generally modulated by distal interfacial residues, it has not been established whether the position of this motif, with respect to the ends of a given TM segment, affects dimer affinity. Here we examine the relationship between motif position and affinity in the homodimers of 2 single-spanning membrane protein TM sequences: glycophorin A (GpA) and bacteriophage M13 coat protein (MCP). Using the TOXCAT assay for dimer affinity on a series of GpA and MCP TM segments that have been modified with either 4 Leu residues at each end or with 8 Leu residues at the N-terminal end, we show that in each protein, centrally located GG4 motifs are capable of stronger helix-helix interactions than those proximal to TM helix ends, even when surrounding interfacial residues are maintained. The relative importance of GG4 motifs in stabilizing helix-helix interactions therefore must be considered not only in its specific residue context but also in terms of the location of the interactive surface relative to the N and C termini of alpha-helical TM segments.