Science.gov

Sample records for acid sequence motifs

  1. An amino acid sequence motif sufficient for subnuclear localization of an arginine/serine-rich splicing factor.

    PubMed

    Hedley, M L; Amrein, H; Maniatis, T

    1995-12-05

    We have identified an amino acid sequence in the Drosophila Transformer (Tra) protein that is capable of directing a heterologous protein to nuclear speckles, regions of the nucleus previously shown to contain high concentrations of spliceosomal small nuclear RNAs and splicing factors. This sequence contains a nucleoplasmin-like bipartite nuclear localization signal (NLS) and a repeating arginine/serine (RS) dipeptide sequence adjacent to a short stretch of basic amino acids. Sequence comparisons from a number of other splicing factors that colocalize to nuclear speckles reveal the presence of one or more copies of this motif. We propose a two-step subnuclear localization mechanism for splicing factors. The first step is transport across the nuclear envelope via the nucleoplasmin-like NLS, while the second step is association with components in the speckled domain via the RS dipeptide sequence.

  2. Occurrence probability of structured motifs in random sequences.

    PubMed

    Robin, S; Daudin, J-J; Richard, H; Sagot, M-F; Schbath, S

    2002-01-01

    The problem of extracting from a set of nucleic acid sequences motifs which may have biological function is more and more important. In this paper, we are interested in particular motifs that may be implicated in the transcription process. These motifs, called structured motifs, are composed of two ordered parts separated by a variable distance and allowing for substitutions. In order to assess their statistical significance, we propose approximations of the probability of occurrences of such a structured motif in a given sequence. An application of our method to evaluate candidate promoters in E. coli and B. subtilis is presented. Simulations show the goodness of the approximations.

  3. Motif Yggdrasil: sampling sequence motifs from a tree mixture model.

    PubMed

    Andersson, Samuel A; Lagergren, Jens

    2007-06-01

    In phylogenetic foot-printing, putative regulatory elements are found in upstream regions of orthologous genes by searching for common motifs. Motifs in different upstream sequences are subject to mutations along the edges of the corresponding phylogenetic tree, consequently taking advantage of the tree in the motif search is an appealing idea. We describe the Motif Yggdrasil sampler; the first Gibbs sampler based on a general tree that uses unaligned sequences. Previous tree-based Gibbs samplers have assumed a star-shaped tree or partially aligned upstream regions. We give a probabilistic model (MY model) describing upstream sequences with regulatory elements and build a Gibbs sampler with respect to this model. The model allows toggling, i.e., the restriction of a position to a subset of nucleotides, but does not require aligned sequences nor edge lengths, which may be difficult to come by. We apply the collapsing technique to eliminate the need to sample nuisance parameters, and give a derivation of the predictive update formula. We show that the MY model improves the modeling of difficult motif instances and that the use of the tree achieves a substantial increase in nucleotide level correlation coefficient both for synthetic data and 37 bacterial lexA genes. We investigate the sensitivity to errors in the tree and show that using random trees MY sampler still has a performance similar to the original version.

  4. Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion

    PubMed Central

    Thomsen, Martin Christen Frølund; Nielsen, Morten

    2012-01-01

    Seq2Logo is a web-based sequence logo generator. Sequence logos are a graphical representation of the information content stored in a multiple sequence alignment (MSA) and provide a compact and highly intuitive representation of the position-specific amino acid composition of binding motifs, active sites, etc. in biological sequences. Accurate generation of sequence logos is often compromised by sequence redundancy and low number of observations. Moreover, most methods available for sequence logo generation focus on displaying the position-specific enrichment of amino acids, discarding the equally valuable information related to amino acid depletion. Seq2logo aims at resolving these issues allowing the user to include sequence weighting to correct for data redundancy, pseudo counts to correct for low number of observations and different logotype representations each capturing different aspects related to amino acid enrichment and depletion. Besides allowing input in the format of peptides and MSA, Seq2Logo accepts input as Blast sequence profiles, providing easy access for non-expert end-users to characterize and identify functionally conserved/variable amino acids in any given protein of interest. The output from the server is a sequence logo and a PSSM. Seq2Logo is available at http://www.cbs.dtu.dk/biotools/Seq2Logo (14 May 2012, date last accessed). PMID:22638583

  5. Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion.

    PubMed

    Thomsen, Martin Christen Frølund; Nielsen, Morten

    2012-07-01

    Seq2Logo is a web-based sequence logo generator. Sequence logos are a graphical representation of the information content stored in a multiple sequence alignment (MSA) and provide a compact and highly intuitive representation of the position-specific amino acid composition of binding motifs, active sites, etc. in biological sequences. Accurate generation of sequence logos is often compromised by sequence redundancy and low number of observations. Moreover, most methods available for sequence logo generation focus on displaying the position-specific enrichment of amino acids, discarding the equally valuable information related to amino acid depletion. Seq2logo aims at resolving these issues allowing the user to include sequence weighting to correct for data redundancy, pseudo counts to correct for low number of observations and different logotype representations each capturing different aspects related to amino acid enrichment and depletion. Besides allowing input in the format of peptides and MSA, Seq2Logo accepts input as Blast sequence profiles, providing easy access for non-expert end-users to characterize and identify functionally conserved/variable amino acids in any given protein of interest. The output from the server is a sequence logo and a PSSM. Seq2Logo is available at http://www.cbs.dtu.dk/biotools/Seq2Logo (14 May 2012, date last accessed).

  6. Retinoic acid-induced down-regulation of the interleukin-2 promoter via cis-regulatory sequences containing an octamer motif.

    PubMed Central

    Felli, M P; Vacca, A; Meco, D; Screpanti, I; Farina, A R; Maroder, M; Martinotti, S; Petrangeli, E; Frati, L; Gulino, A

    1991-01-01

    Retinoic acid (RA) is known to influence the proliferation and differentiation of a wide variety of transformed and developing cells. We found that RA and the specific RA receptor (RAR) ligand Ch55 inhibited the phorbol ester and calcium ionophore-induced expression of the T-cell growth factor interleukin-2 (IL-2) gene. Expression of transiently transfected chloramphenicol acetyltransferase vectors containing the 5'-flanking region of the IL-2 gene was also inhibited by RA. RA-induced down-regulation of the IL-2 enhancer is mediated by RAR, since overexpression of transfected RARs increased RA sensitivity of the IL-2 promoter. Functional analysis of chloramphenicol acetyltransferase vectors containing either internal deletion mutants of the region from -317 to +47 bp of the IL-2 enhancer or multimerized cis-regulatory elements showed that the RA-responsive element in the IL-2 promoter mapped to sequences containing an octamer motif. RAR also inhibited the transcriptional activity of the octamer motif of the immunoglobulin heavy chain enhancer. In spite of the transcriptional inhibition of the IL-2 octamer motif, RA did not decrease the in vitro DNA-binding capability of octamer-1 protein. These results identify a regulatory pathway within the IL-2 promoter which involves the octamer motif and RAR. Images PMID:1652063

  7. A generalized profile syntax for biomolecular sequence motifs and its function in automatic sequence interpretation

    SciTech Connect

    Bucher, P.; Bairoch, A.

    1994-12-31

    A general syntax for expressing bimolecular sequence motifs is described, which will be used in future releases of the PROSITE data bank and in a similar collection of nucleic acid sequence motifs currently under development. The central part of the syntax is a regular structure which can be viewed as a generalization of the profiles introduced by Gribskov and coworkers. Accessory features implement specific motif search strategies and provide information helpful for the interpretation of predicted matches. Two contrasting examples, representing E. coli promoters and SH3 domains respectively, are shown to demonstrate the versatility of the syntax, and its compatibility with diverse motif search methods. It is argued, that a comprehensive machine-readable motif collection based on the new syntax, in conjunction with a standard search program, can serve as a general-purpose sequence interpretation and function prediction tool.

  8. Finding sequence motifs in groups of functionally related proteins.

    PubMed

    Smith, H O; Annau, T M; Chandrasegaran, S

    1990-01-01

    We have developed a method for rapidly finding patterns of conserved amino acid residues (motifs) in groups of functionally related proteins. All 3-amino acid patterns in a group of proteins of the type aa1 d1 aa2 d2 aa3, where d1 and d2 are distances that can be varied in a range up to 24 residues, are accumulated into an array. Segments of the proteins containing those patterns that occur most frequently are aligned on each other by a scoring method that obtains an average relatedness value for all the amino acids in each column of the aligned sequence block based on the Dayhoff relatedness odds matrix. The automated method successfully finds and displays nearly all of the sequence motifs that have been previously reported to occur in 33 reverse transcriptases, 18 DNA integrases, and 30 DNA methyltransferases.

  9. Short sequence motifs, overrepresented in mammalian conservednon-coding sequences

    SciTech Connect

    Minovitsky, Simon; Stegmaier, Philip; Kel, Alexander; Kondrashov,Alexey S.; Dubchak, Inna

    2007-02-21

    Background: A substantial fraction of non-coding DNAsequences of multicellular eukaryotes is under selective constraint. Inparticular, ~;5 percent of the human genome consists of conservednon-coding sequences (CNSs). CNSs differ from other genomic sequences intheir nucleotide composition and must play important functional roles,which mostly remain obscure.Results: We investigated relative abundancesof short sequence motifs in all human CNSs present in the human/mousewhole-genome alignments vs. three background sets of sequences: (i)weakly conserved or unconserved non-coding sequences (non-CNSs); (ii)near-promoter sequences (located between nucleotides -500 and -1500,relative to a start of transcription); and (iii) random sequences withthe same nucleotide composition as that of CNSs. When compared tonon-CNSs and near-promoter sequences, CNSs possess an excess of AT-richmotifs, often containing runs of identical nucleotides. In contrast, whencompared to random sequences, CNSs contain an excess of GC-rich motifswhich, however, lack CpG dinucleotides. Thus, abundance of short sequencemotifs in human CNSs, taken as a whole, is mostly determined by theiroverall compositional properties and not by overrepresentation of anyspecific short motifs. These properties are: (i) high AT-content of CNSs,(ii) a tendency, probably due to context-dependent mutation, of A's andT's to clump, (iii) presence of short GC-rich regions, and (iv) avoidanceof CpG contexts, due to their hypermutability. Only a small number ofshort motifs, overrepresented in all human CNSs are similar to bindingsites of transcription factors from the FOX family.Conclusion: Human CNSsas a whole appear to be too broad a class of sequences to possess strongfootprints of any short sequence-specific functions. Such footprintsshould be studied at the level of functional subclasses of CNSs, such asthose which flank genes with a particular pattern of expression. Overallproperties of CNSs are affected by patterns in

  10. Mutation of the aspartic acid residues of the GDD sequence motif of poliovirus RNA-dependent RNA polymerase results in enzymes with altered metal ion requirements for activity.

    PubMed Central

    Jablonski, S A; Morrow, C D

    1995-01-01

    The poliovirus RNA-dependent RNA polymerase, 3Dpol, is known to share a region of sequence homology with all RNA polymerases centered at the GDD amino acid motif. The two aspartic acids have been postulated to be involved in the catalytic activity and metal ion coordination of the enzyme. To test this hypothesis, we have utilized oligonucleotide site-directed mutagenesis to generate defined mutations in the aspartic acids of the GDD motif of the 3Dpol gene. The codon for the first aspartate (3D-D-328 [D refers to the single amino acid change, and the number refers to its position in the polymerase]) was changed to that for glutamic acid, histidine, asparagine, or glutamine; the codons for both aspartic acids were simultaneously changed to those for glutamic acids; and the codon for the second aspartic acid (3D-D-329) was changed to that for glutamic acid or asparagine. The mutant enzymes were expressed in Escherichia coli, and the in vitro poly(U) polymerase activity was characterized. All of the mutant 3Dpol enzymes were enzymatically inactive in vitro when tested over a range of Mg2+ concentrations. However, when Mn2+ was substituted for Mg2+ in the in vitro assays, the mutant that substituted the second aspartic acid for asparagine (3D-N-329) was active. To further substantiate this finding, a series of different transition metal ions were substituted for Mg2+ in the poly(U) polymerase assay. The wild-type enzyme was active with all metals except Ca2+, while the 3D-N-329 mutant was active only when FeC6H7O5 was used in the reaction. To determine the effects of the mutations on poliovirus replication, the mutant 3Dpol genes were subcloned into an infectious cDNA of poliovirus. The cDNAs containing the mutant 3Dpol genes did not produce infectious virus when transfected into tissue culture cells under standard conditions. Because of the activity of the 3D-N-329 mutant in the presence of Fe2+ and Mn2+, transfections were also performed in the presence of the

  11. Probabilistic models for semisupervised discriminative motif discovery in DNA sequences.

    PubMed

    Kim, Jong Kyoung; Choi, Seungjin

    2011-01-01

    Methods for discriminative motif discovery in DNA sequences identify transcription factor binding sites (TFBSs), searching only for patterns that differentiate two sets (positive and negative sets) of sequences. On one hand, discriminative methods increase the sensitivity and specificity of motif discovery, compared to generative models. On the other hand, generative models can easily exploit unlabeled sequences to better detect functional motifs when labeled training samples are limited. In this paper, we develop a hybrid generative/discriminative model which enables us to make use of unlabeled sequences in the framework of discriminative motif discovery, leading to semisupervised discriminative motif discovery. Numerical experiments on yeast ChIP-chip data for discovering DNA motifs demonstrate that the best performance is obtained between the purely-generative and the purely-discriminative and the semisupervised learning improves the performance when labeled sequences are limited.

  12. iMotifs: an integrated sequence motif visualization and analysis environment

    PubMed Central

    Piipari, Matias; Down, Thomas A.; Saini, Harpreet; Enright, Anton; Hubbard, Tim J.P.

    2010-01-01

    Motivation: Short sequence motifs are an important class of models in molecular biology, used most commonly for describing transcription factor binding site specificity patterns. High-throughput methods have been recently developed for detecting regulatory factor binding sites in vivo and in vitro and consequently high-quality binding site motif data are becoming available for increasing number of organisms and regulatory factors. Development of intuitive tools for the study of sequence motifs is therefore important. iMotifs is a graphical motif analysis environment that allows visualization of annotated sequence motifs and scored motif hits in sequences. It also offers motif inference with the sensitive NestedMICA algorithm, as well as overrepresentation and pairwise motif matching capabilities. All of the analysis functionality is provided without the need to convert between file formats or learn different command line interfaces. The application includes a bundled and graphically integrated version of the NestedMICA motif inference suite that has no outside dependencies. Problems associated with local deployment of software are therefore avoided. Availability: iMotifs is licensed with the GNU Lesser General Public License v2.0 (LGPL 2.0). The software and its source is available at http://wiki.github.com/mz2/imotifs and can be run on Mac OS X Leopard (Intel/PowerPC). We also provide a cross-platform (Linux, OS X, Windows) LGPL 2.0 licensed library libxms for the Perl, Ruby, R and Objective-C programming languages for input and output of XMS formatted annotated sequence motif set files. Contact: matias.piipari@gmail.com; imotifs@googlegroups.com PMID:20106815

  13. The distribution of RNA motifs in natural sequences.

    PubMed

    Bourdeau, V; Ferbeyre, G; Pageau, M; Paquin, B; Cedergren, R

    1999-11-15

    Functional analysis of genome sequences has largely ignored RNA genes and their structures. We introduce here the notion of 'ribonomics' to describe the search for the distribution of and eventually the determination of the physiological roles of these RNA structures found in the sequence databases. The utility of this approach is illustrated here by the identification in the GenBank database of RNA motifs having known binding or chemical activity. The frequency of these motifs indicates that most have originated from evolutionary drift and are selectively neutral. On the other hand, their distribution among species and their location within genes suggest that the destiny of these motifs may be more elaborate. For example, the hammerhead motif has a skewed organismal presence, is phylogenetically stable and recent work on a schistosome version confirms its in vivo biological activity. The under-representation of the valine-binding motif and the Rev-binding element in GenBank hints at a detrimental effect on cell growth or viability. Data on the presence and the location of these motifs may provide critical guidance in the design of experiments directed towards the understanding and the manipulation of RNA complexes and activities in vivo.

  14. Chaotic motif sampler: detecting motifs from biological sequences by using chaotic neurodynamics

    NASA Astrophysics Data System (ADS)

    Matsuura, Takafumi; Ikeguchi, Tohru

    Identification of a region in biological sequences, motif extraction problem (MEP) is solved in bioinformatics. However, the MEP is an NP-hard problem. Therefore, it is almost impossible to obtain an optimal solution within a reasonable time frame. To find near optimal solutions for NP-hard combinatorial optimization problems such as traveling salesman problems, quadratic assignment problems, and vehicle routing problems, chaotic search, which is one of the deterministic approaches, has been proposed and exhibits better performance than stochastic approaches. In this paper, we propose a new alignment method that employs chaotic dynamics to solve the MEPs. It is called the Chaotic Motif Sampler. We show that the performance of the Chaotic Motif Sampler is considerably better than that of the conventional methods such as the Gibbs Site Sampler and the Neighborhood Optimization for Multiple Alignment Discovery.

  15. Import of desired nucleic acid sequences using addressing motif of mitochondrial ribosomal 5S-rRNA for fluorescent in vivo hybridization of mitochondrial DNA and RNA.

    PubMed

    Zelenka, Jaroslav; Alán, Lukáš; Jabůrek, Martin; Ježek, Petr

    2014-04-01

    Based on the matrix-addressing sequence of mitochondrial ribosomal 5S-rRNA (termed MAM), which is naturally imported into mitochondria, we have constructed an import system for in vivo targeting of mitochondrial DNA (mtDNA) or mt-mRNA, in order to provide fluorescence hybridization of the desired sequences. Thus DNA oligonucleotides were constructed, containing the 5'-flanked T7 RNA polymerase promoter. After in vitro transcription and fluorescent labeling with Alexa Fluor(®) 488 or 647 dye, we obtained the fluorescent "L-ND5 probe" containing MAM and exemplar cargo, i.e., annealing sequence to a short portion of ND5 mRNA and to the light-strand mtDNA complementary to the heavy strand nd5 mt gene (5'-end 21 base pair sequence). For mitochondrial in vivo fluorescent hybridization, HepG2 cells were treated with dequalinium micelles, containing the fluorescent probes, bringing the probes proximally to the mitochondrial outer membrane and to the natural import system. A verification of import into the mitochondrial matrix of cultured HepG2 cells was provided by confocal microscopy colocalizations. Transfections using lipofectamine or probes without 5S-rRNA addressing MAM sequence or with MAM only were ineffective. Alternatively, the same DNA oligonucleotides with 5'-CACC overhang (substituting T7 promoter) were transcribed from the tetracycline-inducible pENTRH1/TO vector in human embryonic kidney T-REx®-293 cells, while mitochondrial matrix localization after import of the resulting unlabeled RNA was detected by PCR. The MAM-containing probe was then enriched by three-order of magnitude over the natural ND5 mRNA in the mitochondrial matrix. In conclusion, we present a proof-of-principle for mitochondrial in vivo hybridization and mitochondrial nucleic acid import.

  16. The highly conserved amino acid sequence motif Tyr-Gly-Asp-Thr-Asp-Ser in alpha-like DNA polymerases is required by phage phi 29 DNA polymerase for protein-primed initiation and polymerization.

    PubMed Central

    Bernad, A; Lázaro, J M; Salas, M; Blanco, L

    1990-01-01

    The alpha-like DNA polymerases from bacteriophage phi 29 and other viruses, prokaryotes and eukaryotes contain an amino acid consensus sequence that has been proposed to form part of the dNTP binding site. We have used site-directed mutants to study five of the six highly conserved consecutive amino acids corresponding to the most conserved C-terminal segment (Tyr-Gly-Asp-Thr-Asp-Ser). Our results indicate that in phi 29 DNA polymerase this consensus sequence, although irrelevant for the 3'----5' exonuclease activity, is essential for initiation and elongation. Based on these results and on its homology with known or putative metal-binding amino acid sequences, we propose that in phi 29 DNA polymerase the Tyr-Gly-Asp-Thr-Asp-Ser consensus motif is part of the dNTP binding site, involved in the synthetic activities of the polymerase (i.e., initiation and polymerization), and that it is involved particularly in the metal binding associated with the dNTP site. Images PMID:2191296

  17. Classification of protein motifs based on subcellular localization uncovers evolutionary relationships at both sequence and functional levels

    PubMed Central

    2013-01-01

    Background Most proteins have evolved in specific cellular compartments that limit their functions and potential interactions. On the other hand, motifs define amino acid arrangements conserved between protein family members and represent powerful tools for assigning function to protein sequences. The ideal motif would identify all members of a protein family but in practice many motifs identify both family members and unrelated proteins, referred to as True Positive (TP) and False Positive (FP) sequences, respectively. Results To address the relationship between protein motifs, protein function and cellular localization, we systematically assigned subcellular localization data to motif sequences from the comprehensive PROSITE sequence motif database. Using this data we analyzed relationships between localization and function. We find that TPs and FPs have a strong tendency to localize in different compartments. When multiple localizations are considered, TPs are usually distributed between related cellular compartments. We also identified cases where FPs are concentrated in particular subcellular regions, indicating possible functional or evolutionary relationships with TP sequences of the same motif. Conclusions Our findings suggest that the systematic examination of subcellular localization has the potential to uncover evolutionary and functional relationships between motif-containing sequences. We believe that this type of analysis complements existing motif annotations and could aid in their interpretation. Our results shed light on the evolution of cellular organelles and potentially establish the basis for new subcellular localization and function prediction algorithms. PMID:23865897

  18. WildSpan: mining structured motifs from protein sequences

    PubMed Central

    2011-01-01

    Background Automatic extraction of motifs from biological sequences is an important research problem in study of molecular biology. For proteins, it is desired to discover sequence motifs containing a large number of wildcard symbols, as the residues associated with functional sites are usually largely separated in sequences. Discovering such patterns is time-consuming because abundant combinations exist when long gaps (a gap consists of one or more successive wildcards) are considered. Mining algorithms often employ constraints to narrow down the search space in order to increase efficiency. However, improper constraint models might degrade the sensitivity and specificity of the motifs discovered by computational methods. We previously proposed a new constraint model to handle large wildcard regions for discovering functional motifs of proteins. The patterns that satisfy the proposed constraint model are called W-patterns. A W-pattern is a structured motif that groups motif symbols into pattern blocks interleaved with large irregular gaps. Considering large gaps reflects the fact that functional residues are not always from a single region of protein sequences, and restricting motif symbols into clusters corresponds to the observation that short motifs are frequently present within protein families. To efficiently discover W-patterns for large-scale sequence annotation and function prediction, this paper first formally introduces the problem to solve and proposes an algorithm named WildSpan (sequential pattern mining across large wildcard regions) that incorporates several pruning strategies to largely reduce the mining cost. Results WildSpan is shown to efficiently find W-patterns containing conserved residues that are far separated in sequences. We conducted experiments with two mining strategies, protein-based and family-based mining, to evaluate the usefulness of W-patterns and performance of WildSpan. The protein-based mining mode of WildSpan is developed for

  19. The 'helix clamp' in HIV-1 reverse transcriptase: a new nucleic acid binding motif common in nucleic acid polymerases.

    PubMed Central

    Hermann, T; Meier, T; Götte, M; Heumann, H

    1994-01-01

    Amino acid sequences homologous to 259KLVGKL (X)16KLLR284 of human immunodeficiency virus type 1 reverse transcriptase (HIV-1 RT) are conserved in several nucleotide polymerizing enzymes. This amino acid motif has been identified in the crystal structure model as an element of the enzyme's nucleic acid binding apparatus. It is part of the helix-turn-helix structure, alpha H-turn-alpha I, within the 'thumb' region of HIV-1 RT. The motif grasps the complexed nucleic acid at one side. Molecular modeling studies on HIV-1 RT in complex with a nucleic acid fragment suggest that the motif has binding function in the p66 subunit as well as in the p51 subunit, acting as a kind of 'helix clamp'. Given its wide distribution within the nucleic acid polymerases, the helix clamp motif is assumed to be a structure of general significance for nucleic acid binding. Images PMID:7527138

  20. Inhibition of NADPH oxidase activation by synthetic peptides mapping within the carboxyl-terminal domain of small GTP-binding proteins. Lack of amino acid sequence specificity and importance of polybasic motif.

    PubMed

    Joseph, G; Gorzalczany, Y; Koshkin, V; Pick, E

    1994-11-18

    The small GTP-binding protein (G protein) Rac1 is an obligatory participant in the assembly of the superoxide (O2-.)-generating NADPH oxidase complex of macrophages. We investigated the effect of synthetic peptides, mapping within the near carboxyl-terminal domains of Rac1 and of related G proteins, on the activity of NADPH oxidase in a cell-free system consisting of solubilized guinea pig macrophage membrane, a cytosolic fraction enriched in p47phox and p67phox (or total cytosol), highly purified Rac1-GDP dissociation inhibitor for Rho (Rho GDI) complex, and the activating amphiphile, lithium dodecyl sulfate. Peptides Rac1-(178-188) and Rac1-(178-191), but not Rac2-(178-188), inhibited NADPH oxidase activity in a Rac1-dependent system when added prior to or simultaneously with the initiation of activation. However, undecapeptides corresponding to the near carboxyl-terminal domains of RhoA and RhoC and, most notably, a peptide containing the same amino acids as Rac1-(178-188), but in reversed orientation, were also inhibitory. Surprisingly, O2-. production in a Rac2-dependent cell-free system was inhibited by Rac1-(178-188) but not by Rac2-(178-188). Finally, basic polyamino acids containing lysine, histidine, or arginine, also inhibited NADPH oxidase activation. We conclude that inhibition of NADPH oxidase activation by synthetic peptides mapping within the carboxyl-terminal domain of certain small G proteins is not amino acid sequence-specific but related to the presence of a polybasic motif. It has been proposed that such a motif serves as a plasma membrane targeting signal for a number of small G proteins (Hancock, J.F., Paterson, H., and Marshall, C.J. (1990) Cell 63, 133-139).

  1. Computational definition of sequence motifs governing constitutive exon splicing.

    PubMed

    Zhang, Xiang H-F; Chasin, Lawrence A

    2004-06-01

    We have searched for sequence motifs that contribute to the recognition of human pre-mRNA splice sites by comparing the frequency of 8-mers in internal noncoding exons versus unspliced pseudo exons and 5' untranslated regions (5' untranslated regions [UTRs]) of transcripts of intronless genes. This type of comparison avoids the isolation of sequences that are distinguished by their protein-coding information. We classified sequence families comprising 2069 putative exonic enhancers and 974 putative exonic silencers. Representatives of each class functioned as enhancers or silencers when inserted into a test exon and assayed in transfected mammalian cells. As a class, the enhancer sequencers were more prevalent and the silencer elements less prevalent in all exons compared with introns. A survey of 58 reported exonic splicing mutations showed good agreement between the splicing phenotype and the effect of the mutation on the motifs defined here. The large number of effective sequences implied by these results suggests that sequences that influence splicing may be very abundant in pre-mRNA.

  2. Computational Prediction of Phylogenetically Conserved Sequence Motifs for Five Different Candidate Genes in Type II Diabetic Nephropathy

    PubMed Central

    Sindhu, T; Rajamanikandan, S; Srinivasan, P

    2012-01-01

    Background: Computational identification of phylogenetic motifs helps to understand the knowledge about known functional features that includes catalytic site, substrate binding epitopes, and protein-protein interfaces. Furthermore, they are strongly conserved among orthologs, indicating their evolutionary importance. The study aimed to analyze five candidate genes involved in type II diabetic nephropathy and to predict phylogenetic motifs from their corresponding orthologous protein sequences. Methods: AKR1B1, APOE, ENPP1, ELMO1 and IGFBP1 are the genes that have been identified as an important target for type II diabetic nephropathy through experimental studies. Their corresponding protein sequences, structures, orthologous sequences were retrieved from UniprotKB, PDB, and PHOG database respectively. Multiple sequence alignments were constructed using ClustalW and phylogenetic motifs were identified using MINER. The occurrence of amino acids in the obtained phylogenetic motifs was generated using WebLogo and false positive expectations were calculated against phylogenetic similarity. Results: In total, 17 phylogenetic motifs were identified from the five proteins and the residues such as glycine, leucine, tryptophan, aspartic acid were found in appreciable frequency whereas arginine identified in all the predicted PMs. The result implies that these residues can be important to the functional and structural role of the proteins and calculated false positive expectations implies that they were generally conserved in traditional sense. Conclusion: The prediction of phylogenetic motifs is an accurate method for detecting functionally important conserved residues. The conserved motifs can be used as a potential drug target for type II diabetic nephropathy. PMID:23113206

  3. Conserved sequence motifs among bacterial, eukaryotic, and archaeal phosphatases that define a new phosphohydrolase superfamily.

    PubMed Central

    Thaller, M. C.; Schippa, S.; Rossolini, G. M.

    1998-01-01

    Members of a new molecular family of bacterial nonspecific acid phosphatases (NSAPs), indicated as class C, were found to share significant sequence similarities to bacterial class B NSAPs and to some plant acid phosphatases, representing the first example of a family of bacterial NSAPs that has a relatively close eukaryotic counterpart. Despite the lack of an overall similarity, conserved sequence motifs were also identified among the above enzyme families (class B and class C bacterial NSAPs, and related plant phosphatases) and several other families of phosphohydrolases, including bacterial phosphoglycolate phosphatases, histidinol-phosphatase domains of the bacterial bifunctional enzymes imidazole-glycerolphosphate dehydratases, and bacterial, eukaryotic, and archaeal phosphoserine phosphatases and threalose-6-phosphatases. These conserved motifs are clustered within two domains, separated by a variable spacer region, according to the pattern [FILMAVT]-D-[ILFRMVY]-D-[GSNDE]-[TV]-[ILVAM]-[AT S VILMC]-X-¿YFWHKR)-X-¿YFWHNQ¿-X( 102,191)-¿KRHNQ¿-G-D-¿FYWHILVMC¿-¿QNH¿-¿FWYGP¿-D -¿PSNQYW¿. The dephosphorylating activity common to all these proteins supports the definition of this phosphatase motif and the inclusion of these enzymes into a superfamily of phosphohydrolases that we propose to indicate as "DDDD" after the presence of the four invariant aspartate residues. Database searches retrieved various hypothetical proteins of unknown function containing this or similar motifs, for which a phosphohydrolase activity could be hypothesized. PMID:9684901

  4. Identification of imine reductase-specific sequence motifs.

    PubMed

    Fademrecht, Silvia; Scheller, Philipp N; Nestl, Bettina M; Hauer, Bernhard; Pleiss, Jürgen

    2016-05-01

    Chiral amines are valuable building blocks for the production of a variety of pharmaceuticals, agrochemicals and other specialty chemicals. Only recently, imine reductases (IREDs) were discovered which catalyze the stereoselective reduction of imines to chiral amines. Although several IREDs were biochemically characterized in the last few years, knowledge of the reaction mechanism and the molecular basis of substrate specificity and stereoselectivity is limited. To gain further insights into the sequence-function relationships, the Imine Reductase Engineering Database (www.IRED.BioCatNet.de) was established and a systematic analysis of 530 putative IREDs was performed. A standard numbering scheme based on R-IRED-Sk was introduced to facilitate the identification and communication of structurally equivalent positions in different proteins. A conservation analysis revealed a highly conserved cofactor binding region and a predominantly hydrophobic substrate binding cleft. Two IRED-specific motifs were identified, the cofactor binding motif GLGxMGx(5 )[ATS]x(4) Gx(4) [VIL]WNR[TS]x(2) [KR] and the active site motif Gx[DE]x[GDA]x[APS]x(3){K}x[ASL]x[LMVIAG]. Our results indicate a preference toward NADPH for all IREDs and explain why, despite their sequence similarity to β-hydroxyacid dehydrogenases (β-HADs), no conversion of β-hydroxyacids has been observed. Superfamily-specific conservations were investigated to explore the molecular basis of their stereopreference. Based on our analysis and previous experimental results on IRED mutants, an exclusive role of standard position 187 for stereoselectivity is excluded. Alternatively, two standard positions 139 and 194 were identified which are superfamily-specifically conserved and differ in R- and S-selective enzymes.

  5. Phosphotyrosine Substrate Sequence Motifs for Dual Specificity Phosphatases

    PubMed Central

    Zhao, Bryan M.; Keasey, Sarah L.; Tropea, Joseph E.; Lountos, George T.; Dyas, Beverly K.; Cherry, Scott; Raran-Kurussi, Sreejith; Waugh, David S.; Ulrich, Robert G.

    2015-01-01

    Protein tyrosine phosphatases dephosphorylate tyrosine residues of proteins, whereas, dual specificity phosphatases (DUSPs) are a subgroup of protein tyrosine phosphatases that dephosphorylate not only Tyr(P) residue, but also the Ser(P) and Thr(P) residues of proteins. The DUSPs are linked to the regulation of many cellular functions and signaling pathways. Though many cellular targets of DUSPs are known, the relationship between catalytic activity and substrate specificity is poorly defined. We investigated the interactions of peptide substrates with select DUSPs of four types: MAP kinases (DUSP1 and DUSP7), atypical (DUSP3, DUSP14, DUSP22 and DUSP27), viral (variola VH1), and Cdc25 (A-C). Phosphatase recognition sites were experimentally determined by measuring dephosphorylation of 6,218 microarrayed Tyr(P) peptides representing confirmed and theoretical phosphorylation motifs from the cellular proteome. A broad continuum of dephosphorylation was observed across the microarrayed peptide substrates for all phosphatases, suggesting a complex relationship between substrate sequence recognition and optimal activity. Further analysis of peptide dephosphorylation by hierarchical clustering indicated that DUSPs could be organized by substrate sequence motifs, and peptide-specificities by phylogenetic relationships among the catalytic domains. The most highly dephosphorylated peptides represented proteins from 29 cell-signaling pathways, greatly expanding the list of potential targets of DUSPs. These newly identified DUSP substrates will be important for examining structure-activity relationships with physiologically relevant targets. PMID:26302245

  6. cisExpress: motif detection in DNA sequences

    PubMed Central

    Triska, Martin; Grocutt, David; Southern, James; Murphy, Denis J.; Tatarinova, Tatiana

    2013-01-01

    Motivation: One of the major challenges for contemporary bioinformatics is the analysis and accurate annotation of genomic datasets to enable extraction of useful information about the functional role of DNA sequences. This article describes a novel genome-wide statistical approach to the detection of specific DNA sequence motifs based on similarities between the promoters of similarly expressed genes. This new tool, cisExpress, is especially designed for use with large datasets, such as those generated by publicly accessible whole genome and transcriptome projects. cisExpress uses a task farming algorithm to exploit all available computational cores within a shared memory node. We demonstrate the robust nature and validity of the proposed method. It is applicable for use with a wide range of genomic databases for any species of interest. Availability: cisExpress is available at www.cisexpress.org. Contact: tatiana.tatarinova@usc.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23793750

  7. Nucleotide binding database NBDB – a collection of sequence motifs with specific protein-ligand interactions

    PubMed Central

    Zheng, Zejun; Goncearenco, Alexander; Berezovsky, Igor N.

    2016-01-01

    NBDB database describes protein motifs, elementary functional loops (EFLs) that are involved in binding of nucleotide-containing ligands and other biologically relevant cofactors/coenzymes, including ATP, AMP, ATP, GMP, GDP, GTP, CTP, PAP, PPS, FMN, FAD(H), NAD(H), NADP, cAMP, cGMP, c-di-AMP and c-di-GMP, ThPP, THD, F-420, ACO, CoA, PLP and SAM. The database is freely available online at http://nbdb.bii.a-star.edu.sg. In total, NBDB contains data on 249 motifs that work in interactions with 24 ligands. Sequence profiles of EFL motifs were derived de novo from nonredundant Uniprot proteome sequences. Conserved amino acid residues in the profiles interact specifically with distinct chemical parts of nucleotide-containing ligands, such as nitrogenous bases, phosphate groups, ribose, nicotinamide, and flavin moieties. Each EFL profile in the database is characterized by a pattern of corresponding ligand–protein interactions found in crystallized ligand–protein complexes. NBDB database helps to explore the determinants of nucleotide and cofactor binding in different protein folds and families. NBDB can also detect fragments that match to profiles of particular EFLs in the protein sequence provided by user. Comprehensive information on sequence, structures, and interactions of EFLs with ligands provides a foundation for experimental and computational efforts on design of required protein functions. PMID:26507856

  8. Sequence-motif Detection of NAD(P)-binding Proteins: Discovery of a Unique Antibacterial Drug Target

    NASA Astrophysics Data System (ADS)

    Hua, Yun Hao; Wu, Chih Yuan; Sargsyan, Karen; Lim, Carmay

    2014-09-01

    Many enzymes use nicotinamide adenine dinucleotide or nicotinamide adenine dinucleotide phosphate (NAD(P)) as essential coenzymes. These enzymes often do not share significant sequence identity and cannot be easily detected by sequence homology. Previously, we determined all distinct locally conserved pyrophosphate-binding structures (3d motifs) from NAD(P)-bound protein structures, from which 1d sequence motifs were derived. Here, we aim to establish the precision of these 3d and 1d motifs to annotate NAD(P)-binding proteins. We show that the pyrophosphate-binding 3d motifs are characteristic of NAD(P)-binding proteins, as they are rarely found in nonNAD(P)-binding proteins. Furthermore, several 1d motifs could distinguish between proteins that bind only NAD and those that bind only NADP. They could also distinguish between NAD(P)-binding proteins from nonNAD(P)-binding ones. Interestingly, one of the pyrophosphate-binding 3d and corresponding 1d motifs was found only in enoyl-acyl carrier protein reductases, which are enzymes essential for bacterial fatty acid biosynthesis. This unique 3d motif serves as an attractive novel drug target, as it is conserved across many bacterial species and is not found in human proteins.

  9. New structural motif for carboxylic acid perhydrolases.

    PubMed

    Yin, DeLu Tyler; Purpero, Vince M; Fujii, Ryota; Jing, Qing; Kazlauskas, Romas J

    2013-02-25

    Some serine hydrolases also catalyze a promiscuous reaction--reversible perhydrolysis of carboxylic acids to make peroxycarboxylic acids. Five X-ray crystal structures of these carboxylic acid perhydrolases show a proline in the oxyanion loop. Here, we test whether this proline is essential for high perhydrolysis activity using Pseudomonas fluorescens esterase (PFE). The L29P variant of this esterase catalyzes perhydrolysis 43-fold faster (k(cat) comparison) than the wild type. Surprisingly, saturation mutagenesis at the 29 position of PFE identified six other amino acid substitutions that increase perhydrolysis of acetic acid at least fourfold over the wild type. The best variant, L29I PFE, catalyzed perhydrolysis 83-times faster (k(cat) comparison) than wild-type PFE and twice as fast as L29P PFE. Despite the different amino acid in the oxyanion loop, L29I PFE shows a similar selectivity for hydrogen peroxide over water as L29P PFE (β(0)=170 vs. 160 M(-1)), and a similar fast formation of acetyl-enzyme (140 vs. 62 U mg(-1)). X-ray crystal structures of L29I PFE with and without bound acetate show an unusual mixture of two different oxyanion loop conformations. The type II β-turn conformation resembles the wild-type structure and is unlikely to increase perhydrolysis, but the type I β-turn conformation creates a binding site for a second acetate. Modeling suggests that a previously proposed mechanism for L29P PFE can be extended to include L29I PFE, so that an acetate accepts a hydrogen bond to promote faster formation of the acetyl-enzyme.

  10. Improved K-means clustering algorithm for exploring local protein sequence motifs representing common structural property.

    PubMed

    Zhong, Wei; Altun, Gulsah; Harrison, Robert; Tai, Phang C; Pan, Yi

    2005-09-01

    Information about local protein sequence motifs is very important to the analysis of biologically significant conserved regions of protein sequences. These conserved regions can potentially determine the diverse conformation and activities of proteins. In this work, recurring sequence motifs of proteins are explored with an improved K-means clustering algorithm on a new dataset. The structural similarity of these recurring sequence clusters to produce sequence motifs is studied in order to evaluate the relationship between sequence motifs and their structures. To the best of our knowledge, the dataset used by our research is the most updated dataset among similar studies for sequence motifs. A new greedy initialization method for the K-means algorithm is proposed to improve traditional K-means clustering techniques. The new initialization method tries to choose suitable initial points, which are well separated and have the potential to form high-quality clusters. Our experiments indicate that the improved K-means algorithm satisfactorily increases the percentage of sequence segments belonging to clusters with high structural similarity. Careful comparison of sequence motifs obtained by the improved and traditional algorithms also suggests that the improved K-means clustering algorithm may discover some relatively weak and subtle sequence motifs, which are undetectable by the traditional K-means algorithms. Many biochemical tests reported in the literature show that these sequence motifs are biologically meaningful. Experimental results also indicate that the improved K-means algorithm generates more detailed sequence motifs representing common structures than previous research. Furthermore, these motifs are universally conserved sequence patterns across protein families, overcoming some weak points of other popular sequence motifs. The satisfactory result of the experiment suggests that this new K-means algorithm may be applied to other areas of bioinformatics

  11. Illumina MiSeq sequencing disfavours a sequence motif in the GFP reporter gene

    PubMed Central

    Van den Hoecke, Silvie; Verhelst, Judith; Saelens, Xavier

    2016-01-01

    Green fluorescent protein (GFP) is one of the most used reporter genes. We have used next-generation sequencing (NGS) to analyse the genetic diversity of a recombinant influenza A virus that expresses GFP and found a remarkable coverage dip in the GFP coding sequence. This coverage dip was present when virus-derived RT-PCR product or the parental plasmid DNA was used as starting material for NGS and regardless of whether Nextera XT transposase or Covaris shearing was used for DNA fragmentation. Therefore, the sequence coverage dip in the GFP coding sequence was not the result of emerging GFP mutant viruses or a bias introduced by Nextera XT fragmentation. Instead, we found that the Illumina MiSeq sequencing method disfavours the ‘CCCGCC’ motif in the GFP coding sequence. PMID:27193250

  12. Exhaustive Search for Over-represented DNA Sequence Motifs with CisFinder

    PubMed Central

    Sharov, Alexei A.; Ko, Minoru S.H.

    2009-01-01

    We present CisFinder software, which generates a comprehensive list of motifs enriched in a set of DNA sequences and describes them with position frequency matrices (PFMs). A new algorithm was designed to estimate PFMs directly from counts of n-mer words with and without gaps; then PFMs are extended over gaps and flanking regions and clustered to generate non-redundant sets of motifs. The algorithm successfully identified binding motifs for 12 transcription factors (TFs) in embryonic stem cells based on published chromatin immunoprecipitation sequencing data. Furthermore, CisFinder successfully identified alternative binding motifs of TFs (e.g. POU5F1, ESRRB, and CTCF) and motifs for known and unknown co-factors of genes associated with the pluripotent state of ES cells. CisFinder also showed robust performance in the identification of motifs that were only slightly enriched in a set of DNA sequences. PMID:19740934

  13. CyanoLyase: a database of phycobilin lyase sequences, motifs and functions

    PubMed Central

    Bretaudeau, Anthony; Coste, François; Humily, Florian; Garczarek, Laurence; Le Corguillé, Gildas; Six, Christophe; Ratin, Morgane; Collin, Olivier; Schluchter, Wendy M.; Partensky, Frédéric

    2013-01-01

    CyanoLyase (http://cyanolyase.genouest.org/) is a manually curated sequence and motif database of phycobilin lyases and related proteins. These enzymes catalyze the covalent ligation of chromophores (phycobilins) to specific binding sites of phycobiliproteins (PBPs). The latter constitute the building bricks of phycobilisomes, the major light-harvesting systems of cyanobacteria and red algae. Phycobilin lyases sequences are poorly annotated in public databases. Sequences included in CyanoLyase were retrieved from all available genomes of these organisms and a few others by similarity searches using biochemically characterized enzyme sequences and then classified into 3 clans and 32 families. Amino acid motifs were computed for each family using Protomata learner. CyanoLyase also includes BLAST and a novel pattern matching tool (Protomatch) that allow users to rapidly retrieve and annotate lyases from any new genome. In addition, it provides phylogenetic analyses of all phycobilin lyases families, describes their function, their presence/absence in all genomes of the database (phyletic profiles) and predicts the chromophorylation of PBPs in each strain. The site also includes a thorough bibliography about phycobilin lyases and genomes included in the database. This resource should be useful to scientists and companies interested in natural or artificial PBPs, which have a number of biotechnological applications, notably as fluorescent markers. PMID:23175607

  14. QGRS-H Predictor: a web server for predicting homologous quadruplex forming G-rich sequence motifs in nucleotide sequences

    PubMed Central

    Menendez, Camille; Frees, Scott; Bagga, Paramjeet S.

    2012-01-01

    Naturally occurring G-quadruplex structural motifs, formed by guanine-rich nucleic acids, have been reported in telomeric, promoter and transcribed regions of mammalian genomes. G-quadruplex structures have received significant attention because of growing evidence for their role in important biological processes, human disease and as therapeutic targets. Lately, there has been much interest in the potential roles of RNA G-quadruplexes as cis-regulatory elements of post-transcriptional gene expression. Large-scale computational genomics studies on G-quadruplexes have difficulty validating their predictions without laborious testing in ‘wet’ labs. We have developed a bioinformatics tool, QGRS-H Predictor that can map and analyze conserved putative Quadruplex forming 'G'-Rich Sequences (QGRS) in mRNAs, ncRNAs and other nucleotide sequences, e.g. promoter, telomeric and gene flanking regions. Identifying conserved regulatory motifs helps validate computations and enhances accuracy of predictions. The QGRS-H Predictor is particularly useful for mapping homologous G-quadruplex forming sequences as cis-regulatory elements in the context of 5′- and 3′-untranslated regions, and CDS sections of aligned mRNA sequences. QGRS-H Predictor features highly interactive graphic representation of the data. It is a unique and user-friendly application that provides many options for defining and studying G-quadruplexes. The QGRS-H Predictor can be freely accessed at: http://quadruplex.ramapo.edu/qgrs/app/start. PMID:22576365

  15. The bioactive acidic serine- and aspartate-rich motif peptide.

    PubMed

    Minamizaki, Tomoko; Yoshiko, Yuji

    2015-01-01

    The organic component of the bone matrix comprises 40% dry weight of bone. The organic component is mostly composed of type I collagen and small amounts of non-collagenous proteins (NCPs) (10-15% of the total bone protein content). The small integrin-binding ligand N-linked glycoprotein (SIBLING) family, a NCP, is considered to play a key role in bone mineralization. SIBLING family of proteins share common structural features and includes the arginine-glycine-aspartic acid (RGD) motif and acidic serine- and aspartic acid-rich motif (ASARM). Clinical manifestations of gene mutations and/or genetically modified mice indicate that SIBLINGs play diverse roles in bone and extraskeletal tissues. ASARM peptides might not be primary responsible for the functional diversity of SIBLINGs, but this motif is suggested to be a key domain of SIBLINGs. However, the exact function of ASARM peptides is poorly understood. In this article, we discuss the considerable progress made in understanding the role of ASARM as a bioactive peptide.

  16. Sequence-specific high mobility group box factors recognize 10-12-base pair minor groove motifs.

    PubMed

    van Beest, M; Dooijes, D; van De Wetering, M; Kjaerulff, S; Bonvin, A; Nielsen, O; Clevers, H

    2000-09-01

    Sequence-specific high mobility group (HMG) box factors bind and bend DNA via interactions in the minor groove. Three-dimensional NMR analyses have provided the structural basis for this interaction. The cognate HMG domain DNA motif is generally believed to span 6-8 bases. However, alignment of promoter elements controlled by the yeast genes ste11 and Rox1 has indicated strict conservation of a larger DNA motif. By site selection, we identify a highly specific 12-base pair motif for Ste11, AGAACAAAGAAA. Similarly, we show that Tcf1, MatMc, and Sox4 bind unique, highly specific DNA motifs of 12, 12, and 10 base pairs, respectively. Footprinting with a deletion mutant of Ste11 reveals a novel interaction between the 3' base pairs of the extended DNA motif and amino acids C-terminal to the HMG domain. The sequence-specific interaction of Ste11 with these 3' base pairs contributes significantly to binding and bending of the DNA motif.

  17. Computational generation and screening of RNA motifs in large nucleotide sequence pools

    PubMed Central

    Kim, Namhee; Izzo, Joseph A.; Elmetwaly, Shereef; Gan, Hin Hark; Schlick, Tamar

    2010-01-01

    Although identification of active motifs in large random sequence pools is central to RNA in vitro selection, no systematic computational equivalent of this process has yet been developed. We develop a computational approach that combines target pool generation, motif scanning and motif screening using secondary structure analysis for applications to 1012–1014-sequence pools; large pool sizes are made possible using program redesign and supercomputing resources. We use the new protocol to search for aptamer and ribozyme motifs in pools up to experimental pool size (1014 sequences). We show that motif scanning, structure matching and flanking sequence analysis, respectively, reduce the initial sequence pool by 6–8, 1–2 and 1 orders of magnitude, consistent with the rare occurrence of active motifs in random pools. The final yields match the theoretical yields from probability theory for simple motifs and overestimate experimental yields, which constitute lower bounds, for aptamers because screening analyses beyond secondary structure information are not considered systematically. We also show that designed pools using our nucleotide transition probability matrices can produce higher yields for RNA ligase motifs than random pools. Our methods for generating, analyzing and designing large pools can help improve RNA design via simulation of aspects of in vitro selection. PMID:20448026

  18. Mitoxantrone and Analogues Bind and Stabilize i-Motif Forming DNA Sequences

    PubMed Central

    Wright, Elisé P.; Day, Henry A.; Ibrahim, Ali M.; Kumar, Jeethendra; Boswell, Leo J. E.; Huguin, Camille; Stevenson, Clare E. M.; Pors, Klaus; Waller, Zoë A. E.

    2016-01-01

    There are hundreds of ligands which can interact with G-quadruplex DNA, yet very few which target i-motif. To appreciate an understanding between the dynamics between these structures and how they can be affected by intervention with small molecule ligands, more i-motif binding compounds are required. Herein we describe how the drug mitoxantrone can bind, induce folding of and stabilise i-motif forming DNA sequences, even at physiological pH. Additionally, mitoxantrone was found to bind i-motif forming sequences preferentially over double helical DNA. We also describe the stabilisation properties of analogues of mitoxantrone. This offers a new family of ligands with potential for use in experiments into the structure and function of i-motif forming DNA sequences. PMID:28004744

  19. Mitoxantrone and Analogues Bind and Stabilize i-Motif Forming DNA Sequences

    NASA Astrophysics Data System (ADS)

    Wright, Elisé P.; Day, Henry A.; Ibrahim, Ali M.; Kumar, Jeethendra; Boswell, Leo J. E.; Huguin, Camille; Stevenson, Clare E. M.; Pors, Klaus; Waller, Zoë A. E.

    2016-12-01

    There are hundreds of ligands which can interact with G-quadruplex DNA, yet very few which target i-motif. To appreciate an understanding between the dynamics between these structures and how they can be affected by intervention with small molecule ligands, more i-motif binding compounds are required. Herein we describe how the drug mitoxantrone can bind, induce folding of and stabilise i-motif forming DNA sequences, even at physiological pH. Additionally, mitoxantrone was found to bind i-motif forming sequences preferentially over double helical DNA. We also describe the stabilisation properties of analogues of mitoxantrone. This offers a new family of ligands with potential for use in experiments into the structure and function of i-motif forming DNA sequences.

  20. Physical-chemical property based sequence motifs and methods regarding same

    DOEpatents

    Braun, Werner; Mathura, Venkatarajan S.; Schein, Catherine H.

    2008-09-09

    A data analysis system, program, and/or method, e.g., a data mining/data exploration method, using physical-chemical property motifs. For example, a sequence database may be searched for identifying segments thereof having physical-chemical properties similar to the physical-chemical property motifs.

  1. Nucleic Acid i-Motif Structures in Analytical Chemistry.

    PubMed

    Alba, Joan Josep; Sadurní, Anna; Gargallo, Raimundo

    2016-09-02

    Under the appropriate experimental conditions of pH and temperature, cytosine-rich segments in DNA or RNA sequences may produce a characteristic folded structure known as an i-motif. Besides its potential role in vivo, which is still under investigation, this structure has attracted increasing interest in other fields due to its sharp, fast and reversible pH-driven conformational changes. This "on/off" switch at molecular level is being used in nanotechnology and analytical chemistry to develop nanomachines and sensors, respectively. This paper presents a review of the latest applications of this structure in the field of chemical analysis.

  2. MISCORE: a new scoring function for characterizing DNA regulatory motifs in promoter sequences

    PubMed Central

    2012-01-01

    Background Computational approaches for finding DNA regulatory motifs in promoter sequences are useful to biologists in terms of reducing the experimental costs and speeding up the discovery process of de novo binding sites. It is important for rule-based or clustering-based motif searching schemes to effectively and efficiently evaluate the similarity between a k-mer (a k-length subsequence) and a motif model, without assuming the independence of nucleotides in motif models or without employing computationally expensive Markov chain models to estimate the background probabilities of k-mers. Also, it is interesting and beneficial to use a priori knowledge in developing advanced searching tools. Results This paper presents a new scoring function, termed as MISCORE, for functional motif characterization and evaluation. Our MISCORE is free from: (i) any assumption on model dependency; and (ii) the use of Markov chain model for background modeling. It integrates the compositional complexity of motif instances into the function. Performance evaluations with comparison to the well-known Maximum a Posteriori (MAP) score and Information Content (IC) have shown that MISCORE has promising capabilities to separate and recognize functional DNA motifs and its instances from non-functional ones. Conclusions MISCORE is a fast computational tool for candidate motif characterization, evaluation and selection. It enables to embed priori known motif models for computing motif-to-motif similarity, which is more advantageous than IC and MAP score. In addition to these merits mentioned above, MISCORE can automatically filter out some repetitive k-mers from a motif model due to the introduction of the compositional complexity in the function. Consequently, the merits of our proposed MISCORE in terms of both motif signal modeling power and computational efficiency will make it more applicable in the development of computational motif discovery tools. PMID:23282090

  3. Detecting Remote Sequence Homology in Disordered Proteins: Discovery of Conserved Motifs in the N-Termini of Mononegavirales phosphoproteins

    PubMed Central

    Karlin, David; Belshaw, Robert

    2012-01-01

    Paramyxovirinae are a large group of viruses that includes measles virus and parainfluenza viruses. The viral Phosphoprotein (P) plays a central role in viral replication. It is composed of a highly variable, disordered N-terminus and a conserved C-terminus. A second viral protein alternatively expressed, the V protein, also contains the N-terminus of P, fused to a zinc finger. We suspected that, despite their high variability, the N-termini of P/V might all be homologous; however, using standard approaches, we could previously identify sequence conservation only in some Paramyxovirinae. We now compared the N-termini using sensitive sequence similarity search programs, able to detect residual similarities unnoticeable by conventional approaches. We discovered that all Paramyxovirinae share a short sequence motif in their first 40 amino acids, which we called soyuz1. Despite its short length (11–16aa), several arguments allow us to conclude that soyuz1 probably evolved by homologous descent, unlike linear motifs. Conservation across such evolutionary distances suggests that soyuz1 plays a crucial role and experimental data suggest that it binds the viral nucleoprotein to prevent its illegitimate self-assembly. In some Paramyxovirinae, the N-terminus of P/V contains a second motif, soyuz2, which might play a role in blocking interferon signaling. Finally, we discovered that the P of related Mononegavirales contain similarly overlooked motifs in their N-termini, and that their C-termini share a previously unnoticed structural similarity suggesting a common origin. Our results suggest several testable hypotheses regarding the replication of Mononegavirales and suggest that disordered regions with little overall sequence similarity, common in viral and eukaryotic proteins, might contain currently overlooked motifs (intermediate in length between linear motifs and disordered domains) that could be detected simply by comparing orthologous proteins. PMID:22403617

  4. Fast and Accurate Discovery of Degenerate Linear Motifs in Protein Sequences

    PubMed Central

    Levy, Emmanuel D.; Michnick, Stephen W.

    2014-01-01

    Linear motifs mediate a wide variety of cellular functions, which makes their characterization in protein sequences crucial to understanding cellular systems. However, the short length and degenerate nature of linear motifs make their discovery a difficult problem. Here, we introduce MotifHound, an algorithm particularly suited for the discovery of small and degenerate linear motifs. MotifHound performs an exact and exhaustive enumeration of all motifs present in proteins of interest, including all of their degenerate forms, and scores the overrepresentation of each motif based on its occurrence in proteins of interest relative to a background (e.g., proteome) using the hypergeometric distribution. To assess MotifHound, we benchmarked it together with state-of-the-art algorithms. The benchmark consists of 11,880 sets of proteins from S. cerevisiae; in each set, we artificially spiked-in one motif varying in terms of three key parameters, (i) number of occurrences, (ii) length and (iii) the number of degenerate or “wildcard” positions. The benchmark enabled the evaluation of the impact of these three properties on the performance of the different algorithms. The results showed that MotifHound and SLiMFinder were the most accurate in detecting degenerate linear motifs. Interestingly, MotifHound was 15 to 20 times faster at comparable accuracy and performed best in the discovery of highly degenerate motifs. We complemented the benchmark by an analysis of proteins experimentally shown to bind the FUS1 SH3 domain from S. cerevisiae. Using the full-length protein partners as sole information, MotifHound recapitulated most experimentally determined motifs binding to the FUS1 SH3 domain. Moreover, these motifs exhibited properties typical of SH3 binding peptides, e.g., high intrinsic disorder and evolutionary conservation, despite the fact that none of these properties were used as prior information. MotifHound is available (http://michnick.bcm.umontreal.ca or http

  5. Factoring local sequence composition in motif significance analysis.

    PubMed

    Ng, Patrick; Keich, Uri

    2008-01-01

    We recently introduced a biologically realistic and reliable significance analysis of the output of a popular class of motif finders. In this paper we further improve our significance analysis by incorporating local base composition information. Relying on realistic biological data simulation, as well as on FDR analysis applied to real data, we show that our method is significantly better than the increasingly popular practice of using the normal approximation to estimate the significance of a finder's output. Finally we turn to leveraging our reliable significance analysis to improve the actual motif finding task. Specifically, endowing a variant of the Gibbs Sampler with our improved significance analysis we demonstrate that de novo finders can perform better than has been perceived. Significantly, our new variant outperforms all the finders reviewed in a recently published comprehensive analysis of the Harbison genome-wide binding location data. Interestingly, many of these finders incorporate additional information such as nucleosome positioning and the significance of binding data.

  6. Analysis of Genomic Sequence Motifs for Deciphering Transcription Factor Binding and Transcriptional Regulation in Eukaryotic Cells

    PubMed Central

    Boeva, Valentina

    2016-01-01

    Eukaryotic genomes contain a variety of structured patterns: repetitive elements, binding sites of DNA and RNA associated proteins, splice sites, and so on. Often, these structured patterns can be formalized as motifs and described using a proper mathematical model such as position weight matrix and IUPAC consensus. Two key tasks are typically carried out for motifs in the context of the analysis of genomic sequences. These are: identification in a set of DNA regions of over-represented motifs from a particular motif database, and de novo discovery of over-represented motifs. Here we describe existing methodology to perform these two tasks for motifs characterizing transcription factor binding. When applied to the output of ChIP-seq and ChIP-exo experiments, or to promoter regions of co-modulated genes, motif analysis techniques allow for the prediction of transcription factor binding events and enable identification of transcriptional regulators and co-regulators. The usefulness of motif analysis is further exemplified in this review by how motif discovery improves peak calling in ChIP-seq and ChIP-exo experiments and, when coupled with information on gene expression, allows insights into physical mechanisms of transcriptional modulation. PMID:26941778

  7. Nuclear Magnetic Resonance Structure of a Novel Globular Domain in RBM10 Containing OCRE, the Octamer Repeat Sequence Motif.

    PubMed

    Martin, Bryan T; Serrano, Pedro; Geralt, Michael; Wüthrich, Kurt

    2016-01-05

    The OCtamer REpeat (OCRE) has been annotated as a 42-residue sequence motif with 12 tyrosine residues in the spliceosome trans-regulatory elements RBM5 and RBM10 (RBM [RNA-binding motif]), which are known to regulate alternative splicing of Fas and Bcl-x pre-mRNA transcripts. Nuclear magnetic resonance structure determination showed that the RBM10 OCRE sequence motif is part of a 55-residue globular domain containing 16 aromatic amino acids, which consists of an anti-parallel arrangement of six β strands, with the first five strands containing complete or incomplete Tyr triplets. This OCRE globular domain is a distinctive component of RBM10 and is more widely conserved in RBM10s across the animal kingdom than the ubiquitous RNA recognition components. It is also found in the functionally related RBM5. Thus, it appears that the three-dimensional structure of the globular OCRE domain, rather than the 42-residue OCRE sequence motif alone, confers specificity on RBM10 intermolecular interactions in the spliceosome.

  8. Functional characterization of sequence motifs in the transit peptide of Arabidopsis small subunit of rubisco.

    PubMed

    Lee, Dong Wook; Lee, Sookjin; Lee, Gil-Je; Lee, Kwang Hee; Kim, Sanguk; Cheong, Gang-Won; Hwang, Inhwan

    2006-02-01

    The transit peptides of nuclear-encoded chloroplast proteins are necessary and sufficient for targeting and import of proteins into chloroplasts. However, the sequence information encoded by transit peptides is not fully understood. In this study, we investigated sequence motifs in the transit peptide of the small subunit of the Rubisco complex by examining the ability of various mutant transit peptides to target green fluorescent protein reporter proteins to chloroplasts in Arabidopsis (Arabidopsis thaliana) leaf protoplasts. We divided the transit peptide into eight blocks (T1 through T8), each consisting of eight or 10 amino acids, and generated mutants that had alanine (Ala) substitutions or deletions, of one or two T blocks in the transit peptide. In addition, we generated mutants that had the original sequence partially restored in single- or double-T-block Ala (A) substitution mutants. Analysis of chloroplast import of these mutants revealed several interesting observations. Single-T-block mutations did not noticeably affect targeting efficiency, except in T1 and T4 mutations. However, double-T mutants, T2A/T4A, T3A/T6A, T3A/T7A, T4A/T6A, and T4A/T7A, caused a 50% to 100% loss in targeting ability. T3A/T6A and T4A/T6A mutants produced only precursor proteins, whereas T2A/T4A and T4A/T7A mutants produced only a 37-kD protein. Detailed analyses revealed that sequence motifs ML in T1, LKSSA in T3, FP and RK in T4, CMQVW in T6, and KKFET in T7 play important roles in chloroplast targeting. In T1, the hydrophobicity of ML is important for targeting. LKSSA in T3 is functionally equivalent to CMQVW in T6 and KKFET in T7. Furthermore, subcellular fractionation revealed that Ala substitution in T1, T3, and T6 produced soluble precursors, whereas Ala substitution in T4 and T7 produced intermediates that were tightly associated with membranes. These results demonstrate that the transit peptide contains multiple motifs and that some of them act in concert or

  9. Sequence motifs and prokaryotic expression of the reptilian paramyxovirus fusion protein

    USGS Publications Warehouse

    Franke, J.; Batts, W.N.; Ahne, W.; Kurath, G.; Winton, J.R.

    2006-01-01

    Fourteen reptilian paramyxovirus isolates were chosen to represent the known extent of genetic diversity among this novel group of viruses. Selected regions of the fusion (F) gene were sequenced, analyzed and compared. The F gene of all isolates contained conserved motifs homologous to those described for other members of the family Paramyxoviridae including: signal peptide, transmembrane domain, furin cleavage site, fusion peptide, N-linked glycosylation sites, and two heptad repeats, the second of which (HRB-LZ) had the characteristics of a leucine zipper. Selected regions of the fusion gene of isolate Gono-GER85 were inserted into a prokaryotic expression system to generate three recombinant protein fragments of various sizes. The longest recombinant protein was cleaved by furin into two fragments of predicted length. Western blot analysis with virus-neutralizing rabbit-antiserum against this isolate demonstrated that only the longest construct reacted with the antiserum. This construct was unique in containing 30 additional C-terminal amino acids that included most of the HRB-LZ. These results indicate that the F genes of reptilian paramyxoviruses contain highly conserved motifs typical of other members of the family and suggest that the HRB-LZ domain of the reptilian paramyxovirus F protein contains a linear antigenic epitope. ?? Springer-Verlag 2005.

  10. Predicting candidate genomic sequences that correspond to synthetic functional RNA motifs

    PubMed Central

    Laserson, Uri; Gan, Hin Hark; Schlick, Tamar

    2005-01-01

    Riboswitches and RNA interference are important emerging mechanisms found in many organisms to control gene expression. To enhance our understanding of such RNA roles, finding small regulatory motifs in genomes presents a challenge on a wide scale. Many simple functional RNA motifs have been found by in vitro selection experiments, which produce synthetic target-binding aptamers as well as catalytic RNAs, including the hammerhead ribozyme. Motivated by the prediction of Piganeau and Schroeder [(2003) Chem. Biol., 10, 103–104] that synthetic RNAs may have natural counterparts, we develop and apply an efficient computational protocol for identifying aptamer-like motifs in genomes. We define motifs from the sequence and structural information of synthetic aptamers, search for sequences in genomes that will produce motif matches, and then evaluate the structural stability and statistical significance of the potential hits. Our application to aptamers for streptomycin, chloramphenicol, neomycin B and ATP identifies 37 candidate sequences (in coding and non-coding regions) that fold to the target aptamer structures in bacterial and archaeal genomes. Further energetic screening reveals that several candidates exhibit energetic properties and sequence conservation patterns that are characteristic of functional motifs. Besides providing candidates for experimental testing, our computational protocol offers an avenue for expanding natural RNA's functional repertoire. PMID:16254081

  11. GOmotif: A web server for investigating the biological role of protein sequence motifs

    PubMed Central

    2011-01-01

    Background Many proteins contain conserved sequence patterns (motifs) that contribute to their functionality. The process of experimentally identifying and validating novel protein motifs can be difficult, expensive, and time consuming. A means for helping to identify in advance the possible function of a novel motif is important to test hypotheses concerning the biological relevance of these motifs, thus reducing experimental trial-and-error. Results GOmotif accepts PROSITE and regular expression formatted motifs as input and searches a Gene Ontology annotated protein database using motif search tools. The search returns the set of proteins containing matching motifs and their associated Gene Ontology terms. These results are presented as: 1) a hierarchical, navigable tree separated into the three Gene Ontology biological domains - biological process, cellular component, and molecular function; 2) corresponding pie charts indicating raw and statistically adjusted distributions of the results, and 3) an interactive graphical network view depicting the location of the results in the Gene Ontology. Conclusions GOmotif is a web-based tool designed to assist researchers in investigating the biological role of novel protein motifs. GOmotif can be freely accessed at http://www.gomotif.ca PMID:21943350

  12. A Siglec-like sialic-acid-binding motif revealed in an adenovirus capsid protein

    PubMed Central

    Rademacher, Christoph; Bru, Thierry; McBride, Ryan; Robison, Elizabeth; Nycholat, Corwin M; Kremer, Eric J; Paulson, James C

    2012-01-01

    Sialic-acid-binding immunoglobulin-like lectins (Siglecs) are a family of transmembrane receptors that are well documented to play roles in regulation of innate and adaptive immune responses. To see whether the features that define the molecular recognition of sialic acid were found in other sialic-acid-binding proteins, we analyzed 127 structures with bound sialic acids found in the Protein Data Bank database. Of these, the canine adenovirus 2-fiber knob protein showed close local structural relationship to Siglecs despite low sequence similarity. The fiber knob harbors a noncanonical sialic-acid recognition site, which was then explored for detailed specificity using a custom glycan microarray comprising 58 diverse sialosides. It was found that the adenoviral protein preferentially recognizes the epitope Neu5Acα2-3[6S]Galβ1-4GlcNAc, a structure previously identified as the preferred ligand for Siglec-8 in humans and Siglec-F in mice. Comparison of the Siglec and fiber knob sialic-acid-binding sites reveal conserved structural elements that are not clearly identifiable from the primary amino acid sequence, suggesting a Siglec-like sialic-acid-binding motif that comprises the consensus features of these proteins in complex with sialic acid. PMID:22522600

  13. Identification of disease-specific motifs in the antibody specificity repertoire via next-generation sequencing

    PubMed Central

    Pantazes, Robert J.; Reifert, Jack; Bozekowski, Joel; Ibsen, Kelly N.; Murray, Joseph A.; Daugherty, Patrick S.

    2016-01-01

    Disease-specific antibodies can serve as highly effective biomarkers but have been identified for only a relatively small number of autoimmune diseases. A method was developed to identify disease-specific binding motifs through integration of bacterial display peptide library screening, next-generation sequencing (NGS) and computational analysis. Antibody specificity repertoires were determined by identifying bound peptide library members for each specimen using cell sorting and performing NGS. A computational algorithm, termed Identifying Motifs Using Next- generation sequencing Experiments (IMUNE), was developed and applied to discover disease- and healthy control-specific motifs. IMUNE performs comprehensive pattern searches, identifies patterns statistically enriched in the disease or control groups and clusters the patterns to generate motifs. Using celiac disease sera as a discovery set, IMUNE identified a consensus motif (QPEQPF[PS]E) with high diagnostic sensitivity and specificity in a validation sera set, in addition to novel motifs. Peptide display and sequencing (Display-Seq) coupled with IMUNE analysis may thus be useful to characterize antibody repertoires and identify disease-specific antibody epitopes and biomarkers. PMID:27481573

  14. Recurring sequence-structure motifs in (βα)8-barrel proteins and experimental optimization of a chimeric protein designed based on such motifs.

    PubMed

    Wang, Jichao; Zhang, Tongchuan; Liu, Ruicun; Song, Meilin; Wang, Juncheng; Hong, Jiong; Chen, Quan; Liu, Haiyan

    2017-02-01

    An interesting way of generating novel artificial proteins is to combine sequence motifs from natural proteins, mimicking the evolutionary path suggested by natural proteins comprising recurring motifs. We analyzed the βα and αβ modules of TIM barrel proteins by structure alignment-based sequence clustering. A number of preferred motifs were identified. A chimeric TIM was designed by using recurring elements as mutually compatible interfaces. The foldability of the designed TIM protein was then significantly improved by six rounds of directed evolution. The melting temperature has been improved by more than 20°C. A variety of characteristics suggested that the resulting protein is well-folded. Our analysis provided a library of peptide motifs that is potentially useful for different protein engineering studies. The protein engineering strategy of using recurring motifs as interfaces to connect partial natural proteins may be applied to other protein folds.

  15. A sequence upstream of canonical PDZ-binding motif within CFTR COOH-terminus enhances NHERF1 interaction.

    PubMed

    Sharma, Neeraj; LaRusch, Jessica; Sosnay, Patrick R; Gottschalk, Laura B; Lopez, Andrea P; Pellicore, Matthew J; Evans, Taylor; Davis, Emily; Atalar, Melis; Na, Chan-Hyun; Rosson, Gedge D; Belchis, Deborah; Milewski, Michal; Pandey, Akhilesh; Cutting, Garry R

    2016-12-01

    The development of cystic fibrosis transmembrane conductance regulator (CFTR) targeted therapy for cystic fibrosis has generated interest in maximizing membrane residence of mutant forms of CFTR by manipulating interactions with scaffold proteins, such as sodium/hydrogen exchange regulatory factor-1 (NHERF1). In this study, we explored whether COOH-terminal sequences in CFTR beyond the PDZ-binding motif influence its interaction with NHERF1. NHERF1 displayed minimal self-association in blot overlays (NHERF1, Kd = 1,382 ± 61.1 nM) at concentrations well above physiological levels, estimated at 240 nM from RNA-sequencing and 260 nM by liquid chromatography tandem mass spectrometry in sweat gland, a key site of CFTR function in vivo. However, NHERF1 oligomerized at considerably lower concentrations (10 nM) in the presence of the last 111 amino acids of CFTR (20 nM) in blot overlays and cross-linking assays and in coimmunoprecipitations using differently tagged versions of NHERF1. Deletion and alanine mutagenesis revealed that a six-amino acid sequence (1417)EENKVR(1422) and the terminal (1478)TRL(1480) (PDZ-binding motif) in the COOH-terminus were essential for the enhanced oligomerization of NHERF1. Full-length CFTR stably expressed in Madin-Darby canine kidney epithelial cells fostered NHERF1 oligomerization that was substantially reduced (∼5-fold) on alanine substitution of EEN, KVR, or EENKVR residues or deletion of the TRL motif. Confocal fluorescent microscopy revealed that the EENKVR and TRL sequences contribute to preferential localization of CFTR to the apical membrane. Together, these results indicate that COOH-terminal sequences mediate enhanced NHERF1 interaction and facilitate the localization of CFTR, a property that could be manipulated to stabilize mutant forms of CFTR at the apical surface to maximize the effect of CFTR-targeted therapeutics.

  16. PfEMP1-DBL1alpha amino acid motifs in severe disease states of Plasmodium falciparum malaria.

    PubMed

    Normark, Johan; Nilsson, Daniel; Ribacke, Ulf; Winter, Gerhard; Moll, Kirsten; Wheelock, Craig E; Bayarugaba, Justus; Kironde, Fred; Egwang, Thomas G; Chen, Qijun; Andersson, Björn; Wahlgren, Mats

    2007-10-02

    An infection with Plasmodium falciparum may lead to severe malaria as a result of excessive binding of infected erythrocytes in the microvasculature. Vascular adhesion is mediated by P. falciparum erythrocyte membrane protein-1 (PfEMP1), which is encoded for by highly polymorphic members of the var-gene family. Here, we profile var gene transcription in fresh P. falciparum trophozoites from Ugandan children with malaria through var-specific DBL1alpha-PCR amplification and sequencing. A method for subsectioning region alignments into homology areas (MOTIFF) was developed to examine collected sequences. Specific PfEMP1-DBL1alpha amino acid motifs correlated with rosetting and severe malaria, with motif location corresponding to distinct regions of receptor interaction. The method is potentially applicable to other families of variant proteins and may be useful in identifying sequence-phenotype relationships. The results suggest that certain PfEMP1 sequences are predisposed to inducing severe malaria.

  17. Creation of Hybrid Nanorods From Sequences of Natural Trimeric Fibrous Proteins Using the Fibritin Trimerization Motif

    NASA Astrophysics Data System (ADS)

    Papanikolopoulou, Katerina; van Raaij, Mark J.; Mitraki, Anna

    Stable, artificial fibrous proteins that can be functionalized open new avenues in fields such as bionanomaterials design and fiber engineering. An important source of inspiration for the creation of such proteins are natural fibrous proteins such as collagen, elastin, insect silks, and fibers from phages and viruses. The fibrous parts of this last class of proteins usually adopt trimeric, β-stranded structural folds and are appended to globular, receptor-binding domains. It has been recently shown that the globular domains are essential for correct folding and trimerization and can be successfully substituted by a very small (27-amino acid) trimerization motif from phage T4 fibritin. The hybrid proteins are correctly folded nanorods that can withstand extreme conditions. When the fibrous part derives from the adenovirus fiber shaft, different tissue-targeting specificities can be engineered into the hybrid proteins, which therefore can be used as gene therapy vectors. The integration of such stable nanorods in devices is also a big challenge in the field of biomechanical design. The fibritin foldon domain is a versatile trimerization motif and can be combined with a variety of fibrous motifs, such as coiled-coil, collagenous, and triple β-stranded motifs, provided the appropriate linkers are used. The combination of different motifs within the same fibrous molecule to create stable rods with multiple functions can even be envisioned. We provide a comprehensive overview of the experimental procedures used for designing, creating, and characterizing hybrid fibrous nanorods using the fibritin trimerization motif.

  18. Identification of an oligodeoxynucleotide sequence motif that specifically inhibits phosphorylation by protein tyrosine kinases.

    PubMed

    Krieg, A M; Matson, S; Cheng, K; Fisher, E; Koretzky, G A; Koland, J G

    1997-04-01

    Protein tyrosine kinases (PTKs) have central roles in cellular signal transduction. We have identified a sequence motif (CGT[C]GA) in phosphorothioate-modified oligodeoxynucleotides (ODNs) that specifically inhibits the enzymatic activity of recombinant or immunoprecipitated PTK in vitro. Hexamer ODNs containing this motif block both substrate and autophosphorylation of at least four different PTKs but have no apparent effect on the enzymatic activity of a serine/threonine protein kinase. These data suggest possible new applications for ODNs and have implications for the design and interpretation of experiments using antisense or triplex ODNs.

  19. Development of a salicylic acid inducible minimal sub-genomic transcript promoter from Figwort mosaic virus with enhanced root- and leaf-activity using TGACG motif rearrangement.

    PubMed

    Kumar, Deepak; Patro, Sunita; Ghosh, Jayasish; Das, Abhimanyu; Maiti, Indu B; Dey, Nrisingha

    2012-07-15

    In Figwort mosaic virus sub-genomic transcript promoter (F-Sgt), function of the TGACG-regulatory motif, was investigated in the background of artificially designed promoter sequences. The 131bp (FS, -100 to +31) long F-Sgt promoter sequence containing one TGACG motif [FS-(TGACG)] was engineered to generate a set of three modified promoter constructs: [FS-(TGACG)(2), containing one additional TGACG motif at 7 nucleotides upstream of the original one], [FS-(TGACG)(3), containing two additional TGACG motifs at 7 nucleotides upstream and two nucleotides downstream of the original one] and [FS-(TGCTG)(mu), having a mutated TGACG motif]. EMSA and foot-printing analysis confirmed binding of tobacco nuclear factors with modified TGACG motif/s. The transcription-activation of the GUS gene by the TGACG motif/s in above promoter constructs was examined in transgenic tobacco and Arabidopsis plants and observed that the transcription activation was affected by the spacing/s and number/s of the TGACG motif/s. The FS-(TGACG)(2) promoter showed strongest root-activity compared to other modified and CaMV35S promoters. Also under salicylic acid (SA) stress, the leaf-activity of the said promoter was further enhanced. All above findings were confirmed by real-time and semi-qRT PCR analysis. Taken together, these results clearly demonstrated that the TGACG motif plays an important role in inducing the root-specific expression of the F-Sgt promoter. This study advocates the importance of genetic manipulation of functional cis-motif for amending the tissue specificity of a plant promoter. SA inducible FS-(TGACG)(2) promoter with enhanced activity could be a useful candidate promoter for developing plants with enhanced crop productivity.

  20. Definition of the tempo of sequence diversity across an alignment and automatic identification of sequence motifs: Application to protein homologous families and superfamilies

    PubMed Central

    May, Alex C.W.

    2002-01-01

    It is often possible to identify sequence motifs that characterize a protein family in terms of its fold and/or function from aligned protein sequences. Such motifs can be used to search for new family members. Partitioning of sequence alignments into regions of similar amino acid variability is usually done by hand. Here, I present a completely automatic method for this purpose: one that is guaranteed to produce globally optimal solutions at all levels of partition granularity. The method is used to compare the tempo of sequence diversity across reliable three-dimensional (3D) structure-based alignments of 209 protein families (HOMSTRAD) and that for 69 superfamilies (CAMPASS). (The mean alignment length for HOMSTRAD and CAMPASS are very similar.) Surprisingly, the optimal segmentation distributions for the closely related proteins and distantly related ones are found to be very similar. Also, optimal segmentation identifies an unusual protein superfamily. Finally, protein 3D structure clues from the tempo of sequence diversity across alignments are examined. The method is general, and could be applied to any area of comparative biological sequence and 3D structure analysis where the constraint of the inherent linear organization of the data imposes an ordering on the set of objects to be clustered. PMID:12441381

  1. Species-Specific Minimal Sequence Motif for Oligodeoxyribonucleotides Activating Mouse TLR9.

    PubMed

    Pohar, Jelka; Lainšček, Duško; Fukui, Ryutaro; Yamamoto, Chikako; Miyake, Kensuke; Jerala, Roman; Benčina, Mojca

    2015-11-01

    Synthetic oligodeoxyribonucleotides (ODNs) containing unmethylated CpG recapitulate the activation of TLR9 by microbial DNA. ODNs are potent stimulators of the immune response in cells expressing TLR9. Despite extensive use of mice as experimental animals in basic and applied immunological research, the key sequence determinants that govern the activation of mouse TLR9 by ODNs have not been well defined. We performed a systematic investigation of the sequence motif of B class phosphodiester ODNs to identify the sequence properties that govern mouse TLR9 activation. In contrast to ODNs activating human TLR9, where the minimal sequence motif for the receptor activation comprises a pair of closely positioned CpGs we found that the mouse TLR9 requires a single CpG positioned 4-6 nt from the 5'-end. Activation is augmented by a 5'TCC sequence one to three nucleotides from the CG. The distance of the CG dinucleotide of four to six nucleotides from the 5'-end and the ODN's length fine-tunes activation of mouse macrophages. Length of the ODN <23 and >29 nt decreases activation of dendritic cells. The ODNs with minimal sequence induce Th1-type cytokine synthesis in dendritic cells and confirm the expression of cell surface markers in B cells. Identification of the minimal sequence provides an insight into the sequence selectivity of mouse TLR9 and points to the differences in the receptor selectivity between species probably as a result of differences in the receptor binding sites.

  2. Using machine learning to predict gene expression and discover sequence motifs

    NASA Astrophysics Data System (ADS)

    Li, Xuejing

    Recently, large amounts of experimental data for complex biological systems have become available. We use tools and algorithms from machine learning to build data-driven predictive models. We first present a novel algorithm to discover gene sequence motifs associated with temporal expression patterns of genes. Our algorithm, which is based on partial least squares (PLS) regression, is able to directly model the flow of information, from gene sequence to gene expression, to learn cis regulatory motifs and characterize associated gene expression patterns. Our algorithm outperforms traditional computational methods e.g. clustering in motif discovery. We then present a study of extending a machine learning model for transcriptional regulation predictive of genetic regulatory response to Caenorhabditis elegans. We show meaningful results both in terms of prediction accuracy on the test experiments and biological information extracted from the regulatory program. The model discovers DNA binding sites ab initio. We also present a case study where we detect a signal of lineage-specific regulation. Finally we present a comparative study on learning predictive models for motif discovery, based on different boosting algorithms: Adaptive Boosting (AdaBoost), Linear Programming Boosting (LPBoost) and Totally Corrective Boosting (TotalBoost). We evaluate and compare the performance of the three boosting algorithms via both statistical and biological validation, for hypoxia response in Saccharomyces cerevisiae.

  3. Composition for nucleic acid sequencing

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2008-08-26

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  4. A Convex Atomic-Norm Approach to Multiple Sequence Alignment and Motif Discovery

    PubMed Central

    Yen, Ian E. H.; Lin, Xin; Zhang, Jiong; Ravikumar, Pradeep; Dhillon, Inderjit S.

    2016-01-01

    Multiple Sequence Alignment and Motif Discovery, known as NP-hard problems, are two fundamental tasks in Bioinformatics. Existing approaches to these two problems are based on either local search methods such as Expectation Maximization (EM), Gibbs Sampling or greedy heuristic methods. In this work, we develop a convex relaxation approach to both problems based on the recent concept of atomic norm and develop a new algorithm, termed Greedy Direction Method of Multiplier, for solving the convex relaxation with two convex atomic constraints. Experiments show that our convex relaxation approach produces solutions of higher quality than those standard tools widely-used in Bioinformatics community on the Multiple Sequence Alignment and Motif Discovery problems. PMID:27559428

  5. 'Size leap' algorithm: an efficient extraction of the longest common motifs from a molecular sequence set. Application to the DNA sequence reconstruction.

    PubMed

    Danckaert, A; Chappey, C; Hazout, S

    1991-10-01

    We propose a new method, called 'size leap' algorithm, of search for motifs of maximum size and common to two fragments at least. It allows the creation of a reduced database of motifs from a set of sequences whose size obeys the series of Fibonacci numbers. The convenience lies in the efficiency of the motif extraction. It can be applied in the establishment of overlap regions for DNA sequence reconstruction and multiple alignment of biological sequences. The method of complete DNA sequence reconstruction by extraction of the longest motifs ('anchor motifs') is presented as an application of the size leap algorithm. The details of a reconstruction from three sequenced fragments are given as an example.

  6. The tungsten formylmethanofuran dehydrogenase from Methanobacterium thermoautotrophicum contains sequence motifs characteristic for enzymes containing molybdopterin dinucleotide.

    PubMed

    Hochheimer, A; Schmitz, R A; Thauer, R K; Hedderich, R

    1995-12-15

    Formylmethanofuran dehydrogenases are molybdenum or tungsten iron-sulfur proteins containing a pterin dinucleotide cofactor. We report here on the primary structures of the four subunits FwdABCD of the tungsten enzyme from Methanobacterium thermoautotrophicum which were determined by cloning and sequencing the encoding genes fwdABCD. FwdB was found to contain sequence motifs characteristic for molybdopterin-dinucleotide-containing enzymes indicating that this subunit harbors the active site. FwdA, FwdC and FwdD showed no significant sequence similarity to proteins in the data bases. Northern blot analysis revealed that the four fwd genes form a transcription unit together with three additional genes designated fwdE, fwdF and fwdG. A 17.8-kDa protein and an 8.6-kDa protein, both containing two [4Fe-4S] cluster binding motifs, were deduced from fwdE and fwdG. The open reading frame fwdF encodes a 38.6-kDa protein containing eight binding motifs for [4Fe-4S] clusters suggesting the gene product to be a novel polyferredoxin. All seven fwd genes were expressed in Escherichia coli yielding proteins of the expected size. The fwd operon was found to be located in a region of the M. thermoautotrophicum genome encoding molybdenum enzymes and proteins involved in molybdopterin biosynthesis.

  7. The Runt domain of AML1 (RUNX1) binds a sequence-conserved RNA motif that mimics a DNA element

    PubMed Central

    Fukunaga, Junichi; Nomura, Yusuke; Tanaka, Yoichiro; Amano, Ryo; Tanaka, Taku; Nakamura, Yoshikazu; Kawai, Gota; Sakamoto, Taiichi; Kozu, Tomoko

    2013-01-01

    AML1 (RUNX1) is a key transcription factor for hematopoiesis that binds to the Runt-binding double-stranded DNA element (RDE) of target genes through its N-terminal Runt domain. Aberrations in the AML1 gene are frequently found in human leukemia. To better understand AML1 and its potential utility for diagnosis and therapy, we obtained RNA aptamers that bind specifically to the AML1 Runt domain. Enzymatic probing and NMR analyses revealed that Apt1-S, which is a truncated variant of one of the aptamers, has a CACG tetraloop and two stem regions separated by an internal loop. All the isolated aptamers were found to contain the conserved sequence motif 5′-NNCCAC-3′ and 5′-GCGMGN′N′-3′ (M:A or C; N and N′ form Watson–Crick base pairs). The motif contains one AC mismatch and one base bulged out. Mutational analysis of Apt1-S showed that three guanines of the motif are important for Runt binding as are the three guanines of RDE, which are directly recognized by three arginine residues of the Runt domain. Mutational analyses of the Runt domain revealed that the amino acid residues used for Apt1-S binding were similar to those used for RDE binding. Furthermore, the aptamer competed with RDE for binding to the Runt domain in vitro. These results demonstrated that the Runt domain of the AML1 protein binds to the motif of the aptamer that mimics DNA. Our findings should provide new insights into RNA function and utility in both basic and applied sciences. PMID:23709277

  8. AliBiMotif: integrating alignment and biclustering to unravel transcription factor binding sites in DNA sequences.

    PubMed

    Gonçalves, Joana P; Moreau, Yves; Madeira, Sara C

    2012-01-01

    Transcription Factors (TFs) control transcription by binding to specific sites in the promoter regions of the target genes, which can be modelled by structured motifs. In this paper we propose AliBiMotif, a method combining sequence alignment and a biclustering approach based on efficient string matching techniques using suffix trees to unravel approximately conserved sets of blocks (structured motifs) while straightforwardly disregarding non-conserved stretches in-between. The ability to ignore the width of non-conserved regions is a major advantage of the proposed method over other motif finders, as the lengths of the binding sites are usually easier to estimate than the separating distances.

  9. Association of the amino acid motifs of BoLA-DRB3 alleles with mastitis pathogens in Japanese Holstein cows.

    PubMed

    Yoshida, Tatsuyuki; Mukoyama, Harutaka; Furuta, Hiroki; Kondo, Yasuko; Takeshima, Shin-nosuke; Aida, Yoko; Kosugiyama, Motoaki; Tomogane, Hiroshi

    2009-10-01

    The association of the polymorphism of bovine leukocyte antigen (BoLA-DRB3) genes, identified by the polymerase chain reaction sequence-based typing (PCR-SBT) method, with resistance and susceptibility to mastitis caused by Streptococci, coagulase-negative Staphylococci, Escherichia coli and Staphylococcus aureus was investigated. Blood samples for DNA extraction were collected from 170 Holstein cows (129 mastitis and 41 healthy cows) from 5 districts in Chiba prefecture, Japan. Susceptibility or resistance to the mastitis-causing pathogens was thought to vary by the presence of amino acid substitutions at the 9, 11, 13, and 30 positions. DRB3*0101 and DRB3*1501 had amino acid motifs of Glu(9), Ser(11), Ser(13), and Tyr(30), and they were considered to have susceptibility to all 4 mastitis pathogens. In contrast, DRB3*1101 and DRB3*1401 had amino acid motifs of Gln(9), His(11), Gly(13), and His(30) in these positions, and they also had Val(86), so these alleles were considered to have resistance to Streptococcal and coagulase-negative Staphylococcal mastitis. However, in the case of Escherichia coli mastitis, amino acid substitutions at the 9, 11, 13, and 30 positions had little effect, but rather substitutions at the 47, 67 positions of pocket 7, and at the 71, 74 positions of pocket 4, Tyr(47), Ile(67), Ala(71), and Ala(74), were associated with resistance. This motif was present in DRB3*1201.

  10. Peptide sequence motif analysis of tandem MS data with the SALSA algorithm.

    PubMed

    Liebler, Daniel C; Hansen, Beau T; Davey, Sean W; Tiscareno, Laura; Mason, Daniel E

    2002-01-01

    We have developed a pattern recognition algorithm called SALSA (scoring algorithm for spectral analysis) for the detection of specific features in tandem MS (MS-MS) spectra. Application of the SALSA algorithm to the detection of peptide MS-MS ion series enables identification of MS-MS spectra displaying characteristics of specific peptide sequences. SALSA analysis scores MS-MS spectra based on correspondence between theoretical ion series for peptide sequence motifs and actual MS-MS product ion series, regardless of their absolute positions on the m/z axis. Analyses of tryptic digests of bovine serum albumin (BSA) by LC-MS-MS followed by SALSA analysis detected MS-MS spectra for both unmodified and multiple modified forms of several BSA tryptic peptides. SALSA analysis of MS-MS data from mixtures of BSA and human serum albumin (HSA) tryptic digests indicated that ion series searches with BSA peptide sequence motifs identified MS-MS spectra for both BSA and closely related HSA peptides. Optimal discrimination between MS-MS spectra of variant peptide forms is achieved when the SALSA search criteria are optimized to the target peptide. Application of SALSA to LC-MS-MS proteome analysis will facilitate the characterization of modified and sequence variant proteins.

  11. Functional importance of GGXG sequence motifs in putative reentrant loops of 2HCT and ESS transport proteins.

    PubMed

    Dobrowolski, Adam; Lolkema, Juke S

    2009-08-11

    The 2HCT and ESS families are two families of secondary transporters. Members of the two families are unrelated in amino acid sequence but share similar hydropathy profiles, which suggest a similar folding of the proteins in membranes. Structural models show two homologous domains containing five transmembrane segments (TMSs) each, with a reentrant or pore loop between the fourth and fifth TMSs in each domain. Here we show that GGXG sequence motifs present in the putative reentrant loops are important for the activity of the transporters. Mutation of the conserved Gly residues to Cys in the motifs of the Na(+)-citrate transporter CitS in the 2HCT family and the Na(+)-glutamate transporter GltS in the ESS family resulted in strongly reduced transport activity. Similarly, mutation of the variable residue "X" to Cys in the N-terminal half of GltS essentially inactivated the transporter. The corresponding mutations in the N- and C-terminal halves of CitS reduced transport activity to 60 and 25% of that of the wild type, respectively. Residual activity of any of the mutants could be further reduced by treatment with the membrane permeable thiol reagent N-ethylmaleimide (NEM). The X to Cys mutation (S405C) in the cytoplasmic loop in the C-terminal half of CitS rendered the protein sensitive to the bulky, membrane impermeable thiol reagent 4-acetamido-4'-maleimidylstilbene-2,2'-disulfonic acid (AmdiS) added at the periplasmic side of the membrane, providing further evidence that this part of the loop is positioned between the transmembrane segments. The putative reentrant loop in the C-terminal half of the ESS family does not contain the GGXG motif, but a conserved stretch rich in Gly residues. Cysteine-scanning mutagenesis of a stretch of 18 residues in the GltS protein revealed two residues important for function. Mutant N356C was completely inactivated by treatment with NEM, and mutant P351C appeared to be the counterpart of mutant S405C of CitS; the mutant was

  12. Identification of sequence motifs involved in Dengue virus-host interactions.

    PubMed

    Asnet Mary, J; Paramasivan, R; Shenbagarathai, R

    2016-01-01

    Dengue fever is a rapidly spreading mosquito-borne virus infection, which remains a serious global public health problem. As there is no specific treatment or commercial vaccine available for effective control of the disease, the attempts on developing novel control strategies are underway. Viruses utilize the surface receptor proteins of host to enter into the cells. Though various proteins were said to be receptors of Dengue virus (DENV) using Virus Overlay Protein Binding Assay, the precise interaction between DENV and host is not explored. Understanding the structural features of domain III envelope glycoprotein would help in developing efficient antiviral inhibitors. Therefore, an attempt was made to identify the sequence motifs present in domain III envelope glycoprotein of Dengue virus. Computational analysis revealed that the NGR motif is present in the domain III envelope glycoprotein of DENV-1 and DENV-3. Similarly, DENV-1, DENV-2 and DENV-4 were found to contain Yxxphi motif which is a tyrosine-based sorting signal responsible for the interaction with a mu subunit of adaptor protein complex. High-throughput virtual screening resulted in five compounds as lead molecules based on glide score, which ranges from -4.664 to -6.52 kcal/Mol. This computational prediction provides an additional tool for understanding the virus-host interactions and helps to identify potential targets in the host. Further, experimental evidence is warranted to confirm the virus-host interactions and also inhibitory activity of reported lead compounds.

  13. High speed nucleic acid sequencing

    SciTech Connect

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2011-05-17

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid. Each type of labeled nucleotide comprises an acceptor fluorophore attached to a phosphate portion of the nucleotide such that the fluorophore is removed upon incorporation into a growing strand. Fluorescent signal is emitted via fluorescent resonance energy transfer between the donor fluorophore and the acceptor fluorophore as each nucleotide is incorporated into the growing strand. The sequence is deduced by identifying which base is being incorporated into the growing strand.

  14. [Analysis of the molecular motif for inducing response to jasmonic acid and ethylene in Pib promoter via rice transformation].

    PubMed

    Yu, Li; Yang, Shi-Hu; Jin, Yu-Kuan; Wan, Jian-Min; Zhao, Bao-Quan

    2010-01-01

    The expression of Pib gene in rice was induced by hormone, such as jasmonic acid and ethylene. In order to determine the necessary regions of sequence or motifs for response to jasmonic acid and ethylene in Pib promoter, the full length promoter of Pib (-3,572 approximately 2 bp) and three different 5' deletion fragments of Pib promoter (-2,692 approximately 2 bp, -1,335 approximately 2 bp, -761 approximately 2 bp) were synthesized by PCR and then were substituted for 35S upstream gus in a binary plasmid to construct re-combined plasmids of Pib promoter-gus fusions. Transgenic rice plants of the four recombined plasmids were produced by Agrobacterium-mediated transformation. Quality and quantum analysis of gus activities in transgenic plants at both protein and mRNA levels were conducted. The promotion activity of the full length promoter of Pib (-3,572 approximately 2 bp, pNAR901) was the highest in the four recombinants and the gus activities in its transgenic plant organs were enhanced obviously at 6 h after treatment with jasmonic acid or ethylene. The promotion activity of the deleted Pib promoters was significantly decreased and the response to jasmonic acid or ethylene treatment was not present when the -3,572 approximately -2,692 bp sequence was knocked out from the Pib promoter. Although the disparity in the lengths of the deleted Pib promoter of pNAR902 (-2,692 approximately 2 bp), pNAR903 (-1,335 approximately 2 bp), and pNAR904 (-761 approximately 2 bp) was more than 2 or 3 times, the response to jasmonic acid or ethylene treatment was not different among their transgenic plants. All these results indicated that the common deleted sequences (-3,572 approximately -2,692 bp) in the three deleted Pib promoter constructs were the essential region to the response to jasmonic acid and ethylene treatment. The result of pib promoter sequence searching indicated that there was only one GCCGCC motif at -2,722 bp of this common deleted segment in the Pib promoter

  15. Quadfinder: server for identification and analysis of quadruplex-forming motifs in nucleotide sequences

    PubMed Central

    Scaria, Vinod; Hariharan, Manoj; Arora, Amit; Maiti, Souvik

    2006-01-01

    G-quadruplex secondary structures, which play a structural role in repetitive DNA such as telomeres, may also play a functional role at other genomic locations as targetable regulatory elements which control gene expression. The recent interest in application of quadruplexes in biological systems prompted us to develop a tool for the identification and analysis of quadruplex-forming nucleotide sequences especially in the RNA. Here we present Quadfinder, an online server for prediction and bioinformatics of uni-molecular quadruplex-forming nucleotide sequences. The server is designed to be user-friendly and needs minimal intervention by the user, while providing flexibility of defining the variants of the motif. The server is freely available at URL . PMID:16845097

  16. Using a color-coded ambigraphic nucleic acid notation to visualize conserved palindromic motifs within and across genomes

    PubMed Central

    2014-01-01

    Background Ambiscript is a graphically-designed nucleic acid notation that uses symbol symmetries to support sequence complementation, highlight biologically-relevant palindromes, and facilitate the analysis of consensus sequences. Although the original Ambiscript notation was designed to easily represent consensus sequences for multiple sequence alignments, the notation’s black-on-white ambiguity characters are unable to reflect the statistical distribution of nucleotides found at each position. We now propose a color-augmented ambigraphic notation to encode the frequency of positional polymorphisms in these consensus sequences. Results We have implemented this color-coding approach by creating an Adobe Flash® application ( http://www.ambiscript.org) that shades and colors modified Ambiscript characters according to the prevalence of the encoded nucleotide at each position in the alignment. The resulting graphic helps viewers perceive biologically-relevant patterns in multiple sequence alignments by uniquely combining color, shading, and character symmetries to highlight palindromes and inverted repeats in conserved DNA motifs. Conclusion Juxtaposing an intuitive color scheme over the deliberate character symmetries of an ambigraphic nucleic acid notation yields a highly-functional nucleic acid notation that maximizes information content and successfully embodies key principles of graphic excellence put forth by the statistician and graphic design theorist, Edward Tufte. PMID:24447494

  17. How to find a leucine in a haystack? Structure, ligand recognition and regulation of leucine-aspartic acid (LD) motifs.

    PubMed

    Alam, Tanvir; Alazmi, Meshari; Gao, Xin; Arold, Stefan T

    2014-06-15

    LD motifs (leucine-aspartic acid motifs) are short helical protein-protein interaction motifs that have emerged as key players in connecting cell adhesion with cell motility and survival. LD motifs are required for embryogenesis, wound healing and the evolution of multicellularity. LD motifs also play roles in disease, such as in cancer metastasis or viral infection. First described in the paxillin family of scaffolding proteins, LD motifs and similar acidic LXXLL interaction motifs have been discovered in several other proteins, whereas 16 proteins have been reported to contain LDBDs (LD motif-binding domains). Collectively, structural and functional analyses have revealed a surprising multivalency in LD motif interactions and a wide diversity in LDBD architectures. In the present review, we summarize the molecular basis for function, regulation and selectivity of LD motif interactions that has emerged from more than a decade of research. This overview highlights the intricate multi-level regulation and the inherently noisy and heterogeneous nature of signalling through short protein-protein interaction motifs.

  18. The Motif Tool Assessment Platform (MTAP) for sequence-based transcription factor binding site prediction tools.

    PubMed

    Quest, Daniel; Ali, Hesham

    2010-01-01

    Predicting transcription factor binding sites (TFBS) from sequence is one of the most challenging problems in computational biology. The development of (semi-)automated computer-assisted prediction methods is needed to find TFBS over an entire genome, which is a first step in reconstructing mechanisms that control gene activity. Bioinformatics journals continue to publish diverse methods for predicting TFBS on a monthly basis. To help practitioners in deciding which method to use to predict for a particular TFBS, we provide a platform to assess the quality and applicability of the available methods. Assessment tools allow researchers to determine how methods can be expected to perform on specific organisms or on specific transcription factor families. This chapter introduces the TFBS detection problem and reviews current strategies for evaluating algorithm effectiveness. In this chapter, a novel and robust assessment tool, the Motif Tool Assessment Platform (MTAP), is introduced and discussed.

  19. The structure of an endogenous Drosophila centromere reveals the prevalence of tandemly repeated sequences able to form i-motifs

    PubMed Central

    Garavís, Miguel; Méndez-Lago, María; Gabelica, Valérie; Whitehead, Siobhan L.; González, Carlos; Villasante, Alfredo

    2015-01-01

    Centromeres are the chromosomal loci at which spindle microtubules attach to mediate chromosome segregation during mitosis and meiosis. In most eukaryotes, centromeres are made up of highly repetitive DNA sequences (satellite DNA) interspersed with middle repetitive DNA sequences (transposable elements). Despite the efforts to establish complete genomic sequences of eukaryotic organisms, the so-called ‘finished’ genomes are not actually complete because the centromeres have not been assembled due to the intrinsic difficulties in constructing both physical maps and complete sequence assemblies of long stretches of tandemly repetitive DNA. Here we show the first molecular structure of an endogenous Drosophila centromere and the ability of the C-rich dodeca satellite strand to form dimeric i-motifs. The finding of i-motif structures in simple and complex centromeric satellite DNAs leads us to suggest that these centromeric sequences may have been selected not by their primary sequence but by their ability to form noncanonical secondary structures. PMID:26289671

  20. Conserved amino acid motifs from the novel Piv/MooV family of transposases and site-specific recombinases are required for catalysis of DNA inversion by Piv.

    PubMed

    Tobiason, D M; Buchner, J M; Thiel, W H; Gernert, K M; Karls, A C

    2001-02-01

    Piv, a site-specific invertase from Moraxella lacunata, exhibits amino acid homology with the transposases of the IS110/IS492 family of insertion elements. The functions of conserved amino acid motifs that define this novel family of both transposases and site-specific recombinases (Piv/MooV family) were examined by mutagenesis of fully conserved amino acids within each motif in Piv. All Piv mutants altered in conserved residues were defective for in vivo inversion of the M. lacunata invertible DNA segment, but competent for in vivo binding to Piv DNA recognition sequences. Although the primary amino acid sequences of the Piv/MooV recombinases do not contain a conserved DDE motif, which defines the retroviral integrase/transposase (IN/Tnps) family, the predicted secondary structural elements of Piv align well with those of the IN/Tnps for which crystal structures have been determined. Molecular modelling of Piv based on these alignments predicts that E59, conserved as either E or D in the Piv/MooV family, forms a catalytic pocket with the conserved D9 and D101 residues. Analysis of Piv E59G confirms a role for E59 in catalysis of inversion. These results suggest that Piv and the related IS110/IS492 transposases mediate DNA recombination by a common mechanism involving a catalytic DED or DDD motif.

  1. Large Putative PEST-like Sequence Motif at the Carboxyl Tail of Human Calcium Receptor Directs Lysosomal Degradation and Regulates Cell Surface Receptor Level*

    PubMed Central

    Zhuang, Xiaolei; Northup, John K.; Ray, Kausik

    2012-01-01

    A deletion between amino acid residues Ser895 and Val1075 in the carboxyl terminus of the human calcium receptor (hCaR), which causes autosomal dominant hypocalcemia, showed enhanced signaling activity and increased cell surface expression in HEK293 cells (Lienhardt, A., Garabédian, M. G., Bai, M., Sinding, C., Zhang, Z., Lagarde, J. P., Boulesteix, J., Rigaud, M., Brown, E. M., and Kottler, M. L. (2000) J. Clin. Endocrinol. Metab. 85, 1695–1702). To identify the underlying mechanism(s) for these increases, we investigated the effects of carboxyl tail truncation and deletion in hCaR mutants using a combination of biochemical and cell imaging approaches to define motifs that participate in regulating cell surface numbers of this G protein-coupled receptor. Our data indicate a rapid constitutive receptor internalization of the cell surface hCaR, accumulating in early (Rab7 positive) and late endosomal (LAMP1 positive) sorting compartments, before targeting to lysosomes for degradation. Recycling of hCaR back to the cell surface was also evident. Truncation and deletion mapping defined a 51-amino acid sequence between residues 920 and 970 that is required for targeting to lysosomes and degradation but not for internalization or recycling of the receptor. No singular sequence motif was identified, instead the required sequence elements seem to distribute throughout this entire interval. This interval includes a high proportion of acidic and hydroxylated amino acid residues, suggesting a similarity to PEST-like degradation motif (PESTfind score of +10) and several glutamine repeats. The results define a novel large PEST-like sequence that participates in the sorting of internalized hCaR routed to the lysosomal/degradation pathway that regulates cell surface receptor numbers. PMID:22158862

  2. Exceptional motifs in different Markov chain models for a statistical analysis of DNA sequences.

    PubMed

    Schbath, S; Prum, B; de Turckheim, E

    1995-01-01

    Identifying exceptional motifs is often used for extracting information from long DNA sequences. The two difficulties of the method are the choice of the model that defines the expected frequencies of words and the approximation of the variance of the difference T(W) between the number of occurrences of a word W and its estimation. We consider here different Markov chain models, either with stationary or periodic transition probabilities. We estimate the variance of the difference T(W) by the conditional variance of the number of occurrences of W given the oligonucleotides counts that define the model. Two applications show how to use asymptotically standard normal statistics associated with the counts to describe a given sequence in terms of its outlying words. Sequences of Escherichia coli and of Bacillus subtilis are compared with respect to their exceptional tri- and tetranucleotides. For both bacteria, exceptional 3-words are mainly found in the coding frame. E. coli palindrome counts are analyzed in different models, showing that many overabundant words are one-letter mutations of avoided palindromes.

  3. Synthesis, anti-mycobacterial activity and DNA sequence-selectivity of a library of biaryl-motifs containing polyamides.

    PubMed

    Brucoli, Federico; Guzman, Juan D; Maitra, Arundhati; James, Colin H; Fox, Keith R; Bhakta, Sanjib

    2015-07-01

    The alarming rise of extensively drug-resistant tuberculosis (XDR-TB) strains, compel the development of new molecules with novel modes of action to control this world health emergency. Distamycin analogues containing N-terminal biaryl-motifs 2(1-5)(1-7) were synthesised using a solution-phase approach and evaluated for their anti-mycobacterial activity and DNA-sequence selectivity. Thiophene dimer motif-containing polyamide 2(2,6) exhibited 10-fold higher inhibitory activity against Mycobacterium tuberculosis compared to distamycin and library member 2(5,7) showed high binding affinity for the 5'-ACATAT-3' sequence.

  4. Engineering Proteins with Enhanced Mechanical Stability by Force Specific Sequence Motifs

    PubMed Central

    Lu, Wenzhe; Negi, Surendra; Oberhauser, Andres F.; Braun, Werner

    2012-01-01

    Use of atomic force microscopy (AFM) has recently led to a better understanding of the molecular mechanisms of the unfolding process by mechanical forces; however, the rational design of novel proteins with specific mechanical strength remains challenging. We have approached this problem from a new perspective that generates linear physical-chemical properties (PCP) motifs from a limited AFM data set. Guided by our linear sequence analysis we designed and analyzed four new mutants of the titin I1 domain with the goal of increasing the domain's mechanical strength. All four mutants could be cloned and expressed as soluble proteins. AFM data indicate that at least two of the mutants have increased molecular mechanical strength. This observation suggests that the PCP method is useful to graft sequences specific for high mechanical stability to weak proteins to increase their mechanical stability, and represents an additional tool in the design of novel proteins besides steered molecular dynamics calculations, coarse grained simulations and phi-value analysis of the transition state. PMID:22274941

  5. In the TTF-1 homeodomain the contribution of several amino acids to DNA recognition depends on the bound sequence.

    PubMed Central

    Fabbro, D; Tell, G; Leonardi, A; Pellizzari, L; Pucillo, C; Lonigro, R; Formisano, S; Damante, G

    1996-01-01

    The thyroid transcription factor-1 homeodomain (TTF-1HD) shows a peculiar DNA binding specificity, preferentially recognizing sequences containing the 5'-CAAG-3' core motif. Most other homeodomains instead recognize sites containing the 5'-TAAT-3' core motif. Here, we show that TTF-1HD efficiently recognizes another sequence, called D1, devoid of the 5'-CAAG-3' core motif. Different experimental approaches indicate that TTF-1HD contacts the D1 sequence in a manner which is different to that used to interact with sequences containing the 5'-CAAG-3' core motif. The binding activities that mutants of TTF-1HD display with the D1 sequence or with the sequence containing the 5'-CAAG-3' core motif indicate that the role of several DNA-contacting amino acids is different. In particular, during recognition of the D1 sequence, backbone-interacting amino acids not relevant in binding to sequences containing the 5'-CAAG-3' core motif play an important role. In the TTF-1HD, therefore, the contribution of several amino acids to DNA recognition depends on the bound sequence. These data indicate that although a common bonding network exists in all of the HD/DNA complexes, peculiarities important for DNA recognition may occur in single cases. PMID:8811078

  6. Functional structural motifs for protein-ligand, protein-protein, and protein-nucleic acid interactions and their connection to supersecondary structures.

    PubMed

    Kinjo, Akira R; Nakamura, Haruki

    2013-01-01

    Protein functions are mediated by interactions between proteins and other molecules. One useful approach to analyze protein functions is to compare and classify the structures of interaction interfaces of proteins. Here, we describe the procedures for compiling a database of interface structures and efficiently comparing the interface structures. To do so requires a good understanding of the data structures of the Protein Data Bank (PDB). Therefore, we also provide a detailed account of the PDB exchange dictionary necessary for extracting data that are relevant for analyzing interaction interfaces and secondary structures. We identify recurring structural motifs by classifying similar interface structures, and we define a coarse-grained representation of supersecondary structures (SSS) which represents a sequence of two or three secondary structure elements including their relative orientations as a string of four to seven letters. By examining the correspondence between structural motifs and SSS strings, we show that no SSS string has particularly high propensity to be found interaction interfaces in general, indicating any SSS can be used as a binding interface. When individual structural motifs are examined, there are some SSS strings that have high propensity for particular groups of structural motifs. In addition, it is shown that while the SSS strings found in particular structural motifs for nonpolymer and protein interfaces are as abundant as in other structural motifs that belong to the same subunit, structural motifs for nucleic acid interfaces exhibit somewhat stronger preference for SSS strings. In regard to protein folds, many motif-specific SSS strings were found across many folds, suggesting that SSS may be a useful description to investigate the universality of ligand binding modes.

  7. Endocytosis and Trafficking of Natriuretic Peptide Receptor-A: Potential Role of Short Sequence Motifs

    PubMed Central

    Pandey, Kailash N.

    2015-01-01

    The targeted endocytosis and redistribution of transmembrane receptors among membrane-bound subcellular organelles are vital for their correct signaling and physiological functions. Membrane receptors committed for internalization and trafficking pathways are sorted into coated vesicles. Cardiac hormones, atrial and brain natriuretic peptides (ANP and BNP) bind to guanylyl cyclase/natriuretic peptide receptor-A (GC-A/NPRA) and elicit the generation of intracellular second messenger cyclic guanosine 3',5'-monophosphate (cGMP), which lowers blood pressure and incidence of heart failure. After ligand binding, the receptor is rapidly internalized, sequestrated, and redistributed into intracellular locations. Thus, NPRA is considered a dynamic cellular macromolecule that traverses different subcellular locations through its lifetime. The utilization of pharmacologic and molecular perturbants has helped in delineating the pathways of endocytosis, trafficking, down-regulation, and degradation of membrane receptors in intact cells. This review describes the investigation of the mechanisms of internalization, trafficking, and redistribution of NPRA compared with other cell surface receptors from the plasma membrane into the cell interior. The roles of different short-signal peptide sequence motifs in the internalization and trafficking of other membrane receptors have been briefly reviewed and their potential significance in the internalization and trafficking of NPRA is discussed. PMID:26151885

  8. Helicobacter pylori CagA: analysis of sequence diversity in relation to phosphorylation motifs and implications for the role of CagA as a virulence factor.

    PubMed

    Evans, D J; Evans, D G

    2001-09-01

    CagA is transported into host target cells and subsequently phosphorylated. Clearly this is a mechanism by which Helicobacter pylori could take control of one or more host cell signal transduction pathways. Presumably the end result of this interaction favors survival of H. pylori, irrespective of eventual damage to the host cell. CagA is noted for its amino acid (AA) sequence diversity, both within and outside the variable region of the molecule. The primary purpose of this review is to examine how variation in the type and number of CagA phosphorylation sites might determine the outcome of infection by different strains of H. pylori. The answer to this question could help to explain the widely disparate results obtained when H. pylori CagA status has been compared to type and severity of disease outcome in different populations, that is in different countries. Analysis of all available CagA sequences revealed that CagA contains both tyrosine phosphorylation motifs (TPMs) and cyclic-AMP-dependent phosphorylation motifs (CPMs). There are two potential CPMs near the N-terminus of CagA and at least two in the repeat region; these are not all equally well conserved. We also defined a 48-residue AA sequence, which includes the N-terminal TPM at tyrosine (Y)-122, which distinguishes between Eastern (Hong Kong-Taiwan-Japan-Thailand) H. pylori isolates and those from the West (Europe-Africa-the Americas-Australia). All 28 of the Eastern type CagA proteins have a functional N-terminal TPM whereas 11 of 47 (23.4%) of the Western type contain an inactive motif, with threonine (T) replacing the critical aspartic acid (D) residue. Only 13 of 24 (54%) known CagA sequences have an active TPM in the repeat region and only one has two TPMs in this region. The potential TPM near the C-terminus of CagA is not likely to be important since only 3 of 24 (12.5%) sequences were found to be intact. Protein database searches revealed that the AA sequence immediately following the TPM at Y

  9. Finishing and Special Motifs: Lessons Learned from CRISPR Analysis Using Next-Generation Draft Sequences ( 7th Annual SFAF Meeting, 2012)

    SciTech Connect

    Campbell, Catherine

    2012-06-01

    Catherine Campbell on "Finishing and Special Motifs: Lessons learned from CRISPR analysis using next-generation draft sequences" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.

  10. Finishing and Special Motifs: Lessons Learned from CRISPR Analysis Using Next-Generation Draft Sequences ( 7th Annual SFAF Meeting, 2012)

    ScienceCinema

    Campbell, Catherine [Noblis

    2016-07-12

    Catherine Campbell on "Finishing and Special Motifs: Lessons learned from CRISPR analysis using next-generation draft sequences" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.

  11. Triazine-Based Sequence-Defined Polymers with Side-Chain Diversity and Backbone-Backbone Interaction Motifs.

    PubMed

    Grate, Jay W; Mo, Kai-For; Daily, Michael D

    2016-03-14

    Sequence control in polymers, well-known in nature, encodes structure and functionality. Here we introduce a new architecture, based on the nucleophilic aromatic substitution chemistry of cyanuric chloride, that creates a new class of sequence-defined polymers dubbed TZPs. Proof of concept is demonstrated with two synthesized hexamers, having neutral and ionizable side chains. Molecular dynamics simulations show backbone-backbone interactions, including H-bonding motifs and pi-pi interactions. This architecture is arguably biomimetic while differing from sequence-defined polymers having peptide bonds. The synthetic methodology supports the structural diversity of side chains known in peptides, as well as backbone-backbone hydrogen-bonding motifs, and will thus enable new macromolecules and materials with useful functions.

  12. DNA recognition for virus assembly through multiple sequence-independent interactions with a helix-turn-helix motif

    PubMed Central

    Greive, Sandra J.; Fung, Herman K.H.; Chechik, Maria; Jenkins, Huw T.; Weitzel, Stephen E.; Aguiar, Pedro M.; Brentnall, Andrew S.; Glousieau, Matthieu; Gladyshev, Grigory V.; Potts, Jennifer R.; Antson, Alfred A.

    2016-01-01

    The helix-turn-helix (HTH) motif features frequently in protein DNA-binding assemblies. Viral pac site-targeting small terminase proteins possess an unusual architecture in which the HTH motifs are displayed in a ring, distinct from the classical HTH dimer. Here we investigate how such a circular array of HTH motifs enables specific recognition of the viral genome for initiation of DNA packaging during virus assembly. We found, by surface plasmon resonance and analytical ultracentrifugation, that individual HTH motifs of the Bacillus phage SF6 small terminase bind the packaging regions of SF6 and related SPP1 genome weakly, with little local sequence specificity. Nuclear magnetic resonance chemical shift perturbation studies with an arbitrary single-site substrate suggest that the HTH motif contacts DNA similarly to how certain HTH proteins contact DNA non-specifically. Our observations support a model where specificity is generated through conformational selection of an intrinsically bent DNA segment by a ring of HTHs which bind weakly but cooperatively. Such a system would enable viral gene regulation and control of the viral life cycle, with a minimal genome, conferring a major evolutionary advantage for SPP1-like viruses. PMID:26673721

  13. Defining RNA motif-aminoglycoside interactions via two-dimensional combinatorial screening and structure-activity relationships through sequencing.

    PubMed

    Velagapudi, Sai Pradeep; Disney, Matthew D

    2013-10-15

    RNA is an extremely important target for the development of chemical probes of function or small molecule therapeutics. Aminoglycosides are the most well studied class of small molecules to target RNA. However, the RNA motifs outside of the bacterial rRNA A-site that are likely to be bound by these compounds in biological systems is largely unknown. If such information were known, it could allow for aminoglycosides to be exploited to target other RNAs and, in addition, could provide invaluable insights into potential bystander targets of these clinically used drugs. We utilized two-dimensional combinatorial screening (2DCS), a library-versus-library screening approach, to select the motifs displayed in a 3×3 nucleotide internal loop library and in a 6-nucleotide hairpin library that bind with high affinity and selectivity to six aminoglycoside derivatives. The selected RNA motifs were then analyzed using structure-activity relationships through sequencing (StARTS), a statistical approach that defines the privileged RNA motif space that binds a small molecule. StARTS allowed for the facile annotation of the selected RNA motif-aminoglycoside interactions in terms of affinity and selectivity. The interactions selected by 2DCS generally have nanomolar affinities, which is higher affinity than the binding of aminoglycosides to a mimic of their therapeutic target, the bacterial rRNA A-site.

  14. DNA consensus sequence motif for binding response regulator PhoP, a virulence regulator of Mycobacterium tuberculosis.

    PubMed

    He, Xiaoyuan; Wang, Shuishu

    2014-12-30

    Tuberculosis has reemerged as a serious threat to human health because of the increasing prevalence of drug-resistant strains and synergetic infection with HIV, prompting an urgent need for new and more efficient treatments. The PhoP-PhoR two-component system of Mycobacterium tuberculosis plays an important role in the virulence of the pathogen and thus represents a potential drug target. To study the mechanism of gene transcription regulation by response regulator PhoP, we identified a high-affinity DNA sequence for PhoP binding using systematic evolution of ligands by exponential enrichment. The sequence contains a direct repeat of two 7 bp motifs separated by a 4 bp spacer, TCACAGC(N4)TCACAGC. The specificity of the direct-repeat sequence for PhoP binding was confirmed by isothermal titration calorimetry and electrophoretic mobility shift assays. PhoP binds to the direct repeat as a dimer in a highly cooperative manner. We found many genes previously identified to be regulated by PhoP that contain the direct-repeat motif in their promoter sequences. Synthetic DNA fragments at the putative promoter-binding sites bind PhoP with variable affinity, which is related to the number of mismatches in the 7 bp motifs, the positions of the mismatches, and the spacer and flanking sequences. Phosphorylation of PhoP increases the affinity but does not change the specificity of DNA binding. Overall, our results confirm the direct-repeat sequence as the consensus motif for PhoP binding and thus pave the way for identification of PhoP directly regulated genes in different mycobacterial genomes.

  15. SVM2Motif—Reconstructing Overlapping DNA Sequence Motifs by Mimicking an SVM Predictor

    PubMed Central

    Vidovic, Marina M. -C.; Görnitz, Nico; Müller, Klaus-Robert; Rätsch, Gunnar; Kloft, Marius

    2015-01-01

    Identifying discriminative motifs underlying the functionality and evolution of organisms is a major challenge in computational biology. Machine learning approaches such as support vector machines (SVMs) achieve state-of-the-art performances in genomic discrimination tasks, but—due to its black-box character—motifs underlying its decision function are largely unknown. As a remedy, positional oligomer importance matrices (POIMs) allow us to visualize the significance of position-specific subsequences. Although being a major step towards the explanation of trained SVM models, they suffer from the fact that their size grows exponentially in the length of the motif, which renders their manual inspection feasible only for comparably small motif sizes, typically k ≤ 5. In this work, we extend the work on positional oligomer importance matrices, by presenting a new machine-learning methodology, entitled motifPOIM, to extract the truly relevant motifs—regardless of their length and complexity—underlying the predictions of a trained SVM model. Our framework thereby considers the motifs as free parameters in a probabilistic model, a task which can be phrased as a non-convex optimization problem. The exponential dependence of the POIM size on the oligomer length poses a major numerical challenge, which we address by an efficient optimization framework that allows us to find possibly overlapping motifs consisting of up to hundreds of nucleotides. We demonstrate the efficacy of our approach on a synthetic data set as well as a real-world human splice site data set. PMID:26690911

  16. A combined sequence and structure based method for discovering enriched motifs in RNA from in vivo binding data.

    PubMed

    Polishchuk, Maya; Paz, Inbal; Kohen, Refael; Mesika, Rona; Yakhini, Zohar; Mandel-Gutfreund, Yael

    2017-03-06

    RNA binding proteins (RBPs) play an important role in regulating many processes in the cell. RBPs often recognize their RNA targets in a specific manner. In addition to the RNA primary sequence, the structure of the RNA has been shown to play a central role in RNA recognition by RBPs. In recent years, many experimental approaches, both in vitro and in vivo, were developed and employed to identify and characterize RBP targets and extract their binding specificities. In vivo binding techniques, such as CrossLinking and ImmunoPrecipitation (CLIP)-based methods, enable the characterization of protein binding sites on RNA targets. However, these methods do not provide information regarding the structural preferences of the protein. While methods to obtain the structure of RNA are available, inferring both the sequence and the structure preferences of RBPs remains a challenge. Here we present SMARTIV, a novel computational tool for discovering combined sequence and structure binding motifs from in vivo RNA binding data relying on the sequences of the target sites, the ranking of their binding scores and their predicted secondary structure. The combined motifs are provided in a unified representation that is informative and easy for visual perception. We tested the method on CLIP-seq data from different platforms for a variety of RBPs. Overall, we show that our results are highly consistent with known binding motifs of RBPs, offering additional information on their structural preferences.

  17. A survey of DNA motif finding algorithms

    PubMed Central

    Das, Modan K; Dai, Ho-Kwok

    2007-01-01

    Background Unraveling the mechanisms that regulate gene expression is a major challenge in biology. An important task in this challenge is to identify regulatory elements, especially the binding sites in deoxyribonucleic acid (DNA) for transcription factors. These binding sites are short DNA segments that are called motifs. Recent advances in genome sequence availability and in high-throughput gene expression analysis technologies have allowed for the development of computational methods for motif finding. As a result, a large number of motif finding algorithms have been implemented and applied to various motif models over the past decade. This survey reviews the latest developments in DNA motif finding algorithms. Results Earlier algorithms use promoter sequences of coregulated genes from single genome and search for statistically overrepresented motifs. Recent algorithms are designed to use phylogenetic footprinting or orthologous sequences and also an integrated approach where promoter sequences of coregulated genes and phylogenetic footprinting are used. All the algorithms studied have been reported to correctly detect the motifs that have been previously detected by laboratory experimental approaches, and some algorithms were able to find novel motifs. However, most of these motif finding algorithms have been shown to work successfully in yeast and other lower organisms, but perform significantly worse in higher organisms. Conclusion Despite considerable efforts to date, DNA motif finding remains a complex challenge for biologists and computer scientists. Researchers have taken many different approaches in developing motif discovery tools and the progress made in this area of research is very encouraging. Performance comparison of different motif finding tools and identification of the best tools have proven to be a difficult task because tools are designed based on algorithms and motif models that are diverse and complex and our incomplete understanding of

  18. MSDmotif: exploring protein sites and motifs

    PubMed Central

    Golovin, Adel; Henrick, Kim

    2008-01-01

    Background Protein structures have conserved features – motifs, which have a sufficient influence on the protein function. These motifs can be found in sequence as well as in 3D space. Understanding of these fragments is essential for 3D structure prediction, modelling and drug-design. The Protein Data Bank (PDB) is the source of this information however present search tools have limited 3D options to integrate protein sequence with its 3D structure. Results We describe here a web application for querying the PDB for ligands, binding sites, small 3D structural and sequence motifs and the underlying database. Novel algorithms for chemical fragments, 3D motifs, ϕ/ψ sequences, super-secondary structure motifs and for small 3D structural motif associations searches are incorporated. The interface provides functionality for visualization, search criteria creation, sequence and 3D multiple alignment options. MSDmotif is an integrated system where a results page is also a search form. A set of motif statistics is available for analysis. This set includes molecule and motif binding statistics, distribution of motif sequences, occurrence of an amino-acid within a motif, correlation of amino-acids side-chain charges within a motif and Ramachandran plots for each residue. The binding statistics are presented in association with properties that include a ligand fragment library. Access is also provided through the distributed Annotation System (DAS) protocol. An additional entry point facilitates XML requests with XML responses. Conclusion MSDmotif is unique by combining chemical, sequence and 3D data in a single search engine with a range of search and visualisation options. It provides multiple views of data found in the PDB archive for exploring protein structures. PMID:18637174

  19. Phylogenetic Analysis of Geographically Diverse Radopholus similis via rDNA Sequence Reveals a Monomorphic Motif.

    PubMed

    Kaplan, D T; Thomas, W K; Frisse, L M; Sarah, J L; Stanton, J M; Speijer, P R; Marin, D H; Opperman, C H

    2000-06-01

    The nucleic acid sequences of rDNA ITS1 and the rDNA D2/D3 expansion segment were compared for 57 burrowing nematode isolates collected from Australia, Cameroon, Central America, Cuba, Dominican Republic, Florida, Guadeloupe, Hawaii, Nigeria, Honduras, Indonesia, Ivory Coast, Puerto Rico, South Africa, and Uganda. Of the 57 isolates, 55 were morphologically similar to Radopholus similis and seven were citrus-parasitic. The nucleic acid sequences for PCR-amplified ITS1 and for the D2/D3 expansion segment of the 28S rDNA gene were each identical for all putative R. similis. Sequence divergence for both the ITS1 and the D2/D3 was concordant with morphological differences that distinguish R. similis from other burrowing nematode species. This result substantiates previous observations that the R. similis genome is highly conserved across geographic regions. Autapomorphies that would delimit phylogenetic lineages of non-citrus-parasitic R. similis from those that parasitize citrus were not observed. The data presented herein support the concept that R. similis is comprised of two pathotypes-one that parasitizes citrus and one that does not.

  20. Chip-based sequencing nucleic acids

    DOEpatents

    Beer, Neil Reginald

    2014-08-26

    A system for fast DNA sequencing by amplification of genetic material within microreactors, denaturing, demulsifying, and then sequencing the material, while retaining it in a PCR/sequencing zone by a magnetic field. One embodiment includes sequencing nucleic acids on a microchip that includes a microchannel flow channel in the microchip. The nucleic acids are isolated and hybridized to magnetic nanoparticles or to magnetic polystyrene-coated beads. Microreactor droplets are formed in the microchannel flow channel. The microreactor droplets containing the nucleic acids and the magnetic nanoparticles are retained in a magnetic trap in the microchannel flow channel and sequenced.

  1. Localization of proteins to the 1,2-propanediol utilization microcompartment by non-native signal sequences is mediated by a common hydrophobic motif.

    PubMed

    Jakobson, Christopher M; Kim, Edward Y; Slininger, Marilyn F; Chien, Alex; Tullman-Ercek, Danielle

    2015-10-02

    Various bacteria localize metabolic pathways to proteinaceous organelles known as bacterial microcompartments (MCPs), enabling the metabolism of carbon sources to enhance survival and pathogenicity in the gut. There is considerable interest in exploiting bacterial MCPs for metabolic engineering applications, but little is known about the interactions between MCP signal sequences and the protein shells of different MCP systems. We found that the N-terminal sequences from the ethanolamine utilization (Eut) and glycyl radical-generating protein MCPs are able to target reporter proteins to the 1,2-propanediol utilization (Pdu) MCP, and that this localization is mediated by a conserved hydrophobic residue motif. Recapitulation of this motif by the addition of a single amino acid conferred targeting function on an N-terminal sequence from the ethanol utilization MCP system that previously did not act as a Pdu signal sequence. Moreover, the Pdu-localized signal sequences competed with native Pdu targeting sequences for encapsulation in the Pdu MCP. Salmonella enterica natively possesses both the Pdu and Eut operons, and our results suggest that Eut proteins might be localized to the Pdu MCP in vivo. We further demonstrate that S. enterica LT2 retained the ability to grow on 1,2-propanediol as the sole carbon source when a Pdu enzyme was replaced with its Eut homolog. Although the relevance of this finding to the native system remains to be explored, we show that the Pdu-localized signal sequences described herein allow control over the ratio of heterologous proteins encapsulated within Pdu MCPs.

  2. Enhanced Binding Affinity for an i-Motif DNA Substrate Exhibited by a Protein Containing Nucleobase Amino Acids.

    PubMed

    Bai, Xiaoguang; Talukder, Poulami; Daskalova, Sasha M; Roy, Basab; Chen, Shengxi; Li, Zhongxian; Dedkova, Larisa M; Hecht, Sidney M

    2017-03-17

    Several variants of a nucleic acid binding motif (RRM1) of putative transcription factor hnRNP LL containing nucleobase amino acids at specific positions have been prepared and used to study binding affinity for the BCL2 i-motif DNA. Molecular modeling suggested a number of amino acids in RRM1 likely to be involved in interaction with the i-motif DNA, and His24 and Arg26 were chosen for modification based on their potential ability to interact with G14 of the i-motif DNA. Four nucleobase amino acids were introduced into RRM1 at one or both of positions 24 and 26. The introduction of cytosine nucleobase 2 into position 24 of RRM1 increased the affinity of the modified protein for the i-motif DNA, consistent with the possible Watson-Crick interaction of 2 and G14. In comparison, the introduction of uracil nucleobase 3 had a minimal effect on DNA affinity. Two structurally simplified nucleobase analogues (1 and 4) lacking both the N-1 and the 2-oxo substituents were also introduced in lieu of His24. Again, the RRM1 analogue containing 1 exhibited enhanced affinity for the i-motif DNA, while the protein analogue containing 4 bound less tightly to the DNA substrate. Finally, the modified protein containing 1 in lieu of Arg26 also bound to the i-motif DNA more strongly than the wild-type protein, but a protein containing 1 both at positions 24 and 26 bound to the DNA less strongly than wild type. The results support the idea of using nucleobase amino acids as protein constituents for controlling and enhancing DNA-protein interaction. Finally, modification of the i-motif DNA at G14 diminished RRM1-DNA interaction, as well as the ability of nucleobase amino acid 1 to stabilize RRM1-DNA interaction.

  3. MINT: software to identify motifs and short-range interactions in trajectories of nucleic acids

    PubMed Central

    Górska, Anna; Jasiński, Maciej; Trylska, Joanna

    2015-01-01

    Structural biology experiments and structure prediction tools have provided many high-resolution three-dimensional structures of nucleic acids. Also, molecular dynamics force field parameters have been adapted to simulating charged and flexible nucleic acid structures on microsecond time scales. Therefore, we can generate the dynamics of DNA or RNA molecules, but we still lack adequate tools for the analysis of the resulting huge amounts of data. We present MINT (Motif Identifier for Nucleic acids Trajectory) — an automatic tool for analyzing three-dimensional structures of RNA and DNA, and their full-atom molecular dynamics trajectories or other conformation sets (e.g. X-ray or nuclear magnetic resonance-derived structures). For each RNA or DNA conformation MINT determines the hydrogen bonding network resolving the base pairing patterns, identifies secondary structure motifs (helices, junctions, loops, etc.) and pseudoknots. MINT also estimates the energy of stacking and phosphate anion-base interactions. For many conformations, as in a molecular dynamics trajectory, MINT provides averages of the above structural and energetic features and their evolution. We show MINT functionality based on all-atom explicit solvent molecular dynamics trajectory of the 30S ribosomal subunit. PMID:26024667

  4. Relation between mRNA expression and sequence information in Desulfovibrio vulgaris: Combinatorial contributions of upstream regulatory motifs and coding sequence features to variations in mRNA abundance

    SciTech Connect

    Wu, Gang; Nie, Lei; Zhang, Weiwen

    2006-05-26

    ABSTRACT-The context-dependent expression of genes is the core for biological activities, and significant attention has been given to identification of various factors contributing to gene expression at genomic scale. However, so far this type of analysis has been focused whether on relation between mRNA expression and non-coding sequence features such as upstream regulatory motifs or on correlation between mRN abundance and non-random features in coding sequences (e.g. codon usage and amino acid usage). In this study multiple regression analyses of the mRNA abundance and all sequence information in Desulfovibrio vulgaris were performed, with the goal to investigate how much coding and non-coding sequence features contribute to the variations in mRNA expression, and in what manner they act together...

  5. A DNA-binding protein containing two widely separated zinc finger motifs that recognize the same DNA sequence.

    PubMed

    Fan, C M; Maniatis, T

    1990-01-01

    We have isolated a full-length cDNA clone encoding a protein (PRDII-BF1) that binds specifically to a positive regulatory domain (PRDII) of the human IFN-beta gene promoter, and to a similar sequence present in a number of other promoters and enhancers. The sequence of this protein reveals two novel structural features. First, it is the largest sequence-specific DNA-binding protein reported to date (298 kD). Second, it contains two widely separated sets of C2-H2-type zinc fingers. Remarkably, each set of zinc fingers binds to the same DNA sequence motif with similar affinities and methylation interference patterns. Thus, this protein may act by binding simultaneously to reiterated copies of the same recognition sequence. Although the function of PRDII-BF1 is not known, the level of its mRNA is inducible by serum and virus, albeit with different kinetics.

  6. Comparative Analysis of P450 Signature Motifs EXXR and CXG in the Large and Diverse Kingdom of Fungi: Identification of Evolutionarily Conserved Amino Acid Patterns Characteristic of P450 Family

    PubMed Central

    Syed, Khajamohiddin; Mashele, Samson Sitheni

    2014-01-01

    Cytochrome P450 monooxygenases (P450s) are heme-thiolate proteins distributed across the biological kingdoms. P450s are catalytically versatile and play key roles in organisms primary and secondary metabolism. Identification of P450s across the biological kingdoms depends largely on the identification of two P450 signature motifs, EXXR and CXG, in the protein sequence. Once a putative protein has been identified as P450, it will be assigned to a family and subfamily based on the criteria that P450s within a family share more than 40% homology and members of subfamilies share more than 55% homology. However, to date, no evidence has been presented that can distinguish members of a P450 family. Here, for the first time we report the identification of EXXR- and CXG-motifs-based amino acid patterns that are characteristic of the P450 family. Analysis of P450 signature motifs in the under-explored fungal P450s from four different phyla, ascomycota, basidiomycota, zygomycota and chytridiomycota, indicated that the EXXR motif is highly variable and the CXG motif is somewhat variable. The amino acids threonine and leucine are preferred as second and third amino acids in the EXXR motif and proline and glycine are preferred as second and third amino acids in the CXG motif in fungal P450s. Analysis of 67 P450 families from biological kingdoms such as plants, animals, bacteria and fungi showed conservation of a set of amino acid patterns characteristic of a particular P450 family in EXXR and CXG motifs. This suggests that during the divergence of P450 families from a common ancestor these amino acids patterns evolve and are retained in each P450 family as a signature of that family. The role of amino acid patterns characteristic of a P450 family in the structural and/or functional aspects of members of the P450 family is a topic for future research. PMID:24743800

  7. Distinguishing Proteins From Arbitrary Amino Acid Sequences

    PubMed Central

    Yau, Stephen S.-T.; Mao, Wei-Guang; Benson, Max; He, Rong Lucy

    2015-01-01

    What kinds of amino acid sequences could possibly be protein sequences? From all existing databases that we can find, known proteins are only a small fraction of all possible combinations of amino acids. Beginning with Sanger's first detailed determination of a protein sequence in 1952, previous studies have focused on describing the structure of existing protein sequences in order to construct the protein universe. No one, however, has developed a criteria for determining whether an arbitrary amino acid sequence can be a protein. Here we show that when the collection of arbitrary amino acid sequences is viewed in an appropriate geometric context, the protein sequences cluster together. This leads to a new computational test, described here, that has proved to be remarkably accurate at determining whether an arbitrary amino acid sequence can be a protein. Even more, if the results of this test indicate that the sequence can be a protein, and it is indeed a protein sequence, then its identity as a protein sequence is uniquely defined. We anticipate our computational test will be useful for those who are attempting to complete the job of discovering all proteins, or constructing the protein universe. PMID:25609314

  8. Structural Analysis of a β-Helical Protein Motif Stabilized by Targeted Replacements with Conformationally Constrained Amino Acids

    PubMed Central

    Ballano, Gema; Zanuy, David; Jiménez, Ana I.; Cativiela, Carlos; Nussinov, Ruth; Alemán, Carlos

    2009-01-01

    Here we study conformational stabilization induced in a β-helical nanostructure by position-specific mutations. The nanostructure is constructed through the self-assembly of the β-helical building block excised from E. coli galactoside acetyltransferase (PDB code 1krr, chain A; residues 131-165). The mutations involve substitutions by cyclic, conformationally constrained amino acids. Specifically, a complete structural analysis of the Pro-Xaa-Val sequence [with Xaa being Gly, Ac3c (1-aminocyclopropane-1-carboxylic acid) and Ac5c (1-aminocyclopentane-1-carboxylic acid)], corresponding to the 148-150 loop region in the wild-type (Gly) and mutated (Ac3c and Ac5c) 1krr, has been performed using Molecular Dynamics simulations and X-ray crystallography. Simulations have been performed for the wild-type and mutants of three different systems, namely the building block, the nanoconstruct and the isolated Pro-Xaa-Val tripeptide. Furthermore, the crystalline structures of five peptides of Pro-Xaa-Val or Xaa-Val sequences have been solved by X-ray diffraction analysis and compared with theoretical predictions. Both the theoretical and crystallographic studies indicate that the Pro-Acnc-Val sequences exhibit a high propensity to adopt turn-like conformations, and this propensity is little affected by the chemical environment. Overall, the results indicate that replacement of Gly149 by Ac3c or Ac5c significantly reduce the conformational flexibility of the target site enhancing the structural specificity of the building block and the nanoconstruct derived from the 1krr β-helical motif. PMID:18811190

  9. The complete amino acid sequence of prochymosin.

    PubMed Central

    Foltmann, B; Pedersen, V B; Jacobsen, H; Kauffman, D; Wybrandt, G

    1977-01-01

    The total sequence of 365 amino acid residues in bovine prochymosin is presented. Alignment with the amino acid sequence of porcine pepsinogen shows that 204 amino acid residues are common to the two zymogens. Further comparison and alignment with the amino acid sequence of penicillopepsin shows that 66 residues are located at identical positions in all three proteases. The three enzymes belong to a large group of proteases with two aspartate residues in the active center. This group forms a family derived from one common ancestor. PMID:329280

  10. CDR3 clonotype and amino acid motif diversity of BV19 expressing circulating human CD8 T cells

    PubMed Central

    Yassai, Maryam B.; Demos, Wendy; Janczak, Teresa; Naumova, Elena N.; Gorski, Jack

    2015-01-01

    Generating a detailed description of human T cell repertoire diversity is an important goal in the study of human immunology. The circulation is the source of most T cells used for studies in humans. Here we use high throughput sequencing of TCR BV19 transcripts from CD8 T cells derived from unmanipulated PBMC from an older HLA-A2 individual to provide a quantitative and qualitative description of the clonotypic CDR3 nucleotide and amino acid composition of the TCR β-chain from this subset of circulating CD8 T cells. Aggregated samples from six time points spanning ~ 1.5 years were analyzed to smooth possible temporal fluctuation. BV19 encompasses the well studied RS-encoding clonotypes involved in recognition of the M158–66 epitope from influenza A in HLA-A2 individuals. The clonotype distribution was diverse, complex and self-similar. The amino acid composition was generally skewed in favor of glycines and there were specific amino acids observed at higher frequency at the NDN start position. The motif repertoire distribution was also diverse, complex and self-similar with respect to CDR3 length, NDN start and length. PMID:26593155

  11. ML2Motif—Reliable extraction of discriminative sequence motifs from learning machines

    PubMed Central

    Kloft, Marius; Müller, Klaus-Robert; Görnitz, Nico

    2017-01-01

    High prediction accuracies are not the only objective to consider when solving problems using machine learning. Instead, particular scientific applications require some explanation of the learned prediction function. For computational biology, positional oligomer importance matrices (POIMs) have been successfully applied to explain the decision of support vector machines (SVMs) using weighted-degree (WD) kernels. To extract relevant biological motifs from POIMs, the motifPOIM method has been devised and showed promising results on real-world data. Our contribution in this paper is twofold: as an extension to POIMs, we propose gPOIM, a general measure of feature importance for arbitrary learning machines and feature sets (including, but not limited to, SVMs and CNNs) and devise a sampling strategy for efficient computation. As a second contribution, we derive a convex formulation of motifPOIMs that leads to more reliable motif extraction from gPOIMs. Empirical evaluations confirm the usefulness of our approach on artificially generated data as well as on real-world datasets. PMID:28346487

  12. Bayesian Markov models consistently outperform PWMs at predicting motifs in nucleotide sequences

    PubMed Central

    Siebert, Matthias; Söding, Johannes

    2016-01-01

    Position weight matrices (PWMs) are the standard model for DNA and RNA regulatory motifs. In PWMs nucleotide probabilities are independent of nucleotides at other positions. Models that account for dependencies need many parameters and are prone to overfitting. We have developed a Bayesian approach for motif discovery using Markov models in which conditional probabilities of order k − 1 act as priors for those of order k. This Bayesian Markov model (BaMM) training automatically adapts model complexity to the amount of available data. We also derive an EM algorithm for de-novo discovery of enriched motifs. For transcription factor binding, BaMMs achieve significantly (P    =  1/16) higher cross-validated partial AUC than PWMs in 97% of 446 ChIP-seq ENCODE datasets and improve performance by 36% on average. BaMMs also learn complex multipartite motifs, improving predictions of transcription start sites, polyadenylation sites, bacterial pause sites, and RNA binding sites by 26–101%. BaMMs never performed worse than PWMs. These robust improvements argue in favour of generally replacing PWMs by BaMMs. PMID:27288444

  13. Membrane-bound fatty acid desaturases are inserted co-translationally into the ER and contain different ER retrieval motifs at their carboxy termini.

    PubMed

    McCartney, Andrew W; Dyer, John M; Dhanoa, Preetinder K; Kim, Peter K; Andrews, David W; McNew, James A; Mullen, Robert T

    2004-01-01

    Fatty acid desaturases (FADs) play a prominent role in plant lipid metabolism and are located in various subcellular compartments, including the endoplasmic reticulum (ER). To investigate the biogenesis of ER-localized membrane-bound FADs, we characterized the mechanisms responsible for insertion of Arabidopsis FAD2 and Brassica FAD3 into ER membranes and determined the molecular signals that maintain their ER residency. Using in vitro transcription/translation reactions with ER-derived microsomes, we show that both FAD2 and FAD3 are efficiently integrated into membranes by a co-translational, translocon-mediated pathway. We also demonstrate that while the C-terminus of FAD3 (-KSKIN) contains a functional prototypic dilysine ER retrieval motif, FAD2 contains a novel C-terminal aromatic amino acid-containing sequence (-YNNKL) that is both necessary and sufficient for maintaining localization in the ER. Co-expression of a membrane-bound reporter protein containing the FAD2 C-terminus with a dominant-negative mutant of ADP-ribosylation factor (Arf)1 abolished transient localization of the reporter protein in the Golgi, indicating that the FAD2 peptide signal acts as an ER retrieval motif. Mutational analysis of the FAD2 ER retrieval signal revealed a sequence-specific motif consisting of Phi-X-X-K/R/D/E-Phi-COOH, where -Phi- are large hydrophobic amino acid residues. Interestingly, this aromatic motif was present in a variety of other known and putative ER membrane proteins, including cytochrome P450 and the peroxisomal biogenesis factor Pex10p. Taken together, these data describe the insertion and retrieval mechanisms of FADs and define a new ER localization signal in plants that is responsible for the retrieval of escaped membrane proteins back to the ER.

  14. Method for sequencing nucleic acid molecules

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2006-05-30

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  15. Method for sequencing nucleic acid molecules

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2006-06-06

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  16. Building dictionaries of 1D and 3D motifs by mining the Unaligned 1D sequences of 17 archaeal and bacterial genomes.

    PubMed

    Rigoutsos, I; Gao, Y; Floratos, A; Parida, L

    1999-01-01

    We have used the Teiresias algorithm to carry out unsupervised pattern discovery in a database containing the unaligned ORFs from the 17 publicly available complete archaeal and bacterial genomes and build a 1D dictionary of motifs. These motifs which we refer to as seqlets account for and cover 97.88% of this genomic input at the level of amino acid positions. Each of the seqlets in this 1D dictionary was located among the sequences in Release 38.0 of the Protein Data Bank and the structural fragments corresponding to each seqlet's instances were identified and aligned in three dimensions: those of the seqlets that resulted in RMSD errors below a pre-selected threshold of 2.5 Angstroms were entered in a 3D dictionary of structurally conserved seqlets. These two dictionaries can be thought of as cross-indices that facilitate the tackling of tasks such as automated functional annotation of genomic sequences, local homology identification, local structure characterization, comparative genomics, etc.

  17. ICAP-1, a Novel β1 Integrin Cytoplasmic Domain–associated Protein, Binds to a Conserved and Functionally Important NPXY Sequence Motif of β1 Integrin

    PubMed Central

    Chang, David D.; Wong, Carol; Smith, Healy; Liu, Jenny

    1997-01-01

    The cytoplasmic domains of integrins are essential for cell adhesion. We report identification of a novel protein, ICAP-1 (integrin cytoplasmic domain– associated protein-1), which binds to the β1 integrin cytoplasmic domain. The interaction between ICAP-1 and β1 integrins is highly specific, as demonstrated by the lack of interaction between ICAP-1 and the cytoplasmic domains of other β integrins, and requires a conserved and functionally important NPXY sequence motif found in the COOH-terminal region of the β1 integrin cytoplasmic domain. Mutational studies reveal that Asn and Tyr of the NPXY motif and a Val residue located NH2-terminal to this motif are critical for the ICAP-1 binding. Two isoforms of ICAP-1, a 200–amino acid protein (ICAP-1α) and a shorter 150–amino acid protein (ICAP-1β), derived from alternatively spliced mRNA, are expressed in most cells. ICAP-1α is a phosphoprotein and the extent of its phosphorylation is regulated by the cell–matrix interaction. First, an enhancement of ICAP-1α phosphorylation is observed when cells were plated on fibronectin-coated but not on nonspecific poly-l-lysine–coated surface. Second, the expression of a constitutively activated RhoA protein that disrupts the cell–matrix interaction results in dephosphorylation of ICAP-1α. The regulation of ICAP-1α phosphorylation by the cell–matrix interaction suggests an important role of ICAP-1 during integrin-dependent cell adhesion. PMID:9281591

  18. Modeling and analysis of MH1 domain of Smads and their interaction with promoter DNA sequence motif.

    PubMed

    Makkar, Pooja; Metpally, Raghu Prasad R; Sangadala, Sreedhara; Reddy, Boojala Vijay B

    2009-04-01

    The Smads are a group of related intracellular proteins critical for transmitting the signals to the nucleus from the transforming growth factor-beta (TGF-beta) superfamily of proteins at the cell surface. The prototypic members of the Smad family, Mad and Sma, were first described in Drosophila and Caenorhabditis elegans, respectively. Related proteins in Xenopus, Humans, Mice and Rats were subsequently identified, and are now known as Smads. Smad protein family members act downstream in the TGF-beta signaling pathway mediating various biological processes, including cell growth, differentiation, matrix production, apoptosis and development. Smads range from about 400-500 amino acids in length and are grouped into the receptor-regulated Smads (R-Smads), the common Smads (Co-Smads) and the inhibitory Smads (I-Smads). There are eight Smads in mammals, Smad1/5/8 (bone morphogenetic protein regulated) and Smad2/3 (TGF-beta/activin regulated) are termed R-Smads, Smad4 is denoted as Co-Smad and Smad6/7 are inhibitory Smads. A typical Smad consists of a conserved N-terminal Mad Homology 1 (MH1) domain and a C-terminal Mad Homology 2 (MH2) domain connected by a proline rich linker. The MH1 domain plays key role in DNA recognition and also facilitates the binding of Smad4 to the phosphorylated C-terminus of R-Smads to form activated complex. The MH2 domain exhibits transcriptional activation properties. In order to understand the structural basis of interaction of various Smads with their target proteins and the promoter DNA, we modeled MH1 domain of the remaining mammalian Smads based on known crystal structures of Smad3-MH1 domain bound to GTCT Smad box DNA sequence (1OZJ). We generated a B-DNA structure using average base-pair parameters of Twist, Tilt, Roll and base Slide angles. We then modeled interaction pose of the MH1 domain of Smad1/5/8 to their corresponding DNA sequence motif GCCG. These models provide the structural basis towards understanding functional

  19. Conserved sequence motifs upstream from the co-ordinately expressed vitellogenin and apoVLDLII genes of chicken.

    PubMed

    van het Schip, F; Strijker, R; Samallo, J; Gruber, M; Geert, A B

    1986-11-11

    The vitellogenin and apoVLDLII yolk protein genes of chicken are transcribed in the liver upon estrogenization. To get information on putative regulatory elements, we compared more than 2 kb of their 5' flanking DNA sequences. Common sequence motifs were found in regions exhibiting estrogen-induced changes in chromatin structure. Stretches of alternating pyrimidines and purines of about 30-nucleotides long are present at roughly similar positions. A distinct box of sequence homology in the chicken genes also appears to be present at a similar position in front of the vitellogenin genes of Xenopus laevis, but is absent from the estrogen-responsive egg-white protein genes expressed in the oviduct. In front of the vitellogenin (position -595) and the VLDLII gene (position -548), a DNA element of about 300 base-pairs was found, which possesses structural characteristics of a mobile genetic element and bears homology to the transposon-like Vi element of Xenopus laevis.

  20. A Glance at Microsatellite Motifs from 454 Sequencing Reads of Watermelon Genomic DNA

    Technology Transfer Automated Retrieval System (TEKTRAN)

    A single 454 (Life Sciences Sequencing Technology) run of Charleston Gray watermelon (Citrullus lanatus var. lanatus) genomic DNA was performed and sequence data were assembled. A large scale identification of simple sequence repeat (SSR) was performed and SSR sequence data were used for the develo...

  1. Analysis of Cytochrome P450 Conserved Sequence Motifs between Helices E and H: Prediction of Critical Motifs and Residues in Enzyme Functions

    PubMed Central

    Oezguen, Numan; Kumar, Santosh

    2014-01-01

    Rational approaches have been extensively used to investigate the role of active site residues in cytochrome P450 (CYP) functions. However, recent studies using random mutagenesis suggest an important role for non-active site residues in CYP functions. Meta-analysis of the random mutants showed that 75% of the functionally important non-active site residues are present in 20% of the entire protein between helices E and H (E-H) and conserved sequence motif (CSM) between 7 and 11. The CSM approach was developed recently to investigate the functional role of non-active site residues in CYP2B4. Furthermore, we identified and analyzed the CSM in multiple CYP families and subfamilies in the E-H region. Results from CSM analysis showed that CSM 7, 8, 10, and 11 are conserved in CYP1, CYP2, and CYP3 families, while CSM 9 is conserved only in CYP2 family. Analysis of different CYP2 subfamilies showed that CYP2B and CYP2C have similar characteristics in the CSM, while the characteristics of CYP2A and CYP2D subfamilies are different. Finally, we analyzed CSM 7, 8, 10, and 11, which are common in all the CYP families/subfamilies analyzed, in fifteen important drug-metabolizing CYPs. The results showed that while CSM 8 is most conserved among these CYPs, CSM 7, 9, and 10 have significant variations. We suggest that CSM8 has a common role in all the CYPs that have been analyzed, while CSM 7, 10, and 11 may have relatively specific role within the subfamily. We further suggest that these CSM play important role in opening and closing of the substrate access/egress channel by modulating the flexible/plastic region of the protein. Thus, site-directed mutagenesis of these CSM can be used to study structure-function and dynamic/plasticity-function relationships and to design CYP biocatalysts. PMID:25426333

  2. Design of polymer motifs for nucleic acid recognition and assembly stabilization

    NASA Astrophysics Data System (ADS)

    Zhou, Zhun

    This dissertation describes the synthesis and assembly of bio-functional polymers and the applications of these polymers to drug encapsulation, delivery, and multivalent biomimetic macromolecular recognition between synthetic polymer and nucleic acids. The main content is divided into three parts: (1) polyacidic domains as strongly stabilizing design elements for aqueous phase polyacrylate diblock assembly; (2) small molecule/polymer recognition triggered macromolecular assembly and drug encapsulation; (3) trizaine derivatized polymer as a novel class of "bifacial polymer nucleic acid" (bPoNA) and applications of bPoNA to nanoparticle loading of DNA/RNA, silencing delivery as well as control of aptamer function. Through the studies in part (1) and part (2), it was demonstrated that well-designed polymer motifs are not only able to enhance assemblies driven by non-specific hydrophobic effect, but are also able to direct assemblies based on specific recognitions. In part (3) of this dissertation, this concept was further extended by the design of polyacrylate polymers that are capable of discrete and robust hybridization with nucleic acids. This surprising finding demonstrated both fundamental and practical applications. Overall, these studies provided insights into the rational design elements for improving the bio-functions of synthetic polymers, and significantly expanded the scope of biological applications in which polymers synthesized via controlled radical polymerization may play a role.

  3. A common set of conserved motifs in a vast variety of putative nucleic acid-dependent ATPases including MCM proteins involved in the initiation of eukaryotic DNA replication.

    PubMed Central

    Koonin, E V

    1993-01-01

    A new superfamily of (putative) DNA-dependent ATPases is described that includes the ATPase domains of prokaryotic NtrC-related transcription regulators, MCM proteins involved in the initiation of eukaryotic DNA replication, and a group of uncharacterized bacterial and chloroplast proteins. MCM proteins are shown to contain a modified form of the ATP-binding motif and are predicted to mediate ATP-dependent opening of double-stranded DNA in the replication origins. In a second line of investigation, it is demonstrated that the products of unidentified open reading frames from Marchantia mitochondria and from yeast, and a domain of a baculovirus protein involved in viral DNA replication are related to the superfamily III of DNA and RNA helicases that previously has been known to include only proteins of small viruses. Comparison of the multiple alignments showed that the proteins of the NtrC superfamily and the helicases of superfamily III share three related sequence motifs tightly packed in the ATPase domain that consists of 100-150 amino acid residues. A similar array of conserved motifs is found in the family of DnaA-related ATPases. It is hypothesized that the three large groups of nucleic acid-dependent ATPases have similar structure of the core ATPase domain and have evolved from a common ancestor. PMID:8332451

  4. Examination of the transcription factor NtcA-binding motif by in vitro selection of DNA sequences from a random library.

    PubMed

    Jiang, F; Wisén, S; Widersten, M; Bergman, B; Mannervik, B

    2000-08-25

    A recursive in vitro selection among random DNA sequences was used for analysis of the cyanobacterial transcription factor NtcA-binding motifs. An eight-base palindromic sequence, TGTA-(N(8))-TACA, was found to be the optimal NtcA-binding sequence. The more divergent the binding sequences, compared to this consensus sequence, the lower the NtcA affinity. The second and third bases in each four-nucleotide half of the consensus sequence were crucial for NtcA binding, and they were in general highly conserved. The most frequently occurring sequence in the middle weakly conserved region was similar to that of the NtcA-binding motif of the Anabaena sp. strain PCC 7120 glnA gene, previously known to have high affinity for NtcA. This indicates that the middle sequences were selected for high NtcA affinity. Analysis of natural NtcA-binding motifs showed that these could be classified into two groups based on differences in recognition consensus sequences. It is suggested that NtcA naturally recognizes different DNA-binding motifs, or has differential affinities to these sequences under different physiological conditions.

  5. Spectrometric study of the folding process of i-motif-forming DNA sequences upstream of the c-kit transcription initiation site.

    PubMed

    Bucek, Pavel; Gargallo, Raimundo; Kudrev, Andrei

    2010-12-17

    The c-kit oncogene shows a cytosine-rich DNA region upstream of the transcription initiation site which forms an i-motif structure at slightly acidic pH values (Bucek et al. [5]). In the present study, the pH-induced formation of i-motif - forming sequences 5'-CCC CTC CCT CGC GCC CGC CCG-3' (ckitC1, native), 5'-CCC TTC CCT TGT GCC CGC CCG-3' (ckitC2) and 5'-CCCTT CCC TTTTT CCC T CCC T-3' (ckitC3) was studied by spectroscopic techniques, such as UV molecular absorption and circular dichroism (CD), in tandem with two multivariate data analysis methods, the hard modelling-based matrix method and the soft modelling-based MCR-ALS approach. Use of the hard chemical modelling enabled us to propose the equilibrium model, which describes spectral changes as functions of solution acidity. Additionally, the intrinsic protonation constant, K(in), and the cooperativity parameters, ω(c), and ω(a), were calculated from the fitting procedure of the coupled CD and molecular absorption spectra. In the case of ckitC2 and ckitC3, the hard model correctly reproduced the spectral variations observed experimentally. The results indicated that folding was accompanied by a cooperative process, i.e. the enhancement of protonated structure stability upon protonation. In contrast, unfolding was accompanied by an anticooperative process. Finally, folding of the native sequence, ckitC1, seemed to follow a more complex mechanism.

  6. New melanocortin 1 receptor binding motif based on the C-terminal sequence of alpha-melanocyte-stimulating hormone.

    PubMed

    Schiöth, Helgi B; Muceniece, Ruta; Mutule, Ilga; Wikberg, Jarl E S

    2006-10-01

    The C-terminal tripeptide of the alpha-melanocyte stimulating hormone (alpha-MSH11-13) possesses strong antiinflammatory activity without known cellular target. In order to better understand the structural requirements for function of such motif, we designed, synthesized and tested out Trp- and Tyr-containing analogues of the alpha-MSH11-13. Seven alpha-MSH11-13 analogues were synthesized and characterized for their binding to the melanocortin receptors recombinantly expressed in insect (Sf9) cells, infected with baculovirus carrying corresponding MC receptor DNA. We also tested these analogues on B16-F1 mouse melanoma cells endogenously expressing the MC1 receptor for binding and for ability to increase cAMP levels as well as on COS-7 cells transfected with the human MC receptors. The data indicate that HS401 (Ac-Tyr-Lys-Pro-Val-NH2) and HS402 (Ac-Lys-Pro-Val-Tyr-NH2) selectively bound to the MC1 receptor and stimulated cAMP generation in a concentration dependent way while the other Tyr- and Trp-containing alpha-MSH11-13 analogues neither bound to MC receptors nor stimulated cAMP. We have thus identified new MC receptor binding motif derived from the C-terminal sequence of alpha-MSH. The tetrapeptides have novel properties as the both act via MC-ergic pathways and also carry the anti-inflammatory alpha-MSH11-13 message sequence.

  7. A conserved sequence extending motif III of the motor domain in the Snf2-family DNA translocase Rad54 is critical for ATPase activity.

    PubMed

    Zhang, Xiao-Ping; Janke, Ryan; Kingsley, James; Luo, Jerry; Fasching, Clare; Ehmsen, Kirk T; Heyer, Wolf-Dietrich

    2013-01-01

    Rad54 is a dsDNA-dependent ATPase that translocates on duplex DNA. Its ATPase function is essential for homologous recombination, a pathway critical for meiotic chromosome segregation, repair of complex DNA damage, and recovery of stalled or broken replication forks. In recombination, Rad54 cooperates with Rad51 protein and is required to dissociate Rad51 from heteroduplex DNA to allow access by DNA polymerases for recombination-associated DNA synthesis. Sequence analysis revealed that Rad54 contains a perfect match to the consensus PIP box sequence, a widely spread PCNA interaction motif. Indeed, Rad54 interacts directly with PCNA, but this interaction is not mediated by the Rad54 PIP box-like sequence. This sequence is located as an extension of motif III of the Rad54 motor domain and is essential for full Rad54 ATPase activity. Mutations in this motif render Rad54 non-functional in vivo and severely compromise its activities in vitro. Further analysis demonstrated that such mutations affect dsDNA binding, consistent with the location of this sequence motif on the surface of the cleft formed by two RecA-like domains, which likely forms the dsDNA binding site of Rad54. Our study identified a novel sequence motif critical for Rad54 function and showed that even perfect matches to the PIP box consensus may not necessarily identify PCNA interaction sites.

  8. Rapid characterization of CRISPR-Cas9 protospacer adjacent motif sequence elements.

    PubMed

    Karvelis, Tautvydas; Gasiunas, Giedrius; Young, Joshua; Bigelyte, Greta; Silanskas, Arunas; Cigan, Mark; Siksnys, Virginijus

    2015-11-19

    To expand the repertoire of Cas9s available for genome targeting, we present a new in vitro method for the simultaneous examination of guide RNA and protospacer adjacent motif (PAM) requirements. The method relies on the in vitro cleavage of plasmid libraries containing a randomized PAM as a function of Cas9-guide RNA complex concentration. Using this method, we accurately reproduce the canonical PAM preferences for Streptococcus pyogenes, Streptococcus thermophilus CRISPR3 (Sth3), and CRISPR1 (Sth1). Additionally, PAM and sgRNA solutions for a novel Cas9 protein from Brevibacillus laterosporus are provided by the assay and are demonstrated to support functional activity in vitro and in plants.

  9. Critical Role for an acidic amino acid region in platelet signaling by the HemITAM (hemi-immunoreceptor tyrosine-based activation motif) containing receptor CLEC-2 (C-type lectin receptor-2).

    PubMed

    Hughes, Craig E; Sinha, Uma; Pandey, Anjali; Eble, Johannes A; O'Callaghan, Christopher A; Watson, Steve P

    2013-02-15

    CLEC-2 is a member of new family of C-type lectin receptors characterized by a cytosolic YXXL downstream of three acidic amino acids in a sequence known as a hemITAM (hemi-immunoreceptor tyrosine-based activation motif). Dimerization of two phosphorylated CLEC-2 molecules leads to recruitment of the tyrosine kinase Syk via its tandem SH2 domains and initiation of a downstream signaling cascade. Using Syk-deficient and Zap-70-deficient cell lines we show that hemITAM signaling is restricted to Syk and that the upstream triacidic amino acid sequence is required for signaling. Using surface plasmon resonance and phosphorylation studies, we demonstrate that the triacidic amino acids are required for phosphorylation of the YXXL. These results further emphasize the distinct nature of the proximal events in signaling by hemITAM relative to ITAM receptors.

  10. NullSeq: A Tool for Generating Random Coding Sequences with Desired Amino Acid and GC Contents

    PubMed Central

    Liu, Sophia S.; Hockenberry, Adam J.; Lancichinetti, Andrea; Jewett, Michael C.

    2016-01-01

    The existence of over- and under-represented sequence motifs in genomes provides evidence of selective evolutionary pressures on biological mechanisms such as transcription, translation, ligand-substrate binding, and host immunity. In order to accurately identify motifs and other genome-scale patterns of interest, it is essential to be able to generate accurate null models that are appropriate for the sequences under study. While many tools have been developed to create random nucleotide sequences, protein coding sequences are subject to a unique set of constraints that complicates the process of generating appropriate null models. There are currently no tools available that allow users to create random coding sequences with specified amino acid composition and GC content for the purpose of hypothesis testing. Using the principle of maximum entropy, we developed a method that generates unbiased random sequences with pre-specified amino acid and GC content, which we have developed into a python package. Our method is the simplest way to obtain maximally unbiased random sequences that are subject to GC usage and primary amino acid sequence constraints. Furthermore, this approach can easily be expanded to create unbiased random sequences that incorporate more complicated constraints such as individual nucleotide usage or even di-nucleotide frequencies. The ability to generate correctly specified null models will allow researchers to accurately identify sequence motifs which will lead to a better understanding of biological processes as well as more effective engineering of biological systems. PMID:27835644

  11. Sequence motifs associated with paternal transmission of mitochondrial DNA in the horse mussel, Modiolus modiolus (Bivalvia: Mytilidae).

    PubMed

    Robicheau, Brent M; Breton, Sophie; Stewart, Donald T

    2017-03-20

    In the majority of metazoans paternal mitochondria represent evolutionary dead-ends. In many bivalves, however, this paradigm does not hold true; both maternal and paternal mitochondria are inherited. Herein, we characterize maternal and paternal mitochondrial control regions of the horse mussel, Modiolus modiolus (Bivalvia: Mytilidae). The maternal control region is 808bp long, while the paternal control region is longer at 2.3kb. We hypothesize that the size difference is due to a combination of repeated duplications within the control region of the paternal mtDNA genome, as well as an evolutionarily ancient recombination event between two sex-associated mtDNA genomes that led to the insertion of a second control region sequence in the genome that is now transmitted via males. In a comparison to other mytilid male control regions, we identified two evolutionarily Conserved Motifs, CMA and CMB, associated with paternal transmission of mitochondrial DNA. CMA is characterized by a conserved purine/pyrimidine pattern, while CMB exhibits a specific 13bp nucleotide string within a stem and loop structure. The identification of motifs CMA and CMB in M. modiolus extends our understanding of Sperm Transmission Elements (STEs) that have recently been identified as being associated with the paternal transmission of mitochondria in marine bivalves.

  12. A Bayesian hidden Markov model for motif discovery through joint modeling of genomic sequence and ChIP-chip data.

    PubMed

    Gelfond, Jonathan A L; Gupta, Mayetri; Ibrahim, Joseph G

    2009-12-01

    We propose a unified framework for the analysis of chromatin (Ch) immunoprecipitation (IP) microarray (ChIP-chip) data for detecting transcription factor binding sites (TFBSs) or motifs. ChIP-chip assays are used to focus the genome-wide search for TFBSs by isolating a sample of DNA fragments with TFBSs and applying this sample to a microarray with probes corresponding to tiled segments across the genome. Present analytical methods use a two-step approach: (i) analyze array data to estimate IP-enrichment peaks then (ii) analyze the corresponding sequences independently of intensity information. The proposed model integrates peak finding and motif discovery through a unified Bayesian hidden Markov model (HMM) framework that accommodates the inherent uncertainty in both measurements. A Markov chain Monte Carlo algorithm is formulated for parameter estimation, adapting recursive techniques used for HMMs. In simulations and applications to a yeast RAP1 dataset, the proposed method has favorable TFBS discovery performance compared to currently available two-stage procedures in terms of both sensitivity and specificity.

  13. A Bayesian hidden Markov model for motif discovery through joint modeling of genomic sequence and ChIP-chip data

    PubMed Central

    Gelfond, Jonathan A. L.; Gupta, Mayetri; Ibrahim, Joseph G.

    2009-01-01

    SUMMARY We propose a unified framework for the analysis of Chromatin (Ch) Immunoprecipitation (IP) microarray (ChIP-chip) data for detecting transcription factor binding sites (TFBSs) or motifs. ChIP-chip assays are used to focus the genome-wide search for TFBSs by isolating a sample of DNA fragments with TFBSs and applying this sample to a microarray with probes corresponding to tiled segments across the genome. Present analytical methods use a two-step approach: (i) analyze array data to estimate IP enrichment peaks then (ii) analyze the corresponding sequences independently of intensity information. The proposed model integrates peak finding and motif discovery through a unified Bayesian hidden Markov model (HMM) framework that accommodates the inherent uncertainty in both measurements. A Markov Chain Monte Carlo algorithm is formulated for parameter estimation, adapting recursive techniques used for HMMs. In simulations and applications to a yeast RAP1 dataset, the proposed method has favorable TFBS discovery performance compared to currently available two-stage procedures in terms of both sensitivity and specificity. PMID:19210737

  14. The LIMP-2/SCARB2 Binding Motif on Acid β-Glucosidase

    PubMed Central

    Liou, Benjamin; Haffey, Wendy D.; Greis, Kenneth D.; Grabowski, Gregory A.

    2014-01-01

    The acid β-glucosidase (glucocerbrosidase (GCase)) binding sequence to LIMP-2 (lysosomal integral membrane protein 2), the receptor for intracellular GCase trafficking to the lysosome, has been identified. Heterologous expression of deletion constructs, the available GCase crystal structures, and binding and co-localization of identified peptides or mutant GCases were used to identify and characterize a highly conserved 11-amino acid sequence, DSPIIVDITKD, within human GCase. The binding to LIMP-2 is not dependent upon a single amino acid, but the interactions of GCase with LIMP-2 are heavily influenced by Asp399 and the di-isoleucines, Ile402 and Ile403. A single alanine substitution at any of these decreases GCase binding to LIMP-2 and alters its pH-dependent binding as well as diminishing the trafficking of GCase to the lysosome and significantly increasing GCase secretion. Enterovirus 71 also binds to LIMP-2 (also known as SCARB2) on the external surface of the plasma membrane. However, the LIMP-2/SCARB2 binding sequences for enterovirus 71 and GCase are not similar, indicating that LIMP-2/SCARB2 may have multiple or overlapping binding sites with differing specificities. These findings have therapeutic implications for the production of GCase and the distribution of this enzyme that is delivered to various organs. PMID:25202012

  15. De novo computational identification of stress-related sequence motifs and microRNA target sites in untranslated regions of a plant translatome

    PubMed Central

    Munusamy, Prabhakaran; Zolotarov, Yevgen; Meteignier, Louis-Valentin; Moffett, Peter; Strömvik, Martina V.

    2017-01-01

    Gene regulation at the transcriptional and translational level leads to diversity in phenotypes and function in organisms. Regulatory DNA or RNA sequence motifs adjacent to the gene coding sequence act as binding sites for proteins that in turn enable or disable expression of the gene. Whereas the known DNA and RNA binding proteins range in the thousands, only a few motifs have been examined. In this study, we have predicted putative regulatory motifs in groups of untranslated regions from genes regulated at the translational level in Arabidopsis thaliana under normal and stressed conditions. The test group of sequences was divided into random subgroups and subjected to three de novo motif finding algorithms (Seeder, Weeder and MEME). In addition to identifying sequence motifs, using an in silico tool we have predicted microRNA target sites in the 3′ UTRs of the translationally regulated genes, as well as identified upstream open reading frames located in the 5′ UTRs. Our bioinformatics strategy and the knowledge generated contribute to understanding gene regulation during stress, and can be applied to disease and stress resistant plant development. PMID:28276452

  16. Alignment of U3 region sequences of mammalian type C viruses: identification of highly conserved motifs and implications for enhancer design.

    PubMed Central

    Golemis, E A; Speck, N A; Hopkins, N

    1990-01-01

    We aligned published sequences for the U3 region of 35 type C mammalian retroviruses. The alignment reveals that certain sequence motifs within the U3 region are strikingly conserved. A number of these motifs correspond to previously identified sites. In particular, we found that the enhancer region of most of the viruses examined contains a binding site for leukemia virus factor b, a viral corelike element, the consensus motif for nuclear factor 1, and the glucocorticoid response element. Most viruses containing more than one copy of enhancer sequences include these binding sites in both copies of the repeat. We consider this set of binding sites to constitute a framework for the enhancers of this set of viruses. Other highly conserved motifs in the U3 region include the retrovirus inverted repeat sequence, a negative regulatory element, and the CCAAT and TATA boxes. In addition, we identified two novel motifs in the promoter region that were exceptionally highly conserved but have not been previously described. PMID:2153223

  17. WeederH: an algorithm for finding conserved regulatory motifs and regions in homologous sequences

    PubMed Central

    Pavesi, Giulio; Zambelli, Federico; Pesole, Graziano

    2007-01-01

    Background This work addresses the problem of detecting conserved transcription factor binding sites and in general regulatory regions through the analysis of sequences from homologous genes, an approach that is becoming more and more widely used given the ever increasing amount of genomic data available. Results We present an algorithm that identifies conserved transcription factor binding sites in a given sequence by comparing it to one or more homologs, adapting a framework we previously introduced for the discovery of sites in sequences from co-regulated genes. Differently from the most commonly used methods, the approach we present does not need or compute an alignment of the sequences investigated, nor resorts to descriptors of the binding specificity of known transcription factors. The main novel idea we introduce is a relative measure of conservation, assuming that true functional elements should present a higher level of conservation with respect to the rest of the sequence surrounding them. We present tests where we applied the algorithm to the identification of conserved annotated sites in homologous promoters, as well as in distal regions like enhancers. Conclusion Results of the tests show how the algorithm can provide fast and reliable predictions of conserved transcription factor binding sites regulating the transcription of a gene, with better performances than other available methods for the same task. We also show examples on how the algorithm can be successfully employed when promoter annotations of the genes investigated are missing, or when regulatory sites and regions are located far away from the genes. PMID:17286865

  18. A sequence motif enriched in regions bound by the Drosophila dosage compensation complex

    PubMed Central

    2010-01-01

    Background In Drosophila melanogaster, dosage compensation is mediated by the action of the dosage compensation complex (DCC). How the DCC recognizes the fly X chromosome is still poorly understood. Characteristic sequence signatures at all DCC binding sites have not hitherto been found. Results In this study, we compare the known binding sites of the DCC with oligonucleotide profiles that measure the specificity of the sequences of the D. melanogaster X chromosome. We show that the X chromosome regions bound by the DCC are enriched for a particular type of short, repetitive sequences. Their distribution suggests that these sequences contribute to chromosome recognition, the generation of DCC binding sites and/or the local spreading of the complex. Comparative data indicate that the same sequences may be involved in dosage compensation in other Drosophila species. Conclusions These results offer an explanation for the wild-type binding of the DCC along the Drosophila X chromosome, contribute to delineate the forces leading to the establishment of dosage compensation and suggest new experimental approaches to understand the precise biochemical features of the dosage compensation system. PMID:20226017

  19. Phenolic acid esterases, coding sequences and methods

    DOEpatents

    Blum, David L.; Kataeva, Irina; Li, Xin-Liang; Ljungdahl, Lars G.

    2002-01-01

    Described herein are four phenolic acid esterases, three of which correspond to domains of previously unknown function within bacterial xylanases, from XynY and XynZ of Clostridium thermocellum and from a xylanase of Ruminococcus. The fourth specifically exemplified xylanase is a protein encoded within the genome of Orpinomyces PC-2. The amino acids of these polypeptides and nucleotide sequences encoding them are provided. Recombinant host cells, expression vectors and methods for the recombinant production of phenolic acid esterases are also provided.

  20. Structural Analysis of a Repetitive Protein Sequence Motif in Strepsirrhine Primate Amelogenin

    PubMed Central

    Bromley, Keith M.; Hacia, Joseph G.; Bromage, Timothy G.; Snead, Malcolm L.; Moradian-Oldak, Janet; Paine, Michael L.

    2011-01-01

    Strepsirrhines are members of a primate suborder that has a distinctive set of features associated with the development of the dentition. Amelogenin (AMEL), the better known of the enamel matrix proteins, forms 90% of the secreted organic matrix during amelogenesis. Although AMEL has been sequenced in numerous mammalian lineages, the only reported strepsirrhine AMEL sequences are those of the ring-tailed lemur and galago, which contain a set of additional proline-rich tandem repeats absent in all other primates species analyzed to date, but present in some non-primate mammals. Here, we first determined that these repeats are present in AMEL from three additional lemur species and thus are likely to be widespread throughout this group. To evaluate the functional relevance of these repeats in strepsirrhines, we engineered a mutated murine amelogenin sequence containing a similar proline-rich sequence to that of Lemur catta. In the monomeric form, the MQP insertions had no influence on the secondary structure or refolding properties, whereas in the assembled form, the insertions increased the hydrodynamic radii. We speculate that increased AMEL nanosphere size may influence enamel formation in strepsirrhine primates. PMID:21437261

  1. Di-acidic Motifs in the Membrane-distal C Termini Modulate the Transport of Angiotensin II Receptors from the Endoplasmic Reticulum to the Cell Surface*

    PubMed Central

    Zhang, Xiaoping; Dong, Chunmin; Wu, Qiong J.; Balch, William E.; Wu, Guangyu

    2011-01-01

    The molecular mechanisms underlying the endoplasmic reticulum (ER) export and cell surface transport of nascent G protein-coupled receptors (GPCRs) have just begun to be revealed and previous studies have shown that hydrophobic motifs in the putative amphipathic 8th α-helical region within the membrane-proximal C termini play an important role. In this study, we demonstrate that di-acidic motifs in the membrane-distal, nonstructural C-terminal portions are required for the exit from the ER and transport to the plasma membrane of angiotensin II receptors, but not adrenergic receptors. More interestingly, distinct di-acidic motifs dictate optimal export trafficking of different angiotensin II receptors and export ability of each acidic residue in the di-acidic motifs cannot be fully substituted by other acidic residue. Moreover, the function of the di-acidic motifs is likely mediated through facilitating the recruitment of the receptors onto the ER-derived COPII transport vesicles. Therefore, the di-acidic motifs located in the membrane-distal C termini may represent the first linear motifs which recruit selective GPCRs onto the COPII vesicles to control their export from the ER. PMID:21507945

  2. Method for identifying and quantifying nucleic acid sequence aberrations

    DOEpatents

    Lucas, J.N.; Straume, T.; Bogen, K.T.

    1998-07-21

    A method is disclosed for detecting nucleic acid sequence aberrations by detecting nucleic acid sequences having both a first and a second nucleic acid sequence type, the presence of the first and second sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. The method uses a first hybridization probe which includes a nucleic acid sequence that is complementary to a first sequence type and a first complexing agent capable of attaching to a second complexing agent and a second hybridization probe which includes a nucleic acid sequence that selectively hybridizes to the second nucleic acid sequence type over the first sequence type and includes a detectable marker for detecting the second hybridization probe. 11 figs.

  3. Method for identifying and quantifying nucleic acid sequence aberrations

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    1998-01-01

    A method for detecting nucleic acid sequence aberrations by detecting nucleic acid sequences having both a first and a second nucleic acid sequence type, the presence of the first and second sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. The method uses a first hybridization probe which includes a nucleic acid sequence that is complementary to a first sequence type and a first complexing agent capable of attaching to a second complexing agent and a second hybridization probe which includes a nucleic acid sequence that selectively hybridizes to the second nucleic acid sequence type over the first sequence type and includes a detectable marker for detecting the second hybridization probe.

  4. A novel sorting motif in the glutamate transporter excitatory amino acid transporter 3 directs its targeting in Madin-Darby canine kidney cells and hippocampal neurons.

    PubMed

    Cheng, Chialin; Glover, Greta; Banker, Gary; Amara, Susan G

    2002-12-15

    The glutamate transporter excitatory amino acid transporter 3 (EAAT3) is polarized to the apical surface in epithelial cells and localized to the dendritic compartment in hippocampal neurons, where it is clustered adjacent to postsynaptic sites. In this study, we analyzed the sequences in EAAT3 that are responsible for its polarized localization in Madin-Darby canine kidney (MDCK) cells and neurons. Confocal microscopy and cell surface biotinylation assays demonstrated that deletion of the EAAT3 C terminus or replacement of the C terminus of EAAT3 with the analogous region in EAAT1 eliminated apical localization in MDCK cells. The C terminus of EAAT3 was sufficient to redirect the basolateral-preferring EAAT1 and the nonpolarized EAAT2 to the apical surface. Using alanine substitution mutants, we identified a short peptide motif in the cytoplasmic C-terminal region of EAAT3 that directs its apical localization in MDCK cells. Mutation of this sequence also impairs dendritic targeting of EAAT3 in hippocampal neurons but does not interfere with the clustering of EAAT3 on dendritic spines and filopodia. These data provide the first evidence that an identical cytoplasmic motif can direct apical targeting in epithelia and somatodendritic targeting in neurons. Moreover, our results demonstrate that the two fundamental features of the localization of EAAT3 in neurons, its restriction to the somatodendritic domain and its clustering near postsynaptic sites, are mediated by distinct molecular mechanisms.

  5. On the Concept of Cis-regulatory Information: From Sequence Motifs to Logic Functions

    NASA Astrophysics Data System (ADS)

    Tarpine, Ryan; Istrail, Sorin

    The regulatory genome is about the “system level organization of the core genomic regulatory apparatus, and how this is the locus of causality underlying the twin phenomena of animal development and animal evolution” (E.H. Davidson. The Regulatory Genome: Gene Regulatory Networks in Development and Evolution, Academic Press, 2006). Information processing in the regulatory genome is done through regulatory states, defined as sets of transcription factors (sequence-specific DNA binding proteins which determine gene expression) that are expressed and active at the same time. The core information processing machinery consists of modular DNA sequence elements, called cis-modules, that interact with transcription factors. The cis-modules “read” the information contained in the regulatory state of the cell through transcription factor binding, “process” it, and directly or indirectly communicate with the basal transcription apparatus to determine gene expression. This endowment of each gene with the information-receiving capacity through their cis-regulatory modules is essential for the response to every possible regulatory state to which it might be exposed during all phases of the life cycle and in all cell types. We present here a set of challenges addressed by our CYRENE research project aimed at studying the cis-regulatory code of the regulatory genome. The CYRENE Project is devoted to (1) the construction of a database, the cis-Lexicon, containing comprehensive information across species about experimentally validated cis-regulatory modules; and (2) the software development of a next-generation genome browser, the cis-Browser, specialized for the regulatory genome. The presentation is anchored on three main computational challenges: the Gene Naming Problem, the Consensus Sequence Bottleneck Problem, and the Logic Function Inference Problem.

  6. Peptide sequences identified by phage display are immunodominant functional motifs of Pet and Pic serine proteases secreted by Escherichia coli and Shigella flexneri.

    PubMed

    Ulises, Hernández-Chiñas; Tatiana, Gazarian; Karlen, Gazarian; Guillermo, Mendoza-Hernández; Juan, Xicohtencatl-Cortes; Carlos, Eslava

    2009-12-01

    Plasmid-encoded toxin (Pet) and protein involved in colonization (Pic), are serine protease autotransporters of Enterobacteriaceae (SPATEs) secreted by enteroaggregative Escherichia coli (EAEC), which display the GDSGSG sequence or the serine motif. Our research was directed to localize functional sites in both proteins using the phage display method. From a 12mer linear and a 7mer cysteine-constrained (C7C) libraries displayed on the M13 phage pIII protein we selected different mimotopes using IgG purified from sera of children naturally infected with EAEC producing Pet and Pic proteins, and anti-Pet and anti-Pic IgG purified from rabbits immunized with each one of these proteins. Children IgG selected a homologous group of sequences forming the consensus sequence, motif, PQPxK, and the motifs PGxI/LN and CxPDDSSxC were selected by the rabbit anti-Pet and anti-Pic IgGs, respectively. Analysis of the amino terminal region of a panel of SPATEs showed the presence in all of them of sequences matching the PGxI/LN or CxPDDSSxC motifs, and in a three-dimensional model (Modeller 9v2) designed for Pet, both these motifs were found in the globular portion of the protein, close to the protease active site GDSGSG. Antibodies induced in mice by mimotopes carrying the three aforementioned motifs were reactive with Pet, Pic, and with synthetic peptides carrying the immunogenic mimotope sequences TYPGYINHSKA and LLPQPPKLLLP, thus confirming that the peptide moiety of the selected phages induced the antibodies specific for the toxins. The antibodies induced in mice to the PGxI/LN and CxPDDSSxC mimotopes inhibited fodrin proteolysis and macrophage chemotaxis biological activities of Pet. Our results showed that we were able to generate, by a phage display procedure, mimotopes with sequence motifs PGxI/LN and CxPDDSSxC, and to identify them as functional motifs of the Pet, Pic and other SPATEs involved in their biological activities.

  7. High-throughput sequencing enhanced phage display enables the identification of patient-specific epitope motifs in serum

    PubMed Central

    Christiansen, Anders; Kringelum, Jens V.; Hansen, Christian S.; Bøgh, Katrine L.; Sullivan, Eric; Patel, Jigar; Rigby, Neil M.; Eiwegger, Thomas; Szépfalusi, Zsolt; Masi, Federico de; Nielsen, Morten; Lund, Ole; Dufva, Martin

    2015-01-01

    Phage display is a prominent screening technique with a multitude of applications including therapeutic antibody development and mapping of antigen epitopes. In this study, phages were selected based on their interaction with patient serum and exhaustively characterised by high-throughput sequencing. A bioinformatics approach was developed in order to identify peptide motifs of interest based on clustering and contrasting to control samples. Comparison of patient and control samples confirmed a major issue in phage display, namely the selection of unspecific peptides. The potential of the bioinformatic approach was demonstrated by identifying epitopes of a prominent peanut allergen, Ara h 1, in sera from patients with severe peanut allergy. The identified epitopes were confirmed by high-density peptide micro-arrays. The present study demonstrates that high-throughput sequencing can empower phage display by (i) enabling the analysis of complex biological samples, (ii) circumventing the traditional laborious picking and functional testing of individual phage clones and (iii) reducing the number of selection rounds. PMID:26246327

  8. CpG island erosion, polycomb occupancy and sequence motif enrichment at bivalent promoters in mammalian embryonic stem cells.

    PubMed

    Mantsoki, Anna; Devailly, Guillaume; Joshi, Anagha

    2015-11-19

    In embryonic stem (ES) cells, developmental regulators have a characteristic bivalent chromatin signature marked by simultaneous presence of both activation (H3K4me3) and repression (H3K27me3) signals and are thought to be in a 'poised' state for subsequent activation or silencing during differentiation. We collected eleven pairs (H3K4me3 and H3K27me3) of ChIP sequencing datasets in human ES cells and eight pairs in murine ES cells, and predicted high-confidence (HC) bivalent promoters. Over 85% of H3K27me3 marked promoters were bivalent in human and mouse ES cells. We found that (i) HC bivalent promoters were enriched for developmental factors and were highly likely to be differentially expressed upon transcription factor perturbation; (ii) murine HC bivalent promoters were occupied by both polycomb repressive component classes (PRC1 and PRC2) and grouped into four distinct clusters with different biological functions; (iii) HC bivalent and active promoters were CpG rich while H3K27me3-only promoters lacked CpG islands. Binding enrichment of distinct sets of regulators distinguished bivalent from active promoters. Moreover, a 'TCCCC' sequence motif was specifically enriched in bivalent promoters. Finally, this analysis will serve as a resource for future studies to further understand transcriptional regulation during embryonic development.

  9. CpG island erosion, polycomb occupancy and sequence motif enrichment at bivalent promoters in mammalian embryonic stem cells

    PubMed Central

    Mantsoki, Anna; Devailly, Guillaume; Joshi, Anagha

    2015-01-01

    In embryonic stem (ES) cells, developmental regulators have a characteristic bivalent chromatin signature marked by simultaneous presence of both activation (H3K4me3) and repression (H3K27me3) signals and are thought to be in a ‘poised’ state for subsequent activation or silencing during differentiation. We collected eleven pairs (H3K4me3 and H3K27me3) of ChIP sequencing datasets in human ES cells and eight pairs in murine ES cells, and predicted high-confidence (HC) bivalent promoters. Over 85% of H3K27me3 marked promoters were bivalent in human and mouse ES cells. We found that (i) HC bivalent promoters were enriched for developmental factors and were highly likely to be differentially expressed upon transcription factor perturbation; (ii) murine HC bivalent promoters were occupied by both polycomb repressive component classes (PRC1 and PRC2) and grouped into four distinct clusters with different biological functions; (iii) HC bivalent and active promoters were CpG rich while H3K27me3-only promoters lacked CpG islands. Binding enrichment of distinct sets of regulators distinguished bivalent from active promoters. Moreover, a ‘TCCCC’ sequence motif was specifically enriched in bivalent promoters. Finally, this analysis will serve as a resource for future studies to further understand transcriptional regulation during embryonic development. PMID:26582124

  10. Machine learning study of classifiers trained with biophysiochemical properties of amino acids to predict fibril forming Peptide motifs.

    PubMed

    Kumaran Nair, Smitha Sunil; Subba Reddy, N V; Hareesha, K S

    2012-09-01

    It is important to understand the cause of amyloid illnesses by predicting the short protein fragments capable of forming amyloid-like fibril motifs aiding in the discovery of sequence-targeted anti-aggregation drugs. It is extremely desirable to design computational tools to provide affordable in silico predictions owing to the limitations of molecular techniques for their identification. In this research article, we tried to study, from a machine learning perspective, the performance of several machine learning classifiers that use heterogenous features based on biochemical and biophysical properties of amino acids to discriminate between amyloidogenic and non-amyloidogenic regions in peptides. Four conventional machine learning classifiers namely Support Vector Machine, Neural network, Decision tree and Random forest were trained and tested to find the best classifier that fits the problem domain well. Prior to classification, novel implementations of two biologically-inspired feature optimization techniques based on evolutionary algorithms and methodologies that mimic social life and a multivariate method based on projection are utilized in order to remove the unimportant and uninformative features. Among the dimenionality reduction algorithms considered under the study, prediction results show that algorithms based on evolutionary computation is the most effective. SVM best suits the problem domain in its fitment among the classifiers considered. The best classifier is also compared with an online predictor to evidence the equilibrium maintained between true positive rates and false positive rates in the proposed classifier. This exploratory study suggests that these methods are promising in providing amyloidogenity prediction and may be further extended for large-scale proteomic studies.

  11. Methods for analyzing nucleic acid sequences

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2011-05-17

    The present invention is directed to a method of sequencing a target nucleic acid. The method provides a complex comprising a polymerase enzyme, a target nucleic acid molecule, and a primer, wherein the complex is immobilized on a support Fluorescent label is attached to a terminal phosphate group of the nucleotide or nucleotide analog. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The time duration of the signal from labeled nucleotides or nucleotide analogs that become incorporated is distinguished from freely diffusing labels by a longer retention in the observation volume for the nucleotides or nucleotide analogs that become incorporated than for the freely diffusing labels.

  12. Contribution of Sequence Motif, Chromatin State, and DNA Structure Features to Predictive Models of Transcription Factor Binding in Yeast.

    PubMed

    Tsai, Zing Tsung-Yeh; Shiu, Shin-Han; Tsai, Huai-Kuang

    2015-08-01

    Transcription factor (TF) binding is determined by the presence of specific sequence motifs (SM) and chromatin accessibility, where the latter is influenced by both chromatin state (CS) and DNA structure (DS) properties. Although SM, CS, and DS have been used to predict TF binding sites, a predictive model that jointly considers CS and DS has not been developed to predict either TF-specific binding or general binding properties of TFs. Using budding yeast as model, we found that machine learning classifiers trained with either CS or DS features alone perform better in predicting TF-specific binding compared to SM-based classifiers. In addition, simultaneously considering CS and DS further improves the accuracy of the TF binding predictions, indicating the highly complementary nature of these two properties. The contributions of SM, CS, and DS features to binding site predictions differ greatly between TFs, allowing TF-specific predictions and potentially reflecting different TF binding mechanisms. In addition, a "TF-agnostic" predictive model based on three DNA "intrinsic properties" (in silico predicted nucleosome occupancy, major groove geometry, and dinucleotide free energy) that can be calculated from genomic sequences alone has performance that rivals the model incorporating experiment-derived data. This intrinsic property model allows prediction of binding regions not only across TFs, but also across DNA-binding domain families with distinct structural folds. Furthermore, these predicted binding regions can help identify TF binding sites that have a significant impact on target gene expression. Because the intrinsic property model allows prediction of binding regions across DNA-binding domain families, it is TF agnostic and likely describes general binding potential of TFs. Thus, our findings suggest that it is feasible to establish a TF agnostic model for identifying functional regulatory regions in potentially any sequenced genome.

  13. Contribution of Sequence Motif, Chromatin State, and DNA Structure Features to Predictive Models of Transcription Factor Binding in Yeast

    PubMed Central

    Tsai, Zing Tsung-Yeh; Shiu, Shin-Han; Tsai, Huai-Kuang

    2015-01-01

    Transcription factor (TF) binding is determined by the presence of specific sequence motifs (SM) and chromatin accessibility, where the latter is influenced by both chromatin state (CS) and DNA structure (DS) properties. Although SM, CS, and DS have been used to predict TF binding sites, a predictive model that jointly considers CS and DS has not been developed to predict either TF-specific binding or general binding properties of TFs. Using budding yeast as model, we found that machine learning classifiers trained with either CS or DS features alone perform better in predicting TF-specific binding compared to SM-based classifiers. In addition, simultaneously considering CS and DS further improves the accuracy of the TF binding predictions, indicating the highly complementary nature of these two properties. The contributions of SM, CS, and DS features to binding site predictions differ greatly between TFs, allowing TF-specific predictions and potentially reflecting different TF binding mechanisms. In addition, a "TF-agnostic" predictive model based on three DNA “intrinsic properties” (in silico predicted nucleosome occupancy, major groove geometry, and dinucleotide free energy) that can be calculated from genomic sequences alone has performance that rivals the model incorporating experiment-derived data. This intrinsic property model allows prediction of binding regions not only across TFs, but also across DNA-binding domain families with distinct structural folds. Furthermore, these predicted binding regions can help identify TF binding sites that have a significant impact on target gene expression. Because the intrinsic property model allows prediction of binding regions across DNA-binding domain families, it is TF agnostic and likely describes general binding potential of TFs. Thus, our findings suggest that it is feasible to establish a TF agnostic model for identifying functional regulatory regions in potentially any sequenced genome. PMID:26291518

  14. An aspartic acid at amino acid 108 is required to rescue infectious virus after transfection of a poliovirus cDNA containing a CGDD but not SGDD amino acid motif in 3Dpol.

    PubMed Central

    Walker, D E; McPherson, D; Jablonski, S A; McPherson, S; Morrow, C D

    1995-01-01

    Dpol did not result in the production of virus. Surprisingly, transfection of the poliovirus cDNAs containing the 3D-D-108/C-326 double mutation, but not the 3D-D-108/S-326 mutation, resulted in the production of virus. The virus obtained from transfection of polio-virus cDNAs containing 3D-D-108/C-326 mutation replicated with kinetics similar to that of the wild-type virus. RNA sequence analysis of the region of the 3Dpol containing the 3D-C-326 mutation revealed that the codon for cysteine (UGC) reverted to the codon for tyrosine (UAC). The results of these studies establish that under the appropriate conditions, poliovirus has the capacity to revert mutations within the YGDD amino acid motif of the poliovirus 3Dpol gene and further strengthen the idea that interaction between amino acid 108 and the YGDD region of 3Dpol is required for viral replication. PMID:7494345

  15. Dual hydrogen-bonding motifs in complexes formed between tropolone and formic acid

    NASA Astrophysics Data System (ADS)

    Nemchick, Deacon J.; Cohen, Michael K.; Vaccaro, Patrick H.

    2016-11-01

    The near-ultraviolet π*←π absorption system of weakly bound complexes formed between tropolone (TrOH) and formic acid (FA) under cryogenic free-jet expansion conditions has been interrogated by exploiting a variety of fluorescence-based laser-spectroscopic probes, with synergistic quantum-chemical calculations built upon diverse model chemistries being enlisted to unravel the structural and dynamical properties of the pertinent ground [X˜ 1A'] and excited [A˜ 1A'(" separators="π*π )] electronic states. For binary TrOH ṡ FA adducts, the presence of dual hydrogen-bond linkages gives rise to three low-lying isomers designated (in relative energy order) as INT, EXT1, and EXT2 depending on whether docking of the FA ligand to the TrOH substrate takes place internal or external to the five-membered reaction cleft of tropolone. While the symmetric double-minimum topography predicted for the INT potential surface mediates an intermolecular double proton-transfer event, the EXT1 and EXT2 structures are interconverted by an asymmetric single proton-transfer process that is TrOH-centric in nature. The A ˜ -X ˜ origin of TrOH ṡ FA at ν˜ 00=27 484 .45 cm-1 is displaced by δ ν˜ 00=+466 .76 cm-1 with respect to the analogous feature for bare tropolone and displays a hybrid type - a/b rotational contour that reflects the configuration of binding. A comprehensive analysis of vibrational landscapes supported by the optically connected X˜ 1A' and A˜ 1A'(" separators="π*π ) manifolds, including the characteristic isotopic shifts incurred by partial deuteration of the labile TrOH and FA protons, has been performed leading to the uniform assignment of numerous intermolecular (viz., modulating hydrogen-bond linkages) and intramolecular (viz., localized on monomer subunits) degrees of freedom. The holistic interpretation of all experimental and computational findings affords compelling evidence that an external-binding motif (attributed to EXT1), rather than the

  16. Flow Cytometry-assisted Cloning of Specific Sequence Motifs fromComplex 16S ribosomal RNA Gene Libraries.

    SciTech Connect

    Nielsen, J.L.; Schramm, A.; Bernhard, A.E.; van den Engh, G.J.; Stahl, D.A.

    2004-07-21

    A flow cytometry method was developed for rapid screeningand recovery of cloned DNA containing common sequence motifs. Thisapproach, termed fluorescence-activated cell sorting-assisted cloning,was used to recover sequences affiliated with a unique lineage within theBacteroidetes not abundant in a clone library of environmental 16S rRNAgenes. Retrieval and sequence analysis of phylogenetically informativegenes has become a standard cultivation-independent technique toinvestigate microbial diversity in nature (7, 18). Genes encoding the 16SrRNA, because of the relative ease of their selective amplification, havebeen most frequently employed for general diversity surveys (16).Environmental studies have also focused on specific subpopulationsaffiliated with a phylogenetic group or identified by genes encodingspecific metabolic functions (e.g., ammonia oxidation, sulfaterespiration, and nitrate reduction) (8,15,20). However, specificpopulations may be of low abundance (1,23), or the genes encodingspecific metabolic functions may be insufficiently conserved to providepriming sites for general PCR amplification. Three general approacheshave been used to obtain 16S rRNA sequence information from low-abundancepopulations: screening hundreds to thousands of clones in a general 16SrRNA gene library (21), flow cytometric sorting of a subpopulation ofenvironmentally derived cells labeled by fluorescent in situhybridization (FISH) (27), or selective PCR amplification using primersspecific for the subpopulation (2,23). While the first approach is simplytime-consuming and tedious, the second has been restricted to fairlylarge and strongly fluorescent cells from aquatic samples (5, 27). Thethird approach often generates fragments of only a few hundred bases dueto the limited number of specific priming sites. Partial sequenceinformation often degrades analysis, obscuring or distorting thephylogenetic placement of the new sequences (11, 20). A more robustcharacterization of environ

  17. Localization and trafficking of an isoform of the AtPRA1 family to the Golgi apparatus depend on both N- and C-terminal sequence motifs.

    PubMed

    Jung, Chan Jin; Lee, Myoung Hui; Min, Myung Ki; Hwang, Inhwan

    2011-02-01

    Prenylated Rab acceptors (PRAs) bind to prenylated Rab proteins and possibly aid in targeting Rabs to their respective compartments. In Arabidopsis, 19 isoforms of PRA1 have been identified and, depending upon the isoforms, they localize to the endoplasmic reticulum (ER), Golgi apparatus and endosomes. Here, we investigated the localization and trafficking of AtPRA1.B6, an isoform of the Arabidopsis PRA1 family. In colocalization experiments with various organellar markers, AtPRA1.B6 tagged with hemagglutinin (HA) at the N-terminus localized to the Golgi apparatus in protoplasts and transgenic plants. The valine residue at the C-terminal end and an EEE motif in the C-terminal cytoplasmic domain were critical for anterograde trafficking from the ER to the Golgi apparatus. The N-terminal region contained a sequence motif for retention of AtPRA1.B6 at the Golgi apparatus. In addition, anterograde trafficking of AtPRA1.B6 from the ER to the Golgi apparatus was highly sensitive to the HA:AtPRA1.B6 level. The region that contains the sequence motif for Golgi retention also conferred the abundance-dependent trafficking inhibition. On the basis of these results, we propose that AtPRA1.B6 localizes to the Golgi apparatus and its ER-to-Golgi trafficking and localization to the Golgi apparatus are regulated by multiple sequence motifs in both the C- and N-terminal cytoplasmic domains.

  18. Amino acid substitutions in the FXYD motif enhance phospholemman-induced modulation of cardiac L-type calcium channels.

    PubMed

    Guo, Kai; Wang, Xianming; Gao, Guofeng; Huang, Congxin; Elmslie, Keith S; Peterson, Blaise Z

    2010-11-01

    We have found that phospholemman (PLM) associates with and modulates the gating of cardiac L-type calcium channels (Wang et al., Biophys J 98: 1149-1159, 2010). The short 17 amino acid extracellular NH(2)-terminal domain of PLM contains a highly conserved PFTYD sequence that defines it as a member of the FXYD family of ion transport regulators. Although we have learned a great deal about PLM-dependent changes in calcium channel gating, little is known regarding the molecular mechanisms underlying the observed changes. Therefore, we investigated the role of the PFTYD segment in the modulation of cardiac calcium channels by individually replacing Pro-8, Phe-9, Thr-10, Tyr-11, and Asp-12 with alanine (P8A, F9A, T10A, Y11A, D12A). In addition, Asp-12 was changed to lysine (D12K) and cysteine (D12C). As expected, wild-type PLM significantly slows channel activation and deactivation and enhances voltage-dependent inactivation (VDI). We were surprised to find that amino acid substitutions at Thr-10 and Asp-12 significantly enhanced the ability of PLM to modulate Ca(V)1.2 gating. T10A exhibited a twofold enhancement of PLM-induced slowing of activation, whereas D12K and D12C dramatically enhanced PLM-induced increase of VDI. The PLM-induced slowing of channel closing was abrogated by D12A and D12C, whereas D12K and T10A failed to impact this effect. These studies demonstrate that the PFXYD motif is not necessary for the association of PLM with Ca(V)1.2. Instead, since altering the chemical and/or physical properties of the PFXYD segment alters the relative magnitudes of opposing PLM-induced effects on Ca(V)1.2 channel gating, PLM appears to play an important role in fine tuning the gating kinetics of cardiac calcium channels and likely plays an important role in shaping the cardiac action potential and regulating Ca(2+) dynamics in the heart.

  19. The C-Terminal Sequence and PI motif of the Orchid (Oncidium Gower Ramsey) PISTILLATA (PI) Ortholog Determine its Ability to Bind AP3 Orthologs and Enter the Nucleus to Regulate Downstream Genes Controlling Petal and Stamen Formation.

    PubMed

    Mao, Wan-Ting; Hsu, Hsing-Fun; Hsu, Wei-Han; Li, Jen-Ying; Lee, Yung-I; Yang, Chang-Hsien

    2015-11-01

    This study focused on the investigation of the effects of the PI motif and C-terminus of the Oncidium Gower Ramsey MADS box gene 8 (OMADS8), a PISTILLATA (PI) ortholog, on floral organ formation. 35S::OMADS8 completely rescued and 35S::OMADS8-PI (with the PI motif deleted) partially rescued petal/stamen formation, whereas these deficiencies were not rescued by 35S::OMADS8-C (C-terminal 29 amino acids deleted) in pi-1 mutants. OMADS8 could interact with Arabidopsis APETALA3 (AP3) and enter the nucleus. The nuclear entry efficiency was reduced for OMADS8-PI/AP3 and OMADS8-C/AP3. OMADS8 could also interact with OMADS5/OMADS9 (the Oncidium AP3 ortholog) and enter the nucleus with an efficiency only slightly affected by the deletion of the C-terminal sequence or PI motif. However, the stability of the OMADS8/OMADS5 and OMADS8/OMADS9 complexes was significantly reduced by deletion of the C-terminal sequence or PI motif. Further analysis indicated that the expression of genes downstream of AP3/PI (BNQ1/BNQ2/GNC/At4g30270) was compensated by 35S::OMADS8 and 35S::OMADS8-PI to a level similar to wild-type plants but was not affected by 35S::OMADS8-C in the pi-1 mutants. A similar FRET (fluorescence resonance energy transfer) efficiency was observed for Arabidopsis AGAMOUS (AG) and the Oncidium AG ortholog OMADS4 for OMADS8, OMADS8-PI and OMADS8-C. These results indicated that the OMADS8 PI motif and C-terminus were valuable for the interaction of OMADS8 with the AP3 orthologs to form higher order heterotetrameric complexes that regulated petal/stamen formation in both Oncidium orchids and transgenic Arabidopsis. However, the C-terminal sequence and PI motif were dispensable for the interaction of OMADS8 with the AG orthologs.

  20. Protospacer recognition motifs

    PubMed Central

    Shah, Shiraz A.; Erdmann, Susanne; Mojica, Francisco J.M.; Garrett, Roger A.

    2013-01-01

    Protospacer adjacent motifs (PAMs) were originally characterized for CRISPR-Cas systems that were classified on the basis of their CRISPR repeat sequences. A few short 2–5 bp sequences were identified adjacent to one end of the protospacers. Experimental and bioinformatical results linked the motif to the excision of protospacers and their insertion into CRISPR loci. Subsequently, evidence accumulated from different virus- and plasmid-targeting assays, suggesting that these motifs were also recognized during DNA interference, at least for the recently classified type I and type II CRISPR-based systems. The two processes, spacer acquisition and protospacer interference, employ different molecular mechanisms, and there is increasing evidence to suggest that the sequence motifs that are recognized, while overlapping, are unlikely to be identical. In this article, we consider the properties of PAM sequences and summarize the evidence for their dual functional roles. It is proposed to use the terms protospacer associated motif (PAM) for the conserved DNA sequence and to employ spacer acqusition motif (SAM) and target interference motif (TIM), respectively, for acquisition and interference recognition sites. PMID:23403393

  1. The Thiamin Pyrophosphate-Motif

    NASA Technical Reports Server (NTRS)

    Dominiak, Paulina M.; Ciszak, Ewa M.

    2003-01-01

    Using databases the authors have identified a common thiamin pyrophosphate (TPP)-motif in the family of functionally diverse TPP-dependent enzymes. This common motif consists of multimeric organization of subunits, two catalytic centers, common amino acid sequence, and specific contacts to provide a flip-flop, or alternate site, mechanism of action. Each catalytic center [PP:PYR] is formed at the interface of the PP-domain binding the magnesium ion, pyrophosphate and aminopyrimidine ring of TPP, and the PYR-domain binding the aminopyrimidine ring of that cofactor. A pair of these catalytic centers constitutes the catalytic core [PP:PYR]* within these enzymes. Analysis of the structural elements of this catalytic core reveals novel definition of the common amino acid sequences, which are GX@&(G)@XXGQ, and GDGX25-30 within the PP- domain, and the E&(G)@XXG@ within the PYR-domain, where Q, corresponds to a hydrophobic amino acid. This TPP-motif provides a novel tool for annotation of TPP-dependent enzymes useful in advancing functional proteomics.

  2. 77 FR 65537 - Requirements for Patent Applications Containing Nucleotide Sequence and/or Amino Acid Sequence...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-10-29

    ... Amino Acid Sequence Disclosures ACTION: Proposed collection; comment request. SUMMARY: The United States....'' SUPPLEMENTARY INFORMATION: I. Abstract Patent applications that contain nucleotide and/or amino acid...

  3. Nucleotide and derived amino acid sequences of the major porin of Comamonas acidovorans and comparison of porin primary structures.

    PubMed Central

    Gerbl-Rieger, S; Peters, J; Kellermann, J; Lottspeich, F; Baumeister, W

    1991-01-01

    The DNA sequence of the gene which codes for the major outer membrane porin (Omp32) of Comamonas acidovorans has been determined. The structural gene encodes a precursor consisting of 351 amino acid residues with a signal peptide of 19 amino acid residues. Comparisons with amino acid sequences of outer membrane proteins and porins from several other members of the class Proteobacteria and of the Chlamydia trachomatis porin and the Neurospora crassa mitochondrial porin revealed a motif of eight regions of local homology. The results of this analysis are discussed with regard to common structural features of porins. PMID:1848840

  4. Native characterization of nucleic acid motif thermodynamics via non-covalent catalysis

    PubMed Central

    Wang, Chunyan; Bae, Jin H.; Zhang, David Yu

    2016-01-01

    DNA hybridization thermodynamics is critical for accurate design of oligonucleotides for biotechnology and nanotechnology applications, but parameters currently in use are inaccurately extrapolated based on limited quantitative understanding of thermal behaviours. Here, we present a method to measure the ΔG° of DNA motifs at temperatures and buffer conditions of interest, with significantly better accuracy (6- to 14-fold lower s.e.) than prior methods. The equilibrium constant of a reaction with thermodynamics closely approximating that of a desired motif is numerically calculated from directly observed reactant and product equilibrium concentrations; a DNA catalyst is designed to accelerate equilibration. We measured the ΔG° of terminal fluorophores, single-nucleotide dangles and multinucleotide dangles, in temperatures ranging from 10 to 45 °C. PMID:26782977

  5. Native characterization of nucleic acid motif thermodynamics via non-covalent catalysis

    NASA Astrophysics Data System (ADS)

    Wang, Chunyan; Bae, Jin H.; Zhang, David Yu

    2016-01-01

    DNA hybridization thermodynamics is critical for accurate design of oligonucleotides for biotechnology and nanotechnology applications, but parameters currently in use are inaccurately extrapolated based on limited quantitative understanding of thermal behaviours. Here, we present a method to measure the ΔG° of DNA motifs at temperatures and buffer conditions of interest, with significantly better accuracy (6- to 14-fold lower s.e.) than prior methods. The equilibrium constant of a reaction with thermodynamics closely approximating that of a desired motif is numerically calculated from directly observed reactant and product equilibrium concentrations; a DNA catalyst is designed to accelerate equilibration. We measured the ΔG° of terminal fluorophores, single-nucleotide dangles and multinucleotide dangles, in temperatures ranging from 10 to 45 °C.

  6. Native characterization of nucleic acid motif thermodynamics via non-covalent catalysis.

    PubMed

    Wang, Chunyan; Bae, Jin H; Zhang, David Yu

    2016-01-19

    DNA hybridization thermodynamics is critical for accurate design of oligonucleotides for biotechnology and nanotechnology applications, but parameters currently in use are inaccurately extrapolated based on limited quantitative understanding of thermal behaviours. Here, we present a method to measure the ΔG° of DNA motifs at temperatures and buffer conditions of interest, with significantly better accuracy (6- to 14-fold lower s.e.) than prior methods. The equilibrium constant of a reaction with thermodynamics closely approximating that of a desired motif is numerically calculated from directly observed reactant and product equilibrium concentrations; a DNA catalyst is designed to accelerate equilibration. We measured the ΔG° of terminal fluorophores, single-nucleotide dangles and multinucleotide dangles, in temperatures ranging from 10 to 45 °C.

  7. Identification of a novel calcium binding motif based on the detection of sequence insertions in the animal peroxidase domain of bacterial proteins.

    PubMed

    Santamaría-Hernando, Saray; Krell, Tino; Ramos-González, María-Isabel

    2012-01-01

    Proteins of the animal heme peroxidase (ANP) superfamily differ greatly in size since they have either one or two catalytic domains that match profile PS50292. The orf PP_2561 of Pseudomonas putida KT2440 that we have called PepA encodes a two-domain ANP. The alignment of these domains with those of PepA homologues revealed a variable number of insertions with the consensus G-x-D-G-x-x-[GN]-[TN]-x-D-D. This motif has also been detected in the structure of pseudopilin (pdb 3G20), where it was found to be involved in Ca(2+) coordination although a sequence analysis did not reveal the presence of any known calcium binding motifs in this protein. Isothermal titration calorimetry revealed that a peptide containing this consensus motif bound specifically calcium ions with affinities ranging between 33-79 µM depending on the pH. Microcalorimetric titrations of the purified N-terminal ANP-like domain of PepA revealed Ca(2+) binding with a K(D) of 12 µM and stoichiometry of 1.25 calcium ions per protein monomer. This domain exhibited peroxidase activity after its reconstitution with heme. These data led to the definition of a novel calcium binding motif that we have termed PERCAL and which was abundantly present in animal peroxidase-like domains of bacterial proteins. Bacterial heme peroxidases thus possess two different types of calcium binding motifs, namely PERCAL and the related hemolysin type calcium binding motif, with the latter being located outside the catalytic domains and in their C-terminal end. A phylogenetic tree of ANP-like catalytic domains of bacterial proteins with PERCAL motifs, including single domain peroxidases, was divided into two major clusters, representing domains with and without PERCAL motif containing insertions. We have verified that the recently reported classification of bacterial heme peroxidases in two families (cd09819 and cd09821) is unrelated to these insertions. Sequences matching PERCAL were detected in all kingdoms of life.

  8. Detection of nucleic acid sequences by invader-directed cleavage

    DOEpatents

    Brow, Mary Ann D.; Hall, Jeff Steven Grotelueschen; Lyamichev, Victor; Olive, David Michael; Prudent, James Robert

    1999-01-01

    The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The 5' nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof. The present invention further relates to methods and devices for the separation of nucleic acid molecules based by charge.

  9. Formation and Dissociation of the Interstrand i-Motif by the Sequences d(XnC4Ym) Monitored with Electrospray Ionization Mass Spectrometry

    NASA Astrophysics Data System (ADS)

    Cao, Yanwei; Qin, Yujiao; Bruist, Michael; Gao, Shang; Wang, Bing; Wang, Huixin; Guo, Xinhua

    2015-06-01

    Formation and dissociation of the interstrand i-motifs by DNA with the sequence d(XnC4Ym) (X and Y represent thymine, adenine, or guanine, and n, m range from 0 to 2) are studied with electrospray ionization mass spectrometry (ESI-MS), circular dichroism (CD), and UV spectrophotometry. The ion complexes detected in the gas phase and the melting temperatures (Tm) obtained in solution show that a non-C base residue located at 5' end favors formation of the four-stranded structures, with T > A > G for imparting stability. Comparatively, no rule is found when a non-C base is located at the 3' end. Detection of penta- and hexa-stranded ions indicates the formation of i-motifs with more than four strands. In addition, the i-motifs seen in our mass spectra are accompanied by single-, double-, and triple-stranded ions, and the trimeric ions were always less abundant during annealing and heat-induced dissociation process of the DNA strands in solution (pH = 4.5). This provides a direct evidence of a strand-by-strand formation and dissociation pathway of the interstrand i-motif and formation of the triple strands is the rate-limiting step. In contrast, the trimeric ions are abundant when the tetramolecular ions are subjected to collision-induced dissociation (CID) in the gas phase, suggesting different dissociation behaviors of the interstrand i-motif in the gas phase and in solution. Furthermore, hysteretic UV absorption melting and cooling curves reveal an irreversible dissociation and association kinetic process of the interstrand i-motif in solution.

  10. Unlocked nucleic acids with a pyrene-modified uracil: synthesis, hybridization studies, fluorescent properties and i-motif stability.

    PubMed

    Perlíková, Pavla; Karlsen, Kasper K; Pedersen, Erik B; Wengel, Jesper

    2014-01-03

    The synthesis of two new phosphoramidite building blocks for the incorporation of 5-(pyren-1-yl)uracilyl unlocked nucleic acid (UNA) monomers into oligonucleotides has been developed. Monomers containing a pyrene-modified nucleobase component were found to destabilize an i-motif structure at pH 5.2, both under molecular crowding and noncrowding conditions. The presence of the pyrene-modified UNA monomers in DNA strands led to decreases in the thermal stabilities of DNA*/DNA and DNA*/RNA duplexes, but these duplexes' thermal stabilities were better than those of duplexes containing unmodified UNA monomers. Pyrene-modified UNA monomers incorporated in bulges were able to stabilize DNA*/DNA duplexes due to intercalation of the pyrene moiety into the duplexes. Steady-state fluorescence emission studies of oligonucleotides containing pyrene-modified UNA monomers revealed decreases in fluorescence intensities upon hybridization to DNA or RNA. Efficient quenching of fluorescence of pyrene-modified UNA monomers was observed after formation of i-motif structures at pH 5.2. The stabilizing/destabilizing effect of pyrene-modified nucleic acids might be useful for designing antisense oligonucleotides and hybridization probes.

  11. Using Weeder, Pscan, and PscanChIP for the Discovery of Enriched Transcription Factor Binding Site Motifs in Nucleotide Sequences.

    PubMed

    Zambelli, Federico; Pesole, Graziano; Pavesi, Giulio

    2014-09-08

    One of the greatest challenges facing modern molecular biology is understanding the complex mechanisms regulating gene expression. A fundamental step in this process requires the characterization of sequence motifs involved in the regulation of gene expression at transcriptional and post-transcriptional levels. In particular, transcription is modulated by the interaction of transcription factors (TFs) with their corresponding binding sites. Weeder, Pscan, and PscanChIP are software tools freely available for noncommercial users as a stand-alone or Web-based applications for the automatic discovery of conserved motifs in a set of DNA sequences likely to be bound by the same TFs. Input for the tools can be promoter sequences from co-expressed or co-regulated genes (for which Weeder and Pscan are suitable), or regions identified through genome wide ChIP-seq or similar experiments (Weeder and PscanChIP). The motifs are either found by a de novo approach (Weeder) or by using descriptors of the binding specificity of TFs (Pscan and PscanChIP).

  12. Los Alamos sequence analysis package for nucleic acids and proteins.

    PubMed Central

    Kanehisa, M I

    1982-01-01

    An interactive system for computer analysis of nucleic acid and protein sequences has been developed for the Los Alamos DNA Sequence Database. It provides a convenient way to search or verify various sequence features, e.g., restriction enzyme sites, protein coding frames, and properties of coded proteins. Further, the comprehensive analysis package on a large-scale database can be used for comparative studies on sequence and structural homologies in order to find unnoted information stored in nucleic acid sequences. PMID:6174934

  13. Motivated Proteins: A web application for studying small three-dimensional protein motifs

    PubMed Central

    Leader, David P; Milner-White, E James

    2009-01-01

    Background Small loop-shaped motifs are common constituents of the three-dimensional structure of proteins. Typically they comprise between three and seven amino acid residues, and are defined by a combination of dihedral angles and hydrogen bonding partners. The most abundant of these are αβ-motifs, asx-motifs, asx-turns, β-bulges, β-bulge loops, β-turns, nests, niches, Schellmann loops, ST-motifs, ST-staples and ST-turns. We have constructed a database of such motifs from a range of high-quality protein structures and built a web application as a visual interface to this. Description The web application, Motivated Proteins, provides access to these 12 motifs (with 48 sub-categories) in a database of over 400 representative proteins. Queries can be made for specific categories or sub-categories of motif, motifs in the vicinity of ligands, motifs which include part of an enzyme active site, overlapping motifs, or motifs which include a particular amino acid sequence. Individual proteins can be specified, or, where appropriate, motifs for all proteins listed. The results of queries are presented in textual form as an (X)HTML table, and may be saved as parsable plain text or XML. Motifs can be viewed and manipulated either individually or in the context of the protein in the Jmol applet structural viewer. Cartoons of the motifs imposed on a linear representation of protein secondary structure are also provided. Summary information for the motifs is available, as are histograms of amino acid distribution, and graphs of dihedral angles at individual positions in the motifs. Conclusion Motivated Proteins is a publicly and freely accessible web application that enables protein scientists to study small three-dimensional motifs without requiring knowledge of either Structured Query Language or the underlying database schema. PMID:19210785

  14. Structure-Specific Nucleic Acid Recognition by L-motifs And Their Diverse Roles in Expression And Regulation Of The Genome

    PubMed Central

    Thapar, Roopa

    2015-01-01

    The high-mobility group (HMG) domain containing proteins regulate transcription, DNA replication and recombination. They adopt L-shaped folds and are structure-specific DNA binding motifs. Here, I define the L-motif super-family that consists of DNA-binding HMG-box proteins and the L-motif of the histone mRNA binding domain of Stem-Loop Binding Protein (SLBP). The SLBP L-motif and HMG-box domains adopt similar L-shaped folds with three α-helices and two or three small hydrophobic cores that stabilize the overall fold, but have very different and distinct modes of nucleic acid recognition. A comparison of the structure, dynamics, protein-protein and nucleic acid interactions, and regulation by PTMs of the SLBP and the HMG-box L-motifs reveals the versatile and diverse modes by which L-motifs utilize their surfaces for structure-specific recognition of nucleic acids to regulate gene expression. PMID:25748361

  15. Hybridization and sequencing of nucleic acids using base pair mismatches

    DOEpatents

    Fodor, Stephen P. A.; Lipshutz, Robert J.; Huang, Xiaohua

    2001-01-01

    Devices and techniques for hybridization of nucleic acids and for determining the sequence of nucleic acids. Arrays of nucleic acids are formed by techniques, preferably high resolution, light-directed techniques. Positions of hybridization of a target nucleic acid are determined by, e.g., epifluorescence microscopy. Devices and techniques are proposed to determine the sequence of a target nucleic acid more efficiently and more quickly through such synthesis and detection techniques.

  16. Nuclear Magnetic Resonance Structural Mapping Reveals Promiscuous Interactions between Clathrin-Box Motif Sequences and the N-Terminal Domain of the Clathrin Heavy Chain

    PubMed Central

    2016-01-01

    The recruitment and organization of clathrin at endocytic sites first to form coated pits and then clathrin-coated vesicles depend on interactions between the clathrin N-terminal domain (TD) and multiple clathrin binding sequences on the cargo adaptor and accessory proteins that are concentrated at such sites. Up to four distinct protein binding sites have been proposed to be present on the clathrin TD, with each site proposed to interact with a distinct clathrin binding motif. However, an understanding of how such interactions contribute to clathrin coat assembly must take into account observations that any three of these four sites on clathrin TD can be mutationally ablated without causing loss of clathrin-mediated endocytosis. To take an unbiased approach to mapping binding sites for clathrin-box motifs on clathrin TD, we used isothermal titration calorimetry (ITC) and nuclear magnetic resonance spectroscopy. Our ITC experiments revealed that a canonical clathrin-box motif peptide from the AP-2 adaptor binds to clathrin TD with a stoichiometry of 3:1. Assignment of 90% of the total visible amide resonances in the TROSY-HSQC spectrum of 13C-, 2H-, and 15N-labeled TD40 allowed us to map these three binding sites by analyzing the chemical shift changes as clathrin-box motif peptides were titrated into clathrin TD. We found that three different clathrin-box motif peptides can each simultaneously bind not only to the previously characterized clathrin-box site but also to the W-box site and the β-arrestin splice loop site on a single TD. The promiscuity of these binding sites can help explain why their mutation does not lead to larger effects on clathrin function and suggests a mechanism by which clathrin may be transferred between different proteins during the course of an endocytic event. PMID:25844500

  17. Insights into the Activity and Substrate Binding of Xylella fastidiosa Polygalacturonase by Modification of a Unique QMK Amino Acid Motif Using Protein Chimeras

    PubMed Central

    Warren, Jeremy G.; Lincoln, James E.; Kirkpatrick, Bruce C.

    2015-01-01

    Polygalacturonases (EC 3.2.1.15) catalyze the random hydrolysis of 1, 4-alpha-D-galactosiduronic linkages in pectate and other galacturonans. Xylella fastidiosa possesses a single polygalacturonase gene, pglA (PD1485), and X. fastidiosa mutants deficient in the production of polygalacturonase are non-pathogenic and show a compromised ability to systemically infect grapevines. These results suggested that grapevines expressing sufficient amounts of an inhibitor of X. fastidiosa polygalacturonase might be protected from disease. Previous work in our laboratory and others have tried without success to produce soluble active X. fastidiosa polygalacturonase for use in inhibition assays. In this study, we created two enzymatically active X. fastidiosa / A. vitis polygalacturonase chimeras, AX1A and AX2A to explore the functionality of X. fastidiosa polygalacturonase in vitro. The AX1A chimera was constructed to specifically test if recombinant chimeric protein, produced in Escherichia coli, is soluble and if the X. fastidiosa polygalacturonase catalytic amino acids are able to hydrolyze polygalacturonic acid. The AX2A chimera was constructed to evaluate the ability of a unique QMK motif of X. fastidiosa polygalacturonase, most polygalacturonases have a R(I/L)K motif, to bind to and allow the hydrolysis of polygalacturonic acid. Furthermore, the AX2A chimera was also used to explore what effect modification of the QMK motif of X. fastidiosa polygalacturonase to a conserved RIK motif has on enzymatic activity. These experiments showed that both the AX1A and AX2A polygalacturonase chimeras were soluble and able to hydrolyze the polygalacturonic acid substrate. Additionally, the modification of the QMK motif to the conserved RIK motif eliminated hydrolytic activity, suggesting that the QMK motif is important for the activity of X. fastidiosa polygalacturonase. This result suggests X. fastidiosa polygalacturonase may preferentially hydrolyze a different pectic substrate or

  18. Insights into the Activity and Substrate Binding of Xylella fastidiosa Polygalacturonase by Modification of a Unique QMK Amino Acid Motif Using Protein Chimeras.

    PubMed

    Warren, Jeremy G; Lincoln, James E; Kirkpatrick, Bruce C

    2015-01-01

    Polygalacturonases (EC 3.2.1.15) catalyze the random hydrolysis of 1, 4-alpha-D-galactosiduronic linkages in pectate and other galacturonans. Xylella fastidiosa possesses a single polygalacturonase gene, pglA (PD1485), and X. fastidiosa mutants deficient in the production of polygalacturonase are non-pathogenic and show a compromised ability to systemically infect grapevines. These results suggested that grapevines expressing sufficient amounts of an inhibitor of X. fastidiosa polygalacturonase might be protected from disease. Previous work in our laboratory and others have tried without success to produce soluble active X. fastidiosa polygalacturonase for use in inhibition assays. In this study, we created two enzymatically active X. fastidiosa / A. vitis polygalacturonase chimeras, AX1A and AX2A to explore the functionality of X. fastidiosa polygalacturonase in vitro. The AX1A chimera was constructed to specifically test if recombinant chimeric protein, produced in Escherichia coli, is soluble and if the X. fastidiosa polygalacturonase catalytic amino acids are able to hydrolyze polygalacturonic acid. The AX2A chimera was constructed to evaluate the ability of a unique QMK motif of X. fastidiosa polygalacturonase, most polygalacturonases have a R(I/L)K motif, to bind to and allow the hydrolysis of polygalacturonic acid. Furthermore, the AX2A chimera was also used to explore what effect modification of the QMK motif of X. fastidiosa polygalacturonase to a conserved RIK motif has on enzymatic activity. These experiments showed that both the AX1A and AX2A polygalacturonase chimeras were soluble and able to hydrolyze the polygalacturonic acid substrate. Additionally, the modification of the QMK motif to the conserved RIK motif eliminated hydrolytic activity, suggesting that the QMK motif is important for the activity of X. fastidiosa polygalacturonase. This result suggests X. fastidiosa polygalacturonase may preferentially hydrolyze a different pectic substrate or

  19. Sequence and spatiotemporal expression analysis of CLE-motif containing genes from the reniform nematode (Rotylenchulus reniformis Linford & Oliveira)

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The reniform nematode, Rotylenchulus reniformis, is a sedentary semi-endoparasitic species with a host range that encompasses more than 77 plant families. Nematode effector proteins containing plant-ligand motifs similar to CLAVATA3/ESR (CLE) peptides have been identified in the Heterodera, Globode...

  20. De Novo Regulatory Motif Discovery Identifies Significant Motifs in Promoters of Five Classes of Plant Dehydrin Genes

    PubMed Central

    Zolotarov, Yevgen; Strömvik, Martina

    2015-01-01

    Plants accumulate dehydrins in response to osmotic stresses. Dehydrins are divided into five different classes, which are thought to be regulated in different manners. To better understand differences in transcriptional regulation of the five dehydrin classes, de novo motif discovery was performed on 350 dehydrin promoter sequences from a total of 51 plant genomes. Overrepresented motifs were identified in the promoters of five dehydrin classes. The Kn dehydrin promoters contain motifs linked with meristem specific expression, as well as motifs linked with cold/dehydration and abscisic acid response. KS dehydrin promoters contain a motif with a GATA core. SKn and YnSKn dehydrin promoters contain motifs that match elements connected with cold/dehydration, abscisic acid and light response. YnKn dehydrin promoters contain motifs that match abscisic acid and light response elements, but not cold/dehydration response elements. Conserved promoter motifs are present in the dehydrin classes and across different plant lineages, indicating that dehydrin gene regulation is likely also conserved. PMID:26114291

  1. Betaine 0.77-perhydrate 0.23-hydrate and common structural motifs in crystals of amino acid perhydrates.

    PubMed

    Minkov, Vasily S; Kapustin, Evgeny A; Boldyreva, Elena V

    2013-04-01

    The title compound, betaine 0.77-perhydrate 0.23-hydrate, (CH3)3N(+)CH2COO(-)·0.77H2O2·0.23H2O, crystallizes in the orthorhombic noncentrosymmetric space group Pca2(1). Chiral molecules of hydrogen peroxide are positionally disordered with water molecules in a ratio of 0.77:0.23. Betaine, 2-(trimethylazaniumyl)acetate, preserves its zwitterionic state, with a positively charged ammonium group and a negatively charged carboxylate group. The molecular conformation of betaine here differs from the conformations of both anhydrous betaine and its hydrate, mainly in the orientation of the carboxylate group with respect to the C-C-N skeleton. Hydrogen peroxide is linked via two hydrogen bonds to carboxylate groups, forming infinite chains along the crystallographic a axis, which are very similar to those in the crystal structure of betaine hydrate. The present work contributes to the understanding of the structure-forming factors for amino acid perhydrates, which are presently attracting much attention. A correlation is suggested between the ratio of amino acid zwitterions and hydrogen peroxide in the unit cell and the structural motifs present in the crystal structures of all currently known amino acids perhydrates. This can help to classify the crystal structures of amino acid perhydrates and to design new crystal structures.

  2. Purification, amino acid sequence and immunological characterization of Ole e 6, a cysteine-enriched allergen from olive tree pollen.

    PubMed

    Batanero, E; Ledesma, A; Villalba, M; Rodríguez, R

    1997-06-30

    The Ole e 6 allergen from olive tree pollen has been isolated by combining gel permeation and reverse-phase chromatographies. It is a single and highly acidic (pI 4.2) polypeptide chain protein. Its NH2-terminal amino acid sequence has been determined by Edman degradation. Total RNA from the olive tree pollen was isolated, and a specific cDNA was amplified by the polymerase chain reaction using a degenerate oligonucleotide primer designed according to the NH2-terminal sequence of the protein. The nucleotide sequencing of the cDNA rendered an open reading frame encoding a 50 amino acid polypeptide chain, in which two sets of the sequential motif Cys-X3-Cys-X3-Cys are present. No sequence similarity has been found between this protein and other previously described polypeptides.

  3. The Thiamin Pyrophosphate-Motif

    NASA Technical Reports Server (NTRS)

    Dominiak, P.; Ciszak, E.

    2003-01-01

    Using databases the authors have identified a common thiamin pyrophosphate (TPP)-motif in the family of functionally diverse TPP-dependent enzymes. This common motif consists of multimeric organization of subunits and two catalytic centers. Each catalytic center (PP:PYR) is formed at the interface of the PP-domain binding the magnesium ion, pyrophosphate and amhopyrimidine ring of TPP, and the PYR-domain binding the aminopyrimidine ring of that cofactor. A pair of these catalytic centers constitutes the catalytic core (PP:PYR)(sub 2) within these enzymes. Analysis of the structural elements of this catalytic core reveals novel definition of the common amino acid sequences, which are GXPhiX(sub 4)(G)PhiXXGQ and GDGX(sub 25-30)NN in the PP-domain, and the EX(sub 4)(G)PhiXXGPhi in the PYR-domain, where Phi corresponds to a hydrophobic amino acid. This TPP-motif provides a novel tool for annotation of TPP-dependent enzymes useful in advancing functional proteomics.

  4. The Thiamine-Pyrophosphate-Motif

    NASA Technical Reports Server (NTRS)

    Ciszak, Ewa; Dominiak, Paulina

    2004-01-01

    Thiamin pyrophosphate (TPP), a derivative of vitamin B1, is a cofactor for enzymes performing catalysis in pathways of energy production including the well known decarboxylation of a-keto acid dehydrogenases followed by transketolation. TPP-dependent enzymes constitute a structurally and functionally diverse group exhibiting multimeric subunit organization, multiple domains and two chemically equivalent catalytic centers. Annotation of functional TPP-dependcnt enzymes, therefore, has not been trivial due to low sequence similarity related to this complex organization. Our approach to analysis of structures of known TPP-dependent enzymes reveals for the first time features common to this group, which we have termed the TPP-motif. The TPP-motif consists of specific spatial arrangements of structural elements and their specific contacts to provide for a flip-flop, or alternate site, enzymatic mechanism of action. Analysis of structural elements entrained in the flip-flop action displayed by TPP-dependent enzymes reveals a novel definition of the common amino acid sequences. These sequences allow for annotation of TPP-dependent enzymes, thus advancing functional proteomics. Further details of three-dimensional structures of TPP-dependent enzymes will be discussed.

  5. Ovodefensins, an Oviduct-Specific Antimicrobial Gene Family, Have Evolved in Birds and Reptiles to Protect the Egg by Both Sequence and Intra-Six-Cysteine Sequence Motif Spacing.

    PubMed

    Whenham, Natasha; Lu, Tian Chee; Maidin, Maisarah B M; Wilson, Peter W; Bain, Maureen M; Stevenson, M Lynn; Stevens, Mark P; Bedford, Michael R; Dunn, Ian C

    2015-06-01

    Ovodefensins are a novel beta defensin-related family of antimicrobial peptides containing conserved glycine and six cysteine residues. Originally thought to be restricted to the albumen-producing region of the avian oviduct, expression was found in chicken, turkey, duck, and zebra finch in large quantities in many parts of the oviduct, but this varied between species and between gene forms in the same species. Using new search strategies, the ovodefensin family now has 35 members, including reptiles, but no representatives outside birds and reptiles have been found. Analysis of their evolution shows that ovodefensins divide into six groups based on the intra-cysteine amino acid spacing, representing a unique mechanism alongside traditional evolution of sequence. The groups have been used to base a nomenclature for the family. Antimicrobial activity for three ovodefensins from chicken and duck was confirmed against Escherichia coli and a pathogenic E. coli strain as well as a Gram-positive organism, Staphylococcus aureus, for the first time. However, activity varied greatly between peptides, with Gallus gallus OvoDA1 being the most potent, suggesting a link with the different structures. Expression of Gallus gallus OvoDA1 (gallin) in the oviduct was increased by estrogen and progesterone and in the reproductive state. Overall, the results support the hypothesis that ovodefensins evolved to protect the egg, but they are not necessarily restricted to the egg white. Therefore, divergent motif structure and sequence present an interesting area of research for antimicrobial peptide design and understanding protection of the cleidoic egg.

  6. Linear array of conserved sequence motifs to discriminate protein subfamilies: study on pyridine nucleotide-disulfide reductases

    PubMed Central

    Avila, César L; Rapisarda, Viviana A; Farías, Ricardo N; De Las Rivas, Javier; Chehín, Rosana

    2007-01-01

    Background The pyridine nucleotide disulfide reductase (PNDR) is a large and heterogeneous protein family divided into two classes (I and II), which reflect the divergent evolution of its characteristic disulfide redox active site. However, not all the PNDR members fit into these categories and this suggests the need of further studies to achieve a more comprehensive classification of this complex family. Results A workflow to improve the clusterization of protein families based on the array of linear conserved motifs is designed. The method is applied to the PNDR large family finding two main groups, which correspond to PNDR classes I and II. However, two other separate protein clusters, previously classified as class I in most databases, are outgrouped: the peroxide reductases (NAOX, NAPE) and the type II NADH dehydrogenases (NDH-2). In this way, two novel PNDR classes III and IV for NAOX/NAPE and NDH-2 respectively are proposed. By knowledge-driven biochemical and functional data analyses done on the new class IV, a linear array of motifs putatively related to Cu(II)-reductase activity is detected in a specific subset of NDH-2. Conclusion The results presented are a novel contribution to the classification of the complex and large PNDR protein family, supporting its reclusterization into four classes. The linear array of motifs detected within the class IV PNDR subfamily could be useful as a signature for a particular subgroup of NDH-2. PMID:17367536

  7. Comparison of loline alkaloid gene clusters across fungal endophytes: predicting the co-regulatory sequence motifs and the evolutionary history.

    PubMed

    Kutil, Brandi L; Greenwald, Charles; Liu, Gang; Spiering, Martin J; Schardl, Christopher L; Wilkinson, Heather H

    2007-10-01

    LOL, a fungal secondary metabolite gene cluster found in Epichloë and Neotyphodium species, is responsible for production of insecticidal loline alkaloids. To analyze the genetic architecture and to predict the evolutionary history of LOL, we compared five clusters from four fungal species (single clusters from Epichloë festucae, Neotyphodium sp. PauTG-1, Neotyphodium coenophialum, and two clusters we previously characterized in Neotyphodium uncinatum). Using PhyloCon to compare putative lol gene promoter regions, we have identified four motifs conserved across the lol genes in all five clusters. Each motif has significant similarity to known fungal transcription factor binding sites in the TRANSFAC database. Conservation of these motifs is further support for the hypothesis that the lol genes are co-regulated. Interestingly, the history of asexual Neotyphodium spp. includes multiple interspecific hybridization events. Comparing clusters from three Neotyphodium species and E. festucae allowed us to determine which Epichloë ancestors are the most likely contributors of LOL in these asexual species. For example, while no present day Epichloë typhina isolates are known to produce lolines, our data support the hypothesis that the E. typhina ancestor(s) of three asexual endophyte species contained a LOL gene cluster. Thus, these data support a model of evolution in which the polymorphism in loline alkaloid production phenotypes among endophyte species is likely due to the loss of the trait over time.

  8. Salicylic Acid Suppresses Jasmonic Acid Signaling Downstream of SCFCOI1-JAZ by Targeting GCC Promoter Motifs via Transcription Factor ORA59[C][W][OA

    PubMed Central

    Van der Does, Dieuwertje; Leon-Reyes, Antonio; Koornneef, Annemart; Van Verk, Marcel C.; Rodenburg, Nicole; Pauwels, Laurens; Goossens, Alain; Körbes, Ana P.; Memelink, Johan; Ritsema, Tita; Van Wees, Saskia C.M.; Pieterse, Corné M.J.

    2013-01-01

    Antagonism between the defense hormones salicylic acid (SA) and jasmonic acid (JA) plays a central role in the modulation of the plant immune signaling network, but the molecular mechanisms underlying this phenomenon are largely unknown. Here, we demonstrate that suppression of the JA pathway by SA functions downstream of the E3 ubiquitin-ligase Skip-Cullin-F-box complex SCFCOI1, which targets JASMONATE ZIM-domain transcriptional repressor proteins (JAZs) for proteasome-mediated degradation. In addition, neither the stability nor the JA-induced degradation of JAZs was affected by SA. In silico promoter analysis of the SA/JA crosstalk transcriptome revealed that the 1-kb promoter regions of JA-responsive genes that are suppressed by SA are significantly enriched in the JA-responsive GCC-box motifs. Using GCC:GUS lines carrying four copies of the GCC-box fused to the β-glucuronidase reporter gene, we showed that the GCC-box motif is sufficient for SA-mediated suppression of JA-responsive gene expression. Using plants overexpressing the GCC-box binding APETALA2/ETHYLENE RESPONSE FACTOR (AP2/ERF) transcription factors ERF1 or ORA59, we found that SA strongly reduces the accumulation of ORA59 but not that of ERF1. Collectively, these data indicate that the SA pathway inhibits JA signaling downstream of the SCFCOI1-JAZ complex by targeting GCC-box motifs in JA-responsive promoters via a negative effect on the transcriptional activator ORA59. PMID:23435661

  9. Methods and compositions for efficient nucleic acid sequencing

    DOEpatents

    Drmanac, Radoje

    2006-07-04

    Disclosed are novel methods and compositions for rapid and highly efficient nucleic acid sequencing based upon hybridization with two sets of small oligonucleotide probes of known sequences. Extremely large nucleic acid molecules, including chromosomes and non-amplified RNA, may be sequenced without prior cloning or subcloning steps. The methods of the invention also solve various current problems associated with sequencing technology such as, for example, high noise to signal ratios and difficult discrimination, attaching many nucleic acid fragments to a surface, preparing many, longer or more complex probes and labelling more species.

  10. Methods and compositions for efficient nucleic acid sequencing

    DOEpatents

    Drmanac, Radoje

    2002-01-01

    Disclosed are novel methods and compositions for rapid and highly efficient nucleic acid sequencing based upon hybridization with two sets of small oligonucleotide probes of known sequences. Extremely large nucleic acid molecules, including chromosomes and non-amplified RNA, may be sequenced without prior cloning or subcloning steps. The methods of the invention also solve various current problems associated with sequencing technology such as, for example, high noise to signal ratios and difficult discrimination, attaching many nucleic acid fragments to a surface, preparing many, longer or more complex probes and labelling more species.

  11. Kit for detecting nucleic acid sequences using competitive hybridization probes

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    2001-01-01

    A kit is provided for detecting a target nucleic acid sequence in a sample, the kit comprising: a first hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a first portion of the target sequence, the first hybridization probe including a first complexing agent for forming a binding pair with a second complexing agent; and a second hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a second portion of the target sequence to which the first hybridization probe does not selectively hybridize, the second hybridization probe including a detectable marker; a third hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a first portion of the target sequence, the third hybridization probe including the same detectable marker as the second hybridization probe; and a fourth hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a second portion of the target sequence to which the third hybridization probe does not selectively hybridize, the fourth hybridization probe including the first complexing agent for forming a binding pair with the second complexing agent; wherein the first and second hybridization probes are capable of simultaneously hybridizing to the target sequence and the third and fourth hybridization probes are capable of simultaneously hybridizing to the target sequence, the detectable marker is not present on the first or fourth hybridization probes and the first, second, third, and fourth hybridization probes each include a competitive nucleic acid sequence which is sufficiently complementary to a third portion of the target sequence that the competitive sequences of the first, second, third, and fourth hybridization probes compete with each other to hybridize to the third portion of the

  12. Analysis and Annotation of Nucleic Acid Sequence

    SciTech Connect

    States, David J.

    2004-07-28

    The aims of this project were to develop improved methods for computational genome annotation and to apply these methods to improve the annotation of genomic sequence data with a specific focus on human genome sequencing. The project resulted in a substantial body of published work. Notable contributions of this project were the identification of basecalling and lane tracking as error processes in genome sequencing and contributions to improved methods for these steps in genome sequencing. This technology improved the accuracy and throughput of genome sequence analysis. Probabilistic methods for physical map construction were developed. Improved methods for sequence alignment, alternative splicing analysis, promoter identification and NF kappa B response gene prediction were also developed.

  13. Phylogenetic analysis, based on EPIYA repeats in the cagA gene of Indian Helicobacter pylori, and the implications of sequence variation in tyrosine phosphorylation motifs on determining the clinical outcome.

    PubMed

    Tiwari, Santosh K; Sharma, Vishwas; Sharma, Varun Kumar; Gopi, Manoj; Saikant, R; Nandan, Amrita; Bardia, Avinash; Gunisetty, Sivaram; Katikala, Prasanth; Habeeb, Md Aejaz; Khan, Aleem A; Habibullah, C M

    2011-04-01

    The population of India harbors one of the world's most highly diverse gene pools, owing to the influx of successive waves of immigrants over regular periods in time. Several phylogenetic studies involving mitochondrial DNA and Y chromosomal variation have demonstrated Europeans to have been the first settlers in India. Nevertheless, certain controversy exists, due to the support given to the thesis that colonization was by the Austro-Asiatic group, prior to the Europeans. Thus, the aim was to investigate pre-historic colonization of India by anatomically modern humans, using conserved stretches of five amino acid (EPIYA) sequences in the cagA gene of Helicobacter pylori. Simultaneously, the existence of a pathogenic relationship of tyrosine phosphorylation motifs (TPMs), in 32 H. pylori strains isolated from subjects with several forms of gastric diseases, was also explored. High resolution sequence analysis of the above described genes was performed. The nucleotide sequences obtained were translated into amino acids using MEGA (version 4.0) software for EPIYA. An MJ-Network was constructed for obtaining TPM haplotypes by using NETWORK (version 4.5) software. The findings of the study suggest that Indian H. pylori strains share a common ancestry with Europeans. No specific association of haplotypes with the outcome of disease was revealed through additional network analysis of TPMs.

  14. Solid phase sequencing of double-stranded nucleic acids

    DOEpatents

    Fu, Dong-Jing; Cantor, Charles R.; Koster, Hubert; Smith, Cassandra L.

    2002-01-01

    This invention relates to methods for detecting and sequencing of target double-stranded nucleic acid sequences, to nucleic acid probes and arrays of probes useful in these methods, and to kits and systems which contain these probes. Useful methods involve hybridizing the nucleic acids or nucleic acids which represent complementary or homologous sequences of the target to an array of nucleic acid probes. These probe comprise a single-stranded portion, an optional double-stranded portion and a variable sequence within the single-stranded portion. The molecular weights of the hybridized nucleic acids of the set can be determined by mass spectroscopy, and the sequence of the target determined from the molecular weights of the fragments. Nucleic acids whose sequences can be determined include nucleic acids in biological samples such as patient biopsies and environmental samples. Probes may be fixed to a solid support such as a hybridization chip to facilitate automated determination of molecular weights and identification of the target sequence.

  15. Intramolecular i-motif structure at acidic pH for progressive myoclonus epilepsy (EPM1) repeat d(CCCCGCCCCGCG)n.

    PubMed

    Pataskar, S S; Dash, D; Brahmachari, S K

    2001-10-01

    The most common mutation associated with Progressive Myoclonus Epilepsy (EPM1) of Unverricht-Lundberg type is the expansion of a dodecamer repeat, d(CCCCGCCCCGCG)n. We show that the C-rich strand of this repeat (2-3 copies) forms intercalated i-motif structure at acidic pH as judged by CD spectroscopy and anomalous gel electrophoretic mobility. The stability of the structure increases with the increase in the length of the repeat. Transient formation of stable, folded back structure like i-motif could play an important role in the mechanism of expansion of this repeat.

  16. Analysis of sequences involved in IE2 transactivation of a baculovirus immediate-early gene promoter and identification of a new regulatory motif.

    PubMed

    Shippam-Brett, C E; Willis, L G; Theilmann, D A

    2001-05-01

    Opep-2 is a unique baculovirus early gene that has only been identified in the Orgyia pseudotsugata multiple capsid nucleopolyhedrovirus (OpMNPV). Previous analyses have shown this gene is expressed at very early times post-infection (p.i.) but is shut down by 36-48 h p.i. The promoter of opep-2 therefore, represents a class of early genes that is temporally regulated. In this study, a detailed analysis of the opep-2 promoter is performed to analyze the role individual motifs play in early gene expression. A new 13 base pair regulatory element was identified and shown to be essential in controlling high-level expression of this gene. In addition, mutational analysis revealed that GATA and CACGTG motifs, which have been shown to bind cellular factors in Sf9 and Ld652Y cells, played minor roles in influencing opep-2 expression in the absence of other viral factors. The OpMNPV transactivator IE2 causes a significant activation of the opep-2 promoter. Cotransfection of an extensive number of promoter deletions and mutations did not show any sequence specificity for IE2 transactivation. This is the first detailed analysis of the sequence requirements for IE2 transactivation, and these results suggest that IE2 does not bind directly to specific elements in the opep-2 promoter.

  17. Dipeptide Sequence Determination: Analyzing Phenylthiohydantoin Amino Acids by HPLC

    NASA Astrophysics Data System (ADS)

    Barton, Janice S.; Tang, Chung-Fei; Reed, Steven S.

    2000-02-01

    Amino acid composition and sequence determination, important techniques for characterizing peptides and proteins, are essential for predicting conformation and studying sequence alignment. This experiment presents improved, fundamental methods of sequence analysis for an upper-division biochemistry laboratory. Working in pairs, students use the Edman reagent to prepare phenylthiohydantoin derivatives of amino acids for determination of the sequence of an unknown dipeptide. With a single HPLC technique, students identify both the N-terminal amino acid and the composition of the dipeptide. This method yields good precision of retention times and allows use of a broad range of amino acids as components of the dipeptide. Students learn fundamental principles and techniques of sequence analysis and HPLC.

  18. Selective Alkylation of C-Rich Bulge Motifs in Nucleic Acids by Quinone Methide Derivatives.

    PubMed

    Lönnberg, Tuomas; Hutchinson, Mark; Rokita, Steven

    2015-09-07

    A quinone methide precursor featuring a bis-cyclen anchoring moiety has been synthesized and its capacity to alkylate oligonucleotide targets quantified in the presence and absence of divalent metal ions (Zn(2+) , Ni(2+) and Cd(2+) ). The oligonucleotides were designed for testing the sequence and secondary structure specificity of the reaction. Gel electrophoretic analysis revealed predominant alkylation of C-rich bulges, regardless of the presence of divalent metal ions or even the bis-cyclen anchor. This C-selectivity appears to be an intrinsic property of the quinone methide electrophile as reflected by its reaction with an equimolar mixture of the 2'-deoxynucleosides. Only dA-N1 and dC-N3 alkylation products were detected initially and only the dC adduct persisted for detection under conditions of the gel electrophoretic analysis.

  19. Mathematical Characterization of Protein Sequences Using Patterns as Chemical Group Combinations of Amino Acids.

    PubMed

    Das, Jayanta Kumar; Das, Provas; Ray, Korak Kumar; Choudhury, Pabitra Pal; Jana, Siddhartha Sankar

    2016-01-01

    Comparison of amino acid sequence similarity is the fundamental concept behind the protein phylogenetic tree formation. By virtue of this method, we can explain the evolutionary relationships, but further explanations are not possible unless sequences are studied through the chemical nature of individual amino acids. Here we develop a new methodology to characterize the protein sequences on the basis of the chemical nature of the amino acids. We design various algorithms for studying the variation of chemical group transitions and various chemical group combinations as patterns in the protein sequences. The amino acid sequence of conventional myosin II head domain of 14 family members are taken to illustrate this new approach. We find two blocks of maximum length 6 aa as 'FPKATD' and 'Y/FTNEKL' without repeating the same chemical nature and one block of maximum length 20 aa with the repetition of chemical nature which are common among all 14 members. We also check commonality with another motor protein sub-family kinesin, KIF1A. Based on our analysis we find a common block of length 8 aa both in myosin II and KIF1A. This motif is located in the neck linker region which could be responsible for the generation of mechanical force, enabling us to find the unique blocks which remain chemically conserved across the family. We also validate our methodology with different protein families such as MYOI, Myosin light chain kinase (MLCK) and Rho-associated protein kinase (ROCK), Na+/K+-ATPase and Ca2+-ATPase. Altogether, our studies provide a new methodology for investigating the conserved amino acids' pattern in different proteins.

  20. Aromatic amino acids providing characteristic motifs in the Raman and SERS spectroscopy of peptides.

    PubMed

    Wei, Fang; Zhang, Dongmao; Halas, Naomi J; Hartgerink, Jeffrey D

    2008-07-31

    Raman and surface-enhanced Raman spectroscopies (SERS) are potentially important tools in the characterization of biomolecules such as proteins and DNA. In this work, SERS spectra of three cysteine-containing aromatic peptides: tryptophan-cysteine, tyrosine-cysteine, and phenylalanine-cysteine, bound to Au nanoshell substrates, were obtained, and compared to their respective normal Raman spectra. While the linewidths of the SERS peaks are significantly broadened (up to 70%), no significant spectral shifts (<6 cm (-1)) of the major Stokes modes were observed between the two modalities. We show that the Raman and SERS spectra of penetratin, a cell-penetrating peptide oligomer, can be comprised quite reliably from the spectra of its constituent aromatic amino acids except in the backbone regions where the spectral intensities are critically dependent on the length and conformations of the probed molecules. From this study we conclude that, together with protein backbone groups, aromatic amino acid residues provide the overwhelmingly dominant features in the Raman and SERS spectra of peptides and proteins when present. It follows that the Raman modes of these three small constructed peptides may likely apply to the assignment of Raman and SERS features in the spectra of other peptides and proteins.

  1. Knowledge discovery of multilevel protein motifs

    SciTech Connect

    Conklin, D.; Glasgow, J.; Fortier, S.

    1994-12-31

    A new category of protein motif is introduced. This type of motif captures, in addition to global structure, the nested structure of its component parts. A dataset of four proteins is represented using this scheme. A structured machine discovery procedure is used to discover recurrent amino acid motifs and this knowledge is utilized for the expression of subsequent protein motif discoveries. Examples of discovered multilevel motifs are presented.

  2. Evolution of the hydrogen-bonding motif in the melamine-cyanuric acid co-crystal: a topological study.

    PubMed

    Petelski, Andre N; Peruchena, Nelida M; Sosa, Gladis L

    2016-09-01

    The melamine (M)/cyanuric acid (CA) supramolecular system is perhaps one of the most exploited in the field of self-assembly because of the high complementarity of the components. However, it is necessary to investigate further the factors involved in the assembly process. In this study, we analyzed a set of 13 M n /CA m clusters (with n , m = 1, 2, 3), taken from crystallographic data, to characterize the nature of the hydrogen bonds involved in the self-assembly of these components as well as to provide greater understanding of the phenomenon. The calculations were performed at the B3LYP/6-311++G(d,p) and ω-B97XD (single point) levels of theory, and the interactions were analyzed within the framework of the quantum theory of atoms in molecules and by means of molecular electrostatic potential maps. Our results show that the stablest structure is the rosette-type motif and the aggregation mechanism is governed by a combination of cooperative and anticooperative effects. Our topological results explain the polymorphism in the self-assembly of coadsorbed monolayers of M and CA. Graphical abstract The aggregation steps of the melamine-cyanuric co-crystal is driven by a hydrogen-bonded network which is governed by a complex combination of cooperative and anticooperative effects.

  3. The dimerization motif of the glycophorin A transmembrane segment in membranes: importance of glycine residues.

    PubMed

    Brosig, B; Langosch, D

    1998-04-01

    The glycophorin A transmembrane segment homo-dimerizes to a right-handed pair of alpha-helices. Here, we identified the amino acid motif mediating this interaction within a natural membrane environment. Critical residues were grafted onto two different hydrophobic host sequences in a stepwise manner and self-assembly of the hybrid sequences was determined with the ToxR transcription activator system. Our results show that the motif LIxxGxxxGxxxT elicits a level of self-association equivalent to that of the original glycophorin A transmembrane segment. This motif is very similar to the one previously established in detergent solution. Interestingly, the central GxxxG motif by itself already induced strong self-assembly of host sequences and the three-residue spacing between both glycines proved to be optimal for the interaction. The GxxxG element thus appears to be the most crucial part of the interaction motif.

  4. The dimerization motif of the glycophorin A transmembrane segment in membranes: importance of glycine residues.

    PubMed Central

    Brosig, B.; Langosch, D.

    1998-01-01

    The glycophorin A transmembrane segment homo-dimerizes to a right-handed pair of alpha-helices. Here, we identified the amino acid motif mediating this interaction within a natural membrane environment. Critical residues were grafted onto two different hydrophobic host sequences in a stepwise manner and self-assembly of the hybrid sequences was determined with the ToxR transcription activator system. Our results show that the motif LIxxGxxxGxxxT elicits a level of self-association equivalent to that of the original glycophorin A transmembrane segment. This motif is very similar to the one previously established in detergent solution. Interestingly, the central GxxxG motif by itself already induced strong self-assembly of host sequences and the three-residue spacing between both glycines proved to be optimal for the interaction. The GxxxG element thus appears to be the most crucial part of the interaction motif. PMID:9568912

  5. Amino acid sequence of mouse submaxillary gland renin.

    PubMed Central

    Misono, K S; Chang, J J; Inagami, T

    1982-01-01

    The complete amino acid sequences of the heavy chain and light chain of mouse submaxillary gland renin have been determined. The heavy chain consists of 288 amino acid residues having a Mr of 31,036 calculated from the sequence. The light chain contains 48 amino acid residues with a Mr of 5,458. The sequence of the heavy chain was determined by automated Edman degradations of the cyanogen bromide peptides and tryptic peptides generated after citraconylation, as well as other peptides generated therefrom. The sequence of the light chain was derived from sequence analyses of the peptides generated by cyanogen bromide cleavage or by digestion with Staphylococcus aureus protease. The sequences in the active site regions in renin containing two catalytically essential aspartyl residues 32 and 215 were found identical with those in pepsin, chymosin, and penicillopepsin. Comparison of the amino acid sequence of renin with that of porcine pepsin indicated a 42% sequence identity of the heavy chain with the amino-terminal and middle regions and a 46% identity of the light chain with the carboxyl-terminal region of the porcine pepsin sequence. Residues identical in renin and pepsin are distributed throughout the length of the molecules, suggesting a similarity in their overall structures. PMID:6812055

  6. Direct Imaging of Hippocampal Epileptiform Calcium Motifs Following Kainic Acid Administration in Freely Behaving Mice

    PubMed Central

    Berdyyeva, Tamara K.; Frady, E. Paxon; Nassi, Jonathan J.; Aluisio, Leah; Cherkas, Yauheniya; Otte, Stephani; Wyatt, Ryan M.; Dugovic, Christine; Ghosh, Kunal K.; Schnitzer, Mark J.; Lovenberg, Timothy; Bonaventure, Pascal

    2016-01-01

    Prolonged exposure to abnormally high calcium concentrations is thought to be a core mechanism underlying hippocampal damage in epileptic patients; however, no prior study has characterized calcium activity during seizures in the live, intact hippocampus. We have directly investigated this possibility by combining whole-brain electroencephalographic (EEG) measurements with microendoscopic calcium imaging of pyramidal cells in the CA1 hippocampal region of freely behaving mice treated with the pro-convulsant kainic acid (KA). We observed that KA administration led to systematic patterns of epileptiform calcium activity: a series of large-scale, intensifying flashes of increased calcium fluorescence concurrent with a cluster of low-amplitude EEG waveforms. This was accompanied by a steady increase in cellular calcium levels (>5 fold increase relative to the baseline), followed by an intense spreading calcium wave characterized by a 218% increase in global mean intensity of calcium fluorescence (n = 8, range [114–349%], p < 10−4; t-test). The wave had no consistent EEG phenotype and occurred before the onset of motor convulsions. Similar changes in calcium activity were also observed in animals treated with 2 different proconvulsant agents, N-methyl-D-aspartate (NMDA) and pentylenetetrazol (PTZ), suggesting the measured changes in calcium dynamics are a signature of seizure activity rather than a KA-specific pathology. Additionally, despite reducing the behavioral severity of KA-induced seizures, the anticonvulsant drug valproate (VA, 300 mg/kg) did not modify the observed abnormalities in calcium dynamics. These results confirm the presence of pathological calcium activity preceding convulsive motor seizures and support calcium as a candidate signaling molecule in a pathway connecting seizures to subsequent cellular damage. Integrating in vivo calcium imaging with traditional assessment of seizures could potentially increase translatability of pharmacological

  7. Mathematical Characterization of Protein Sequences Using Patterns as Chemical Group Combinations of Amino Acids

    PubMed Central

    Choudhury, Pabitra Pal; Jana, Siddhartha Sankar

    2016-01-01

    Comparison of amino acid sequence similarity is the fundamental concept behind the protein phylogenetic tree formation. By virtue of this method, we can explain the evolutionary relationships, but further explanations are not possible unless sequences are studied through the chemical nature of individual amino acids. Here we develop a new methodology to characterize the protein sequences on the basis of the chemical nature of the amino acids. We design various algorithms for studying the variation of chemical group transitions and various chemical group combinations as patterns in the protein sequences. The amino acid sequence of conventional myosin II head domain of 14 family members are taken to illustrate this new approach. We find two blocks of maximum length 6 aa as ‘FPKATD’ and ‘Y/FTNEKL’ without repeating the same chemical nature and one block of maximum length 20 aa with the repetition of chemical nature which are common among all 14 members. We also check commonality with another motor protein sub-family kinesin, KIF1A. Based on our analysis we find a common block of length 8 aa both in myosin II and KIF1A. This motif is located in the neck linker region which could be responsible for the generation of mechanical force, enabling us to find the unique blocks which remain chemically conserved across the family. We also validate our methodology with different protein families such as MYOI, Myosin light chain kinase (MLCK) and Rho-associated protein kinase (ROCK), Na+/K+-ATPase and Ca2+-ATPase. Altogether, our studies provide a new methodology for investigating the conserved amino acids’ pattern in different proteins. PMID:27930687

  8. Identification of a Novel Sequence Motif Recognized by the Ankyrin Repeat Domain of zDHHC17/13 S-Acyltransferases*

    PubMed Central

    Lemonidis, Kimon; Sanchez-Perez, Maria C.; Chamberlain, Luke H.

    2015-01-01

    S-Acylation is a major post-translational modification affecting several cellular processes. It is particularly important for neuronal functions. This modification is catalyzed by a family of transmembrane S-acyltransferases that contain a conserved zinc finger DHHC (zDHHC) domain. Typically, eukaryote genomes encode for 7–24 distinct zDHHC enzymes, with two members also harboring an ankyrin repeat (AR) domain at their cytosolic N termini. The AR domain of zDHHC enzymes is predicted to engage in numerous interactions and facilitates both substrate recruitment and S-acylation-independent functions; however, the sequence/structural features recognized by this module remain unknown. The two mammalian AR-containing S-acyltransferases are the Golgi-localized zDHHC17 and zDHHC13, also known as Huntingtin-interacting proteins 14 and 14-like, respectively; they are highly expressed in brain, and their loss in mice leads to neuropathological deficits that are reminiscent of Huntington's disease. Here, we report that zDHHC17 and zDHHC13 recognize, via their AR domain, evolutionary conserved and closely related sequences of a [VIAP][VIT]XXQP consensus in SNAP25, SNAP23, cysteine string protein, Huntingtin, cytoplasmic linker protein 3, and microtubule-associated protein 6. This novel AR-binding sequence motif is found in regions predicted to be unstructured and is present in a number of zDHHC17 substrates and zDHHC17/13-interacting S-acylated proteins. This is the first study to identify a motif recognized by AR-containing zDHHCs. PMID:26198635

  9. Small yet effective: the ethylene responsive element binding factor-associated amphiphilic repression (EAR) motif.

    PubMed

    Kagale, Sateesh; Rozwadowski, Kevin

    2010-06-01

    The Ethylene-responsive element binding factor-associated Amphiphilic Repression (EAR) motif is a small yet distinct regulatory motif that is conserved in many plant transcriptional regulator (TR) proteins associated with diverse biological functions. We have previously established a list of high-confidence Arabidopsis EAR repressors, the EAR repressome, comprising 219 TRs belonging to 21 different TR families. This class of proteins and the sequence context of the EAR motif exhibited a high degree of conservation across evolutionarily diverse plant species. Our comprehensive genome-wide analysis enabled refining EAR motifs as comprising either LxLxL or DLNxxP. Comparing the representation of these sequence signatures in TRs to that of other repressor motifs we show that the EAR motif is the one most frequently represented, detected in 10 to 25% of the TRs from diverse plant species. The mechanisms involved in regulation of EAR motif function and the cellular fates of EAR repressors are currently not well understood. Our earlier analysis had implicated amino acid residues flanking the EAR motifs in regulation of their functionality. Here, we present additional evidence supporting possible regulation of EAR motif function by phosphorylation of integral or adjacent Ser and/or Thr residues. Additionally, we discuss potential novel roles of EAR motifs in plant-pathogen interaction and processes other than transcriptional repression.

  10. Amino Acid Sequence of Human Cholinesterase

    DTIC Science & Technology

    1985-10-01

    liquid chromatography (HPLC). Activity testing of the aged, DFP-labeled cholinesterase showed that 99.8% of the active sites had been labeled, since...acids were quantitated by ninhydrin at the AAA Labs, or by derivatization with phenylisothiocyanate at the University of Michigan. The latter method

  11. New molecular motif for recognizing sialic acid using emissive lanthanide-macrocyclic polyazacarboxylate complexes: deprotonation of a coordinated water molecule controls specific binding.

    PubMed

    Ouchi, Kazuki; Saito, Shingo; Shibukawa, Masami

    2013-06-03

    A new molecular motif--lanthanide-macrocyclic polyazacarboxylate hexadentate complexes, Ln(3+)-ABNOTA--was found to specifically bind to sialic acid with strong emission enhancement and high affinity. The selectivity toward sialic acid over other monosaccharides was one of the highest among artificial receptors. Also, the novel binding mechanism was investigated in detail; binding selectivity is controlled by interactions between sialic acid and both the central metal and a hydroxyl group produced by deprotonation of a coordinated water molecule in the Ln(3+) complex.

  12. A common sequence motif determines the Cajal body-specific localization of box H/ACA scaRNAs.

    PubMed

    Richard, Patricia; Darzacq, Xavier; Bertrand, Edouard; Jády, Beáta E; Verheggen, Céline; Kiss, Tamás

    2003-08-15

    Post-transcriptional synthesis of 2'-O-methylated nucleotides and pseudouridines in Sm spliceosomal small nuclear RNAs takes place in the nucleoplasmic Cajal bodies and it is directed by guide RNAs (scaRNAs) that are structurally and functionally indistinguishable from small nucleolar RNAs (snoRNAs) directing rRNA modification in the nucleolus. The scaRNAs are synthesized in the nucleoplasm and specifically targeted to Cajal bodies. Here, mutational analysis of the human U85 box C/D-H/ACA scaRNA, followed by in situ localization, demonstrates that box H/ACA scaRNAs share a common Cajal body-specific localization signal, the CAB box. Two copies of the evolutionarily conserved CAB consensus (UGAG) are located in the terminal loops of the 5' and 3' hairpins of the box H/ACA domains of mammalian, Drosophila and plant scaRNAs. Upon alteration of the CAB boxes, mutant scaRNAs accumulate in the nucleolus. In turn, authentic snoRNAs can be targeted into Cajal bodies by addition of exogenous CAB box motifs. Our results indicate that scaRNAs represent an ancient group of small nuclear RNAs which are localized to Cajal bodies by an evolutionarily conserved mechanism.

  13. Role of repetitive nine-residue sequence motifs in secretion, enzymatic activity, and protein conformation of a family I.3 lipase.

    PubMed

    Kwon, Hyun-Ju; Haruki, Mitsuru; Morikawa, Masaaki; Omori, Kenji; Kanaya, Shigenori

    2002-01-01

    A family I.3 lipase from Pseudomonas sp. MIS38 (PML) contains 12 repeats of a nine-residue sequence motif in the C-terminal region. To elucidate the role of these repetitive sequences, mutant proteins PML5, PML4, PML1, and PML0, in which 7, 8, 11, and all 12 of the repetitive sequences are deleted, and PMLdelta19, in which 19 C-terminal residues are truncated, were constructed. Escherichia coli DH5 cells carrying the Serratia marcescens Lip system permitted the secretion of the wild-type and all of the mutant proteins except for PMLdelta19, although they were partially accumulated in the cells in an insoluble form as well. Both the secretion level and cellular content of the proteins decreased in the order PML > PML5 > PML4 > PML1 > PML0, indicating that repetitive sequences are not required for secretion of PML but are important for its stability in the cells. All the mutant proteins were purified in a refolded form and their biochemical properties were characterized. CD spectra, the Ca2+ contents, and susceptibility to chymotryptic digestion strongly suggested that the five repetitive sequences remaining in PML5 are sufficient to form a beta-roll structure, whereas the four in PML4 are not. PML5 and PMLdelta19 showed both lipase and esterase activities, whereas PML4, PML1, and PML0 were inactive. These results suggest that the enzymatic activity of PML is not seriously affected by a deletion or truncation at the C-terminal region as long as a succession of repetitive sequences can build a beta-roll structure.

  14. Cystatin. Amino acid sequence and possible secondary structure.

    PubMed Central

    Schwabe, C; Anastasi, A; Crow, H; McDonald, J K; Barrett, A J

    1984-01-01

    The amino acid sequence of cystatin, the protein from chicken egg-white that is a tight-binding inhibitor of many cysteine proteinases, is reported. Cystatin is composed of 116 amino acid residues, and the Mr is calculated to be 13 143. No striking similarity to any other known sequence has been detected. The results of computer analysis of the sequence and c.d. spectrometry indicate that the secondary structure includes relatively little alpha-helix (about 20%) and that the remainder is mainly beta-structure. PMID:6712597

  15. Limb body wall complex, amniotic band sequence, or new syndrome caused by mutation in IQ Motif containing K (IQCK)?

    PubMed Central

    Kruszka, Paul; Uwineza, Annette; Mutesa, Leon; Martinez, Ariel F; Abe, Yu; Zackai, Elaine H; Ganetzky, Rebecca; Chung, Brian; Stevenson, Roger E; Adelstein, Robert S; Ma, Xuefei; Mullikin, James C; Hong, Sung-Kook; Muenke, Maximilian

    2015-01-01

    Limb body wall complex (LBWC) and amniotic band sequence (ABS) are multiple congenital anomaly conditions with craniofacial, limb, and ventral wall defects. LBWC and ABS are considered separate entities by some, and a continuum of severity of the same condition by others. The etiology of LBWC/ABS remains unknown and multiple hypotheses have been proposed. One individual with features of LBWC and his unaffected parents were whole exome sequenced and Sanger sequenced as confirmation of the mutation. Functional studies were conducted using morpholino knockdown studies followed by human mRNA rescue experiments. Using whole exome sequencing, a de novo heterozygous mutation was found in the gene IQCK: c.667C>G; p.Q223E and confirmed by Sanger sequencing in an individual with LBWC. Morpholino knockdown of iqck mRNA in the zebrafish showed ventral defects including failure of ventral fin to develop and cardiac edema. Human wild-type IQCK mRNA rescued the zebrafish phenotype, whereas human p.Q223E IQCK mRNA did not, but worsened the phenotype of the morpholino knockdown zebrafish. This study supports a genetic etiology for LBWC/ABS, or potentially a new syndrome. PMID:26436108

  16. Mouse Vk gene classification by nucleic acid sequence similarity.

    PubMed

    Strohal, R; Helmberg, A; Kroemer, G; Kofler, R

    1989-01-01

    Analyses of immunoglobulin (Ig) variable (V) region gene usage in the immune response, estimates of V gene germline complexity, and other nucleic acid hybridization-based studies depend on the extent to which such genes are related (i.e., sequence similarity) and their organization in gene families. While mouse Igh heavy chain V region (VH) gene families are relatively well-established, a corresponding systematic classification of Igk light chain V region (Vk) genes has not been reported. The present analysis, in the course of which we reviewed the known extent of the Vk germline gene repertoire and Vk gene usage in a variety of responses to foreign and self antigens, provides a classification of mouse Vk genes in gene families composed of members with greater than 80% overall nucleic acid sequence similarity. This classification differed in several aspects from that of VH genes: only some Vk gene families were as clearly separated (by greater than 25% sequence dissimilarity) as typical VH gene families; most Vk gene families were closely related and, in several instances, members from different families were very similar (greater than 80%) over large sequence portions; frequently, classification by nucleic acid sequence similarity diverged from existing classifications based on amino-terminal protein sequence similarity. Our data have implications for Vk gene analyses by nucleic acid hybridization and describe potentially important differences in sequence organization between VH and Vk genes.

  17. A survey of motif finding Web tools for detecting binding site motifs in ChIP-Seq data.

    PubMed

    Tran, Ngoc Tam L; Huang, Chun-Hsi

    2014-02-20

    ChIP-Seq (chromatin immunoprecipitation sequencing) has provided the advantage for finding motifs as ChIP-Seq experiments narrow down the motif finding to binding site locations. Recent motif finding tools facilitate the motif detection by providing user-friendly Web interface. In this work, we reviewed nine motif finding Web tools that are capable for detecting binding site motifs in ChIP-Seq data. We showed each motif finding Web tool has its own advantages for detecting motifs that other tools may not discover. We recommended the users to use multiple motif finding Web tools that implement different algorithms for obtaining significant motifs, overlapping resemble motifs, and non-overlapping motifs. Finally, we provided our suggestions for future development of motif finding Web tool that better assists researchers for finding motifs in ChIP-Seq data.

  18. The cytosolic C-terminus of the glucose transporter GLUT4 contains an acidic cluster endosomal targeting motif distal to the dileucine signal.

    PubMed Central

    Shewan, A M; Marsh, B J; Melvin, D R; Martin, S; Gould, G W; James, D E

    2000-01-01

    The insulin-responsive glucose transporter GLUT4 is targeted to a post-endocytic compartment in adipocytes, from where it moves to the cell surface in response to insulin. Previous studies have identified two cytosolic targeting motifs that regulate the intracellular sequestration of this protein: FQQI(5-8) in the N-terminus and LL(489,490) (one-letter amino acid notation) in the C-terminus. In the present study we show that a GLUT4 chimaera in which the C-terminal 12 amino acids in GLUT4 have been replaced with the same region from human GLUT3 is constitutively targeted to the plasma membrane when expressed in 3T3-L1 adipocytes. To further dissect this domain it was divided into three regions, each of which was mutated en bloc to alanine residues. Analysis of these constructs revealed that the targeting information is contained within the residues TELEYLGP(498-505). Using the transferrin-horseradish peroxidase endosomal ablation technique in 3T3-L1 adipocytes, we show that mutants in which this C-terminal domain has been disrupted are more sensitive to chemical ablation than wild-type GLUT4. These data indicate that GLUT4 contains a targeting signal in its C-terminus, distal to the dileucine motif, that regulates its sorting into a post-endosomal compartment. Similar membrane-distal, acidic-cluster-based motifs are found in the cytosolic tails of the insulin-responsive aminopeptidase IRAP (insulin-regulated aminopeptidase) and the proprotein convertase PC6B, indicating that this type of motif may play an important role in the endosomal sequestration of a number of different proteins. PMID:10926832

  19. Amino acid sequence repertoire of the bacterial proteome and the occurrence of untranslatable sequences

    PubMed Central

    Navon, Sharon Penias; Kornberg, Guy; Chen, Jin; Schwartzman, Tali; Tsai, Albert; Puglisi, Elisabetta Viani; Puglisi, Joseph D.; Adir, Noam

    2016-01-01

    Bioinformatic analysis of Escherichia coli proteomes revealed that all possible amino acid triplet sequences occur at their expected frequencies, with four exceptions. Two of the four underrepresented sequences (URSs) were shown to interfere with translation in vivo and in vitro. Enlarging the URS by a single amino acid resulted in increased translational inhibition. Single-molecule methods revealed stalling of translation at the entrance of the peptide exit tunnel of the ribosome, adjacent to ribosomal nucleotides A2062 and U2585. Interaction with these same ribosomal residues is involved in regulation of translation by longer, naturally occurring protein sequences. The E. coli exit tunnel has evidently evolved to minimize interaction with the exit tunnel and maximize the sequence diversity of the proteome, although allowing some interactions for regulatory purposes. Bioinformatic analysis of the human proteome revealed no underrepresented triplet sequences, possibly reflecting an absence of regulation by interaction with the exit tunnel. PMID:27307442

  20. Amino acid sequence repertoire of the bacterial proteome and the occurrence of untranslatable sequences.

    PubMed

    Navon, Sharon Penias; Kornberg, Guy; Chen, Jin; Schwartzman, Tali; Tsai, Albert; Puglisi, Elisabetta Viani; Puglisi, Joseph D; Adir, Noam

    2016-06-28

    Bioinformatic analysis of Escherichia coli proteomes revealed that all possible amino acid triplet sequences occur at their expected frequencies, with four exceptions. Two of the four underrepresented sequences (URSs) were shown to interfere with translation in vivo and in vitro. Enlarging the URS by a single amino acid resulted in increased translational inhibition. Single-molecule methods revealed stalling of translation at the entrance of the peptide exit tunnel of the ribosome, adjacent to ribosomal nucleotides A2062 and U2585. Interaction with these same ribosomal residues is involved in regulation of translation by longer, naturally occurring protein sequences. The E. coli exit tunnel has evidently evolved to minimize interaction with the exit tunnel and maximize the sequence diversity of the proteome, although allowing some interactions for regulatory purposes. Bioinformatic analysis of the human proteome revealed no underrepresented triplet sequences, possibly reflecting an absence of regulation by interaction with the exit tunnel.

  1. Amino acid sequences of proteins from Leptospira serovar pomona.

    PubMed

    Alves, S F; Lefebvre, R B; Probert, W

    2000-01-01

    This report describes a partial amino acid sequences from three putative outer envelope proteins from Leptospira serovar pomona. In order to obtain internal fragments for protein sequencing, enzymatic and chemical digestion was performed. The enzyme clostripain was used to digest the proteins 32 and 45 kDa. In situ digestion of 40 kDa molecular weight protein was accomplished using cyanogen bromide. The 32 kDa protein generated two fragments, one of 21 kDa and another of 10 kDa that yielded five residues. A fragment of 24 kDa that yielded nineteen residues of amino acids was obtained from 45 kDa protein. A fragment with a molecular weight of 20 kDa, yielding a twenty amino acids sequence from the 40 kDa protein.

  2. Rice MEL2, the RNA recognition motif (RRM) protein, binds in vitro to meiosis-expressed genes containing U-rich RNA consensus sequences in the 3'-UTR.

    PubMed

    Miyazaki, Saori; Sato, Yutaka; Asano, Tomoya; Nagamura, Yoshiaki; Nonomura, Ken-Ichi

    2015-10-01

    Post-transcriptional gene regulation by RNA recognition motif (RRM) proteins through binding to cis-elements in the 3'-untranslated region (3'-UTR) is widely used in eukaryotes to complete various biological processes. Rice MEIOSIS ARRESTED AT LEPTOTENE2 (MEL2) is the RRM protein that functions in the transition to meiosis in proper timing. The MEL2 RRM preferentially associated with the U-rich RNA consensus, UUAGUU[U/A][U/G][A/U/G]U, dependently on sequences and proportionally to MEL2 protein amounts in vitro. The consensus sequences were located in the putative looped structures of the RNA ligand. A genome-wide survey revealed a tendency of MEL2-binding consensus appearing in 3'-UTR of rice genes. Of 249 genes that conserved the consensus in their 3'-UTR, 13 genes spatiotemporally co-expressed with MEL2 in meiotic flowers, and included several genes whose function was supposed in meiosis; such as Replication protein A and OsMADS3. The proteome analysis revealed that the amounts of small ubiquitin-related modifier-like protein and eukaryotic translation initiation factor3-like protein were dramatically altered in mel2 mutant anthers. Taken together with transcriptome and gene ontology results, we propose that the rice MEL2 is involved in the translational regulation of key meiotic genes on 3'-UTRs to achieve the faithful transition of germ cells to meiosis.

  3. A mechanism of immunoreceptor tyrosine-based activation motif (ITAM)-like sequences in the capsid protein VP2 in viral growth and pathogenesis of Coxsackievirus B3.

    PubMed

    Kim, Dae-Sun; Park, Jung-Hyun; Kim, Joo-Young; Kim, Dokeun; Nam, Jae-Hwan

    2012-04-01

    Coxsackievirus B3 (CVB3) is an RNA virus that mainly causes myocarditis. We have reported previously that immunoreceptor tyrosine-based activation motif (ITAM)-like sequences are contained in the capsid protein VP2 of CVB3. The substitution of two tyrosines for phenylalanines in the ITAM-like region causes attenuation of CVB3, possibly via defective viral assembly. In this study, we found that Syk, a downstream molecule of ITAM, interacts with the wild-type (WT) CVB3 VP0 protein, but not with the mutant CVB3 VP0 (called YYFF), and that an inhibitor of Syk reduced the growth of CVB3. The WT CVB3 activated nuclear factor kappa B (NF-κB), a protein activated by ITAM, and eventually induced the production of interleukin-6 (IL-6)-one of the proinflammatory cytokines induced by NF-κB-in macrophages. However, the YYFF form did not. In addition, viral VP2 protein may be dependent on the phosphorylation of an ITAM-like region that affected the activation of NF-κB. Taken together, these results suggest that the ITAM-like sequences in CVB3 VP2 can not only affect viral structure but also act as signals in pathogenesis.

  4. Extensive amino acid sequence homologies between animal lectins

    SciTech Connect

    Paroutaud, P.; Levi, G.; Teichberg, V.I.; Strosberg, A.D.

    1987-09-01

    The authors have established the amino acid sequence of the ..beta..-D-galactoside binding lectin from the electric eel and the sequences of several peptides from a similar lectin isolated from human placenta. These sequences were compared with the published sequences of peptides derived from the ..beta..-D-galactoside binding lectin from human lung and with sequences deduced from cDNAs assigned to the ..beta..-D-galactoside binding lectins from chicken embryo skin and human hepatomas. Significant homologies were observed. One of the highly conserved regions that contains a tryptophan residue and two glutamic acid resides is probably part of the ..beta..-D-galactoside binding site, which, on the basis of spectroscopic studies of the electric eel lectin, is expected to contain such residues. The similarity of the hydropathy profiles and the predicted secondary structure of the lectins from chicken skin and electric eel, in spite of differences in their amino acid sequences, strongly suggests that these proteins have maintained structural homologies during evolution and together with the other ..beta..-D-galactoside binding lectins were derived form a common ancestor gene.

  5. Amino acid sequence of porcine spleen cathepsin D.

    PubMed Central

    Shewale, J G; Tang, J

    1984-01-01

    The amino acid sequence of porcine spleen cathepsin D heavy chain has been determined and, hence, the complete structure of this enzyme is now known. The sequence of heavy chain was constructed by aligning the structures of peptides generated by cyanogen bromide, trypsin, and endo-proteinase Lys C cleavages. The structure of the light chain has been published previously. The cathepsin D molecule contains 339 amino acid residues in two polypeptide chains: a 97-residue light chain and a 242-residue heavy chain, with a combined Mr of 36,779 (without carbohydrate). There are two carbohydrate units linked to asparagine residues 70 and 192. The disulfide bond arrangement in cathepsin D is probably similar to that of pepsin, because the positions of six half-cystine residues are conserved. The active site aspartyl residues, corresponding to aspartic acid-32 and -215 of pepsin, are located at residues 33 and 224 in the cathepsin D molecule. The amino acid sequence around these aspartyl residues is strongly conserved. Cathepsin D shows a strong homology with other acid proteases. When the sequence of cathepsin D, renin, and pepsin are aligned, 32.7% of the residues are identical. The homology is observed throughout the length of the molecules, indicating that three-dimensional structures of all three molecules are similar. PMID:6587385

  6. Motifs and structural blocks retrieval by GHT

    NASA Astrophysics Data System (ADS)

    Cantoni, Virginio; Ferone, Alessio; Petrosino, Alfredo; Polat, Ozlem

    2014-06-01

    The structure of a protein gives more insight on the protein function than its amino acid sequence. Protein structure analysis and comparison are important for understanding the evolutionary relationships among proteins, predicting protein functions, and predicting protein folding. Proteins are formed by two basic regular 3D structural patterns, called Secondary Structures (SSs): helices and sheets. A structural motif is a compact 3D protein block referring to a small specific combination of secondary structural elements, which appears in a variety of molecules. In this paper we compare a few approaches for motif retrieval based on the Generalized Hough Transform (GHT). A primary technique is to adopt the single SS as structural primitives; alternatives are to adopt a SSs pair as primitive structural element, or a SSs triplet, and so on up-to an entire motif. The richer the primitive, the higher the time for pre-analysis and search, and the simpler the inspection process on the parameter space for analyzing the peaks. Performance comparisons, in terms of precision and computation time, are here presented considering the retrieval of motifs composed by three to five SSs for more than 15 million searches. The approach can be easily applied to the retrieval of greater blocks, up to protein domains, or even entire proteins.

  7. Bioinformatics study of cancer-related mutations within p53 phosphorylation site motifs.

    PubMed

    Ji, Xiaona; Huang, Qiang; Yu, Long; Nussinov, Ruth; Ma, Buyong

    2014-07-29

    p53 protein has about thirty phosphorylation sites located at the N- and C-termini and in the core domain. The phosphorylation sites are relatively less mutated than other residues in p53. To understand why and how p53 phosphorylation sites are rarely mutated in human cancer, using a bioinformatics approaches, we examined the phosphorylation site and its nearby flanking residues, focusing on the consensus phosphorylation motif pattern, amino-acid correlations within the phosphorylation motifs, the propensity of structural disorder of the phosphorylation motifs, and cancer mutations observed within the phosphorylation motifs. Many p53 phosphorylation sites are targets for several kinases. The phosphorylation sites match 17 consensus sequence motifs out of the 29 classified. In addition to proline, which is common in kinase specificity-determining sites, we found high propensity of acidic residues to be adjacent to phosphorylation sites. Analysis of human cancer mutations in the phosphorylation motifs revealed that motifs with adjacent acidic residues generally have fewer mutations, in contrast to phosphorylation sites near proline residues. p53 phosphorylation motifs are mostly disordered. However, human cancer mutations within phosphorylation motifs tend to decrease the disorder propensity. Our results suggest that combination of acidic residues Asp and Glu with phosphorylation sites provide charge redundancy which may safe guard against loss-of-function mutations, and that the natively disordered nature of p53 phosphorylation motifs may help reduce mutational damage. Our results further suggest that engineering acidic amino acids adjacent to potential phosphorylation sites could be a p53 gene therapy strategy.

  8. A novel cysteine-rich sequence-specific DNA-binding protein interacts with the conserved X-box motif of the human major histocompatibility complex class II genes via a repeated Cys-His domain and functions as a transcriptional repressor

    PubMed Central

    1994-01-01

    The class II major histocompatibility complex (MHC) molecules function in the presentation of processed peptides to helper T cells. As most mammalian cells can endocytose and process foreign antigen, the critical determinant of an antigen-presenting cell is its ability to express class II MHC molecules. Expression of these molecules is usually restricted to cells of the immune system and dysregulated expression is hypothesized to contribute to the pathogenesis of a severe combined immunodeficiency syndrome and certain autoimmune diseases. Human complementary DNA clones encoding a newly identified, cysteine-rich transcription factor, NF-X1, which binds to the conserved X-box motif of class II MHC genes, were obtained, and the primary amino acid sequence deduced. The major open reading frame encodes a polypeptide of 1,104 amino acids with a symmetrical organization. A central cysteine-rich portion encodes the DNA-binding domain, and is subdivided into seven repeated motifs. This motif is similar to but distinct from the LIM domain and the RING finger family, and is reminiscent of known metal-binding regions. The unique arrangement of cysteines indicates that the consensus sequence CX3CXL-XCGX1- 5HXCX3CHXGXC represents a novel cysteine-rich motif. Two lines of evidence indicate that the polypeptide encodes a potent and biologically relevant repressor of HLA-DRA transcription: (a) overexpression of NF-X1 from a retroviral construct strongly decreases transcription from the HLA-DRA promoter; and (b) the NF-X1 transcript is markedly induced late after induction with interferon gamma (IFN- gamma), coinciding with postinduction attenuation of HLA-DRA transcription. The NF-X1 protein may therefore play an important role in regulating the duration of an inflammatory response by limiting the period in which class II MHC molecules are induced by IFN-gamma. PMID:7964459

  9. Molecular cloning and sequencing of a cDNA encoding the thioesterase domain of the rat fatty acid synthetase.

    PubMed

    Naggert, J; Witkowski, A; Mikkelsen, J; Smith, S

    1988-01-25

    A cloned cDNA containing the entire coding sequence for the long-chain S-acyl fatty acid synthetase thioester hydrolase (thioesterase I) component as well as the 3'-noncoding region of the fatty acid synthetase has been isolated using an expression vector and domain-specific antibodies. The coding region was assigned to the thioesterase I domain by identification of sequences coding for characterized peptide fragments, amino-terminal analysis of the isolated thioesterase I domain and the presence of the serine esterase active-site sequence motif. The thioesterase I domain is 306 amino acids long with a calculated molecular mass of 33,476 daltons; its DNA is flanked at the 5'-end by a region coding for the acyl carrier protein domain and at the 3'-end by a 1,537-base pairs-long noncoding sequence with a poly(A) tail. The thioesterase I domain exhibits a low, albeit discernible, homology with the discrete medium-chain S-acyl fatty acid synthetase thioester hydrolases (thioesterase II) from rat mammary gland and duck uropygial gland, suggesting a distant but common evolutionary ancestry for these proteins.

  10. Active site amino acid sequence of human factor D.

    PubMed

    Davis, A E

    1980-08-01

    Factor D was isolated from human plasma by chromatography on CM-Sephadex C50, Sephadex G-75, and hydroxylapatite. Digestion of reduced, S-carboxymethylated factor D with cyanogen bromide resulted in three peptides which were isolated by chromatography on Sephadex G-75 (superfine) equilibrated in 20% formic acid. NH2-Terminal sequences were determined by automated Edman degradation with a Beckman 890C sequencer using a 0.1 M Quadrol program. The smallest peptide (CNBr III) consisted of the NH2-terminal 14 amino acids. The other two peptides had molecular weights of 17,000 (CNBr I) and 7000 (CNBr II). Overlap of the NH2-terminal sequence of factor D with the NH2-terminal sequence of CNBr I established the order of the peptides. The NH2-terminal 53 residues of factor D are somewhat more homologous with the group-specific protease of rat intestine than with other serine proteases. The NH2-terminal sequence of CNBr II revealed the active site serine of factor D. The typical serine protease active site sequence (Gly-Asp-Ser-Gly-Gly-Pro was found at residues 12-17. The region surrounding the active site serine does not appear to be more highly homologous with any one of the other serine proteases. The structural data obtained point out the similarities between factor D and the other proteases. However, complete definition of the degree of relationship between factor D and other proteases will require determination of the remainder of the primary structure.

  11. The amino acid sequence of iguana (Iguana iguana) pancreatic ribonuclease.

    PubMed

    Zhao, W; Beintema, J J; Hofsteenge, J

    1994-01-15

    The pyrimidine-specific ribonuclease superfamily constitutes a group of homologous proteins so far found only in higher vertebrates. Four separate families are found in mammals, which have resulted from gene duplications in mammalian ancestors. To learn more about the evolutionary history of this superfamily, the primary structure and other characteristics of the pancreatic enzyme from iguana (Iguana iguana), a herbivorous lizard species belonging to the reptiles, have been determined. The polypeptide chain consists of 119 amino acid residues. The positions of insertions and deletions in the sequence are identical to those in the enzyme from snapping turtle. However, the two enzymes differ at 54% of the amino acid positions. Iguana ribonuclease contains no carbohydrate, although the enzyme possesses three recognition sites for carbohydrate attachment, and has a high number of acidic residues in a localized part of the sequence.

  12. A short sequence motif in the 5' leader of the HIV-1 genome modulates extended RNA dimer formation and virus replication.

    PubMed

    van Bel, Nikki; Das, Atze T; Cornelissen, Marion; Abbink, Truus E M; Berkhout, Ben

    2014-12-19

    The 5' leader of the HIV-1 RNA genome encodes signals that control various steps in the replication cycle, including the dimerization initiation signal (DIS) that triggers RNA dimerization. The DIS folds a hairpin structure with a palindromic sequence in the loop that allows RNA dimerization via intermolecular kissing loop (KL) base pairing. The KL dimer can be stabilized by including the DIS stem nucleotides in the intermolecular base pairing, forming an extended dimer (ED). The role of the ED RNA dimer in HIV-1 replication has hardly been addressed because of technical challenges. We analyzed a set of leader mutants with a stabilized DIS hairpin for in vitro RNA dimerization and virus replication in T cells. In agreement with previous observations, DIS hairpin stability modulated KL and ED dimerization. An unexpected previous finding was that mutation of three nucleotides immediately upstream of the DIS hairpin significantly reduced in vitro ED formation. In this study, we tested such mutants in vivo for the importance of the ED in HIV-1 biology. Mutants with a stabilized DIS hairpin replicated less efficiently than WT HIV-1. This defect was most severe when the upstream sequence motif was altered. Virus evolution experiments with the defective mutants yielded fast replicating HIV-1 variants with second site mutations that (partially) restored the WT hairpin stability. Characterization of the mutant and revertant RNA molecules and the corresponding viruses confirmed the correlation between in vitro ED RNA dimer formation and efficient virus replication, thus indicating that the ED structure is important for HIV-1 replication.

  13. Amino acid sequence of bovine heart coupling factor 6.

    PubMed Central

    Fang, J K; Jacobs, J W; Kanner, B I; Racker, E; Bradshaw, R A

    1984-01-01

    The amino acid sequence of bovine heart mitochondrial coupling factor 6 (F6) has been determined by automated Edman degradation of the whole protein and derived peptides. Preparations based on heat precipitation and ethanol extraction showed allotypic variation at three positions while material further purified by HPLC yielded only one sequence that also differed by a Phe-Thr replacement at residue 62. The mature protein contains 76 amino acids with a calculated molecular weight of 9006 and a pI of approximately equal to 5, in good agreement with experimentally measured values. The charged amino acids are mainly clustered at the termini and in one section in the middle; these three polar segments are separated by two segments relatively rich in nonpolar residues. Chou-Fasman analysis suggests three stretches of alpha-helix coinciding (or within) the high-charge-density sequences with a single beta-turn at the first polar-nonpolar junction. Comparison of the F6 sequence with those of other proteins did not reveal any homologous structures. PMID:6149548

  14. Amino acid sequence and comparative antigenicity of chicken metallothionein.

    PubMed Central

    McCormick, C C; Fullmer, C S; Garvey, J S

    1988-01-01

    The complete amino acid sequence of metallothionein (MT) from chicken liver is reported. The primary structure was determined by automated sequence analysis of peptides produced by limited acid hydrolysis and by trypsin digestion. The comparative antigenicity of chicken MT was determined by radioimmunoassay using rabbit anti-rat MT polyclonal antibody. Chicken MT consists of 63 amino acids as compared to 61 found in MTs from mammals. One insertion (and two substitutions) occurs in the amino-terminal region, a region considered invariant among mammalian MTs. Eighteen of the 20 cysteines in chicken MT were aligned with cysteines from other mammalian sequences. Two cysteines near the carboxyl terminus are shifted by one residue due to the insertion of proline in that region. Overall, the chicken protein showed approximately equal to 68% sequence identity in a comparison with various mammalian MTs. The affinity of the polyclonal antibody for chicken MT was decreased by 2 orders of magnitude in comparison to that of a mammalian MT (rat MT isoforms). This reduced affinity is attributed to major substitutions in chicken MT in the regions of the principal determinants of mammalian MTs. Theoretical analysis of the primary structure predicted the secondary structure to consist of reverse turns and random coils with no stable beta or helix conformations. There is no evidence that chicken MT differs functionally from mammalian MTs. PMID:2448773

  15. Sequences Of Amino Acids For Human Serum Albumin

    NASA Technical Reports Server (NTRS)

    Carter, Daniel C.

    1992-01-01

    Sequences of amino acids defined for use in making polypeptides one-third to one-sixth as large as parent human serum albumin molecule. Smaller, chemically stable peptides have diverse applications including service as artificial human serum and as active components of biosensors and chromatographic matrices. In applications involving production of artificial sera from new sequences, little or no concern about viral contaminants. Smaller genetically engineered polypeptides more easily expressed and produced in large quantities, making commercial isolation and production more feasible and profitable.

  16. Nanopores and nucleic acids: prospects for ultrarapid sequencing

    NASA Technical Reports Server (NTRS)

    Deamer, D. W.; Akeson, M.

    2000-01-01

    DNA and RNA molecules can be detected as they are driven through a nanopore by an applied electric field at rates ranging from several hundred microseconds to a few milliseconds per molecule. The nanopore can rapidly discriminate between pyrimidine and purine segments along a single-stranded nucleic acid molecule. Nanopore detection and characterization of single molecules represents a new method for directly reading information encoded in linear polymers. If single-nucleotide resolution can be achieved, it is possible that nucleic acid sequences can be determined at rates exceeding a thousand bases per second.

  17. Repression domains of class II ERF transcriptional repressors share an essential motif for active repression.

    PubMed

    Ohta, M; Matsui, K; Hiratsu, K; Shinshi, H; Ohme-Takagi, M

    2001-08-01

    We reported previously that three ERF transcription factors, tobacco ERF3 (NtERF3) and Arabidopsis AtERF3 and AtERF4, which are categorized as class II ERFs, are active repressors of transcription. To clarify the roles of these repressors in transcriptional regulation in plants, we attempted to identify the functional domains of the ERF repressor that mediates the repression of transcription. Analysis of the results of a series of deletions revealed that the C-terminal 35 amino acids of NtERF3 are sufficient to confer the capacity for repression of transcription on a heterologous DNA binding domain. This repression domain suppressed the intermolecular activities of other transcriptional activators. In addition, fusion of this repression domain to the VP16 activation domain completely inhibited the transactivation function of VP16. Comparison of amino acid sequences of class II ERF repressors revealed the conservation of the sequence motif (L)/(F)DLN(L)/(F)(x)P. This motif was essential for repression because mutations within the motif eliminated the capacity for repression. We designated this motif the ERF-associated amphiphilic repression (EAR) motif, and we identified this motif in a number of zinc-finger proteins from wheat, Arabidopsis, and petunia plants. These zinc finger proteins functioned as repressors, and their repression domains were identified as regions that contained an EAR motif.

  18. Efficient motif search in ranked lists and applications to variable gap motifs.

    PubMed

    Leibovich, Limor; Yakhini, Zohar

    2012-07-01

    Sequence elements, at all levels-DNA, RNA and protein, play a central role in mediating molecular recognition and thereby molecular regulation and signaling. Studies that focus on -measuring and investigating sequence-based recognition make use of statistical and computational tools, including approaches to searching sequence motifs. State-of-the-art motif searching tools are limited in their coverage and ability to address large motif spaces. We develop and present statistical and algorithmic approaches that take as input ranked lists of sequences and return significant motifs. The efficiency of our approach, based on suffix trees, allows searches over motif spaces that are not covered by existing tools. This includes searching variable gap motifs-two half sites with a flexible length gap in between-and searching long motifs over large alphabets. We used our approach to analyze several high-throughput measurement data sets and report some validation results as well as novel suggested motifs and motif refinements. We suggest a refinement of the known estrogen receptor 1 motif in humans, where we observe gaps other than three nucleotides that also serve as significant recognition sites, as well as a variable length motif related to potential tyrosine phosphorylation.

  19. Gastrointestinal localization of metronidazole by a lactobacilli-inspired tetramic acid motif improves treatment outcomes in the hamster model of Clostridium difficile infection

    PubMed Central

    Cherian, Philip T.; Wu, Xiaoqian; Yang, Lei; Scarborough, Jerrod S.; Singh, Aman P.; Alam, Zahidul A.; Lee, Richard E.; Hurdle, Julian G.

    2015-01-01

    Objectives Metronidazole, a mainstay treatment for Clostridium difficile infection (CDI), is often ineffective for severe CDI. Whilst this is thought to arise from suboptimal levels of metronidazole in the colon due to rapid absorption, empirical validation is lacking. In contrast, reutericyclin, an antibacterial tetramic acid from Lactobacillus reuteri, concentrates in the gastrointestinal tract. In this study, we modified metronidazole with reutericyclin's tetramic acid motif to obtain non-absorbed compounds, enabling assessment of the impact of pharmacokinetics on treatment outcomes. Methods A series of metronidazole-bearing tetramic acid substituents were synthesized and evaluated in terms of anti-C. difficile activities, gastric permeability, in vivo pharmacokinetics, efficacy in the hamster model of CDI and mode of action. Results Most compounds were absorbed less than metronidazole in cell-based Caco-2 permeability assays. In hamsters, lead compounds compartmentalized in the colon rather than the bloodstream with negligible levels detected in the blood, in direct contrast with metronidazole, which was rapidly absorbed into the blood and was undetectable in caecum. Accordingly, four leads were more efficacious (P < 0.05) than metronidazole in C. difficile-infected animals. Improved efficacy was not due to an alternative mode of action, as the leads retained the mode of action of metronidazole. Conclusions This study provides the clearest empirical evidence that the high absorption of metronidazole lowers treatment outcomes for CDI and suggests a role for the tetramic acid motif for colon-specific drug delivery. This approach also has the potential to lower systemic toxicity and drug interactions of nitroheterocyclic drugs for treating gastrointestine-specific diseases. PMID:26286574

  20. The complementary deoxyribonucleic acid sequence of guinea pig endometrial prorelaxin.

    PubMed

    Lee, Y A; Bryant-Greenwood, G D; Mandel, M; Greenwood, F C

    1992-03-01

    The nucleotide sequence of the relaxin gene transcript in the endometrium of the late pregnant guinea pig has been determined. The strategy used was a combination of polymerase chain reaction (PCR) with primers designed from the mRNA sequence of porcine preprorelaxin, rapid amplification of cDNA ends-PCR, and blunt end cloning in M13 mp18. With heterologous primers, a 226-basepair (bp) segment of the guinea pig relaxin gene sequence was obtained and was used to design a guinea pig-specific primer for use with the rapid amplification of cDNA ends-PCR method. The latter allowed completion of the sequence of 336 bp, with a 96-bp overlap. The sequence obtained shows greater homology at both the nucleotide and amino acid levels with porcine and human relaxins H1 and H2 than with rat relaxin, supporting the thesis that the guinea pig is not a rodent. The transcription of the guinea pig endometrial relaxin gene during pregnancy was confirmed by Northern analysis of guinea pig endometrial tissues with a species-specific cDNA probe. The endometrial relaxin gene is transcribed during pregnancy, but not in lactation, consistent with the observed immunostaining for relaxin.

  1. Quantum-Sequencing: Biophysics of quantum tunneling through nucleic acids

    NASA Astrophysics Data System (ADS)

    Casamada Ribot, Josep; Chatterjee, Anushree; Nagpal, Prashant

    2014-03-01

    Tunneling microscopy and spectroscopy has extensively been used in physical surface sciences to study quantum tunneling to measure electronic local density of states of nanomaterials and to characterize adsorbed species. Quantum-Sequencing (Q-Seq) is a new method based on tunneling microscopy for electronic sequencing of single molecule of nucleic acids. A major goal of third-generation sequencing technologies is to develop a fast, reliable, enzyme-free single-molecule sequencing method. Here, we present the unique ``electronic fingerprints'' for all nucleotides on DNA and RNA using Q-Seq along their intrinsic biophysical parameters. We have analyzed tunneling spectra for the nucleotides at different pH conditions and analyzed the HOMO, LUMO and energy gap for all of them. In addition we show a number of biophysical parameters to further characterize all nucleobases (electron and hole transition voltage and energy barriers). These results highlight the robustness of Q-Seq as a technique for next-generation sequencing.

  2. Cloning, sequence analysis, and expression in Escherichia coli of the gene encoding an alpha-amino acid ester hydrolase from Acetobacter turbidans.

    PubMed

    Polderman-Tijmes, Jolanda J; Jekel, Peter A; de Vries, Erik J; van Merode, Annet E J; Floris, René; van der Laan, Jan-Metske; Sonke, Theo; Janssen, Dick B

    2002-01-01

    The alpha-amino acid ester hydrolase from Acetobacter turbidans ATCC 9325 is capable of hydrolyzing and synthesizing beta-lactam antibiotics, such as cephalexin and ampicillin. N-terminal amino acid sequencing of the purified alpha-amino acid ester hydrolase allowed cloning and genetic characterization of the corresponding gene from an A. turbidans genomic library. The gene, designated aehA, encodes a polypeptide with a molecular weight of 72,000. Comparison of the determined N-terminal sequence and the deduced amino acid sequence indicated the presence of an N-terminal leader sequence of 40 amino acids. The aehA gene was subcloned in the pET9 expression plasmid and expressed in Escherichia coli. The recombinant protein was purified and found to be dimeric with subunits of 70 kDa. A sequence similarity search revealed 26% identity with a glutaryl 7-ACA acylase precursor from Bacillus laterosporus, but no homology was found with other known penicillin or cephalosporin acylases. There was some similarity to serine proteases, including the conservation of the active site motif, GXSYXG. Together with database searches, this suggested that the alpha-amino acid ester hydrolase is a beta-lactam antibiotic acylase that belongs to a class of hydrolases that is different from the Ntn hydrolase superfamily to which the well-characterized penicillin acylase from E. coli belongs. The alpha-amino acid ester hydrolase of A. turbidans represents a subclass of this new class of beta-lactam antibiotic acylases.

  3. Sequence-Based Screening for Rare Enzymes: New Insights into the World of AMDases Reveal a Conserved Motif and 58 Novel Enzymes Clustering in Eight Distinct Families

    PubMed Central

    Maimanakos, Janine; Chow, Jennifer; Gaßmeyer, Sarah K.; Güllert, Simon; Busch, Florian; Kourist, Robert; Streit, Wolfgang R.

    2016-01-01

    Arylmalonate Decarboxylases (AMDases, EC 4.1.1.76) are very rare and mostly underexplored enzymes. Currently only four known and biochemically characterized representatives exist. However, their ability to decarboxylate α-disubstituted malonic acid derivatives to optically pure products without cofactors makes them attractive and promising candidates for the use as biocatalysts in industrial processes. Until now, AMDases could not be separated from other members of the aspartate/glutamate racemase superfamily based on their gene sequences. Within this work, a search algorithm was developed that enables a reliable prediction of AMDase activity for potential candidates. Based on specific sequence patterns and screening methods 58 novel AMDase candidate genes could be identified in this work. Thereby, AMDases with the conserved sequence pattern of Bordetella bronchiseptica’s prototype appeared to be limited to the classes of Alpha-, Beta-, and Gamma-proteobacteria. Amino acid homologies and comparison of gene surrounding sequences enabled the classification of eight enzyme clusters. Particularly striking is the accumulation of genes coding for different transporters of the tripartite tricarboxylate transporters family, TRAP transporters and ABC transporters as well as genes coding for mandelate racemases/muconate lactonizing enzymes that might be involved in substrate uptake or degradation of AMDase products. Further, three novel AMDases were characterized which showed a high enantiomeric excess (>99%) of the (R)-enantiomer of flurbiprofen. These are the recombinant AmdA and AmdV from Variovorax sp. strains HH01 and HH02, originated from soil, and AmdP from Polymorphum gilvum found by a data base search. Altogether our findings give new insights into the class of AMDases and reveal many previously unknown enzyme candidates with high potential for bioindustrial processes. PMID:27610105

  4. Qualitative detection of class IIa bacteriocinogenic lactic acid bacteria from traditional Chinese fermented food using a YGNGV-motif-based assay.

    PubMed

    Liu, Wenli; Zhang, Lanwei; Yi, Huaxi; Shi, John; Xue, Chaohui; Li, Hongbo; Jiao, Yuehua; Shigwedha, Nditange; Du, Ming; Han, Xue

    2014-05-01

    In the present study, a YGNGV-motif-based assay was developed and applied. Given that there is an increasing demand for natural preservatives, we set out to obtain lactic acid bacteria (LAB) that produce bacteriocins against Gram-positive and Gram-negative bacteria. We here isolated 123 LAB strains from 5 types of traditional Chinese fermented food and screened them for the production of bacteriocins using the agar well diffusion assay (AWDA). Then, to acquire LAB producing class IIa bacteriocins, we used a YGNGV-motif-based assay that was based on 14 degenerate primers matching all class IIa bacteriocin-encoding genes currently deposited in NCBI. Eight of the LAB strains identified by AWDA could inhibit Gram-positive and Gram-negative bacteria; 5 of these were YGNGV-amplicon positive. Among these 5 isolates, amplicons from 2 strains (Y31 and Y33) matched class IIa bacteriocin genes. Strain Y31 demonstrated the highest inhibitory activity and the best match to a class IIa bacteriocin gene in NCBI, and was identified as Enterococcus faecium. The bacteriocin from Enterococcus avium Y33 was 100% identical to enterocin P. Both of these strains produced bacteriocins with strong antimicrobial activity against Listeria monocytogenes, Escherichia coli, and Bacillus subtilis, hence these bacteriocins hold promise as potential bio-preservatives in the food industry. These findings also indicated that the YGNGV-motif-based assay used in this study could identify novel class IIa bacteriocinogenic LAB, rapidly and specifically, saving time and labour by by-passing multiple separation and purification steps.

  5. The First Aspartic Acid of the DQxD Motif for Human UDP-Glucuronosyltransferase 1A10 Interacts with UDP-Glucuronic Acid during Catalysis

    PubMed Central

    Xiong, Yan; Patana, Anne-Sisko; Miley, Michael J.; Zielinska, Agnieszka K.; Bratton, Stacie M.; Miller, Grover P.; Goldman, Adrian; Finel, Moshe; Redinbo, Matt R.; Radominska-Pandya, Anna

    2008-01-01

    All UDP-glucuronosyltransferase enzymes (UGTs) share a common cofactor, UDP-glucuronic acid (UDP-GlcUA). The binding site for UDP-GlcUA is localized to the C-terminal domain of UGTs on the basis of amino acid sequence homology analysis and crystal structures of glycosyltransferases, including the C-terminal domain of human UGT2B7. We hypothesized that the 393DQMD-NAK399 region of human UGT1A10 interacts with the glucuronic acid moiety of UDP-GlcUA. Using site-directed mutagenesis and enzymatic analysis, we demonstrated that the D393A mutation abolished the glucuronidation activity of UGT1A10 toward all substrates. The effects of the alanine mutation at Q394, D396, and K399 on glucuronidation activities were substrate-dependent. Previously, we examined the importance of these residues in UGT2B7. Although D393 (D398 in UGT2B7) is similarly critical for UDP-GlcUA binding in both enzymes, the effects of Q394 (Q399 in UGT2B7) to Ala mutation on activity were significant but different between UGT1A10 and UGT2B7. A model of the UDP-GlcUA binding site suggests that the contribution of other residues to cosubstrate binding may explain these differences between UGT1A10 and UGT2B7. We thus postulate that D393 is critical for the binding of glucuronic acid and that proximal residues, e.g., Q394 (Q399 in UGT2B7), play a subtle role in cosubstrate binding in UGT1A10 and UGT2B7. Hence, this study provides important new information needed for the identification and understanding of the binding sites of UGTs, a major step forward in elucidating their molecular mechanism. PMID:18048489

  6. Molecular cloning and amino acid sequence of human 5-lipoxygenase

    SciTech Connect

    Matsumoto, T.; Funk, C.D.; Radmark, O.; Hoeoeg, J.O.; Joernvall, H.; Samuelsson, B.

    1988-01-01

    5-Lipoxygenase (EC 1.13.11.34), a Ca/sup 2 +/- and ATP-requiring enzyme, catalyzes the first two steps in the biosynthesis of the peptidoleukotrienes and the chemotactic factor leukotriene B/sub 4/. A cDNA clone corresponding to 5-lipoxygenase was isolated from a human lung lambda gt11 expression library by immunoscreening with a polyclonal antibody. Additional clones from a human placenta lambda gt11 cDNA library were obtained by plaque hybridization with the /sup 32/P-labeled lung cDNA clone. Sequence data obtained from several overlapping clones indicate that the composite DNAs contain the complete coding region for the enzyme. From the deduced primary structure, 5-lipoxygenase encodes a 673 amino acid protein with a calculated molecular weight of 77,839. Direct analysis of the native protein and its proteolytic fragments confirmed the deduced composition, the amino-terminal amino acid sequence, and the structure of many internal segments. 5-Lipoxygenase has no apparent sequence homology with leukotriene A/sub 4/ hydrolase or Ca/sup 2 +/-binding proteins. RNA blot analysis indicated substantial amounts of an mRNA species of approx. = 2700 nucleotides in leukocytes, lung, and placenta.

  7. Nucleic acid sequence detection using multiplexed oligonucleotide PCR

    DOEpatents

    Nolan, John P.; White, P. Scott

    2006-12-26

    Methods for rapidly detecting single or multiple sequence alleles in a sample nucleic acid are described. Provided are all of the oligonucleotide pairs capable of annealing specifically to a target allele and discriminating among possible sequences thereof, and ligating to each other to form an oligonucleotide complex when a particular sequence feature is present (or, alternatively, absent) in the sample nucleic acid. The design of each oligonucleotide pair permits the subsequent high-level PCR amplification of a specific amplicon when the oligonucleotide complex is formed, but not when the oligonucleotide complex is not formed. The presence or absence of the specific amplicon is used to detect the allele. Detection of the specific amplicon may be achieved using a variety of methods well known in the art, including without limitation, oligonucleotide capture onto DNA chips or microarrays, oligonucleotide capture onto beads or microspheres, electrophoresis, and mass spectrometry. Various labels and address-capture tags may be employed in the amplicon detection step of multiplexed assays, as further described herein.

  8. Sequence of the canine herpesvirus thymidine kinase gene: taxon-preferred amino acid residues in the alphaherpesviral thymidine kinases.

    PubMed

    Rémond, M; Sheldrick, P; Lebreton, F; Foulon, T

    1995-12-01

    Multiple sequence alignments of evolutionarily related proteins are finding increasing use as indicators of critical amino acid residues necessary for structural stability or involved in functional domains responsible for catalytic activities. In the past, a number of alignments have provided such information for the herpesviral thymidine kinases, for which three-dimensional structures are not yet available. We have sequenced the thymidine kinase gene of a canine herpesvirus, and with a multiple alignment have identified amino acids preferentially conserved in either of two taxons, the genera Varicellovirus and Simplexvirus, of the subfamily Alphaherpesvirinae. Since some regions of the thymidine kinases show otherwise elevated levels of substitutional tolerance, these conserved amino acids are candidates for critical residues which have become fixed through selection during the evolutionary divergence of these enzymes. Several pairs with distinctive patterns of distribution among the various viruses occur in or near highly conserved sequence motifs previously proposed to form the catalytic site, and we speculate that they may represent interacting, co-ordinately variable residues.

  9. The amino acid sequence of chymopapain from Carica papaya.

    PubMed Central

    Watson, D C; Yaguchi, M; Lynn, K R

    1990-01-01

    Chymopapain is a polypeptide of 218 amino acid residues. It has considerable structural similarity with papain and papaya proteinase omega, including conservation of the catalytic site and of the disulphide bonding. Chymopapain is like papaya proteinase omega in carrying four extra residues between papain positions 168 and 169, but differs from both papaya proteinases in the composition of its S2 subsite, as well as in having a second thiol group, Cys-117. Some evidence for the amino acid sequence of chymopapain has been deposited as Supplementary Publication SUP 50153 (12 pages) at the British Library Document Supply Centre, Boston Spa., Wetherby, West Yorkshire LS23 7BQ, U.K., from whom copies may be obtained on the terms indicated in Biochem. J. (1990) 265, 5. The information comprises Supplement Tables 1-4, which contain, in order, amino acid compositions of peptides from tryptic, peptic, CNBr and mild acid cleavages, Supplement Fig. 1, showing re-fractionation of selected peaks from Fig. 2 of the main paper. Supplement Fig. 2, showing cation-exchange chromatography of the earliest-eluted peak of Fig. 3 of the main paper, Supplement Fig. 3, showing reverse-phase h.p.l.c. of the later-eluted peak from Fig. 3 of the main paper, and Supplement Fig. 4, showing the separation of peptides after mild acid hydrolysis of CNBr-cleavage fragment CB3. PMID:2106878

  10. Exploring the role of putative active site amino acids and pro-region motif of recombinant falcipain-2: a principal hemoglobinase of Plasmodium falciparum.

    PubMed

    Kumar, Amit; Dasaradhi, P V N; Chauhan, Virander S; Malhotra, Pawan

    2004-04-23

    Falcipain-2 is one of the principal hemoglobinases of Plasmodium falciparum, a human malaria parasite. It has a typical papain family cysteine protease structural organization, a large pro-domain, a mature domain with conserved active site amino acids. Pro-domain of falcipain-2 also contains two important conserved motifs, "GNFD" and "ERFNIN." The "GNFD" motif has been shown to be responsible for correct folding and stability in case of many papain family proteases. In the present study, we carried out site-directed mutagenesis to assess the roles of active site residues and pro-domain residues for the activity of falcipain-2. Our results showed that substitutions of putative active site residues; Q36, C42, H174, and N204 resulted in complete loss of falcipain-2 activity, while W206 and D155 mutants retained partial/complete activity in comparison to the wild type falcipain-2. Homology modeling data also corroborate the results of mutagenesis; Q36, C42, H174, N204, and W206 residues form the active site loop of the enzyme and D155 lie outside the active pocket. Substitutions in the pro-region did not affect the activity of falcipain-2. This implies that falcipain-2 shares active site residues with other members of papain family, however pro-region of falcipain-2 does not play any role in the activity of enzyme.

  11. The amino acid sequence of rabbit cardiac troponin I.

    PubMed Central

    Grand, R J; Wilkinson, J M

    1976-01-01

    The complete amino acid sequence of troponin I from rabbit cardiac muscle was determined by the isolation of four unique CNBr fragments, together with overlapping tryptic peptides containing radioactive methionine residues. Overlap data for residues 35-36, 93-94 and 140-145 are incomplete, the sequence at these positions being based on homology with the sequence of the fast-skeletal-muscle protein. Cardiac troponin I is a single polypeptide chain of 206 residues with mol.wt. 23550 and an extinction coefficient, E 1%,1cm/280, of 4.37. The protein has a net positive charge of 14 and is thus somewhat more basic than troponin I from fast-skeletal muscle. Comparison of the sequences of troponin I from cardiac and fast skeletal muscle show that the cardiac protein has 26 extra residues at the N-terminus which account for the larger size of the protein. In the remainder of sequence there is a considerable degree of homology, this being greater in the C-terminal two-thirds of the molecule. The region in the cardiac protein corresponding to the peptide with inhibitory activity from the fast-skeletal-muscle protein is very similar and it seems unlikely that this is the cause of the difference in inhibitory activity between the two proteins. The region responsible for binding troponin C, however, possesses a lower degree of homology. Detailed evidence on which the sequence is based has been deposited as Supplementary Publication SUP 50072 (20 pages), at the British Library Lending Division, Boston Spa, Wetherby, West Yorkshire LS23 7QB, U.K., from whom copies may be obtained on the terms given in Biochem. J. (1976) 153, 5. PMID:1008822

  12. Amino acid sequence of a mouse immunoglobulin mu chain.

    PubMed Central

    Kehry, M; Sibley, C; Fuhrman, J; Schilling, J; Hood, L E

    1979-01-01

    The complete amino acid sequence of the mouse mu chain from the BALB/c myeloma tumor MOPC 104E is reported. The C mu region contains four consecutive homology regions of approximately 110 residues and a COOH-terminal region of 19 residues. A comparison of this mu chain from mouse with a complete mu sequence from human (Ou) and a partial mu chain sequence from dog (Moo) reveals a striking gradient of increasing homology from the NH2-terminal to the COOH-terminal portion of these mu chains, with the former being the least and the latter the most highly conserved. Four of the five sites of carbohydrate attachment appear to be at identical residue positions when the constant regions of the mouse and human mu chains are compared. The mu chain of MOPC 104E has a carbohydrate moiety attached in the second hypervariable region. This is particularly interesting in view of the fact that MOPC 104E binds alpha-(1 leads to 3)-dextran, a simple carbohydrate. The structural and functional constraints imposed by these comparative sequence analyses are discussed. PMID:111247

  13. Efficient motif search in ranked lists and applications to variable gap motifs

    PubMed Central

    Leibovich, Limor; Yakhini, Zohar

    2012-01-01

    Sequence elements, at all levels—DNA, RNA and protein, play a central role in mediating molecular recognition and thereby molecular regulation and signaling. Studies that focus on measuring and investigating sequence-based recognition make use of statistical and computational tools, including approaches to searching sequence motifs. State-of-the-art motif searching tools are limited in their coverage and ability to address large motif spaces. We develop and present statistical and algorithmic approaches that take as input ranked lists of sequences and return significant motifs. The efficiency of our approach, based on suffix trees, allows searches over motif spaces that are not covered by existing tools. This includes searching variable gap motifs—two half sites with a flexible length gap in between—and searching long motifs over large alphabets. We used our approach to analyze several high-throughput measurement data sets and report some validation results as well as novel suggested motifs and motif refinements. We suggest a refinement of the known estrogen receptor 1 motif in humans, where we observe gaps other than three nucleotides that also serve as significant recognition sites, as well as a variable length motif related to potential tyrosine phosphorylation. PMID:22416066

  14. Ultrasensitive nucleic acid sequence detection by single-molecule electrophoresis

    SciTech Connect

    Castro, A; Shera, E.B.

    1996-09-01

    This is the final report of a one-year laboratory-directed research and development project at Los Alamos National Laboratory. There has been considerable interest in the development of very sensitive clinical diagnostic techniques over the last few years. Many pathogenic agents are often present in extremely small concentrations in clinical samples, especially at the initial stages of infection, making their detection very difficult. This project sought to develop a new technique for the detection and accurate quantification of specific bacterial and viral nucleic acid sequences in clinical samples. The scheme involved the use of novel hybridization probes for the detection of nucleic acids combined with our recently developed technique of single-molecule electrophoresis. This project is directly relevant to the DOE`s Defense Programs strategic directions in the area of biological warfare counter-proliferation.

  15. The Role of Glutamic or Aspartic Acid in Position Four of the Epitope Binding Motif and Thyrotropin Receptor-Extracellular Domain Epitope Selection in Graves' Disease

    PubMed Central

    Inaba, Hidefumi; Martin, William; Ardito, Matt; De Groot, Anne Searls; De Groot, Leslie J.

    2010-01-01

    Context: Development of Graves' disease (GD) is related to HLA-DRB1*0301 (DR3),and more specifically to arginine at position 74 of the DRB1 molecule. The extracellular domain (ECD) of human TSH receptor (hTSH-R) contains the target antigen. Objective and Design: We analyzed the relation between hTSH-R-ECD peptides and DR molecules to determine whether aspartic acid (D) or glutamic acid (E) at position four in the binding motif influenced selection of functional epitopes. Results: Peptide epitopes from TSH-R-ECD with D or E in position four (D/E+) had higher affinity for binding to DR3 than peptides without D/E (D/E−) (IC50 29.3 vs. 61.4, P = 0.0024). HLA-DR7, negatively correlated with GD, and DRB1*0302 (HLA-DR18), not associated with GD, had different profiles of epitope binding. Toxic GD patients who are DR3+ had higher responses to D/E+ peptides than D/E− peptides (stimulation index 1.42 vs. 1.22, P = 0.028). All DR3+ GD patients (toxic + euthyroid) had higher responses, with borderline significance (Sl; 1.32 vs. 1.18, P = 0.051). Splenocytes of DR3 transgenic mice immunized to TSH-R-ECD responded to D/E+ peptides more than D/E− peptides (stimulation index 1.95 vs. 1.69, P = 0.036). Seven of nine hTSH-R-ECD peptide epitopes reported to be reactive with GD patients' peripheral blood mononuclear cells contain binding motifs with D/E at position four. Conclusions: TSH-R-ECD epitopes with D/E in position four of the binding motif bind more strongly to DRB1*0301 than epitopes that are D/E− and are more stimulatory to GD patients' peripheral blood mononuclear cells and to splenocytes from mice immunized to hTSH-R. These epitopes appear important in immunogenicity to TSH-R due to their favored binding to HLA-DR3, thus increasing presentation to T cells. PMID:20392871

  16. Nucleic acid (cDNA) and amino acid sequences of alpha-type gliadins from wheat (Triticum aestivum).

    PubMed Central

    Kasarda, D D; Okita, T W; Bernardin, J E; Baecker, P A; Nimmo, C C; Lew, E J; Dietler, M D; Greene, F C

    1984-01-01

    The complete amino acid sequence for an alpha-type gliadin protein of wheat (Triticum aestivum Linnaeus) endosperm has been derived from a cloned cDNA sequence. An additional cDNA clone that corresponds to about 75% of a similar alpha-type gliadin has been sequenced and shows some important differences. About 97% of the composite sequence of A-gliadin (an alpha-type gliadin fraction) has also been obtained by direct amino acid sequencing. This sequence shows a high degree of similarity with amino acid sequences derived from both cDNA clones and is virtually identical to one of them. On the basis of sequence information, after loss of the signal sequence, the mature alpha-type gliadins may be divided into five different domains, two of which may have evolved from an ancestral gliadin gene, whereas the remaining three contain repeating sequences that may have developed independently. Images PMID:6589619

  17. A motif unique to the human DEAD-box protein DDX3 is important for nucleic acid binding, ATP hydrolysis, RNA/DNA unwinding and HIV-1 replication.

    PubMed

    Garbelli, Anna; Beermann, Sandra; Di Cicco, Giulia; Dietrich, Ursula; Maga, Giovanni

    2011-05-12

    DEAD-box proteins are enzymes endowed with nucleic acid-dependent ATPase, RNA translocase and unwinding activities. The human DEAD-box protein DDX3 has been shown to play important roles in tumor proliferation and viral infections. In particular, DDX3 has been identified as an essential cofactor for HIV-1 replication. Here we characterized a set of DDX3 mutants biochemically with respect to nucleic acid binding, ATPase and helicase activity. In particular, we addressed the functional role of a unique insertion between motifs I and Ia of DDX3 and provide evidence for its implication in nucleic acid binding and HIV-1 replication. We show that human DDX3 lacking this domain binds HIV-1 RNA with lower affinity. Furthermore, a specific peptide ligand for this insertion selected by phage display interferes with HIV-1 replication after transduction into HelaP4 cells. Besides broadening our understanding of the structure-function relationships of this important protein, our results identify a specific domain of DDX3 which may be suited as target for antiviral drugs designed to inhibit cellular cofactors for HIV-1 replication.

  18. Dipeptide frequency/bias analysis identifies conserved sites of nonrandomness shared by cysteine-rich motifs.

    PubMed

    Campion, S R; Ameen, A S; Lai, L; King, J M; Munzenmaier, T N

    2001-08-15

    This report describes the application of a simple computational tool, AAPAIR.TAB, for the systematic analysis of the cysteine-rich EGF, Sushi, and Laminin motif/sequence families at the two-amino acid level. Automated dipeptide frequency/bias analysis detects preferences in the distribution of amino acids in established protein families, by determining which "ordered dipeptides" occur most frequently in comprehensive motif-specific sequence data sets. Graphic display of the dipeptide frequency/bias data revealed family-specific preferences for certain dipeptides, but more importantly detected a shared preference for employment of the ordered dipeptides Gly-Tyr (GY) and Gly-Phe (GF) in all three protein families. The dipeptide Asn-Gly (NG) also exhibited high-frequency and bias in the EGF and Sushi motif families, whereas Asn-Thr (NT) was distinguished in the Laminin family. Evaluation of the distribution of dipeptides identified by frequency/bias analysis subsequently revealed the highly restricted localization of the G(F/Y) and N(G/T) sequence elements at two separate sites of extreme conservation in the consensus sequence of all three sequence families. The similar employment of the high-frequency/bias dipeptides in three distinct protein sequence families was further correlated with the concurrence of these shared molecular determinants at similar positions within the distinctive scaffolds of three structurally divergent, but similarly employed, motif modules.

  19. Structural gene and complete amino acid sequence of Vibrio alginolyticus collagenase.

    PubMed Central

    Takeuchi, H; Shibano, Y; Morihara, K; Fukushima, J; Inami, S; Keil, B; Gilles, A M; Kawamoto, S; Okuda, K

    1992-01-01

    The DNA encoding the collagenase of Vibrio alginolyticus was cloned, and its complete nucleotide sequence was determined. When the cloned gene was ligated to pUC18, the Escherichia coli expression vector, bacteria carrying the gene exhibited both collagenase antigen and collagenase activity. The open reading frame from the ATG initiation codon was 2442 bp in length for the collagenase structural gene. The amino acid sequence, deduced from the nucleotide sequence, revealed that the mature collagenase consists of 739 amino acids with an Mr of 81875. The amino acid sequences of 20 polypeptide fragments were completely identical with the deduced amino acid sequences of the collagenase gene. The amino acid composition predicted from the DNA sequence was similar to the chemically determined composition of purified collagenase reported previously. The analyses of both the DNA and amino acid sequences of the collagenase gene were rigorously performed, but we could not detect any significant sequence similarity to other collagenases. Images Fig. 2. PMID:1311172

  20. Zinc finger binding motifs do not explain recombination rate variation within or between species of Drosophila.

    PubMed

    Heil, Caiti S S; Noor, Mohamed A F

    2012-01-01

    In humans and mice, the Cys(2)His(2) zinc finger protein PRDM9 binds to a DNA sequence motif enriched in hotspots of recombination, possibly modifying nucleosomes, and recruiting recombination machinery to initiate Double Strand Breaks (DSBs). However, since its discovery, some researchers have suggested that the recombinational effect of PRDM9 is lineage or species specific. To test for a conserved role of PRDM9-like proteins across taxa, we use the Drosophila pseudoobscura species group in an attempt to identify recombination associated zinc finger proteins and motifs. We leveraged the conserved amino acid motifs in Cys(2)His(2) zinc fingers to predict nucleotide binding motifs for all Cys(2)His(2) zinc finger proteins in Drosophila pseudoobscura and identified associations with empirical measures of recombination rate. Additionally, we utilized recombination maps from D. pseudoobscura and D. miranda to explore whether changes in the binding motifs between species can account for changes in the recombination landscape, analogous to the effect observed in PRDM9 among human populations. We identified a handful of potential recombination-associated sequence motifs, but the associations are generally tenuous and their biological relevance remains uncertain. Furthermore, we found no evidence that changes in zinc finger DNA binding explains variation in recombination rate between species. We therefore conclude that there is no protein with a DNA sequence specific human-PRDM9-like function in Drosophila. We suggest these findings could be explained by the existence of a different recombination initiation system in Drosophila.

  1. Cloning, sequence analysis and expression of the F1F0-ATPase beta-subunit from wine lactic acid bacteria.

    PubMed

    Sievers, Martin; Uermösi, Christina; Fehlmann, Marc; Krieger, Sibylle

    2003-09-01

    The nucleotide sequences of the genes encoding the F1F0-ATPase beta-subunit from Oenococcus oeni, Leuconostoc mesenteroides subsp. mesenteroides, Pediococcus damnosus, Pediococcus parvulus, Lactobacillus brevis and Lactobacillus hilgardii were determined. Their deduced amino acid sequences showed homology values of 79-98%. Data from the alignment and ATPase tree indicated that O. oeni and L. mesenteroides subsp. mesenteroides formed a group well-separated from P. damnosus and P. parvulus and from the group comprises L. brevis and L. hilgardii. The N-terminus of the F1F0-ATPase beta-subunit of O. oeni contains a stretch of additional 38 amino acid residues. The catalytic site of the ATPase beta-subunit of the investigated strains is characterized by the two conserved motifs GGAGVGKT and GERTRE. The amplified atpD coding sequences were inserted into the pCRT7/CT-TOPO vector using TA-cloning strategy and transformed in Escherichia coli. SDS-PAGE and Western blot analyses confirmed that O. oeni has an ATPase beta-subunit protein which is larger in size than the corresponding molecules from the investigated strains.

  2. Stochastic motif extraction using hidden Markov model

    SciTech Connect

    Fujiwara, Yukiko; Asogawa, Minoru; Konagaya, Akihiko

    1994-12-31

    In this paper, we study the application of an HMM (hidden Markov model) to the problem of representing protein sequences by a stochastic motif. A stochastic protein motif represents the small segments of protein sequences that have a certain function or structure. The stochastic motif, represented by an HMM, has conditional probabilities to deal with the stochastic nature of the motif. This HMM directive reflects the characteristics of the motif, such as a protein periodical structure or grouping. In order to obtain the optimal HMM, we developed the {open_quotes}iterative duplication method{close_quotes} for HMM topology learning. It starts from a small fully-connected network and iterates the network generation and parameter optimization until it achieves sufficient discrimination accuracy. Using this method, we obtained an HMM for a leucine zipper motif. Compared to the accuracy of a symbolic pattern representation with accuracy of 14.8 percent, an HMM achieved 79.3 percent in prediction. Additionally, the method can obtain an HMM for various types of zinc finger motifs, and it might separate the mixed data. We demonstrated that this approach is applicable to the validation of the protein databases; a constructed HMM b as indicated that one protein sequence annotated as {open_quotes}lencine-zipper like sequence{close_quotes} in the database is quite different from other leucine-zipper sequences in terms of likelihood, and we found this discrimination is plausible.

  3. Heavy-atom Database System: a tool for the preparation of heavy-atom derivatives of protein crystals based on amino-acid sequence and crystallization conditions.

    PubMed

    Sugahara, Michihiro; Asada, Yukuhiko; Ayama, Haruhiko; Ukawa, Hisashi; Taka, Hideyuki; Kunishima, Naoki

    2005-09-01

    Heavy-atom Database System (HATODAS) is a WWW-based tool designed to assist the heavy-atom derivatization of proteins. The conventional procedure for the preparation of derivatives is usually a time-consuming 'trial-and-error' process. The present program provides a solution for this problem using a database of known heavy-atom derivatives. A database search suggests potential heavy-atom reagents for any target protein based on its amino-acid sequence and crystallization conditions. A mining of the database identified 93 preferred motifs for heavy-atom binding. The motifs are observed frequently at the actual heavy-atom-binding sites encountered in the process of structure determination.

  4. Rapid fixation of a distinctive sequence motif in the 3' noncoding region of the clade of West Nile virus invading North America.

    PubMed

    Hughes, Austin L; Piontkivska, Helen; Foppa, Ivo

    2007-09-15

    Phylogenetic analysis of complete genomes of West Nile virus (WNV) by a variety of methods supported the hypothesis that North American isolates of WNV constitute a monophyletic group, together with an isolate from Israel and one from Hungary. We used ancestral sequence reconstruction in order to obtain evidence for evolutionary changes that might be correlated with increased virulence in this clade (designated the N.A. clade). There was one amino acid change (I-->T at residue 356 of the NS3 protein) that occurred in the ancestor of the N.A. clade and remained conserved in all N.A. clade genomes analyzed. There were four changes in the upstream portion of the 3' noncoding region (the AT-enriched region) that occurred in the ancestor of the N.A. clade and remained conserved in all N.A. clade genomes analyzed, changes predicted to alter RNA secondary structure. The AT-enriched region showed a higher rate of substitution in the branch ancestral to the N.A. clade, relative to polymorphism, than did the remainder of the noncoding regions, synonymous sites in coding regions, or nonsynonymous sites in coding regions. The high rate of occurrence of fixed nucleotide substitutions in this region suggests that positive Darwinian selection may have acted on this portion of the 3'NCR and that these fixed changes, possibly in concert with the amino acid change in NS3, may underlie phenotypic effects associated with increased virulence in North American WNV.

  5. Temporal motifs in time-dependent networks

    NASA Astrophysics Data System (ADS)

    Kovanen, Lauri; Karsai, Márton; Kaski, Kimmo; Kertész, János; Saramäki, Jari

    2011-11-01

    Temporal networks are commonly used to represent systems where connections between elements are active only for restricted periods of time, such as telecommunication, neural signal processing, biochemical reaction and human social interaction networks. We introduce the framework of temporal motifs to study the mesoscale topological-temporal structure of temporal networks in which the events of nodes do not overlap in time. Temporal motifs are classes of similar event sequences, where the similarity refers not only to topology but also to the temporal order of the events. We provide a mapping from event sequences to coloured directed graphs that enables an efficient algorithm for identifying temporal motifs. We discuss some aspects of temporal motifs, including causality and null models, and present basic statistics of temporal motifs in a large mobile call network.

  6. A motif rich in charged residues determines product specificity in isomaltulose synthase.

    PubMed

    Zhang, Daohai; Li, Nan; Swaminathan, Kunchithapadam; Zhang, Lian Hui

    2003-01-16

    Isomaltulose synthase (PalI) catalyzes hydrolysis of sucrose and formation of alpha-1,6 and alpha-1,1 bonds to produce isomaltulose (alpha-D-glucosylpyranosyl-1,6-D-fructofranose) and small amount of trehalulose (alpha-D-glucosylpyranosyl-1,1-D-fructofranose). A potential isomaltulose synthase-specific motif ((325)RLDRD(329)), that contains a 'DxD' motif conserved in many glycosyltransferases, was identified based on sequence comparison with reference to the secondary structural features of PalI and homologs. Site-directed mutagenesis analysis of the motif showed that the four charged amino acid residues (Arg(325), Arg(328), Asp(327) and Asp(329)) influence the enzyme kinetics and determine the product specificity. Mutation of these four residues increased trehalulose formation by 17-61% and decreased isomaltulose by 26-67%. We conclude that the 'RLDRD' motif controls the product specificity of PalI.

  7. Biosynthesis of D-alanyl-lipoteichoic acid: cloning, nucleotide sequence, and expression of the Lactobacillus casei gene for the D-alanine-activating enzyme.

    PubMed Central

    Heaton, M P; Neuhaus, F C

    1992-01-01

    The D-alanine-activating enzyme (Dae; EC 6.3.2.4) encoded by the dae gene from Lactobacillus casei ATCC 7469 is a cytosolic protein essential for the formation of the D-alanyl esters of membrane-bound lipoteichoic acid. The gene has been cloned, sequenced, and expressed in Escherichia coli, an organism which does not possess Dae activity. The open reading frame is 1,518 nucleotides and codes for a protein of 55.867 kDa, a value in agreement with the 56 kDa obtained by electrophoresis. A putative promoter and ribosome-binding site immediately precede the dae gene. A second open reading frame contiguous with the dae gene has also been partially sequenced. The organization of these genetic elements suggests that more than one enzyme necessary for the biosynthesis of D-alanyl-lipoteichoic acid may be present in this operon. Analysis of the amino acid sequence deduced from the dae gene identified three regions with significant homology to proteins in the following groups of ATP-utilizing enzymes: (i) the acid-thiol ligases, (ii) the activating enzymes for the biosynthesis of enterobactin, and (iii) the synthetases for tyrocidine, gramicidin S, and penicillin. From these comparisons, a common motif (GXXGXPK) has been identified that is conserved in the 19 protein domains analyzed. This motif may represent the phosphate-binding loop of an ATP-binding site for this class of enzymes. A DNA fragment (1,568 nucleotides) containing the dae gene and its putative ribosome-binding site has been subcloned and expressed in E. coli. Approximately 0.5% of the total cell protein is active Dae, whereas 21% is in the form of inclusion bodies. The isolation of this minimal fragment without a native promoter sequence provides the basis for designing a genetic system for modulating the D-alanine ester content of lipoteichoic acid. PMID:1385594

  8. Localization of the labile disulfide bond between SU and TM of the murine leukemia virus envelope protein complex to a highly conserved CWLC motif in SU that resembles the active-site sequence of thiol-disulfide exchange enzymes.

    PubMed Central

    Pinter, A; Kopelman, R; Li, Z; Kayman, S C; Sanders, D A

    1997-01-01

    Previous studies have indicated that the surface (SU) and transmembrane (TM) subunits of the envelope protein (Env) of murine leukemia viruses (MuLVs) are joined by a labile disulfide bond that can be stabilized by treatment of virions with thiol-specific reagents. In the present study this observation was extended to the Envs of additional classes of MuLV, and the cysteines of SU involved in this linkage were mapped by proteolytic fragmentation analyses to the CWLC sequence present at the beginning of the C-terminal domain of SU. This sequence is highly conserved across a broad range of distantly related retroviruses and resembles the CXXC motif present at the active site of thiol-disulfide exchange enzymes. A model is proposed in which rearrangements of the SU-TM intersubunit disulfide linkage, mediated by the CWLC sequence, play roles in the assembly and function of the Env complex. PMID:9311907

  9. Nucleic acid (cDNA) and amino acid sequences of the maize endosperm protein glutelin-2.

    PubMed Central

    Prat, S; Cortadas, J; Puigdomènech, P; Palau, J

    1985-01-01

    The cDNA coding for a glutelin-2 protein from maize endosperm has been cloned and the complete amino acid sequence of the protein derived for the first time. An immature maize endosperm cDNA bank was screened for the expression of a beta-lactamase:glutelin-2 (G2) fusion polypeptide by using antibodies against the purified 28 kd G2 protein. A clone corresponding to the 28 kd G2 protein was sequenced and the primary structure of this protein was derived. Five regions can be defined in the protein sequence: an 11 residue N-terminal part, a repeated region formed by eight units of the sequence Pro-Pro-Pro-Val-His-Leu, an alternating Pro-X stretch 21 residues long, a Cys rich domain and a C-terminal part rich in Gln. The protein sequence is preceded by 19 residues which have the characteristics of the signal peptide found in secreted proteins. Unlike zeins, the main maize storage proteins, 28 kd glutelin-2 has several homologous sequences in common with other cereal storage proteins. Images PMID:3839076

  10. RSAT peak-motifs: motif analysis in full-size ChIP-seq datasets.

    PubMed

    Thomas-Chollier, Morgane; Herrmann, Carl; Defrance, Matthieu; Sand, Olivier; Thieffry, Denis; van Helden, Jacques

    2012-02-01

    ChIP-seq is increasingly used to characterize transcription factor binding and chromatin marks at a genomic scale. Various tools are now available to extract binding motifs from peak data sets. However, most approaches are only available as command-line programs, or via a website but with size restrictions. We present peak-motifs, a computational pipeline that discovers motifs in peak sequences, compares them with databases, exports putative binding sites for visualization in the UCSC genome browser and generates an extensive report suited for both naive and expert users. It relies on time- and memory-efficient algorithms enabling the treatment of several thousand peaks within minutes. Regarding time efficiency, peak-motifs outperforms all comparable tools by several orders of magnitude. We demonstrate its accuracy by analyzing data sets ranging from 4000 to 1,28,000 peaks for 12 embryonic stem cell-specific transcription factors. In all cases, the program finds the expected motifs and returns additional motifs potentially bound by cofactors. We further apply peak-motifs to discover tissue-specific motifs in peak collections for the p300 transcriptional co-activator. To our knowledge, peak-motifs is the only tool that performs a complete motif analysis and offers a user-friendly web interface without any restriction on sequence size or number of peaks.

  11. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2010-07-01 2010-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences § 1.821 Nucleotide and/or amino acid sequence disclosures in patent applications. (a) Nucleotide...

  12. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2012-07-01 2012-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences § 1.821 Nucleotide and/or amino acid sequence disclosures in patent applications. (a) Nucleotide...

  13. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2014-07-01 2014-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences § 1.821 Nucleotide and/or amino acid sequence disclosures in patent applications. (a) Nucleotide...

  14. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2011-07-01 2011-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences § 1.821 Nucleotide and/or amino acid sequence disclosures in patent applications. (a) Nucleotide...

  15. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2013-07-01 2013-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences § 1.821 Nucleotide and/or amino acid sequence disclosures in patent applications. (a) Nucleotide...

  16. Modulation of anti-endotoxin property of Temporin L by minor amino acid substitution in identified phenylalanine zipper sequence.

    PubMed

    Srivastava, Saurabh; Kumar, Amit; Tripathi, Amit Kumar; Tandon, Anshika; Ghosh, Jimut Kanti

    2016-11-01

    A 13-residue frog antimicrobial peptide Temporin L (TempL) possesses versatile antimicrobial activities and is considered a lead molecule for the development of new antimicrobial agents. To find out the amino acid sequences that influence the anti-microbial property of TempL, a phenylalanine zipper-like sequence was identified in it which was not reported earlier. Several alanine-substituted analogs and a scrambled peptide having the same composition of TempL were designed for evaluating the role of this motif. To investigate whether leucine residues instead of phenylalanine residues at 'a' and/or 'd' position(s) of the heptad repeat sequence could alter its antimicrobial property, several TempL analogs were synthesized after replacing these phenylalanine residues with leucine residues. Replacing phenylalanine residues with alanine residues in the phenylalanine zipper sequence significantly compromised the anti-endotoxin property of TempL. This is evident from the higher production of tumor necrosis factor-α and interleukin-6 in lipopolysaccharide (LPS)-stimulated rat bone-marrow-derived macrophage cells in the presence of its alanine-substituted analogs than TempL itself. However, replacement of these phenylalanine residues with leucine residues significantly augmented anti-endotoxin property of TempL. A single alanine-substituted TempL analog (F8A-TempL) showed significantly reduced cytotoxicity but retained the antibacterial activity of TempL, while the two single leucine-substituted analogs (F5L-TempL and F8L-TempL), although exhibiting lower cytotoxicity, were able to retain the antibacterial activity of the parent peptide. The results demonstrate how minor amino acid substitutions in the identified phenylalanine zipper sequence in TempL could yield analogs with better antibacterial and/or anti-endotoxin properties with their plausible mechanism of action.

  17. Rapid Fixation of a Distinctive Sequence Motif in the 3′Noncoding Region of the Clade of West Nile Virus Invading North America

    PubMed Central

    Hughes, Austin L.; Piontkivska, Helen; Foppa, Ivo

    2007-01-01

    Phylogenetic analysis of complete genomes of West Nile virus (WNV) by a variety of methods supported the hypothesis that North American isolates of WNV constitute a monophyletic group, together with an isolate from Israel and one from Hungary. We used ancestral sequence reconstruction in order to obtain evidence for evolutionary changes that might be correlated with increased virulence in this clade (designated the N.A. clade). There was one amino acid change (I→T at residue 356 of the NS3 protein) that occurred in the ancestor of the N.A. clade and remained conserved in all N.A. clade genomes analyzed. There were four changes in the upstream portion of the 3′ noncoding region (the AT-enriched region) that occurred in the ancestor of the N.A. clade and remained conserved in all N.A. clade genomes analyzed, changes predicted to alter RNA secondary structure. The AT-enriched region showed a higher rate of substitution in the branch ancestral to the N.A. clade, relative to polymorphism, than did the remainder of the non-coding regions, synonymous sites in coding regions, or nonsynonymous sites in coding regions. The high rate of occurrence of fixed nucleotide substitutions in this region suggests that positive Darwinian selection may have acted on this portion of the 3′NCR and that these fixed changes, possibly in concert with the amino acid change in NS3, may underlie phenotypic effects associated with increased virulence in North American WNV. PMID:17587514

  18. Assembly of supramolecular DNA complexes containing both G-quadruplexes and i-motifs by enhancing the G-repeat-bearing capacity of i-motifs

    PubMed Central

    Cao, Yanwei; Gao, Shang; Yan, Yuting; Bruist, Michael F.; Wang, Bing; Guo, Xinhua

    2017-01-01

    The single-step assembly of supramolecular complexes containing both i-motifs and G-quadruplexes (G4s) is demonstrated. This can be achieved because the formation of four-stranded i-motifs appears to be little affected by certain terminal residues: a five-cytosine tetrameric i-motif can bear ten-base flanking residues. However, things become complex when different lengths of guanine-repeats are added at the 3′ or 5′ ends of the cytosine-repeats. Here, a series of oligomers d(XGiXC5X) and d(XC5XGiX) (X = A, T or none; i < 5) are designed to study the impact of G-repeats on the formation of tetrameric i-motifs. Our data demonstrate that tetramolecular i-motif structure can tolerate specific flanking G-repeats. Assemblies of these oligonucleotides are polymorphic, but may be controlled by solution pH and counter ion species. Importantly, we find that the sequences d(TGiAC5) can form the tetrameric i-motif in large quantities. This leads to the design of two oligonucleotides d(TG4AC7) and d(TGBrGGBrGAC7) that self-assemble to form quadruplex supramolecules under certain conditions. d(TG4AC7) forms supramolecules under acidic conditions in the presence of K+ that are mainly V-shaped or ring-like containing parallel G4s and antiparallel i-motifs. d(TGBrGGBrGAC7) forms long linear quadruplex wires under acidic conditions in the presence of Na+ that consist of both antiparallel G4s and i-motifs. PMID:27899568

  19. Human liver apolipoprotein B-100 cDNA: complete nucleic acid and derived amino acid sequence.

    PubMed Central

    Law, S W; Grant, S M; Higuchi, K; Hospattankar, A; Lackner, K; Lee, N; Brewer, H B

    1986-01-01

    Human apolipoprotein B-100 (apoB-100), the ligand on low density lipoproteins that interacts with the low density lipoprotein receptor and initiates receptor-mediated endocytosis and low density lipoprotein catabolism, has been cloned, and the complete nucleic acid and derived amino acid sequences have been determined. ApoB-100 cDNAs were isolated from normal human liver cDNA libraries utilizing immunoscreening as well as filter hybridization with radiolabeled apoB-100 oligodeoxynucleotides. The apoB-100 mRNA is 14.1 kilobases long encoding a mature apoB-100 protein of 4536 amino acids with a calculated amino acid molecular weight of 512,723. ApoB-100 contains 20 potential glycosylation sites, and 12 of a total of 25 cysteine residues are located in the amino-terminal region of the apolipoprotein providing a potential globular structure of the amino terminus of the protein. ApoB-100 contains relatively few regions of amphipathic helices, but compared to other human apolipoproteins it is enriched in beta-structure. The delineation of the entire human apoB-100 sequence will now permit a detailed analysis of the conformation of the protein, the low density lipoprotein receptor binding domain(s), and the structural relationship between apoB-100 and apoB-48 and will provide the basis for the study of genetic defects in apoB-100 in patients with dyslipoproteinemias. PMID:3464946

  20. Computer selection of oligonucleotide probes from amino acid sequences for use in gene library screening.

    PubMed

    Yang, J H; Ye, J H; Wallace, D C

    1984-01-11

    We present a computer program, FINPROBE, which utilizes known amino acid sequence data to deduce minimum redundancy oligonucleotide probes for use in screening cDNA or genomic libraries or in primer extension. The user enters the amino acid sequence of interest, the desired probe length, the number of probes sought, and the constraints on oligonucleotide synthesis. The computer generates a table of possible probes listed in increasing order of redundancy and provides the location of each probe in the protein and mRNA coding sequence. Activation of a next function provides the amino acid and mRNA sequences of each probe of interest as well as the complementary sequence and the minimum dissociation temperature of the probe. A final routine prints out the amino acid sequence of the protein in parallel with the mRNA sequence listing all possible codons for each amino acid.

  1. Identification of second arginine-glycine-aspartic acid motif of ovine vitronectin as the complement C9 binding site and its implication in bacterial infection.

    PubMed

    T, Prasada Rao; T, Lakshmi Prasanth; R, Parvathy; S, Murugavel; Devi, Karuna; Joshi, Paritosh

    2017-02-02

    Vitronectin (Vn), a multifunctional protein of blood and extracellular matrix interacts with complement C9. This interaction may modulate innate immunity. Details of Vn-C9 interaction are limited. An assessment of Vn-C9 interaction was made employing goat homologous system. Vn binding to C9 was observed in three different assays. Using recombinant fragments, the C9 binding was mapped to the N-terminus of Vn. Site directed mutagenesis was performed to alter the second RGD sequence (RGD-2) of Vn. Change of R to G or D to A in RGD-2 caused significant decrease in Vn binding to C9 whereas change of R to G in the first RGD motif (RGD-1) had no effect on Vn binding to C9. These results imply that the RGD-2 of goat Vn is involved in C9 binding. In competitive binding assay, the presence of soluble RGD peptide inhibited Vn binding to C9 whereas heparin had no effect. Vn binding to C9 in terms of bacterial pathogenesis was also evaluated. Serum dependent inhibition of E. coli growth was significantly reverted when Vn or its N-fragment were included in the assay. The C-fragment, which did not support C9 binding, also partly nullified serum dependent inhibition of bacterial growth probably through other serum component(s).

  2. L-Rhamnose-binding lectin from eggs of the Echinometra lucunter: Amino acid sequence and molecular modeling.

    PubMed

    Carneiro, Rômulo Farias; Teixeira, Claudener Souza; de Melo, Arthur Alves; de Almeida, Alexandra Sampaio; Cavada, Benildo Sousa; de Sousa, Oscarina Viana; da Rocha, Bruno Anderson Matias; Nagano, Celso Shiniti; Sampaio, Alexandre Holanda

    2015-01-01

    An L-rhamnose-binding lectin named ELEL was isolated from eggs of the rock boring sea urchin Echinometra lucunter by affinity chromatography on lactosyl-agarose. ELEL is a homodimer linked by a disulfide bond with subunits of 11 kDa each. The new lectin was inhibited by saccharides possessing the same configuration of hydroxyl groups at C-2 and C-4, such as L-rhamnose, melibiose, galactose and lactose. The amino acid sequence of ELEL was determined by tandem mass spectrometry. The ELEL subunit has 103 amino acids, including nine cysteine residues involved in four conserved intrachain disulfide bonds and one interchain disulfide bond. The full sequence of ELEL presents conserved motifs commonly found in rhamnose-binding lectins, including YGR, DPC and KYL. A three-dimensional model of ELEL was created, and molecular docking revealed favorable binding energies for interactions between ELEL and rhamnose, melibiose and Gb3 (Galα1-4Galβ1-4Glcβ1-Cer). Furthermore, ELEL was able to agglutinate Gram-positive bacterial cells, suggesting its ability to recognize pathogens.

  3. RAG-1 interacts with the repeated amino acid motif of the human homologue of the yeast protein SRP1.

    PubMed Central

    Cortes, P; Ye, Z S; Baltimore, D

    1994-01-01

    Genes for immunoglobulins and T-cell receptor are generated by a process known as V(D)J recombination. This process is highly regulated and mediated by the recombination activating proteins RAG-1 and RAG-2. By the use of the two-hybrid protein interaction system, we isolated a human protein that specifically interacts with RAG-1. This protein is the human homologue of the yeast SRP1 (suppressor of a temperature-sensitive RNA polymerase I mutation). The SRP1-1 mutation is an allele-specific dominant suppressor of a temperature-sensitive mutation in the zinc binding domain of the 190-kDa subunit of Saccharomyces cerevisiae RNA polymerase I. The human SRP cDNA clone was used to screen a mouse cDNA library. We obtained a 3.9-kbp cDNA clone encoding the mouse SRP1. The open reading frame of this cDNA encodes a 538-amino acid protein with eight degenerate repeats of 40-45 amino acids each. The mouse and human SRP1 are 98% identical, while the mouse and yeast SRP1 have 48% identity. After cotransfection of the genes encoding RAG-1 and human SRP1 into 293T cells, a stable complex was evident. Deletion analysis indicated that the region of the SRP1 protein interacting with RAG-1 involved four repeats. The domain of RAG-1 that associates with SRP1 mapped N-terminal to the zinc finger domain. Because this region of RAG-1 is not required for recombination and SRP1 appears to be bound to the nuclear envelope, we suggest that this interaction helps to localize RAG-1. Images PMID:8052633

  4. Probability distribution of intersymbol distances in random symbolic sequences: Applications to improving detection of keywords in texts and of amino acid clustering in proteins

    NASA Astrophysics Data System (ADS)

    Carpena, Pedro; Bernaola-Galván, Pedro A.; Carretero-Campos, Concepción; Coronado, Ana V.

    2016-11-01

    Symbolic sequences have been extensively investigated in the past few years within the framework of statistical physics. Paradigmatic examples of such sequences are written texts, and deoxyribonucleic acid (DNA) and protein sequences. In these examples, the spatial distribution of a given symbol (a word, a DNA motif, an amino acid) is a key property usually related to the symbol importance in the sequence: The more uneven and far from random the symbol distribution, the higher the relevance of the symbol to the sequence. Thus, many techniques of analysis measure in some way the deviation of the symbol spatial distribution with respect to the random expectation. The problem is then to know the spatial distribution corresponding to randomness, which is typically considered to be either the geometric or the exponential distribution. However, these distributions are only valid for very large symbolic sequences and for many occurrences of the analyzed symbol. Here, we obtain analytically the exact, randomly expected spatial distribution valid for any sequence length and any symbol frequency, and we study its main properties. The knowledge of the distribution allows us to define a measure able to properly quantify the deviation from randomness of the symbol distribution, especially for short sequences and low symbol frequency. We apply the measure to the problem of keyword detection in written texts and to study amino acid clustering in protein sequences. In texts, we show how the results improve with respect to previous methods when short texts are analyzed. In proteins, which are typically short, we show how the measure quantifies unambiguously the amino acid clustering and characterize its spatial distribution.

  5. Probability distribution of intersymbol distances in random symbolic sequences: Applications to improving detection of keywords in texts and of amino acid clustering in proteins.

    PubMed

    Carpena, Pedro; Bernaola-Galván, Pedro A; Carretero-Campos, Concepción; Coronado, Ana V

    2016-11-01

    Symbolic sequences have been extensively investigated in the past few years within the framework of statistical physics. Paradigmatic examples of such sequences are written texts, and deoxyribonucleic acid (DNA) and protein sequences. In these examples, the spatial distribution of a given symbol (a word, a DNA motif, an amino acid) is a key property usually related to the symbol importance in the sequence: The more uneven and far from random the symbol distribution, the higher the relevance of the symbol to the sequence. Thus, many techniques of analysis measure in some way the deviation of the symbol spatial distribution with respect to the random expectation. The problem is then to know the spatial distribution corresponding to randomness, which is typically considered to be either the geometric or the exponential distribution. However, these distributions are only valid for very large symbolic sequences and for many occurrences of the analyzed symbol. Here, we obtain analytically the exact, randomly expected spatial distribution valid for any sequence length and any symbol frequency, and we study its main properties. The knowledge of the distribution allows us to define a measure able to properly quantify the deviation from randomness of the symbol distribution, especially for short sequences and low symbol frequency. We apply the measure to the problem of keyword detection in written texts and to study amino acid clustering in protein sequences. In texts, we show how the results improve with respect to previous methods when short texts are analyzed. In proteins, which are typically short, we show how the measure quantifies unambiguously the amino acid clustering and characterize its spatial distribution.

  6. Automated classification of RNA 3D motifs and the RNA 3D Motif Atlas.

    PubMed

    Petrov, Anton I; Zirbel, Craig L; Leontis, Neocles B

    2013-10-01

    The analysis of atomic-resolution RNA three-dimensional (3D) structures reveals that many internal and hairpin loops are modular, recurrent, and structured by conserved non-Watson-Crick base pairs. Structurally similar loops define RNA 3D motifs that are conserved in homologous RNA molecules, but can also occur at nonhomologous sites in diverse RNAs, and which often vary in sequence. To further our understanding of RNA motif structure and sequence variability and to provide a useful resource for structure modeling and prediction, we present a new method for automated classification of internal and hairpin loop RNA 3D motifs and a new online database called the RNA 3D Motif Atlas. To classify the motif instances, a representative set of internal and hairpin loops is automatically extracted from a nonredundant list of RNA-containing PDB files. Their structures are compared geometrically, all-against-all, using the FR3D program suite. The loops are clustered into motif groups, taking into account geometric similarity and structural annotations and making allowance for a variable number of bulged bases. The automated procedure that we have implemented identifies all hairpin and internal loop motifs previously described in the literature. All motif instances and motif groups are assigned unique and stable identifiers and are made available in the RNA 3D Motif Atlas (http://rna.bgsu.edu/motifs), which is automatically updated every four weeks. The RNA 3D Motif Atlas provides an interactive user interface for exploring motif diversity and tools for programmatic data access.

  7. Automated classification of RNA 3D motifs and the RNA 3D Motif Atlas

    PubMed Central

    Petrov, Anton I.; Zirbel, Craig L.; Leontis, Neocles B.

    2013-01-01

    The analysis of atomic-resolution RNA three-dimensional (3D) structures reveals that many internal and hairpin loops are modular, recurrent, and structured by conserved non-Watson–Crick base pairs. Structurally similar loops define RNA 3D motifs that are conserved in homologous RNA molecules, but can also occur at nonhomologous sites in diverse RNAs, and which often vary in sequence. To further our understanding of RNA motif structure and sequence variability and to provide a useful resource for structure modeling and prediction, we present a new method for automated classification of internal and hairpin loop RNA 3D motifs and a new online database called the RNA 3D Motif Atlas. To classify the motif instances, a representative set of internal and hairpin loops is automatically extracted from a nonredundant list of RNA-containing PDB files. Their structures are compared geometrically, all-against-all, using the FR3D program suite. The loops are clustered into motif groups, taking into account geometric similarity and structural annotations and making allowance for a variable number of bulged bases. The automated procedure that we have implemented identifies all hairpin and internal loop motifs previously described in the literature. All motif instances and motif groups are assigned unique and stable identifiers and are made available in the RNA 3D Motif Atlas (http://rna.bgsu.edu/motifs), which is automatically updated every four weeks. The RNA 3D Motif Atlas provides an interactive user interface for exploring motif diversity and tools for programmatic data access. PMID:23970545

  8. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... for nucleotide and/or amino acid sequence data. 1.822 Section 1.822 Patents, Trademarks, and... Amino Acid Sequences § 1.822 Symbols and format to be used for nucleotide and/or amino acid sequence data. (a) The symbols and format to be used for nucleotide and/or amino acid sequence data...

  9. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... for nucleotide and/or amino acid sequence data. 1.822 Section 1.822 Patents, Trademarks, and... Amino Acid Sequences § 1.822 Symbols and format to be used for nucleotide and/or amino acid sequence data. (a) The symbols and format to be used for nucleotide and/or amino acid sequence data...

  10. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... for nucleotide and/or amino acid sequence data. 1.822 Section 1.822 Patents, Trademarks, and... Amino Acid Sequences § 1.822 Symbols and format to be used for nucleotide and/or amino acid sequence data. (a) The symbols and format to be used for nucleotide and/or amino acid sequence data...

  11. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... for nucleotide and/or amino acid sequence data. 1.822 Section 1.822 Patents, Trademarks, and... Amino Acid Sequences § 1.822 Symbols and format to be used for nucleotide and/or amino acid sequence data. (a) The symbols and format to be used for nucleotide and/or amino acid sequence data...

  12. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... for nucleotide and/or amino acid sequence data. 1.822 Section 1.822 Patents, Trademarks, and... Amino Acid Sequences § 1.822 Symbols and format to be used for nucleotide and/or amino acid sequence data. (a) The symbols and format to be used for nucleotide and/or amino acid sequence data...

  13. Characterization of Putative Cholesterol Recognition/Interaction Amino Acid Consensus-Like Motif of Campylobacter jejuni Cytolethal Distending Toxin C

    PubMed Central

    Lai, Chih-Ho; Lai, Cheng-Kuo; Lin, Ying-Ju; Hung, Chiu-Lien; Chu, Chia-Han; Feng, Chun-Lung; Chang, Chia-Shuo; Su, Hong-Lin

    2013-01-01

    Cytolethal distending toxin (CDT) produced by Campylobacter jejuni comprises a heterotrimeric complex formed by CdtA, CdtB, and CdtC. Among these toxin subunits, CdtA and CdtC function as essential proteins that mediate toxin binding to cytoplasmic membranes followed by delivery of CdtB into the nucleus. The binding of CdtA/CdtC to the cell surface is mediated by cholesterol, a major component in lipid rafts. Although the putative cholesterol recognition/interaction amino acid consensus (CRAC) domain of CDT has been reported from several bacterial pathogens, the protein regions contributing to CDT binding to cholesterol in C. jejuni remain unclear. Here, we selected a potential CRAC-like region present in the CdtC from C. jejuni for analysis. Molecular modeling showed that the predicted functional domain had the shape of a hydrophobic groove, facilitating cholesterol localization to this domain. Mutation of a tyrosine residue in the CRAC-like region decreased direct binding of CdtC to cholesterol rather than toxin intermolecular interactions and led to impaired CDT intoxication. These results provide a molecular link between C. jejuni CdtC and membrane-lipid rafts through the CRAC-like region, which contributes to toxin recognition and interaction with cholesterol. PMID:23762481

  14. The Drosophila don juan (dj) gene encodes a novel sperm specific protein component characterized by an unusual domain of a repetitive amino acid motif.

    PubMed

    Santel, A; Winhauer, T; Blümer, N; Renkawitz-Pohl, R

    1997-06-01

    We identified and characterized the don juan gene (dj) of Drosophila melanogaster. The don juan gene codes for a sperm specific protein component with an unusual repetitive six amino acid motif (DPCKKK) in the carboxy-terminal part of the protein. The expression of Don Juan is limited to male germ cells where transcription of the dj gene is initiated during meiotic prophase. But Western blot experiments indicate that DJ protein occurs just postmeiotically. Examination of transgenic flies bearing a dj-promoter-lacZ reporter construct revealed lacZ mRNA distribution resembling the expression pattern of the endogenous dj mRNA in the adult testes, whereas beta-galactosidase expression is exclusively present in postmeiotic germ cells. Thus, these observations strongly suggest that dj transcripts are under translational repression until in spermiogenesis. To study the function and subcellular distribution of DJ in spermiogenesis we expressed a chimaeric dj-GFP fusion gene in the male germline exhibiting strong GFP fluorescence in the liver testes, where only elongated spermatids are decorated. With regard to the characteristic expression pattern of DJ protein and its conspicuous repeat units possible functional roles are discussed.

  15. Human retroviruses and AIDS 1996. A compilation and analysis of nucleic acid and amino acid sequences

    SciTech Connect

    Myers, G.; Foley, B.; Korber, B.; Mellors, J.W.; Jeang, K.T.; Wain-Hobson, S.

    1997-04-01

    This compendium and the accompanying floppy diskettes are the result of an effort to compile and rapidly publish all relevant molecular data concerning the human immunodeficiency viruses (HIV) and related retroviruses. The scope of the compendium and database is best summarized by the five parts that it comprises: (1) Nuclear Acid Alignments and Sequences; (2) Amino Acid Alignments; (3) Analysis; (4) Related Sequences; and (5) Database Communications. Information within all the parts is updated throughout the year on the Web site, http://hiv-web.lanl.gov. While this publication could take the form of a review or sequence monograph, it is not so conceived. Instead, the literature from which the database is derived has simply been summarized and some elementary computational analyses have been performed upon the data. Interpretation and commentary have been avoided insofar as possible so that the reader can form his or her own judgments concerning the complex information. In addition to the general descriptions of the parts of the compendium, the user should read the individual introductions for each part.

  16. Two overlapping sequence motifs within the polyomavirus enhancer are independently the targets of stimulation by both the tumor promoter 12-O-tetradecanoylphorbol-3-acetate and the Ha-ras oncogene

    SciTech Connect

    Yamaguchi, Yyko; Satake, Masanobu; Ito, Yoshiaki

    1989-03-01

    A tumor-promoting phorbol ester, 12-O-tetradecanoylphorbol-13-acetate (TPA), strongly stimulates the activity of polyomavirus enhancer in a human erythroleukemia cell line, K562. The target of stimulation was the previously defined A element (from nucleotides 5107 to 5130) of the enhancer. The authors found that within the A element, two partly overlapping sequence motifs (one from nucleotides 5107 to 5117, the other from nucleotides 5113 to 5121) were independently the targets of TPA stimulation. The former is homologous to the enhancer core sequence of the adenovirus type 5 E1A gene, and the latter shares the consensus AP-1-binding site. In addition, transiently expressed Ha-ras oncogene also stimulated these two subelements in K562 cells, as they reported for NIH 3T3 cells previously.

  17. Sampling Motif-Constrained Ensembles of Networks

    NASA Astrophysics Data System (ADS)

    Fischer, Rico; Leitão, Jorge C.; Peixoto, Tiago P.; Altmann, Eduardo G.

    2015-10-01

    The statistical significance of network properties is conditioned on null models which satisfy specified properties but that are otherwise random. Exponential random graph models are a principled theoretical framework to generate such constrained ensembles, but which often fail in practice, either due to model inconsistency or due to the impossibility to sample networks from them. These problems affect the important case of networks with prescribed clustering coefficient or number of small connected subgraphs (motifs). In this Letter we use the Wang-Landau method to obtain a multicanonical sampling that overcomes both these problems. We sample, in polynomial time, networks with arbitrary degree sequences from ensembles with imposed motifs counts. Applying this method to social networks, we investigate the relation between transitivity and homophily, and we quantify the correlation between different types of motifs, finding that single motifs can explain up to 60% of the variation of motif profiles.

  18. Transcriptome Sequencing in Response to Salicylic Acid in Salvia miltiorrhiza

    PubMed Central

    Zhang, Xiaoru; Dong, Juane; Liu, Hailong; Wang, Jiao; Qi, Yuexin; Liang, Zongsuo

    2016-01-01

    Salvia miltiorrhiza is a traditional Chinese herbal medicine, whose quality and yield are often affected by diseases and environmental stresses during its growing season. Salicylic acid (SA) plays a significant role in plants responding to biotic and abiotic stresses, but the involved regulatory factors and their signaling mechanisms are largely unknown. In order to identify the genes involved in SA signaling, the RNA sequencing (RNA-seq) strategy was employed to evaluate the transcriptional profiles in S. miltiorrhiza cell cultures. A total of 50,778 unigenes were assembled, in which 5,316 unigenes were differentially expressed among 0-, 2-, and 8-h SA induction. The up-regulated genes were mainly involved in stimulus response and multi-organism process. A core set of candidate novel genes coding SA signaling component proteins was identified. Many transcription factors (e.g., WRKY, bHLH and GRAS) and genes involved in hormone signal transduction were differentially expressed in response to SA induction. Detailed analysis revealed that genes associated with defense signaling, such as antioxidant system genes, cytochrome P450s and ATP-binding cassette transporters, were significantly overexpressed, which can be used as genetic tools to investigate disease resistance. Our transcriptome analysis will help understand SA signaling and its mechanism of defense systems in S. miltiorrhiza. PMID:26808150

  19. Inframolecular acid base studies of the tris and tetrakis myo-inositol phosphates including the 1,2,3-trisphosphate motif

    NASA Astrophysics Data System (ADS)

    Dozol, Hélène; Blum-Held, Corinne; Guédat, Philippe; Maechling, Clarisse; Lanners, Steve; Schlewer, Gilbert; Spiess, Bernard

    2002-12-01

    The intrinsic acid-base properties of the phosphate groups of three myo-inositol derivatives which display the 1,2,3-trisphosphate motif, i.e. (±)- myo-inositol 1,2,3-trisphosphate (Ins(1,2,3)P 3), (±)- myo-inositol 1,2,3,6-tetrakisphosphate (Ins(1,2,3,6)P 4), and (±)- myo-inositol 1,2,3,5-tetrakisphosphate (Ins(1,2,3,5)P 4) are reported. The studies were performed in 0.2 M KCl solution at 37 °C, near physiological ionic strength and temperature. In addition, in order to shed light on the transition metal complexation properties of Ins(1,2,3)P 3, the influence of the Zn 2+ cations on its 31P NMR titration curves was investigated. From the titration curves as well as from the determined protonation microconstants, it appears that for Ins(1,2,3)P 3, the two lateral P1 and P3 phosphates strongly contribute to stabilise a proton on the central P2 phosphate. However, in the fully deprotonated form of Ins(1,2,3)P 3, P1 and P3 repulse each other so that they establish hydrogen bonds with, respectively, their neighbouring OH6 and OH4 hydroxyls. The 1,2,3-trisphosphate motif of Ins(1,2,3,5)P 4 behaves very similarly to that of Ins(1,2,3)P 3 indicating a poor interaction with the distant P5 phosphate. By contrast, moving a phosphate group from position 5 to position 6 on the myo-inositol ring as in Ins(1,2,3,6)P 4, leads to major changes in the basicity and cooperativity of the phosphate groups. Finally, the presence of Zn 2+ cations has a marked influence on the 31P NMR titration curves of Ins(1,2,3)P 3, leading to the conclusion that two equatorial phosphates, assisted by a middle axial one, afford an optimal chelating moiety that is able to occupy all sites of the metal coordination polyhedron which could be the reason for its antioxidant properties.

  20. Effector prediction in host-pathogen interaction based on a Markov model of a ubiquitous EPIYA motif

    PubMed Central

    2010-01-01

    Background Effector secretion is a common strategy of pathogen in mediating host-pathogen interaction. Eight EPIYA-motif containing effectors have recently been discovered in six pathogens. Once these effectors enter host cells through type III/IV secretion systems (T3SS/T4SS), tyrosine in the EPIYA motif is phosphorylated, which triggers effectors binding other proteins to manipulate host-cell functions. The objectives of this study are to evaluate the distribution pattern of EPIYA motif in broad biological species, to predict potential effectors with EPIYA motif, and to suggest roles and biological functions of potential effectors in host-pathogen interactions. Results A hidden Markov model (HMM) of five amino acids was built for the EPIYA-motif based on the eight known effectors. Using this HMM to search the non-redundant protein database containing 9,216,047 sequences, we obtained 107,231 sequences with at least one EPIYA motif occurrence and 3115 sequences with multiple repeats of the EPIYA motif. Although the EPIYA motif exists among broad species, it is significantly over-represented in some particular groups of species. For those proteins containing at least four copies of EPIYA motif, most of them are from intracellular bacteria, extracellular bacteria with T3SS or T4SS or intracellular protozoan parasites. By combining the EPIYA motif and the adjacent SH2 binding motifs (KK, R4, Tarp and Tir), we built HMMs of nine amino acids and predicted many potential effectors in bacteria and protista by the HMMs. Some potential effectors for pathogens (such as Lawsonia intracellularis, Plasmodium falciparum and Leishmania major) are suggested. Conclusions Our study indicates that the EPIYA motif may be a ubiquitous functional site for effectors that play an important pathogenicity role in mediating host-pathogen interactions. We suggest that some intracellular protozoan parasites could secrete EPIYA-motif containing effectors through secretion systems similar to the

  1. QM Computations on Complete Nucleic Acids Building Blocks: Analysis of the Sarcin-Ricin RNA Motif Using DFT-D3, HF-3c, PM6-D3H, and MM Approaches.

    PubMed

    Kruse, Holger; Havrila, Marek; Šponer, Jiřı

    2014-06-10

    A set of conformations obtained from explicit solvent molecular dynamics (MD) simulations of the Sarcin-Ricin internal loop (SRL) RNA motif is investigated using quantum mechanical (QM, TPSS-D3/def2-TZVP DFT-D3) and molecular mechanics (MM, AMBER parm99bsc0+χol3 force field) methods. Solvent effects are approximated using implicit solvent methods (COSMO for DFT-D3; GB and PB for MM). Large-scale DFT-D3 optimizations of the full 11-nucleotide motif are compared to MM results and reveal a higher flexibility of DFT-D3 over the MM in the optimization procedure. Conformational energies of the SRL motif expose significant differences in the DFT-D3 and MM energy descriptions that explain difficulties in MD simulations of the SRL motif. The TPSS-D3 data are in excellent agreement with results obtained by the hybrid functionals PW6B95-D3 and M06-2X. Computationally more efficient methods such as PM6-D3H and HF-3c show promising but partly inconsistent results. It is demonstrated that large-scale DFT-D3 computations on complete nucleic acids building blocks are a viable tool to complement the picture obtained from MD simulations and can be used as benchmarks for faster computational methods. Methodological challenges of large-scale QM computations on nucleic acids such as missing solvent-solute interactions and the truncation of the studied systems are discussed.

  2. Human retroviruses and aids, 1992. A compilation and analysis of nucleic acid and amino acid sequences

    SciTech Connect

    Myers, G.; Korber, B.; Berzofsky, J.A.; Pavlakis, G.N.; Smith, R.F.

    1992-10-01

    This compendium and the accompanying floppy diskettes are the result of an effort to compile and rapidly publish all relevant molecular data concerning the human immunodeficiency viruses (HIV) and related retroviruses. The scope of the compendium and database is best summarized by the five parts that it comprises: (1) HIV and SIV Nucleotide Sequences; (H) Amino Acid Sequences; (III) Analyses; (IV) Related Sequences; and (V) Database Communications. information within all the parts is updated at least twice in each year, which accounts for the modes of binding and pagination in the compendium. While this publication could take the form of a review or sequence monograph, it is not so conceived. Instead, the literature from which the database is derived has simply been summarized and some elementary computational analyses have been performed upon the data. Interpretation and commentary have been avoided insofar as possible so that the reader can form his or her own judgments concerning the complex information. In addition to the general descriptions below of the parts of the compendium, the user should read the individual introductions for each part.

  3. Structural Relationships in the Lysozyme Superfamily: Significant Evidence for Glycoside Hydrolase Signature Motifs

    PubMed Central

    Wohlkönig, Alexandre; Huet, Joëlle; Looze, Yvan; Wintjens, René

    2010-01-01

    Background Chitin is a polysaccharide that forms the hard, outer shell of arthropods and the cell walls of fungi and some algae. Peptidoglycan is a polymer of sugars and amino acids constituting the cell walls of most bacteria. Enzymes that are able to hydrolyze these cell membrane polymers generally play important roles for protecting plants and animals against infection with insects and pathogens. A particular group of such glycoside hydrolase enzymes share some common features in their three-dimensional structure and in their molecular mechanism, forming the lysozyme superfamily. Results Besides having a similar fold, all known catalytic domains of glycoside hydrolase proteins of lysozyme superfamily (families and subfamilies GH19, GH22, GH23, GH24 and GH46) share in common two structural elements: the central helix of the all-α domain, which invariably contains the catalytic glutamate residue acting as general-acid catalyst, and a β-hairpin pointed towards the substrate binding cleft. The invariant β-hairpin structure is interestingly found to display the highest amino acid conservation in aligned sequences of a given family, thereby allowing to define signature motifs for each GH family. Most of such signature motifs are found to have promising performances for searching sequence databases. Our structural analysis further indicates that the GH motifs participate in enzymatic catalysis essentially by containing the catalytic water positioning residue of inverting mechanism. Conclusions The seven families and subfamilies of the lysozyme superfamily all have in common a β-hairpin structure which displays a family-specific sequence motif. These GH β-hairpin motifs contain potentially important residues for the catalytic activity, thereby suggesting the participation of the GH motif to catalysis and also revealing a common catalytic scheme utilized by enzymes of the lysozyme superfamily. PMID:21085702

  4. Fitting a mixture model by expectation maximization to discover motifs in biopolymers

    SciTech Connect

    Bailey, T.L.; Elkan, C.

    1994-12-31

    The algorithm described in this paper discovers one or more motifs in a collection of DNA or protein sequences by using the technique of expectation maximization to fit a two-component finite mixture model to the set of sequences. Multiple motifs are found by fitting a mixture model to the data, probabilistically erasing the occurrences of the motif thus found, and repeating the process to find successive motifs. The algorithm requires only a set of unaligned sequences and a number specifying the width of the motifs as input. It returns a model of each motif and a threshold which together can be used as a Bayes-optimal classifier for searching for occurrences of the motif in other databases. The algorithm estimates how many times each motif occurs in each sequence in the dataset and outputs an alignment of the occurrences of the motif. The algorithm is capable of discovering several different motifs with differing numbers of occurrences in a single dataset.

  5. The position of the Gly-xxx-Gly motif in transmembrane segments modulates dimer affinity.

    PubMed

    Johnson, Rachel M; Rath, Arianna; Deber, Charles M

    2006-12-01

    Although the intrinsic low solubility of membrane proteins presents challenges to their high-resolution structure determination, insight into the amino acid sequence features and forces that stabilize their folds has been provided through study of sequence-dependent helix-helix interactions between single transmembrane (TM) helices. While the stability of helix-helix partnerships mediated by the Gly-xxx-Gly (GG4) motif is known to be generally modulated by distal interfacial residues, it has not been established whether the position of this motif, with respect to the ends of a given TM segment, affects dimer affinity. Here we examine the relationship between motif position and affinity in the homodimers of 2 single-spanning membrane protein TM sequences: glycophorin A (GpA) and bacteriophage M13 coat protein (MCP). Using the TOXCAT assay for dimer affinity on a series of GpA and MCP TM segments that have been modified with either 4 Leu residues at each end or with 8 Leu residues at the N-terminal end, we show that in each protein, centrally located GG4 motifs are capable of stronger helix-helix interactions than those proximal to TM helix ends, even when surrounding interfacial residues are maintained. The relative importance of GG4 motifs in stabilizing helix-helix interactions therefore must be considered not only in its specific residue context but also in terms of the location of the interactive surface relative to the N and C termini of alpha-helical TM segments.

  6. A survey of motif finding Web tools for detecting binding site motifs in ChIP-Seq data

    PubMed Central

    2014-01-01

    Abstract ChIP-Seq (chromatin immunoprecipitation sequencing) has provided the advantage for finding motifs as ChIP-Seq experiments narrow down the motif finding to binding site locations. Recent motif finding tools facilitate the motif detection by providing user-friendly Web interface. In this work, we reviewed nine motif finding Web tools that are capable for detecting binding site motifs in ChIP-Seq data. We showed each motif finding Web tool has its own advantages for detecting motifs that other tools may not discover. We recommended the users to use multiple motif finding Web tools that implement different algorithms for obtaining significant motifs, overlapping resemble motifs, and non-overlapping motifs. Finally, we provided our suggestions for future development of motif finding Web tool that better assists researchers for finding motifs in ChIP-Seq data. Reviewers This article was reviewed by Prof. Sandor Pongor, Dr. Yuriy Gusev, and Dr. Shyam Prabhakar (nominated by Prof. Limsoon Wong). PMID:24555784

  7. Completion of the amino acid sequence of the alpha 1 chain from type I calf skin collagen. Amino acid sequence of alpha 1(I)B8.

    PubMed Central

    Glanville, R W; Breitkreutz, D; Meitinger, M; Fietzek, P P

    1983-01-01

    The complete amino acid sequence of the 279-residue CNBr peptide CB8 from the alpha 1 chain of type I calf skin collagen is presented. It was determined by sequencing overlapping fragments of CB8 produced by Staphylococcus aureus V8 proteinase, trypsin, Endoproteinase Arg-C and hydroxylamine. Tryptic cleavages were also made specific for lysine by blocking arginine residues with cyclohexane-1,2-dione. This completes the amino acid sequence analysis of the 1054-residues-long alpha (I) chain of calf skin collagen. PMID:6354180

  8. VARUN: discovering extensible motifs under saturation constraints.

    PubMed

    Apostolico, Alberto; Comin, Matteo; Parida, Laxmi

    2010-01-01

    The discovery of motifs in biosequences is frequently torn between the rigidity of the model on one hand and the abundance of candidates on the other hand. In particular, motifs that include wild cards or "don't cares" escalate exponentially with their number, and this gets only worse if a don't care is allowed to stretch up to some prescribed maximum length. In this paper, a notion of extensible motif in a sequence is introduced and studied, which tightly combines the structure of the motif pattern, as described by its syntactic specification, with the statistical measure of its occurrence count. It is shown that a combination of appropriate saturation conditions and the monotonicity of probabilistic scores over regions of constant frequency afford us significant parsimony in the generation and testing of candidate overrepresented motifs. A suite of software programs called Varun is described, implementing the discovery of extensible motifs of the type considered. The merits of the method are then documented by results obtained in a variety of experiments primarily targeting protein sequence families. Of equal importance seems the fact that the sets of all surprising motifs returned in each experiment are extracted faster and come in much more manageable sizes than would be obtained in the absence of saturation constraints.

  9. Genome-wide analysis of ethylene-responsive element binding factor-associated amphiphilic repression motif-containing transcriptional regulators in Arabidopsis.

    PubMed

    Kagale, Sateesh; Links, Matthew G; Rozwadowski, Kevin

    2010-03-01

    The ethylene-responsive element binding factor-associated amphiphilic repression (EAR) motif is a transcriptional regulatory motif identified in members of the ethylene-responsive element binding factor, C2H2, and auxin/indole-3-acetic acid families of transcriptional regulators. Sequence comparison of the core EAR motif sites from these proteins revealed two distinct conservation patterns: LxLxL and DLNxxP. Proteins containing these motifs play key roles in diverse biological functions by negatively regulating genes involved in developmental, hormonal, and stress signaling pathways. Through a genome-wide bioinformatics analysis, we have identified the complete repertoire of the EAR repressome in Arabidopsis (Arabidopsis thaliana) comprising 219 proteins belonging to 21 different transcriptional regulator families. Approximately 72% of these proteins contain a LxLxL type of EAR motif, 22% contain a DLNxxP type of EAR motif, and the remaining 6% have a motif where LxLxL and DLNxxP are overlapping. Published in vitro and in planta investigations support approximately 40% of these proteins functioning as negative regulators of gene expression. Comparative sequence analysis of EAR motif sites and adjoining regions has identified additional preferred residues and potential posttranslational modification sites that may influence the functionality of the EAR motif. Homology searches against protein databases of poplar (Populus trichocarpa), grapevine (Vitis vinifera), rice (Oryza sativa), and sorghum (Sorghum bicolor) revealed that the EAR motif is conserved across these diverse plant species. This genome-wide analysis represents the most extensive survey of EAR motif-containing proteins in Arabidopsis to date and provides a resource enabling investigations into their biological roles and the mechanism of EAR motif-mediated transcriptional regulation.

  10. Characteristic motifs for families of allergenic proteins

    PubMed Central

    Ivanciuc, Ovidiu; Garcia, Tzintzuni; Torres, Miguel; Schein, Catherine H.; Braun, Werner

    2008-01-01

    The identification of potential allergenic proteins is usually done by scanning a database of allergenic proteins and locating known allergens with a high sequence similarity. However, there is no universally accepted cut-off value for sequence similarity to indicate potential IgE cross-reactivity. Further, overall sequence similarity may be less important than discrete areas of similarity in proteins with homologous structure. To identify such areas, we first classified all allergens and their subdomains in the Structural Database of Allergenic Proteins (SDAP, http://fermi.utmb.edu/SDAP/) to their closest protein families as defined in Pfam, and identified conserved physicochemical property motifs characteristic of each group of sequences. Allergens populate only a small subset of all known Pfam families, as all allergenic proteins in SDAP could be grouped to only 130 (of 9318 total) Pfams, and 31 families contain more than four allergens. Conserved physicochemical property motifs for the aligned sequences of the most populated Pfam families were identified with the PCPMer program suite and catalogued in the webserver Motif-Mate (http://born.utmb.edu/motifmate/summary.php). We also determined specific motifs for allergenic members of a family that could distinguish them from non-allergenic ones. These allergen specific motifs should be most useful in database searches for potential allergens. We found that sequence motifs unique to the allergens in three families (seed storage proteins, Bet v 1, and tropomyosin) overlap with known IgE epitopes, thus providing evidence that our motif based approach can be used to assess the potential allergenicity of novel proteins. PMID:18951633

  11. Sharing of four DR-beta sequence motifs between HLA-DRB1*1601 and DRB1*1101 correlates with frequent degenerate T-cell recognition of HA306-320 peptide complexed to these two molecules.

    PubMed

    Zeliszewski, D; Dorval, I; Golvano, J J; Prevost, A; Borras-Cuesta, F; Sterkers, G

    1996-02-01

    This paper shows that the seven HA306-320 specific T-cell clones isolated from one individual recognize the peptide complexed to both autologous HLA-DRB1*1101 and allogeneic HLA-DRB1*1601 (or DRB5*0201) molecules. For each T-cell clone, a single T-cell receptor (TCR) is involved in the recognition of these two different peptide-DR complexes as evidenced by cold target competition experiments. Yet, the seven T-cell clones express several different TCRs as judged by V beta-J beta usage and fine specificities. Furthermore, one representative clone has the same fine specificity for HA306-320 analogues mutated at epitopic residues irrespective of the use of DR1101 or DR1601 APC. These results suggest that structural differences between DRB1*1101 and DRB1*1601 (or DRB5*0201) do not dramatically influence the orientation of HA306-320 in the grooves such that most residues interacting with TCRs are conserved. In another individual, the same pattern of restriction, i.e. DR1101 + DR1601, was found for several HA306-320 specific clones. Two additional patterns, DR1101 + DR0801 and DR1101 + DR0801 + DR1601, were identified. By comparing DR sequences the authors found that DRB1*1101 and DRB1*1601 share four important motifs, i.e. beta 85-86, beta 67-71, beta 57 and beta 28-31 supposed to line three distinct HLA-DR pockets. Three of these motifs are also shared with DRB1*0801. All the results further support that the motif similarities allow the peptide to adopt very similar orientations in the cross-reacting DR molecules.

  12. An Integrated Sequence-Structure Database incorporating matching mRNA sequence, amino acid sequence and protein three-dimensional structure data.

    PubMed Central

    Adzhubei, I A; Adzhubei, A A; Neidle, S

    1998-01-01

    We have constructed a non-homologous database, termed the Integrated Sequence-Structure Database (ISSD) which comprises the coding sequences of genes, amino acid sequences of the corresponding proteins, their secondary structure and straight phi,psi angles assignments, and polypeptide backbone coordinates. Each protein entry in the database holds the alignment of nucleotide sequence, amino acid sequence and the PDB three-dimensional structure data. The nucleotide and amino acid sequences for each entry are selected on the basis of exact matches of the source organism and cell environment. The current version 1.0 of ISSD is available on the WWW at http://www.protein.bio.msu.su/issd/ and includes 107 non-homologous mammalian proteins, of which 80 are human proteins. The database has been used by us for the analysis of synonymous codon usage patterns in mRNA sequences showing their correlation with the three-dimensional structure features in the encoded proteins. Possible ISSD applications include optimisation of protein expression, improvement of the protein structure prediction accuracy, and analysis of evolutionary aspects of the nucleotide sequence-protein structure relationship. PMID:9399866

  13. Complete amino acid sequence and structure characterization of the taste-modifying protein, miraculin.

    PubMed

    Theerasilp, S; Hitotsuya, H; Nakajo, S; Nakaya, K; Nakamura, Y; Kurihara, Y

    1989-04-25

    The taste-modifying protein, miraculin, has the unusual property of modifying sour taste into sweet taste. The complete amino acid sequence of miraculin purified from miracle fruits by a newly developed method (Theerasilp, S., and Kurihara, Y. (1988) J. Biol. Chem. 263, 11536-11539) was determined by an automatic Edman degradation method. Miraculin was a single polypeptide with 191 amino acid residues. The calculated molecular weight based on the amino acid sequence and the carbohydrate content (13.9%) was 24,600. Asn-42 and Asn-186 were linked N-glycosidically to carbohydrate chains. High homology was found between the amino acid sequences of miraculin and soybean trypsin inhibitor.

  14. Detection and isolation of nucleic acid sequences using a bifunctional hybridization probe

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    2000-01-01

    A method for detecting and isolating a target sequence in a sample of nucleic acids is provided using a bifunctional hybridization probe capable of hybridizing to the target sequence that includes a detectable marker and a first complexing agent capable of forming a binding pair with a second complexing agent. A kit is also provided for detecting a target sequence in a sample of nucleic acids using a bifunctional hybridization probe according to this method.

  15. Allele drop-out in the MECP2 gene due to G-quadruplex and i-motif sequences when using polymerase chain reaction-based diagnosis for Rett syndrome.

    PubMed

    Saunders, Carol J; Friez, Michael J; Patterson, Melanie; Nzabi, Masha; Zhao, Weiwei; Bi, Chengpeng

    2010-04-01

    Although few examples are formally documented, all polymerase chain reaction-based testing is theoretically vulnerable to allele drop-out (ADO), the failure to amplify one of the two alleles present in a cell. In a clinical setting, this can lead to false positive or negative diagnosis. We investigated the mechanisms leading to ADO in the MECP2 gene in two unrelated female patients undergoing testing for Rett syndrome. Both the patients had two benign DNA variations, c.819G > T and c.1161C > T, that appeared homozygous due to ADO. Bioinformatics analyses indicate that this region of the MECP2 gene is rich in complex tertiary structures called G-quadruplex and i-motifs, the disruption of which by the c.819G > T and c.1161C > T variants leads to preferential amplification of the variant allele. Other examples of ADO likely occur, and consideration of disrupting G-quadruplex and i-motif structures should be given when this phenomenon is unexpected. We identify factors in both the polymerase chain reaction amplification and the sequencing steps that help overcome ADO.

  16. Bayesian models and Markov chain Monte Carlo methods for protein motifs with the secondary characteristics.

    PubMed

    Xie, Jun; Kim, Nak-Kyeong

    2005-09-01

    Statistical methods have been developed for finding local patterns, also called motifs, in multiple protein sequences. The aligned segments may imply functional or structural core regions. However, the existing methods often have difficulties in aligning multiple proteins when sequence residue identities are low (e.g., less than 25%). In this article, we develop a Bayesian model and Markov chain Monte Carlo (MCMC) methods for identifying subtle motifs in protein sequences. Specifically, a motif is defined not only in terms of specific sites characterized by amino acid frequency vectors, but also as a combination of secondary characteristics such as hydrophobicity, polarity, etc. Markov chain Monte Carlo methods are proposed to search for a motif pattern with high posterior probability under the new model. A special MCMC algorithm is developed, involving transitions between state spaces of different dimensions. The proposed methods were supported by a simulated study. It was then tested by two real datasets, including a group of helix-turn-helix proteins, and one set from the CATH Protein Structure Classification Database. Statistical comparisons showed that the new approach worked better than a typical Gibbs sampling approach which is based only on an amino acid model.

  17. The extended AT-hook is a novel RNA binding motif.

    PubMed

    Filarsky, Michael; Zillner, Karina; Araya, Ingrid; Villar-Garea, Ana; Merkl, Rainer; Längst, Gernot; Németh, Attila

    2015-01-01

    The AT-hook has been defined as a DNA binding peptide motif that contains a glycine-arginine-proline (G-R-P) tripeptide core flanked by basic amino acids. Recent reports documented variations in the sequence of AT-hooks and revealed RNA binding activity of some canonical AT-hooks, suggesting a higher structural and functional variability of this protein domain than previously anticipated. Here we describe the discovery and characterization of the extended AT-hook peptide motif (eAT-hook), in which basic amino acids appear symmetrical mainly at a distance of 12-15 amino acids from the G-R-P core. We identified 80 human and 60 mouse eAT-hook proteins and biochemically characterized the eAT-hooks of Tip5/BAZ2A, PTOV1 and GPBP1. Microscale thermophoresis and electrophoretic mobility shift assays reveal the nucleic acid binding features of this peptide motif, and show that eAT-hooks bind RNA with one order of magnitude higher affinity than DNA. In addition, cellular localization studies suggest a role for the N-terminal eAT-hook of PTOV1 in nucleocytoplasmic shuttling. In summary, our findings classify the eAT-hook as a novel nucleic acid binding motif, which potentially mediates various RNA-dependent cellular processes.

  18. Identification of amino acids essential for DNA binding and dimerization in p67SRF: implications for a novel DNA-binding motif.

    PubMed Central

    Sharrocks, A D; Gille, H; Shaw, P E

    1993-01-01

    The serum response factor (p67SRF) binds to a palindromic sequence in the c-fos serum response element (SRE). A second protein, p62TCF binds in conjunction with p67SRF to form a ternary complex, and it is through this complex that growth factor-induced transcriptional activation of c-fos is thought to take place. A 90-amino-acid peptide, coreSRF, is capable for dimerizing, binding DNA, and recruiting p62TCF. By using extensive site-directed mutagenesis we have investigated the role of individual coreSRF amino acids in DNA binding. Mutant phenotypes were defined by gel retardation and cross-linking analyses. Our results have identified residues essential for either DNA binding or dimerization. Three essential basic amino acids whose conservative mutation severely reduced DNA binding were identified. Evidence which is consistent with these residues being on the face of a DNA binding alpha-helix is presented. A phenylalanine residue and a hexameric hydrophobic box are identified as essential for dimerization. The amino acid phasing is consistent with the dimerization interface being presented as a continuous region on a beta-strand. A putative second alpha-helix acts as a linker between these two regions. This study indicates that p67SRF is a member of a protein family which, in common with many DNA binding proteins, utilize an alpha-helix for DNA binding. However, this alpha-helix is contained within a novel domain structure. Images PMID:8417320

  19. The Crc and Hfq proteins of Pseudomonas putida cooperate in catabolite repression and formation of ribonucleic acid complexes with specific target motifs.

    PubMed

    Moreno, Renata; Hernández-Arranz, Sofía; La Rosa, Ruggero; Yuste, Luis; Madhushani, Anjana; Shingler, Victoria; Rojo, Fernando

    2015-01-01

    The Crc protein is a global regulator that has a key role in catabolite repression and optimization of metabolism in Pseudomonads. Crc inhibits gene expression post-transcriptionally, preventing translation of mRNAs bearing an AAnAAnAA motif [the catabolite activity (CA) motif] close to the translation start site. Although Crc was initially believed to bind RNA by itself, this idea was recently challenged by results suggesting that a protein co-purifying with Crc, presumably the Hfq protein, could account for the detected RNA-binding activity. Hfq is an abundant protein that has a central role in post-transcriptional gene regulation. Herein, we show that the Pseudomonas putida Hfq protein can recognize the CA motifs of RNAs through its distal face and that Crc facilitates formation of a more stable complex at these targets. Crc was unable to bind RNA in the absence of Hfq. However, pull-down assays showed that Crc and Hfq can form a co-complex with RNA containing a CA motif in vitro. Inactivation of the hfq or the crc gene impaired catabolite repression to a similar extent. We propose that Crc and Hfq cooperate in catabolite repression, probably through forming a stable co-complex with RNAs containing CA motifs to result in inhibition of translation initiation.

  20. The complete sequence of a Spanish isolate of Broad bean wilt virus 1 (BBWV-1) reveals a high variability and conserved motifs in the genus Fabavirus.

    PubMed

    Ferrer, R M; Guerri, J; Luis-Arteaga, M S; Moreno, P; Rubio, L

    2005-10-01

    The genome of a Spanish isolate of Broad bean wilt virus-1 (BBWV-1) was completely sequenced and compared with available sequences of other isolates of the genus Fabavirus (BBWV-1 and BBWV-2). This consisted of two RNAs of 5814 and 3431 nucleotides, respectively, and their organization was similar to that of other members of the family Comoviridae. Its mean nucleotide identity with a BBWV-1 American isolate was 81.5%, and between 59.8 and 63.5% with seven BBWV-2 isolates. Our analysis showed sequence stretches in the 5' non-coding regions which are conserved in both genomic RNAs and in BBWV-1 and BBWV-2 isolates.

  1. Gibbs motif sampling: detection of bacterial outer membrane protein repeats.

    PubMed Central

    Neuwald, A. F.; Liu, J. S.; Lawrence, C. E.

    1995-01-01

    The detection and alignment of locally conserved regions (motifs) in multiple sequences can provide insight into protein structure, function, and evolution. A new Gibbs sampling algorithm is described that detects motif-encoding regions in sequences and optimally partitions them into distinct motif models; this is illustrated using a set of immunoglobulin fold proteins. When applied to sequences sharing a single motif, the sampler can be used to classify motif regions into related submodels, as is illustrated using helix-turn-helix DNA-binding proteins. Other statistically based procedures are described for searching a database for sequences matching motifs found by the sampler. When applied to a set of 32 very distantly related bacterial integral outer membrane proteins, the sampler revealed that they share a subtle, repetitive motif. Although BLAST (Altschul SF et al., 1990, J Mol Biol 215:403-410) fails to detect significant pairwise similarity between any of the sequences, the repeats present in these outer membrane proteins, taken as a whole, are highly significant (based on a generally applicable statistical test for motifs described here). Analysis of bacterial porins with known trimeric beta-barrel structure and related proteins reveals a similar repetitive motif corresponding to alternating membrane-spanning beta-strands. These beta-strands occur on the membrane interface (as opposed to the trimeric interface) of the beta-barrel. The broad conservation and structural location of these repeats suggests that they play important functional roles. PMID:8520488

  2. DNA motifs determining the efficiency of adaptation into the Escherichia coli CRISPR array.

    PubMed

    Yosef, Ido; Shitrit, Dror; Goren, Moran G; Burstein, David; Pupko, Tal; Qimron, Udi

    2013-08-27

    Clustered regularly interspaced short palindromic repeats (CRISPR) and their associated proteins constitute a recently identified prokaryotic defense system against invading nucleic acids. DNA segments, termed protospacers, are integrated into the CRISPR array in a process called adaptation. Here, we establish a PCR-based assay that enables evaluating the adaptation efficiency of specific spacers into the type I-E Escherichia coli CRISPR array. Using this assay, we provide direct evidence that the protospacer adjacent motif along with the first base of the protospacer (5'-AAG) partially affect the efficiency of spacer acquisition. Remarkably, we identified a unique dinucleotide, 5'-AA, positioned at the 3' end of the spacer, that enhances efficiency of the spacer's acquisition. Insertion of this dinucleotide increased acquisition efficiency of two different spacers. DNA sequencing of newly adapted CRISPR arrays revealed that the position of the newly identified motif with respect to the 5'-AAG is important for affecting acquisition efficiency. Analysis of approximately 1 million spacers showed that this motif is overrepresented in frequently acquired spacers compared with those acquired rarely. Our results represent an example of a short nonprotospacer adjacent motif sequence that affects acquisition efficiency and suggest that other as yet unknown motifs affect acquisition efficiency in other CRISPR systems as well.

  3. Evolutionary Analysis and Classification of OATs, OCTs, OCTNs, and Other SLC22 Transporters: Structure-Function Implications and Analysis of Sequence Motifs.

    PubMed

    Zhu, Christopher; Nigam, Kabir B; Date, Rishabh C; Bush, Kevin T; Springer, Stevan A; Saier, Milton H; Wu, Wei; Nigam, Sanjay K

    2015-01-01

    The SLC22 family includes organic anion transporters (OATs), organic cation transporters (OCTs) and organic carnitine and zwitterion transporters (OCTNs). These are often referred to as drug transporters even though they interact with many endogenous metabolites and signaling molecules (Nigam, S.K., Nature Reviews Drug Discovery, 14:29-44, 2015). Phylogenetic analysis of SLC22 supports the view that these transporters may have evolved over 450 million years ago. Many OAT members were found to appear after a major expansion of the SLC22 family in mammals, suggesting a physiological and/or toxicological role during the mammalian radiation. Putative SLC22 orthologs exist in worms, sea urchins, flies, and ciona. At least six groups of SLC22 exist. OATs and OCTs form two Major clades of SLC22, within which (apart from Oat and Oct subclades), there are also clear Oat-like, Octn, and Oct-related subclades, as well as a distantly related group we term "Oat-related" (which may have different functions). Based on available data, it is arguable whether SLC22A18, which is related to bacterial drug-proton antiporters, should be assigned to SLC22. Disease-causing mutations, single nucleotide polymorphisms (SNPs) and other functionally analyzed mutations in OAT1, OAT3, URAT1, OCT1, OCT2, OCTN1, and OCTN2 map to the first extracellular domain, the large central intracellular domain, and transmembrane domains 9 and 10. These regions are highly conserved within subclades, but not between subclades, and may be necessary for SLC22 transporter function and functional diversification. Our results not only link function to evolutionarily conserved motifs but indicate the need for a revised sub-classification of SLC22.

  4. Evolutionary Analysis and Classification of OATs, OCTs, OCTNs, and Other SLC22 Transporters: Structure-Function Implications and Analysis of Sequence Motifs

    PubMed Central

    Date, Rishabh C.; Bush, Kevin T.; Springer, Stevan A.; Saier, Milton H.; Wu, Wei; Nigam, Sanjay K.

    2015-01-01

    The SLC22 family includes organic anion transporters (OATs), organic cation transporters (OCTs) and organic carnitine and zwitterion transporters (OCTNs). These are often referred to as drug transporters even though they interact with many endogenous metabolites and signaling molecules (Nigam, S.K., Nature Reviews Drug Discovery, 14:29–44, 2015). Phylogenetic analysis of SLC22 supports the view that these transporters may have evolved over 450 million years ago. Many OAT members were found to appear after a major expansion of the SLC22 family in mammals, suggesting a physiological and/or toxicological role during the mammalian radiation. Putative SLC22 orthologs exist in worms, sea urchins, flies, and ciona. At least six groups of SLC22 exist. OATs and OCTs form two Major clades of SLC22, within which (apart from Oat and Oct subclades), there are also clear Oat-like, Octn, and Oct-related subclades, as well as a distantly related group we term “Oat-related” (which may have different functions). Based on available data, it is arguable whether SLC22A18, which is related to bacterial drug-proton antiporters, should be assigned to SLC22. Disease-causing mutations, single nucleotide polymorphisms (SNPs) and other functionally analyzed mutations in OAT1, OAT3, URAT1, OCT1, OCT2, OCTN1, and OCTN2 map to the first extracellular domain, the large central intracellular domain, and transmembrane domains 9 and 10. These regions are highly conserved within subclades, but not between subclades, and may be necessary for SLC22 transporter function and functional diversification. Our results not only link function to evolutionarily conserved motifs but indicate the need for a revised sub-classification of SLC22. PMID:26536134

  5. Trichomonas vaginalis acidic phospholipase A2: isolation and partial amino acid sequence.

    PubMed

    Escobedo-Guajardo, Brenda L; González-Salazar, Francisco; Palacios-Corona, Rebeca; Torres de la Cruz, Víctor M; Morales-Vallarta, Mario; Mata-Cárdenas, Benito D; Garza-González, Jesús N; Rivera-Silva, Gerardo; Vargas-Villarreal, Javier

    2013-12-01

    Sexually transmitted diseases are a major cause of acute disease worldwide, and trichomoniasis is the most common and curable disease, generating more than 170 million cases annually worldwide. Trichomonas vaginalis is the causal agent of trichomoniasis and has the ability to destroy in vitro cell monolayers of the vaginal mucosa, where the phospholipases A2 (PLA2) have been reported as potential virulence factors. These enzymes have been partially characterized from the subcellular fraction S30 of pathogenic T. vaginalis strains. The main objective of this study was to purify a phospholipase A2 from T. vaginalis, make a partial characterization, obtain a partial amino acid sequence, and determine its enzymatic participation as hemolytic factor causing lysis of erythrocytes. Trichomonas S30, RF30 and UFF30 sub-fractions from GT-15 strain have the capacity to hydrolyze [2-(14)C-PA]-PC at pH 6.0. Proteins from the UFF30 sub-fraction were separated by affinity chromatography into two eluted fractions with detectable PLA A2 activity. The EDTA-eluted fraction was analyzed by HPLC using on-line HPLC-tandem mass spectrometry and two protein peaks were observed at 8.2 and 13 kDa. Peptide sequences were identified from the proteins present in the eluted EDTA UFF30 fraction; bioinformatic analysis using Protein Link Global Server charged with T. vaginalis protein database suggests that eluted peptides correspond a putative ubiquitin protein in the 8.2 kDa fraction and a phospholipase preserved in the 13 kDa fraction. The EDTA-eluted fraction hydrolyzed [2-(14)C-PA]-PC lyses erythrocytes from Sprague-Dawley in a time and dose-dependent manner. The acidic hemolytic activity decreased by 84% with the addition of 100 μM of Rosenthal's inhibitor.

  6. Alanine substitutions of noncysteine residues in the cysteine-stabilized αβ motif

    PubMed Central

    Yang, Ying-Fang; Cheng, Kuo-Chang; Tsai, Ping-Hsing; Liu, Chung-Cheng; Lee, Tian-Ren; Ping-Chiang Lyu

    2009-01-01

    The protein scaffold is a peptide framework with a high tolerance of residue modifications. The cysteine-stabilized αβ motif (CSαβ) consists of an α-helix and an antiparallel triple-stranded β-sheet connected by two disulfide bridges. Proteins containing this motif share low sequence identity but high structural similarity and has been suggested as a good scaffold for protein engineering. The Vigna radiate defensin 1 (VrD1), a plant defensin, serves here as a model protein to probe the amino acid tolerance of CSαβ motif. A systematic alanine substitution is performed on the VrD1. The key residues governing the inhibitory function and structure stability are monitored. Thirty-two of 46 residue positions of VrD1 are altered by site-directed mutagenesis techniques. The circular dichroism spectrum, intrinsic fluorescence spectrum, and chemical denaturation are used to analyze the conformation and structural stability of proteins. The secondary structures were highly tolerant to the amino acid substitutions; however, the protein stabilities were varied for each mutant. Many mutants, although they maintained their conformations, altered their inhibitory function significantly. In this study, we reported the first alanine scan on the plant defensin containing the CSαβ motif. The information is valuable to the scaffold with the CSαβ motif and protein engineering. PMID:19533758

  7. Identification of random nucleic acid sequence aberrations using dual capture probes which hybridize to different chromosome regions

    DOEpatents

    Lucas, J.N.; Straume, T.; Bogen, K.T.

    1998-03-24

    A method is provided for detecting nucleic acid sequence aberrations using two immobilization steps. According to the method, a nucleic acid sequence aberration is detected by detecting nucleic acid sequences having both a first nucleic acid sequence type (e.g., from a first chromosome) and a second nucleic acid sequence type (e.g., from a second chromosome), the presence of the first and the second nucleic acid sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. In the method, immobilization of a first hybridization probe is used to isolate a first set of nucleic acids in the sample which contain the first nucleic acid sequence type. Immobilization of a second hybridization probe is then used to isolate a second set of nucleic acids from within the first set of nucleic acids which contain the second nucleic acid sequence type. The second set of nucleic acids are then detected, their presence indicating the presence of a nucleic acid sequence aberration. 14 figs.

  8. Identification of random nucleic acid sequence aberrations using dual capture probes which hybridize to different chromosome regions

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    1998-01-01

    A method is provided for detecting nucleic acid sequence aberrations using two immobilization steps. According to the method, a nucleic acid sequence aberration is detected by detecting nucleic acid sequences having both a first nucleic acid sequence type (e.g., from a first chromosome) and a second nucleic acid sequence type (e.g., from a second chromosome), the presence of the first and the second nucleic acid sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. In the method, immobilization of a first hybridization probe is used to isolate a first set of nucleic acids in the sample which contain the first nucleic acid sequence type. Immobilization of a second hybridization probe is then used to isolate a second set of nucleic acids from within the first set of nucleic acids which contain the second nucleic acid sequence type. The second set of nucleic acids are then detected, their presence indicating the presence of a nucleic acid sequence aberration.

  9. Model peptide studies of sequence regions in the elastomeric biomineralization protein, Lustrin A. I. The C-domain consensus-PG-, -NVNCT-motif.

    PubMed

    Zhang, Bo; Wustman, Brandon A; Morse, Daniel; Evans, John Spencer

    2002-05-01

    The lustrin superfamily represents a unique group of biomineralization proteins localized between layered aragonite mineral plates (i.e., nacre layer) in mollusk shell. Recent atomic force microscopy (AFM) pulling studies have demonstrated that the lustrin-containing organic nacre layer in the abalone, Haliotis rufescens, exhibits a typical sawtooth force-extension curve with hysteretic recovery. This force extension behavior is reminiscent of reversible unfolding and refolding in elastomeric proteins such as titin and tenascin. Since secondary structure plays an important role in force-induced protein unfolding and refolding, the question is, What secondary structure(s) exist within the major domains of Lustrin A? Using a model peptide (FPGKNVNCTSGE) representing the 12-residue consensus sequence found near the N-termini of the first eight cysteine-rich domains (C-domains) within the Lustrin A protein, we employed CD, NMR spectroscopy, and simulated annealing/minimization to determine the secondary structure preferences for this sequence. At pH 7.4, we find that the 12-mer sequence adopts a loop conformation, consisting of a "bend" or "turn" involving residues G3-K4 and N7-C8-T9, with extended conformations arising at F1-G3; K4-V6; T9-S10-G11 in the sequence. Minor pH-dependent conformational effects were noted for this peptide; however, there is no evidence for a salt-bridge interaction between the K4 and E12 side chains. The presence of a loop conformation within the highly conserved -PG-, -NVNCT- sequence of C1-C8 domains may have important structural and mechanistic implications for the Lustrin A protein with regard to elastic behavior.

  10. The amino acid sequence of protein CM-3 from Dendroaspis polylepis polylepis (black mamba) venom.

    PubMed

    Joubert, F J

    1985-01-01

    Protein CM-3 from Dendroaspis polylepis polylepis venom was purified by gel filtration and ion exchange chromatography. It comprises 65 amino acids including eight half-cystines. The complete amino acid sequence of protein CM-3 has been elucidated. The sequence (residues 1-50) resembles that of the N-terminal sequence of the subunits of a synergistic type protein and residues 51-65 that of the C-terminal sequence of an angusticeps type protein. Mixtures of protein CM-3 and angusticeps type proteins showed no apparent synergistic effect, in that their toxicity in combination was no greater than the sum of their individual toxicities.

  11. The amino acid sequences of the Fd fragments of two human γ heavy chains

    PubMed Central

    Press, E. M.; Hogg, N. M.

    1970-01-01

    The amino acid sequences of the Fd fragments of two human pathological immunoglobulins of the immunoglobulin G1 class are reported. Comparison of the two sequences shows that the heavy-chain variable regions are similar in length to those of the light chains. The existence of heavy chain variable region subgroups is also deduced, from a comparison of these two sequences with those of another γ 1 chain, Eu, a μ chain, Ou, and the partial sequence of a fourth γ 1 chain, Ste. Carbohydrate has been found to be linked to an aspartic acid residue in the variable region of one of the γ 1 chains, Cor. PMID:5449120

  12. Cloning, expression and functional characterization of the putative regeneration and tolerance factor (RTF/TJ6) as a functional vacuolar ATPase proton pump regulatory subunit with a conserved sequence of immunoreceptor tyrosine-based activation motif.

    PubMed

    Babichev, Yael; Tamir, Ami; Park, Meeyoug; Muallem, Shmuel; Isakov, Noah

    2005-10-01

    In an attempt to identify new immunoreceptor tyrosine-based activation motif (ITAM)-containing human molecules that may regulate hitherto unknown immune cell functions, we BLAST searched the National Center for Biotechnology Information database for ITAM-containing sequences. A human expressed sequence tag showing partial homology to the murine TJ6 (mTJ6) gene and encoding a putative ITAM sequence has been identified and used to clone the human TJ6 (hTJ6) gene from an HL-60-derived cDNA library. hTJ6 was found to encode a protein of 856 residues with a calculated mass of 98 155 Da. Immunolocalization and sequence analysis revealed that hTJ6 is a membrane protein with predicted six transmembrane-spanning regions, typical of ion channels, and a single putative ITAM (residues 452-466) in a juxtamembrane or hydrophobic intramembrane region. hTJ6 is highly homologous to Bos taurus 116-kDa subunit of the vacuolar proton-translocating ATPase. Over-expression of hTJ6 in HEK 293 cells increased H+ uptake into intracellular organelles, an effect that was sensitive to inhibition by bafilomycin, a selective inhibitor of vacuolar H+ pump. Northern blot analysis demonstrated three different hybridizing mRNA transcripts corresponding to 3.2, 5.0 and 7.3 kb, indicating the presence of several splice variants. Significant differences in hTJ6 mRNA levels in human tissues of different origins point to possible tissue-specific function. Although hTJ6 was found to be a poor substrate for tyrosine-phosphorylating enzymes, suggesting that its ITAM sequence is non-functional in protein tyrosine kinase-mediated signaling pathways, its role in organellar H+ pumping suggests that hTJ6 function may participate in protein trafficking/processing.

  13. The VQ Motif-Containing Protein Family of Plant-Specific Transcriptional Regulators1

    PubMed Central

    Jing, Yanjun; Lin, Rongcheng

    2015-01-01

    The VQ motif-containing proteins (designated as VQ proteins) are a class of plant-specific proteins with a conserved and single short FxxhVQxhTG amino acid sequence motif. VQ proteins regulate diverse developmental processes, including responses to biotic and abiotic stresses, seed development, and photomorphogenesis. In this Update, we summarize and discuss recent advances in our understanding of the regulation and function of VQ proteins and the role of the VQ motif in mediating transcriptional regulation and protein-protein interactions in signaling pathways. Based on the accumulated evidence, we propose a general mechanism of action for the VQ protein family, which likely defines a novel class of transcriptional regulators specific to plants. PMID:26220951

  14. The Chinese hamster Alu-equivalent sequence: a conserved highly repetitious, interspersed deoxyribonucleic acid sequence in mammals has a structure suggestive of a transposable element.

    PubMed Central

    Haynes, S R; Toomey, T P; Leinwand, L; Jelinek, W R

    1981-01-01

    A consensus sequence has been determined for a major interspersed deoxyribonucleic acid repeat in the genome of Chinese hamster ovary cells (CHO cells). This sequence is extensively homologous to (i) the human Alu sequence (P. L. Deininger et al., J. Mol. Biol., in press), (ii) the mouse B1 interspersed repetitious sequence (Krayev et al., Nucleic Acids Res. 8:1201-1215, 1980) (iii) an interspersed repetitious sequence from African green monkey deoxyribonucleic acid (Dhruva et al., Proc. Natl. Acad. Sci. U.S.A. 77:4514-4518, 1980) and (iv) the CHO and mouse 4.5S ribonucleic acid (this report; F. Harada and N. Kato, Nucleic Acids Res. 8:1273-1285, 1980). Because the CHO consensus sequence shows significant homology to the human Alu sequence it is termed the CHO Alu-equivalent sequence. A conserved structure surrounding CHO Alu-equivalent family members can be recognized. It is similar to that surrounding the human Alu and the mouse B1 sequences, and is represented as follows: direct repeat-CHO-Alu-A-rich sequence-direct repeat. A composite interspersed repetitious sequence has been identified. Its structure is represented as follows: direct repeat-residue 47 to 107 of CHO-Alu-non-Alu repetitious sequence-A-rich sequence-direct repeat. Because the Alu flanking sequences resemble those that flank known transposable elements, we think it likely that the Alu sequence dispersed throughout the mammalian genome by transposition. Images PMID:9279371

  15. Identification of a common hyaluronan binding motif in the hyaluronan binding proteins RHAMM, CD44 and link protein.

    PubMed Central

    Yang, B; Yang, B L; Savani, R C; Turley, E A

    1994-01-01

    We have previously identified two hyaluronan (HA) binding domains in the HA receptor, RHAMM, that occur near the carboxyl-terminus of this protein. We show here that these two HA binding domains are the only HA binding regions in RHAMM, and that they contribute approximately equally to the HA binding ability of this receptor. Mutation of domain II using recombinant polypeptides of RHAMM demonstrates that K423 and R431, spaced seven amino acids apart, are critical for HA binding activity. Domain I contains two sets of two basic amino acids, each spaced seven residues apart, and mutation of these basic amino acids reduced their binding to HA--Sepharose. These results predict that two basic amino acids flanking a seven amino acid stretch [hereafter called B(X7)B] are minimally required for HA binding activity. To assess whether this motif predicts HA binding in the intact RHAMM protein, we mutated all basic amino acids in domains I and II that form part of these motifs using site-directed mutagenesis and prepared fusion protein from the mutated cDNA. The altered RHAMM protein did not bind HA, confirming that the basic amino acids and their spacing are critical for binding. A specific requirement for arginine or lysine residues was identified since mutation of K430, R431 and K432 to histidine residues abolished binding. Clustering of basic amino acids either within or at either end of the motif enhanced HA binding activity while the occurrence of acidic residues between the basic amino acids reduced binding. The B(X7)B motif, in which B is either R or K and X7 contains no acidic residues and at least one basic amino acid, was found in all HA binding proteins molecularly characterized to date. Recombinant techniques were used to generate chimeric proteins containing either the B(X7)B motifs present in CD44 or link protein, with the amino-terminus of RHAMM (amino acids 1-238) that does not bind HA. All chimeric proteins containing the motif bound HA in transblot analyses

  16. Target motifs affecting natural immunity by a constitutive CRISPR-Cas system in Escherichia coli.

    PubMed

    Almendros, Cristóbal; Guzmán, Noemí M; Díez-Villaseñor, César; García-Martínez, Jesús; Mojica, Francisco J M

    2012-01-01

    Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) and CRISPR associated (cas) genes conform the CRISPR-Cas systems of various bacteria and archaea and produce degradation of invading nucleic acids containing sequences (protospacers) that are complementary to repeat intervening spacers. It has been demonstrated that the base sequence identity of a protospacer with the cognate spacer and the presence of a protospacer adjacent motif (PAM) influence CRISPR-mediated interference efficiency. By using an original transformation assay with plasmids targeted by a resident spacer here we show that natural CRISPR-mediated immunity against invading DNA occurs in wild type Escherichia coli. Unexpectedly, the strongest activity is observed with protospacer adjoining nucleotides (interference motifs) that differ from the PAM both in sequence and location. Hence, our results document for the first time native CRISPR activity in E. coli and demonstrate that positions next to the PAM in invading DNA influence their recognition and degradation by these prokaryotic immune systems.

  17. Identification of peptide motif that binds to the surface of zirconia.

    PubMed

    Hashimoto, Kazuhiko; Yoshinari, Masao; Matsuzaka, Kenichi; Shiba, Kiyotaka; Inoue, Takashi

    2011-01-01

    A zirconia-binding peptide motif was identified using a peptide phage display system. Yttria stabilized zirconia beads and discs were used as the target. Quartz crystal microbalance was used to monitor the binding of phages to zirconia. Starting from a library of phages displaying random sequences of 12-mer peptides, we repeated cycles of biopanning against zirconia beads. After four cycles of biopanning, we isolated a phage clone Φ#17. DNA sequencing of the corresponding portion of Φ#17 unexpectedly revealed that it displayed a 58-mer peptide (amino acid sequence: WMPSDVDINDPQGGGSRPNLHQPKPAAEAASKKKSENRKVPFYSHSWY-SSMSEDKRGW). We found that Φ#17 had a 300-fold, significantly higher binding affinity for zirconia discs than phages displaying no peptide. In quartz crystal microbalance assay, a rapid increase in energy dissipation was observed from Φ#17 but not from the control phages, indicating that Φ#17 binds to the surface of zirconia via its displayed peptide. We successfully identified a peptide motif that binds zirconia.

  18. The amino acid sequence of goat beta-lactoglobulin.

    PubMed

    Préaux, G; Braunitzer, G; Schrank, B; Stangl, A

    1979-11-01

    The isolation of beta-lactoglobulin from milk of the goat is described. The purified protein was checked for purity and has been characterized by its gross composition and end groups. The native or the modified protein was then degraded by tryptic and cyanogen bromide cleavage. The cleavage products were isolated and sequenced in the sequenator using a Quadrol and propyne program. These data provide the complete sequence of beta-lactoglobulin of the goat. The results are discussed and compared particularly with bovine beta-lactoglobulin components AB. Some biological aspects are described.

  19. Layered materials with coexisting acidic and basic sites for catalytic one-pot reaction sequences.

    PubMed

    Motokura, Ken; Tada, Mizuki; Iwasawa, Yasuhiro

    2009-06-17

    Acidic montmorillonite-immobilized primary amines (H-mont-NH(2)) were found to be excellent acid-base bifunctional catalysts for one-pot reaction sequences, which are the first materials with coexisting acid and base sites active for acid-base tamdem reactions. For example, tandem deacetalization-Knoevenagel condensation proceeded successfully with the H-mont-NH(2), affording the corresponding condensation product in a quantitative yield. The acidity of the H-mont-NH(2) was strongly influenced by the preparation solvent, and the base-catalyzed reactions were enhanced by interlayer acid sites.

  20. Synthesis of gamma,delta-unsaturated glycolic acids via sequenced brook and Ireland--claisen rearrangements.

    PubMed

    Schmitt, Daniel C; Johnson, Jeffrey S

    2010-03-05

    Organozinc, -magnesium, and -lithium nucleophiles initiate a Brook/Ireland-Claisen rearrangement sequence of allylic silyl glyoxylates resulting in the formation of gamma,delta-unsaturated alpha-silyloxy acids.

  1. Computer Simulation of the Determination of Amino Acid Sequences in Polypeptides

    ERIC Educational Resources Information Center

    Daubert, Stephen D.; Sontum, Stephen F.

    1977-01-01

    Describes a computer program that generates a random string of amino acids and guides the student in determining the correct sequence of a given protein by using experimental analytic data for that protein. (MLH)

  2. Numb directs the subcellular localization of EAAT3 through binding the YxNxxF motif.

    PubMed

    Su, Jin-Feng; Wei, Jian; Li, Pei-Shan; Miao, Hong-Hua; Ma, Yong-Chao; Qu, Yu-Xiu; Xu, Jie; Qin, Jie; Li, Bo-Liang; Song, Bao-Liang; Xu, Zheng-Ping; Luo, Jie

    2016-08-15

    Excitatory amino acid transporter type 3 (EAAT3, also known as SLC1A1) is a high-affinity, Na(+)-dependent glutamate carrier that localizes primarily within the cell and at the apical plasma membrane. Although previous studies have reported proteins and sequence regions involved in EAAT3 trafficking, the detailed molecular mechanism by which EAAT3 is distributed to the correct location still remains elusive. Here, we identify that the YVNGGF sequence in the C-terminus of EAAT3 is responsible for its intracellular localization and apical sorting in rat hepatoma cells CRL1601 and Madin-Darby canine kidney (MDCK) cells, respectively. We further demonstrate that Numb, a clathrin adaptor protein, directly binds the YVNGGF motif and regulates the localization of EAAT3. Mutation of Y503, N505 and F508 within the YVNGGF motif to alanine residues or silencing Numb by use of small interfering RNA (siRNA) results in the aberrant localization of EAAT3. Moreover, both Numb and the YVNGGF motif mediate EAAT3 endocytosis in CRL1601 cells. In summary, our study suggests that Numb is a pivotal adaptor protein that mediates the subcellular localization of EAAT3 through binding the YxNxxF (where x stands for any amino acid) motif.

  3. Targeting functional motifs of a protein family

    NASA Astrophysics Data System (ADS)

    Bhadola, Pradeep; Deo, Nivedita

    2016-10-01

    The structural organization of a protein family is investigated by devising a method based on the random matrix theory (RMT), which uses the physiochemical properties of the amino acid with multiple sequence alignment. A graphical method to represent protein sequences using physiochemical properties is devised that gives a fast, easy, and informative way of comparing the evolutionary distances between protein sequences. A correlation matrix associated with each property is calculated, where the noise reduction and information filtering is done using RMT involving an ensemble of Wishart matrices. The analysis of the eigenvalue statistics of the correlation matrix for the β -lactamase family shows the universal features as observed in the Gaussian orthogonal ensemble (GOE). The property-based approach captures the short- as well as the long-range correlation (approximately following GOE) between the eigenvalues, whereas the previous approach (treating amino acids as characters) gives the usual short-range correlations, while the long-range correlations are the same as that of an uncorrelated series. The distribution of the eigenvector components for the eigenvalues outside the bulk (RMT bound) deviates significantly from RMT observations and contains important information about the system. The information content of each eigenvector of the correlation matrix is quantified by introducing an entropic estimate, which shows that for the β -lactamase family the smallest eigenvectors (low eigenmodes) are highly localized as well as informative. These small eigenvectors when processed gives clusters involving positions that have well-defined biological and structural importance matching with experiments. The approach is crucial for the recognition of structural motifs as shown in β -lactamase (and other families) and selectively identifies the important positions for targets to deactivate (activate) the enzymatic actions.

  4. Targeting functional motifs of a protein family.

    PubMed

    Bhadola, Pradeep; Deo, Nivedita

    2016-10-01

    The structural organization of a protein family is investigated by devising a method based on the random matrix theory (RMT), which uses the physiochemical properties of the amino acid with multiple sequence alignment. A graphical method to represent protein sequences using physiochemical properties is devised that gives a fast, easy, and informative way of comparing the evolutionary distances between protein sequences. A correlation matrix associated with each property is calculated, where the noise reduction and information filtering is done using RMT involving an ensemble of Wishart matrices. The analysis of the eigenvalue statistics of the correlation matrix for the β-lactamase family shows the universal features as observed in the Gaussian orthogonal ensemble (GOE). The property-based approach captures the short- as well as the long-range correlation (approximately following GOE) between the eigenvalues, whereas the previous approach (treating amino acids as characters) gives the usual short-range correlations, while the long-range correlations are the same as that of an uncorrelated series. The distribution of the eigenvector components for the eigenvalues outside the bulk (RMT bound) deviates significantly from RMT observations and contains important information about the system. The information content of each eigenvector of the correlation matrix is quantified by introducing an entropic estimate, which shows that for the β-lactamase family the smallest eigenvectors (low eigenmodes) are highly localized as well as informative. These small eigenvectors when processed gives clusters involving positions that have well-defined biological and structural importance matching with experiments. The approach is crucial for the recognition of structural motifs as shown in β-lactamase (and other families) and selectively identifies the important positions for targets to deactivate (activate) the enzymatic actions.

  5. An Affinity Propagation-Based DNA Motif Discovery Algorithm.

    PubMed

    Sun, Chunxiao; Huo, Hongwei; Yu, Qiang; Guo, Haitao; Sun, Zhigang

    2015-01-01

    The planted (l, d) motif search (PMS) is one of the fundamental problems in bioinformatics, which plays an important role in locating transcription factor binding sites (TFBSs) in DNA sequences. Nowadays, identifying weak motifs and reducing the effect of local optimum are still important but challenging tasks for motif discovery. To solve the tasks, we propose a new algorithm, APMotif, which first applies the Affinity Propagation (AP) clustering in DNA sequences to produce informative and good candidate motifs and then employs Expectation Maximization (EM) refinement to obtain the optimal motifs from the candidate motifs. Experimental results both on simulated data sets and real biological data sets show that APMotif usually outperforms four other widely used algorithms in terms of high prediction accuracy.

  6. Genome sequence of the acid-tolerant strain Rhizobium sp. LPU83.

    PubMed

    Wibberg, Daniel; Tejerizo, Gonzalo Torres; Del Papa, María Florencia; Martini, Carla; Pühler, Alfred; Lagares, Antonio; Schlüter, Andreas; Pistorio, Mariano

    2014-04-20

    Rhizobia are important members of the soil microbiome since they enter into nitrogen-fixing symbiosis with different legume host plants. Rhizobium sp. LPU83 is an acid-tolerant Rhizobium strain featuring a broad-host-range. However, it is ineffective in nitrogen fixation. Here, the improved draft genome sequence of this strain is reported. Genome sequence information provides the basis for analysis of its acid tolerance, symbiotic properties and taxonomic classification.

  7. A molecular mechanism realizing sequence-specific recognition of nucleic acids by TDP-43

    PubMed Central

    Furukawa, Yoshiaki; Suzuki, Yoh; Fukuoka, Mami; Nagasawa, Kenichi; Nakagome, Kenta; Shimizu, Hideaki; Mukaiyama, Atsushi; Akiyama, Shuji

    2016-01-01

    TAR DNA-binding protein 43 (TDP-43) is a DNA/RNA-binding protein containing two consecutive RNA recognition motifs (RRM1 and RRM2) in tandem. Functional abnormality of TDP-43 has been proposed to cause neurodegeneration, but it remains obscure how the physiological functions of this protein are regulated. Here, we show distinct roles of RRM1 and RRM2 in the sequence-specific substrate recognition of TDP-43. RRM1 was found to bind a wide spectrum of ssDNA sequences, while no binding was observed between RRM2 and ssDNA. When two RRMs are fused in tandem as in native TDP-43, the fused construct almost exclusively binds ssDNA with a TG-repeat sequence. In contrast, such sequence-specificity was not observed in a simple mixture of RRM1 and RRM2. We thus propose that the spatial arrangement of multiple RRMs in DNA/RNA binding proteins provides steric effects on the substrate-binding site and thereby controls the specificity of its substrate nucleotide sequences. PMID:26838063

  8. Stable proline box motif at the N-terminal end of alpha-helices.

    PubMed Central

    Viguera, A. R.; Serrano, L.

    1999-01-01

    We describe a novel N-terminal alpha-helix local motif that involves three hydrophobic residues and a Pro residue (Pro-box motif). Database analysis shows that when Pro is the N-cap of an alpha-helix the distribution of amino acids in adjacent positions changes dramatically with respect to the average distribution in an alpha-helix, but not when Pro is at position N1. N-cap Pro residues are usually associated to Ile and Leu, at position N', Val at position N3 and a hydrophobic residue (h) at position N4. The side chain of the N-cap Pro packs against Val, while the hydrophobic residues at positions N' and N4 make favorable interactions. To analyze the role of this putative motif (sequence fingerprint hPXXhh), we have synthesized a series of peptides and analyzed them by circular dichroism (CD) and NMR. We find that this motif is formed in peptides, and that the accompanying hydrophobic interactions contribute up to 1.2 kcal/mol to helix stability. The fact that some of the residues in this fingerprint are not good N-cap and helix formers results in a small overall stabilization of the alpha-helix with respect to other peptides having Gly as the N-cap and Ala at N3 and N4. This suggests that the Pro-box motif will not specially contribute to protein stability but to the specificity of its fold. In fact, 80% of the sequences that contain the fingerprint sequence in the protein database are adopting the described structural motif, and in none of them is the helix extended to place Pro at the more favorable N1 position. PMID:10493574

  9. Bases of motifs for generating repeated patterns with wild cards.

    PubMed

    Pisanti, Nadia; Crochemore, Maxime; Grossi, Roberto; Sagot, Marie-France

    2005-01-01

    Motif inference represents one of the most important areas of research in computational biology, and one of its oldest ones. Despite this, the problem remains very much open in the sense that no existing definition is fully satisfying, either in formal terms, or in relation to the biological questions that involve finding such motifs. Two main types of motifs have been considered in the literature: matrices (of letter frequency per position in the motif) and patterns. There is no conclusive evidence in favor of either, and recent work has attempted to integrate the two types into a single model. In this paper, we address the formal issue in relation to motifs as patterns. This is essential to get at a better understanding of motifs in general. In particular, we consider a promising idea that was recently proposed, which attempted to avoid the combinatorial explosion in the number of motifs by means of a generator set for the motifs. Instead of exhibiting a complete list of motifs satisfying some input constraints, what is produced is a basis of such motifs from which all the other ones can be generated. We study the computational cost of determining such a basis of repeated motifs with wild cards in a sequence. We give new upper and lower bounds on such a cost, introducing a notion of basis that is provably contained in (and, thus, smaller) than previously defined ones. Our basis can be computed in less time and space, and is still able to generate the same set of motifs. We also prove that the number of motifs in all bases defined so far grows exponentially with the quorum, that is, with the minimal number of times a motif must appear in a sequence, something unnoticed in previous work. We show that there is no hope to efficiently compute such bases unless the quorum is fixed.

  10. The amino acid sequence of monal pheasant lysozyme and its activity.

    PubMed

    Araki, T; Matsumoto, T; Torikata, T

    1998-10-01

    The amino acid sequence of monal pheasant lysozyme and its activity were analyzed. Carboxymethylated lysozyme was digested with trypsin and the resulting peptides were sequenced. The established amino acid sequence had one amino acid substitution at position 102 (Arg to Gly) comparing with Indian peafowl lysozyme and four amino acid substitutions at positions 3 (Phe to Tyr), 15 (His to Leu), 41 (Gln to His), and 121 (Gln to His) with chicken lysozyme. Analysis of the time-courses of reaction using N-acetylglucosamine pentamer as a substrate showed a difference of binding free energy change (-0.4 kcal/mol) at subsites A between monal pheasant and Indian peafowl lysozyme. This was assumed to be caused by the amino acid substitution at subsite A with loss of a positive charge at position 102 (Arg102 to Gly).

  11. Single-chain structure of human ceruloplasmin: the complete amino acid sequence of the whole molecule.

    PubMed Central

    Takahashi, N; Ortel, T L; Putnam, F W

    1984-01-01

    We have determined the amino acid sequence of the amino-terminal 67,000-dalton (67-kDa) fragment of human ceruloplasmin and have established overlapping sequences between the 67-kDa and 50-kDa fragments and between the 50-kDa and 19-kDa fragments. The 67-kDa fragment contains 480 amino acid residues and three glucosamine oligosaccharides. These results together with our previous sequence data for the 50-kDa and 19-kDa fragments complete the amino acid sequence of human ceruloplasmin. The polypeptide chain has a total of 1,046 amino acid residues (Mr 120,085) and has attachment sites for four glucosamine oligosaccharides; together these account for the total molecular mass of human ceruloplasmin (132 kDa). The sequence analysis of the peptides overlapping the fragments showed that one additional amino acid, arginine, is present between the 67-kDa and 50-kDa fragments, and another, lysine, is between the 50-kDa and 19-kDa fragments. Only two apparent sites of amino acid interchange have been identified in the polypeptide chain. Both involve a single-point interchange of glycine and lysine that would result in a difference in charge. The results of the complete sequence analysis verified that human ceruloplasmin is composed of a single polypeptide chain and that the subunit-like fragments are produced by proteolytic cleavage during purification (and possibly also in vivo). PMID:6582496

  12. Multiple Genome Sequences of Important Beer-Spoiling Lactic Acid Bacteria

    PubMed Central

    Geissler, Andreas J.; Vogel, Rudi F.

    2016-01-01

    Seven strains of important beer-spoiling lactic acid bacteria were sequenced using single-molecule real-time sequencing. Complete genomes were obtained for strains of Lactobacillus paracollinoides, Lactobacillus lindneri, and Pediococcus claussenii. The analysis of these genomes emphasizes the role of plasmids as the genomic foundation of beer-spoiling ability. PMID:27795248

  13. Discovering interacting domains and motifs in protein-protein interactions.

    PubMed

    Hugo, Willy; Sung, Wing-Kin; Ng, See-Kiong

    2013-01-01

    Many important biological processes, such as the signaling pathways, require protein-protein interactions (PPIs) that are designed for fast response to stimuli. These interactions are usually transient, easily formed, and disrupted, yet specific. Many of these transient interactions involve the binding of a protein domain to a short stretch (3-10) of amino acid residues, which can be characterized by a sequence pattern, i.e., a short linear motif (SLiM). We call these interacting domains and motifs domain-SLiM interactions. Existing methods have focused on discovering SLiMs in the interacting proteins' sequence data. With the recent increase in protein structures, we have a new opportunity to detect SLiMs directly from the proteins' 3D structures instead of their linear sequences. In this chapter, we describe a computational method called SLiMDIet to directly detect SLiMs on domain interfaces extracted from 3D structures of PPIs. SLiMDIet comprises two steps: (1) interaction interfaces belonging to the same domain are extracted and grouped together using structural clustering and (2) the extracted interaction interfaces in each cluster are structurally aligned to extract the corresponding SLiM. Using SLiMDIet, de novo SLiMs interacting with protein domains can be computationally detected from structurally clustered domain-SLiM interactions for PFAM domains which have available 3D structures in the PDB database.

  14. PASTA: Ultra-Large Multiple Sequence Alignment for Nucleotide and Amino-Acid Sequences.

    PubMed

    Mirarab, Siavash; Nguyen, Nam; Guo, Sheng; Wang, Li-San; Kim, Junhyong; Warnow, Tandy

    2015-05-01

    We introduce PASTA, a new multiple sequence alignment algorithm. PASTA uses a new technique to produce an alignment given a guide tree that enables it to be both highly scalable and very accurate. We present a study on biological and simulated data with up to 200,000 sequences, showing that PASTA produces highly accurate alignments, improving on the accuracy and scalability of the leading alignment methods (including SATé). We also show that trees estimated on PASTA alignments are highly accurate--slightly better than SATé trees, but with substantial improvements relative to other methods. Finally, PASTA is faster than SATé, highly parallelizable, and requires relatively little memory.

  15. SETG: Nucleic Acid Extraction and Sequencing for In Situ Life Detection on Mars

    NASA Astrophysics Data System (ADS)

    Mojarro, A.; Hachey, J.; Tani, J.; Smith, A.; Bhattaru, S. A.; Pontefract, A.; Doebler, R.; Brown, M.; Ruvkun, G.; Zuber, M. T.; Carr, C. E.

    2016-10-01

    We are developing an integrated nucleic acid extraction and sequencing instrument: the Search for Extra-Terrestrial Genomes (SETG) for in situ life detection on Mars. Our goals are to identify related or unrelated nucleic acid-based life on Mars.

  16. Draft Genome Sequence of Cyanobacterium sp. Strain IPPAS B-1200 with a Unique Fatty Acid Composition

    PubMed Central

    Starikov, Alexander Y.; Usserbaeva, Aizhan A.; Sinetova, Maria A.; Sarsekeyeva, Fariza K.; Zayadan, Bolatkhan K.; Ustinova, Vera V.; Kupriyanova, Elena V.; Los, Dmitry A.

    2016-01-01

    Here, we report the draft genome of Cyanobacterium sp. IPPAS strain B-1200, isolated from Lake Balkhash, Kazakhstan, and characterized by the unique fatty acid composition of its membrane lipids, which are enriched with myristic and myristoleic acids. The approximate genome size is 3.4 Mb, and the predicted number of coding sequences is 3,119. PMID:27856596

  17. Sequencing and computational analysis of complete genome sequences of Citrus yellow mosaic badna virus from acid lime and pummelo.

    PubMed

    Borah, Basanta K; Johnson, A M Anthony; Sai Gopal, D V R; Dasgupta, Indranil

    2009-08-01

    Citrus yellow mosaic badna virus (CMBV), a member of the Family Caulimoviridae, Genus Badnavirus, is the causative agent of Citrus mosaic disease in India. Although the virus has been detected in several citrus species, only two full-length genomes, one each from Sweet orange and Rangpur lime, are available in publicly accessible databases. In order to obtain a better understanding of the genetic variability of the virus in other citrus mosaic-affected citrus species, we performed the cloning and sequence analysis of complete genomes of CMBV from two additional citrus species, Acid lime and Pummelo. We show that CMBV genomes from the two hosts share high homology with previously reported CMBV sequences and hence conclude that the new isolates represent variants of the virus present in these species. Based on in silico sequence analysis, we predict the possible function of the protein encoded by one of the five ORFs.

  18. Parvalbumins from coelacanth muscle. III. Amino acid sequence of the major component.

    PubMed

    Jauregui-Adell, J; Pechere, J F

    1978-09-26

    The primary structure of the major parvalbumin (pI = 4.52) from coelacanth muscle (Latimeria chalumnae) has been determined. Sequence analysis of the tryptic peptides, in some cases obtained with beta-trypsin, accounts for the total amino acid content of the protein. Chymotryptic peptides provide appropriate sequence overlaps, to complete the localization of the tryptic peptides. Examination of the amino acid sequence of this protein shows the typical structure of a beta-parvalbumin. Its position in the dendrogram of related calcium-binding proteins corresponds to that usually accepted for crossopterygians.

  19. Analysis of cloned cDNA and genomic sequences for phytochrome: complete amino acid sequences for two gene products expressed in etiolated Avena.

    PubMed Central

    Hershey, H P; Barker, R F; Idler, K B; Lissemore, J L; Quail, P H

    1985-01-01

    Cloned cDNA and genomic sequences have been analyzed to deduce the amino acid sequence of phytochrome from etiolated Avena. Restriction endonuclease site polymorphism between clones indicates that at least four phytochrome genes are expressed in this tissue. Sequence analysis of two complete and one partial coding region shows approximately 98% homology at both the nucleotide and amino acid levels, with the majority of amino acid changes being conservative. High sequence homology is also found in the 5'-untranslated region but significant divergence occurs in the 3'-untranslated region. The phytochrome polypeptides are 1128 amino acid residues long corresponding to a molecular mass of 125 kdaltons. The known protein sequence at the chromophore attachment site occurs only once in the polypeptide, establishing that phytochrome has a single chromophore per monomer covalently linked to Cys-321. Computer analyses of the amino acid sequences have provided predictions regarding a number of structural features of the phytochrome molecule. PMID:3001642

  20. Genomic analysis of membrane protein families: abundance and conserved motifs

    PubMed Central

    Liu, Yang; Engelman, Donald M; Gerstein, Mark

    2002-01-01

    Background Polytopic membrane proteins can be related to each other on the basis of the number of transmembrane helices and sequence similarities. Building on the Pfam classification of protein domain families, and using transmembrane-helix prediction and sequence-similarity searching, we identified a total of 526 well-characterized membrane protein families in 26 recently sequenced genomes. To this we added a clustering of a number of predicted but unclassified membrane proteins, resulting in a total of 637 membrane protein families. Results Analysis of the occurrence and composition of these families revealed several interesting trends. The number of assigned membrane protein domains has an approximately linear relationship to the total number of open reading frames (ORFs) in 26 genomes studied. Caenorhabditis elegans is an apparent outlier, because of its high representation of seven-span transmembrane (7-TM) chemoreceptor families. In all genomes, including that of C. elegans, the number of distinct membrane protein families has a logarithmic relation to the number of ORFs. Glycine, proline, and tyrosine locations tend to be conserved in transmembrane regions within families, whereas isoleucine, valine, and methionine locations are relatively mutable. Analysis of motifs in putative transmembrane helices reveals that GxxxG and GxxxxxxG (which can be written GG4 and GG7, respectively; see Materials and methods) are among the most prevalent. This was noted in earlier studies; we now find these motifs are particularly well conserved in families, however, especially those corresponding to transporters, symporters, and channels. Conclusions We carried out a genome-wide analysis on patterns of the classified polytopic membrane protein families and analyzed the distribution of conserved amino acids and motifs in the transmembrane helix regions in these families. PMID:12372142

  1. Identification and characterization of four novel peptide motifs that recognize distinct regions of the transcription factor CP2.

    PubMed

    Kang, Ho Chul; Chung, Bo Mee; Chae, Ji Hyung; Yang, Sung-Il; Kim, Chan Gil; Kim, Chul Geun

    2005-03-01

    Although ubiquitously expressed, the transcriptional factor CP2 also exhibits some tissue- or stage-specific activation toward certain genes such as globin in red blood cells and interleukin-4 in T helper cells. Because this specificity may be achieved by interaction with other proteins, we screened a peptide display library and identified four consensus motifs in numerous CP2-binding peptides: HXPR, PHL, ASR and PXHXH. Protein-database searching revealed that RE-1 silencing factor (REST), Yin-Yang1 (YY1) and five other proteins have one or two of these CP2-binding motifs. Glutathione S-transferase pull-down and coimmunoprecipitation assays showed that two HXPR motif-containing proteins REST and YY1 indeed were able to bind CP2. Importantly, this binding to CP2 was almost abolished when a double amino acid substitution was made on the HXPR sequence of REST and YY1 proteins. The suppressing effect of YY1 on CP2's transcriptional activity was lost by this point mutation on the HXPR sequence of YY1 and reduced by an HXPR-containing peptide, further supporting the interaction between CP2 and YY1 via the HXPR sequence. Mapping the sites on CP2 for interaction with the four distinct CP2-binding motifs revealed at least three different regions on CP2. This suggests that CP2 recognizes several distinct binding motifs by virtue of employing different regions, thus being able to interact with and regulate many cellular partners.

  2. Purification, characterization and partial amino acid sequence of glycogen synthase from Saccharomyces cerevisiae.

    PubMed Central

    Carabaza, A; Arino, J; Fox, J W; Villar-Palasi, C; Guinovart, J J

    1990-01-01

    Glycogen synthase from Saccharomyces cerevisiae was purified to homogeneity. The enzyme showed a subunit molecular mass of 80 kDa. The holoenzyme appears to be a tetramer. Antibodies developed against purified yeast glycogen synthase inactivated the enzyme in yeast extracts and allowed the detection of the protein in Western blots. Amino acid analysis showed that the enzyme is very rich in glutamate and/or glutamine residues. The N-terminal sequence (11 amino acid residues) was determined. In addition, selected tryptic-digest peptides were purified by reverse-phase h.p.l.c. and submitted to gas-phase sequencing. Up to eight sequences (79 amino acid residues) could be aligned with the human muscle enzyme sequence. Levels of identity range between 37 and 100%, indicating that, although human and yeast glycogen synthases probably share some conserved regions, significant differences in their primary structure should be expected. Images Fig. 1. Fig. 2. Fig. 3. PMID:2114092

  3. Amino acid sequence of anionic peroxidase from the windmill palm tree Trachycarpus fortunei.

    PubMed

    Baker, Margaret R; Zhao, Hongwei; Sakharov, Ivan Yu; Li, Qing X

    2014-12-10

    Palm peroxidases are extremely stable and have uncommon substrate specificity. This study was designed to fill in the knowledge gap about the structures of a peroxidase from the windmill palm tree Trachycarpus fortunei. The complete amino acid sequence and partial glycosylation were determined by MALDI-top-down sequencing of native windmill palm tree peroxidase (WPTP), MALDI-TOF/TOF MS/MS of WPTP tryptic peptides, and cDNA sequencing. The propeptide of WPTP contained N- and C-terminal signal sequences which contained 21 and 17 amino acid residues, respectively. Mature WPTP was 306 amino acids in length, and its carbohydrate content ranged from 21% to 29%. Comparison to closely related royal palm tree peroxidase revealed structural features that may explain differences in their substrate specificity. The results can be used to guide engineering of WPTP and its novel applications.

  4. TranslatorX: multiple alignment of nucleotide sequences guided by amino acid translations.

    PubMed

    Abascal, Federico; Zardoya, Rafael; Telford, Maximilian J

    2010-07-01

    We present TranslatorX, a web server designed to align protein-coding nucleotide sequences based on their corresponding amino acid translations. Many comparisons between biological sequences (nucleic acids and proteins) involve the construction of multiple alignments. Alignments represent a statement regarding the homology between individual nucleotides or amino acids within homologous genes. As protein-coding DNA sequences evolve as triplets of nucleotides (codons) and it is known that sequence similarity degrades more rapidly at the DNA than at the amino acid level, alignments are generally more accurate when based on amino acids than on their corresponding nucleotides. TranslatorX novelties include: (i) use of all documented genetic codes and the possibility of assigning different genetic codes for each sequence; (ii) a battery of different multiple alignment programs; (iii) translation of ambiguous codons when possible; (iv) an innovative criterion to clean nucleotide alignments with GBlocks based on protein information; and (v) a rich output, including Jalview-powered graphical visualization of the alignments, codon-based alignments coloured according to the corresponding amino acids, measures of compositional bias and first, second and third codon position specific alignments. The TranslatorX server is freely available at http://translatorx.co.uk.

  5. Amino acid sequence of homologous rat atrial peptides: natriuretic activity of native and synthetic forms.

    PubMed Central

    Seidah, N G; Lazure, C; Chrétien, M; Thibault, G; Garcia, R; Cantin, M; Genest, J; Nutt, R F; Brady, S F; Lyle, T A

    1984-01-01

    A substance called atrial natriuretic factor (ANF), localized in secretory granules of atrial cardiocytes, was isolated as four homologous natriuretic peptides from homogenates of rat atria. The complete sequence of the longest form showed that it is composed of 33 amino acids. The three other shorter forms (2-33, 3-33, and 8-33) represent amino-terminally truncated versions of the 33 amino acid parent molecule as shown by analysis of sequence, amino acid composition, or both. The proposed primary structure agrees entirely with the amino acid composition and reveals no significant sequence homology with any known protein or segment of protein. The short form ANF-(8-33) was synthesized by a multi-fragment condensation approach and the synthetic product was shown to exhibit specific activity comparable to that of the natural ANF-(3-33). PMID:6232612

  6. Nucleotide and deduced amino acid sequences of a new subtilisin from an alkaliphilic Bacillus isolate.

    PubMed

    Saeki, Katsuhisa; Magallones, Marietta V; Takimura, Yasushi; Hatada, Yuji; Kobayashi, Tohru; Kawai, Shuji; Ito, Susumu

    2003-10-01

    The gene for a new subtilisin from the alkaliphilic Bacillus sp. KSM-LD1 was cloned and sequenced. The open reading frame of the gene encoded a 97 amino-acid prepro-peptide plus a 307 amino-acid mature enzyme that contained a possible catalytic triad of residues, Asp32, His66, and Ser224. The deduced amino acid sequence of the mature enzyme (LD1) showed approximately 65% identity to those of subtilisins SprC and SprD from alkaliphilic Bacillus sp. LG12. The amino acid sequence identities of LD1 to those of previously reported true subtilisins and high-alkaline proteases were below 60%. LD1 was characteristically stable during incubation with surfactants and chemical oxidants. Interestingly, an oxidizable Met residue is located next to the catalytic Ser224 of the enzyme as in the cases of the oxidation-susceptible subtilisins reported to date.

  7. Shark myelin basic protein: amino acid sequence, secondary structure, and self-association.

    PubMed

    Milne, T J; Atkins, A R; Warren, J A; Auton, W P; Smith, R

    1990-09-01

    Myelin basic protein (MBP) from the Whaler shark (Carcharhinus obscurus) has been purified from acid extracts of a chloroform/methanol pellet from whole brains. The amino acid sequence of the majority of the protein has been determined and compared with the sequences of other MBPs. The shark protein has only 44% homology with the bovine protein, but, in common with other MBPs, it has basic residues distributed throughout the sequence and no extensive segments that are predicted to have an ordered secondary structure in solution. Shark MBP lacks the triproline sequence previously postulated to form a hairpin bend in the molecule. The region containing the putative consensus sequence for encephalitogenicity in the guinea pig contains several substitutions, thus accounting for the lack of activity of the shark protein. Studies of the secondary structure and self-association have shown that shark MBP possesses solution properties similar to those of the bovine protein, despite the extensive differences in primary structure.

  8. High affinity recognition of a Phytophthora protein by Arabidopsis via an RGD motif.

    PubMed

    Senchou, V; Weide, R; Carrasco, A; Bouyssou, H; Pont-Lezica, R; Govers, F; Canut, H

    2004-02-01

    The RGD tripeptide sequence, a cell adhesion motif present in several extracellular matrix proteins of mammalians, is involved in numerous plant processes. In plant-pathogen interactions, the RGD motif is believed to reduce plant defence responses by disrupting adhesions between the cell wall and plasma membrane. Photoaffinity cross-linking of [125I]-azido-RGD heptapeptide in the presence of purified plasma membrane vesicles of Arabidopsis thaliana led to label incorporation into a single protein with an apparent molecular mass of 80 kDa. Incorporation could be prevented by excess RGD peptides, but also by the IPI-O protein, an RGD-containing protein secreted by the oomycete plant pathogen Phytophthora infestans. Hydrophobic cluster analysis revealed that the RGD motif of IPI-O (positions 53-56) is readily accessible for interactions. Single amino acid mutations in the RGD motif in IPI-O (of Asp56 into Glu or Ala) resulted in the loss of protection of the 80-kDa protein from labelling. Thus, the interaction between the two proteins is mediated through RGD recognition and the 80-kDa RGD-binding protein has the characteristics of a receptor for IPI-O. The IPI-O protein also disrupted cell wall-plasma membrane adhesions in plasmolysed A. thaliana cells, whereas IPI-O proteins mutated in the RGD motif (D56A and D56E) did not.

  9. Complete cDNA and derived amino acid sequence of human factor V

    SciTech Connect

    Jenny, R.J.; Pittman, D.D.; Toole, J.J.; Kriz, R.W.; Aldape, R.A.; Hewick, R.M.; Kaufman, R.J.; Mann, K.G.

    1987-07-01

    cDNA clones encoding human factor V have been isolated from an oligo(dT)-primed human fetal liver cDNA library prepared with vector Charon 21A. The cDNA sequence of factor V from three overlapping clones includes a 6672-base-pair (bp) coding region, a 90-bp 5' untranslated region, and a 163-bp 3' untranslated region within which is a poly(A)tail. The deduced amino acid sequence consists of 2224 amino acids inclusive of a 28-amino acid leader peptide. Direct comparison with human factor VIII reveals considerable homology between proteins in amino acid sequence and domain structure: a triplicated A domain and duplicated C domain show approx. 40% identity with the corresponding domains in factor VIII. As in factor VIII, the A domains of factor V share approx. 40% amino acid-sequence homology with the three highly conserved domains in ceruloplasmin. The B domain of factor V contains 35 tandem and approx. 9 additional semiconserved repeats of nine amino acids of the form Asp-Leu-Ser-Gln-Thr-Thr/Asn-Leu-Ser-Pro and 2 additional semiconserved repeats of 17 amino acids. Factor V contains 37 potential N-linked glycosylation sites, 25 of which are in the B domain, and a total of 19 cysteine residues.

  10. Motif enrichment tool.

    PubMed

    Blatti, Charles; Sinha, Saurabh

    2014-07-01

    The Motif Enrichment Tool (MET) provides an online interface that enables users to find major transcriptional regulators of their gene sets of interest. MET searches the appropriate regulatory region around each gene and identifies which transcription factor DNA-binding specificities (motifs) are statistically overrepresented. Motif enrichment analysis is currently available for many metazoan species including human, mouse, fruit fly, planaria and flowering plants. MET also leverages high-throughput experimental data such as ChIP-seq and DNase-seq from ENCODE and ModENCODE to identify the regulatory targets of a transcription factor with greater precision. The results from MET are produced in real time and are linked to a genome browser for easy follow-up analysis. Use of the web tool is free and open to all, and there is no login requirement. ADDRESS: http://veda.cs.uiuc.edu/MET/.

  11. An analysis of amino acid sequences surrounding archaeal glycoprotein sequons.

    PubMed

    Abu-Qarn, Mehtap; Eichler, Jerry

    2007-05-01

    Despite having provided the first example of a prokaryal glycoprotein, little is known of the rules governing the N-glycosylation process in Archaea. As in Eukarya and Bacteria, archaeal N-glycosylation takes place at the Asn residues of Asn-X-Ser/Thr sequons. Since not all sequons are utilized, it is clear that other factors, including the context in which a sequon exists, affect glycosylation efficiency. As yet, the contribution to N-glycosylation made by sequon-bordering residues and other related factors in Archaea remains unaddressed. In the following, the surroundings of Asn residues confirmed by experiment as modified were analyzed in an attempt to define sequence rules and requirements for archaeal N-glycosylation.

  12. Pressure-dependent formation of i-motif and G-quadruplex DNA structures.

    PubMed

    Takahashi, S; Sugimoto, N

    2015-12-14

    Pressure is an important physical stimulus that can influence the fate of cells by causing structural changes in biomolecules such as DNA. We investigated the effect of high pressure on the folding of duplex, DNA i-motif, and G-quadruplex (G4) structures; the non-canonical structures may be modulators of expression of genes involved in cancer progression. The i-motif structure was stabilized by high pressure, whereas the G4 structure was destabilized. The melting temperature of an intramolecular i-motif formed by 5'-dCGG(CCT)10CGG-3' increased from 38.8 °C at atmospheric pressure to 61.5 °C at 400 MPa. This effect was also observed in the presence of 40 wt% ethylene glycol, a crowding agent. In the presence of 40 wt% ethylene glycol, the G4 structure was less destabilized than in the absence of the crowding agent. P-T stability diagrams of duplex DNA with a telomeric sequence indicated that the duplex is more stable than G4 and i-motif structures under low pressure, but the i-motif dominates the structural composition under high pressure. Under crowding conditions, the P-T diagrams indicated that the duplex does not form under high pressure, and i-motif and G4 structures dominate. Our findings imply that temperature regulates the formation of the duplex structure, whereas pressure triggers the formation of non-canonical DNA structures like i-motif and G4. These results suggest that pressure impacts the function of nucleic acids by stabilizing non-canonical structures; this may be relevant to deep sea organisms and during evolution under prebiotic conditions.

  13. Nucleic and Amino Acid Sequences Support Structure-Based Viral Classification

    PubMed Central

    Sinclair, Robert M.; Ravantti, Janne J.

    2017-01-01

    ABSTRACT Viral capsids ensure viral genome integrity by protecting the enclosed nucleic acids. Interactions between the genome and capsid and between individual capsid proteins (i.e., capsid architecture) are intimate and are expected to be characterized by strong evolutionary conservation. For this reason, a capsid structure-based viral classification has been proposed as a way to bring order to the viral universe. The seeming lack of sufficient sequence similarity to reproduce this classification has made it difficult to reject structural convergence as the basis for the classification. We reinvestigate whether the structure-based classification for viral coat proteins making icosahedral virus capsids is in fact supported by previously undetected sequence similarity. Since codon choices can influence nascent protein folding cotranslationally, we searched for both amino acid and nucleotide sequence similarity. To demonstrate the sensitivity of the approach, we identify a candidate gene for the pandoravirus capsid protein. We show that the structure-based classification is strongly supported by amino acid and also nucleotide sequence similarities, suggesting that the similarities are due to common descent. The correspondence between structure-based and sequence-based analyses of the same proteins shown here allow them to be used in future analyses of the relationship between linear sequence information and macromolecular function, as well as between linear sequence and protein folds. IMPORTANCE Viral capsids protect nucleic acid genomes, which in turn encode capsid proteins. This tight coupling of protein shell and nucleic acids, together with strong functional constraints on capsid protein folding and architecture, leads to the hypothesis that capsid protein-coding nucleotide sequences may retain signatures of ancient viral evolution. We have been able to show that this is indeed the case, using the major capsid proteins of viruses forming icosahedral capsids

  14. Protein Chaperones Q8ZP25_SALTY from Salmonella Typhimurium and HYAE_ECOLI from Escherichia coli Exhibit Thioredoxin-like Structures Despite Lack of Canonical Thioredoxin Active Site Sequence Motif

    SciTech Connect

    Parish, D.; Benach, J; Liu, G; Singarapu, K; Xiao, R; Acton, T; Hunt, J; Montelione, G; Szyperski, T; et. al.

    2008-01-01

    The structure of the 142-residue protein Q8ZP25 SALTY encoded in the genome of Salmonella typhimurium LT2 was determined independently by NMR and X-ray crystallography, and the structure of the 140-residue protein HYAE ECOLI encoded in the genome of Escherichia coli was determined by NMR. The two proteins belong to Pfam (Finn et al. 34:D247-D251, 2006) PF07449, which currently comprises 50 members, and belongs itself to the 'thioredoxin-like clan'. However, protein HYAE ECOLI and the other proteins of Pfam PF07449 do not contain the canonical Cys-X-X-Cys active site sequence motif of thioredoxin. Protein HYAE ECOLI was previously classified as a (NiFe) hydrogenase-1 specific chaperone interacting with the twin-arginine translocation (Tat) signal peptide. The structures presented here exhibit the expected thioredoxin-like fold and support the view that members of Pfam family PF07449 specifically interact with Tat signal peptides.

  15. Classification of mouse VK groups based on the partial amino acid sequence to the first invariant tryptophan: impact of 14 new sequences from IgG myeloma proteins.

    PubMed

    Potter, M; Newell, J B; Rudikoff, S; Haber, E

    1982-12-01

    Fourteen new VK sequences derived from BALB/c IgG myeloma proteins were determined to the first invariant tryptophan (Trp 35). These partial sequences were compared with 65 other published VK sequences using a computer program. The 79 sequences were organized according to the length of the sequence from the amino terminus to the first invariant tryptophan (Trp 35), into seven groups (33, 34, 35, 36, 39, 40 and 41aa). A distance matrix of all 79 sequences was then computed, i.e. the number of amino acid substitutions necessary to convert one sequence to another was determined. From these data a dendrogram was constructed. Most of the VK sequences fell into clusters or closely related groups. The definition of a sequence group is arbitrary but facilitates the classification of VK proteins. We used 12 substitutions as the basis for defining a sequence group based on the known number of substitutions that are found in the VK21 proteins. By this criterion there were 18 groups in the Trp 35 dendrogram. Twelve of the 14 new sequences fell into one of these sequence groups; two formed new sequence groups. Collective amino acid sequencing is still encountering new VK structures indicating more sequences will be required to attain an accurate estimate of the total number of VK groups. Updated dendrograms can be quickly generated to include newly generated sequences.

  16. Detection and isolation of nucleic acid sequences using competitive hybridization probes

    DOEpatents

    Lucas, J.N.; Straume, T.; Bogen, K.T.

    1997-04-01

    A method for detecting a target nucleic acid sequence in a sample is provided using hybridization probes which competitively hybridize to a target nucleic acid. According to the method, a target nucleic acid sequence is hybridized to first and second hybridization probes which are complementary to overlapping portions of the target nucleic acid sequence, the first hybridization probe including a first complexing agent capable of forming a binding pair with a second complexing agent and the second hybridization probe including a detectable marker. The first complexing agent attached to the first hybridization probe is contacted with a second complexing agent, the second complexing agent being attached to a solid support such that when the first and second complexing agents are attached, target nucleic acid sequences hybridized to the first hybridization probe become immobilized on to the solid support. The immobilized target nucleic acids are then separated and detected by detecting the detectable marker attached to the second hybridization probe. A kit for performing the method is also provided. 7 figs.

  17. Detection and isolation of nucleic acid sequences using competitive hybridization probes

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    1997-01-01

    A method for detecting a target nucleic acid sequence in a sample is provided using hybridization probes which competitively hybridize to a target nucleic acid. According to the method, a target nucleic acid sequence is hybridized to first and second hybridization probes which are complementary to overlapping portions of the target nucleic acid sequence, the first hybridization probe including a first complexing agent capable of forming a binding pair with a second complexing agent and the second hybridization probe including a detectable marker. The first complexing agent attached to the first hybridization probe is contacted with a second complexing agent, the second complexing agent being attached to a solid support such that when the first and second complexing agents are attached, target nucleic acid sequences hybridized to the first hybridization probe become immobilized on to the solid support. The immobilized target nucleic acids are then separated and detected by detecting the detectable marker attached to the second hybridization probe. A kit for performing the method is also provided.

  18. Identification of sequence–structure RNA binding motifs for SELEX-derived aptamers

    PubMed Central

    Hoinka, Jan; Zotenko, Elena; Friedman, Adam; Sauna, Zuben E.; Przytycka, Teresa M.

    2012-01-01

    Motivation: Systematic Evolution of Ligands by EXponential Enrichment (SELEX) represents a state-of-the-art technology to isolate single-stranded (ribo)nucleic acid fragments, named aptamers, which bind to a molecule (or molecules) of interest via specific structural regions induced by their sequence-dependent fold. This powerful method has applications in designing protein inhibitors, molecular detection systems, therapeutic drugs and antibody replacement among others. However, full understanding and consequently optimal utilization of the process has lagged behind its wide application due to the lack of dedicated computational approaches. At the same time, the combination of SELEX with novel sequencing technologies is beginning to provide the data that will allow the examination of a variety of properties of the selection process. Results: To close this gap we developed, Aptamotif, a computational method for the identification of sequence–structure motifs in SELEX-derived aptamers. To increase the chances of identifying functional motifs, Aptamotif uses an ensemble-based approach. We validated the method using two published aptamer datasets containing experimentally determined motifs of increasing complexity. We were able to recreate the author's findings to a high degree, thus proving the capability of our approach to identify binding motifs in SELEX data. Additionally, using our new experimental dataset, we illustrate the application of Aptamotif to elucidate several properties of the selection process. Contact: przytyck@ncbi.nlm.nih.gov, Zuben.Sauna@fda.hhs.gov PMID:22689764

  19. One-step catalytic asymmetric synthesis of all-syn deoxypropionate motif from propylene: Total synthesis of (2R,4R,6R,8R)-2,4,6,8-tetramethyldecanoic acid

    PubMed Central

    Ota, Yusuke; Murayama, Toshiki; Nozaki, Kyoko

    2016-01-01

    In nature, many complex structures are assembled from simple molecules by a series of tailored enzyme-catalyzed reactions. One representative example is the deoxypropionate motif, an alternately methylated alkyl chain containing multiple stereogenic centers, which is biosynthesized by a series of enzymatic reactions from simple building blocks. In organic synthesis, however, the majority of the reported routes require the syntheses of complex building blocks. Furthermore, multistep reactions with individual purifications are required at each elongation. Here we show the construction of the deoxypropionate structure from propylene in a single step to achieve a three-step synthesis of (2R,4R,6R,8R)-2,4,6,8-tetramethyldecanoic acid, a major acid component of a preen-gland wax of the graylag goose. To realize this strategy, we focused on the coordinative chain transfer polymerization and optimized the reaction condition to afford a stereo-controlled oligomer, which is contrastive to the other synthetic strategies developed to date that require 3–6 steps per unit, with unavoidable byproduct generation. Furthermore, multiple oligomers with different number of deoxypropionate units were isolated from one batch, showing application to the construction of library. Our strategy opens the door for facile synthetic routes toward other natural products that share the deoxypropionate motif. PMID:26908873

  20. Amino acid sequence around the active-site serine residue in the acyltransferase domain of goat mammary fatty acid synthetase.

    PubMed Central

    Mikkelsen, J; Højrup, P; Rasmussen, M M; Roepstorff, P; Knudsen, J

    1985-01-01

    Goat mammary fatty acid synthetase was labelled in the acyltransferase domain by formation of O-ester intermediates by incubation with [1-14C]acetyl-CoA and [2-14C]malonyl-CoA. Tryptic-digest and CNBr-cleavage peptides were isolated and purified by high-performance reverse-phase and ion-exchange liquid chromatography. The sequences of the malonyl- and acetyl-labelled peptides were shown to be identical. The results confirm the hypothesis that both acetyl and malonyl groups are transferred to the mammalian fatty acid synthetase complex by the same transferase. The sequence is compared with those of other fatty acid synthetase transferases. PMID:3922356

  1. Piriform spider silk sequences reveal unique repetitive elements.

    PubMed

    Perry, David J; Bittencourt, Daniela; Siltberg-Liberles, Jessica; Rech, Elibio L; Lewis, Randolph V

    2010-11-08

    Orb-weaving spider silk fibers are assembled from very large, highly repetitive proteins. The repeated segments contain, in turn, short, simple, and repetitive amino acid motifs that account for the physical and mechanical properties of the assembled fiber. Of the six orb-weaver silk fibroins, the piriform silk that makes the attachment discs, which lashes the joints of the web and attaches dragline silk to surfaces, has not been previously characterized. Piriform silk protein cDNAs were isolated from phage libraries of three species: A. trifasciata , N. clavipes , and N. cruentata . The deduced amino acid sequences from these genes revealed two new repetitive motifs: an alternating proline motif, where every other amino acid is proline, and a glutamine-rich motif of 6-8 amino acids. Similar to other spider silk proteins, the repeated segments are large (>200 amino acids) and highly homogenized within a species. There is also substantial sequence similarity across the genes from the three species, with particular conservation of the repetitive motifs. Northern blot analysis revealed that the mRNA is larger than 11 kb and is expressed exclusively in the piriform glands of the spider. Phylogenetic analysis of the C-terminal regions of the new proteins with published spidroins robustly shows that the piriform sequences form an ortholog group.

  2. [Personal motif in art].

    PubMed

    Gerevich, József

    2015-01-01

    One of the basic questions of the art psychology is whether a personal motif is to be found behind works of art and if so, how openly or indirectly it appears in the work itself. Analysis of examples and documents from the fine arts and literature allow us to conclude that the personal motif that can be identified by the viewer through symbols, at times easily at others with more difficulty, gives an emotional plus to the artistic product. The personal motif may be found in traumatic experiences, in communication to the model or with other emotionally important persons (mourning, disappointment, revenge, hatred, rivalry, revolt etc.), in self-searching, or self-analysis. The emotions are expressed in artistic activity either directly or indirectly. The intention nourished by the artist's identity (Kunstwollen) may stand in the way of spontaneous self-expression, channelling it into hidden paths. Under the influence of certain circumstances, the artist may arouse in the viewer, consciously or unconsciously, an illusionary, misleading image of himself. An examination of the personal motif is one of the important research areas of art therapy.

  3. Ligation with nucleic acid sequence-based amplification.

    PubMed

    Ong, Carmichael; Tai, Warren; Sarma, Aartik; Opal, Steven M; Artenstein, Andrew W; Tripathi, Anubhav

    2012-01-01

    This work presents a novel method for detecting nucleic acid targets using a ligation step along with an isothermal, exponential amplification step. We use an engineered ssDNA with two variable regions on the ends, allowing us to design the probe for optimal reaction kinetics and primer binding. This two-part probe is ligated by T4 DNA Ligase only when both parts bind adjacently to the target. The assay demonstrates that the expected 72-nt RNA product appears only when the synthetic target, T4 ligase, and both probe fragments are present during the ligation step. An extraneous 38-nt RNA product also appears due to linear amplification of unligated probe (P3), but its presence does not cause a false-positive result. In addition, 40 mmol/L KCl in the final amplification mix was found to be optimal. It was also found that increasing P5 in excess of P3 helped with ligation and reduced the extraneous 38-nt RNA product. The assay was also tested with a single nucleotide polymorphism target, changing one base at the ligation site. The assay was able to yield a negative signal despite only a single-base change. Finally, using P3 and P5 with longer binding sites results in increased overall sensitivity of the reaction, showing that increasing ligation efficiency can improve the assay overall. We believe that this method can be used effectively for a number of diagnostic assays.

  4. KM+, a mannose-binding lectin from Artocarpus integrifolia: amino acid sequence, predicted tertiary structure, carbohydrate recognition, and analysis of the beta-prism fold.

    PubMed Central

    Rosa, J. C.; De Oliveira, P. S.; Garratt, R.; Beltramini, L.; Resing, K.; Roque-Barreira, M. C.; Greene, L. J.

    1999-01-01

    The complete amino acid sequence of the lectin KM+ from Artocarpus integrifolia (jackfruit), which contains 149 residues/mol, is reported and compared to those of other members of the Moraceae family, particularly that of jacalin, also from jackfruit, with which it shares 52% sequence identity. KM+ presents an acetyl-blocked N-terminus and is not posttranslationally modified by proteolytic cleavage as is the case for jacalin. Rather, it possesses a short, glycine-rich linker that unites the regions homologous to the alpha- and beta-chains of jacalin. The results of homology modeling implicate the linker sequence in sterically impeding rotation of the side chain of Asp141 within the binding site pocket. As a consequence, the aspartic acid is locked into a conformation adequate only for the recognition of equatorial hydroxyl groups on the C4 epimeric center (alpha-D-mannose, alpha-D-glucose, and their derivatives). In contrast, the internal cleavage of the jacalin chain permits free rotation of the homologous aspartic acid, rendering it capable of accepting hydrogen bonds from both possible hydroxyl configurations on C4. We suggest that, together with direct recognition of epimeric hydroxyls and the steric exclusion of disfavored ligands, conformational restriction of the lectin should be considered to be a new mechanism by which selectivity may be built into carbohydrate binding sites. Jacalin and KM+ adopt the beta-prism fold already observed in two unrelated protein families. Despite presenting little or no sequence similarity, an analysis of the beta-prism reveals a canonical feature repeatedly present in all such structures, which is based on six largely hydrophobic residues within a beta-hairpin containing two classic-type beta-bulges. We suggest the term beta-prism motif to describe this feature. PMID:10210179

  5. Thin-film technology for direct visual detection of nucleic acid sequences: applications in clinical research.

    PubMed

    Jenison, Robert D; Bucala, Richard; Maul, Diana; Ward, David C

    2006-01-01

    Certain optical conditions permit the unaided eye to detect thickness changes on surfaces on the order of 20 A, which are of similar dimensions to monomolecular interactions between proteins or hybridization of complementary nucleic acid sequences. Such detection exploits specific interference of reflected white light, wherein thickness changes are perceived as surface color changes. This technology, termed thin-film detection, allows for the visualization of subattomole amounts of nucleic acid targets, even in complex clinical samples. Thin-film technology has been applied to a broad range of clinically relevant indications, including the detection of pathogenic bacterial and viral nucleic acid sequences and the discrimination of sequence variations in human genes causally related to susceptibility or severity of disease.

  6. An evolutionary analysis of flightin reveals a conserved motif unique and widespread in Pancrustacea.

    PubMed

    Soto-Adames, Felipe N; Alvarez-Ortiz, Pedro; Vigoreaux, Jim O

    2014-01-01

    Flightin is a thick filament protein that in Drosophila melanogaster is uniquely expressed in the asynchronous, indirect flight muscles (IFM). Flightin is required for the structure and function of the IFM and is indispensable for flight in Drosophila. Given the importance of flight acquisition in the evolutionary history of insects, here we study the phylogeny and distribution of flightin. Flightin was identified in 69 species of hexapods in classes Collembola (springtails), Protura, Diplura, and insect orders Thysanura (silverfish), Dictyoptera (roaches), Orthoptera (grasshoppers), Pthiraptera (lice), Hemiptera (true bugs), Coleoptera (beetles), Neuroptera (green lacewing), Hymenoptera (bees, ants, and wasps), Lepidoptera (moths), and Diptera (flies and mosquitoes). Flightin was also found in 14 species of crustaceans in orders Anostraca (water flea), Cladocera (brine shrimp), Isopoda (pill bugs), Amphipoda (scuds, sideswimmers), and Decapoda (lobsters, crabs, and shrimps). Flightin was not identified in representatives of chelicerates, myriapods, or any species outside Pancrustacea (Tetraconata, sensu Dohle). Alignment of amino acid sequences revealed a conserved region of 52 amino acids, referred herein as WYR, that is bound by strictly conserved tryptophan (W) and arginine (R) and an intervening sequence with a high content of tyrosines (Y). This motif has no homologs in GenBank or PROSITE and is unique to flightin and paraflightin, a putative flightin paralog identified in decapods. A third motif of unclear affinities to pancrustacean WYR was observed in chelicerates. Phylogenetic analysis of amino acid sequences of the conserved motif suggests that paraflightin originated before the divergence of amphipods, isopods, and decapods. We conclude that flightin originated de novo in the ancestor of Pancrustacea > 500 MYA, well before the divergence of insects (~400 MYA) and the origin of flight (~325 MYA), and that its IFM-specific function in Drosophila is a more

  7. Conservation of Shannon's redundancy for proteins. [information theory applied to amino acid sequences

    NASA Technical Reports Server (NTRS)

    Gatlin, L. L.

    1974-01-01

    Concepts of information theory are applied to examine various proteins in terms of their redundancy in natural originators such as animals and plants. The Monte Carlo method is used to derive information parameters for random protein sequences. Real protein sequence parameters are compared with the standard parameters of protein sequences having a specific length. The tendency of a chain to contain some amino acids more frequently than others and the tendency of a chain to contain certain amino acid pairs more frequently than other pairs are used as randomness measures of individual protein sequences. Non-periodic proteins are generally found to have random Shannon redundancies except in cases of constraints due to short chain length and genetic codes. Redundant characteristics of highly periodic proteins are discussed. A degree of periodicity parameter is derived.

  8. RNA internal standard synthesis by nucleic acid sequence-based amplification for competitive quantitative amplification reactions.

    PubMed

    Lo, Wan-Yu; Baeumner, Antje J

    2007-02-15

    Nucleic acid sequence-based amplification (NASBA) reactions have been demonstrated to successfully synthesize new sequences based on deletion and insertion reactions. Two RNA internal standards were synthesized for use in competitive amplification reactions in which quantitative analysis can be achieved by coamplifying the internal standard with the wild type sample. The sequences were created in two consecutive NASBA reactions using the E. coli clpB mRNA sequence as model analyte. The primer sequences of the wild type sequence were maintained, and a 20-nt-long segment inside the amplicon region was exchanged for a new segment of similar GC content and melting temperature. The new RNA sequence was thus amplifiable using the wild type primers and detectable via a new inserted sequence. In the first reaction, the forwarding primer and an additional 20-nt-long sequence was deleted and replaced by a new 20-nt-long sequence. In the second reaction, a forwarding primer containing as 5' overhang sequence the wild type primer sequence was used. The presence of pure internal standard was verified using electrochemiluminescence and RNA lateral-flow biosensor analysis. Additional sequence deletion in order to shorten the internal standard amplicons and thus generate higher detection signals was found not to be required. Finally, a competitive NASBA reaction between one internal standard and the wild type sequence was carried out proving its functionality. This new rapid construction method via NASBA provides advantages over the traditional techniques since it requires no traditional cloning procedures, no thermocyclers, and can be completed in less than 4 h.

  9. The Chlamydia effector TarP mimics the mammalian leucine-aspartic acid motif of paxillin to subvert the focal adhesion kinase during invasion.

    PubMed

    Thwaites, Tristan; Nogueira, Ana T; Campeotto, Ivan; Silva, Ana P; Grieshaber, Scott S; Carabeo, Rey A

    2014-10-31

    Host cell signal transduction pathways are often targets of bacterial pathogens, especially during the process of invasion when robust actin remodeling is required. We demonstrate that the host cell focal adhesion kinase (FAK) was necessary for the invasion by the obligate intracellular pathogen Chlamydia caviae. Bacterial adhesion triggered the transient recruitment of FAK to the plasma membrane to mediate a Cdc42- and Arp2/3-dependent actin assembly. FAK recruitment was via binding to a domain within the virulence factor TarP that mimicked the LD2 motif of the FAK binding partner paxillin. Importantly, bacterial two-hybrid and quantitative imaging assays revealed a similar level of interaction between paxillin-LD2 and TarP-LD. The conserved leucine residues within the L(D/E)XLLXXL motif were essential to the recruitment of FAK, Cdc42, p34(Arc), and actin to the plasma membrane. In the absence of FAK, TarP-LD-mediated F-actin assembly was reduced, highlighting the functional relevance of this interaction. Together, the data indicate that a prokaryotic version of the paxillin LD2 domain targets the FAK signaling pathway, with TarP representing the first example of an LD-containing Type III virulence effector.

  10. Conversion of nicotinic acid to trigonelline is catalyzed by N-methyltransferase belonged to motif B′ methyltransferase family in Coffea arabica

    SciTech Connect

    Mizuno, Kouichi; Matsuzaki, Masahiro; Kanazawa, Shiho; Tokiwano, Tetsuo; Yoshizawa, Yuko; Kato, Misako

    2014-10-03

    Graphical abstract: Trigonelline synthase catalyzes the conversion of nicotinic acid to trigonelline. We isolated and characterized trigonelline synthase gene(s) from Coffea arabica. - Highlights: • Trigonelline is a major compound in coffee been same as caffeine is. • We isolated and characterized trigonelline synthase gene. • Coffee trigonelline synthases are highly homologous with coffee caffeine synthases. • This study contributes the fully understanding of pyridine alkaloid metabolism. - Abstract: Trigonelline (N-methylnicotinate), a member of the pyridine alkaloids, accumulates in coffee beans along with caffeine. The biosynthetic pathway of trigonelline is not fully elucidated. While it is quite likely that the production of trigonelline from nicotinate is catalyzed by N-methyltransferase, as is caffeine synthase (CS), the enzyme(s) and gene(s) involved in N-methylation have not yet been characterized. It should be noted that, similar to caffeine, trigonelline accumulation is initiated during the development of coffee fruits. Interestingly, the expression profiles for two genes homologous to caffeine synthases were similar to the accumulation profile of trigonelline. We presumed that these two CS-homologous genes encoded trigonelline synthases. These genes were then expressed in Escherichiacoli, and the resulting recombinant enzymes that were obtained were characterized. Consequently, using the N-methyltransferase assay with S-adenosyl[methyl-{sup 14}C]methionine, it was confirmed that these recombinant enzymes catalyzed the conversion of nicotinate to trigonelline, coffee trigonelline synthases (termed CTgS1 and CTgS2) were highly identical (over 95% identity) to each other. The sequence homology between the CTgSs and coffee CCS1 was 82%. The pH-dependent activity curve of CTgS1 and CTgS2 revealed optimum activity at pH 7.5. Nicotinate was the specific methyl acceptor for CTgSs, and no activity was detected with any other nicotinate derivatives, or

  11. The Assembly Motif of a Bacterial Small Multidrug Resistance Protein*

    PubMed Central

    Poulsen, Bradley E.; Rath, Arianna; Deber, Charles M.

    2009-01-01

    Multidrug transporters such as the small multidrug resistance (SMR) family of bacterial integral membrane proteins are capable of conferring clinically significant resistance to a variety of common therapeutics. As antiporter proteins of ∼100 amino acids, SMRs must self-assemble into homo-oligomeric structures for efflux of drug molecules. Oligomerization centered at transmembrane helix four (TM4) has been implicated in SMR assembly, but the full complement of residues required to mediate its self-interaction remains to be characterized. Here, we use Hsmr, the 110-residue SMR family member of the archaebacterium Halobacterium salinarum, to determine the TM4 residue motif required to mediate drug resistance and SMR self-association. Twelve single point mutants that scan the central portion of the TM4 helix (residues 85–104) were constructed and were tested for their ability to confer resistance to the cytotoxic compound ethidium bromide. Six residues were found to be individually essential for drug resistance activity (Gly90, Leu91, Leu93, Ile94, Gly97, and Val98), defining a minimum activity motif of 90GLXLIXXGV98 within TM4. When the propensity of these mutants to dimerize on SDS-PAGE was examined, replacements of all but Ile resulted in ∼2-fold reduction of dimerization versus the wild-type antiporter. Our work defines a minimum activity motif of 90GLXLIXXGV98 within TM4 and suggests that this sequence mediates TM4-based SMR dimerization along a single helix surface, stabilized by a small residue heptad repeat sequence. These TM4-TM4 interactions likely constitute the highest affinity locus for disruption of SMR function by directly targeting its self-assembly mechanism. PMID:19224913

  12. In vitro evolution of a peptide with a hematite binding motif that may constitute a natural metal-oxide binding archetype.

    PubMed

    Lower, Brian H; Lins, Roberto D; Oestreicher, Zachery; Straatsma, Tjerk P; Hochella, Michael F; Shi, Liang; Lower, Steven K

    2008-05-15

    Phage-display technology was used to evolve peptides that selectively bind to the metal-oxide hematite (Fe2O3) from a library of approximately 3 billion different polypeptides. The sequences of these peptides contained the highly conserved amino acid motif, Ser/Thr-hydrophobic/aromatic-Ser/Thr-Pro-Ser/Thr. To better understand the nature of the peptide-metal oxide binding demonstrated by these experiments, molecular dynamics simulations were carried out for Ser-Pro-Ser at a hematite surface. These simulations show that hydrogen bonding occurs between the two serine amino acids and the hydroxylated hematite surface and that the presence of proline between the hydroxide residues restricts the peptide flexibility, thereby inducing a structural-binding motif. A search of published sequence data revealed that the binding motif (Ser/Thr-Pro-Ser/Thr) is adjacent to the terminal heme-binding domain of both OmcA and MtrC, which are outer membrane cytochromes from the metal-reducing bacterium Shewanella oneidensis MR-1. The entire five amino acid consensus sequence (Ser/Thr-hydrophobic/ aromatic-Ser/Thr-Pro-Ser/Thr) was also found as multiple copies in the primary sequences of metal-oxide binding proteins Sil1 and Sil2 from Thalassiosira pseudonana. We suggest that this motif constitutes a natural metal-oxide binding archetype that could be exploited in enzyme-based biofuel cell design and approaches to synthesize tailored metal-oxide nanostructures.

  13. Amino acid sequences of two nonspecific lipid-transfer proteins from germinated castor bean.

    PubMed

    Takishima, K; Watanabe, S; Yamada, M; Suga, T; Mamiya, G

    1988-11-01

    The amino acid sequence of two nonspecific lipid-transfer proteins (nsLTP) B and C from germinated castor bean seeds have been determined. Both the proteins consist of 92 residues, as for nsLTP previously reported, and their calculated Mr values are 9847 and 9593 for nsLTP-B and nsLTP-C, respectively. The sequences of nsLTP-B and nsLTP-C, compared to the known sequence of nsLTP-A from the same source, are 68% and 35% similar, respectively. No variation was found at the positions of the cysteine residues, indicating that they might be involved in disulfide bridges.

  14. A classification of glycosyl hydrolases based on amino acid sequence similarities.

    PubMed Central

    Henrissat, B

    1991-01-01

    The amino acid sequences of 301 glycosyl hydrolases and related enzymes have been compared. A total of 291 sequences corresponding to 39 EC entries could be classified into 35 families. Only ten sequences (less than 5% of the sample) could not be assigned to any family. With the sequences available for this analysis, 18 families were found to be monospecific (containing only one EC number) and 17 were found to be polyspecific (containing at least two EC numbers). Implications on the folding characteristics and mechanism of action of these enzymes and on the evolution of carbohydrate metabolism are discussed. With the steady increase in sequence and structural data, it is suggested that the enzyme classification system should perhaps be revised. PMID:1747104

  15. In silico comparative analysis of DNA and amino acid sequences for prion protein gene.

    PubMed

    Kim, Y; Lee, J; Lee, C

    2008-01-01

    Genetic variability might contribute to species specificity of prion diseases in various organisms. In this study, structures of the prion protein gene (PRNP) and its amino acids were compared among species of which sequence data were available. Comparisons of PRNP DNA sequences among 12 species including human, chimpanzee, monkey, bovine, ovine, dog, mouse, rat, wallaby, opossum, chicken and zebrafish allowed us to identify candidate regulatory regions in intron 1 and 3'-untranslated region (UTR) in addition to the coding region. Highly conserved putative binding sites for transcription factors, such as heat shock factor 2 (HSF2) and myocite enhancer factor 2 (MEF2), were discovered in the intron 1. In 3'-UTR, the functional sequence (ATTAAA) for nucleus-specific polyadenylation was found in all the analysed species. The functional sequence (TTTTTAT) for maturation-specific polyadenylation was identically observed only in ovine, and one or two nucleotide mismatches in the other species. A comparison of the amino acid sequences in 53 species revealed a large sequence identity. Especially the octapeptide repeat region was observed in all the species but frog and zebrafish. Functional changes and susceptibility to prion diseases with various isoforms of prion protein could be caused by numeric variability and conformational changes discovered in the repeat sequences.

  16. Transcriptome sequencing revealed the transcriptional organization at ribosome-mediated attenuation sites in Corynebacterium glutamicum and identified a novel attenuator involved in aromatic amino acid biosynthesis.

    PubMed

    Neshat, Armin; Mentz, Almut; Rückert, Christian; Kalinowski, Jörn

    2014-11-20

    The Gram-positive bacterium Corynebacterium glutamicum belongs to the order Corynebacteriales and is used as a producer of amino acids at industrial scales. Due to its economic importance, gene expression and particularly the regulation of amino acid biosynthesis has been investigated extensively. Applying the high-resolution technique of transcriptome sequencing (RNA-seq), recently a vast amount of data has been generated that was used to comprehensively analyze the C. glutamicum transcriptome. By analyzing RNA-seq data from a small RNA cDNA library of C. glutamicum, short transcripts in the known transcriptional attenuators sites of the trp operon, the ilvBNC operon and the leuA gene were verified. Furthermore, whole transcriptome RNA-seq data were used to elucidate the transcriptional organization of these three amino acid biosynthesis operons. In addition, we discovered and analyzed the novel attenuator aroR, located upstream of the aroF gene (cg1129). The DAHP synthase encoded by aroF catalyzes the first step in aromatic amino acid synthesis. The AroR leader peptide contains the amino acid sequence motif F-Y-F, indicating a regulatory effect by phenylalanine and tyrosine. Analysis by real-time RT-PCR suggests that the attenuator regulates the transcription of aroF in dependence of the cellular amount of tRNA loaded with phenylalanine when comparing a phenylalanine-auxotrophic C. glutamicum mutant fed with limiting and excess amounts of a phenylalanine-containing dipeptide. Additionally, the very interesting finding was made that all analyzed attenuators are leaderless transcripts.

  17. Complete amino acid sequence of the N-terminal extension of calf skin type III procollagen.

    PubMed Central

    Brandt, A; Glanville, R W; Hörlein, D; Bruckner, P; Timpl, R; Fietzek, P P; Kühn, K

    1984-01-01

    The N-terminal extension peptide of type III procollagen, isolated from foetal-calf skin, contains 130 amino acid residues. To determine its amino acid sequence, the peptide was reduced and carboxymethylated or aminoethylated and fragmented with trypsin, Staphylococcus aureus V8 proteinase and bacterial collagenase. Pyroglutamate aminopeptidase was used to deblock the N-terminal collagenase fragment to enable amino acid sequencing. The type III collagen extension peptide is homologous to that of the alpha 1 chain of type I procollagen with respect to a three-domain structure. The N-terminal 79 amino acids, which contain ten of the 12 cysteine residues, form a compact globular domain. The next 39 amino acids are in a collagenase triplet sequence (Gly- Xaa - Yaa )n with a high hydroxyproline content. Finally, another short non-collagenous domain of 12 amino acids ends at the cleavage site for procollagen aminopeptidase, which cleaves a proline-glutamine bond. In contrast with type I procollagen, the type III procollagen extension peptides contain interchain disulphide bridges located at the C-terminus of the triple-helical domain. PMID:6331392

  18. 37 CFR 1.824 - Form and format for nucleotide and/or amino acid sequence submissions in computer readable form.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... nucleotide and/or amino acid sequence submissions in computer readable form. 1.824 Section 1.824 Patents... And/or Amino Acid Sequences § 1.824 Form and format for nucleotide and/or amino acid sequence... readable form may be created by any means, such as word processors, nucleotide/amino acid sequence...

  19. 37 CFR 1.824 - Form and format for nucleotide and/or amino acid sequence submissions in computer readable form.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... nucleotide and/or amino acid sequence submissions in computer readable form. 1.824 Section 1.824 Patents... And/or Amino Acid Sequences § 1.824 Form and format for nucleotide and/or amino acid sequence... readable form may be created by any means, such as word processors, nucleotide/amino acid sequence...

  20. 37 CFR 1.824 - Form and format for nucleotide and/or amino acid sequence submissions in computer readable form.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... nucleotide and/or amino acid sequence submissions in computer readable form. 1.824 Section 1.824 Patents... And/or Amino Acid Sequences § 1.824 Form and format for nucleotide and/or amino acid sequence... readable form may be created by any means, such as word processors, nucleotide/amino acid sequence...

  1. Complete amino acid sequence of branched-chain amino acid aminotransferase (transaminase B) of Salmonella typhimurium, identification of the coenzyme-binding site and sequence comparison analysis

    SciTech Connect

    Feild, M.J.

    1988-01-01

    The complete amino acid sequence of the subunit of branched-chain amino acid aminotransferase of Salmonella typhimurium was determined by automated Edman degradation of peptide fragments generated by chemical and enzymatic digestion of S-carboxymethylated and S-pyridylethylated transaminase B. Peptide fragments of transaminase B were generated by treatment of the enzyme with trypsin, Staphylococcus aureus V8 protease, endoproteinase Lys-C, and cyanogen bromide. Protocols were developed for separation of the peptide fragments by reverse-phase high performance liquid chromatography (HPLC), ion-exchange HPLC, and SDS-urea gel electrophoresis. The enzyme subunit contains 308 amino acid residues and has a molecular weight of 33,920 daltons. The coenzyme-binding site was determined by treatment of the enzyme, containing bound pyridoxal 5-phosphate, with tritiated sodium borohydride prior to trypsin digestion. Monitoring radioactivity incorporation and peptide map comparisons with an apoenzyme tryptic digest, allowed identification of the pyridoxylated-peptide which was isolated by reverse-phase HPLC and sequenced. The coenzyme-binding site is a lysyl residue at position 159. Some peptides were further characterized by fast atom bombardment mass spectrometry.

  2. The Q Motif Is Involved in DNA Binding but Not ATP Binding in ChlR1 Helicase

    PubMed Central

    Ding, Hao; Guo, Manhong; Vidhyasagar, Venkatasubramanian; Talwar, Tanu; Wu, Yuliang

    2015-01-01

    Helicases are molecular motors that couple the energy of ATP hydrolysis to the unwinding of structured DNA or RNA and chromatin remodeling. The conversion of energy derived from ATP hydrolysis into unwinding and remodeling is coordinated by seven sequence motifs (I, Ia, II, III, IV, V, and VI). The Q motif, consisting of nine amino acids (GFXXPXPIQ) with an invariant glutamine (Q) residue, has been identified in some, but not all helicases. Compared to the seven well-recognized conserved helicase motifs, the role of the Q motif is less acknowledged. Mutations in the human ChlR1 (DDX11) gene are associated with a unique genetic disorder known as Warsaw Breakage Syndrome, which is characterized by cellular defects in genome maintenance. To examine the roles of the Q motif in ChlR1 helicase, we performed site directed mutagenesis of glutamine to alanine at residue 23 in the Q motif of ChlR1. ChlR1 recombinant protein was overexpressed and purified from HEK293T cells. ChlR1-Q23A mutant abolished the helicase activity of ChlR1 and displayed reduced DNA binding ability. The mutant showed impaired ATPase activity but normal ATP binding. A thermal shift assay revealed that ChlR1-Q23A has a melting point value similar to ChlR1-WT. Partial proteolysis mapping demonstrated that ChlR1-WT and Q23A have a similar globular structure, although some subtle conformational differences in these two proteins are evident. Finally, we found ChlR1 exists and functions as a monomer in solution, which is different from FANCJ, in which the Q motif is involved in protein dimerization. Taken together, our results suggest that the Q motif is involved in DNA binding but not ATP binding in ChlR1 helicase. PMID:26474416

  3. A poly(A) binding protein-specific sequence motif: MRTENGKSKGFGFVC binding to mRNA poly(A) and polynucleotides and its role on mRNA translation.

    PubMed

    Rubin, H N; Halim, M N; Leavis, P C

    1994-06-01

    A consensus sequence (GKSKGFGFV) was recognized in all the sequenced poly(A) binding proteins. We synthesized a 15-amino acid peptide (corresponding to 354-368 in the yeast poly(A) binding protein) which includes the consensus sequence to test its binding affinity to different nucleotides, polynucleotides and mRNA with or without a poly(A) tail. Biochemical and biophysical studies revealed that the 15-amino acid peptide has a strong binding affinity to poly(A) alone or poly(A) attached at the 3' end of mRNA. Circular dichroism spectroscopy demonstrated that the secondary structure of the 15-mer is consistent with that expected based on the structure of the native RNP domain. Furthermore, among the various mononucleotides performed in the present studies, ATP was preferentially found to bind to the 15-mer. To further examine the biological significance of the binding of the 15-mer to the poly(A) tail of mRNA, in vitro translation of the mRNA poly(A)+ in the presence of the 15-mer drastically increased globin synthesis by almost 2-fold, while translation of the deadenylated mRNA in the presence of the 15-mer almost did not alter the rate of incorporation of radiolabeled leucine into globin.

  4. The 3'-5' exonuclease site of DNA polymerase III from gram-positive bacteria: definition of a novel motif structure.

    PubMed

    Barnes, M H; Spacciapoli, P; Li, D H; Brown, N C

    1995-11-07

    The primary structure of the 3'-5' exonuclease (Exo) site of the Gram+ bacterial DNA polymerase III (Pol III) was examined by site-directed mutagenesis of Bacillus subtilis Pol III (BsPol III). It was found to differ significantly from the conventional three-motif substructure established for the Exo site of DNA polymerase I of Escherichia coli (EcPol I) and the majority of other DNA polymerase-exonucleases. Motifs I and II were conventionally organized and anchored functionally by the predicted carboxylate residues. However, the conventional downstream motif, motif III, was replaced by motif III epsilon, a novel 55-amino-acid (aa) segment incorporating three essential aa (His565, Asp533 and Asp570) which are strictly conserved in three Gram+ Pol III and in the Ec Exo epsilon (epsilon). Despite its unique substructure, the Gram+ Pol III-specific Exo site was conventionally independent of Pol, the site of 2'-deoxyribonucleoside 5-triphosphate (dNTP) binding and polymerization. The entire Exo site, including motif III epsilon, could be deleted without profoundly affecting the enzyme's capacity to polymerize dNTPs. Conversely, Pol and all other sequences downstream of the Exo site could be deleted with little apparent effect on Exo activity. Whether the three essential aa within the unique motif III epsilon substructure participate in the conventional two-metal-ion mechanism elucidated for the model Exo site of EcPol I, remains to be established.

  5. The amino acid sequence of cytochromes c-551 from three species of Pseudomonas

    PubMed Central

    Ambler, R. P.; Wynn, Margaret

    1973-01-01

    The amino acid sequences of the cytochromes c-551 from three species of Pseudomonas have been determined. Each resembles the protein from Pseudomonas strain P6009 (now known to be Pseudomonas aeruginosa, not Pseudomonas fluorescens) in containing 82 amino acids in a single peptide chain, with a haem group covalently attached to cysteine residues 12 and 15. In all four sequences 43 residues are identical. Although by bacteriological criteria the organisms are closely related, the differences between pairs of sequences range from 22% to 39%. These values should be compared with the differences in the sequence of mitochondrial cytochrome c between mammals and amphibians (about 18%) or between mammals and insects (about 33%). Detailed evidence for the amino acid sequences of the proteins has been deposited as Supplementary Publication SUP 50015 at the National Lending Library for Science and Technology, Boston Spa, Yorks. LS23 7BQ, U.K., from whom copies can be obtained on the terms indicated in Biochem. J. (1973), 131, 5. PMID:4352718

  6. Draft Genome Sequence of Sorghum Grain Mold Fungus Epicoccum sorghinum, a Producer of Tenuazonic Acid

    PubMed Central

    Oliveira, Rodrigo C.; Davenport, Karen W.; Hovde, Blake; Silva, Danielle; Chain, Patrick S. G.; Correa, Benedito

    2017-01-01

    ABSTRACT The facultative plant pathogen Epicoccum sorghinum is associated with grain mold of sorghum and produces the mycotoxin tenuazonic acid. This fungus can have serious economic impact on sorghum production. Here, we report the draft genome sequence of E. sorghinum (USPMTOX48). PMID:28126937

  7. Snake venom. The amino acid sequence of protein A from Dendroaspis polylepis polylepis (black mamba) venom.

    PubMed

    Joubert, F J; Strydom, D J

    1980-12-01

    Protein A from Dendroaspis polylepis polylepis venom comprises 81 amino acids, including ten half-cystine residues. The complete primary structures of protein A and its variant A' were elucidated. The sequences of proteins A and A', which differ in a single position, show no homology with various neurotoxins and non-neurotoxic proteins and represent a new type of elapid venom protein.

  8. Draft Genome Sequence of Bacillus coagulans NL01, a Wonderful l-Lactic Acid Producer

    PubMed Central

    Zheng, Zhaojuan; Jiang, Ting; Lin, Xi; Zhou, Jie

    2015-01-01

    Here, we report the draft genome sequence of Bacillus coagulans NL01, which could produce high optically pure l-lactic acid using xylose as a sole carbon source. The draft genome is 3,505,081 bp, with 144 contigs. About 3,903 protein-coding genes and 92 rRNAs are predicted from this assembly. PMID:26089419

  9. MADMX: a strategy for maximal dense motif extraction.

    PubMed

    Grossi, Roberto; Pietracaprina, Andrea; Pisanti, Nadia; Pucci, Geppino; Upfal, Eli; Vandin, Fabio

    2011-04-01

    We develop, analyze, and experiment with a new tool, called MADMX, which extracts frequent motifs from biological sequences. We introduce the notion of density to single out the "significant" motifs. The density is a simple and flexible measure for bounding the number of don't cares in a motif, defined as the fraction of solid (i.e., different from don't care) characters in the motif. A maximal dense motif has density above a certain threshold, and any further specialization of a don't care symbol in it or any extension of its boundaries decreases its number of occurrences in the input sequence. By extracting only maximal dense motifs, MADMX reduces the output size and improves performance, while enhancing the quality of the discoveries. The efficiency of our approach relies on a newly defined combining operation, dubbed fusion, which allows for the construction of maximal dense motifs in a bottom-up fashion, while avoiding the generation of nonmaximal ones. We provide experimental evidence of the efficiency and the quality of the motifs returned by MADMX.

  10. A Bioinformatics Approach for Detecting Repetitive Nested Motifs using Pattern Matching

    PubMed Central

    Romero, José R.; Carballido, Jessica A.; Garbus, Ingrid; Echenique, Viviana C.; Ponzoni, Ignacio

    2016-01-01

    The identification of nested motifs in genomic sequences is a complex computational problem. The detection of these patterns is important to allow the discovery of transposable element (TE) insertions, incomplete reverse transcripts, deletions, and/or mutations. In this study, a de novo strategy for detecting patterns that represent nested motifs was designed based on exhaustive searches for pairs of motifs and combinatorial pattern analysis. These patterns can be grouped into three categories, motifs within other motifs, motifs flanked by other motifs, and motifs of large size. The methodology used in this study, applied to genomic sequences from the plant species Aegilops tauschii and Oryza sativa, revealed that it is possible to identify putative nested TEs by detecting these three types of patterns. The results were validated through BLAST alignments, which revealed the efficacy and usefulness of the new method, which is called Mamushka. PMID:27812277

  11. Convergent evolution and mimicry of protein linear motifs in host-pathogen interactions.

    PubMed

    Chemes, Lucía Beatriz; de Prat-Gay, Gonzalo; Sánchez, Ignacio Enrique

    2015-06-01

    Pathogen linear motif mimics are highly evolvable elements that facilitate rewiring of host protein interaction networks. Host linear motifs and pathogen mimics differ in sequence, leading to thermodynamic and structural differences in the resulting protein-protein interactions. Moreover, the functional output of a mimic depends on the motif and domain repertoire of the pathogen protein. Regulatory evolution mediated by linear motifs can be understood by measuring evolutionary rates, quantifying positive and negative selection and performing phylogenetic reconstructions of linear motif natural history. Convergent evolution of linear motif mimics is widespread among unrelated proteins from viral, prokaryotic and eukaryotic pathogens and can also take place within individual protein phylogenies. Statistics, biochemistry and laboratory models of infection link pathogen linear motifs to phenotypic traits such as tropism, virulence and oncogenicity. In vitro evolution experiments and analysis of natural sequences suggest that changes in linear motif composition underlie pathogen adaptation to a changing environment.

  12. A Bioinformatics Approach for Detecting Repetitive Nested Motifs using Pattern Matching.

    PubMed

    Romero, José R; Carballido, Jessica A; Garbus, Ingrid; Echenique, Viviana C; Ponzoni, Ignacio

    2016-01-01

    The identification of nested motifs in genomic sequences is a complex computational problem. The detection of these patterns is important to allow the discovery of transposable element (TE) insertions, incomplete reverse transcripts, deletions, and/or mutations. In this study, a de novo strategy for detecting patterns that represent nested motifs was designed based on exhaustive searches for pairs of motifs and combinatorial pattern analysis. These patterns can be grouped into three categories, motifs within other motifs, motifs flanked by other motifs, and motifs of large size. The methodology used in this study, applied to genomic sequences from the plant species Aegilops tauschii and Oryza sativa, revealed that it is possible to identify putative nested TEs by detecting these three types of patterns. The results were validated through BLAST alignments, which revealed the efficacy and usefulness of the new method, which is called Mamushka.

  13. Discriminative motif analysis of high-throughput dataset

    PubMed Central

    Yao, Zizhen; MacQuarrie, Kyle L.; Fong, Abraham P.; Tapscott, Stephen J.; Ruzzo, Walter L.; Gentleman, Robert C.

    2014-01-01

    Motivation: High-throughput ChIP-seq studies typically identify thousands of peaks for a single transcription factor (TF). It is common for traditional motif discovery tools to predict motifs that are statistically significant against a naïve background distribution but are of questionable biological relevance. Results: We describe a simple yet effective algorithm for discovering differential motifs between two sequence datasets that is effective in eliminating systematic biases and scalable to large datasets. Tested on 207 ENCODE ChIP-seq datasets, our method identifies correct motifs in 78% of the datasets with known motifs, demonstrating improvement in both accuracy and efficiency compared with DREME, another state-of-art discriminative motif discovery tool. More interestingly, on the remaining more challenging datasets, we identify common technical or biological factors that compromise the motif search results and use advanced features of our tool to control for these factors. We also present case studies demonstrating the ability of our method to detect single base pair differences in DNA specificity of two similar TFs. Lastly, we demonstrate discovery of key TF motifs involved in tissue specification by examination of high-throughput DNase accessibility data. Availability: The motifRG package is publically available via the bioconductor repository. Contact: yzizhen@fhcrc.org Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24162561

  14. Amino acid sequences of heterotrophic and photosynthetic ferredoxins from the tomato plant (Lycopersicon esculentum Mill.).

    PubMed

    Kamide, K; Sakai, H; Aoki, K; Sanada, Y; Wada, K; Green, L S; Yee, B C; Buchanan, B B

    1995-11-01

    Several forms (isoproteins) of ferredoxin in roots, leaves, and green and red pericarps in tomato plants (Lycopersicon esculentum Mill.) were earlier identified on the basis of N-terminal amino acid sequence and chromatographic behavior (Green et al. 1991). In the present study, a large scale preparation made possible determination of the full length amino acid sequence of the two ferredoxins from leaves. The ferredoxins characteristic of fruit and root were sequenced from the amino terminus to the 30th residue or beyond. The leaf ferredoxins were confirmed to be expressed in pericarp of both green and red fruit. The ferredoxins characteristic of fruit and root appeared to be restricted to those tissue. The results extend earlier findings in demonstrating that ferredoxin occurs in the major organs of the tomato plant where it appears to function irrespective of photosynthetic competence.

  15. Amino acid sequence of myoglobin from white-tailed deer (Odocoileus virginianus).

    PubMed

    Joseph, Poulson; Suman, Surendranath P; Li, Shuting; Fontaine, Michele; Steinke, Laurey

    2012-10-01

    Our objective was to determine the primary structure of white-tailed deer myoglobin (Mb). White-tailed deer Mb was isolated from cardiac muscles employing ammonium sulfate precipitation and gel-filtration chromatography. The amino acid sequence was determined by Edman degradation. Sequence analyses of intact Mb as well as tryptic- and cyanogen bromide-peptides yielded the complete primary structure of white-tailed deer Mb, which shared 100% similarity with red deer Mb. White-tailed deer Mb consists of 153 amino acid residues and shares more than 96% sequence similarity with myoglobins from meat-producing ruminants, such as cattle, buffalo, sheep, and goat. Similar to sheep and goat myoglobins, white-tailed deer Mb contains 12 histidine residues. Proximal (position 93) and distal (position 64) histidine residues responsible for maintaining the stability of heme are conserved in white-tailed deer Mb.

  16. Nucleotide sequence and the encoded amino acids of human apolipoprotein A-I mRNA.

    PubMed Central

    Law, S W; Brewer, H B

    1984-01-01

    The cDNA clones encoding the precursor form of human liver apolipoprotein A-I (apoA-I), preproapoA-I, have been isolated from a cDNA library. A 17-base synthetic oligonucleotide based on residues 108-113 of apoA-I and a 26-base primer-extended, dideoxynucleotide-terminated cDNA were used as hybridization probes to select for recombinant plasmids bearing the apoA-I sequence. The complete nucleic acid sequence of human liver preproapoA-I has been determined by analysis of the cloned cDNA. The sequence is composed of 801 nucleotides encoding 267 amino acid residues. PreproapoA-I contains an 18-amino-acid prepeptide and a 6-amino-acid propeptide connected to the amino terminus of the 243-amino acid mature apoA-I. Southern blotting analysis of chromosomal DNA obtained from peripheral blood indicated the apoA-I gene is contained in a 2.1-kilobase-pair Pst I fragment and there is no gross difference in structural organization between the normal apoA-I gene and the Tangier disease apoA-I gene. Images PMID:6198645

  17. cWINNOWER Algorithm for Finding Fuzzy DNA Motifs

    NASA Technical Reports Server (NTRS)

    Liang, Shoudan

    2003-01-01

    The cWINNOWER algorithm detects fuzzy motifs in DNA sequences rich in protein-binding signals. A signal is defined as any short nucleotide pattern having up to d mutations differing from a motif of length l. The algorithm finds such motifs if multiple mutated copies of the motif (i.e., the signals) are present in the DNA sequence in sufficient abundance. The cWINNOWER algorithm substantially improves the sensitivity of the winnower method of Pevzner and Sze by imposing a consensus constraint, enabling it to detect much weaker signals. We studied the minimum number of detectable motifs qc as a function of sequence length N for random sequences. We found that qc increases linearly with N for a fast version of the algorithm based on counting three-member sub-cliques. Imposing consensus constraints reduces qc, by a factor of three in this case, which makes the algorithm dramatically more sensitive. Our most sensitive algorithm, which counts four-member sub-cliques, needs a minimum of only 13 signals to detect motifs in a sequence of length N = 12000 for (l,d) = (15,4).

  18. An ion-responsive motif in the second transmembrane segment of rhodopsin-like receptors.

    PubMed

    Parker, M S; Wong, Y Y; Parker, S L

    2008-06-01

    A L(M)xxxD(N, E) motif (x=a non-ionic amino acid residue, most frequently A, S, L or F; small capitals indicating a minor representation) is found in the second transmembrane (tm2) segment of most G-protein coupling metazoan receptors of the rhodopsin family (Rh-GPCRs). Changes in signal transduction, agonist binding and receptor cycling are known for numerous receptors bearing evolved or experimentally introduced mutations in this tm2 motif, especially of its aspartate residue. The [Na(+)] sensitivity of the receptor-agonist interaction relates to this aspartate in a number of Rh-GPCRs. Native non-conservative mutations in the tm2 motif only rarely coincide with significant changes in two other ubiquitous features of the rhodopsin family, the seventh transmembrane N(D)PxxY(F) motif and the D(E)RY(W,F) or analogous sequence at the border of the third transmembrane helix and the second intracellular loop. Native tm2 mutations with Rh-GPCRs frequently result in constitutive signaling, and with visual opsins also in shifts to short-wavelength sensitivity. Substitution of a strongly basic residue for the tm2 aspartate in Taste-2 receptors could be connected to a lack of sodium sensing by these receptors. These properties could be consistent with ionic interactions, and even of ion transfer, that involve the tm2 motif. A decrease in cation sensing by this motif is usually connected to an enhanced constitutive interaction of the mutated receptors with cognate G- proteins, and also relates to both the constitutive and the overall activity of the short-wavelength opsins.

  19. Software scripts for quality checking of high-throughput nucleic acid sequencers.

    PubMed

    Lazo, G R; Tong, J; Miller, R; Hsia, C; Rausch, C; Kang, Y; Anderson, O D

    2001-06-01

    We have developed a graphical interface to allow the researcher to view and assess the quality of sequencing results using a series of program scripts developed to process data generated by automated sequencers. The scripts are written in Perl programming language and are executable under the cgibin directory of a Web server environment. The scripts direct nucleic acid sequencing trace file data output from automated sequencers to be analyzed by the phred molecular biology program and are displayed as graphical hypertext mark-up language (HTML) pages. The scripts are mainly designed to handle 96-well microtiter dish samples, but the scripts are also able to read data from 384-well microtiter dishes 96 samples at a time. The scripts may be customized for different laboratory environments and computer configurations. Web links to the sources and discussion page are provided.

  20. Mitogen-activated protein kinase 4-like carrying an MEY motif instead of a TXY motif is involved in ozone tolerance and regulation of stomatal closure in tobacco

    PubMed Central

    Yanagawa, Yuki; Yoda, Hiroshi; Osaki, Kohei; Amano, Yuta; Aono, Mitsuko; Seo, Shigemi; Kuchitsu, Kazuyuki; Mitsuhara, Ichiro

    2016-01-01

    The mitogen-activated protein kinases (MAPKs/MPKs) are important factors in the regulation of signal transduction in response to biotic and abiotic stresses. Previously, we characterized a MAPK from tobacco, Nicotiana tabacum MPK4 (NtMPK4). Here, we found a highly homologous gene, NtMPK4-like (NtMPK4L), in tobacco as well as other species in Solanaceae and Gramineae. Deduced amino acid sequences of their translation products carried MEY motifs instead of conserved TXY motifs of the MAPK family. We isolated the full length NtMPK4L gene and examined the physiological functions of NtMPK4L. We revealed that NtMPK4L was activated by wounding, like NtMPK4. However, a constitutively active salicylic acid-induced protein kinase kinase (SIPKKEE), which phosphorylates NtMPK4, did not phosphorylate NtMPK4L. Moreover, a tyrosine residue in the MEY motif was not involved in NtMPK4L activation. We also found that NtMPK4L-silenced plants showed rapid transpiration caused by remarkably open stomata. In addition, NtMPK4L-silenced plants completely lost the ability to close stomata upon ozone treatment and were highly sensitive to ozone, suggesting that this atypical MAPK plays a role in ozone tolerance through stomatal regulation. PMID:27126796

  1. Amino acid sequence of band-3 protein from rainbow trout erythrocytes derived from cDNA.

    PubMed Central

    Hübner, S; Michel, F; Rudloff, V; Appelhans, H

    1992-01-01

    In this report we present the first complete band-3 cDNA sequence of a poikilothermic lower vertebrate. The primary structure of the anion-exchange protein band 3 (AE1) from rainbow trout erythrocytes was determined by nucleotide sequencing of cDNA clones. The overlapping clones have a total length of 3827 bp with a 5'-terminal untranslated region of 150 bp, a 2754 bp open reading frame and a 3'-untranslated region of 924 bp. Band-3 protein from trout erythrocytes consists of 918 amino acid residues with a calculated molecular mass of 101 827 Da. Comparison of its amino acid sequence revealed a 60-65% identity within the transmembrane spanning sequence of band-3 proteins published so far. An additional insertion of 24 amino acid residues within the membrane-associated domain of trout band-3 protein was identified, which until now was thought to be a general feature only of mammalian band-3-related proteins. PMID:1637296

  2. Preparation of Nucleic Acid Libraries for Personalized Sequencing Systems Using an Integrated Microfluidic Hub Technology (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    ScienceCinema

    Patel, Kamlesh D [Ken; SNL,

    2016-07-12

    Kamlesh (Ken) Patel from Sandia National Laboratories (Livermore, California) presents "Preparation of Nucleic Acid Libraries for Personalized Sequencing Systems Using an Integrated Microfluidic Hub Technology " at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  3. Role of the two-component leader sequence and mature amino acid sequences in extracellular export of endoglucanase EGL from Pseudomonas solanacearum.

    PubMed Central

    Huang, J Z; Schell, M A

    1992-01-01

    The egl gene of Pseudomonas solanacearum encodes a 43-kDa extracellular endoglucanase (mEGL) involved in wilt disease caused by this phytopathogen. Egl is initially translated with a 45-residue, two-part leader sequence. The first 19 residues are apparently removed by signal peptidase II during export of Egl across the inner membrane (IM); the remaining residues of the leader sequence (modified with palmitate) are removed during export across the outer membrane (OM). Localization of Egl-PhoA fusion proteins showed that the first 26 residues of the Egl leader sequence are required and sufficient to direct lipid modification, processing, and export of Egl or PhoA across the IM but not the OM. Fusions of the complete 45-residue leader sequence or of the leader and increasing portions of mEgl sequences to PhoA did not cause its export across the OM. In-frame deletion of portions of mEGL-coding sequences blocked export of the truncated polypeptides across the OM without affecting export across the IM. These results indicate that the first part of the leader sequence functions independently to direct export of Egl across the IM while the second part and sequences and structures in mEGL are involved in export across the OM. Computer analysis of the mEgl amino acid sequence obtained from its nucleotide sequence identified a region of mEGL similar in amino acid sequence to regions in other prokaryotic endoglucanases. Images PMID:1735723

  4. Exploiting Publicly Available Biological and Biochemical Information for the Discovery of Novel Short Linear Motifs

    PubMed Central

    Sayadi, Ahmed; Briganti, Leonardo; Tramontano, Anna; Via, Allegra

    2011-01-01

    The function of proteins is often mediated by short linear segments of their amino acid sequence, called Short Linear Motifs or SLiMs, the identification of which can provide important information about a protein function. However, the short length of the motifs and their variable degree of conservation makes their identification hard since it is difficult to correctly estimate the statistical significance of their occurrence. Consequently, only a small fraction of them have been discovered so far. We describe here an approach for the discovery of SLiMs based on their occurrence in evolutionarily unrelated proteins belonging to the same biological, signalling or metabolic pathway and give specific examples of its effectiveness in both rediscovering known motifs and in discovering novel ones. An automatic implementation of the procedure, available for download, allows significant motifs to be identified, automatically annotated with functional, evolutionary and structural information and organized in a database that can be inspected and queried. An instance of the database populated with pre-computed data on seven organisms is accessible through a publicly available server and we believe it constitutes by itself a useful resource for the life sciences (http://www.biocomputing.it/modipath). PMID:21799808

  5. Identification of a binding motif specific to HNF4 by comparative analysis of multiple nuclear receptors

    PubMed Central

    Fang, Bin; Mane-Padros, Daniel; Bolotin, Eugene; Jiang, Tao; Sladek, Frances M.

    2012-01-01

    Nuclear receptors (NRs) regulate gene expression by binding specific DNA sequences consisting of AG[G/T]TCA or AGAACA half site motifs in a variety of configurations. However, those motifs/configurations alone do not adequately explain the diversity of NR function in vivo. Here, a systematic examination of DNA binding specificity by protein-binding microarrays (PBMs) of three closely related human NRs—HNF4α, retinoid X receptor alpha (RXRα) and COUPTF2—reveals an HNF4-specific binding motif (H4-SBM), xxxxCAAAGTCCA, as well as a previously unrecognized polarity in the classical DR1 motif (AGGTCAxAGGTCA) for HNF4α, RXRα and COUPTF2 homodimers. ChIP-seq data indicate that the H4-SBM is uniquely bound by HNF4α but not 10 other NRs in vivo, while NRs PXR, FXRα, Rev-Erbα appear to bind adjacent to H4-SBMs. HNF4-specific DNA recognition and transactivation are mediated by residues Asp69 and Arg76 in the DNA-binding domain; this combination of amino acids is unique to HNF4 among all human NRs. Expression profiling and ChIP data predict ∼100 new human HNF4α target genes with an H4-SBM site, including several Co-enzyme A-related genes and genes with links to disease. These results provide important new insights into NR DNA binding. PMID:22383578

  6. Studies on adenosine triphosphate transphosphorylases. Amino acid sequence of rabbit muscle ATP-AMP transphosphorylase.

    PubMed

    Kuby, S A; Palmieri, R H; Frischat, A; Fischer, A H; Wu, L H; Maland, L; Manship, M

    1984-05-22

    The total amino acid sequence of rabbit muscle adenylate kinase has been determined, and the single polypeptide chain of 194 amino acid residues starts with N-acetylmethionine and ends with leucyllysine at its carboxyl terminus, in agreement with the earlier data on its amino acid composition [Mahowald, T. A., Noltmann, E. A., & Kuby, S. A. (1962) J. Biol. Chem. 237, 1138-1145] and its carboxyl-terminus sequence [Olson, O. E., & Kuby, S. A. (1964) J. Biol. Chem. 239, 460-467]. Elucidation of the primary structure was based on tryptic and chymotryptic cleavages of the performic acid oxidized protein, cyanogen bromide cleavages of the 14C-labeled S-carboxymethylated protein at its five methionine sites (followed by maleylation of peptide fragments), and tryptic cleavages at its 12 arginine sites of the maleylated 14C-labeled S-carboxymethylated protein. Calf muscle myokinase, whose sequence has also been established, differs primarily from the rabbit muscle myokinase's sequence in the following: His-30 is replaced by Gln-30; Lys-56 is replaced by Met-56; Ala-84 and Asp 85 are replaced by Val-84 and Asn-85. A comparison of the four muscle-type adenylate kinases, whose covalent structures have now been determined, viz., rabbit, calf, porcine, and human [for the latter two sequences see Heil, A., Müller, G., Noda, L., Pinder, T., Schirmer, H., Schirmer, I., & Von Zabern, I. (1974) Eur. J. Biochem. 43, 131-144, and Von Zabern, I., Wittmann-Liebold, B., Untucht-Grau, R., Schirmer, R. H., & Pai, E. F. (1976) Eur. J. Biochem. 68, 281-290], demonstrates an extraordinary degree of homology.(ABSTRACT TRUNCATED AT 250 WORDS)

  7. The complete amino acid sequence of a trypsin inhibitor from Bauhinia variegata var. candida seeds.

    PubMed

    Di Ciero, L; Oliva, M L; Torquato, R; Köhler, P; Weder, J K; Camillo Novello, J; Sampaio, C A; Oliveira, B; Marangoni, S

    1998-11-01

    Trypsin inhibitors of two varieties of Bauhinia variegata seeds have been isolated and characterized. Bauhinia variegata candida trypsin inhibitor (BvcTI) and B. variegata lilac trypsin inhibitor (BvlTI) are proteins with Mr of about 20,000 without free sulfhydryl groups. Amino acid analysis shows a high content of aspartic acid, glutamic acid, serine, and glycine, and a low content of histidine, tyrosine, methionine, and lysine in both inhibitors. Isoelectric focusing for both varieties detected three isoforms (pI 4.85, 5.00, and 5.15), which were resolved by HPLC procedure. The trypsin inhibitors show Ki values of 6.9 and 1.2 nM for BvcTI and BvlTI, respectively. The N-terminal sequences of the three trypsin inhibitor isoforms from both varieties of Bauhinia variegata and the complete amino acid sequence of B. variegata var. candida L. trypsin inhibitor isoform 3 (BvcTI-3) are presented. The sequences have been determined by automated Edman degradation of the reduced and carboxymethylated proteins of the peptides resulting from Staphylococcus aureus protease and trypsin digestion. BvcTI-3 is composed of 167 residues and has a calculated molecular mass of 18,529. Homology studies with other trypsin inhibitors show that BvcTI-3 belongs to the Kunitz family. The putative active site encompasses Arg (63)-Ile (64).

  8. Multiple site-selective insertions of non-canonical amino acids into sequence-repetitive polypeptides

    PubMed Central

    Wu, I-Lin; Patterson, Melissa A.; Carpenter Desai, Holly E.; Mehl, Ryan A.; Giorgi, Gianluca

    2013-01-01

    A simple and efficient method is described for introduction of non-canonical amino acids at multiple, structurally defined sites within recombinant polypeptide sequences. E. coli MRA30, a bacterial host strain with attenuated activity for release factor 1 (RF1), is assessed for its ability to support the incorporation of a diverse range of non-canonical amino acids in response to multiple encoded amber (TAG) codons within genetic templates derived from superfolder GFP and an elastin-mimetic protein polymer. Suppression efficiency and isolated protein yield were observed to depend on the identity of the orthogonal aminoacyl-tRNA synthetase/tRNACUA pair and the non-canonical amino acid substrate. This approach afforded elastin-mimetic protein polymers containing non-canonical amino acid derivatives at up to twenty-two positions within the repeat sequence with high levels of substitution. The identity and position of the variant residues was confirmed by mass spectrometric analysis of the full-length polypeptides and proteolytic cleavage fragments resulting from thermolysin digestion. The accumulated data suggest that this multi-site suppression approach permits the preparation of protein-based materials in which novel chemical functionality can be introduced at precisely defined positions within the polypeptide sequence. PMID:23625817

  9. Deduced amino acid sequence of human pulmonary surfactant proteolipid: SPL(pVal)

    SciTech Connect

    Whitsett, J.A.; Glasser, S.W.; Korfhagen, T.R.; Weaver, T.E.; Clark, J.; Pilot-Matias, T.; Meuth, J.; Fox, J.L.

    1987-05-01

    Hydrophobic, proteolipid-like protein of Mr 6500 was isolated from ether/ethanol extracts of human, canine and bovine pulmonary surfactant. Amino acid composition of the protein demonstrated a remarkable abundance of hydrophobic residues, particularly valine and leucine. The N-terminal amino acid sequence of the human protein was determined: N-Leu-Ile-Pro-Cys-Cys-Pro-Val-Asn-Leu-Lys-Arg-Leu-Leu-Ile-Val4... An oligonucleotide probe was used to screen an adult human lung cDNA library and resulted in detection of cDNA clones with predicted amino acid sequence with close identity to the N-terminal amino acid sequence of the human peptide. SPL(pVal) was found within the reading frame of a larger peptide. SPL(pVal) results from proteolytic processing of a larger preprotein. Northern blot analysis detected in a single 1.0 kilobase SPL(pVal) RNA which was less abundant in fetal than in adult lung. Mixtures of purified canine and bovine SPL(pVal) and synthetic phospholipids display properties of rapid adsorption and surface tension lowering activity characteristic of surfactant. Human SPL(pVal) is a pulmonary surfactant proteolipid which may therefore be useful in combination with phospholipids and/or other surfactant proteins for the treatment of surfactant deficiency such as hyaline membrane disease in newborn infants.

  10. SUBGROUPS OF AMINO ACID SEQUENCES IN THE VARIABLE REGIONS OF IMMUNOGLOBULIN HEAVY CHAINS*

    PubMed Central

    Cunningham, Bruce A.; Pflumm, Mollie N.; User, Urs Rutisha; Edelman, Gerald M.

    1969-01-01

    The amino acid sequence of the first 133 residues of the heavy (γ) chain from a human γG immunoglobulin (He) has been determined. This γ-chain is identical in Gm type to that of protein Eu, the complete sequence of which has been reported. Comparison of the two sequences substantiates the previous suggestion that there are subgroups of variable regions of heavy chains. The variable region of Eu has been assigned to subgroup I and that of He to subgroup II; on the other hand, the constant regions of the two proteins appear to be identical. Comparison of the sequence of the heavy chain of He with the heavy chain sequences determined in other laboratories suggests that the variable region of subgroup II is at least 118 residues long. The nature and distribution of amino acid variations in this heavy chain subgroup resemble those observed in light chain subgroups. These studies provide evidence that the translocation hypothesis applies to heavy as well as to light chains, viz., genes for variable regions (V) are somatically translocated to genes for constant regions (C) to form complete VC structural genes. Images PMID:5264153

  11. Complete nucleic acid sequence of Penaeus stylirostris densovirus (PstDNV) from India.

    PubMed

    Rai, Praveen; Safeena, Muhammed P; Karunasagar, Iddya; Karunasagar, Indrani

    2011-06-01

    Infectious hypodermal and hematopoietic necrosis virus (IHHNV) of shrimp, recently been classified as Penaeus stylirostris densovirus (PstDNV). The complete nucleic acid sequence of PstDNV from India was obtained by cloning and sequencing of different DNA fragment of the virus. The genome organisation of PstDNV revealed that there were three major coding domains: a left ORF (NS1) of 2001 bp, a mid ORF (NS2) of 1092 bp and a right ORF (VP) of 990 bp. The complete genome and amino acid sequences of three proteins viz., NS1, NS2 and VP were compared with the genomes of the virus reported from Hawaii, China and Mexico and with partial sequence available from isolates from different regions. The phylogenetic analysis of shrimp, insect and vertebrate parvovirus sequences showed that the Indian PstDNV isolate is phylogenetically more closely related to one of the three isolates from Taiwan (AY355307), and two isolates (AY362547 and AY102034) from Thailand.

  12. DNA Cloning of Plasmodium falciparum Circumsporozoite Gene: Amino Acid Sequence of Repetitive Epitope

    NASA Astrophysics Data System (ADS)

    Enea, Vincenzo; Ellis, Joan; Zavala, Fidel; Arnot, David E.; Asavanich, Achara; Masuda, Aoi; Quakyi, Isabella; Nussenzweig, Ruth S.

    1984-08-01

    A clone of complementary DNA encoding the circumsporozoite (CS) protein of the human malaria parasite Plasmodium falciparum has been isolated by screening an Escherichia coli complementary DNA library with a monoclonal antibody to the CS protein. The DNA sequence of the complementary DNA insert encodes a four-amino acid sequence: proline-asparagine-alanine-asparagine, tandemly repeated 23 times. The CS β -lactamase fusion protein specifically binds monoclonal antibodies to the CS protein and inhibits the binding of these antibodies to native Plasmodium falciparum CS protein. These findings provide a basis for the development of a vaccine against Plasmodium falciparum malaria.

  13. Amino-Acid Sequence of NADP-Specific Glutamate Dehydrogenase of Neurospora crassa

    PubMed Central

    Wootton, John C.; Chambers, Geoffrey K.; Holder, Anthony A.; Baron, Andrew J.; Taylor, John G.; Fincham, John R. S.; Blumenthal, Kenneth M.; Moon, Kenneth; Smith, Emil L.

    1974-01-01

    A tentative primary structure of the NADP-specific glutamate dehydrogenase [L-glutamate: NADP oxidoreductase (deaminating), EC 1.4.1.4] from Neurospora crassa has been determined. The proposed sequence contains 452 amino-acid residues in each of the identical subunits of the hexameric enzyme. Comparison of the sequence with that of the bovine liver enzyme reveals considerable homology in the amino-terminal portion of the chain, including the vicinity of the reactive lysine, with only shorter stretches of homology within the carboxyl-terminal regions. The significance of this distribution of homologous regions is discussed. PMID:4155068

  14. Brickworx builds recurrent RNA and DNA structural motifs into medium- and low-resolution electron-density maps

    SciTech Connect

    Chojnowski, Grzegorz; Waleń, Tomasz; Piątkowski, Paweł; Potrzebowski, Wojciech; Bujnicki, Janusz M.

    2015-03-01

    A computer program that builds crystal structure models of nucleic acid molecules is presented. Brickworx is a computer program that builds crystal structure models of nucleic acid molecules using recurrent motifs including double-stranded helices. In a first step, the program searches for electron-density peaks that may correspond to phosphate groups; it may also take into account phosphate-group positions provided by the user. Subsequently, comparing the three-dimensional patterns of the P atoms with a database of nucleic acid fragments, it finds the matching positions of the double-stranded helical motifs (A-RNA or B-DNA) in the unit cell. If the target structure is RNA, the helical fragments are further extended with recurrent RNA motifs from a fragment library that contains single-stranded segments. Finally, the matched motifs are merged and refined in real space to find the most likely conformations, including a fit of the sequence to the electron-density map. The Brickworx program is available for download and as a web server at http://iimcb.genesilico.pl/brickworx.

  15. Method for high-volume sequencing of nucleic acids: random and directed priming with libraries of oligonucleotides

    DOEpatents

    Studier, F.W.

    1995-04-18

    Random and directed priming methods for determining nucleotide sequences by enzymatic sequencing techniques, using libraries of primers of lengths 8, 9 or 10 bases, are disclosed. These methods permit direct sequencing of nucleic acids as large as 45,000 base pairs or larger without the necessity for subcloning. Individual primers are used repeatedly to prime sequence reactions in many different nucleic acid molecules. Libraries containing as few as 10,000 octamers, 14,200 nonamers, or 44,000 decamers would have the capacity to determine the sequence of almost any cosmid DNA. Random priming with a fixed set of primers from a smaller library can also be used to initiate the sequencing of individual nucleic acid molecules, with the sequence being completed by directed priming with primers from the library. In contrast to random cloning techniques, a combined random and directed priming strategy is far more efficient. 2 figs.

  16. Method for high-volume sequencing of nucleic acids: random and directed priming with libraries of oligonucleotides

    DOEpatents

    Studier, F. William

    1995-04-18

    Random and directed priming methods for determining nucleotide sequences by enzymatic sequencing techniques, using libraries of primers of lengths 8, 9 or 10 bases, are disclosed. These methods permit direct sequencing of nucleic acids as large as 45,000 base pairs or larger without the necessity for subcloning. Individual primers are used repeatedly to prime sequence reactions in many different nucleic acid molecules. Libraries containing as few as 10,000 octamers, 14,200 nonamers, or 44,000 decamers would have the capacity to determine the sequence of almost any cosmid DNA. Random priming with a fixed set of primers from a smaller library can also be used to initiate the sequencing of individual nucleic acid molecules, with the sequence being completed by directed priming with primers from the library. In contrast to random cloning techniques, a combined random and directed priming strategy is far more efficient.

  17. Amino acid sequences recognized by T cells: studies on a merozoite surface antigen from the FCQ-27/PNG isolate of Plasmodium falciparum.

    PubMed

    Rzepczyk, C M; Csurhes, P A; Baxter, E P; Doran, T J; Irving, D O; Kere, N

    1990-08-01

    Twenty-six overlapping peptides, spanning the entire FCQ-27/PNG sequence of the Plasmodium falciparum antigen known as merozoite surface antigen 2 were screened for their ability to induce the proliferation of peripheral blood lymphocytes (PBL) obtained from 12 donors living in Honiara, Solomon Islands where P. falciparum is endemic. A recombinant (r) form of MSA2, known as Ag 1609 was also screened in these assays and tetanus toxoid (TT) antigen was included as a control. The location of the predicted T cell determinants within MSA2 was examined using the algorithm, AMPHI and by scanning MSA2 for amino acid sequences showing the Rothbard motif. There were 13 predicted amphipathic helical sites and five examples of Rothbard sequences in the antigen. The location of these with regard to the peptides tested is shown. Nine of the 12 individuals responded to TT with high stimulation indices (greater than 4) being obtained in the majority of donors. Only three individuals responded to r-MSA2 with the stimulation indices (SI) in the range of 2.4-4.1. Peptides from both the constant and variable regions of MSA2 were recognized in the proliferative assays. However, the majority of the positive proliferative responses were to peptides which spanned the central variable region which included the two copies of the 32-amino-acid repeat occurring in the antigen. High SI comparable to those obtained to TT were seen in some individuals with some peptides. There was considerable variation between donors in number and nature of the peptides recognised and two donors did not respond to any of the antigens tested. The significance of these findings to vaccine development is discussed.

  18. Sequence-specific thermodynamic properties of nucleic acids influence both transcriptional pausing and backtracking in yeast

    PubMed Central

    2017-01-01

    RNA Polymerase II pauses and backtracks during transcription, with many consequences for gene expression and cellular physiology. Here, we show that the energy required to melt double-stranded nucleic acids in the transcription bubble predicts pausing in Saccharomyces cerevisiae far more accurately than nucleosome roadblocks do. In addition, the same energy difference also determines when the RNA polymerase backtracks instead of continuing to move forward. This data-driven model corroborates—in a genome wide and quantitative manner—previous evidence that sequence-dependent thermodynamic features of nucleic acids influence both transcriptional pausing and backtracking. PMID:28301878

  19. Respiratory syncytial virus fusion glycoprotein: nucleotide sequence of mRNA, identification of cleavage activation site and amino acid sequence of N-terminus of F1 subunit.

    PubMed Central

    Elango, N; Satake, M; Coligan, J E; Norrby, E; Camargo, E; Venkatesan, S

    1985-01-01

    The amino acid sequence of respiratory syncytial virus fusion protein (Fo) was deduced from the sequence of a partial cDNA clone of mRNA and from the 5' mRNA sequence obtained by primer extension and dideoxysequencing. The encoded protein of 574 amino acids is extremely hydrophobic and has a molecular weight of 63371 daltons. The site of proteolytic cleavage within this protein was accurately mapped by determining a partial amino acid sequence of the N-terminus of the larger subunit (F1) purified by radioimmunoprecipitation using monoclonal antibodies. Alignment of the N-terminus of the F1 subunit within the deduced amino acid sequence of Fo permitted us to identify a sequence of lys-lys-arg-lys-arg-arg at the C-terminus of the smaller N-terminal F2 subunit that appears to represent the cleavage/activation domain. Five potential sites of glycosylation, four within the F2 subunit, were also identified. Three extremely hydrophobic domains are present in the protein; a) the N-terminal signal sequence, b) the N-terminus of the F1 subunit that is analogous to the N-terminus of the paramyxovirus F1 subunit and the HA2 subunit of influenza virus hemagglutinin, and c) the putative membrane anchorage domain near the C-terminus of F1. Images PMID:2987829

  20. Analysis of protein function and its prediction from amino acid sequence.

    PubMed

    Clark, Wyatt T; Radivojac, Predrag

    2011-07-01

    Understanding protein function is one of the keys to understanding life at the molecular level. It is also important in the context of human disease because many conditions arise as a consequence of alterations of protein function. The recent availability of relatively inexpensive sequencing technology has resulted in thousands of complete or partially sequenced genomes with millions of functionally uncharacterized proteins. Such a large volume of data, combined with the lack of high-throughput experimental assays to functionally annotate proteins, attributes to the growing importance of automated function prediction. Here, we study proteins annotated by Gene Ontology (GO) terms and estimate the accuracy of functional transfer from protein sequence only. We find that the transfer of GO terms by pairwise sequence alignments is only moderately accurate, showing a surprisingly small influence of sequence identity (SID) in a broad range (30-100%). We developed and evaluated a new predictor of protein function, functional annotator (FANN), from amino acid sequence. The predictor exploits a multioutput neural network framework which is well suited to simultaneously modeling dependencies between functional terms. Experiments provide evidence that FANN-GO (predictor of GO terms; available from http://www.informatics.indiana.edu/predrag) outperforms standard methods such as transfer by global or local SID as well as GOtcha, a method that incorporates the structure of GO.

  1. The Complete Genome Sequence of the Lactic Acid Bacterium Lactococcus lactis ssp. lactis IL1403

    PubMed Central

    Bolotin, Alexander; Wincker, Patrick; Mauger, Stéphane; Jaillon, Olivier; Malarme, Karine; Weissenbach, Jean; Ehrlich, S. Dusko; Sorokin, Alexei

    2001-01-01

    Lactococcus lactis is a nonpathogenic AT-rich gram-positive bacterium closely related to the genus Streptococcus and is the most commonly used cheese starter. It is also the best-characterized lactic acid bacterium. We sequenced the genome of the laboratory strain IL1403, using a novel two-step strategy that comprises diagnostic sequencing of the entire genome and a shotgun polishing step. The genome contains 2,365,589 base pairs and encodes 2310 proteins, including 293 protein-coding genes belonging to six prophages and 43 insertion sequence (IS) elements. Nonrandom distribution of IS elements indicates that the chromosome of the sequenced strain may be a product of recent recombination between two closely related genomes. A complete set of late competence genes is present, indicating the ability of L. lactis to undergo DNA transformation. Genomic sequence revealed new possibilities for fermentation pathways and for aerobic respiration. It also indicated a horizontal transfer of genetic information from Lactococcus to gram-negative enteric bacteria of Salmonella-Escherichia group. [The sequence data described in this paper has been submitted to the GenBank data library under accession no. AE005176.] PMID:11337471

  2. MINER: software for phylogenetic motif identification.

    PubMed

    La, David; Livesay, Dennis R

    2005-07-01

    MINER is web-based software for phylogenetic motif (PM) identification. PMs are sequence regions (fragments) that conserve the overall familial phylogeny. PMs have been shown to correspond to a wide variety of catalytic regions, substrate-binding sites and protein interfaces, making them ideal functional site predictions. The MINER output provides an intuitive interface for interactive PM sequence analysis and structural visualization. The web implementation of MINER is freely available at http://www.pmap.csupomona.edu/MINER/. Source code is available to the academic community on request.

  3. Amino acid sequence of myoglobin from emu (Dromaius novaehollandiae) skeletal muscle.

    PubMed

    Suman, S P; Joseph, P; Li, S; Beach, C M; Fontaine, M; Steinke, L

    2010-11-01

    The objective of the present study was to characterize the primary structure of emu myoglobin (Mb). Emu Mb was isolated from Iliofibularis muscle employing gel-filtration chromatography. Matrix Assisted Laser Desorption Ionization-Time of Flight Mass Spectrometry was employed to determine the exact molecular mass of emu Mb in comparison with horse Mb, and Edman degradation was utilized to characterize the amino acid sequence. The molecular mass of emu Mb was 17,380 Da and was close to those reported for ratite and poultry myoglobins. Similar to myoglobins from meat-producing livestock and birds, emu Mb has 153 amino acids. Emu Mb contains 9 histidines. Proximal and distal histidines, responsible for coordinating oxygen-binding property of Mb, are conserved in emu. Emu Mb shared more than 90% homology with ratite and chicken myoglobins, whereas it demonstrated only less than 70% sequence similarity with ruminant myoglobins.

  4. Stereochemical Sequence Ion Selectivity: Proline versus Pipecolic-acid-containing Protonated Peptides

    NASA Astrophysics Data System (ADS)

    Abutokaikah, Maha T.; Guan, Shanshan; Bythell, Benjamin J.

    2017-01-01

    Substitution of proline by pipecolic acid, the six-membered ring congener of proline, results in vastly different tandem mass spectra. The well-known proline effect is eliminated and amide bond cleavage C-terminal to pipecolic acid dominates instead. Why do these two ostensibly similar residues produce dramatically differing spectra? Recent evidence indicates that the proton affinities of these residues are similar, so are unlikely to explain the result [Raulfs et al., J. Am. Soc. Mass Spectrom. 25, 1705-1715 (2014)]. An additional hypothesis based on increased flexibility was also advocated. Here, we provide a computational investigation of the "pipecolic acid effect," to test this and other hypotheses to determine if theory can shed additional light on this fascinating result. Our calculations provide evidence for both the increased flexibility of pipecolic-acid-containing peptides, and structural changes in the transition structures necessary to produce the sequence ions. The most striking computational finding is inversion of the stereochemistry of the transition structures leading to "proline effect"-type amide bond fragmentation between the proline/pipecolic acid-congeners: R (proline) to S (pipecolic acid). Additionally, our calculations predict substantial stabilization of the amide bond cleavage barriers for the pipecolic acid congeners by reduction in deleterious steric interactions and provide evidence for the importance of experimental energy regime in rationalizing the spectra.

  5. Self-sequencing of amino acids and origins of polyfunctional protocells

    NASA Technical Reports Server (NTRS)

    Fox, S. W.

    1984-01-01

    The role of proteins in the origin of living things is discussed. It has been experimentally established that amino acids can sequence themselves under simulated geological conditions with highly nonrandom products which accordingly contain diverse information. Multiple copies of each type of macromolecule are formed, resulting in greater power for any protoenzymic molecule than would accrue from a single copy of each type. Thermal proteins are readily incorporated into laboratory protocells. The experimental evidence for original polyfunctional protocells is discussed.

  6. Amino acid sequence of atrial natriuretic peptides in human coronary sinus plasma.

    PubMed

    Yandle, T; Crozier, I; Nicholls, G; Espiner, E; Carne, A; Brennan, S

    1987-07-31

    Two atrial natriuretic peptides were purified from pooled human coronary sinus plasma by Sep-Pak extraction, immunoaffinity chromatography and reverse phase HPLC. The amino acid sequences of the two peptides were homologous with 99-126 human atrial natriuretic peptide (hANP) and 106-126 hANP, the latter being most probably linked to 99-105 ANP by the disulphide bond. The molar ratio of the peptides in plasma, as assessed by radioimmunoassay was 10:3.

  7. Amino Acid Sequences Mediating Vascular Cell Adhesion Molecule 1 Binding to Integrin Alpha 4: Homologous DSP Sequence Found for JC Polyoma VP1 Coat Protein

    PubMed Central

    Meyer, Michael Andrew

    2013-01-01

    The JC polyoma viral coat protein VP1 was analyzed for amino acid sequences homologies to the IDSP sequence which mediates binding of VLA-4 (integrin alpha 4) to vascular cell adhesion molecule 1. Although the full sequence was not found, a DSP sequence was located near the critical arginine residue linked to infectivity of the virus and binding to sialic acid containing molecules such as integrins (3). For the JC polyoma virus, a DSP sequence was found at residues 70, 71 and 72 with homology also noted for the mouse polyoma virus and SV40 virus. Three dimensional modeling of the VP1 molecule suggests that the DSP loop has an accessible site for interaction from the external side of the assembled viral capsid pentamer. PMID:24147211

  8. Amino Acid Sequences Mediating Vascular Cell Adhesion Molecule 1 Binding to Integrin Alpha 4: Homologous DSP Sequence Found for JC Polyoma VP1 Coat Protein.

    PubMed

    Meyer, Michael Andrew

    2013-01-01

    The JC polyoma viral coat protein VP1 was analyzed for amino acid sequences homologies to the IDSP sequence which mediates binding of VLA-4 (integrin alpha 4) to vascular cell adhesion molecule 1. Although the full sequence was not found, a DSP sequence was located near the critical arginine residue linked to infectivity of the virus and binding to sialic acid containing molecules such as integrins (3). For the JC polyoma virus, a DSP sequence was found at residues 70, 71 and 72 with homology also noted for the mouse polyoma virus and SV40 virus. Three dimensional modeling of the VP1 molecule suggests that the DSP loop has an accessible site for interaction from the external side of the assembled viral capsid pentamer.

  9. Identification of genes encoding zinc finger motifs in the cardiovascular system.

    PubMed

    Wang, R; Hwang, D M; Cukerman, E; Liew, C C

    1997-01-01

    The Zn2+-finger DNA-binding domain has been identified in several developmental control proteins, transcription factors and gene products associated with diseases, as well as in several RNA-binding proteins. We applied library screening, expressed sequence tagging (EST sequencing), Zn2+-binding assays and Northern blot hybridization, in order to characterize novel cDNA clones of the human cardiovascular system which contain Zn2+-finger motifs. An embryonic (8-10 weeks gestation) heart lambda ZAP Express cDNA library was screened with an oligonucleotide probe deduced from a consensus amino acid sequence which is highly conserved for Zn2+-finger proteins, and approximately 350 positive clones were isolated from 1 x 10(4) plaque-forming units (pfu) initially plated. The isolated clones were classified as known and novel following single pass automated DNA sequencing. Analysis of Northern blot hybridization delineated the tissue specificity of these clones, as well as their association with cardiac growth and development. Existence of Zn2+-finger motifs in the novel clones was confirmed by Zn2+-binding assay. In this report, we present the characterization of eight novel clones, including the complete cDNA sequences of one of these clones (HHZ-123).

  10. Amino acid sequence similarity between rabies virus glycoprotein and snake venom curaremimetic neurotoxins.

    PubMed

    Lentz, T L; Wilson, P T; Hawrot, E; Speicher, D W

    1984-11-16

    Evidence was presented earlier that a host-cell receptor for the highly neurotropic rabies virus might be the acetylcholine receptor. The amino acid sequence of the glycoprotein of rabies virus was compared by computer analysis with that of snake venom curaremimetic neurotoxins, potent ligands of the acetylcholine receptor. A statistically significant sequence relation was found between a segment of the rabies glycoprotein and the entire sequence of long neurotoxins. The greatest identity occurs with residues considered most important in neurotoxicity, including those interacting with the acetylcholine binding site of the acetylcholine receptor. Because of the similarity between the glycoprotein and the receptor-binding region of the neurotoxins, this region of the viral glycoprotein may function as a recognition site for the acetylcholine receptor. Direct binding of the rabies virus glycoprotein to the acetylcholine receptor could contribute to the neurotropism of this virus.

  11. Partial amino acid sequence of human pancreatic stone protein, a novel pancreatic secretory protein.

    PubMed Central

    Montalto, G; Bonicel, J; Multigner, L; Rovery, M; Sarles, H; De Caro, A

    1986-01-01

    Pancreatic stone protein (PSP) is the major organic component of human pancreatic stones. With the use of monoclonal antibody immunoadsorbents, five immunoreactive forms (PSP-S) with close Mr values (14,000-19,000) were isolated from normal pancreatic juice. By CM-Trisacryl M chromatography the lowest-Mr form (PSP-S1) was separated from the others and some of its molecular characteristics were investigated. The Mr of the PSP-S1 polypeptide chain calculated from the amino acid composition was about 16,100. The N-terminal sequences (40 residues) of PSP and PSP-S1 are identical, which suggests that the peptide backbone is the same for both of these polypeptides. The PSP-S1 sequence was determined up to residue 65 and was found to be different from all other known protein sequences. Images Fig. 1. PMID:3541906

  12. Introduction of Ca(2+)-binding amino-acid sequence into the T4 lysozyme.

    PubMed

    Leontiev, V V; Uversky, V N; Permyakov, E A; Murzin, A G

    1993-03-05

    The 51-62 loop of T4 phage lysozyme was altered by site-directed mutagenesis to obtain maximal homology with the typical EF-hand motif. A Ca(2+)-binding site was designed and created by replacing both Gly-51 and Asn-53 with aspartic acid. The mutant T4 lysozyme (G51D/N53D) was expressed in Escherichia coli. The activity of the G51D/N53D-mutant was about 60% of that of the wild-type protein. This mutant can bind Ca2+ ions specifically, while the effective dissociation constant was essentially greater than that of the EF-hand proteins. Stability of the G51D/N53D-mutant apo-form to urea- or temperature-induced denaturation was the same as that of the wild-type protein. In the presence of Ca2+ ions in solution the stability of the mutant T4 phage lysozyme was less than that of the wild-type protein. It is suggested that the binding of Ca2+ by the mutant is accompanied by the considerable conformational changes in the 'corrected' loop, which can lead to the Ca(2+)-induced destabilization of the protein.

  13. Characterization of the microbial acid mine drainage microbial community using culturing and direct sequencing techniques.

    PubMed

    Auld, Ryan R; Myre, Maxine; Mykytczuk, Nadia C S; Leduc, Leo G; Merritt, Thomas J S

    2013-05-01

    We characterized the bacterial community from an AMD tailings pond using both classical culturing and modern direct sequencing techniques and compared the two methods. Acid mine drainage (AMD) is produced by the environmental and microbial oxidation of minerals dissolved from mining waste. Surprisingly, we know little about the microbial communities associated with AMD, despite the fundamental ecological roles of these organisms and large-scale economic impact of these waste sites. AMD microbial communities have classically been characterized by laboratory culturing-based techniques and more recently by direct sequencing of marker gene sequences, primarily the 16S rRNA gene. In our comparison of the techniques, we find that their results are complementary, overall indicating very similar community structure with similar dominant species, but with each method identifying some species that were missed by the other. We were able to culture the majority of species that our direct sequencing results indicated were present, primarily species within the Acidithiobacillus and Acidiphilium genera, although estimates of relative species abundance were only obtained from direct sequencing. Interestingly, our culture-based methods recovered four species that had been overlooked from our sequencing results because of the rarity of the marker gene sequences, likely members of the rare biosphere. Further, direct sequencing indicated that a single genus, completely missed in our culture-based study, Legionella, was a dominant member of the microbial community. Our results suggest that while either method does a reasonable job of identifying the dominant members of the AMD microbial community, together the methods combine to give a more complete picture of the true diversity of this environment.

  14. Cloning of the BssHII restriction-modification system in Escherichia coli : BssHII methyltransferase contains circularly permuted cytosine-5 methyltransferase motifs.

    PubMed Central

    Xu, S; Xiao, J; Posfai, J; Maunus, R; Benner, J

    1997-01-01

    BssHII restriction endonuclease cleaves 5'-GCGCGC-3' on double-stranded DNA between the first and second bases to generate a four base 5'overhang. BssHII restriction endonuclease was purified from the native Bacillus stearothermophilus H3 cells and its N-terminal amino acid sequence was determined. Degenerate PCR primers were used to amplify the first 20 codons of the BssHII restriction endonuclease gene. The BssHII restriction endonuclease gene (bssHIIR) and the cognate BssHII methyltransferase gene (bssHIIM) were cloned in Escherichia coli by amplification of Bacillus stearothermophilus genomic DNA using PCR and inverse PCR. BssHII methyltransferase (M.BssHII) contains all 10 conserved cytosine-5 methyltransferase motifs, but motifs IX and X precede motifs I-VIII. Thus, the conserved motifs of M. BssHII are circularly permuted relative to the motif organizations of other cytosine-5 methyltransferases. M.BssHII and the non-cognate multi-specific phiBssHII methyltransferase, M.phiBss HII [Schumann,J. et al . (1995) Gene, 157, 103-104] share 34% identity in amino acid sequences from motifs I-VIII, and 40% identity in motifs IX-X. A conserved arginine is located upstream of a TV dipeptide in the N-terminus of M.BssHII that may be responsible for the recognition of the guanine 5' of the target cytosine. The BssHII restriction endonuclease gene was expressed in E.coli via a T7 expression vector. PMID:9321648

  15. [MOLECULAR EVOLUTION OF ION CHANNELS: AMINO ACID SEQUENCES AND 3D STRUCTURES].

    PubMed

    Korkosh, V S; Zhorov, B S; Tikhonov, D B

    2016-01-01

    An integral part of modern evolutionary biology is comparative analysis of structure and function of macromolecules such as proteins. The first and critical step to understand evolution of homologous proteins is their amino acid sequence alignment. However, standard algorithms fop not provide unambiguous sequence alignments for proteins of poor homology. More reliable results can be obtained by comparing experimental 3D structures obtained at atomic resolution, for instance, with the aid of X-ray structural analysis. If such structures are lacking, homology modeling is used, which may take into account indirect experimental data on functional roles of individual amino-acid residues. An important problem is that the sequence alignment, which reflects genetic modifications, does not necessarily correspond to the functional homology. The latter depends on three-dimensional structures which are critical for natural selection. Since alignment techniques relying only on the analysis of primary structures carry no information on the functional properties of proteins, including 3D structures into consideration is very important. Here we consider several examples involving ion channels and demonstrate that alignment of their three-dimensional structures can significantly improve sequence alignments obtained by traditional methods.

  16. The amino acid sequence of the aspartate aminotransferase from baker's yeast (Saccharomyces cerevisiae).

    PubMed Central

    Cronin, V B; Maras, B; Barra, D; Doonan, S

    1991-01-01

    1. The single (cytosolic) aspartate aminotransferase was purified in high yield from baker's yeast (Saccharomyces cerevisiae). 2. Amino-acid-sequence analysis was carried out by digestion of the protein with trypsin and with CNBr; some of the peptides produced were further subdigested with Staphylococcus aureus V8 proteinase or with pepsin. Peptides were sequenced by the dansyl-Edman method and/or by automated gas-phase methods. The amino acid sequence obtained was complete except for a probable gap of two residues as indicated by comparison with the structures of counterpart proteins in other species. 3. The N-terminus of the enzyme is blocked. Fast-atom-bombardment m.s. was used to identify the blocking group as an acetyl one. 4. Alignment of the sequence of the enzyme with those of vertebrate cytosolic and mitochondrial aspartate aminotransferases and with the enzyme from Escherichia coli showed that about 25% of residues are conserved between these distantly related forms. 5. Experimental details and confirmatory data for the results presented here are given in a Supplementary Publication (SUP 50164, 25 pages) that has been deposited at the British Library Document Supply Centre, Boston Spa. Wetherby, West Yorkshire LS23 7 BQ, U.K., from whom copies can be obtained on the terms indicated in Biochem. J. (1991) 273, 5. PMID:1859361

  17. Analysis of amino acid sequence variations and immunoglobulin E-binding epitopes of German cockroach tropomyosin.

    PubMed

    Jeong, Kyoung Yong; Lee, Jongweon; Lee, In-Yong; Ree, Han-Il; Hong, Chein-Soo; Yong, Tai-Soon

    2004-09-01

    The allergenicities of tropomyosins from different organisms have been reported to vary. The cDNA encoding German cockroach tropomyosin (Bla g 7) was isolated, expressed, and characterized previously. In the present study, the amino acid sequence variations in German cockroach tropomyosin were analyzed in order to investigate its influence on allergenicity. We also undertook the identification of immunodominant peptides containing immunoglobulin E (IgE) epitopes which may facilitate the development of diagnostic and immunotherapeutic strategies based on the recombinant proteins. Two-dimensional gel electrophoresis and immunoblot analysis with mouse anti-recombinant German cockroach tropomyosin serum was performed to investigate the isoforms at the protein level. Reverse transcriptase PCR (RT-PCR) was applied to examine the sequence diversity. Eleven different variants of the deduced amino acid sequences were identified by RT-PCR. German cockroach tropomyosin has only minor sequence variations that did not seem to affect its allergenicity significantly. These results support the molecular basis underlying the cross-reactivities of arthropod tropomyosins. Recombinant fragments were also generated by PCR, and IgE-binding epitopes were assessed by enzyme-linked immunosorbent assay. Sera from seven patients revealed heterogeneous IgE-binding responses. This study demonstrates multiple IgE-binding epitope regions in a single molecule, suggesting that full-length tropomyosin should be used for the development of diagnostic and therapeutic reagents.

  18. Investigations on dendrimer space reveal solid and liquid tumor growth-inhibition by original phosphorus-based dendrimers and the corresponding monomers and dendrons with ethacrynic acid motifs.

    PubMed

    El Brahmi, Nabil; Mignani, Serge M; Caron, Joachim; El Kazzouli, Saïd; Bousmina, Mosto M; Caminade, Anne-Marie; Cresteil, Thierry; Majoral, Jean-Pierre

    2015-03-07

    The well-known reactive diuretic ethacrynic acid (EA, Edecrin), with low antiproliferative activities, was chemically modified and grafted onto phosphorus dendrimers and the corresponding simple branched phosphorus dendron-like derivatives affording novel nanodevices showing moderate to strong antiproliferative activities against liquid and solid tumor cell lines, respectively.

  19. Complete amino acid sequence of a histidine-rich proteolytic fragment of human ceruloplasmin.

    PubMed

    Kingston, I B; Kingston, B L; Putnam, F W

    1979-04-01

    The complete amino acid sequence has been determined for a fragment of human ceruloplasmin [ferroxidase; iron(II):oxygen oxidoreductase, EC 1.16.3.1]. The fragment (designated Cp F5) contains 159 amino acid residues and has a molecular weight of 18,650; it lacks carbohydrate, is rich in histidine, and contains one free cysteine that may be part of a copper-binding site. This fragment is present in most commercial preparations of ceruloplasmin, probably owing to proteolytic degradation, but can also be obtained by limited cleavage of single-chain ceruloplasmin with plasmin. Cp F5 probably is an intact domain attached to the COOH-terminal end of single-chain ceruloplasmin via a labile interdomain peptide bond. A model of the secondary structure predicted by empirical methods suggests that almost one-third of the amino acid residues are distributed in alpha helices, about a third in beta-sheet structure, and the remainder in beta turns and unidentified structures. Computer analysis of the amino acid sequence has not demonstrated a statistically significant relationship between this ceruloplasmin fragment and any other protein, but there is some evidence for an internal duplication.

  20. The LINKS motif zippers trans-acyltransferase polyketide synthase assembly lines into a biosynthetic megacomplex

    PubMed Central

    Gay, Darren C.; Wagner, Drew T.; Meinke, Jessica L.; Zogzas, Charles E.; Gay, Glen R.; Keatinge-Clay, Adrian T.

    2016-01-01

    Polyketides such as the clinically-valuable antibacterial agent mupirocin are constructed by architecturally-sophisticated assembly lines known as trans-acyltransferase polyketide synthases. Organelle-sized megacomplexes composed of several copies of trans-acyltransferase polyketide synthase assembly lines have been observed by others through transmission electron microscopy to be located at the Bacillus subtilis plasma membrane, where the synthesis and export of the antibacterial polyketide bacillaene takes place. In this work we analyze ten crystal structures of trans-acyltransferase polyketide synthases ketosynthase domains, seven of which are reported here for the first time, to characterize a motif capable of zippering assembly lines into a megacomplex. While each of the three-helix LINKS (Laterally-INteracting Ketosynthase Sequence) motifs is observed to similarly dock with a spatially-reversed copy of itself through hydrophobic and ionic interactions, the amino acid sequences of this motif are not conserved. Such a code is appropriate for mediating homotypic contacts between assembly lines to ensure the ordered self-assembly of a noncovalent, yet tightly-knit, enzymatic network. LINKS-mediated lateral interactions would also have the effect of bolstering the vertical association of the polypeptides that comprise a polyketide synthase assembly line. PMID:26724270

  1. Finding specific RNA motifs: Function in a zeptomole world?

    PubMed Central

    KNIGHT, ROB; YARUS, MICHAEL

    2003-01-01

    We have developed a new method for estimating the abundance of any modular (piecewise) RNA motif within a longer random region. We have used this method to estimate the size of the active motifs available to modern SELEX experiments (picomoles of unique sequences) and to a plausible RNA World (zeptomoles of unique sequences: 1 zmole = 602 sequences). Unexpectedly, activities such as specific isoleucine binding are almost certainly present in zeptomoles of molecules, and even ribozymes such as self-cleavage motifs may appear (depending on assumptions about the minimal structures). The number of specified nucleotides is not the only important determinant of a motif’s rarity: The number of modules into which it is divided, and the details of this division, are also crucial. We propose three maxims for easily isolated motifs: the Maxim of Minimization, the Maxim of Multiplicity, and the Maxim of the Median. These maxims together state that selected motifs should be small and composed of as many separate, equally sized modules as possible. For evenly divided motifs with four modules, the largest accessible activity in picomole scale (1–1000 pmole) pools of length 100 is about 34 nucleotides; while for zeptomole scale (1–1000 zmole) pools it is about 20 specific nucleotides (50% probability of occurrence). This latter figure includes some ribozymes and aptamers. Consequently, an RNA metabolism apparently could have begun with only zeptomoles of RNA molecules. PMID:12554865

  2. cWINNOWER algorithm for finding fuzzy dna motifs

    NASA Technical Reports Server (NTRS)

    Liang, S.; Samanta, M. P.; Biegel, B. A.

    2004-01-01

    The cWINNOWER algorithm detects fuzzy motifs in DNA sequences rich in protein-binding signals. A signal is defined as any short nucleotide pattern having up to d mutations differing from a motif of length l. The algorithm finds such motifs if a clique consisting of a sufficiently large number of mutated copies of the motif (i.e., the signals) is present in the DNA sequence. The cWINNOWER algorithm substantially improves the sensitivity of the winnower method of Pevzner and Sze by imposing a consensus constraint, enabling it to detect much weaker signals. We studied the minimum detectable clique size qc as a function of sequence length N for random sequences. We found that qc increases linearly with N for a fast version of the algorithm based on counting three-member sub-cliques. Imposing consensus constraints reduces qc by a factor of three in this case, which makes the algorithm dramatically more sensitive. Our most sensitive algorithm, which counts four-member sub-cliques, needs a minimum of only 13 signals to detect motifs in a sequence of length N = 12,000 for (l, d) = (15, 4). Copyright Imperial College Press.

  3. Processing and amino acid sequence analysis of the mouse mammary tumor virus env gene product.

    PubMed Central

    Arthur, L O; Copeland, T D; Oroszlan, S; Schochetman, G

    1982-01-01

    The envelope proteins of mouse mammary tumor virus (MMTV) are synthesized from a subgenomic 24S mRNA as a 75,000-dalton glycosylated precursor polyprotein which is eventually processed to the mature glycoproteins gp52 and gp36. In vivo synthesis of this env precursor in the presence of the core glycosylation inhibitor tunicamycin yielded a precursor of approximately 61,000 daltons (P61env). However, a 67,000-dalton protein (P67env) was obtained from cell-free translation with the MMTV 24S mRNA as the template. To determine whether the portion of the protein cleaved from P67env to give P61env was removed from the NH2-terminal end of P67env and as such would represent a leader sequence, the NH2-terminal amino acid sequence of the terminal peptide gp52 was determined. Glutamic acid, and not methionine, was found to be the amino-terminal residue of gp52, indicating that the cleaved portion was derived from the NH2-terminal end of P67env. The NH2-terminal amino acid sequences of gp52's from endogenous and exogenous C3H MMTVs were determined though 46 residues and found to be identical. However, amino acid composition and type-specific gp52 radioimmunoassays from MMTVs grown in heterologous cells indicated primary structure differences between gp52's of the two viruses. The nucleic acid sequence of cloned MMTV DNA fragments (J. Majors and H. E. Varmus, personal communication) in conjunction with the NH2-terminal sequence of gp52 allowed localization of the env gene in the MMTV genome. Nucleotides coding for the NH2 terminus of gp52 begin approximately 0.8 kilobase to the 3' side of the single EcoRI cleavage site. Localization of the env gene at that point agrees with the proposed gene order -gag-pol-env- and also allows sufficient coding potential for the glycoprotein precursor without extending into the long terminal repeat. Images PMID:6281457

  4. Complete Genome Sequence of a thermotolerant sporogenic lactic acid bacterium, Bacillus coagulans strain 36D1

    PubMed Central

    Rhee, Mun Su; Moritz, Brélan E.; Xie, Gary; Glavina del Rio, T.; Dalin, E.; Tice, H.; Bruce, D.; Goodwin, L.; Chertkov, O.; Brettin, T.; Han, C.; Detter, C.; Pitluck, S.; Land, Miriam L.; Patel, Milind; Ou, Mark; Harbrucker, Roberta; Ingram, Lonnie O.; Shanmugam, K. T.

    2011-01-01

    Bacillus coagulans is a ubiquitous soil bacterium that grows at 50-55 °C and pH 5.0 and ferments various sugars that constitute plant biomass to L (+)-lactic acid. The ability of this sporogenic lactic acid bacterium to grow at 50-55 °C and pH 5.0 makes this organism an attractive microbial biocatalyst for production of optically pure lactic acid at industrial scale not only from glucose derived from cellulose but also from xylose, a major constituent of hemicellulose. This bacterium is also considered as a potential probiotic. Complete genome sequence of a representative strain, B. coagulans strain 36D1, is presented and discussed. PMID:22675583

  5. BeadCons: detection of nucleic acid sequences by flow cytometry.

    PubMed

    Horejsh, Douglas; Martini, Federico; Capobianchi, Maria Rosaria

    2005-11-01

    Molecular beacons are single-stranded nucleic acid structures with a terminal fluorophore and a distal, terminal quencher. These molecules are typically used in real-time PCR assays, but have also been conjugated with solid matrices. This unit describes protocols related to molecular beacon-conjugated beads (BeadCons), whose specific hybridization with complementary target sequences can be resolved by cytometry. Assay sensitivity is achieved through the concentration of fluorescence signal on discrete particles. By using molecular beacons with different fluorophores and microspheres of different sizes, it is possible to construct a fluid array system with each bead corresponding to a specific target nucleic acid. Methods are presented for the design, construction, and use of BeadCons for the specific, multiplexed detection of unlabeled nucleic acids in solution. The use of bead-based detection methods will likely lead to the design of new multiplex molecular diagnostic tools.

  6. Measuring nanometer distances in nucleic acids using a sequence-independent nitroxide probe

    PubMed Central

    Qin, Peter Z; Haworth, Ian S; Cai, Qi; Kusnetzow, Ana K; Grant, Gian Paola G; Price, Eric A; Sowa, Glenna Z; Popova, Anna; Herreros, Bruno; He, Honghang

    2008-01-01

    This protocol describes the procedures for measuring nanometer distances in nucleic acids using a nitroxide probe that can be attached to any nucleotide within a given sequence. Two nitroxides are attached to phosphorothioates that are chemically substituted at specific sites of DNA or RNA. Inter-nitroxide distances are measured using a four-pulse double electron–electron resonance technique, and the measured distances are correlated to the parent structures using a Web-accessible computer program. Four to five days are needed for sample labeling, purification and distance measurement. The procedures described herein provide a method for probing global structures and studying conformational changes of nucleic acids and protein/nucleic acid complexes. PMID:17947978

  7. Complete Genome Sequence of a thermotolerant sporogenic lactic acid bacterium, Bacillus coagulans strain 36D1.

    PubMed

    Rhee, Mun Su; Moritz, Brélan E; Xie, Gary; Glavina Del Rio, T; Dalin, E; Tice, H; Bruce, D; Goodwin, L; Chertkov, O; Brettin, T; Han, C; Detter, C; Pitluck, S; Land, Miriam L; Patel, Milind; Ou, Mark; Harbrucker, Roberta; Ingram, Lonnie O; Shanmugam, K T

    2011-12-31

    Bacillus coagulans is a ubiquitous soil bacterium that grows at 50-55 °C and pH 5.0 and ferments various sugars that constitute plant biomass to L (+)-lactic acid. The ability of this sporogenic lactic acid bacterium to grow at 50-55 °C and pH 5.0 makes this organism an attractive microbial biocatalyst for production of optically pure lactic acid at industrial scale not only from glucose derived from cellulose but also from xylose, a major constituent of hemicellulose. This bacterium is also considered as a potential probiotic. Complete genome sequence of a representative strain, B. coagulans strain 36D1, is presented and discussed.

  8. The amino acid sequence of Lady Amherst's pheasant (Chrysolophus amherstiae) and golden pheasant (Chrysolophus pictus) egg-white lysozymes.

    PubMed

    Araki, T; Kuramoto, M; Torikata, T

    1990-09-01

    The amino acids of Lady Amherst's pheasant and golden pheasant egg-white lysozymes have been sequenced. The carboxymethylated lysozymes were digested with trypsin followed by sequencing of the tryptic peptides. Lady Amherst's pheasant lysozyme proved to consist of 129 amino acid residues, and a relative molecular mass of 14,423 Da was calculated. This lysozyme had 6 amino acids substitutions when compared with hen egg-white lysozyme: Phe3 to Tyr, His15 to Leu, Gln41 to His, Asn77 to His, Gln 121 to Asn, and a newly found substitution of Ile124 to Thr. The amino acid sequence of golden pheasant lysozyme was identical to that of Lady Amherst's phesant lysozyme. The phylogenetic tree constructured by the comparison of amino acid sequences of phasianoid birds lysozymes revealed a minimum genetic distance between these pheasants and the turkey-peafowl group.

  9. Investigations on dendrimer space reveal solid and liquid tumor growth-inhibition by original phosphorus-based dendrimers and the corresponding monomers and dendrons with ethacrynic acid motifs

    NASA Astrophysics Data System (ADS)

    El Brahmi, Nabil; Mignani, Serge M.; Caron, Joachim; El Kazzouli, Saïd; Bousmina, Mosto M.; Caminade, Anne-Marie; Cresteil, Thierry; Majoral, Jean-Pierre

    2015-02-01

    The well-known reactive diuretic ethacrynic acid (EA, Edecrin), with low antiproliferative activities, was chemically modified and grafted onto phosphorus dendrimers and the corresponding simple branched phosphorus dendron-like derivatives affording novel nanodevices showing moderate to strong antiproliferative activities against liquid and solid tumor cell lines, respectively.The well-known reactive diuretic ethacrynic acid (EA, Edecrin), with low antiproliferative activities, was chemically modified and grafted onto phosphorus dendrimers and the corresponding simple branched phosphorus dendron-like derivatives affording novel nanodevices showing moderate to strong antiproliferative activities against liquid and solid tumor cell lines, respectively. Electronic supplementary information (ESI) available. See DOI: 10.1039/c4nr05983b

  10. A 25-Amino Acid Sequence of the Arabidopsis TGD2 Protein Is Sufficient for Specific Binding of Phosphatidic Acid*

    PubMed Central

    Lu, Binbin; Benning, Christoph

    2009-01-01

    Genetic analysis suggests that the TGD2 protein of Arabidopsis is required for the biosynthesis of endoplasmic reticulum derived thylakoid lipids. TGD2 is proposed to be the substrate-binding protein of a presumed lipid transporter consisting of the TGD1 (permease) and TGD3 (ATPase) proteins. The TGD1, -2, and -3 proteins are localized in the inner chloroplast envelope membrane. TGD2 appears to be anchored with an N-terminal membrane-spanning domain into the inner envelope membrane, whereas the C-terminal domain faces the intermembrane space. It was previously shown that the C-terminal domain of TGD2 binds phosphatidic acid (PtdOH). To investigate the PtdOH binding site of TGD2 in detail, the C-terminal domain of the TGD2 sequence lacking the transit peptide and transmembrane sequences was fused to the C terminus of the Discosoma sp. red fluorescent protein (DR). This greatly improved the solubility of the resulting DR-TGD2C fusion protein following production in Escherichia coli. The DR-TGD2C protein bound PtdOH with high specificity, as demonstrated by membrane lipid-protein overlay and liposome association assays. Internal deletion and truncation mutagenesis identified a previously undescribed minimal 25-amino acid fragment in the C-terminal domain of TGD2 that is sufficient for PtdOH binding. Binding characteristics of this 25-mer were distinctly different from those of TGD2C, suggesting that additional sequences of TGD2 providing the proper context for this 25-mer are needed for wild type-like PtdOH binding. PMID:19416982

  11. Nucleotide sequence of the luxC gene encoding fatty acid reductase of the lux operon from Photobacterium leiognathi.

    PubMed

    Lin, J W; Chao, Y F; Weng, S F

    1993-02-26

    The nucleotide sequence of the luxC gene (EMBL Accession No. 65156) encoding fatty acid reductase (FAR) of the lux operon from Photobacterium leiognathi PL741 was determined and the encoded amino acid sequence deduced. The fatty acid reductase is a component of the fatty